Parsing 8-bit integers quickly

Daniel Lemire's blog

Suppose that you want to parse quickly 8-bit integers (0, 1, 2, …, 254, 255) from an ASCII/UTF-8 string. The problem comes up in the simdzone project lead by Jeroen Koekkoek (NLnet Labs). You are given a string and its length: e.g., ’22’ and length is 2. The naive approach in C might be:

int parse_uint8_naive(const char *str, size_t len, uint8_t *num) {
  uint32_t n = 0;
  for (size_t i = 0, r = len & 0x3; i < r; i++) {
    uint8_t d = (uint8_t)(str[i] - '0');
    if (d > 9)
     return 0;
    n = n * 10 + d;
  }
  *num = (uint8_t)n;
  return n < 256 && len && len < 4;
}

This code is a C function that takes a string of characters, its length, and a pointer to an unsigned 8-bit integer as input arguments. The function returns a Boolean value indicating whether the input string can be parsed into an unsigned 8-bit integer or not.

The function first initializes a 32-bit unsigned integer n to zero, we will store our answer there. The function then iterates over the input string, extracting each digit character from the string and converting it to an unsigned 8-bit integer. The extracted digit is then added to n after being multiplied by 10. This process continues until the end of the string or until the function has processed 4 bytes of the string. Finally, the function assigns the value of n to the unsigned 8-bit integer pointed to by num. It then returns a boolean value indicating whether the parsed integer is less than 256 and the length of the input string is between 1 and 3 characters. If the input string contains any non-digit characters or if the length of the string is greater than 3 bytes, the function returns false.

In C++, you could call from_chars:

int parse_uint8_fromchars(const char *str, size_t len, uint8_t *num) {
  auto [p, ec] = std::from_chars(str, str + len, *num);
  return (ec == std::errc());
}

The std::from_chars function takes three arguments: a pointer to the beginning of the input character sequence, a pointer to the end of the input character sequence, and a reference to the output variable that will hold the parsed integer value. The function returns a std::from_chars_result object that contains two members: a pointer to the first character that was not parsed, and an error code that indicates whether the parsing was successful or not.

In this function, the std::from_chars function is called with the input string and its length as arguments, along with a pointer to the unsigned 8-bit integer that will hold the parsed value. The function then checks whether the error code returned by std::from_chars is equal to std::errc(), which indicates that the parsing was successful. If the parsing was successful, the function returns true. Otherwise, it returns false.

Can we do better than these functions? Suppose that you can always read 4 bytes, even if the string is shorter (i.e., there is a buffer). This is often a safe assumption. Then you can load your input into a 32-bit word and process all bytes at once, in a single register. This often called SWAR: In computer science, SWAR means SIMD within a register, which is a technique for performing parallel operations on data contained in a processor register.

Jeroen Koekkoek first came up with a valid SWAR approach, but it was only about 40% than the naive approach in the case where we had unpredictable inputs, and no faster than the naive approach given predictable inputs. Let us examine an approach that might be competitive all around:

int parse_uint8_fastswar(const char *str, size_t len, uint8_t *num) {
  union {
    uint8_t as_str[4];
    uint32_t as_int;
  } digits;
  memcpy(&digits.as_int, str, sizeof(digits));
  digits.as_int ^= 0x30303030lu;
  digits.as_int <<= (4 - (len & 0x3)) * 8;
  uint32_t y = ((UINT64_C(0x640a0100) * digits.as_int)>>32)&0xff;
  *num = (uint8_t)(y);
  return (digits.as_int & 0xf0f0f0f0) == 0 && y < 256 && len != 0 
          && len < 4;
}

Again, this code is a C function that takes a string of characters, its length, and a pointer to an unsigned 8-bit integer as input arguments. The function returns a boolean value indicating whether the input string can be parsed into an unsigned 8-bit integer or not. The function first initializes a union digits that contains an array of 4 unsigned 8-bit integers and a 32-bit unsigned integer. The function then copies the input string into the 32-bit unsigned integer using the memcpy function. The memcpy function copies the input string into the 32-bit unsigned integer.

The function then flips the bits of the 32-bit unsigned integer using the XOR operator and the constant value 0x30303030lu. This operation converts each digit character in the input string to its corresponding decimal value. Indeed, the ASCII characters from 0 to 9 have code point values 0x30 to 0x39 in ASCII. The function then shifts the 32-bit unsigned integer to the left by a certain number of bits, depending on the length of the input string. This operation removes any trailing bytes that were not part of the input string.

The next part is where I contributed to the routine. The function then multiplies the 32-bit unsigned integer by the constant value 0x640a0100 using a 64-bit unsigned integer. It is a concise way to do two multiplications (by 100 and by 10) and two sums at once. Observe that 0x64 is 100 and 0x0a is 10. The result of this multiplication is then shifted to the right by 32 bits and masked with the value 0xff. This operation extracts the least significant byte of the 32-bit unsigned integer, which represents the parsed unsigned 8-bit integer. Finally, the function assigns the value of the parsed unsigned 8-bit integer to the unsigned 8-bit integer pointed to by num. It then returns a boolean value indicating whether the parsed integer is less than 256 and the length of the input string is between 1 and 3 characters.

To test these functions, I wrote a benchmark. The benchmark uses random inputs, or sequential inputs (0,1,…), and it ends up being very relevant. I use GCC 12 and an Ice Lake (Intel) linux server running at 3.2 GHz. I report the number of millions of numbers parsed by second.

So the SWAR approach is effectively twice as fast as the naive approach when the inputs are unpredictable. Otherwise, for predictable inputs, we still have a 30% gain according to my numbers. The from_chars results are disappointing all around.

Can you do better? The benchmark is available.

Credit: I am grateful to Jeroen Koekkoek from NLnet Labs for suggesting this problem.

Generated by RSStT. The copyright belongs to the original author.

Source

Parsing 8-bit integers quickly

Report Page