Fast character classification with z3
Daniel Lemire's blog
We often need to quickly classify characters. For example, consider how the binary data that you send by email is converted to an ASCII string made of 64 distinct characters (A-Z, a-z, 0-9, +, /). ASCII characters are stored as 7-bit integer values (from 0 to 128) using on byte per character. During the decoding of a base64 string, you might want to identify which characters are base64 characters. If you are processing the data character-by-character from an ASCII or UTF-8 stream, a simple 256-element lookup table suffices. E.g., you create a table filled with false values, except at the indexes corresponding to the 7-bit code point value of the designed characters (A-Z, a-z, 0-9, +, /).
However, if you are using fancier techniques such as Single Instruction, Multiple Data (SIMD) instructions, where you try to process the data in blocks of 16 characters or more, the standard lookup table fails.
I call the process of classifying characters using SIMD instructions vectorized classification. In a common version of vectorized classification, each character’s ASCII value is split into two 4-bit nibbles (lower and upper), and 16-byte lookup tables (lut_lo and lut_hi) are used to classify the character via a bitwise AND operation: f(c) = lut_lo[c & 0x0F] AND lut_hi[c >> 4]. For base64 decoding, you might want to have that f(c) = 0 indicates a valid base64 character, while f(c) > 0 flags an invalid one, allowing rapid validation across vectors of characters in a single instruction cycle.
How do you figure out the content of the arrays lut_lo and lut_hi tables? In “Parsing gigabytes of JSON per second” (Langdale and Lemire, 2019) as well as in other papers, we describe how you can reason it out. But in other papers, such as “Faster Base64 Encoding and Decoding Using AVX2 Instructions” (Muła and Lemire, 2018) we just give the solution without further explanation.
These constants in these tables must be computed to satisfy the classification rules for all possible input characters, which can be a confusing task given the overlapping nibble patterns of valid and invalid characters. I do not recommend using ChatGPT. However, you can automated the task with a theorem prove such as Z3. Z3 can model the classification constraints as a satisfiability problem and automatically compute a set of table values that meet the requirements.
The following Python code uses Z3 to solve for the lut_lo and lut_hi tables. It begins by importing Z3 and creating a solver instance (s = Solver()). Two lists of 8-bit bitvector variables, lut_lo and lut_hi, are defined to represent the lookup tables, each with 16 entries. The base64 character set (A-Z, a-z, 0-9, +, /) is defined using ASCII ranges, and invalid characters are identified as all other ASCII values (0 to 255). Constraints are added to the solver: base64 characters must classify to 0, and invalid characters must classify to a value greater than 0. The solver checks for a solution; if found, it extracts the table values. It is quite easy ! Admittedly, you have to know how to use Z3.
from z3 import *
s = Solver()
lut_lo = [BitVec(f'lut_lo_{i}', 8) for i in range(16)]
lut_hi = [BitVec(f'lut_hi_{i}', 8) for i in range(16)]
# Base64 characters (A-Z, a-z, 0-9, +, /)
base64_chars = list(range(65, 91)) + list(range(97, 123)) + list(range(48, 58)) + [43, 47]
invalid_chars = [c for c in range(256) if c not in base64_chars]
def classify(c):
lower_nibble = c & 0x0F
upper_nibble = c >> 4
return lut_lo[lower_nibble] & lut_hi[upper_nibble]
for c in base64_chars:
s.add(classify(c) == 0)
for c in invalid_chars:
s.add(classify(c) > 0)
if s.check() == sat:
m = s.model()
print("lut_lo =", [m[lut_lo[i]].as_long() for i in range(16)])
print("lut_hi =", [m[lut_hi[i]].as_long() for i in range(16)])In this instance, it finds the following solution:
lut_lo = [103, 39, 39, 39, 39, 39, 39, 39, 39, 39, 47, 31, 63, 63, 63, 31] lut_hi = [255, 255, 224, 152, 192, 144, 192, 144, 255, 255, 255, 255, 255, 255, 255, 255]
In general the solution is not unique.
In Muła and Lemire (2018), we solved the full base64 decoding problem, and we needed at least one more table. I have checked in a script on how to do the full computation in the GitHub repository I used for my blog.
If you are interested in vectorized classification, you might find the following references useful:
- Lemire, D. (2025). Scanning HTML at Tens of Gigabytes Per Second on ARM Processors. Software: Practice and Experience.
- Keiser, J., & Lemire, D. (2021). Validating UTF‐8 in less than one instruction per byte. Software: Practice and Experience, 51(5)
- Koekkoek, J., & Lemire, D. (2025). Parsing Millions of DNS Records per Second. Software: Practice and Experience, 55(4), 778-788.
Generated by RSStT. The copyright belongs to the original author.