Unsigned comparisons using signed types
Daniel Lemire's blog
There are two main types of fixed-precision integers in modern software: unsigned and signed. In C++20 and above, the signed integers must use the two’s complement convention. Other programming languages typically specify two’s complement as well.
Two’s complement is a method for representing signed integers in binary, where the leftmost bit serves as the sign bit—0 for positive numbers and 1 for negative ones—allowing computers to handle both positive and negative values efficiently with simple arithmetic. Positive numbers are written as standard binary (e.g., 5 in 8-bit is 00000101), while negative numbers are formed by taking the positive version, flipping all bits (one’s complement), and adding 1 (e.g., -5 becomes 11111011 from 00000101).
There are cases where you have signed types and you wish to use unsigned comparisons. It happens in Java (which lacks unsigned types) with some of the x64 SIMD instruction sets: extensions to the x64 architecture that allow a singleinstruction to perform the same operation on multiple data elements simultaneously.
Thus, you would want any negative value to be larger than any positive value. The trick is to simply use the fact that arithmetic operations typically wrap around in hardware. Let M be the smallest possible value (a very small negative), and take two signed integers x and y, then (x + M) < (y + M) is equivalent to comparing x and y as if they had been cast to unsigned integer values.
To see why it is so, consider that the transformation x to x + M maps 0 to M (so 0 becomes the smallest value). It maps all positive values to the range from 1 to –M-1 to the range from M+1 to –1 (so positives become negatives). The negative values all overflow in such a way that the last bit becomes a zero while other bits remain unchanged. Thus it maps negative values from M to -1 to the range 0 to -M-1 (so negatives become positives).
Unfortunately, it will not generally work in C or C++ where signed overflow leads to an undefined behavior. However, it will work if you code for SIMD instruction sets using intrinsics (special functions used by SIMD enthusiasts). It will also work in other programming languages.
Generated by RSStT. The copyright belongs to the original author.