C++ Fast Division/Mod by 10^X

Fastest way to get mod 10 in C

Because most modern processors can do multiplication much, much faster than division, it is often possible to speed up division and modulus operations where the dividend is a known small constant by replacing the division with one or two multiplications and a few other fast operations (such as shift and addition).

To do so requires computing at compile-time some magic numbers dependent on the dividend; fortunately most modern compilers know how to do this so you don't need to do anything to take advantage. Just let your compiler do the heavy lifting for you, as @chux suggests in an excellent answer.

You can help the compiler by using unsigned types; for some dividends, signed division and modulus are harder to replace.

The basic outline of the optimisation of modulus looks like this:

If you had exact arithmetic, you could replace x % p with p * ((x * (1/p)) % 1). For constant p, 1/p can be precomputed at compile time. The %1 operation simply consists of discarding the fraction part, which is just a right-shift. So that replaces a division with two multiplies, and if p only has a few bits set, the multiply by p might be further optimised into a few left-shifts.

We can do that computation with fixed-point arithmetic, taking advantage of the fact that most processors produce a double-sized result for integer multiplication. Since we don't care about the integer part of the inner multiplication and we know that the result of the outer multiplication must be less than p, we only need to reserve ceil(log2 p) bits for the integer part of the computation, leaving the rest of the bits for the fraction. And that might give us enough precision to correctly handle the possible range of values of x, particularly if x has a limited range (eg. uint8_t or even uint16_t). The key is finding a position of the fixed-point which minimises the error in representation of 1/p.

For many small values of p, that works. For others, there is an alternative (but slower) solution which involves estimating q = x/p using multiplication by the inverse, and then computing x - q * p. If the estimate of q can be guaranteed to be either correct or off by one in a known direction, we only need to correct the final computation by conditionally adding or subtracting p; that can be accomplished without a branch on many modern CPUs. (The direction of the error is known because it will depend only on whether the approximation we chose for the inverse of the dividend was too small or too big.)

In the very specific case of x % 10 where x is a uint_8, you might be able to do better than the above using a 256-byte lookup table. That would only be worthwhile if you were doing the modulus operation in a tight loop over a large number of values, and even then you'd want to profile carefully to verify that it is an improvement.

I doubt whether that's the best expenditure of your time; there are probably much more fruitful optimisation opportunities in your application.

Divide by 10 using bit shifts?

Editor's note: this is not actually what compilers do, and gives the wrong answer for large positive integers ending with 9, starting with div10(1073741829) = 107374183 not 107374182. It is exact for smaller inputs, though, which may be sufficient for some uses.

Compilers (including MSVC) do use fixed-point multiplicative inverses for constant divisors, but they use a different magic constant and shift on the high-half result to get an exact result for all possible inputs, matching what the C abstract machine requires. See Granlund & Montgomery's paper on the algorithm.

See Why does GCC use multiplication by a strange number in implementing integer division? for examples of the actual x86 asm gcc, clang, MSVC, ICC, and other modern compilers make.

This is a fast approximation that's inexact for large inputs

It's even faster than the exact division via multiply + right-shift that compilers use.

You can use the high half of a multiply result for divisions by small integral constants. Assume a 32-bit machine (code can be adjusted accordingly):

int32_t div10(int32_t dividend)
{
    int64_t invDivisor = 0x1999999A;
    return (int32_t) ((invDivisor * dividend) >> 32);
}

What's going here is that we're multiplying by a close approximation of 1/10 * 2^32 and then removing the 2^32. This approach can be adapted to different divisors and different bit widths.

This works great for the ia32 architecture, since its IMUL instruction will put the 64-bit product into edx:eax, and the edx value will be the wanted value. Viz (assuming dividend is passed in eax and quotient returned in eax)

div10 proc 
    mov    edx,1999999Ah    ; load 1/10 * 2^32
    imul   eax              ; edx:eax = dividend / 10 * 2 ^32
    mov    eax,edx          ; eax = dividend / 10
    ret
    endp

Even on a machine with a slow multiply instruction, this will be faster than a software or even hardware divide.

Faster modulus in C/C#?

If the denominator is known at compile time to be a power of 2, like your example of 2048, you could subtract 1 and do a bitwise-and.

That is:

n % m == n & (m - 1)

...where m is a power of 2.

For example:

22 % 8 == 22 - 16 == 6

         Dec   Bin
       -----   -----
          22 = 10110
           8 = 01000  
       8 - 1 = 00111 
22 & (8 - 1) =   10110 
               & 00111 
               -------
           6 =   00110

Bear in mind that a good compiler will have its own optimizations for %, maybe even enough to be as fast as the above technique. Arithmetic operators tend to be pretty heavily optimized.

Fast ceiling of an integer division in C / C++

For positive numbers where you want to find the ceiling (q) of x when divided by y.

unsigned int x, y, q;

To round up ...

q = (x + y - 1) / y;

or (avoiding overflow in x+y)

q = 1 + ((x - 1) / y); // if x != 0

Fast Division on GCC/ARM

After that, however, it subtracted (dividend/2^31) from the result.

Actually, it subtracts dividend >> 31, which is -1 for negative dividend, and 0 for non-negative dividend, when right-shifting negative integers is arithmetic right-shift (and int is 32 bits wide).

0x6666667 = (2^34 + 6)/10

So for x < 0, we have, writing x = 10*k + r with -10 < r <= 0,

0x66666667 * (10*k+r) = (2^34+6)*k + (2^34 + 6)*r/10 = 2^34*k + 6*k + (2^34+6)*r/10

Now, arithmetic right shift of negative integers yields the floor of v / 2^n, so

(0x66666667 * x) >> 34

results in

k + floor((6*k + (2^34+6)*r/10) / 2^34)

So we need to see that

-2^34 < 6*k + (2^34+6)*r/10 < 0

The right inequality is easy, both k and r are non-positive, and not both are 0.

For the left inequality, a bit more analysis is needed.

r >= -9

so the absolute value of (2^34+6)*r/10 is at most 2^34+6 - (2^34+6)/10.

|k| <= 2^31/10,

so |6*k| <= 3*2^31/5.

And it remains to verify that

6 + 3*2^31/5 < (2^34+6)/10
1288490194   < 1717986919

Yup, true.

Which is better option to use for dividing an integer number by 2?

Use the operation that best describes what you are trying to do.

If you are treating the number as a sequence of bits, use bitshift.
If you are treating it as a numerical value, use division.

Note that they are not exactly equivalent. They can give different results for negative integers. For example:

-5 / 2  = -2
-5 >> 1 = -3

(ideone)