Is There Any Alternative to Using % (Modulus) in C/C++

Is there any alternative to using % (modulus) in C/C++?

Ah, the joys of bitwise arithmetic. A side effect of many division routines is the modulus - so in few cases should division actually be faster than modulus. I'm interested to see the source you got this information from. Processors with multipliers have interesting division routines using the multiplier, but you can get from division result to modulus with just another two steps (multiply and subtract) so it's still comparable. If the processor has a built in division routine you'll likely see it also provides the remainder.

Still, there is a small branch of number theory devoted to Modular Arithmetic which requires study if you really want to understand how to optimize a modulus operation. Modular arithmatic, for instance, is very handy for generating magic squares.

So, in that vein, here's a very low level look at the math of modulus for an example of x, which should show you how simple it can be compared to division:


Maybe a better way to think about the problem is in terms of number
bases and modulo arithmetic. For example, your goal is to compute DOW
mod 7 where DOW is the 16-bit representation of the day of the
week. You can write this as:

 DOW = DOW_HI*256 + DOW_LO

DOW%7 = (DOW_HI*256 + DOW_LO) % 7
= ((DOW_HI*256)%7 + (DOW_LO % 7)) %7
= ((DOW_HI%7 * 256%7) + (DOW_LO%7)) %7
= ((DOW_HI%7 * 4) + (DOW_LO%7)) %7

Expressed in this manner, you can separately compute the modulo 7
result for the high and low bytes. Multiply the result for the high by
4 and add it to the low and then finally compute result modulo 7.

Computing the mod 7 result of an 8-bit number can be performed in a
similar fashion. You can write an 8-bit number in octal like so:

  X = a*64 + b*8 + c

Where a, b, and c are 3-bit numbers.

  X%7 = ((a%7)*(64%7) + (b%7)*(8%7) + c%7) % 7
= (a%7 + b%7 + c%7) % 7
= (a + b + c) % 7

since 64%7 = 8%7 = 1

Of course, a, b, and c are

  c = X & 7
b = (X>>3) & 7
a = (X>>6) & 7 // (actually, a is only 2-bits).

The largest possible value for a+b+c is 7+7+3 = 17. So, you'll need
one more octal step. The complete (untested) C version could be
written like:

unsigned char Mod7Byte(unsigned char X)
{
X = (X&7) + ((X>>3)&7) + (X>>6);
X = (X&7) + (X>>3);

return X==7 ? 0 : X;
}

I spent a few moments writing a PIC version. The actual implementation
is slightly different than described above

Mod7Byte:
movwf temp1 ;
andlw 7 ;W=c
movwf temp2 ;temp2=c
rlncf temp1,F ;
swapf temp1,W ;W= a*8+b
andlw 0x1F
addwf temp2,W ;W= a*8+b+c
movwf temp2 ;temp2 is now a 6-bit number
andlw 0x38 ;get the high 3 bits == a'
xorwf temp2,F ;temp2 now has the 3 low bits == b'
rlncf WREG,F ;shift the high bits right 4
swapf WREG,F ;
addwf temp2,W ;W = a' + b'

; at this point, W is between 0 and 10

addlw -7
bc Mod7Byte_L2
Mod7Byte_L1:
addlw 7
Mod7Byte_L2:
return

Here's a liitle routine to test the algorithm

       clrf    x
clrf count

TestLoop:
movf x,W
RCALL Mod7Byte
cpfseq count
bra fail

incf count,W
xorlw 7
skpz
xorlw 7
movwf count

incfsz x,F
bra TestLoop
passed:

Finally, for the 16-bit result (which I have not tested), you could
write:

uint16 Mod7Word(uint16 X)
{
return Mod7Byte(Mod7Byte(X & 0xff) + Mod7Byte(X>>8)*4);
}

Scott


Alternative to using % operator and / Operator in C++

Nothing is going to be considerably more efficient than the % operator. If there was a better way to do it, then any reasonable compiler would automatically convert it. When you're told that % and / are inefficient, that's just because those are difficult operations - if you need to perform a modulo, then do that.

There may be special cases when there are better ways - for example, mod a power of two can be written as a binary or - but those are probably optimized by your compiler.

What is the replacement of mod operator?

It appears from your question that bpp is always a power of 2, in which case you can use instead:

int rem = gid & (bpp - 1);

However you should not optimise prematurely - unless you have profiled and know for certain that this mod operation is a bottleneck then you should just leave it in its original, more readable form.

Faster modulus in C/C#?

If the denominator is known at compile time to be a power of 2, like your example of 2048, you could subtract 1 and do a bitwise-and.

That is:

n % m == n & (m - 1) 

...where m is a power of 2.

For example:

22 % 8 == 22 - 16 == 6

Dec Bin
----- -----
22 = 10110
8 = 01000
8 - 1 = 00111
22 & (8 - 1) = 10110
& 00111
-------
6 = 00110

Bear in mind that a good compiler will have its own optimizations for %, maybe even enough to be as fast as the above technique. Arithmetic operators tend to be pretty heavily optimized.

How to code a modulo (%) operator in C/C++/Obj-C that handles negative numbers

First of all I'd like to note that you cannot even rely on the fact that (-1) % 8 == -1. the only thing you can rely on is that (x / y) * y + ( x % y) == x. However whether or not the remainder is negative is implementation-defined.

Reference: C++03 paragraph 5.6 clause 4:

The binary / operator yields the quotient, and the binary % operator yields the remainder from the division of the first expression by the second. If the second operand of / or % is zero the behavior is undefined; otherwise (a/b)*b + a%b is equal to a. If both operands are nonnegative then the remainder is nonnegative; if not, the sign of the remainder is implementation-defined.

Here it follows a version that handles both negative operands so that the result of the subtraction of the remainder from the divisor can be subtracted from the dividend so it will be floor of the actual division. mod(-1,8) results in 7, while mod(13, -8) is -3.

int mod(int a, int b)
{
if(b < 0) //you can check for b == 0 separately and do what you want
return -mod(-a, -b);
int ret = a % b;
if(ret < 0)
ret+=b;
return ret;
}

Why is operator% referred to as the modulus operator instead of the remainder operator?

It seems like a misnomer to me to call it "modulus" and not "remainder" (In math, the answer really should be 9).

C calls it the % operator, and calls its result the remainder. C++ copies this from C. Neither language calls it the modulus operator. This also explains why the remainder is negative: because the / operator truncates towards 0, and (a / b) * b + (a % b) should equal a.

Edit: David Rodríguez rightly points out that C++ does define a template class std::modulus, which calls operator%. In my opinion, that class is poorly named. Digging a little bit, it is inherited from STL where it was already named as it is now. The download for STL says "The STL was developed on SGI MIPSproTM C++ 7.0, 7.1, 7.2, and 7.2.1.", and as far as I can tell without actually having the compiler and hardware, MIPSpro passes the division to the CPU and MIPS hardware truncates to 0, which would mean std::modulus has always been misnamed.

Fastest way to get a positive modulo in C/C++

Most of the time, compilers are very good at optimizing your code, so it is usually best to keep your code readable (for both compilers and other developers to know what you are doing).

Since your array size is always positive, I suggest you to define the quotient as unsigned. The compiler will optimize small if/else blocks into conditional instructions which have no branches:

unsigned modulo( int value, unsigned m) {
int mod = value % (int)m;
if (mod < 0) {
mod += m;
}
return mod;
}

This creates a very small function without branches:

modulo(int, unsigned int):
mov eax, edi
cdq
idiv esi
add esi, edx
mov eax, edx
test edx, edx
cmovs eax, esi
ret

For example modulo(-5, 7) returns 2.

Unfortunately, since the quotient is not known they must perform an integer division, which is a bit slow compared to other integer operations. If you know the sizes of your array are power of two, I recommend keeping these function definitions in a header, so that the compiler can optimize them into a more efficient function. Here is the function unsigned modulo256(int v) { return modulo(v,256); }:

modulo256(int):                          # @modulo256(int)
mov edx, edi
sar edx, 31
shr edx, 24
lea eax, [rdi+rdx]
movzx eax, al
sub eax, edx
lea edx, [rax+256]
test eax, eax
cmovs eax, edx
ret

See assembly: https://gcc.godbolt.org/z/DG7jMw

See comparison with most voted answer: http://quick-bench.com/oJbVwLr9G5HJb0oRaYpQOCec4E4

Benchmark comparison

Edit: turns out Clang is able to generate a function without any conditional move instructions (which cost more than regular arithmetic operations). This difference is completely negligible in the general case due to the fact that the integral division takes around 70% of the total time.

Basically, Clang shifts value right to extend its sign bit to the whole width of m (that is 0xffffffff when negative and 0 otherwise) which is used to mask the second operand in mod + m.

unsigned modulo (int value, unsigned m) {
int mod = value % (int)m;
m &= mod >> std::numeric_limits<int>::digits;
return mod + m;
}


Related Topics



Leave a reply



Submit