Is Multiplication and Division Using Shift Operators in C Actually Faster

Is multiplication and division using shift operators in C actually faster?

Short answer: Not likely.

Long answer:
Your compiler has an optimizer in it that knows how to multiply as quickly as your target processor architecture is capable. Your best bet is to tell the compiler your intent clearly (i.e. i*2 rather than i << 1) and let it decide what the fastest assembly/machine code sequence is. It's even possible that the processor itself has implemented the multiply instruction as a sequence of shifts & adds in microcode.

Bottom line--don't spend a lot of time worrying about this. If you mean to shift, shift. If you mean to multiply, multiply. Do what is semantically clearest--your coworkers will thank you later. Or, more likely, curse you later if you do otherwise.

Is shifting bits faster than multiplying and dividing in Java? .NET?

Most compilers today will do more than convert multiply or divide by a power-of-two to shift operations. When optimizing, many compilers can optimize a multiply or divide with a compile time constant even if it's not a power of 2. Often a multiply or divide can be decomposed to a series of shifts and adds, and if that series of operations will be faster than the multiply or divide, the compiler will use it.

For division by a constant, the compiler can often convert the operation to a multiply by a 'magic number' followed by a shift. This can be a major clock-cycle saver since multiplication is often much faster than a division operation.

Henry Warren's book, Hacker's Delight, has a wealth of information on this topic, which is also covered quite well on the companion website:

http://www.hackersdelight.org/

See also a discussion (with a link or two ) in:

Reading assembly code

Anyway, all this boils down to allowing the compiler to take care of the tedious details of micro-optimizations. It's been years since doing your own shifts outsmarted the compiler.

Is shifting or multiplying faster and why?

It really depends on the architecture of the processor, as well as the compiler that you're using.

But you can simply view the dis-assembly of each option, and see for yourself.

Here is what I got using Visual-Studio 2010 compiler for Pentium:

int v2 = (v<<13) + (v<<11) + (v<<4) - (v<<8);
mov         eax,dword ptr [v]  
shl         eax,0Dh  
mov         ecx,dword ptr [v]  
shl         ecx,0Bh  
add         eax,ecx  
mov         edx,dword ptr [v]  
shl         edx,4  
add         eax,edx  
mov         ecx,dword ptr [v]  
shl         ecx,8  
sub         eax,ecx  
mov         dword ptr [v2],eax  

int v2 = 10000*v;
mov         eax,dword ptr [v]  
imul        eax,eax,2710h  
mov         dword ptr [v2],eax

So it appears that the second option is faster in my case.

BTW, you might get a different result if you enable optimization (mine was disabled)...

Differences in division and multiplication vs bit shifting

To sum up the answers already mentioned in the comments:

Multiplication, as well as bit shifting, is faster because is a native operation for the CPU too. It takes one cycle while bit shifting takes about four which is why it is faster. Division takes something between 11 and 18 cycles.
Using C# I cannot get close enough to the CPU to get diagnostically conclusive results because many optimizations take place between my code and the CPU.
Also, microbenchmarking is hard and can produce erroneous results, which also can happen because of the above mentioned reason.

If I forgot anything, please comment and tell me!

How can I multiply and divide using only bit shifting and adding?

To multiply in terms of adding and shifting you want to decompose one of the numbers by powers of two, like so:

21 * 5 = 10101_2 * 101_2             (Initial step)
       = 10101_2 * (1 * 2^2  +  0 * 2^1  +  1 * 2^0)
       = 10101_2 * 2^2 + 10101_2 * 2^0 
       = 10101_2 << 2 + 10101_2 << 0 (Decomposed)
       = 10101_2 * 4 + 10101_2 * 1
       = 10101_2 * 5
       = 21 * 5                      (Same as initial expression)

(_2 means base 2)

As you can see, multiplication can be decomposed into adding and shifting and back again. This is also why multiplication takes longer than bit shifts or adding - it's O(n^2) rather than O(n) in the number of bits. Real computer systems (as opposed to theoretical computer systems) have a finite number of bits, so multiplication takes a constant multiple of time compared to addition and shifting. If I recall correctly, modern processors, if pipelined properly, can do multiplication just about as fast as addition, by messing with the utilization of the ALUs (arithmetic units) in the processor.

When to use Shift operators in C#?

There is no need to use them for optimisation purposes because the compiler will take care of this for you.

Only use them when shifting bits is the real intent of your code (as in the remaining examples in your question). The rest of the time just use multiply and divide so readers of your code can understand it at a glance.

C++ Modulus operator Vs. Shift operator, which is faster and why?

I really want to know how '%' works in lower level!

If you're asking how it is implemented then the answer is that chances are the CPU you're using has a single instruction for modulo (%). For example, take this C++ code:

int main()
{
    int x = 100;

    int mod = x % 128;
    int shift = x >> 7;

    return 0;
}

The generated x86 assembly code (Clang 6.0.0) for it is:

main:
    push    rbp
    mov     rbp, rsp
    xor     eax, eax
    mov     ecx, 128
    mov     dword ptr [rbp - 4], 0
    mov     dword ptr [rbp - 8], 100
    mov     edx, dword ptr [rbp - 8]  # Start of modulo boilerplater
    mov     dword ptr [rbp - 20], eax 
    mov     eax, edx
    cdq
    idiv    ecx                       # Modulo CPU instruction
    mov     dword ptr [rbp - 12], edx # End of modulo sequence
    mov     ecx, dword ptr [rbp - 8]  # Start of shift boilerplate
    sar     ecx, 7                    # Shift CPU instruction
    mov     dword ptr [rbp - 16], ecx # End of shift sequence
    mov     ecx, dword ptr [rbp - 20]
    mov     eax, ecx
    pop     rbp
    ret

The idiv instruction is called the Signed Divide, and it places the quotient in EAX/RAX and the remainder in EDX/RDX for (x86/x64 accordingly).

I guessed the reason is that shift version needs one more comparison
than '%' version... Is my correct?

No comparisons are being done in this case, since it's a single instruction.

Is Multiplication and Division Using Shift Operators in C Actually Faster