Which Is Faster: X<<1 or X<<10

Which is faster: x1 or x10?

Potentially depends on the CPU.

However, all modern CPUs (x86, ARM) use a "barrel shifter" -- a hardware module specifically designed to perform arbitrary shifts in constant time.

So the bottom line is... no. No difference.

Is multiplication and division using shift operators in C actually faster?

Short answer: Not likely.

Long answer:
Your compiler has an optimizer in it that knows how to multiply as quickly as your target processor architecture is capable. Your best bet is to tell the compiler your intent clearly (i.e. i*2 rather than i << 1) and let it decide what the fastest assembly/machine code sequence is. It's even possible that the processor itself has implemented the multiply instruction as a sequence of shifts & adds in microcode.

Bottom line--don't spend a lot of time worrying about this. If you mean to shift, shift. If you mean to multiply, multiply. Do what is semantically clearest--your coworkers will thank you later. Or, more likely, curse you later if you do otherwise.

Difference in x/2 and x1 or x*2 and x 1 where x is an integer

This question has been answered on the ridiculousfishblog : http://ridiculousfish.com/blog/posts/will-it-optimize.html


  1. Division by 2 to right shift

Will GCC transform an integer division by 2 to a right shift?

int halve_it(int x) {
return x / 2;
}

int halve_it(int x) {
return x >> 1;
}

The right shift operator is equivalent to division that rounds towards
negative infinity, but normal division rounds towards zero. Thus the
proposed optimization will produce the wrong result for odd negative
numbers.

The result can be "fixed up" by adding the most significant bit to the
numerator before shifting, and gcc does this.

Good programers let compilers optimize their code, unless they hit a performance penalty.

EDIT : Since you ask for official sources, let's quote the standard rationale document for C99. You find it here : http://www.open-std.org/jtc1/sc22/wg14/www/docs/C99RationaleV5.10.pdf

In C89, division of integers involving negative operands could round upward or downward in an implementation-defined manner; the intent was to avoid incurring overhead in run-time code to check for special cases and enforce specific behavior. In Fortran, however, the result will always truncate toward zero, and the overhead seems to be acceptable to the numeric programming community. Therefore, C99 now requires similar behavior, which should facilitate porting of code from Fortran to C. The table in §7.20.6.2 of this document illustrates the required semantics.

Your optimization would have been correct in C89, since it let to the compiler to do as it wants. However, C99 introduce a new convention to comply with Fortran code. Here is an example of what is expected of the divide operator (always from the same document) :

Sample Image

Unfortunately your optimization does not comply with the C99 standard, since it does not give the right result for x = -1 :

#include <stdio.h>

int div8(int x)
{
return x/3;
}

int rs8( int x )
{
return x >> 3;
}

int main(int argc, char *argv[])
{
volatile int x = -1;
printf("div : %d \n", div8(x) );
printf("rs : %d \n", rs8(x) );

return 0;
}

Result:
div : 0
rs : -1
[Finished in 0.2s]

If you look at the compiled code, you can spot an interesting difference (compiled with g++ v4.6.2) :

0040138c <__Z4div8i>:
40138c: 55 push %ebp
40138d: 89 e5 mov %esp,%ebp
40138f: 8b 45 08 mov 0x8(%ebp),%eax
401392: 85 c0 test %eax,%eax
401394: 79 03 jns 401399 <__Z4div8i+0xd>
401396: 83 c0 0f add $0x7,%eax
401399: c1 f8 04 sar $0x3,%eax
40139c: 5d pop %ebp
40139d: c3 ret

0040139e <__Z3rs8i>:
40139e: 55 push %ebp
40139f: 89 e5 mov %esp,%ebp
4013a1: 8b 45 08 mov 0x8(%ebp),%eax
4013a4: c1 f8 03 sar $0x3,%eax
4013a7: 5d pop %ebp
4013a8: c3 ret

line 401392, there is a test instruction, which will check the parity bit and, if the number is negative, will add 1 << (n-1) = 7 to x before right-shifting by 3 units.

Which is better option to use for dividing an integer number by 2?

Use the operation that best describes what you are trying to do.

  • If you are treating the number as a sequence of bits, use bitshift.
  • If you are treating it as a numerical value, use division.

Note that they are not exactly equivalent. They can give different results for negative integers. For example:

-5 / 2  = -2
-5 >> 1 = -3

(ideone)

Which is faster in Python: x**.5 or math.sqrt(x)?

math.sqrt(x) is significantly faster than x**0.5.

import math
N = 1000000
%%timeit
for i in range(N):
z=i**.5

10 loops, best of 3: 156 ms per loop

%%timeit
for i in range(N):
z=math.sqrt(i)

10 loops, best of 3: 91.1 ms per loop

Using Python 3.6.9 (notebook).

Times-two faster than bit-shift, for Python 3.x integers?

This seems to be because multiplication of small numbers is optimized in CPython 3.5, in a way that left shifts by small numbers are not. Positive left shifts always create a larger integer object to store the result, as part of the calculation, while for multiplications of the sort you used in your test, a special optimization avoids this and creates an integer object of the correct size. This can be seen in the source code of Python's integer implementation.

Because integers in Python are arbitrary-precision, they are stored as arrays of integer "digits", with a limit on the number of bits per integer digit. So in the general case, operations involving integers are not single operations, but instead need to handle the case of multiple "digits". In pyport.h, this bit limit is defined as 30 bits on 64-bit platform, or 15 bits otherwise. (I'll just call this 30 from here on to keep the explanation simple. But note that if you were using Python compiled for 32-bit, your benchmark's result would depend on if x were less than 32,768 or not.)

When an operation's inputs and outputs stay within this 30-bit limit, the operation can be handled in an optimized way instead of the general way. The beginning of the integer multiplication implementation is as follows:

static PyObject *
long_mul(PyLongObject *a, PyLongObject *b)
{
PyLongObject *z;

CHECK_BINOP(a, b);

/* fast path for single-digit multiplication */
if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
stwodigits v = (stwodigits)(MEDIUM_VALUE(a)) * MEDIUM_VALUE(b);
#ifdef HAVE_LONG_LONG
return PyLong_FromLongLong((PY_LONG_LONG)v);
#else
/* if we don't have long long then we're almost certainly
using 15-bit digits, so v will fit in a long. In the
unlikely event that we're using 30-bit digits on a platform
without long long, a large v will just cause us to fall
through to the general multiplication code below. */
if (v >= LONG_MIN && v <= LONG_MAX)
return PyLong_FromLong((long)v);
#endif
}

So when multiplying two integers where each fits in a 30-bit digit, this is done as a direct multiplication by the CPython interpreter, instead of working with the integers as arrays. (MEDIUM_VALUE() called on a positive integer object simply gets its first 30-bit digit.) If the result fits in a single 30-bit digit, PyLong_FromLongLong() will notice this in a relatively small number of operations, and create a single-digit integer object to store it.

In contrast, left shifts are not optimized this way, and every left shift deals with the integer being shifted as an array. In particular, if you look at the source code for long_lshift(), in the case of a small but positive left shift, a 2-digit integer object is always created, if only to have its length truncated to 1 later: (my comments in /*** ***/)

static PyObject *
long_lshift(PyObject *v, PyObject *w)
{
/*** ... ***/

wordshift = shiftby / PyLong_SHIFT; /*** zero for small w ***/
remshift = shiftby - wordshift * PyLong_SHIFT; /*** w for small w ***/

oldsize = Py_ABS(Py_SIZE(a)); /*** 1 for small v > 0 ***/
newsize = oldsize + wordshift;
if (remshift)
++newsize; /*** here newsize becomes at least 2 for w > 0, v > 0 ***/
z = _PyLong_New(newsize);

/*** ... ***/
}


Integer division

You didn't ask about the worse performance of integer floor division compared to right shifts, because that fit your (and my) expectations. But dividing a small positive number by another small positive number is not as optimized as small multiplications, either. Every // computes both the quotient and the remainder using the function long_divrem(). This remainder is computed for a small divisor with a multiplication, and is stored in a newly-allocated integer object, which in this situation is immediately discarded.

Or at least, that was the case when this question was originally asked. In CPython 3.6, a fast path for small int // was added, so // now beats >> for small ints too.

Is shifting bits faster than multiplying and dividing in Java? .NET?

Most compilers today will do more than convert multiply or divide by a power-of-two to shift operations. When optimizing, many compilers can optimize a multiply or divide with a compile time constant even if it's not a power of 2. Often a multiply or divide can be decomposed to a series of shifts and adds, and if that series of operations will be faster than the multiply or divide, the compiler will use it.

For division by a constant, the compiler can often convert the operation to a multiply by a 'magic number' followed by a shift. This can be a major clock-cycle saver since multiplication is often much faster than a division operation.

Henry Warren's book, Hacker's Delight, has a wealth of information on this topic, which is also covered quite well on the companion website:

  • http://www.hackersdelight.org/

See also a discussion (with a link or two ) in:

  • Reading assembly code

Anyway, all this boils down to allowing the compiler to take care of the tedious details of micro-optimizations. It's been years since doing your own shifts outsmarted the compiler.

What are bitwise shift (bit-shift) operators and how do they work?

The bit shifting operators do exactly what their name implies. They shift bits. Here's a brief (or not-so-brief) introduction to the different shift operators.

The Operators

  • >> is the arithmetic (or signed) right shift operator.
  • >>> is the logical (or unsigned) right shift operator.
  • << is the left shift operator, and meets the needs of both logical and arithmetic shifts.

All of these operators can be applied to integer values (int, long, possibly short and byte or char). In some languages, applying the shift operators to any datatype smaller than int automatically resizes the operand to be an int.

Note that <<< is not an operator, because it would be redundant.

Also note that C and C++ do not distinguish between the right shift operators. They provide only the >> operator, and the right-shifting behavior is implementation defined for signed types. The rest of the answer uses the C# / Java operators.

(In all mainstream C and C++ implementations including GCC and Clang/LLVM, >> on signed types is arithmetic. Some code assumes this, but it isn't something the standard guarantees. It's not undefined, though; the standard requires implementations to define it one way or another. However, left shifts of negative signed numbers is undefined behaviour (signed integer overflow). So unless you need arithmetic right shift, it's usually a good idea to do your bit-shifting with unsigned types.)



Left shift (<<)

Integers are stored, in memory, as a series of bits. For example, the number 6 stored as a 32-bit int would be:

00000000 00000000 00000000 00000110

Shifting this bit pattern to the left one position (6 << 1) would result in the number 12:

00000000 00000000 00000000 00001100

As you can see, the digits have shifted to the left by one position, and the last digit on the right is filled with a zero. You might also note that shifting left is equivalent to multiplication by powers of 2. So 6 << 1 is equivalent to 6 * 2, and 6 << 3 is equivalent to 6 * 8. A good optimizing compiler will replace multiplications with shifts when possible.

Non-circular shifting

Please note that these are not circular shifts. Shifting this value to the left by one position (3,758,096,384 << 1):

11100000 00000000 00000000 00000000

results in 3,221,225,472:

11000000 00000000 00000000 00000000

The digit that gets shifted "off the end" is lost. It does not wrap around.



Logical right shift (>>>)

A logical right shift is the converse to the left shift. Rather than moving bits to the left, they simply move to the right. For example, shifting the number 12:

00000000 00000000 00000000 00001100

to the right by one position (12 >>> 1) will get back our original 6:

00000000 00000000 00000000 00000110

So we see that shifting to the right is equivalent to division by powers of 2.

Lost bits are gone

However, a shift cannot reclaim "lost" bits. For example, if we shift this pattern:

00111000 00000000 00000000 00000110

to the left 4 positions (939,524,102 << 4), we get 2,147,483,744:

10000000 00000000 00000000 01100000

and then shifting back ((939,524,102 << 4) >>> 4) we get 134,217,734:

00001000 00000000 00000000 00000110

We cannot get back our original value once we have lost bits.



Arithmetic right shift (>>)

The arithmetic right shift is exactly like the logical right shift, except instead of padding with zero, it pads with the most significant bit. This is because the most significant bit is the sign bit, or the bit that distinguishes positive and negative numbers. By padding with the most significant bit, the arithmetic right shift is sign-preserving.

For example, if we interpret this bit pattern as a negative number:

10000000 00000000 00000000 01100000

we have the number -2,147,483,552. Shifting this to the right 4 positions with the arithmetic shift (-2,147,483,552 >> 4) would give us:

11111000 00000000 00000000 00000110

or the number -134,217,722.

So we see that we have preserved the sign of our negative numbers by using the arithmetic right shift, rather than the logical right shift. And once again, we see that we are performing division by powers of 2.

What is the difference between 1 x and pow(2, x)?

In general terms, for most languages:

The pow function is likely to be less efficient, as it essentially general purpose and can raise any number to any power, so cannot be optimized as easily for special cases.

The bit shifting will map directly to a processor level operation.

In some languages, the compiler would see that the operations have a constant result and use that, however if there is a little more complexity, then the optimization may be missed.



Related Topics



Leave a reply



Submit