Examples of When a Bitwise Swap() Is a Bad Idea

Examples of when a bitwise swap() is a bad idea?

I'm going to argue that this is almost always a bad idea except in the specific case where profiling has been done and a more obvious and clear implementation of swap has performance problems. Even in that case I would only go with this sort of approach for straight up no-inheritance structures, never for any sort of class. You never know when inheritance will be added potentially breaking the whole thing (possibly in truly insidious ways too).

If you want a fast swap implementation perhaps a better choice (where appropriate) is to pimpl the class and then just swap out the implementation (again, this assumes that there are no back-pointers to the owner, but that's easily contained to the class & impl rather than external factors).

EDIT: Possible problems with this approach:

Pointers back to self (directly or indirectly)
If the class contains any object for which a straight byte-copy is meaningless (effectively recursing this definition) or for which copying is normally disabled
If the class needs any sort of locking to copy
It's easy to accidentally pass in two different types here (all it takes is one intermediate function to implicitly make a derived class look like the parent) and then you swap vptrs (OUCH!)

Making swap faster, easier to use and exception-safe

So why not simply write a swap template that does exactly that: swap the object representations*?

There's many ways in which an object, once being constructed, can break when you copy the bytes it resides in. In fact, one could come up with a seemingly endless number of cases where this would not do the right thing - even though in practice it might work in 98% of all cases.

That's because the underlying problem to all this is that, other than in C, in C++ we must not treat objects as if they are mere raw bytes. That's why we have construction and destruction, after all: to turn raw storage into objects and objects back into raw storage. Once a constructor has run, the memory where the object resides is more than only raw storage. If you treat it as if it weren't, you will break some types.

However, essentially, moving objects shouldn't perform that much worse than your idea, because, once you start to recursively inline the calls to std::move(), you usually ultimately arrive at where built-ins are moved. (And if there's more to moving for some types, you'd better not fiddle with the memory of those yourself!) Granted, moving memory en bloc is usually faster than single moves (and it's unlikely that a compiler might find out that it could optimize the individual moves to one all-encompassing std::memcpy()), but that's the price we pay for the abstraction opaque objects offer us. And it's quite small, especially when you compare it to the copying we used to do.

You could, however, have an optimized swap() using std::memcpy() for aggregate types.

Byte swapping in bit wise operations

Extract the ith byte by using ((1ll << ((i + 1) * 8)) - 1) >> (i * 8). Swap using the XOR operator, and put the swapped bytes in their places.

int x, y, z;
y = 1, z = 3;
x = 0x12345678;

int a, b; /* bytes to swap */
a = (x & ((1ll << ((y + 1) * 8)) - 1)) >> (y * 8);
b = (x & ((1ll << ((z + 1) * 8)) - 1)) >> (z * 8);

/* swap */
a = a ^ b;
b = a ^ b;
a = a ^ b;

/* put zeros in bytes to swap */
x = x & (~((0xff << (y * 8))));
x = x & (~((0xff << (z * 8))));

/* put new bytes in place */
x = x | (a << (y * 8));
x = x | (b << (z * 8));

Why is a = (a+b) - (b=a) a bad choice for swapping two integers?

No. This is not acceptable. This code invokes Undefined behavior. This is because of the operation on b is not defined. In the expression

a=(a+b)-(b=a);

it is not certain whether b gets modified first or its value gets used in the expression (a+b) because of the lack of the sequence point.

See what standard syas:

C11: 6.5 Expressions:

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an unsequenced side
effect occurs in any of the orderings.84)₁.

Read C-faq- 3.8 and this answer for more detailed explanation of sequence point and undefined behavior.

_{1. Emphasis is mine.}

Real world use cases of bitwise operators

Bit fields (flags)

They're the most efficient way of representing something whose state is defined by several "yes or no" properties. ACLs are a good example; if you have let's say 4 discrete permissions (read, write, execute, change policy), it's better to store this in 1 byte rather than waste 4. These can be mapped to enumeration types in many languages for added convenience.
Communication over ports/sockets

Always involves checksums, parity, stop bits, flow control algorithms, and so on, which usually depend on the logic values of individual bytes as opposed to numeric values, since the medium may only be capable of transmitting one bit at a time.
Compression, Encryption

Both of these are heavily dependent on bitwise algorithms. Look at the deflate algorithm for an example - everything is in bits, not bytes.
Finite State Machines

I'm speaking primarily of the kind embedded in some piece of hardware, although they can be found in software too. These are combinatorial in nature - they might literally be getting "compiled" down to a bunch of logic gates, so they have to be expressed as AND, OR, NOT, etc.
Graphics
There's hardly enough space here to get into every area where these operators are used in graphics programming. XOR (or ^) is particularly interesting here because applying the same input a second time will undo the first. Older GUIs used to rely on this for selection highlighting and other overlays, in order to eliminate the need for costly redraws. They're still useful in slow graphics protocols (i.e. remote desktop).

Those were just the first few examples I came up with - this is hardly an exhaustive list.

XOR operator in C

You use exclusive or to flip bits - the ones that are ON are turned OFF, and vice versa. This can be handy for swapping two numbers without having space for a third number, for example.

0x0A ^ 0xFF = 0x03 ( 00001010 ^ 11111111 = 11110101 )

Swapping numbers:

operation   example 
             A      B
initial:   0011    1010
a = a^b;   1001    1010
b = a^b;   1001    0011
a = a^b;   1010    0011

As you can see, the numbers (nibbles in this case) A and B were swapped without using additional space. This works for any two numbers of the same type (although in C, bitwise operators expect unsigned integers)

XOR operations are also used for "weak encryption". You can take the XOR of a string with a (repeated) "code word", and the result will be a string of bytes that make no sense. Apply the same operation again, and the original appears. This is quite a weak algorithm, and should never be used in real life.

Regarding toggling bits - in the "olden days", if you wanted to invert an image, you would do pix = pix & 1 for each pixel - or if you could do it a byte at a time, byte = byte & 0xFF. This would turn black text on a white background into white text on a black background. I think ATARI had a patent for creating a blinking cursor "anywhere on the screen" by doing XOR with the bit map.

Similarly if you have a microcontroller that wants to create a blinking light, repeatedly doing state = state XOR 1 will cause the state to toggle.

Finallly, there are many "bit twiddling hacks" that rely on the XOR operation. For example, computing the parity of a word (i.e. whether the number of set bits is odd or even) can be done with the following (source: http://graphics.stanford.edu/~seander/bithacks.html)

unsigned int v;  // word value to compute the parity of
v ^= v >> 16;
v ^= v >> 8;
v ^= v >> 4;
v &= 0xf;
return (0x6996 >> v) & 1;

There are many other "clever tricks" - they usually show up when you are trying to find the fastest way to do something that involves bit manipulation. Most people can go through life perfectly fine without ever really "getting" it, and that's OK. Me, I like bit fiddling.

Swap two variables with XOR

Your problem is easily reduced to xor-swap buffers of raw memory. Something like that.

void xorswap(void *a, void *b, size_t size);

That can be implemented in terms of xorswaps of primitive types. For example:

void xorswap(void *a, void *b, size_t size)
{
    if (a == b)
        return; //nothing to do

    size_t qwords = size / 8;
    size_t rest = size % 8;

    uint64_t *a64 = (uint64_t *)a;
    uint64_t *b64 = (uint64_t *)b;
    for (size_t i = 0; i < qwords; ++i)
        xorswap64(a64++, b64++);
    uint8_t *a8 = (uint8_t*)a64;
    uint8_t *b8 = (uint8_t*)b64;
    for (size_t i = 0; i < rest; ++i)
        xorswap8(a8++, b8++);
}

I leave the implementation of xorswap64() and xorswap8() as an exercise to the reader.

Also note that to be efficient, the original buffers should be 8-byte aligned. If that's not the case, depending on the architecture, the code may work suboptimally or not work at all (again, an exercise to the reader ;-).

Other optimizations are possible. You can even use Duff's device to unroll the last loop, but I don't know if it is worth it. You'll have to profile it to know for sure.

Is switch case faster if using smaller numbers?

This benchmarking method is nonsense, because the compiler can statically determine the value of i between the two cases. Your actual code will probably end up something like this:

start = std::chrono::system_clock::now();
switch (f)
{
    case 0x1234500 : i = 0; break;
    case 0x1234522 : i = 2; break;
    case 0x1234555 : i = 5; break;
    case 0x1234588 : i = 8; break;
    default : break;
}
end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end-start;
std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";

// i = -1;  this line isn't needed, the value isn't used, lets optimize it away

start = std::chrono::system_clock::now();
// Oh great, we already know the value of i
// Because nothing in the previos code could have affected f
// i will get the same value as it did above, and we removed i = -1
// So lets optimize away this pointless code here
end = std::chrono::system_clock::now();
elapsed_seconds = end-start;

You could attempt to declare i as volatile but that may prevent the compiler from other doing optimizations too, so it might not make a valid benchmark test either.

The i = -1; between tests is meaningless, since the compiler can deduct that this value isn't used. So that code will just get removed as well.

The best way might be to read the constant 0x12345688 from a file or user input, so that the compiler can't assume anything about it. You need to do this twice, for both test cases.

Generally: when doing benchmarking like this, always disassemble the code, to verify that your test isn't nonsense.

Examples of When a Bitwise Swap() Is a Bad Idea