How to Read/Write Arbitrary Bits in C/C++

How to read/write arbitrary bits in C/C++

Some 2+ years after I asked this question I'd like to explain it the way I'd want it explained back when I was still a complete newb and would be most beneficial to people who want to understand the process.

First of all, forget the "11111111" example value, which is not really all that suited for the visual explanation of the process. So let the initial value be 10111011 (187 decimal) which will be a little more illustrative of the process.

1 - how to read a 3 bit value starting from the second bit:

    ___  <- those 3 bits
10111011

The value is 101, or 5 in decimal, there are 2 possible ways to get it:

mask and shift

In this approach, the needed bits are first masked with the value 00001110 (14 decimal) after which it is shifted in place:

    ___
10111011 AND
00001110 =
00001010 >> 1 =
     ___
00000101

The expression for this would be: (value & 14) >> 1

shift and mask

This approach is similar, but the order of operations is reversed, meaning the original value is shifted and then masked with 00000111 (7) to only leave the last 3 bits:

    ___
10111011 >> 1
     ___
01011101 AND
00000111
00000101

The expression for this would be: (value >> 1) & 7

Both approaches involve the same amount of complexity, and therefore will not differ in performance.

2 - how to write a 3 bit value starting from the second bit:

In this case, the initial value is known, and when this is the case in code, you may be able to come up with a way to set the known value to another known value which uses less operations, but in reality this is rarely the case, most of the time the code will know neither the initial value, nor the one which is to be written.

This means that in order for the new value to be successfully "spliced" into byte, the target bits must be set to zero, after which the shifted value is "spliced" in place, which is the first step:

    ___ 
10111011 AND
11110001 (241) =
10110001 (masked original value)

The second step is to shift the value we want to write in the 3 bits, say we want to change that from 101 (5) to 110 (6)

     ___
00000110 << 1 =
    ___
00001100 (shifted "splice" value)

The third and final step is to splice the masked original value with the shifted "splice" value:

10110001 OR
00001100 =
    ___
10111101

The expression for the whole process would be: (value & 241) | (6 << 1)

Bonus - how to generate the read and write masks:

Naturally, using a binary to decimal converter is far from elegant, especially in the case of 32 and 64 bit containers - decimal values get crazy big. It is possible to easily generate the masks with expressions, which the compiler can efficiently resolve during compilation:

read mask for "mask and shift": ((1 << fieldLength) - 1) << (fieldIndex - 1), assuming that the index at the first bit is 1 (not zero)
read mask for "shift and mask": (1 << fieldLength) - 1 (index does not play a role here since it is always shifted to the first bit
write mask : just invert the "mask and shift" mask expression with the ~ operator

How does it work (with the 3bit field beginning at the second bit from the examples above)?

00000001 << 3
00001000  - 1
00000111 << 1
00001110  ~ (read mask)
11110001    (write mask)

The same examples apply to wider integers and arbitrary bit width and position of the fields, with the shift and mask values varying accordingly.

Also note that the examples assume unsigned integer, which is what you want to use in order to use integers as portable bit-field alternative (regular bit-fields are in no way guaranteed by the standard to be portable), both left and right shift insert a padding 0, which is not the case with right shifting a signed integer.

Even easier:

Using this set of macros (but only in C++ since it relies on the generation of member functions):

#define GETMASK(index, size) ((((size_t)1 << (size)) - 1) << (index))
#define READFROM(data, index, size) (((data) & GETMASK((index), (size))) >> (index))
#define WRITETO(data, index, size, value) ((data) = (((data) & (~GETMASK((index), (size)))) | (((value) << (index)) & (GETMASK((index), (size))))))
#define FIELD(data, name, index, size) \
  inline decltype(data) name() const { return READFROM(data, index, size); } \
  inline void set_##name(decltype(data) value) { WRITETO(data, index, size, value); }

You could go for something as simple as:

struct A {
  uint bitData;
  FIELD(bitData, one, 0, 1)
  FIELD(bitData, two, 1, 2)
};

And have the bit fields implemented as properties you can easily access:

A a;
a.set_two(3);
cout << a.two();

Replace decltype with gcc's typeof pre-C++11.

Extract bits in C

Extract the bits from x:

int b = x & 0x3E000;  /* x AND (...) 0011 1110 0000 0000 0000*/

Shift the extracted bits:

b = b >> 13; /* 13 bits to the right, so that they land up starting on the first position */

Clean the bits from z:

z &= 0xFFFFFFE0; /* z AND 1111 1111 1111 1111 1111 1111 1110 0000 */

Apply the bits to z:

z |= b;

Extract the bits from y:

b = y & 0x3; /* y AND (...) 0000 0111 */

Shift the extracted bits:

b = b << 5; /* move them 5 bits to the left so that they end up starting on position 5 */

Clear the bits from z:

z &= 0xFFFFFF1F; /* z AND 1111 1111 1111 1111 1111 1111 0001 1111 */

Apply the bits to z:

z |= b;

You can now combine the "clearings" into a single operation:

z &= 0xFFFFFF00; /* z AND 1111 1111 1111 1111 1111 1111 0000 0000 */

All in a single statement:

z = (z & 0xFFFFFF00) | ((y & 0x3) << 5) | ((x & 0x3E000) >> 13);

This works for 32-bits integers, or int32_t (C99 and beyond).
With 64-bits integers (int64_t), you just have to use 0xFFFFFFFFFFFFFF00 as a "clearing mask".

If you're using vanilla int variables, get the size in bits using the following:

printf("int has %ud bits\n", sizeof(int) * 8);

sizeof() returns the size in bytes of an integer, and then you multiply that result by 8 (bits per byte in 99.999% of cases)to get the size in bits of your integer, and therefore the size of the masks you have to apply.

In general:

Extract bits by applying an AND mask, with every bit set to 0, except those you want to extract, which you will set to 1. The reason is that 0 AND X = 0, but 1 AND X = X.
Shift bits to the position you need them to be.
Clear bits by applying an AND mask, with every bit set to 1, except those you want to clear, which you will set to 0. The reason is the same as above.
Apply bits by using an OR operation. You must ensure that the bits you do want to modify on the destination are set to 0, and that those which you do NOT want to modify on the origin are set to 0. The reason for that is that 1 OR X = 1, but 0 OR X = X.

Structure for an array of bits in C

If you're mainly interested in accessing a single bit at a time, you can take an array of unsigned char and treat it as a bit array. For example:

unsigned char array[125];

Assuming 8 bits per byte, this can be treated as an array of 1000 bits. The first 16 logically look like this:

     ---------------------------------------------------------------------------------
byte |                   0                   |              1                        |
     ---------------------------------------------------------------------------------
bit  |  0 |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8 |  9 | 10 | 11 | 12 | 13 | 14 | 15 |
     ---------------------------------------------------------------------------------

Let's say you want to work with bit b. You can then do the following:

Read bit b:

value = (array[b/8] & (1 << (b%8)) != 0;

Set bit b:

array[b/8] |= (1 << (b%8));

Clear bit b:

array[b/8] &= ~(1 << (b%8));

Dividing the bit number by 8 gets you the relevant byte. Similarly, mod'ing the bit number by 8 gives you the relevant bit inside of that byte. You then left shift the value 1 by the bit number to give you the necessary bit mask.

While there is integer division and modulus at work here, the dividend is a power of 2 so any decent compiler should replace them with bit shifting/masking.

Read 'N' bit from a byte

You must define precisely how you count the bits:

starting at 0 or 1
from least significant to most significant or the other way?

Assuming bit 0 is the least significant, you can get bit 7 with this expression:

int bit7 = ((unsigned char)Buffer[0] >> 7) & 1;

Here is a generic loop:

for (int i = 7; i >= 0; i--) {
    putchar('0' + (((unsigned char)Buffer[0] >> i) & 1));
}

How to Read/Write Arbitrary Bits in C/C++