What Is Data Alignment? Why and When Should I Be Worried When Typecasting Pointers in C

What is data alignment? Why and when should I be worried when typecasting pointers in C?

I'll try to explain in short.

What is data alignment?

The architecture in you computer is composed of processor and memory.
Memory is organized in cells, so:

 0x00 |   data  |  
 0x01 |   ...   |
 0x02 |   ...   |

Each memory cell has a specified size, amount of bits it can store. This is architecture dependent.

When you define a variable in your C/C++ program, one or more different cells are occupied by your program.

For example

int variable = 12;

Suppose each cell contains 32 bits and the int type size is 32 bits, then in somewhere in your memory:

variable: | 0 0 0 c |  // c is hexadecimal of 12.

When your CPU has to operate on that variable it needs to bring it inside its register. A CPU can take in "1 clock" a small amount of bit from the memory, that size is usually called WORD. This dimension is architecture dependent as well.

Now suppose you have a variable which is stored, because of some offset, in two cells.

For example I have two different pieces data to store (I'm going to use a "string representation to make more clear"):

data1: "ab"
data2: "cdef"

So the memory will be composed in that way (2 different cells):

|a b c d|     |e f 0 0|

That is, data1 occupies just half of the cell, so data2 occupies the remaining part and a part of a second cell.

Now suppose you CPU wants to read data2. The CPU needs 2 clocks in order to access the data, because within one clock it reads the first cell and within the other clock it reads the remaining part in the second cell.

If we align data2 in accordance with this memory-example, we can
introduce a sort of padding and shift data2 all in the second cell.

|a b 0 0|     |c d e f|
     ---
   padding

In that way the CPU will lose only "1 clock" in order to access to data2.

What an align system does

An align system just introduces that padding in order to align the data with the memory of the system, in accordance with the architecture.

Why should I care about alignment?

I will not go deep in this answer.
However, broadly speaking, memory alignment comes from the requirements of the context.

In the example above, having padding (so the data is memory-aligned) can save CPU cycles in order to retrieve the data. This might have an impact on the execution performance of the program because of minor number of memory access.

However, beyond the above example (made only for sake of the explanation), there are many other scenarios where memory alignment is useful or even needed.

For example, some architectures might have strict requirements how the memory can be accessed. In such cases, the padding helps to allocate memory fulfilling the platform constraints.

Should I worry about the alignment during pointer casting?

1. Is it REALLY safe to dereference the pointer after casting in a real project?

If the pointer happens to not be aligned properly it really can cause problems. I've personally seen and fixed bus errors in real, production code caused by casting a char* to a more strictly aligned type. Even if you don't get an obvious error you can have less obvious issues like slower performance. Strictly following the standard to avoid UB is a good idea even if you don't immediately see any problems. (And one rule the code is breaking is the strict aliasing rule, § 3.10/10*)

A better alternative is to use std::memcpy() or std::memmove if the buffers overlap (or better yet bit_cast<>())

unsigned char data[16];
int i1, i2, i3, i4;
std::memcpy(&i1, data     , sizeof(int));
std::memcpy(&i2, data +  4, sizeof(int));
std::memcpy(&i3, data +  8, sizeof(int));
std::memcpy(&i4, data + 12, sizeof(int));

Some compilers work harder than others to make sure char arrays are aligned more strictly than necessary because programmers so often get this wrong though.

#include <cstdint>
#include <typeinfo>
#include <iostream>

template<typename T> void check_aligned(void *p) {
    std::cout << p << " is " <<
      (0==(reinterpret_cast<std::intptr_t>(p) % alignof(T))?"":"NOT ") <<
      "aligned for the type " << typeid(T).name() << '\n';
}

void foo1() {
    char a;
    char b[sizeof (int)];
    check_aligned<int>(b); // unaligned in clang
}

struct S {
    char a;
    char b[sizeof(int)];
};

void foo2() {
    S s;
    check_aligned<int>(s.b); // unaligned in clang and msvc
}

S s;

void foo3() {
    check_aligned<int>(s.b); // unaligned in clang, msvc, and gcc
}

int main() {
    foo1();
    foo2();
    foo3();
}

http://ideone.com/FFWCjf

2. Is there any difference between C-style casting and reinterpret_cast?

It depends. C-style casts do different things depending on the types involved. C-style casting between pointer types will result in the same thing as a reinterpret_cast; See § 5.4 Explicit type conversion (cast notation) and § 5.2.9-11.

3. Is there any difference between C and C++?

There shouldn't be as long as you're dealing with types that are legal in C.

* Another issue is that C++ does not specify the result of casting from one pointer type to a type with stricter alignment requirements. This is to support platforms where unaligned pointers cannot even be represented. However typical platforms today can represent unaligned pointers and compilers specify the results of such a cast to be what you would expect. As such, this issue is secondary to the aliasing violation. See [expr.reinterpret.cast]/7.

Additional questions on memory alignment

1) Why is cast_1 UB?

Because the language rules say so. Multiple rules in fact.

The offset where you access the object does not meet the alignment requirements of int32_t (except on systems where the alignment requirement is 1). No objects can be created without conforming to the alignment requirement of the type.
A char pointer may not be aliased by a int32_t pointer.

2) Is cast_2 a correct approach to fixing the UB of cast_1?

cast_2 has well defined behaviour. The reinterpret_cast in that function is redundant, and it is bad to use magic constants (use sizeof).

Data layouts used by C compilers (the alignment concept)

First of all, the machine 1 is not special at all - it is exactly like a x86-32 or 32-bit ARM.

Moreover I find it quite weird that in the row for double the size of the type is more than what is given in the alignment field. Shouldn't alignment(in bits) ≥ size (in bits) ? Because alignment refers to the memory actually allocated for the data object (?).

No, this isn't true. Alignment means that the address of the lowest addressable byte in the object must be divisible by the given number of bytes.

Additionally, with C, it is also true that within arrays sizeof (ElementType) will need to be greater than or equal to the alignment of each member and sizeof (ElementType) be divisible by alignment, thus the footnote a. Therefore on the latter computer:

 struct { char a, b; }

might have sizeof 16 because the characters are in distinct addressable words, whereas

struct { char a[2];  }

could be squeezed into 8 bytes.

how should this statement about the concept of the pointers, based on alignment is to be visualized (2^6 = 64, it is fine but how is this 6 bits correlating with the alignment concept)

As for the character pointers, the 6 bits is bogus. 3 bits are needed to choose one of the 8 bytes within the 8-byte words, so this is an error in the book. An ordinary byte would select just a word with 24 bits, and a character (a byte) pointer would select the word with 24 bits, and one of the 8-bit bytes inside the word with 3 bits.

What is aligned memory allocation?

Alignment requirements specify what address offsets can be assigned to what types. This is completely implementation-dependent, but is generally based on word size. For instance, some 32-bit architectures require all int variables start on a multiple of four. On some architectures, alignment requirements are absolute. On others (e.g. x86) flouting them only comes with a performance penalty.

malloc is required to return an address suitable for any alignment requirement. In other words, the returned address can be assigned to a pointer of any type. From C99 §7.20.3 (Memory management functions):

The pointer returned if the allocation
succeeds is suitably aligned so that
it may be assigned to a pointer to any
type of object and then used to access
such an object or an array of such
objects in the space allocated (until
the space is explicitly deallocated).

What does alignment means in .comm directives?

Pay attention to the error message: "alignment 1 of symbol 'hello' in file.o is smaller than 4". The problem is not with the .comm it's the .int in the first file that does not specify an alignment. To fix it, you can put a .balign 4 before the .int.

Pointer Alignment when Implementing a container_of

If your pointer originally points to a valid object, then you may cast it to void* and than back to the original type as often as you want (cf, for example, this online C11 standard draft):

6.3.2.3 Pointers

(1) A pointer to void may be converted to or from a pointer to any
object type. A pointer to any object type may be converted to a
pointer to void and back again; the result shall compare equal to the
original pointer.

So your sequence of casts does not introduce undefined behaviour.

Does casting to a char pointer to increment a pointer by a certain amount and then accessing as a different type violate strict aliasing?

Does casting to a char pointer to increment a pointer by a certain amount and then accessing as a different type violate strict aliasing?

Not inherently so.

Normally, accessing an int * casted from a char * violates strict aliasing rules

Not necessarily. Strict aliasing is about the (effective) type of the pointed-to object. It is quite possible for the object to which a char * points to be an int, or compatible with int, or to be assigned effective type int as a consequence of the (write) access. In such cases, casting to int * and dereferencing the result is perfectly valid.

There are, yes, lots of cases in which casting a char * to an int * and then dereferencing the result would constitute a strict-aliasing violation, but it is not specifically because of the involvement of, or the casting to or from, type char *.

The above applies regardless of how the particular char * value was obtained, so in your particular example case, too. If the result of your pointer computation is a valid pointer, and the object to which it points is genuinely an (effective) int or is compatible with int in one of the specific ways documented in section 6.5 of the language spec, then reading the pointed-to value via the pointer is fine. Otherwise, it is a strict-aliasing violation.

Attempting to dereference a pointer value that is not correctly aligned for its type is a potential issue in general with pointer manipulation, but the strict aliasing rule is stronger than and effectively inclusive of pointer alignment considerations. If you have an access that satisfies the strict aliasing rule then the pointer involved must be satisfactorily aligned for its type. The reverse is not necessarily true.

Do note, however, that although on many platforms, your align16() will indeed attempt to perform a read of a 16-byte-aligned object, the C language specifications do not require that to be so. Pointer-to-integer and integer-to-pointer conversions are explicitly allowed, but their results are implementation defined. It is not necessarily the case that value on the integer side of such a conversion reports on or controls the alignment of the pointer on the other side.

How does the standard deal with such case, accessing a pointer modified while casted to a uintptr_t?

See above. Pointer-to-integer and integer-to-pointer conversions have implementation-defined effect as far as the language spec is concerned. However, on most implementations you're likely to meet, your two versions of align16() will have equivalent behavior.

What Is Data Alignment? Why and When Should I Be Worried When Typecasting Pointers in C