Memory alignment : how to use alignof / alignas?
Alignment is a restriction on which memory positions a value's first byte can be stored. (It is needed to improve performance on processors and to permit use of certain instructions that works only on data with particular alignment, for example SSE need to be aligned to 16 bytes, while AVX to 32 bytes.)
Alignment of 16 means that memory addresses that are a multiple of 16 are the only valid addresses.
alignas
force alignment to the required number of bytes. You can only align to powers of 2: 1, 2, 4, 8, 16, 32, 64, 128, ...
#include <cstdlib>
#include <iostream>
int main() {
alignas(16) int a[4];
alignas(1024) int b[4];
printf("%p\n", a);
printf("%p", b);
}
example output:
0xbfa493e0
0xbfa49000 // note how many more "zeros" now.
// binary equivalent
1011 1111 1010 0100 1001 0011 1110 0000
1011 1111 1010 0100 1001 0000 0000 0000 // every zero is just a extra power of 2
the other keyword
alignof
is very convenient, you cannot do something like
int a[4];
assert(a % 16 == 0); // check if alignment is to 16 bytes: WRONG compiler error
but you can do
assert(alignof(a) == 16);
assert(alignof(b) == 1024);
note that in reality this is more strict than a simple "%" (modulus) operation. In fact we know that something aligned to 1024 bytes is necessarily aligned to 1, 2, 4, 8 bytes but
assert(alignof(b) == 32); // fail.
So to be more precise, "alignof" returns the greatest power of 2 to wich something is aligned.
Also alignof is a nice way to know in advance minimum alignment requirement for basic datatypes (it will probably return 1 for chars, 4 for float etc.).
Still legal:
alignas(alignof(float)) float SqDistance;
Something with an alignment of 16 then will be placed on the next available address that is a multiple of 16 (there may be a implicit padding from last used address).
Practical use cases for alignof and alignas C++ keywords
A common use case for the alignas
specifier is for the scenario where you want to pass multiple objects between different threads through a queue (e.g., an event or task queue) while avoiding false sharing. False sharing will result from having multiple threads competing for the same cache line when they are actually accessing different objects. It is usually undesirable due to performance degradation.
For example – assuming that the cache line size is 64 bytes – given the following Event
class:
struct Event {
int event_type_;
};
The alignment of Event
will correspond to the alignment of its data member, event_type_
. Assuming that the alignment of int
is 4 bytes (i.e., alignof(int)
evaluates to 4), then up to 16 Event
objects can fit into a single cache line. So, if you have a queue like:
std::queue<Event> eventQueue;
Where one thread pushes events into the back of the queue, and another thread pulls events from the front, we may have both threads competing for the same cache line. However, by properly using the alignas
specifier on Event
:
struct alignas(64) Event {
int event_type_;
};
This way, an Event
object will always be aligned on a cache line boundary so that a cache line will contain an Event
object at most. Therefore two or more threads will never be competing for the same cache line when accessing distinct Event
objects (if multiple threads are accessing the same Event
object, they will obviously compete for the same cache line).
What are the alignas and alignof keywords used for?
Some special types must be aligned at more bytes than usual- for example, matrices must be aligned at 16bytes on x86 for the most efficient copying to the GPU. SSE vector types can behave this way too. As such, if you want to make a container type, then you must know the alignment requirements of the type you're trying to contain or allocate.
aligned_malloc() vs alignas() for Constant Buffers
Yes, you could use it like this:
struct SceneConstantBuffer
{
alignas(16) DirectX::XMFLOAT4X4 ViewProjection[2];
alignas(16) DirectX::XMFLOAT4 EyePosition[2];
alignas(16) DirectX::XMFLOAT3 LightDirection{};
alignas(16) DirectX::XMFLOAT3 LightDiffuseColor{};
alignas(16) int NumSpecularMipLevels{ 1 };
};
What won't work is __declspec(align)
...
EDIT: If you want to use it on the struct itself something similar to this should work too:
struct alignas(16) SceneConstantBuffer
{
DirectX::XMMATRIX ViewProjection; // 16-bytes
...
DirectX::XMFLOAT3 LightDiffuseColor{};
}
How to tell the maximum data alignment requirement in C++
You might be looking for std::max_align_t
Does the alignas specifier work with 'new'?
Before C++17, if your type's alignment is not over-aligned, then yes, the default new
will work. "Over-aligned" means that the alignment you specify in alignas
is greater than alignof(std::max_align_t)
. The default new
will work with non-over-aligned types more or less by accident; the default memory allocator will always allocate memory with an alignment equal to alignof(std::max_align_t)
.
If your type's alignment is over-aligned however, your out of luck. Neither the default new
, nor any global new
operator you write, will be able to even know the alignment required of the type, let alone allocate memory appropriate to it. The only way to help this case is to overload the class's operator new
, which will be able to query the class's alignment with alignof
.
Of course, this won't be useful if that class is used as the member of another class. Not unless that other class also overloads operator new
. So something as simple as new pair<over_aligned, int>()
won't work.
C++17 adds a number of memory allocators which are given the alignment of the type being used. These allocators are used specifically for over-aligned types (or more specifically, new-extended over-aligned types). So new pair<over_aligned, int>()
will work in C++17.
Of course, this only works to the extent that the allocator handles over-aligned types.
Where can I use alignas() in C++11?
You cannot apply an alignment to a typedef
. In the C++ model of alignment specifiers, the alignment is an inseparable part of the type itself, and a typedef
does not create a new type (it only provides a new name for an existing type) so it is not meaningful to apply an alignment specifier in a typedef
declaration.
From [dcl.align] (7.6.2)p1:
An alignment-specifier may be applied to a variable or to a class data member [...]. An alignment-specifier may also be applied to the declaration or definition of a class (in an elaborated-type-specifier (7.1.6.3) or class-head (Clause 9), respectively) and to the declaration or definition of an enumeration (in an opaque-enum-declaration
or enum-head, respectively (7.2)).
These are the only places where the standard says an alignment-specifier (alignas(...)
) may be applied. Note that this does not include typedef
declarations nor alias-declarations.
Per [dcl.attr.grammar] (7.6.1)p4:
If an attribute-specifier-seq that appertains to some entity or statement contains an attribute that is not allowed to apply to that entity or statement, the program is ill-formed.
This wording was intended to apply to alignas
as well as the other forms of attribute that may appear within an attribute-specifier-seq, but was not correctly updated when alignment switched from being a "real" attribute to being a different kind of attribute-specifier-seq.
So: your example code using alignas
is supposed to be ill-formed. The C++ standard does not currently explicitly say this, but it also does not permit the usage, so instead it currently would result in undefined behavior (because the standard does not define any behavior for it).
Related Topics
"Cannot Evaluate Function -- May Be In-Lined" Error in Gdb for Stl Template Container
Memory Management Patterns in C++
What Are the Advantages of Using the C++ Boost Libraries
What Does Static_Assert Do, and What Would You Use It For
C++, Std::Atomic, What Is Std::Memory_Order and How to Use Them
How to Determine Distance from an Object in a Video
Need a Fast Random Generator for C++
Is There a C++ Gdb Gui for Linux
C++11 Std::Threads VS Posix Threads
How to Hint the Optimizer by Giving the Range of an Integer
Traceback a Pointer in C++ Code Gdb
Constraining the Existing Boost.Spirit Real_Parser (With a Policy)
What's the Point of Const Pointers
What Differences, If Any, Between C++03 and C++11 Can Be Detected at Run-Time
Difference Between | and || , or & and &&
Why the Initializer of Std::Function Has to Be Copyconstructible
Is It Ok to Specialize Std::Numeric_Limits<T> for User-Defined Number-Like Classes