Making std::vector allocate aligned memory
Starting in C++17, just use std::vector<__m256i>
or with any other aligned type. There's aligned version of operator new
, it is used by std::allocator
for aligned types (as well as by plain new
-expression, so new __m256i[N]
is also safe starting in C++17).
There's a comment by @MarcGlisse saying this, making this an answer to make it more visible.
Usage of alignas in template argument of std::vector
If alignas(32)double
compiled, it would require that each element separately had 32-byte alignment, i.e. pad each double out to 32 bytes, completely defeating SIMD. (I don't think it will compile, but similar things with GNU C typedef double da __attribute__((aligned(32)))
do compile that way, with sizeof(da) == 32
.)
See Modern approach to making std::vector allocate aligned memory for working code.
As of C++17, std::vector<__m256d>
would work, but is usually not what you want because it makes scalar access a pain.
C++ sucks for this in my experience, although there might be a standard (or Boost) allocator that takes an over-alignment you can use as the second (usually defaulted) template param.
std::vector<double, some_aligned_allocator<32> >
still isn't type-compatible with normal std::vector
, which makes sense because any function that might reallocated it has to maintain alignment. But unfortunately that makes it not type-compatible even for passing to functions that only want read-only access to a std::vector
of double
elements.
Cost of misalignment
For a lot of cases the misalignment is only a couple percent worse than aligned, for AVX/AVX2 loops over an array if data's coming from L3 cache or RAM (on recent Intel CPUs); only with 64-byte vectors do you get a significantly bigger penalty (like 15% or so even when memory bandwidth is still the bottleneck.) You'd hope that the CPU core would have time to deal with it and keep the same number of outstanding off-core transactions in flight. But it doesn't.
For data hot in L1d, misalignment could hurt more even with 32-byte vectors.
In x86-64 code, alignof(max_align_t)
is 16 on mainstream C++ implementations, so in practice even a vector<double>
will end up aligned by 16 at least because the underlying allocator used by new
always aligns at least that much. But that's very often an odd multiple of 16, at least on GNU/Linux. Glibc's allocator (also used by malloc) for large allocations uses mmap
to get a whole range of pages, but it reserves the first 16 bytes for bookkeeping info. This is unfortunate for AVX and AVX-512 because it means your arrays are always misaligned unless you used aligned allocations. (How to solve the 32-byte-alignment issue for AVX load/store operations?)
Mainstream std::vector
implementations are also inefficient when they have to grow: C++ doesn't provide a realloc
equivalent that's compatible with new/delete, so it always has to allocate more space and copy to the start. Never even trying to allocate more space contiguous with the existing mapping (which would be safe even for non-trivially-copyable types), and not using implementation-specific tricks like Linux mremap
to map the same physical pages to a different virtual address without having to copy all those mega/gigabytes. The fact that C++ allows code to redefine operator new
means library implementations of std::vector can't just use a better allocator, either. All of this is a non-problem if you .reserve
the size you're going to need, but it is pretty dumb.
Is it possible to have a std::vector char allocate memory with a chosen memory alignment
I could solve my issue with a custom allocator.
Example with boost::alignment::aligned_allocator
#include <vector>
#include <boost/align/aligned_allocator.hpp>
template <typename T>
using aligned_vector = std::vector<T, boost::alignment::aligned_allocator<T, 16>>;
// 16 bytes aligned allocation
See also How is a vector's data aligned?.
How is a vector's data aligned?
C++ standard requires allocation functions (malloc()
and operator new()
) to allocate memory suitably aligned for any standard type. As these functions don't receive the alignment requirement as an argument, in practice it means that the alignment for all allocations is the same, and is that of a standard type with the largest alignment requirement, which often is long double
and/or long long
(see boost max_align union).
Vector instructions, such as SSE and AVX, have stronger alignment requirements (16-byte aligned for 128-bit access and 32-byte aligned for 256-bit access) than that provided by the standard C++ allocation functions. posix_memalign()
or memalign()
can be used to satisfy such allocations with stronger alignment requirements.
In C++17 the allocation functions accept an additional argument of type std::align_val_t
.
You can make use of it like:
#include <immintrin.h>
#include <memory>
#include <new>
int main() {
std::unique_ptr<__m256i[]> arr{new(std::align_val_t{alignof(__m256i)}) __m256i[32]};
}
Moreover, in C++17 the standard allocators have been updated to respect type's alignment, so you can simply do:
#include <immintrin.h>
#include <vector>
int main() {
std::vector<__m256i> arr2(32);
}
Or (no heap allocation involved and supported in C++11):
#include <immintrin.h>
#include <array>
int main() {
std::array<__m256i, 32> arr3;
}
Aligned allocation of elements in vector
In practice, I see that it is true in Visual Studio 2019, and in gcc 8+. But can I be absolutely sure, or is it just a coincidence and some custom allocator in
std::vector
(likeboost::alignment::aligned_allocator
) is necessary?
There is no reason to expect that, provided the absence of bugs in the implementation of the respective compiler (which can however be checked on the assembly level, if required).
Since C++11, there is the alignas
-specifier which allows you to enforce the alignment in a standardized way. Consequently, the standard allocator will call operator new
upon calling allocator::allocate()
, to which it will forward the alignment information according to the documentation. Thus, the standard allocator already respects alignment needs, if specified. However, of course if the global operator new
is overloaded by a custom implementation, no such guarantee can be made.
How to make tr1::array allocate aligned memory?
tr1::array
(and std::array
and boost::array
) are POD, so the memory occupied by the contents is coincident with the memory of the array
. So, allocate the array
however you need to, and construct it with placement new
.
typedef std::tr1::array< MyClass, ary_sz > AryT;
void *array_storage = aligned_allocation( sizeof( AryT ) );
AryT *ary = new( array_storage ) AryT( initial_value );
How to create std::vector of char/std::byte where first byte is aligned to 16 byes, but there is no padding?
Aligning the data in a vector ain't provided by default. Not even for aligned classes.
The best way of doing alignment is with the aligned_allocator of boost.
Unfortunately, it doesn't prevent padding, it even overallocates to adapt the pointer on the alignment. From C++17, it can used aligned new (see std::aligned_val_t
overloads). However, all implementations I've seen actually use the same trick.
An alternative is allocating a whole page at once, and do your own memory management with a custom allocator. You can do it, though it might take a lot of time to do correctly.
Related Topics
How to Create a Single Instance Application in C or C++
Capturing a Reference by Reference in a C++11 Lambda
Format Number with Commas in C++
What Is the Easiest Way to Parse an Ini File in C++
The Simplest and Neatest C++11 Scopeguard
Computing Length of a C String at Compile Time. Is This Really a Constexpr
Building Glew on Windows with Mingw
Q_Object Throwing 'Undefined Reference to Vtable' Error
When Are Static and Global Variables Initialized
Generate Sha256 with Openssl and C++
How to Convert Euler Angles to Directional Vector
Why Do You Use Std::Move When You Have && in C++11
Detecting CPU Architecture Compile-Time