How is a vector's data aligned?
C++ standard requires allocation functions (malloc()
and operator new()
) to allocate memory suitably aligned for any standard type. As these functions don't receive the alignment requirement as an argument, in practice it means that the alignment for all allocations is the same, and is that of a standard type with the largest alignment requirement, which often is long double
and/or long long
(see boost max_align union).
Vector instructions, such as SSE and AVX, have stronger alignment requirements (16-byte aligned for 128-bit access and 32-byte aligned for 256-bit access) than that provided by the standard C++ allocation functions. posix_memalign()
or memalign()
can be used to satisfy such allocations with stronger alignment requirements.
In C++17 the allocation functions accept an additional argument of type std::align_val_t
.
You can make use of it like:
#include <immintrin.h>
#include <memory>
#include <new>
int main() {
std::unique_ptr<__m256i[]> arr{new(std::align_val_t{alignof(__m256i)}) __m256i[32]};
}
Moreover, in C++17 the standard allocators have been updated to respect type's alignment, so you can simply do:
#include <immintrin.h>
#include <vector>
int main() {
std::vector<__m256i> arr2(32);
}
Or (no heap allocation involved and supported in C++11):
#include <immintrin.h>
#include <array>
int main() {
std::array<__m256i, 32> arr3;
}
How is a Vector of Vector aligned in memory?
The size of the vector<int>
struct that is stored in ref
is constant. Common implementations has this as three pointers, or around 12 bytes on 32-bit architectures, or 24 bytes on shiny new 64-bit architectures.
So ref
manages roughly ref.capacity() * 12 bytes of continuous storage.
Each element/vector<int>
in ref manages its own integers independent of the elements ref
manages. In the artistic rendering below ref.size() == ref.capacity()
for the sake of simplicity.
So your
ref.resize(i);
only affects the top row. Your
ref[i].push_back(23);
only affects the i-th column.
Making std::vector allocate aligned memory
Starting in C++17, just use std::vector<__m256i>
or with any other aligned type. There's aligned version of operator new
, it is used by std::allocator
for aligned types (as well as by plain new
-expression, so new __m256i[N]
is also safe starting in C++17).
There's a comment by @MarcGlisse saying this, making this an answer to make it more visible.
if T is aligned, std::vector T is aligned too?
C++ default allocators are required to align structs properly aligned for any so-called standard type, and padding automatically added at the end of a struct (visible via sizeof()
) generally facilitates this in contiguous allocations.
struct C {
uint8_t a; // followed by 7B of invisible padding to naturally align b
uint64_t b;
uint32_t c;
uint8_t d; // followed by 3B padding for C (natural alignment of 8B due to b)
};
// sizeof(C) = 24B, alignof(C) = 8B
struct D {
uint8_t a; // followed by 3B padding for b
uint32_t b;
uint8_t c; // followed by 3B padding for D (natural alignment of 4B due to b)
};
// sizeof(D) = 12B, alignof(D) = 4B
struct E {
__m256 v; // SSE/AVX intrinsics handle natural alignment properly too
char v2;
};
// sizeof(E) = 64B, alignof(E) = 32B
For most cases, this is adequate, but if you are doing fancy casting tricks or need 64B cache line alignment, etc., you can use alignas()
, provided you are using C++11 or newer. This works partially by padding the end of the structure too:
struct alignas(64) F {
double stuff[3];
};
// sizeof(F) = 64B, alignof(F) = 64B
void foo() {
F f[4];
// these addresses separated by (and even multiples of) 0x40 bytes:
cout << &f[0] << " " << &f[1] << " " << &f[2] << endl;
}
Use std::aligned_storage<T>
if you need a large block aligned against, e.g., 4 kiB page boundaries. But then you're on your own with placement new
in general and lose the convenience of std::vector<>
doing everything for you.
C++ struct alignment and STL vectors
The standard requires you to be able to create an array of a struct type. When you do so, the array is required to be contiguous. That means, whatever size is allocated for the struct, it has to be one that allows you to create an array of them. To ensure that, the compiler can allocate extra space inside the structure, but cannot require any extra space between the structs.
The space for the data in a vector
is (normally) allocated with ::operator new
(via an Allocator class), and ::operator new
is required to allocate space that's properly aligned to store any type.
You could supply your own Allocator and/or overload ::operator new
-- but if you do, your version is still required to meet the same requirements, so it won't change anything in this respect.
In other words, exactly what you want is required to work as long as the data in the file was created in essentially the same way you're trying to read it back in. If it was created on another machine or with a different compiler (or even the same compiler with different flags) you have a fair number of potential problems -- you might get differences in endianness, padding in the struct, and so on.
Edit: Given that you don't know whether the structs have been written out in the format expected by the compiler, you not only need to read the structs one at a time -- you really need to read the items in the structs one at a time, then put each into a temporary struct
, and finally add that filled-in struct
to your collection.
Fortunately, you can overload operator>>
to automate most of this. It doesn't improve speed (for example), but it can keep your code cleaner:
struct whatever {
int x, y, z;
char stuff[672-3*sizeof(int)];
friend std::istream &operator>>(std::istream &is, whatever &w) {
is >> w.x >> w.y >> w.z;
return is.read(w.stuff, sizeof(w.stuff);
}
};
int main(int argc, char **argv) {
std::vector<whatever> data;
assert(argc>1);
std::ifstream infile(argv[1]);
std::copy(std::istream_iterator<whatever>(infile),
std::istream_iterator<whatever>(),
std::back_inserter(data));
return 0;
}
What is the byte alignment of the elements in a std::vector char ?
The elements of the container have at least the alignment required for them in that implementation: if int
is 4-aligned in your implementation, then each element of a vector<int>
is an int
and therefore is 4-aligned. I say "if" because there's a difference between size and alignment requirements - just because int
has size 4 doesn't necessarily mean that it must be 4-aligned, as far as the standard is concerned. It's very common, though, since int
is usually the word size of the machine, and most machines have advantages for memory access on word boundaries. So it makes sense to align int
even if it's not strictly necessary. On x86, for example, you can do unaligned word-sized memory access, but it's slower than aligned. On ARM unaligned word operations are not allowed, and typically crash.
vector
guarantees contiguous storage, so there won't be any "padding" in between the first and second element of a vector<char>
, if that's what you're concerned about. The specific requirement for std::vector
is that for 0 < n < vec.size()
, &vec[n] == &vec[0] + n
.
[Edit: this bit is now irrelevant, the questioner has disambiguated: The container itself will usually have whatever alignment is required for a pointer, regardless of what the value_type is. That's because the vector itself would not normally incorporate any elements, but will have a pointer to some dynamically-allocated memory with the elements in that. This isn't explicitly required, but it's a predictable implementation detail.]
Every object in C++ is 1-aligned, the only things that aren't are bitfields, and the elements of the borderline-crazy special case that is vector<bool>
. So you can rest assured that your hope for std::vector<char>
is well-founded. Both the vector and its first element will probably also be 4-aligned ;-)
As for how they get aligned - the same way anything in C++ gets aligned. When memory is allocated from the heap, it is required to be aligned sufficiently for any object that can fit into the allocation. When objects are placed on the stack, the compiler is responsible for designing the stack layout. The calling convention will specify the alignment of the stack pointer on function entry, then the compiler knows the size and alignment requirement of each object it lays down, so it knows whether the stack needs any padding to bring the next object to the correct alignment.
Memory alignment of Armadillo vectors vec/fvec
The Armadillo do not seems to talk about this point in the documentation so it is left unspecified. Thus, vector data are likely not ensured to be 32-bytes aligned.
However, you do not need vector data to be aligned to load them in AVX registers: you can use the unaligned load intrinsic _mm256_loadu_ps
. AFAIK, the performance of _mm256_load_ps
and _mm256_loadu_ps
is about the same on relatively-new x86 processors.
How to create std::vector of char/std::byte where first byte is aligned to 16 byes, but there is no padding?
Aligning the data in a vector ain't provided by default. Not even for aligned classes.
The best way of doing alignment is with the aligned_allocator of boost.
Unfortunately, it doesn't prevent padding, it even overallocates to adapt the pointer on the alignment. From C++17, it can used aligned new (see std::aligned_val_t
overloads). However, all implementations I've seen actually use the same trick.
An alternative is allocating a whole page at once, and do your own memory management with a custom allocator. You can do it, though it might take a lot of time to do correctly.
Usage of alignas in template argument of std::vector
If alignas(32)double
compiled, it would require that each element separately had 32-byte alignment, i.e. pad each double out to 32 bytes, completely defeating SIMD. (I don't think it will compile, but similar things with GNU C typedef double da __attribute__((aligned(32)))
do compile that way, with sizeof(da) == 32
.)
See Modern approach to making std::vector allocate aligned memory for working code.
As of C++17, std::vector<__m256d>
would work, but is usually not what you want because it makes scalar access a pain.
C++ sucks for this in my experience, although there might be a standard (or Boost) allocator that takes an over-alignment you can use as the second (usually defaulted) template param.
std::vector<double, some_aligned_allocator<32> >
still isn't type-compatible with normal std::vector
, which makes sense because any function that might reallocated it has to maintain alignment. But unfortunately that makes it not type-compatible even for passing to functions that only want read-only access to a std::vector
of double
elements.
Cost of misalignment
For a lot of cases the misalignment is only a couple percent worse than aligned, for AVX/AVX2 loops over an array if data's coming from L3 cache or RAM (on recent Intel CPUs); only with 64-byte vectors do you get a significantly bigger penalty (like 15% or so even when memory bandwidth is still the bottleneck.) You'd hope that the CPU core would have time to deal with it and keep the same number of outstanding off-core transactions in flight. But it doesn't.
For data hot in L1d, misalignment could hurt more even with 32-byte vectors.
In x86-64 code, alignof(max_align_t)
is 16 on mainstream C++ implementations, so in practice even a vector<double>
will end up aligned by 16 at least because the underlying allocator used by new
always aligns at least that much. But that's very often an odd multiple of 16, at least on GNU/Linux. Glibc's allocator (also used by malloc) for large allocations uses mmap
to get a whole range of pages, but it reserves the first 16 bytes for bookkeeping info. This is unfortunate for AVX and AVX-512 because it means your arrays are always misaligned unless you used aligned allocations. (How to solve the 32-byte-alignment issue for AVX load/store operations?)
Mainstream std::vector
implementations are also inefficient when they have to grow: C++ doesn't provide a realloc
equivalent that's compatible with new/delete, so it always has to allocate more space and copy to the start. Never even trying to allocate more space contiguous with the existing mapping (which would be safe even for non-trivially-copyable types), and not using implementation-specific tricks like Linux mremap
to map the same physical pages to a different virtual address without having to copy all those mega/gigabytes. The fact that C++ allows code to redefine operator new
means library implementations of std::vector can't just use a better allocator, either. All of this is a non-problem if you .reserve
the size you're going to need, but it is pretty dumb.
Related Topics
C and C++ Programming on Ubuntu 11.10
Which Headers in the C++ Standard Library Are Guaranteed to Include Another Header
Using Cmake with Multiple Compilers for the Same Language
C++ Double Address Operator? (&&)
Simple Object Detection Using Opencv and MAChine Learning
Brace-Enclosed Initializer List Constructor
Multithreaded Rendering on Opengl
Vector Push_Back Calling Copy_Constructor More Than Once
How to Store Variadic Template Arguments
Std::String::C_Str() and Temporaries
Is There Any Guarantee of Alignment of Address Return by C++'s New Operation
Mixing C++11 Atomics and Openmp
Differencebetween a Const Reference and Normal Parameter
Accessing Certain Pixel Rgb Value in Opencv
Why Is Allocator::Rebind Necessary When We Have Template Template Parameters
Why Should I Use the "Using" Keyword to Access My Base Class Method