Questions about Hinnant's stack allocator
I've been using Howard Hinnant's stack allocator and it works
like a charm, but some details of the implementation are a little
unclear to me.
Glad it's been working for you.
1. Why are global operators
new
anddelete
used? Theallocate()
anddeallocate()
member functions use::operator new
and
::operator delete
respectively. Similarly, the member function
construct()
uses the global placement new. Why not allow for any
user-defined global or class-specific overloads?
There's no particular reason. Feel free to modify this code in whatever way works best for you. This was meant to be more of an example, and it is by no means perfect. The only requirements are that the allocator and deallocator supply properly aligned memory, and that the construct member constructs an argument.
In C++11, the construct (and destroy) members are optional. I would encourage you to remove them from the allocator if you're operating in an environment that supplies allocator_traits
. To find out, just remove them and see if things still compile.
2. Why is alignment set to hard-coded 16 bytes instead of
std::alignment_of<T>
?
std::alignment_of<T>
would probably work fine. I was probably being paranoid that day.
3. Why do the constructors and
max_size
have athrow()
exception specification? Isn't this discouraged (see e.g. More Effective C++
Item 14.)? Is it really necessary to terminate and abort when an
exception occurs in the allocator? Does this change with the new C++11
noexcept
keyword?
These members just won't ever throw. For C++11 I should update them to noexcept
. In C++11 it becomes more important to decorate things with noexcept
, especially special members. In C++11 one can detect whether an expression is nothrow or not. Code can branch depending on that answer. Code that is known to be nothrow is more likely to cause generic code to branch to a more efficient path. std::move_if_noexcept
is the canonical example in C++11.
Don't use throw(type1, type2)
ever. It has been deprecated in C++11.
Do use throw()
when you really want to say: This will never throw, and if I'm wrong, terminate the program so I can debug it. throw()
is also deprecated in C++11, but has a drop-in replacement: noexcept
.
4. The
construct()
member function would be an ideal candidate for perfect forwarding (to the constructor that is being called). Is this
the way to write C++11 conformant allocators?
Yes. However allocator_traits
will do it for you. Let it. The std::lib has already debugged that code for you. C++11 containers will call allocator_traits<YourAllocator>::construct(your_allocator, pointer, args...)
. If your allocator implements these functions, allocator_traits will call your implementation, else it calls a debugged, efficient, default implementation.
5. What other changes are necessary to make the current code C++11 conformant?
To tell you the truth, this allocator isn't really C++03 or C++11 conformant. When you copy an allocator, the original and the copy are supposed to be equal to each other. In this design, that is never true. However this thing still just happens to work in many contexts.
If you want to make it strictly conforming, you need another level of indirection such that copies will point to the same buffer.
Aside from that, C++11 allocators are so much easier to build than C++98/03 allocators. Here's the minimum you must do:
template <class T>
class MyAllocator
{
public:
typedef T value_type;
MyAllocator() noexcept; // only required if used
MyAllocator(const MyAllocator&) noexcept; // copies must be equal
MyAllocator(MyAllocator&&) noexcept; // not needed if copy ctor is good enough
template <class U>
MyAllocator(const MyAllocator<U>& u) noexcept; // requires: *this == MyAllocator(u)
value_type* allocate(std::size_t);
void deallocate(value_type*, std::size_t) noexcept;
};
template <class T, class U>
bool operator==(const MyAllocator<T>&, const MyAllocator<U>&) noexcept;
template <class T, class U>
bool operator!=(const MyAllocator<T>&, const MyAllocator<U>&) noexcept;
You might optionally consider making MyAllocator
Swappable and put the following nested type in the allocator:
typedef std::true_type propagate_on_container_swap;
There's a few other knobs like that you can tweak on C++11 allocators. But all of the knobs have reasonable defaults.
Update
Above I note that my stack allocator is not conforming due to the fact that copies are not equal. I've decided to update this allocator to a conforming C++11 allocator. The new allocator is called short_allocator and is documented here.
The short_allocator differs from the stack allocator in that the "internal" buffer is no longer internal to the allocator, but is now a separate "arena" object that can be located on the local stack, or given thread or static storage duration. The arena
isn't thread safe though so watch out for that. You could make it thread safe if you wanted to, but that has diminishing returns (eventually you'll reinvent malloc).
This is conforming because copies of allocators all point to the same external arena
. Note that the unit of N
is now bytes, not number of T
.
One could convert this C++11 allocator to a C++98/03 allocator by adding the C++98/03 boiler-plate (the typedefs, the construct member, the destroy member, etc.). A tedious, but straightforward task.
The answers to this question for the new short_allocator remain unchanged.
Hinnant's stack allocator and exceptions
The gcc and clang implementations follow a specification called the Itanium ABI. It says, among other things:
Storage is needed for exceptions being thrown. This storage must persist while stack is being unwound, since it will be used by the handler, and must be thread-safe. Exception object storage will therefore normally be allocated in the heap, although implementations may provide an emergency buffer to support throwing
bad_alloc
exceptions under low memory conditions (see Section 3.3.1).Memory will be allocated by the
__cxa_allocate_exception
runtime library routine. This routine is passed the size of the exception object to be thrown (not including the size of the__cxa_exception
header), and returns a pointer to the temporary space for the exception object. It will allocate the exception memory on the heap if possible. If heap allocation fails, an implementation may use other backup mechanisms (see Section 3.4.1).If
__cxa_allocate_exception
cannot allocate an exception object under these constraints, it callsterminate()
.
The libc++abi implementation of __cxa_allocate_exception
is here. It will first go to the heap, and if that fails, try an emergency backup buffer that is statically allocated within the libc++abi.dylib. If both the heap, and the emergency-stash fail to allocate enough memory for the arbitrary-sized user-created exception, terminate()
is called.
Hinnant's stack allocator with boost rtrees: compilation failure
The heart of the issue is essentially a circular dependency.
Constructing the RTree
causes the rtree<...>
template instantiation which includes a typedef node_pointer = allocators_type::node_pointer
, which triggers the instantiation of allocators_type
, i.e. detail::rtree::allocators<...>
, which has a base class of detail::rtree::node_alloc<...>
, which in its definition rebinds the allocator to the node type. The node type is a variant of detail::rtree::variant_leaf<...>
and detail::rtree::variant_internal_node<...>
.
But stack_alloc
needs the sizeof(T)
, so both templates included in the variant
types get instantiated, and when instantiating variant_internal_node
, it needs Allocators::node_pointer
, so Allocators
must be instantiated, but isn't that what we're in the middle of instantiating!
I suggest trying short_alloc
and passing the allocator to the container. Because it separates the storage from the allocator type, it should not require completeness of the template type, breaking the circle.
Hinnant's short_alloc and alignment guarantees
This code was written before I had std::max_align_t
in my toolbox (which now lives in <cstddef>
). I would now write this as:
static const std::size_t alignment = alignof(std::max_align_t);
which on my system is exactly equivalent to the current code, but now more portable. This is the alignment which new
and malloc
are guaranteed to return. And once you have this "maximally aligned" buffer, you can put an array of any one type in it. But you can't use the same arena
for different types (at least not different types that have different alignment requirements). And for that reason, perhaps it would be best to template arena
on a second size_t
, which is equal to the the alignof(T)
. In that way you can prevent the same arena
from being accidentally used by types with differing alignment requirements:
arena<N, alignof(T)>& a_;
Assuming each allocation from arena
has the same alignment requirements, and assuming the buffer is maximally aligned, then every allocation from the buffer will be suitably aligned for T
.
E.g. on my system alignof(std::max_align_t) == 16
. A buffer with this alignment can hold arrays of:
- Types with
alignof == 1
. - Types with
alignof == 2
. - Types with
alignof == 4
. - Types with
alignof == 8
. - Types with
alignof == 16
.
As some environment may support types that have "super alignment" requirements, an added safety precaution would be to add (say within short_alloc
):
static_assert(alignof(T) <= alignof(std::max_align_t), "");
If you are super paranoid you could also check that alignof(T)
is a power of 2, though the C++ standard itself guarantees that this will always be true ([basic.align]/p4).
Update
I have taken a closer look at this problem and believe that rounding the requested allocation size up to the next alignment
(as the OP suggested) is the best solution. I have updated "short_alloc" to do that on my website.
template <std::size_t N>
char*
arena<N>::allocate(std::size_t n)
{
assert(pointer_in_buffer(ptr_) && "short_alloc has outlived arena");
n = align_up(n);
if (buf_ + N - ptr_ >= n)
{
char* r = ptr_;
ptr_ += n;
return r;
}
return static_cast<char*>(::operator new(n));
}
For special situations where you know that you don't need maximally aligned allocations (e.g. vector<unsigned char>
), one can simply tweak alignment
appropriately. And one could also have short_alloc::allocate
pass alignof(T)
to arena::allocate
and assert(requested_align <= alignment)
template <std::size_t N>
char*
arena<N>::allocate(std::size_t n, std::size_t requested_align)
{
assert(requested_align <= alignment);
assert(pointer_in_buffer(ptr_) && "short_alloc has outlived arena");
n = align_up(n);
if (buf_ + N - ptr_ >= n)
{
char* r = ptr_;
ptr_ += n;
return r;
}
return static_cast<char*>(::operator new(n));
}
This would give you confidence that if you adjusted alignment
downward, you didn't adjust it too far downward.
Update Again!
I've updated the description and code of this allocator quite a bit because of this excellent question (I had neglected this code for years).
The alignment check mentioned in the previous update is now done at compile-time (compile-time errors are always superior to run-time errors, even asserts).
Both the arena
and short_alloc
is now templated on alignment so that you can easily customize the alignment requirements you anticipate (and if you guess too low it is caught at compile-time). This template parameter is defaulted to alignof(std::max_align_t)
.
The arena::allocate
function now looks like:
template <std::size_t N, std::size_t alignment>
template <std::size_t ReqAlign>
char*
arena<N, alignment>::allocate(std::size_t n)
{
static_assert(ReqAlign <= alignment, "alignment is too small for this arena");
assert(pointer_in_buffer(ptr_) && "short_alloc has outlived arena");
auto const aligned_n = align_up(n);
if (buf_ + N - ptr_ >= aligned_n)
{
char* r = ptr_;
ptr_ += aligned_n;
return r;
}
return static_cast<char*>(::operator new(n));
}
Thanks to alias templates, this allocator is easier to use than ever. For example:
// Create a vector<T> template with a small buffer of 200 bytes.
// Note for vector it is possible to reduce the alignment requirements
// down to alignof(T) because vector doesn't allocate anything but T's.
// And if we're wrong about that guess, it is a comple-time error, not
// a run time error.
template <class T, std::size_t BufSize = 200>
using SmallVector = std::vector<T, short_alloc<T, BufSize, alignof(T)>>;
// Create the stack-based arena from which to allocate
SmallVector<int>::allocator_type::arena_type a;
// Create the vector which uses that arena.
SmallVector<int> v{a};
This isn't necessarily the final word in such allocators. But hopefully this is a solid foundation upon which you can build your custom allocators.
Fast move assignment with Howard Hinnant's short_alloc
Two different arena
objects may have different lifetime. Two different short_alloc
objects that depend on different arena
objects manage memory with different lifetime. As a result, two std::vector
objects with different short_alloc objects can't simply move pointers between them.
Your hack won't work, since it is the pointer that is allocated either from the arena
or by new[]
. Your hack assumes that the allocator becomes a heap allocator for big vectors, but this is not the case. This is not something that an allocator knows without inspecting the requested size or the freed pointer.
The correct solution is to replace the allocator object with the move operator. To do that, short_alloc
should define:
using propagate_on_container_move_assignment = std::true_type;
private:
arena_type * a_; // <--- instead of reference
public:
short_alloc(const short_alloc&) = default;
// !!! Don't delete the move assignment !!!
// short_alloc& operator=(const short_alloc&) = delete;
This will make the move operator work as expected. It will start using the other arena after the move.
In general, such allocation techniques are very dangerous and should be rarely used, due to their high risk of memory-related bugs. For example, if one is to return a vector from a function, there is a high risk for it to refer to an
arena
on a scope that is about to be exited.With my proposed change, the risk factor is slightly higher. The problem of the arena
going out-of-scope now also goes to pass-by-reference vectors. The problem of the arena
going out-of-scope also exists when the arena
is defined in an inner block.
This behavior of the other arena
going out of scope can surprise a programmer, and introduce bugs. That's why I am not a fan of this solution. However, sometimes one is willing to write dangerous code in time-critical sections (after profiling and analyzing).
As the question suggests, it is possible to mark a
short_alloc
allocator as a heap allocator when applicable. It can be marked this way right after the first allocation that uses new[]
. This will work well with std::vector
, since it holds only one chunk of memory between method calls. Despite working with std::vector
, it breaks with most other containers, because most of them use nodes, such as std::map
and std::unordered_set
. The problem is that some of the nodes come from the arena
and some from the heap. With the suggested operator==
that returns true
if new[]
is used, a move from std::map
will make some nodes from an unrelated arena move the the target std::map
. Very unexpected, and a bad idea. This will result in a std::map
object that contains nodes from its own arena
and from an unrelated arena
. The nodes of the unrelated arena
will never be freed by std::map
. These bad nodes will be freed only when their allocating arena
dies.
The technique proposed in the question is completely broken. It results with inconsistent allocations in surprising ways for almost anything but std::vector
. I will strongly advise against it.
Issue with stack allocator with embedded arena
When you move-construct the second container, the container move-constructs the allocator and takes over the pointers. When the second container dies, the it tries to deallocate the pointers and does it wrongly, causing UB in the pointer comparison in deallocate()
.
The "allocator" does not satisfy the allocator requirements, particularly:
X a1(move(a))
Post:a1
equals the prior value ofa
.
Two allocators are "equal" if one can deallocate memory allocated by the other, which is blatantly not the case here.
Issue with stack allocator with embedded arena
When you move-construct the second container, the container move-constructs the allocator and takes over the pointers. When the second container dies, the it tries to deallocate the pointers and does it wrongly, causing UB in the pointer comparison in deallocate()
.
The "allocator" does not satisfy the allocator requirements, particularly:
X a1(move(a))
Post:a1
equals the prior value ofa
.
Two allocators are "equal" if one can deallocate memory allocated by the other, which is blatantly not the case here.
Getting Howard Hinnant's short_alloc (C++11 version) to compile in Visual C++ 2015
The simplest hack that I can think of is replacing
short_alloc& operator=(const short_alloc&) = delete;
with
short_alloc& operator=(const short_alloc&)
{
assert(false && "this should never be called");
return *this;
};
It looks like a pretty dangerous hack, but it's actually not that bad in this particular case, and here's why:
The reason the original version doesn't compile in VC++ is that its standard library implementation of std::vector
's move assignment operator makes the classic mistake of testing std::allocator_traits<...>::propagate_on_container_move_assignment::value
using an if()
statement.
It does the proper check and doesn't assign the allocator if the trait value is false
(and moves the elements individually to the other side if the allocators are different, as requested by the standard), but the code on the if()
branch still has to compile, even though it will never be reached for this type of allocator.
So, that assignment operator will never be called by the container's implementation at runtime, and that's why the hack is safe in this particular case.
(The funniest thing is that, one line below that if()
, the actual moving is implemented properly with a helper function using tag dispatch...)
This is based on the standard library implementation that comes with Visual C++ 2013 Update 4.
Update: As reported by the OP in the comments, VC14 CTP5 has the same problem.
Update 2: As indicated in the comments for the bug report, the fix for this issue will be available in the final version of Visual C++ 2015.
Related Topics
How to Enforce Copy Elision, Why It Won't Work with Deleted Copy Constructor
Order of Constructor Call in Virtual Inheritance
Why Does Gcc Allow Char Array Initialization with String Literal Larger Than Array
Static Member Access in Constant Expressions
How to Check That an Element Is in a Std::Set
Why Do We Require Requires Requires
What Is a Scalar Object in C++
What Happens If 'Throw' Fails to Allocate Memory for Exception Object
How to Prevent Stack Allocation of an Object and Only Allow It to Be Instantiated with 'New'
How to Use Identical Names for Fields and Constructor Parameters
Visual Studio: Run C++ Project Post-Build Event Even If Project Is Up-To-Date
Linking with Clang++ on Os X Generates Lots of Symbol Not Found Errors