Questions About Hinnant's Stack Allocator

Questions about Hinnant's stack allocator

I've been using Howard Hinnant's stack allocator and it works
like a charm, but some details of the implementation are a little
unclear to me.

Glad it's been working for you.

1. Why are global operators new and delete used? The allocate() and deallocate() member functions use ::operator new and
::operator delete respectively. Similarly, the member function
construct() uses the global placement new. Why not allow for any
user-defined global or class-specific overloads?

There's no particular reason. Feel free to modify this code in whatever way works best for you. This was meant to be more of an example, and it is by no means perfect. The only requirements are that the allocator and deallocator supply properly aligned memory, and that the construct member constructs an argument.

In C++11, the construct (and destroy) members are optional. I would encourage you to remove them from the allocator if you're operating in an environment that supplies allocator_traits. To find out, just remove them and see if things still compile.

2. Why is alignment set to hard-coded 16 bytes instead of std::alignment_of<T>?

std::alignment_of<T> would probably work fine. I was probably being paranoid that day.

3. Why do the constructors and max_size have a throw() exception specification? Isn't this discouraged (see e.g. More Effective C++
Item 14.)? Is it really necessary to terminate and abort when an
exception occurs in the allocator? Does this change with the new C++11
noexcept keyword?

These members just won't ever throw. For C++11 I should update them to noexcept. In C++11 it becomes more important to decorate things with noexcept, especially special members. In C++11 one can detect whether an expression is nothrow or not. Code can branch depending on that answer. Code that is known to be nothrow is more likely to cause generic code to branch to a more efficient path. std::move_if_noexcept is the canonical example in C++11.

Don't use throw(type1, type2) ever. It has been deprecated in C++11.

Do use throw() when you really want to say: This will never throw, and if I'm wrong, terminate the program so I can debug it. throw() is also deprecated in C++11, but has a drop-in replacement: noexcept.

4. The construct() member function would be an ideal candidate for perfect forwarding (to the constructor that is being called). Is this
the way to write C++11 conformant allocators?

Yes. However allocator_traits will do it for you. Let it. The std::lib has already debugged that code for you. C++11 containers will call allocator_traits<YourAllocator>::construct(your_allocator, pointer, args...). If your allocator implements these functions, allocator_traits will call your implementation, else it calls a debugged, efficient, default implementation.

5. What other changes are necessary to make the current code C++11 conformant?

To tell you the truth, this allocator isn't really C++03 or C++11 conformant. When you copy an allocator, the original and the copy are supposed to be equal to each other. In this design, that is never true. However this thing still just happens to work in many contexts.

If you want to make it strictly conforming, you need another level of indirection such that copies will point to the same buffer.

Aside from that, C++11 allocators are so much easier to build than C++98/03 allocators. Here's the minimum you must do:

template <class T>
class MyAllocator
{
public:
    typedef T value_type;

    MyAllocator() noexcept;  // only required if used
    MyAllocator(const MyAllocator&) noexcept;  // copies must be equal
    MyAllocator(MyAllocator&&) noexcept;  // not needed if copy ctor is good enough
    template <class U>
        MyAllocator(const MyAllocator<U>& u) noexcept;  // requires: *this == MyAllocator(u)

    value_type* allocate(std::size_t);
    void deallocate(value_type*, std::size_t) noexcept;
};

template <class T, class U>
bool operator==(const MyAllocator<T>&, const MyAllocator<U>&) noexcept;

template <class T, class U>
bool operator!=(const MyAllocator<T>&, const MyAllocator<U>&) noexcept;

You might optionally consider making MyAllocator Swappable and put the following nested type in the allocator:

typedef std::true_type propagate_on_container_swap;

There's a few other knobs like that you can tweak on C++11 allocators. But all of the knobs have reasonable defaults.

Update

Above I note that my stack allocator is not conforming due to the fact that copies are not equal. I've decided to update this allocator to a conforming C++11 allocator. The new allocator is called short_allocator and is documented here.

The short_allocator differs from the stack allocator in that the "internal" buffer is no longer internal to the allocator, but is now a separate "arena" object that can be located on the local stack, or given thread or static storage duration. The arena isn't thread safe though so watch out for that. You could make it thread safe if you wanted to, but that has diminishing returns (eventually you'll reinvent malloc).

This is conforming because copies of allocators all point to the same external arena. Note that the unit of N is now bytes, not number of T.

One could convert this C++11 allocator to a C++98/03 allocator by adding the C++98/03 boiler-plate (the typedefs, the construct member, the destroy member, etc.). A tedious, but straightforward task.

The answers to this question for the new short_allocator remain unchanged.

Hinnant's stack allocator and exceptions

The gcc and clang implementations follow a specification called the Itanium ABI. It says, among other things:

Storage is needed for exceptions being thrown. This storage must persist while stack is being unwound, since it will be used by the handler, and must be thread-safe. Exception object storage will therefore normally be allocated in the heap, although implementations may provide an emergency buffer to support throwing bad_alloc exceptions under low memory conditions (see Section 3.3.1).
Memory will be allocated by the __cxa_allocate_exception runtime library routine. This routine is passed the size of the exception object to be thrown (not including the size of the __cxa_exception header), and returns a pointer to the temporary space for the exception object. It will allocate the exception memory on the heap if possible. If heap allocation fails, an implementation may use other backup mechanisms (see Section 3.4.1).
If __cxa_allocate_exception cannot allocate an exception object under these constraints, it calls terminate().

The libc++abi implementation of __cxa_allocate_exception is here. It will first go to the heap, and if that fails, try an emergency backup buffer that is statically allocated within the libc++abi.dylib. If both the heap, and the emergency-stash fail to allocate enough memory for the arbitrary-sized user-created exception, terminate() is called.

Hinnant's stack allocator with boost rtrees: compilation failure

The heart of the issue is essentially a circular dependency.

Constructing the RTree causes the rtree<...> template instantiation which includes a typedef node_pointer = allocators_type::node_pointer, which triggers the instantiation of allocators_type, i.e. detail::rtree::allocators<...>, which has a base class of detail::rtree::node_alloc<...>, which in its definition rebinds the allocator to the node type. The node type is a variant of detail::rtree::variant_leaf<...> and detail::rtree::variant_internal_node<...>.

But stack_alloc needs the sizeof(T), so both templates included in the variant types get instantiated, and when instantiating variant_internal_node, it needs Allocators::node_pointer, so Allocators must be instantiated, but isn't that what we're in the middle of instantiating!

I suggest trying short_alloc and passing the allocator to the container. Because it separates the storage from the allocator type, it should not require completeness of the template type, breaking the circle.

Hinnant's short_alloc and alignment guarantees

This code was written before I had std::max_align_t in my toolbox (which now lives in <cstddef>). I would now write this as:

static const std::size_t alignment = alignof(std::max_align_t);

which on my system is exactly equivalent to the current code, but now more portable. This is the alignment which new and malloc are guaranteed to return. And once you have this "maximally aligned" buffer, you can put an array of any one type in it. But you can't use the same arena for different types (at least not different types that have different alignment requirements). And for that reason, perhaps it would be best to template arena on a second size_t, which is equal to the the alignof(T). In that way you can prevent the same arena from being accidentally used by types with differing alignment requirements:

arena<N, alignof(T)>& a_;

Assuming each allocation from arena has the same alignment requirements, and assuming the buffer is maximally aligned, then every allocation from the buffer will be suitably aligned for T.

E.g. on my system alignof(std::max_align_t) == 16. A buffer with this alignment can hold arrays of:

Types with alignof == 1.
Types with alignof == 2.
Types with alignof == 4.
Types with alignof == 8.
Types with alignof == 16.

As some environment may support types that have "super alignment" requirements, an added safety precaution would be to add (say within short_alloc):

static_assert(alignof(T) <= alignof(std::max_align_t), "");

If you are super paranoid you could also check that alignof(T) is a power of 2, though the C++ standard itself guarantees that this will always be true ([basic.align]/p4).

Update

I have taken a closer look at this problem and believe that rounding the requested allocation size up to the next alignment (as the OP suggested) is the best solution. I have updated "short_alloc" to do that on my website.

template <std::size_t N>
char*
arena<N>::allocate(std::size_t n)
{
    assert(pointer_in_buffer(ptr_) && "short_alloc has outlived arena");
    n = align_up(n);
    if (buf_ + N - ptr_ >= n)
    {
        char* r = ptr_;
        ptr_ += n;
        return r;
    }
    return static_cast<char*>(::operator new(n));
}

For special situations where you know that you don't need maximally aligned allocations (e.g. vector<unsigned char>), one can simply tweak alignment appropriately. And one could also have short_alloc::allocate pass alignof(T) to arena::allocate and assert(requested_align <= alignment)

template <std::size_t N>
char*
arena<N>::allocate(std::size_t n, std::size_t requested_align)
{
    assert(requested_align <= alignment);
    assert(pointer_in_buffer(ptr_) && "short_alloc has outlived arena");
    n = align_up(n);
    if (buf_ + N - ptr_ >= n)
    {
        char* r = ptr_;
        ptr_ += n;
        return r;
    }
    return static_cast<char*>(::operator new(n));
}

This would give you confidence that if you adjusted alignment downward, you didn't adjust it too far downward.

Update Again!

I've updated the description and code of this allocator quite a bit because of this excellent question (I had neglected this code for years).

The alignment check mentioned in the previous update is now done at compile-time (compile-time errors are always superior to run-time errors, even asserts).

Both the arena and short_alloc is now templated on alignment so that you can easily customize the alignment requirements you anticipate (and if you guess too low it is caught at compile-time). This template parameter is defaulted to alignof(std::max_align_t).

The arena::allocate function now looks like:

template <std::size_t N, std::size_t alignment>
template <std::size_t ReqAlign>
char*
arena<N, alignment>::allocate(std::size_t n)
{
    static_assert(ReqAlign <= alignment, "alignment is too small for this arena");
    assert(pointer_in_buffer(ptr_) && "short_alloc has outlived arena");
    auto const aligned_n = align_up(n);
    if (buf_ + N - ptr_ >= aligned_n)
    {
        char* r = ptr_;
        ptr_ += aligned_n;
        return r;
    }
    return static_cast<char*>(::operator new(n));
}

Thanks to alias templates, this allocator is easier to use than ever. For example:

// Create a vector<T> template with a small buffer of 200 bytes.
//   Note for vector it is possible to reduce the alignment requirements
//   down to alignof(T) because vector doesn't allocate anything but T's.
//   And if we're wrong about that guess, it is a comple-time error, not
//   a run time error.
template <class T, std::size_t BufSize = 200>
using SmallVector = std::vector<T, short_alloc<T, BufSize, alignof(T)>>;

// Create the stack-based arena from which to allocate
SmallVector<int>::allocator_type::arena_type a;
// Create the vector which uses that arena.
SmallVector<int> v{a};

This isn't necessarily the final word in such allocators. But hopefully this is a solid foundation upon which you can build your custom allocators.

Fast move assignment with Howard Hinnant's short_alloc

Two different arena objects may have different lifetime. Two different short_alloc objects that depend on different arena objects manage memory with different lifetime. As a result, two std::vector objects with different short_alloc objects can't simply move pointers between them.

Your hack won't work, since it is the pointer that is allocated either from the arena or by new[]. Your hack assumes that the allocator becomes a heap allocator for big vectors, but this is not the case. This is not something that an allocator knows without inspecting the requested size or the freed pointer.

The correct solution is to replace the allocator object with the move operator. To do that, short_alloc should define:

   using propagate_on_container_move_assignment = std::true_type;
private:
   arena_type * a_;  // <--- instead of reference
public:
   short_alloc(const short_alloc&) = default;
   // !!! Don't delete the move assignment !!!
   // short_alloc& operator=(const short_alloc&) = delete;

This will make the move operator work as expected. It will start using the other arena after the move.

In general, such allocation techniques are very dangerous and should be rarely used, due to their high risk of memory-related bugs. For example, if one is to return a vector from a function, there is a high risk for it to refer to an arena on a scope that is about to be exited.

With my proposed change, the risk factor is slightly higher. The problem of the arena going out-of-scope now also goes to pass-by-reference vectors. The problem of the arena going out-of-scope also exists when the arena is defined in an inner block.

This behavior of the other arena going out of scope can surprise a programmer, and introduce bugs. That's why I am not a fan of this solution. However, sometimes one is willing to write dangerous code in time-critical sections (after profiling and analyzing).

As the question suggests, it is possible to mark a short_alloc allocator as a heap allocator when applicable. It can be marked this way right after the first allocation that uses new[]. This will work well with std::vector, since it holds only one chunk of memory between method calls. Despite working with std::vector, it breaks with most other containers, because most of them use nodes, such as std::map and std::unordered_set.

The problem is that some of the nodes come from the arena and some from the heap. With the suggested operator== that returns true if new[] is used, a move from std::map will make some nodes from an unrelated arena move the the target std::map. Very unexpected, and a bad idea. This will result in a std::map object that contains nodes from its own arena and from an unrelated arena. The nodes of the unrelated arena will never be freed by std::map. These bad nodes will be freed only when their allocating arena dies.

The technique proposed in the question is completely broken. It results with inconsistent allocations in surprising ways for almost anything but std::vector. I will strongly advise against it.

Issue with stack allocator with embedded arena

When you move-construct the second container, the container move-constructs the allocator and takes over the pointers. When the second container dies, the it tries to deallocate the pointers and does it wrongly, causing UB in the pointer comparison in deallocate().

The "allocator" does not satisfy the allocator requirements, particularly:

X a1(move(a)) Post: a1 equals the prior value of a.

Two allocators are "equal" if one can deallocate memory allocated by the other, which is blatantly not the case here.

Issue with stack allocator with embedded arena

The "allocator" does not satisfy the allocator requirements, particularly:

X a1(move(a)) Post: a1 equals the prior value of a.

Two allocators are "equal" if one can deallocate memory allocated by the other, which is blatantly not the case here.

Getting Howard Hinnant's short_alloc (C++11 version) to compile in Visual C++ 2015

The simplest hack that I can think of is replacing

short_alloc& operator=(const short_alloc&) = delete;

with

short_alloc& operator=(const short_alloc&) 
{ 
    assert(false && "this should never be called"); 
    return *this; 
};

It looks like a pretty dangerous hack, but it's actually not that bad in this particular case, and here's why:

The reason the original version doesn't compile in VC++ is that its standard library implementation of std::vector's move assignment operator makes the classic mistake of testing std::allocator_traits<...>::propagate_on_container_move_assignment::value using an if() statement.

It does the proper check and doesn't assign the allocator if the trait value is false (and moves the elements individually to the other side if the allocators are different, as requested by the standard), but the code on the if() branch still has to compile, even though it will never be reached for this type of allocator.

So, that assignment operator will never be called by the container's implementation at runtime, and that's why the hack is safe in this particular case.

(The funniest thing is that, one line below that if(), the actual moving is implemented properly with a helper function using tag dispatch...)

This is based on the standard library implementation that comes with Visual C++ 2013 Update 4.

Update: As reported by the OP in the comments, VC14 CTP5 has the same problem.

Update 2: As indicated in the comments for the bug report, the fix for this issue will be available in the final version of Visual C++ 2015.

Questions About Hinnant's Stack Allocator