Implementing a Std::Vector Like Container Without Undefined Behavior

Implementing a std::vector like container without undefined behavior

This is topic that is under active discussion, we can see this in the proposal p0593: Implicit creation of objects for low-level object manipulation. This is a pretty solid discussion of the issues and why they are not fixable without changes. If you have different approaches or strong views on the approaches being considered you may want to reach out to the proposals authors.

It includes this discussion:

2.3. Dynamic construction of arrays

Consider this program that attempts to implement a type like std::vector (with many details omitted for brevity):

....

In practice, this code works across a range of existing
implementations, but according to the C++ object model, undefined
behavior occurs at points #a, #b, #c, #d, and #e, because they attempt
to perform pointer arithmetic on a region of allocated storage that
does not contain an array object.

At locations #b, #c, and #d, the arithmetic is performed on a char*,
and at locations #a, #e, and #f, the arithmetic is performed on a T*.
Ideally, a solution to this problem would imbue both calculations with
defined behavior.


  1. Approach

The above snippets have a common theme: they attempt to use objects that they never created. Indeed, there is a family of types for which programmers assume they do not need to explicitly create objects. We propose to identify these types, and carefully carve out rules that remove the need to explicitly create such objects, by instead creating them implicitly.

The approach using the adc union has the problem that we expect to be able to access the contained data via a pointer T* i.e. via std::vector::data. Accessing the union as a T* would violate the strict aliasing rules and therefore be undefined behavior.

Dynamic arrays in C++ without Undefined Behavior

If std::vector can't, then you can't either.

But I wouldn't worry about it. This is one of those cases where people have found a problem with the wording of the standard, that technically makes an extremely common use case undefined. But your vectors still work.

Now, the key: that's not because of magic innate to std::vector, or to some particular std::vector implementation: it's because the compiler doesn't perform absurd optimisations that make use of this undefined behaviour that somebody only just spotted while studying the text with a fine-toothed comb.

Perhaps it'll be tidied up in a future revision, but for practical purposes you do not need to worry about it, whether you use std::vector or you use new[].

Why is undefined behavior allowed in the STL?

When undefined behaviour is allowed, it's usually for reasons of efficiency.

If the standard specified what has to happen when you access an array out of bounds, it would force the implementation to check whether the index is in bounds. Same goes for a vector, which is just a wrapper for a dynamic array.

In other cases the behaviour is allowed to be undefined in order to allow freedom in the implementation. But that, too, is really about efficiency (as some possible implementation strategies could be more efficient on some machines than on others, and C++ leaves it up to the implementer to pick the most efficient strategy, if they so desire.)

Appending std::vector to itself, undefined behavior?

According to the C++03 ISO spec (§23.1.1, Table 67) (and as @AndyProwl has mentioned, in §23.2.3, table 11 of the C++11 ISO spec), as part of sequence requirements, the operation a.insert(p, i, j) in a sequence container has this precondition:

i, j are not iterators into a.

In other words, sequence containers are allowed to safely assume that if you do a range insertion operation, that range will not be defined from iterators over that original container.

As a result, if you try to insert a container's elements into itself, you are calling a standard library function and breaking a precondition. This results in undefined behavior, meaning that it might work on some platforms if the library implementers are nice people, but it could terribly and catastrophically fail with no justification.

Hope this helps!

Undefined behavior for std::vector reserve()

It is undefined behaviour to call v.front() whenever v.empty() is true. It is undefined behaviour to call v[n] unless n < v.size(). Moreover, there are no objects at the reserved memory, so you cannot treat the memory as if it was an object. A vector only guranatees

that [data(), data() + size()) is a valid range

and there are no guarantees that there is any larger valid range. (Note that data() == &front(), so this is applies to your code.)

Using a std::vectorstd::arrayT,N as a flat contiguous array of T

No, but yes.

No, the C++ standard does not let you treat structs with arrays in them, or even structs with uniform elements, packed together as a single contiguous larger array of the base type.

Yes, in that enough in world production code requires this to work that no compiler is going to break it from working any time soon.

Things to watch out for include packing. Add static asserts that the size of the structs and arrays is what you expect. Also, avoid being fancy with object lifetime or going "backwards" from an index besides the first one; the reachability rules of C++ are slightly more likrly to bite you if you do strange things.

Another concern is that these constructs are under somewhat active investigation; the fiasco that std vector cannot be implemented in standard C++, for example, or std::launder and bit_cast. When a real standard way to do what you want develops, switching to it might be a good idea, because the old technique will become less likely to be supported.



Related Topics



Leave a reply



Submit