Implementing a std::vector like container without undefined behavior
This is topic that is under active discussion, we can see this in the proposal p0593: Implicit creation of objects for low-level object manipulation. This is a pretty solid discussion of the issues and why they are not fixable without changes. If you have different approaches or strong views on the approaches being considered you may want to reach out to the proposals authors.
It includes this discussion:
2.3. Dynamic construction of arrays
Consider this program that attempts to implement a type like std::vector (with many details omitted for brevity):
....
In practice, this code works across a range of existing
implementations, but according to the C++ object model, undefined
behavior occurs at points #a, #b, #c, #d, and #e, because they attempt
to perform pointer arithmetic on a region of allocated storage that
does not contain an array object.At locations #b, #c, and #d, the arithmetic is performed on a char*,
and at locations #a, #e, and #f, the arithmetic is performed on a T*.
Ideally, a solution to this problem would imbue both calculations with
defined behavior.
- Approach
The above snippets have a common theme: they attempt to use objects that they never created. Indeed, there is a family of types for which programmers assume they do not need to explicitly create objects. We propose to identify these types, and carefully carve out rules that remove the need to explicitly create such objects, by instead creating them implicitly.
The approach using the adc union has the problem that we expect to be able to access the contained data via a pointer T*
i.e. via std::vector::data. Accessing the union as a T*
would violate the strict aliasing rules and therefore be undefined behavior.
Dynamic arrays in C++ without Undefined Behavior
If std::vector
can't, then you can't either.
But I wouldn't worry about it. This is one of those cases where people have found a problem with the wording of the standard, that technically makes an extremely common use case undefined. But your vectors still work.
Now, the key: that's not because of magic innate to std::vector
, or to some particular std::vector
implementation: it's because the compiler doesn't perform absurd optimisations that make use of this undefined behaviour that somebody only just spotted while studying the text with a fine-toothed comb.
Perhaps it'll be tidied up in a future revision, but for practical purposes you do not need to worry about it, whether you use std::vector
or you use new[]
.
Why is undefined behavior allowed in the STL?
When undefined behaviour is allowed, it's usually for reasons of efficiency.
If the standard specified what has to happen when you access an array out of bounds, it would force the implementation to check whether the index is in bounds. Same goes for a vector, which is just a wrapper for a dynamic array.
In other cases the behaviour is allowed to be undefined in order to allow freedom in the implementation. But that, too, is really about efficiency (as some possible implementation strategies could be more efficient on some machines than on others, and C++ leaves it up to the implementer to pick the most efficient strategy, if they so desire.)
Appending std::vector to itself, undefined behavior?
According to the C++03 ISO spec (§23.1.1, Table 67) (and as @AndyProwl has mentioned, in §23.2.3, table 11 of the C++11 ISO spec), as part of sequence requirements, the operation a.insert(p, i, j)
in a sequence container has this precondition:
i
,j
are not iterators intoa
.
In other words, sequence containers are allowed to safely assume that if you do a range insertion operation, that range will not be defined from iterators over that original container.
As a result, if you try to insert a container's elements into itself, you are calling a standard library function and breaking a precondition. This results in undefined behavior, meaning that it might work on some platforms if the library implementers are nice people, but it could terribly and catastrophically fail with no justification.
Hope this helps!
Undefined behavior for std::vector reserve()
It is undefined behaviour to call v.front()
whenever v.empty()
is true. It is undefined behaviour to call v[n]
unless n < v.size()
. Moreover, there are no objects at the reserved memory, so you cannot treat the memory as if it was an object. A vector only guranatees
that
[data(), data() + size())
is a valid range
and there are no guarantees that there is any larger valid range. (Note that data() == &front()
, so this is applies to your code.)
Using a std::vectorstd::arrayT,N as a flat contiguous array of T
No, but yes.
No, the C++ standard does not let you treat structs with arrays in them, or even structs with uniform elements, packed together as a single contiguous larger array of the base type.
Yes, in that enough in world production code requires this to work that no compiler is going to break it from working any time soon.
Things to watch out for include packing. Add static asserts that the size of the structs and arrays is what you expect. Also, avoid being fancy with object lifetime or going "backwards" from an index besides the first one; the reachability rules of C++ are slightly more likrly to bite you if you do strange things.
Another concern is that these constructs are under somewhat active investigation; the fiasco that std vector cannot be implemented in standard C++, for example, or std::launder
and bit_cast
. When a real standard way to do what you want develops, switching to it might be a good idea, because the old technique will become less likely to be supported.
Related Topics
Find Two Elements in an Array That Sum to K
How to Compare Two Character Strings Statically at Compile Time
Sorting a Vector of Objects by a Property of the Object
Why "Not All Control Paths Return a Value" Is Warning and Not an Error
Effective Use of C++ Iomanip Library
Skip Some Arguments in a C++ Function
For Loop Prints an Extra Comma
How to Set Up Unit Testing for Visual Studio C++
Class Members and Explicit Stack/Heap Allocation
Calculate System Time Using Rdtsc
Error: Cannot Convert 'Const Wchar_T [13]' to 'Lpcstr {Aka Const Char*}' in Assignment
What's the Real Reason to Not Use the Eof Bit as Our Stream Extraction Condition
Piping (Or Command Chaining) with Qprocess
How to Set the Cout Locale to Insert Commas as Thousands Separators