May I Take the Address of the One-Past-The-End Element of an Array

Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?

Your example is legal, but only because you're not actually using an out of bounds pointer.

Let's deal with out of bounds pointers first (because that's how I originally interpreted your question, before I noticed that the example uses a one-past-the-end pointer instead):

In general, you're not even allowed to create an out-of-bounds pointer. A pointer must point to an element within the array, or one past the end. Nowhere else.

The pointer is not even allowed to exist, which means you're obviously not allowed to dereference it either.

Here's what the standard has to say on the subject:

5.7:5:

When an expression that has integral
type is added to or subtracted from a
pointer, the result has the type of
the pointer operand. If the pointer
operand points to an element of an
array object, and the array is large
enough, the result points to an
element offset from the original
element such that the difference of
the subscripts of the resulting and
original array elements equals the
integral expression. In other words,
if the expression P points to the i-th
element of an array object, the
expressions (P)+N (equivalently,
N+(P)) and (P)-N (where N has the
value n) point to, respectively, the
i+n-th and i−n-th elements of the
array object, provided they exist.
Moreover, if the expression P points
to the last element of an array
object, the expression (P)+1 points
one past the last element of the array
object, and if the expression Q points
one past the last element of an array
object, the expression (Q)-1 points to
the last element of the array object.
If both the pointer operand and the
result point to elements of the same
array object, or one past the last
element of the array object, the
evaluation shall not produce an
overflow; otherwise, the behavior is
undefined
.

(emphasis mine)

Of course, this is for operator+. So just to be sure, here's what the standard says about array subscripting:

5.2.1:1:

The expression E1[E2] is identical (by definition) to *((E1)+(E2))

Of course, there's an obvious caveat: Your example doesn't actually show an out-of-bounds pointer. it uses a "one past the end" pointer, which is different. The pointer is allowed to exist (as the above says), but the standard, as far as I can see, says nothing about dereferencing it. The closest I can find is 3.9.2:3:

[Note: for instance, the address one past the end of an array (5.7) would be considered to
point to an unrelated object of the array’s element type that might be located at that address. —end note ]

Which seems to me to imply that yes, you can legally dereference it, but the result of reading or writing to the location is unspecified.

Thanks to ilproxyil for correcting the last bit here, answering the last part of your question:

  • array + 5 doesn't actually
    dereference anything, it simply
    creates a pointer to one past the end
    of array.
  • &array[4] + 1 dereferences
    array+4 (which is perfectly safe),
    takes the address of that lvalue, and
    adds one to that address, which
    results in a one-past-the-end pointer
    (but that pointer never gets
    dereferenced.
  • &array[5] dereferences array+5
    (which as far as I can see is legal,
    and results in "an unrelated object
    of the array’s element type", as the
    above said), and then takes the
    address of that element, which also
    seems legal enough.

So they don't do quite the same thing, although in this case, the end result is the same.

May I take the address of the one-past-the-end element of an array?

Yes, you can take the address one beyond the end of an array, but you can't dereference it. For your array of 10 items, array+10 would work. It's been argued a few times (by the committee, among others) whether &array[10] really causes undefined behavior or not (and if it does, whether it really should). The bottom line with it is that at least according to the current standards (both C and C++) it officially causes undefined behavior, but if there's a single compiler for which it actually doesn't work, nobody in any of the arguments has been able to find or cite it.

Edit: For once my memory was half correct -- this was (part of) an official Defect Report to the committee, and at least some committee members (e.g., Tom Plum) thought the wording had been changed so it would not cause undefined behavior. OTOH, the DR dates from 2000, and the status is still "Drafting", so it's open to question whether it's really fixed, or ever likely to be (I haven't looked through N3090/3092 to figure out).

In C99, however, it's clearly not undefined behavior.

How do I take the address of one past the end of an array if the last address is 0xFFFFFFFF?

If this situation is a problem for a particular architecture (it may or may not be), then the compiler and runtime can be expected to arrange that allocated arrays never end at 0xFFFFFFFF. If they were to fail to do this, and something breaks when an array does end there, then they would not conform to the C++ standard.

Why point to memory location one past the last element in an array or vector?

To be able to tell an "empty" container/sequence from one with elements in it.

If a container is empty, a pointer to its first element would be one past the end, the same as the "end" location.

A NULL pointer could also have been used, but since iterators are necessarily pointers, the NULL would not apply. The default value for the iterator could also be used. In generic programming, how would the "default" be determined uniformly across all types (bear in mind pre C++98 implementations here)?

Not all sequences/containers, are contiguous in memory, any attempt to use a comparison such as operator < would not be suitable. The use of equality (or inequality), operators == and != is required, so a single point (or element) is needed to indicate the last element; the "one past the end" solves that.

For uniformity, the "end" is chosen as one past the end; it solves a lot of the issues and brought power to the STL. The half closed interval has become the norm in C++.

On a side note, the ranges library and techniques shines another light on the matter, but this was not around when the foundation for the STL was laid out.

Another side note; not all sequences correlate to a container, some sequences that are required to be iterated over are only a portion of the original container or sequence, the half closed interval offers a uniform technique to access the sequence, independently of where it is in the original container/sequence.

Related.

when is pointer to one past the last element of a memory area valid for pointer arithmetic?


An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type. The element type shall be complete whenever the array type is specified. Array types are characterized by their element type and by the number of elements in the array. An array type is said to be derived from its element type, and if its element type is T, the array type is sometimes called "array of T". The construction of an array type from an element type is called "array type derivation".
(6.2.5, 23)

An object is defined as:

region of data storage in the execution environment, the contents of which can represent values
Note 1 to entry: When referenced, an object can be interpreted as having a particular type (3.15)

void * specifically does not support arithmetic (UB), so that would rule out malloc and mmap. If you assign or cast it an array of something then it's fair game.

@ChrisDodd pointed out that:

For the purposes of these operators [ed: *, +, -], a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type. (6.5.6, 8)

Pointer to an array,why is this the address of the last element?

Because arrays in C++ are zero-based, i.e. start from 0 and end at n-1. var[MAX] is the element past the end of the array, so is out-of-bounds and accessing it is undefined behaviour.

var   { 10, 100, 200 }
index ^0 ^1^ ^2^ ^3?

pointer comparisons with one past the last element of an array object

Yes, a pointer is permitted to point to the location just past the end of the array. However you aren't permitted to deference such a pointer.

C99 6.5.6/8 Additive operators (emphasis added)

if the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object
, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary *
operator that is evaluated.

And, specifically for comparision operations on pointers:

C99 6.5.8/5 Relational operators

If the expression P points to an element of an array object and the
expression Q points to the last element of the same array object, the
pointer expression Q+1 compares greater than P. In all other cases,
the behavior is undefined.



Related Topics



Leave a reply



Submit