Is the "One-Past-The-End" Pointer of a Non-Array Type a Valid Concept in C++

Is the one-past-the-end pointer of a non-array type a valid concept in C++?

No, it is legal. 5.7(4) - one paragraph before your quote - says: "For the purposes of these operators, a pointer to a nonarray object behaves the same as a pointer to the
first element of an array of length one with the type of the object as its element type."

Is one-past-end pointer OK for non-array object types?

Yes, it is okay. It is one of the four categories of values any pointer type may hold.

[basic.compound] (emphasis mine)

3 Every value of pointer type is one of the following:

  • a pointer to an object or function (the pointer is said to point to the object or function), or
  • a pointer past the end of an object ([expr.add]), or
  • the null pointer value ([conv.ptr]) for that type, or
  • an invalid pointer value.

A value of a pointer type that is a pointer to or past the end of an
object represents the address of the first byte in memory
([intro.memory]) occupied by the object or the first byte in memory
after the end of the storage occupied by the object, respectively. [
Note: A pointer past the end of an object ([expr.add]) is not
considered to point to an unrelated object of the object's type that
might be located at that address. A pointer value becomes invalid when
the storage it denotes reaches the end of its storage duration; see
[basic.stc]. — end note ] For purposes of pointer arithmetic
([expr.add]) and comparison ([expr.rel], [expr.eq]), a pointer past
the end of the last element of an array x of n elements is considered
to be equivalent to a pointer to a hypothetical array element n of x
and an object of type T that is not an array element is considered to
belong to an array with one element of type T.

As you can see, array types are also mentioned here, with their hypothetical object past the end. And as the footnote in [expr.add] explains, the arithmetic to obtain a one-past-the-end pointer is meant to be valid too:

As specified in [basic.compound], an object that is not an array element is considered to belong to a single-element array for this purpose and a pointer past the last element of an array of n elements is considered to be equivalent to a pointer to a hypothetical array element n for this purpose.

Pointer to one-before-first element of array is UB. When was this first defined so?

The very earliest C standard, C89, has the same rule in 3.3.6 Additive Operators:

When an expression that has integral type is added to or subtracted from a pointer, the integral value is first multiplied by the size of the object pointed to. The result has the type of the pointer operand. If the pointer operand points to a member of an array object, and the array object is large enough, the result points to a member of the same array object, appropriately offset from the original member. Thus if P points to a member of an array object, the expression P+1 points to the next member of the array object. Unless both the pointer operand and the result point to a member of the same array object, or one past the last member of the array object, the behavior is undefined. Unless both the pointer operand and the result point to a member of the same array object, or the pointer operand points one past the last member of an array object and the result points to a member of the same array object, the behavior is undefined if the result is used as the operand of a unary * operator.

I don't believe that forming pointers to the "-1" element of an array has ever been well-defined C. Of course there might have been specific implementations where it happened to work, or was documented to do so.

Why is it OK to compare one past the last element?

It's okay to compute that pointer and compare to it only. The reason being that it can be used to pass and iterate over and array using a pair of pointers.

If it was not allowed by the standard, the following loop would have exhibited undefined behavior by the mere existence of pend

int arr[100] = {0};

for (int *pbegin = arr, *pend = arr + 100; pbegin != pend; ++pbegin)
{
// do stuff
}

This is especially important in C++, where the idiom of passing a range as "an iterator to the beginning and one-past the end" is used heavily by the standard library.

Clarification on behaviour of comparing invalid pointers

Am I to understand that this is exactly one past the end of the array or that all pointers past the end of the array are valid.

Only a pointer one-past-the-array or one-past-the-object is valid (although you cannot dereference such a pointer). Pointers after that cannot be constructed, because pointer arithmetic has undefined behavior past this point.

The note above would have me believe that a pointer that does not point to an instantiated object is automatically invalid, since it is pointing to potentially unallocated memory.

The pointer doesn't need to point to an actual object if it is the one-past-end pointer. However such a pointer cannot be dereferenced. The pointers to the array/object, including the one-past-the-end pointer become invalid as soon as the storage duration of the object/array ends.

Which to me would suggest that if a pointer does not point to an array element or an object that has not reached the end of its lifetime, it is invalid.

The one-past-the-end pointers are considered a hypothetical element of the (hypothetical) array for the quoted clauses, see the note under the section referencing [basic.compound].

Would this be undefined behaviour if elem is not in the interval [arr_first, arr_last] since there is no guarantee elem points to anything?

Assuming arr_first is the first element of an array and arr_last the last element of the array, your function has unspecified behavior if elem doesn't point into the range arr_first to arr_last+1 inclusive.

This doesn't mean that it has undefined behavior, just that the return value of the function may be completely arbitrary.

However, trying to form e.g. a pointer arr_last+2 to pass to the function already has undefined behavior itself, since pointer arithmetic is only defined as long as one stays within the bounds of the array (or one-past-the array).

Which in turn invalidates the existence of this function since I can't guarantee its (expected) false results are defined?

The function as written is technically not useful, although I suppose it will work more or less as expected in practice most of the time. It is a much better approach to validate indices into the array, rather than pointers.

Is incrementing a pointer to a 0-sized dynamic array undefined?

Pointers to elements of arrays are allowed to point to a valid element, or one past the end. If you increment a pointer in a way that goes more than one past the end, the behavior is undefined.

For your 0-sized array, p is already pointing one past the end, so incrementing it is not allowed.

See C++17 8.7/4 regarding the + operator (++ has the same restrictions):

f the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n; otherwise, the behavior is undefined.

Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?

Your example is legal, but only because you're not actually using an out of bounds pointer.

Let's deal with out of bounds pointers first (because that's how I originally interpreted your question, before I noticed that the example uses a one-past-the-end pointer instead):

In general, you're not even allowed to create an out-of-bounds pointer. A pointer must point to an element within the array, or one past the end. Nowhere else.

The pointer is not even allowed to exist, which means you're obviously not allowed to dereference it either.

Here's what the standard has to say on the subject:

5.7:5:

When an expression that has integral
type is added to or subtracted from a
pointer, the result has the type of
the pointer operand. If the pointer
operand points to an element of an
array object, and the array is large
enough, the result points to an
element offset from the original
element such that the difference of
the subscripts of the resulting and
original array elements equals the
integral expression. In other words,
if the expression P points to the i-th
element of an array object, the
expressions (P)+N (equivalently,
N+(P)) and (P)-N (where N has the
value n) point to, respectively, the
i+n-th and i−n-th elements of the
array object, provided they exist.
Moreover, if the expression P points
to the last element of an array
object, the expression (P)+1 points
one past the last element of the array
object, and if the expression Q points
one past the last element of an array
object, the expression (Q)-1 points to
the last element of the array object.
If both the pointer operand and the
result point to elements of the same
array object, or one past the last
element of the array object, the
evaluation shall not produce an
overflow; otherwise, the behavior is
undefined
.

(emphasis mine)

Of course, this is for operator+. So just to be sure, here's what the standard says about array subscripting:

5.2.1:1:

The expression E1[E2] is identical (by definition) to *((E1)+(E2))

Of course, there's an obvious caveat: Your example doesn't actually show an out-of-bounds pointer. it uses a "one past the end" pointer, which is different. The pointer is allowed to exist (as the above says), but the standard, as far as I can see, says nothing about dereferencing it. The closest I can find is 3.9.2:3:

[Note: for instance, the address one past the end of an array (5.7) would be considered to
point to an unrelated object of the array’s element type that might be located at that address. —end note ]

Which seems to me to imply that yes, you can legally dereference it, but the result of reading or writing to the location is unspecified.

Thanks to ilproxyil for correcting the last bit here, answering the last part of your question:

  • array + 5 doesn't actually
    dereference anything, it simply
    creates a pointer to one past the end
    of array.
  • &array[4] + 1 dereferences
    array+4 (which is perfectly safe),
    takes the address of that lvalue, and
    adds one to that address, which
    results in a one-past-the-end pointer
    (but that pointer never gets
    dereferenced.
  • &array[5] dereferences array+5
    (which as far as I can see is legal,
    and results in "an unrelated object
    of the array’s element type", as the
    above said), and then takes the
    address of that element, which also
    seems legal enough.

So they don't do quite the same thing, although in this case, the end result is the same.

Is it UB to access an element one past the end of a row of a 2d array?

A debugging implementation might use "fat" pointers. For example, a pointer may be represented as a tuple (address, base, size) to detect out-of-bounds access. There is absolutely nothing wrong or contrary to the standard about such representation. So any pointer arithmetic that brings the pointer outside the range of [base, base+size] fails, and any dereference outside of [base, base+size) also fails.

Note that base and size are not the address and the size of the 2D array but rather of the array that the pointer points into (the row in this case).

It might sound trivial in this case, but when deciding whether a certain pointer construction is UB or not, it is useful to mentally run your example through this hypothetical implementation.

Are non dereferenced iterators past the one past-the-end iterator of an array undefined behavior?

Yes, your program has undefined behaviour if you form such a pointer.

That's because the only way you can do so is to increment a valid pointer past the bounds of the object it points inside, and that is an undefined operation.

[C++14: 5.7/5]: When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

An uninitialised pointer is not the same thing because you never did anything to "get" that pointer, other than declaring it (which is obviously valid). But you can't even evaluate it (not dereference — evaluate) without imbuing your program with undefined behaviour. Not until you've assigned it a valid value.

As a sidenote, I would not call these "past-the-end" iterators/pointers, a term in C++ which specifically means the "one past-the-end" iterator/pointer, which is valid (e.g. cend(foo) itself). You're waaaay past the end. ;)



Related Topics



Leave a reply



Submit