Take the Address of a One-Past-The-End Array Element Via Subscript: Legal by the C++ Standard or Not

Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?

Your example is legal, but only because you're not actually using an out of bounds pointer.

Let's deal with out of bounds pointers first (because that's how I originally interpreted your question, before I noticed that the example uses a one-past-the-end pointer instead):

In general, you're not even allowed to create an out-of-bounds pointer. A pointer must point to an element within the array, or one past the end. Nowhere else.

The pointer is not even allowed to exist, which means you're obviously not allowed to dereference it either.

Here's what the standard has to say on the subject:

5.7:5:

When an expression that has integral
type is added to or subtracted from a
pointer, the result has the type of
the pointer operand. If the pointer
operand points to an element of an
array object, and the array is large
enough, the result points to an
element offset from the original
element such that the difference of
the subscripts of the resulting and
original array elements equals the
integral expression. In other words,
if the expression P points to the i-th
element of an array object, the
expressions (P)+N (equivalently,
N+(P)) and (P)-N (where N has the
value n) point to, respectively, the
i+n-th and i−n-th elements of the
array object, provided they exist.
Moreover, if the expression P points
to the last element of an array
object, the expression (P)+1 points
one past the last element of the array
object, and if the expression Q points
one past the last element of an array
object, the expression (Q)-1 points to
the last element of the array object.
If both the pointer operand and the
result point to elements of the same
array object, or one past the last
element of the array object, the
evaluation shall not produce an
overflow; otherwise, the behavior is
undefined
.

(emphasis mine)

Of course, this is for operator+. So just to be sure, here's what the standard says about array subscripting:

5.2.1:1:

The expression E1[E2] is identical (by definition) to *((E1)+(E2))

Of course, there's an obvious caveat: Your example doesn't actually show an out-of-bounds pointer. it uses a "one past the end" pointer, which is different. The pointer is allowed to exist (as the above says), but the standard, as far as I can see, says nothing about dereferencing it. The closest I can find is 3.9.2:3:

[Note: for instance, the address one past the end of an array (5.7) would be considered to
point to an unrelated object of the array’s element type that might be located at that address. —end note ]

Which seems to me to imply that yes, you can legally dereference it, but the result of reading or writing to the location is unspecified.

Thanks to ilproxyil for correcting the last bit here, answering the last part of your question:

  • array + 5 doesn't actually
    dereference anything, it simply
    creates a pointer to one past the end
    of array.
  • &array[4] + 1 dereferences
    array+4 (which is perfectly safe),
    takes the address of that lvalue, and
    adds one to that address, which
    results in a one-past-the-end pointer
    (but that pointer never gets
    dereferenced.
  • &array[5] dereferences array+5
    (which as far as I can see is legal,
    and results in "an unrelated object
    of the array’s element type", as the
    above said), and then takes the
    address of that element, which also
    seems legal enough.

So they don't do quite the same thing, although in this case, the end result is the same.

C++ is it legal to take the address of 2 or more past the end of the array?

The key issue with taking addresses beyond the end of an array are segmented architectures: you may overflow the representable range of the pointer. The existing rule already creates some level of pain as it means that the last object can't be right on the boundary of a segment. however, the ability to form this address was well established.

May I take the address of the one-past-the-end element of an array? [duplicate]

Yes, you can take the address one beyond the end of an array, but you can't dereference it. For your array of 10 items, array+10 would work. It's been argued a few times (by the committee, among others) whether &array[10] really causes undefined behavior or not (and if it does, whether it really should). The bottom line with it is that at least according to the current standards (both C and C++) it officially causes undefined behavior, but if there's a single compiler for which it actually doesn't work, nobody in any of the arguments has been able to find or cite it.

Edit: For once my memory was half correct -- this was (part of) an official Defect Report to the committee, and at least some committee members (e.g., Tom Plum) thought the wording had been changed so it would not cause undefined behavior. OTOH, the DR dates from 2000, and the status is still "Drafting", so it's open to question whether it's really fixed, or ever likely to be (I haven't looked through N3090/3092 to figure out).

In C99, however, it's clearly not undefined behavior.

Is &array[i] always equivalent to (array + i)? [duplicate]

Firstly there is 6.5.2.1/2:

The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))

Then it is defined in (6.5.3.2/3) , the unary & operator:

[...] Similarly, if the operand is the result of a [] operator, neither the & operator nor
the unary * that is implied by the [] is evaluated and the result is as if the & operator
were removed and the [] operator were changed to a + operator.

Which is explicitly saying that &x[y] means (x) + (y) exactly.

Is it UB to access an element one past the end of a row of a 2d array?

A debugging implementation might use "fat" pointers. For example, a pointer may be represented as a tuple (address, base, size) to detect out-of-bounds access. There is absolutely nothing wrong or contrary to the standard about such representation. So any pointer arithmetic that brings the pointer outside the range of [base, base+size] fails, and any dereference outside of [base, base+size) also fails.

Note that base and size are not the address and the size of the 2D array but rather of the array that the pointer points into (the row in this case).

It might sound trivial in this case, but when deciding whether a certain pointer construction is UB or not, it is useful to mentally run your example through this hypothetical implementation.

Dereferencing one past the end pointer to array type

This is CWG 232. That issue might seem like it's mainly about dereferencing a null pointer but it's fundamentally about what it means to simply dereference something that doesn't point to an object. There is no explicit language rule about this case.

One of the examples in the issue is:

Similarly, dereferencing a pointer to the end of an array should be allowed as long as the value is not used:

char a[10];
char *b = &a[10]; // equivalent to "char *b = &*(a+10);"

Both cases come up often enough in real code that they should be allowed.

This is basically the same thing as OP (the a[10] part of the above expression), except using char instead of an array type.

Common wisdom is that it's undefined behavior to dereference a one-past-the-end pointer. However, does this hold true for pointers to array types?

There is no difference in the rules based on what kind of pointer it is. my_past_end is a past-the-end pointer, so whether it's UB to dereference it or not is not a function of the fact that it points to an array as opposed to any other kind of type.


While the type of is_this_valid an int* which gets initialized from a int(&)[3] (array-to-pointer decay), and thus nothing here actually reads from memory - that is immaterial to the way the language rules work. my_past_end is a pointer whose value is past the end of an object, and that's the only thing that matters.

Is it OK to access past the size of a structure via member address, with enough space allocated?

The intent at least since the standardization of C in 1989 has been that implementations are allowed to check array bounds for array accesses.

The member p->a is an object of type int. C11 6.5.6p7 says that

7 For the purposes of [additive operators] a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

Thus

&(p->a)

is a pointer to an int; but it is also as if it were a pointer to the first element of an array of length 1, with int as the object type.

Now 6.5.6p8 allows one to calculate &(p->a) + 1 which is a pointer to just past the end of the array, so there is no undefined behaviour. However, the dereference of such a pointer is invalid. From Appendix J.2 where it is spelt out, the behaviour is undefined when:

Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond the array object and is used as the operand of a unary * operator that is evaluated (6.5.6).

In the expression above, there is only one array, the one (as if) with exactly 1 element. If &(p->a) + 1 is dereferenced, the array with length 1 is accessed out of bounds and undefined behaviour occurs, i.e.

behavior [...], for which [The C11] Standard imposes no requirements

With the note saying that:

Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

That the most common behaviour is ignoring the situation completely, i.e. behaving as if the pointer referenced the memory location just after, doesn't mean that other kind of behaviour wouldn't be acceptable from the standard's point of view - the standard allows every imaginable and unimaginable outcome.


There has been claims that the C11 standard text has been written vaguely, and the intention of the committee should be that this indeed be allowed, and previously it would have been alright. It is not true. Read the part from the committee response to [Defect Report #017 dated 10 Dec 1992 to C89].

Question 16

[...]

Response

For an array of arrays, the permitted pointer arithmetic in
subclause 6.3.6, page 47, lines 12-40 is to be understood by
interpreting the use of the word object as denoting the specific
object determined directly by the pointer's type and value, not other
objects related to that one by contiguity
. Therefore, if an expression
exceeds these permissions, the behavior is undefined. For example, the
following code has undefined behavior:

 int a[4][5];

a[1][7] = 0; /* undefined */

Some conforming implementations may
choose to diagnose an array bounds violation, while others may
choose to interpret such attempted accesses successfully with the
obvious extended semantics.

(bolded emphasis mine)

There is no reason why the same wouldn't be transferred to scalar members of structures, especially when 6.5.6p7 says that a pointer to them should be considered to behave the same as a pointer to the first element of an array of length one with the type of the object as its element type.

If you want to address the consecutive structs, you can always take the pointer to the first member and cast that as the pointer to the struct and advance that instead:

*(int *)((S *)&(p->a) + 1) = 0;


Related Topics



Leave a reply



Submit