Why Is Out-Of-Bounds Pointer Arithmetic Undefined Behaviour

Why is out-of-bounds pointer arithmetic undefined behaviour?

That's because pointers don't behave like integers. It's undefined behavior because the standard says so.

On most platforms however (if not all), you won't get a crash or run into dubious behavior if you don't dereference the array. But then, if you don't dereference it, what's the point of doing the addition?

That said, note that an expression going one over the end of an array is technically 100% "correct" and guaranteed not to crash per §5.7 ¶5 of the C++11 spec. However, the result of that expression is unspecified (just guaranteed not to be an overflow); while any other expression going more than one past the array bounds is explicitly undefined behavior.

Note: That does not mean it is safe to read and write from an over-by-one offset. You likely will be editing data that does not belong to that array, and will cause state/memory corruption. You just won't cause an overflow exception.

My guess is that it's like that because it's not only dereferencing that's wrong. Also pointer arithmetics, comparing pointers, etc. So it's just easier to say don't do this instead of enumerating the situations where it can be dangerous.

Out of the bounds in C++ and undefined behaviour

It's undefined behavior. We can juxtapose a couple of passages to be convinced of it. First, and I won't explicitly prove it, table[4] is *(table + 4). We need only ask ourselves the properties of the pointer value table + 4 and how it relates to the requirements of the indirection operator.

On the pointer, we have this passage:

[basic.compound]

3 Every value of pointer type is one of the following:

  • a pointer to an object or function (the pointer is said to point to the object or function), or
  • a pointer past the end of an object ([expr.add]), or
  • the null pointer value for that type, or
  • an invalid pointer value.

Our pointer is of the second bullet's type, not the first. As for the indirection operator:

[expr.unary.op]

1 The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to T”, the type of the result is “T”.

I hope it's obvious from reading this paragraph that the operation is defined for a pointer of the category described by the first bullet in the preceding paragraph.

So we apply an operation to a pointer value for which its behavior is not defined. The result is undefined behavior.

Is it safe to keep a pointer out-of-bounds without dereferencing it?

Moving pointer to one element past the last element is allowed, but moving further or moving before the first element is not allowed.

Quote from N1570 6.5.6 Additive operators (point 8):

When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist. Moreover, if the expression P points to the last
element of an array object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element of an array object,
the expression (Q)-1 points to the last element of the array object. If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.

Is accessing an element of a multidimensional array out of bounds undefined behavior?

According to the standard, it is clearly undefined behaviour as such a case is explicitly listed in the section J.2 undefined behaviour (found in an online C99 standard draft):

An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression
a[1][7] given the declaration int a[4][5]) (6.5.6).

It can still be the case that your example will work, and actually I have seen a lot of such cases in C code; However, to be accurate, it is UB.

Are the following statements using pointer arithmetic involving out of bound aceess valid?

In your case,

  char *p1 = array + 7;

and

  char *p2 = array + 8;

are valid, because, you are legally allowed to point to

  • any object inside the array length
  • one past the last array element until you dereference it.

OTOH,

  char *p3 = array + 9;

and

  char *p4 = array + 10; 

are undefined.


Quoting C11, chapter §6.5.6, "Additive operators" (emphasis mine)

[same in C99, chapter §6.5.6/p8, for anybody interested]

When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand. If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression.
In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and
(P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of
the array object, provided they exist. Moreover, if the expression P points to the last
element of an array object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element of an array object,
the expression (Q)-1 points to the last element of the array object. If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.
[...]

Pointer arithmetic in c and array bounds

Undefined behavior means exactly that: absolutely anything could happen. It could succeed silently, it could fail silently, it could crash your program, it could blue screen your OS, or it could erase your hard drive. Some of these are not very likely, but all of them are permissible behaviors as far as the C language standard is concerned.

In this particular case, yes, the C standard is saying that even computing the address of a pointer outside of valid array bounds, without dereferencing it, is undefined behavior. The reason it says this is that there are some arcane systems where doing such a calculation could result in a fault of some sort. For example, you might have an array at the very end of addressable memory, and constructing a pointer beyond that would cause an overflow in a special address register which generates a trap or fault. The C standard wants to permit this behavior in order to be as portable as possible.

In reality, though, you'll find that constructing such an invalid address without dereferencing it has well-defined behavior on the vast majority of systems you'll come across in common usage. Creating an invalid memory address will have no ill effects unless you attempt to dereference it. But of course, it's better to avoid creating those invalid addresses so that your code will work perfectly even on those arcane systems.

Pointer to one-before-first element of array is UB. When was this first defined so?

The very earliest C standard, C89, has the same rule in 3.3.6 Additive Operators:

When an expression that has integral type is added to or subtracted from a pointer, the integral value is first multiplied by the size of the object pointed to. The result has the type of the pointer operand. If the pointer operand points to a member of an array object, and the array object is large enough, the result points to a member of the same array object, appropriately offset from the original member. Thus if P points to a member of an array object, the expression P+1 points to the next member of the array object. Unless both the pointer operand and the result point to a member of the same array object, or one past the last member of the array object, the behavior is undefined. Unless both the pointer operand and the result point to a member of the same array object, or the pointer operand points one past the last member of an array object and the result points to a member of the same array object, the behavior is undefined if the result is used as the operand of a unary * operator.

I don't believe that forming pointers to the "-1" element of an array has ever been well-defined C. Of course there might have been specific implementations where it happened to work, or was documented to do so.



Related Topics



Leave a reply



Submit