Is Pointer Comparison Undefined or Unspecified Behavior in C++

Is pointer comparison undefined or unspecified behavior in C++?

Note that pointer subtraction and pointer comparison are different operations with different rules.

C++14 5.6/6, on subtracting pointers:

Unless both pointers point to elements of the same array object or one past the last element of the array object, the behavior is undefined.

C++14 5.9/3-4:

Comparing pointers to objects is defined as follows:

  • If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.

  • If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the array, the latter pointer compares greater.

  • If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member compares greater provided the two members have the same access control and provided their class is not a union.

If two operands p and q compare equal (5.10), p<=q and p>=q both yield true and p<q and p>q both yield false. Otherwise, if a pointer p compares greater than a pointer q, p>=q, p>q, q<=p, and q<p all yield true, and p<=q, p<q, q>=p, and q>p all yield false. Otherwise, the result of each of the operators is unspecified.

Is comparing two pointers with undefined behavior if they are both cast to an integer type?

The conversion is legal but there is, technically, no meaning defined for the result. If instead you convert the pointer to void * and then convert to uintptr_t, there is slight meaning defined: Performing the reverse operations will reproduce the original pointer (or something equivalent).

It particular, you cannot rely on the fact that one integer is less than another to mean it is earlier in memory or has a lower address.

The specification for uintptr_t (C 2018 7.20.1.4 1) says it has the property that any valid void * can be converted to uintptr_t, then converted back to void *, and the result will compare equal to the original pointer.

However, when you convert an unsigned char * to uintptr_t, you are not converting a void * to uintptr_t. So 7.20.1.4 does not apply. All we have is the general definition of pointer conversions in 6.3.2.3, in which paragraphs 5 and 6 say:

An integer may be converted to any pointer type. Except as previously specified [involving zero for null pointers], the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.

Any pointer type may be converted to an integer type. Except as previously specified [null pointers again], the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

So these paragraphs are no help except they tell you that the implementation documentation should tell you whether the conversions are useful. Undoubtedly they are in most C implementations.

In your example, you actually start with a void * from a parameter and convert it to unsigned char * and then to uintptr_t. So the remedy there is simple: Convert to uintptr_t directly from the void *.

For situations where we have some other pointer type, not void *, then 6.3.2.3 1 is useful:

A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.

So, converting to and from void * is defined to preserve the original pointer, so we can combine it with a conversion from void * to uintptr_t:

(uintptr_t) (void *) A < (uintptr_t) (void *) B

Since (void *) A must be able to produce the original A upon conversion back, and (uintptr_t) (void *) A must be able to produce its (void *) A, then (uintptr_t) (void *) A and (uintptr_t) (void *) B must be different if A and B are different.

And that is all we can say from the C standard about the comparison. Converting from pointers to integers might produce the address bits out of order or some other oddities. For example, they might produce a 32-bit integer contain a 16-bit segment address and a 16-bit offset. Some of those integers might have higher values for lower addresses while others have lower values for lower addresses. Worse, the same address might have two representations, so the comparison might indicate “less than” even though A and B refer to the same object.

How does pointer comparison work in C? Is it ok to compare pointers that don't point to the same array?

According to the C11 standard, the relational operators <, <=, >, and >= may only be used on pointers to elements of the same array or struct object. This is spelled out in section 6.5.8p5:

When two pointers are compared, the result depends on the
relative locations in the address space of the objects pointed to.
If two pointers to object types both point to the same object, or
both point one past the last element of the same array
object, they compare equal. If the objects pointed to are
members of the same aggregate object,pointers to structure
members declared later compare greater than pointers to
members declared earlier in the structure, and pointers to
array elements with larger subscript values compare greater than
pointers to elements of the same array with lower subscript values.
All pointers to members of the same union object compare
equal. If the expression P points to an element of an array
object and the expression Q points to the last element of the
same array object, the pointer expression Q+1 compares greater than P.
In all other cases, the behavior is undefined.

Note that any comparisons that do not satisfy this requirement invoke undefined behavior, meaning (among other things) that you can't depend on the results to be repeatable.

In your particular case, for both the comparison between the addresses of two local variables and between the address of a local and a dynamic address, the operation appeared to "work", however the result could change by making a seemingly unrelated change to your code or even compiling the same code with different optimization settings. With undefined behavior, just because the code could crash or generate an error doesn't mean it will.

As an example, an x86 processor running in 8086 real mode has a segmented memory model using a 16-bit segment and a 16-bit offset to build a 20-bit address. So in this case an address doesn't convert exactly to an integer.

The equality operators == and != however do not have this restriction. They can be used between any two pointers to compatible types or NULL pointers. So using == or != in both of your examples would produce valid C code.

However, even with == and != you could get some unexpected yet still well-defined results. See Can an equality comparison of unrelated pointers evaluate to true? for more details on this.

Regarding the exam question given by your professor, it makes a number of flawed assumptions:

  • A flat memory model exists where there is a 1-to-1 correspondence between an address and an integer value.
  • That the converted pointer values fit inside an integer type.
  • That the implementation simply treats pointers as integers when performing comparisons without exploiting the freedom given by undefined behavior.
  • That a stack is used and that local variables are stored there.
  • That a heap is used to pull allocated memory from.
  • That the stack (and therefore local variables) appears at a higher address than the heap (and therefore allocated objects).
  • That string constants appear at a lower address then the heap.

If you were to run this code on an architecture and/or with a compiler that does not satisfy these assumptions then you could get very different results.

Also, both examples also exhibit undefined behavior when they call strcpy, since the right operand (in some cases) points to a single character and not a null terminated string, resulting in the function reading past the bounds of the given variable.

Does the pointer arithmetic in this usage cause undefined behavior

It is undefined behavior because there are severe restrictions on what can be done with pointer arithmetic. The edits that you have made and that were suggested do nothing to fix this.

Undefined Behavior in Addition

StructA* a = (StructA*)((char*)copy + offset);

First of all, this is undefined behavior due to the addition onto copy:

When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

  • (4.1) If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
  • (4.2) Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0 ≤ i + j ≤ n and the expression P - J points to the (possibly-hypothetical) array element i − j of x if 0 ≤ i − j ≤ n.
  • (4.3) Otherwise, the behavior is undefined.

See https://eel.is/c++draft/expr.add#4

In short, performing pointer arithmetic on non-arrays and non-null-pointers is always undefined behavior. Even if copy or its members were arrays, adding onto a pointer so that it becomes:

  • two or more past the end of the array
  • at least one before the first element

is also undefined behavior.

Undefined Behavior in Subtraction

ptrdiff_t offset = (char*)original - (char*)(copy->b);

The subtraction of your two pointers is also undefined behavior:

When two pointer expressions P and Q are subtracted, the type of the result is an implementation-defined signed integral type; [...]

  • (5.1) If P and Q both evaluate to null pointer values, the result is 0.
  • (5.2) Otherwise, if P and Q point to, respectively, array elements i and j of the same array object x, the expression P - Q has the value i − j.
  • (5.3) Otherwise, the behavior is undefined.

See https://eel.is/c++draft/expr.add#5

So subtracting pointers from one another, when they are not both null or pointers to elements of the same array is undefined behavior.

Undefined Behavior in C

The C standard has similar restrictions:

(8) [...] If the pointer operand points to an element of
an array object, and the array is large enough, the result points to an element offset from
the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression.

(The standard does not mention what happens for non-array pointer addition)

(9) When two pointers are subtracted, both shall point to elements of the same array object,
or one past the last element of the array object; [...]

See §6.5.6 Additive Operators in the C11 standard (n1570).

Using Data Member Pointers Instead

A clean and type-safe solution in C++ would be to use data member pointers.

typedef struct StructB {
StructA a;
StructA StructB::*b_offset;
} StructB;

int main() {
StructB* original = (StructB*) malloc(sizeof(StructB));
original->a.a = 5;
original->b_offset = &StructB::a;

StructB* copy = (StructB*) malloc(sizeof(StructB));
memcpy(copy, original, sizeof(StructB));
free(original);
printf("%i\n", (copy->*(copy->b_offset)).a);
free(copy);
}

Notes

The standard citations are from a C++ draft. The C++11 which you have cited does not appear to have any looser restrictions on pointer arithmetic, it is just formatted differently. See C++11 standard (n3337).

Is it unspecified behavior to compare pointers to different arrays for equality?

The semantics for op== and op!= explicitly say that the mapping is except for their truth-value result. So you need to look what is defined for their truth value result. If they say that the result is unspecified, then it is unspecified. If they define specific rules, then it is not. It says in particular

Two pointers of the same type compare equal if and only if they are both null, both point to the same function, or both represent the same address

Is this write to an array truly undefined behavior in C?

There's nothing invalid about your code, the compiler is wrong. If you remove the unnecessary ADDR_AFTER check in test(), the code runs as expected with no UBSan error. If you run it with optimization enabled and without UBSan, you get the wrong output (test=1, should be 2).

Something about the ADDR_AFTER(first) == (uintptr_t)an_int code inside test() makes Clang do the wrong thing when compiling with -O2.

I tested with Apple clang version 11.0.3 (clang-1103.0.32.62) but it looks like Clang 13 and current trunk also have the bug: https://godbolt.org/z/s83ncTsbf - if you change the compiler to any version of GCC you'll see it can return 1 or 2 from main(), while Clang always returns 1 (mov eax, 1).

You should probably file a Clang bug for this.

is not required == undefined behavior?

The wording has changed in various editions of the C++ standard, and in the recent draft cited in the question. (See my comments on the question for the gory details.)

C++11 says:

Other pointer comparisons are unspecified.

C++17 says:

Otherwise, neither pointer compares greater than the other.

The latest draft, cited in the question, says:

Otherwise, neither pointer is required to compare greater than the other.

That change was made in response to an issue saying ""compares greater" term is needlessly confusing".

If you look at the surrounding context in the draft standard, it's clear that in the remaining cases the result is unspecified. Quoting from [expr.rel] (text in italics is my summary):

The result of comparing unequal pointers to objects is defined in
terms of a partial order consistent with the following rules:

  • [pointers to elements of the same array]

  • [pointers to members of the same object]

  • [remaining cases] Otherwise, neither pointer is required to compare greater than the other.

If two operands p and q compare equal, p<=q and
p>=q both yield true and p<q and p>q both yield
false. Otherwise, if a pointer p compares greater than a pointer q, p>=q, p>q, q<=p, and q<p all
yield true and p<=q, p<q, q>=p, and q>p
all yield false. Otherwise, the result of each of the operators
is unspecified.

So the result of the < operator in such cases is unspecified, but it does not have undefined behavior. It can be either true or false, but I don't believe it's required to be consistent. The program's output could be any of 00, 01, 10, or 11.

Is it undefined behavior to compare a character array char u[10] with a string literal abc

Standard says:

[expr.eq]

The == (equal to) and the != (not equal to) operators group left-to-right.
The lvalue-to-rvalue ([conv.lval]), array-to-pointer ([conv.array]), and function-to-pointer ([conv.func]) standard conversions are performed on the operands...

Hence, we are comparing pointers to the respective arrays.

If at least one of the operands is a pointer, ...
Comparing pointers is defined as follows:

  • If one pointer represents the address of a complete object, and another pointer represents the address one past the last element of a different complete object,72 the result of the comparison is unspecified. [does not apply since neither pointer is past last element]
  • Otherwise, if the pointers are both null, both point to the same function, or both represent the same address, they compare equal. [does not apply since neither is null, neither point to functions, nor represent same address]
  • Otherwise, the pointers compare unequal. [applies]

The behaviour is defined and else branch will be unconditionally executed.

Unconditionally unconditional if-statements imply that there is probably a bug; most likely the author was trying to compare the content of the arrays, which the operator does not do.



"warning: comparison with string literal results in unspecified behavior"

I believe that this warning message is slightly misleading. Comparison with two string literals would be unspecified:

if ("tri" == "tri")

It's unspecified whether this conditional is true or false.

Is it UB to compare (for equality) a void pointer with a typed pointer in C?

This comparison is well defined.

When a void * is compared against another pointer type via ==, the other pointer is converted to void *.

Also, Section 6.5.9p6 of the C standard says the following regarding pointer comparisons with ==:

Two pointers compare equal if and only if both are null pointers, both
are pointers to the same object (including a pointer to an object and
a subobject at its beginning) or function,both are pointers to one
past the last element of the same array object, or one is a pointer to
one past the end of one array object and the other is a pointer to the
start of a different array object that happens to immediately
follow the first array object in the address space.

There is no mention here of undefined behavior.

Is computing a pointer to uninitialized memory undefined behavior in C?

There is neither undefined behavior. You can consider a single object as an array with one element. Using the pointer arithmetic the pointer may point to element past the last element of the array so this statement

p = p + 1 - 1;

is correct.

From the C Standard (6.5.6 Additive operators)

7 For the purposes of these operators, a pointer to an object that is
not an element of an array behaves the same as a pointer to the first
element of an array of length one with the type of the object as its
element type.

and


  1. ...Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of
    the array object, and if the expression Q points one past the last
    element of an array object, the expression (Q)-1 points to the last
    element of the array object.

Pay attention to that


  1. ...If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array
    object, the evaluation shall not produce an overflow; otherwise,
    the behavior is undefined.


Related Topics



Leave a reply



Submit