Is It Unspecified Behavior to Compare Pointers to Different Arrays for Equality

Is it unspecified behavior to compare pointers to different arrays for equality?

The semantics for op== and op!= explicitly say that the mapping is except for their truth-value result. So you need to look what is defined for their truth value result. If they say that the result is unspecified, then it is unspecified. If they define specific rules, then it is not. It says in particular

Two pointers of the same type compare equal if and only if they are both null, both point to the same function, or both represent the same address

Is pointer comparison undefined or unspecified behavior in C++?

Note that pointer subtraction and pointer comparison are different operations with different rules.

C++14 5.6/6, on subtracting pointers:

Unless both pointers point to elements of the same array object or one past the last element of the array object, the behavior is undefined.

C++14 5.9/3-4:

Comparing pointers to objects is defined as follows:
If two pointers point to different elements of the same array, or to subobjects thereof, the pointer to the element with the higher subscript compares greater.
If one pointer points to an element of an array, or to a subobject thereof, and another pointer points one past the last element of the array, the latter pointer compares greater.
If two pointers point to different non-static data members of the same object, or to subobjects of such members, recursively, the pointer to the later declared member compares greater provided the two members have the same access control and provided their class is not a union.
If two operands p and q compare equal (5.10), p<=q and p>=q both yield true and p<q and p>q both yield false. Otherwise, if a pointer p compares greater than a pointer q, p>=q, p>q, q<=p, and q<p all yield true, and p<=q, p<q, q>=p, and q>p all yield false. Otherwise, the result of each of the operators is unspecified.

How does pointer comparison work in C? Is it ok to compare pointers that don't point to the same array?

According to the C11 standard, the relational operators <, <=, >, and >= may only be used on pointers to elements of the same array or struct object. This is spelled out in section 6.5.8p5:

When two pointers are compared, the result depends on the
relative locations in the address space of the objects pointed to.
If two pointers to object types both point to the same object, or
both point one past the last element of the same array
object, they compare equal. If the objects pointed to are
members of the same aggregate object,pointers to structure
members declared later compare greater than pointers to
members declared earlier in the structure, and pointers to
array elements with larger subscript values compare greater than
pointers to elements of the same array with lower subscript values.
All pointers to members of the same union object compare
equal. If the expression P points to an element of an array
object and the expression Q points to the last element of the
same array object, the pointer expression Q+1 compares greater than P.
In all other cases, the behavior is undefined.

Note that any comparisons that do not satisfy this requirement invoke undefined behavior, meaning (among other things) that you can't depend on the results to be repeatable.

In your particular case, for both the comparison between the addresses of two local variables and between the address of a local and a dynamic address, the operation appeared to "work", however the result could change by making a seemingly unrelated change to your code or even compiling the same code with different optimization settings. With undefined behavior, just because the code could crash or generate an error doesn't mean it will.

As an example, an x86 processor running in 8086 real mode has a segmented memory model using a 16-bit segment and a 16-bit offset to build a 20-bit address. So in this case an address doesn't convert exactly to an integer.

The equality operators == and != however do not have this restriction. They can be used between any two pointers to compatible types or NULL pointers. So using == or != in both of your examples would produce valid C code.

However, even with == and != you could get some unexpected yet still well-defined results. See Can an equality comparison of unrelated pointers evaluate to true? for more details on this.

Regarding the exam question given by your professor, it makes a number of flawed assumptions:

A flat memory model exists where there is a 1-to-1 correspondence between an address and an integer value.
That the converted pointer values fit inside an integer type.
That the implementation simply treats pointers as integers when performing comparisons without exploiting the freedom given by undefined behavior.
That a stack is used and that local variables are stored there.
That a heap is used to pull allocated memory from.
That the stack (and therefore local variables) appears at a higher address than the heap (and therefore allocated objects).
That string constants appear at a lower address then the heap.

If you were to run this code on an architecture and/or with a compiler that does not satisfy these assumptions then you could get very different results.

Also, both examples also exhibit undefined behavior when they call strcpy, since the right operand (in some cases) points to a single character and not a null terminated string, resulting in the function reading past the bounds of the given variable.

Can an equality comparison of unrelated pointers evaluate to true?

Can an equality comparison of unrelated pointers evaluate to true?

Yes, but ...

int a;
int b;
printf("a precedes b: %d\n", (&a + 1) == &b);
printf("b precedes a: %d\n", (&b + 1) == &a);

There are, by my interpretation of the C standard, three possibilities:

a immediately precedes b
b immediately precedes a
neither a nor b immediately precedes the other (there could be a gap, or another object, between them)

I played around with this some time ago and concluded that GCC was performing an invalid optimization on the == operator for pointers, making it yield false even when the addresses are the same, so I submitted a bug report:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63611

That bug was closed as a duplicate of another report:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502

The GCC maintainers who responded to these bug reports seem to be of the opinion that adjacency of two objects need not be consistent and that the comparison of their addresses might show them to be adjacent or not, within the same run of the program. As you can see from my comments on the second Bugzilla ticket, I strongly disagree. In my opinion, without consistent behavior of the == operator, the standard's requirements for adjacent objects is meaningless, and I think we have to assume that those words are not merely decorative.

Here's a simple test program:

#include <stdio.h>
int main(void) {
    int x;
    int y;
    printf("&x = %p\n&y = %p\n", (void*)&x, (void*)&y);
    if (&y == &x + 1) {
        puts("y immediately follows x");
    }
    else if (&x == &y + 1) {
        puts("x immediately follows y");
    }
    else {
        puts("x and y are not adjacent");
    }
}

When I compile it with GCC 6.2.0, the printed addresses of x and y differ by exactly 4 bytes at all optimization levels, but I get y immediately follows x only at -O0; at -O1, -O2, and -O3 I get x and y are not adjacent. I believe this is incorrect behavior, but apparently, it's not going to be fixed.

clang 3.8.1, in my opinion, behaves correctly, showing x immediately follows y at all optimization levels. Clang previously had a problem with this; I reported it:

https://bugs.llvm.org/show_bug.cgi?id=21327

and it was corrected.

I suggest not relying on comparisons of addresses of possibly adjacent objects behaving consistently.

(Note that relational operators (<, <=, >, >=) on pointers to unrelated objects have undefined behavior, but equality operators (==, !=) are generally required to behave consistently.)

Comparing pointers that are not necessarily associated with the same array

Yes. Well, unspecified not undefined, which is much safer.

Converting to int_ptr is a guaranteed round trip however. Also std::less<>{}( a, b ) is guaranteed to be well behaved and consistent with < when < is specified.

This unspecified behaviour permits three things.

Originally, segmented memory; pointers could ignore the segment and compare faster.
Now, it permits certain optimizations. Like assuming compared pointers where derived in certain ways. And if the assumption is violated, the compiler can return anything.
Blocks this comparison in constant evaluated code.

However, most compilers do not aggressively blow up when you violate that rule. So it isn't a super high priority fix. At least one compiler actually implements less as a raw <.

Is comparing two pointers with undefined behavior if they are both cast to an integer type?

The conversion is legal but there is, technically, no meaning defined for the result. If instead you convert the pointer to void * and then convert to uintptr_t, there is slight meaning defined: Performing the reverse operations will reproduce the original pointer (or something equivalent).

It particular, you cannot rely on the fact that one integer is less than another to mean it is earlier in memory or has a lower address.

The specification for uintptr_t (C 2018 7.20.1.4 1) says it has the property that any valid void * can be converted to uintptr_t, then converted back to void *, and the result will compare equal to the original pointer.

However, when you convert an unsigned char * to uintptr_t, you are not converting a void * to uintptr_t. So 7.20.1.4 does not apply. All we have is the general definition of pointer conversions in 6.3.2.3, in which paragraphs 5 and 6 say:

An integer may be converted to any pointer type. Except as previously specified [involving zero for null pointers], the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
Any pointer type may be converted to an integer type. Except as previously specified [null pointers again], the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

So these paragraphs are no help except they tell you that the implementation documentation should tell you whether the conversions are useful. Undoubtedly they are in most C implementations.

In your example, you actually start with a void * from a parameter and convert it to unsigned char * and then to uintptr_t. So the remedy there is simple: Convert to uintptr_t directly from the void *.

For situations where we have some other pointer type, not void *, then 6.3.2.3 1 is useful:

A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.

So, converting to and from void * is defined to preserve the original pointer, so we can combine it with a conversion from void * to uintptr_t:

(uintptr_t) (void *) A < (uintptr_t) (void *) B

Since (void *) A must be able to produce the original A upon conversion back, and (uintptr_t) (void *) A must be able to produce its (void *) A, then (uintptr_t) (void *) A and (uintptr_t) (void *) B must be different if A and B are different.

And that is all we can say from the C standard about the comparison. Converting from pointers to integers might produce the address bits out of order or some other oddities. For example, they might produce a 32-bit integer contain a 16-bit segment address and a 16-bit offset. Some of those integers might have higher values for lower addresses while others have lower values for lower addresses. Worse, the same address might have two representations, so the comparison might indicate “less than” even though A and B refer to the same object.

How to compare pointers?

Yes, that is the definition of raw pointer equality: they both point to the same location (or are pointer aliases); usually in the virtual address space of the process running your application coded in C++ and managed by some operating system (but C++ can also be used for programming embedded devices with micro-controllers having a Harward architecture: on such microcontrollers some pointer casts are forbidden and makes no sense - since read only data could sit in code ROM)

For C++, read a good C++ programming book, see this C++ reference website, read the documentation of your C++ compiler (perhaps GCC or Clang) and consider coding with smart pointers. Maybe read also some draft C++ standard, like n4713 or buy the official standard from your ISO representative.

The concepts and terminology of garbage collection are also relevant when managing pointers and memory zones obtained by dynamic allocation (e.g. ::operator new), so read perhaps the GC handbook.

For pointers on Linux machines, see also this.

As per author below first version of compare will be undefined if two pointer pointing to different array

The operators >, >=, < and <= invoke undefined behaviour when applied to pointers into different arrays, according to the C language standard, and inherited by the C++ language standard. It's a pain. (== and != don't have undefined behaviour if the pointers are valid, the only problem is that a pointer past the end of one object may compare equal to a pointer to the start of another object. For example int a, b and compare &a[1] and &b[0]).

The less() function doesn't have this problem. It has defined behaviour in those cases as well. It has defined behaviour because the C++ standard says so, and it is up to the implementor of the standard library to make it work. On most current implementations less() is just as efficient as < .

Is It Unspecified Behavior to Compare Pointers to Different Arrays for Equality