Is the Behavior of Subtracting Two Null Pointers Defined

Is the behavior of subtracting two NULL pointers defined?

In C99, it's technically undefined behavior. C99 §6.5.6 says:

7) For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.

[...]

9) When two pointers are subtracted, both shall point to elements of the same array object,
or one past the last element of the array object; the result is the difference of the
subscripts of the two array elements. [...]

And §6.3.2.3/3 says:

An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.⁵⁵⁾ If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

So since a null pointer is unequal to any object, it violates the preconditions of 6.5.6/9, so it's undefined behavior. But in practicality, I'd be willing to bet that pretty much every compiler will return a result of 0 without any ill side effects.

In C89, it's also undefined behavior, though the wording of the standard is slightly different.

C++03, on the other hand, does have defined behavior in this instance. The standard makes a special exception for subtracting two null pointers. C++03 §5.7/7 says:

If the value 0 is added to or subtracted from a pointer value, the result compares equal to the original pointer value. If two pointers point to the same object or both point one past the end of the same array or both are null, and the two pointers are subtracted, the result compares equal to the value 0 converted to the type ptrdiff_t.

C++11 (as well as the latest draft of C++14, n3690) have identical wording to C++03, with just the minor change of std::ptrdiff_t in place of ptrdiff_t.

Can we subtract NULL pointers?

Subtracting two NULL pointers is not allowed. Section 6.5.6p9 of the C standard states:

When two pointers are subtracted, both shall point to elements of the
same array object, or one past the last element of the array
object; the result is the difference of the subscripts of the
two array elements. The size of the result is
implementation-defined, and its type (a signed integer type) is
ptrdiff_t defined in the header. If the result is not
representable in an object of that type, the behavior is
undefined. In other words, if the expressions P and Q point to,
respectively, the i
-th and j
-th elements of an array object, the expression (P)-(Q) has the value i−j provided the value fits in an object of type ptrdiff_t . Moreover,
if the expression P points either to an element of an array object or
one past the last element of an array object, and the expression Q
points to the last element of the same array object, the expression
((Q)+1)-(P) has the same value as ((Q)-(P))+1 and as
-((P)-((Q)+1)) , and has the value zero if the expression P points one past the last element of the array object, even
though the expression (Q)+1 does not point to an element of the array
object.

Because neither pointer points to an array object, the behavior is undefined.

You also can't subtract two void * because void is an incomplete type, and pointer subtraction depends on knowing the size of the pointed-to object. You could cast each pointer to a intptr_t and subtract those, however that would give you the byte difference between the pointers, not the index difference.

Why pointer subtraction is undefined behavior in C++?

For the code, as you've written it, the answer is basically the same for C++ as it is for C: you get defined behavior if and only if the two pointers involved refer to parts of the same array, or one past its end (where, as @bathsheba already noted, a non-array object is treated as being the same as an array of one item).

C++ does, however, add one wrinkle that might be useful to know about here: even though neither subtraction nor ordered comparisons (e.g., <) is required to produce meaningful results when applied to pointers to separate objects, std::less<T> and friends, from <functional> are required to do so. So, given two separate objects like this:

Object a;
Object b;

...comparing the addresses of the two objects with the comparison objects must "yield a strict total order that is consistent among those specializations and is also consistent with the partial order imposed by the built-in operators <, >, <=, >=." (N4659, [comparisons]/2).

As such, you could write your function something like this:

template <typename Ty> 
bool in_range(const Ty *test, const Ty *begin, const Ty *end) 
{ 
    return std::less_equal<Ty *>()(begin, test) && std::less<Ty *>()(test, end);
}

If you really want to maintain the original function signature, you could do that as well:

template <typename Ty> 
bool in_range(const Ty *test, const Ty *r, size_t n) 
{ 
    auto end = r + n;
    return std::less_equal<Ty *>()(r, test) && std::less<Ty *>()(test, end);
}

[Note that I've written it using std::less_equal for the first comparison, and std:less for the second to match the typically expected semantics of C++, where the range is defined as [begin, end). ]

This does carry one proviso though: you need to ensure that r points to to the beginning of an array of at least n items¹, or else the auto end = r + n; will produce undefined behavior.

At least for what I'd expect as the typical use-case for such a function, you can probably simplify usage a little but by passing the array itself, rather than a pointer and explicit length:

template <class Ty, size_t N>
bool in_range(Ty (&array)[N], Ty *test) {
    return  std::less_equal<Ty *>()(&array[0], test) && 
            std::less<Ty *>()(test, &array[0] + N);
}

In this case, you'd just pass the name of the array, and the pointer you want to test:

int foo[10];
int *bar = &foo[4];

std::cout << std::boolalpha << in_range(foo, bar) << "\n"; // returns true

This only supports testing against actual arrays though. If you attempt to pass a non-array item as the first parameter it simply won't compile:

int foo[10];
int bar;
int *baz = &foo[0];
int *ptr = new int[20];

std::cout << std::boolalpha << in_range(bar, baz) << "\n"; // won't compile
std::cout << std::boolalpha << in_range(ptr, baz) << "\n"; // won't compile either

The former probably prevents some accidents. The latter might not be quite as desirable. If we want to support both, we can do so via overloading (for all three situations, if we choose to):

template <class Ty, size_t N>
bool in_range(Ty (&array)[N], Ty *test) {
    return  std::less_equal<Ty *>()(&array[0], test) &&
            std::less<Ty *>()(test, &array[0]+ N);
}

template <class Ty>
bool in_range(Ty &a, Ty *b) { return &a == b; }

template <class Ty>
bool in_range(Ty a, Ty b, size_t N) {
    return std::less_equal<Ty>()(a, b) && 
           std::less<Ty>()(b, a + N);
}

void f() { 
     double foo[10]; 
     double *x = &foo[0]; 
     double bar;
     double *baz = new double[20];

     std::cout << std::boolalpha << in_range(foo, x) << "\n";
     std::cout << std::boolalpha << in_range(bar, x) << "\n";
     std::cout << std::boolalpha << in_range(baz, x, 20) << "\n";
}

^{1. If you want to get really technical, it doesn't have to point to the beginning of the array--it just has to point to part of an array that's followed by at least n items in the array.}

What is the rationale of making subtraction of two pointers not related to the same array undefined behavior?

Speaking more academically: pointers are not numbers. They are pointers.

It is true that a pointer on your system is implemented as a numerical representation of an address-like representation of a location in some abstract kind of memory (probably a virtual, per-process memory space).

But C++ doesn't care about that. C++ wants you to think of pointers as post-its, as bookmarks, to specific objects. The numerical address values are just a side-effect. The only arithmetic that makes sense on a pointer is forwards and backwards through an array of objects; nothing else is philosophically meaningful.

This may seem pretty arcane and useless, but it's actually deliberate and useful. C++ doesn't want to constrain implementations to imbuing further meaning to practical, low-level computer properties that it cannot control. And, since there is no reason for it to do so (why would you want to do this?) it just says that the result is undefined.

In practice you may find that your subtraction works. However, compilers are extremely complicated and make great use of the standard's rules in order to generate the fastest code possible; that can and often will result in your program appearing to do strange things when you break the rules. Don't be too surprised if your pointer arithmetic operation is mangled when the compiler assumes that both the originating value and the result refer to the same array — an assumption that you violated.

Consequences of null pointer subtraction

When the C Standard was written, many hardware platforms performed pointer arithmetic in such a way that adding zero to a null pointer would yield a null pointer with no side effects, and subtracting one null pointer from another would yield zero with no side effects. These behaviors were often useful, since they could eliminate the need for corner-case code when performing tasks involving N-byte chunks of storage, where N might be zero.

Even though many platforms could support the aforementioned corner cases without having to generate any extra machine code, it was hardly clear that all platforms would be able to do so (I don't know of any particular platforms that couldn't, but wouldn't be at all surprised if some such platforms existed). The Standard thus handled such situations the same way as it handles other situations where almost all implementations would process a construct in the same useful fashion, but it might be impractical for all to do so: categorized the action as "Undefined Behavior" but allow implementations to, as a form of "conforming language extension", process it in a manner consistent with the underlying execution environment.

There was never any doubt about how such constructs should be processed on commonplace platforms. The only doubt would have been whether implementations whose target platforms would require extra machine code to yield the commonplace semantics should generate such extra machine code, and classifying such constructs as UB would allow such decisions to be made by people who were working with such platforms, and would thus be better placed than the Committee to weigh the costs and benefits of supporting the commonplace behavior.

Is this failing test that adds zero to a null pointer undefined behaviour, a compiler bug, or something else?

This looks like an MSVC bug the C++14 draft standard explicitly allows adding and subtracting of the value 0 to a pointer to compare equal to itself, from [expr.add]p7:

If the value 0 is added to or subtracted from a pointer value, the result compares equal to the original pointer value. If two pointers point to the same object or both point one past the end of the same array or both are null, and the two pointers are subtracted, the result compares equal to the value 0 converted to the type std::ptrdiff_t.

It looks like CWG defect 1776 lead to p0137 which adjusted [expr.add]p7 to explicitly say null pointer.

The latest draft made this even more explicit [expr.add]p4:

When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

- If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.

- Otherwise, if P points to element x[i] of an array object x with n elements,85 the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) element x[i−j] if 0≤i−j≤n.
(4.3).

- Otherwise, the behavior is undefined.

This change was made editorially see this github issue and this PR.

MSVC is inconsistent here in that it allows adding and substracting zero in a constant expression just like gcc and clang does. This is key because undefined behavior in a constant expression is ill-formed and therefore requires a diagnostic. Given the following:

constexpr int *p = nullptr  ;
constexpr int z = 0 ;
constexpr int *q1 = p + z;
constexpr int *q2 = p - z;

gcc, clang and MSVC allows it a constant expression (live godbolt example) although sadly MSVC is doubly inconsistent in that it allows non-zero value as well, given the following:

constexpr int *p = nullptr  ;
constexpr int z = 1 ;
constexpr int *q1 = p + z;
constexpr int *q2 = p - z;

both clang and gcc say it is ill-formed while MSVC does not (live godbolt).

Why subtract null pointer in offsetof()?

The first version converts a pointer into an integer with a cast, which is not portable.

The second version is more portable across a wider variety of compilers, because it relies on pointer arithmetic by the compiler to get an integer result instead of a typecast.

BTW, I was the editor that added the original code to the Wiki entry, which was the Linux form. Later editors changed it to the more portable version.

Subtracting null pointer from a normal pointer?

It has undefined behaviour, since pointer arithmetic is only defined within an array.

In practice, on a machine that represents pointers by a numerical address, and with sufficiently large integers, and a null pointer represented by address zero, it will convert a byte address into a word address. That is, it gives the number of int-sized lumps of memory between the null pointer's address and the address p.

Also there was a comment that this is a pure and re-entrant function.

"Pure" means that it has no side-effects, and the result only depends on its inputs. "re-entrant" means that it's safe to call while it's already in progress (for example, from an interrupt handler) - it has no internal static data.

Subtraction of two nullptr values guaranteed to be zero?

Yes, that is valid. It would be undefined in C, but C++ has added a special exception to the - operator to define the behaviour.

5.7 Additive operators [expr.add]
7 If the value 0 is added to or subtracted from a pointer value, the result compares equal to the original pointer value. If two pointers point to the same object or both point one past the end of the same array or both are null, and the two pointers are subtracted, the result compares equal to the value 0 converted to the type std::ptrdiff_t.

C++ Undefined Behavior when subtracting pointers

You are correct.

The behaviour on subtracting pointers that do not point to elements of the same array is undefined. For this purpose a pointer is allowed to be one beyond the final element of an array, and an object counts as an array of length one.

But once you cast a pointer to a suitable type, e.g. to a std::uintptr_t (assuming your compiler supports it; it doesn't have to) you can apply any arithmetic you want to it, subject to the constrants imposed upon you by that type.

Although such rules may seem obtuse, they are related to a similar rule where you are not allowed to read a pointer that doesn't point to valid memory. All of this helps achieve greater portability of the language.

Is the Behavior of Subtracting Two Null Pointers Defined