How to Write C/C++ Code Correctly When Null Pointer Is Not All Bits Zero

How to write C/C++ code correctly when null pointer is not all bits zero

According to the C spec:

An integer constant expression with the value 0, or such an expression
cast to type void *, is called a null pointer constant. 55) If a null
pointer constant is converted to a pointer type, the resulting
pointer, called a null pointer, is guaranteed to compare unequal to a
pointer to any object or function.

So 0 is a null pointer constant. And if we convert it to a pointer type we will get a null pointer that might be non-all-bits-zero for some architectures. Next let's see what the spec says about comparing pointers and a null pointer constant:

If one operand is a
pointer and the other is a null pointer constant, the null pointer
constant is converted to the type of the pointer.

Let's consider (p == 0): first 0 is converted to a null pointer, and then p is compared with a null pointer constant whose actual bit values are architecture-dependent.

Next, see what the spec says about the negation operator:

The result of the logical negation operator ! is 0 if the value of its
operand compares unequal to 0, 1 if the value of its operand compares
equal to 0. The result has type int. The expression !E is equivalent
to (0==E).

This means that (!p) is equivalent to (p == 0) which is, according to the spec, testing p against the machine-defined null pointer constant.

Thus, you may safely write if (!p) even on architectures where the null pointer constant is not all-bits-zero.

As for C++, a null pointer constant is defined as:

A null pointer constant is an integral constant expression (5.19)
prvalue of integer type that evaluates to zero or a prvalue of type
std::nullptr_t. A null pointer constant can be converted to a pointer
type; the result is the null pointer value of that type and is
distinguishable from every other value of object pointer or function
pointer type.

Which is close to what we have for C, plus the nullptr syntax sugar. The behavior of operator == is defined by:

In addition, pointers to members can be compared, or a pointer to
member and a null pointer constant. Pointer to member conversions
(4.11) and qualification conversions (4.4) are performed to bring them
to a common type. If one operand is a null pointer constant, the
common type is the type of the other operand. Otherwise, the common
type is a pointer to member type similar (4.4) to the type of one of
the operands, with a cv-qualification signature (4.4) that is the
union of the cv-qualification signatures of the operand types. [ Note:
this implies that any pointer to member can be compared to a null
pointer constant. — end note ]

That leads to conversion of 0 to a pointer type (as for C). For the negation operator:

The operand of the logical negation operator ! is contextually
converted to bool (Clause 4); its value is true if the converted
operand is true and false otherwise. The type of the result is bool.

That means that result of !p depends on how conversion from pointer to bool is performed. The standard says:

A zero value, null pointer value, or null member pointer value is
converted to false;

So if (p==NULL) and if (!p) does the same things in C++ too.

When NULL is not all-zero-bits, is an all-zero-bit pointer value also 'false'?

typedef struct { void * p; } obj;
obj * o = calloc(sizeof(obj), 1);
assert(o);  // Let us set aside the case of a failed allocation
printf("%s\n", o->p ? "true" : "false");  // 1st: could print "true" ?

can I rely on calloc to produce a pointer value that will always evaluate to false in boolean contexts/comparisons?

No - output could be "true".^*1.

The bit pattern of all zeros, as a pointer, may not be a null pointer.

7.22.3.2 The calloc function

2 The calloc function allocates space for an array of nmemb objects, each of whose size is size. The space is initialized to all bits zero.301)

Footnote 301) Note that this need not be the same as the representation of floating-point zero or a null pointer constant.

Example: An implementation may only have only a single null pointer encoding with a bit pattern of all ones. (void *)0 converts the all zeros bit pattern int 0 to an all ones void *. if (null_pointer) is always false, regardless of the bit pattern of the null pointer.

^*1 Yet practically yes, output is always "false". Implementations are uncommon these days that do not use all zero bit pattern as a null pointer. Highly portable code would not assume this practicality. Consider an old or new novel system may use a zero bit pattern as a non-null pointer - and sadly break many a code base that assumes an all zero bit pattern is a null pointer.

How compiler handles a non-zero null pointer value in C?

if (!pointer)

If the C implementation used the value DEADBEEF₁₆ for a null pointer, the compiler would compile if (!pointer) to code such as:

    compare             pointer, #0xDEADBEEF
    branch-if-not-equal else-clause

if (pointer == 0)

An integer constant zero qualifies as a “null pointer constant” (C 2018 6.3.2.3 3). When a pointer is compared to a null pointer constant, the null pointer constant is converted to the type of the pointer (6.5.9 5). The compiler would implement this conversion by producing DEADBEEF₁₆ for the resulting pointer. Then it would compare pointer to DEADBEEF₁₆ and branch accordingly.

Simply put, just because the character “0” appears in source code does not mean the compiler has to use zero in the instructions it generates. It generates whatever instructions and values it needs to get the job done.

I am really unable to understand how this literal 0 becomes non-all-bits-zero when initialized to a pointer.

There is nothing about the character “0” that forces a compiler to give it a value of zero. The code for “0” is 48 in ASCII and 240 in EBCDIC, so the compiler is not starting with a value of zero when it processes this or other characters. Normally, when processing numerals, it has to read the digits and do some arithmetic to calculate the numbers represented by the numerals. It is that software that causes “0” to stand for the value zero or that causes “23” to stand for the value twenty-three. To make “0” represent a null pointer, the software in the compiler simply substitutes the internal value of a null pointer wherever “0” is used in a context for a pointer.

For example, in void *x = 0;, the compiler might initially convert “0” to zero, but this will be part of a data structure that also says this is a token or integer constant expression that currently has the value zero. When the compiler sees this integer constant expression is being used to initialize a pointer, it will change the value, and it will generate code that initializes the pointer to the internal value for a null pointer.

Is NULL always zero in C?

I'm assuming you mean the null pointer. It is guaranteed to compare equal to 0.¹ But it doesn't have to be represented with all-zero bits.²

See also the comp.lang.c FAQ on null pointers.

_{See C99, 6.3.2.3.
There's no explicit claim; but see the footnote for C99, 7.20.3 (thanks to @birryree in the comments).
Why is address zero used for the null pointer?
2 points:
only the constant value 0 in the source code is the null pointer - the compiler implementation can use whatever value it wants or needs in the running code. Some platforms have a special pointer value that's 'invalid' that the implementation might use as the null pointer. The C FAQ has a question, "Seriously, have any actual machines really used nonzero null pointers, or different representations for pointers to different types?", that points out several platforms that used this property of 0 being the null pointer in C source while represented differently at runtime. The C++ standard has a note that makes clear that converting "an integral constant expression with value zero always yields a null pointer, but converting other expressions that happen to have value zero need not yield a null pointer".
a negative value might be just as usable by the platform as an address - the C standard simply had to chose something to use to indicate a null pointer, and zero was chosen. I'm honestly not sure if other sentinel values were considered.
The only requirements for a null pointer are:
it's guaranteed to compare unequal to a pointer to an actual object
any two null pointers will compare equal (C++ refines this such that this only needs to hold for pointers to the same type)
Does Standard define null pointer constant to have all bits set to zero?
No, NULL doesn't have to be all bits zero.
N1570 6.3.2.3 Pointers paragraph 3:
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant. 66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
See my emphasis above: Integer 0 is converted if necessary, it doesn't have to have same bit presentation.
Note 66 on bottom of the page says:
66) The macro NULL is defined in (and other headers) as a null pointer constant; see 7.19.
Which leads us to a paragraph of that chapter:
The macros are
NULL
which expands to an implementation-defined null pointer constant
And what is more, on Annex J.3.12 (Portability issues, Implementation-defined behaviour, Library functions) says:
— The null pointer constant to which the macro NULL expands (7.19).
Do you use NULL or 0 (zero) for pointers in C++?
Here's Stroustrup's take on this: C++ Style and Technique FAQ
In C++, the definition of NULL is 0, so there is only an aesthetic difference. I prefer to avoid macros, so I use 0. Another problem with NULL is that people sometimes mistakenly believe that it is different from 0 and/or not an integer. In pre-standard code, NULL was/is sometimes defined to something unsuitable and therefore had/has to be avoided. That's less common these days.
If you have to name the null pointer, call it nullptr; that's what it's called in C++11. Then, nullptr will be a keyword.
That said, don't sweat the small stuff.
When was the NULL macro not 0?
The C FAQ has some examples of historical machines with non-0 NULL representations.
From The C FAQ List, question 5.17:
Q: Seriously, have any actual machines really used nonzero null
pointers, or different representations for pointers to different
types?
A: The Prime 50 series used segment 07777, offset 0 for the null
pointer, at least for PL/I. Later models used segment 0, offset 0 for
null pointers in C, necessitating new instructions such as TCNP (Test
C Null Pointer), evidently as a sop to [footnote] all the extant
poorly-written C code which made incorrect assumptions. Older,
word-addressed Prime machines were also notorious for requiring larger
byte pointers (char *'s) than word pointers (int *'s).
The Eclipse MV series from Data General has three architecturally
supported pointer formats (word, byte, and bit pointers), two of which
are used by C compilers: byte pointers for char * and void *, and word
pointers for everything else. For historical reasons during the
evolution of the 32-bit MV line from the 16-bit Nova line, word
pointers and byte pointers had the offset, indirection, and ring
protection bits in different places in the word. Passing a mismatched
pointer format to a function resulted in protection faults.
Eventually, the MV C compiler added many compatibility options to try
to deal with code that had pointer type mismatch errors.
Some Honeywell-Bull mainframes use the bit pattern 06000 for
(internal) null pointers.
The CDC Cyber 180 Series has 48-bit pointers consisting of a ring,
segment, and offset. Most users (in ring 11) have null pointers of
0xB00000000000. It was common on old CDC ones-complement machines to
use an all-one-bits word as a special flag for all kinds of data,
including invalid addresses.
The old HP 3000 series uses a different addressing scheme for byte
addresses than for word addresses; like several of the machines above
it therefore uses different representations for char * and void *
pointers than for other pointers.
The Symbolics Lisp Machine, a tagged architecture, does not even have
conventional numeric pointers; it uses the pair <NIL, 0> (basically a
nonexistent <object, offset> handle) as a C null pointer.
Depending on the "memory model" in use, 8086-family processors (PC
compatibles) may use 16-bit data pointers and 32-bit function
pointers, or vice versa.
Some 64-bit Cray machines represent int * in the lower 48 bits of a
word; char * additionally uses some of the upper 16 bits to indicate a
byte address within a word.
How to set a pointer to zero'th location?
One way is to store all-bits-zero into your pointer:
void* zero;
memset(&zero, 0, sizeof(zero));
Why NULL is not predefined by the compiler
C 2011 Standard, online draft
6.3.2.3 Pointers
...

3 An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.⁶⁶⁾ If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
^{66) The macro NULL is deﬁned in <stddef.h> (and other headers) as a null pointer constant; see 7.19.}
The macro NULL is always defined as a zero-valued constant expression; it can be a naked 0, or 0 cast to void *, or some other integral expression that evaluates to 0. As far as your source code is concerned, NULL will always evaluate to 0.
Once the code has been translated, any occurrence of the null pointer constant (0, NULL, etc.) will be replaced with whatever the underlying architecture uses for a null pointer, which may or may not be 0-valued.

Related Topics

Scope of Variables in If Statements
Using Erase-Remove_If Idiom
How to Get the Type of a Variable
When Is an Object "Out of Scope"
How to Correctly and Standardly Compare Floats
Error: Expected Class-Name Before '{' Token
Using Sizeof on Arrays Passed as Parameters
How to Make Generic Computations Over Heterogeneous Argument Packs of a Variadic Template Function
Function Signature-Like Expressions as C++ Template Arguments
Multiple Dispatch in C++
About Binding a Const Reference to a Sub-Object of a Temporary
Opengl Line Width
C++ Valarray VS. Vector
Compilation Fails with "Relocation R_X86_64_32 Against '.Rodata.Str1.8' Can Not Be Used When Making a Shared Object"
Fork() and Pipes() in C
Convert Std::Bind to Function Pointer
How to Declare a Member Vector of the Same Class
Os Specific Instructions in Cmake: How To}