Why Is Address Zero Used For the Null Pointer

Why is address zero used for the null pointer?

2 points:

only the constant value 0 in the source code is the null pointer - the compiler implementation can use whatever value it wants or needs in the running code. Some platforms have a special pointer value that's 'invalid' that the implementation might use as the null pointer. The C FAQ has a question, "Seriously, have any actual machines really used nonzero null pointers, or different representations for pointers to different types?", that points out several platforms that used this property of 0 being the null pointer in C source while represented differently at runtime. The C++ standard has a note that makes clear that converting "an integral constant expression with value zero always yields a null pointer, but converting other expressions that happen to have value zero need not yield a null pointer".
a negative value might be just as usable by the platform as an address - the C standard simply had to chose something to use to indicate a null pointer, and zero was chosen. I'm honestly not sure if other sentinel values were considered.

The only requirements for a null pointer are:

it's guaranteed to compare unequal to a pointer to an actual object
any two null pointers will compare equal (C++ refines this such that this only needs to hold for pointers to the same type)

Could I ever want to access the address zero?

Neither in C nor in C++ null-pointer value is in any way tied to physical address 0. The fact that you use constant 0 in the source code to set a pointer to null-pointer value is nothing more than just a piece of syntactic sugar. The compiler is required to translate it into the actual physical address used as null-pointer value on the specific platform.

In other words, 0 in the source code has no physical importance whatsoever. It could have been 42 or 13, for example. I.e. the language authors, if they so pleased, could have made it so that you'd have to do p = 42 in order to set the pointer p to null-pointer value. Again, this does not mean that the physical address 42 would have to be reserved for null pointers. The compiler would be required to translate source code p = 42 into machine code that would stuff the actual physical null-pointer value (0x0000 or 0xBAAD) into the pointer p. That's exactly how it is now with constant 0.

Also note, that neither C nor C++ provides a strictly defined feature that would allow you to assign a specific physical address to a pointer. So your question about "how one would assign 0 address to a pointer" formally has no answer. You simply can't assign a specific address to a pointer in C/C++. However, in the realm of implementation-defined features, the explicit integer-to-pointer conversion is intended to have that effect. So, you'd do it as follows

uintptr_t address = 0;
void *p = (void *) address;

Note, that this is not the same as doing

void *p = 0;

The latter always produces the null-pointer value, while the former in general case does not. The former will normally produce a pointer to physical address 0, which might or might not be the null-pointer value on the given platform.

Is it safe to assume that the NULL constant is zero?

Is it safe to assume that the NULL constant is zero?

NULL will compare equal to 0.

NULL is very commonly a zero bit pattern. It is possible for NULL to be a non-zero bit pattern - but not seen these days.

OP is mixing as least 4 things: NULL, null pointer constant, null pointer, comparing a null pointer to 0. C does not define a NULL constant.

NULL

NULL is a macro "which expands to an implementation-defined null
pointer constant" C17dr § 7.19 3

null pointer constant

An integer constant expression with the value 0, or such an expression
cast to type void *, is called a null pointer constant. C17dr § §
6.3.2.3 3

Thus the type of a null pointer constant may be int, unsigned, long, ... or void * .

When an integer constant expression¹, the null pointer constant value is 0. As a pointer like ((void *)0), its value/encoding is not specified. It ubiquitously does have the bit pattern of zeros, but is not specified so.

There may be many null pointer constants. They all compare equal to each other.

Note: the size of a null pointer constant, when it is an integer, may differ from the size of an object pointer. This size difference is often avoided by appending a L or two suffix as needed.

null pointer

If a null pointer constant is converted to a pointer type, the
resulting pointer, called a null pointer, is guaranteed to compare
unequal to a pointer to any object or function. C17dr § § 6.3.2.3 3
Conversion of a null pointer to another pointer type yields a null
pointer of that type. Any two null pointers shall compare equal. C17dr
§ § 6.3.2.3 4

The type of null pointer is some pointer, either an object pointer like int *, char * or function pointer like int (*)(int, int) or void *.

The value of a null pointer is not specified. It ubiquitously does have the bit pattern of zeros, but is not specified so.

All null pointer compare as equal, regardless of their encoding.

comparing a null pointer to 0

if(!ptr) is the same as if(!(ptr != 0)). When the pointer ptr, which is a null pointer, is compared to 0, the zero is converted to a pointer, a null pointer of the same type: int *. These 2 null pointers, which could have different bit patterns, compare as equal.

So when it is not safe to assume that the NULL constant is zero?

NULL may be a ((void*)0) and its bit pattern may differ from zeros. It does compare equal to 0 as above regardless of its encoding. Recall pointer compares have been discussed, not integer compares. Converting NULL to an integer may not result in an integer value of 0 even if ((void*)0) was all zero bits.

printf("%ju\n", (uintmax_t)(uintptr_t)NULL); // Possible not 0

Notice this is converting a pointer to an integer, not the case of if(!ptr) where a 0 was converted to a pointer.

The C spec embraces many old ways of doing things and is open to novel new ones. I have never came across an implementation where NULL was not an all zeros bit pattern. Given much code exist that assumes NULL is all zero bits, I suspect only old obscure implementations ever used a non-zero bit-pattern NULL and that NULL can be all but certain to be an all zero bit pattern.

¹ The null pointer constant is 1) an integer or 2) a void*. "When an integer ..." refers to the first case, not a cast or conversion of the second case as in (int)((void*)0).

Why is NULL/0 an illegal memory location for an object?

The null pointer does not actually have to be 0. It's guaranteed in the C spec that when a constant 0 value is given in the context of a pointer it is treated as null by the compiler, however if you do

char *foo = (void *)1;
--foo;
// do something with foo

You will access the 0-address, not necessarily the null pointer. In most cases this happens to actually be the case, but it's not necessary, so we don't really have to waste that byte. Although, in the larger picture, if it isn't 0, it has to be something, so a byte is being wasted somewhere

Edit: Edited out the use of NULL due to the confusion in the comments. Also, the main message here is "null pointer != 0, and here's some C/pseudo code that shows the point I'm trying to make." Please don't actually try to compile this or worry about whether the types are proper; the meaning is clear.

How compiler handles a non-zero null pointer value in C?

if (!pointer)

If the C implementation used the value DEADBEEF₁₆ for a null pointer, the compiler would compile if (!pointer) to code such as:

    compare             pointer, #0xDEADBEEF
    branch-if-not-equal else-clause

if (pointer == 0)

An integer constant zero qualifies as a “null pointer constant” (C 2018 6.3.2.3 3). When a pointer is compared to a null pointer constant, the null pointer constant is converted to the type of the pointer (6.5.9 5). The compiler would implement this conversion by producing DEADBEEF₁₆ for the resulting pointer. Then it would compare pointer to DEADBEEF₁₆ and branch accordingly.

Simply put, just because the character “0” appears in source code does not mean the compiler has to use zero in the instructions it generates. It generates whatever instructions and values it needs to get the job done.

I am really unable to understand how this literal 0 becomes non-all-bits-zero when initialized to a pointer.

There is nothing about the character “0” that forces a compiler to give it a value of zero. The code for “0” is 48 in ASCII and 240 in EBCDIC, so the compiler is not starting with a value of zero when it processes this or other characters. Normally, when processing numerals, it has to read the digits and do some arithmetic to calculate the numbers represented by the numerals. It is that software that causes “0” to stand for the value zero or that causes “23” to stand for the value twenty-three. To make “0” represent a null pointer, the software in the compiler simply substitutes the internal value of a null pointer wherever “0” is used in a context for a pointer.

For example, in void *x = 0;, the compiler might initially convert “0” to zero, but this will be part of a data structure that also says this is a token or integer constant expression that currently has the value zero. When the compiler sees this integer constant expression is being used to initialize a pointer, it will change the value, and it will generate code that initializes the pointer to the internal value for a null pointer.

Are these null pointers, or are they pointers to address 0?

p1 and p2 are null pointers; p3 is implementation defined,
and may be something else. (A comma operator cannot be part of
a constant expression. And the mapping of a non-constant
integral value 0 to a pointer is implementation defined.) C is
identical to C++ here.

p8 and p9 are both null pointers in C++, but not in C.

With regards to your comment on static_zero_2, there is no
requirement in either language that a literal zero be present,
anywhere. g++ defines NULL as the compiler built-in __null,
for example, and you can use (1 - 1), or '\0', or any other
constant expression evaluating to 0.

Access to field results in dereference of a null pointer in C

You haven't allocated a memory to the structure and yet you are accessing it's member in

void allocateStruct(int sizeN, A* aType) {
    aType->x = (int*)malloc(sizeN * sizeof(int));   
    aType->y = (int*)malloc(sizeN * sizeof(int));    
}

allocate memory to the structure itself first

atype = malloc(sizeof(A))

You need to return the address of atype to your main function as you are passing the pointer by value else your changes in allocateStruct wont be accessible in main and also cause a memory leak. You don't need to pass atype as a parameter in case you are returning the address.

A* allocateStruct(int sizeN){
    A* atype;
    atype = malloc(sizeof(A));
    aType->x = malloc(sizeN * sizeof(int));   
    aType->y = malloc(sizeN * sizeof(int)); 
    return atype;
}

and in main

atype = allocateStruct(5);

Also you dont need to typecast explicitly in C, the malloc returns a void pointer and it can be assigned to any type. And also for completeness so that you dont cause memory leaks at the end of main just free all your memory that you have malloced.

free(atype->x);
free(atype->y);
free(atype);

Why Is Address Zero Used For the Null Pointer