Is There a Platform or Situation Where Dereferencing (But Not Using) a Null Pointer to Make a Null Reference Will Behave Badly

Is using NULL references OK?

No. It is undefined behaviour and can lead to code to do anything (including reformatting you hard disk, core dumping or insulting your mother).

If you need to be able to pass NULL, then use pointers. Code that takes a reference can assume it refers to a valid object.


Addendum: The C++03 Standard (ISO/IEC 14882, 2nd edition 2003) says, in §8.3.2 "References", paragraph 4:

A reference shall be initialized to refer to a valid object
or function. [Note: in particular, a null reference cannot exist in a well-defined program, because the only
way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer,
which causes undefined behavior.
As described in 9.6, a reference cannot be bound directly to a bit-field. ]

[Bold added for emphasis]

What happens in OS when we dereference a NULL pointer in C?

Short answer: it depends on a lot of factors, including the compiler, processor architecture, specific processor model, and the OS, among others.

Long answer (x86 and x86-64): Let's go down to the lowest level: the CPU. On x86 and x86-64, that code will typically compile into an instruction or instruction sequence like this:

movl $10, 0x00000000

Which says to "store the constant integer 10 at virtual memory address 0". The Intel® 64 and IA-32 Architectures Software Developer Manuals describe in detail what happens when this instruction gets executed, so I'm going to summarize it for you.

The CPU can operate in several different modes, several of which are for backwards compatibility with much older CPUs. Modern operating systems run user-level code in a mode called protected mode, which uses paging to convert virtual addresses into physical addresses.

For each process, the OS keeps a page table which dictates how the addresses are mapped. The page table is stored in memory in a specific format (and protected so that they can not be modified by the user code) that the CPU understands. For every memory access that happens, the CPU translates it according to the page table. If the translation succeeds, it performs the corresponding read/write to the physical memory location.

The interesting things happen when the address translation fails. Not all addresses are valid, and if any memory access generates an invalid address, the processor raises a page fault exception. This triggers a transition from user mode (aka current privilege level (CPL) 3 on x86/x86-64) into kernel mode (aka CPL 0) to a specific location in the kernel's code, as defined by the interrupt descriptor table (IDT).

The kernel regains control and, based on the information from the exception and the process's page table, figures out what happened. In this case, it realizes that the user-level process accessed an invalid memory location, and then it reacts accordingly. On Windows, it will invoke structured exception handling to allow the user code to handle the exception. On POSIX systems, the OS will deliver a SIGSEGV signal to the process.

In other cases, the OS will handle the page fault internally and restart the process from its current location as if nothing happened. For example, guard pages are placed at the bottom of the stack to allow the stack to grow on demand up to a limit, instead of preallocating a large amount of memory for the stack. Similar mechanisms are used for achieving copy-on-write memory.

In modern OSes, the page tables are usually set up to make the address 0 an invalid virtual address. But sometimes it's possible to change that, e.g. on Linux by writing 0 to the pseudofile /proc/sys/vm/mmap_min_addr, after which it's possible to use mmap(2) to map the virtual address 0. In that case, dereferencing a null pointer would not cause a page fault.

The above discussion is all about what happens when the original code is running in user space. But this could also happen inside the kernel. The kernel can (and is certainly much more likely than user code to) map the virtual address 0, so such a memory access would be normal. But if it's not mapped, then what happens then is largely similar: the CPU raises a page fault error which traps into a predefined point at the kernel, the kernel examines what happened, and reacts accordingly. If the kernel can't recover from the exception, it will typically panic in some fashion (kernel panic, kernel oops, or a BSOD on Windows, e.g.) by printing out some debug information to the console or serial port and then halting.

See also Much ado about NULL: Exploiting a kernel NULL dereference for an example of how an attacker could exploit a null pointer dereference bug from inside the kernel in order to gain root privileges on a Linux machine.

Why dereferencing a null pointer is undefined behaviour?

Defining consistent behavior for dereferencing a NULL pointer would require the compiler to check for NULL pointers before each dereference on most CPU architectures. This is an unacceptable burden for a language that is designed for speed.

It also only fixes a small part of a larger problem - there are many ways to have an invalid pointer beyond a NULL pointer.

Dereferencing a null pointer

A null pointer is not a pointer to "memory [whose] address is simply 0". It's just a special pointer that doesn't point to anything valid.

The C language says that there are no requirements on the behaviour of a program that dereferences a null pointer.

C/C++ nullptr dereference

TL;DR &(*(char*)0) is well defined.

The C++ standard doesn't say that indirection of null pointer by itself has UB. Current standard draft, [expr.unary.op]

  1. The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to T”, the type of the result is “T”. [snip]

  2. The result of the unary & operator is a pointer to its operand. The operand shall be an lvalue or a qualified-id. [snip]

There is no UB unless the lvalue of the indirection expression is converted to an rvalue.


The C standard is much more explicit. C11 standard draft §6.5.3.2


  1. The unary & operator yields the address of its operand. If the operand has type "type", the result has type "pointer to type". If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the
    & operator were removed and the [] operator were changed to a + operator. Otherwise, the result is a pointer to the object or function designated by its operand.


Related Topics



Leave a reply



Submit