At What Point Does Dereferencing the Null Pointer Become Undefined Behavior

At what point does dereferencing the null pointer become undefined behavior?

Yes it is undefined behavior, because the spec says that an "lvalue designates an object or function" (at clause 3.10) and it says for the *-operator "the result [of dereferencing] is an lvalue referring to the object or function to which the expression points" (at clause 5.3.1).

That means there is no description for what happens when you dereference a null pointer. It's simply undefined behavior.

Why dereferencing a null pointer is undefined behaviour?

Defining consistent behavior for dereferencing a NULL pointer would require the compiler to check for NULL pointers before each dereference on most CPU architectures. This is an unacceptable burden for a language that is designed for speed.

It also only fixes a small part of a larger problem - there are many ways to have an invalid pointer beyond a NULL pointer.

Is null pointer dereference undefined behavior in Objective-C?

Since Objective-C is nothing more than an object-oriented layer on top of C, pure C statements don't have special additional meanings. According to this, in this case, *(long*)0 = 0; is evaluated and interpreted just like in C (since it is C) and thus it invokes undefined behavior. As such, it is not guaranteed to do anything.

Is dereferencing a pointer that's equal to nullptr undefined behavior by the standard?

As you quote C, dereferencing a null pointer is clearly undefined behavior from this Standard quote (emphasis mine):

(C11, 6.5.3.2p4) "If an invalid value has been assigned to the pointer, the
behavior of the unary * operator is undefined
.102)"

102): "Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime."

Exact same quote in C99 and similar in C89 / C90.

dereferencing the null pointer

The answer to this question is: it depends which language standard you are following :-).

In C90 and C++, this is not valid because you perform indirection on the null pointer (by doing *p), and doing so results in undefined behavior.

However, in C99, this is valid, well-formed, and well-defined. In C99, if the operand of the unary-& was obtained as the result of applying the unary-* or by performing subscripting ([]), then neither the & nor the * or [] is applied. For example:

int* p = 0;
int* q = &*p; // In C99, this is equivalent to int* q = p;

Likewise,

int* p = 0;
int* q = &p[0]; // In C99, this is equivalent to int* q = p + 0;

From C99 §6.5.3.2/3:

If the operand [of the unary & operator] is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue.

Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator.

(and its footnote, #84):

Thus, &*E is equivalent to E (even if E is a null pointer)

Difference between dereferencing NULL pointer and uninitialised pointer

Your expectation of a runtime error is flawed.

Dereferencing an uninitialised/invalid pointer with arbitrary value can do anything at all.

That means the potential symptoms range from:

  • nothing happens
  • something happens
  • a runtime error happens
  • your source code is spontaneously edited to use proper standard headers instead of misguidedly hacking in implementation "bits"
  • your attitude in the comments section is magically improved
  • your cat is murdered
  • your cat is not murdered
  • your cat is murdered and not murdered
  • your cat murders itself
  • a black hole opens inside your cat
  • a cat opens up inside a black hole

and so on.

This is true for dereferencing NULL, too, but modern commodity hardware tends to treat NULL dereferences specially, usually guaranteeing a segmentation fault to aid in diagnostics. Obviously, a CPU cannot do that for arbitrary pointer values, because they may be valid as far as it knows!

What does C6011 dereferencing null pointer mean in my program?

First of all, note that the warning is generated by your compiler (or static analyzer, or linter), not by your debugger, as you initially wrote.

The warning is telling you that your program possibly might dereference a null pointer. The reason for this warning is that you perform a malloc() and then use the result (the pointer) without checking for NULL values. In this specific code example, malloc() will most likely just return the requested block of memory. On any desktop computer or laptop, there's generally no reason why it would fail to allocate 12 bytes. That's why your application just runs fine and exits successfully. However, if this would be part of a larger application and/or run on a memory-limited system such as an embedded system, malloc() could fail and return NULL. Note that malloc() does not only fail if there is not enough memory available, it could also fail if there is no large enough consecutive block of memory available, due to fragmentation.

According to the C standard, dereferencing a NULL pointer is undefined behavior, meaning that anything could happen. On modern computers it would likely get your application killed (which could lead to data loss or corruption, depending on what the application does). On older computers or embedded systems the problem might be undetected and your application would read from or (worse) write to the address NULL (which is most likely 0, but even that isn't guaranteed by the C standard). This could lead to data corruption, crashes or other unexpected behavior at an arbitrary time after this happened.

Note that the compiler/analyzer/linter doesn't know anything about your application or the platform you will be running it on, and it doesn't make any assumptions about it. It just warns you about this possible problem. It's up to you to determine if this specific warning is relevant for your situation and how to deal with it.

Generally speaking, there are three things you can do about it:

  1. If you know for sure that malloc() would never fail (for example, in such a toy example that you would only run on a modern computer with gigabytes of memory) or if you don't care about the results (because the application will be killed by your OS and you don't mind), then there's no need for this warning. Just disable it in your compiler, or ignore the warning message.

  2. If you don't expect malloc() to fail, but do want to be informed when it happens, the quick-and-dirty solution is to add assert(v != NULL); after the malloc. Note that this will also exit your application when it happens, but in a slightly more controlled way, and you'll get an error message stating where the problem occurred. I would recommend this for simple hobby projects, where you do not want to spend much time on error handling and corner cases but just want to have some fun programming :-)

  3. When there is a realistic change that malloc() would fail and you want a well-defined behavior of your application, you should definitely add code to handle that situation (check for NULL values). If this is the case, you would generally have to do more than just add an if-statement. You would have to think about how the application can continue to work or gracefully shutdown without requiring more memory allocations. And on an embedded system, you would also have to think about things such as memory fragmentation.

The easiest fix for the example code in question is add the NULL-check. This would make the warning go away, and (assuming malloc() would not fail) your program would run still the same.

int main(void) {
uint32_t *v = malloc(3 * sizeof(uint32_t));
if (v != NULL) {
v[0] = 12;
v[1] = 59;
v[2] = 83;
twice_three(v);
free(v);
}
return 0;
}

What happens in OS when we dereference a NULL pointer in C?

Short answer: it depends on a lot of factors, including the compiler, processor architecture, specific processor model, and the OS, among others.

Long answer (x86 and x86-64): Let's go down to the lowest level: the CPU. On x86 and x86-64, that code will typically compile into an instruction or instruction sequence like this:

movl $10, 0x00000000

Which says to "store the constant integer 10 at virtual memory address 0". The Intel® 64 and IA-32 Architectures Software Developer Manuals describe in detail what happens when this instruction gets executed, so I'm going to summarize it for you.

The CPU can operate in several different modes, several of which are for backwards compatibility with much older CPUs. Modern operating systems run user-level code in a mode called protected mode, which uses paging to convert virtual addresses into physical addresses.

For each process, the OS keeps a page table which dictates how the addresses are mapped. The page table is stored in memory in a specific format (and protected so that they can not be modified by the user code) that the CPU understands. For every memory access that happens, the CPU translates it according to the page table. If the translation succeeds, it performs the corresponding read/write to the physical memory location.

The interesting things happen when the address translation fails. Not all addresses are valid, and if any memory access generates an invalid address, the processor raises a page fault exception. This triggers a transition from user mode (aka current privilege level (CPL) 3 on x86/x86-64) into kernel mode (aka CPL 0) to a specific location in the kernel's code, as defined by the interrupt descriptor table (IDT).

The kernel regains control and, based on the information from the exception and the process's page table, figures out what happened. In this case, it realizes that the user-level process accessed an invalid memory location, and then it reacts accordingly. On Windows, it will invoke structured exception handling to allow the user code to handle the exception. On POSIX systems, the OS will deliver a SIGSEGV signal to the process.

In other cases, the OS will handle the page fault internally and restart the process from its current location as if nothing happened. For example, guard pages are placed at the bottom of the stack to allow the stack to grow on demand up to a limit, instead of preallocating a large amount of memory for the stack. Similar mechanisms are used for achieving copy-on-write memory.

In modern OSes, the page tables are usually set up to make the address 0 an invalid virtual address. But sometimes it's possible to change that, e.g. on Linux by writing 0 to the pseudofile /proc/sys/vm/mmap_min_addr, after which it's possible to use mmap(2) to map the virtual address 0. In that case, dereferencing a null pointer would not cause a page fault.

The above discussion is all about what happens when the original code is running in user space. But this could also happen inside the kernel. The kernel can (and is certainly much more likely than user code to) map the virtual address 0, so such a memory access would be normal. But if it's not mapped, then what happens then is largely similar: the CPU raises a page fault error which traps into a predefined point at the kernel, the kernel examines what happened, and reacts accordingly. If the kernel can't recover from the exception, it will typically panic in some fashion (kernel panic, kernel oops, or a BSOD on Windows, e.g.) by printing out some debug information to the console or serial port and then halting.

See also Much ado about NULL: Exploiting a kernel NULL dereference for an example of how an attacker could exploit a null pointer dereference bug from inside the kernel in order to gain root privileges on a Linux machine.

Why Simply de-referencing a pointer with NULL assigned is not crashing

Dereferencing a null pointer is undefined behavior. Undefined behavior is not equal to error. Anything may happen if you triggered undefined behavior. If you are asking the other way:

Why is undefined behavior not causing an error instead of giving us strange behavior?

It may be due to many reason. One reason is performance. For example, in the implementation of std::vector (at least in MSVC), there is no check if the index is out of range in Release Mode. You can try to do this:

std::vector<int> v(4);
v[4]=0;

It will compile and run. You may got strange behavior or you may not. However, in Debug mode it will throw exception at run-time. MSVC does the check in Debug Mode because performance is not important in Debug Mode. But it does not in Release Mode because performance matters.

The same applies for dereferencing a null pointer. You can image that the code of dereferencing will be put in a wrapper like this:

//Imaginary code
T& dereference(T* ptr){
if(ptr==nullptr){
throw;
}
return *ptr;
}

This part: if(ptr==nullptr){throw;} will slow the dereferencing process of every pointer in the context which is not desirable.

However, it may be done like this:

//Imaginary code
T& dereference(T* ptr){
#ifdef DEBUG
if(ptr==nullptr){
throw;
}
#endif
return *ptr;
}

I think you got the idea now.



Related Topics



Leave a reply



Submit