Dereferencing an Invalid Pointer, Then Taking the Address of the Result

Dereferencing an invalid pointer, then taking the address of the result

Assuming the variable `ptr' does not contain a pointer to a valid object, the undefined behavior occurs if the program necessitates the lvalue-to-rvalue conversion of the expression `*ptr', as specified in [conv.lval] (ISO/IEC 14882:2011, page 82, 4.1 [#1]).

During the evaluation of `&*ptr' the program does not necessitate the lvalue-to-rvalue conversion of the subexpression `*ptr', according to [expr.unary.op] (ISO/IEC 14882:2011, page 109, 5.3.1 [#3])

Hence, it is legal.

Is dereferencing invalid pointers legal if no lvalue-to-rvalue conversion occurs

[basic.compound] says:

Every value of pointer type is one of the following:

  • a pointer to an object or function (the pointer is said to point to the object or function), or
  • a pointer past the end of an object ([expr.add]), or
  • the null pointer value ([conv.ptr]) for that type, or
  • an invalid pointer value.

By the process of elimination we can deduce that p is an invalid pointer value.

[basic.stc] says:

Indirection through an invalid pointer value and passing an invalid
pointer value to a deallocation function have undefined behavior. Any
other use of an invalid pointer value has implementation-defined
behavior.

As indirection operator is said to perform indirection by [expr.unary.op], I would say, that expression *p causes UB no matter if the result is used or not.

Is storing an invalid pointer automatically undefined behavior?

I have the C Draft Standard here, and it makes it undefined by omission. It defines the case of ptr + I at 6.5.6/8 for

  • If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression.
  • Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object.

Your case does not fit any of these. Neither is your array large enough to have -1 adjust the pointer to point to a different array element, nor does any of the result or original pointer point one-past-end.

What does C6011 dereferencing null pointer mean in my program?

First of all, note that the warning is generated by your compiler (or static analyzer, or linter), not by your debugger, as you initially wrote.

The warning is telling you that your program possibly might dereference a null pointer. The reason for this warning is that you perform a malloc() and then use the result (the pointer) without checking for NULL values. In this specific code example, malloc() will most likely just return the requested block of memory. On any desktop computer or laptop, there's generally no reason why it would fail to allocate 12 bytes. That's why your application just runs fine and exits successfully. However, if this would be part of a larger application and/or run on a memory-limited system such as an embedded system, malloc() could fail and return NULL. Note that malloc() does not only fail if there is not enough memory available, it could also fail if there is no large enough consecutive block of memory available, due to fragmentation.

According to the C standard, dereferencing a NULL pointer is undefined behavior, meaning that anything could happen. On modern computers it would likely get your application killed (which could lead to data loss or corruption, depending on what the application does). On older computers or embedded systems the problem might be undetected and your application would read from or (worse) write to the address NULL (which is most likely 0, but even that isn't guaranteed by the C standard). This could lead to data corruption, crashes or other unexpected behavior at an arbitrary time after this happened.

Note that the compiler/analyzer/linter doesn't know anything about your application or the platform you will be running it on, and it doesn't make any assumptions about it. It just warns you about this possible problem. It's up to you to determine if this specific warning is relevant for your situation and how to deal with it.

Generally speaking, there are three things you can do about it:

  1. If you know for sure that malloc() would never fail (for example, in such a toy example that you would only run on a modern computer with gigabytes of memory) or if you don't care about the results (because the application will be killed by your OS and you don't mind), then there's no need for this warning. Just disable it in your compiler, or ignore the warning message.

  2. If you don't expect malloc() to fail, but do want to be informed when it happens, the quick-and-dirty solution is to add assert(v != NULL); after the malloc. Note that this will also exit your application when it happens, but in a slightly more controlled way, and you'll get an error message stating where the problem occurred. I would recommend this for simple hobby projects, where you do not want to spend much time on error handling and corner cases but just want to have some fun programming :-)

  3. When there is a realistic change that malloc() would fail and you want a well-defined behavior of your application, you should definitely add code to handle that situation (check for NULL values). If this is the case, you would generally have to do more than just add an if-statement. You would have to think about how the application can continue to work or gracefully shutdown without requiring more memory allocations. And on an embedded system, you would also have to think about things such as memory fragmentation.

The easiest fix for the example code in question is add the NULL-check. This would make the warning go away, and (assuming malloc() would not fail) your program would run still the same.

int main(void) {
uint32_t *v = malloc(3 * sizeof(uint32_t));
if (v != NULL) {
v[0] = 12;
v[1] = 59;
v[2] = 83;
twice_three(v);
free(v);
}
return 0;
}

dereferencing the null pointer

The answer to this question is: it depends which language standard you are following :-).

In C90 and C++, this is not valid because you perform indirection on the null pointer (by doing *p), and doing so results in undefined behavior.

However, in C99, this is valid, well-formed, and well-defined. In C99, if the operand of the unary-& was obtained as the result of applying the unary-* or by performing subscripting ([]), then neither the & nor the * or [] is applied. For example:

int* p = 0;
int* q = &*p; // In C99, this is equivalent to int* q = p;

Likewise,

int* p = 0;
int* q = &p[0]; // In C99, this is equivalent to int* q = p + 0;

From C99 §6.5.3.2/3:

If the operand [of the unary & operator] is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue.

Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator.

(and its footnote, #84):

Thus, &*E is equivalent to E (even if E is a null pointer)

What does dereferencing a pointer mean?

Reviewing the basic terminology

It's usually good enough - unless you're programming assembly - to envisage a pointer containing a numeric memory address, with 1 referring to the second byte in the process's memory, 2 the third, 3 the fourth and so on....

  • What happened to 0 and the first byte? Well, we'll get to that later - see null pointers below.
  • For a more accurate definition of what pointers store, and how memory and addresses relate, see "More about memory addresses, and why you probably don't need to know" at the end of this answer.

When you want to access the data/value in the memory that the pointer points to - the contents of the address with that numerical index - then you dereference the pointer.

Different computer languages have different notations to tell the compiler or interpreter that you're now interested in the pointed-to object's (current) value - I focus below on C and C++.

A pointer scenario

Consider in C, given a pointer such as p below...

const char* p = "abc";

...four bytes with the numerical values used to encode the letters 'a', 'b', 'c', and a 0 byte to denote the end of the textual data, are stored somewhere in memory and the numerical address of that data is stored in p. This way C encodes text in memory is known as ASCIIZ.

For example, if the string literal happened to be at address 0x1000 and p a 32-bit pointer at 0x2000, the memory content would be:

Memory Address (hex)    Variable name    Contents
1000 'a' == 97 (ASCII)
1001 'b' == 98
1002 'c' == 99
1003 0
...
2000-2003 p 1000 hex

Note that there is no variable name/identifier for address 0x1000, but we can indirectly refer to the string literal using a pointer storing its address: p.

Dereferencing the pointer

To refer to the characters p points to, we dereference p using one of these notations (again, for C):

assert(*p == 'a');  // The first character at address p will be 'a'
assert(p[1] == 'b'); // p[1] actually dereferences a pointer created by adding
// p and 1 times the size of the things to which p points:
// In this case they're char which are 1 byte in C...
assert(*(p + 1) == 'b'); // Another notation for p[1]

You can also move pointers through the pointed-to data, dereferencing them as you go:

++p;  // Increment p so it's now 0x1001
assert(*p == 'b'); // p == 0x1001 which is where the 'b' is...

If you have some data that can be written to, then you can do things like this:

int x = 2;
int* p_x = &x; // Put the address of the x variable into the pointer p_x
*p_x = 4; // Change the memory at the address in p_x to be 4
assert(x == 4); // Check x is now 4

Above, you must have known at compile time that you would need a variable called x, and the code asks the compiler to arrange where it should be stored, ensuring the address will be available via &x.

Dereferencing and accessing a structure data member

In C, if you have a variable that is a pointer to a structure with data members, you can access those members using the -> dereferencing operator:

typedef struct X { int i_; double d_; } X;
X x;
X* p = &x;
p->d_ = 3.14159; // Dereference and access data member x.d_
(*p).d_ *= -1; // Another equivalent notation for accessing x.d_

Multi-byte data types

To use a pointer, a computer program also needs some insight into the type of data that is being pointed at - if that data type needs more than one byte to represent, then the pointer normally points to the lowest-numbered byte in the data.

So, looking at a slightly more complex example:

double sizes[] = { 10.3, 13.4, 11.2, 19.4 };
double* p = sizes;
assert(p[0] == 10.3); // Knows to look at all the bytes in the first double value
assert(p[1] == 13.4); // Actually looks at bytes from address p + 1 * sizeof(double)
// (sizeof(double) is almost always eight bytes)
++p; // Advance p by sizeof(double)
assert(*p == 13.4); // The double at memory beginning at address p has value 13.4
*(p + 2) = 29.8; // Change sizes[3] from 19.4 to 29.8
// Note earlier ++p and + 2 here => sizes[3]

Pointers to dynamically allocated memory

Sometimes you don't know how much memory you'll need until your program is running and sees what data is thrown at it... then you can dynamically allocate memory using malloc. It is common practice to store the address in a pointer...

int* p = (int*)malloc(sizeof(int)); // Get some memory somewhere...
*p = 10; // Dereference the pointer to the memory, then write a value in
fn(*p); // Call a function, passing it the value at address p
(*p) += 3; // Change the value, adding 3 to it
free(p); // Release the memory back to the heap allocation library

In C++, memory allocation is normally done with the new operator, and deallocation with delete:

int* p = new int(10); // Memory for one int with initial value 10
delete p;

p = new int[10]; // Memory for ten ints with unspecified initial value
delete[] p;

p = new int[10](); // Memory for ten ints that are value initialised (to 0)
delete[] p;

See also C++ smart pointers below.

Losing and leaking addresses

Often a pointer may be the only indication of where some data or buffer exists in memory. If ongoing use of that data/buffer is needed, or the ability to call free() or delete to avoid leaking the memory, then the programmer must operate on a copy of the pointer...

const char* p = asprintf("name: %s", name);  // Common but non-Standard printf-on-heap

// Replace non-printable characters with underscores....
for (const char* q = p; *q; ++q)
if (!isprint(*q))
*q = '_';

printf("%s\n", p); // Only q was modified
free(p);

...or carefully orchestrate reversal of any changes...

const size_t n = ...;
p += n;
...
p -= n; // Restore earlier value...
free(p);

C++ smart pointers

In C++, it's best practice to use smart pointer objects to store and manage the pointers, automatically deallocating them when the smart pointers' destructors run. Since C++11 the Standard Library provides two, unique_ptr for when there's a single owner for an allocated object...

{
std::unique_ptr<T> p{new T(42, "meaning")};
call_a_function(p);
// The function above might throw, so delete here is unreliable, but...
} // p's destructor's guaranteed to run "here", calling delete

...and shared_ptr for share ownership (using reference counting)...

{
auto p = std::make_shared<T>(3.14, "pi");
number_storage1.may_add(p); // Might copy p into its container
number_storage2.may_add(p); // Might copy p into its container } // p's destructor will only delete the T if neither may_add copied it

Null pointers

In C, NULL and 0 - and additionally in C++ nullptr - can be used to indicate that a pointer doesn't currently hold the memory address of a variable, and shouldn't be dereferenced or used in pointer arithmetic. For example:

const char* p_filename = NULL; // Or "= 0", or "= nullptr" in C++
int c;
while ((c = getopt(argc, argv, "f:")) != -1)
switch (c) {
case f: p_filename = optarg; break;
}
if (p_filename) // Only NULL converts to false
... // Only get here if -f flag specified

In C and C++, just as inbuilt numeric types don't necessarily default to 0, nor bools to false, pointers are not always set to NULL. All these are set to 0/false/NULL when they're static variables or (C++ only) direct or indirect member variables of static objects or their bases, or undergo zero initialisation (e.g. new T(); and new T(x, y, z); perform zero-initialisation on T's members including pointers, whereas new T; does not).

Further, when you assign 0, NULL and nullptr to a pointer the bits in the pointer are not necessarily all reset: the pointer may not contain "0" at the hardware level, or refer to address 0 in your virtual address space. The compiler is allowed to store something else there if it has reason to, but whatever it does - if you come along and compare the pointer to 0, NULL, nullptr or another pointer that was assigned any of those, the comparison must work as expected. So, below the source code at the compiler level, "NULL" is potentially a bit "magical" in the C and C++ languages...

More about memory addresses, and why you probably don't need to know

More strictly, initialised pointers store a bit-pattern identifying either NULL or a (often virtual) memory address.

The simple case is where this is a numeric offset into the process's entire virtual address space; in more complex cases the pointer may be relative to some specific memory area, which the CPU may select based on CPU "segment" registers or some manner of segment id encoded in the bit-pattern, and/or looking in different places depending on the machine code instructions using the address.

For example, an int* properly initialised to point to an int variable might - after casting to a float* - access memory in "GPU" memory quite distinct from the memory where the int variable is, then once cast to and used as a function pointer it might point into further distinct memory holding machine opcodes for the program (with the numeric value of the int* effectively a random, invalid pointer within these other memory regions).

3GL programming languages like C and C++ tend to hide this complexity, such that:

  • If the compiler gives you a pointer to a variable or function, you can dereference it freely (as long as the variable's not destructed/deallocated meanwhile) and it's the compiler's problem whether e.g. a particular CPU segment register needs to be restored beforehand, or a distinct machine code instruction used

  • If you get a pointer to an element in an array, you can use pointer arithmetic to move anywhere else in the array, or even to form an address one-past-the-end of the array that's legal to compare with other pointers to elements in the array (or that have similarly been moved by pointer arithmetic to the same one-past-the-end value); again in C and C++, it's up to the compiler to ensure this "just works"

  • Specific OS functions, e.g. shared memory mapping, may give you pointers, and they'll "just work" within the range of addresses that makes sense for them

  • Attempts to move legal pointers beyond these boundaries, or to cast arbitrary numbers to pointers, or use pointers cast to unrelated types, typically have undefined behaviour, so should be avoided in higher level libraries and applications, but code for OSes, device drivers, etc. may need to rely on behaviour left undefined by the C or C++ Standard, that is nevertheless well defined by their specific implementation or hardware.

Is apparent NULL pointer dereference in C actually pointer arithmetic?

This is not an "and", this is taking the address of the right hand side argument.

This is a standard hack to get the offset of a struct member at run time. You are casting 0 to a pointer to struct hi, then referencing the 'b' member and getting its address. Then you add this offset to the pointer "ptr" and getting real address of the 'b' field of the struct pointed to by ptr, which is ob. Then you cast that pointer back to int pointer (because b is int) and output it.
This is the 2nd print.
The first print outputs num, which is 4 not because b's value is 4, but because 4 is the offset of the b field in hi struct. Which is sizeof(int), because b follows a, and a is int...
Hope this makes sense :)



Related Topics



Leave a reply



Submit