Dangling References and Undefined Behavior

C++ dangling reference strange behaviour

Your test is not as sharp as it could be: In order to show that the reference is dangling you should actually store the reference and not a copy of the value of the deceased object it refers to.

To understand why that would be more interesting let's dissect the function for a sec.

  • int** z = &y; makes z point to y; *z is now an alias for y.
  • *z=x; makes a copy of the address value the pointer referenced by x contains and assigns it to the entity known as y or *z. That address is entirely valid (f() is called with the address of main's a).
  • return *z; returns an lvalue reference (that is, a reference you could syntactically assign to) to the lvalue *z aka y. That lvalue is of type pointer to int and contains the valid address of main's a. The issue with the code is that what is referred to, namely y, is destroyed as soon as the function has returned so that reading the value through it in cout<<"indirizzo funzione: "<<&f(i,u) is undefined behavior, and the compiler warns about it.

The reason that the program doesn't crash is that immediately after f returns, the memory of its former local variables is still intact. Of course it's illegal to access it, but if you look at the memory it's all there. Consequently, int* aux = f(i,u); simply reads the (valid) address stored in the recently deceased y and stores it as a copy in aux. You can now write on the stack as much as you like: aux will contain a valid value.

That's why you were not successful in your attempts to write on the stack in order to overwrite it.

If instead you store the returned reference to *z aka y you'll refer to the deceased object itself which inevitably will be overwritten by future stack operations or used in other ways by the compiler.

Here is an anglicized, minimal example using a reference instead of a copy (note the definition of the variable dangling_ref). I compile and run it it twice, with standard optimization and with maximum optimization. Simply changing the compiler options changes the output (and, what I'd assume is a bug, determines whether the warning is output!). Here is a sample session on msys2.

$ cat dangling-ref.cpp
#include <iostream>
using namespace std;

int*& dangling_ref_ret(int*& x, int* y)
{
int** z = &y;
*z = x;
cout << "ret addr " << *z << " (should be == " << y << ")" << ", val = " << **z << endl;
return *z;
}

int main()
{
int b = 1;
int* pb = &b;
int c = 2;
int*& dangling_ref = dangling_ref_ret(pb, &c);
cout << "val of dangling_ref " << dangling_ref << " is " << *dangling_ref << endl;
}

$ gcc --version
gcc (GCC) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ g++ -Wall -o dangling-ref dangling-ref.cpp && ./dangling-ref.exe
ret addr 0xffffcc34 (should be == 0xffffcc34), val = 1
val of dangling_ref 0xffffcc34 is 1
$ g++ -Wall -O3 -o dangling-ref dangling-ref.cpp && ./dangling-ref.exe
dangling-ref.cpp: In function ‘int*& dangling_ref_ret(int*&, int*)’:
dangling-ref.cpp:9:13: warning: function may return address of local variable [-Wreturn-local-addr]
9 | return *z;
| ^
dangling-ref.cpp:4:38: note: declared here
4 | int*& dangling_ref_ret(int*& x, int* y)
| ~~~~~^
ret addr 0xffffcc10 (should be == 0xffffcc10), val = 1
val of dangling_ref 0xffffcc14 is 2

Visual Studio also behaves differently between Debug and Release mode.

You can try different compilers and options on godbolt.

Dangling references in std::function captures. Why undefined behavior?

Undefined behaviour does not mean segfault, it means anything. "It works" or "format harddrive" or "segfault" or "email your browser history and passwords to all of your contacts".

In this case, the stack and heap had garbage memory that happened to be layed out like non-garbage.

To prevent this, don't use any kind of [&] capture unless the lambda and all copies are discarded before the end of the current scope.

There is no way to deterministically detect all dangling references on C++. Write code in a style that doesn't generate risks of dangling references for 99% of your code. In the 1% where you cannot for whatever reason, be extremely careful, comment heavily, and include proofs that there are no dangling references.

There are many tools that help track down dangling references, to a greater or lesser extent, but none are reliable enough to deal with programmers insisting on doing dumb things. Asking for tool recommendations is explicitly off topic on SO.

Dangling reference when returning reference to reference parameter bound to temporary

I like to keep an example class A around for situations like this. The full definition of A is a little too lengthy to list here, but it is included in its entirety at this link.

In a nutshell, A keeps a state and a status, and the status can be one of these enums:

    destructed             = -4,
self_move_assigned = -3,
move_assigned_from = -2,
move_constructed_from = -1,
constructed_specified = 0

That is, the special members set the status accordingly. For example ~A() looks like this:

~A()
{
assert(is_valid());
--count;
state_ = randomize();
status_ = destructed;
}

And there's a streaming operator that prints this class out.

Language lawyer disclaimer: Printing out a destructed A is undefined behavior, and anything could happen. That being said, when experiments are compiled with optimizations turned off, you typically get the expected result.

For me, using clang at -O0, this:

#include "A.h"
#include <iostream>

int
main()
{
A a{1};
A b{2};
A c{3};
A&& x = a + b + c;
std::cout << x << '\n';
}

Outputs:

destructed: -1002199219

Changing the line to:

    A x = a + b + c;

Results in:

6

realloc() dangling pointers and undefined behavior

When you free memory, what happens to pointers that point into that memory? Do they become invalid immediately?

Yes, definitely. From section 6.2.4 of the C standard:

The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address, and retains
its last-stored value throughout its lifetime. If an object is referred to outside of its
lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when
the object it points to (or just past) reaches the end of its lifetime.

And from section 7.22.3.5:

The realloc function deallocates the old object pointed to by ptr and returns a
pointer to a new object that has the size specified by size. The contents of the new
object shall be the same as that of the old object prior to deallocation, up to the lesser of
the new and old sizes. Any bytes in the new object beyond the size of the old object have
indeterminate values.

Note the reference to old object and new object ... by the standard, what you get back from realloc is a different object than what you had before; it's no different from doing a free and then a malloc, and there is no guarantee that the two objects have the same address, even if the new size is <= the old size ... and in real implementations they often won't because objects of different sizes are drawn from different free lists.

What happens if they later become valid again?

There's no such animal. Validity isn't some event that takes place, it's an abstract condition placed by the C standard. Your pointers might happen to work in some implementation, but all bets are off once you free the memory they point into.

But what if the memory becomes valid again for the same allocation? There's only one Standard way for that to happen: realloc()

Sorry, no, the C Standard does not contain any language to that effect.

If you then use realloc() again grow the block back to at least cover the object type pointed to by the dangling pointer, and in neither case did realloc() move the memory block

You can't know whether it will ... the standard does not guarantee any such thing. And notably, when you realloc to a smaller size, most implementations modify the memory immediately following the shortened block; reallocing back to the original size will have some garbage in the added part, it won't be what it was before it was shrunk. In some implementations, some block sizes are kept on lists for that block size; reallocating to a different size will give you totally different memory. And in a program with multiple threads, any freed memory can be allocated in a different thread between the two reallocs, in which case the realloc for a larger size will be forced to move the object to a different location.

is the dangling pointer valid again?

See above; invalid is invalid; there's no going back.

This is such a corner case that I don't really know how to interpret the C or C++ standards to figure it out.

It's not any sort of corner case and I don't know what you're seeing in the standard, which is quite clear that freed memory has indeteterminate content and that the values of any pointers to or into it are also indeterminate, and makes no claim that they are magically restored by a later realloc.

Note that modern optimizing compilers are written to know about undefined behavior and take advantage of it. As soon as you realloc string, overwrite is invalid, and the compiler is free to trash it ... e.g., it might be in a register that the compiler reallocates for temporaries or parameter passing. Whether any compiler does this, it can, precisely because the standard is quite clear about pointers into objects becoming invalid when the object's lifetime ends.



Related Topics



Leave a reply



Submit