How to Determine If Returned Pointer Is on the Stack or Heap

How to determine if returned pointer is on the stack or heap

Distinguishing between malloc/free and new/delete is generally not possible, at least not in a reliable and/or portable way. Even more so as new simply wrapps malloc anyway in many implementations.

None of the following alternatives to distinguish heap/stack have been tested, but they should all work.

Linux:

  1. Solution proposed by Luca Tettananti, parse /proc/self/maps to get the address range of the stack.
  2. As the first thing at startup, clone your process, this implies supplying a stack. Since you supply it, you automatically know where it is.
  3. Call GCC's __builtin_frame_address function with increasing level parameter until it returns 0. You then know the depth. Now call __builtin_frame_address again with the maximum level, and once with a level of 0. Anything that lives on the stack must necessarily be between these two addresses.
  4. sbrk(0) as the first thing at startup, and remember the value. Whenever you want to know if something is on the heap, sbrk(0) again -- something that's on the heap must be between the two values. Note that this will not work reliably with allocators that use memory mapping for large allocations.

Knowing the location and size of the stack (alternatives 1 and 2), it's trivial to find out if an address is within that range. If it's not, is necessarily "heap" (unless someone tries to be super smart-ass and gives you a pointer to a static global, or a function pointer, or such...).

Windows:

  1. Use CaptureStackBackTrace, anything living on the stack must be between the returned pointer array's first and last element.
  2. Use GCC-MinGW (and __builtin_frame_address, which should just work) as above.
  3. Use GetProcessHeaps and HeapWalk to check every allocated block for a match. If none match for none of the heaps, it's consequently allocated on the stack (... or a memory mapping, if someone tries to be super-smart with you).
  4. Use HeapReAlloc with HEAP_REALLOC_IN_PLACE_ONLY and with exactly the same size. If this fails, the memory block starting at the given address is not allocated on the heap. If it "succeeds", it is a no-op.
  5. Use GetCurrentThreadStackLimits (Windows 8 / 2012 only)
  6. Call NtCurrentTeb() (or read fs:[18h]) and use the fields StackBase and StackLimit of the returned TEB.

How to know if a pointer points to the heap or the stack?

There is no way of doing this - and if you need to do it, there is something wrong with your design. There is a discussion of why you can't do this in More Effective C++.

Where are pointers in C++ stored, on the stack or in the heap?

Your understanding may be correct, but the statements are wrong:

A pointer to object m has been allocated on the stack.

m is the pointer. It is on the stack. Perhaps you meant pointer to a Member object.

The object m itself (the data that it carries, as well as access to its methods) has been allocated on the heap.

Correct would be to say the object pointed by m is created on the heap

In general, any function/method local object and function parameters are created on the stack. Since m is a function local object, it is on the stack, but the object pointed to by m is on the heap.

Can a C++ class determine whether it's on the stack or heap?

You need to actually ask us the real question(a) :-) It may be apparent to you why you think this is necessary but it almost certainly isn't. In fact, it's almost always a bad idea. In other words, why do you think you need to do this?

I usually find it's because developers want to delete or not delete the object based on where it was allocated but that's something that should usually be left to the client of your code rather than your code itself.


Update:

Now that you've clarified your reasons in the question, I apologise, you've probably found one of the few areas in which what you're asking makes sense (running your own garbage collection processes). Ideally, you'd override all the memory allocation and de-allocation operators to keep track of what is created and removed from the heap.

However, I'm not sure it's a simple matter of intercepting the new/delete for the class since there could be situations where delete is not called and, since mark/sweep relies on a reference count, you need to be able to intercept pointer assignments for it to work correctly.

Have you thought about how you're going to handle that?

The classic example:

myobject *x = new xclass();
x = 0;

will not result in a delete call.

Also, how will you detect the fact that the pointer to one of your instances is on the stack? The interception of new and delete can let you store whether the object itself is stack or heap-based but I'm at a loss as to how you tell where the pointer is going to be assigned to, especially with code like:

myobject *x1 = new xclass();  // yes, calls new.
myobject *x2 = x; // no, it doesn't.

Perhaps you may want to look into C++'s smart pointers, which go a long way toward making manual memory management obsolete. Shared pointers on their own can still suffer from problems like circular dependencies but the judicious use of weak pointers can readily solve that.

It may be that manual garbage collection is no longer required in your scenario.


(a) This is known as the X/Y problem. Many times, people will ask a question that pre-supposes a class of solution whereas a better approach would be just to describe the problem with no preconceptions of what the best solution will be.

Find out whether a pointer is pointing at the stack, heap or program text?

You cannot do what you want in a portable way, because the C language standard does not specify the stack, program area, and heap as distinct areas. Their location can depend on the processor architecture, the operating system, the loader, the linker, and the compiler. Trying to guess where a pointer is pointing is breaking the abstraction provided by C, so you probably you shouldn't be doing that.

Nevertheless, there are ways to write code that will make a correct guess for a specific environment. You do that by examining the addresses of existing objects, and looking for patterns. Consider the following program.

#include <stdlib.h>
#include <stdio.h>

void
function()
{
int stack2;

printf("stack2: %15p\n", &stack2);
}

int
main(int argc, char *argv[])
{
int stack;
void *heap = malloc(1);
void *heap2 = malloc(1);

printf("program: %15p\n", main);
printf("heap: %15p\n", heap);
printf("heap2: %15p\n", heap2);
printf("stack: %15p\n", &stack);
function();
return 0;
}

By examining its output you can see a pattern, such as the following on x64 Linux.

program:        0x400504
heap: 0x1675010
heap2: 0x1675030
stack: 0x7fff282c783c
stack2: 0x7fff6ae37afc

From the above you can determine that (probably) the heap grows up from 0x1675010, anything below it is program code (or static data, which you didn't mention), and that the stack grows in an unpredictable manner (probably due to stack randomization) around a very large address, like 0x7fff282c783c.

Compare this with the output under 32-bit Intel Linux:

program:       0x804842f
heap: 0x804b008
heap2: 0x804b018
stack: 0xbf84ad38
stack2: 0xbf84ad14

Microsoft Windows and the 32-bit Microsoft C compiler:

program:        01271020
heap: 002E3B00
heap2: 002E3B10
stack: 0024F978
stack2: 0024F964

gcc under Windows Cygwin:

program:        0040130B
heap: 00A41728
heap2: 00A417A8
stack: 0028FF44
stack2: 0028FF14

gcc under Intel 32-bit FreeBSD:

program:       0x8048524
heap: 0x804b030
heap2: 0x804b040
stack: 0xbfbffb3c
stack2: 0xbfbffb1c

gcc under Intel 64-bit FreeBSD:

program:        0x400770
heap: 0x801006058
heap2: 0x801006060
stack: 0x7fffffffdaec
stack2: 0x7fffffffdabc

gcc under SPARC-64 FreeBSD:

program:        0x100860
heap: 0x40c04098
heap2: 0x40c040a0
stack: 0x7fdffffe9ac
stack2: 0x7fdffffe8dc

PowerPC running MacOS X:

program:          0x1ed4
heap: 0x100120
heap2: 0x100130
stack: 0xbffffba0
stack2: 0xbffffb38

PowerPC running Linux:

program:      0x10000514
heap: 0x100c6008
heap2: 0x100c6018
stack: 0xbff45db0
stack2: 0xbff45d88

StrongARM running NetBSD:

program:          0x1c5c
heap: 0x5030
heap2: 0x5040
stack: 0xefbfdcd0
stack2: 0xefbfdcb4

and ARMv6 running Linux:

program:          0x842c
heap: 0xb63008
heap2: 0xb63018
stack: 0xbe83eac4
stack2: 0xbe83eaac

As you can see the possibilities are endless.

Do pointer return types always needs to be allocated on heap?

  1. Yes. Returning a local non-static address has little value. Such returned addresses are unusable for dereferencing. You can still printf("%p\n",(void*)the_address) them but that's about all you can do with them. (Returning the address of a local static makes sense, though. Such a returned local address is safe to dereference.)

  2. Pointers can point to anything: globals, statics, and they can be passed from a caller (who could allocate their target on the stack for example).

  3. int ReturnInt(){const int a = 5;return a;} returns through a register on most platforms. If that's not possible, the compiler will have made sure the caller has stack-allocated space for the return value.

Check if a pointer points to allocated memory on the heap

There's no standard way to do this, but various malloc debugging tools may have a way of doing it. For example, if you use valgrind, you can use VALGRIND_CHECK_MEM_IS_ADDRESSABLE to check this and related things

Why is returning a stack allocated pointer variable in a function allowed in C?

They will both be undefined behaviour, if the returned value is accessed. So, none of them are "OK".

You're trying to return a pointer to a block-scoped variable which is of auto storage duration. So, once the scope ends, the lifetime of the variable comes to an end.

Quoting C11, chapter §6.2.4/P2, regarding the lifetime (emphasis mine)

The lifetime of an object is the portion of program execution during which storage is
guaranteed to be reserved for it. An object exists, has a constant address, and retains
its last-stored value throughout its lifetime. If an object is referred to outside of its
lifetime, the behavior is undefined
[...]

Then, from P5,

An object whose identifier is declared with no linkage and without the storage-class
specifier static has automatic storage duration, [...]

and

For such an object that does not have a variable length array type, its lifetime extends
from entry into the block with which it is associated until execution of that block ends in
any way. [...]

So, in your case, the variable arr is having automatic storage and it's lifetime is limited to the function body. Once the address is returned to caller, attempt to access the memory at that address would be UB.

Oh, and there's no "stack" or "heap" in C standard, All we have is the lifetime of a variable.

What and where are the stack and heap?

The stack is the memory set aside as scratch space for a thread of execution. When a function is called, a block is reserved on the top of the stack for local variables and some bookkeeping data. When that function returns, the block becomes unused and can be used the next time a function is called. The stack is always reserved in a LIFO (last in first out) order; the most recently reserved block is always the next block to be freed. This makes it really simple to keep track of the stack; freeing a block from the stack is nothing more than adjusting one pointer.

The heap is memory set aside for dynamic allocation. Unlike the stack, there's no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time; there are many custom heap allocators available to tune heap performance for different usage patterns.

Each thread gets a stack, while there's typically only one heap for the application (although it isn't uncommon to have multiple heaps for different types of allocation).

To answer your questions directly:

To what extent are they controlled by the OS or language runtime?

The OS allocates the stack for each system-level thread when the thread is created. Typically the OS is called by the language runtime to allocate the heap for the application.

What is their scope?

The stack is attached to a thread, so when the thread exits the stack is reclaimed. The heap is typically allocated at application startup by the runtime, and is reclaimed when the application (technically process) exits.

What determines the size of each of them?

The size of the stack is set when a thread is created. The size of the heap is set on application startup, but can grow as space is needed (the allocator requests more memory from the operating system).

What makes one faster?

The stack is faster because the access pattern makes it trivial to allocate and deallocate memory from it (a pointer/integer is simply incremented or decremented), while the heap has much more complex bookkeeping involved in an allocation or deallocation. Also, each byte in the stack tends to be reused very frequently which means it tends to be mapped to the processor's cache, making it very fast. Another performance hit for the heap is that the heap, being mostly a global resource, typically has to be multi-threading safe, i.e. each allocation and deallocation needs to be - typically - synchronized with "all" other heap accesses in the program.

A clear demonstration:
Sample Image

Image source: vikashazrati.wordpress.com

Checking if a pointer is allocated memory or not

You cannot check, except some implementation specific hacks.

Pointers have no information with them other than where they point. The best you can do is say "I know how this particular compiler version allocates memory, so I'll dereference memory, move the pointer back 4 bytes, check the size, makes sure it matches..." and so on. You cannot do it in a standard fashion, since memory allocation is implementation defined. Not to mention they might have not dynamically allocated it at all.

You just have to assume your client knows how to program in C. The only un-solution I can think of would be to allocate the memory yourself and return it, but that's hardly a small change. (It's a larger design change.)



Related Topics



Leave a reply



Submit