Global Memory Management in C++ in Stack or Heap

Global memory management in C++ in stack or heap?

Since I wasn't satisfied with the answers, and hope that the sameer karjatkar wants to learn more than just a simple yes/no answer, here you go.

Typically a process has 5 different areas of memory allocated

  1. Code - text segment
  2. Initialized data – data segment
  3. Uninitialized data – bss segment
  4. Heap
  5. Stack

If you really want to learn what is saved where then read and bookmark these:

COMPILER, ASSEMBLER, LINKER AND LOADER: A BRIEF STORY (look at Table w.5)

Anatomy of a Program in Memory

alt text

What and where are the stack and heap?

The stack is the memory set aside as scratch space for a thread of execution. When a function is called, a block is reserved on the top of the stack for local variables and some bookkeeping data. When that function returns, the block becomes unused and can be used the next time a function is called. The stack is always reserved in a LIFO (last in first out) order; the most recently reserved block is always the next block to be freed. This makes it really simple to keep track of the stack; freeing a block from the stack is nothing more than adjusting one pointer.

The heap is memory set aside for dynamic allocation. Unlike the stack, there's no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time; there are many custom heap allocators available to tune heap performance for different usage patterns.

Each thread gets a stack, while there's typically only one heap for the application (although it isn't uncommon to have multiple heaps for different types of allocation).

To answer your questions directly:

To what extent are they controlled by the OS or language runtime?

The OS allocates the stack for each system-level thread when the thread is created. Typically the OS is called by the language runtime to allocate the heap for the application.

What is their scope?

The stack is attached to a thread, so when the thread exits the stack is reclaimed. The heap is typically allocated at application startup by the runtime, and is reclaimed when the application (technically process) exits.

What determines the size of each of them?

The size of the stack is set when a thread is created. The size of the heap is set on application startup, but can grow as space is needed (the allocator requests more memory from the operating system).

What makes one faster?

The stack is faster because the access pattern makes it trivial to allocate and deallocate memory from it (a pointer/integer is simply incremented or decremented), while the heap has much more complex bookkeeping involved in an allocation or deallocation. Also, each byte in the stack tends to be reused very frequently which means it tends to be mapped to the processor's cache, making it very fast. Another performance hit for the heap is that the heap, being mostly a global resource, typically has to be multi-threading safe, i.e. each allocation and deallocation needs to be - typically - synchronized with "all" other heap accesses in the program.

A clear demonstration:
Sample Image

Image source: vikashazrati.wordpress.com

Are global arrays allocated on the stack

No, global data are not allocated on the stack. They are allocated statically and the memory is reserved at compile time.

One simple way to think about this is to consider threads. There is one stack per thread. But global data is shared between threads. So global data cannot be allocated on a stack.

Other types of global variable are on the heap.

Not so. Global data is never allocated on the heap. Heap allocation is performed dynamically at runtime.

Perhaps you have a pointer global variable. And you assign a dynamic array to that pointer. In that scenario the pointer is a global, but the array is a dynamic heap allocated object.

So perhaps you have code like this:

int *arr;
....
arr = calloc(N, sizeof(int));

In that scenario, arr is a global object, but *arr is heap allocated.

C++ variables and where they are stored in memory (stack, heap, static)

Mostly right.

Any variable that is accessed with a pointer is stored on the heap.

This isn't true. You can have pointers to stack-based or global variables.

Also it's worth pointing out that global variables are generally unified by the linker (i.e. if two modules have "int i" at global scope, you'll only have one global variable called "i"). Dynamic libraries complicate that slightly; on Windows, DLLs don't have that behaviour (i.e. an "int i" in a Windows DLL will not be the same "int i" as in another DLL in the same process, or as the main executable), while most other platforms dynamic libraries do. There are some additional complications on Darwin (iOS/macOS) which has a hierarchical namespace for symbols; as long as you're linking with the flat_namespace option, what I just said will hold.

Additionally, it's worth talking about initialisation behaviour; global variables are initialised automatically by the runtime (typically either using special linker features or by means of a call that is inserted into the code for your main function). The order of initialisation of globals isn't guaranteed. However, static variables declared at function scope are initialised when that function is first executed, and not at program start-up as you might suppose, and that feature is commonly used by C++ programmers to do lazy initialisation.

(Similar concerns apply to destructors for global objects; those are best avoided entirely IMO, not least because on some platforms there are fast termination features that simply won't call them.)

const keyword means you can't change the variable.

Almost. const affects the type, and there is a difference depending on where you write it exactly. For example

const char *foo;

should be read as foo is a pointer to a const char, i.e. foo itself is not const, but the thing it points at is. Contrast with

char * const foo;

which says that foo is a const pointer to char.

Finally, you've missed out volatile, the point of which is to tell the compiler not to make assumptions about the thing to which it applies (e.g. it can't assume that it's safe to cache a volatile value in a register, or to optimise away accesses, or in general to optimise across any operation that affects a volatile value). Hopefully you'll never need to use volatile; it's most often useful if you're doing really low-level things that frankly a lot of people have no need to go anywhere near.

stack memory management in embedded systems

Your understanding of static variables the the .data section is correct. You may also want to consider zero-initialized static variables in the .bss section. These are initialized at the same time as those in the .data section, but their initial value does not need to be stored because it is zero.

Automatic variables may be on the stack or may be optimized to only be in processor registers. Either way, code is generated by the compiler to initialize them each time the function using them is called. If they are on the stack then this will include an instruction to adjust the stack pointer to "allocate" space for them when they are needed and "free" them when they go out of context.

The space for the entire stack is usually allocated in the linker script. In an embedded microcontroller system no instructions are necessary to "allocate" it. Depending on the hardware there may be code required to enable access to external memory, but in most cases there is a bank of fast SRAM ready to use as soon as the system powers on, and the first stack will be in this.

Where in memory are my variables stored in C?

You got some of these right, but whoever wrote the questions tricked you on at least one question:

  • global variables -------> data (correct)
  • static variables -------> data (correct)
  • constant data types -----> code and/or data. Consider string literals for a situation when a constant itself would be stored in the data segment, and references to it would be embedded in the code
  • local variables(declared and defined in functions) --------> stack (correct)
  • variables declared and defined in main function -----> heap also stack (the teacher was trying to trick you)
  • pointers(ex: char *arr, int *arr) -------> heap data or stack, depending on the context. C lets you declare a global or a static pointer, in which case the pointer itself would end up in the data segment.
  • dynamically allocated space(using malloc, calloc, realloc) --------> stack heap

It is worth mentioning that "stack" is officially called "automatic storage class".

Location of pointers and global variables in C

Global variables can be in a couple places, depending on how they're set up - for example, const globals may be in a read-only section of the executable. "Normal" globals are in a read-write section of the executable. They're not on the heap or the stack at all. Pointers are just a type of variable, so they can be wherever you want them to be (on the heap if you malloc() them, on the stack if they're local variables, or in the data section if they're global).

C memory management conventions: freeing memory allocated on heap by object allocated on stack

If you're using _init and _deinit functions, yes, you'd want _deinit to free the memory, and yes, vec_init and vec_deinit would be mandatory for stack allocated structs. For this use case, a stack allocated struct could be initialized with vec_t my_vec = {0}; and a vec_init call avoided, but that assumes zeroing produces a validly initialized struct now and forever (if you change vec_init later to make some fields non-zero, users of your library that didn't use vec_init have to update), and it can be confusing when the unavoidable vec_deinit is not paired with a corresponding vec_init.

Note that code need not be so heavily duplicated; _alloc and _free can be implemented in terms of _init and _deinit, keeping the code duplication to a minimum:

struct vec_t *vec_alloc(void)
{
struct vec_t *vec = malloc(sizeof(struct vec_t));
if (vec) vec_init(vec); // Don't try to init if malloc failed
return vec;
}

void vec_free(struct vec_t *vec)
{
if (vec) vec_deinit(vec); // Don't try to deinit when passed NULL
free(vec);
}

Who is responsible for the stack and heap in C++?

Who or what is responsible for the invention of the stack and heap?

As far as inventing a stack and heap, you would have better luck searching the web. Those concepts have been around for many decades.

Are these inventions of the C++ compiler?

Perhaps invention is the wrong term here. They are data structures. The compiler and OS (if present) are in charge of organizing and utilizing memory.

Does the os specify memory sections in RAM designated "stack" and "heap"?

This is OS specific and can vary by OS. Some OSes reserve stack and heap areas, others don't.

In the embedded system I am working on, there are two heap areas: 1) The area specified in the linker, 2) A portion of the memory allocated to the OS. Both of these areas are set to zero size, so we don't have any heaps.

The stack areas are set up by initialization code that runs before the C language Run-Time Library is initialized. The RTL code may also create some stack areas as well. Our RTOS also creates stack areas (one for each task).

So, there is not one single area called the stack. Some platform's don't use a stack concept at all (especially those whose memory capacity is severely restricted).

I'm pretty sure they are not built into the hardware but I could be wrong.

Depends on the hardware. Simple and cheap hardware only allocates an area of RAM (read/write memory). More complex and expensive hardware may allocate separate areas for stacks, heaps, executables and data. Constants may be placed into ROM (read-only memory, such as Flash). There is no one-size or one-configuration that supports everything. Desktop computers are different animals than smaller embedded systems.

Also, is the compiler responsible for generating assembly code that specify which local or function data will be stored on the stack vs CPU registers?

The task can be in the Linker or Compiler or both.

Many compiler tool-chains utilize both stack and CPU registers. Many variables and data can be on the stack, in registers, in RAM or in ROM. A compiler is designed to make best use of the platform's resources, including memory and registers.

A good example to study is the assembly language generated by your compiler. Also look at the linker instruction file as well. The use of registers or stack memory is so dependent on the data structures (and types) that it may be different for different functions. Another factor is the amount of memory and kind available. If the processor has few registers available, the compiler may pass variables using the stack. Larger data (that doesn't fit in a register) may be passed on the stack or a pointer passed to the data. There are too many options and combinations available to enumerate here.

Summary

In order for the C and C++ languages to be very portable, many concepts are delegated to the implementation (compiler / toolchain). Two of these concepts are commonly referred to as stack and heap. The C and C++ language standards use a simple model as the environment for the languages. Also, there are terms such as "hosted" and "semihosted" which indicate the degree that a platform supports the language requirements. The stack and heap are data structures not required by the platform in order to support the languages. They do assist in the efficiency of the implementation.

If stacks and heaps are supported, their location and management is the responsibility of the implementation (toolchain). A compiler is free to use it's own memory management functions or the OS (if present). The management of the stacks and heaps may require hardware support (such as virtual memory management or paging; and fences). There is no requirement for the stack to grow towards the heap. There is no requirement for the stack to grow in a positive direction. These are all up to the implementation (toolchain), and they can implement and locate stacks however and wherever they like. Note: most likely, they won't place variables in read-only memory and won't locate stacks outside memory capacity.



Related Topics



Leave a reply



Submit