What Does "Memory Allocated at Compile Time" Really Mean

What does Memory allocated at compile time really mean?

Memory allocated at compile-time means the compiler resolves at compile-time where certain things will be allocated inside the process memory map.

For example, consider a global array:

int array[100];

The compiler knows at compile-time the size of the array and the size of an int, so it knows the entire size of the array at compile-time. Also a global variable has static storage duration by default: it is allocated in the static memory area of the process memory space (.data/.bss section). Given that information, the compiler decides during compilation in what address of that static memory area the array will be.

Of course that memory addresses are virtual addresses. The program assumes that it has its own entire memory space (From 0x00000000 to 0xFFFFFFFF for example). That's why the compiler could do assumptions like "Okay, the array will be at address 0x00A33211". At runtime that addresses are translated to real/hardware addresses by the MMU and OS.

Value initialized static storage things are a bit different. For example:

int array[] = { 1 , 2 , 3 , 4 };

In our first example, the compiler only decided where the array will be allocated, storing that information in the executable.

In the case of value-initialized things, the compiler also injects the initial value of the array into the executable, and adds code which tells the program loader that after the array allocation at program start, the array should be filled with these values.

Here are two examples of the assembly generated by the compiler (GCC4.8.1 with x86 target):

C++ code:

int a[4];
int b[] = { 1 , 2 , 3 , 4 };

int main()
{}

Output assembly:

a:
.zero 16
b:
.long 1
.long 2
.long 3
.long 4
main:
pushq %rbp
movq %rsp, %rbp
movl $0, %eax
popq %rbp
ret

As you can see, the values are directly injected into the assembly. In the array a, the compiler generates a zero initialization of 16 bytes, because the Standard says that static stored things should be initialized to zero by default:

8.5.9 (Initializers) [Note]:

Every object of static storage duration is zero-initialized at
program startup before any other initial- ization takes place. In some
cases, additional initialization is done later.

I always suggest people to disassembly their code to see what the compiler really does with the C++ code. This applies from storage classes/duration (like this question) to advanced compiler optimizations. You could instruct your compiler to generate the assembly, but there are wonderful tools to do this on the Internet in a friendly manner. My favourite is GCC Explorer.

Automatic memory allocation occurs at compile time or at run time in C?

Automatic allocation happens in run-time, though the nature of it is very system-specific. Automatic storage duration variables may end up in registers, on the stack or optimized away entirely.

In case they do end up on the stack, the compiler creates a local scope offset to the function where the variable is allocated. That is, the variable might be referred to as SP + 8 or something similar, where SP is the stack pointer. Which in turn could hold any value when the function is entered - the compiler or machine code does not know or care about that, which is why stack overflows exist.

You might find this useful: What gets allocated on the stack and the heap?.

When is static memory allocated in C/C++? At compile time or at the very beginning of when a program is run?

In typical tools, memory with static storage duration is arranged in multiple steps:

  • The compiler generates data in object modules (likely passing through some form of assembly code) that describes needs for various kinds of memory: memory initialized to zero, memory initialized to particular values and is read-only thereafter, memory initialized to particular values and may be modified, memory that does not need to be initialized, and possibly others. The compiler also includes initial data as necessary, information about symbols that refer to various places in the required memory, and other information. At this point, the allocation of memory is in forms roughly like “8 bytes are needed in the constant data section, and a symbol called foo should be set to their address.”
  • The linker combines this information into similar information in an executable file. It also resolves some or all information about symbols. At this point, the allocation of memory is in forms like “The initialized non-constant data section requires 3048 bytes, and here is the initial data for it. When it is assigned a virtual address, the following symbols should be adjusted: bar is at offset 124 from the start of the section, baz is at offset 900…”
  • The program loader reads this information, allocates locations in the virtual address space for it, and may read some of the data from the executable file into memory or inform the operating system where the data is to be found when it is needed. At this point, the places in the code that refer to various symbols have been modified according to the final values of those symbols.
  • The operating system allocates physical memory for the virtual addresses. Often, this is done “on demand” in pieces (memory pages) when a process attempts to access the memory in a specific page, rather than being done at the time the program is initially loaded.

All-in-all, static memory is not allocated at any particular time. It is a combination of many activities. The effect on the program is largely that it occurs the same as if it were all allocated when the program started, but the physical memory might only be allocated just before an instruction actually executes. (The physical memory can even be taken away from the process and restored to it later.)

What does it means All memory allocated on the stack is known at compile time?

The statement made is a little bit simplified for the reader. You're right that the stack is dynamic in nature and the actual allocated amount can vary depending on dynamic input. Here is a simple example with a recursive function:

void f(int n)
{
int x = n * 10;
if(x == 0) return;

std::cout << x << std::endl;
f(n - 1);
}

int main()
{
int n;
std::cout << "Enter n: " << std::endl;
std::cin >> n;
f(n);
}

Here clearly the number of invocations of f, a recursive function, depends on the n entered by the user, so for any given instantiation the compiler cannot possibly know the exact memory address of the local variable x in f. However, what it does know is x's offset from the local stack frame, which is what I believe the example is referring to. A stack frame is a local area of the stack prepared every time a function invocation occurs. Within a given stack frame, locations of local variables are in fact known constant offsets relative to the beginning of the stack frame. This 'beginning' is saved in a standard register in every invocation, so all the compiler has to do to find the address of any local is to apply its fixed known offset to this dynamic 'base pointer'.

What is the need, local static variables are allocated memory during compile time?

Allocating and initializing the memory at compile time means the program doesn't have to keep track of whether the function has already been entered and the variable has been initialized. Local static variables with constant initial values are treated essentially the same as global variables, except that the name is only in the scope of that function.

It's a time-space tradeoff -- initializing it during the first call would require code that has to be executed every time the function is called. Initializing it when the program is loaded means that its initialization is done as part of the block copy from the executable's text segment to the data segment of memory, along with global statics.

See What is the lifetime of a static variable in a C++ function? for the more complicated case of C++ local static variables. In C++ I would probably use a static std::array, which I don't think would be initialized until the function is entered.

If you have a large array in a function that's rarely called, and you don't want to waste memory for it, use a static pointer instead of a static array, and initialize it yourself.

void func1() {
static int *data;

if (!data) { // Need to protect this with a mutex if multi-threading
data = malloc(N * sizeof(int));
for (int i = 0; i < N; i++) {
data[i] = i;
}
}
...
}

This is the code the compiler would have to generate to do first-time initialization of the array.

I'm little bit confused that whether automatic memory allocation takes place during run time or compile time

In your example, it is unclear where "a" is defined. So, I'll take a stab at answering this by making assumptions on that.

  1. If the array is declared as a global array, it resides in the bss segment, and memory is allocated as the segments are loaded into memory.
  2. If the array is on the stack, and the size of the array is known at compile-time, the stack pointer is moved to allocate space for the array. You can see this if you disassemble the code.
  3. If the array is on the stack, but space is allocated based on an argument to the function you have a VLA(variable length array). These are commonly converted to "alloca" calls by the compiler. In this case the stack pointer is just moved to allocated "n" bytes on the stack.
  4. If the array is on the heap, the allocations are performed by the heap allocator in use.


Related Topics



Leave a reply



Submit