Why Does a C/C++ Compiler Need Know the Size of an Array at Compile Time

Why does a C/C++ compiler need know the size of an array at compile time?

To understand why variably-sized arrays are more complicated to implement, you need to know a little about how automatic storage duration ("local") variables are usually implemented.

Local variables tend to be stored on the runtime stack. The stack is basically a large array of memory, which is sequentially allocated to local variables and with a single index pointing to the current "high water mark". This index is the stack pointer.

When a function is entered, the stack pointer is moved in one direction to allocate memory on the stack for local variables; when the function exits, the stack pointer is moved back in the other direction, to deallocate them.

This means that the actual location of local variables in memory is defined only with reference to the value of the stack pointer at function entry¹. The code in a function must access local variables via an offset from the stack pointer. The exact offsets to be used depend upon the size of the local variables.

Now, when all the local variables have a size that is fixed at compile-time, these offsets from the stack pointer are also fixed - so they can be coded directly into the instructions that the compiler emits. For example, in this function:

void foo(void)
{
    int a;
    char b[10];
    int c;

a might be accessed as STACK_POINTER + 0, b might be accessed as STACK_POINTER + 4, and c might be accessed as STACK_POINTER + 14.

However, when you introduce a variably-sized array, these offsets can no longer be computed at compile-time; some of them will vary depending upon the size that the array has on this invocation of the function. This makes things significantly more complicated for compiler writers, because they must now write code that accesses STACK_POINTER + N - and since N itself varies, it must also be stored somewhere. Often this means doing two accesses - one to STACK_POINTER + <constant> to load N, then another to load or store the actual local variable of interest.

^{1. In fact, "the value of the stack pointer at function entry" is such a useful value to have around, that it has a name of its own - the frame pointer - and many CPUs provide a separate register dedicated to storing the frame pointer. In practice, it is usually the frame pointer from which the location of local variables is calculated, rather than the stack pointer itself.}

Why does C++ compiler allows to create an array of unknown size on stack at compile time?

C++ does not permit Variable Length Arrays (VLAs).

However, the most recent C standard does, so it can sometimes be found as an extension, such as it is with the GCC.

When compiling, make sure so explicitly select a language (chose C++17 or later if you can) and ask for pedantic (strictly standards-conforming) behavior.

How does the compiler allocate memory without knowing the size at compile time?

This is not a "static memory allocation". Your array k is a Variable Length Array (VLA), which means that memory for this array is allocated at run time. The size will be determined by the run-time value of n.

The language specification does not dictate any specific allocation mechanism, but in a typical implementation your k will usually end up being a simple int * pointer with the actual memory block being allocated on the stack at run time.

For a VLA sizeof operator is evaluated at run time as well, which is why you obtain the correct value from it in your experiment. Just use %zu (not %ld) to print values of type size_t.

The primary purpose of malloc (and other dynamic memory allocation functions) is to override the scope-based lifetime rules, which apply to local objects. I.e. memory allocated with malloc remains allocated "forever", or until you explicitly deallocate it with free. Memory allocated with malloc does not get automatically deallocated at the end of the block.

VLA, as in your example, does not provide this "scope-defeating" functionality. Your array k still obeys regular scope-based lifetime rules: its lifetime ends at the end of the block. For this reason, in general case, VLA cannot possibly replace malloc and other dynamic memory allocation functions.

But in specific cases when you don't need to "defeat scope" and just use malloc to allocate a run-time sized array, VLA might indeed be seen as a replacement for malloc. Just keep in mind, again, that VLAs are typically allocated on the stack and allocating large chunks of memory on the stack to this day remains a rather questionable programming practice.

Ensuring array is filled to size at compile time

It is standard practice for such cases of arrays and corresponding enums to compare the "enum size member", N_COLORS in your case, against the number of items in the array.

To get the number of items in an array, simply divide the array size with the size of one array member.

Thus:

_Static_assert(sizeof(COLOR_NAMES)/sizeof(*COLOR_NAMES) == N_COLORS, 
               "Array item missing!");

Edit:

Oh btw for this to be meaningful the array declaration must be const char* COLOR_NAMES[] = otherwise you wouldn't be able to tell if there are missing initializers in the array initialization list.

How to determine the length of an array at compile time?

(sizeof(array)/sizeof(array[0]))

Or as a macro

#define ARRAY_SIZE(foo) (sizeof(foo)/sizeof(foo[0]))

    int array[10];
    printf("%d %d\n", sizeof(array), ARRAY_SIZE(array));

40 10

Caution: You can apply this ARRAY_SIZE() macro to a pointer to an array and get a garbage value without any compiler warnings or errors.

Static memory allocation of array at compile time

The array's size isn't determined at compile time; it's determined at run time. At the time that the array is allocated, n has a known value. In typical implementations where automatic variables are allocated on the program stack, the stack pointer will be adjusted to make room for that many ints. It becomes parts of the stack frame and will be automatically reclaimed when it goes out of scope.

This code was not valid in C90; C90 required that all variables be declared at the beginning of the block, so mixing declarations and code like this was not permitted. Variable-length arrays and mixed code and declarations were introduced in C99.

Determine positions in array at compile-time in c

Rather than trying to kludge C to do something it was not meant to do, a better and common method is to prepare data for a C program by writing a program that prepares the data. That is, write some other program that counts the data in the parts and writes the C code necessary to initialize both DATA and PART_IDX.

Another option is:

Put all the data each part in a separate “.h” file, such as files “part1.h”, “part2.h”, “part3.h”.
To initialize DATA, include all of those header files in its initializer list.
To calculate the indices for the parts, use sizeof to calculate the numbers of elements in proxy arrays containing the preceding parts.

Example:

“part1.h” contains 10, 11, 12,.

“part2.h“ contains 20, 21,.

“part3.h” contains 30, 31, 32, 33,.

The C file is:

const unsigned char DATA[] =
{
    #include "part1.h"
    #include "part2.h"
    #include "part3.h"
};

const unsigned int PART_IDX [] =
{
    0,

    sizeof (const unsigned char []) {
        #include "part1.h"
    } / sizeof (const unsigned char),

    sizeof (const unsigned char []) {
        #include "part1.h"
        #include "part2.h"
    } / sizeof (const unsigned char),
};

#include <stdio.h>

int main(void)
{
    for (int i = 0; i < 3; ++i)
        printf("Part %d begins at index %d with value %d.\n",
            i, PART_IDX[i], DATA[PART_IDX[i]]);
}

In C++ what is the point of std::array if the size has to be determined at compile time?

Ease of programming

std::array facilitates several beneficial interfaces and idioms which are used in std::vector. With normal C-style arrays, one cannot have .size() (no sizeof hack), .at() (exception for out of range), front()/back(), iterators, so on. Everything has to be hand-coded.

Many programmers may choose std::vector even for compile time known sized arrays, just because they want to utilize above programming methodologies. But that snatches away the performance available with compile time fixed size arrays.

Hence std::array was provided by the library makers to discourage the C-style arrays, and yet avoid std::vectors when the size is known at the compile time.

C++, Array size must be an const expression

Because those books are teaching you C++.

It is true, in C++.

What you are using is a non-standard extension provided by GCC specifically, called Variable Length Arrays.

If you turn on all your compiler warnings, which you should always do, you will be informed about this during the build.

Why Does a C/C++ Compiler Need Know the Size of an Array at Compile Time