How Do Compilers Treat Variable Length Arrays

How do compilers treat variable length arrays

In the C99 version of the C standard, variable length arrays are permitted. However, they are not permitted in any version of C++; you're seeing a G++ extension. Note that Microsoft's C compiler does not fully support C99; since G++ supports C99 it's easy enough to apply the VLA support to C++ as an extension.

As to how the compiler usually implements VLAs, it's the same as alloca() (except that it has to keep the size around for sizeof) - the compiler saves the original stack pointer, then adjusts it down by however many bytes it calculates that it needs. The downside is function entry and exit is a bit more complicated as the compiler needs to store where to reset the stack pointer to rather than just adjusting by fixed constants.

How does GCC implement variable-length arrays?

Here's the allocation code (x86 - the x64 code is similar) for the following example line taken from some GCC docs for VLA support:

char str[strlen (s1) + strlen (s2) + 1];

where the calculation for strlen (s1) + strlen (s2) + 1 is in eax (GCC MinGW 4.8.1 - no optimizations):

mov edx, eax
sub edx, 1
mov DWORD PTR [ebp-12], edx
mov edx, 16
sub edx, 1
add eax, edx
mov ecx, 16
mov edx, 0
div ecx
imul eax, eax, 16
call ___chkstk_ms
sub esp, eax
lea eax, [esp+8]
add eax, 0
mov DWORD PTR [ebp-16], eax

So it looks to be essentially alloca().

How do I tell if I am using VLA (Variable Length Array)?

int *numberArray; numberArray = new int[size]; is not a variable-length array, it's a dynamically allocated array. That's fine. Note that you have to delete[] it when done, which you do.

A VLA declaration would look like int numberArray[size]; where size is not a constant. It gets automatically deallocated when it goes out of scope, so you don't use delete on it. They are typically allocated on the stack and so can be created and deallocated very fast, but have various pitfalls. The main one is that there is no way to check if enough stack space is available, and your program will simply crash if there isn't; there is no way to safely detect or handle that error. So you would have to be very careful about checking that the value of size is reasonable.

VLAs are not part of standard C++, but some compilers support them anyway.

Why aren't variable-length arrays part of the C++ standard?

There recently was a discussion about this kicked off in usenet: Why no VLAs in C++0x.

I agree with those people that seem to agree that having to create a potential large array on the stack, which usually has only little space available, isn't good. The argument is, if you know the size beforehand, you can use a static array. And if you don't know the size beforehand, you will write unsafe code.

C99 VLAs could provide a small benefit of being able to create small arrays without wasting space or calling constructors for unused elements, but they will introduce rather large changes to the type system (you need to be able to specify types depending on runtime values - this does not yet exist in current C++, except for new operator type-specifiers, but they are treated specially, so that the runtime-ness doesn't escape the scope of the new operator).

You can use std::vector, but it is not quite the same, as it uses dynamic memory, and making it use one's own stack-allocator isn't exactly easy (alignment is an issue, too). It also doesn't solve the same problem, because a vector is a resizable container, whereas VLAs are fixed-size. The C++ Dynamic Array proposal is intended to introduce a library based solution, as alternative to a language based VLA. However, it's not going to be part of C++0x, as far as I know.

Why is this VLA (variable-length array) definition unreliable?

Diagnosis

In the code in the question, the variable n is uninitialized when it is used in the definition of vla. Indeed, with GCC set fussy, the code shown produces a compilation error (it'll give a warning if you are careless enough to omit -Werror from your compilation options — don't do that!):

$ gcc -std=c11 -O3 -g -Wall -Wextra -Werror -Wstrict-prototypes -Wmissing-prototypes -Wshadow -pedantic-errors  vla37.c -o vla37  
vla37.c: In function ‘main’:
vla37.c:6:5: error: ‘n’ is used uninitialized [-Werror=uninitialized]
6 | double vla[n];
| ^~~~~~
vla37.c:5:9: note: ‘n’ declared here
5 | int n;
| ^
cc1: all warnings being treated as errors
$

(That's from GCC 11.2.0 on a machine running RedHat RHEL 7.4.)

The trouble is that the compiler must know the size of the array when it is declared, but the value in n is undefined (indeterminate) because it is uninitialized. It could be huge; it could be zero; it could be negative.

Prescription

The cure for the problem is simple — make sure the size is known and sane before it is used to declare the VLA:

#include <stdio.h>

int main(void)
{
int n;

if (scanf("%d", &n) != 1)
return 1;

double vla[n];
for (int i = 0; i < n; i++)
{
if (scanf("%lf", &vla[i]) != 1)
return 1;
}
for (int i = 0; i < n; i++)
printf("[%d] = %.2f\n", i, vla[i]);
return 0;
}

Now you can run the result:

$ vla41 <<<'9 2.34 3.45 6.12 8.12 99.60 -12.31 1 2 3'
[0] = 2.34
[1] = 3.45
[2] = 6.12
[3] = 8.12
[4] = 99.60
[5] = -12.31
[6] = 1.00
[7] = 2.00
[8] = 3.00
$

(That assumes your shell is Bash or compatible with Bash and supports 'here strings' (the <<<'…' notation.)

The code shown in the question and in this answer is barely adequate in handling I/O errors; it detects input problems but doesn't provide useful feedback to the user.
The code shown does not validate the value of n for plausibility. You should ensure that the size is larger than zero and less than some upper bound. The maximum size depends on the size of the data being stored in the VLA and the platform you're on.

If you're on a Unix-like machine, you probably have 8 MiB of stack; if you're on a Windows machine, you probably have 1 MiB of stack; if you're on an embedded system, you may have much less stack available to you. You need to leave some stack space for other code too, so you should probably check that the array size is not more than, for sake of discussion, 1024 — that would be 8 KiB of stack for an array of double, which is not huge at all but it provides plenty of space for most homework programs. Tweak the number larger to suit your purposes, but when the number grows, you should use malloc() et al to dynamically allocate the array instead of using an on-stack VLA. For example, on a Windows machine, if you use a VLA of type int, setting the size above 262,144 (256 * 1024) almost guarantees that your program will crash, and it may crash at somewhat smaller sizes than that.

Lessons to learn

  • Compile with stringent warning options.
  • Compile with -Werror or its equivalent so warnings are treated as errors.
  • Make sure the variable defining the size of a VLA is initialized before defining the array.
    • Not too small (not zero, not negative).
    • Not too big (not using more than 1 megabyte on Windows, 8 megabytes on Unix).
    • Leave a decent margin for other code to use as well.

Note that all compilers that support VLAs also support variables defined at arbitrary points within a function. Both features were added in C99. VLAs were made optional in C11 — and a compiler should define __STDC_NO_VLA__ if it does not support VLAs at all but claims conformance with C11 or later.

C++ and variable-length arrays

Standard C++ does not support C-style VLAs. However, GCC (g++) does support them by default as an extension. This can cause confusion.

Why is it a bad idea to use variable length arrays in C++?

Variable length arrays are supported by some compilers as an extension. They are not standard C++. Using VLAs makes your code non-portable.

A better alternative is to use std::vector.

int i = 0;
std::cin >> i;

std::vector<int> array(i);

How does C99 handle being unable to create a variable length array at run time?

From the point of view of the Standard, an attempt to allocate a VLA with a size the implementation cannot accommodate invokes Undefined Behavior. Because the Standard provides no means of discovering what size array an implementation could safely create, and does not mandate that implementations allow any particular size, any attempt to create a VLA object with a size greater than 1 should be regarded as invoking Undefined Behavior except in cases where one happens to know enough about implementation's inner workings to determine the size of VLA it will be able to handle.

If malloc() is unavailable, one's best bet may be to define a large array of
whatever type has the coarsest alignment requirement, store its address into
a volatile-qualified pointer [the storage in which the pointer itself resides
should be thus qualified] read it back, and interpret that as the start of a
memory pool. No other use should be made of the original array object. While
the Standard wouldn't guarantee that a compiler wouldn't decide that it should generate code that checks whether the pointer still identifies the original object and, if it does, skipping any code that would use that pointer to access anything other than the original object's type, the use of volatile on the pointer should make that really unlikely.

Once a memory pool is created, you can write your own memory-management
functions to use it, though any time a pointer is returned to the pool it
may be necessary to use the volatile-pointer-laundering hack to prevent
compilers from using type-based aliasing to justify treating the last uses
of storage as its old type as unsequenced relative to the first uses of
storage as a new type.

What are the differences between Variable Length Arrays and Flexible Array Member?

There are two different things that the C99 standard added and they are easy to mix up by mistake:

Flexible array members. This means that a struct can have a member of unknown size at the end. Example from the C standard:

    struct s { int n; double d[]; };

int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));

This was used before C99 as well, but it was then undefined behavior, known as the "struct hack" referred to in another answer. Before C90, there could be unexpected padding bytes at the end of the struct, leading to bugs.

Variable length arrays (VLA). These are arrays with their size set in runtime. They are most likely implemented by the compiler by using dynamic memory allocation. Example:

void func (int n)
{
int array[n];
}

referred from user29079 : https://softwareengineering.stackexchange.com/questions/154089/c-flexible-arrays-when-did-they-become-part-of-the-standard

Is using alloca() for variable length arrays better than using a vector on the heap?

You could and probably should use some dynamically allocated heap memory, such as managed by a std::vector (as answered by Peter). You could use smart pointers, or plain raw pointers (new, malloc,....) that you should not forget to release (delete,free,....). Notice that heap allocation is probably faster than what you believe (practically, much less than a microsecond on current laptops most of the time).

Sometimes you can move the allocation out of some inner loop, or grow it only occasionally (so for a realloc-like thing, better use unsigned newsize=5*oldsize/4+10; than unsigned newsize=oldsize+1; i.e. have some geometrical growth). If you can't use vectors, be sure to keep separate allocated size and used lengths (as std::vector does internally).

Another strategy would be to special case small sizes vs bigger ones. e.g. for an array less than 30 elements, use the call stack; for bigger ones, use the heap.

If you insist on allocating (using VLAs -they are a commonly available extension of standard C++11- or alloca) on the call stack, be wise to limit your call frame to a few kilobytes. The total call stack is limited (e.g. often to about a megabyte or a few of them on many laptops) to some implementation specific limit. In some OSes you can raise that limit (see also setrlimit(2) on Linux)

Be sure to benchmark before hand-tuning your code. Don't forget to enable compiler optimization (e.g. g++ -O2 -Wall with GCC) before benchmarking. Remember that caches misses are generally much more expensive than heap allocation. Don't forget that developer's time also has some cost (which often is comparable to cumulated hardware costs).

Notice that using static variable or data has also issues (it is not reentrant, not thread safe, not async-signal-safe -see signal-safety(7) ....) and is less readable and less robust.



Related Topics



Leave a reply



Submit