Why Is Stack Memory Size So Limited

Why is stack memory size so limited?

My intuition is the following. The stack is not as easy to manage as the heap. The stack need to be stored in continuous memory locations. This means that you cannot randomly allocate the stack as needed, but you need to at least reserve virtual addresses for that purpose. The larger the size of the reserved virtual address space, the fewer threads you can create.

For example, a 32-bit application generally has a virtual address space of 2GB. This means that if the stack size is 2MB (as default in pthreads), then you can create a maximum of 1024 threads. This can be small for applications such as web servers. Increasing the stack size to, say, 100MB (i.e., you reserve 100MB, but do not necessarily allocated 100MB to the stack immediately), would limit the number of threads to about 20, which can be limiting even for simple GUI applications.

A interesting question is, why do we still have this limit on 64-bit platforms. I do not know the answer, but I assume that people are already used to some "stack best practices": be careful to allocate huge objects on the heap and, if needed, manually increase the stack size. Therefore, nobody found it useful to add "huge" stack support on 64-bit platforms.

Is there any limit on stack memory?

This all depends on what language and compiler you use. But programs compiled with for instance C or C++ allocate a fixed size stack at program startup. The size of the stack can usually be specified at compile time (on my particular compiler it default to 1 MB).

Stack size on Linux - Limited stack size vs automatic stack expansion

The stack can grow, but not indefinitely. As shown in the diagram in the first question you link to, the stack and the heap can both grow into the free memory area, but if they keep growing eventually they'll run into each other.

It's not common to write programs that need an enormous amount of stack growth. Often if a program keeps growing the stack, it's an indicator a bug causing infinite recursion. Limiting the stack size catches these errors. While there are some algorithms that recurse deeply, they're not common, and can often be refactored into iterative algorithms.

The problem with
int a[1000000];
isn't that it "wastes" the stack. Most architectures have a relatively small limit to the size of a single stack frame, so you can't have such large arrays as local variables.

But except for that, the usual reason to choose heap versus stack memory is related to how the data will be used. Variables on the stack are declared statically in the program code. If you need a variable number of objects, it's usually necessary to use the heap (C has variable-length arrays, but C++ doesn't, and you can't resize them like you can with realloc()). Also, memory allocated on the stack goes away when the function returns, so you must use heap objects for data that persists beyond a single function.

Don't worry about what's he's talking about at 47:40 in the video. Application programs just deal with virtual memory, physical memory is totally hidden and only relevant to the internals of the virtual memory subsystem in the kernel.

The process break is used by the runtime library's implementation of malloc(). You don't normally deal with it directly in application programs.

Size of stack and heap memory

"Stack is created in the top-level-address and the heap at the
low-level-address" Please elobarate this

This is a myth. It may have a basis in historical truth. It might sometimes resonate with things you see in real life. But it is not literally true.

It's easy enough to explore, though:

#include <stdlib.h>
#include <stdio.h>

void check(int depth) {
    char c;
    char *ptr = malloc(1);
    printf("stack at %p, heap at %p\n", &c, ptr);
    if (depth <= 0) return;
    check(depth-1);
}

int main() {
    check(10);
    return 0;
}

On my machine I see:

stack at 0x22ac3b, heap at 0x20010240
stack at 0x22ac0b, heap at 0x200485b0
stack at 0x22abdb, heap at 0x200485c0
stack at 0x22abab, heap at 0x200485d0
stack at 0x22ab7b, heap at 0x200485e0
stack at 0x22ab4b, heap at 0x200485f0
stack at 0x22ab1b, heap at 0x20048600
stack at 0x22aaeb, heap at 0x20048610
stack at 0x22aabb, heap at 0x20048620
stack at 0x22aa8b, heap at 0x20048630
stack at 0x22aa5b, heap at 0x20048640

So, the stack is going downwards and the heap is going upwards (as you might expect based on the myth), but the stack has the smaller address, and they are not growing toward each other (myth busted).

Btw, my check function is tail-recursive, and on some implementations with some compiler options you might see the stack not moving at all. Which tells you something about why the standard doesn't mandate how all this works -- if it did it might inadvertently forbid useful optimizations.

Why is stack size in C# exactly 1 MB?

Sample Image

You are looking at the guy that made that choice. David Cutler and his team selected one megabyte as the default stack size. Nothing to do with .NET or C#, this was nailed down when they created Windows NT. One megabyte is what it picks when the EXE header of a program or the CreateThread() winapi call doesn't specify the stack size explicitly. Which is the normal way, almost any programmer leaves it up the OS to pick the size.

That choice probably pre-dates the Windows NT design, history is way too murky about this. Would be nice if Cutler would write a book about it, but he's never been a writer. He's been extraordinarily influential on the way computers work. His first OS design was RSX-11M, a 16-bit operating system for DEC computers (Digital Equipment Corporation). It heavily influenced Gary Kildall's CP/M, the first decent OS for 8-bit microprocessors. Which heavily influenced MS-DOS.

His next design was VMS, an operating system for 32-bit processors with virtual memory support. Very successful. His next one was cancelled by DEC around the time the company started disintegrating, not being able to compete with cheap PC hardware. Cue Microsoft, they made him a offer he could not refuse. Many of his co-workers joined too. They worked on VMS v2, better known as Windows NT. DEC got upset about it, money changed hands to settle it. Whether VMS already picked one megabyte is something I don't know, I only know RSX-11 well enough. It isn't unlikely.

Enough history. One megabyte is a lot, a real thread rarely consumes more than a couple of handfuls of kilobytes. So a megabyte is actually rather wasteful. It is however the kind of waste you can afford on a demand-paged virtual memory operating system, that megabyte is just virtual memory. Just numbers to the processor, one each for every 4096 bytes. You never actually use the physical memory, the RAM in the machine, until you actually address it.

It is extra excessive in a .NET program because the one megabyte size was originally picked to accommodate native programs. Which tend to create large stack frames, storing strings and buffers (arrays) on the stack as well. Infamous for being a malware attack vector, a buffer overflow can manipulate the program with data. Not the way .NET programs work, strings and arrays are allocated on the GC heap and indexing is checked. The only way to allocate space on the stack with C# is with the unsafe stackalloc keyword.

The only non-trivial usage of the stack in .NET is by the jitter. It uses the stack of your thread to just-in-time compile MSIL to machine code. I've never seen or checked how much space it requires, it rather depends on the nature of the code and whether or not the optimizer is enabled, but a couple of tens of kilobytes is a rough guess. Which is otherwise how this website got its name, a stack overflow in a .NET program is quite fatal. There isn't enough space left (less than 3 kilobytes) to still reliably JIT any code that tries to catch the exception. Kaboom to desktop is the only option.

Last but not least, a .NET program does something pretty unproductive with the stack. The CLR will commit the stack of a thread. That's an expensive word that means that it doesn't just reserve the size of the stack, it also makes sure that space is reserved in the operating system's paging file so the stack can always be swapped out when necessary. Failing to commit is a fatal error and terminates a program unconditionally. That only happens on machine with very little RAM that runs entirely too many processes, such a machine will have turned to molasses before programs start dying. A possible problem 15+ years ago, not today. Programmers that tune their program to act like an F1 race-car use the <disableCommitThreadStack> element in their .config file.

Fwiw, Cutler didn't stop designing operating systems. That photo was made while he worked on Azure.

Update, I noticed that .NET no longer commits the stack. Not exactly sure when or why this happened, it's been too long since I checked. I'm guessing this design change happened somewhere around .NET 4.5. Pretty sensible change.

C/C++ maximum stack size of program on mainstream OSes

In Visual Studio the default stack size is 1 MB i think, so with a recursion depth of 10,000 each stack frame can be at most ~100 bytes which should be sufficient for a DFS algorithm.

Most compilers including Visual Studio let you specify the stack size. On some (all?) linux flavours the stack size isn't part of the executable but an environment variable in the OS. You can then check the stack size with ulimit -s and set it to a new value with for example ulimit -s 16384.

Here's a link with default stack sizes for gcc.

DFS without recursion:

std::stack<Node> dfs;
dfs.push(start);
do {
    Node top = dfs.top();
    if (top is what we are looking for) {
       break;
    }
    dfs.pop();
    for (outgoing nodes from top) {
        dfs.push(outgoing node);
    }
} while (!dfs.empty())

.NET stack memory limit

You're asking the wrong question. If stack size matters, you're doing something wrong.

If you use many datapoints, you'll put them in a collection, such as an array. Arrays are always allocated on then heap. An array of structs embeds the individual structs and forms a continuous memory block. (If you have more than 2GB, you need several arrays).

Whereas with reference types, the array will only contain the references, and the objects get allocated individually on the heap. A heap allocation has about 16 bytes of overhead, the reference in the array accounts for another 8.

You'll also get worse cache locality due to the indirections, and the GC has to do more work, to crawl all those references.

My conclusion is that if you have many small datapoints, make them a struct, and put them in an array.

Why Is Stack Memory Size So Limited