Arrays, Heap and Stack and Value Types

Arrays, heap and stack and value types

Your array is allocated on the heap, and the ints are not boxed.

The source of your confusion is likely because people have said that reference types are allocated on the heap, and value types are allocated on the stack. This is not an entirely accurate representation.

All local variables and parameters are allocated on the stack. This includes both value types and reference types. The difference between the two is only what is stored in the variable. Unsurprisingly, for a value type, the value of the type is stored directly in the variable, and for a reference type, the value of the type is stored on the heap, and a reference to this value is what is stored in the variable.

The same holds for fields. When memory is allocated for an instance of an aggregate type (a class or a struct), it must include storage for each of its instance fields. For reference-type fields, this storage holds just a reference to the value, which would itself be allocated on the heap later. For value-type fields, this storage holds the actual value.

So, given the following types:

class RefType{
public int I;
public string S;
public long L;
}

struct ValType{
public int I;
public string S;
public long L;
}

The values of each of these types would require 16 bytes of memory (assuming a 32-bit word size). The field I in each case takes 4 bytes to store its value, the field S takes 4 bytes to store its reference, and the field L takes 8 bytes to store its value. So the memory for the value of both RefType and ValType looks like this:


0 ┌───────────────────┐
│ I │
4 ├───────────────────┤
│ S │
8 ├───────────────────┤
│ L │
│ │
16 └───────────────────┘

Now if you had three local variables in a function, of types RefType, ValType, and int[], like this:

RefType refType;
ValType valType;
int[] intArray;

then your stack might look like this:


0 ┌───────────────────┐
│ refType │
4 ├───────────────────┤
│ valType │
│ │
│ │
│ │
20 ├───────────────────┤
│ intArray │
24 └───────────────────┘

If you assigned values to these local variables, like so:

refType = new RefType();
refType.I = 100;
refType.S = "refType.S";
refType.L = 0x0123456789ABCDEF;

valType = new ValType();
valType.I = 200;
valType.S = "valType.S";
valType.L = 0x0011223344556677;

intArray = new int[4];
intArray[0] = 300;
intArray[1] = 301;
intArray[2] = 302;
intArray[3] = 303;

Then your stack might look something like this:


0 ┌───────────────────┐
│ 0x4A963B68 │ -- heap address of `refType`
4 ├───────────────────┤
│ 200 │ -- value of `valType.I`
│ 0x4A984C10 │ -- heap address of `valType.S`
│ 0x44556677 │ -- low 32-bits of `valType.L`
│ 0x00112233 │ -- high 32-bits of `valType.L`
20 ├───────────────────┤
│ 0x4AA4C288 │ -- heap address of `intArray`
24 └───────────────────┘

Memory at address 0x4A963B68 (value of refType) would be something like:


0 ┌───────────────────┐
│ 100 │ -- value of `refType.I`
4 ├───────────────────┤
│ 0x4A984D88 │ -- heap address of `refType.S`
8 ├───────────────────┤
│ 0x89ABCDEF │ -- low 32-bits of `refType.L`
│ 0x01234567 │ -- high 32-bits of `refType.L`
16 └───────────────────┘

Memory at address 0x4AA4C288 (value of intArray) would be something like:


0 ┌───────────────────┐
│ 4 │ -- length of array
4 ├───────────────────┤
│ 300 │ -- `intArray[0]`
8 ├───────────────────┤
│ 301 │ -- `intArray[1]`
12 ├───────────────────┤
│ 302 │ -- `intArray[2]`
16 ├───────────────────┤
│ 303 │ -- `intArray[3]`
20 └───────────────────┘

Now, if you passed intArray to another function, the value pushed onto the stack would be 0x4AA4C288, the address of the array, not a copy of the array.

How is value type array allocated in a heap?


How can array require only a single heap allocation?... or what does it mean by single heap allocation

First of all, let's clarify what we mean by "heap" vs "stack".

Most programming environments today are stack-based. As you run a program, each time you call a method a new entry is pushed onto a special stack provided for your program. This stack entry (or frame) tells the system where to look for the method's executable code, what arguments were passed, and exactly where to return to in the calling code after the method exits. When a method finishes, it's entry is removed (popped) from the stack, so the program can go back to the previous method. When the stack is empty, the program has finished.1 There is often support for this special stack directly in the CPU.

The memory for the stack is allocated when the program is first launched, which means the stack itself has a fixed (limited) size. This is where "Stack Overflows" come from; get too deep down too many method calls, and the stack will run out of space. Each frame on the stack also has a certain amount of space for local variables for the method, and this is the memory we're talking about when we say value types live on the stack. The local variables stored on the stack do not require new allocations; the memory is already there. Just remember: this only applies in the context of local variables in a method.

The heap, on the other hand, is memory not automatically granted to the program. It is memory the program must request above and beyond it's core allocation. Heap memory has to be managed more carefully (so it doesn't leak — but we have a garbage collector to help with this), but there is (usually) much more of it available. Because it has to be granted by the operating system on request, initial allocations for the heap are also a little slower than memory used from the stack.2 For reference types, you can think of the new keyword as requesting a new heap memory allocation.

As a broad generalization, we say reference types are allocated on the heap, and value types are allocated on the stack (though there are plenty of exceptions for this3).


Now we understand this much, we can start to look at how .Net handles arrays.

The core array type itself is a reference type. What I mean is, for any given type T, the T may (or may not) be a value type, but an array of T (T[]) is always a reference type. In the "stack vs heap" context, this means creating a new array is a heap allocation, even if T is a value type. Arrays in .Net also have a fixed size4.

An additional attribute of value types is they also have a known/fixed size, based on the members. Therefore, an array of value types has a fixed number of elements, each with a known fixed size. That's enough information so allocating a new array of value types will get all the space for the array object and it's elements in single heap allocation. The value of each item (not just a reference) is held right there with the array's core memory. Now we have a bunch of value-type objects, but their memory is on the heap, rather than the stack.

This can be further complicated by a value type with one or more reference type members. In this situation, the space for the value type is allocated as normal, but the the part of the value for the reference members is just a reference. It still requires separate allocations or assignments to populate those reference members.

For an array holding reference types, the initial heap allocation for the array still allocates space for all the elements, but space reserved for each element is only enough for the reference. That is, initially each element in the array is still null. To populate the array you must still set those references to objects in memory, either by assigning existing objects or allocating new ones.

Finally, just as we were able to use arrays to get a value-type onto the heap, instead of the stack, there are also ways to force reference types to allocate from the stack. However, generally you should not do this unless you really understand the full implications.


1) There are different conventions on exactly when a frame is pushed/popped for each method call, depending on the platform, compiler configuration, and more, so only look at this paragraph for the general idea; the exact specifics will be incorrect in some particulars on any given platform.

2) For future reading, it is also useful to understand how programs handle addressing for heap memory.

3) Eric Lippert has an excellent write-up of this topic.

4) That is, arrays in .Net they are true arrays in the full formal computer science sense, unlike the associative array-like collection types in many other platforms. .Net has these, too, but it calls them what they are: collections rather than arrays.

Array of ValueType in C# goes to Heap or Stack?

The array itself is memory allocated on the heap. People get confused and think value types are always on the stack and this is not always correct.

For example, in this class:

public class X
{
public int Y;
}

Even though Y is an int (value type), since it is contained in a class (reference type), the int Y will be stored with the object itself, which is most likely on the heap.

The array is similar. The ints are contained in the array as value types, but the array itself is on the heap, and thus the ints are too.

How is an array of value types stored in .NET object heap?

Yes, you are right. I suggest you read this:

https://ericlippert.com/2010/09/30/the-truth-about-value-types/

It's very very good, and it explains nearly everything you'll ever want to know.

Stack and heap misunderstanding in Swift


I've always known that reference type variables are stored in the heap while value type variables are stored in the stack.

This is only partially true in Swift. In general, Swift makes no guarantees about where objects and values are stored, except that:

  1. Reference types have a stable location in memory, so that all references to the same object point to exactly the same place, and
  2. Value types are not guaranteed to have a stable location in memory, and can be copied arbitrarily as the compiler sees fit

This technically means that object types can be stored on the stack if the compiler knows that an object is created and destructed within the same stack frame with no escaping references to it, but in practice, you can basically assume that all objects are allocated on the heap.

For value types, the story is a little more complicated:

  • Unless a location-based reference is required of a value (e.g., taking a reference to a struct with &), a struct may be located entirely in registers: operating on small structs may place its members in CPU registers so it never even lives in memory. (This is especially the case for small, possibly short-lived value types like Ints and Doubles, which are guaranteed to fit in registers)
  • Large value types do actually get heap-allocated: although this is an implementation detail of Swift that theoretically could change in the future, structs which are larger than 3 machine words (e.g., larger than 12 bytes on a 32-bit machine, or 24 bytes on a 64-bit machine) are pretty much guaranteed to be allocated and stored on the heap. This doesn't conflict with the value-ness of a value type: it can still be copied arbitrarily as the compiler wishes, and the compiler does a really good job of avoiding unnecessary allocations where it can

So where are ints, doubles, strings, etc. are kept when they are defined inside a class, aka reference type?

This is an excellent question that gets at the heart of what a value type is. One way to think of the storage of a value type is inline, wherever it needs to be. Imagine a

struct Point {
var x: Double
var y: Double
}

structure, which is laid out in memory. Ignoring the fact that Point itself is a struct for a second, where are x and y stored relative to Point? Well, inline wherever Point goes:

┌───────────┐
│ Point │
├─────┬─────┤
│ x │ y │
└─────┴─────┘

When you need to store a Point, the compiler ensures that you have enough space to store both x and y, usually one immediately following the other. If a Point is stored on the stack, then x and y are stored on the stack, one after the other; if Point is stored on the heap, then x and y live on the heap as part of Point. Wherever Swift places a Point, it always ensures you have enough space, and when you assign to x and y, they are written to that space. It doesn't terribly matter where that is.

And when Point is part of another object? e.g.

class Location {
var name: String
var point: Point
}

Then Point is also laid out inline wherever it is stored, and its values are laid out inline as well:

┌──────────────────────┐
│ Location │
├──────────┬───────────┤
│ │ Point │
│ name ├─────┬─────┤
│ │ x │ y │
└──────────┴─────┴─────┘

In this case, when you create a Location object, the compiler ensures that there's enough space to store a String and two Doubles, and lays them out one after another. Where that is, again, doesn't matter, but in this case, it's all on the heap (because Location is a reference type, which happens to contain values).


As for the other way around, object storage has to components:

  1. The variable you use to access the object, and
  2. The actual storage for the object

Let's say that we changed Point from being a struct to being a class. When before, Location stored the contents of Point directly, now, it only stores a reference to their actual storage in memory:

┌──────────────────────┐      ┌───────────┐
│ Location │ ┌───▶│ Point │
├──────────┬───────────┤ │ ├─────┬─────┤
│ name │ point ──┼─┘ │ x │ y │
└──────────┴───────────┘ └─────┴─────┘

Before, when Swift laid out space to create a Location, it was storing one String and two Doubles; now, it stores one String and one pointer to a Point. Unlike in languages like C or C++, you don't actually need to be aware of the fact that Location.point is now a pointer, and it doesn't actually change how you access the object; but under the hood, the size and "shape" of Location has changed.

The same goes for storing all other reference types, including closures. A variable holding a closure is largely just a pointer to some metadata for the closure, and a way to execute the closure's code (though the specifics of this are out of scope for this answer):

┌───────────────────────────────┐     ┌───────────┐
│ MyStruct │ │ closure │
├─────────┬─────────┬───────────┤ ┌──▶│ storage │
│ prop1 │ prop2 │ closure ─┼─┘ │ + code │
└─────────┴─────────┴───────────┘ └───────────┘

Memory address for arrays allocated on heap and on stack

In your first example, a is a 1-dimensional array with 3 elements, where each element is a char[6] array. Since the elements of a given array are sequential in memory, the entire array is sequential:

-------------------------------------------------------------------------------------
| ------------------------- | ------------------------- | ------------------------- |
| | 0 | 1 | 2 | 3 | 4 | 5 | | | 0 | 1 | 2 | 3 | 4 | 5 | | | 0 | 1 | 2 | 3 | 4 | 5 | |
| ------------------------- | ------------------------- | ------------------------- |
-------------------------------------------------------------------------------------
a[0] a[1] a[2]

However, in your second example, a is a pointer to a 1-dimensional array with 3 elements, where each element is a pointer to a separate char[6] array located elsewhere in memory, wherever malloc() decided to allocate them:

                                 -------------------------
| 0 | 1 | 2 | 3 | 4 | 5 |
------------------------- -------------------------
| 0 | 1 | 2 | 3 | 4 | 5 | ^
------------------------- a[1]
^ |
a[0] ----------------
| |
| | -------------------------
| | | 0 | 1 | 2 | 3 | 4 | 5 |
| | -------------------------
| | ^
|__________ | a[2]
| | |
-------------------
| 0 | 1 | 2 |
-------------------
^
a


Related Topics



Leave a reply



Submit