Are Structs Always Stack Allocated or Sometimes Heap Allocated

Are Structs always stack allocated or sometimes heap allocated?

First, read this post from Eric Lippert on The Stack is an Implementation Detail. Follow it with The Truth about Value Types.
As for your specific question

Are struct instances sometimes allocated on the heap?

Yes, they are sometimes allocated on the heap. There are lots of examples of when they could be allocated on the heap. If they are boxed, or if they are fields in a class, or if they are elements of an array, or if they are the value of a variable of value type that has been closed over, etc.

But what happens if I place the struct-values in a list and return that? The elements survives.

You're thinking about this the right way, and this is one of the salient points on where a value type might be allocated. See the second post that I referred to on The Truth About Value Types for more details. But just keep The Stack is an Implementation Detail in mind. The key takeaway is that you really don't need to concern yourself with this stuff. You should be concerned with the semantic difference between value types and reference types.

Does using new on a struct allocate it on the heap or stack?

Okay, let's see if I can make this any clearer.

Firstly, Ash is right: the question is not about where value type variables are allocated. That's a different question - and one to which the answer isn't just "on the stack". It's more complicated than that (and made even more complicated by C# 2). I have an article on the topic and will expand on it if requested, but let's deal with just the new operator.

Secondly, all of this really depends on what level you're talking about. I'm looking at what the compiler does with the source code, in terms of the IL it creates. It's more than possible that the JIT compiler will do clever things in terms of optimising away quite a lot of "logical" allocation.

Thirdly, I'm ignoring generics, mostly because I don't actually know the answer, and partly because it would complicate things too much.

Finally, all of this is just with the current implementation. The C# spec doesn't specify much of this - it's effectively an implementation detail. There are those who believe that managed code developers really shouldn't care. I'm not sure I'd go that far, but it's worth imagining a world where in fact all local variables live on the heap - which would still conform with the spec.


There are two different situations with the new operator on value types: you can either call a parameterless constructor (e.g. new Guid()) or a parameterful constructor (e.g. new Guid(someString)). These generate significantly different IL. To understand why, you need to compare the C# and CLI specs: according to C#, all value types have a parameterless constructor. According to the CLI spec, no value types have parameterless constructors. (Fetch the constructors of a value type with reflection some time - you won't find a parameterless one.)

It makes sense for C# to treat the "initialize a value with zeroes" as a constructor, because it keeps the language consistent - you can think of new(...) as always calling a constructor. It makes sense for the CLI to think of it differently, as there's no real code to call - and certainly no type-specific code.

It also makes a difference what you're going to do with the value after you've initialized it. The IL used for

Guid localVariable = new Guid(someString);

is different to the IL used for:

myInstanceOrStaticVariable = new Guid(someString);

In addition, if the value is used as an intermediate value, e.g. an argument to a method call, things are slightly different again. To show all these differences, here's a short test program. It doesn't show the difference between static variables and instance variables: the IL would differ between stfld and stsfld, but that's all.

using System;

public class Test
{
static Guid field;

static void Main() {}
static void MethodTakingGuid(Guid guid) {}

static void ParameterisedCtorAssignToField()
{
field = new Guid("");
}

static void ParameterisedCtorAssignToLocal()
{
Guid local = new Guid("");
// Force the value to be used
local.ToString();
}

static void ParameterisedCtorCallMethod()
{
MethodTakingGuid(new Guid(""));
}

static void ParameterlessCtorAssignToField()
{
field = new Guid();
}

static void ParameterlessCtorAssignToLocal()
{
Guid local = new Guid();
// Force the value to be used
local.ToString();
}

static void ParameterlessCtorCallMethod()
{
MethodTakingGuid(new Guid());
}
}

Here's the IL for the class, excluding irrelevant bits (such as nops):

.class public auto ansi beforefieldinit Test extends [mscorlib]System.Object    
{
// Removed Test's constructor, Main, and MethodTakingGuid.

.method private hidebysig static void ParameterisedCtorAssignToField() cil managed
{
.maxstack 8
L_0001: ldstr ""
L_0006: newobj instance void [mscorlib]System.Guid::.ctor(string)
L_000b: stsfld valuetype [mscorlib]System.Guid Test::field
L_0010: ret
}

.method private hidebysig static void ParameterisedCtorAssignToLocal() cil managed
{
.maxstack 2
.locals init ([0] valuetype [mscorlib]System.Guid guid)
L_0001: ldloca.s guid
L_0003: ldstr ""
L_0008: call instance void [mscorlib]System.Guid::.ctor(string)
// Removed ToString() call
L_001c: ret
}

.method private hidebysig static void ParameterisedCtorCallMethod() cil managed
{
.maxstack 8
L_0001: ldstr ""
L_0006: newobj instance void [mscorlib]System.Guid::.ctor(string)
L_000b: call void Test::MethodTakingGuid(valuetype [mscorlib]System.Guid)
L_0011: ret
}

.method private hidebysig static void ParameterlessCtorAssignToField() cil managed
{
.maxstack 8
L_0001: ldsflda valuetype [mscorlib]System.Guid Test::field
L_0006: initobj [mscorlib]System.Guid
L_000c: ret
}

.method private hidebysig static void ParameterlessCtorAssignToLocal() cil managed
{
.maxstack 1
.locals init ([0] valuetype [mscorlib]System.Guid guid)
L_0001: ldloca.s guid
L_0003: initobj [mscorlib]System.Guid
// Removed ToString() call
L_0017: ret
}

.method private hidebysig static void ParameterlessCtorCallMethod() cil managed
{
.maxstack 1
.locals init ([0] valuetype [mscorlib]System.Guid guid)
L_0001: ldloca.s guid
L_0003: initobj [mscorlib]System.Guid
L_0009: ldloc.0
L_000a: call void Test::MethodTakingGuid(valuetype [mscorlib]System.Guid)
L_0010: ret
}

.field private static valuetype [mscorlib]System.Guid field
}

As you can see, there are lots of different instructions used for calling the constructor:

  • newobj: Allocates the value on the stack, calls a parameterised constructor. Used for intermediate values, e.g. for assignment to a field or use as a method argument.
  • call instance: Uses an already-allocated storage location (whether on the stack or not). This is used in the code above for assigning to a local variable. If the same local variable is assigned a value several times using several new calls, it just initializes the data over the top of the old value - it doesn't allocate more stack space each time.
  • initobj: Uses an already-allocated storage location and just wipes the data. This is used for all our parameterless constructor calls, including those which assign to a local variable. For the method call, an intermediate local variable is effectively introduced, and its value wiped by initobj.

I hope this shows how complicated the topic is, while shining a bit of light on it at the same time. In some conceptual senses, every call to new allocates space on the stack - but as we've seen, that isn't what really happens even at the IL level. I'd like to highlight one particular case. Take this method:

void HowManyStackAllocations()
{
Guid guid = new Guid();
// [...] Use guid
guid = new Guid(someBytes);
// [...] Use guid
guid = new Guid(someString);
// [...] Use guid
}

That "logically" has 4 stack allocations - one for the variable, and one for each of the three new calls - but in fact (for that specific code) the stack is only allocated once, and then the same storage location is reused.

EDIT: Just to be clear, this is only true in some cases... in particular, the value of guid won't be visible if the Guid constructor throws an exception, which is why the C# compiler is able to reuse the same stack slot. See Eric Lippert's blog post on value type construction for more details and a case where it doesn't apply.

I've learned a lot in writing this answer - please ask for clarification if any of it is unclear!

struct with reference members : heap or stack?

I suppose items is allocated on the heap isn't it?

Yes. Memory for items will be allocated on the heap.

Does g "go" on the heap as well?

No, struct stays on stack. It just has field which holds reference to items list on heap.

What happens for items when g goes out of scope (i.e. does the GC have
to do its job for items)?

If g goes out of scope, then there will be no references to items in application roots. Items will become garbage and will be collected by GC during next garbage collection. Until then items will stay in memory (struct instance will be removed when you'll exit method where you used it).

What if instead of a List we had a variable of type implementing
IDisposable, what would be the best course of action?

Best action is implementing IDisposable by your struct. UPDATE: Actually as @MarcGravell pointed - if possible, its better not to use struct in this case.

Are global structs allocated on the stack or on the heap?

It's implementation-defined (the C++ standard doesn't really talk about stack and heap).

Typically, objects with static storage duration (such as globals) will end up in a special segment of address space that is neither stack nor heap. But the specifics vary from platform to platform.

Creating classes in C, on the stack vs the heap?

There are several reasons for this.

  1. Using "opaque" pointers
  2. Lack of destructors
  3. Embedded systems (stack overflow problem)
  4. Containers
  5. Inertia
  6. "Laziness"

Let's discuss them briefly.

For opaque pointers, it enables you to do something like:

struct CClass_;
typedef struct CClass_ CClass;
// the rest as in your example

So, the user doesn't see the definition of struct CClass_, insulating her from the changes to it and enabling other interesting stuff, like implementing the class differently for different platforms.

Of course, this prohibits using stack variables of CClass. But, OTOH, one can see that this doesn't prohibit allocating CClass objects statically (from some pool) - returned by CClass_create or maybe another function like CClass_create_static.

Lack of destructors - since C compiler will not automatically destruct your CClass stack objects, you need to do it yourself (manually calling the destructor function). So, the only benefit left is the fact that stack allocation is, in general, faster than heap allocation. OTOH, you don't have to use the heap - you can allocate from a pool, or an arena, or some such thing, and that may be almost as fast as stack allocation, without the potential problems of stack allocation discussed below.

Embedded systems - Stack is not an "infinite" resource, you know. Sure, for most applications on today's "Regular" OSes (POSIX, Windows...), it almost is. But, on embedded systems, stack may be as low as a few KBs. That's extreme, but even "big" embedded systems have stack that are in MBs. So, it will run out if over-used. When it does, mostly there is no guarantee what will happen - AFAIK, in both C and C++ that's "Undefined behaviour". OTOH, CClass_create() can return NULL pointer when you're out of memory, and you can handle that.

Containers - C++ users like stack allocation, but, if you create a std::vector on stack, its contents will be heap allocated. You can tweak that, of course, but that is the default behaviour, and it makes ones life much easier to say "all members of a container are heap-allocated" rather than trying to figure out how to handle if they are not.

Inertia - well, the OO came from SmallTalk. Everything is dynamic there, so, the "natural" translation to C is the "put everything on the heap" way. So, the first examples were like that and they inspired others for many years.

"Laziness" - if you know you only want stack objects, you need something like:

CClass CClass_make();
void CClass_deinit(CClass *me);

But, if you want to allow both stack and heap, you need to add:

CClass *CClass_create();
void CClass_destroy(CClass *me);

This is more work to do for the implementer, but is also confusing to the user. One can make slightly different interfaces, but it doesn't change the fact that you need two sets of functions.

Of course, the "containers" reason is also partially a "laziness" reason.

Struct on the heap?

you CAN create a struct type object on the heap by just using new, right?

No, if you do that inside of Main, in general, it won't get allocated on the heap. You can allocate a struct on the heap in many ways, though, including using it as a field in a class, using it in a closure, etc.

The new Date syntax just initializes the struct to it's default (zeroed out) value, but doesn't actually change how or where it's allocated.

C# does new on structs ALWAYS allocate on the stack?

It will be "boxed" and will be allocated on Heap.

Richter CLR via C#:

It's possible to convert value type to a reference type by using mechanism called boxing. Internally, here's what heppens when an instance of valuetype is boxed:

  1. Memory is allocated from the managed heap. The amount of memory allocated is the size required by the value type's fields plus the two additional overhead members (the type object pointer and the sync block index) required by all objects on the managed heap.

  2. The value type's fields are copied to the newly allocated heap memory.

  3. The address of the object is returned. This address is now a reference to an object; the value type is now a reference type.

Structure Elements On The Heap vs The Stack

It is better to keep the simple members as direct values, and allocate just the array. Using the extra two pointers just slows down access for no benefit.

One other option to consider if you have C99 or C11 is to use a flexible array member (FAM).

You'd define your structure using the notation:

typedef struct my_structure_s
{
int bounds[2];
int num_values;
int values[];
} my_structure_t;

You'd allocate enough memory for the structure and an N-element array in values all in a single operation, using:

my_structure_t *np = malloc(sizeof(*np) + N * sizeof(np->values[0]));

This then means you only have to free one block of memory to free.

You can find references to the 'struct hack' if you search. This notation is effectively the standardized form of the struct hack.


In comments, the discussion continued:

This is an interesting approach; however, I can't guarantee I will have C99.

If need be, you can use the 'struct hack' version of the code, which would look like:

typedef struct my_structure_s
{
int bounds[2];
int num_values;
int values[1];
} my_structure_t;

The rest of the code remains unchanged. This uses slightly more memory (4-8 bytes more) than the FAM solution, and isn't strictly supported by the standard, but it was used extensively before the C99 standard so it is unlikely that a compiler would invalidate such code.

Okay, but how about:

typedef struct my_structure_s
{
int bounds[2];
int num_values;
int values[MAX_SIZE];
} my_structure_t;

And then: my_structure_t *the_structure = malloc(sizeof(my_structure_t));

This will also give me a fixed block size on the heap right? (Except here, my block size will be bigger than it needs to be, in some instances, because I won't always get to MAX_SIZE).

If there is not too much wasted space on average, then the fixed-size array in the structure is simpler still. Further, it means that if the MAX_SIZE is not too huge, you can allocate on the stack or on the heap, whereas the FAM approach mandates dynamic (heap) allocation. The issue is whether the wasted space is enough of a problem, and what you do if MAX_SIZE isn't big enough after all. Otherwise, this is much the simplest approach; I simply assumed you'd already ruled it out.

Note that every one of the suggested solutions avoids the pointers to bounds and num_values suggested in option 2 in the question.

c# structs/classes stack/heap control?

While in the general case it's true that objects are always allocated on the heap, C# does let you drop down to the pointer level for heavy interop or for very high performance critical code.

In unsafe blocks, you can use stackalloc to allocate objects on the stack and use them as pointers.

To quote their example:

// cs_keyword_stackalloc.cs
// compile with: /unsafe
using System;

class Test
{
public static unsafe void Main()
{
int* fib = stackalloc int[100];
int* p = fib;
*p++ = *p++ = 1;
for (int i=2; i<100; ++i, ++p)
*p = p[-1] + p[-2];
for (int i=0; i<10; ++i)
Console.WriteLine (fib[i]);
}
}

Note however that you don't need to declare an entire method unsafe, you can just use an unsafe {...} block for it.



Related Topics



Leave a reply



Submit