Boxing Occurrence in C#

Boxing Occurrence in C#

That’s a great question!

Boxing occurs for exactly one reason: when we need a reference to a value type. Everything you listed falls into this rule.

For example since object is a reference type, casting a value type to object requires a reference to a value type, which causes boxing.

If you wish to list every possible scenario, you should also include derivatives, such as returning a value type from a method that returns object or an interface type, because this automatically casts the value type to the object / interface.

By the way, the string concatenation case you astutely identified also derives from casting to object. The + operator is translated by the compiler to a call to the Concat method of string, which accepts an object for the value type you pass, so casting to object and hence boxing occurs.

Over the years I’ve always advised developers to remember the single reason for boxing (I specified above) instead of memorize every single case, because the list is long and hard to remember. This also promotes understanding of what IL code the compiler generates for our C# code (for example + on string yields a call to String.Concat). When your’e in doubt what the compiler generates and if boxing occurs, you can use IL Disassembler (ILDASM.exe). Typically you should look for the box opcode (there is just one case when boxing might occur even though the IL doesn't include the box opcode, more detail below).

But I do agree that some boxing occurrences are less obvious. You listed one of them: calling a non-overridden method of a value type. In fact, this is less obvious for another reason: when you check the IL code you don’t see the box opcode, but the constraint opcode, so even in the IL it’s not obvious that boxing happens! I won't get into the exact detail why to prevent this answer from becoming even longer...

Another case for less obvious boxing is when calling a base class method from a struct. Example:

struct MyValType
{
public override string ToString()
{
return base.ToString();
}
}

Here ToString is overridden, so calling ToString on MyValType won’t generate boxing. However, the implementation calls the base ToString and that causes boxing (check the IL!).

By the way, these two non-obvious boxing scenarios also derive from the single rule above. When a method is invoked on the base class of a value type, there must be something for the this keyword to refer to. Since the base class of a value type is (always) a reference type, the this keyword must refer to a reference type, and so we need a reference to a value type and so boxing occurs due to the single rule.

Here is a direct link to the section of my online .NET course that discusses boxing in detail: http://motti.me/mq

If you are only interested in more advanced boxing scenarios here is a direct link there (though the link above will take you there as well once it discusses the more basic stuff): http://motti.me/mu

I hope this helps!

Motti

Does boxing occur when reference type parameters are used in struct methods?

Reference types can't be boxed, only value types may. I don't think the allocation of memory for local variables and parameters within a method has anything to do with whether the method is declared in a reference type or value type.

Boxing and unboxing: when does it come up?

It's much less of an issue now than it was prior to generics. Now, for example, we can use:

List<int> x = new List<int>();
x.Add(10);
int y = x[0];

No boxing or unboxing required at all.

Previously, we'd have had:

ArrayList x = new ArrayList();
x.Add(10); // Boxing
int y = (int) x[0]; // Unboxing

That was my most common experience of boxing and unboxing, at least.

Without generics getting involved, I think I'd probably say that reflection is the most common cause of boxing in the projects I've worked on. The reflection APIs always use "object" for things like the return value for a method - because they have no other way of knowing what to use.

Another cause which could catch you out if you're not aware of it is if you use a value type which implements an interface, and pass that value to another method which has the interface type as its parameter. Again, generics make this less of a problem, but it can be a nasty surprise if you're not aware of it.

Boxing Memory concerns

Yes, the code snippet that you've shown will indeed cause boxing. You're forcing the run-time to convert a double into an object. If you want to prove it to yourself or a colleague, check the compiled IL for the tell-tale box and unbox instructions.

However, while you're correct in looking out to avoid boxing wherever possible, the actual performance penalty is not always as significant as the hype makes it out to be. Before making breaking changes to your codebase, invest some time profiling to make sure that the code you're spending your time on is really a performance bottleneck.

It's not particularly clear given the specific example above why you need to return type object in the first place. Since you're just returning the value of a private field, you could simply change the property to return type double, instead.

Alternatively, you could convert the property to a generic method. (Properties can't be generic, but methods can, and if you're doing computationally-intense work inside the getter, it probably should be a method anyway.) Generics alleviate the problem of boxing, but still allow you an immense degree of flexibility in what type is returned (similar to returning type object).

Use cases for boxing a value type in C#?

In general, you typically will want to avoid boxing your value types.

However, there are rare occurances where this is useful. If you need to target the 1.1 framework, for example, you will not have access to the generic collections. Any use of the collections in .NET 1.1 would require treating your value type as a System.Object, which causes boxing/unboxing.

There are still cases for this to be useful in .NET 2.0+. Any time you want to take advantage of the fact that all types, including value types, can be treated as an object directly, you may need to use boxing/unboxing. This can be handy at times, since it allows you to save any type in a collection (by using object instead of T in a generic collection), but in general, it is better to avoid this, as you're losing type safety. The one case where boxing frequently occurs, though, is when you're using Reflection - many of the calls in reflection will require boxing/unboxing when working with value types, since the type is not known in advance.

C# boxing when casting to ValueType

Yes, it gets boxed.

Think about it... for the value not to get boxed there should be some common binary representation that can be any value type - including all built in ones and any struct you may define in the future.

Since such a binary representation doesn't exist the value must be boxed.

Explanation:

When you call a method with parameters the caller places a sequence of bits at an agreed about location and in an agreed about format, for example an int is 32bits with negative numbers encoded as 1-complement, a double is 64bits encoded in IEEE floating point format, etc.

You can't have one method that can except both unboxed int and double because it wouldn't know how many bits to read and how to decode themץ

If you do want a method to accept both you can give the function the memory location of the value (the location itself is of known size and format so the method knows how to decode it) and some meta data so the method knows the actual type of the value - wrapping the value with metadata and providing it's memory location is called (surprise, surprise) "boxing"

So, anytime you pass a value using a parameter/variable/whatever that is not the exact type the system has to box the value or the receiver wouldn't know much memory the value really uses and how to decode that memory from a sequence of bits back to a number or structure.

This only applies to value types because reference types are always passed by using the memory location (the memory location is called a "reference" in .net).

Is boxing involved when calling ToString for integer types?

You've already got answers telling you that when ToString() is overridden for a value type, there will be no boxing when you call it, but it's nice to have some way of actually seeing that.

Take the type int? (Nullable<int>). This is a useful type because it is a value type, yet boxing may produce a null reference, and instance methods cannot be called through a null reference. It does have an overridden ToString() method. It does not have (and cannot have) an overridden GetType() method.

int? i = null;
var s = i.ToString(); // okay: initialises s to ""
var t = i.GetType(); // not okay: throws NullReferenceException

This shows that there is no boxing in the call i.ToString(), but there is boxing in the call i.GetType().

Is boxing going to happen here if T is int?

There has been some confusion in the comments to the original question and to Rango's (basically correct) answer, so I thought I'd clear those up.

First off, a note about how generics work in C#. Generics are not templates!

In C#, generics are compiled once by the C# compiler into generic IL, and that IL is then recompiled into specialized forms by the jitter. For example, if we have a method M<T>(T t), then the C# compiler will compile that method and its body once, into IL.

When the jitter comes along, a call to M<string>, M<object> or M<IEnumerable> will trigger exactly one compilation; the jitter is very clever and it can compile the body into a form where it works no matter what the type argument is, provided that the type argument is a reference type. But M<int> and M<double> will each be compiled into their own assembly code body.

Note that the jitter does not know the rules of C#, and C# does overload resolution. By the time the C# compiler generates the IL, the exact method for every method call has already been chosen. So if you have:

static bool X(object a, object b) => object.Equals(a, b);
static bool X(int a, int b) => a == b;
static bool M<T>(T v, T m) => X(v, m);

then overload resolution chooses X(object, object) and compiles the code as though you wrote:

static bool M<T>(T v, T m) => X((object)v, (object)m);

If T turns out to be int, then both ints are boxed to object.

Let me re-emphasize that. By the time we get to the jitter, we already know which X is going to be called; that decision was made at C# compile time. The C# compiler reasons "I've got two Ts, I do not know that they are convertible to int, so I've got to choose the object version".

This is in contrast to C++ template code, which re-compiles the code for each template instantiation, and re-does overload resolution.

So that answers the original question that was asked.

Now let's get into the weird details.

When jit compiling M<int>, is the jitter permitted to notice that M<int> calls X(object, object), which then calls object.Equals(object, object), which is known to compare two boxed ints for equality, and generate the code directly that compares the two ints in their unboxed form?

Yes, the jitter is permitted to perform that optimization.

Does it in practice perform that optimization?

Not to my knowledge. The jitter does perform some inlining optimizations, but to my knowledge it does not perform any inlinings that advanced.

Are there situations in which the jitter does in practice elide a boxing?

Yes!

Can you give some examples?

Sure thing. Consider the following terrible code:

struct S 
{
public int x;
public void M()
{
this.x += 1;
}
}

When we do:

S s = whatever;
s.M();

What happens? this in a value type is equivalent to a parameter of type ref S. So we take a ref to s, pass it to M, and so on.

Now consider the following:

interface I
{
void M();
}
struct S : I { /* body as before */ }

Now suppose we do this:

S s = whatever;
I i = s;
i.M();

What happens?

  • Converting s to I is a boxing conversion, so we allocate a box, make the box implement I, and make a copy of s in the box.
  • Calling i.M() passes the box as the receiver to the implementation of I in the box. That then takes a ref to the copy of s in the box, and passes that ref as the this to M.

All right, now comes the bit that will confuse the heck out of you.

void Q<T>(T t) where T : I
{
t.M();
}
...
S s = whatever;
Q<S>(s);

Now what happens? Obviously we make a copy of s into t and there is no boxing; both are of type S. But: I.M is expecting a receiver of type I, and t is of type S. Do we have to do what we did before? Do we box t to a box that implements I, and then the box calls S.M with the this being a ref to the box?

No. The jitter generates code that elides the boxing and calls S.M directly with ref t as this.

What does this mean? This means that:

void Q<T>(T t) where T : I
{
t.M();
}

and

void Q<T>(T t) where T : I
{
I i = t;
i.M();
}

Are different! The former mutates t because the boxing is skipped. The latter boxes and then mutates the box.

The takeaway here should be mutable value types are pure evil and you should avoid them at all costs. As we've seen, you can very easily get into situations where you think you should be mutating a copy, but you're mutating the original, or worse, situations where you think you are mutating an original, but you're mutating a copy.

What bizarre magic makes this work?

Use sharplab.io and disassemble the methods I've given into IL. Read the IL very carefully; if there is anything you don't understand, look it up. All the magical mechanisms that make this optimization work are well-documented.

Does the jitter always do this?

No! (As you would know if you read all the documentation as I just suggested.)

However, it is slightly tricky to construct a scenario where the optimization cannot be performed. I will leave that as a puzzle:

Write me a program where we have a struct type S that implements an interface I. We constrain type parameter T to I, and construct T with S, and pass in a T t. We call a method with t, as the receiver, and the jitter always causes the receiver to be boxed.

Hint: I predict that the called method's name has seven letters in it. Was I right?

Challenge #2: A question: Is it possible to also demonstrate that the boxing occurred using the same technique that I suggested before? (That technique being: show that a boxing must have occurred because a mutation happened to a copy, not to the original.

Are there scenarios where the jitter boxes unnecessarily?

Yes! When I was working on the compiler, the jitter did not optimize away "box T to O, immediately unbox O back to T" instruction sequences, and sometimes the C# compiler is required to generate such sequences to make the verifier happy. We requested that the optimization be implemented; I do not know if it ever was.

Can you give an example?

Sure. Suppose we have

class C<T>
{
public virtual void M<U>(T t, U u) where U : T { }
}
class D : C<int>
{
public override void M<U>(int t, U u)
{

OK, now at this point you know that the only possible type for U is int, and so t should be assignable to u, and u should be assignable to t, right? But the CLR verifier does not see it that way, and you can then run into situations where the compiler must generate code that causes int to be boxed to object and then unboxed to U, which is int, so the round-trip is pointless.

What's the takeaway here?

  • Do not mutate value types.
  • Generics are not templates. Overload resolution only happens once.
  • The jitter works very hard to eliminate boxing in generics, but if a T is converted to object, then that T is really, truly converted to object.

Fundamental question about boxing / c#

To do that you have to set the value, like this:

string foo = "aaaaaaa";
var bar = new System.Web.UI.HtmlControls.HtmlGenericControl("div") { InnerHtml = foo };
bar.InnerHtml = "zzzzzz";
plcBody.Controls.Add(bar);

Strings themselves are immutable (in .NET at least, this isn't universally true), you can't change it after it's been passed...you passed the value of the variable, which is a string reference - you haven't passed a reference to the original variable, so changing the original variable to refer to a different string doesn't do anything. When you change the variable, you're changing which string foo refers to, not editing its original string, as that's immutable.

If it's easier to think of, you're passing "what foo means" not "foo itself", so once that string goes into whatever you're passing it into, it has no relation to the original variable.



Related Topics



Leave a reply



Submit