Has Foreach's Use of Variables Been Changed in C# 5

Has foreach's use of variables been changed in C# 5?

This is a change to the C# language, not the .NET framework. Therefore, it only affects code compiled under C# 5.0, regardless of the .NET framework version on which that code will execute.

C# 5.0

Section 8.8.4 of the specification makes it clear that this change has been made. Specifically, page 249 of the C# 5.0 specification states:

foreach (V v in x) embedded-statement

is then expanded to:

{
E e = ((C)(x)).GetEnumerator();
try {
while (e.MoveNext()) {
V v = (V)(T)e.Current;
embedded-statement
}
}
finally {
… // Dispose e
}
}

And later:

The placement of v inside the while loop is important for how it is
captured by any anonymous function occurring in the
embedded-statement.

C# 4.0

This change to the specification is clear when comparing with the C# 4.0 specification which states (again, in section 8.8.4, but this time, page 247):

foreach (V v in x) embedded-statement

is then expanded to:

{
E e = ((C)(x)).GetEnumerator();
try {
V v;
while (e.MoveNext()) {
v = (V)(T)e.Current;
embedded-statement
}
}
finally {
… // Dispose e
}
}

Note that the variable v is declared outside the loop instead of inside, as it is with C# 5.0.

Note

You can find the C# specification in the installation folder of Visual Studio under VC#\Specifications\1033. This is the case for VS2005, VS2008, VS2010 and VS2012, giving you access to specifications for C# 1.2, 2.0, 3.0, 4.0 and 5.0. You can also find the specifications on MSDN by searching for C# Specification.

Is there a reason for C#'s reuse of the variable in a foreach?

The compiler declares the variable in a way that makes it highly prone to an error that is often difficult to find and debug, while producing no perceivable benefits.

Your criticism is entirely justified.

I discuss this problem in detail here:

Closing over the loop variable considered harmful

Is there something you can do with foreach loops this way that you couldn't if they were compiled with an inner-scoped variable? or is this just an arbitrary choice that was made before anonymous methods and lambda expressions were available or common, and which hasn't been revised since then?

The latter. The C# 1.0 specification actually did not say whether the loop variable was inside or outside the loop body, as it made no observable difference. When closure semantics were introduced in C# 2.0, the choice was made to put the loop variable outside the loop, consistent with the "for" loop.

I think it is fair to say that all regret that decision. This is one of the worst "gotchas" in C#, and we are going to take the breaking change to fix it. In C# 5 the foreach loop variable will be logically inside the body of the loop, and therefore closures will get a fresh copy every time.

The for loop will not be changed, and the change will not be "back ported" to previous versions of C#. You should therefore continue to be careful when using this idiom.

What's going on behind the scene of the 'foreach' loop?

I encourage you to read section 8.8.4 of the C# specification, which answers your question in detail. Quoting from it here for your convenience:


A foreach statement of the form

foreach (V v in x) embedded-statement

is then expanded to:

{
E e = ((C)(x)).GetEnumerator();
try
{
V v;
while (e.MoveNext())
{
v = (V)(T)e.Current;
embedded-statement
}
}
finally
{
code to Dispose e if necessary
}
}

The types E, C, V and T are the enumerator, collection, loop variable and collection element types deduced by the semantic analyzer; see the spec for details.

So there you go. A "foreach" is just a more convenient way of writing a "while" loop that calls MoveNext until MoveNext returns false.

A few subtle things:

  • This need not be the code that is generated; all that is required is that we generate code that produces the same result. For example, if you "foreach" over an array or a string, we just generate a "for" loop (or loops, in the case of multi-d arrays) that indexes the array or the chars of the string, rather than taking on the expense of allocating an enumerator.

  • If the enumerator is of value type then the disposal code might or might not choose to box the enumerator before disposing it. Don't rely on that one way or the other. (See http://blogs.msdn.com/b/ericlippert/archive/2011/03/14/to-box-or-not-to-box-that-is-the-question.aspx for a related issue.)

  • Similarly, if the casts automatically inserted above are determined to be identity conversions then the casts might be elided even if doing so would normally cause a value type to be copied.

  • Future versions of C# are likely to put the declaration of loop variable v inside the while loop body; this will prevent the common "modified closure" bug that is reported about once a day on Stack Overflow. [Update: This change has indeed been implemented in C# 5.]

Foreach variable in closure

.Net 4.0 is irrelevant here. Only thing is the c# compiler. Starting from C# 5.0 behavior is changed. I presume you're using C# 5.0 compiler.

This means that even in .Net 2.0 this code will work if you're using Visual studio 2012 (given that default C# compiler version is 5.0)

If you're using Visual studio 2012 or newer version by default C#5.0 compiler will be used and hence you don't see the bug.

What's the difference between for and foreach in respect to closure

This is the result of an unfortunate decision that was later regretted by the C# team. A breaking change introduced by C# 5 finally changed that behavior. Quoting Eric Lippert:

In C# 5 the foreach loop variable will be logically inside the body of the loop, and therefore closures will get a fresh copy every time.

Before C# 5, the closures all referenced the same variable binding. Therefore, the output was the same for all invocations because they were all accessing the latest value of the same variable.

Starting with C# 5, however, a new variable binding is created for each iteration. This is the behavior that was most likely intended by the programmer.

Reproducing the close over the variable of a foreach gotcha

Scott's answer is correct, but could use some additional clarification.

The problem here is that the "language version" switch doesn't do what you think it does. This is in my opinion a bit of a misfeature, since it is quite misleading. The "language version" switch does not mean "use the old compiler"; it is not a compatibility mode.

Rather, it means "use the current compiler, and produce an error if I use a feature that was not available in the selected version of the language."

The reason for this switch is so that one person on a dev team can "try out" a new version of the compiler to make sure that their code still works, but know before they check in that they have not accidentally used a language feature that their teammates' compilers will choke on. So if you set the language version to 3.0 then "dynamic" will not work, (because it was added in C# 4.0) but it is still whatever version of the compiler you have installed.

As Scott points out, if you want to use the old compiler you'll have to actually find a copy of the old compiler on your machine somewhere and use it explicitly.

See http://ericlippert.com/2013/04/04/what-does-the-langversion-switch-do/ for some more examples of what this switch does and does not do.

Why can't we change the iteration variable in a foreach loop

Question 1: Why I was not able to change the any attribute of an iteration variable?

From the documentation on Anonymous Types:

Anonymous types provide a convenient way to encapsulate a set of read-only properties

You cannot change the values of the properties in your anonymous type, so

name.Age = 1;
// and
names[i].Age = 1;

are equally invalid.


Question 2. I was only able to assign a new object to the iteration variable in for loop. Not in foreach loop. Why?

From the documentation on IEnumerable:

An enumerator remains valid as long as the collection remains unchanged.

You would invalidate the iterator if you change the backing list in any way. Consider what would happen if the iterator returned the items in a specific order based on the Age field, for example.

The foreach identifier and closures

Edit: this all changes in C# 5, with a change to where the variable is defined (in the eyes of the compiler). From C# 5 onwards, they are the same.


Before C#5

The second is safe; the first isn't.

With foreach, the variable is declared outside the loop - i.e.

Foo f;
while(iterator.MoveNext())
{
f = iterator.Current;
// do something with f
}

This means that there is only 1 f in terms of the closure scope, and the threads might very likely get confused - calling the method multiple times on some instances and not at all on others. You can fix this with a second variable declaration inside the loop:

foreach(Foo f in ...) {
Foo tmp = f;
// do something with tmp
}

This then has a separate tmp in each closure scope, so there is no risk of this issue.

Here's a simple proof of the problem:

    static void Main()
{
int[] data = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
foreach (int i in data)
{
new Thread(() => Console.WriteLine(i)).Start();
}
Console.ReadLine();
}

Outputs (at random):

1
3
4
4
5
7
7
8
9
9

Add a temp variable and it works:

        foreach (int i in data)
{
int j = i;
new Thread(() => Console.WriteLine(j)).Start();
}

(each number once, but of course the order isn't guaranteed)

Captured Closure (Loop Variable) in C# 5.0

What is the reasoning behind this?

I'm going to assume you mean "why wasn't it changed for for loops as well?"

The answer is that for for loops, the existing behaviour makes perfect sense. If you break a for loop into:

  • initializer
  • condition
  • iterator
  • body

... then the loop is roughly:

{
initializer;
while (condition)
{
body;
iterator;
}
}

(Except that the iterator is executed at the end of a continue; statement as well, of course.)

The initialization part logically only happens once, so it's entirely logical that there's only one "variable instantiation". Furthermore, there's no natural "initial" value of the variable on each iteration of the loop - there's nothing to say that a for loop has to be of a form declaring a variable in the initializer, testing it in the condition and modifying it in the iterator. What would you expect a loop like this to do:

for (int i = 0, j = 10; i < j; i++)
{
if (someCondition)
{
j++;
}
actions.Add(() => Console.WriteLine(i, j));
}

Compare that with a foreach loop which looks like you're declaring a separate variable for every iteration. Heck, the variable is read-only, making it even more odd to think of it being one variable which changes between iterations. It makes perfect sense to think of a foreach loop as declaring a new read-only variable on each iteration with its value taken from the iterator.

Foreach Variable in Closure. Why Results Differ for These Snippets?

Resharper gives this description: "Access to foreach variable in closure. May have different behaviour when compiled with different versions of compiler." Why may it have a different behaviour?

There was a breaking change between C# 4 and C# 5 due to the way the loop variable in foreach was impacted by closures, notably since the introduction of lambda expressions in C# 3. Resharper is warning you of this, in case you might depend or otherwise have come to expect the former semantics.

The quick upshot is that in C# 4, the loop variable was shared between each iteration of the loop, and closures capture the variable, so it led to unexpected results for most people when they closed over the loop variable.

In C# 5, each iteration of the loop gets its own variable, so closures in one iteration do not close over the same variable as other iterations, leading to more expected outcomes (for most people).

That gets us to the heart of your problem:

In the first snippet all tasks start with their own message, while in the second one all tasks start with the same message?

In your first snippet, you are creating a copy of the loop variable inside your loop and the closure is occuring over the inner variable. In the second, you close over the loop variable directly. Presumably, you are running under C# 4, so the former semantics apply. If running in C# 5, the loop outputs from both versions should be consistent. This is the change Resharper refers to, and it should also let you understand how to structure your code in C# 4 (namely, use the first version you have written).

As Justin Pihony points out in the comments, Eric Lippert has written a very useful blog article on the former semantics that also alludes to the change for C# 5.



Related Topics



Leave a reply



Submit