Does "Foreach" Cause Repeated Linq Execution

Does foreach cause repeated Linq execution?

In general LINQ uses deferred execution. If you use methods like First() and FirstOrDefault() the query is executed immediately. When you do something like;

foreach(string s in MyObjects.Select(x => x.AStringProp))

The results are retrieved in a streaming manner, meaning one by one. Each time the iterator calls MoveNext the projection is applied to the next object. If you were to have a Where it would first apply the filter, then the projection.

If you do something like;

List<string> names = People.Select(x => x.Name).ToList();
foreach (string name in names)

Then I believe this is a wasteful operation. ToList() will force the query to be executed, enumerating the People list and applying the x => x.Name projection. Afterwards you will enumerate the list again. So unless you have a good reason to have the data in a list (rather than IEnumerale) you're just wasting CPU cycles.

Generally speaking using a LINQ query on the collection you're enumerating with a foreach will not have worse performance than any other similar and practical options.

Also it's worth noting that people implementing LINQ providers are encouraged to make the common methods work as they do in the Microsoft provided providers but they're not required to. If I were to go write a LINQ to HTML or LINQ to My Proprietary Data Format provider there would be no guarantee that it behaves in this manner. Perhaps the nature of the data would make immediate execution the only practical option.

Also, final edit; if you're interested in this Jon Skeet's C# In Depth is very informative and a great read. My answer summarizes a few pages of the book (hopefully with reasonable accuracy) but if you want more details on how LINQ works under the covers, it's a good place to look.

Does foreach execute the query only once?

Now, with LINQ's deferred execution, would a subsequent foreach loop execute the query only once or for each turn in the loop?

Yes, once for the loop. Actually, it may execute the query less than once - you could abort the looping part way through and the (num % 2) == 0 test wouldn't be performed on any remaining items.

Or, in other words, would there be any difference if I had:

foreach (int num in numQuery.ToList())

Two differences:

  1. In the case above, ToList() wastes time and memory, because it first does the same thing as the initial foreach, builds a list from it, and then foreachs that list. The differences will be somewhere between trivial and preventing the code from ever working, depending on the size of the results.

  2. However, in the case where you are going to repeatedly do foreach on the same results, or otherwise use it repeatedly, the then while the foreach only runs the query once, the next foreach runs it again. If the query is expensive, then the ToList() approach (and storing that list) can be a massive saving.

When do LINQ Lambdas execute in a foreach loop

Here is the example of working lazy evaluation of Linq queries.

List<int> vals = new List<int> {1, 1, 2, 2, 3, 4};
var res = new List<int>();
foreach (int s in vals.Where(s =>
{
Console.WriteLine("lambda");
return s % 2 == 0;
}))
{
Console.WriteLine("loop");
}

And the output will be

lambda
lambda
lambda
loop
lambda
loop
lambda
lambda
loop

As you can see lambda will be evaluated only when the next element is required by foreach loop

Is ToList required when using foreach with LINQ to Entities

It is better, if you have only to iterate through your elements to not call ToList(). This is because when we call it, an immediate execution of the corresponding query is triggered and one in memory collection will be created.

If you don't call ToList you will avoid the creation of the in memory collection that will hold the results of your query.

Either you follow the first way, either the second, you will make one round trip to the database.

(foreach & isAssignableFrom) versus (OfType & foreach)

First, in terms of performance, in the first snippet you are iterating over ONLY Foos where as in the second instance you're iterating over everything and checking during the iteration if it is a Foo.

Second, in terms of readability, I would be a bit surprised if I found the second option, whereas the first is quite normal.

Finally, as Peter Duniho noted in the comments, you can save yourself the overhead of creating a new array altogether thanks to deferred execution.

foreach(Foo foo in foosAndBars.OfType<Foo>())
{ }

Why am I able to edit a LINQ list while iterating over it?

The explanation to your first question, why your LINQ query re-runs every time it's iterated over is because of Linq's deferred execution.

This line just declares the linq exrpession and does not execute it:

var linqLIST = aArray.Where(x => x == "a");

and this is where it gets executed:

foreach (var arrItem in aArray)

and

Console.WriteLine(linqList.Count());

An explict call ToList() would run the Linq expression immediately. Use it like this:

var linqList = aArray.Where(x => x == "a").ToList();

Regarding the edited question:

Of course, the Linq expression is evaluated in every foreach iteration. The issue is not the Count(), instead every call to the LINQ expression re-evaluates it. As mentioned above, enumerate it to a List and iterate over the list.

Late edit:

Concerning @Eric Lippert's critique, I will also refer and go into detail for the rest of the OP's questions.

//Why does this only print out 2 a's and 2 b's, rather than 4 b's?

In the first loop iteration i = 3, so after aArray[3] = "b"; your array will look like this:

{ "a", "a", "a", "b" }

In the second loop iteration i(--) has now the value 2 and after executing aArray[i] = "b"; your array will be:

{ "a", "a", "b", "b" }

At this point, there are still a's in your array but the LINQ query returns IEnumerator.MoveNext() == false and as such the loop reaches its exit condition because the IEnumerator internally used, now reaches the third position in the index of the array and as the LINQ is re-evaluated it doesn't match the where x == "a" condition any more.

Why am I able to change what I'm looping over as I'm looping over it?

You are able to do so because the build in code analyser in Visual Studio is not detecting that you modify the collection within the loop. At runtime the array is modified, changing the outcome of the LINQ query but there is no handling in the implementation of the array iterator so no exception is thrown.
This missing handling seems by design, as arrays are of fixed size oposed to lists where such an exception is thrown at runtime.

Consider following example code which should be equivalent with your initial code example (before edit):

using System;
using System.Linq;

namespace MyTest {
class Program {
static void Main () {
var aArray = new string[] {
"a", "a", "a", "a"
};
var iterationList = aArray.Where(x => x == "a").ToList();
foreach (var item in iterationList)
{
var index = iterationList.IndexOf(item);
iterationList.Remove(item);
iterationList.Insert(index, "b");
}
foreach (var arrItem in aArray)
{
Console.WriteLine(arrItem);
}
Console.ReadKey();
}
}
}

This code will compile and iterate the loop once before throwing an System.InvalidOperationException with the message:

Collection was modified; enumeration operation may not execute.

Now the reason why the List implementation throws this error while enumerating it, is because it follows a basic concept: For and Foreach are iterative control flow statements that need to be deterministic at runtime. Furthermore the Foreach statement is a C# specific implementation of the iterator pattern, which defines an algorithm that implies sequential traversal and as such it would not change within the execution. Thus the List implementation throws an exception when you modify the collection while enumerating it.

You found one of the ways to modify a loop while iterating it and re-eveluating it in each iteration. This is a bad design choice because you might run into an infinite loop if the LINQ expression keeps changing the results and never meets an exit condition for the loop. This will make it hard to debug and will not be obvious when reading the code.

In contrast there is the while control flow statement which is a conditional construct and is ment to be non-deterministic at runtime, having a specific exit condition that is expected to change while execution.
Consider this rewrite base on your example:

using System;
using System.Linq;

namespace MyTest {
class Program {
static void Main () {
var aArray = new string[] {
"a", "a", "a", "a"
};
bool arrayHasACondition(string x) => x == "a";
while (aArray.Any(arrayHasACondition))
{
var index = Array.FindIndex(aArray, arrayHasACondition);
aArray[index] = "b";
}
foreach (var arrItem in aArray)
{
Console.WriteLine(arrItem); //Why does this only print out 2 a's and 2 b's, rather than 4 b's?
}
Console.ReadKey();
}
}
}

I hope this should outline the technical background and explain your false expectations.

Foreach using LINQ Efficiency

Looking at this (admittedly old) answer: Does "foreach" cause repeated Linq execution?

It depends on the dataset to an extent; but because of how LINQ and IEnumerables work, A & C are both the same in terms of functionality. Instead of executing the query in a single-hit; the results are retrieved in a streaming manner, meaning one by one. Each time the iterator calls MoveNext the projection is applied to the next object; because there's a where clause in your example it applies the filter before the projection.

By calling the .ToList() method in examples B & D, you're forcing the query to execute and the result to cached. In terms of the "which one is better" question, that's where the answer becomes "it depends".

If the dataset are already in-memory objects; A & C both save a bit on memory, and are slightly quicker than B & D because it's not having to do any manipulation in terms of resizing the list.

If you're querying a database, then A & C save on memory; however (you'd have to test this bit, because it seems hit and miss) it's possible that it'd go back the DB each time the MoveNext is hit - on a small table it wouldn't make much difference, but I have encountered instances in large tables where it's saved several minutes worth of execution time just by creating a local list of the query results.

EDIT for clarity:

Adding in some pseudocode to elaborate on this point. The premise behind how A & C work is as follows:

  1. Look for an element that meets the criteria.
  2. Get the first element that meets the selection criteria.
  3. Do whatever is within the loop.
  4. Look for another element.
  5. Get the next element.
  6. Do whatever is within the loop.
  7. Repeat steps 4-6 until a result is not found.

Whereas B & D work more along the lines of the following:

  1. Find all elements that match the selection criteria.
  2. Create a list from the results to step 1.
  3. Assign a pointer that points at the first element in the list.
  4. Do the code within the loop.
  5. Move the pointer to the next item in the list.
  6. Do the code within the loop.
  7. Repeat steps 5 and 6 for all items in the list.

A more real-life scenario that can roughly explain it is when you go shopping - if you have the shopping list in your hand, because you've already spent the time to figure out what you need, (B & D) then you just need to look at the list and grab the next item. If you don't have the shopping list (A & C), then you have the extra step in the store of thinking "what do I need?" before retrieving the item.

Does LINQ deferred execution occur when rendering the view, or earlier?

MSDN documentation addresses this question under the deferred query execution section (emphasis mine).

In a query that returns a sequence of values, the query variable
itself never holds the query results and only stores the query
commands. Execution of the query is deferred until the query variable
is iterated over in a foreach or For Each loop
...

That narrows down the answer to options 2 and 3.

foreach is just syntactic sugar, underneath the compiler re-writes that as a while loop. There's a pretty thorough explanation of what happens here. Basically your loop will end up looking something like this

{
IEnumerator<?> e = ((IEnumerable<?>)Model).GetEnumerator();
try
{
int m; // this is inside the loop in C# 5
while(e.MoveNext())
{
m = (?)e.Current;
// your code goes here
}
}
finally
{
if (e != null) ((IDisposable)e).Dispose();
}
}

Enumerator is advanced before it reaches your code inside the loop, so slightly before you get to @item.Bar. That only leaves option 2, the @foreach (var item in Model) line (though technically that line doesn't exist after the compiler is done with your code).

I'm not sue if the query will execute on the call to GetEnumerator() or on the first call to e.MoveNext().


As @pst points out in the comments, there are other ways to trigger execution of a query, such as by calling ToList, and it may not internally use a foreach loop. MSDN documentation sort of addresses this here:

The IQueryable interface inherits the IEnumerable interface so that if
it represents a query, the results of that query can be enumerated.
Enumeration causes the expression tree associated with an IQueryable
object to be executed.
The definition of "executing an expression
tree" is specific to a query provider. For example, it may involve
translating the expression tree to an appropriate query language for
the underlying data source. Queries that do not return enumerable
results are executed when the Execute method is called.

My understanding of that is an attempt to enumerate the expression will cause it to execute (be it through a foreach or some other way). How exactly that happens will depend on the implementation of the provider.

Why does LINQ take more time to execute than foreach?

There are several things wrong with your example:

  • Tiny sample size. Four elements in the array?? Try 1,000,000
  • In the first example the dictionary object is created outside of the stopwatch. Object creation is a factor in speed, particularly with such a tiny example
  • The LINQ code uses a delegate. Admittedly this is a common usage in LINQ but to get a true comparison either both should use methods or both should use delegates.

You should check out Jon Skeet's blog post on this subject

Does For loop check linq expression at each iteration

1: No because it is only the starting value. This part will be visited only once at the start of the loop

2: Yes because the linq expression has to be evaluated to get the result value. If you want to avoid it execute the linq expression once before the loop and save it into a separate value:

int end = list.OrderBy(x => x.Key).First(); 
for (var i = list.Count() - 1; i >= end; i--)
if (list.ContainsKey(i))
list.RemoveAt(i)

EDIT:

If your list might change dynamically, then of course the way you use it now would be a preferable solution, since you could accommodate to the changes.

as for question 3: this Answer says that the Add method is not thread safe



Related Topics



Leave a reply



Submit