How Does Deferred Linq Query Execution Actually Work

How does deferred LINQ query execution actually work?

Your query can be written like this in method syntax:

var query = numbers.Where(value => value >= threshold);

Or:

Func<int, bool> predicate = delegate(value) {
return value >= threshold;
}
IEnumerable<int> query = numbers.Where(predicate);

These pieces of code (including your own query in query syntax) are all equivalent.

When you unroll the query like that, you see that predicate is an anonymous method and threshold is a closure in that method. That means it will assume the value at the time of execution. The compiler will generate an actual (non-anonymous) method that will take care of that. The method will not be executed when it's declared, but for each item when query is enumerated (the execution is deferred). Since the enumeration happens after the value of threshold is changed (and threshold is a closure), the new value is used.

When you set numbers to null, you set the reference to nowhere, but the object still exists. The IEnumerable returned by Where (and referenced in query) still references it and it does not matter that the initial reference is null now.

That explains the behavior: numbers and threshold play different roles in the deferred execution. numbers is a reference to the array that is enumerated, while threshold is a local variable, whose scope is ”forwarded“ to the anonymous method.

Extension, part 1: Modification of the closure during the enumeration

You can take your example one step further when you replace the line...

var result = query.ToList();

...with:

List<int> result = new List<int>();
foreach(int value in query) {
threshold = 8;
result.Add(value);
}

What you are doing is to change the value of threshold during the iteration of your array. When you hit the body of the loop the first time (when value is 3), you change the threshold to 8, which means the values 5 and 7 will be skipped and the next value to be added to the list is 9. The reason is that the value of threshold will be evaluated again on each iteration and the then valid value will be used. And since the threshold has changed to 8, the numbers 5 and 7 do not evaluate as greater or equal anymore.

Extension, part 2: Entity Framework is different

To make things more complicated, when you use LINQ providers that create a different query from your original and then execute it, things are slightly different. The most common examples are Entity Framework (EF) and LINQ2SQL (now largely superseded by EF). These providers create an SQL query from the original query before the enumeration. Since this time the value of the closure is evaluated only once (it actually is not a closure, because the compiler generates an expression tree and not an anonymous method), changes in threshold during the enumeration have no effect on the result. These changes happen after the query is submitted to the database.

The lesson from this is that you have to be always aware which flavor of LINQ you are using and that some understanding of its inner workings is an advantage.

What are the benefits of a Deferred Execution in LINQ?

The main benefit is that this allows filtering operations, the core of LINQ, to be much more efficient. (This is effectively your item #1).

For example, take a LINQ query like this:

 var results = collection.Select(item => item.Foo).Where(foo => foo < 3).ToList();

With deferred execution, the above iterates your collection one time, and each time an item is requested during the iteration, performs the map operation, filters, then uses the results to build the list.

If you were to make LINQ fully execute each time, each operation (Select / Where) would have to iterate through the entire sequence. This would make chained operations very inefficient.

Personally, I'd say your item #2 above is more of a side effect rather than a benefit - while it's, at times, beneficial, it also causes some confusion at times, so I would just consider this "something to understand" and not tout it as a benefit of LINQ.


In response to your edit:

In your particular example, in both cases Select would iterate collection and return an IEnumerable I1 of type item.Foo. Where() would then enumerate I1 and return IEnumerable<> I2 of type item.Foo. I2 would then be converted to List.

This is not true - deferred execution prevents this from occurring.

In my example, the return type is IEnumerable<T>, which means that it's a collection that can be enumerated, but, due to deferred execution, it isn't actually enumerated.

When you call ToList(), the entire collection is enumerated. The result ends up looking conceptually something more like (though, of course, different):

List<Foo> results = new List<Foo>();
foreach(var item in collection)
{
// "Select" does a mapping
var foo = item.Foo;

// "Where" filters
if (!(foo < 3))
continue;

// "ToList" builds results
results.Add(foo);
}

Deferred execution causes the sequence itself to only be enumerated (foreach) one time, when it's used (by ToList()). Without deferred execution, it would look more like (conceptually):

// Select
List<Foo> foos = new List<Foo>();
foreach(var item in collection)
{
foos.Add(item.Foo);
}

// Where
List<Foo> foosFiltered = new List<Foo>();
foreach(var foo in foos)
{
if (foo < 3)
foosFiltered.Add(foo);
}

List<Foo> results = new List<Foo>();
foreach(var item in foosFiltered)
{
results.Add(item);
}

Trying to understand how linq/deferred execution works

Lets think what is fold variable:

var fold = enumerators.SelectMany(e => e.Current).OrderBy(x => random.Next());

It is not a result of query execution. It's a query definition. Because both SelectMany and OrderBy are operators with deferred manner of execution. So, it just saves knowledge about flattening current items from all enumerators and returning them in random order. I have highlighted word current, because it's current item at the time of query execution.

Now lets think when this query will be executed. Result of GenerateFolds method execution is IEnumerable of IEnumerable<int> queries. Following code does not execute any of queries:

var folds = GenerateFolds(indices, values, numberOfFolds);

It's again just a query. You can execute it by calling ToList() or enumerating it:

var f = folds.ToList();

But even now inner queries are not executed. They are all returned, but not executed. I.e. while loop in GenerateFolds has been executed while you saved queries to the list f. And e.MoveNext() has been called several times until you exited loop:

while (enumerators.All(e => e.MoveNext()))
{
var fold = enumerators.SelectMany(e => e.Current).OrderBy(x => random.Next());
yield return fold;
}

So, what f holds? It holds list of queries. And thus you have got them all, current item is the last item from each enumerator (remember - we have iterated while loop completely at this point of time). But none of these queries is executed yet! Here you execute first of them:

f[0].Count() 

You get count of items returned by first query (defined at the top of question). But thus you already enumerated all queries current item is the last item. And you get count of indexes in last item.

Now take a look on

folds.First().Count()

Here you don't enumerate all queries to save them in list. I.e. while loop is executed only once and current item is the first item. That's why you have count of indexes in first item. And that's why these values are different.

Last question - why all works fine when you add ToList() inside your while loop. Answer is very simple - that executes each query. And you have list of indexes instead of query definition. Each query is executed on each iteration, thus current item is always different. And your code works fine.

Understanding Deferred Execution: Is a Linq Query Re-executed Everytime its collection of anonymous objects is referred to?

Your linq ceases to be deferred when you do

ToDataTable();

At that point it is snapshotted as done and dusted forever.

Same is true with foamOrders and webOrders when you convert it

ToList();

You could do it as one query. I dont have mySQL to check it out on.

Does LINQ deferred execution occur when rendering the view, or earlier?

MSDN documentation addresses this question under the deferred query execution section (emphasis mine).

In a query that returns a sequence of values, the query variable
itself never holds the query results and only stores the query
commands. Execution of the query is deferred until the query variable
is iterated over in a foreach or For Each loop
...

That narrows down the answer to options 2 and 3.

foreach is just syntactic sugar, underneath the compiler re-writes that as a while loop. There's a pretty thorough explanation of what happens here. Basically your loop will end up looking something like this

{
IEnumerator<?> e = ((IEnumerable<?>)Model).GetEnumerator();
try
{
int m; // this is inside the loop in C# 5
while(e.MoveNext())
{
m = (?)e.Current;
// your code goes here
}
}
finally
{
if (e != null) ((IDisposable)e).Dispose();
}
}

Enumerator is advanced before it reaches your code inside the loop, so slightly before you get to @item.Bar. That only leaves option 2, the @foreach (var item in Model) line (though technically that line doesn't exist after the compiler is done with your code).

I'm not sue if the query will execute on the call to GetEnumerator() or on the first call to e.MoveNext().


As @pst points out in the comments, there are other ways to trigger execution of a query, such as by calling ToList, and it may not internally use a foreach loop. MSDN documentation sort of addresses this here:

The IQueryable interface inherits the IEnumerable interface so that if
it represents a query, the results of that query can be enumerated.
Enumeration causes the expression tree associated with an IQueryable
object to be executed.
The definition of "executing an expression
tree" is specific to a query provider. For example, it may involve
translating the expression tree to an appropriate query language for
the underlying data source. Queries that do not return enumerable
results are executed when the Execute method is called.

My understanding of that is an attempt to enumerate the expression will cause it to execute (be it through a foreach or some other way). How exactly that happens will depend on the implementation of the provider.

How to maintain LINQ deferred execution?

  1. You have to be really careful about passing around IQueryables when you're using a DataContext, because once the context get's disposed you won't be able to execute on that IQueryable anymore. If you're not using a context then you might be ok, but be aware of that.

  2. .Any() and .FirstOrDefault() are not deferred. When you call them they will cause execution to occur. However, this may not do what you think it does. For instance, in LINQ to SQL if you perform an .Any() on an IQueryable it acts as a IF EXISTS( SQL HERE ) basically.

You can chain IQueryable's along like this if you want to:

var firstQuery = from f in context.Foos
where f.Bar == bar
select f;

var secondQuery = from f in firstQuery
where f.Bar == anotherBar
orderby f.SomeDate
select f;

if (secondQuery.Any()) //immediately executes IF EXISTS( second query in SQL )
{
//causes execution on second query
//and allows you to enumerate through the results
foreach (var foo in secondQuery)
{
//do something
}

//or

//immediately executes second query in SQL with a TOP 1
//or something like that
var foo = secondQuery.FirstOrDefault();
}

Does Deferred Execution works in the same way for method syntax and query syntax in LINQ/Entity Framework?

Actually there is only method syntax. When you write your query with query syntax, compiler translates it to method syntax (actually to static methods calls).

Example:

Extension method call (method syntax)

var query = sequence.Select(x => x.Property);

Is compiled as (yes extension methods are just a syntax sugar for calls of static class methods)

var query = Queryable.Select(sequence, x => x.Property);

Same result gives (this is a syntax sugar for same Queryable/Enumerable methods calls)

var query = from x in sequence
select x.Property;

So, both syntaxes produce same code. Thus there is no difference which syntax you are using - deferred execution (and anything else) will work in the same way.

How does LINQ defer execution when in a using statement

I would expect that to simply not work; the Select is deferred, so no data has been consumed at this point. However, since you have disposed the data-context (before leaving MyFunc), it will never be able to get data. A better option is to pass the data-context into the method, so that the consumer can choose the lifetime. Also, I would recommend returning IQueryable<T> so that the consumer can "compose" the result (i.e. add OrderBy / Skip / Take / Where etc, and have it impact the final query):

// this could also be an instance method on the data-context
internal static IQueryable<SomeType> MyFunc(
this MyDataContext dc, parameter a)
{
return dc.tablename.Where(row => row.parameter == a);
}

private void UsingFunc()
{
using(MyDataContext dc = new MyDataContext()) {
var result = dc.MyFunc(new a());

foreach(var row in result)
{
//Do something
}
}
}

Update: if you (comments) don't want to defer execution (i.e. you don't want the caller dealing with the data-context), then you need to evaluate the results. You can do this by calling .ToList() or .ToArray() on the result to buffer the values.

private IEnumerable<SomeType> MyFunc(parameter a)
{
using(MyDataContext dc = new MyDataContext)
{
// or ToList() etc
return dc.tablename.Where(row => row.parameter == a).ToArray();
}
}

If you want to keep it deferred in this case, then you need to use an "iterator block":

private IEnumerable<SomeType> MyFunc(parameter a)
{
using(MyDataContext dc = new MyDataContext)
{
foreach(SomeType row in dc
.tablename.Where(row => row.parameter == a))
{
yield return row;
}
}
}

This is now deferred without passing the data-context around.

Linq - What is the quickest way to find out deferred execution or not?

Generally methods that return a sequence use deferred execution:

IEnumerable<X> ---> Select ---> IEnumerable<Y>

and methods that return a single object doesn't:

IEnumerable<X> ---> First ---> Y

So, methods like Where, Select, Take, Skip, GroupBy and OrderBy use deferred execution because they can, while methods like First, Single, ToList and ToArray don't because they can't.

There are also two types of deferred execution. For example the Select method will only get one item at a time when it's asked to produce an item, while the OrderBy method will have to consume the entire source when asked to return the first item. So, if you chain an OrderBy after a Select, the execution will be deferred until you get the first item, but then the OrderBy will ask the Select for all the items.



Related Topics



Leave a reply



Submit