Linq - What is the quickest way to find out deferred execution or not?
Generally methods that return a sequence use deferred execution:
IEnumerable<X> ---> Select ---> IEnumerable<Y>
and methods that return a single object doesn't:
IEnumerable<X> ---> First ---> Y
So, methods like Where
, Select
, Take
, Skip
, GroupBy
and OrderBy
use deferred execution because they can, while methods like First
, Single
, ToList
and ToArray
don't because they can't.
There are also two types of deferred execution. For example the Select
method will only get one item at a time when it's asked to produce an item, while the OrderBy
method will have to consume the entire source when asked to return the first item. So, if you chain an OrderBy
after a Select
, the execution will be deferred until you get the first item, but then the OrderBy
will ask the Select
for all the items.
Trying to understand how linq/deferred execution works
Lets think what is fold
variable:
var fold = enumerators.SelectMany(e => e.Current).OrderBy(x => random.Next());
It is not a result of query execution. It's a query definition. Because both SelectMany
and OrderBy
are operators with deferred manner of execution. So, it just saves knowledge about flattening current items from all enumerators and returning them in random order. I have highlighted word current, because it's current item at the time of query execution.
Now lets think when this query will be executed. Result of GenerateFolds
method execution is IEnumerable
of IEnumerable<int>
queries. Following code does not execute any of queries:
var folds = GenerateFolds(indices, values, numberOfFolds);
It's again just a query. You can execute it by calling ToList()
or enumerating it:
var f = folds.ToList();
But even now inner queries are not executed. They are all returned, but not executed. I.e. while
loop in GenerateFolds
has been executed while you saved queries to the list f
. And e.MoveNext()
has been called several times until you exited loop:
while (enumerators.All(e => e.MoveNext()))
{
var fold = enumerators.SelectMany(e => e.Current).OrderBy(x => random.Next());
yield return fold;
}
So, what f
holds? It holds list of queries. And thus you have got them all, current item is the last item from each enumerator (remember - we have iterated while
loop completely at this point of time). But none of these queries is executed yet! Here you execute first of them:
f[0].Count()
You get count of items returned by first query (defined at the top of question). But thus you already enumerated all queries current item is the last item. And you get count of indexes in last item.
Now take a look on
folds.First().Count()
Here you don't enumerate all queries to save them in list. I.e. while
loop is executed only once and current item is the first item. That's why you have count of indexes in first item. And that's why these values are different.
Last question - why all works fine when you add ToList()
inside your while
loop. Answer is very simple - that executes each query. And you have list of indexes instead of query definition. Each query is executed on each iteration, thus current item is always different. And your code works fine.
deferred execution or not
Every statement in your question is an example of deferred execution. The contents of the Select
and Where
statement have no effect on whether or not the resulting value is deferred executed or not. The Select
+ Where
statements themselves dictate that.
As a counter example consider the Sum
method. This is always eagerly executed irrespective of what the input is.
var sum = dc.myTables.Sum(...); // Always eager
What are the benefits of a Deferred Execution in LINQ?
The main benefit is that this allows filtering operations, the core of LINQ, to be much more efficient. (This is effectively your item #1).
For example, take a LINQ query like this:
var results = collection.Select(item => item.Foo).Where(foo => foo < 3).ToList();
With deferred execution, the above iterates your collection one time, and each time an item is requested during the iteration, performs the map operation, filters, then uses the results to build the list.
If you were to make LINQ fully execute each time, each operation (Select
/ Where
) would have to iterate through the entire sequence. This would make chained operations very inefficient.
Personally, I'd say your item #2 above is more of a side effect rather than a benefit - while it's, at times, beneficial, it also causes some confusion at times, so I would just consider this "something to understand" and not tout it as a benefit of LINQ.
In response to your edit:
In your particular example, in both cases Select would iterate collection and return an IEnumerable I1 of type item.Foo. Where() would then enumerate I1 and return IEnumerable<> I2 of type item.Foo. I2 would then be converted to List.
This is not true - deferred execution prevents this from occurring.
In my example, the return type is IEnumerable<T>
, which means that it's a collection that can be enumerated, but, due to deferred execution, it isn't actually enumerated.
When you call ToList()
, the entire collection is enumerated. The result ends up looking conceptually something more like (though, of course, different):
List<Foo> results = new List<Foo>();
foreach(var item in collection)
{
// "Select" does a mapping
var foo = item.Foo;
// "Where" filters
if (!(foo < 3))
continue;
// "ToList" builds results
results.Add(foo);
}
Deferred execution causes the sequence itself to only be enumerated (foreach) one time, when it's used (by ToList()
). Without deferred execution, it would look more like (conceptually):
// Select
List<Foo> foos = new List<Foo>();
foreach(var item in collection)
{
foos.Add(item.Foo);
}
// Where
List<Foo> foosFiltered = new List<Foo>();
foreach(var foo in foos)
{
if (foo < 3)
foosFiltered.Add(foo);
}
List<Foo> results = new List<Foo>();
foreach(var item in foosFiltered)
{
results.Add(item);
}
LINQ deferred (or immediate?) execution
If you are concerned with generating multiple calls I would consider using EntityFramework Extensions
You can batch queries together by adding .Future() to the end of a query
Example:
db.BlogPosts.Where(x => x.Category.Any(y => y.Name.Contains("EntityFramework"))).Future();
So to answer your question you could combine these into one call to the database.
To check the SQL/batching you can also include this before your query:
db.Database.Log = s => System.Diagnostics.Debug.WriteLine($"SQL: {s}");
and the log will be displayed in your output window.
Shouldn't sum method be deferred in LINQ
Only functions that return an IEnumerable<T>
can be deferred in Linq (since they can be wrapped in an object that allows deferring).
The result of Sum
is an int
, so it can't possibly defer it in any meaningful way:
var res2 = no.Sum(a => a * a);
// res2 is now an integer with a value of 55
Console.WriteLine(res2);
no.Add(100);
// how are you expecting an integer to change its value here?
Console.WriteLine(res2);
You can defer the execution (not really defer, but explicitly call it), by assigning the lambda to, for example, a Func<T>
:
List<int> no = new List<int>() { 1, 2, 3, 4, 5 };
Func<int> res2 = () => no.Sum(a => a * a);
Console.WriteLine(res2());
no.Add(100);
Console.WriteLine(res2());
This should correctly give 55
and 10055
How does deferred LINQ query execution actually work?
Your query can be written like this in method syntax:
var query = numbers.Where(value => value >= threshold);
Or:
Func<int, bool> predicate = delegate(value) {
return value >= threshold;
}
IEnumerable<int> query = numbers.Where(predicate);
These pieces of code (including your own query in query syntax) are all equivalent.
When you unroll the query like that, you see that predicate
is an anonymous method and threshold
is a closure in that method. That means it will assume the value at the time of execution. The compiler will generate an actual (non-anonymous) method that will take care of that. The method will not be executed when it's declared, but for each item when query
is enumerated (the execution is deferred). Since the enumeration happens after the value of threshold
is changed (and threshold
is a closure), the new value is used.
When you set numbers
to null
, you set the reference to nowhere, but the object still exists. The IEnumerable
returned by Where
(and referenced in query
) still references it and it does not matter that the initial reference is null
now.
That explains the behavior: numbers
and threshold
play different roles in the deferred execution. numbers
is a reference to the array that is enumerated, while threshold
is a local variable, whose scope is ”forwarded“ to the anonymous method.
Extension, part 1: Modification of the closure during the enumeration
You can take your example one step further when you replace the line...
var result = query.ToList();
...with:
List<int> result = new List<int>();
foreach(int value in query) {
threshold = 8;
result.Add(value);
}
What you are doing is to change the value of threshold
during the iteration of your array. When you hit the body of the loop the first time (when value
is 3), you change the threshold to 8, which means the values 5 and 7 will be skipped and the next value to be added to the list is 9. The reason is that the value of threshold
will be evaluated again on each iteration and the then valid value will be used. And since the threshold has changed to 8, the numbers 5 and 7 do not evaluate as greater or equal anymore.
Extension, part 2: Entity Framework is different
To make things more complicated, when you use LINQ providers that create a different query from your original and then execute it, things are slightly different. The most common examples are Entity Framework (EF) and LINQ2SQL (now largely superseded by EF). These providers create an SQL query from the original query before the enumeration. Since this time the value of the closure is evaluated only once (it actually is not a closure, because the compiler generates an expression tree and not an anonymous method), changes in threshold
during the enumeration have no effect on the result. These changes happen after the query is submitted to the database.
The lesson from this is that you have to be always aware which flavor of LINQ you are using and that some understanding of its inner workings is an advantage.
How to tell if an IEnumerableT is subject to deferred execution?
Deferred execution of LINQ has trapped a lot of people, you're not alone.
The approach I've taken to avoiding this problem is as follows:
Parameters to methods - use IEnumerable<T>
unless there's a need for a more specific interface.
Local variables - usually at the point where I create the LINQ, so I'll know whether lazy evaluation is possible.
Class members - never use IEnumerable<T>
, always use List<T>
. And always make them private.
Properties - use IEnumerable<T>
, and convert for storage in the setter.
public IEnumerable<Person> People
{
get { return people; }
set { people = value.ToList(); }
}
private List<People> people;
While there are theoretical cases where this approach wouldn't work, I've not run into one yet, and I've been enthusiasticly using the LINQ extension methods since late Beta.
BTW: I'm curious why you use ToArray();
instead of ToList();
- to me, lists have a much nicer API, and there's (almost) no performance cost.
Update: A couple of commenters have rightly pointed out that arrays have a theoretical performance advantage, so I've amended my statement above to "... there's (almost) no performance cost."
Update 2: I wrote some code to do some micro-benchmarking of the difference in performance between Arrays and Lists. On my laptop, and in my specific benchmark, the difference is around 5ns (that's nanoseconds) per access. I guess there are cases where saving 5ns per loop would be worthwhile ... but I've never come across one. I had to hike my test up to 100 million iterations before the runtime became long enough to accurately measure.
If result of the LINQ has an interface type it means it has a deferred execution
Not correct, even the array implements IEnumerable<int>
. You don't know from the type if it's using deferred execution since there is no IDeferred
interface.
I think the best what you could do is to try to cast it to ICollection<T>
or ICollection
:
public static bool IsDeferred<T>(this IEnumerable<T> source) {
if (source == null) throw new ArgumentNullException(nameof(source));
ICollection<T> genCollection = source as ICollection<T>;
if (genCollection != null) return false;
ICollection collection = source as ICollection;
if (collection != null) return false;
return true;
}
var arr = new int[5];
bool deferred = arr.IsDeferred(); // false
IEnumerable<int> seq = arr.Where(i => i != 0);
deferred = seq.IsDeferred(); // true
Related Topics
Why Visual Studio Doesn't Create a Public Class by Default
Displaying Tooltip on Mouse Hover of a Text
Regex to Match All Us Phone Number Formats
Create Http Post Request and Receive Response Using C# Console Application
How to Find Control in Edit Item Template
How to Determine the True Pixel Size of My Monitor in .Net
Return All Enumerables with Yield Return at Once; Without Looping Through
How to Get the Icon from the Executable File Only Having an Instance of It's Process in C#
Selecting Attribute Values with HTML Agility Pack
How to Have a Variable Number of Generic Parameters
Taking Screenshot of a Webpage Programmatically
Error "This Stream Does Not Support Seek Operations" in C#
Foo.Cmd Won't Output Lines in Process (On Website)
Output Console.Writeline from Wpf Windows Applications to Actual Console
Wpf Binding Not Updating the View