Linq - What Is the Quickest Way to Find Out Deferred Execution or Not

Linq - What is the quickest way to find out deferred execution or not?

Generally methods that return a sequence use deferred execution:

IEnumerable<X> ---> Select ---> IEnumerable<Y>

and methods that return a single object doesn't:

IEnumerable<X> ---> First ---> Y

So, methods like Where, Select, Take, Skip, GroupBy and OrderBy use deferred execution because they can, while methods like First, Single, ToList and ToArray don't because they can't.

There are also two types of deferred execution. For example the Select method will only get one item at a time when it's asked to produce an item, while the OrderBy method will have to consume the entire source when asked to return the first item. So, if you chain an OrderBy after a Select, the execution will be deferred until you get the first item, but then the OrderBy will ask the Select for all the items.

Trying to understand how linq/deferred execution works

Lets think what is fold variable:

var fold = enumerators.SelectMany(e => e.Current).OrderBy(x => random.Next());

It is not a result of query execution. It's a query definition. Because both SelectMany and OrderBy are operators with deferred manner of execution. So, it just saves knowledge about flattening current items from all enumerators and returning them in random order. I have highlighted word current, because it's current item at the time of query execution.

Now lets think when this query will be executed. Result of GenerateFolds method execution is IEnumerable of IEnumerable<int> queries. Following code does not execute any of queries:

var folds = GenerateFolds(indices, values, numberOfFolds);

It's again just a query. You can execute it by calling ToList() or enumerating it:

var f = folds.ToList();

But even now inner queries are not executed. They are all returned, but not executed. I.e. while loop in GenerateFolds has been executed while you saved queries to the list f. And e.MoveNext() has been called several times until you exited loop:

while (enumerators.All(e => e.MoveNext()))
{
    var fold = enumerators.SelectMany(e => e.Current).OrderBy(x => random.Next());
    yield return fold;
}

So, what f holds? It holds list of queries. And thus you have got them all, current item is the last item from each enumerator (remember - we have iterated while loop completely at this point of time). But none of these queries is executed yet! Here you execute first of them:

f[0].Count()

You get count of items returned by first query (defined at the top of question). But thus you already enumerated all queries current item is the last item. And you get count of indexes in last item.

Now take a look on

folds.First().Count()

Here you don't enumerate all queries to save them in list. I.e. while loop is executed only once and current item is the first item. That's why you have count of indexes in first item. And that's why these values are different.

Last question - why all works fine when you add ToList() inside your while loop. Answer is very simple - that executes each query. And you have list of indexes instead of query definition. Each query is executed on each iteration, thus current item is always different. And your code works fine.

deferred execution or not

Every statement in your question is an example of deferred execution. The contents of the Select and Where statement have no effect on whether or not the resulting value is deferred executed or not. The Select + Where statements themselves dictate that.

As a counter example consider the Sum method. This is always eagerly executed irrespective of what the input is.

var sum = dc.myTables.Sum(...);  // Always eager

What are the benefits of a Deferred Execution in LINQ?

The main benefit is that this allows filtering operations, the core of LINQ, to be much more efficient. (This is effectively your item #1).

For example, take a LINQ query like this:

 var results = collection.Select(item => item.Foo).Where(foo => foo < 3).ToList();

With deferred execution, the above iterates your collection one time, and each time an item is requested during the iteration, performs the map operation, filters, then uses the results to build the list.

If you were to make LINQ fully execute each time, each operation (Select / Where) would have to iterate through the entire sequence. This would make chained operations very inefficient.

Personally, I'd say your item #2 above is more of a side effect rather than a benefit - while it's, at times, beneficial, it also causes some confusion at times, so I would just consider this "something to understand" and not tout it as a benefit of LINQ.

In response to your edit:

In your particular example, in both cases Select would iterate collection and return an IEnumerable I1 of type item.Foo. Where() would then enumerate I1 and return IEnumerable<> I2 of type item.Foo. I2 would then be converted to List.

This is not true - deferred execution prevents this from occurring.

In my example, the return type is IEnumerable<T>, which means that it's a collection that can be enumerated, but, due to deferred execution, it isn't actually enumerated.

When you call ToList(), the entire collection is enumerated. The result ends up looking conceptually something more like (though, of course, different):

List<Foo> results = new List<Foo>();
foreach(var item in collection)
{
    // "Select" does a mapping
    var foo = item.Foo; 

    // "Where" filters
    if (!(foo < 3))
         continue;

    // "ToList" builds results
    results.Add(foo);
}

Deferred execution causes the sequence itself to only be enumerated (foreach) one time, when it's used (by ToList()). Without deferred execution, it would look more like (conceptually):

// Select
List<Foo> foos = new List<Foo>();
foreach(var item in collection)
{
    foos.Add(item.Foo);
}

// Where
List<Foo> foosFiltered = new List<Foo>();
foreach(var foo in foos)
{
    if (foo < 3)
        foosFiltered.Add(foo);
}    

List<Foo> results = new List<Foo>();
foreach(var item in foosFiltered)
{
    results.Add(item);
}

LINQ deferred (or immediate?) execution

If you are concerned with generating multiple calls I would consider using EntityFramework Extensions

You can batch queries together by adding .Future() to the end of a query

Example:

db.BlogPosts.Where(x => x.Category.Any(y => y.Name.Contains("EntityFramework"))).Future();

So to answer your question you could combine these into one call to the database.

To check the SQL/batching you can also include this before your query:

db.Database.Log = s => System.Diagnostics.Debug.WriteLine($"SQL: {s}");

and the log will be displayed in your output window.

Shouldn't sum method be deferred in LINQ

Only functions that return an IEnumerable<T> can be deferred in Linq (since they can be wrapped in an object that allows deferring).

The result of Sum is an int, so it can't possibly defer it in any meaningful way:

var res2 = no.Sum(a => a * a);
// res2 is now an integer with a value of 55
Console.WriteLine(res2);
no.Add(100);

// how are you expecting an integer to change its value here?
Console.WriteLine(res2);

You can defer the execution (not really defer, but explicitly call it), by assigning the lambda to, for example, a Func<T>:

List<int> no = new List<int>() { 1, 2, 3, 4, 5 };
Func<int> res2 = () => no.Sum(a => a * a);
Console.WriteLine(res2());
no.Add(100);
Console.WriteLine(res2());

This should correctly give 55 and 10055

How does deferred LINQ query execution actually work?

Your query can be written like this in method syntax:

var query = numbers.Where(value => value >= threshold);

Or:

Func<int, bool> predicate = delegate(value) {
    return value >= threshold;
}
IEnumerable<int> query = numbers.Where(predicate);

These pieces of code (including your own query in query syntax) are all equivalent.

When you unroll the query like that, you see that predicate is an anonymous method and threshold is a closure in that method. That means it will assume the value at the time of execution. The compiler will generate an actual (non-anonymous) method that will take care of that. The method will not be executed when it's declared, but for each item when query is enumerated (the execution is deferred). Since the enumeration happens after the value of threshold is changed (and threshold is a closure), the new value is used.

When you set numbers to null, you set the reference to nowhere, but the object still exists. The IEnumerable returned by Where (and referenced in query) still references it and it does not matter that the initial reference is null now.

That explains the behavior: numbers and threshold play different roles in the deferred execution. numbers is a reference to the array that is enumerated, while threshold is a local variable, whose scope is ”forwarded“ to the anonymous method.

Extension, part 1: Modification of the closure during the enumeration

You can take your example one step further when you replace the line...

var result = query.ToList();

...with:

List<int> result = new List<int>();
foreach(int value in query) {
    threshold = 8;
    result.Add(value);
}

What you are doing is to change the value of threshold during the iteration of your array. When you hit the body of the loop the first time (when value is 3), you change the threshold to 8, which means the values 5 and 7 will be skipped and the next value to be added to the list is 9. The reason is that the value of threshold will be evaluated again on each iteration and the then valid value will be used. And since the threshold has changed to 8, the numbers 5 and 7 do not evaluate as greater or equal anymore.

Extension, part 2: Entity Framework is different

To make things more complicated, when you use LINQ providers that create a different query from your original and then execute it, things are slightly different. The most common examples are Entity Framework (EF) and LINQ2SQL (now largely superseded by EF). These providers create an SQL query from the original query before the enumeration. Since this time the value of the closure is evaluated only once (it actually is not a closure, because the compiler generates an expression tree and not an anonymous method), changes in threshold during the enumeration have no effect on the result. These changes happen after the query is submitted to the database.

The lesson from this is that you have to be always aware which flavor of LINQ you are using and that some understanding of its inner workings is an advantage.

How to tell if an IEnumerableT is subject to deferred execution?

Deferred execution of LINQ has trapped a lot of people, you're not alone.

The approach I've taken to avoiding this problem is as follows:

Parameters to methods - use IEnumerable<T> unless there's a need for a more specific interface.

Local variables - usually at the point where I create the LINQ, so I'll know whether lazy evaluation is possible.

Class members - never use IEnumerable<T>, always use List<T>. And always make them private.

Properties - use IEnumerable<T>, and convert for storage in the setter.

public IEnumerable<Person> People 
{
    get { return people; }
    set { people = value.ToList(); }
}
private List<People> people;

While there are theoretical cases where this approach wouldn't work, I've not run into one yet, and I've been enthusiasticly using the LINQ extension methods since late Beta.

BTW: I'm curious why you use ToArray(); instead of ToList(); - to me, lists have a much nicer API, and there's (almost) no performance cost.

Update: A couple of commenters have rightly pointed out that arrays have a theoretical performance advantage, so I've amended my statement above to "... there's (almost) no performance cost."

Update 2: I wrote some code to do some micro-benchmarking of the difference in performance between Arrays and Lists. On my laptop, and in my specific benchmark, the difference is around 5ns (that's nanoseconds) per access. I guess there are cases where saving 5ns per loop would be worthwhile ... but I've never come across one. I had to hike my test up to 100 million iterations before the runtime became long enough to accurately measure.

If result of the LINQ has an interface type it means it has a deferred execution

Not correct, even the array implements IEnumerable<int>. You don't know from the type if it's using deferred execution since there is no IDeferred interface.

I think the best what you could do is to try to cast it to ICollection<T> or ICollection:

public static bool IsDeferred<T>(this IEnumerable<T> source) {
    if (source == null) throw new ArgumentNullException(nameof(source));
    ICollection<T> genCollection = source as ICollection<T>;
    if (genCollection != null) return false;
    ICollection collection = source as ICollection;
    if (collection != null) return false;
    return true;
}

var arr = new int[5];
bool deferred = arr.IsDeferred(); // false
IEnumerable<int> seq = arr.Where(i => i != 0);
deferred = seq.IsDeferred();      // true

Linq - What Is the Quickest Way to Find Out Deferred Execution or Not