What's the Hardest or Most Misunderstood Aspect of Linq

What's the hardest or most misunderstood aspect of LINQ?

Delayed execution

What's the hardest or most misunderstood aspect of LINQ?

Delayed execution

Understanding linq thoroughly

Read Jon Skeet's EduLINQ series.

Note: Long

Reason not to use LINQ

Personally I can't see why you wouldn't want to use it, it makes code much more readable.

With that being said, LINQ can sometimes be detrimental to performance in certain scenarios, that would really only be my reason for not using it.

What is the Big O of linq .where?

Where() is O(1); it doesn't actually do any work.

Looping through the collection returned by Where() is O(n). ..

The O(n) that you're seeing is the result of ToList(), which is O(n).

If you pass a Where() query to an O(n2) algorithm, you will see the callback execute n2 times. (assuming the algorithm doesn't cache anywhere)

This is called deferred execution.

This is true about most if not all LINQ providers; it wouldn't make sense for a LINQ provider to eagerly execute all calls.


In the case of LINQ to objects, this assumes that the source collection's enumerator is O(n).

If you're using some strange collection which iterates in worse than O(n) (in other words, if its MoveNext() is worse than O(1)), Where() will be bounded by that.

To be more precise, the time complexity of enumerating a Where() query is the same as the time complexity of the original enumeration.

Similarly, I'm assuming that the callback is O(1).

If it isn't, you'll need to multiply the complexity of the callback by the complexity of the original enumeration.

Why and When to use LINQ?

Just to clarify there are differences between the concept of LINQ and LINQ to SQL.

LINQ is a query syntax, not a language or an O/RM. You can build an O/RM on top of the syntax provided by LINQ.

Since I gather that your question is really When to use LINQ to SQL I'll just address that.

LINQ to SQL is best used when you are:

  • Only ever targeting MS SQL 2000+
  • Doing RAD

I've used LINQ to SQL on a couple of commercial products and quite a few of my own products and found these benefits:

  • Familiar language to code in (C#/ VB.NET)
  • Easier to maintain (we have more .NET than SQL gurus on staff)
  • SQL generated is well structured and very optimal
  • Allows direct translation of business rules to SQL while still keeping all business logic in a single project

As for LINQ as a concept I use it all the time, because I understand what it can/can't do and how to use it properly. Like any language feature it can be miss-used easily if people don't have an understanding of what it is and how to use it. I recommend the following blogs to get some of the concepts of LINQ down:

  • Bart De Smet - advanced
  • Charlie Calvert
  • Wriju

How does the following LINQ statement work?

The output is 2,4,6,8 because of deferred execution.

The query is actually executed when the query variable
is iterated over, not when the query variable is created.
This is called deferred execution.

-- Suprotim Agarwal, "Deferred vs Immediate Query Execution in LINQ"

There is another execution called Immediate Query Execution, which is useful for caching query results. From Suprotim Agarwal again:

To force immediate execution of a query that does not produce a singleton value, you can call the ToList(), ToDictionary(), ToArray(), Count(), Average() or Max() method on a query or query variable. These are called conversion operators which allow you to make a copy/snapshot of the result and access is as many times you want, without the need to re-execute the query.

If you want the output to be 2,4,6, use .ToList():

var list = new List<int>{1,2,4,5,6};
var even = list.Where(m => m%2 == 0).ToList();
list.Add(8);
foreach (var i in even)
{
Console.WriteLine(i);
}

Slow LINQ query for .ToArray()

Upto here this query run in no time.

Up to here, it hasn't actually done anything, except build a deferred-execution model that represents the pending query. It doesn't start iterating until you call MoveNext() on the iterator, i.e. via foreach, in your case via .ToArray().

So: it takes time because it is doing work.

Consider:

static IEnumerable<int> GetData()
{
Console.WriteLine("a");
yield return 0;
Console.WriteLine("b");
yield return 1;
Console.WriteLine("c");
yield return 2;
Console.WriteLine("d");
}
static void Main()
{
Console.WriteLine("start");
var data = GetData();
Console.WriteLine("got data");
foreach (var item in data)
Console.WriteLine(item);
Console.WriteLine("end");
}

This outputs:

start
got data
a
0
b
1
c
2
d
end

Note how the work doesn't all happen at once - it is both deferred (a comes after got data) and spooling (we don't get a,...,d,0,...2).


Related: this is roughly how Distinct() works, from comments:

public static IEnumerable<T> Distinct<T>(this IEnumerable<T> source) {
var seen = new HashSet<T>();
foreach(var item in source) {
if(seen.Add(item)) yield return item;
}
}

...

and a new Join operation:

public static string Join(this IEnumerable<string> source, string separator) {
using(var iter = source.GetEnumerator()) {
if(!iter.MoveNext()) return "";
var sb = new StringBuilder(iter.Current);
while(iter.MoveNext())
sb.Append(separator).Append(iter.Current);
return sb.ToString();
}
}

and use:

string s = d.Join(",");


Related Topics



Leave a reply



Submit