Why Is Linq .Where(Predicate).First() Faster Than .First(Predicate)

Why is LINQ .Where(predicate).First() faster than .First(predicate)?

I got the same results: where+first was quicker than first.

As Jon noted, Linq uses lazy evaluation so the performance should be (and is) broadly similar for both methods.

Looking in Reflector, First uses a simple foreach loop to iterate through the collection but Where has a variety of iterators specialised for different collection types (arrays, lists, etc.). Presumably this is what gives Where the small advantage.

Difference between using .First() and .Where().First()

It depends. If the LINQ is translating to a piece of SQL then it depends on how that translation is handled. If you are using LINQ to objects (e.g. you are looking at an existing in memory array) then, while the end result in the same, the performance is markedly different. I ran some bench marks and was actually surprised at the result. I would have assumed that array.First() would be more efficient than array.Where(...).First(), but I found it to be the other way around.

I created a test, and to see how long it would take to traverse the array, I put the search item the last on the array. I did 200 tests of each, each test consisting of 1000 iterations. The average result, in Ticks, was:

First()         = 2655969
Where().First() = 1455211

As you can see Where().First() takes roughly about half the time of First() alone.

My benchmarking application is as follows:

class Program
{
private const int internalIterations = 1000;
private const int externalIterations = 100;
private const int dataSize = 100000;
private const int search = dataSize - 1;

private static readonly long[] resultsFirst = new long[externalIterations*2];
private static readonly long[] resultsWhereFirst = new long[externalIterations*2];
private static readonly int[] data = Enumerable.Range(0, dataSize).ToArray();

static void Main(string[] args)
{
Stopwatch sw = new Stopwatch();
for (int i = 0; i < externalIterations; i++)
{
Console.WriteLine("Iteration {0} of {1}", i+1, externalIterations);
sw.Restart();
First();
sw.Stop();
resultsFirst[i*2] = sw.ElapsedTicks;
Console.WriteLine(" First : {0}", sw.ElapsedTicks);

sw.Restart();
WhereFirst();
sw.Stop();
resultsWhereFirst[i*2] = sw.ElapsedTicks;
Console.WriteLine("WhereFirst : {0}", sw.ElapsedTicks);

sw.Restart();
WhereFirst();
sw.Stop();
resultsWhereFirst[(i*2)+1] = sw.ElapsedTicks;
Console.WriteLine("WhereFirst : {0}", sw.ElapsedTicks);

sw.Restart();
First();
sw.Stop();
resultsFirst[(i*2)+1] = sw.ElapsedTicks;
Console.WriteLine(" First : {0}", sw.ElapsedTicks);
}

Console.WriteLine("Done!");
Console.WriteLine("Averages:");
Console.WriteLine(" First Average: {0:0.00}", resultsFirst.Average());
Console.WriteLine("WhereFirst Average: {0:0.00}", resultsWhereFirst.Average());

}

private static void WhereFirst()
{
for (int i = 0; i < internalIterations; i++)
{
int item = data.Where(d => d == search).First();
}
}

private static void First()
{
for (int i = 0; i < internalIterations; i++)
{
int item = data.First(d => d == search);
}
}
}

Update

I tried using a List instead of an array as the source of the data and found that it was slower.

The data creation line looks like this:

private static readonly List<int> data = Enumerable.Range(0, dataSize).ToList();

And the end result was:

First()         = 3222609
Where().First() = 2124652

Which is faster: Single(predicate) or Where(predicate).Single()

LINQ-to-Objects

Nothing answers a question like this like a benchmark:

(Updated)

class Program
{
const int N = 10000;
volatile private static int s_val;

static void DoTest(IEnumerable<int> data, int[] selectors) {
Stopwatch s;

// Using .Single(predicate)
s = Stopwatch.StartNew();
foreach (var t in selectors) {
s_val = data.Single(x => x == t);
}
s.Stop();
Console.WriteLine(" {0} calls to Single(predicate) took {1} ms.",
selectors.Length, s.ElapsedMilliseconds);

// Using .Where(predicate).Single()
s = Stopwatch.StartNew();
foreach (int t in selectors) {
s_val = data.Where(x => x == t).Single();
}
s.Stop();
Console.WriteLine(" {0} calls to Where(predicate).Single() took {1} ms.",
selectors.Length, s.ElapsedMilliseconds);
}

public static void Main(string[] args) {
var R = new Random();
var selectors = Enumerable.Range(0, N).Select(_ => R.Next(0, N)).ToArray();

Console.WriteLine("Using IEnumerable<int> (Enumerable.Range())");
DoTest(Enumerable.Range(0, 10 * N), selectors);

Console.WriteLine("Using int[]");
DoTest(Enumerable.Range(0, 10*N).ToArray(), selectors);

Console.WriteLine("Using List<int>");
DoTest(Enumerable.Range(0, 10 * N).ToList(), selectors);

Console.ReadKey();
}
}

Somewhat shockingly, .Where(predicate).Single() wins by a factor of about two. I even ran both cases twice to make sure caching, etc. was not a factor.

1) 10000 calls to Single(predicate) took 7938 ms.
1) 10000 calls to Where(predicate).Single() took 3795 ms.
2) 10000 calls to Single(predicate) took 8132 ms.
2) 10000 calls to Where(predicate).Single() took 4318 ms.

Updated Results:

Using IEnumerable<int>  (Enumerable.Range())
10000 calls to Single(predicate) took 7838 ms.
10000 calls to Where(predicate).Single() took 8104 ms.
Using int[]
10000 calls to Single(predicate) took 8859 ms.
10000 calls to Where(predicate).Single() took 2970 ms.
Using List<int>
10000 calls to Single(predicate) took 9523 ms.
10000 calls to Where(predicate).Single() took 3781 ms.

using LINQ where(lamdaexp).First and First(lambdaexp)

(I assume that you are using Linq to .Net)
Firstly let's look at their source codes:

Here is Where():

public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if (source == null)
throw Error.ArgumentNull("source");
if (predicate == null)
throw Error.ArgumentNull("predicate");
if (source is Iterator<TSource>)
return ((Iterator<TSource>)source).Where(predicate);
if (source is TSource[])
return new WhereArrayIterator<TSource>((TSource[])source, predicate);
if (source is List<TSource>)
return new WhereListIterator<TSource>((List<TSource>)source, predicate);
return new WhereEnumerableIterator<TSource>(source, predicate);
}

And here is First()

   public static TSource First<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {
if (source == null) throw Error.ArgumentNull("source");
if (predicate == null) throw Error.ArgumentNull("predicate");
foreach (TSource element in source) {
if (predicate(element)) return element;
}
throw Error.NoMatch();
}

And let's find what does each code do:

  1. products.First(p => p.ProductID == 12);

Based on source code of the First() we can say, First() will iterate over collection, and will stop iteration when it finds first item in a collection which mets the criteria.


  1. products.Where(p => p.ProductID == 12).First();

First it will create iterator in Where method where elements meet the criteria. Then, again it will add get the first elemnt to that iterator. And, again will returm the first element as soon as it finds it.

As an additional note, LINQ uses deferred execution in some methods. And it has some relationship with the result of your question.

Deferred execution is a pattern of the execution model by which the
CLR ensures a value will be extracted only when it is required from
the IEnumerable-based information source. When any Linq operator
uses the deferred execution, the CLR encapsulates the related
information, such as the original sequence, predicate, or selector (if
any), into an iterator, which will be used when the information is
extracted from the original sequence using ToList method or
ForEachmethod or manually using the underlying GetEnumerator and
MoveNext methods in C#.

And, the question is. Which one is faster?


The main point is that, Where().First is quicker than First. in case of List and Array. Otherwise, First() will be faster. Here is the detailed exlpanation in @Akash Kava answer.

Let's pay an attention to Where() implementation. It will return WhereListIterator() if your collection is List, but First() will just iterate over source.
And in my opinion they have made some speed up in the implementation of WhereListIterator. And after this we are calling First() method which takes no predicate as input and only will iterate on filtered collection.

Also, as I understand, The Where iterator avoids indirect virtual table call, but calls iterator methods directly. And, this is the reason of this speed-up.

Why is this function faster and why are multiple enumerations of it faster than the first?

You're not materializing the results of your query until after you print out the stopwatch's elapsed time. LINQ queries use deferred execution to avoid actually executing the query until they are enumerated. In the case of your second method you're calling Count before building the query. Count needs to actually enumerate the entire result set to compute it's value. This means that your second method needs to iterate the sequence each time, whereas the first query is able to successfully defer its work until after you display the time. I would expect it to have more work to do, in many situations, it just succeeds at waiting until after you're done timing it.

As for why the first is faster when called multiple times, that pretty much just comes down to the fact that the JIT warmup that takes place when executing any code. The second method is does get that speedup, but since it doesn't get to omit the iteration of the query each time (which is a large portion of its cost) the percent speedup is much smaller.

Note that your second query iterates the source sequence twice (unless the enumerable happens to implement ICollection). This means that if the object is something that can be efficient iterated multiple times, it's not a problem. If it implements IList it will in fact be much faster, but if it is something like say an IQueryable that needs to execute an expensive query against a database each time its iterated it will need to do that twice, not once. If it's a query that doesn't even have the same contents when iterated multiple times then that can cause all sorts of problems.

Why first linq act 20 times faster than second one [Getting Max value]

The second has to compute the Max() for every single item in your collection. This effectively makes the brd.MohreHa.First call quadratic, since it's going to be checking against every item once for each item.

The first option only does the Count() call each time, and then does a single ordering at the end. This avoids the need to enumerate for Max() N times.

Linq query - Tidy Up/Optimization

You can call it without where:

ListOfMyClass.OrderByDescending(x => x.DateCreated).First(x => !x.Deleted);

First accepts predicates also. It really does not matter if you write Where and First in this case or just First it's evaluated to the same query at all :) But First looks cleaner.

As Jeroen van Langen said Where is slightly faster than First. So I assume that using Where.OrderByDescending.First is the best solution so far.



Related Topics



Leave a reply



Submit