Order of Linq Extension Methods Does Not Affect Performance

Order of LINQ extension methods does not affect performance?

I would have guessed that it would be slower to prepend Where since it must find all matching items and then take the first and a preceded FirstOrDefault could yield the first found item. Can somebody explain why i'm on the wrong track?

You are on the wrong track because your first statement is simply incorrect. Where is not required to find all matching items before fetching the first matching item. Where fetches matching items "on demand"; if you only ask for the first one, it only fetches the first one. If you only ask for the first two, it only fetches the first two.

Jon Skeet does a nice bit on stage. Imagine you have three people. The first person has a shuffled pack of cards. The second person has a t-shirt that says "where card is red". The third person pokes the second person and says "give me the first card". The second person pokes the first person over and over again until the first person hands over a red card, which the second person then hands to the third person. The second person has no reason to keep poking the first person; the task is done!

Now, if the second person's t-shirt says "order by rank ascending" then we have a very different situation. Now the second person really does need to get every card from the first person, in order to find the lowest card in the deck, before handing the first card to the third person.

This should now give you the necessary intuition to tell when order does matter for performance reasons. The net result of "give me the red cards and then sort them" is exactly the same as "sort all the cards then give me the red ones", but the former is much faster because you do not have to spend any time sorting the black cards that you are going to discard.

C# LINQ performance when extension method called inside where clause

I re-generated the scenarios you talked about in your question. I tried following code and got this output.

But this is how you can debug this.

static List<string> itemCollection = new List<string>();

static void Main(string[] args)
{

for (int i = 0; i < 10000000; i++)
{
itemCollection.Add(i.ToString());
}

var watch = new Stopwatch();
watch.Start();

Console.WriteLine(CheckIdExists(580748));
watch.Stop();
Console.WriteLine($"Took {watch.ElapsedMilliseconds}");

var watch1 = new Stopwatch();
watch1.Start();

Console.WriteLine(CheckIdExists1(580748));
watch1.Stop();
Console.WriteLine($"Took {watch1.ElapsedMilliseconds}");

Console.ReadLine();
}

public static bool CheckIdExists(int searchId)
{
return itemCollection.Any(item => item.Equals(ConvertToString(searchId)));
}

public static bool CheckIdExists1(int searchId)
{
string sId =ConvertToString(searchId);

return itemCollection.Any(item => item.Equals(sId));
}

public static string ConvertToString(int input)
{
return Convert.ToString(input, CultureInfo.InvariantCulture);
}

OUTPUT:

True
Took 170
True
Took 11

Linq IEnumerable Extension Method - How to improve performance?

I believe this solution will provide the best performance and will scale better as the sequences get larger because it doesn't allocate any additional buffers (Lists or Queues), nor does it have to convert the result to a List or do any counts on the result buffer. Plus, it only iterates over the sequence once.

public static IEnumerable<IEnumerable<T>> FindSequenceConsecutive<T>(this IEnumerable<T> sequence,
Predicate<T> predicate, int sequenceSize)
{
IEnumerable<T> window = Enumerable.Repeat(default(T), 0);

int count = 0;

foreach (var item in sequence)
{
if (predicate(item))
{
window = window.Concat(Enumerable.Repeat(item, 1));
count++;

if (count == sequenceSize)
{
yield return window;
window = window.Skip(1);
count--;
}
}
else
{
count = 0;
window = Enumerable.Repeat(default(T), 0);
}
}
}

Different LINQ command execution and performance

If you use Linq-To-Entities(or whatever database provider), there might be no difference at all, because the database will take care of the optimization. But in general and especially with Linq-To-Objects the latter is better, because it is more efficient to order less.

Also Guru is right with his comment that the result could be different in theory. Enumerable.Where does not guarantee the order in the documentation. But they won't change it because they would break a lot of code(it's mentioned here that the order ist stable if the input sequence was ordered).
Edit: Actually some providers already change that order arbitrarily(for example PLINQ).

I have asked a similar question long time ago which answer you might find helpful:
Order of LINQ extension methods does not affect performance? Note that it's just about Linq-To-Objects.

How does OrderBy extension method orders collection

LINQ believe in deferred execution. This means the expression will only be evaluated when you started iterating or accessing the result.

LINQ c# efficiency

I can't speak much to the AsEnumerable() call or the field conversions, but for the LINQ side of things, the orderby is a stable quick sort and should be O(n log n). If I had to guess, everything but the orderby should be O(n), so overall you're still just O(n log n).

Update: the LINQ Distinct() call should also be O(n).

So altogether, the Big-Oh for this thing is still O(Kn log n), where K is some constant.

Sort an array and return all number that are smaller then X

Why do you want to sort before you filter? That is less efficient than to filter first and order the rest:

var sorted = array.Where(i => i < x).OrderBy(i => i).ToArray(); 

or in query syntax:

var sorted = (from i in array
where i < x
orderby i
select i).ToArray();

Order of LINQ extension methods does not affect performance? (the title is misleading, actually OrderBy is one of the exceptions where the order matters as E.Lippert explains in his answer)

How efficient are chained LINQ statements?

First off yes that is creating a "iterator" and does not actually do any iterating until you materialized the query in a foreach or by calling ToList on it. When you do that the number of iterations that occur are based on the underlying type. Reverse will create a buffer array for whatever source you give it and iterate over it backwards. If the source is ICollection<T> then it will use its CopyTo method to populate the array which should usually result in a single bulk copy of contiguous data in constant time. If it's not a ICollection<T> then it will iterate the source into the buffer and then iterate over it backwards. With that in mind here's what happens for your specific query when you iterate.

First the last Reverse will start iterating its source (which is not an ICollection<T>).

Then the Skip will start iterating its source

Then the first Reverse will either do a CopyTo if its source is a ICollection<T> or it will iterate the source into a buffer array that it resizes as needed.

Then the first Reverse will iterate over its buffer array backwards

Then the Skip will take the results skipping the first two and yielding the rest

Then the last Reverse will take the result and add them to its buffer array and resize it as needed.

Finally the last Reverse will iterate the buffer array backwards.

So if you're dealing with a ICollecion<T> that's one CopyTo and then 1 iteration of all of the values and then 1 iteration of all but 2 of the values. If it's not a ICollection<T> that's basically 3 iterations of the values (really the last iteration is of all but 2). And either way it's also using two intermediate arrays in the process.

And to prove that the query does no iterations until you materialize it you can check out this example

void Main()
{
var query = MyValues().Reverse().Skip(2).Reverse();
Console.WriteLine($"After query before materialization");
var results = query.ToList();
Console.WriteLine(string.Join(",", results));
}

public IEnumerable<int> MyValues()
{
for(int i = 0; i < 10; i ++)
{
Console.WriteLine($"yielding {i}");
yield return i;
}
}

Which produces the output

After query before materialization
yielding 0
yielding 1
yielding 2
yielding 3
yielding 4
yielding 5
yielding 6
yielding 7
yielding 8
yielding 9
0,1,2,3,4,5,6,7

When compared to the other example you had x.Take(x.Count() - 2), that will iterate the source before you materialize it once for the Count (unless it's ICollection or ICollection<T> in which case it will just use the Count property) then it will iterate it again when you materialize it.

Here's the same example with the different code and the resulting output.

void Main()
{
var x = MyValues();
var query = x.Take(x.Count() - 2);
Console.WriteLine($"After query before materialization");
var results = query.ToList();
Console.WriteLine(string.Join(",", results));
}

public IEnumerable<int> MyValues()
{
for(int i = 0; i < 10; i ++)
{
Console.WriteLine($"yielding {i}");
yield return i;
}
}

With the output

yielding 0
yielding 1
yielding 2
yielding 3
yielding 4
yielding 5
yielding 6
yielding 7
yielding 8
yielding 9
After query before materialization
yielding 0
yielding 1
yielding 2
yielding 3
yielding 4
yielding 5
yielding 6
yielding 7
0,1,2,3,4,5,6,7

So which one is better is completely dependent on the source. For a ICollection<T> or ICollection the Take and Count would be preferred (unless the source might change between when the query is created and when it is materialized), but if it's neither of those you might prefer the double Reverse to avoid iterating the source twice (especially if the source can change between when you create the query and actually materialize it as the size might change as well), but that has to be weighted against the increase in total iterations done and memory usage.



Related Topics



Leave a reply



Submit