Order of LINQ extension methods does not affect performance?
I would have guessed that it would be slower to prepend Where since it must find all matching items and then take the first and a preceded FirstOrDefault could yield the first found item. Can somebody explain why i'm on the wrong track?
You are on the wrong track because your first statement is simply incorrect. Where
is not required to find all matching items before fetching the first matching item. Where
fetches matching items "on demand"; if you only ask for the first one, it only fetches the first one. If you only ask for the first two, it only fetches the first two.
Jon Skeet does a nice bit on stage. Imagine you have three people. The first person has a shuffled pack of cards. The second person has a t-shirt that says "where card is red". The third person pokes the second person and says "give me the first card". The second person pokes the first person over and over again until the first person hands over a red card, which the second person then hands to the third person. The second person has no reason to keep poking the first person; the task is done!
Now, if the second person's t-shirt says "order by rank ascending" then we have a very different situation. Now the second person really does need to get every card from the first person, in order to find the lowest card in the deck, before handing the first card to the third person.
This should now give you the necessary intuition to tell when order does matter for performance reasons. The net result of "give me the red cards and then sort them" is exactly the same as "sort all the cards then give me the red ones", but the former is much faster because you do not have to spend any time sorting the black cards that you are going to discard.
C# LINQ performance when extension method called inside where clause
I re-generated the scenarios you talked about in your question. I tried following code and got this output.
But this is how you can debug this.
static List<string> itemCollection = new List<string>();
static void Main(string[] args)
{
for (int i = 0; i < 10000000; i++)
{
itemCollection.Add(i.ToString());
}
var watch = new Stopwatch();
watch.Start();
Console.WriteLine(CheckIdExists(580748));
watch.Stop();
Console.WriteLine($"Took {watch.ElapsedMilliseconds}");
var watch1 = new Stopwatch();
watch1.Start();
Console.WriteLine(CheckIdExists1(580748));
watch1.Stop();
Console.WriteLine($"Took {watch1.ElapsedMilliseconds}");
Console.ReadLine();
}
public static bool CheckIdExists(int searchId)
{
return itemCollection.Any(item => item.Equals(ConvertToString(searchId)));
}
public static bool CheckIdExists1(int searchId)
{
string sId =ConvertToString(searchId);
return itemCollection.Any(item => item.Equals(sId));
}
public static string ConvertToString(int input)
{
return Convert.ToString(input, CultureInfo.InvariantCulture);
}
OUTPUT:
True
Took 170
True
Took 11
Linq IEnumerable Extension Method - How to improve performance?
I believe this solution will provide the best performance and will scale better as the sequences get larger because it doesn't allocate any additional buffers (Lists or Queues), nor does it have to convert the result to a List or do any counts on the result buffer. Plus, it only iterates over the sequence once.
public static IEnumerable<IEnumerable<T>> FindSequenceConsecutive<T>(this IEnumerable<T> sequence,
Predicate<T> predicate, int sequenceSize)
{
IEnumerable<T> window = Enumerable.Repeat(default(T), 0);
int count = 0;
foreach (var item in sequence)
{
if (predicate(item))
{
window = window.Concat(Enumerable.Repeat(item, 1));
count++;
if (count == sequenceSize)
{
yield return window;
window = window.Skip(1);
count--;
}
}
else
{
count = 0;
window = Enumerable.Repeat(default(T), 0);
}
}
}
Different LINQ command execution and performance
If you use Linq-To-Entities(or whatever database provider), there might be no difference at all, because the database will take care of the optimization. But in general and especially with Linq-To-Objects the latter is better, because it is more efficient to order less.
Also Guru is right with his comment that the result could be different in theory. Enumerable.Where
does not guarantee the order in the documentation. But they won't change it because they would break a lot of code(it's mentioned here that the order ist stable if the input sequence was ordered).
Edit: Actually some providers already change that order arbitrarily(for example PLINQ).
I have asked a similar question long time ago which answer you might find helpful:
Order of LINQ extension methods does not affect performance? Note that it's just about Linq-To-Objects.
How does OrderBy extension method orders collection
LINQ believe in deferred execution. This means the expression will only be evaluated when you started iterating or accessing the result.
LINQ c# efficiency
I can't speak much to the AsEnumerable()
call or the field conversions, but for the LINQ side of things, the orderby
is a stable quick sort and should be O(n log n)
. If I had to guess, everything but the orderby
should be O(n)
, so overall you're still just O(n log n)
.
Update: the LINQ Distinct()
call should also be O(n)
.
So altogether, the Big-Oh for this thing is still O(Kn log n)
, where K is some constant.
Sort an array and return all number that are smaller then X
Why do you want to sort before you filter? That is less efficient than to filter first and order the rest:
var sorted = array.Where(i => i < x).OrderBy(i => i).ToArray();
or in query syntax:
var sorted = (from i in array
where i < x
orderby i
select i).ToArray();
Order of LINQ extension methods does not affect performance? (the title is misleading, actually OrderBy
is one of the exceptions where the order matters as E.Lippert explains in his answer)
How efficient are chained LINQ statements?
First off yes that is creating a "iterator" and does not actually do any iterating until you materialized the query in a foreach
or by calling ToList
on it. When you do that the number of iterations that occur are based on the underlying type. Reverse
will create a buffer array for whatever source you give it and iterate over it backwards. If the source is ICollection<T>
then it will use its CopyTo
method to populate the array which should usually result in a single bulk copy of contiguous data in constant time. If it's not a ICollection<T>
then it will iterate the source into the buffer and then iterate over it backwards. With that in mind here's what happens for your specific query when you iterate.
First the last Reverse
will start iterating its source (which is not an ICollection<T>
).
Then the Skip
will start iterating its source
Then the first Reverse will either do a CopyTo
if its source is a ICollection<T>
or it will iterate the source into a buffer array that it resizes as needed.
Then the first Reverse will iterate over its buffer array backwards
Then the Skip will take the results skipping the first two and yielding the rest
Then the last Reverse will take the result and add them to its buffer array and resize it as needed.
Finally the last Reverse will iterate the buffer array backwards.
So if you're dealing with a ICollecion<T>
that's one CopyTo
and then 1 iteration of all of the values and then 1 iteration of all but 2 of the values. If it's not a ICollection<T>
that's basically 3 iterations of the values (really the last iteration is of all but 2). And either way it's also using two intermediate arrays in the process.
And to prove that the query does no iterations until you materialize it you can check out this example
void Main()
{
var query = MyValues().Reverse().Skip(2).Reverse();
Console.WriteLine($"After query before materialization");
var results = query.ToList();
Console.WriteLine(string.Join(",", results));
}
public IEnumerable<int> MyValues()
{
for(int i = 0; i < 10; i ++)
{
Console.WriteLine($"yielding {i}");
yield return i;
}
}
Which produces the output
After query before materialization
yielding 0
yielding 1
yielding 2
yielding 3
yielding 4
yielding 5
yielding 6
yielding 7
yielding 8
yielding 9
0,1,2,3,4,5,6,7
When compared to the other example you had x.Take(x.Count() - 2)
, that will iterate the source before you materialize it once for the Count
(unless it's ICollection
or ICollection<T>
in which case it will just use the Count
property) then it will iterate it again when you materialize it.
Here's the same example with the different code and the resulting output.
void Main()
{
var x = MyValues();
var query = x.Take(x.Count() - 2);
Console.WriteLine($"After query before materialization");
var results = query.ToList();
Console.WriteLine(string.Join(",", results));
}
public IEnumerable<int> MyValues()
{
for(int i = 0; i < 10; i ++)
{
Console.WriteLine($"yielding {i}");
yield return i;
}
}
With the output
yielding 0
yielding 1
yielding 2
yielding 3
yielding 4
yielding 5
yielding 6
yielding 7
yielding 8
yielding 9
After query before materialization
yielding 0
yielding 1
yielding 2
yielding 3
yielding 4
yielding 5
yielding 6
yielding 7
0,1,2,3,4,5,6,7
So which one is better is completely dependent on the source. For a ICollection<T>
or ICollection
the Take
and Count
would be preferred (unless the source might change between when the query is created and when it is materialized), but if it's neither of those you might prefer the double Reverse
to avoid iterating the source twice (especially if the source can change between when you create the query and actually materialize it as the size might change as well), but that has to be weighted against the increase in total iterations done and memory usage.
Related Topics
Run Two Async Tasks in Parallel and Collect Results in .Net 4.5
Capture Stored Procedure Print Output in .Net
How to Assign a Base Class Object to a Derived Class Reference with an Explicit Typecast
How to Deserialize JSON into Ienumerable<Basetype> with Newtonsoft JSON.Net
C# Okay with Comparing Value Types to Null
Conditional Operator Assignment with Nullable<Value> Types
ASP.NET Core Web API Exception Handling
Change Current Linux User in a C# Application Running with Mono
Does Mono Support System.Drawing and System.Drawing.Printing
Mono Shared Library Under Linux Location
How to Create a Navigation Menu in Dotnet Application
How to Use Visual Studio Code to Develop Unity3D Projects in Ubuntu
Embedding JavaScript Engine into .Net
Difference Between Equals/Equals and == Operator
Most Elegant Way to Generate Prime Numbers
Show a Form Without Stealing Focus