Performance of Find() vs. FirstOrDefault()
I was able to mimic your results so I decompiled your program and there is a difference between Find
and FirstOrDefault
.
First off here is the decompiled program. I made your data object an anonmyous data item just for compilation
List<\u003C\u003Ef__AnonymousType0<string>> source = Enumerable.ToList(Enumerable.Select(Enumerable.Range(0, 1000000), i =>
{
var local_0 = new
{
Name = Guid.NewGuid().ToString()
};
return local_0;
}));
source.Insert(999000, new
{
Name = diana
});
stopwatch.Restart();
Enumerable.FirstOrDefault(source, c => c.Name == diana);
stopwatch.Stop();
Console.WriteLine("Diana was found in {0} ms with System.Linq.Enumerable.FirstOrDefault().", (object) stopwatch.ElapsedMilliseconds);
stopwatch.Restart();
source.Find(c => c.Name == diana);
stopwatch.Stop();
Console.WriteLine("Diana was found in {0} ms with System.Collections.Generic.List<T>.Find().", (object) stopwatch.ElapsedMilliseconds);
The key thing to notice here is that FirstOrDefault
is called on Enumerable
whereas Find
is called as a method on the source list.
So, what is find doing? This is the decompiled Find
method
private T[] _items;
[__DynamicallyInvokable]
public T Find(Predicate<T> match)
{
if (match == null)
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
for (int index = 0; index < this._size; ++index)
{
if (match(this._items[index]))
return this._items[index];
}
return default (T);
}
So it's iterating over an array of items which makes sense, since a list is a wrapper on an array.
However, FirstOrDefault
, on the Enumerable
class, uses foreach
to iterate the items. This uses an iterator to the list and move next. I think what you are seeing is the overhead of the iterator
[__DynamicallyInvokable]
public static TSource FirstOrDefault<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if (source == null)
throw Error.ArgumentNull("source");
if (predicate == null)
throw Error.ArgumentNull("predicate");
foreach (TSource source1 in source)
{
if (predicate(source1))
return source1;
}
return default (TSource);
}
Foreach is just syntatic sugar on using the enumerable pattern. Look at this image
.
I clicked on foreach to see what it's doing and you can see dotpeek wants to take me to the enumerator/current/next implementations which makes sense.
Other than that they are basically the same (testing the passed in predicate to see if an item is what you want)
Should we always use .Find() rather than .FirstOrDefault() when we have the primary key in Entity Framework Core?
According to the reference source DbSet.Find will not access the database if an object with the same keyValues is already fetched in the DbContext:
/// Finds an entity with the given primary key values.
/// If an entity with the given primary key values exists in the context, then it is
/// returned immediately without making a request to the store.
public abstract object Find(params object[] keyValues);
FirstOrDefault, and similar functions will call IQueryable.GetEnumerator()
, which will ask the IQueryable for the interface to the Provider IQueryable.GetProvider()
and then call IQueryProvider.Execute(Expression)
to get the data defined by the Expression.
This will always access the database.
Suppose you have Schools with their Students, a simple one-to-many relationship. You also have a procedures to change Student data.
Student ChangeAddress(dbContext, int studentId, Address address);
Student ChangeSchool(dbContext, int studentId, int SchoolId);
You have this in procedures, because these procedure will check the validity of the changes, probably Eton Students are not allowed to live on Oxford Campus, and there might be Schools that only allow Students from a certain age.
You have the following code that uses these procedures:
void ChangeStudent(int studentId, Address address, int schoolId)
{
using (var dbContext = new SchoolDbContext())
{
ChangeAddress(dbContext, studentId, address);
ChangeSchool(dbContext, studentId, schoolId);
dbContext.SaveChanges();
}
}
If the Change... functions would use FirstOrDefault()
then you would lose changes made by the other procedure.
However, sometimes you want to be able to re-fetch the database data, for instance, because others might have changed the data, or some changes you just made are invalid
Student student = dbContext.Students.Find(10);
// let user change student attributes
...
bool changesAccepted = AskIfChangesOk();
if (!changesAccepted)
{ // Refetch the student.
// can't use Find, because that would give the changed Student
student = dbContext.Students.Where(s => s.Id == 10).FirstOrDefault();
}
// now use the refetched Student with the original data
Find() vs. Where().FirstOrDefault()
Where is the Find
method on IEnumerable<T>
? (Rhetorical question.)
The Where
and FirstOrDefault
methods are applicable against multiple kinds of sequences, including List<T>
, T[]
, Collection<T>
, etc. Any sequence that implements IEnumerable<T>
can use these methods. Find
is available only for the List<T>
. Methods that are generally more applicable, are then more reusable and have a greater impact.
I guess my next question would be why did they add the find at all. That is a good tip. The only thing I can think of is that the FirstOrDefault could return a different default value other than null. Otherwise it just seems like a pointless addition
Find
on List<T>
predates the other methods. List<T>
was added with generics in .NET 2.0, and Find
was part of the API for that class. Where
and FirstOrDefault
were added as extension methods for IEnumerable<T>
with Linq, which is a later .NET version. I cannot say with certainty that if Linq existed with the 2.0 release that Find
would never have been added, but that is arguably the case for many other features that came in earlier .NET versions that were made obsolete or redundant by later versions.
EF Core: FirstOrDefault() vs SingleOrDefault() performance comparison
Can we assume that in only one existing result scenario
SingleOrDefault()
scans the whole table and is ~2x slower thanFirstOrDefault()
?
It depends, so generally no.
For unique indexed column criteria (like PK), it doesn't matter - the SQL optimizers are smart enough to use the information about value cardinality from the index definition.
For others it's similar to LINQ to Objects. In general (in case there are no value distribution statistics) database has to perform full table scan. So the question is if it can stop the search earlier. If no matching item exists, then both operations are the same O(N). If item exists, FirstOrDefault
can stop earlier while SingleOrDefault
should complete the full scan. Hence it depends of how close the matching item is to the "beginning" of the scan and if second matching item exists and how close it is. In case only one matching value can exist, we could expect the later method to be average 2 times slower.
Is there any recommendation which method should be used? For instance, when we are querying by PK and are sure that there's only one result?
This is considered opinionated, but I would say FirstOrDefault
- not only because of the performance. SingleOrDefault
is strange method in my opinion since it serves two different purposes - finding a match and validating that there are no duplicates. The second part usually is ensured by the property data structure in memory and unique constraint/index in database.
Finally, since there are some comments/discussions regarding Find
method vs the aforementioned two methods when searching for PK. The main purpose of Find
method is that it first searches the local cache and returns the object from there if exists. If it doesn't, then it executes internally what - FirstOrDefault
. This should answer what EF Core team think is preferable.
Performance of LINQ Any vs FirstOrDefault != null
The enumeration in Any()
stops as soon as it finds a matching item as well:
https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.any
I would expect the performance to be very similar. Note that the FirstOrDefault
version won't work with a collection of value types (since the default isn't null) but the Any
version would.
LINQ: When to use SingleOrDefault vs. FirstOrDefault() with filtering criteria
Whenever you use SingleOrDefault
, you clearly state that the query should result in at most a single result. On the other hand, when FirstOrDefault
is used, the query can return any amount of results but you state that you only want the first one.
I personally find the semantics very different and using the appropriate one, depending on the expected results, improves readability.
FirstOrDefault() || Select() which is faster?
I'm 90% certain that
var test = dc.TableBlah.FirstOrDefault(_blah => _blah.id == 1);
sets up the exact same expression tree as
var test = (from blah in dc.TableBlah
where blah.id == 1
select blah).FirstOrDefault();
So your second example simply lacks the benefit of getting a single record by calling FirstOrDefault()
. In terms of performance, they'll be identical.
Personally, I would use SingleOrDefault()
instead, since you're looking for a single item. SingleOrDefault()
will throw if you receive more than one record.
Related Topics
Winforms: Application.Exit VS Environment.Exit VS Form.Close
Compare Two Lists for Differences
Hosting Clr in Delphi With/Without Jcl - Example
One Class Per File Rule in .Net
How to "Await Yield Return Dosomethingasync()"
Comparing Timer with Dispatchertimer
Why Can't I Have Protected Interface Members
Wait Some Seconds Without Blocking UI Execution
Send Keys Through Sendinput in User32.Dll
C# Generic "Where Constraint" with "Any Generic Type" Definition
Does ASP.NET MVC Have Application Variables
Fields of Class, Are They Stored in the Stack or Heap
What Are the Benefits of Resource(.Resx) Files
Visual Studio One Project with Several Dlls as Output