Performance of Find() VS. Firstordefault()

Performance of Find() vs. FirstOrDefault()

I was able to mimic your results so I decompiled your program and there is a difference between Find and FirstOrDefault.

First off here is the decompiled program. I made your data object an anonmyous data item just for compilation

    List<\u003C\u003Ef__AnonymousType0<string>> source = Enumerable.ToList(Enumerable.Select(Enumerable.Range(0, 1000000), i =>
{
var local_0 = new
{
Name = Guid.NewGuid().ToString()
};
return local_0;
}));
source.Insert(999000, new
{
Name = diana
});
stopwatch.Restart();
Enumerable.FirstOrDefault(source, c => c.Name == diana);
stopwatch.Stop();
Console.WriteLine("Diana was found in {0} ms with System.Linq.Enumerable.FirstOrDefault().", (object) stopwatch.ElapsedMilliseconds);
stopwatch.Restart();
source.Find(c => c.Name == diana);
stopwatch.Stop();
Console.WriteLine("Diana was found in {0} ms with System.Collections.Generic.List<T>.Find().", (object) stopwatch.ElapsedMilliseconds);

The key thing to notice here is that FirstOrDefault is called on Enumerable whereas Find is called as a method on the source list.

So, what is find doing? This is the decompiled Find method

private T[] _items;

[__DynamicallyInvokable]
public T Find(Predicate<T> match)
{
if (match == null)
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
for (int index = 0; index < this._size; ++index)
{
if (match(this._items[index]))
return this._items[index];
}
return default (T);
}

So it's iterating over an array of items which makes sense, since a list is a wrapper on an array.

However, FirstOrDefault, on the Enumerable class, uses foreach to iterate the items. This uses an iterator to the list and move next. I think what you are seeing is the overhead of the iterator

[__DynamicallyInvokable]
public static TSource FirstOrDefault<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if (source == null)
throw Error.ArgumentNull("source");
if (predicate == null)
throw Error.ArgumentNull("predicate");
foreach (TSource source1 in source)
{
if (predicate(source1))
return source1;
}
return default (TSource);
}

Foreach is just syntatic sugar on using the enumerable pattern. Look at this image

Sample Image.

I clicked on foreach to see what it's doing and you can see dotpeek wants to take me to the enumerator/current/next implementations which makes sense.

Other than that they are basically the same (testing the passed in predicate to see if an item is what you want)

Should we always use .Find() rather than .FirstOrDefault() when we have the primary key in Entity Framework Core?

According to the reference source DbSet.Find will not access the database if an object with the same keyValues is already fetched in the DbContext:

///     Finds an entity with the given primary key values.
/// If an entity with the given primary key values exists in the context, then it is
/// returned immediately without making a request to the store.
public abstract object Find(params object[] keyValues);

FirstOrDefault, and similar functions will call IQueryable.GetEnumerator(), which will ask the IQueryable for the interface to the Provider IQueryable.GetProvider() and then call IQueryProvider.Execute(Expression) to get the data defined by the Expression.
This will always access the database.

Suppose you have Schools with their Students, a simple one-to-many relationship. You also have a procedures to change Student data.

Student ChangeAddress(dbContext, int studentId, Address address);
Student ChangeSchool(dbContext, int studentId, int SchoolId);

You have this in procedures, because these procedure will check the validity of the changes, probably Eton Students are not allowed to live on Oxford Campus, and there might be Schools that only allow Students from a certain age.

You have the following code that uses these procedures:

void ChangeStudent(int studentId, Address address, int schoolId)
{
using (var dbContext = new SchoolDbContext())
{
ChangeAddress(dbContext, studentId, address);
ChangeSchool(dbContext, studentId, schoolId);
dbContext.SaveChanges();
}
}

If the Change... functions would use FirstOrDefault() then you would lose changes made by the other procedure.

However, sometimes you want to be able to re-fetch the database data, for instance, because others might have changed the data, or some changes you just made are invalid

Student student = dbContext.Students.Find(10);
// let user change student attributes
...

bool changesAccepted = AskIfChangesOk();
if (!changesAccepted)
{ // Refetch the student.
// can't use Find, because that would give the changed Student
student = dbContext.Students.Where(s => s.Id == 10).FirstOrDefault();
}

// now use the refetched Student with the original data

Find() vs. Where().FirstOrDefault()

Where is the Find method on IEnumerable<T>? (Rhetorical question.)

The Where and FirstOrDefault methods are applicable against multiple kinds of sequences, including List<T>, T[], Collection<T>, etc. Any sequence that implements IEnumerable<T> can use these methods. Find is available only for the List<T>. Methods that are generally more applicable, are then more reusable and have a greater impact.

I guess my next question would be why did they add the find at all. That is a good tip. The only thing I can think of is that the FirstOrDefault could return a different default value other than null. Otherwise it just seems like a pointless addition

Find on List<T> predates the other methods. List<T> was added with generics in .NET 2.0, and Find was part of the API for that class. Where and FirstOrDefault were added as extension methods for IEnumerable<T> with Linq, which is a later .NET version. I cannot say with certainty that if Linq existed with the 2.0 release that Find would never have been added, but that is arguably the case for many other features that came in earlier .NET versions that were made obsolete or redundant by later versions.

EF Core: FirstOrDefault() vs SingleOrDefault() performance comparison

Can we assume that in only one existing result scenario SingleOrDefault() scans the whole table and is ~2x slower than FirstOrDefault()?

It depends, so generally no.

For unique indexed column criteria (like PK), it doesn't matter - the SQL optimizers are smart enough to use the information about value cardinality from the index definition.

For others it's similar to LINQ to Objects. In general (in case there are no value distribution statistics) database has to perform full table scan. So the question is if it can stop the search earlier. If no matching item exists, then both operations are the same O(N). If item exists, FirstOrDefault can stop earlier while SingleOrDefault should complete the full scan. Hence it depends of how close the matching item is to the "beginning" of the scan and if second matching item exists and how close it is. In case only one matching value can exist, we could expect the later method to be average 2 times slower.

Is there any recommendation which method should be used? For instance, when we are querying by PK and are sure that there's only one result?

This is considered opinionated, but I would say FirstOrDefault - not only because of the performance. SingleOrDefault is strange method in my opinion since it serves two different purposes - finding a match and validating that there are no duplicates. The second part usually is ensured by the property data structure in memory and unique constraint/index in database.

Finally, since there are some comments/discussions regarding Find method vs the aforementioned two methods when searching for PK. The main purpose of Find method is that it first searches the local cache and returns the object from there if exists. If it doesn't, then it executes internally what - FirstOrDefault. This should answer what EF Core team think is preferable.

Performance of LINQ Any vs FirstOrDefault != null

The enumeration in Any() stops as soon as it finds a matching item as well:

https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.any

I would expect the performance to be very similar. Note that the FirstOrDefault version won't work with a collection of value types (since the default isn't null) but the Any version would.

LINQ: When to use SingleOrDefault vs. FirstOrDefault() with filtering criteria

Whenever you use SingleOrDefault, you clearly state that the query should result in at most a single result. On the other hand, when FirstOrDefault is used, the query can return any amount of results but you state that you only want the first one.

I personally find the semantics very different and using the appropriate one, depending on the expected results, improves readability.

FirstOrDefault() || Select() which is faster?

I'm 90% certain that

var test = dc.TableBlah.FirstOrDefault(_blah => _blah.id == 1);

sets up the exact same expression tree as

var test = (from blah in dc.TableBlah
where blah.id == 1
select blah).FirstOrDefault();

So your second example simply lacks the benefit of getting a single record by calling FirstOrDefault(). In terms of performance, they'll be identical.

Personally, I would use SingleOrDefault() instead, since you're looking for a single item. SingleOrDefault() will throw if you receive more than one record.



Related Topics



Leave a reply



Submit