Find() VS. Where().Firstordefault()

Find() vs. Where().FirstOrDefault()

Where is the Find method on IEnumerable<T>? (Rhetorical question.)

The Where and FirstOrDefault methods are applicable against multiple kinds of sequences, including List<T>, T[], Collection<T>, etc. Any sequence that implements IEnumerable<T> can use these methods. Find is available only for the List<T>. Methods that are generally more applicable, are then more reusable and have a greater impact.

I guess my next question would be why did they add the find at all. That is a good tip. The only thing I can think of is that the FirstOrDefault could return a different default value other than null. Otherwise it just seems like a pointless addition

Find on List<T> predates the other methods. List<T> was added with generics in .NET 2.0, and Find was part of the API for that class. Where and FirstOrDefault were added as extension methods for IEnumerable<T> with Linq, which is a later .NET version. I cannot say with certainty that if Linq existed with the 2.0 release that Find would never have been added, but that is arguably the case for many other features that came in earlier .NET versions that were made obsolete or redundant by later versions.

Performance of Find() vs. FirstOrDefault()

I was able to mimic your results so I decompiled your program and there is a difference between Find and FirstOrDefault.

First off here is the decompiled program. I made your data object an anonmyous data item just for compilation

    List<\u003C\u003Ef__AnonymousType0<string>> source = Enumerable.ToList(Enumerable.Select(Enumerable.Range(0, 1000000), i =>
{
var local_0 = new
{
Name = Guid.NewGuid().ToString()
};
return local_0;
}));
source.Insert(999000, new
{
Name = diana
});
stopwatch.Restart();
Enumerable.FirstOrDefault(source, c => c.Name == diana);
stopwatch.Stop();
Console.WriteLine("Diana was found in {0} ms with System.Linq.Enumerable.FirstOrDefault().", (object) stopwatch.ElapsedMilliseconds);
stopwatch.Restart();
source.Find(c => c.Name == diana);
stopwatch.Stop();
Console.WriteLine("Diana was found in {0} ms with System.Collections.Generic.List<T>.Find().", (object) stopwatch.ElapsedMilliseconds);

The key thing to notice here is that FirstOrDefault is called on Enumerable whereas Find is called as a method on the source list.

So, what is find doing? This is the decompiled Find method

private T[] _items;

[__DynamicallyInvokable]
public T Find(Predicate<T> match)
{
if (match == null)
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
for (int index = 0; index < this._size; ++index)
{
if (match(this._items[index]))
return this._items[index];
}
return default (T);
}

So it's iterating over an array of items which makes sense, since a list is a wrapper on an array.

However, FirstOrDefault, on the Enumerable class, uses foreach to iterate the items. This uses an iterator to the list and move next. I think what you are seeing is the overhead of the iterator

[__DynamicallyInvokable]
public static TSource FirstOrDefault<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if (source == null)
throw Error.ArgumentNull("source");
if (predicate == null)
throw Error.ArgumentNull("predicate");
foreach (TSource source1 in source)
{
if (predicate(source1))
return source1;
}
return default (TSource);
}

Foreach is just syntatic sugar on using the enumerable pattern. Look at this image

Sample Image.

I clicked on foreach to see what it's doing and you can see dotpeek wants to take me to the enumerator/current/next implementations which makes sense.

Other than that they are basically the same (testing the passed in predicate to see if an item is what you want)

Should we always use .Find() rather than .FirstOrDefault() when we have the primary key in Entity Framework Core?

According to the reference source DbSet.Find will not access the database if an object with the same keyValues is already fetched in the DbContext:

///     Finds an entity with the given primary key values.
/// If an entity with the given primary key values exists in the context, then it is
/// returned immediately without making a request to the store.
public abstract object Find(params object[] keyValues);

FirstOrDefault, and similar functions will call IQueryable.GetEnumerator(), which will ask the IQueryable for the interface to the Provider IQueryable.GetProvider() and then call IQueryProvider.Execute(Expression) to get the data defined by the Expression.
This will always access the database.

Suppose you have Schools with their Students, a simple one-to-many relationship. You also have a procedures to change Student data.

Student ChangeAddress(dbContext, int studentId, Address address);
Student ChangeSchool(dbContext, int studentId, int SchoolId);

You have this in procedures, because these procedure will check the validity of the changes, probably Eton Students are not allowed to live on Oxford Campus, and there might be Schools that only allow Students from a certain age.

You have the following code that uses these procedures:

void ChangeStudent(int studentId, Address address, int schoolId)
{
using (var dbContext = new SchoolDbContext())
{
ChangeAddress(dbContext, studentId, address);
ChangeSchool(dbContext, studentId, schoolId);
dbContext.SaveChanges();
}
}

If the Change... functions would use FirstOrDefault() then you would lose changes made by the other procedure.

However, sometimes you want to be able to re-fetch the database data, for instance, because others might have changed the data, or some changes you just made are invalid

Student student = dbContext.Students.Find(10);
// let user change student attributes
...

bool changesAccepted = AskIfChangesOk();
if (!changesAccepted)
{ // Refetch the student.
// can't use Find, because that would give the changed Student
student = dbContext.Students.Where(s => s.Id == 10).FirstOrDefault();
}

// now use the refetched Student with the original data

.Where(condition).FirstOrDefault() vs .FirstOrDefault(condition)

Both generate the same SQL statement. The second approach is shorter, while the first might be clearer to some developers. Ultimately it's a matter of personal preference.

You can inspect the SQL using the ObjectQuery.ToTraceString method.

Difference between LINQ FirstOrDefault vs. Where(...).FirstOrDefault?

If we are talking about Linq to Objects, then there is one notable difference. Second statement

Where(someField => someField.Name.Equals(settings.Text)).FirstOrDefault() 

Will create WhereEnumerableIterator internally, and then it will start enumeration and take first item:

// argument checks and collection optimizations removed
public static IEnumerable<TSource> Where<TSource>(
this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
// it enumerates source and returns items which match predicate
return new WhereEnumerableIterator<TSource>(source, predicate);
}

public static TSource First<TSource>(this IEnumerable<TSource> source)
{
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
if (enumerator.MoveNext())
return enumerator.Current;
}

throw Error.NoElements();
}

But first statement will just take first item from source which matches predicate without creating additional enumerator:

// argument checks removed
public static TSource First<TSource>(
this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
foreach (TSource local in source)
{
if (predicate(local))
return local;
}

throw Error.NoMatch();
}

So, first one is better in terms of performance:

FirstOrDefault(someField => someField.Name.Equals(settings.Text))

Is is better to use Find() or Single() to select an item from the database?

The main difference is that Find() searches the first level cache of the DbContext first. If there is no hit, it will query the database. Single() will always query the database. In addition to that, Find() returns null when no entity was found. Single() will throw an exception. So the equivalent of Find() would be SingleOrDefault().

Entity Framework Find vs. Where

The point is that Find() starts by searching in the local cache of the context and then, if no match, sends a query to the DB.

Call to Where() always sends a query to the DB.

With EF 4, I used to think that SQL generated by Find() was too complex and, in some cases, led to a performance issue. So I always use Where() even with EF 5. I should check the SQL generated by Find() with EF 5.

So on paper, Find() is better because it uses the cache.



Related Topics



Leave a reply



Submit