Entity Framework Very Slow to Load for First Time After Every Compilation

Entity framework very slow to load for first time after every compilation

On the first query EF compiles the model. This can take some serious time for a model this large.

Here are 3 suggestions: http://www.fusonic.net/en/blog/2014/07/09/three-steps-for-fast-entityframework-6.1-first-query-performance/

A summary:

Using a cached db model store
Generate pre-compiled views
Generate pre-compiled version of entityframework using n-gen to avoid jitting

I would also make sure that I compile the application in release mode when doing the benchmarks.

Another solution is to look at splitting the DBContext. 400 entities is a lot and it should be nicer to work with smaller chunks. I haven't tried it but I assume it would be possible to build the models one by one meaning no single load takes 15s. See this post by Julie Lerman https://msdn.microsoft.com/en-us/magazine/jj883952.aspx

Entity-framework code is slow when using Include() many times

tl;dr Multiple Includes blow up the SQL result set. Soon it becomes cheaper to load data by multiple database calls instead of running one mega statement. Try to find the best mixture of Include and Load statements.

it does seem that there is a performance penalty when using Include

That's an understatement! Multiple Includes quickly blow up the SQL query result both in width and in length. Why is that?

Growth factor of `Include`s

(This part applies Entity Framework classic, v6 and earlier)

Let's say we have

root entity Root
parent entity Root.Parent
child entities Root.Children1 and Root.Children2
a LINQ statement Root.Include("Parent").Include("Children1").Include("Children2")

This builds a SQL statement that has the following structure:

SELECT *, <PseudoColumns>
FROM Root
JOIN Parent
JOIN Children1

UNION

SELECT *, <PseudoColumns>
FROM Root
JOIN Parent
JOIN Children2

These <PseudoColumns> consist of expressions like CAST(NULL AS int) AS [C2], and they serve to have the same amount of columns in all UNION-ed queries. The first part adds pseudo columns for Child2, the second part adds pseudo columns for Child1.

This is what it means for the size of the SQL result set:

Number of columns in the SELECT clause is the sum of all columns in the four tables
The number of rows is the sum of records in included child collections

Since the total number of data points is columns * rows, each additional Include exponentially increases the total number of data points in the result set. Let me demonstrate that by taking Root again, now with an additional Children3 collection. If all tables have 5 columns and 100 rows, we get:

One Include (Root + 1 child collection): 10 columns * 100 rows = 1000 data points.

Two Includes (Root + 2 child collections): 15 columns * 200 rows = 3000 data points.

Three Includes (Root + 3 child collections): 20 columns * 300 rows = 6000 data points.

With 12 Includes this would amount to 78000 data points!

Conversely, if you get all records for each table separately instead of 12 Includes, you have 13 * 5 * 100 data points: 6500, less than 10%!

Now these numbers are somewhat exaggerated in that many of these data points will be null, so they don't contribute much to the actual size of the result set that is sent to the client. But the query size and the task for the query optimizer certainly get affected negatively by increasing numbers of Includes.

Balance

So using Includes is a delicate balance between the cost of database calls and data volume. It's hard to give a rule of the thumb, but by now you can imagine that the data volume generally quickly outgrows the cost of extra calls if there are more than ~3 Includes for child collections (but quite a bit more for parent Includes, that only widen the result set).

Alternative

The alternative to Include is to load data in separate queries:

context.Configuration.LazyLoadingEnabled = false;
var rootId = 1;
context.Children1.Where(c => c.RootId == rootId).Load();
context.Children2.Where(c => c.RootId == rootId).Load();
return context.Roots.Find(rootId);

This loads all required data into the context's cache. During this process, EF executes relationship fixup by which it auto-populates navigation properties (Root.Children etc.) by loaded entities. The end result is identical to the statement with Includes, except for one important difference: the child collections are not marked as loaded in the entity state manager, so EF will try to trigger lazy loading if you access them. That's why it's important to turn off lazy loading.

In reality, you will have to figure out which combination of Include and Load statements work best for you.

Other aspects to consider

Each Include also increases query complexity, so the database's query optimizer will have to make increasingly more effort to find the best query plan. At some point this may no longer succeed. Also, when some vital indexes are missing (esp. on foreign keys) performance may suffer by adding Includes, even with the best query plan.

Entity Framework core

Cartesian explosion

For some reason, the behavior described above, UNIONed queries, was abandoned as of EF core 3. It now builds one query with joins. When the query is "star" shaped¹ this leads to Cartesian explosion (in the SQL result set). I can only find a note announcing this breaking change, but it doesn't say why.

Split queries

To counter this Cartesian explosion, Entity Framework core 5 introduced the concept of split queries that enables loading related data in multiple queries. It prevents building one massive, multiplied SQL result set. Also, because of lower query complexity, it may reduce the time it takes to fetch data even with multiple roundtrips. However, it may lead to inconsistent data when concurrent updates occur.

¹Multiple 1:n relationships off of the query root.

Entity Framework: Count() very slow on large DbSet and complex WHERE clause

If you can possibly avoid it, don't count for pagination. Just return the first page. It's always expensive to count and adds little to the user experience.

And in any case you're building the dynamic search wrong.

You're calling IEnumerable.Count(Func<ParcelOrder,bool>), which will force client-side evaluation where you should be calling IQueryable.Count(Expression<Func<ParcelOrder,bool>>). Here:

    Func<ParcelOrder, bool> completeExpression = order => userPrivilegeValidation(order) && criteriaMatchValidation(order);
    searchModel.PaginationTotalCount = db.ParcelOrder.Count(completeExpression);

But there's a simpler, better pattern for this in EF: just conditionally add criteria to your IQueryable.

eg put a method on your DbContext like this:

public IQueryable<ParcelOrder> SearchParcels(ParcelOrderSearchModel searchModel)
{
        var q = this.ParcelOrders();
        if (searchModel.KeyUploadID != null)
        {
          q = q.Where( po => po.UploadID == searchModel.KeyUploadID );
        }
        if (searchModel.KeyCustomerID != null)
        {
          q = q.Where( po.CustomerID == searchModel.KeyCustomerID );
        }
        //. . .
        return q;
}

Entity Framework Very Slow to Load for First Time After Every Compilation