.Include() VS .Load() Performance in Entityframework

.Include() vs .Load() performance in EntityFramework

It depends, try both

When using Include(), you get the benefit of loading all of your data in a single call to the underlying data store. If this is a remote SQL Server, for example, that can be a major performance boost.

The downside is that Include() queries tend to get really complicated, especially if you have any filters (Where() calls, for example) or try to do any grouping. EF will generate very heavily nested queries using sub-SELECT and APPLY statements to get the data you want. It is also much less efficient -- you get back a single row of data with every possible child-object column in it, so data for your top level objects will be repeated a lot of times. (For example, a single parent object with 10 children will product 10 rows, each with the same data for the parent-object's columns.) I've had single EF queries get so complex they caused deadlocks when running at the same time as EF update logic.

The Load() method is much simpler. Each query is a single, easy, straightforward SELECT statement against a single table. These are much easier in every possible way, except you have to do many of them (possibly many times more). If you have nested collections of collections, you may even need to loop through your top level objects and Load their sub-objects. It can get out of hand.

Quick rule-of-thumb

Try to avoid having any more than three Include calls in a single query. I find that EF's queries get too ugly to recognize beyond that; it also matches my rule-of-thumb for SQL Server queries, that up to four JOIN statements in a single query works very well, but after that it's time to consider refactoring.

However, all of that is only a starting point.

It depends on your schema, your environment, your data, and many other factors.

In the end, you will just need to try it out each way.

Pick a reasonable "default" pattern to use, see if it's good enough, and if not, optimize to taste.

What is the difference between load and include in sql query

var query = db.Customer
.Include(c => c.Address)
.Where(c => c.Address.Id > 10)
.ToList();

On above query where it brings all the related data using single database trip.

 var query = db.Customer
.Where(c => c.Address.Id > 10)
.ToList();

db.Address
.Where(a => a.Id > 10)
.Load();

Here it uses 2 database trips to bring the data.

Load :

There are several scenarios where you may want to load entities from
the database into the context without immediately doing anything with
those entities. A good example of this is loading entities for data
binding as described in Local Data. One common way to do this is to
write a LINQ query and then call ToList on it, only to immediately
discard the created list. The Load extension method works just like
ToList except that it avoids the creation of the list altogether.

Note : We cannot say which one is better.Most of the time we use eager loading method (Include).It is nice and simple.But sometimes it is slow.So you need to decide which one to use according to your data size and etc.

Entity Framework Include performance

Your second approach relies on the EF navigation property fixup process. The problem is though that every

query.Include(q => q.ItemNavN).Load();

statement will also include all the master record data along with the related entity data.

Using the same basic idea, one potential improvement could be to execute one Load per each navigation property, replacing the Include with either Select (for references) or SelectMany (for collections) - something similar to how EF Core processes the Includes internally.

Taking your second approach example, you could try the following and compare the performance:

var query = ctx.Filters.Where(x => x.SessionId == id)
.Join(ctx.Items, i => i.ItemId, fs => fs.Id, (f, fs) => fs);

query.Select(x => x.ItemNav1).Load();
query.Select(x => x.ItemNav2).Load();
query.Select(x => x.ItemNav3).Load();
query.Select(x => x.ItemNav4).Load();
query.Select(x => x.ItemNav5).Load();
query.Select(x => x.ItemNav6).Load();

var result = query.ToList();
// here all the navigation properties should be populated

Do you have to use .Include to load child objects in EF Core 5?

There are a few separate issues here, however the overall problem is that you are stuck trying to pick either the database (SQL) to manage the DTO projection OR the local C# runtime.

  • Your deferred SQL version (the initial attempt) failed due to the incorrect DateTime.ToString() syntax.
  • The C# version is verbose in terms of .Include(), mixed with magic strings, but also very inefficient, you are pulling back potentially hundreds of columns across the wire, only to ignore most of the columns when you project the results into the DTO

For assistance specifically on .Include please review these previous posts:

  • Lazy Loading not working in EntityFramework Core
  • Entity Framework - what's the difference between using Include/eager loading and lazy loading?
  • .Include() vs .Load() performance in EntityFramework
  • Lazy Loading vs Eager Loading

For problems like this you should consider a Hybrid approach, one that respects and exploits the strengths of both environments.

  1. Project the DB elements that your logic needs, but keep them in the raw format, let EF and SQL work together to optimise the output.

    NOTE: There is no need to formalise this projection model, instead we can simply use an anonymous type!

     List<StatementModel> myStatements = new List<StatementModel>();

    var dbQuery = db.Statements.Select(s => new {
    s.StatementId,
    s.EmployeeNumber,
    s.Employee.FirstName,
    s.Employee.LastName,
    s.Employee.PlanType.PlanTypeName,
    s.FiscalPeriod.StartDate,
    s.FiscalPeriod.EndDate,
    s.Employee.CostCenterId,
    Evp = new {
    s.Employee.CostCenter.Evp.FirstName,
    s.Employee.CostCenter.Evp.LastName
    },
    Svp = new {
    s.Employee.CostCenter.Svp.FirstName,
    s.Employee.CostCenter.Svp.LastName
    },
    LOBMgr = new {
    s.Employee.CostCenter.Lobmgr.FirstName,
    s.Employee.CostCenter.Lobmgr.LastName
    },
    s.AdminApprovalStatus.ApprovalStatusName,
    s.StatementStatus.StatementStatusName,
    s.AmountDue
    }).Where(s => s.StatementStatusId == "PAA");

    It is not necessary to nest the Evp,Svp,LOBMgr projections, you could flatten the entire resultset, it would actually be ever so slightly more efficient, but this shows the possibilities

  2. Now project the result set into DTOs in memory, in this way you get total C# control over the type casting and string formatting.

     myStatements = dbQuery.ToList()
    .Select(s => new StatementModel
    {
    StatmentId = s.StatementId,
    EmployeeNumber = s.EmployeeNumber,
    FirstName = s.FirstName,
    LastName = s.LastName,
    PlanType = s.PlanTypeName ?? "",
    FiscalPeriod = $"{s.StartDate:yyyy-MM-dd} - {s.EndDate:yyyy-MM-dd}",
    CostCenterId = s.CostCenterId,
    RVPName = $"{s.Evp.FirstName} {s.Evp.LastName}".Trim(),
    SVPName = $"{s.Svp.FirstName} {s.Svp.LastName}".Trim(),
    LOBMgrName = $"{s.LOBMgr.FirstName} {s.LOBMgr.LastName}".Trim(),
    AdminApprovalStatus = s.ApprovalStatusName,
    StatementStatus = s.StatementStatusName,
    AmountDue = s.AmountDue
    }).ToList();

Notice there are NO includes! we relace the Includes with the initial Projection. IMO I find this code is more natural, includes can be quite indirect, its not always obvious why we need them or when we've forgotten to add them at all. This code feels like double handling a bit, but we only bring back the specific columns we need and don't have to get in the way of clean SQL by trying to get the database to format the data values when we can do that with minimal effort in C#.

Date formatting is trivial, but this technique can be very powerful if you need to perform complex formatting or other processing logic on the results that you already have in c# without replicating that logic into SQL friendly Linq.



Avoid Magic strings in Includes

If you are going to use Includes, you should try to avoid the string variant of include and instead use the lambdas. This will allow the compiler to notify you or later devs when the schema changes might invalidate your query:

    .Include(x => x.Employee.CostCenter.Evp)
.Include(x => x.Employee.CostCenter.Svp)
.Include(x => x.Employee.CostCenter.Lobmgr)
.Include(x => x.FiscalPeriod)
.Include(x => x.AdminApprovalStatus)
.Include(x => x.StatementStatus)

Entity-framework code is slow when using Include() many times

tl;dr Multiple Includes blow up the SQL result set. Soon it becomes cheaper to load data by multiple database calls instead of running one mega statement. Try to find the best mixture of Include and Load statements.

it does seem that there is a performance penalty when using Include

That's an understatement! Multiple Includes quickly blow up the SQL query result both in width and in length. Why is that?

Growth factor of Includes

(This part applies Entity Framework classic, v6 and earlier)

Let's say we have

  • root entity Root
  • parent entity Root.Parent
  • child entities Root.Children1 and Root.Children2
  • a LINQ statement Root.Include("Parent").Include("Children1").Include("Children2")

This builds a SQL statement that has the following structure:

SELECT *, <PseudoColumns>
FROM Root
JOIN Parent
JOIN Children1

UNION

SELECT *, <PseudoColumns>
FROM Root
JOIN Parent
JOIN Children2

These <PseudoColumns> consist of expressions like CAST(NULL AS int) AS [C2], and they serve to have the same amount of columns in all UNION-ed queries. The first part adds pseudo columns for Child2, the second part adds pseudo columns for Child1.

This is what it means for the size of the SQL result set:

  • Number of columns in the SELECT clause is the sum of all columns in the four tables
  • The number of rows is the sum of records in included child collections

Since the total number of data points is columns * rows, each additional Include exponentially increases the total number of data points in the result set. Let me demonstrate that by taking Root again, now with an additional Children3 collection. If all tables have 5 columns and 100 rows, we get:

One Include (Root + 1 child collection): 10 columns * 100 rows = 1000 data points.

Two Includes (Root + 2 child collections): 15 columns * 200 rows = 3000 data points.

Three Includes (Root + 3 child collections): 20 columns * 300 rows = 6000 data points.

With 12 Includes this would amount to 78000 data points!

Conversely, if you get all records for each table separately instead of 12 Includes, you have 13 * 5 * 100 data points: 6500, less than 10%!

Now these numbers are somewhat exaggerated in that many of these data points will be null, so they don't contribute much to the actual size of the result set that is sent to the client. But the query size and the task for the query optimizer certainly get affected negatively by increasing numbers of Includes.

Balance

So using Includes is a delicate balance between the cost of database calls and data volume. It's hard to give a rule of the thumb, but by now you can imagine that the data volume generally quickly outgrows the cost of extra calls if there are more than ~3 Includes for child collections (but quite a bit more for parent Includes, that only widen the result set).

Alternative

The alternative to Include is to load data in separate queries:

context.Configuration.LazyLoadingEnabled = false;
var rootId = 1;
context.Children1.Where(c => c.RootId == rootId).Load();
context.Children2.Where(c => c.RootId == rootId).Load();
return context.Roots.Find(rootId);

This loads all required data into the context's cache. During this process, EF executes relationship fixup by which it auto-populates navigation properties (Root.Children etc.) by loaded entities. The end result is identical to the statement with Includes, except for one important difference: the child collections are not marked as loaded in the entity state manager, so EF will try to trigger lazy loading if you access them. That's why it's important to turn off lazy loading.

In reality, you will have to figure out which combination of Include and Load statements work best for you.

Other aspects to consider

Each Include also increases query complexity, so the database's query optimizer will have to make increasingly more effort to find the best query plan. At some point this may no longer succeed. Also, when some vital indexes are missing (esp. on foreign keys) performance may suffer by adding Includes, even with the best query plan.

Entity Framework core

Cartesian explosion

For some reason, the behavior described above, UNIONed queries, was abandoned as of EF core 3. It now builds one query with joins. When the query is "star" shaped1 this leads to Cartesian explosion (in the SQL result set). I can only find a note announcing this breaking change, but it doesn't say why.

Split queries

To counter this Cartesian explosion, Entity Framework core 5 introduced the concept of split queries that enables loading related data in multiple queries. It prevents building one massive, multiplied SQL result set. Also, because of lower query complexity, it may reduce the time it takes to fetch data even with multiple roundtrips. However, it may lead to inconsistent data when concurrent updates occur.


1Multiple 1:n relationships off of the query root.

Which is the better way to load a single entity with its related data with EF Core in ASP.NET Core?

Eager/lazy is not relevant here, since it's still you who decides the exact time when the requests will be done. Three requests will always be longer than one, so the question is, how often the linked arrays are needed. If they are needed every time, there is definitely no point in splitting the requests. If not... Well, you need to profile it yourself then.

But the performance should not be your main concern. To load everything in one request is simpler and easier to understand. So I would suggest to always go with the simpler solution, even if it's not the best performance-wise. And to start tweaking and optimizing only if it turns out that the performance is too low, and only if the profiling shows that the bad SQL is the reason for that. Spoiler: it probably won't.

Entity Framework on SQL Server CE - lazy vs eager loading, performance considerations

Open a connection to the database in your application startup code, and leave it open for the lifetime of your app. Do not use this connection for any data access.
That will open the SQL Compact file and load the SQL Compact dll files at startup (and only at startup).

It is unclear if your application is a web app or desktop app, but you can use code similar to this and call it from Application_Start / App_Startup etc.:

public static class ContextHelper
{
private static ChinookEntities context ;
private static object objLock = new object();

public static void Open()
{
lock (objLock)
{
if (context != null)
throw new InvalidOperationException("Already opened");
context = new ChinookEntities();
context.Connection.Open();
}
}
}

See the deployment section in my blog post here: http://erikej.blogspot.dk/2011/01/entity-framework-with-sql-server.html



Related Topics



Leave a reply



Submit