Why is inserting entities in EF 4.1 so slow compared to ObjectContext?
As already indicated by Ladislav in the comment, you need to disable automatic change detection to improve performance:
context.Configuration.AutoDetectChangesEnabled = false;
This change detection is enabled by default in the DbContext
API.
The reason why DbContext
behaves so different from the ObjectContext
API is that many more functions of the DbContext
API will call DetectChanges
internally than functions of the ObjectContext
API when automatic change detection is enabled.
Here you can find a list of those functions which call DetectChanges
by default. They are:
- The
Add
,Attach
,Find
,Local
, orRemove
members onDbSet
- The
GetValidationErrors
,Entry
, orSaveChanges
members onDbContext
- The
Entries
method onDbChangeTracker
Especially Add
calls DetectChanges
which is responsible for the poor performance you experienced.
I contrast to this the ObjectContext
API calls DetectChanges
only automatically in SaveChanges
but not in AddObject
and the other corresponding methods mentioned above. That's the reason why the default performance of ObjectContext
is faster.
Why did they introduce this default automatic change detection in DbContext
in so many functions? I am not sure, but it seems that disabling it and calling DetectChanges
manually at the proper points is considered as advanced and can easily introduce subtle bugs into your application so use [it] with care.
DbContext is very slow when adding and deleting
Try to add this to your DbContext tests:
dbContext.Configuration.AutoDetectChangesEnabled = false;
// Now do all your changes
dbContext.ChangeTracker.DetectChanges();
dbContext.SaveChanges();
and try to run your tests again.
There was some architectural change in DbContext API which checks changes in entities every time you Add
, Attach
or Delete
anything from the context. In ObjectContext API this detection run only when you triggered SaveChanges
. It is better solution for most common scenarios but it requires special handling for mass data processing.
Adding an object to the entity framework context takes about 1.5 seconds
As requested in the comments (in case this isn't closed a duplicate), the slowdown was related to automatic change detection, which is on by default in the DbContext
API.
To disable automatic change detection:
context.Configuration.AutoDetectChangesEnabled = false;
A much more complete/full description (which I certainly can't better here) can be found in this accepted answer:
Why is inserting entities in EF 4.1 so slow compared to ObjectContext?
What causes .Attach() to be slow in EF4?
I can confirm this slow behaviour and I also found the main reason. I've made a little test with the following model ...
public class MyClass
{
public int Id { get; set; }
public string P1 { get; set; }
// ... properties P2 to P49, all of type string
public string P50 { get; set; }
}
public class MyContext : DbContext
{
public DbSet<MyClass> MyClassSet { get; set; }
}
... and this test program ...
using (var context = new MyContext())
{
var list = new List<MyClass>();
for (int i = 0; i < 1000; i++)
{
var m = new MyClass()
{
Id = i+1,
P1 = "Some text ....................................",
// ... initialize P2 to P49, all with the same text
P50 = "Some text ...................................."
}
list.Add(m);
}
Stopwatch watch = new Stopwatch();
watch.Start();
foreach (var entity in list)
{
context.Set<MyClass>().Attach(entity);
context.Entry(entity).State = System.Data.EntityState.Modified;
}
watch.Stop();
long time = watch.ElapsedMilliseconds;
}
Test 1
Exactly the code above:
--> time = 29,2 sec
Test 2
Comment out the line ...
//context.Entry(entity).State = System.Data.EntityState.Modified;
--> time = 15,3 sec
Test 3
Comment out the line ...
//context.Set<MyClass>().Attach(entity);
--> time = 57,3 sec
This result is very strange because I expected that calling Attach
is not necessary because changing the state attaches anyway.
Test 4
Remove properties P6 to P50 (so we only have 5 strings in the entity), original code:
--> time = 3,4 sec
So, yes, obviously the number of properties strongly matters.
Test 5
Add the following line before the loop (model again with all 50 properties):
context.Configuration.AutoDetectChangesEnabled = false;
--> time = 1,4 sec
Test 6
Again with AutoDetectChangesEnabled = false
but with only 5 properties:
--> time = 1,3 sec
So, without change tracking the number of properties doesn't matter so much anymore.
Conclusion
By far most of the time seems to be spent for taking the snapshot of the attached object's properties by the change tracking mechanism. If you don't need it disable change tracking for your code snippet. (I guess in your code your really don't need change tracking because by setting the entitiy's state to Modified
you basically mark all properties as changed anyway. So all columns get sent to the database in an update statement.)
Edit
The test times above are in Debug mode. But Release mode doesn't make a big difference (for instance: Test 1 = 28,7 sec, Test 5 = 0,9 sec).
Inserting many rows with Entity Framework is extremely slow
One easy method is by using the EntityFramework.BulkInsert extension.
You can then do:
// Add all workers to database
var workforce = allWorkers.Values
.Select(i => new Worker
{
Reference = i.EMPLOYEE_REF,
Skills = i.GetSkills().Select(s => dbSkills[s]).ToArray(),
DefaultRegion = "wa",
DefaultEfficiency = i.TECH_EFFICIENCY
});
db.BulkInsert(workforce);
Entity Framework is Too Slow. What are my options?
You should start by profiling the SQL commands actually issued by the Entity Framework. Depending on your configuration (POCO, Self-Tracking entities) there is a lot room for optimizations. You can debug the SQL commands (which shouldn't differ between debug and release mode) using the ObjectSet<T>.ToTraceString()
method. If you encounter a query that requires further optimization you can use some projections to give EF more information about what you trying to accomplish.
Example:
Product product = db.Products.SingleOrDefault(p => p.Id == 10);
// executes SELECT * FROM Products WHERE Id = 10
ProductDto dto = new ProductDto();
foreach (Category category in product.Categories)
// executes SELECT * FROM Categories WHERE ProductId = 10
{
dto.Categories.Add(new CategoryDto { Name = category.Name });
}
Could be replaced with:
var query = from p in db.Products
where p.Id == 10
select new
{
p.Name,
Categories = from c in p.Categories select c.Name
};
ProductDto dto = new ProductDto();
foreach (var categoryName in query.Single().Categories)
// Executes SELECT p.Id, c.Name FROM Products as p, Categories as c WHERE p.Id = 10 AND p.Id = c.ProductId
{
dto.Categories.Add(new CategoryDto { Name = categoryName });
}
I just typed that out of my head, so this isn't exactly how it would be executed, but EF actually does some nice optimizations if you tell it everything you know about the query (in this case, that we will need the category-names). But this isn't like eager-loading (db.Products.Include("Categories")) because projections can further reduce the amount of data to load.
Fastest Way of Inserting in Entity Framework
To your remark in the comments to your question:
"...SavingChanges (for each
record)..."
That's the worst thing you can do! Calling SaveChanges()
for each record slows bulk inserts extremely down. I would do a few simple tests which will very likely improve the performance:
- Call
SaveChanges()
once after ALL records. - Call
SaveChanges()
after for example 100 records. - Call
SaveChanges()
after for example 100 records and dispose the context and create a new one. - Disable change detection
For bulk inserts I am working and experimenting with a pattern like this:
using (TransactionScope scope = new TransactionScope())
{
MyDbContext context = null;
try
{
context = new MyDbContext();
context.Configuration.AutoDetectChangesEnabled = false;
int count = 0;
foreach (var entityToInsert in someCollectionOfEntitiesToInsert)
{
++count;
context = AddToContext(context, entityToInsert, count, 100, true);
}
context.SaveChanges();
}
finally
{
if (context != null)
context.Dispose();
}
scope.Complete();
}
private MyDbContext AddToContext(MyDbContext context,
Entity entity, int count, int commitCount, bool recreateContext)
{
context.Set<Entity>().Add(entity);
if (count % commitCount == 0)
{
context.SaveChanges();
if (recreateContext)
{
context.Dispose();
context = new MyDbContext();
context.Configuration.AutoDetectChangesEnabled = false;
}
}
return context;
}
I have a test program which inserts 560.000 entities (9 scalar properties, no navigation properties) into the DB. With this code it works in less than 3 minutes.
For the performance it is important to call SaveChanges()
after "many" records ("many" around 100 or 1000). It also improves the performance to dispose the context after SaveChanges and create a new one. This clears the context from all entites, SaveChanges
doesn't do that, the entities are still attached to the context in state Unchanged
. It is the growing size of attached entities in the context what slows down the insertion step by step. So, it is helpful to clear it after some time.
Here are a few measurements for my 560000 entities:
- commitCount = 1, recreateContext = false: many hours (That's your current procedure)
- commitCount = 100, recreateContext = false: more than 20 minutes
- commitCount = 1000, recreateContext = false: 242 sec
- commitCount = 10000, recreateContext = false: 202 sec
- commitCount = 100000, recreateContext = false: 199 sec
- commitCount = 1000000, recreateContext = false: out of memory exception
- commitCount = 1, recreateContext = true: more than 10 minutes
- commitCount = 10, recreateContext = true: 241 sec
- commitCount = 100, recreateContext = true: 164 sec
- commitCount = 1000, recreateContext = true: 191 sec
The behaviour in the first test above is that the performance is very non-linear and decreases extremely over time. ("Many hours" is an estimation, I never finished this test, I stopped at 50.000 entities after 20 minutes.) This non-linear behaviour is not so significant in all other tests.
Related Topics
How to "Multiply" a String (In C#)
How to Retrieve Data from a SQL Server Database in C#
Linq List of Lists to Single List
Setting/Getting the Class Properties by String Name
How to Populate a Dropdownlist from a Database
Does List<T> Guarantee Insertion Order
Client Side Groupby Is Not Supported
How to Use Microsoft.Office.Interop.Excel on a MAChine Without Installed Ms Office
How to Override an Existing Extension Method
Creating Dynamic Queries with Entity Framework
How to Accept an Array as an ASP.NET MVC Controller Action Parameter
How to Determine for Which Platform an Executable Is Compiled
Capture the Screen Shot Using .Net
Why Is the "F" Required When Declaring Floats
Why Does (Does It Really) List<T> Implement All These Interfaces, Not Just Ilist<T>
ASP.NET MVC: How to Redirect a Non Www to Www and Vice Versa