Fastest Way of Inserting in Entity Framework

Fastest Way of Inserting in Entity Framework

To your remark in the comments to your question:

"...SavingChanges (for each
record
)..."

That's the worst thing you can do! Calling SaveChanges() for each record slows bulk inserts extremely down. I would do a few simple tests which will very likely improve the performance:

  • Call SaveChanges() once after ALL records.
  • Call SaveChanges() after for example 100 records.
  • Call SaveChanges() after for example 100 records and dispose the context and create a new one.
  • Disable change detection

For bulk inserts I am working and experimenting with a pattern like this:

using (TransactionScope scope = new TransactionScope())
{
MyDbContext context = null;
try
{
context = new MyDbContext();
context.Configuration.AutoDetectChangesEnabled = false;

int count = 0;
foreach (var entityToInsert in someCollectionOfEntitiesToInsert)
{
++count;
context = AddToContext(context, entityToInsert, count, 100, true);
}

context.SaveChanges();
}
finally
{
if (context != null)
context.Dispose();
}

scope.Complete();
}

private MyDbContext AddToContext(MyDbContext context,
Entity entity, int count, int commitCount, bool recreateContext)
{
context.Set<Entity>().Add(entity);

if (count % commitCount == 0)
{
context.SaveChanges();
if (recreateContext)
{
context.Dispose();
context = new MyDbContext();
context.Configuration.AutoDetectChangesEnabled = false;
}
}

return context;
}

I have a test program which inserts 560.000 entities (9 scalar properties, no navigation properties) into the DB. With this code it works in less than 3 minutes.

For the performance it is important to call SaveChanges() after "many" records ("many" around 100 or 1000). It also improves the performance to dispose the context after SaveChanges and create a new one. This clears the context from all entites, SaveChanges doesn't do that, the entities are still attached to the context in state Unchanged. It is the growing size of attached entities in the context what slows down the insertion step by step. So, it is helpful to clear it after some time.

Here are a few measurements for my 560000 entities:

  • commitCount = 1, recreateContext = false: many hours (That's your current procedure)
  • commitCount = 100, recreateContext = false: more than 20 minutes
  • commitCount = 1000, recreateContext = false: 242 sec
  • commitCount = 10000, recreateContext = false: 202 sec
  • commitCount = 100000, recreateContext = false: 199 sec
  • commitCount = 1000000, recreateContext = false: out of memory exception
  • commitCount = 1, recreateContext = true: more than 10 minutes
  • commitCount = 10, recreateContext = true: 241 sec
  • commitCount = 100, recreateContext = true: 164 sec
  • commitCount = 1000, recreateContext = true: 191 sec

The behaviour in the first test above is that the performance is very non-linear and decreases extremely over time. ("Many hours" is an estimation, I never finished this test, I stopped at 50.000 entities after 20 minutes.) This non-linear behaviour is not so significant in all other tests.

Bulk Insert with Entity Framework 6

You can use the following library:

https://github.com/MikaelEliasson/EntityFramework.Utilities

It works well for simple bulk inserts and updates.

You should also look at the following post if you want to find out about other options to achieve bulk insert:

Fastest Way of Inserting in Entity Framework

How to increase insert speed using Bulk insert using AddRange and then SaveChanges in Entity Framework

It doesn't matter if you use Add in a foreach or AddRange, problem lies in SaveChanges method, as it stores changes in observed entities one by one I think. There are libraries out there that allows for real bulk insert of entities using under the hood mechanism of SqlBulkCopy

Link to EF Core library: EFCore.BulkExtensions

EDIT:
For EF6 I found this nuget: EntityFramework6.BulkInsert but I haven't personally used it so I can't say anything about it.

EDIT 2: I simplified this, using AddRange over Add will improve time of adding entities to change tracker, but still SaveChanges will could take very long time, so it's not a solution.

Improving bulk insert performance in Entity framework

There is opportunity for several improvements (if you are using DbContext):

Set:

yourContext.Configuration.AutoDetectChangesEnabled = false;
yourContext.Configuration.ValidateOnSaveEnabled = false;

Do SaveChanges() in packages of 100 inserts... or you can try with packages of 1000 items and see the changes in performance.

Since during all this inserts, the context is the same and it is getting bigger, you can rebuild your context object every 1000 inserts. var yourContext = new YourContext(); I think this is the big gain.

Doing this improvements in an importing data process of mine, took it from 7 minutes to 6 seconds.

The actual numbers... could not be 100 or 1000 in your case... try it and tweak it.

Insert huge number of rows into database using Entity Framework

When I add my seeding method to Configuration.cs and run update-database command it takes less than 5 minutes to insert all rows.

It works best when calling Context.AddRange() only once.

        dbContext.Configuration.AutoDetectChangesEnabled = false;
dbContext.Configuration.ValidateOnSaveEnabled = false;
dbContext.ReportData.AddRange(recordsList);
dbContext.SaveChanges();

Entity Framework insertion performance

All common tricks like:

  • AutoDetectChangesEnabled = false
  • Use AddRange over Add
  • Etc.

Will not work like you already have noticed since the performance problem is not within Entity Framework but with SQL Azure

SQL Azure may look pretty cool at first but it's slow as hell unless you paid for a very good Premium Database Tier.

As Evk recommended, you should try to execute a simple SQL Command like "SELECT 1" and you will notice this probably take more than 100ms which is ridiculously slow.

Solution:

  • Move to a better SQL Azure Tier
  • Move away from SQL Azure

Disclaimer: I'm the owner of the project Entity Framework Extensions

Another solution is using this library which will batch multiple queries/bulk operations. However again, even if this library is very fast, you will need a better SQL Azure Tier since it look every database round-trip take more than 200ms in your case.

Fastest way of inserting many parent and child records

Disclaimer: I'm the owner of the project Entity Framework Extensions

Here is the fastest way of inserting, updating, deleting, and merging. You can even make it easier and use BulkSaveChanges over SaveChanges.

// Using BulkSaveChanges
using (var db = new MyDBContext())
{
db.ScenarioCategory.AddRange(categories);
db.BulkSaveChanges();
}

// Using BulkInsert on parent then child
using (var db = new MyDBContext())
{
db.BulkInsert(categories);
db.BulkInsert(categories.SelectMany(x => x.Items);
}


Related Topics



Leave a reply



Submit