What's the Fastest Way to Bulk Insert a Lot of Data in SQL Server (C# Client)

Insert 2 million rows into SQL Server quickly

You can try with SqlBulkCopy class.

Lets you efficiently bulk load a SQL Server table with data from
another source.

There is a cool blog post about how you can use it.

Best/fastest way to bulk insert data into a sql server database for load testing?

You can speed this exact code up by wrapping a transaction around the loop. That way SQL Server does not have to harden the log to disk on each iteration (possibly multiple times depending on how often you issue a DML statement in that proc).

That said, the fastest way to go is to insert all records at once. Something like

insert into Target
select someComputedColumns
from Numbers n
WHERE n.ID <= 10000

This should execute in <<1sec for typical cases. It breaks the encapsulation of using that procedure, though.

Most performant way to insert thousands of rows of data into Azure SQL DB?

How else can I optimize the performance?

SqlBulkCopy is the best. You can load a DataTable to load with in-memory data, or use an adapter like this to convert a collection of in-memory objects to an IDataReader for use with SqlBulkCopy.

You can also send each batch as JSON doc as a parameter to a SQL query, where you read it with OPENJSON.

Both of these should be faster than single-row inserts.

Client-side loading methods in (rough) order of slowest to fastest are:

  • single-row inserts, no transaction
  • single-row inserts, with transaction
  • single-row inserts with TSQL batching
  • single-row inserts with TDS batching (SqlDataAdapter)
  • bulk insert with XML or JSON
  • bulk insert with Table-valued Parameters
  • bulk insert with SqlBulkCopy

Fastest way of performing Bulk Update in C# / .NET

The fastest way would be to bulk insert the data into temporary table using the built in SqlBulkCopy Class, and then update using join to that table

Or you can use a tool such as SqlBulkTools which does exactly this in an easy way.

var bulk = new BulkOperations();

using (TransactionScope trans = new TransactionScope())
{
using (SqlConnection conn = new SqlConnection("Data Source=.;Initial Catalog=mydb;Integrated Security=SSPI")
{
bulk.Setup()
.ForCollection(items)
.WithTable("Items")
.AddColumn(x => x.QuantitySold)
.BulkUpdate()
.MatchTargetOn(x => x.ItemID)
.Commit(conn);
}

trans.Complete();
}

Fastest way to insert 1 million rows in SQL Server

I think what you are looking for is Bulk Insert if you prefer using SQL.

Or there is also the ADO.NET for Batch Operations option, so you keep the logic in your C# application. This article is also very complete.

Update

Yes I'm afraid bulk insert will only work with imported files (from within the database).

I have an experience in a Java project where we needed to insert millions of rows (data came from outside the application btw).

Database was Oracle, so of course we used the multi-line insert of Oracle. It turned out that the Java batch update was much faster than the multi-valued insert of Oracle (so called "bulk updates").

My suggestion is:

  • Compare the performance between the multi-value insert of SQL Server code (then you can read from inside your database, a procedure if you like) with the ADO.NET Batch Insert.

If the data you are going to manipulate is coming from outside your application (if it is not already in the database), I would say just go for the ADO.NET Batch Inserts. I think that its your case.

Note: Keep in mind that batch inserts usually operate with the same query. That is what makes them so fast.

Fastest way to insert many records in the database

Check out SqlBulkCopy.

It's designed for fast insertion of bulk data. I've found it to be fastest when using the TableLock option and setting a BatchSize of around 10,000, but it's best to test the different scenarios with your own data.

You may also find the following useful.

SQLBulkCopy Performance Analysis

Fastest Way of Inserting in Entity Framework

To your remark in the comments to your question:

"...SavingChanges (for each
record
)..."

That's the worst thing you can do! Calling SaveChanges() for each record slows bulk inserts extremely down. I would do a few simple tests which will very likely improve the performance:

  • Call SaveChanges() once after ALL records.
  • Call SaveChanges() after for example 100 records.
  • Call SaveChanges() after for example 100 records and dispose the context and create a new one.
  • Disable change detection

For bulk inserts I am working and experimenting with a pattern like this:

using (TransactionScope scope = new TransactionScope())
{
MyDbContext context = null;
try
{
context = new MyDbContext();
context.Configuration.AutoDetectChangesEnabled = false;

int count = 0;
foreach (var entityToInsert in someCollectionOfEntitiesToInsert)
{
++count;
context = AddToContext(context, entityToInsert, count, 100, true);
}

context.SaveChanges();
}
finally
{
if (context != null)
context.Dispose();
}

scope.Complete();
}

private MyDbContext AddToContext(MyDbContext context,
Entity entity, int count, int commitCount, bool recreateContext)
{
context.Set<Entity>().Add(entity);

if (count % commitCount == 0)
{
context.SaveChanges();
if (recreateContext)
{
context.Dispose();
context = new MyDbContext();
context.Configuration.AutoDetectChangesEnabled = false;
}
}

return context;
}

I have a test program which inserts 560.000 entities (9 scalar properties, no navigation properties) into the DB. With this code it works in less than 3 minutes.

For the performance it is important to call SaveChanges() after "many" records ("many" around 100 or 1000). It also improves the performance to dispose the context after SaveChanges and create a new one. This clears the context from all entites, SaveChanges doesn't do that, the entities are still attached to the context in state Unchanged. It is the growing size of attached entities in the context what slows down the insertion step by step. So, it is helpful to clear it after some time.

Here are a few measurements for my 560000 entities:

  • commitCount = 1, recreateContext = false: many hours (That's your current procedure)
  • commitCount = 100, recreateContext = false: more than 20 minutes
  • commitCount = 1000, recreateContext = false: 242 sec
  • commitCount = 10000, recreateContext = false: 202 sec
  • commitCount = 100000, recreateContext = false: 199 sec
  • commitCount = 1000000, recreateContext = false: out of memory exception
  • commitCount = 1, recreateContext = true: more than 10 minutes
  • commitCount = 10, recreateContext = true: 241 sec
  • commitCount = 100, recreateContext = true: 164 sec
  • commitCount = 1000, recreateContext = true: 191 sec

The behaviour in the first test above is that the performance is very non-linear and decreases extremely over time. ("Many hours" is an estimation, I never finished this test, I stopped at 50.000 entities after 20 minutes.) This non-linear behaviour is not so significant in all other tests.

What would be the fastest way to insert 2.5 million rows of data parsed from a text file, into Sql server

SqlBulkCopy. Read about it. IN the documentation.

FASTER - because it is not really written smart - is to make this into a temp table, then at the end of that insert into the final table. SqlBulkCopy locks the whole table, this bypasses it and allows the table to be used during the upload.

Then use multiple threads to insert blocks of a lot more than 10000 rows per go.

I manage more than 100.000 rows - per second - on a lower end database server (that is 48gb memory, about a dozen SAS discs - and yes, that is lower end).



Related Topics



Leave a reply



Submit