Writing Large Number of Records (Bulk Insert) to Access in .Net/C#

Writing large number of records (bulk insert) to Access in .NET/C#

I found that using DAO in a specific manner is roughly 30 times faster than using ADO.NET. I am sharing the code and results in this answer. As background, in the below, the test is to write out 100 000 records of a table with 20 columns.

A summary of the technique and times - from best to worse:

  1. 02.8 seconds: Use DAO, use DAO.Field's to refer to the table columns
  2. 02.8 seconds: Write out to a text file, use Automation to import the text into Access
  3. 11.0 seconds: Use DAO, use the column index to refer to the table columns.
  4. 17.0 seconds: Use DAO, refer to the column by name
  5. 79.0 seconds: Use ADO.NET, generate INSERT statements for each row
  6. 86.0 seconds: Use ADO.NET, use DataTable to an DataAdapter for "batch" insert

As background, occasionally I need to perform analysis of reasonably large amounts of data, and I find that Access is the best platform. The analysis involves many queries, and often a lot of VBA code.

For various reasons, I wanted to use C# instead of VBA. The typical way is to use OleDB to connect to Access. I used an OleDbDataReader to grab millions of records, and it worked quite well. But when outputting results to a table, it took a long, long time. Over an hour.

First, let's discuss the two typical ways to write records to Access from C#. Both ways involve OleDB and ADO.NET. The first is to generate INSERT statements one at time, and execute them, taking 79 seconds for the 100 000 records. The code is:

public static double TestADONET_Insert_TransferToAccess()
{
StringBuilder names = new StringBuilder();
for (int k = 0; k < 20; k++)
{
string fieldName = "Field" + (k + 1).ToString();
if (k > 0)
{
names.Append(",");
}
names.Append(fieldName);
}

DateTime start = DateTime.Now;
using (OleDbConnection conn = new OleDbConnection(Properties.Settings.Default.AccessDB))
{
conn.Open();
OleDbCommand cmd = new OleDbCommand();
cmd.Connection = conn;

cmd.CommandText = "DELETE FROM TEMP";
int numRowsDeleted = cmd.ExecuteNonQuery();
Console.WriteLine("Deleted {0} rows from TEMP", numRowsDeleted);

for (int i = 0; i < 100000; i++)
{
StringBuilder insertSQL = new StringBuilder("INSERT INTO TEMP (")
.Append(names)
.Append(") VALUES (");

for (int k = 0; k < 19; k++)
{
insertSQL.Append(i + k).Append(",");
}
insertSQL.Append(i + 19).Append(")");
cmd.CommandText = insertSQL.ToString();
cmd.ExecuteNonQuery();
}
cmd.Dispose();
}
double elapsedTimeInSeconds = DateTime.Now.Subtract(start).TotalSeconds;
Console.WriteLine("Append took {0} seconds", elapsedTimeInSeconds);
return elapsedTimeInSeconds;
}

Note that I found no method in Access that allows a bulk insert.

I had then thought that maybe using a data table with a data adapter would be prove useful. Especially since I thought that I could do batch inserts using the UpdateBatchSize property of a data adapter. However, apparently only SQL Server and Oracle support that, and Access does not. And it took the longest time of 86 seconds. The code I used was:

public static double TestADONET_DataTable_TransferToAccess()
{
StringBuilder names = new StringBuilder();
StringBuilder values = new StringBuilder();
DataTable dt = new DataTable("TEMP");
for (int k = 0; k < 20; k++)
{
string fieldName = "Field" + (k + 1).ToString();
dt.Columns.Add(fieldName, typeof(int));
if (k > 0)
{
names.Append(",");
values.Append(",");
}
names.Append(fieldName);
values.Append("@" + fieldName);
}

DateTime start = DateTime.Now;
OleDbConnection conn = new OleDbConnection(Properties.Settings.Default.AccessDB);
conn.Open();
OleDbCommand cmd = new OleDbCommand();
cmd.Connection = conn;

cmd.CommandText = "DELETE FROM TEMP";
int numRowsDeleted = cmd.ExecuteNonQuery();
Console.WriteLine("Deleted {0} rows from TEMP", numRowsDeleted);

OleDbDataAdapter da = new OleDbDataAdapter("SELECT * FROM TEMP", conn);

da.InsertCommand = new OleDbCommand("INSERT INTO TEMP (" + names.ToString() + ") VALUES (" + values.ToString() + ")");
for (int k = 0; k < 20; k++)
{
string fieldName = "Field" + (k + 1).ToString();
da.InsertCommand.Parameters.Add("@" + fieldName, OleDbType.Integer, 4, fieldName);
}
da.InsertCommand.UpdatedRowSource = UpdateRowSource.None;
da.InsertCommand.Connection = conn;
//da.UpdateBatchSize = 0;

for (int i = 0; i < 100000; i++)
{
DataRow dr = dt.NewRow();
for (int k = 0; k < 20; k++)
{
dr["Field" + (k + 1).ToString()] = i + k;
}
dt.Rows.Add(dr);
}
da.Update(dt);
conn.Close();

double elapsedTimeInSeconds = DateTime.Now.Subtract(start).TotalSeconds;
Console.WriteLine("Append took {0} seconds", elapsedTimeInSeconds);
return elapsedTimeInSeconds;
}

Then I tried non-standard ways. First, I wrote out to a text file, and then used Automation to import that in. This was fast - 2.8 seconds - and tied for first place. But I consider this fragile for a number of reasons: Outputing date fields is tricky. I had to format them specially (someDate.ToString("yyyy-MM-dd HH:mm")), and then set up a special "import specification" that codes in this format. The import specification also had to have the "quote" delimiter set right. In the example below, with only integer fields, there was no need for an import specification.

Text files are also fragile for "internationalization" where there is a use of comma's for decimal separators, different date formats, possible the use of unicode.

Notice that the first record contains the field names so that the column order isn't dependent on the table, and that we used Automation to do the actual import of the text file.

public static double TestTextTransferToAccess()
{
StringBuilder names = new StringBuilder();
for (int k = 0; k < 20; k++)
{
string fieldName = "Field" + (k + 1).ToString();
if (k > 0)
{
names.Append(",");
}
names.Append(fieldName);
}

DateTime start = DateTime.Now;
StreamWriter sw = new StreamWriter(Properties.Settings.Default.TEMPPathLocation);

sw.WriteLine(names);
for (int i = 0; i < 100000; i++)
{
for (int k = 0; k < 19; k++)
{
sw.Write(i + k);
sw.Write(",");
}
sw.WriteLine(i + 19);
}
sw.Close();

ACCESS.Application accApplication = new ACCESS.Application();
string databaseName = Properties.Settings.Default.AccessDB
.Split(new char[] { ';' }).First(s => s.StartsWith("Data Source=")).Substring(12);

accApplication.OpenCurrentDatabase(databaseName, false, "");
accApplication.DoCmd.RunSQL("DELETE FROM TEMP");
accApplication.DoCmd.TransferText(TransferType: ACCESS.AcTextTransferType.acImportDelim,
TableName: "TEMP",
FileName: Properties.Settings.Default.TEMPPathLocation,
HasFieldNames: true);
accApplication.CloseCurrentDatabase();
accApplication.Quit();
accApplication = null;

double elapsedTimeInSeconds = DateTime.Now.Subtract(start).TotalSeconds;
Console.WriteLine("Append took {0} seconds", elapsedTimeInSeconds);
return elapsedTimeInSeconds;
}

Finally, I tried DAO. Lots of sites out there give huge warnings about using DAO. However, it turns out that it is simply the best way to interact between Access and .NET, especially when you need to write out large number of records. Also, it gives access to all the properties of a table. I read somewhere that it's easiest to program transactions using DAO instead of ADO.NET.

Notice that there are several lines of code that are commented. They will be explained soon.

public static double TestDAOTransferToAccess()
{

string databaseName = Properties.Settings.Default.AccessDB
.Split(new char[] { ';' }).First(s => s.StartsWith("Data Source=")).Substring(12);

DateTime start = DateTime.Now;
DAO.DBEngine dbEngine = new DAO.DBEngine();
DAO.Database db = dbEngine.OpenDatabase(databaseName);

db.Execute("DELETE FROM TEMP");

DAO.Recordset rs = db.OpenRecordset("TEMP");

DAO.Field[] myFields = new DAO.Field[20];
for (int k = 0; k < 20; k++) myFields[k] = rs.Fields["Field" + (k + 1).ToString()];

//dbEngine.BeginTrans();
for (int i = 0; i < 100000; i++)
{
rs.AddNew();
for (int k = 0; k < 20; k++)
{
//rs.Fields[k].Value = i + k;
myFields[k].Value = i + k;
//rs.Fields["Field" + (k + 1).ToString()].Value = i + k;
}
rs.Update();
//if (0 == i % 5000)
//{
//dbEngine.CommitTrans();
//dbEngine.BeginTrans();
//}
}
//dbEngine.CommitTrans();
rs.Close();
db.Close();

double elapsedTimeInSeconds = DateTime.Now.Subtract(start).TotalSeconds;
Console.WriteLine("Append took {0} seconds", elapsedTimeInSeconds);
return elapsedTimeInSeconds;
}

In this code, we created DAO.Field variables for each column (myFields[k]) and then used them. It took 2.8 seconds. Alternatively, one could directly access those fields as found in the commented line rs.Fields["Field" + (k + 1).ToString()].Value = i + k; which increased the time to 17 seconds. Wrapping the code in a transaction (see the commented lines) dropped that to 14 seconds. Using an integer index rs.Fields[k].Value = i + k; droppped that to 11 seconds. Using the DAO.Field (myFields[k]) and a transaction actually took longer, increasing the time to 3.1 seconds.

Lastly, for completeness, all of this code was in a simple static class, and the using statements are:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using ACCESS = Microsoft.Office.Interop.Access; // USED ONLY FOR THE TEXT FILE METHOD
using DAO = Microsoft.Office.Interop.Access.Dao; // USED ONLY FOR THE DAO METHOD
using System.Data; // USED ONLY FOR THE ADO.NET/DataTable METHOD
using System.Data.OleDb; // USED FOR BOTH ADO.NET METHODS
using System.IO; // USED ONLY FOR THE TEXT FILE METHOD

Writing to large number of records take too much time using Entity Framework

Entity Framework is not a good fit for bulk operations. Even when doing the Context.SaveChanges() at the end (or every 100 records or something).

using (var context = new MyContext()) 
{
// This might speed up things a little aswell
context.Configuration.AutoDetectChangesEnabled = false;

// As might this (if applicable for your situation)
context.Configuration.ValidateOnSaveEnabled = false;
}

For best performance, you might want to use something close to the database like a stored procedure or if it has to be in code, use ADO.NET directly.

SqlBulkCopy might be exactly what you want.

How can I bulk insert to MS-Access table using C# from gridview control?

My best guess is that dbChequeGrid.Rows[i].Cells[0].Value is null, and since there is no default value it is blowing up.

something like

    comm.Parameters.AddWithValue("@id", (dbChequeGrid.Rows[i].Cells[0].Value==null?"somedfaultvalue":dbChequeGrid.Rows[i].Cells[0].Value));

should do the trick for you. You could also do some error checking before doing the insert. Something like

   for (int i = 0; i < rowsCount; i++)
{
if(dbChequeGrid.Rows[i].Cells[0].Value != null){
string StrQuery = "INSERT INTO cust (clientname,clientaddress,clientfathername)VALUES (@id,@CourseName,@Credits)";
comm.CommandText = StrQuery;
comm.Parameters.AddWithValue("@id", dbChequeGrid.Rows[i].Cells[0].Value);
comm.Parameters.AddWithValue("@CourseName", dbChequeGrid.Rows[i].Cells[1].Value);
comm.Parameters.AddWithValue("@Credits", dbChequeGrid.Rows[i].Cells[2].Value);

if (comm.ExecuteNonQuery() > 0)
{
rowsInserted++;
}
comm.Parameters.Clear();
}else{
//do some sort of error handling
}
}//end of for loop

Either way would prevent NULLs from ever showing up in your @id parameter. I would check all of your parameters to see if they are correct before ever attempting to insert them into the database.

Using LINQ2SQL to insert a large number of records

If you are writing a large volume of homogeneous data, SqlBulkCopy may be a more appropriate tool, for example perhaps with CsvReader to read lines (since SqlBulkCopy can accept an IDataReader, which means you don't have to buffer all 30k lines into memory).

If the data is CSV, this can be as simple as:

using (CsvReader reader = new CsvReader(path))
using (SqlBulkCopy bcp = new SqlBulkCopy(CONNECTION_STRING))
{
bcp.DestinationTableName = "SomeTable";
bcp.WriteToServer(reader);
}

If the data is more complex (not-CSV), then SimpleDataReader might be useful - you just subclass it and add the code to represent your data per row.

What is the fastest way to insert 100 000 records into an MDB file in C#

If you happen to already have a "numbers table" available (with at least 100,000 rows) then Remou's answer will almost certainly get the job done fastest. I tried a quick test in VBA and the query

Dim t0 As Single
t0 = Timer
CurrentDb.Execute _
"INSERT INTO tblBooks (Title, Price, Tag, Author) " & _
"SELECT 'Dummy Text 1', 10, 'Dummy Text 2', 'Dummy Text 3' FROM Numbers", _
dbFailOnError
Debug.Print Format(Timer - t0, "0.0") & " seconds"

created the 100,000 rows in less than 2 seconds.

However, if you don't already have a numbers table then you would need to create that table first, so if this is a one-time requirement then you might be better off just optimizing your code.

The code as posted in your question took 45 seconds on my machine. Two enhancements that significantly reduced the execution time were:

  1. Use .Prepare(): that alone reduced the elapsed time to 16 seconds

  2. Use an OleDbTransaction: Wrapping the inserts in a transaction (in addition to using .Prepare()) further reduced the elapsed time to 10 seconds.

The modified code looks like this:

var sw = new System.Diagnostics.Stopwatch();
sw.Start();
OleDbConnection con = new OleDbConnection();
string dbProvider = "PROVIDER=Microsoft.Jet.OLEDB.4.0;";
string dbSource = "Data Source = C:/Users/Gord/Desktop/speed.mdb";
con.ConnectionString = dbProvider + dbSource;
con.Open();
OleDbCommand cmd = new OleDbCommand();
cmd.Connection = con;
cmd.CommandText = "INSERT INTO tblBooks (Title, Price, Tag, Author) VALUES (?,?,?,?)";
cmd.Parameters.Add("?", OleDbType.VarWChar, 255);
cmd.Parameters.Add("?", OleDbType.Currency);
cmd.Parameters.Add("?", OleDbType.VarWChar, 255);
cmd.Parameters.Add("?", OleDbType.VarWChar, 255);
cmd.Prepare();
cmd.Parameters[0].Value = "Dummy Text 1";
cmd.Parameters[1].Value = 10;
cmd.Parameters[2].Value = "Dummy Text 2";
cmd.Parameters[3].Value = "Dummy Text 3";
OleDbTransaction trn = con.BeginTransaction();
cmd.Transaction = trn;
for (int i = 0; i < 100000; i++)
{
cmd.ExecuteNonQuery();
}
trn.Commit();
con.Close();
sw.Stop();
Console.WriteLine(String.Format("{0:0.0} seconds", sw.ElapsedMilliseconds / 1000.0));

Insert 2 million rows into SQL Server quickly

You can try with SqlBulkCopy class.

Lets you efficiently bulk load a SQL Server table with data from
another source.

There is a cool blog post about how you can use it.



Related Topics



Leave a reply



Submit