Best /Fastest Way to Read an Excel Sheet into a Datatable

Best /Fastest way to read an Excel Sheet into a DataTable?

I have always used OLEDB for this, something like...

    Dim sSheetName As String
Dim sConnection As String
Dim dtTablesList As DataTable
Dim oleExcelCommand As OleDbCommand
Dim oleExcelReader As OleDbDataReader
Dim oleExcelConnection As OleDbConnection

sConnection = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\Test.xls;Extended Properties=""Excel 12.0;HDR=No;IMEX=1"""

oleExcelConnection = New OleDbConnection(sConnection)
oleExcelConnection.Open()

dtTablesList = oleExcelConnection.GetSchema("Tables")

If dtTablesList.Rows.Count > 0 Then
sSheetName = dtTablesList.Rows(0)("TABLE_NAME").ToString
End If

dtTablesList.Clear()
dtTablesList.Dispose()

If sSheetName <> "" Then

oleExcelCommand = oleExcelConnection.CreateCommand()
oleExcelCommand.CommandText = "Select * From [" & sSheetName & "]"
oleExcelCommand.CommandType = CommandType.Text

oleExcelReader = oleExcelCommand.ExecuteReader

nOutputRow = 0

While oleExcelReader.Read

End While

oleExcelReader.Close()

End If

oleExcelConnection.Close()

The ACE.OLEDB provider will read both .xls and .xlsx files and I have always found the speed quite good.

Importing Excel into a DataTable Quickly

Caling .Value2 is an expensive operation because it's a COM-interop call. I would instead read the entire range into an array and then loop through the array:

object[,] data = Range.Value2;

// Create new Column in DataTable
for (int cCnt = 1; cCnt <= Range.Columns.Count; cCnt++)
{
textBox3.Text = cCnt.ToString();

var Column = new DataColumn();
Column.DataType = System.Type.GetType("System.String");
Column.ColumnName = cCnt.ToString();
DT.Columns.Add(Column);

// Create row for Data Table
for (int rCnt = 1; rCnt <= Range.Rows.Count; rCnt++)
{
textBox2.Text = rCnt.ToString();

string CellVal = String.Empty;
try
{
cellVal = (string)(data[rCnt, cCnt]);
}
catch (Microsoft.CSharp.RuntimeBinder.RuntimeBinderException)
{
ConvertVal = (double)(data[rCnt, cCnt]);
cellVal = ConvertVal.ToString();
}

DataRow Row;

// Add to the DataTable
if (cCnt == 1)
{

Row = DT.NewRow();
Row[cCnt.ToString()] = cellVal;
DT.Rows.Add(Row);
}
else
{

Row = DT.Rows[rCnt + 1];
Row[cCnt.ToString()] = cellVal;

}
}
}

Get Excel data in DataTable

While the JET/Access driver can read Excel files as if they were databases, it has several problems, especially in the .NET Core era:

  1. It's Windows-only
  2. It needs to be installed. It can't be packaged with your application
  3. The installed version must match the bitness (32/64-bit) of any Office components. This in turn means that your application must match Office's bitness.

There are libraries that can read Excel files directly. One such option is ExcelDataReader, which opens a DbDataReader over an Excel sheet. It can handle both the obsolete xls format and the 16 year old xlsx format (yes, the "new" xlsx format was introduce in 2006, 16 years ago).

The generated data reader can be used to read the data or load a DataTable the same as any other data reader.

using (var stream = File.Open(filePath, FileMode.Open, FileAccess.Read))
{
using (var reader = ExcelReaderFactory.CreateReader(stream))
{
var table=new DataTable();
table.Load(reader);
...
}
}

ExcelDataReader has an extension that allows reading all sheets in a workbook into a DataSet, with one DataTable for each sheet.

using (var stream = File.Open(filePath, FileMode.Open, FileAccess.Read))
{
using (var reader = ExcelReaderFactory.CreateReader(stream))
{
var dataset= reader.AsDataSet();

// The result of each spreadsheet is in dataset.Tables
}
}

Faster way of storing Excel worksheet to system.data.datatable using C#

For others who are encountering the same issue, what I did was:

  • save the xml as an xlsx file
  • use oledb to read the xlsx file
  • store in dataset using OleDbAdapter (Fill() method)
  • bulk insert

Here is the code that I used to do this (change the connection string):

Stopwatch s = new Stopwatch();
s.Start();
string sSheetName = null;
string sConnection = null;
System.Data.DataTable sheetData = new System.Data.DataTable();
System.Data.DataTable dtTablesList = default(System.Data.DataTable);
OleDbConnection oleExcelConnection = default(OleDbConnection);
sConnection = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + @"C:\Users\YOURUSERNAME\Documents\Visual Studio 2012\Projects\TestXmlParser\TestXmlParser\bin\Debug\ConsolidatedSSMFiles.xlsx" + ";Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=1\"";
oleExcelConnection = new OleDbConnection(sConnection);
oleExcelConnection.Open();
dtTablesList = oleExcelConnection.GetSchema("Tables");

if (dtTablesList.Rows.Count > 0)
{
sSheetName = dtTablesList.Rows[0]["TABLE_NAME"].ToString();
}
dtTablesList.Clear();
dtTablesList.Dispose();

if (!string.IsNullOrEmpty(sSheetName))
{
OleDbDataAdapter sheetAdapter = new OleDbDataAdapter("select * from [TEST$]", oleExcelConnection);
sheetAdapter.Fill(sheetData);
} s.Stop();
var duration = s.Elapsed;


oleExcelConnection.Close();
dataGridView1.DataSource = sheetData;
MessageBox.Show(sheetData.Rows.Count.ToString()+"rows - "+ duration.ToString());

This reads 25000+ rows of excel data to a datable in approx. 1.9 to 2.0 seconds.

Read excel spreadsheet into a datatable

Here is example how to get all columns and rows from special Sheet from xlsx file. This code takes all data from Sheet2 from xlsx file and fill the DataTable with that values.

Hopefully this will help you.

using System;
using System.Data;
using System.Data.OleDb;

namespace ConsoleApp4
{
class Program
{
static void Main(string[] args)
{
DataTable rs = new DataTable();

using (var odConnection = new OleDbConnection(@"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=c:\Users\IIG\Desktop\test.xlsx;Extended Properties='Excel 12.0;HDR=YES;IMEX=1;';"))
{
odConnection.Open();

using (OleDbCommand cmd = new OleDbCommand())
{
cmd.Connection = odConnection;
cmd.CommandType = CommandType.Text;
cmd.CommandText = "SELECT * FROM [Sheet2$]";
using (OleDbDataAdapter oleda = new OleDbDataAdapter(cmd))
{
oleda.Fill(rs);
}
}
odConnection.Close();
}
foreach(DataRow row in rs.Rows)
{
foreach(object item in row.ItemArray)
{
Console.Write(item +"\t");
}
Console.WriteLine();
}
}
}
}

How to read all data rows in a specific Excel sheet into an ADO.NET Datatable with out using ODBC in C#

in case ms office is installed on the server, the fastest way would be to export the excel sheet into a CSV file and then using file IO to import it into your database.

you could open excel via COM Interop and export the workbook to CSV. others here on stackoverflow discussed this proceeding: Save an excel file to a csv file in C# code

i've got the experience that getting data out of excel directly via com interop is very slow. excel exports csv really fast instead.

[edit]
a very good office wrapper is NetOffice. It works similar like the COM Interop interface but with more speed and more function: http://netoffice.codeplex.com/

as you prefer stream reader, there's an excel binary reader on codeplex too: http://exceldatareader.codeplex.com/
[/edit]



Related Topics



Leave a reply



Submit