Insert Large Amount of Data Efficiently with Sql

Insert large amount of data efficiently with SQL

Use Oracle external tables.

See also e.g.

  • OraFaq about external tables
  • What Tom thinks about external tables
  • René Nyffenegger's notes about external tables

A simple example that should get you started

You need a file located in a server directory (get familiar with directory objects):

SQL> select directory_path from all_directories where directory_name = 'JTEST';

DIRECTORY_PATH
--------------------------------------------------------------------------------
c:\data\jtest

SQL> !cat ~/.gvfs/jtest\ on\ 192.168.xxx.xxx/exttable-1.csv
1,a
3,bsdf
4,sdkfj
5,something
129,else

Create an external table:

create table so13t (
id number(4),
data varchar2(20)
)
organization external (
type oracle_loader
default directory jtest /* jtest is an existing directory object */
access parameters (
records delimited by newline
fields terminated by ','
missing field values are null
)
location ('exttable-1.csv') /* the file located in jtest directory */
)
reject limit unlimited;

Now you can use all the powers of SQL to access the data:

SQL> select * from so13t order by data;

ID DATA
---------- ------------------------------------------------------------
1 a
3 bsdf
129 else
4 sdkfj
5 something

Efficiency when inserting huge data into a database

The most efficient would be SQL Bulk Insert.

To improve the performance further you can use SqlBulkCopy to SQL Server in Parallel.

Fastest way for inserting very large number of records into a Table in SQL

Use BULK INSERT - it is designed for exactly what you are asking and significantly increases the speed of inserts.

Also, (just in case you really do have no indexes) you may also want to consider adding an indexes - some indexes (most an index one on the primary key) may improve the performance of inserts.

The actual rate at which you should be able to insert records will depend on the exact data, the table structure and also on the hardware / configuration of the SQL server itself, so I can't really give you any numbers.

How to insert large amount of data into SQL Server 2008 with c# code?

Is there a db.closeconn(); somewhere after the try block that was pasted into the question? If not then that is a huge issue (i.e. to keep opening connections and not closing them, and that could explain why it freezes after opening 200+ of them). If there is a close connection method being called then great, but still, opening and closing the connection per each INSERT is unnecessary, let alone horribly inefficient.

At the very least you can:

  • define the query string, SqlParameters, and SqlCommand once
  • in the loop, set the parameter values and call ExecuteNonQuery();
  • (it is also preferred to not use AddWithValue() anyway)

Example:

// this should be in a try block

strSQL = "INSERT...";
db.openconn("MOMT_Report", "Report");
cmd = new SqlCommand(strSQL, db.cn);

SqlParameter _Rptdate = new SqlParameter("@Rptdate", DbType.Int);
cmd.Parameters.Add(_Rptdate);

...{repeat for remaining params}...

// optional begin transaction

for / while loop
{
_Rptdate.Value = Rptdate;
// set other param values
cmd.ExecuteNonQuery();
}

// if optional transaction was started, do commit

db.closeconn(); // this should be in a finally block

However, the fastest and cleanest way to get this data inserted is to use Table-Valued Parameters (TVPs) which were introduced in SQL Server 2008. You need to create a User-Defined Table Type (one time) to define the structure, and then you can use it in either an ad hoc insert like you current have, or pass to a stored procedure. But this way you don't need to export to a file just to import. There is no need for that additional steps.

Rather than copy/paste a large code block, I have noted three links below where I have posted the code to do this. The first two links are the full code (SQL and C#) to accomplish this. Each is a slight variation on the theme (which shows the flexibility of using TVPs). The third is another variation but not the full code as it just shows the differences from one of the first two in order to fit that particular situation. But in all 3 cases, the data is streamed from the app into SQL Server. There is no creating of any additional collection or external file; you use what you currently have and only need to duplicate the values of a single row at a time to be sent over. And on the SQL Server side, it all comes through as a populated Table Variable. This is far more efficient than taking data you already have in memory, converting it to a file (takes time and disk space) or XML (takes cpu and memory) or a DataTable (for SqlBulkCopy; takes cpu and memory) or something else, only to rely on an external factor such as the filesystem (the files will need to be cleaned up, right?) or need to parse out of XML.

  • How can I insert 10 million records in the shortest time possible?
  • Pass Dictionary<string,int> to Stored Procedure T-SQL
  • Storing a Dictionary<int,string> or KeyValuePair in a database

MSSQL 2017 - Inserting large data sets efficiently

Use set based operations instead of line by line (or cursor/foreach) statements. You are currently executing a stored procedure that executes another 4 stored procedures inside a transaction, just to insert 4 lines in 4 different tables. Executing this by thousands of times is extremely slow compared to a set base solution, in which you do all rows at the same time, without sacrificing your business rules.

Use a Bulk Insert command (from your C# app) into a Staging Table, which is a table to hold temporal data before inserting it on your final tables. You would have 4 different staging tables in your case.

These staging tables will have no constraints, keys, triggers or any other mechanism that might delay the insert operation. Bulk inserts work very fast on such tables.

After you insert your CSV on the staging tables, you can use SQL to validate and insert the records on your final table (on a set basis, not 1 by 1). You can create or enable (rebuild) indexes on your staging tables if you need to do joins or particular filters on certain columns.

Staging tables could be volatile, so you may truncate them on each run if you need.

Better way to insert large number of rows in sql server using java?

It depends but you will need to use batch and call execute batch after something like 50-100 rows. This should be the most effective way.. you can try running with diffrent number because this also depends on your db instance.

something lime that:

 ...
pstmt.setString(12, temp.get(i).get(11).toString());
pstmt.setString(13, temp.get(i).get(12).toString());
pstmt.setString(14, temp.get(i).get(13).toString());
pstmt.addBatch();
if ((i + 1) % 100 == 0) {
pstmt.executeBatch(); // Execute every 100 items.
}

EDIT:

In addition if this won't be good enough i would suggest to try this:

if ((i + 1) % 100 == 0) {
pstmt.executeBatch(); // Execute every 100 items.
psmt.clearParameters();
psmt.clearBatch();
}

There is also tuning in the DB level, you can try to read this, it might help but this need better understanding of sql server

https://blogs.datadirect.com/2012/05/how-to-bulk-insert-jdbc-batches-into-microsoft-sql-server-oracle-sybase.html

What is the most efficient way to insert a large number of rows from MS SQL to MySQL?

The problem is that the table you are selecting from is on the local server and the table you are inserting to is on the remote server. As such the linked server is going to have to translate each row into a INSERT INTO Table (Field1, Field2) VALUES ('VALUE1','VALUE2') or similar on the MySQL server. What you could do is to keep a checksum on each row in the SQL server. Instead of truncating and reinserting the entire table you can simply delete and reinsert changed and new records. Unless most of your records change every day this should cut the amount of data you have to transfer down enourmously without having to mess about exporting and reimporting text files.

What is an efficient way to insert large amounts of data into a MySQL table using python?

As mention abouve, MySQL dump is one option.
if you still want to user python, you should use cursor.executemany(operation, seq_of_params), it is more efficient way to insert a lot of values to a table.



Related Topics



Leave a reply



Submit