Sqlite .Net Performance, How to Speed Up Things

SQLite .NET performance, how to speed up things?

You definitely need a transaction. If you don't, SQLite starts its own transaction for every insert command so you're effectively doing 86000 transactions as is.

It looks you're also opening and closing the connection each time, along with resetting the CommandText each time. This is unnecessary and doubtless slowing you down, it'll go much faster if you:

  • Open the connection once
  • Build the command once , adding the parameters to it once.
  • Start the transaction
  • Loop through, changing the parameter values only before calling ExecuteNonQuery
  • Commit the transaction.
  • Close the connection.

I think you could reduce your 20 minutes down to just a few seconds this way.

Edit: this is what I mean:

public void InsertItems()
{
SQLiteConnection connection = new SQLiteConnection(SomeConnectionString);
SQLiteCommand command = connection.CreateCommand();
SQLiteTransaction transaction = connection.BeginTransaction();

command.CommandText = "INSERT OR IGNORE INTO Result "
+ "(RunTag, TopicId, DocumentNumber, Rank, Score) " +
"VALUES (@RunTag, @TopicId, @DocumentNumber, @Rank, @Score)";

command.Parameters.AddWithValue("@RunTag", "");
command.Parameters.AddWithValue("@TopicId", "");
command.Parameters.AddWithValue("@DocumentNumber", "");
command.Parameters.AddWithValue("@Rank", "");
command.Parameters.AddWithValue("@Score", "");

foreach ( /* item to loop through and add to db */ )
{
InsertResultItem(runTag, topicId, documentNumber, rank, score, command);
}

transaction.Commit();
command.Dispose();
connection.Dispose();
}

public int InsertResultItem(string runTag, int topicId, string documentNumber, int rank, double score, SQLiteCommand command)
{
command.Parameters["@RunTag"].Value = runTag;
command.Parameters["@TopicId"].Value = topicId;
command.Parameters["@DocumentNumber"].Value = documentNumber;
command.Parameters["@Rank"].Value = rank;
command.Parameters["@Score"].Value = score;
return command.ExecuteNonQuery();
}

It only uses one connection, one transaction and one command, so all you're changing is the parameter values each time.

SQlite Part of C# Application takes too long

A few things I notice.

-You're never clearing out the list of parameters before you move onto the next insert statement. I'd run 'sqlite_cmd.Parameters.Clear()' in between each query that you're using parameters.

-You're running a ton of insert statements, a few 'create table' statements, and a 'delete everything from table' statement. These could all be done in a single transaction to greatly speed things up.

To use transactions, at the beginning run this SQL query.

BEGIN TRANSACTION

And after you're done your inserts run this query.

COMMIT TRANSACTION

Also, if you're new to using SQLite, this link might come in handy that lists all the possible SQLite commands and information on what they do.

http://www.sqlite.org/lang.html

Improve large data import performance into SQLite with C#

This is quite fast for 6 million records.

It seems that you are doing it the right way, some time ago I've read on sqlite.org that when inserting records you need to put these inserts inside transaction, if you don't do this your inserts will be limited to only 60 per second! That is because each insert will be treated as a separate transaction and each transaction must wait for the disk to rotate fully. You can read full explanation here:

http://www.sqlite.org/faq.html#q19

Actually, SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second. Transaction speed is limited by the rotational speed of your disk drive. A transaction normally requires two complete rotations of the disk platter, which on a 7200RPM disk drive limits you to about 60 transactions per second.

Comparing your time vs Average stated above: 50,000 per second => that should take 2m 00 sec. Which is only little faster than your time.

Transaction speed is limited by disk drive speed because (by default) SQLite actually waits until the data really is safely stored on the disk surface before the transaction is complete. That way, if you suddenly lose power or if your OS crashes, your data is still safe. For details, read about atomic commit in SQLite..

By default, each INSERT statement is its own transaction. But if you surround multiple INSERT statements with BEGIN...COMMIT then all the inserts are grouped into a single transaction. The time needed to commit the transaction is amortized over all the enclosed insert statements and so the time per insert statement is greatly reduced.

There is some hint in next paragraph that you could try to speed up the inserts:

Another option is to run PRAGMA synchronous=OFF. This command will cause SQLite to not wait on data to reach the disk surface, which will make write operations appear to be much faster. But if you lose power in the middle of a transaction, your database file might go corrupt.

I always thought that SQLite was designed for "simple things", 6 millions of records seems to me is a job for some real database server like MySQL.

Counting records in a table in SQLite with so many records can take long time, just for your information, instead of using SELECT COUNT(*), you can always use SELECT MAX(rowid) which is very fast, but is not so accurate if you were deleting records in that table.

EDIT.

As Mike Woodhouse stated, creating the index after you inserted the records should speed up the whole thing, that is a common advice in other databases, but can't say for sure how it works in SQLite.

Improve INSERT-per-second performance of SQLite

Several tips:

  1. Put inserts/updates in a transaction.
  2. For older versions of SQLite - Consider a less paranoid journal mode (pragma journal_mode). There is NORMAL, and then there is OFF, which can significantly increase insert speed if you're not too worried about the database possibly getting corrupted if the OS crashes. If your application crashes the data should be fine. Note that in newer versions, the OFF/MEMORY settings are not safe for application level crashes.
  3. Playing with page sizes makes a difference as well (PRAGMA page_size). Having larger page sizes can make reads and writes go a bit faster as larger pages are held in memory. Note that more memory will be used for your database.
  4. If you have indices, consider calling CREATE INDEX after doing all your inserts. This is significantly faster than creating the index and then doing your inserts.
  5. You have to be quite careful if you have concurrent access to SQLite, as the whole database is locked when writes are done, and although multiple readers are possible, writes will be locked out. This has been improved somewhat with the addition of a WAL in newer SQLite versions.
  6. Take advantage of saving space...smaller databases go faster. For instance, if you have key value pairs, try making the key an INTEGER PRIMARY KEY if possible, which will replace the implied unique row number column in the table.
  7. If you are using multiple threads, you can try using the shared page cache, which will allow loaded pages to be shared between threads, which can avoid expensive I/O calls.
  8. Don't use !feof(file)!

I've also asked similar questions here and here.

Improve performance of SQLite bulk inserts using Dapper ORM

So I finally found the trick to high performance bulk inserts in SQLite using .NET. This trick improved insert performance by a factor of 4.1! My total save time went from 27 seconds to 6.6 seconds. wow!

This article explains the fastest way to do bulk inserts into SQLite. The key is reusing the same parameter objects but for each record to insert, assigning a different value. The time that .NET takes constructing all those DbParameter objects really adds up. For example with 100k rows and 30 columns = 3 million parameter objects which must be created. Instead, creating and reusing just 30 parameter objects is much faster.

New performance:

  • 55,000 rows (19 columns) in .53 seconds = 100k inserts/second

        internal const string PeakResultsInsert = @"INSERT INTO PeakResult values(@Id,@PeakID,@QuanPeakID,@ISTDRetentionTimeDiff)";

    var command = cnn.CreateCommand();
    command.CommandText = BatchConstants.PeakResultsInsert;

    string[] parameterNames = new[]
    {
    "@Id",
    "@PeakID",
    "@QuanPeakID",
    "@ISTDRetentionTimeDiff"
    };

    DbParameter[] parameters = parameterNames.Select(pn =>
    {
    DbParameter parameter = command.CreateParameter();
    parameter.ParameterName = pn;
    command.Parameters.Add(parameter);
    return parameter;
    }).ToArray();

    foreach (var peakResult in peakResults)
    {
    parameters[0].Value = peakResult.Id;
    parameters[1].Value = peakResult.PeakID;
    parameters[2].Value = peakResult.QuanPeakID;
    parameters[3].Value = peakResult.ISTDRetentionTimeDiff;

    command.ExecuteNonQuery();
    }

It ends up that I could not use Dapper for inserting into my large tables. (For my small tables, I still use Dapper).

Note, some other things that I found:

  • I tried using multiple threads to insert data into the same database, this did not make any improvement. (didn't make a difference)

  • Upgraded from System.Data.Sqlite 1.0.69 to 1.0.79. (didn't make a difference in performance that I could see)

  • I am not assigning a Type to the DbParameter, it doesn't seem to make a performance difference either way.

  • For reads, I could not improve on Dapper's performance.



Related Topics



Leave a reply



Submit