Sqlite Insert Very Slow

SQLite Insert very slow?

Wrap BEGIN \ END statements around your bulk inserts. Sqlite is optimized for transactions.

dbcon = new SQLiteConnection(connectionString);
dbcon.Open();

SQLiteCommand sqlComm;
sqlComm = new SQLiteCommand("begin", dbcon);
sqlComm.ExecuteNonQuery();
//---INSIDE LOOP

sqlComm = new SQLiteCommand(sqlQuery, dbcon);

nRowUpdatedCount = sqlComm.ExecuteNonQuery();

//---END LOOP
sqlComm = new SQLiteCommand("end", dbcon);
sqlComm.ExecuteNonQuery();
dbcon.close();

SQLite insert speed slows as number of records increases due to an index

If your requirement is to find a particular z_id and the x_ids and y_ids linked to it (as distinct from quickly selecting a range of z_ids) you could look into a non-indexed hash-table nested-relational db that would allow you to instantly find your way to a particular z_id in order to get its y_ids and x_ids -- without the indexing overhead and the concomitant degraded performance during inserts as the index grows. In order to avoid clumping (aka bucket collisions), choose a key hashing algorithm that puts greatest weight on the digits of z_id with greatest variation (right-weighted).

P.S. A database that uses a b-tree may at first appear faster than a db that uses linear hashing, say, but the insert performance will remain level with the linear hash as performance on the b-tree begins to degrade.

P.P.S. To answer @kawing-chiu's question: the core feature relevant here is that such a database relies on so-called "sparse" tables in which the physical location of a record is determined by a hashing algorithm which takes the record key as input. This approach permits a seek directly to the record's location in the table without the intermediary of an index. As there is no need to traverse indexes or to re-balance indexes, insert-times remain constant as the table becomes more densely populated. With a b-tree, by contrast, insert times degrade as the index tree grows. OLTP applications with large numbers of concurrent inserts can benefit from such a sparse-table approach. The records are scattered throughout the table. The downside of records being scattered across the "tundra" of the sparse table is that gathering large sets of records which have a value in common, such as a postal code, can be slower. The hashed sparse-table approach is optimized to insert and retrieve individual records, and to retrieve networks of related records, not large sets of records that have some field value in common.

A nested relational database is one that permits tuples within a column of a row.

Why might SQLite insert be very slow? (Transactions used)

I don't know why pst deleted his answer so I'll re-post the same information from it as this appears to be the correct answer.

According to the SQLite FAQ - INSERT is really slow - I can only do few dozen INSERTs per second

Actually, SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second

...

By default, each INSERT statement is its own transaction. But if you surround multiple INSERT statements with BEGIN...COMMIT then all the inserts are grouped into a single transaction.

So basically you need to group your INSERTs into fewer transactions.

Update: So the problem is probably mostly due to the sheer size of the SQL script - SQLite needs to parse the entire script before it can execute, but the parser will be designed to parse small scripts not massive ones! This is why you are seeing so much time spent in the SQLite3.Prepare method.

Instead you should use a parameterised query and insert records in a loop in your C# code, for example if your data was in CSV format in your text file you could use something like this:

using (TransactionScope txn = new TransactionScope())
{
using (DbCommand cmd = Banco.GetSqlStringCommand(sql))
{
string line = null;
while ((line = reader.ReadLine()) != null)
{
// Set the parameters for the command at this point based on the current line
Banco.ExecuteNonQuery(cmd);
txn.Complete();
}
}
}

QT Creator + SQLite. Insert Very slow

Batch several INSERT operations into a single transaction.

Doing the inserts individually is going to limit your throughput to about 60 inserts per second, due to the hard drive disk platters having to rotate completely around while SQLite does a read-after-write verification.

Further Reading
INSERT is really slow - I can only do few dozen INSERTs per second

Why is SQLite so slow (~2 q/s) on a specific machine?

I have done a similar test on a Linux 64bit machine using strace -C -tt to have an idea of where SQLite3 is taking time.

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
99.03 0.004000 32 124 fsync
0.64 0.000026 0 222 mprotect
0.32 0.000013 0 216 munmap

The obvious delay is in the fsync function, which is:

  • configurable
  • depends on general disk I/O (check out iotop, iostat)
  • heavily depends on IOSS (therefore, the file system and disk allocation - you might get one value on ext3, a different one on xfs, and a third one on btrfs)
  • depends of course, indirectly, on underlying hardware and its quirks or tunings.

By turning syncing off, my SQLite3 performance increases by a factor of around three thousand:

$db = new PDO('sqlite:test.db');

$db->exec('pragma synchronous = off;');

I too have two different values on two very similar machines (one has ext4, the other XFS, but I'm not positive this is the main reason - their load profiles are also different).

By the way, using prepared statements just about doubles the execution speed at the fastest level (from 45k to 110k INSERTs, in batches of 3000 since at that speed 30 INSERTs are bound to give spurious timings), and raises the lowest speed from about 6 to about 150.

So this (using prepared statements) might be a good solution to improve repeated operations without touching file synchronization, i.e., while still being demonstrably sure that data risk level remains the same. After that I'd try transactions or fsync (maybe even memory journaling) depending on the risk and worth of a data outage.

When designing a system from the ground up, some tests on different FS's are surely advisable.

Tests on different file systems (same machine)

ext4 (acl,user_xattr,data=order)         5.5 queries/s
using transactions 170 queries/s
disabling fsync 16000 queries/s
using transactions and disabling fsync 47200 queries/s

On a temporary file system, fsync is cheap, so turning it off yields little benefit. Most of the time is spent guarding, so transactions are key.

tmpfs                                  13700 queries/s
disabling fsync 15350 queries/s
enabling transactions 47900 queries/s
using transactions and disabling fsync 48200 queries/s

Of course, proper data organization and indexing has to be taken into account and, for large data sets, might well turn out to be more important.


UPDATE: to squeeze some more performance, one can also put the SQLite journal into memory with pragma journal_mode=MEMORY;

Also, you can tell ext3/4 not to bother updating atime on SQLite databases (this hugely depends on the implementation, though). You can try adding noatime to the file system where the database resides, and if it works, you can put it into /etc/fstab (you can also use relatime instead of the more extreme noatime:

sudo mount /var -o remount,noatime


Related Topics



Leave a reply



Submit