Speeding Up Large Numbers of MySQL Updates and Inserts

Speeding up large numbers of mysql updates and inserts

Some useful links:

  • 32 Tips To Speed Up Your MySQL Queries
  • Turn on MySQL query cache to speed up query performance?
  • Multiple Insert in Single Query – PHP/MySQL
  • 3 Ways to Speed Up MySQL

From MySQL Documentation:

Speed of INSERT Statements says:

  • If you are inserting many rows from the same client at the same time, use
    INSERT statements with multiple VALUES
    lists to insert several rows at a
    time. This is considerably faster
    (many times faster in some cases) than
    using separate single-row INSERT
    statements. If you are adding data to
    a nonempty table, you can tune the
    bulk_insert_buffer_size variable to
    make data insertion even faster.

  • If multiple clients are inserting a lot of rows, you can get higher speed
    by using the INSERT DELAYED statement.

  • For a MyISAM table, you can use concurrent inserts to add rows at the
    same time that SELECT statements are
    running, if there are no deleted rows
    in middle of the data file.

  • When loading a table from a text file, use LOAD DATA INFILE. This is
    usually 20 times faster than using
    INSERT statements.

  • With some extra work, it is possible to make LOAD DATA INFILE run even
    faster for a MyISAM table when the
    table has many indexes.

How to speed up MySQL INSERTs/UPDATEs?

It turns out the reason for the slowness was from the FileMaker side of things. Exporting the FileMaker records to a CSV and running INSERT/UPDATE commands resulted in very fast execution.

Speed up MySQL Update/Insert Statement

There's a bunch of performance problems here if you need to do this millions of times.

  • You're preparing the same SQL statement over and over again, millions of times. It would work better to prepare it once and execute it millions of times.

  • You're disconnecting from the database on every function call after a single query. That means you need to reconnect each time and any cached information is thrown away. Don't do that, leave it connected.

  • You're committing after each row. This will slow things down. Instead, commit after doing a batch.

  • The select + update or insert can probably be done as a single upsert.

  • That you're inserting so much into a temp table is probably a performance issue.

  • If the table has too many indexes that can slow inserts. Sometimes it's best to drop indexes, do a big batch update, and recreate them.

  • Because you're putting values directly into your SQL, your SQL is open to a SQL injection attack.


Instead...

  • Use prepared statements and bind parameters
  • Leave the database connected
  • Do updates in bulk
  • Only commit at the end of a run of updates
  • Do all the math in the UPDATE rather then SELECT + math + UPDATE.
  • Use an "UPSERT" instead of SELECT then UPDATE or INSERT

First off, prepared statements. These let MySQL compile the statement once and then reuse it. The idea is you write a statement with placeholders for the values.

select id, position, impressions, clicks, ctr
from temp
where profile_id=%s and
keyword=%s and
landing_page=%s

Then you execute that with the values as arguments, not as part of the string.

self.cursor.execute(
'select id, position, impressions, clicks, ctr from temp where profile_id=%s and keyword=%s and landing_page=%s',
(profile_id, keyword, landing_page)
)

This allows the database to cache the prepared statement and not have to recompile it each time. It also avoids a SQL injection attack where a clever attacker can craft a value that is actually more SQL like " MORE SQL HERE ". It is a very, very, very common security hole.

Note, you might need to use MySQL's own Python database library to get true prepared statements. Don't worry about it too much, using prepared statements is not your biggest performance problem.


Next, what you're basically doing is adding to an existing row, or if there is no existing row, inserting a new one. This can be done more efficiently in a single statement with an UPSERT, a combined INSERT and UPDATE. MySQL has it as INSERT ... ON DUPLICATE KEY UPDATE.

To see how this is done, we can write your SELECT then UPDATE as a single UPDATE. The calculations are done in the SQL.

    update temp
set impressions = impressions + %s,
clicks = clicks + %s,
ctr = (ctr + %s / 2)
where profile_id=%s and
keyword=%s and
landing_page=%s

Your INSERT remains the same...

    insert into temp
(profile_id, landing_page, keyword, position, impressions, clicks, ctr)
values (%s, %s, %s, %s, %s, %s, %s)

Combine them into one INSERT ON DUPLICATE KEY UPDATE.

    insert into temp
(profile_id, landing_page, keyword, position, impressions, clicks, ctr)
values (%s, %s, %s, %s, %s, %s, %s)
on duplicate key update
update temp
set impressions = impressions + %s,
clicks = clicks + %s,
ctr = (ctr + %s / 2)

This depends on what the keys of the table are defined as. If you have unique( profile_id, landing_page, keyword ) then it should work the same as your code.

Even if you can't do the upsert, you can eliminate the SELECT by trying the UPDATE, checking if it updated anything, and if it didn't doing an INSERT.


Do the updates in bulk. Instead of calling a subroutine which does one update and commits, pass it a big list of things to be updated and work on them in a loop. You can even take advantage of executemany to run the same statement with multiple values. Then commit.

You might be able to do the UPSERT in bulk. INSERT can take multiple rows at once. For example, this inserts three rows.

insert into whatever
(foo, bar, baz)
values (1, 2, 3),
(4, 5, 6),
(7, 8, 9)

You can likely do the same with your INSERT ON DUPLICATE KEY UPDATE reducing the amount of overhead to talk to the database. See this post for an example (in PHP, but you should be able to adapt).

This sacrifices returning the ID of the last inserted row, but them's the breaks.

What is the best way to achieve speedy inserts of large amounts of data in MySQL?

  • Use the mysqlimport tool or the LOAD DATA INFILE command.
  • Temporarily disable indices that you don't need for data integrity

Improving Speed of SQL 'Update' function - break into Insert/ Delete?

There are many possible 'answers' to your questions.

13/second -- a lot that can be done...

INSERT ... ON DUPLICATE KEY UPDATE ... ('IODKU') is usually the best way to do "update, else insert" (unless I don't know what you mean by it).

Batched inserts is much faster than inserting one row at a time. Optimal is around 100 rows giving 10x speedup. IODKU can (usually) be batched, too; see the VALUES() pseudo function.

BEGIN;...lots of writes...COMMIT; cuts back significantly on the overhead for transaction.

Using a "staging" table for gathering things up update can have a significant benefit. My blog discussing that. That also covers batch "normalization".

Building Summary Tables on the fly interferes with high speed data ingestion. Another blog covers Summary tables.

Normalization can be used for de-dupping, hence shrinking the disk footprint. This can be important for decreasing I/O for the 'Fact' table in Data Warehousing. (I am referring to your 20 x VARCHAR(50).)

RAID striping is a hardware help.

Batter-Backed-Write-Cache on a RAID controller makes writes seem instantaneous.

SSDs speed up I/O.

If you provide some more specifics (SHOW CREATE TABLE, SQL, etc), I can be more specific.



Related Topics



Leave a reply



Submit