Speeding up large numbers of mysql updates and inserts
Some useful links:
- 32 Tips To Speed Up Your MySQL Queries
- Turn on MySQL query cache to speed up query performance?
- Multiple Insert in Single Query – PHP/MySQL
- 3 Ways to Speed Up MySQL
Speed of INSERT Statements says:
If you are inserting many rows from the same client at the same time, use
INSERT statements with multiple VALUES
lists to insert several rows at a
time. This is considerably faster
(many times faster in some cases) than
using separate single-row INSERT
statements. If you are adding data to
a nonempty table, you can tune the
bulk_insert_buffer_size variable to
make data insertion even faster.If multiple clients are inserting a lot of rows, you can get higher speed
by using the INSERT DELAYED statement.For a MyISAM table, you can use concurrent inserts to add rows at the
same time that SELECT statements are
running, if there are no deleted rows
in middle of the data file.When loading a table from a text file, use LOAD DATA INFILE. This is
usually 20 times faster than using
INSERT statements.With some extra work, it is possible to make LOAD DATA INFILE run even
faster for a MyISAM table when the
table has many indexes.
How to speed up MySQL INSERTs/UPDATEs?
It turns out the reason for the slowness was from the FileMaker side of things. Exporting the FileMaker records to a CSV and running INSERT/UPDATE commands resulted in very fast execution.
Speed up MySQL Update/Insert Statement
There's a bunch of performance problems here if you need to do this millions of times.
You're preparing the same SQL statement over and over again, millions of times. It would work better to prepare it once and execute it millions of times.
You're disconnecting from the database on every function call after a single query. That means you need to reconnect each time and any cached information is thrown away. Don't do that, leave it connected.
You're committing after each row. This will slow things down. Instead, commit after doing a batch.
The select + update or insert can probably be done as a single upsert.
That you're inserting so much into a temp table is probably a performance issue.
If the table has too many indexes that can slow inserts. Sometimes it's best to drop indexes, do a big batch update, and recreate them.
Because you're putting values directly into your SQL, your SQL is open to a SQL injection attack.
Instead...
- Use prepared statements and bind parameters
- Leave the database connected
- Do updates in bulk
- Only commit at the end of a run of updates
- Do all the math in the
UPDATE
rather thenSELECT + math + UPDATE
. - Use an "UPSERT" instead of
SELECT
thenUPDATE
orINSERT
select id, position, impressions, clicks, ctr
from temp
where profile_id=%s and
keyword=%s and
landing_page=%s
Then you execute that with the values as arguments, not as part of the string.self.cursor.execute(
'select id, position, impressions, clicks, ctr from temp where profile_id=%s and keyword=%s and landing_page=%s',
(profile_id, keyword, landing_page)
)
This allows the database to cache the prepared statement and not have to recompile it each time. It also avoids a SQL injection attack where a clever attacker can craft a value that is actually more SQL like " MORE SQL HERE "
. It is a very, very, very common security hole.Note, you might need to use MySQL's own Python database library to get true prepared statements. Don't worry about it too much, using prepared statements is not your biggest performance problem.
Next, what you're basically doing is adding to an existing row, or if there is no existing row, inserting a new one. This can be done more efficiently in a single statement with an
UPSERT
, a combined INSERT
and UPDATE
. MySQL has it as INSERT ... ON DUPLICATE KEY UPDATE
.To see how this is done, we can write your SELECT then UPDATE
as a single UPDATE
. The calculations are done in the SQL.
update temp
set impressions = impressions + %s,
clicks = clicks + %s,
ctr = (ctr + %s / 2)
where profile_id=%s and
keyword=%s and
landing_page=%s
Your INSERT remains the same... insert into temp
(profile_id, landing_page, keyword, position, impressions, clicks, ctr)
values (%s, %s, %s, %s, %s, %s, %s)
Combine them into one INSERT ON DUPLICATE KEY UPDATE. insert into temp
(profile_id, landing_page, keyword, position, impressions, clicks, ctr)
values (%s, %s, %s, %s, %s, %s, %s)
on duplicate key update
update temp
set impressions = impressions + %s,
clicks = clicks + %s,
ctr = (ctr + %s / 2)
This depends on what the keys of the table are defined as. If you have unique( profile_id, landing_page, keyword )
then it should work the same as your code.Even if you can't do the upsert, you can eliminate the SELECT
by trying the UPDATE
, checking if it updated anything, and if it didn't doing an INSERT
.
Do the updates in bulk. Instead of calling a subroutine which does one update and commits, pass it a big list of things to be updated and work on them in a loop. You can even take advantage of
executemany
to run the same statement with multiple values. Then commit.You might be able to do the UPSERT
in bulk. INSERT
can take multiple rows at once. For example, this inserts three rows.
insert into whatever
(foo, bar, baz)
values (1, 2, 3),
(4, 5, 6),
(7, 8, 9)
You can likely do the same with your INSERT ON DUPLICATE KEY UPDATE
reducing the amount of overhead to talk to the database. See this post for an example (in PHP, but you should be able to adapt).This sacrifices returning the ID of the last inserted row, but them's the breaks.
What is the best way to achieve speedy inserts of large amounts of data in MySQL?
- Use the mysqlimport tool or the LOAD DATA INFILE command.
- Temporarily disable indices that you don't need for data integrity
Improving Speed of SQL 'Update' function - break into Insert/ Delete?
There are many possible 'answers' to your questions.
13/second -- a lot that can be done...
INSERT ... ON DUPLICATE KEY UPDATE ...
('IODKU') is usually the best way to do "update, else insert" (unless I don't know what you mean by it).
Batched inserts is much faster than inserting one row at a time. Optimal is around 100 rows giving 10x speedup. IODKU can (usually) be batched, too; see the VALUES()
pseudo function.
BEGIN;
...lots of writes...COMMIT;
cuts back significantly on the overhead for transaction.
Using a "staging" table for gathering things up update can have a significant benefit. My blog discussing that. That also covers batch "normalization".
Building Summary Tables on the fly interferes with high speed data ingestion. Another blog covers Summary tables.
Normalization can be used for de-dupping, hence shrinking the disk footprint. This can be important for decreasing I/O for the 'Fact' table in Data Warehousing. (I am referring to your 20 x VARCHAR(50)
.)
RAID striping is a hardware help.
Batter-Backed-Write-Cache on a RAID controller makes writes seem instantaneous.
SSDs speed up I/O.
If you provide some more specifics (SHOW CREATE TABLE
, SQL, etc), I can be more specific.
Related Topics
How to Rename a Filename After Uploading with PHP
Codeigniter Compatibility with PHP Version
How to Automatically Read in Calculated Values with PHPexcel
How to Sort a PHP Array by an Element Nested Inside
Pass Array to Where in Codeigniter Active Record
PHP Looping Through Multiple Arrays
How to Refresh Select2 Dropdown Menu After Ajax Loading Different Content
Count How Often a Particular Value Appears in an Array
Woocommerce Cart Quantity Base Discount
.Htaccess Rewrite: Subdomain as Get Var and Path as Get Var
In a PHP5 Class, When Does a Private Constructor Get Called
Change Xml Node Element Value in PHP and Save File
Phpexcel Get Formatted Date as Is Visible in Excel File
Codeigniter 3 - Access Session from Outside Codeigniter Installation
Resource Interpreted as Image But Transferred with Mime Type Text/HTML - Magento