Is Innodb Sorting Really That Slow

Is InnoDB sorting really THAT slow?

That is a large result set (66,424 rows) that MySQL must manually sort. Try adding an index to metaward_achiever.modified.

There is a limitation with MySQL 4.x that only allows MySQL to use one index per table. Since it is using the index on metaward_achiever.award_id column for the WHERE selection, it cannot also use the index on metaward_achiever.modified for the sort. I hope you're using MySQL 5.x, which may have improved this.

You can see this by doing explain on this simplified query:

SELECT * FROM `metaward_achiever` 
WHERE `metaward_achiever`.`award_id` = 1507
ORDER BY `metaward_achiever`.`modified` DESC
LIMIT 100

If you can get this using the indexes for both the WHERE selection and sorting, then you're set.

You could also create a compound index with both metaward_achiever.award_id and metaward_achiever. If MySQL doesn't use it, then you can hint at it or remove the one on just award_id.

Alternatively, if you can get rid of metaward_achiever.id and make metaward_achiever.award_id your primary key and add a key on metaward_achiever.modified, or better yet make metaward_achiever.award_id combined with metaward.modified your primary key, then you'll be really good.

You can try to optimize the file sorting by modifying settings. Unfortunately, I'm not experienced with this, as our DBA handles the configuration, but you might want to check out this great blog:
http://www.mysqlperformanceblog.com/

Here's an article about filesort in particular:
http://s.petrunia.net/blog/?p=24

MySQL InnoDB indexes slowing down sorts

In the slow case, MySQL is making an assumption that the index on STATUS will greatly limit the number of users it has to sort through. MySQL is wrong. Presumably most of your users are ACTIVE. MySQL is picking up 50k user rows, checking their ACCESS_ID, joining to MIGHT_FLOCK, sorting the results and taking the first 100 (out of 50k).

In the fast case, you have told MySQL it can't use either index on USERS. MySQL is using its next-best index, it is taking the first 100 rows from MIGHT_FLOCK using the STREAK index (which is already sorted), then joining to USERS and picking up the user rows, then checking that your users are ACTIVE and have an ACCESS_ID at or above 8. This is much faster because only 100 rows are read from disk (x2 for the two tables).

I would recommend:

  • drop the index on STATUS unless you frequently need to retrieve INACTIVE users (not ACTIVE users). This index is not helping you.
  • Read this question to understand why your sorts are so slow. You can probably tune InnoDB for better sort performance to prevent these kind of problems.
  • If you have very few users with ACCESS_ID at or above 8 you should see a dramatic improvement already. If not you might have to use STRAIGHT_JOIN in your select clause.

Example below:

SELECT *
FROM MIGHT_FLOCK mf
STRAIGHT_JOIN USERS u ON (u.USER_ID = mf.USER_ID)
WHERE u.STATUS = 'ACTIVE' AND u.ACCESS_ID >= 8 ORDER BY mf.STREAK DESC LIMIT 0,100

STRAIGHT_JOIN forces MySQL to access the MIGHT_FLOCK table before the USERS table based on the order in which you specify those two tables in the query.

To answer the question "Why did the behaviour change" you should start by understanding the statistics that MySQL keeps on each index: http://dev.mysql.com/doc/refman/5.6/en/myisam-index-statistics.html. If statistics are not up to date or if InnoDB is not providing sufficient information to MySQL, the query optimiser can (and does) make stupid decisions about how to join tables.

Why is InnoDB so painfully slow on full table scans even though entire data is in buffer pool?

There is quite a bit of overhead in the interface between InnoDB and the SQL interpreter of the MySQL or MariaDB server.

In InnoDB, each access must be protected by a buffer pool page latch. A mini-transaction object will keep track of the acquired latches. Basically, for every fetched row, InnoDB will start a mini-transaction, look up the B-tree leaf page in the buffer pool, acquire the page latch, copy the data, and finally commit the mini-transaction and release the page latch.

There are a couple of optimizations on top of this, but this is insufficient, and it would be better to implement MDEV-16232 to allow a mini-transaction to persist across the entire range scan. In that way, we would only acquire and release page latches when advancing to the next page.

In range scans, a persistent cursor (btr_pcur_t) will store the current position. When the cursor position is restored at the start of the next mini-transaction (to fetch the next record), an optimistic restore will be attempted, with the assumption that the old pointer to the buffer pool page is still valid.

InnoDB also implements a prefetch buffer. After 4 next-record read operations, InnoDB will copy 8 records at a time to the buffer, within a single mini-transaction. Subsequent requests will then be satisfied from this buffer. This mechanism would be made redundant by MDEV-16232 and should be removed as part of implementing it.

Implementing MDEV-16232 would also speed up UPDATE and DELETE operations, by removing the need to acquire explicit record locks. If we continuously hold the page latch for the whole duration of deleting or updating a row, we can rely on implicit locking whenever no conflicts exist, just like we do in the INSERT case.

Why is MySQL InnoDB insert so slow?

InnoDB doesn't cope well with 'random' primary keys. Try a sequential key or auto-increment, and I believe you'll see better performance. Your 'real' key field could still be indexed, but for a bulk insert you might be better off dropping and recreating that index in one hit after the insert in complete. Would be interested to see your benchmarks for that!

Some related questions

  • Slow INSERT into InnoDB table with random PRIMARY KEY column's value
  • Why do MySQL InnoDB inserts / updates on large tables get very slow when there are a few indexes?
  • InnoDB inserts very slow and slowing down

mysql is slow with InnoDB during insert compared to MYISAM

As requested, the logging on commit level often causes a lot of disk-stress and by that reducing the throughput of data on mysql instances with inno for a great deal.

Setting your mysql.ini to innodb_flush_log_at_trx_commit = 0 (or 2) does often solve this issue.

Plese note, ACID rules would love that value to be at 1...

Are full count queries really so slow on a large MySQL InnoDB tables?

In addition to what Bill says...

Smallest index

InnoDB picks the 'smallest' index for doing COUNT(*). It could be that all of the indexes of communication are bigger than the smallest of transaction, hence the time difference. When judging the size of an index, include the PRIMARY KEY column(s) with any secondary index:

PRIMARY KEY(id),   -- INT (4 bytes)
INDEX(flag), -- TINYINT (1 byte)
INDEX(name), -- VARCHAR(255) (? bytes)

For measuring size, the PRIMARY KEY has big since it includes (due to clustering) all the columns of the table. INDEX(flag) is "5 bytes". INDEX(name) probably averages a few dozen bytes. SELECT COUNT(*) will clearly pick INDEX(flag).

Apparently transaction has a 'small' index, but communication does not.

TEXT/BLOG columns are sometimes stored "off-record". Hence, they do not count in the size of the PK index.

Query Cache

If the "Query cache" is turned on, the second running of a query may be immensely faster than the first. But that is only if there were no changes to the table in the mean time. Since any change to the table invalidates all QC entries for that table, the QC is rarely useful in production systems. By "faster" I mean on the order of 0.001 seconds; not 1.44 seconds.

The difference between 1m38s and 1.44s is probably due to what was cached in the buffer_pool -- the general caching area for InnoDB. The first run probably found none of the 'smallest' index in RAM so it did a lot of I/O, taking 98 seconds to fetch all 4.5M rows of that index. The second run found all that data cached in the buffer_pool, so it ran at CPU speed (no I/O), hence much faster.

Good Enough

In situations like this, I question the necessity of doing the COUNT(*) at all. Notice how you said "2.8 mio entries", as if 2 significant digits was "good enough". If you are displaying the count to users on a UI, won't that be "good enough"? If so, one solution to the performance is to do the count once a day and store it some place. This would allow instantaneous access to a "good enough" value.

There are other techniques. One is to keep the counter updated, either with active code, or with some form of Summary Table.

Throwing hardware at it

You already found that changing the hardware did not help.

  • The 98s was as fast as any of RDS's I/O offerings can run.
  • The 1.44s was as fast as any one RDS CPU can run.
  • MySQL (and its variants) do not use more than one CPU per query.
  • You had enough RAM so the entire 'small' index would fit in the buffer_pool until your second SELECT COUNT(*).. (Too little RAM would have led the second running to be very slow.)

MYSQL, very slow order by

Seems you're suffering from MySQL's inability to do late row lookups:

  • MySQL ORDER BY / LIMIT performance: late row lookups
  • Late row lookups: InnoDB

Try this:

SELECT  p.*, u.*
FROM (
SELECT id
FROM photo
ORDER BY
uploaddate DESC, id DESC
LIMIT 10
OFFSET 100000
) pi
JOIN photo p
ON p.id = pi.id
JOIN user u
ON u.user_id = p.user_id

InnoDB inserts very slow and slowing down

InnoDB provides more complex keys structure than MyIsam (FOREIGN KEYS) and regenerating keys is really slow in InnoDB. You should enclose all update/insert statements into one transactions (those are actually quite fast in InnoDB, once I had about 300 000 insert queries on InnoDb table with 2 indexes and it took around 30 minutes, once I enclosed every 10 000 inserts into BEGIN TRANSACTION and COMMIT it took less than 2 minutes).

I recommend to use:

BEGIN TRANSACTION;
SELECT ... FROM products;
UPDATE ...;
INSERT INTO ...;
INSERT INTO ...;
INSERT INTO ...;
COMMIT;

This will cause InnoDB to refresh indexes just once not few hundred times.

Let me know if it worked

Why is myisam slower than Innodb

Executive Summary: Use InnoDB, and change the my.cnf settings accordingly.

Details:

"MyISAM is faster" -- This is an old wives' tale. Today, InnoDB is faster in most situations.

Assuming you have at least 4GB of RAM...

  • If all-MyISAM, key_buffer_size should be about 20% of RAM; innodb_buffer_pool_size should be 0.
  • If all-InnoDB, key_buffer_size should be, say, only 20MB; innodb_buffer_pool_size should be about 70% of RAM.
  • If a mixture, do something in between. More discussion.

Let's look at how things are handled differently by the two Engines.

  • MyISAM puts the entire BLOB 'inline' with the other columns.
  • InnoDB puts most or all of each blob in other blocks.

Conclusion:

A table scan in a MyISAM table spends a lot of time stepping over cow paddies; InnoDB is much faster if you don't touch the BLOB.

This makes InnoDB a clear winner for SELECT SUM(x) FROM tbl; when there is no index on x. With INDEX(x), either engine will be fast.

Because of the BLOB being inline, MyISAM has fragmentation issues if you update records in the table; InnoDB has much less fragmentation. This impacts all operations, making InnoDB the winner again.

The order of the columns in the CREATE TABLE has no impact on performance in either engine.

Because the BLOB dominates the size of each row, the tweaks to the other columns will have very little impact on performance.

If you decide to go with MyISAM, I would recommend a 'parallel' table ('vertical partitioning'). Put the BLOB and the id in it a separate table. This would help MyISAM come closer to InnoDB's model and performance, but would add complexity to your code.

For "point queries" (looking up a single row via an index), there won't be much difference in performance between the engines.

Your my.cnf seems antique; set-variable has not been necessary in a long time.



Related Topics



Leave a reply



Submit