How to Avoid "Using Temporary" in Many-To-Many Queries

How to avoid Using temporary in many-to-many queries?

Here's a simplified example I did for a similar performance related question sometime ago that takes advantage of innodb clustered primary key indexes (obviously only available with innodb !!)

  • http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
  • http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/

You have 3 tables: category, product and product_category as follows:

drop table if exists product;
create table product
(
prod_id int unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb;

drop table if exists category;
create table category
(
cat_id mediumint unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb;

drop table if exists product_category;
create table product_category
(
cat_id mediumint unsigned not null,
prod_id int unsigned not null,
primary key (cat_id, prod_id) -- **note the clustered composite index** !!
)
engine = innodb;

The most import thing is the order of the product_catgeory clustered composite primary key as typical queries for this scenario always lead by cat_id = x or cat_id in (x,y,z...).

We have 500K categories, 1 million products and 125 million product categories.

select count(*) from category;
+----------+
| count(*) |
+----------+
| 500000 |
+----------+

select count(*) from product;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+

select count(*) from product_category;
+-----------+
| count(*) |
+-----------+
| 125611877 |
+-----------+

So let's see how this schema performs for a query similar to yours. All queries are run cold (after mysql restart) with empty buffers and no query caching.

select
p.*
from
product p
inner join product_category pc on
pc.cat_id = 4104 and pc.prod_id = p.prod_id
order by
p.prod_id desc -- sry dont a date field in this sample table - wont make any difference though
limit 20;

+---------+----------------+
| prod_id | name |
+---------+----------------+
| 993561 | Product 993561 |
| 991215 | Product 991215 |
| 989222 | Product 989222 |
| 986589 | Product 986589 |
| 983593 | Product 983593 |
| 982507 | Product 982507 |
| 981505 | Product 981505 |
| 981320 | Product 981320 |
| 978576 | Product 978576 |
| 973428 | Product 973428 |
| 959384 | Product 959384 |
| 954829 | Product 954829 |
| 953369 | Product 953369 |
| 951891 | Product 951891 |
| 949413 | Product 949413 |
| 947855 | Product 947855 |
| 947080 | Product 947080 |
| 945115 | Product 945115 |
| 943833 | Product 943833 |
| 942309 | Product 942309 |
+---------+----------------+
20 rows in set (0.70 sec)

explain
select
p.*
from
product p
inner join product_category pc on
pc.cat_id = 4104 and pc.prod_id = p.prod_id
order by
p.prod_id desc -- sry dont a date field in this sample table - wont make any diference though
limit 20;

+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
| 1 | SIMPLE | pc | ref | PRIMARY | PRIMARY | 3 | const | 499 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | p | eq_ref | PRIMARY | PRIMARY | 4 | vl_db.pc.prod_id | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
2 rows in set (0.00 sec)

So that's 0.70 seconds cold - ouch.

Hope this helps :)

EDIT

Having just read your reply to my comment above it seems you have one of two choices to make:

create table articles_to_categories
(
article_id int unsigned not null,
category_id mediumint unsigned not null,
primary key(article_id, category_id), -- good for queries that lead with article_id = x
key (category_id)
)
engine=innodb;

or.

create table categories_to_articles
(
article_id int unsigned not null,
category_id mediumint unsigned not null,
primary key(category_id, article_id), -- good for queries that lead with category_id = x
key (article_id)
)
engine=innodb;

depends on your typical queries as to how you define your clustered PK.

How to avoid Using index; Using temporary; Using filesort in MySQL, 21 table JOIN

  • Using index (also type=index) -- This is a 'good' thing. It says that the work for that table can be done entirely in the BTree of the index, without touching the data.
  • Using temporary -- This says that at least one temp table is needed for the execution of the query. There are queries that absolutely must have a tmp table. So live with it. Note: the position of this phrase in the EXPLAIN does not indicate which table really needs the tmp or filesort. See EXPLAIN FORMAT=JSON SELECT ... for such detail.
  • Using filesort -- This says that at least one step required sorting. It does not say whether that sort had to hit the disk. Again, it may be absolutely necessary.
  • Using where -- does not mean much of anything
  • Using index condition -- This is different than "Using index". ICP means that a certain efficiency is being used for a complex WHERE; this is available only in newer versions.
  • eq_ref - One row in the next table is all that is needed. Good.
  • ref - Not 1:1, but 1:many. From the EXPLAIN, it seems to often be closer to 1:1.

As for speeding it up...

  • Do you need LEFT? If not get rid of it; the optimizer may prefer some other order of tables.
  • The EXPLAIN says that not many rows needed to be fetched. (Caution: Explain's "Row" numbers are approximates.) So, I don't see much help here.
  • See if you can make "covering indexes" for some of the JOINs -- but do this only for the "ref" cases, not the "eq_ref + PRIMARY" cases. It looks like you have already done a thorough job of this.
  • What will you do with 11000 rows? This seems 'unreasonable' for a UI? If it is destined for some further processing, how often is it done? (That is, does 10 seconds really matter?)
  • What Engine are you using? How much RAM do you have? SHOW VARIABLES LIKE '%buffer%'; -- I am fishing for memory utilization.
  • JOIN + GROUP BY -- This often means an explosion of rows, followed by an implosion. Changing some of the JOIN to subqueries in the SELECT may improve things: ( SELECT ... FROM ... WHERE ... LIMIT 1 ) AS whatever.
  • Are any of the fields TEXT? That forces a "filesort" out of memory (hence, slower). Give me more details and perhaps we can work around it.

In summary, the EXPLAIN looks pretty clean. The huge number of JOINs is another matter.

Two potentially bad schema designs:

  • Arrays across tables (*_price)
  • Arrays across columns (level*, sku*, user*, genre*)

Addenda

For 16GB of RAM and a mixture of InnoDB and MyISAM, suggest key_buffer_size = 1500M and innodb_buffer_pool_size = 5G. Set those in my.cnf (or my.ini) and restart mysqld.

Optimise query to avoid using temporary tables

Ok I found the solution !

Update the index like this :

KEY `pages_objects_likes_page_id_created_time_index` (`page_id`,`user_id`,`created_time`)

And update the query by inverting the group by statement :

GROUP BY c.page_id, c.user_id

The index is now used everywhere ;)

is it always bad to have Using temporary and filesort?

Do what Gordon says, but use

ORDER BY last_order_date DESC

order by date_created desc does not make sense.

It may switch to a table scan if the list is "too long". This may be a difference in EXPLAIN between MySQL and MariaDB. (The resultsets will be identical.)

If you do EXPLAIN FORMAT=JSON SELECT ..., you may find that there are two filesorts.

Back to your original question...

"filesort" and "using temporary" are necessary in some cases -- especially like your case. After it GROUPs the results, the ORDER BY calls for sorting in a way that was not specified by the GROUP BY. This necessitates storing the data and sorting it.

"FILEsort" is a misnomer. In most cases, the rows are sitting in RAM and can be very quickly sorted. For very large resultsets and other complex situations, a "temporary" "file" will actually be used.

The Optimizer turns your list of ORs into an IN like Gordon's answer. So, there is essentially no difference between the two ways of writing it. (I find IN to be cleaner and more concise.)

Using index condition means that InnoDB is taking on some of the work that the generic "Handler" normally does. (This is good, but not a big deal.) However, replacing INDEX(product_id) by INDEX(product_id, date_created) is likely to be even better because it is "covering", which will be indicated by Using index.

"I have index for both fields" -- That is not the same as the composite index I am recommending.

You say "100 primary keys", but I suspect you mean "secondary" keys. Please provide SHOW CREATE TABLE orders to discuss this.

I disagree with the old wives' tale: "should avoid a query with Using temporary; Using filesort". Those are just clues that you are doing something that needs such complexity. It can rarely be "avoided".

Slow MySQL query, EXPLAIN shows Using temporary; Using filesort

As already mentioned in a comment, the main cause of a slow query is that you seem to have single column indexes only, while you would need multi-column indexes to cover the joins, the filters, and the group by.

Also, your query has 2 other issues:

  1. Even though you group by on 2 fields only, several other fields are listed in the select list without being subject to an aggregate function, such as min(). MySQL does allow such queries to be run under certain sql mode settings, but they are still against the sql standard and may have unexpected side effects, unless you really know what your are doing.

  2. You have filters on the ppi_loan table in the join condition that is the left table in a left join. Due to the nature of the left join, these records will not be eliminated from the resultset, but MySQL will not join any values on them. These criteria should be moved to the where clause.

The indexes I would create:

  • ppi_sar_status: multi-column index on loanID, status, history fields - I would consider moving this to the join section because this table is not there

  • ppi_ques_status: multi-column index on loanID, status, timestamp fields - this would support both the subquery and the join. Remember, the subquery also has filesort in the explain.

  • ppi_loan: as a minimum a multi-column index on customerID, loan_number fields to support the group by clause, therefore avoiding the filesort as a minimum. You may consider adding the other fields in the join criteria based on their selectivity to this index.

I'm also not sure why you have the last 2 status tables in the join, since you are not retrieving any values from them. If you re using these tables to eliminate certain records, then consider using an exists() subquery instead of a join. In a join MySQL needs to fetch data from all joined tables, whereas in an exists() subquery it would only check if at least 1 record exists in the resultset without retrieving any actual data from the underlying tables.



Related Topics



Leave a reply



Submit