How to avoid Using temporary in many-to-many queries?
Here's a simplified example I did for a similar performance related question sometime ago that takes advantage of innodb clustered primary key indexes (obviously only available with innodb !!)
- http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
- http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/
You have 3 tables: category, product and product_category as follows:
drop table if exists product;
create table product
(
prod_id int unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb;
drop table if exists category;
create table category
(
cat_id mediumint unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb;
drop table if exists product_category;
create table product_category
(
cat_id mediumint unsigned not null,
prod_id int unsigned not null,
primary key (cat_id, prod_id) -- **note the clustered composite index** !!
)
engine = innodb;
The most import thing is the order of the product_catgeory clustered composite primary key as typical queries for this scenario always lead by cat_id = x or cat_id in (x,y,z...).
We have 500K categories, 1 million products and 125 million product categories.
select count(*) from category;
+----------+
| count(*) |
+----------+
| 500000 |
+----------+
select count(*) from product;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
select count(*) from product_category;
+-----------+
| count(*) |
+-----------+
| 125611877 |
+-----------+
So let's see how this schema performs for a query similar to yours. All queries are run cold (after mysql restart) with empty buffers and no query caching.
select
p.*
from
product p
inner join product_category pc on
pc.cat_id = 4104 and pc.prod_id = p.prod_id
order by
p.prod_id desc -- sry dont a date field in this sample table - wont make any difference though
limit 20;
+---------+----------------+
| prod_id | name |
+---------+----------------+
| 993561 | Product 993561 |
| 991215 | Product 991215 |
| 989222 | Product 989222 |
| 986589 | Product 986589 |
| 983593 | Product 983593 |
| 982507 | Product 982507 |
| 981505 | Product 981505 |
| 981320 | Product 981320 |
| 978576 | Product 978576 |
| 973428 | Product 973428 |
| 959384 | Product 959384 |
| 954829 | Product 954829 |
| 953369 | Product 953369 |
| 951891 | Product 951891 |
| 949413 | Product 949413 |
| 947855 | Product 947855 |
| 947080 | Product 947080 |
| 945115 | Product 945115 |
| 943833 | Product 943833 |
| 942309 | Product 942309 |
+---------+----------------+
20 rows in set (0.70 sec)
explain
select
p.*
from
product p
inner join product_category pc on
pc.cat_id = 4104 and pc.prod_id = p.prod_id
order by
p.prod_id desc -- sry dont a date field in this sample table - wont make any diference though
limit 20;
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
| 1 | SIMPLE | pc | ref | PRIMARY | PRIMARY | 3 | const | 499 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | p | eq_ref | PRIMARY | PRIMARY | 4 | vl_db.pc.prod_id | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
2 rows in set (0.00 sec)
So that's 0.70 seconds cold - ouch.
Hope this helps :)
EDIT
Having just read your reply to my comment above it seems you have one of two choices to make:
create table articles_to_categories
(
article_id int unsigned not null,
category_id mediumint unsigned not null,
primary key(article_id, category_id), -- good for queries that lead with article_id = x
key (category_id)
)
engine=innodb;
or.
create table categories_to_articles
(
article_id int unsigned not null,
category_id mediumint unsigned not null,
primary key(category_id, article_id), -- good for queries that lead with category_id = x
key (article_id)
)
engine=innodb;
depends on your typical queries as to how you define your clustered PK.
How to avoid Using index; Using temporary; Using filesort in MySQL, 21 table JOIN
- Using index (also type=index) -- This is a 'good' thing. It says that the work for that table can be done entirely in the BTree of the index, without touching the data.
- Using temporary -- This says that at least one temp table is needed for the execution of the query. There are queries that absolutely must have a tmp table. So live with it. Note: the position of this phrase in the
EXPLAIN
does not indicate which table really needs the tmp or filesort. SeeEXPLAIN FORMAT=JSON SELECT ...
for such detail. - Using filesort -- This says that at least one step required sorting. It does not say whether that sort had to hit the disk. Again, it may be absolutely necessary.
- Using where -- does not mean much of anything
- Using index condition -- This is different than "Using index". ICP means that a certain efficiency is being used for a complex
WHERE
; this is available only in newer versions. - eq_ref - One row in the next table is all that is needed. Good.
- ref - Not 1:1, but 1:many. From the
EXPLAIN
, it seems to often be closer to 1:1.
As for speeding it up...
- Do you need
LEFT
? If not get rid of it; the optimizer may prefer some other order of tables. - The
EXPLAIN
says that not many rows needed to be fetched. (Caution: Explain's "Row" numbers are approximates.) So, I don't see much help here. - See if you can make "covering indexes" for some of the
JOINs
-- but do this only for the "ref" cases, not the "eq_ref + PRIMARY" cases. It looks like you have already done a thorough job of this. - What will you do with 11000 rows? This seems 'unreasonable' for a UI? If it is destined for some further processing, how often is it done? (That is, does 10 seconds really matter?)
- What Engine are you using? How much RAM do you have?
SHOW VARIABLES LIKE '%buffer%';
-- I am fishing for memory utilization. JOIN
+GROUP BY
-- This often means an explosion of rows, followed by an implosion. Changing some of theJOIN
to subqueries in theSELECT
may improve things:( SELECT ... FROM ... WHERE ... LIMIT 1 ) AS whatever
.- Are any of the fields
TEXT
? That forces a "filesort" out of memory (hence, slower). Give me more details and perhaps we can work around it.
In summary, the EXPLAIN
looks pretty clean. The huge number of JOINs
is another matter.
Two potentially bad schema designs:
- Arrays across tables (*_price)
- Arrays across columns (level*, sku*, user*, genre*)
Addenda
For 16GB of RAM and a mixture of InnoDB and MyISAM, suggest key_buffer_size = 1500M
and innodb_buffer_pool_size = 5G
. Set those in my.cnf
(or my.ini
) and restart mysqld
.
Optimise query to avoid using temporary tables
Ok I found the solution !
Update the index like this :
KEY `pages_objects_likes_page_id_created_time_index` (`page_id`,`user_id`,`created_time`)
And update the query by inverting the group by statement :
GROUP BY c.page_id, c.user_id
The index is now used everywhere ;)
is it always bad to have Using temporary and filesort?
Do what Gordon says, but use
ORDER BY last_order_date DESC
order by date_created desc
does not make sense.
It may switch to a table scan if the list is "too long". This may be a difference in EXPLAIN
between MySQL and MariaDB. (The resultsets will be identical.)
If you do EXPLAIN FORMAT=JSON SELECT ...
, you may find that there are two filesorts.
Back to your original question...
"filesort" and "using temporary" are necessary in some cases -- especially like your case. After it GROUPs
the results, the ORDER BY
calls for sorting in a way that was not specified by the GROUP BY
. This necessitates storing the data and sorting it.
"FILEsort" is a misnomer. In most cases, the rows are sitting in RAM and can be very quickly sorted. For very large resultsets and other complex situations, a "temporary" "file" will actually be used.
The Optimizer turns your list of ORs
into an IN
like Gordon's answer. So, there is essentially no difference between the two ways of writing it. (I find IN
to be cleaner and more concise.)
Using index condition
means that InnoDB is taking on some of the work that the generic "Handler" normally does. (This is good, but not a big deal.) However, replacing INDEX(product_id)
by INDEX(product_id, date_created)
is likely to be even better because it is "covering", which will be indicated by Using index
.
"I have index for both fields" -- That is not the same as the composite index I am recommending.
You say "100 primary keys", but I suspect you mean "secondary" keys. Please provide SHOW CREATE TABLE orders
to discuss this.
I disagree with the old wives' tale: "should avoid a query with Using temporary; Using filesort". Those are just clues that you are doing something that needs such complexity. It can rarely be "avoided".
Slow MySQL query, EXPLAIN shows Using temporary; Using filesort
As already mentioned in a comment, the main cause of a slow query is that you seem to have single column indexes only, while you would need multi-column indexes to cover the joins, the filters, and the group by.
Also, your query has 2 other issues:
Even though you
group by
on 2 fields only, several other fields are listed in theselect
list without being subject to an aggregate function, such asmin()
. MySQL does allow such queries to be run under certain sql mode settings, but they are still against the sql standard and may have unexpected side effects, unless you really know what your are doing.You have filters on the
ppi_loan
table in the join condition that is the left table in aleft join
. Due to the nature of the left join, these records will not be eliminated from the resultset, but MySQL will not join any values on them. These criteria should be moved to thewhere
clause.
The indexes I would create:
ppi_sar_status: multi-column index on loanID, status, history fields - I would consider moving this to the join section because this table is not there
ppi_ques_status: multi-column index on loanID, status, timestamp fields - this would support both the subquery and the join. Remember, the subquery also has filesort in the explain.
ppi_loan: as a minimum a multi-column index on customerID, loan_number fields to support the
group by
clause, therefore avoiding the filesort as a minimum. You may consider adding the other fields in the join criteria based on their selectivity to this index.
I'm also not sure why you have the last 2 status tables in the join, since you are not retrieving any values from them. If you re using these tables to eliminate certain records, then consider using an exists()
subquery instead of a join. In a join MySQL needs to fetch data from all joined tables, whereas in an exists()
subquery it would only check if at least 1 record exists in the resultset without retrieving any actual data from the underlying tables.
Related Topics
Simple PHP Echo Code Not Working
How to Get User's Screen Resolution with PHP
Storing Time Information: Timezone Required
Opposite of Nl2Br? Is It Str_Replace
PHP Session Seemingly Not Working
How to Find Timezone Id from Ip Address/Country Name in PHP
Installing Pdo Driver on MySQL Linux Server
Create Programmatically a Product Using Crud Methods in Woocommerce 3
File_Get_Contents Throws 400 Bad Request Error PHP
How to Run the Bind_Param() Statement in PHP
Do I Need to Escape Backslashes in PHP
Forbidden Error When Submitting Simple PHP Form
Php, Merging Arrays with Common Keys
Php: Session Lost on Subdomain