Rewriting MySQL Select to Reduce Time and Writing Tmp to Disk

Rewriting mysql select to reduce time and writing tmp to disk

hope you find this helpful - http://pastie.org/1105206

drop table if exists poster;
create table poster
(
poster_id int unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb; 


drop table if exists category;
create table category
(
cat_id mediumint unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb; 

drop table if exists poster_category;
create table poster_category
(
cat_id mediumint unsigned not null,
poster_id int unsigned not null,
primary key (cat_id, poster_id) -- note the clustered composite index !!
)
engine = innodb;

-- FYI http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html

select count(*) from category
count(*)
========
500,000


select count(*) from poster
count(*)
========
1,000,000

select count(*) from poster_category
count(*)
========
125,675,688

select count(*) from poster_category where cat_id = 623
count(*)
========
342,820

explain
select
 p.*,
 c.*
from
 poster_category pc
inner join category c on pc.cat_id = c.cat_id
inner join poster p on pc.poster_id = p.poster_id
where
 pc.cat_id = 623
order by
 p.name
limit 32;

id  select_type table   type    possible_keys   key     key_len ref                         rows
==  =========== =====   ====    =============   ===     ======= ===                         ====
1   SIMPLE      c       const   PRIMARY         PRIMARY 3       const                       1   
1   SIMPLE      p       index   PRIMARY         name    257     null                        32  
1   SIMPLE      pc      eq_ref  PRIMARY         PRIMARY 7       const,foo_db.p.poster_id    1   

select
 p.*,
 c.*
from
 poster_category pc
inner join category c on pc.cat_id = c.cat_id
inner join poster p on pc.poster_id = p.poster_id
where
 pc.cat_id = 623
order by
 p.name
limit 32;

Statement:21/08/2010 
0:00:00.021: Query OK

How can I make this MySQL query perform better?

A different approach if there are very few different values of attribute1 iis to try an index on attribute1 to take advantage of the loose index scan.

How can I rewrite a DELETE query so I can limit the rows affected?

Add ORDER BY and LIMIT in the part of the code that selects the rows to be deleted, then place it inside a derived table and join back to the table to be deleted:

DELETE f_del
FROM field_data_body AS f_del
   JOIN
      ( SELECT f.PK                            --- the Primary Key of the table
        FROM   field_data_body f
               INNER JOIN node n
                       ON f.entity_id = n.nid
               LEFT JOIN content_to_keep k
                      ON n.nid = k.nid
        WHERE  n.type = 'article'
               AND k.nid IS NULL
        ORDER BY some_column
        LIMIT 100
      ) AS tmp
      ON tmp.PK = f_del.PK ;

What should be indexed to improve performance?

In general, the selection filters can use indexes on user_id or activity_type_id or both (in either order).

The ordering operation might be able to use a filter on created_at.

It is likely that for this query, a composite index on (user_id, activity_type_id) would give the best result, assuming that MySQL can actually make use of it. Failing that, it is likely to be better to index user_id than activity_type_id because it is likely to provide better selectivity. One reason for thinking that is that there would be 4 subsections of the index to scan if it uses an index on activity_type_id, compared with just one subsection to scan if it uses an index on user_id alone.

Trying to rely on an index for the sort order is likely to mean a full table scan, so it is less likely to be beneficial. I would not create an index on created_at to support this query; there might be other queries where it would be beneficial.

Need basic advice regarding mysql indexes and query performance

The best way to think about indexes is how you expect to query the data.

Let's assume that products_id and categories_id are PRIMARY KEY in your database, which means they are indexed automatically. If not, start with that.

When I do multi-join tables, if you want to be paranoid, create two indexes to allow bi-directional accessibility of the IDs, e.g.

CREATE TABLE products_to_categories (
    products_id integer unsigned NOT NULL,
    categories_id integer unsigned NOT NULL,
    INDEX p_to_c (products_id,categories_id),
    INDEX c_to_p (categories_id,products_id)
) ENGINE=MyISAM;

This takes a lot of space, but it will be really, really fast, and unless you query both directions (from products to categories, and then reverse), it's probably overkill. Alternatively, by default, I do:

CREATE TABLE products_to_categories (
    products_id integer unsigned NOT NULL,
    categories_id integer unsigned NOT NULL,
    INDEX p (products_id),
    INDEX c (categories_id)
) ENGINE=MyISAM;

If you need some sort of constraint (many-to-one, one-to-many) then change your index types to UNIQUE etc.

In general, start with the latter definition, do your query, and run an EXPLAIN on it. If it shows anything which is more than 1 for the number of matched rows (except for the first table), then re-work the indexes.

Database indexing is really more a matter of testing and diagnostics than many think. I didn't know how to do this for a while, until I actually had a problem. In short:

Create your indexes
Determine your queries
Run EXPLAIN on your queries, and run timing tests to determine query speed!
Adjust your indexes
Go back to 3

As one user commented below, EXPLAIN is a good starting point before running timing tests, but nothing beats actual timing tests in the wild.

MySQL optimization of huge table

Here are some innodb examples that work on large tables of approx. 60 to 500 million rows that demonstrate the advantages of a well designed innodb table and how best to use clustered indexes (only available with innodb)

MySQL and NoSQL: Help me to choose the right one

60 million entries, select entries from a certain month. How to optimize database?

Rewriting mysql select to reduce time and writing tmp to disk

You will also want to read the following:

http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html

http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/

Once you've sorted out your table designs and optimised your innodb config:

http://www.mysqlperformanceblog.com/2006/09/29/what-to-tune-in-mysql-server-after-installation/

http://www.mysqlperformanceblog.com/2007/11/03/choosing-innodb_buffer_pool_size/

You can try something like:

start transaction;

insert into target_table (x,y) select x,y from source_table order by x,y;

commit;

Hope this helps.

complex query takes too much time transferring

You need a compound index on the intersection table:

ALTER TABLE instruments_tools ADD KEY (id_instrument, id_tool);

The order of columns in that index is important!

What you're hoping for is that the joins will start with the instrument table, then look up the matching index entry in the compound index based on id_instrument. Then once it finds that index entry, it has the related id_tool for free. So it doesn't have to read the instrument_tools table at all, it only need to read the index entry. That should give the "Using index" comment in your EXPLAIN for the instruments_tools table.

That should help, but you can't avoid the temp table and filesort, because of the columns you're grouping by and sorting by cannot make use of an index.

You can try to make MySQL avoid writing the temp table to disk by increasing the size of memory it can use for temporary tables:

mysql> SET GLOBAL tmp_table_size = 256*1024*1024;      -- 256MB
mysql> SET GLOBAL max_heap_table_size = 256*1024*1024; -- 256MB

That figure is just an example. I have no idea how large it would have to be for the temp table in your case.

How to create a SQL test enviornment?

What you really need is a data generator tool that will help you populate a database with thousands or millions of records, and after you have a bloated database with meaningful data you can start your performance tests experimenting the best relationship, index and joins that will help you detect what really needs to be optimized.

One that I personally used in the past, was: GenerateData
But there are others.

Rewriting MySQL Select to Reduce Time and Writing Tmp to Disk