Two Single-Column Indexes VS One Two-Column Index in MySQL

Two single-column indexes vs one two-column index in MySQL?

If you have two single column indexes, only one of them will be used in your example.

If you have an index with two columns, the query might be faster (you should measure). A two column index can also be used as a single column index, but only for the column listed first.

Sometimes it can be useful to have an index on (A,B) and another index on (B). This makes queries using either or both of the columns fast, but of course uses also more disk space.

When choosing the indexes, you also need to consider the effect on inserting, deleting and updating. More indexes = slower updates.

MySQL multiple indexes vs multi-column index for searching

You have to understand in MySQL (in the case of InnoDB) it only uses the left most prefix of your indexes. So if the index is in Field1 + Field2, querying with only Field2 in the WHERE cannot make use of the whole index.

And if Field1 + Field2 + Field3 is the index, a query with only Field1 & Field3 in WHERE the same would happen.

Option 1

You would have to create a separate index for each search scenario if you want to optimize each search. However, if the tables are large, then the indexes would become extremely large as well. If those search queries are made very often this would be worth it.

Option 2

You can use a nifty trick if your searches have a low selectivity (i.e. Gender) by putting the low selectivity columns to the left most and use Gender IN(M, F) to include it in the WHERE clause along with the other column(s) to make use of the whole index.

Is a single-column index needed when having multicolumn index?

A short excerpt from the documentation page about how MySQL uses indexes:

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). For more information, see Section 8.3.5, “Multiple-Column Indexes”.

You better remove the indexes on (c1) and (c1,c2). They are not used but they use storage space and consume processor power to be kept up-to-date when the table data changes.

Multiple Column Index vs Multiple Indexes

You should use a multi-column index on (primaryId, imgDate) so that MySQL is able to use it for selecting the rows and sorting.

If all the columns used for sorting are not in the index used for selection, MySQL uses the "filesort" strategy, which consists of sorting all rows (in memory if there is not too much rows; on disk else).

If all columns used for sorting are in the index, MySQL uses the index to get the rows order (with some restrictions).

MySQL uses a tree structure for the indexes. This allows to access keys in order directly without sorting.

A multi-column index is basically an index of the concatenation of the columns. This allows MySQL to find the first row matching primaryId={$imgId}, and then access all the other rows directly in the right order.

With a single-row index on primaryId, MySQL can find all the rows matching primaryId={$imgId}, but it will find the rows in no particular order; so it will have to sort them after that.

See EXPLAIN and ORDER BY Optimization.

Multiple indexes vs single index on multiple columns in postgresql

Regardless of how many indices have you created on relation, only one of them will be used in a certain query (which one depends on query, statistics etc). So in your case you wouldn't get a cumulative advantage from creating two single column indices. To get most performance from index I would suggest to use composite index on (location, timestamp).

Note, that queries like ... WHERE timestamp BETWEEN smth AND smth will not use the index above while queries like ... WHERE location = 'smth' or ... WHERE location = 'smth' AND timestamp BETWEEN smth AND smth will. It's because the first attribute in index is crucial for searching and sorting.

Don't forget to perform

ANALYZE;

after index creation in order to collect statistics.

Update:
As @MondKin mentioned in comments certain queries can actually use several indexes on the same relation. For example, query with OR clauses like a = 123 OR b = 456 (assuming that there are indexes for both columns). In this case postgres would perform bitmap index scans for both indexes, build a union of resulting bitmaps and use it for bitmap heap scan. In certain conditions the same scheme may be used for AND queries but instead of union there would be an intersection.

Using two single-column indexes in where and orderby clause

This is your query:

SELECT *
FROM mytable
WHERE user_id = 123
ORDER BY date_created

If you have two distinct indexes, then MySQL might use the index on user_id to apply the where predicate (if it believes that it will speed up the query, depending on the cardinality of your data, and other factor). It will not use the index on date_created, because it has no way to relate the intermediate resultset that satisfy the where predicate to that index.

For this query, you want a compound index on (user_id, date_created). The database uses the first key in the index to filter the dataset: in the index B-tree, matching rows are already sorted by date, so the order by operation becoms a no-op.

I notice that you are using select *; this is not a good practice in general, and not good for performance. If there are other columns in the table than the user and date, this forces to database to look up at the table to bring the corresponding rows after filtering and ordering through the index, which can be more expensive than not using the index at all. If you just need a few columns, then enumerate them:

SELECT date_created, first_name, last_name 
FROM mytable
WHERE user_id = 123
ORDER BY date_created

And have an index on (user_id, date_created, first_name, last_name). That's a covering index: the database can execute the whole query using on the index, without looking up the table itself.

Multiple Indexes vs Multi-Column Indexes

I agree with Cade Roux.

This article should get you on the right track:

  • Indexes in SQL Server 2005/2008 – Best Practices, Part 1
  • Indexes in SQL Server 2005/2008 – Part 2 – Internals

One thing to note, clustered indexes should have a unique key (an identity column I would recommend) as the first column.
Basically it helps your data insert at the end of the index and not cause lots of disk IO and Page splits.

Secondly, if you are creating other indexes on your data and they are constructed cleverly they will be reused.

e.g. imagine you search a table on three columns

state, county, zip.

  • you sometimes search by state only.
  • you sometimes search by state and county.
  • you frequently search by state, county, zip.

Then an index with state, county, zip. will be used in all three of these searches.

If you search by zip alone quite a lot then the above index will not be used (by SQL Server anyway) as zip is the third part of that index and the query optimiser will not see that index as helpful.

You could then create an index on Zip alone that would be used in this instance.

By the way We can take advantage of the fact that with Multi-Column indexing the first index column is always usable for searching and when you search only by 'state' it is efficient but yet not as efficient as Single-Column index on 'state'

I guess the answer you are looking for is that it depends on your where clauses of your frequently used queries and also your group by's.

The article will help a lot. :-)

MySQL Indexes: Why does multi-column index perform worse than single column index?

Your 'expectations' are right. EXPLAIN is imprecise; do not trust it too far.

WHERE release_year < 2010
AND rating = 'R'

is (usually) best optimized with

INDEX(rating,   -- first, because it is tested with '='
release_year) -- last, because it is a range.

If you can afford to run it both ways, watching SESSION STATUS LIKE 'Handler%' gives you a precise view into rows read (and perhaps written to temp tables). I discuss that technique here . That blog also explains that the composite index is best.

Exceptions on that being best:

  • Perhaps the statistics say that using an index is not worth the effort; simply scanning the table might be better.
  • Perhaps extending it to be "covering" would be better. (Not for the query in question.)
  • Perhaps the PRIMARY KEY should be that pair of columns, or at least start with them. This avoids bouncing between the index BTree and the Data BTree.

If the table has only a thousand rows, you may not be able to see the difference between this index, that index, or even no index. But, if you expect the table to grow, it is best to establish the best indexes now, not next year in the middle of the night when your web site has a performance problem and you have forgotten the detail.

A side note... If you tack on ORDER BY release_year LIMIT 5, the composite index really shines. This is because the index can be used for all the WHERE, all the ORDER BY, and get to the LIMIT, and touch only 5 rows. Almost any other scenario will need to collect lots of rows in a temp table, sort it, then peel off 5 rows.



Related Topics



Leave a reply



Submit