How Important Is the Order of Columns in Indexes

How important is the order of columns in indexes?

Look at an index like this:

Cols
1 2 3
-------------
| | 1 | |
| A |---| |
| | 2 | |
|---|---| |
| | | |
| | 1 | 9 |
| B | | |
| |---| |
| | 2 | |
| |---| |
| | 3 | |
|---|---| |

See how restricting on A first, as your first column eliminates more results than restricting on your second column first? It's easier if you picture how the index must be traversed across, column 1, then column 2, etc...you see that lopping off most of the results in the fist pass makes the 2nd step that much faster.

Another case, if you queried on column 3, the optimizer wouldn't even use the index, because it's not helpful at all in narrowing down the result sets. Anytime you're in a query, narrowing down the number of results to deal with before the next step means better performance.

Since the index is also stored this way, there's no backtracking across the index to find the first column when you're querying on it.

In short: No, it's not for show, there are real performance benefits.

Unique index column order - is it important?

Uniqueness doesn't depend on column order. If you state that the combination of columns 1, 2 and 3 are unique, and someone else states that the combination of columns 2, 1 and 3 are unique, you're stating the same facts.

But that's not to say that all indexes are equal, or equally useful. Indexes are only (potentially) useful if you can use all of their left-most columns. If you have a query that doesn't use the left-most column of any index, the only way to satisfy the query is to scan the whole table.

how the columns order matters while index creation in sql server?

For your particular query, either index can be used. SQL Server arbitrarily chooses one of them -- I don't think there is a preference for one over the other.

On the other hand, if you had a query like this:

where name = ? and address like 'A%'

Then the best index is (name, address).

Or like this:

where address = ? and name like 'A%'

Then the best index is (address, name).

The order of the comparisons in the WHERE is independent of the index usage (unless there is some meaningless impact based on the ordering of equivalent indexes in the optimizer).

How can I test the importance of column order in indexing in PostgreSQL?

Indexes are good for finding needles in a haystack. They are not so spectacular at finding needles in a pin cushion.

You should make the conditions more selective, as well as make the table larger.

INSERT INTO dev.table_name(column_1, column_2)
SELECT RANDOM() * 1000, RANDOM() * 1000 FROM generate_series(1, 1000000);

Does order of columns of Multi-Column Indexes in where clause in MySQL matter?

The order of columns in a multi-column index matters.

The documentation of the multiple-column indexes reads:

MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.

This means an index on columns name and city can be used when an index on column name is needed but it cannot be used instead of an index on column city.

The order of conditions in the WHERE clause doesn't matter. The MySQL optimizer does a lot of work on the conditions on the WHERE clause to eliminate as many candidate rows as possible as early as possible and to read as little data as possible from the tables and indexes (because some of the read data is dropped because it doesn't match the entire WHERE clause).

Does the order of columns in an multicolumn non-clustered index matter in SQL Server?

YES it matters!

The index might be used, if your query includes the n left-most columns of that index.

So with your first version of index MultiFieldIndex_1, it might be used if you

  • use all four columns
  • use columns A, B, C
  • use columns A, B
  • use column A

but it will NOT ever be considered if you use

  • just column D
  • columns C and D
    etc.

However, your second version of the index might be used if your specify just D, or D and C - but it will never ever be used if you just specify A and B

Only if you always use all columns that are defined in the index, then the order in which they are defined becomes (almost) irrelevant (there are still some nuances as to ordering by highest selectivity etc. but those are much less important than the fact that an index will not ever be used if you don't specify the n left-most columns in your SELECT statements)

Does Order of Fields of Multi-Column Index in MySQL Matter

When discussing multi-column indexes, I use an analogy to a telephone book. A telephone book is basically an index on last name, then first name. So the sort order is determined by which "column" is first. Searches fall into a few categories:

  1. If you look up people whose last name is Smith, you can find them easily because the book is sorted by last name.

  2. If you look up people whose first name is John, the telephone book doesn't help because the Johns are scattered throughout the book. You have to scan the whole telephone book to find them all.

  3. If you look up people with a specific last name Smith and a specific first name John, the book helps because you find the Smiths sorted together, and within that group of Smiths, the Johns are also found in sorted order.

If you had a telephone book sorted by first name then by last name, the sorting of the book would assist you in the above cases #2 and #3, but not case #1.

That explains cases for looking up exact values, but what if you're looking up by ranges of values? Say you wanted to find all people whose first name is John and whose last name begins with 'S' (Smith, Saunders, Staunton, Sherman, etc.). The Johns are sorted under 'J' within each last name, but if you want all Johns for all last names starting with 'S', the Johns are not grouped together. They're scattered again, so you end up having to scan through all the names with last name starting with 'S'. Whereas if the telephone book were organized by first name then by last name, you'd find all the Johns together, then within the Johns, all the 'S' last names would be grouped together.

So the order of columns in a multi-column index definitely matters. One type of query may need a certain column order for the index. If you have several types of queries, you might need several indexes to help them, with columns in different orders.

You can read my presentation How to Design Indexes, Really for more information.

Oracle: does the column order matter in an index?

  1. If a and b both have 1000 distinct values and they are always queried together then the order of columns in the index doesn't really matter. But if a has only 10 distinct values or you have queries which use just one of the columns then it does matter; in these scenarios the index may not be used if the column ordering does not suit the query.
  2. The column with the least distinct values ought to be first and the column with the most distinct values last. This not only maximises the utility of the index it also increases the potential gains from index compression.
  3. The datatype and length of the column have an impact on the return we can get from index compression but not on the best order of columns in an index.
  4. Arrange the columns with the least selective column first and the most selective column last. In the case of a tie lead with the column which is more likely to be used on its own.

The one potential exception to 2. and 3. is with DATE columns. Because Oracle DATE columns include a time element they might have 86400 distinct values per day. However most queries on a data column are usually only interested in the day element, so you might want to consider only the number of distinct days in your calculations. Although I suspect it won't affect the relative selectivity in but a handful of cases.

edit (in response to Nick Pierpoint's comment)

The two main reasons for leading with the least selective column are

  1. Index compression
  2. Index Skip reads

Both these work their magic from knowing that the value in the current slot is the same as the value in the previous slot. Consequently we can maximize the return from these techniques by minimsing the number of times the value changes. In the following example, A has four distinct values and B has six. The dittos represent a compressible value or a skippable index block.

Least selective column leads ...

A B
--------- -
AARDVARK 1
" 2
" 3
" 4
" 5
" 6
DIFFVAL 1
" 2
" 3
" 4
" 5
" 6
OTHERVAL 1
" 2
" 3
" 4
" 5
" 6
WHATEVER 1
" 2
" 3
" 4
" 5
" 6

Most selective column leads ...

B  A
- --------
1 AARDVARK
" DIFFVAL
" OTHERVAL
" WHATEVER
2 AARDVARK
" DIFFVAL
" OTHERVAL
" WHATEVER
3 AARDVARK
" DIFFVAL
" OTHERVAL
" WHATEVER
4 AARDVARK
" DIFFVAL
" OTHERVAL
" WHATEVER
5 AARDVARK
" DIFFVAL
" OTHERVAL
" WHATEVER
6 AARDVARK
" DIFFVAL
" OTHERVAL
" WHATEVER

Even in this trival example, (A, B) has 20 skippable slots compared to the 18 of (B, A). A wider disparity would generate greater ROI on index compression or better utility from Index Skip reads.

As is the case with most tuning heuristics we need to benchmark using actual values and realistic volumes. This is definitely a scenario where data skew could have a dramatic impact of the effectiveness of different approaches.


"I think if you have a highly selective first index then - from a
performance perspective - you'll do well to put it first."

If we have a highly selective column then we should build it an index of its own. The additional benefits of avoiding a FILTER operation on a handful of rows is unlikely to be outweighed by the overhead of maintaining a composite index.

Multi-column indexes are most useful when we have:

  • two or more columns of middling selectivity,
  • which are frequently used in the same query.


Related Topics



Leave a reply



Submit