What Are Covering Indexes and Covered Queries in SQL Server

What are Covering Indexes and Covered Queries in SQL Server?

A covering index is one which can satisfy all requested columns in a query without performing a further lookup into the clustered index.

There is no such thing as a covering query.

Have a look at this Simple-Talk article: Using Covering Indexes to Improve Query Performance.

What is a Covered Index?

A covering index is an index that contains all of, and possibly more, the columns you need for your query.

For instance, this:

SELECT *
FROM tablename
WHERE criteria

will typically use indexes to speed up the resolution of which rows to retrieve using criteria, but then it will go to the full table to retrieve the rows.

However, if the index contained the columns column1, column2 and column3, then this sql:

SELECT column1, column2
FROM tablename
WHERE criteria

and, provided that particular index could be used to speed up the resolution of which rows to retrieve, the index already contains the values of the columns you're interested in, so it won't have to go to the table to retrieve the rows, but can produce the results directly from the index.

This can also be used if you see that a typical query uses 1-2 columns to resolve which rows, and then typically adds another 1-2 columns, it could be beneficial to append those extra columns (if they're the same all over) to the index, so that the query processor can get everything from the index itself.

Here's an article: Index Covering Boosts SQL Server Query Performance on the subject.

How does SQL covering index work?

Any index, at all levels, stores the values for the column(s) defining that index's key. In addition, at the leaf level of the non-clustered index, the leaves store the values for any additional column(s) which are part of the clustered index key1 and were not part of the non-clustered index key, because that is how the clustered index lookup is then performed.

If the only column(s) that the query needs to retrieve are either part of the non-clustered index key or are part of the clustered index key then we've already obtained all of those column values by navigating the non-clustered index.

Queries, in general, are not trying to retrieve rows, only row values from particular columns.


As an analogy, consider that you're running a census for an entire town, and are storing all of the data and physical cards. These cards contain the person's name, address, date of birth, current occupation, etc. Assume further that every individual has a unique address and so you decide to store all of these cards in address order, in a big box file. This is your clustered index.

You frequently want to locate people based on their names. So you create another set of index cards that tell you, for any particular combination of surname and firstname all of the addresses at which someone with that name resides. You put these cards in a second box file and sort them by surname, firstname values. This is your non-clusered index.

Finally, suppose your task is to identify the street on which all people with the surname Radish live. You can obviously use your non-clustered index to identify all of the people with the surname Radish. But remember, the cards in this secondary index gives you the addresses for these people. If our only task is to identify their street, we already have that information at hand. There's no need for us to go and look up all of the original census cards, containing all kinds of information that we've not been asked for, just to complete this query.


1And since 2012, any additional columns identified in an INCLUDE clause for the index definition.

Database covering a query

Only the third example is covered. To be covered, a query must be fully satisfied from the index. Your first example produces results that are entirely within the index, but it needs information that is not part of the index to complete, and so is not covered. To match your first example, you need an index that lists Col3 first.

One important feature of indexes is the ability to include a set of column in the index without actually indexing those columns. So an index example for your table might look like this:

CREATE INDEX [ix_MyTable] ON [MyTable] 
(
[Col1] ASC,
[Col2] ASC
)
INCLUDE ( [Col3])

Now samples 2 and 3 are both covered. Sample 1 is still not covered, because the index is still not useful for the WHERE clause.

Why INCLUDE Col3, rather than just listing it with the others? It's important to remember that as you add indexes or make them more complex, operations that change data using those indexes will require more and more work, because each change will also require updating the indexes. If you include a column in an index, without actually indexing it, an update to that column still needs to go back and update the index as well, so that the data in the index is accurate... but it doesn't also need to re-order the index based on the new value. So this saves some work for our database server. To put it another way, if a column will only be in the select list, and not in the where clause, you might get a small performance benefit by including it in an index to get the benefit of covering a query from the index, without actually indexing on the column.

How can a covering index satisfy more than one query?

Order matters. Since your proposed index begins with [payment_date] but the query predicate does not include [payment_date], the index is not likely to be more advantageous than a table scan.

It is possible to have a single index cover multiple queries. The first listed field of the index will almost always need be in the predicate of all the queries. For improved results, also apply this logic to the 2nd field, 3rd field, etc.

When there is more than one choice for a position, one choice may perform better than another.

Side note: Oracle has a feature called "index skip scan" that allows an index to be used even if the leading column is not in the predicate. It is effective when the leading column has few distinct values (from learningintheopen.org).

Usefulness of Covering indexes

Its very much a case of understanding your usage. It would be really easy to add indexes for everything you could possibly query against but as with everything its a trade-off. Every index costs you time and storage - so it will likely slow down your inserts/updates, and the more you index the higher this cost.

If your usage highly favours reads over writes, then all is well and all you need to do is pay for some storage. If you need decent performance for writes too, then all you can really do is understand your application and index your most important stuff.

I highly recommend the "inside sql server" series of books (Kalen Delaney et al) - lots of reading to get through but I guarantee they'll help you understand the trade-offs you're making.

What is a MySQL covering index?

Let's say that "covering" is "an attribute of an INDEX relative to a particular SELECT.

Some examples:

select id, name from table where id = 1

INDEX(id, name) -- covering; best index
INDEX(id, name, age) -- covering, but overkill
INDEX(age, name, id) -- covering, but inefficient (might not be used)

select id, name, age from table where id = 1

INDEX(id, name, age) -- Having `id` first is optimal, but any order is "covering"

As already pointed out, if this is InnoDB and the table has PRIMARY KEY(id), then none of these secondary indexes are worth having.

SELECT a FROM tbl GROUP BY b ORDER BY c

No index is very useful since the GROUP BY and ORDER BY are not the same.
INDEX(a,b,c) -- in any order, is "covering"
INDEX(b,c,a) -- "covering", and perhaps optimal.
INDEX(b,c,a,d) -- "covering", but 'bigger'

Bigger matters in small ways. When doing SELECT COUNT(*) FROM ..., InnoDB will (usually) pick the 'smallest' index to do the counting.

Another 'rule' is to avoid redundant indexes.

    INDEX(a,b)  -- Let's say you 'need' this one.
INDEX(a) -- Then this one is redundant and should be dropped.

When to use composite index and covering index in SQL Server?

  1. The composite index (where all of the columns are “key” columns) will carry the data in all levels of the index; the INCLUDE index will only carry the “non key” columns in the leaf node. Take away: The INCLUDE index will be smaller than the equivalent composite index.

  2. The INCLUDE columns(non-key) will not count towards the max index size (900 bytes).

  3. Updates to non-key columns in INCLUDE indexes will not contribute to index fragmentation; updates to unused “key” columns in the composite index will result in fragmentation… IMO, this is a biggie.

  4. Assuming low index fragmentation, both composite and INCLUDE indexes will perform comparably. All columns in the composite index need not be in the query predicate.



Related Topics



Leave a reply



Submit