Thoughts on Index Creation for SQL Server for Missing Indexes

Thoughts on index creation for SQL Server for missing indexes

Order matters for indexed columns, but not included columns. This is because only key columns are used for lookups.

You have to "read between the lines" of most of the documentation, but that's what they're implying in this BOL article

This discussion is a bit more explicit in saying order isn't important.

Pretty much the only concern with included columns in indexes is space--the indexes can get quite large if you include lots of columns, or very large columns. BUT this means you definitely don't want to do two separate indexes, each with the same key columns but different included ones. Then you're just compounding your space problem.

SQL Server - Missing Indexes - What would use the index?

Here is what finally worked:

with xmlnamespaces(default 'http://schemas.microsoft.com/sqlserver/2004/07/showplan') , CachedPlans as (
select 
 query_plan,
 n.value('../../../@StatementText' ,'varchar(1000)') as [Statement],
 n.value('../../../@StatementSubTreeCost' ,'varchar(1000)') as [Cost],
 n.value('../../../@StatementEstRows' ,'varchar(1000)') as [Rows],
 n.value('@Impact' ,'float') as Impact,
 n.value('MissingIndex[1]/@Database' ,'varchar(128)') as [Database],
 n.value('MissingIndex[1]/@Table' ,'varchar(128)') as [TableName],
 (
  select dbo.concat(c.value('@Name' ,'varchar(128)'))
  from n.nodes('MissingIndex/ColumnGroup[@Usage="EQUALITY"][1]') as t(cg)
  cross apply cg.nodes('Column') as r(c)
 ) as equality_columns,
 (
  select dbo.concat(c.value('@Name' ,'varchar(128)'))
  from n.nodes('MissingIndex/ColumnGroup[@Usage="INEQUALITY"][1]') as t(cg)
  cross apply cg.nodes('Column') as r(c)
 ) as inequality_columns,
 (
  select dbo.concat(c.value('@Name' ,'varchar(128)'))
  from n.nodes('MissingIndex/ColumnGroup[@Usage="INCLUDE"][1]') as t(cg)
  cross apply cg.nodes('Column') as r(c)
 ) as include_columns
from (
 select query_plan
 from sys.dm_exec_cached_plans p
 outer apply sys.dm_exec_query_plan(p.plan_handle) tp
) as tab(query_plan)
cross apply query_plan.nodes('//MissingIndexGroup') as q(n)
)
select *
from CachedPlans

Understanding Indexes and Missing Index Recommendations in SSMS

The first thing I'd ask is whether there is a good reason for a table of that size doesn't have a clustered index? A clustered key doesn't even have to be unique (SQL Server will add a 'uniquifier' to it if not, although it's usually best to use an IDENTITY column).

To answer your two questions:

1) The index recommendation is related to the query you are running. As a rule of thumb, the suggested columns will match the columns the query optimiser is using to probe into the table, so if you have a query like:

SELECT field1, field2, field3
FROM   table1
WHERE  field4 = 1 AND field5 = 'bob'

The suggested index is likely to be on the field4 and field5 columns, and in order of selectivity (i.e. the column with the most variation in values first). It may include other columns (for instance field1, field2, field3) because then the query optimiser will only have to visit the index to get that data, and not visit the data page.

Note also that sometimes the suggested index is not always the one you might choose yourself. If joining several tables, the query optimiser will choose the execution plan that it thinks best suits the data, based on available indexes and statistics. It might loop over one table and probe into another, when the best possible plan might do it the other way around. You have to inspect the actual query execution plan to see what is going on.

If you know your query is selective enough to drill down to a small range of records (for instance has a where clause like WHERE table1.field1 = 1 AND table1.field2 = 'abc' AND table1.field3 = '2015-07-01' ...), you can add an index that covers all the referenced columns. This might influence the query optimiser to scan this index to get a small number of rows to join to another table, rather than performing scans.

As a rule of thumb, a good place to start when examining the execution plans is trying to eliminate scans, where the server will be reading a large range of rows, and provide indexes that narrow down the amount of data that has to be processed.

2) I think others have probably explained this well enough by now - the included columns are there so that when the index is read, the server doesn't then have to read the data page to get those values; they are stored on the index as well.

The initial response a lot of people may have when they read about such 'covering indexes' is "why don't I add a whole bunch of indexes that do this", or "why don't I add an index that covers all the columns".

In some situations (usually small tables with narrow columns, such as many-to-many joining tables), this is useful. However, with each index you add comes some costs:

Firstly, every time you update or insert a value into your table, the index has to be updated. This means you will have to contend with locking, lock escalation issues (possibly deadlocking), page splits, and the associated fragmentation. There are various ways to mitigate these issues, such as using an appropriate fill-factor to allow more values to be inserted into an index page without having to split it.

Secondly, indexes take up space. At the very least, an index is going to contain the key values you use and either the RID (in a heap) or clustering key (in a table with a clustered index). Covering indexes also contain a copy of the included columns. If these are large columns (such as big varchars) then the index can be quite large and it is not unheard of for a tables indexes to add up to be bigger than the table itself. Note that there are also limits on the size of an index, both in terms of columns, and total size. Because the clustering key is always included in non-clustered indexes on a table with a clustered index (the clustered index is on the data page itself), this means that a smaller clustered key is better. Whilst you can use a composite index, this is likely to be several bytes wide, and whilst you can use a non-unique key, SQL Server will add that uniquifier to it, which is another 4 bytes. Best practice is to use an identify column (int, or bigint if you envisage ever having more than 2 billion rows in the table). Identities also always increment, so you won't get page splits in your data pages when inserting a new record, as it will always go on the end of the table.

so the tl;dr; is:

The suggested indexes can be useful, but often don't give the best index. if you know the structure of your data and how it will be queried, you can construct indexes that contain the commonly use probing keys.

Always order the columns in your index in the order of selectivity (i.e. the column with the most values first). This might seem counter-intuitive, but it allows SQL Server to find the data you want faster, with fewer reads.

Included columns are useful, but only usually when they are small columns (e.g. integers). If your query needs six columns from a table and the index covers only five of them, SQL Server will still have to visit the data page, so in this case you're better off without the included columns because they just take up space and have a maintenance cost.

Why is SSMS-produced script missing indexes?

Scripting out the index is turned off by default in SSMS. Personally, that is one of the first things I turn back on along with scripting permissions and triggers. You can find this setting by:

In SSMS, open the Tools menu and pick Options
Scroll down to SQL Server Object Explorer and expand the tree
Click on the Scripting node and change Script indexes to true

Missing Index Details SQL

That means SQL Server is suggesting that your query could run faster with an index. Indexes add overhead and disk storage, so you should ignore this hint unless the query is giving performance problems in production.

To create the index, uncomment the statement after use, replace [<Name of Missing Index, sysname,>] with a real name, and run it:

USE [phsprod]
GO
CREATE NONCLUSTERED INDEX IX_Address_UserId
ON [dbo].[address] ([userid])

How to determine if an Index is required or necessary

I use Jason Strate's index analysis scripts. They tell you how much your existing indexes are used as well as how much missing indexes would have been used. I typically don't add indexes unless they make up more than 5 or 10% of the queries on a table.

Most importantly, though, it's about making sure the application responds fast enough for the users.

Jason Strate's index analysis blog articles)

These days, I use sp_BlitzIndex® when performing index analysis.

Thoughts on Index Creation for SQL Server for Missing Indexes