Difference of Create Index by Using Include Column or Not Using

Difference of create Index by using include column or not using

In the first one Column2 gets added to the index key. In the second one it might not (*) get added to the key in which case it will only appear in the index leaf pages. This can allow the index to be searched by Column1 but avoid the need to go back to the base table (bookmark lookup/key lookup) to retrieve the value for Column2.

i.e. it makes index2 "covering" for queries such as

SELECT Column1,Column2
FROM [dbo].[MyTable]
WHERE Column1 = 'X'

And it also covers queries such as

SELECT Column1,Column2
FROM [dbo].[MyTable]
WHERE Column1 = 'X' AND Column2 = 'Y'

But index1 may well perform better for the second query as it can seek on the two columns directly (as opposed to only being able to search on Column1 then needing to evaluate all matching rows at the index leaf level to see if they meet the Column2 predicate). If Column2 is never used as a search predicate against that index and your queries against that index wouldn't benefit from having Column2 ordered then it should be added as an INCLUDE-d column to keep the size of the key down and reduce the number of pages in the index.

(*) The reason I say "might not" above is because if Column2 is (part of) the clustered index key it will still be added there anyway for a non clustered index not created with the UNIQUE option.

Indexes with included columns, what's the difference?

The internal storage of indexes uses a B-Tree structure and consists of "index pages" (the root and all intermediate pages) and "index data pages" (the leaf pages only).

Note do not confuse "index data pages" with the "data pages" (leaf pages of clustered indexes) which store most of the columns of actual data.

  • Only the index columns are stored on the index pages.
  • By placing some columns in the INCLUDE section, less data per index key is stored on each page.
  • Meaning fewer pages are needed to hold the index keys. (Making it easier to cache these frequently used pages in memory for longer.)
  • And possibly fewer levels in the tree. (In such a case performance benefits can be much bigger because every tree level traversal is another disk access.)

When an index is used, the index key is used to navigate through the index pages to the correct index data page.

  • If the index has INCLUDE columns, that data is immediately available should the query need it.
  • If the query requires columns not available in either the index keys or the INCLUDE columns, then an additional "bookmark lookup" is required to the correct row in the clustered index (or heap if no clustered index defined).

Some things to note that hopefully addresses some of your confusion:

  • If the keys of your index and filters in your query are not selective enough, then the index will be ignored (regardless of what's in your INCLUDE columns).
  • Every index you create has overhead for INSERT and UPDATE statements; more so for "bigger" indexes. (Bigger applies to INCLUDE columns as well.)
  • So while you could in theory create a multitude of big indexes with include columns to match all the permutations of access paths: it would be very counter-productive.

It's worth noting that before INCLUDE columns were added as a feature:

  • It was a common index tuning 'trick' to expand the keys of an index to include columns that weren't needed in the index/filter. (Known as a covering index.)
  • These columns were commonly required in output columns or as reference columns for joins to other tables.
  • This would avoid the infamous "bookmark lookups", but had the disadvantage of making the index 'wider' than strictly necessary.
  • In fact very often the earlier columns in the index would already identify a unique row meaning the extra included columns would be completely redundant if not for the "avoiding bookmark lookups" benefit.
  • INCLUDE columns basically allow the same benefit more efficiently.

NB Something very important to point out. You generally get zero benefit out of INCLUDE columns in your indexes if you're in the lazy habit of always writing your queries as SELECT * .... By returning all columns you're basically ensuring a bookmark lookup is required in any case.

What INCLUDE() function does when creating index in MS SQL Server?

The INCLUDE clause adds the data at the lowest/leaf level, rather than in the index tree. This makes the index smaller because it's not part of the tree.

INCLUDE columns are not key columns in the index, so they are not ordered. This means it isn't really useful for predicates, sorting etc.. However, it may be useful if you have a residual lookup in a few rows from the key columns.

INCLUDE columns are not key columns in the index, so they are not ordered. This makes them not typically useful for JOINs or sorting. And because they are not key columns, they don't sit in the whole B-tree structure like key columns

By adding Include (or nonkey)columns, you can create nonclustered indexes that cover more queries. This is because the nonkey columns have the following benefits:

  • They can be data types not allowed as index key columns.
  • They are not considered by the Database Engine when calculating the number of index key columns or index key size.

An index with Included columns can significantly improve query performance when all columns in the query are included in the index either as key or nonkey columns. Performance gains are achieved because the query optimizer can locate all the column values within the index; table or clustered index data is not accessed resulting in fewer disk I/O operations.

For more info refer Microsoft docs: Create Indexes with Included Columns

Why use the INCLUDE clause when creating an index?

If the column is not in the WHERE/JOIN/GROUP BY/ORDER BY, but only in the column list in the SELECT clause is where you use INCLUDE.

The INCLUDE clause adds the data at the lowest/leaf level, rather than in the index tree.
This makes the index smaller because it's not part of the tree

INCLUDE columns are not key columns in the index, so they are not ordered.
This means it isn't really useful for predicates, sorting etc as I mentioned above. However, it may be useful if you have a residual lookup in a few rows from the key column(s)

Another MSDN article with a worked example

difference between creating index with two columns and creating separate index on two columns

You might experience a difference if you have any queries that select based on col2 alone.

SELECT (list of columns)
FROM dbo.YourTable
WHERE col2 = 'someValue'

If you have two separate indices, then there is a chance that ix_index2 might be used to speed up this query.

However, if you only have a compound index on (col1, col2), then that index cannot ever be used for this query. A compound index can only ever be used if the n left-most columns are being referenced in a query.

So your compound index might be used

  • if your query uses both col1 and col2 in a WHERE clause
  • if your query uses only col1 in a WHERE clause

but it can NEVER be used if your query only uses col2 in the WHERE clause

Index Key Column VS Index Included Column

Index key columns are part of the b-tree of the index. Included columns are not.

Take two indexes:

CREATE INDEX index1 ON table1 (col1, col2, col3)
CREATE INDEX index2 ON table1 (col1) INCLUDE (col2, col3)

index1 is better suited for this kind of query:

SELECT * FROM table1 WHERE col1 = x AND col2 = y AND col3 = z

Whereas index2 is better suited for this kind of query:

SELECT col2, col3 FROM table1 WHERE col1 = x

In the first query, index1 provides a mechanism for quickly identifying the rows of interest. The query will (probably) execute as an index seek, followed by a bookmark lookup to retrieve the full row(s).

In the second query, index2 acts as a covering index. SQL Server doesn't have to hit the base table at all, since the index provides all the data it needs to satisfy the query. index1 could also act as a covering index in this case.

If you want a covering index, but don't want to add all columns to the b-tree because you don't seek on them, or can't because they aren't an allowed datatype (eg, XML), use the INCLUDE clause.

Proper way of creating indexes for same column when include columns are different

Definitely create one index that includes all three columns!

The fewer indexes you have, the better - index maintenance is a cost factor - more indices require more maintenance.

And the included columns are included in the leaf level of the index only - the have only a very marginal impact on performance.

Update: if you have a single index on (tblID, column_A), then you can use this for queries that use only tblID in their WHERE clause, or you can use it for queries that use both columns in their WHERE clause.

HOWEVER: this index is useless for queries that use only column_A in their WHERE clause. A compound index (index made up from multiple columns) is only ever useful if a given query uses the n left-most columns as specified in the index.

So in your case, one query seems to indicate tblID, while the other needs (tblID, column_A) - so yes, in this case, I would argue a single index on (tblID, column_A) would work for both queries.

Hard and Fast rule for include columns in index

That index suggestion by marc_s is wrong. I've added a comment. (And it was my answer accepted too!)

The index for this query would be

CREATE NONCLUSTERED INDEX NC_EmpDep 
ON Employee(DepartmentID)
INCLUDE (Lastname, EmployeeID)

An index is typically

CREATE INDEX <name> ON <table> (KeyColList) INCLUDE (NonKeyColList)

Where:

  • KeyColList = Key columns = used for row restriction and processing

    WHERE, JOIN, ORDER BY, GROUP BY etc
  • NonKeyColList = Non-key columns = used in SELECT and aggregation (e.g. SUM(col)) after selection/restriction

What is the difference between MySQL's create index and alter add index?

The implementation is the same on the server-side.

The only difference is that with CREATE INDEX syntax, you must specify a name for the index.

Whereas with ALTER TABLE, you may specify a name for the index, but you don't have to.

If you don't specify a name, the server generates a default name, as the name of the first column in the index, with a number suffix if necessary.



Related Topics



Leave a reply



Submit