Clustered VS Non-Clustered

What do Clustered and Non-Clustered index actually mean?

With a clustered index the rows are stored physically on the disk in the same order as the index. Therefore, there can be only one clustered index.

With a non clustered index there is a second list that has pointers to the physical rows. You can have many non clustered indices, although each new index will increase the time it takes to write new records.

It is generally faster to read from a clustered index if you want to get back all the columns. You do not have to go first to the index and then to the table.

Writing to a table with a clustered index can be slower, if there is a need to rearrange the data.

What are the differences between a clustered and a non-clustered index?

Clustered Index

  • Only one per table
  • Faster to read than non clustered as data is physically stored in index order

Non Clustered Index

  • Can be used many times per table
  • Quicker for insert and update operations than a clustered index

Both types of index will improve performance when select data with fields that use the index but will slow down update and insert operations.

Because of the slower insert and update clustered indexes should be set on a field that is normally incremental ie Id or Timestamp.

SQL Server will normally only use an index if its selectivity is above 95%.

Clustered Index Vs Non-Clustered Index Usage

As many here have said, it's hard to say definitively what will be the best solution without testing. However, you say that you are filtering by col2 before choosing to move data. Depending on what percentage of those records are moved, I would suggest starting with clustering on the unique col1. Then create a non-clustered index on col2. One advantage of the non-clustered index is that you can make it a filtered index with a WHERE clause. So, for example, if only 10% of your records have a col2 value from a few choices that you care about, the index 'WHERE col2 IN (val, val2, val3) will be 10x smaller and therefore faster to access.

If you go this route, make sure the WHERE clause in your SELECT matches the WHERE clause you specify on the index.

Difference between clustered and nonclustered index

You really need to keep two issues apart:

1) the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.

2) the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.

By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way!

One rule of thumb I would apply is this: any "regular" table (one that you use to store data in, that is a lookup table etc.) should have a clustering key. There's really no point not to have a clustering key. Actually, contrary to common believe, having a clustering key actually speeds up all the common operations - even inserts and deletes (since the table organization is different and usually better than with a heap - a table without a clustering key).

Kimberly Tripp, the Queen of Indexing has a great many excellent articles on the topic of why to have a clustering key, and what kind of columns to best use as your clustering key. Since you only get one per table, it's of utmost importance to pick the right clustering key - and not just any clustering key.

  • GUIDs as PRIMARY KEY and/or clustered key
  • The clustered index debate continues
  • Ever-increasing clustering key - the Clustered Index Debate..........again!
  • Disk space is cheap - that's not the point!

Marc

What is best among clustered index scan vs non-clustered index seek

First of all, there is no 'best' operator. Sometimes reading more data is more efficient than reading some data and massage them to get our results. 'Best' as almost everything is relative.

Lets try to understand what happened in the comments...

The query

select 
min(CampaignID),
max(CampaignID)
from Campaign
where datecreated < dateadd(day, -90, getutcdate())

Which says:

I want the first and the last ID (min/max) of any record where the date is less than a constant date.

Clustered

The first query without the index/index hint did what SQL Server thought is cheaper than reading any index even if it requires more IO (disk usage). This is because finding the minimum and maximum while validating the records in the table is cheaper than selecting half of the table, then reordering/aggregating them find the exact same info.

The clustered index stores all data on disk and is logically ordered by the key columns, in this case CampaignID (I assume). This means, that to find the minimum and maximum ID is easy: The minimum is the first ID which matches the criteria -> lets check each ID from the first one and stop once we find a record where the date is in place (this will most probably be the first one). The maximum is the first record matching the condition from the end of the index.

Index with the date as key

CREATE NONCLUSTERED INDEX [NCIX] 
ON [dbo].[Campaign](DateCreated)
INCLUDE (Campaignid)

With the first index (date as the key column), SQL Server can use the date to filter the data, true, but it did not help in sorting. It still has to check every record in that index and figure out the minimum and maximum from a possibly unordered set of values.

Index with the ID as key

CREATE NONCLUSTERED INDEX [NCIX] 
ON [dbo].[Campaign](Campaignid)
INCLUDE (DateCreated)

With the second index where the ID was the key column, SQL Server can use the same trick as with the clustered key. The only difference is that there is no other data to read, but the ID and the date, which is much smaller than the whole record would be, therefore it can fit in less pages and requires less IO.

SQL Server will most probably choose the second index even if there is no index hint.

How the second index works (approximation by query)

You can get the minimum Campaignid by

SELECT TOP(1)
Campaignid
FROM
[dbo].[Campaign]
WHERE
datecreated < dateadd(day, -90, getutcdate())
ORDER BY
Campaignid ASC

and the maximum with a very similar query

SELECT TOP(1)
Campaignid
FROM
[dbo].[Campaign]
WHERE
datecreated < dateadd(day, -90, getutcdate())
ORDER BY
Campaignid DESC

If you cross join them as subqueries, you pretty much got what the execution plan describes.

Notes

Here I would add a note: optimizing for only one query is not always the best tactic. You can't optimize for everything, if this query runs once a day/week/quarter, that 14-15 seconds runtime with the clustered key will most probably do no harm. If the index does not help other queries, I would not create it, unless it is a mission critical query.

Difference between Cluster and Non-cluster index in SQL

A link describing the two.

http://www.mssqlcity.com/FAQ/General/clustered_vs_nonclustered_indexes.htm

http://www.sql-server-performance.com/articles/per/index_data_structures_p1.aspx

The difference is in the physical order of the records in the table relative to the index. A clustered index is physically ordered that way in the table.

Why NonClustered index scan faster than Clustered Index scan?

SQL Server indices are b-trees. A non-clustered index just contains the indexed columns, with the leaf nodes of the b-tree being pointers to the approprate data page. A clustered index is different: its leaf nodes are the data page itself and the clustered index's b-tree becomes the backing store for the table itself; the heap ceases to exist for the table.

Your non-clustered index contains a single, presumably integer column. It's a small, compact index to start with. Your query select id from scan has a covering index: the query can be satisfied just by examining the index, which is what is happening. If, however, your query included columns not in the index, assuming the optimizer elected to use the non-clustered index, an additional lookup would be required to fetch the data pages required, either from the clustering index or from the heap.

To understand what's going on, you need to examine the execution plan selected by the optimizer:

  • See Displaying Graphical Execution Plans
  • See Red Gate's SQL Server Execution Plans, by Grant Fritchey

Using clustered vs non-clustered index on large data in SQL

Which clustering or non-clustering index should I use in these two
cases?

With SSN as the primary key clustered index, a non-clustered index on dept will cover the query and be the most efficient regardless of the number of rows returned. Remember that the clustered index key (the primary key here) is implicitly included in non-clustered index leaf nodes as the row locator. This will avoid the need to access the separate data pages containing columns not needed by the query.

The execution plan should show only an index seek using the dept non-clustered index, touching only the data needed by the query.



Related Topics



Leave a reply



Submit