SQL Server Plans:Difference Between Index Scan/Index Seek

SQL Server Plans : difference between Index Scan / Index Seek

An index scan is where SQL server reads the whole of the index looking for matches - the time this takes is proportional to the size of the index.

An index seek is where SQL server uses the b-tree structure of the index to seek directly to matching records (see http://mattfleming.com/node/192 for an idea on how this works) - time taken is only proportional to the number of matching records.

  • In general an index seek is preferable to an index scan (when the number of matching records is proprtionally much lower than the total number of records), as the time taken to perform an index seek is constant regardless of the toal number of records in your table.
  • Note however that in certain situations an index scan can be faster than an index seek (sometimes significantly faster) - usually when the table is very small, or when a large percentage of the records match the predicate.

Why index scan instead of seek while using comparison operator

Well, I mean, you're selecting all the rows (except maybe one). There really is no difference between a seek and a scan here. SQL Server is choosing to perform a single scan of the skinniest index instead of doing 80,000 seeks (or however many orders are in the table).

A seek is not always the best choice, but this is a common misconception. In fact sometimes you absolutely want a scan.

  • Scans are better than seeks. Really.
  • Why the SQL Server FORCESCAN hint exists
  • Why isn't SQL Server using my non-clustered index and doing a clustered index scan?
  • T-SQL Tuesday #56 : SQL Server Assumptions (see #2)
  • “Tipping point” posts by Kimberly Tripp

What is best among clustered index scan vs non-clustered index seek

First of all, there is no 'best' operator. Sometimes reading more data is more efficient than reading some data and massage them to get our results. 'Best' as almost everything is relative.

Lets try to understand what happened in the comments...

The query

select 
min(CampaignID),
max(CampaignID)
from Campaign
where datecreated < dateadd(day, -90, getutcdate())

Which says:

I want the first and the last ID (min/max) of any record where the date is less than a constant date.

Clustered

The first query without the index/index hint did what SQL Server thought is cheaper than reading any index even if it requires more IO (disk usage). This is because finding the minimum and maximum while validating the records in the table is cheaper than selecting half of the table, then reordering/aggregating them find the exact same info.

The clustered index stores all data on disk and is logically ordered by the key columns, in this case CampaignID (I assume). This means, that to find the minimum and maximum ID is easy: The minimum is the first ID which matches the criteria -> lets check each ID from the first one and stop once we find a record where the date is in place (this will most probably be the first one). The maximum is the first record matching the condition from the end of the index.

Index with the date as key

CREATE NONCLUSTERED INDEX [NCIX] 
ON [dbo].[Campaign](DateCreated)
INCLUDE (Campaignid)

With the first index (date as the key column), SQL Server can use the date to filter the data, true, but it did not help in sorting. It still has to check every record in that index and figure out the minimum and maximum from a possibly unordered set of values.

Index with the ID as key

CREATE NONCLUSTERED INDEX [NCIX] 
ON [dbo].[Campaign](Campaignid)
INCLUDE (DateCreated)

With the second index where the ID was the key column, SQL Server can use the same trick as with the clustered key. The only difference is that there is no other data to read, but the ID and the date, which is much smaller than the whole record would be, therefore it can fit in less pages and requires less IO.

SQL Server will most probably choose the second index even if there is no index hint.

How the second index works (approximation by query)

You can get the minimum Campaignid by

SELECT TOP(1)
Campaignid
FROM
[dbo].[Campaign]
WHERE
datecreated < dateadd(day, -90, getutcdate())
ORDER BY
Campaignid ASC

and the maximum with a very similar query

SELECT TOP(1)
Campaignid
FROM
[dbo].[Campaign]
WHERE
datecreated < dateadd(day, -90, getutcdate())
ORDER BY
Campaignid DESC

If you cross join them as subqueries, you pretty much got what the execution plan describes.

Notes

Here I would add a note: optimizing for only one query is not always the best tactic. You can't optimize for everything, if this query runs once a day/week/quarter, that 14-15 seconds runtime with the clustered key will most probably do no harm. If the index does not help other queries, I would not create it, unless it is a mission critical query.

SQL Server Index Scan and Index Seek having the same performance

It is not really a seek. You just seek the beginning - and do a range scan from there. As Your range is (almost) identical to the whole table, there is no real difference. The count has to go through 100M of records, either in clustered or in nonclustered index. You can not expect it would be fast. And no, there is no rowcount stored for the table You can easily read.

Is an Index seek operation more costly compared to index scan when the data volume is high?

You will never see SQL Server choose a scan for this query if you have an index on StockItemID as this covers the query and there is no "tipping point" issue.

It will always choose a seek even if it estimates that 100% of the rows match.

Example

CREATE TABLE OrderLines
(
OrderID INT IDENTITY PRIMARY KEY,
StockItemID INT INDEX IX1
);

INSERT INTO OrderLines
(StockItemID)
SELECT 1
FROM sys.all_objects

SELECT StockItemID
FROM OrderLines
WHERE StockItemID = 1;

Sample Image

In the case that the seek returns all the rows in the table the only difference between a seek and an index ordered scan is how the first row is located (by navigating the depth of the B tree or simply going to the first index page from metadata). This is likely to be negligible.

One edge case where a scan may perform better would be if an allocation ordered scan was preferable and you are running with a table lock or nolock so this becomes a viable option.



Related Topics



Leave a reply



Submit