Why Is There a Scan on My Clustered Index

Why is there a scan on my clustered index?

Here's a good blog post about when SQL Server reaches the "tipping point" and switches from an index seek to an index/table scan:

http://www.sqlskills.com/BLOGS/KIMBERLY/post/The-Tipping-Point-Query-Answers.aspx

You may want to look at the way your queries are filtering, as the tipping point is often much fewer rows than people expect.

What Clustered Index Scan (Clustered) means on SQL Server execution plan?

I would appreciate any explanations to "Clustered Index Scan
(Clustered)"

I will try to put in the easiest manner, for better understanding you need to understand both index seek and scan.

SO lets build the table

use tempdb GO

create table scanseek (id int , name varchar(50) default ('some random names') )

create clustered index IX_ID_scanseek on scanseek(ID)

declare @i int
SET @i = 0
while (@i <5000)
begin
insert into scanseek
select @i, 'Name' + convert( varchar(5) ,@i)
set @i =@i+1
END

An index seek is where SQL server uses the b-tree structure of the index to seek directly to matching records

Sample Image

you can check your table root and leaf nodes using the DMV below

-- check index level 
SELECT
index_level
,record_count
,page_count

,avg_record_size_in_bytes
FROM sys.dm_db_index_physical_stats(DB_ID('tempdb'),OBJECT_ID('scanseek'),NULL,NULL,'DETAILED')
GO

Now here we have clustered index on column "ID"

lets look for some direct matching records

select * from scanseek where id =340

and look at the Execution plan

Sample Image

you've requested rows directly in the query that's why you got a clustered index SEEK .

Clustered index scan: When Sql server reads through for the Row(s) from top to bottom in the clustered index.
for example searching data in non key column. In our table NAME is non key column so if we will search some data in the name column we will see clustered index scan because all the rows are in clustered index leaf level.

Example

select * from scanseek where name = 'Name340'

Sample Image

please note: I made this answer short for better understanding only, if you have any question or suggestion please comment below.

Why isn't SQL Server using my clustered index and doing a non-clustered index scan?

Basically, your second query wants to get all ID values from the table (no WHERE clause or anything).

SQL Server can do this two ways:

  • clustered index scan - basically a full table scan to read all the data from all rows, and extract the ID from each row - would work, but it loads the WHOLE table, one by one
  • do a scan across the non-clustered index, because each non-clustered index also includes the clustering column(s) on its leaf level. Since this is a index that is much smaller than the full table, to do this, SQL Server will need to load fewer data pages and thus can provide the answer - all ID values from all rows - faster than when doing a full table scan (clustered index scan)

The cost-based optimizer in SQL Server just picks the more efficient route to get the answer to the question you've asked with your second query.

TOP(1) vs MIN problem - Why does SQL Server use a clustered scan rather than seeking the covering index?

Even though RowNumber, the clustered index key, is a key value on IX_Transactions_TransactionDate, the index keys are ordered first by TransactionDate, then by RowNumber. The MIN(RowNumber) may not be on the first row with TransactionDate >= '20191002 04:00:00.000'.

Consider if the IX_Transactions_TransactionDate contained the key values:

(20191002 04:00:00.000,10),
(20191002 05:00:00.000,11),
(20191002 06:00:00.000,1)

The result of

SELECT MIN(RowNumber) FROM FintracTransactions WHERE TransactionDate >= '20191002 04:00:00.000' OPTION(RECOMPILE)

is 1. While the result of:

SELECT TOP(1) RowNumber FROM FintracTransactions WHERE TransactionDate >= '20191002 04:00:00.000' ORDER BY TransactionDate OPTION(RECOMPILE)

is 10.

So the optimizer's real choice is to scan every value from IX_Transactions_TransactionDate after the target date, or to scan the clustered index from the beginning until it finds the first row with a qualifying TransactionDate.

You should see that the execution plan for:

SELECT MIN(RowNumber) FROM [Transactions] with (index=[IX_Transactions_TransactionDate]) WHERE TransactionDate >= '20191002 04:00:00.000' OPTION(RECOMPILE)

has a higher estimated cost than the execution plan for:

SELECT MIN(RowNumber) FROM [Transactions]  WHERE TransactionDate >= '20191002 04:00:00.000' OPTION(RECOMPILE)

Why is my query doing a clustered index scan when it has an index

I have figured it out, as mentioned in some articles I read the selectivity has to be under 1%. I was mislead by an article about the tipping points thinking it would only go into scan mode if it was returning 30%+

But actual testing shows that anything returning 6+ rows is a scan, and anything less is a seek.

So if it returns about 0.5% it will do a seek, else a scan. 0.5% of my 830 rows is 4.15 rows.

Hope this helps someone.

Why/when/how is whole clustered index scan chosen rather than full table scan?

Please read my answer under "No direct access to data row in clustered table - why?", first.

"the leaf of clustered index contains the real table row, so full clustered index, with intermediate leaves, contain much more data than the full table(?)"

See you are mixing up "Table" with storage structures. In the context of your question, eg. thinking about the size of the CI as opposed to the "table", well then you must think about the CI minus the leaf level (which is the data row). The CI, index portion only, is tiny. The intermediate levels (like any B-Tree) contain partial (not full) key entries; it excludes the lowest level, which is the full key entry, which sits in the row itself, and is not duplicated.

The table (full CI) may be 10GB. The CI only may be 10MB. There is an awful lot that can be determined from the 10MB without having to go to the 100GB.

For understanding: the equivalent NCI on the same table (CI) may be 22MB; the equivalent NCI on the same table if you removed the CI may be 21.5MB (assuming the CI key is reasonable, not fat wide).

"Why/when/how is ever whole clustered index scan chosen over the full table scan?"

Quite often. Again the context is, we are talking about the CI-minus-Leaf levels. For queries that use only the columns in the CI, the presence of those columns in the CI (any index actually) allow the query to be a "covered query", which means it can by serviced wholly from the index, no need to go to the data rows. Think range scans on partial keys: BETWEEN x AND yY; x <= y; etc.

(There is always the chance that the optimiser will choose a table scan, when you think it should choose an index scan, bu t that is a different story.)

"I still do not understand how/why clustered index full scan can be "better" over full table scan."

(The terms used by MS are less precise than my answers here.) For any query that can be answered from the 10MB CI, I would much rather churn 10MB through the data cache, than 100GB. For the same queries, bounded by a range on the CI key, that's a fraction of the 10MB.

For queries that requires a "full table scan", well yes, you must read all the Leaf pages of the CI, which is the 100GB.

SQL Server: Why Clustered Index Scan and not Table Scan?

A clustered index is the index that stores all the table data. So a table scan is the same as a clustered index scan.

In a table without a clustered index (a "heap"), a table scan requires crawling through all data pages. That is what the query optimizer calls a "table scan".

SQL Server : expected clustered index scan, but got non clustered index scan

The non-clustered index is a covering index for the query. That is, the index contains all of the columns needed to satisfy the query.

The execution plan is showing that SQL Server is using the non-clustered index.

For the given query, it seems like a reasonable execution plan.

If there were some predicate (a WHERE clause condition on a column) or an ORDER BY clause, then we would expect that to influence which index is used.

But in this case, retrieving two columns (a2 and coda) for every row in the table with the rows returned in an unspecified order, then a full scan of either index is a suitable plan.



Related Topics



Leave a reply



Submit