What's the difference between a Table Scan and a Clustered Index Scan?
In a table without a clustered index (a heap table), data pages are not linked together - so traversing pages requires a lookup into the Index Allocation Map.
A clustered table, however, has it's data pages linked in a doubly linked list - making sequential scans a bit faster. Of course, in exchange, you have the overhead of dealing with keeping the data pages in order on INSERT
, UPDATE
, and DELETE
. A heap table, however, requires a second write to the IAM.
If your query has a RANGE
operator (e.g.: SELECT * FROM TABLE WHERE Id BETWEEN 1 AND 100
), then a clustered table (being in a guaranteed order) would be more efficient - as it could use the index pages to find the relevant data page(s). A heap would have to scan all rows, since it cannot rely on ordering.
And, of course, a clustered index lets you do a CLUSTERED INDEX SEEK, which is pretty much optimal for performance...a heap with no indexes would always result in a table scan.
So:
For your example query where you select all rows, the only difference is the doubly linked list a clustered index maintains. This should make your clustered table just a tiny bit faster than a heap with a large number of rows.
For a query with a
WHERE
clause that can be (at least partially) satisfied by the clustered index, you'll come out ahead because of the ordering - so you won't have to scan the entire table.For a query that is not satisified by the clustered index, you're pretty much even...again, the only difference being that doubly linked list for sequential scanning. In either case, you're suboptimal.
For
INSERT
,UPDATE
, andDELETE
a heap may or may not win. The heap doesn't have to maintain order, but does require a second write to the IAM. I think the relative performance difference would be negligible, but also pretty data dependent.
Microsoft has a whitepaper which compares a clustered index to an equivalent non-clustered index on a heap (not exactly the same as I discussed above, but close). Their conclusion is basically to put a clustered index on all tables. I'll do my best to summarize their results (again, note that they're really comparing a non-clustered index to a clustered index here - but I think it's relatively comparable):
INSERT
performance: clustered index wins by about 3% due to the second write needed for a heap.UPDATE
performance: clustered index wins by about 8% due to the second lookup needed for a heap.DELETE
performance: clustered index wins by about 18% due to the second lookup needed and the second delete needed from the IAM for a heap.- single
SELECT
performance: clustered index wins by about 16% due to the second lookup needed for a heap. - range
SELECT
performance: clustered index wins by about 29% due to the random ordering for a heap. - concurrent
INSERT
: heap table wins by 30% under load due to page splits for the clustered index.
Index scan, Index seek and table scan
In case you do a SELECT *
- you want all columns - so in the end, SQL Server must go back to the base table data. In such a case, often it's cheaper to just do a table scan (or clustered index scan) rather than an index seek with an expensive key lookup (or RID lookup, if no clustered index is present).
If you have a lot of rows, then at some point it will become more efficient for SQL Server to do an index scan and a single (or a few) key/RID lookups - so if you have thousands of rows in your sample table - at some point (the "tipping point"), SQL Server will start using your nonclustered index.
In the second case, when you do SELECT id
, you only want the id
column - and that column is in the index page - so an index seek on that index will give SQL Server all that it needs to satisfy this query - therefore, an index seek is typically much faster and will be preferred over the table scan.
This is one of the many reasons why you should try to avoid using SELECT * FROM dbo.Table
as much as you can. With a SELECT *
, more often than not, nonclustered indexes are not used, and a table (or clustered index) scan is used instead.
Why index scan instead of seek while using comparison operator
Well, I mean, you're selecting all the rows (except maybe one). There really is no difference between a seek and a scan here. SQL Server is choosing to perform a single scan of the skinniest index instead of doing 80,000 seeks (or however many orders are in the table).
A seek is not always the best choice, but this is a common misconception. In fact sometimes you absolutely want a scan.
- Scans are better than seeks. Really.
- Why the SQL Server FORCESCAN hint exists
- Why isn't SQL Server using my non-clustered index and doing a clustered index scan?
- T-SQL Tuesday #56 : SQL Server Assumptions (see #2)
- “Tipping point” posts by Kimberly Tripp
How Query optimizer determines when to perform full table scan or Index scan?
The most common reason is that the optimizer estimates that using the index will in fact be more costly than simply reading all the rows.
If the specific value you're searching for (in this case officeCode value of 1) occurs on a large enough subset of rows, the optimizer decides that reading the index entries only to then be redirected to the table rows is a waste of time. For the same reason that very common words are not included in the index at the back of a book.
Another factor is that the data is read into RAM in pages, so if your table is quite small, it's likely to fit all rows onto a single page. Once the search is narrowed down to a single page, the benefit of an index is trivial. Since data is stored on a different page than the index, using an index could even result in reading more pages than just doing the table-scan on a single page.
Your visual EXPLAIN shows that the number of rows examined by the table-scan is about 23 rows, so I would guess that these might reside on one page.
You might like to read https://dev.mysql.com/doc/refman/8.0/en/cost-model.html
Related Topics
Group Query Results by Month and Year in Postgresql
How to Connect an Existing SQL Server Login to an Existing SQL Server Database User of Same Name
How to Manually Execute SQL Commands in Ruby on Rails Using Nuodb
SQL Server Script to Create a New User
Set Database from Single User Mode to Multi User
Find Duplicate Records in a Table Using SQL Server
Is There Any Difference Between "!=" and "<>" in Oracle SQL
How to Combine 2 Select Statements into One
Multiple Full Outer Join on Multiple Tables
Sql-Server: Error - Exclusive Access Could Not Be Obtained Because the Database Is in Use
SQL Server: How to Get All Child Records Given a Parent Id in a Self Referencing Table
Use Tnsnames.Ora in Oracle SQL Developer
How to Find the Size of a Table in SQL