Performance Difference Between Primary Key and Unique Clustered Index in SQL Server

What’s the difference between a primary key and a clustered index?

Well, for starters, one is a key, and the other one is an index.

In most database lingo, key is something that somehow identifies the data, with no explicit relation to the storage or performance of the data. And a primary key is a piece of data that uniquely identifies that data.

An index on the other hand is something that describes a (faster) way to access data. It does not (generally) concern itself with the integrity and meaning of the data, it's just concerned with performance and storage. In SQL Server specifically, a clustered index is an index that dictates the physical order of storage of the rows. The things that it does are quite complex, but a useful approximation is that the rows are ordered by the value of the clustered index. This means that when you do not specify a order clause, the data is likely to be sorted by the value of the clustered index.

So, they are completely different things, that kinda-sorta compliment each other. That is why SQL Server, when you create a primary key via the designer, throws in a free clustered index along with it.

Difference in performance between clustered and non-clustered indexes

Maybe it is for efficiency. The nonclustered index is normally smaller than the clustered index, because the clustered index at the leaf level contains all the (non-LOB) fields. So maybe it prefers to use the nonclustered index to enforce the foreign key constraints.

Update: I have done some further tests using AdventureWorks database, which bear out this theory. See below.

I can reproduce the problem using two tables T1 and T2. T1 is the parent and there is a foreign key relationship from T2 to T1.

When T1 has a clustered primary key constraint and a nonclustered unique index Ix-T1, I can alter the table and drop the clustered primary key constraint, but I can't drop Ix-T1 as you found.

If I make T1 with a nonclustered primary key constraint and a clustered unique index Ix_T1, then the situation is reversed: I can drop Ix-T1, but I can't remove the primary key constraint.

CREATE TABLE T1
(
id int NOT NULL CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED
);

CREATE UNIQUE NONCLUSTERED INDEX Ix_T1
ON T1(id);

CREATE TABLE T2
(
id2 int NOT NULL PRIMARY KEY CLUSTERED,
id1 int NOT NULL FOREIGN KEY REFERENCES dbo.T1(id)
);

INSERT INTO T1 (id)
VALUES (1), (2), (3), (4);

INSERT INTO T2 (id2, id1)
VALUES (11, 1), (12, 2), (13, 3);

Try to drop the nonclustered index. This fails.

DROP INDEX Ix_T1
ON dbo.T1;

Sample Image

However I can drop the clustered primary key constraint.

ALTER TABLE dbo.T1
DROP CONSTRAINT PK_T1;

Sample Image

Repeat the test with T1 having a nonclustered primary key and a clustered unique index.

CREATE TABLE T1
(
id int NOT NULL CONSTRAINT PK_T1 PRIMARY KEY NONCLUSTERED
);

CREATE UNIQUE CLUSTERED INDEX Ix_T1
ON T1(id);

This time, I cannot drop the primary key constraint.

ALTER TABLE dbo.T1
DROP CONSTRAINT PK_T1;

Sample Image

However I can drop the clustered index.

DROP INDEX Ix_T1
ON dbo.T1;

Sample Image

So, if my theory is correct, the performance could suffer if you remove the nonclustered index. You might want to do some investigation and tests.

Is there any documentation for the database schema explaining why the index exists? Or can you ask the person who designed the database?

I've done some further tests using AdventureWorks2014, which bear out my theory.

USE AdventureWorks2014;
GO
CREATE SCHEMA test;
GO

-- Create two test tables
SELECT *
INTO test.SalesOrderHeader
FROM Sales.SalesOrderHeader;

SELECT *
INTO test.SalesOrderDetail
FROM Sales.SalesOrderDetail;

-- Test 1 - Clustered primary key and nonclustered index
ALTER TABLE test.SalesOrderHeader
ADD CONSTRAINT PK_Test_SalesOrderHeader PRIMARY KEY CLUSTERED (SalesOrderID);

CREATE UNIQUE NONCLUSTERED INDEX Ix_Test_SalesOrderHeader
ON test.SalesOrderHeader(SalesOrderID);

-- Test 2 - Nonclustered primary key and clustered index
CREATE UNIQUE CLUSTERED INDEX Ix_Test_SalesOrderHeader
ON test.SalesOrderHeader(SalesOrderID);

ALTER TABLE test.SalesOrderHeader
ADD CONSTRAINT PK_Test_SalesOrderHeader PRIMARY KEY NONCLUSTERED (SalesOrderID);

-- Test 3 - Clustered primary key only
ALTER TABLE test.SalesOrderHeader
ADD CONSTRAINT PK_Test_SalesOrderHeader PRIMARY KEY CLUSTERED (SalesOrderID);

-- Same for all tests
ALTER TABLE test.SalesOrderDetail
ADD CONSTRAINT PK_Test_SalesOrderDetail PRIMARY KEY CLUSTERED (SalesOrderDetailID);

ALTER TABLE test.SalesOrderDetail
ADD CONSTRAINT FK_Test_SalesOrderDetail_SalesOrderHeader FOREIGN KEY (SalesOrderID) REFERENCES test.SalesOrderHeader(SalesOrderID);

-- Update 100 records in SalesOrderDetail
UPDATE test.SalesOrderDetail
SET SalesOrderID = SalesOrderID + 1
WHERE SalesOrderDetailID BETWEEN 57800 AND 57899;

Actual execution plan for test 1.

Sample Image

Actual execution plan for test 2. The estimated subtree cost for the Index Seek operator is almost identical to test 1.

Sample Image

Actual execution plan for test 3. The estimated subtree cost for the Index Seek is more than double test 1 or test 2.

Sample Image

And here is a query that measures the sizes of the indexes. (Test 1 configuration.) You can clearly see that the clustered index is much bigger.

-- Measure sizes of indexes
SELECT I.object_id, I.name, I.index_id, I.[type], I.[type_desc], SUM(s.used_page_count) * 8 AS 'IndexSizeKB'
FROM sys.indexes AS I
INNER JOIN sys.dm_db_partition_stats AS S
ON S.[object_id] = I.[object_id] AND S.index_id = I.index_id
WHERE I.[object_id] = OBJECT_ID('test.SalesOrderHeader')
GROUP BY I.object_id, I.name, I.index_id, I.[type], I.[type_desc];

Sample Image

Here are some references that explain clustered indexes and nonclustered indexes.

TechNet > Tables and Index Data Structures Architecture: https://technet.microsoft.com/en-us/library/ms180978(v=sql.105).aspx

Training kit 70-462 Administering Microsoft SQL Server 2012 Databases > Chapter 10: Indexes and Concurrency > Lesson 1: Implementing and Maintaining Indexes

Microsoft SQL Server 2012 Internals by Kalen Delaney > Chapter 7: Indexes: internals and management

Difference between Primary Key and Unique Index in SQL Server

From SQL UNIQUE Constraint

The UNIQUE constraint uniquely identifies each record in a database
table.

The UNIQUE and PRIMARY KEY constraints both provide a
guarantee for uniqueness for a column or set of columns.

A PRIMARY
KEY constraint automatically has a UNIQUE constraint defined on it.

Note that you can have many UNIQUE constraints per table, but only one
PRIMARY KEY constraint per table.

Also, from Create Unique Indexes

You cannot create a unique index on a single column if that column
contains NULL in more than one row. Similarly, you cannot create a
unique index on multiple columns if the combination of columns
contains NULL in more than one row. These are treated as duplicate
values for indexing purposes.

Whereas from Create Primary Keys

All columns defined within a PRIMARY KEY constraint must be defined as
NOT NULL. If nullability is not specified, all columns participating
in a PRIMARY KEY constraint have their nullability set to NOT NULL.

Relationship of Primary Key and Clustered Index

A primary key is a logical concept - it's the unique identifier for a row in a table. As such, it has a bunch of attributes - it may not be null, and it must be unique. Of course, as you're likely to be searching for records by their unique identifier a lot, it would be good to have an index on the primary key.

A clustered index is a physical concept - it's an index that affects the order in which records are stored on disk. This makes it a very fast index when accessing data, though it may slow down writes if your primary key is not a sequential number.

Yes, you can have a primary key without a clustered index - and sometimes, you may want to (for instance when your primary key is a combination of foreign keys on a joining table, and you don't want to incur the disk shuffle overhead when writing).

Yes, you can create a clustered index on columns that aren't a primary key.

Performance between non-clustered index and composite primary key

This is too long for a comment.

How much slower is "slower"? When searching through a non-clustered index, the database engine needs to find the row references in the index (quite fast) and then load the data pages to fetch the row.

When searching using a clustered index, there is no need to load the data pages.

The difference is likely to be much more noticeable when fetching multiple rows, because the clustered index will have the data on the same data pages. The non-clustered index is likely to be fetching from a different page for each item being retrieved (up to a point).

You can compare the difference in performance by fetching only columns in the index. This might not be what you want, but it is a viable performance comparison. These should be similar between the two indexes.

This might explain the difference in performance. If so, then this is nothing to worry about, because it is the expected overhead when not using a clustered index. In general, this is relatively big for queries that are fast and less important for queries that are slower.



Related Topics



Leave a reply



Submit