Relationship of Primary Key and Clustered Index

Relationship of Primary Key and Clustered Index

A primary key is a logical concept - it's the unique identifier for a row in a table. As such, it has a bunch of attributes - it may not be null, and it must be unique. Of course, as you're likely to be searching for records by their unique identifier a lot, it would be good to have an index on the primary key.

A clustered index is a physical concept - it's an index that affects the order in which records are stored on disk. This makes it a very fast index when accessing data, though it may slow down writes if your primary key is not a sequential number.

Yes, you can have a primary key without a clustered index - and sometimes, you may want to (for instance when your primary key is a combination of foreign keys on a joining table, and you don't want to incur the disk shuffle overhead when writing).

Yes, you can create a clustered index on columns that aren't a primary key.

What’s the difference between a primary key and a clustered index?

Well, for starters, one is a key, and the other one is an index.

In most database lingo, key is something that somehow identifies the data, with no explicit relation to the storage or performance of the data. And a primary key is a piece of data that uniquely identifies that data.

An index on the other hand is something that describes a (faster) way to access data. It does not (generally) concern itself with the integrity and meaning of the data, it's just concerned with performance and storage. In SQL Server specifically, a clustered index is an index that dictates the physical order of storage of the rows. The things that it does are quite complex, but a useful approximation is that the rows are ordered by the value of the clustered index. This means that when you do not specify a order clause, the data is likely to be sorted by the value of the clustered index.

So, they are completely different things, that kinda-sorta compliment each other. That is why SQL Server, when you create a primary key via the designer, throws in a free clustered index along with it.

Clustered index on foreign key or primary key?

Having the clustered index on the identity field has the advantage that the records will be stored in the order that they are created. New records are added at the end of the table.

If you use the foreign key as clustered index, the records will be stored in that order instead. When you create new records the data will be fragmented as records are inserted in the middle, which can reduce performance.

If you want an index on the foreign key, then just add a non-clustered index for it.

Should primary keys be always assigned as clustered index

The ideal clustered index key is:

  1. Sequential
  2. Selective (no dupes, unique for each record)
  3. Narrow
  4. Used in Queries

In general it is a very bad idea to use a GUID as a clustered index key, since it leads to mucho fragmentation as rows are added.

EDIT FOR CLARITY:

PK and Clustered key are indeed separate concepts. Your PK does not need to be your clustered index key.

In practical applications in my own experience, the same field that is your PK should/would be your clustered key since it meets the same criteria listed above.

Can I have a primary key and a separate clustered index together?

It depends on your dbms. Not all of them implement clustered indexes. Those that do are liable to implement them in different ways. As far as I know, every platform that implements clustered indexes also provides ways to choose which columns are in the clustered index, although often the primary key is the default.

In SQL Server, you can create a nonclustered primary key and a separate clustered index like this.

create table test (
test_id integer primary key nonclustered,
another_column char(5) not null unique clustered
);

I think that the closest thing to this in Oracle is an index organized table. I could be wrong. It's not quite the same as creating a table with a clustered index in SQL Server.

You can't have multiple clustered indexes on a single table in SQL Server. A table's rows can only be stored in one order at a time. Actually, I suppose you could store rows in multiple, distinct orders, but you'd have to essentially duplicate all or part of the table for each order. (Although I didn't know it at the time I wrote this answer, DB2 UDB supports multiple clustered indexes, and it's quite an old feature. Its design and implementation is quite different from SQL Server.)

A primary key's job is to guarantee uniqueness. Although that job is often done by creating a unique index on the primary key column(s), strictly speaking uniqueness and indexing are two different things with two different aims. Uniqueness aims for data integrity; indexing aims for speed.

A primary key declaration isn't intended to give you any information about the order of rows on disk. In practice, it usually gives you some information about the order of index entries on disk. (Because primary keys are usually implemented using a unique index.)

If you SELECT rows from a table that has a clustered index, you still can't be assured that the rows will be returned to the user in the same order that they're stored on disk. Loosely speaking, the clustered index helps the query optimizer find rows faster, but it doesn't control the order in which those rows are returned to the user. The only way to guarantee the order in which rows are returned to the user is with an explicit ORDER BY clause. (This seems to be a fairly frequent point of confusion. A lot of people seem surprised when a bare SELECT on a clustered index doesn't return rows in the order they expect.)

Why a primary key automatically creates a clustered index

Oracle will create an index to police an unique constraint where no pre-existing index is suitable. Without the index, Oracle would need to serialize operations (such as a table lock) whenever someone tries to insert or delete a row (or update the PK).

Contrarily to MS-SQL Server, this index is not clustered on heap tables (default table organization), i.e. this index won't change the underlying table structure and natural order. The rows won't be reordered when Oracle creates the index. The index will be a B-tree index and will exist as a separate entity where each entry points to a row in the main table.

Oracle doesn't have clustered index as MS SQL, however indexed-organized tables share some properties with cluster-indexed tables. The PK is an integral part of such tables and has to be specified during creation.

(Oracle also has table clusters, but they are a completely different concept).



Related Topics



Leave a reply



Submit