When Should I Use Primary Key or Index

When should I use primary key or index?

Basically, a primary key is (at the implementation level) a special kind of index. Specifically:

  • A table can have only one primary key, and with very few exceptions, every table should have one.
  • A primary key is implicitly UNIQUE - you cannot have more than one row with the same primary key, since its purpose is to uniquely identify rows.
  • A primary key can never be NULL, so the row(s) it consists of must be NOT NULL

A table can have multiple indexes, and indexes are not necessarily UNIQUE. Indexes exist for two reasons:

  • To enforce a uniquness constraint (these can be created implicitly when you declare a column UNIQUE)
  • To improve performance. Comparisons for equality or "greater/smaller than" in WHERE clauses, as well as JOINs, are much faster on columns that have an index. But note that each index decreases update/insert/delete performance, so you should only have them where they're actually needed.

What index should I use when using JOIN on PRIMARY KEY

MySQL's Optimizer does not care which table comes first in a JOIN. It will look at statistics (etc) to decide for itself whether to start with Hotel or HotelRoom. So, you should write indexes for both cases, so as not to restrict the Optimizer.

MySQL almost always performs a JOIN by scanning one table. Then, for each row in that table, look up the necessary row(s) in the other table. See "Nested Loop Join" or "NLJ". This implies that the optimal indexes are (often) thus: For the 'first' table, columns of the WHERE clause involving the first table. For the second table, the columns from both the WHERE and ON clauses involving the second table.

Assuming that the Optimizer started with Hotel:

Hotel: INDEX(IsClosed, Enabled)   -- in either order
HotelRoom: INDEX(Deleted, Enabled, HotelId) -- in any order

If it started with HotelRoom:

HotelRoom:  INDEX(Deleted, Enabled)  -- in either order
Hotel: PRIMARY KEY(HotelId) -- which you already have?

If there are a lot of closed/disabled hotels, then this may be beneficial:

Hotel: INDEX(IsClosed, Enabled, HotelId)

As Tim mentioned, it may be beneficial to augment an index to include the rest of the columns mentioned, thereby making the index "covering". (But don't do this with the PRIMARY KEY or any UNIQUE key.)

If you provide SHOW CREATE TABLE and the sizes of the tables, we might have further suggestions.

What is the difference between a primary key and a index key

A primary key is a special kind of index in that:

  • there can be only one;
  • it cannot be nullable; and
  • it must be unique.

You tend to use the primary key as the most natural unique identifier for a row (such as social security number, employee ID and so forth, although there is a school of thought that you should always use an artificial surrogate key for this).

Indexes, on the other hand, can be used for fast retrieval based on other columns. For example, an employee database may have your employee number as the primary key but it may also have an index on your last name or your department.

Both of these indexes (last name and department) would disallow NULLs (probably) and allow duplicates (almost certainly), and they would be useful to speed up queries looking for anyone with (for example) the last name 'Corleone' or working in the 'HitMan' department.

Does MySQL create an extra index for primary key or uses the data itself as an index

Clustered and Secondary Indexes

Every InnoDB table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key. To get the best performance from queries, inserts, and other database operations, you must understand how InnoDB uses the clustered index to optimize the most common lookup and DML operations for each table.

  • When you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index

  • If you do not define a PRIMARY KEY for your table, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.

  • If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.

How the Clustered Index Speeds Up Queries

Accessing a row through the clustered index is fast because the index search leads directly to the page with all the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record.

Is primary key also index?

  1. In MySQL a PRIMARY or UNIQUE KEY creates an index on the columns defined in the constraint. If there are multiple columns a composite index is created.

    If its an InnoDB table the PRIMARY KEY also becomes the clustered index for the table.

  2. It doesn't make sense to add additional indexes with the same definitions as a PRIMARY/UNIQUE.

For other RDBMS an index will be required for these constraints. Even if you are allowed to create a constraint without an appropriate index, it will be required to get any reasonable performance.

Is there any benefit to creating and index on a primary key?

To your first question, yets, you're safe to assume that.

To the second question:

Indexes help to speed up searching - it's like an index in a book. They can help the DB engine jump to the correct record, just as an index can help you jump to the right page in a book.

The benefit to indexes that you might create youself depends on how you intend to search the data.

In your example, I'd create an INDEX on the name fields if you're going to search on them in your app.

Should each and every table have a primary key?

Short answer: yes.

Long answer:

  • You need your table to be joinable on something
  • If you want your table to be clustered, you need some kind of a primary key.
  • If your table design does not need a primary key, rethink your design: most probably, you are missing something. Why keep identical records?

In MySQL, the InnoDB storage engine always creates a primary key if you didn't specify it explicitly, thus making an extra column you don't have access to.

Note that a primary key can be composite.

If you have a many-to-many link table, you create the primary key on all fields involved in the link. Thus you ensure that you don't have two or more records describing one link.

Besides the logical consistency issues, most RDBMS engines will benefit from including these fields in a unique index.

And since any primary key involves creating a unique index, you should declare it and get both logical consistency and performance.

See this article in my blog for why you should always create a unique index on unique data:

  • Making an index UNIQUE

P.S. There are some very, very special cases where you don't need a primary key.

Mostly they include log tables which don't have any indexes for performance reasons.



Related Topics



Leave a reply



Submit