What's the Best Practice For Primary Keys in Tables

What's the best practice for primary keys in tables?

I follow a few rules:

  1. Primary keys should be as small as necessary. Prefer a numeric type because numeric types are stored in a much more compact format than character formats. This is because most primary keys will be foreign keys in another table as well as used in multiple indexes. The smaller your key, the smaller the index, the less pages in the cache you will use.
  2. Primary keys should never change. Updating a primary key should always be out of the question. This is because it is most likely to be used in multiple indexes and used as a foreign key. Updating a single primary key could cause of ripple effect of changes.
  3. Do NOT use "your problem primary key" as your logic model primary key. For example passport number, social security number, or employee contract number as these "natural keys" can change in real world situations. Make sure to add UNIQUE constraints for these where necessary to enforce consistency.

On surrogate vs natural key, I refer to the rules above. If the natural key is small and will never change it can be used as a primary key. If the natural key is large or likely to change I use surrogate keys. If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you'd put a primary key in place.

Best practice to choose primary keys in a relational DB, what is the smartest solution?

Even if a table has a good natural key, it is still generally preferable to assign a surrogate key (usually a numeric auto increment column).

First, as jarlh points out, even countries can and do change their names from time to time, which you can handle easily with a CountryID value.

Also, though, many times a natural key is composed of character data. SQL deals with numbers faster than it deals with characters, so there is a performance boost using numeric ID values.

And it's currently the standard practice in data warehousing, so developers are accustomed to seeing those SK columns.

Best practice? Probably. Standard practice? Definitely. Go with the autoincrements.

Best Practices for Primary Keys

so that data is spread more evenly through the pages of the database

So you recognize the primary key's properties that will affect performance.

what would be a good primary key combination...?

Based on your data size the combination of keys you mention, did you see any performance differences? If so, how significant? Ultimately you have to find the right combination based on your apps needs of size, speed, load, and other characteristics that Bogdan Sahlean, Martin Smith, and Dan Guzman have raised in the comments above.

Would there be any benefit of making the Department the first column in the PK, followed by Request_ID, and then Condition_ID?

Depends on the queries and size of data. PK choice will affect how the data is also stored obviating the need to make separate indexes because you get a clustered index (for free) based on the PK.

Is an incremental key the same thing as an aggregate key?

No. If you are speaking of SQLSERVER, incremental refers to the auto-increment value that the server manages for that field when adding, removing records. Aggregate key refers to combination of column values that form an aggregate key for that row (often for uniqueness). In your example, all these aggregates are unique (therefore candidates for PK/clustering):

Condition_ID

Condition_ID + Department

Condition_ID + Department + Request_ID

Should each and every table have a primary key?

Short answer: yes.

Long answer:

  • You need your table to be joinable on something
  • If you want your table to be clustered, you need some kind of a primary key.
  • If your table design does not need a primary key, rethink your design: most probably, you are missing something. Why keep identical records?

In MySQL, the InnoDB storage engine always creates a primary key if you didn't specify it explicitly, thus making an extra column you don't have access to.

Note that a primary key can be composite.

If you have a many-to-many link table, you create the primary key on all fields involved in the link. Thus you ensure that you don't have two or more records describing one link.

Besides the logical consistency issues, most RDBMS engines will benefit from including these fields in a unique index.

And since any primary key involves creating a unique index, you should declare it and get both logical consistency and performance.

See this article in my blog for why you should always create a unique index on unique data:

  • Making an index UNIQUE

P.S. There are some very, very special cases where you don't need a primary key.

Mostly they include log tables which don't have any indexes for performance reasons.

Best practice: foreign keys as primary keys or unique constraint

The first you listed because it is simpler.
Both absolve the same task so why complicate your life?
Read this: KISS

The best choice for Person table primary key

As mentioned above, use an auto-increment as your primary key. But I don't believe this is your real question.

Your real question is how to avoid duplicate entries. In theory, there is no way - 2 people could be born on the same day, with the same name, and live in the same household, and not have a social insurance number available for one or the other. (One might be a foreigner visiting the country).

However, the combination of full name, birthdate, address, and telephone number is usually sufficient to avoid duplication. Note that addresses may be entered differently, people may have multiple phone numbers, and people may choose to omit their middle name or use an initial. It depends on how important it is to avoid duplicate entries, and how large is your userbase (and thus the likelihood of a collision).

Of course, if you can get the SSN/SIN then use that to determine uniqueness.

Composite Primary Keys : Good or Bad?

There is no conclusion that composite primary keys are bad.

The best practice is to have some column or columns that uniquely identify a row. But in some tables a single column is not enough by itself to uniquely identify a row.

SQL (and the relational model) allows a composite primary key. It is a good practice is some cases. Or, another way of looking at it is that it's not a bad practice in all cases.

Some people have the opinion that every table should have an integer column that automatically generates unique values, and that should serve as the primary key. Some people also claim that this primary key column should always be called id. But those are conventions, not necessarily best practices. Conventions have some benefit, because it simplifies certain decisions. But conventions are also restrictive.

You may have an order with multiple payments because some people purchase on layaway, or else they have multiple sources of payment (two credit cards, for instance), or two different people want to pay for a share of the order (I frequently go to a restaurant with a friend, and we each pay for our own meal, so the staff process half of the order on each of our credit cards).

I would design the system you describe as follows:

Products  : product_id (PK)

Orders : order_id (PK)

LineItems : product_id is (FK) to Products
order_id is (FK) to Orders
(product_id, order_id) is (PK)

Payments : order_id (FK)
payment_id - ordinal for each order_id
(order_id, payment_id) is (PK)

This is also related to the concept of identifying relationship. If it's definitional that a payment exists only because an order exist, then make the order part of the primary key.

Note the LineItems table also lacks its own auto-increment, single-column primary key. A many-to-many table is a classic example of a good use of a composite primary key.

Best practices on primary key, auto-increment, and UUID in SQL databases

It's a matter of choice actually and this question can raise opinion based answers from my point of view. What I always do, even if it's redundant is I create primary key on auto increment column (I call it technical key) to keep it consistent within the database, allow for "primary key" to change in case something went wrong at design phase and also allow for less space to be consumed in case that key is being pointed to by foreign key constraint in any other table and also I make the candidate key unique and not null.

Technical key is something you don't normally show to end users, unless you decide to. This can be the same for other technical columns that you're keeping only at database level for any purpose you may need like modify date, create date, version, user who changed the record and more.

In this case I would go for your second option, but slightly modified:

CREATE TABLE users(
pk INT NOT NULL AUTO_INCREMENT,
id UUID NOT NULL,
.....
PRIMARY KEY(pk),
UNIQUE(id)
);


Related Topics



Leave a reply



Submit