Database Design and The Use of Non-Numeric Primary Keys

Database Design and the use of non-numeric Primary Keys

There are 2 reasons I would always add an ID number to a lookup / ENUM table:

  1. If you are referencing a single column table with the name then you may be better served by using a constraint
  2. What happens if you wanted to rename one of the client_status entries? e.g. if you wanted to change the name from 'affiliate' to 'affiliate user' you would need to update the client table which should not be necessary. The ID number serves as the reference and the name is the description.

In the website table, if you are confident that the name will be unique then it is fine to use as a primary key. Personally I would still assign a numeric ID as it reduces the space used in foreign key tables and I find it easier to manage.

EDIT:
As stated above, you will run into problems if the website name is renamed. By making this the primary key you will be making it very difficult if not impossible for this to be changed at a later date.

Can I use a non-numerical primary key for a MySQL table?

There are two schools of thought on this topic.

There are some who hold strongly to the belief that using a "natural key" as the primary key for an entity table is desirable, because it has significant advantages over a surrogate key.

The are others that believe that a "surrogate" key can provide some desirable properties which a "natural" key may not.

Let's summarize some of the most important and desirable properties of a primary key:

  • minimal - fewest possible number of attributes
  • simple - native datatypes, ideally a single column
  • available - the value will always be available when the entity is created
  • unique - absolutely no duplicates, no two rows will ever have the same value
  • anonymous - carries no hidden "meaningful" information
  • immutable - once assigned, it will never be modified

(There are some other properties that can be listed, but some of those properties can be derived from the properties above (not null, can be indexed, etc.)


I break the two schools of thought regarding "natural" and "surrogate" keys as the "best" primary keys into two camps:

1) Those who have been badly burned by an earlier decision to elect a natural key as the primary key, and

2) Those who have not yet been burned by that decision.

Must database primary keys be integers?

You can use varchar as well as long as you make sure that each one is unique. This however isn't ideal (see article link below for more info).

What you are looking for is called natural key but a primary key with auto-increment and handled by the RDBMS is called surrogate key which is preferred way. Therefore you need to have it to be integer.

Learn more:

  • Surrogate Keys vs Natural Keys for Primary Key?
  • Why I prefer surrogate keys instead of natural keys in database design

What's the best practice for primary keys in tables?

I follow a few rules:

  1. Primary keys should be as small as necessary. Prefer a numeric type because numeric types are stored in a much more compact format than character formats. This is because most primary keys will be foreign keys in another table as well as used in multiple indexes. The smaller your key, the smaller the index, the less pages in the cache you will use.
  2. Primary keys should never change. Updating a primary key should always be out of the question. This is because it is most likely to be used in multiple indexes and used as a foreign key. Updating a single primary key could cause of ripple effect of changes.
  3. Do NOT use "your problem primary key" as your logic model primary key. For example passport number, social security number, or employee contract number as these "natural keys" can change in real world situations. Make sure to add UNIQUE constraints for these where necessary to enforce consistency.

On surrogate vs natural key, I refer to the rules above. If the natural key is small and will never change it can be used as a primary key. If the natural key is large or likely to change I use surrogate keys. If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you'd put a primary key in place.

Is it ok to use character values for primary keys?

I'd stay away from using text as your key - what happens in the future when you want to change the team ID for some team? You'd have to cascade that key change all through your data, when it's the exact thing a primary key can avoid. Also, though I don't have any emperical evidence, I'd think the INT key would be significantly faster than the text one.

Perhaps you can create views for your data that make it easier to consume, while still using a numeric primary key.

Why is the Primary Key often an integer in a Relational Database Management System?

An integer will use less disk space than a string, thus giving you a smaller index file to search through. This is important for large tables where you want to have as much of the index as possible cached in RAM.

Also, they can be autoincremented so you don't need to write your own routines to generate keys.

You often want to have a technical key (also called a surrogate key), a key that is only used to identify the row and not used for anything else. Most data may change sooner or later for reasons you can't control and you don't want to update it everywhere. Even such seemingly static data as a nation-assigned personal id number can change (if you get a new identity) or there may be laws prohibiting their use. A key generated by you, however, is in your own control. For such surrogate keys it's useful to have a small key that is easily generated.

As for "floats as primary keys": Don't do this. A primary key should uniquely identify a row. Floats have no equality relation, which means you cannot safely compare two float values for equality. This is an inherent shortcoming of floating-point values. If you need decimals, use a fixed-point number type instead.

Should I have a dedicated primary key field?

I would use a generated PK myself, just for the reasons you mentioned. Also, indexing and comparing by integer is faster than comparing by strings. You can put a unique index on the name field too without making it a primary key.

Should each and every table have a primary key?

Short answer: yes.

Long answer:

  • You need your table to be joinable on something
  • If you want your table to be clustered, you need some kind of a primary key.
  • If your table design does not need a primary key, rethink your design: most probably, you are missing something. Why keep identical records?

In MySQL, the InnoDB storage engine always creates a primary key if you didn't specify it explicitly, thus making an extra column you don't have access to.

Note that a primary key can be composite.

If you have a many-to-many link table, you create the primary key on all fields involved in the link. Thus you ensure that you don't have two or more records describing one link.

Besides the logical consistency issues, most RDBMS engines will benefit from including these fields in a unique index.

And since any primary key involves creating a unique index, you should declare it and get both logical consistency and performance.

See this article in my blog for why you should always create a unique index on unique data:

  • Making an index UNIQUE

P.S. There are some very, very special cases where you don't need a primary key.

Mostly they include log tables which don't have any indexes for performance reasons.

What are the design criteria for primary keys?

The criteria for consideration of a primary key are:

  • Uniqueness
  • Irreducibility (no subset of the key uniquely identifies a row in the table)
  • Simplicity (so that relational representation & manipulation can be simpler)
  • Stability (should not be altered frequently)
  • Familiarity (meaningful to the user)


Related Topics



Leave a reply



Submit