What Are the Pros and Cons of Using Multi Column Primary Keys

What are the pros and cons of using multi column primary keys?

This really seems to be a question about surrogate keys, which are always either an auto-incrementing number or GUID and hence a single column, vs. natural keys, which often require multiple pieces of information in order to be truly unique. If you are able to have a natural key that is only one column, then the point is obviously moot anyway.

Some people will insist on only using one or the other. Spend sufficient time working with production databases and you'll learn that there is no context-independent best practice.

Some of these answers use SQL Server terminology but the concepts are generally applicable to all DBMS products:


Reasons to use single-column surrogate keys:

  • Clustered indexes. A clustered index always performs best when the database can merely append to it - otherwise, the DB has to do page splits. Note that this only applies if the key is sequential, i.e. either an auto-increment sequence or a sequential GUID. Arbitrary GUIDs will probably be much worse for performance.

  • Relationships. If your key is 3, 4, 5 columns long, including character types and other non-compact data, you end up wasting enormous amounts of space and subsequently reduce performance if you have to create foreign key relationships to this key in 20 other tables.

  • Uniqueness. Sometimes you don't have a true natural key. Maybe your table is some sort of log, and it's possible for you to get two of the same event at the same time. Or maybe your real key is something like a materialized path that can only be determined after the row is already inserted. Either way, you always want your clustered index and/or primary key to be unique, so if you have no other truly unique information, you have no choice but to employ a surrogate key.

  • Compatibility. Most people will never have to deal with this, but if the natural key contains something like a hierarchyid, it's possible that some systems can't even read it. In this case, again you must create a simple auto-generated surrogate key for use by these applications. Even if you don't have any "weird" data in the natural key, some DB libraries have a lot of trouble dealing with multi-column primary keys, although this problem is quickly going away.

Reasons to use multi-column natural keys

  • Storage. Many people who work with databases never work with large enough ones to have to care about this factor. But when a table has billions or trillions of rows, you are going to want to keep the absolute minimum amount of data in this table that you possibly can.

  • Replication. Yes, you can use a GUID, or a sequential GUID. But GUIDs have their own trade-offs, and if you can't or don't want to use a GUID for some reason, a multi-column natural key is a much better choice for replication scenarios because it is intrinsically globally unique - that is, you don't need a special algorithm to make it unique, it's unique by definition. This makes it very easy to reason about distributed architectures.

  • Insert/Update Performance. Surrogate keys aren't free. If you have a set of columns that are unique and frequently queried on, and you therefore need to create a covering index on these columns; the index ends up being almost as large as the table, which wastes space and requires that a second index be updated every time you make any modifications. If it is ever possible for you to have only one index (the clustered index) on a table, you should do it!


That's what comes to mind right off the bat. I'll update if I suddenly remember anything else.

Advantages and disadvantages of having composite primary key

There are lots of tables where you may want to have an identity column as a primary key. However, in the case of a M:M relationship table you describe, best practice is NOT to use a new identity column for the primary key.

RThomas's link in his comment provides the excellent reasons why the best practice is to NOT add an identity column. Here's that link.

The cons will outweigh the pros in pretty much every case, but since you asked for pros and cons I put a couple of unlikely pros in as well.

Cons

  • Adds complexity

  • Can lead to duplicate relationships unless you enforce uniqueness on the relationship (which a primary key would do by default).

  • Likely slower: db must maintain two indexes rather than one.

Pros

All the pros are pretty sketchy

  • If you had a situation where you needed to use the primary key of the relationship table as a join to a separate table (e.g. an audit table?) the join would likely be faster. (As noted though--adding and removing records will likely be slower. Further, if your relationship table is a relationship between tables that themselves use unique IDs, the speed increase from using one identity column in the join vs two will be minimal.)

  • The application, for simplicity, may assume that every table it works with has a unique ID as its primary key. (That's poor design in the app but you may not have control over it.) You could imagine a scenario where it is better to introduce some extra complexity in the DB than the extra complexity into such an app.

Composite Primary Keys : Good or Bad?

There is no conclusion that composite primary keys are bad.

The best practice is to have some column or columns that uniquely identify a row. But in some tables a single column is not enough by itself to uniquely identify a row.

SQL (and the relational model) allows a composite primary key. It is a good practice is some cases. Or, another way of looking at it is that it's not a bad practice in all cases.

Some people have the opinion that every table should have an integer column that automatically generates unique values, and that should serve as the primary key. Some people also claim that this primary key column should always be called id. But those are conventions, not necessarily best practices. Conventions have some benefit, because it simplifies certain decisions. But conventions are also restrictive.

You may have an order with multiple payments because some people purchase on layaway, or else they have multiple sources of payment (two credit cards, for instance), or two different people want to pay for a share of the order (I frequently go to a restaurant with a friend, and we each pay for our own meal, so the staff process half of the order on each of our credit cards).

I would design the system you describe as follows:

Products  : product_id (PK)

Orders : order_id (PK)

LineItems : product_id is (FK) to Products
order_id is (FK) to Orders
(product_id, order_id) is (PK)

Payments : order_id (FK)
payment_id - ordinal for each order_id
(order_id, payment_id) is (PK)

This is also related to the concept of identifying relationship. If it's definitional that a payment exists only because an order exist, then make the order part of the primary key.

Note the LineItems table also lacks its own auto-increment, single-column primary key. A many-to-many table is a classic example of a good use of a composite primary key.

MySQL - Should I use multi-column primary keys on every child table?

This data is normalized

TABLE { FIELDS }
-----------------------------------------------------------------------
building { id, data }
floor { id, building_id, data }
room {id, floor_id, data }
bed {id, room_id, data }

This table is not (bad idea)

TABLE { FIELDS }
-----------------------------------------------------------------------
building { id, data }
floor { id, building_id, data }
room {id, building_id, floor_id, data }
bed {id, building_id, floor_id, room_id, data }
  1. In the first (good) table you do not have unneeded duplicated data.
  2. Inserts in the first table will be much faster.
  3. The first tables will fit more easily in memory, speeding up your queries.
  4. InnoDB is optimized with model A in mind, not with model B.
  5. The latter (bad) table has duplicated data, if that gets out of sync, you will have a mess. DB A cannot is much harder to get out of sync, because the data is only listed once.
  6. If I want to combine data from the building, floor, room and bed I will need to combine all four tables in model A as well as model B, how are you saving time here.
  7. InnoDB stores indexed data in its own file, if you select only indexes, the tables themselves will never be accessed. So why are you duplicating the indexes? MySQL will never need to read the main table anyway.
  8. InnoDB stores the PK in each an every secondary index, with a composite and thus long PK, you are slowing down every select that uses an index and balooning the filesize; for no gain what so ever.
  9. Do you have serious speed problem? If not, you are you denormalizing your tables?
  10. Don't even think about using MyISAM which suffers less from these issues, it is not optimized for multi-join databases and does not support referential intregrity or transactions and is a poor match for this workload.
  11. When using a composite key you can only ever use the rightmost-part of the key, i.e. you cannot use floor_id in table bed other than using id+building_id+floor_id, This means that you may have to use much more key-space than needed in Model A. Either that or you need to add an extra index (which will drag around a full copy of the PK).

In short
I see absolutly zero benefit and a whole lot of drawbacks in Model B, never use it!

Are Multi-column Primary Keys in MySQL a optimisation problem?

I wouldn't think that there would be any performance problems with multiple primary keys. It's more or less equivalent to having multiple indexes (you will spend a little bit more time computing index values when doing inserts).

Sometimes the data model makes more sense with multiple keys. I'd worry about being straightforward first and worry about performance second. You can always add more indexes, improve your queries, or twiddle server settings.

I think the most I've encountered was a 4-column primary key. Makes me cringe a little bit, but it worked¹.


[1] "worked" is defined to mean "the application performed to specification", and is not meant to imply that actual tasks were accomplished using said application. :)

What are the pros and cons for choosing a character varying data type for primary key in SQL?

The advantages you have for choosing a character datatype as a primary key field is that you may choose what data it can show. As an example, you could have the email address as the key field for a users table. The eliminates the need for an additional column. Another advantage is if you have a common data table that holds indexes of multiple other tables (think a NOTES table with an external reference to FINANCE, CONTACT, and ADMIN tables), you can easily know what table this came from (e.g. your FINANCE table has an index of F00001, CONTACT table has an index of C00001, etc). I'm afraid the disadvantages are going to be greater larger in this reply as I'm against such an approach.

The disadvantages are as follows:

  1. The serial datatype exists for exactly this reason in PostgreSQL
  2. Numeric indexes will be entered in order and minimal reindexing will need to be done (i.e. if you have a table with keys Apple, Carrot and want to insert Banana, the table will have to move around the indexes so that Banana is inserted in the middle. You will rarely insert data in the middle of an index if the index is numeric).
  3. Numeric indexes unlinked from data are not going to change.
  4. Numeric indexes are shorter and their length can be fixed (4 bytes vs whatever you pick as your varchar length).

In your case you can still put a foreign key on a numeric index, so I'm not sure why you would want to force it to be a varchar type. Searching and filtering on a numeric field is theoretically faster than a text field as the server will be forced to convert the data first. Generally speaking, you would have a numeric primary key that is non-clustered, and then create a clustered key on your data column that you are going to filter a lot.

Those are general standards when writing SQL, but when it comes to benchmarking, you will only find that varchar columns are a little slower on joining and filtering than integer columns. As long as your primary keys are not changing EVER then you're alright.

Why Composite Primary key when I can use Single Primary key with Unique constraints on composite columns?

You do not need a primary key to enforce uniqueness. You can use a unique constraint or index instead.

I am not a fan of composite primary keys. Here are some reasons:

  • All foreign key references have to include all the keys in the correct order and matching types. This makes is slightly more cumbersome to define those tables.
  • Because the composite keys are included in all referencing tables, those tables are often larger, which results in worse performance.
  • If you decide that you want to change the type of one of the component keys -- say the length of a string or an int to a numeric -- you have to modify lots and lots of tables.
  • When joining tables, you have to include all the keys. If you miss one . . . well, the code is syntactically correct but the results are wrong.

There are occasions where composite keys are acceptable, such as tables that have no foreign key references. Even in those cases, I use synthetic keys, but I totally understand the other perspective.



Related Topics



Leave a reply



Submit