What Are the Down Sides of Using a Composite/Compound Primary Key

Advantages and disadvantages of having composite primary key

There are lots of tables where you may want to have an identity column as a primary key. However, in the case of a M:M relationship table you describe, best practice is NOT to use a new identity column for the primary key.

RThomas's link in his comment provides the excellent reasons why the best practice is to NOT add an identity column. Here's that link.

The cons will outweigh the pros in pretty much every case, but since you asked for pros and cons I put a couple of unlikely pros in as well.

Cons

  • Adds complexity

  • Can lead to duplicate relationships unless you enforce uniqueness on the relationship (which a primary key would do by default).

  • Likely slower: db must maintain two indexes rather than one.

Pros

All the pros are pretty sketchy

  • If you had a situation where you needed to use the primary key of the relationship table as a join to a separate table (e.g. an audit table?) the join would likely be faster. (As noted though--adding and removing records will likely be slower. Further, if your relationship table is a relationship between tables that themselves use unique IDs, the speed increase from using one identity column in the join vs two will be minimal.)

  • The application, for simplicity, may assume that every table it works with has a unique ID as its primary key. (That's poor design in the app but you may not have control over it.) You could imagine a scenario where it is better to introduce some extra complexity in the DB than the extra complexity into such an app.

Composite Primary Keys : Good or Bad?

There is no conclusion that composite primary keys are bad.

The best practice is to have some column or columns that uniquely identify a row. But in some tables a single column is not enough by itself to uniquely identify a row.

SQL (and the relational model) allows a composite primary key. It is a good practice is some cases. Or, another way of looking at it is that it's not a bad practice in all cases.

Some people have the opinion that every table should have an integer column that automatically generates unique values, and that should serve as the primary key. Some people also claim that this primary key column should always be called id. But those are conventions, not necessarily best practices. Conventions have some benefit, because it simplifies certain decisions. But conventions are also restrictive.

You may have an order with multiple payments because some people purchase on layaway, or else they have multiple sources of payment (two credit cards, for instance), or two different people want to pay for a share of the order (I frequently go to a restaurant with a friend, and we each pay for our own meal, so the staff process half of the order on each of our credit cards).

I would design the system you describe as follows:

Products  : product_id (PK)

Orders : order_id (PK)

LineItems : product_id is (FK) to Products
order_id is (FK) to Orders
(product_id, order_id) is (PK)

Payments : order_id (FK)
payment_id - ordinal for each order_id
(order_id, payment_id) is (PK)

This is also related to the concept of identifying relationship. If it's definitional that a payment exists only because an order exist, then make the order part of the primary key.

Note the LineItems table also lacks its own auto-increment, single-column primary key. A many-to-many table is a classic example of a good use of a composite primary key.

What are the pros and cons of using multi column primary keys?

This really seems to be a question about surrogate keys, which are always either an auto-incrementing number or GUID and hence a single column, vs. natural keys, which often require multiple pieces of information in order to be truly unique. If you are able to have a natural key that is only one column, then the point is obviously moot anyway.

Some people will insist on only using one or the other. Spend sufficient time working with production databases and you'll learn that there is no context-independent best practice.

Some of these answers use SQL Server terminology but the concepts are generally applicable to all DBMS products:


Reasons to use single-column surrogate keys:

  • Clustered indexes. A clustered index always performs best when the database can merely append to it - otherwise, the DB has to do page splits. Note that this only applies if the key is sequential, i.e. either an auto-increment sequence or a sequential GUID. Arbitrary GUIDs will probably be much worse for performance.

  • Relationships. If your key is 3, 4, 5 columns long, including character types and other non-compact data, you end up wasting enormous amounts of space and subsequently reduce performance if you have to create foreign key relationships to this key in 20 other tables.

  • Uniqueness. Sometimes you don't have a true natural key. Maybe your table is some sort of log, and it's possible for you to get two of the same event at the same time. Or maybe your real key is something like a materialized path that can only be determined after the row is already inserted. Either way, you always want your clustered index and/or primary key to be unique, so if you have no other truly unique information, you have no choice but to employ a surrogate key.

  • Compatibility. Most people will never have to deal with this, but if the natural key contains something like a hierarchyid, it's possible that some systems can't even read it. In this case, again you must create a simple auto-generated surrogate key for use by these applications. Even if you don't have any "weird" data in the natural key, some DB libraries have a lot of trouble dealing with multi-column primary keys, although this problem is quickly going away.

Reasons to use multi-column natural keys

  • Storage. Many people who work with databases never work with large enough ones to have to care about this factor. But when a table has billions or trillions of rows, you are going to want to keep the absolute minimum amount of data in this table that you possibly can.

  • Replication. Yes, you can use a GUID, or a sequential GUID. But GUIDs have their own trade-offs, and if you can't or don't want to use a GUID for some reason, a multi-column natural key is a much better choice for replication scenarios because it is intrinsically globally unique - that is, you don't need a special algorithm to make it unique, it's unique by definition. This makes it very easy to reason about distributed architectures.

  • Insert/Update Performance. Surrogate keys aren't free. If you have a set of columns that are unique and frequently queried on, and you therefore need to create a covering index on these columns; the index ends up being almost as large as the table, which wastes space and requires that a second index be updated every time you make any modifications. If it is ever possible for you to have only one index (the clustered index) on a table, you should do it!


That's what comes to mind right off the bat. I'll update if I suddenly remember anything else.

Question about composite primary keys

IF your Ticket table has a primary key on (TicketID, TenantID), then any table referencing the Ticket table would also have to reference both columns, e.g.

TicketItem(TicketID,TenantID) ==> Ticket(TicketID,TenantID)

You cannot have a reference to just parts of a (compound) primary key, e.g. you cannot have TicketID in TicketItem reference the Ticket table - you need both parts of a compound primary key in every single foreign key referencing it (one of the major drawbacks of compound indices, in my opinion - it makes joins cumbersome)

Composite primary keys versus unique object ID field

Most of the commonly used engines (MS SQL Server, Oracle, DB2, MySQL, etc.) would not experience noticeable issues using a surrogate key system. Some may even experience a performance boost from the use of a surrogate, but performance issues are highly platform-specific.

In general terms, the natural key (and by extension, composite key) verses surrogate key debate has a long history with no likely “right answer” in sight.

The arguments for natural keys (singular or composite) usually include some the following:

1) They are already available in the data model. Most entities being modeled already include one or more attributes or combinations of attributes that meet the needs of a key for the purposes of creating relations. Adding an additional attribute to each table incorporates an unnecessary redundancy.

2) They eliminate the need for certain joins. For example, if you have customers with customer codes, and invoices with invoice numbers (both of which are "natural" keys), and you want to retrieve all the invoice numbers for a specific customer code, you can simply use "SELECT InvoiceNumber FROM Invoice WHERE CustomerCode = 'XYZ123'". In the classic surrogate key approach, the SQL would look something like this: "SELECT Invoice.InvoiceNumber FROM Invoice INNER JOIN Customer ON Invoice.CustomerID = Customer.CustomerID WHERE Customer.CustomerCode = 'XYZ123'".

3) They contribute to a more universally-applicable approach to data modeling. With natural keys, the same design can be used largely unchanged between different SQL engines. Many surrogate key approaches use specific SQL engine techniques for key generation, thus requiring more specialization of the data model to implement on different platforms.

Arguments for surrogate keys tend to revolve around issues that are SQL engine specific:

1) They enable easier changes to attributes when business requirements/rules change. This is because they allow the data attributes to be isolated to a single table. This is primarily an issue for SQL engines that do not efficiently implement standard SQL constructs such as DOMAINs. When an attribute is defined by a DOMAIN statement, changes to the attribute can be performed schema-wide using an ALTER DOMAIN statement. Different SQL engines have different performance characteristics for altering a domain, and some SQL engines do not implement DOMAINS at all, so data modelers compensate for these situations by adding surrogate keys to improve the ability to make changes to attributes.

2) They enable easier implementations of concurrency than natural keys. In the natural key case, if two users are concurrently working with the same information set, such as a customer row, and one of the users modifies the natural key value, then an update by the second user will fail because the customer code they are updating no longer exists in the database. In the surrogate key case, the update will process successfully because immutable ID values are used to identify the rows in the database, not mutable customer codes. However, it is not always desirable to allow the second update – if the customer code changed it is possible that the second user should not be allowed to proceed with their change because the actual “identity” of the row has changed – the second user may be updating the wrong row. Neither surrogate keys nor natural keys, by themselves, address this issue. Comprehensive concurrency solutions have to be addressed outside of the implementation of the key.

3) They perform better than natural keys. Performance is most directly affected by the SQL engine. The same database schema implemented on the same hardware using different SQL engines will often have dramatically different performance characteristics, due to the SQL engines data storage and retrieval mechanisms. Some SQL engines closely approximate flat-file systems, where data is actually stored redundantly when the same attribute, such as a Customer Code, appears in multiple places in the database schema. This redundant storage by the SQL engine can cause performance issues when changes need to be made to the data or schema. Other SQL engines provide a better separation between the data model and the storage/retrieval system, allowing for quicker changes of data and schema.

4) Surrogate keys function better with certain data access libraries and GUI frameworks. Due to the homogeneous nature of most surrogate key designs (example: all relational keys are integers), data access libraries, ORMs, and GUI frameworks can work with the information without needing special knowledge of the data. Natural keys, due to their heterogeneous nature (different data types, size etc.), do not work as well with automated or semi-automated toolkits and libraries. For specialized scenarios, such as embedded SQL databases, designing the database with a specific toolkit in mind may be acceptable. In other scenarios, databases are enterprise information resources, accessed concurrently by multiple platforms, applications, report systems, and devices, and therefore do not function as well when designed with a focus on any particular library or framework. In addition, databases designed to work with specific toolkits become a liability when the next great toolkit is introduced.

I tend to fall on the side of natural keys (obviously), but I am not fanatical about it. Due to the environment I work in, where any given database I help design may be used by a variety of applications, I use natural keys for the majority of the data modeling, and rarely introduce surrogates. However, I don’t go out of my way to try to re-implement existing databases that use surrogates. Surrogate-key systems work just fine – no need to change something that is already functioning well.

There are some excellent resources discussing the merits of each approach:

http://www.google.com/search?q=natural+key+surrogate+key

http://www.agiledata.org/essays/keys.html

http://www.informationweek.com/news/software/bi/201806814

Why are composite primary keys still around?

Personally I prefer the use of surrogate keys. However, in joining tables that consist only of the ids from two other tables (to create a many-to-many relationships) composite keys are the way to go and thus taking them out would make things more difficult.

There is a school of thought that surrogate keys are always bad and that if you don't have uniqueness to record through the use of natural keys you have a bad design. I strongly disagree with this (if you aren't storing SSN or some other unique value I defy you to come up with a natural key for a person table for instance.) But many people feel that it is necessary for proper normalization.

Sometimes having a composite key reduces the need to join to another table. Sometimes it doesn't. So there are times when a composite key can boost performance as well as times when it can harm performance. If the key is relatively stable, you may be fine with faster performance on select queries. However, if it is something that is subject to change like a company name, you could be in a world of hurt when company A changes it's name and you have to update a million associated records.

There is no one size fits all in database design. There are time when composite keys are helpful and times when they are horrible. There are times when surrogate keys are helpful and times when they are not.



Related Topics



Leave a reply



Submit