Composite Primary Keys:Good or Bad

Composite Primary Keys : Good or Bad?

There is no conclusion that composite primary keys are bad.

The best practice is to have some column or columns that uniquely identify a row. But in some tables a single column is not enough by itself to uniquely identify a row.

SQL (and the relational model) allows a composite primary key. It is a good practice is some cases. Or, another way of looking at it is that it's not a bad practice in all cases.

Some people have the opinion that every table should have an integer column that automatically generates unique values, and that should serve as the primary key. Some people also claim that this primary key column should always be called id. But those are conventions, not necessarily best practices. Conventions have some benefit, because it simplifies certain decisions. But conventions are also restrictive.

You may have an order with multiple payments because some people purchase on layaway, or else they have multiple sources of payment (two credit cards, for instance), or two different people want to pay for a share of the order (I frequently go to a restaurant with a friend, and we each pay for our own meal, so the staff process half of the order on each of our credit cards).

I would design the system you describe as follows:

Products  : product_id (PK)

Orders : order_id (PK)

LineItems : product_id is (FK) to Products
order_id is (FK) to Orders
(product_id, order_id) is (PK)

Payments : order_id (FK)
payment_id - ordinal for each order_id
(order_id, payment_id) is (PK)

This is also related to the concept of identifying relationship. If it's definitional that a payment exists only because an order exist, then make the order part of the primary key.

Note the LineItems table also lacks its own auto-increment, single-column primary key. A many-to-many table is a classic example of a good use of a composite primary key.

Advantages and disadvantages of having composite primary key

There are lots of tables where you may want to have an identity column as a primary key. However, in the case of a M:M relationship table you describe, best practice is NOT to use a new identity column for the primary key.

RThomas's link in his comment provides the excellent reasons why the best practice is to NOT add an identity column. Here's that link.

The cons will outweigh the pros in pretty much every case, but since you asked for pros and cons I put a couple of unlikely pros in as well.

Cons

  • Adds complexity

  • Can lead to duplicate relationships unless you enforce uniqueness on the relationship (which a primary key would do by default).

  • Likely slower: db must maintain two indexes rather than one.

Pros

All the pros are pretty sketchy

  • If you had a situation where you needed to use the primary key of the relationship table as a join to a separate table (e.g. an audit table?) the join would likely be faster. (As noted though--adding and removing records will likely be slower. Further, if your relationship table is a relationship between tables that themselves use unique IDs, the speed increase from using one identity column in the join vs two will be minimal.)

  • The application, for simplicity, may assume that every table it works with has a unique ID as its primary key. (That's poor design in the app but you may not have control over it.) You could imagine a scenario where it is better to introduce some extra complexity in the DB than the extra complexity into such an app.

Should I use composite primary keys or not?

I think there's no problem using a composite key.

To me the database it's a component on its own, that should be treated the same way we treat code : for instance we want clean code, that communicates clearly its intent, that does one thing and does it well, that doesn't add any uneeded level of complexity, etc.

Same thing with the db, if the PK is composite, this is the reality, so the model should be kept clean and clear. A composite PK it's clearer than the mix auto-increment + constraint. When you see an ID column that does nothing you need to ask what's the real PK, are there any other hidden things that you should be aware of, etc. A clear PK doesn't leave any doubts.

The db is the base of your app, to me we need the most solid base that we can have. On this base we'll build the app ( web or not ). So I can't see why we should bend the db model to conform to some specific in one development tool/framework/language. The data is directing the application, not the other way around. What if the ORM changes in the future and becomes obsolete and a better solution appears that imposes another model ? We can't play with the db model to fit this or that framework, the model should stay the same, it should not depend on what tool we're using to access the data ...

If the db model change in the future, it should change because functionality changed. If we would know today how this functionality will change, we'll be modeling this already. ANd any future change will be dealt with when the time comes, we can't predict for instance the impact on existing data, so one extra column doesn't guarantee that it will withold any future change ...

We should design for today's functionality, and keep the db model the simplest possible, this way it will be easy to change/evolve in the future.

Why are composite primary keys still around?

Personally I prefer the use of surrogate keys. However, in joining tables that consist only of the ids from two other tables (to create a many-to-many relationships) composite keys are the way to go and thus taking them out would make things more difficult.

There is a school of thought that surrogate keys are always bad and that if you don't have uniqueness to record through the use of natural keys you have a bad design. I strongly disagree with this (if you aren't storing SSN or some other unique value I defy you to come up with a natural key for a person table for instance.) But many people feel that it is necessary for proper normalization.

Sometimes having a composite key reduces the need to join to another table. Sometimes it doesn't. So there are times when a composite key can boost performance as well as times when it can harm performance. If the key is relatively stable, you may be fine with faster performance on select queries. However, if it is something that is subject to change like a company name, you could be in a world of hurt when company A changes it's name and you have to update a million associated records.

There is no one size fits all in database design. There are time when composite keys are helpful and times when they are horrible. There are times when surrogate keys are helpful and times when they are not.

Project rework : Should I use primary composite keys and json for translations with Laravel?

You're raising a lot of good questions that needs to be answered as well. We do not have any context of your "issues" exactly precise but you gave us quite a bit of information that can be useful to understand what you're asking for.

So here are my two cents :

Problem 1

You're pointing both of the composite primary keys and composite unique keys SQL concepts which are great to handle combined keys to refer to related to a Model of your app. As firstly pointed 4 years ago (that much!) by this topic, there's indeed no way today yet to get this natively working from Eloquent. But there are workaround to make this working great.

I'd prefer to use the composite primary keys to ensure that the base is really solid and avoid making mistakes while pushing rows in the database. I'd say that it's up to you accordingly to some kind of tolerance score towards potential mistakes or gap you may have forgotten while developing your app.

I don't really like the idea n°1 because in my opinion there is no wonderful and pragmatic reason to have an auto-incremented key here. I'm always keeping the "KISS" principle in mind ("Keep it stupid simple"). SQL is offering a way to combine keys natively (and it's working very well in general purposes), "why would I go with one more key if it doesn't fit with a real useful feature of my app ?"

Laravel has grown a lot but is constantly updated and improved each month. I like this framework a lot but it's always missing valuable things and that's why open-source is so much important. It offers some way to create Traits, Interface, Helpers either to retrieve, update, and find rows by a couple of primary keys as you may have known, so don't spend too much time ; just be sure that it fit to your needs and you can manipulate your model easily through it. Create your own HasCompositePrimaryKey trait on which you can define a new primary key property : protected $primaryKey = array('key1', 'key2'); (please check this answer).

Problem 2

Indeed I also think that json is the best field type to handle multilang string values. The latest version of MySQL (8.0) has made a lot of improvements towards searching content in JSON objects that you may want to take a look at. I'd store too all the strings within the same row for multiple languages as you did for your comments table. This will allow you to easily retrieve, display and manipulate content easily according to your user or visitor's language choice. You may store the 2-letters iso lang code into the user session cookie and use it in your Blade template, where it'd be much simpler to use : $comment[$user_lang_key] where the $user_lang_key could be retrieved from some kind of Session::get('settings_lang_key').

I don't really know what is the purpose of displaying a different comment to the "all other languages" which are not available. I'd prefer to use an existing lang as the fallback for my app, like the english one which is a lot used and displayed for a lot of default values. You'd have to simply fallback the session lang key to en to make this working everywhere. Putting some kind of "alias" like the '*' can make this quite a bit confusing and will probably need you to make some pre-verification to use it properly. If you want to give the choice to your user to display the comments in lang he want to as you said in comments, I'd prefer to save this settings value somewhere in your database and use it to point to the selected language.

Also, in case your needing it (because the user of your app could setup some behaviors of your app), here's how I would handle a multi-lingual setting he could create in my e-commerce app :

{
"fr": {
"title": "Choix d'un T-Shirt",
"fields": [
{
"label": "Taille de T-Shirt",
"type": "single-choice",
"choices": [
{ "name": "S", "price": 10 },
{ "name": "M", "price": 15 }
]
},
{
"label": "Couleur",
[...]
}
]
},
"en": {
"title": "T-Shirt Selection",
"fields": [
{
"label": "T-Shirt Size",
"type": "single-choice",
"choices": [
{ "name": "S", "price": 10 },
{ "name": "M", "price": 15 }
]
},
{
"label": "Color",
[...]
}
]
}
}

Finally, everything is up to you but I wanted to share my knowledge and how I'm currently doing by my side without any pain to handle multilingual features. I hope this will give you some ideas which fit to your needs in the best way you imagine it.

Is it a bad idea to have a composite primary key that contains a foreign key?

The biggest problem is if the FK field could also be null in your data because it could cause uniqueness issues. However, given your field structure, I would find it unlikely that you would allow nulls in that field in any event.

Composite Primary Keys and auto increment? What is a good practices?

You definitely want to use autoincrementing id values as primary keys. There happen to be many reasons for this. Here are some.

  1. Avoiding race conditions (accidental id duplication) requires great care if you generate them yourself. Spend that mental energy -- development, QA, operations -- on making your SaaS excellent instead of reinventing the flat tire on primary keys.
  2. You can still put an index on (client_id, id) even if it isn't the PK.
  3. Your JOIN operations will be easier to write, test, and maintain.
  4. This query pattern is great for getting the latest row for each client from a table. It performs very well. It's harder to do this kind of thing if you generate your own pks.

        SELECT t.*
    FROM table t
    JOIN (SELECT MAX(id) id
    FROM table
    GROUP BY client_id
    ) m ON t.id = m.id

What are the pros and cons of using multi column primary keys?

This really seems to be a question about surrogate keys, which are always either an auto-incrementing number or GUID and hence a single column, vs. natural keys, which often require multiple pieces of information in order to be truly unique. If you are able to have a natural key that is only one column, then the point is obviously moot anyway.

Some people will insist on only using one or the other. Spend sufficient time working with production databases and you'll learn that there is no context-independent best practice.

Some of these answers use SQL Server terminology but the concepts are generally applicable to all DBMS products:


Reasons to use single-column surrogate keys:

  • Clustered indexes. A clustered index always performs best when the database can merely append to it - otherwise, the DB has to do page splits. Note that this only applies if the key is sequential, i.e. either an auto-increment sequence or a sequential GUID. Arbitrary GUIDs will probably be much worse for performance.

  • Relationships. If your key is 3, 4, 5 columns long, including character types and other non-compact data, you end up wasting enormous amounts of space and subsequently reduce performance if you have to create foreign key relationships to this key in 20 other tables.

  • Uniqueness. Sometimes you don't have a true natural key. Maybe your table is some sort of log, and it's possible for you to get two of the same event at the same time. Or maybe your real key is something like a materialized path that can only be determined after the row is already inserted. Either way, you always want your clustered index and/or primary key to be unique, so if you have no other truly unique information, you have no choice but to employ a surrogate key.

  • Compatibility. Most people will never have to deal with this, but if the natural key contains something like a hierarchyid, it's possible that some systems can't even read it. In this case, again you must create a simple auto-generated surrogate key for use by these applications. Even if you don't have any "weird" data in the natural key, some DB libraries have a lot of trouble dealing with multi-column primary keys, although this problem is quickly going away.

Reasons to use multi-column natural keys

  • Storage. Many people who work with databases never work with large enough ones to have to care about this factor. But when a table has billions or trillions of rows, you are going to want to keep the absolute minimum amount of data in this table that you possibly can.

  • Replication. Yes, you can use a GUID, or a sequential GUID. But GUIDs have their own trade-offs, and if you can't or don't want to use a GUID for some reason, a multi-column natural key is a much better choice for replication scenarios because it is intrinsically globally unique - that is, you don't need a special algorithm to make it unique, it's unique by definition. This makes it very easy to reason about distributed architectures.

  • Insert/Update Performance. Surrogate keys aren't free. If you have a set of columns that are unique and frequently queried on, and you therefore need to create a covering index on these columns; the index ends up being almost as large as the table, which wastes space and requires that a second index be updated every time you make any modifications. If it is ever possible for you to have only one index (the clustered index) on a table, you should do it!


That's what comes to mind right off the bat. I'll update if I suddenly remember anything else.



Related Topics



Leave a reply



Submit