Composite VS Surrogate Keys for Referential Integrity in 6Nf

Composite vs Surrogate keys for Referential Integrity in 6NF

Here's some trade-offs:

Single Surrogate (artificially created):

  • All child tables foreign keys only need a single column to reference the primary key.

  • Very easy to update the natural keys in table without needing to update every child table with foreign keys

  • Smaller primary/foreign key indexes (ie. not a wide) This can make the database run faster, for example when a record is deleted in a parent table, the child tables need to be searched to make sure this will not create orphans. Narrow indexes are faster to scan (just sightly).

  • you will have more indexes because you most likely will also want to index whatever natural keys exists in the data.

Natural composite keyed tables:

  • fewer indexes in the database

  • less columns in the database

  • easier/faster to insert a ton of records as you will not need to grab the sequence generator

  • updating one of the keys in the compound requires that every child table also be updated.

Then there is another category: artificial composite primary keys

I've only found one instance where this makes sense. When you need to tag every record in every table for row level security.

For example, suppose you had an database which stored data for 50,000 clients and each client was not supposed to see other client's data--very common in web application development.

If each record was tagged with a client_id field, you are creating a row level security environment. Most databases have the tools to enforce row level security when setup correctly.

First thing to do is setup primary and foreign keys. Normally a table with have an id field as the primary key. By adding client_id the key is now composite key. And it is necessary to carry client_id to all child table.

The composite key is based on 2 surrogate keys and is a bulletproof way to ensure data integrity among clients and within the database a whole.

After this you would create views (or if using Oracle EE setup Virtual Private Database) and other various structures to allow the database to enforce row level security (which is a topic all it own).

Granted that this data structure is no longer normalized to the nth degree. The client_id field in each pk/fk denormalizes an otherwise normal model. The benefit of the model is the ease of enforcing row level security at the database level (which is what databases should do). Every select, insert, update, delete is restricted to whatever client_id your session is currently set. The database has session awareness.

Summary

Surrogate keys are always the safe bet. They require a little more work to setup and require more storage.

The biggest benefit in my opinion is:

  • Being able to update the PK in one table and all other child tables are instantaneously changed without ever being touched.

  • When data gets messed up--and it will at some point due to a programming mistake, surrogate keys make the clean up much much easier and in some cases only possible to do because there are surrogate keys.

  • Query performance is improved as the db is able to search attributes to locate the s.key and then join all child table by a single numeric key.

Natural Keys especially composite NKeys make writing code a pain. When you need to join 4 tables the "where clause" will be much longer (and easier to mess up) than when single SKeys were used.

Surrogate keys are the "safe" route. Natural keys are beneficial in a few places, I'd say around 1% of the tables in a db.

natural key vs surrogate key an innodb foreign key

Opinions don't matter when you can measure.

I implemented this on PostgreSQL using both natural keys and surrogates. I used 300,000 total products, 180 ingredients, and populated two "product ingredient" tables with 3 to 17 ingredients per product, for 100,000 randomly selected products (1053462 rows).

Selecting all the ingredients for a single product using natural keys returned in 0.067 ms. Using surrogates, 0.199ms.

Returning all the non-id columns for a single product using natural keys returned in 0.145 ms. Using surrogates, 0.222 ms

So natural keys were about 2 to 3 times faster on this data set.

Natural keys don't require any joins to return this data. Surrogate keys require two joins.

The actual performance difference depends on the width of your tables, number of rows, page size, and length of names, and things like that. There will be a point where surrogate keys start outperforming natural keys, but few people try to measure that.

When I was designing the database for my employer's operational database, I built a testbed with tables designed around natural keys and with tables designed around id numbers. Both those schemas have more than 13 million rows of computer-generated sample data. In a few cases, queries on the id number schema outperformed the natural key schema by 50%. (So a complex query that took 20 seconds with id numbers took 30 seconds with natural keys.) But 80% of the test queries had faster SELECT performance against the natural key schema. And sometimes it was staggeringly faster--a difference of 30 to 1.

We expect natural keys to outperform surrogates in our database for years to come. (Unless we move certain tables over to an SSD, in which case natural keys will probably outperform surrogates forever.)

Postgresql: Primary key for table with one column

there is always a debate between using surrogate keys and composite keys as primary key. using composite primary keys always introduces some complexity to your database design so to your application.

think that you have another table which is needed to have direct relationship between your resulting table (billing table). For the composite key scenario you need to have 4 columns in your related table in order to connect with the billing table. On the other hand, if you use surrogate keys, you will have one identity column (simplicity) and you can create unique constraint on (country_id, product_id, company_id, addresstype_id)

but it is hard to say this approach is better then the other one because they both have Pros and Cons.

You can check This for more information

Should referential integrity be enforced?

The database is responsible for data. That's it. Period.

If referential integrity is not done in the database, then it's not integrity. It's just trusting people not to do bad things, in which case you probably shouldn't even worry about password-protecting your data either :-)

Who's to say you won't get someone writing their own JDBC-connected client to totally screw up the data, despite your perfectly crafted and bug-free business layer (the fact that it probably won't be bug-free is another issue entirely, mandating that the DB should protect itself).

Generating surrogate keys remotely

This explains the hilo algorithm which you refer to: What's the Hi/Lo algorithm?

It's the often-used solution to "disconnect" problems such as yours. For i.e. if you're using Hibernate/nHibernate, it's one of the recommended primary key options.



Related Topics



Leave a reply



Submit