Maintaining Referential Integrity - Good or Bad

Maintaining Referential Integrity - Good or Bad?

When discussing auditing, I would go back to the purpose behind it. It isn't really a backup but rather a history of what has been. For example, for StudentScore, you would want to be sure not to lose the fact that the student originally had a 65% when they now have a 95%. This audit trail would allow you to walk back through the changes to see what happened and who did it. From this, you could identify what a particular user did to abuse the system. In some ways this could be a type of backup since you could roll back these changes to their previous states without rolling back entire tables.

With this in mind (if my assumptions about what you are using this for are correct), the only place you would want a FK/PK relationship is between the history table and its "live" counterpart. Your audit (history) table should not refer to any other table because it is no longer a part of that system. Instead, it is simply a record of what has happened in one table. Period. The only referential integrity you might want to consider is between the history table and the live table (thus the possible FK/PK relationship). If you allow records to be deleted from the live table, don't include the FK in the history table. Then the history table could include deleted records (which is what you want if you allow deletions).

Don't get confused with relational integrity in the main database with this history table. The history tables are all stand-alone. They only serve as a history of one table (not a set of tables).

The relating of two history tables together is possible and even more advanced relations between the live and history tables together (Students and Courses with both live and history, for example) so you can deal with even the possibility that a student was deleted (shudder) since the record would still be in the history table. The only issue here would be if you don't keep the history for a particular table, in which case you are choosing to lose that data (if you allow deletes).

When is referential integrity not appropriate?

When is referential integrity not appropriate?

Referential intergrity if typically not used on Data Warehouses where the data is a read only copy of a transactional datbase. Another example of when you'd not need RI is when you want to log information which includes row ids; maintaining referential integrity for a read-only log table is a waste of database overhead.

Is it appropriate to have fields containing multiple and/or possibly incomplete subsets of a foreign key's list?

Sometimes you care more about capturing data than data quality. Imagine you are aggregating a large amount of data from disparate systems which each in their own right suffer from data quality issues. Sometimes you are after the greater good of data quality and having everything in one place even with broken keys etc. represents a starting point for moving towards true data quality. It's not ideal, but it does happen as the beenfits could outweigh the tradeoffs.

Typically, should this be a schema structure design decision or an interface design decision? (Or possibly neither or both)

Everything about systems development is centered around information security, and a key element of that is data integrity. The database structure should lean towards enforcing these things when possible, however you often are not dealing with modern database systems. Sometimes your data source is an old school AS400 with long-antiquated apps. Sometimes you have to build a data and business layer which provide for data integrity.

Just my thoughts.

Should referential integrity be enforced?

The database is responsible for data. That's it. Period.

If referential integrity is not done in the database, then it's not integrity. It's just trusting people not to do bad things, in which case you probably shouldn't even worry about password-protecting your data either :-)

Who's to say you won't get someone writing their own JDBC-connected client to totally screw up the data, despite your perfectly crafted and bug-free business layer (the fact that it probably won't be bug-free is another issue entirely, mandating that the DB should protect itself).

How should I keep referential integrity to users in a separate membership provider?

This question is actually about two different things.

Should a foreign key to a remote database also a be a foreign key to a local table.
Can you maintain referential integrity to a foreign database

These are two different things entirely, although the quick answer to both is actually the same:

No.

But let me get into the details a bit.

1. Use of foreign keys to a remote database

To reduce dependancy on the remote database, you should only store those foreign keys in one location in your database.

Example: Let's just say you had a blog where users could post comments. These users will login through Facebook. You now have a remote database (Facebook) and a local one that stores your users' comments. You could now follow one of two designs:

a comments table that stores the facebook_id as foreign key

a separate users table storing the facebook_id along with a local id and a comments table that uses your local id as foreign key.

You should not use the facebook_id in both. While that would actually work, you're introducing a dependancy on a remote database without need. You wouldn't be able to add a comment from a Non-Facebook user since that would break your design.

2.Referential integrity with remote databases

You might not have intended to ask this, but the term referential integrity implies all foreign keys to the remote database actually refer to an existing remote record (i.e. user). The only way to maintain that integrity would be if the remote database would inform you of changes to a remote record or its deletion, which usually is not the case.

Example: Lets go back to above mentioned hypothetical blog. Some Facebook user posted a comment. Later the same person decides to delete their Facebook account. The Facebook database will not likely inform you of that happening, leaving you with "dead" records in your database which do not link to a valid record in the remote database anymore. This breaks referential integrity. So unless you have a good way of actually maintaining that integrity, such as receiving deletion notifications etc, you should design your application so that it won't break if the Facebook user got deleted.

Does creating a foreign key automatically mean referential integrity?

Yes, Referential Integrity is enforced as soon as you create a foreign key. You can define other options as well, such as cascading updates and deletes. Usually this is part of the same syntax used to create the foreign key and it normally has to be specified when the foreign key is created - at least that's how it works in standard SQL.

Are foreign keys really necessary in a database design?

Foreign keys help enforce referential integrity at the data level. They also improve performance because they're normally indexed by default.

Maintaining Referential Integrity - Good or Bad