How Far to Take Normalization

How far to take normalization?

Denormalization has the advantage of fast SELECTs on large queries.

Disadvantages are:

  • It takes more coding and time to ensure integrity (which is most important in your case)

  • It's slower on DML (INSERT/UPDATE/DELETE)

  • It takes more space

As for optimization, you may optimize either for faster querying or for faster DML (as a rule, these two are antagonists).

Optimizing for faster querying often implies duplicating data, be it denormalization, indices, extra tables of whatever.

In case of indices, the RDBMS does it for you, but in case of denormalization, you'll need to code it yourself. What if Department moves to another Office? You'll need to fix it in three tables instead of one.

So, as I can see from the names of your tables, there won't be millions records there. So you'd better normalize your data, it will be simplier to manage.

How do you determine how far to normalize a database?

You want to start designing a normalized database up to 3rd normal form. As you develop the business logic layer you may decide you have to denormalize a bit but never, never go below the 3rd form. Always, keep 1st and 2nd form compliant. You want to denormalize for simplicity of code, not for performance. Use indexes and stored procedures for that :)

The reason not "normalize as you go" is that you would have to modify the code you already have written most every time you modify the database design.

There are a couple of good articles:

http://www.agiledata.org/essays/dataNormalization.html

When is a good time to break normalization rules?

The rule is normalize til it hurts, then denormalize til it works. (who said that?)

In general, I often denormalize when I have a lot of parent child relationships and I know I would often have to join to five or six large tables to get one piece of data (say the client id for instance) and will not need any of the information from the intermediate tables much of the time. If at all possible, I try to denormalize things that will not change frequently (such as id fields). But anytime you denormalize, you have to write triggers or some other process (but normally triggers if it isn't something that can be handled through a PK/FK relationship and cascading updates) to make sure the data stays in synch. If you fail to do this at the database level, then you will have data integrity problems and your data becomes useless. Do not think you can maintain the denormalization through the application code. This is a recipe for disaster, as database are updated often from places other than the application.

Denormalizing correctly can slow inserts, updates and deletes, especially if you need to do large batches of data. It may or may not improve select query speed depending on how you need to query the data. If you end up needing to do a lot of self-joins to get the data, it is possible you would have been better off not denormalizing. Never denormalize without testing to see if you have improved performance. Remember slowing inserts/updates/deletes will have an overall effect on the system when many users are using it. By denormalizing to fix one problem, you may be introducing a worse problem in the overall system. Don't just test the one query you are trying to speed up, test the performance of the whole system. You might speed up a query that runs once a month and slow down other qreries that run thousands of times a day.

Denormalizing is often done for data warehouses which are a special case as they are generally updated automatically on a schedule rather than one record at a time by a user. DBAs who specialize in data warehousing also tend to build them and they know how to avoid the data integrity issues.

Another common denormalizing technique is to create a staging table for data related to a complex report that doesn't need to be run with real time data. This is a sort of poor man's data warehouse and should never be done without a way to update the staging table on a schedule (As infrequently as you can get away with, this uses server resources that could be better spend elsewhere most of the time.) Often these types of table are updated when there are few users on the system and lag a full day behind the real time data. Don't consider doing this unless the query you are staging the data for is truly slow and cannot otherwise be optimized. Many slow queries can be optimized without denomalization as developers often use the easiest to understand rather than the most performant ways to select data.

Is normalizing a person's name going too far?

Database normalization usually refers to normalizing the field, not its content. In other words, you would normalize that there only be one first name field in the database. That is generally worthwhile. However the data content should not be normalized, since it is individual to that person - you are not picking from a list, and you are not changing a list in one place to affect everybody - that would be a bug, not a feature.

MySQL Database: How far to Normalize / Queries VS Join / Unique Index

Your tables are already in 2NF.The condition for 2NF is there should be no partial dependency.For example lets take your users table and user-id is the primary key and another primary key more appropriate to call candidate key is (cityid,A) with which you can uniquely represent a row in the table.Your table is not in 2NF if cityid or A alone is enough to uniquely retrieve B,C,D or E but in your case one needs both (cityid,A) to retrieve a unique record and hence it's already normalized.

Note:

Your tables are not in 3NF.The condition for 3NF is no transitive dependency.Let's take the users table here userid is the primary key and you can get a unique (cityid,A) pair with that and in turn you can get a unique (B,C,D,E) record with (cityid,A) obtained from userid.In short if A->B and B->C indirectly A->C which is called transitive dependency and it's present in your user table and hence it's not a suitable candidate for 3NF.

Is normalizing the gender table going too far?

Whether or not you choose to normalize your table structure to accomodate gender is going to depend on the requirements of your application and your business requirements.

I would normalize if:

  • You want to be able to manage the "description" of a gender in the database, and not in code.

    • This allows you to quickly change the description from Man/Woman to Male/Female, for example.
  • Your application currently must handle, or will possible handle in the future, localization requirements, i.e. being able to specify gender in different languages.
  • Your business requires that everything be normalized.

I would not normalize if:

  • You have a relatively simple application where you can easily manage the description of the gender in code rather than in the database.
  • You have tight programmatic control of the data going in and out of the gender field such that you can ensure consistency of the data in that field.
  • You only care about the gender field for information capture, meaning, you don't have a lot of programmatic need to update this field once it is set the first time.


Related Topics



Leave a reply



Submit