What Is the Resource Impact from Normalizing a Database

What is the resource impact from normalizing a database?

This can not really be answered in a general manner, as the impact will vary heavily depending on the specifics of the database in question and the apps using it.

So you basically stated the general expectations concerning the impact:

  1. Overall memory demands for storage should go down, as redundant data gets removed
  2. CPU needs might go up, as queries might get more expensive (Note that in many cases queries on a normalized database will actually be faster, even if they are more complex, as there are more optimization options for the query engine)
  3. Development resource needs might go up, as developers might need to construct more elaborate queries (But on the other hand, you need less development effort to maintain data integrity)

So the only real answer is the usual: it depends ;)

Note: This assumes that we are talking about cautious and intentional denormalization. If you are referring to the 'just throw some tables together as data comes along' approach way to common with inexperienced developers, I'd risk the statement that normalization will reduce resource needs on all levels ;)


Edit: Concerning the specific context added by cdeszaq, I'd say 'Good luck getting your point through' ;)

Oviously, with over 300 Tables and no constraints (!), the answer to your question is definitely 'normalizing will reduce resource needs on all levels' (and probably very substantially), but:

Refactoring such a mess will be a major undertaking. If there is only one app using this database, it is already dreadful - if there are many, it might become a nightmare!

So even if normalizing would substantially reduce resource needs in the long run, it might not be worth the trouble, depending on circumstances. The main questions here are about long term scope - how important is this database, how long will it be used, will there be more apps using it in the future, is the current maintenance effort constant or increasing, etc. ...

Don't ignore that it is a running system - even if it's ugly and horrible, according to your description it is not (yet) broken ;-)

Is database normalization still necessary?

It depends on what type of application(s) are using the database.

For OLTP apps (principally data entry, with many INSERTs, UPDATEs and DELETES, along with SELECTs), normalized is generally a good thing.

For OLAP and reporting apps, normalization is not helpful. SELECT queries will run much more quickly against a denormalized schema, which could be achieved with views.

You might also find some helpful information in these very popular similar questions:

Should I normalize my DB or not?

In terms of databases, is “Normalize for correctness, denormalize for performance” a right mantra?

What is the resource impact from normalizing a database?

How to convince someone to normalize a database?

Is it really better to use normalized tables?

Does normalization really hurt performance in high traffic sites?

I quote: "normalize for correctness, denormalize for speed - and only when necessary"

I refer you to: In terms of databases, is "Normalize for correctness, denormalize for performance" a right mantra?

HTH.

Data normalization and writing queries

general principle behind data normalization is to create a RDBMS where data redundancy is kept to a minimum.

Only partly true.

Normalization is not about "redundancy".

It's about "update anomalies".

1NF is the "don't use arrays" rules. Breaking 1NF means a row isn't atomic, but a collection and independent updates in the collection wouldn't work out well. There'd be locking and slowness.

2NF is the "one key" rule. Each row has exactly one key and everything in the row depends on the key. There are no dependencies on part of the key. Some folks like to talk about candidate keys and natural keys and foreign keys; they may exist or they may not. 2NF is satisfied when all attributes depend on one key. If the key is a single-column surrogate key, this normal form is trivially satisfied.

If 2NF is violated, you've got columns which depend on part of a key, but not the whole key. If you had a table with (Part Number, Revision Number) as a key, and attributes of color and weight, where weight depends on the whole key, but color only depends on the part number. You have a 2NF problem where you could update some part colors but not others, creating data anomalies.

3NF is the "only the key" rule. If you put derived data in a row, and change the derived result, it doesn't match the source columns. If you change a source column without updating the derived value, you have a problem, too. Yes, triggers are a bad hackaround to allow 3NF design violations. That's not the point. The point is merely to define 3NF and show that it prevents an update problem.

each query involves combing through several different tables and joining them together. I was wondering if this is a a side effect of data normalization?

It is.



Related Topics



Leave a reply



Submit