What Are the [Dis]Advantages of Using a Key/Value Table Over Nullable Columns or Separate Tables

What are the [dis]advantages of using a key/value table over nullable columns or separate tables?

Perhaps you should look this question

The accepted answer from Bill Karwin goes into specific arguments against the key/value table usually know as Entity Attribute Value (EAV)

.. Although many people seem to favor
EAV, I don't. It seems like the most
flexible solution, and therefore the
best. However, keep in mind the adage
TANSTAAFL. Here are some of the
disadvantages of EAV:

  • No way to make a column mandatory (equivalent of NOT NULL).
  • No way to use SQL data types to validate entries.
  • No way to ensure that attribute names are spelled consistently.
  • No way to put a foreign key on the values of any given attribute, e.g.
    for a lookup table.
  • Fetching results in a conventional tabular layout is complex and
    expensive, because to get attributes
    from multiple rows you need to do
    JOIN for each attribute.

The degree of flexibility EAV gives
you requires sacrifices in other
areas, probably making your code as
complex (or worse) than it would have
been to solve the original problem in
a more conventional way.

And in most cases, it's an unnecessary
to have that degree of flexibility.
In the OP's question about product
types, it's much simpler to create a
table per product type for
product-specific attributes, so you
have some consistent structure
enforced at least for entries of the
same product type.

I'd use EAV only if every row must
be permitted to potentially have a
distinct set of attributes. When you
have a finite set of product types,
EAV is overkill. Class Table
Inheritance would be my first choice.

Nullable foreign keys vs relational tables for N:M relations

Avoid nullable "foreign keys". They have multiple disadvantages.

The constraint on a referencing row is not always enforced when the foreign key contains a null. However, that default behaviour is not consistent between different DBMSs. Some DBMSs support configuration options to change the behaviour of nullable foreign keys and some do not. SQL developers and users may therefore be unclear about what a nullable foreign key constraint actually means from a data integrity perspective. Porting the database between DBMS products or even between different servers using the same product could give inconsistent results.

Database design tools, integration tools and other software don't always support them correctly and the results they produce may be wrong.

Foreign keys are frequently used in joins and other query logic, compounding the problems for users who think the constraint is in effect when it isn't.

In logical terms, a nullable "foreign key" constraint doesn't make much logical sense. According to the SQL standard such a constraint may not be violated even if the table being referenced is empty. That contradicts one of the most common alleged justifications for using a null - that it represents the "unknown" case. If there are no valid values of X then any "unknown" X certainly cannot be a valid value - and yet SQL will permit it.

It's unnecessary. You can always construct the tables so that a null isn't needed. In the interests of simplicity and accuracy it is therefore better to leave nulls out than put them in.

Advantage of nullable Foreign Keys

If you have a foreign key with a default value of 0, that means you must at all times maintain the existence of a workOrder with a ID of 0, as a fake-workOrder for unassigned components to reference. This is a pretty ugly hack.

(This is assuming your FOREIGN KEY actually is a proper, enforced foreign key, which is definitely desirable, but wouldn't happen if you were, say, using MyISAM. The example doesn't work for me, as just saying FOREIGN KEY on its own without specifying what column it REFERENCES isn't valid SQL.)

If you don't like NULLs, the alternative solution is a join table mapping components to workOrders, with a UNIQUE constraint on the component_Id.

SQL one to one relationship vs. single table

You would normally split tables into two or more 1:1 related tables when the table gets very wide (i.e. has many columns). It is hard for programmers to have to deal with tables with too many columns. For big companies such tables can easily have more than 100 columns.

So imagine a product table. There is a selling price and maybe another price which was used for calculation and estimation only. Wouldn't it be good to have two tables, one for the real values and one for the planning phase? So a programmer would never confuse the two prices. Or take logistic settings for the product. You want to insert into the products table, but with all these logistic attributes in it, do you need to set some of these? If it were two tables, you would insert into the product table, and another programmer responsible for logistics data would care about the logistic table. No more confusion.

Another thing with many-column tables is that a full table scan is of course slower for a table with 150 columns than for a table with just half of this or less.

A last point is access rights. With separate tables you can grant different rights on the product's main table and the product's logistic table.

So all in all, it is rather rare to see 1:1 relations, but they can give a clearer view on data and even help with performance issues and data access.

EDIT: I'm taking Mike Sherrill's advice and (hopefully) clarify the thing about normalization.

Normalization is mainly about avoiding redundancy and relateded lack of consistence. The decision whether to hold data in only one table or more 1:1 related tables has nothing to do with this. You can decide to split a user table in one table for personal information like first and last name and another for his school, graduation and job. Both tables would stay in the normal form as the original table, because there is no data more or less redundant than before. The only column used twice would be the user id, but this is not redundant, because it is needed in both tables to identify a record.

So asking "Is it considered correct to normalize the settings into a separate table?" is not a valid question, because you don't normalize anything by putting data into a 1:1 related separate table.

Should each and every table have a primary key?

Short answer: yes.

Long answer:

  • You need your table to be joinable on something
  • If you want your table to be clustered, you need some kind of a primary key.
  • If your table design does not need a primary key, rethink your design: most probably, you are missing something. Why keep identical records?

In MySQL, the InnoDB storage engine always creates a primary key if you didn't specify it explicitly, thus making an extra column you don't have access to.

Note that a primary key can be composite.

If you have a many-to-many link table, you create the primary key on all fields involved in the link. Thus you ensure that you don't have two or more records describing one link.

Besides the logical consistency issues, most RDBMS engines will benefit from including these fields in a unique index.

And since any primary key involves creating a unique index, you should declare it and get both logical consistency and performance.

See this article in my blog for why you should always create a unique index on unique data:

  • Making an index UNIQUE

P.S. There are some very, very special cases where you don't need a primary key.

Mostly they include log tables which don't have any indexes for performance reasons.

Can a foreign key be NULL and/or duplicate?

Short answer: Yes, it can be NULL or duplicate.

I want to explain why a foreign key might need to be null or might need to be unique or not unique. First remember a Foreign key simply requires that the value in that field must exist first in a different table (the parent table). That is all an FK is by definition. Null by definition is not a value. Null means that we do not yet know what the value is.

Let me give you a real life example. Suppose you have a database that stores sales proposals. Suppose further that each proposal only has one sales person assigned and one client. So your proposal table would have two foreign keys, one with the client ID and one with the sales rep ID. However, at the time the record is created, a sales rep is not always assigned (because no one is free to work on it yet), so the client ID is filled in but the sales rep ID might be null. In other words, usually you need the ability to have a null FK when you may not know its value at the time the data is entered, but you do know other values in the table that need to be entered. To allow nulls in an FK generally all you have to do is allow nulls on the field that has the FK. The null value is separate from the idea of it being an FK.

Whether it is unique or not unique relates to whether the table has a one-one or a one-many relationship to the parent table. Now if you have a one-one relationship, it is possible that you could have the data all in one table, but if the table is getting too wide or if the data is on a different topic (the employee - insurance example @tbone gave for instance), then you want separate tables with a FK. You would then want to make this FK either also the PK (which guarantees uniqueness) or put a unique constraint on it.

Most FKs are for a one to many relationship and that is what you get from a FK without adding a further constraint on the field. So you have an order table and the order details table for instance. If the customer orders ten items at one time, he has one order and ten order detail records that contain the same orderID as the FK.

MySQL - A lot of columns in a Table - needs suggestions

You should not store computed values in your table. This violates database normalization, because you store values redundantly. Example: If a table contains the columns x and y and x_plus_y and their values in a table row are 10, 12, and 13, then some person or process has inserted invalid data, because 10+12=22, not 13. Maybe the values were correct at first, but then one of the values was updated and the updating person or process was not aware that they had to update the dependent column, too. Anyway, now some queries may use x_plus_y and others may calculate the result from x and y, and thus they give different results. That must not be.

The solution to that: Don't store the values, when you can always calculate them ad hoc. You can, however, write a view or add generated columns to your table. Generated columns are mere calcuations that either get done when queried or when their base values change. E.g.

create table io_generated
(
...
total_late decimal(10,2) generated always as (late1 + late2 + late3 + late4) virtual;
...
);

(Exception to the rule: In data warehouses we often accept redundancy. We usually get our data from a database without redundancies and introduce the redundancies in order to gain access speed.)

Apart from that your table looks okay. We cannot know, however, if its design is appropriate or not, because we know too little about your data. A more typical design would be:

CREATE TABLE io_generated 
(
io_generated_id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
employee_id INT(11) UNSIGNED NOT NULL,
date DATE NOT NULL,
branch_id MEDIUMINT(8) UNSIGNED NOT NULL
);

CREATE TABLE io_detail
(
io_detail_id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
io_generated_id INT(11) NOT NULL,
in_datetime DATETIME NOT NULL,
out_datetime DATETIME NOT NULL,
in_branch_id MEDIUMINT(8) UNSIGNED NOT NULL,
out_branch_id MEDIUMINT(8) UNSIGNED NOT NULL,
in_edited TINYINT(1) UNSIGNED NOT NULL DEFAULT 0,
out_edited TINYINT(1) UNSIGNED NOT NULL DEFAULT 0
);

This design has advantages and disadvantages compared to yours.

  • It's very easy for instance to tell whether there are IOs after 10 p.m., because it's just one database column we must look at. It's very difficult on the other hand to tell whether there are IOs after 10 p.m. for the third IO, because we'd first have to determine which detail row is third.
  • It's very easy to extend this and have five IOs some day instead of only four. Just add a row; we would not have to change the table designs at all. It's very hard or impossible on the other hand to guarantee to have exactly four IOs.
  • It's very easy to count how many distinct in-branches are involved (COUNT(DISTINCT in_branch_id)).
  • It's impossible to guarantee all detail rows' dates match the parent row's date. This, however, can easily solved by switching from surrogate keys to natural composite keys.

I hope this helps you getting an idea what to consider when deciding for one design or the other.



Related Topics



Leave a reply



Submit