SQL One to One Relationship VS. Single Table

SQL one to one relationship vs. single table

You would normally split tables into two or more 1:1 related tables when the table gets very wide (i.e. has many columns). It is hard for programmers to have to deal with tables with too many columns. For big companies such tables can easily have more than 100 columns.

So imagine a product table. There is a selling price and maybe another price which was used for calculation and estimation only. Wouldn't it be good to have two tables, one for the real values and one for the planning phase? So a programmer would never confuse the two prices. Or take logistic settings for the product. You want to insert into the products table, but with all these logistic attributes in it, do you need to set some of these? If it were two tables, you would insert into the product table, and another programmer responsible for logistics data would care about the logistic table. No more confusion.

Another thing with many-column tables is that a full table scan is of course slower for a table with 150 columns than for a table with just half of this or less.

A last point is access rights. With separate tables you can grant different rights on the product's main table and the product's logistic table.

So all in all, it is rather rare to see 1:1 relations, but they can give a clearer view on data and even help with performance issues and data access.

EDIT: I'm taking Mike Sherrill's advice and (hopefully) clarify the thing about normalization.

Normalization is mainly about avoiding redundancy and relateded lack of consistence. The decision whether to hold data in only one table or more 1:1 related tables has nothing to do with this. You can decide to split a user table in one table for personal information like first and last name and another for his school, graduation and job. Both tables would stay in the normal form as the original table, because there is no data more or less redundant than before. The only column used twice would be the user id, but this is not redundant, because it is needed in both tables to identify a record.

So asking "Is it considered correct to normalize the settings into a separate table?" is not a valid question, because you don't normalize anything by putting data into a 1:1 related separate table.

Single table vs two one-to-one related tables performance

Obviously you can see the 1st isn't 3rd normal form. Performance wise a properly normalized table will be on par to a flat table as you have in the first example even when dealing with 10s and 100s of millions of records. Though the flat file will always be slightly faster but at trivial amounts if properly related. The issue with the first becomes scalability over time. You're giving up a slight gain in performance for an unstable foundation if growth is needed

It's a Marginal difference at best. The single table will always have a slight advantage; which would become more pronounced when you're dealing with hundreds of millions of records+. But there's ways around that by partitioning the table into relevant blocks so the engine can multi-thread the results gathering and eliminate lots of unneeded records based on join & filter criteria.

As within any other development there is no one single silver bullet. There are always exceptions to the rules; context matters for each question. However, the broad brush approach says, normalize unless you KNOW there will NEVER be growth. (never's a long time! but then maybe the system has a known shelf life and will never achieve such long term existence. )

Database design: One to one relationship from a single table?

The short answer is not really.

A one to one relationship is what you get when you establish a relationship between the primary keys of two tables.

Taking your idea: say we have a bunch of users, but we track their log-in information in a table like this:

CREATE TABLE users (
user_id int NOT NULL PRIMARY KEY,
username varchar,
password varchar);

And another table like this:

CREATE TABLE user_personal(
user_id int NOT NULL PRIMARY KEY,
age int,
firstname varchar,
lastname varchar)

Once you've made the two tables, it's easy to see you have a column in each table with the same data-type (user_id) that is used as the Primary Key for that table.

If you establish a relationship between the two tables using user_id as the key, you've established a one-to-one relationship, because the user_id can only appear once in either table.

This may seem a little weird, because you could ask why you would want to separate the data out like that. Why draw the distinction between what data fields you include in one table and not the other? One easy solution is data segregation. Say we modified our second table to show this:

CREATE TABLE user_personal(
user_id int NOT NULL PRIMARY KEY,
age int,
firstname varchar(255),
lastname varchar(255),
home_address varchar(255),
social_security_number int)

We might want to give access to some data to some people, but not others. If you had your user-relationship table, it might be good to show some people what family/friendship relationships people have to each other, but you wouldn't want to be giving away their social security numbers and home addresses. Another big reason you might separate out the data is if your tables have a lot of data in them (think in the hundreds of columns), and only some data gets updated at any particular time. If you have millions of records that get updated, but only certain columns get updated together while the others remain pretty static, you could separate out that table into two tables with a one-to-one relationship to keep the database performance from bogging down during update processes.

When I should use one to one relationship?

1 to 0..1

  • The "1 to 0..1" between super and sub-classes is used as a part of "all classes in separate tables" strategy for implementing inheritance.

  • A "1 to 0..1" can be represented in a single table with "0..1" portion covered by NULL-able fields. However, if the relationship is mostly "1 to 0" with only a few "1 to 1" rows, splitting-off the "0..1" portion into a separate table might save some storage (and cache performance) benefits. Some databases are thriftier at storing NULLs than others, so a "cut-off point" where this strategy becomes viable can vary considerably.

1 to 1

  • The real "1 to 1" vertically partitions the data, which may have implications for caching. Databases typically implement caches at the page level, not at the level of individual fields, so even if you select only a few fields from a row, typically the whole page that row belongs to will be cached. If a row is very wide and the selected fields relatively narrow, you'll end-up caching a lot of information you don't actually need. In a situation like that, it may be useful to vertically partition the data, so only the narrower, more frequently used portion or rows gets cached, so more of them can fit into the cache, making the cache effectively "larger".

  • Another use of vertical partitioning is to change the locking behavior: databases typically cannot lock at the level of individual fields, only the whole rows. By splitting the row, you are allowing a lock to take place on only one of its halfs.

  • Triggers are also typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do that could make this impractical. For example, Oracle doesn't let you modify the mutating table - by having separate tables, only one of them may be mutating so you can still modify the other one from your trigger.

  • Separate tables may allow more granular security.

These considerations are irrelevant in most cases, so in most cases you should consider merging the "1 to 1" tables into a single table.

See also: Why use a 1-to-1 relationship in database design?

Why use a 1-to-1 relationship in database design?

From the logical standpoint, a 1:1 relationship should always be merged into a single table.

On the other hand, there may be physical considerations for such "vertical partitioning" or "row splitting", especially if you know you'll access some columns more frequently or in different pattern than the others, for example:

  • You might want to cluster or partition the two "endpoint" tables of a 1:1 relationship differently.
  • If your DBMS allows it, you might want to put them on different physical disks (e.g. more performance-critical on an SSD and the other on a cheap HDD).
  • You have measured the effect on caching and you want to make sure the "hot" columns are kept in cache, without "cold" columns "polluting" it.
  • You need a concurrency behavior (such as locking) that is "narrower" than the whole row. This is highly DBMS-specific.
  • You need different security on different columns, but your DBMS does not support column-level permissions.
  • Triggers are typically table-specific. While you can theoretically have just one table and have the trigger ignore the "wrong half" of the row, some databases may impose additional limits on what a trigger can and cannot do. For example, Oracle doesn't let you modify the so called "mutating" table from a row-level trigger - by having separate tables, only one of them may be mutating so you can still modify the other from your trigger (but there are other ways to work-around that).

Databases are very good at manipulating the data, so I wouldn't split the table just for the update performance, unless you have performed the actual benchmarks on representative amounts of data and concluded the performance difference is actually there and significant enough (e.g. to offset the increased need for JOINing).


On the other hand, if you are talking about "1:0 or 1" (and not a true 1:1), this is a different question entirely, deserving a different answer...

See also: When I should use one to one relationship?

Difference between one-to-one and one-to-many relationship in database

In a sense, all the relationships we talk about are not known to the database, they are constructs we have invented to better understand how to design the tables.

The big difference in terms of table structure between one-to-one and one-to-many is that in one-to-one it is possible (but not necessary) to have a bidirectional relationship, meaning table A can have a foreign key into table B, and table B can have a foreign key into the associated record in table A. This is not possible with a one-to-many relationship.

One-to-one relationships associate one record in one table with a single record in the other table. One-to-many relationships associate one record in one table with many records in the other table.

One-to-One Relation or Use the Same Table?

I'm not exactly sure what your requirements are, but the choices are as follows:

  1. Have Reviews have 2 columns, either being a foreign key to the applicable table, can be NULL. This is really for when a single review can be about both.

  2. Have a ReviewsComics and ReviewsAnime table. You'd then have all the fields from Reviews in each table (and no Reviews table).

  3. An alternative (2) is to use them in conjunction with a Reviews table, then those 2 tables only has 2 fields which are foreign keys to Reviews and Comics/Anime respectively (thus no direct link between Reviews and Comics/Anime).

  4. Have a base table to which Anime and Comics are linked to 1-to-1 and have reviews link to that table instead.

  5. (Edit) If all the fields are all going to be the same (or similar) for Anime/Comics, you can merge them into a single table and add a type field, indicating Anime/Comics, then the problem goes away. This is similar to the base table option.

EDIT: The 2 reviews tables will probably give the best performance (unless you want to select all reviews for either, often), but with proper indices the performance shouldn't be an issue with any of the above.



Related Topics



Leave a reply



Submit