How to Deal with M..N Relationships in a Relational Database

How do you deal with m..n relationships in a relational database?

Add another table called BookAuthors with columns for BookID, AuthorID, and NameUsed. A NULL value for NameUsed would mean to pull it from the Author's table instead. This is called an Intersection table.

Why does the m: n relationship in the DB design have to create a new relation?

Originally, Chen's method for mapping the entity-relationship model to the relational model prescribed that every relationship map to a separate relation (table). However, it's become common practice to denormalize one-to-one and one-to-many relationship relations into entity relations with the same determinant/key, to limit the number of tables in a database. The persistence of network data model thinking (records with references) also helps to popularize this approach. However, not all relationship relations can be denormalized - M:N binary as well as ternary and higher relationships have keys that involve multiple entity sets, and so necessarily require their own relations/tables.

For example, consider the following conceptual models:

ER diagram

A direct mapping of each entity relation and relationship relation to separate tables would give the following physical models:

Table diagram 1

Note that the one-to-one relationship Manages has two candidate keys, but the physical schema requires that we choose one as primary key. I chose department_id since denormalization of the relationship relation into the Employee entity relation would require a nullable column (since not every Employee is a manager).

More importantly, note that the N:1 and 1:1 relationship tables have the same keys as one of the entity tables. We can take advantage of this to combine those relations:

Denormalized relationships table diagram

However, the M:N and M:N:P relationships have composite keys, and can't be combined with other tables (unless, in a larger model, you coincidentally have multiple relationships with the same cardinalities between the same entity sets).

I also wonder why I place a foreign key in n relations instead of 1 when I have a 1: n relationship.

It's more correct to say that a relationship is determined by the combination of its independent entity sets. Values of these sets can appear any number of times, as indicated by M, N, etc. We indicate dependent entity sets with a 1 - these act similar to attributes, in that each unique combination of values in the independent roles determines a single value for each dependent role. This is the logical concept of functional dependency.

When we denormalize tables as above, we do so based on matching keys, which means only dependent columns get transferred from one table to another. In the end, it may look like we added a single FK column to the entity table on the many side of the relationship, but we actually added a mapping from the key to the dependent column, and we just didn't need to repeat the key.

Meaning of n:m and 1:n in database design

m:n is used to denote a many-to-many relationship (m objects on the other side related to n on the other) while 1:n refers to a one-to-many relationship (1 object on the other side related to n on the other).

How to model a database with many m:n relations on a table

Your design violates Fourth Normal Form. You're trying to store multiple "facts" in one table, and it leads to anomalies.

The Person_Attributes table should look something like this: personId jobId houseId restaurantId

So if I associate with one job, one house, but two restaurants, do I store the following?

personId jobId houseId restaurantId
1234 42 87 5678
1234 42 87 9876

And if I add a third restaurant, I copy the other columns?

personId jobId houseId restaurantId
1234 123 87 5678
1234 123 87 9876
1234 42 87 13579

Done! Oh, wait, what happened there? I changed jobs at the same time as adding the new restaurant. Now I'm incorrectly associated with two jobs, but there's no way to distinguish between that and correctly being associated with two jobs.

Also, even if it is correct to be associated with two jobs, shouldn't the data look like this?

personId jobId houseId restaurantId
1234 123 87 5678
1234 123 87 9876
1234 123 87 13579
1234 42 87 5678
1234 42 87 9876
1234 42 87 13579

It starts looking like a Cartesian product of all distinct values of jobId, houseId, and restaurantId. In fact, it is -- because this table is trying to store multiple independent facts.

Correct relational design requires a separate intersection table for each many-to-many relationship. Sorry, you have not found a shortcut.

(Many articles about normalization say the higher normal forms past 3NF are esoteric, and one never has to worry about 4NF or 5NF. Let this example disprove that claim.)


Re your comment about using NULL: Then you have a problem enforcing uniqueness, because a PRIMARY KEY constraint requires that all columns be NOT NULL.

personId jobId houseId restaurantId
1234 123 87 5678
1234 NULL NULL 9876
1234 NULL NULL 13579

Also, if I add a second house or a second jobId to the above table, which row do I put it in? You could end up with this:

personId jobId houseId restaurantId
1234 123 87 5678
1234 NULL NULL 9876
1234 42 NULL 13579

Now if I disassociate restaurantId 9876, I could update it to NULL. But that leaves a row of all NULLs, which I really should just delete.

personId jobId houseId restaurantId
1234 123 87 5678
1234 NULL NULL NULL
1234 42 NULL 13579

Whereas if I had disassociated restaurant 13579, I could update it to NULL and leave the row in place.

personId jobId houseId restaurantId
1234 123 87 5678
1234 NULL NULL 9876
1234 42 NULL NULL

But shouldn't I consolidate rows, moving the jobId to another row, provided there's a vacancy in that column?

personId jobId houseId restaurantId
1234 123 87 5678
1234 42 NULL 9876

The trouble is, now it's getting more and more complex to add or remove associations, requiring multiple SQL statements for changes. You're going to have to write a lot of tedious application code to handle this complexity.

However, all the various changes are easy if you define one table per many-to-many relationship. You do need the complexity of having that many more tables, but by doing that you will simplify your application code.

Adding an association to a restaurant is simply an INSERT to the Person_Restaurant table. Removing that association is simply a DELETE. It doesn't matter how many associations there are to jobs or houses. And you can define a primary key constraint in each of these intersection tables to enforce uniqueness.

Why no many-to-many relationships?

Think about a simple relationship like the one between Authors and Books. An author can write many books. A book could have many authors. Now, without a bridge table to resolve the many-to-many relationship, what would the alternative be? You'd have to add multiple Author_ID columns to the Books table, one for each author. But how many do you add? 2? 3? 10? However many you choose, you'll probably end up with a lot of sparse rows where many of the Author_ID values are NULL and there's a good chance that you'll run across a case where you need "just one more." So then you're either constantly modifying the schema to try to accommodate or you're imposing some artificial restriction ("no book can have more than 3 authors") to force things to fit.

How to normalize relationships between tables with references to a parent table

As you have it set up, the company owns the book and the author. This doesn't match reality; author(s) write books, companies publish books, and a book can have many editions published by many companies.

Formally:

  • A book has many authors.
  • A book has many editions.
  • An edition has a company (the publisher).
  • A book has many companies through their editions.

And from that we can derive...

  • An author has many books.
  • An author has many editions of their books.
  • An author has many companies through their book's editions.
  • A company has many editions.
  • A company has many books through their editions.
  • A company has many authors through their editions' books.
-- These can be publishers.
create table companies (
id bigserial primary key,
name text not null
);

-- These can be authors.
create table people (
id bigserial primary key,
name text not null
);

create table books (
id bigserial primary key,
title text not null
);

-- This can also include their role in the authoring.
-- For example: illustrator, editor
create table book_authors (
book_id bigint not null references books,
person_id bigint not null references people,

-- Can't author the same book twice.
unique(book_id, person_id)
);

-- This can also include information about the publishing such
-- as its year and language.
create table book_editions (
book_id bigint not null references books,
company_id bigint not null references companies,
edition text not null,

-- The same company can't publish the same edition twice.
unique(book_id, company_id, edition)
);

This is a generic, flexible book/author/publisher schema. However, the schema should be primarily defined by how you intend to use it.

many-to-many relationship in database design

There's nothing inherently wrong with having a many-to-many relationship, you'll just need to create a Junction Table (which is what it sounds like you're referring to with articlesTags) to facilitate that relationship.



Related Topics



Leave a reply



Submit