SQL - Many-To-Many Table Primary Key

SQL - many-to-many table primary key

With a simple two-column many-to-many mapping, I see no real advantage to having a surrogate key. Having a primary key on (col1,col2) is guaranteed unique (assuming your col1 and col2 values in the referenced tables are unique) and a separate index on (col2,col1) will catch those cases where the opposite order would execute faster. The surrogate is a waste of space.

You won't need indexes on the individual columns since the table should only ever be used to join the two referenced tables together.

That comment you refer to in the question is not worth the electrons it uses, in my opinion. It sounds like the author thinks the table is stored in an array rather than an extremely high performance balanced multi-way tree structure.

For a start, it's never necessary to store or get at the table sorted, just the index. And the index won't be stored sequentially, it'll be stored in an efficient manner to be able to be retrieved quickly.

In addition, the vast majority of database tables are read far more often than written. That makes anything you do on the select side far more relevant than anything on the insert side.

Should many to many tables have a primary key?

I agree with everything Oded said except

"It can't reasonably be used as a
foreign key either."

In this case it's a pick your poison, the mapping table absolutely can be a parent, it's just a matter of the child using a multicolumn FK or not.

Take a simple case of Car and color. Each Year Auto Makers have a certain pallet of colors and each model only comes in limited number of those colors. Many - Many :: Colors to Cars models

So now design the Order table where new cars orders are stored. Clearly Color and Model will be on the Order table. If you make a FK to each of those tables, the database will permit an incorrect model/color combination to be selected. (Of course you can enforce this with code, you can't do so declaratively.) If you make the parent be the many:many table, you'll only get combinations that have been specified.

SO would you rather have a multicolumn FK and point to a PK built on both ModelID and ColorID or do you want a single column FK?

Pick your poison.

EDIT

But if it's not a parent of something, no table needs a surrogate key.

Primary key in many-to-many table

This is a "hard" question in the sense that there are pretty good arguments on both sides. I have a bias toward putting in auto-incremented ids in all tables that I use. Over time, I have found that this simply helps with the development process and I don't have to think about whether they are necessary.

A big reason for this is so foreign key references to the table can use only one column.

In a many-to-many junction table (aka "association table"), this probably isn't necessary:

It is unlikely that you will add a table with a foreign key relationship to a junction table.
You are going to want a unique index on the columns anyway.
They will probably be declared not null anyway.

Some databases actually store data based on the primary key. So, when you do an insert, then data must be moved on pages to accommodate the new values. Postgres is not one of those databases. It treats the primary key index just like any other index. In other words, you are not incurring "extra" work by declaring one more more columns as a primary key.

My conclusion is that having the composite primary key is fine, even though I would probably have an auto-incremented primary key with separate constraints. The composite primary key will occupy less space so probably be more efficient than an auto-incremented id. However, if there is any chance that this table would be used for a foreign key relationship, then add in another id field.

One or Two Primary Keys in Many-to-Many Table?

You only have one primary key in either case. The second one is what's called a compound key. There's no good reason for introducing a new column. In practise, you will have to keep a unique index on all candidate keys. Adding a new column buys you nothing but maintenance overhead.

Go with option 2.

primary key in many to many table

Personally, I would go with solution 4:


user_language
---------------
user_id (FK)
lang_code (FK)

with composite PRIMARY KEY (user_id, lang_code)

I don't think adding the surrogate key (user_lang_id in solution 3) really adds any value to the schema and simply adds another column to have to worry about. The primary key is a good idea though to maintain uniqueness - without it, you could add the same user_id/lang_code combination multiple times.

Primary key in complicated many to many MySQL table

Keep it very simple. Do not mix normalisation concepts with business/application requirements. Its best to have an integer based, auto incremented column as primary key in each table and reference it in other tables.

If you are required to have a check on uniqueness of combination for said columns, you should rather have a composite index with unique constraint. Business or application requirements keep on changing. You wouldn't want to make changes in the primary key when such times come.

many to many relationship in mysql have to be the foreign keys the primary keys from both tables connections tables?

Your Existof Table is not flexible enough. The way most order processing systems deal with this situation is to add a column, which we can call Quantity, to the Existof table. The default value is 1, but other quantities can be put in as well.

So if a given order wants to order say 5 reams of paper,and ream of paper in a product, the entry for this item in Existof will have a quantity of 5.

This assumes that all 5 reams are interchangeable, and therefore described by the same data. If some of the paper reams are of different colors, than they ought to be different products.

Many-to-many relationship with compound primary key constraint

If you have to enforce a one-to-one relationship between (t1.id,t2.id) and t4.id, that seems to indicate that (t1.id,t2.id) would be unique.

If that's the case, if (t1.id,t2.id) should be UNIQUE in t3, then you could make that the PRIMARY KEY for t3.

You could add a foreign key reference to t4, and make that UNIQUE so no more than one row in t3 can be related to t4. Or, you can make the foreign key reference go the other way, and store the primary key values of t3 in t4.

For example, to make the foreign key in t3 to t4

 CREATE TABLE t3
 ( t1_id  INT NOT NULL       COMMENT 'pk, fk ref t1'
 , t2_id  INT NOT NULL       COMMENT 'pk, fk ref t2'
 , t4_id  INT                COMMENT 'fk ref t4'
 , PRIMARY KEY (t1_id,t2_id)
 , CONSTRAINT t3_ux2 UNIQUE KEY (t4_id)
 , CONSTRAINT fk_t3_t1 FOREIGN KEY (t1_id) REFERENCES t1(id)
     ON DELETE CASCADE ON UPDATE CASCADE
 , CONSTRAINT fk_t3_t2 FOREIGN KEY (t2_id) REFERENCES t2(id)
     ON DELETE CASCADE ON UPDATE CASCADE
 , CONSTRAINT fk_t3_t4 FOREIGN KEY (t4_id) REFERENCES t4(id)
     ON DELETE RESTRICT ON UPDATE CASCADE
 ) ...

Or, you could introduce a surrogate primary key on t3. (We typically only do this if there are other tables that have a foreign key reference to t3; because now t3 is acting more like an actual entity, than a pure relationship. And that avoids us having to use a composite key as a foreign key in another table to reference t3.

For example:

 CREATE TABLE t3
 ( id     INT NOT NULL       COMMENT 'pk'
 , t1_id  INT NOT NULL       COMMENT 'fk ref t1'
 , t2_id  INT NOT NULL       COMMENT 'fk ref t2'
 , PRIMARY KEY (t3_id)
 , CONSTRAINT t3_ux1 UNIQUE KEY (t1_id,t2_id)
 , CONSTRAINT fk_t3_t1 FOREIGN KEY (t1_id) REFERENCES t1(id) 
     ON DELETE CASCADE ON UPDATE CASCADE
 , CONSTRAINT fk_t3_t2 FOREIGN KEY (t2_id) REFERENCES t2(id) 
     ON DELETE CASCADE ON UPDATE CASCADE
 ) ...

You could add a foreign key in this table t3 to reference to t4, as in the previous example.

Or you could implement the relationship from t4 to reference t3.

Either way works.

My decision would be primarily based on

1) avoiding composite keys as foreign keys (if no entity tables have composite keys)

2) do we want or need ON DELETE CASCADE functionality, and which way does that need to work... a delete from t3 should cascade deletes to t4, or the other way around.

I showed ON DELETE RESTRICT in the example I gave adding t4_id fk column to t3. I figured that a delete from t4 should probably not "break" the relationship between t1 and t2 by removing rows from t3.

SQL - Many-To-Many Table Primary Key