SQL - Should I Use a Junction Table or Not

SQL - Should I use a junction table or not?

If a model belongs to only one brand then you can put the FK to brand on the model table (your second approach). The first way, with the junction table, is for a many-to-many relation.

In a junction table, should I use a Primary key and a unique constraint, or a compound/composite key?

As a staunch proponent of, and proselytizer for, the benefits of surrogate keys, I none-the-less make an exception for all-key join tables such as your first example. One of the benefits of surrogate keys is that engines are generally optimized for joining on single integer fields, as the default and most common circumstance.

Your first proposal still obtains this benefit, but also has a 50% greater fan-put on each index level, reducing both the overall size and height of the indices on the join table. Although the performance benefits of this are likely negligible for anything smaller than a massive table it is best practice and comes at no cost.

When I might opt for the other design is if the relation were to accrue additional columns. At that point it is no longer strictly a join table.

Use (or not) of junction tables

You are dealing with a many to many relationship - a student can subscribe to more than one magazine, and a magazine can have more than one subscriber.

The way to deal with this is with a junction table, otherwise you end up with duplicated data within either the student or magazine table.

Consider:

student
id name mag_id
---------------------
1 Joe 1
1 Joe 2
1 Joe 3
2 Tom 1
etc

There's needless duplication of entries here in order to accommodate the M-N relationship.

A junction table is going to make things way easier in the long run:

student
id name
1 Joe
2 Tom

magazine
id name
1 Weekly
2 Sports

junction
id stud_id mag_id
1 1 1
2 1 2
etc

This way you have unique entries for each student-magazine pairing, less duplication of other field entries, and an easier time querying the data.

What is the necessity of junction tables?

The way I see it, you have have several entities

CREATE TABLE StudentType
(
Id Int PRIMARY KEY,
Name NVarChar(50)
);

INSERT StudentType VALUES
(
(1, 'Basic'),
(2, 'Advanced'),
(3, 'SomeOtherCategory')
);

CREATE TABLE Student
(
Id Int PRIMARY KEY,
Name NVarChar(200),
OtherAttributeCommonToAllStudents Int,
Type Int,
CONSTRAINT FK_Student_StudentType
FOREIGN KEY (Type) REFERENCES StudentType(Id)
)

CREATE TABLE StudentAdvanced
(
Id Int PRIMARY KEY,
AdvancedOnlyAttribute Int,
CONSTRIANT FK_StudentAdvanced_Student
FOREIGN KEY (Id) REFERENCES Student(Id)
)

CREATE TABLE StudentSomeOtherCategory
(
Id Int PRIMARY KEY,
SomeOtherCategoryOnlyAttribute Int,
CONSTRIANT FK_StudentSomeOtherCategory_Student
FOREIGN KEY (Id) REFERENCES Student(Id)
)
  1. Any attributes that are common to all students have columns on the Student table.
  2. Types of student that have extra attributes are added to the StudentType table.
  3. Each extra student type gets a Student<TypeName> table to store its specific attributes. These tables have an optional one-to-one relationship with Student.

I think that your "straw-man" junction table is a partial implementation of an EAV anti-pattern, the only time this is sensible, is when you can't know what attributes you need to model, i.e. your data will be entirely unstructured. When this is a real requirment, relational databases start to look less desirable. On those occasions consider a NOSQL/Document database alternative.


A junction table would be useful in the following scenario.

Say we add a Class entity to the model.

CREATE TABLE Class
(
Id Int PRIMARY KEY,
...
)

Its concievable that we would like to store the many-to-many realtionship between students and classes.

CREATE TABLE Registration
(
Id Int PRIMARY KEY,
StudentId Int,
ClassId Int,
CONSTRAINT FK_Registration_Student
FOREIGN KEY (StudentId) REFERENCES Student(Id),
CONSTRAINT FK_Registration_Class
FOREIGN KEY (ClassId) REFERENCES Class(Id)
)

This entity would be the right place to store attributes that relate specifically to a student's registration to a class, perhaps a completion flag for instance. Other data would naturally relate to this junction, pehaps a class specific attendance record or a grade history.

If you don't relate Class and Student in this way, how would you select both, all the students in a class, and all the classes a student reads. Performance wise, this is easily optimised by indices on key columns.


When a many-to-many realtionships exists without any attributes I agree that logically, the junction table needn't exist. However, in a relational database, junction tables are still a useful physical implmentaion, perhaps like this,

CREATE TABLE StudentClass
(
StudentId Int,
ClassId Int,
CONSTRAINT PK_StudentClass PRIMARY KEY (ClassId, StudentId),
CONSTRAINT FK_Registration_Student
FOREIGN KEY (StudentId) REFERENCES Student(Id),
CONSTRAINT FK_Registration_Class
FOREIGN KEY (ClassId) REFERENCES Class(Id)
)

this allows simple queries like

// students in a class?
SELECT StudentId
FROM StudentClass
WHERE ClassId = @classId

// classes read by a student?
SELECT ClassId
FROM StudentClass
WHERE StudentId = @studentId

additionaly, this enables a simple way to manage the relationship, partially or completely from either aspect, that will be familar to relational database developers and sargeable by query optimisers.



Related Topics



Leave a reply



Submit