Explanation of Self-Joins

Explanation of self-joins

You can view self-join as two identical tables. But in normalization, you cannot create two copies of the table so you just simulate having two tables with self-join.

Suppose you have two tables:

Table emp1

Id Name Boss_id            
1 ABC 3
2 DEF 1
3 XYZ 2

Table emp2

Id Name Boss_id            
1 ABC 3
2 DEF 1
3 XYZ 2

Now, if you want to get the name of each employee with his or her boss' names:

select c1.Name , c2.Name As Boss
from emp1 c1
inner join emp2 c2 on c1.Boss_id = c2.Id

Which will output the following table:

Name  Boss
ABC XYZ
DEF ABC
XYZ DEF

What is SELF JOIN and when would you use it?

You use a self join when a table references data in itself.

E.g., an Employee table may have a SupervisorID column that points to the employee that is the boss of the current employee.

To query the data and get information for both people in one row, you could self join like this:

select e1.EmployeeID, 
e1.FirstName,
e1.LastName,
e1.SupervisorID,
e2.FirstName as SupervisorFirstName,
e2.LastName as SupervisorLastName
from Employee e1
left outer join Employee e2 on e1.SupervisorID = e2.EmployeeID

What is a self join for? (in english)

The reason why the employee-manager example is so common, is because it hits the nail on the head. A self join on a table looks for pairs of rows, like any join, but with both rows coming from the same table. Nothing special really.

Self joins. Does Inner, Outer, or Left matter?

It all depends on what you want to do with the data. This answer does a great job of detailing what a self inner join might look like. I recently wrote a report that required comparing grades from two courses a student took in succession. It went something like this:

Given a table student_course:

STUDENT_ID  COURSE  GRADE
1 MTH251 A
1 MTH252 B
2 MTH251 A
2 MTH252 A
3 MTH251 B
3 MTH252 C

Query:

SELECT course1.student_id
, course1.course AS course1
, course1.grade AS grade1
, course2.course AS course2
, course2.grade AS grade2
FROM student_course course1
INNER JOIN student_course course2
ON course1.student_id = course2.student_id
WHERE course1.course = 'MTH251'
AND course2.course = 'MTH252';

Fiddle here. Sorry, the PostgreSQL fiddle wasn't working for me so I used Oracle for testing. The PostgreSQL equivalent should look roughly the same.

Now say I wanted to see a student who may not have taken MTH252. You could do this:

SELECT course1.student_id
, course1.course AS course1
, course1.grade AS grade1
, course2.course AS course2
, course2.grade AS grade2
FROM student_course course1
LEFT OUTER JOIN student_course course2
ON course1.student_id = course2.student_id
AND course2.course = 'MTH252'
WHERE course1.course = 'MTH251';

Other Fiddle

The former displays students who have taken BOTH MTH251 and MTH252, and the latter shows students who have taken MTH251, regardless of their completion of MTH252.

As noted by Nick.McDermaid, a self join works exactly like joining two tables with different data.

Understanding Order of ON Clause in Self-Joins (SQL)

The functionally the order doesn't matter (so, 'ON e.ManagerId = m.Id' is the same as 'ON m.Id = e.ManagerId').

What you are doing here is joining on different columns, which represent different things.

In the incorrect query, you are saying "the managers managerID is the same as the employees ID", which isn't true. Managers (as you've got it in your table) don't have managers themselves.

What you've essentially done is inverse the join. If you were to swap your sign around in you where statement, so WHERE e.Salary > m.Salary to WHERE e.Salary < m.Salary you'd get the same answer as your correct query



Related Topics



Leave a reply



Submit