When Is a Good Situation to Use a Full Outer Join

When is a good situation to use a full outer join?

It's rare, but I have a few cases where it's used. Typically in exception reports or ETL or other very peculiar situations where both sides have data you are trying to combine.

The alternative is to use an INNER JOIN, a LEFT JOIN (with right side IS NULL) and a RIGHT JOIN (with left side IS NULL) and do a UNION - sometimes this approach is better because you can customize each individual join more obviously (and add a derived column to indicate which side is found or whether it's found in both and which one is going to win).

What is the reason behind a Full outer Join resulting in more rows than either parent data set?

Check this out:

TblJob
Name, Surname, Job
John, White, Developer
John, Black, Tester
John, Grey, Manager

TblDrinksPref
Name, Surname, Drink
John, White, Coffee
John, Black, Tea
John, Grey, Orange Juice

SELECT * FROM tbljob j JOIN tbldrinkspref p ON j.name = p.name

John, White, Developer, John, White, Coffee
John, White, Developer, John, Black, Tea
John, White, Developer, John, Grey, Orange Juice
John, Black, Tester, John, White, Coffee
John, Black, Tester, John, Black, Tea
John, Black, Tester, John, Grey, Orange Juice
John, Grey, Manager, John, White, Coffee
John, Grey, Manager, John, Black, Tea
John, Grey, Manager, John, Grey, Orange Juice

By joining in just first name every row in each table matches the other. 3 rows in each table result in 3x3 tables output; more than the sum of the rows. The most rows you'll get out of a join is the multiplication of the numbers of rows heading into the join. We call this a Cartesian product and it's usually an indication that there is a bug in your SQL joins. Any join can do this, not just outer ones. There exists a join (called a CROSS JOIN) whose sole purpose is to produce an output that is a perfect Cartesian product because sometimes we do want to do it but mostly it indicates a problem

What can you do about it? Don't join rows to rows that are unrelated, by making your join conditions better/more accurate:

SELECT * 
FROM tbljob j JOIN tbldrinkspref p
ON j.name = p.name
--the last name is vital to associate rows correctly in this case
AND j.surname = p.surname

If you have written a large SQL and are getting certain rows unexpectedly duplicated it means one of your joins is faulty. Comment them all out back to just the first table, and comment the select block, then keep rerunning the sql as you add joins back in. When you see the rowcount increase unexpectedly that's possibly the fail but be aware that a join may cause rows to disappear too, and you might be in a situation where adding a join may cause half the rows to disappear because they don't match the join predicate but the other half of the rows to double because the join is faulty. You must keep in mind the data you're joining when assessing how the row count should change as a result of adding a table in, versus how it does actually change

What is the purpose (or use case) for an outer join in SQL?

An example use case would be to produce a report that shows ALL customers and their purchases. That is, show even customers who have not purchased anything. If you do an ordinary join of customers and purchases, the report would show only those customers with at least one purchase.

Why does OUTER JOIN need an equality rule?

You are confusing a cross join with a inner/outer(left,right,full) joins. A cross join will match each row with every other row without a condition. The inner/outer joins match on those satisfying the condition in the ON clause. The primary difference among the inner and outer joins is how a non-match is handled. The non-match concept does not apply to a cross join.

Try running these queries as an example. Note the last couple of queries using a full outer join. The ON clause with 1=1 simulates the cross join. The ON clause with 1=0 returns one record from each table.

BTW, I added a ORDER BY in the ON clause with 1=1 because the order was not that of the cross join. (Wanted it easy to compare.) Even without the ORDER BY clause, the cost of this trick is about 100x using a cross join for this very small example.

set nocount on

print 'Inner join:'

select *
from (
values (1,'A'),(2,'B'),(3,'C'),(25,'Y')
) t1 (id, val)
inner join (
values (1, 'A'),(2,'B'),(3,'C'),(26,'Z')
) t2 (id, val)
ON t1.id = t2.id

print 'Left outer join:'

select *
from (
values (1,'A'),(2,'B'),(3,'C'),(25,'Y')
) t1 (id, val)
left outer join (
values (1, 'A'),(2,'B'),(3,'C'),(26,'Z')
) t2 (id, val)
ON t1.id = t2.id

print 'Right outer join:'

select *
from (
values (1,'A'),(2,'B'),(3,'C'),(25,'Y')
) t1 (id, val)
right outer join (
values (1, 'A'),(2,'B'),(3,'C'),(26,'Z')
) t2 (id, val)
ON t1.id = t2.id

print 'Full outer join:'

select *
from (
values (1,'A'),(2,'B'),(3,'C'),(25,'Y')
) t1 (id, val)
full outer join (
values (1, 'A'),(2,'B'),(3,'C'),(26,'Z')
) t2 (id, val)
ON t1.id = t2.id

print 'Cross join:'

select *
from (
values (1,'A'),(2,'B'),(3,'C'),(25,'Y')
) t1 (id, val)
cross join (
values (1, 'A'),(2,'B'),(3,'C'),(26,'Z')
) t2 (id, val)

print 'Full outer join with everything matching anything - a fake cross join:'

select *
from (
values (1,'A'),(2,'B'),(3,'C'),(25,'Y')
) t1 (id, val)
full outer join (
values (1, 'A'),(2,'B'),(3,'C'),(26,'Z')
) t2 (id, val)
ON 1 = 1
ORDER BY t1.id, t2.id

print 'Full outer join with no matches ever:'

select *
from (
values (1,'A'),(2,'B'),(3,'C'),(25,'Y')
) t1 (id, val)
full outer join (
values (1, 'A'),(2,'B'),(3,'C'),(26,'Z')
) t2 (id, val)
ON 1 = 0
Inner join:

id val id val
----------- ---- ----------- ----
1 A 1 A
2 B 2 B
3 C 3 C

Left outer join:

id val id val
----------- ---- ----------- ----
1 A 1 A
2 B 2 B
3 C 3 C
25 Y NULL NULL

Right outer join:

id val id val
----------- ---- ----------- ----
1 A 1 A
2 B 2 B
3 C 3 C
NULL NULL 26 Z

Full outer join:

id val id val
----------- ---- ----------- ----
1 A 1 A
2 B 2 B
3 C 3 C
NULL NULL 26 Z
25 Y NULL NULL

Cross join:

id val id val
----------- ---- ----------- ----
1 A 1 A
1 A 2 B
1 A 3 C
1 A 26 Z
2 B 1 A
2 B 2 B
2 B 3 C
2 B 26 Z
3 C 1 A
3 C 2 B
3 C 3 C
3 C 26 Z
25 Y 1 A
25 Y 2 B
25 Y 3 C
25 Y 26 Z

Full outer join with everything matching anything - a fake cross join:

id val id val
----------- ---- ----------- ----
1 A 1 A
1 A 2 B
1 A 3 C
1 A 26 Z
2 B 1 A
2 B 2 B
2 B 3 C
2 B 26 Z
3 C 1 A
3 C 2 B
3 C 3 C
3 C 26 Z
25 Y 1 A
25 Y 2 B
25 Y 3 C
25 Y 26 Z

Full outer join with no matches ever:

id val id val
----------- ---- ----------- ----
1 A NULL NULL
2 B NULL NULL
3 C NULL NULL
25 Y NULL NULL
NULL NULL 1 A
NULL NULL 2 B
NULL NULL 3 C
NULL NULL 26 Z

What is a good way to make multiple full outer join?

This fiddle illustrates the problem.

If you want the rows from tables B and C to join, you need to accomodate the fact that maybe the data comes from table B and not A. The easiest is probably to use COALESCE.

Your join should therefore look like:

SELECT a.*, b.*, c.*
FROM tableA a
FULL JOIN tableB b ON a.id = b.id
FULL JOIN tableC c ON COALESCE(a.id, b.id) = c.id
-- FULL JOIN tableD d ON COALESCE(a.id, b.id, c.id) = d.id
-- FULL JOIN tableE e ON COALESCE(a.id, b.id, c.id, d.id) = e.id

Is FULL OUTER JOIN really such a bad thing here?

You want a full join here, so I don't think it's considered harmful at all. That's the best way to get to your solution set. I'd run the query execution play on each way to see which would be fastest, but my guess is that the full join would be.

Just remember that there is a distinct difference between a condition in a where clause and a condition in a join clause when dealing with outer joins. Where limits the whole result set, join limits the rows that get matched.

Should I use an SQL full outer join for this?

You should write your query as follows:

SELECT <columns>
FROM A INNER JOIN B
ON
A.RELATED_DOC_NUM = B.DOC_NUM
WHERE
A.DOC_TYPE IN ('ST') AND
B.DOC_TYPE IN ('OT') AND
(A.NEXT_STATUS < 999 OR B.NEXT_STATUS < 999)

LEFT join is a type of OUTER join; LEFT JOIN is typically a contraction of LEFT OUTER JOIN). OUTER means "one side might have nulls in every column because there was no match". Most critically, the code as posted in the question (with a LEFT JOIN, but then has WHERE some_column_from_the_right_table = some_value) runs as an INNER join, because any NULLs inserted by the LEFT OUTER process, are then quashed by the WHERE clause

why ON clause used on full outer join?

A cross join produces a Cartesian product between the two tables, returning all possible combinations of all rows. It has no on clause because you're just joining everything to everything.

A full outer join is a combination of a left outer and right outer join. It returns all rows in both tables that match the query's where clause, and in cases where the on condition can't be satisfied for those rows it puts null values in for the un-populated fields.

From @Donnie answer



Related Topics



Leave a reply



Submit