SQL JOIN and different types of JOINs
What is SQL JOIN
?
SQL JOIN
is a method to retrieve data from two or more database tables.
What are the different SQL JOIN
s ?
There are a total of five JOIN
s. They are :
1. JOIN or INNER JOIN
2. OUTER JOIN
2.1 LEFT OUTER JOIN or LEFT JOIN
2.2 RIGHT OUTER JOIN or RIGHT JOIN
2.3 FULL OUTER JOIN or FULL JOIN
3. NATURAL JOIN
4. CROSS JOIN
5. SELF JOIN
1. JOIN or INNER JOIN :
In this kind of a JOIN
, we get all records that match the condition in both tables, and records in both tables that do not match are not reported.
In other words, INNER JOIN
is based on the single fact that: ONLY the matching entries in BOTH the tables SHOULD be listed.
Note that a JOIN
without any other JOIN
keywords (like INNER
, OUTER
, LEFT
, etc) is an INNER JOIN
. In other words, JOIN
is
a Syntactic sugar for INNER JOIN
(see: Difference between JOIN and INNER JOIN).
2. OUTER JOIN :
OUTER JOIN
retrieves
Either,
the matched rows from one table and all rows in the other table
Or,
all rows in all tables (it doesn't matter whether or not there is a match).
There are three kinds of Outer Join :
2.1 LEFT OUTER JOIN or LEFT JOIN
This join returns all the rows from the left table in conjunction with the matching rows from the
right table. If there are no columns matching in the right table, it returns NULL
values.
2.2 RIGHT OUTER JOIN or RIGHT JOIN
This JOIN
returns all the rows from the right table in conjunction with the matching rows from the
left table. If there are no columns matching in the left table, it returns NULL
values.
2.3 FULL OUTER JOIN or FULL JOIN
This JOIN
combines LEFT OUTER JOIN
and RIGHT OUTER JOIN
. It returns rows from either table when the conditions are met and returns NULL
value when there is no match.
In other words, OUTER JOIN
is based on the fact that: ONLY the matching entries in ONE OF the tables (RIGHT or LEFT) or BOTH of the tables(FULL) SHOULD be listed.
Note that `OUTER JOIN` is a loosened form of `INNER JOIN`.
3. NATURAL JOIN :
It is based on the two conditions :
- the
JOIN
is made on all the columns with the same name for equality. - Removes duplicate columns from the result.
This seems to be more of theoretical in nature and as a result (probably) most DBMS
don't even bother supporting this.
4. CROSS JOIN :
It is the Cartesian product of the two tables involved. The result of a CROSS JOIN
will not make sense
in most of the situations. Moreover, we won't need this at all (or needs the least, to be precise).
5. SELF JOIN :
It is not a different form of JOIN
, rather it is a JOIN
(INNER
, OUTER
, etc) of a table to itself.
JOINs based on Operators
Depending on the operator used for a JOIN
clause, there can be two types of JOIN
s. They are
- Equi JOIN
- Theta JOIN
1. Equi JOIN :
For whatever JOIN
type (INNER
, OUTER
, etc), if we use ONLY the equality operator (=), then we say that
the JOIN
is an EQUI JOIN
.
2. Theta JOIN :
This is same as EQUI JOIN
but it allows all other operators like >, <, >= etc.
Many consider both
EQUI JOIN
and ThetaJOIN
similar toINNER
,OUTER
etcJOIN
s. But I strongly believe that its a mistake and makes the
ideas vague. BecauseINNER JOIN
,OUTER JOIN
etc are all connected with
the tables and their data whereasEQUI JOIN
andTHETA JOIN
are only
connected with the operators we use in the former.Again, there are many who consider
NATURAL JOIN
as some sort of
"peculiar"EQUI JOIN
. In fact, it is true, because of the first
condition I mentioned forNATURAL JOIN
. However, we don't have to
restrict that simply toNATURAL JOIN
s alone.INNER JOIN
s,OUTER JOIN
s
etc could be anEQUI JOIN
too.
What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?
Reading this original article on The Code Project will help you a lot: Visual Representation of SQL Joins.
Also check this post: SQL SERVER – Better Performance – LEFT JOIN or NOT IN?.
Find original one at: Difference between JOIN and OUTER JOIN in MySQL.
Joining Different Data Types
If ProductNumber can contain MORE than the StoreProductNumber:
LEFT JOIN b
ON a.ProductNumber LIKE '%' + CAST(b.StoreProductNumber as varchar(n)) + '%'
Obviously you can modify n
to make sure your StoreProductNumber isn't truncated.
However, if they're guaranteed to be the SAME (just different datatypes), you can just compare them directly:
LEFT JOIN b
on CAST(a.ProductNumber as BIGINT) = b.StoreProductNumber
And if you don't need a BIGINT
, you can use an INT
or whatever datatype you require.
Lastly, as HLGEM pointed out, SQL will do implicit conversions for you, so technically this would also work:
LEFT JOIN b
on a.ProductNumber = b.StoreProductNumber
But, I prefer to do all conversions explicitly for clarity, so I suggest against this approach.
What is the difference between INNER JOIN and OUTER JOIN?
Assuming you're joining on columns with no duplicates, which is a very common case:
An inner join of A and B gives the result of A intersect B, i.e. the inner part of a Venn diagram intersection.
An outer join of A and B gives the results of A union B, i.e. the outer parts of a Venn diagram union.
Examples
Suppose you have two tables, with a single column each, and data as follows:
A B
- -
1 3
2 4
3 5
4 6
Note that (1,2) are unique to A, (3,4) are common, and (5,6) are unique to B.
Inner join
An inner join using either of the equivalent queries gives the intersection of the two tables, i.e. the two rows they have in common.
select * from a INNER JOIN b on a.a = b.b;
select a.*, b.* from a,b where a.a = b.b;
a | b
--+--
3 | 3
4 | 4
Left outer join
A left outer join will give all rows in A, plus any common rows in B.
select * from a LEFT OUTER JOIN b on a.a = b.b;
select a.*, b.* from a,b where a.a = b.b(+);
a | b
--+-----
1 | null
2 | null
3 | 3
4 | 4
Right outer join
A right outer join will give all rows in B, plus any common rows in A.
select * from a RIGHT OUTER JOIN b on a.a = b.b;
select a.*, b.* from a,b where a.a(+) = b.b;
a | b
-----+----
3 | 3
4 | 4
null | 5
null | 6
Full outer join
A full outer join will give you the union of A and B, i.e. all the rows in A and all the rows in B. If something in A doesn't have a corresponding datum in B, then the B portion is null, and vice versa.
select * from a FULL OUTER JOIN b on a.a = b.b;
a | b
-----+-----
1 | null
2 | null
3 | 3
4 | 4
null | 6
null | 5
LEFT JOIN vs. LEFT OUTER JOIN in SQL Server
As per the documentation: FROM (Transact-SQL):
<join_type> ::=
[ { INNER | { { LEFT | RIGHT | FULL } [ OUTER ] } } [ <join_hint> ] ]
JOIN
The keyword OUTER
is marked as optional (enclosed in square brackets). In this specific case, whether you specify OUTER
or not makes no difference. Note that while the other elements of the join clause is also marked as optional, leaving them out will make a difference.
For instance, the entire type-part of the JOIN
clause is optional, in which case the default is INNER
if you just specify JOIN
. In other words, this is legal:
SELECT *
FROM A JOIN B ON A.X = B.Y
Here's a list of equivalent syntaxes:
A LEFT JOIN B A LEFT OUTER JOIN B
A RIGHT JOIN B A RIGHT OUTER JOIN B
A FULL JOIN B A FULL OUTER JOIN B
A INNER JOIN B A JOIN B
Also take a look at the answer I left on this other SO question: SQL left join vs multiple tables on FROM line?.
Difference between JOIN and INNER JOIN
They are functionally equivalent, but INNER JOIN
can be a bit clearer to read, especially if the query has other join types (i.e. LEFT
or RIGHT
or CROSS
) included in it.
Is the order of joining tables indifferent as long as we chose proper join types?
In an inner join, the ordering of the tables in the join doesn't matter - the same rows will make up the result set regardless of the order they are in the join statement.
In either a left or right outer join, the order DOES matter. In A left join B, your result set will contain one row for every record in table A, irrespective of whether there is a matching row in table B. If there are non matching rows, this is likely to be a different result set to B left join A.
In a full outer join, the order again doesn't matter - rows will be produced for each row in each joined table no matter what their order.
Regarding A left join B vs B right join A - these will produce the same results. In simple cases with 2 tables, swapping the tables and changing the direction of the outer join will result in the same result set.
This will also apply to 3 or more tables if all of the outer joins are in the same direction - A left join B left join C will give the same set of results as C right join B right join A.
If you start mixing left and right joins, then you will need to start being more careful. There will almost always be a way to make an equivalent query with re-ordered tables, but at that point sub-queries or bracketing off expressions might be the best way to clarify what you are doing.
As another commenter states, using whatever makes your purpose most clear is usually the best option. The ordering of the tables in your query should make little or no difference performance wise, as the query optimiser should work this out (although the only way to be sure of this would be to check the execution plans for each option with your own queries and data).
SQL Join Types and Performance: Cross vs Inner
Your first example is normally called an explicit join and the second one an implicit join. Performance-wise, they should be equivalent, at least in the popular DBMSes.
Related Topics
Why Is Select * Considered Harmful
Simple Way to Transpose Columns and Rows in Sql
Select Top 10 Records For Each Category
Index For Finding an Element in a Json Array
Select Values That Meet Different Conditions on Different Rows
Listagg in Oracle to Return Distinct Values
How to For SQL Output Clause to Return a Column Not Being Inserted
SQL Server Recursive Self Join
What Is the Expected Behaviour For Multiple Set-Returning Functions in Select Clause
Calculate a Running Total in SQL Server
How to Return Only the Date from a SQL Server Datetime Datatype
Using Column Alias in Where Clause of MySQL Query Produces an Error
Referring to a Column Alias in a Where Clause
How to Do a Batch Insert in MySQL