SQL Join and Different Types of Joins

SQL JOIN and different types of JOINs

What is SQL JOIN ?

SQL JOIN is a method to retrieve data from two or more database tables.

What are the different SQL JOINs ?

There are a total of five JOINs. They are :

  1. JOIN or INNER JOIN
2. OUTER JOIN

2.1 LEFT OUTER JOIN or LEFT JOIN
2.2 RIGHT OUTER JOIN or RIGHT JOIN
2.3 FULL OUTER JOIN or FULL JOIN

3. NATURAL JOIN
4. CROSS JOIN
5. SELF JOIN

1. JOIN or INNER JOIN :

In this kind of a JOIN, we get all records that match the condition in both tables, and records in both tables that do not match are not reported.

In other words, INNER JOIN is based on the single fact that: ONLY the matching entries in BOTH the tables SHOULD be listed.

Note that a JOIN without any other JOIN keywords (like INNER, OUTER, LEFT, etc) is an INNER JOIN. In other words, JOIN is
a Syntactic sugar for INNER JOIN (see: Difference between JOIN and INNER JOIN).

2. OUTER JOIN :

OUTER JOIN retrieves

Either,
the matched rows from one table and all rows in the other table
Or,
all rows in all tables (it doesn't matter whether or not there is a match).

There are three kinds of Outer Join :

2.1 LEFT OUTER JOIN or LEFT JOIN

This join returns all the rows from the left table in conjunction with the matching rows from the
right table. If there are no columns matching in the right table, it returns NULL values.

2.2 RIGHT OUTER JOIN or RIGHT JOIN

This JOIN returns all the rows from the right table in conjunction with the matching rows from the
left table. If there are no columns matching in the left table, it returns NULL values.

2.3 FULL OUTER JOIN or FULL JOIN

This JOIN combines LEFT OUTER JOIN and RIGHT OUTER JOIN. It returns rows from either table when the conditions are met and returns NULL value when there is no match.

In other words, OUTER JOIN is based on the fact that: ONLY the matching entries in ONE OF the tables (RIGHT or LEFT) or BOTH of the tables(FULL) SHOULD be listed.

Note that `OUTER JOIN` is a loosened form of `INNER JOIN`.

3. NATURAL JOIN :

It is based on the two conditions :

  1. the JOIN is made on all the columns with the same name for equality.
  2. Removes duplicate columns from the result.

This seems to be more of theoretical in nature and as a result (probably) most DBMS
don't even bother supporting this.

4. CROSS JOIN :

It is the Cartesian product of the two tables involved. The result of a CROSS JOIN will not make sense
in most of the situations. Moreover, we won't need this at all (or needs the least, to be precise).

5. SELF JOIN :

It is not a different form of JOIN, rather it is a JOIN (INNER, OUTER, etc) of a table to itself.

JOINs based on Operators

Depending on the operator used for a JOIN clause, there can be two types of JOINs. They are

  1. Equi JOIN
  2. Theta JOIN

1. Equi JOIN :

For whatever JOIN type (INNER, OUTER, etc), if we use ONLY the equality operator (=), then we say that
the JOIN is an EQUI JOIN.

2. Theta JOIN :

This is same as EQUI JOIN but it allows all other operators like >, <, >= etc.

Many consider both EQUI JOIN and Theta JOIN similar to INNER, OUTER
etc JOINs. But I strongly believe that its a mistake and makes the
ideas vague. Because INNER JOIN, OUTER JOIN etc are all connected with
the tables and their data whereas EQUI JOIN and THETA JOIN are only
connected with the operators we use in the former.

Again, there are many who consider NATURAL JOIN as some sort of
"peculiar" EQUI JOIN. In fact, it is true, because of the first
condition I mentioned for NATURAL JOIN. However, we don't have to
restrict that simply to NATURAL JOINs alone. INNER JOINs, OUTER JOINs
etc could be an EQUI JOIN too.

What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?

Reading this original article on The Code Project will help you a lot: Visual Representation of SQL Joins.

alt text

Also check this post: SQL SERVER – Better Performance – LEFT JOIN or NOT IN?.

Find original one at: Difference between JOIN and OUTER JOIN in MySQL.

Joining Different Data Types

If ProductNumber can contain MORE than the StoreProductNumber:

LEFT JOIN b 
ON a.ProductNumber LIKE '%' + CAST(b.StoreProductNumber as varchar(n)) + '%'

Obviously you can modify n to make sure your StoreProductNumber isn't truncated.

However, if they're guaranteed to be the SAME (just different datatypes), you can just compare them directly:

LEFT JOIN b
on CAST(a.ProductNumber as BIGINT) = b.StoreProductNumber

And if you don't need a BIGINT, you can use an INT or whatever datatype you require.

Lastly, as HLGEM pointed out, SQL will do implicit conversions for you, so technically this would also work:

LEFT JOIN b
on a.ProductNumber = b.StoreProductNumber

But, I prefer to do all conversions explicitly for clarity, so I suggest against this approach.

What is the difference between INNER JOIN and OUTER JOIN?

Assuming you're joining on columns with no duplicates, which is a very common case:

  • An inner join of A and B gives the result of A intersect B, i.e. the inner part of a Venn diagram intersection.

  • An outer join of A and B gives the results of A union B, i.e. the outer parts of a Venn diagram union.

Examples

Suppose you have two tables, with a single column each, and data as follows:

A    B
- -
1 3
2 4
3 5
4 6

Note that (1,2) are unique to A, (3,4) are common, and (5,6) are unique to B.

Inner join

An inner join using either of the equivalent queries gives the intersection of the two tables, i.e. the two rows they have in common.

select * from a INNER JOIN b on a.a = b.b;
select a.*, b.* from a,b where a.a = b.b;

a | b
--+--
3 | 3
4 | 4

Left outer join

A left outer join will give all rows in A, plus any common rows in B.

select * from a LEFT OUTER JOIN b on a.a = b.b;
select a.*, b.* from a,b where a.a = b.b(+);

a | b
--+-----
1 | null
2 | null
3 | 3
4 | 4

Right outer join

A right outer join will give all rows in B, plus any common rows in A.

select * from a RIGHT OUTER JOIN b on a.a = b.b;
select a.*, b.* from a,b where a.a(+) = b.b;

a | b
-----+----
3 | 3
4 | 4
null | 5
null | 6

Full outer join

A full outer join will give you the union of A and B, i.e. all the rows in A and all the rows in B. If something in A doesn't have a corresponding datum in B, then the B portion is null, and vice versa.

select * from a FULL OUTER JOIN b on a.a = b.b;

a | b
-----+-----
1 | null
2 | null
3 | 3
4 | 4
null | 6
null | 5

LEFT JOIN vs. LEFT OUTER JOIN in SQL Server

As per the documentation: FROM (Transact-SQL):

<join_type> ::= 
[ { INNER | { { LEFT | RIGHT | FULL } [ OUTER ] } } [ <join_hint> ] ]
JOIN

The keyword OUTER is marked as optional (enclosed in square brackets). In this specific case, whether you specify OUTER or not makes no difference. Note that while the other elements of the join clause is also marked as optional, leaving them out will make a difference.

For instance, the entire type-part of the JOIN clause is optional, in which case the default is INNER if you just specify JOIN. In other words, this is legal:

SELECT *
FROM A JOIN B ON A.X = B.Y

Here's a list of equivalent syntaxes:

A LEFT JOIN B            A LEFT OUTER JOIN B
A RIGHT JOIN B A RIGHT OUTER JOIN B
A FULL JOIN B A FULL OUTER JOIN B
A INNER JOIN B A JOIN B

Also take a look at the answer I left on this other SO question: SQL left join vs multiple tables on FROM line?.

Difference between JOIN and INNER JOIN

They are functionally equivalent, but INNER JOIN can be a bit clearer to read, especially if the query has other join types (i.e. LEFT or RIGHT or CROSS) included in it.

Is the order of joining tables indifferent as long as we chose proper join types?

In an inner join, the ordering of the tables in the join doesn't matter - the same rows will make up the result set regardless of the order they are in the join statement.

In either a left or right outer join, the order DOES matter. In A left join B, your result set will contain one row for every record in table A, irrespective of whether there is a matching row in table B. If there are non matching rows, this is likely to be a different result set to B left join A.

In a full outer join, the order again doesn't matter - rows will be produced for each row in each joined table no matter what their order.

Regarding A left join B vs B right join A - these will produce the same results. In simple cases with 2 tables, swapping the tables and changing the direction of the outer join will result in the same result set.

This will also apply to 3 or more tables if all of the outer joins are in the same direction - A left join B left join C will give the same set of results as C right join B right join A.

If you start mixing left and right joins, then you will need to start being more careful. There will almost always be a way to make an equivalent query with re-ordered tables, but at that point sub-queries or bracketing off expressions might be the best way to clarify what you are doing.

As another commenter states, using whatever makes your purpose most clear is usually the best option. The ordering of the tables in your query should make little or no difference performance wise, as the query optimiser should work this out (although the only way to be sure of this would be to check the execution plans for each option with your own queries and data).

SQL Join Types and Performance: Cross vs Inner

Your first example is normally called an explicit join and the second one an implicit join. Performance-wise, they should be equivalent, at least in the popular DBMSes.



Related Topics



Leave a reply



Submit