Left Inner Join VS. Left Outer Join - Why Does the Outer Take Longer

LEFT INNER JOIN vs. LEFT OUTER JOIN - Why does the OUTER take longer?

The fact that the same number of rows is returned is an after fact, the query optimizer cannot know in advance that every row in Accepts has a matching row in Marker, can it?

If you join two tables A and B, say A has 1 million rows and B has 1 row. If you say A LEFT INNER JOIN B it means only rows that match both A and B can result, so the query plan is free to scan B first, then use an index to do a range scan in A, and perhaps return 10 rows. But if you say A LEFT OUTER JOIN B then at least all rows in A have to be returned, so the plan must scan everything in A no matter what it finds in B. By using an OUTER join you are eliminating one possible optimization.

If you do know that every row in Accepts will have a match in Marker, then why not declare a foreign key to enforce this? The optimizer will see the constraint, and if is trusted, will take it into account in the plan.

INNER JOIN vs LEFT JOIN performance in SQL Server

A LEFT JOIN is absolutely not faster than an INNER JOIN. In fact, it's slower; by definition, an outer join (LEFT JOIN or RIGHT JOIN) has to do all the work of an INNER JOIN plus the extra work of null-extending the results. It would also be expected to return more rows, further increasing the total execution time simply due to the larger size of the result set.

(And even if a LEFT JOIN were faster in specific situations due to some difficult-to-imagine confluence of factors, it is not functionally equivalent to an INNER JOIN, so you cannot simply go replacing all instances of one with the other!)

Most likely your performance problems lie elsewhere, such as not having a candidate key or foreign key indexed properly. 9 tables is quite a lot to be joining so the slowdown could literally be almost anywhere. If you post your schema, we might be able to provide more details.


Edit:

Reflecting further on this, I could think of one circumstance under which a LEFT JOIN might be faster than an INNER JOIN, and that is when:

  • Some of the tables are very small (say, under 10 rows);
  • The tables do not have sufficient indexes to cover the query.

Consider this example:

CREATE TABLE #Test1
(
ID int NOT NULL PRIMARY KEY,
Name varchar(50) NOT NULL
)
INSERT #Test1 (ID, Name) VALUES (1, 'One')
INSERT #Test1 (ID, Name) VALUES (2, 'Two')
INSERT #Test1 (ID, Name) VALUES (3, 'Three')
INSERT #Test1 (ID, Name) VALUES (4, 'Four')
INSERT #Test1 (ID, Name) VALUES (5, 'Five')

CREATE TABLE #Test2
(
ID int NOT NULL PRIMARY KEY,
Name varchar(50) NOT NULL
)
INSERT #Test2 (ID, Name) VALUES (1, 'One')
INSERT #Test2 (ID, Name) VALUES (2, 'Two')
INSERT #Test2 (ID, Name) VALUES (3, 'Three')
INSERT #Test2 (ID, Name) VALUES (4, 'Four')
INSERT #Test2 (ID, Name) VALUES (5, 'Five')

SELECT *
FROM #Test1 t1
INNER JOIN #Test2 t2
ON t2.Name = t1.Name

SELECT *
FROM #Test1 t1
LEFT JOIN #Test2 t2
ON t2.Name = t1.Name

DROP TABLE #Test1
DROP TABLE #Test2

If you run this and view the execution plan, you'll see that the INNER JOIN query does indeed cost more than the LEFT JOIN, because it satisfies the two criteria above. It's because SQL Server wants to do a hash match for the INNER JOIN, but does nested loops for the LEFT JOIN; the former is normally much faster, but since the number of rows is so tiny and there's no index to use, the hashing operation turns out to be the most expensive part of the query.

You can see the same effect by writing a program in your favourite programming language to perform a large number of lookups on a list with 5 elements, vs. a hash table with 5 elements. Because of the size, the hash table version is actually slower. But increase it to 50 elements, or 5000 elements, and the list version slows to a crawl, because it's O(N) vs. O(1) for the hashtable.

But change this query to be on the ID column instead of Name and you'll see a very different story. In that case, it does nested loops for both queries, but the INNER JOIN version is able to replace one of the clustered index scans with a seek - meaning that this will literally be an order of magnitude faster with a large number of rows.

So the conclusion is more or less what I mentioned several paragraphs above; this is almost certainly an indexing or index coverage problem, possibly combined with one or more very small tables. Those are the only circumstances under which SQL Server might sometimes choose a worse execution plan for an INNER JOIN than a LEFT JOIN.

What is the difference between INNER JOIN and OUTER JOIN?

Assuming you're joining on columns with no duplicates, which is a very common case:

  • An inner join of A and B gives the result of A intersect B, i.e. the inner part of a Venn diagram intersection.

  • An outer join of A and B gives the results of A union B, i.e. the outer parts of a Venn diagram union.

Examples

Suppose you have two tables, with a single column each, and data as follows:

A    B
- -
1 3
2 4
3 5
4 6

Note that (1,2) are unique to A, (3,4) are common, and (5,6) are unique to B.

Inner join

An inner join using either of the equivalent queries gives the intersection of the two tables, i.e. the two rows they have in common.

select * from a INNER JOIN b on a.a = b.b;
select a.*, b.* from a,b where a.a = b.b;

a | b
--+--
3 | 3
4 | 4

Left outer join

A left outer join will give all rows in A, plus any common rows in B.

select * from a LEFT OUTER JOIN b on a.a = b.b;
select a.*, b.* from a,b where a.a = b.b(+);

a | b
--+-----
1 | null
2 | null
3 | 3
4 | 4

Right outer join

A right outer join will give all rows in B, plus any common rows in A.

select * from a RIGHT OUTER JOIN b on a.a = b.b;
select a.*, b.* from a,b where a.a(+) = b.b;

a | b
-----+----
3 | 3
4 | 4
null | 5
null | 6

Full outer join

A full outer join will give you the union of A and B, i.e. all the rows in A and all the rows in B. If something in A doesn't have a corresponding datum in B, then the B portion is null, and vice versa.

select * from a FULL OUTER JOIN b on a.a = b.b;

a | b
-----+-----
1 | null
2 | null
3 | 3
4 | 4
null | 6
null | 5

LEFT JOIN vs. LEFT OUTER JOIN in SQL Server

As per the documentation: FROM (Transact-SQL):

<join_type> ::= 
[ { INNER | { { LEFT | RIGHT | FULL } [ OUTER ] } } [ <join_hint> ] ]
JOIN

The keyword OUTER is marked as optional (enclosed in square brackets). In this specific case, whether you specify OUTER or not makes no difference. Note that while the other elements of the join clause is also marked as optional, leaving them out will make a difference.

For instance, the entire type-part of the JOIN clause is optional, in which case the default is INNER if you just specify JOIN. In other words, this is legal:

SELECT *
FROM A JOIN B ON A.X = B.Y

Here's a list of equivalent syntaxes:

A LEFT JOIN B            A LEFT OUTER JOIN B
A RIGHT JOIN B A RIGHT OUTER JOIN B
A FULL JOIN B A FULL OUTER JOIN B
A INNER JOIN B A JOIN B

Also take a look at the answer I left on this other SO question: SQL left join vs multiple tables on FROM line?.

Why is LEFT JOIN slower than INNER JOIN?

With INNER JOIN, MySQL generally will start with the table with the smallest number of rows. In this case, it starts with table finished and does a look up for the corresponding record in saved using the index on saved.email.

For a LEFT JOIN, (excluding some optimizations) MySQL generally joins the records in order (starting with the left most table). In this case, MySQL starts with the table saved, then attempts to find each corresponding record in finished. Since there is no usable index on finished.email, it must do a full scan for each look up.

Edit

Now that you posted your schema, I can see that MySQL is ignoring the index (finished.email) when going from utf8 to latin1 character set. You've not posted the character sets and collations for each column, so I'm going by the default character set for the table. The collations must be compatible in order for MySQL to use the index.

MySQL can coerce (upgrade) a latin1 collation, which is very limited, up to a utf8 collation such as unicode_ci (so the first query can use the index on saved.email by upgrading latin1 collation to utf8), but the opposite is not true (the second query can't use the index on finished.email since it can't downgrade a utf8 collation down to latin1).

The solution is to change both email columns to a compatible collation, perhaps most easily by making them identical character sets and collations.

SQL: JOIN vs LEFT OUTER JOIN?

As previously noted above:

JOIN is synonym of INNER JOIN. It's definitively different from all
types of OUTER JOIN

So the question is "When should I use an outer join?"

Here's a good article, with several great diagrams:

https://www.sqlshack.com/sql-outer-join-overview-and-examples/

Sample Image

The short answer your your question is:

  • Prefer JOIN (aka "INNER JOIN") to link two related tables. In practice, you'll use INNER JOIN most of the time.
  • INNER JOIN is the intersection of the two tables. It's represented by the "green" section in the middle of the Venn diagram above.
  • Use an "Outer Join" when you want the left, right or both outer regions.
  • In your example, the result set happens to be the same: the two expressions happen to be equivalent.
  • ALSO: be sure to familiarize yourself with "Show Plan" (or equivalent) for your RDBMS: https://www.sqlshack.com/execution-plans-in-sql-server/

'Hope that helps...



Related Topics



Leave a reply



Submit