Inner Join VS Left Join Performance in SQL Server

INNER JOIN vs LEFT JOIN performance in SQL Server

A LEFT JOIN is absolutely not faster than an INNER JOIN. In fact, it's slower; by definition, an outer join (LEFT JOIN or RIGHT JOIN) has to do all the work of an INNER JOIN plus the extra work of null-extending the results. It would also be expected to return more rows, further increasing the total execution time simply due to the larger size of the result set.

(And even if a LEFT JOIN were faster in specific situations due to some difficult-to-imagine confluence of factors, it is not functionally equivalent to an INNER JOIN, so you cannot simply go replacing all instances of one with the other!)

Most likely your performance problems lie elsewhere, such as not having a candidate key or foreign key indexed properly. 9 tables is quite a lot to be joining so the slowdown could literally be almost anywhere. If you post your schema, we might be able to provide more details.


Edit:

Reflecting further on this, I could think of one circumstance under which a LEFT JOIN might be faster than an INNER JOIN, and that is when:

  • Some of the tables are very small (say, under 10 rows);
  • The tables do not have sufficient indexes to cover the query.

Consider this example:

CREATE TABLE #Test1
(
ID int NOT NULL PRIMARY KEY,
Name varchar(50) NOT NULL
)
INSERT #Test1 (ID, Name) VALUES (1, 'One')
INSERT #Test1 (ID, Name) VALUES (2, 'Two')
INSERT #Test1 (ID, Name) VALUES (3, 'Three')
INSERT #Test1 (ID, Name) VALUES (4, 'Four')
INSERT #Test1 (ID, Name) VALUES (5, 'Five')

CREATE TABLE #Test2
(
ID int NOT NULL PRIMARY KEY,
Name varchar(50) NOT NULL
)
INSERT #Test2 (ID, Name) VALUES (1, 'One')
INSERT #Test2 (ID, Name) VALUES (2, 'Two')
INSERT #Test2 (ID, Name) VALUES (3, 'Three')
INSERT #Test2 (ID, Name) VALUES (4, 'Four')
INSERT #Test2 (ID, Name) VALUES (5, 'Five')

SELECT *
FROM #Test1 t1
INNER JOIN #Test2 t2
ON t2.Name = t1.Name

SELECT *
FROM #Test1 t1
LEFT JOIN #Test2 t2
ON t2.Name = t1.Name

DROP TABLE #Test1
DROP TABLE #Test2

If you run this and view the execution plan, you'll see that the INNER JOIN query does indeed cost more than the LEFT JOIN, because it satisfies the two criteria above. It's because SQL Server wants to do a hash match for the INNER JOIN, but does nested loops for the LEFT JOIN; the former is normally much faster, but since the number of rows is so tiny and there's no index to use, the hashing operation turns out to be the most expensive part of the query.

You can see the same effect by writing a program in your favourite programming language to perform a large number of lookups on a list with 5 elements, vs. a hash table with 5 elements. Because of the size, the hash table version is actually slower. But increase it to 50 elements, or 5000 elements, and the list version slows to a crawl, because it's O(N) vs. O(1) for the hashtable.

But change this query to be on the ID column instead of Name and you'll see a very different story. In that case, it does nested loops for both queries, but the INNER JOIN version is able to replace one of the clustered index scans with a seek - meaning that this will literally be an order of magnitude faster with a large number of rows.

So the conclusion is more or less what I mentioned several paragraphs above; this is almost certainly an indexing or index coverage problem, possibly combined with one or more very small tables. Those are the only circumstances under which SQL Server might sometimes choose a worse execution plan for an INNER JOIN than a LEFT JOIN.

inner join vs left join - which is faster for the same result?

By adding the WHERE A.PK_A = B.PK_B the query optimizer is smart enough to understand that you wanted an inner join and use that instead.
So they will have the same performance since the same execution plan will be created.


Sample Image
So never use a left join with where on the keys when you want an inner join, it will just be frustrating to maintain and understand.

INNER JOIN vs LEFT JOIN performance in SQL

With this statement, the predicates in the WHERE clause would negate the "outerness" of a LEFT JOIN, making equivalent to an INNER JOIN.

I'd expect the optimizer would recognize that, and choose the most cost efficient plan in either case. I'd expect equivalent execution plans, and I wouldn't expect to observe a performance difference in this query, when the INNER keyword is replaced with LEFT.

But without looking at the actual execution plans, that's just conjecture. To see the execution plan, we can use EXPLAIN (for MySQL) or SET SHOWPLAN_ALL ON (for SQL Server, or STATISTICS PROFILE ON, et al.)

The actual performance of the different statements can be observed and measured.


I'm not understanding why you would want to use a LEFT JOIN at all.

(And the mixing of the old-school comma syntax for the join operation with the newer JOIN keyword... what's up with that?)


As far as improving performance, we'd really need to make sure that suitable indexes are available, and are being used. We need to look at the execution plans. We may need to make sure that statistics are collected and up-to-date, and we may need to make some tweaks to the SQL to get an more efficient plan.

It's possible that replacing INNER with LEFT may make a difference in performance of the existing query, but it's unlikely to be a cure to a performance problem.

Personally, I'd prefer to investigate performance of an equivalent query, with this as a starting point:

UPDATE poker_hands f
JOIN poker_cards a ON a.card_name = f.r1 AND a.game_value = 14
JOIN poker_cards b ON b.card_name = f.r2 AND b.game_value = 13 AND b.suit = a.suit
JOIN poker_cards c ON c.card_name = f.r3 AND c.game_value = 12 AND c.suit = a.suit
JOIN poker_cards d ON d.card_name = f.r4 AND d.game_value = 11 AND d.suit = a.suit
JOIN poker_cards e ON e.card_name = f.r5 AND e.game_value = 10 AND e.suit = a.suit
SET f.hand_type = 'Royal flush'

LEFT JOIN Significantly faster than INNER JOIN

The Left join seems to be faster because SQL is forced to do the smaller select first and then join to this smaller set of records. For some reason the optimiser doesn't want to do this naturally.

3 ways to force the joins to happen in the right order:

  1. Select the first subset of data into a temporary table (or table variable) then join on it
  2. Use left joins (and remember that this could return different data because it's a left join not an inner join)
  3. use the FORCE ORDER keyword. Note that if table sizes or schemas change then the query plan may not be correct (see https://dba.stackexchange.com/questions/45388/forcing-join-order)

Performance difference between left join and inner join

There is at least one case where LEFT [OUTER] JOIN is a better option than [INNER] JOIN. I talk about getting the same results using OUTER instead of INNER.

Example (I am using AdventureWorks 2008 database):

-- Some metadata infos
SELECT fk.is_not_trusted, fk.name
FROM sys.foreign_keys fk
WHERE fk.parent_object_id=object_id('Sales.SalesOrderDetail');
GO

CREATE VIEW View1
AS
SELECT h.OrderDate, d.SalesOrderDetailID, o.ModifiedDate
FROM Sales.SalesOrderDetail d
INNER JOIN Sales.SalesOrderHeader h ON d.SalesOrderID = h.SalesOrderID
INNER JOIN Sales.SpecialOfferProduct o ON d.SpecialOfferID=o.SpecialOfferID AND d.ProductID=o.ProductID;
GO

CREATE VIEW View2
AS
SELECT h.OrderDate, d.SalesOrderDetailID, o.ModifiedDate
FROM Sales.SalesOrderDetail d
INNER JOIN Sales.SalesOrderHeader h ON d.SalesOrderID = h.SalesOrderID
LEFT JOIN Sales.SpecialOfferProduct o ON d.SpecialOfferID=o.SpecialOfferID AND d.ProductID=o.ProductID;
GO

SELECT SalesOrderDetailID
FROM View1;

SELECT SalesOrderDetailID
FROM View2;

Results for the first query:

is_not_trusted name
-------------- ---------------------------------------------------------------
0 FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID
0 FK_SalesOrderDetail_SpecialOfferProduct_SpecialOfferIDProductID

Execution plans for the last two queries:
Sample Image

Note 1 / View 1: If we look at the execution plan for SELECT SalesOrderDetailID FROM View1
we see a FK elimination because the FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID constraint is trusted and it has a single column. But, the server is forced (because of INNER JOIN Sales.SpecialOfferProduct) to read data from the third table (SpecialOfferProduct) even the SELECT/WHERE clauses doesn't contain any columns from this table and the FK constraint (FK_SalesOrderDetail_SpecialOfferProduct_SpecialOfferIDProductID) is (also) trusted. This happens because this last FK is multicolumn.

Note 2 / View 2: What if we want to remove the read (Scan/Seek) on the Sales.SpecialOfferProduct? This second FK is multicolumn and for such cases the SQL Server cannot eliminates the FK (see previous Conor Cunnigham blog post). In this case we need to replace the INNER JOIN Sales.SpecialOfferProduct with LEFT OUTER JOIN Sales.SpecialOfferProduct in order to get FK elimination. Both SpecialOfferID and ProductID columns are NOT NULL and we a have a trusted FK referencing SpecialOfferProduct table.

Mysql - LEFT JOIN way faster than INNER JOIN

The reason for the different plans is that LEFT JOIN will force the join order of your tables to match the order they appear in your query. Without the left join, the optimizer will choose the join order for you, and in this case it will choose the very small table first. (You can see this in your explain by looking at the order the tables are listed.) Once your join order is switched, the index for X changes to KEY d which must have a much larger data set than the compound key.

To fix this, change your select to SELECT STRAIGHT_JOIN *. This is preferred over USE INDEX so that the optimizer can still choose the best key for table X... You might find a better compound key than a,b,c,d, or if your data in X changes dramatically, one of your other keys may be better after a point.

I have to point out, that you normally can't just switch to a LEFT JOIN. The data returned will usually be different!

Explain JOIN vs. LEFT JOIN and WHERE condition performance suggestion in more detail

Consider the following example. We have two tables, DEPARTMENTS and EMPLOYEES.

Some departments do not yet have any employees.

This query uses an inner join that finds the department employee 999 works at, if any, otherwise it shows nothing (not even the employee or his or her name):

select a.department_id, a.department_desc, b.employee_id, b.employee_name
from departments a
join employees b
on a.department_id = b.department_id
where b.employee_id = '999'

This next query uses an outer join (left between departments and employees) and finds the department that employee 999 works for. However it too will not show the employee's ID or his or her name, if they do not work at any departments. That is because of the outer joined table being used in the WHERE clause. If there is no matching department, it will be null (not 999, even though 999 exists in employees).

select a.department_id, a.department_desc, b.employee_id, b.employee_name
from departments a
left join employees b
on a.department_id = b.department_id
where b.employee_id = '999'

But consider this query:

select a.department_id, a.department_desc, b.employee_id, b.employee_name
from departments a
left join employees b
on a.department_id = b.department_id
and b.employee_id= '999'

Now the criteria is in the on clause. So even if this employee works at no departments, he will still be returned (his ID and name). The department columns will be null, but we get a result (the employee side).

You might think you would never want to use the outer joined table in the WHERE clause, but that is not necessarily the case. Normally it is, for the reason described above, though.

Suppose you want all departments with no employees. Then you could run the following, which does use an outer join, and the outer joined table is used in the where clause:

select a.department_id, a.department_desc, b.employee_id
from departments a
left join employees b
on a.department_id = b.department_id
where b.employee_id is null

^^ Shows departments with no employees.

The above is likely the only legitimate reason you would want to use an outer joined table in the WHERE clause rather than the ON clause (which I think is what your question is; the difference between inner and outer joins is an entirely different topic).

A good way to look at is this: You use outer joins to allow nulls. Why would you then use an outer join and say that a field should not be null and should be equal to 'XYZ'? If a value has to be 'XYZ' (not null), then why instruct the database to allow nulls to come back? It's like saying one thing and then overriding it later.

INNER JOIN vs LEFT JOIN performance in SQL Server

A LEFT JOIN is absolutely not faster than an INNER JOIN. In fact, it's slower; by definition, an outer join (LEFT JOIN or RIGHT JOIN) has to do all the work of an INNER JOIN plus the extra work of null-extending the results. It would also be expected to return more rows, further increasing the total execution time simply due to the larger size of the result set.

(And even if a LEFT JOIN were faster in specific situations due to some difficult-to-imagine confluence of factors, it is not functionally equivalent to an INNER JOIN, so you cannot simply go replacing all instances of one with the other!)

Most likely your performance problems lie elsewhere, such as not having a candidate key or foreign key indexed properly. 9 tables is quite a lot to be joining so the slowdown could literally be almost anywhere. If you post your schema, we might be able to provide more details.


Edit:

Reflecting further on this, I could think of one circumstance under which a LEFT JOIN might be faster than an INNER JOIN, and that is when:

  • Some of the tables are very small (say, under 10 rows);
  • The tables do not have sufficient indexes to cover the query.

Consider this example:

CREATE TABLE #Test1
(
ID int NOT NULL PRIMARY KEY,
Name varchar(50) NOT NULL
)
INSERT #Test1 (ID, Name) VALUES (1, 'One')
INSERT #Test1 (ID, Name) VALUES (2, 'Two')
INSERT #Test1 (ID, Name) VALUES (3, 'Three')
INSERT #Test1 (ID, Name) VALUES (4, 'Four')
INSERT #Test1 (ID, Name) VALUES (5, 'Five')

CREATE TABLE #Test2
(
ID int NOT NULL PRIMARY KEY,
Name varchar(50) NOT NULL
)
INSERT #Test2 (ID, Name) VALUES (1, 'One')
INSERT #Test2 (ID, Name) VALUES (2, 'Two')
INSERT #Test2 (ID, Name) VALUES (3, 'Three')
INSERT #Test2 (ID, Name) VALUES (4, 'Four')
INSERT #Test2 (ID, Name) VALUES (5, 'Five')

SELECT *
FROM #Test1 t1
INNER JOIN #Test2 t2
ON t2.Name = t1.Name

SELECT *
FROM #Test1 t1
LEFT JOIN #Test2 t2
ON t2.Name = t1.Name

DROP TABLE #Test1
DROP TABLE #Test2

If you run this and view the execution plan, you'll see that the INNER JOIN query does indeed cost more than the LEFT JOIN, because it satisfies the two criteria above. It's because SQL Server wants to do a hash match for the INNER JOIN, but does nested loops for the LEFT JOIN; the former is normally much faster, but since the number of rows is so tiny and there's no index to use, the hashing operation turns out to be the most expensive part of the query.

You can see the same effect by writing a program in your favourite programming language to perform a large number of lookups on a list with 5 elements, vs. a hash table with 5 elements. Because of the size, the hash table version is actually slower. But increase it to 50 elements, or 5000 elements, and the list version slows to a crawl, because it's O(N) vs. O(1) for the hashtable.

But change this query to be on the ID column instead of Name and you'll see a very different story. In that case, it does nested loops for both queries, but the INNER JOIN version is able to replace one of the clustered index scans with a seek - meaning that this will literally be an order of magnitude faster with a large number of rows.

So the conclusion is more or less what I mentioned several paragraphs above; this is almost certainly an indexing or index coverage problem, possibly combined with one or more very small tables. Those are the only circumstances under which SQL Server might sometimes choose a worse execution plan for an INNER JOIN than a LEFT JOIN.



Related Topics



Leave a reply



Submit