Does Except Execute Faster Than a Join When the Table Columns Are the Same

Does EXCEPT execute faster than a JOIN when the table columns are the same

There is no way anyone can tell you that EXCEPT will always or never out-perform an equivalent OUTER JOIN. The optimizer will choose an appropriate execution plan regardless of how you write your intent.

That said, here is my guideline:


Use EXCEPT when at least one of the following is true:

  1. The query is more readable (this will almost always be true).
  2. Performance is improved.

And BOTH of the following are true:

  1. The query produces semantically identical results, and you can demonstrate this through sufficient regression testing, including all edge cases.
  2. Performance is not degraded (again, in all edge cases, as well as environmental changes such as clearing buffer pool, updating statistics, clearing plan cache, and restarting the service).

It is important to note that it can be a challenge to write an equivalent EXCEPT query as the JOIN becomes more complex and/or you are relying on duplicates in part of the columns but not others. Writing a NOT EXISTS equivalent, while slightly less readable than EXCEPT should be far more trivial to accomplish - and will often lead to a better plan (but note that I would never say ALWAYS or NEVER, except in the way I just did).

In this blog post I demonstrate at least one case where EXCEPT is outperformed by both a properly constructed LEFT OUTER JOIN and of course by an equivalent NOT EXISTS variation.

Left join vs EXCEPT

LEFT JOIN and EXCEPT do not produce the same results.

EXCEPT is set operator that eliminates duplicates. LEFT JOIN is a type of join, that can actually produce duplicates. It is not unusual in SQL that two different things produce the same result set for a given set of input data.

I would suggest that you use the one that best fits your use-case. If both work, test which one is faster and use that.

Condition in HAVING faster than same condition in WHERE?

The difference is the execution plan. You would have to look at the execution plans for two two queries to spot the differences.

In my experience, the difference is often due to the ability to use an index for the GROUP BY. The filtering in the WHERE prevents the use of the index. However, that is not the case for your query because it is aggregating by columns from multiple tables.

Another possibility is that the filter removes relatively few records, but affects the execution plan of the JOINs. I suspect this is the cause of what you are seeing. You would need to look at the execution plan to see if the joins are the same.

Use A Union Or A Join - What Is Faster

Union will be faster, as it simply passes the first SELECT statement, and then parses the second SELECT statement and adds the results to the end of the output table.

The Join will go through each row of both tables, finding matches in the other table therefore needing a lot more processing due to searching for matching rows for each and every row.

EDIT

By Union, I mean Union All as it seemed adequate for what you were trying to achieve. Although a normal Union is generally faster then Join.

EDIT 2 (Reply to @seebiscuit 's comment)

I don't agree with him. Technically speaking no matter how good your join is, a "JOIN" is still more expensive than a pure concatenation. I made a blog post to prove it at my blog codePERF[dot]net. Practically speaking they serve 2 completely different purposes and it is more important to ensure your indexing is right and using the right tool for the job.

Technically, I think it can be summed using the following 2 execution plans taken from my blog post:

UNION ALL Execution Plan

UNION ALL Execution Plan

JOIN Execution Plan

JOIN Execution Plan

Practical Results

Practically speaking the difference on a clustered index lookup is negligible:

Benchmark Results

Which is faster/best? SELECT * or SELECT column1, colum2, column3, etc

One reason that selecting specific columns is better is that it raises the probability that SQL Server can access the data from indexes rather than querying the table data.

Here's a post I wrote about it: The real reason select queries are bad index coverage

It's also less fragile to change, since any code that consumes the data will be getting the same data structure regardless of changes you make to the table schema in the future.

SQL JOIN vs IN performance?

Generally speaking, IN and JOIN are different queries that can yield different results.

SELECT  a.*
FROM a
JOIN b
ON a.col = b.col

is not the same as

SELECT  a.*
FROM a
WHERE col IN
(
SELECT col
FROM b
)

, unless b.col is unique.

However, this is the synonym for the first query:

SELECT  a.*
FROM a
JOIN (
SELECT DISTINCT col
FROM b
)
ON b.col = a.col

If the joining column is UNIQUE and marked as such, both these queries yield the same plan in SQL Server.

If it's not, then IN is faster than JOIN on DISTINCT.

See this article in my blog for performance details:

  • IN vs. JOIN vs. EXISTS

Alternate to 'Except' in SQL with performance

Is the combination (TrnId,flgStatus) unique?

Then you might switch to EXCEPT ALL, similar to UNION ALL which might be more efficient than UNION because it avoids the DISTINCT operation.

Another solution which accesses the base table only once:

Select TrnId 
From TableA Where flgStatus in (0,3)
group by TrnId
having MIN(flgStatus) = 3

Is a JOIN faster than a WHERE?

Theoretically, no, it shouldn't be any faster. The query optimizer should be able to generate an identical execution plan. However, some database engines can produce better execution plans for one of them (not likely to happen for such a simple query but for complex enough ones). You should test both and see (on your database engine).



Related Topics



Leave a reply



Submit