Does EXCEPT execute faster than a JOIN when the table columns are the same
There is no way anyone can tell you that EXCEPT
will always or never out-perform an equivalent OUTER JOIN
. The optimizer will choose an appropriate execution plan regardless of how you write your intent.
That said, here is my guideline:
Use EXCEPT
when at least one of the following is true:
- The query is more readable (this will almost always be true).
- Performance is improved.
And BOTH of the following are true:
- The query produces semantically identical results, and you can demonstrate this through sufficient regression testing, including all edge cases.
- Performance is not degraded (again, in all edge cases, as well as environmental changes such as clearing buffer pool, updating statistics, clearing plan cache, and restarting the service).
It is important to note that it can be a challenge to write an equivalent EXCEPT
query as the JOIN
becomes more complex and/or you are relying on duplicates in part of the columns but not others. Writing a NOT EXISTS
equivalent, while slightly less readable than EXCEPT
should be far more trivial to accomplish - and will often lead to a better plan (but note that I would never say ALWAYS
or NEVER
, except in the way I just did).
In this blog post I demonstrate at least one case where EXCEPT
is outperformed by both a properly constructed LEFT OUTER JOIN
and of course by an equivalent NOT EXISTS
variation.
Left join vs EXCEPT
LEFT JOIN
and EXCEPT
do not produce the same results.
EXCEPT
is set operator that eliminates duplicates. LEFT JOIN
is a type of join, that can actually produce duplicates. It is not unusual in SQL that two different things produce the same result set for a given set of input data.
I would suggest that you use the one that best fits your use-case. If both work, test which one is faster and use that.
Condition in HAVING faster than same condition in WHERE?
The difference is the execution plan. You would have to look at the execution plans for two two queries to spot the differences.
In my experience, the difference is often due to the ability to use an index for the GROUP BY
. The filtering in the WHERE
prevents the use of the index. However, that is not the case for your query because it is aggregating by columns from multiple tables.
Another possibility is that the filter removes relatively few records, but affects the execution plan of the JOIN
s. I suspect this is the cause of what you are seeing. You would need to look at the execution plan to see if the joins are the same.
Use A Union Or A Join - What Is Faster
Union will be faster, as it simply passes the first SELECT statement, and then parses the second SELECT statement and adds the results to the end of the output table.
The Join will go through each row of both tables, finding matches in the other table therefore needing a lot more processing due to searching for matching rows for each and every row.
EDIT
By Union, I mean Union All as it seemed adequate for what you were trying to achieve. Although a normal Union is generally faster then Join.
EDIT 2 (Reply to @seebiscuit 's comment)
I don't agree with him. Technically speaking no matter how good your join is, a "JOIN" is still more expensive than a pure concatenation. I made a blog post to prove it at my blog codePERF[dot]net. Practically speaking they serve 2 completely different purposes and it is more important to ensure your indexing is right and using the right tool for the job.
Technically, I think it can be summed using the following 2 execution plans taken from my blog post:
UNION ALL
Execution Plan
JOIN
Execution Plan
Practical Results
Practically speaking the difference on a clustered index lookup is negligible:
Which is faster/best? SELECT * or SELECT column1, colum2, column3, etc
One reason that selecting specific columns is better is that it raises the probability that SQL Server can access the data from indexes rather than querying the table data.
Here's a post I wrote about it: The real reason select queries are bad index coverage
It's also less fragile to change, since any code that consumes the data will be getting the same data structure regardless of changes you make to the table schema in the future.
SQL JOIN vs IN performance?
Generally speaking, IN
and JOIN
are different queries that can yield different results.
SELECT a.*
FROM a
JOIN b
ON a.col = b.col
is not the same as
SELECT a.*
FROM a
WHERE col IN
(
SELECT col
FROM b
)
, unless b.col
is unique.
However, this is the synonym for the first query:
SELECT a.*
FROM a
JOIN (
SELECT DISTINCT col
FROM b
)
ON b.col = a.col
If the joining column is UNIQUE
and marked as such, both these queries yield the same plan in SQL Server
.
If it's not, then IN
is faster than JOIN
on DISTINCT
.
See this article in my blog for performance details:
IN
vs.JOIN
vs.EXISTS
Alternate to 'Except' in SQL with performance
Is the combination (TrnId,flgStatus)
unique?
Then you might switch to EXCEPT ALL
, similar to UNION ALL
which might be more efficient than UNION
because it avoids the DISTINCT operation.
Another solution which accesses the base table only once:
Select TrnId
From TableA Where flgStatus in (0,3)
group by TrnId
having MIN(flgStatus) = 3
Is a JOIN faster than a WHERE?
Theoretically, no, it shouldn't be any faster. The query optimizer should be able to generate an identical execution plan. However, some database engines can produce better execution plans for one of them (not likely to happen for such a simple query but for complex enough ones). You should test both and see (on your database engine).
Related Topics
Moving a Point Along a Path in SQL Server 2008
MySQL Join/Group_Concat Second Table
Access SQL How to Make an Increment in Select Query
How to Properly Trigger an Insert to a Linked SQL Server
Update Columns with Null Values
Paging with Oracle and SQL Server and Generic Paging Method
SQL Syntax to Pivot Multiple Tables
How to Insert Values into the Database Table Using Vba in Ms Access
Get the Id of Last Inserted Records
Only Show Effective SQL String P6Spy
Preserve Parent-Child Relationships When Copying Hierarchical Data
Recursive Query for Bill of Materials
SQL Server Rounding Issue Where There Is 5
Retrieve Aggregates for Arbitrary Time Intervals