Subqueries VS Joins

Join vs. sub-query

Taken from the MySQL manual (13.2.10.11 Rewriting Subqueries as Joins):

A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better—a fact that is not specific to MySQL Server alone.

So subqueries can be slower than LEFT [OUTER] JOIN, but in my opinion their strength is slightly higher readability.

Subqueries vs joins

A "correlated subquery" (i.e., one in which the where condition depends on values obtained from the rows of the containing query) will execute once for each row. A non-correlated subquery (one in which the where condition is independent of the containing query) will execute once at the beginning. The SQL engine makes this distinction automatically.

But, yeah, explain-plan will give you the dirty details.

Understanding when to use a subquery over a join

Good Read for Subquery vs Inner Join

https://www.essentialsql.com/subquery-versus-inner-join/

Normal Join vs Join with Subqueries

In a decent database, there should be no difference between the two queries. Remember, SQL is a descriptive language, not a procedural language. That is, a SQL SELECT statement describes the result set that should be returned. It does not specify the steps for creating it.

Your two queries are semantically equivalent and the SQL optimizer should be able to recognize that.

Of course, SQL optimizers are not omniscient. So, sometimes how you write a query does affect the execution plan. However, the queries that you are describing are turned into execution plans that have no concept of "subquery", so it is reasonable that they would produce the same execution plan.

Note: Some databases -- such as MySQL and MS Access -- do not have very good optimizers and such queries do produce different execution plans. Alas.

SQL Joins Vs SQL Subqueries (Performance)?

I would EXPECT the first query to be quicker, mainly because you have an equivalence and an explicit JOIN. In my experience IN is a very slow operator, since SQL normally evaluates it as a series of WHERE clauses separated by "OR" (WHERE x=Y OR x=Z OR...).

As with ALL THINGS SQL though, your mileage may vary. The speed will depend a lot on indexes (do you have indexes on both ID columns? That will help a lot...) among other things.

The only REAL way to tell with 100% certainty which is faster is to turn on performance tracking (IO Statistics is especially useful) and run them both. Make sure to clear your cache between runs!

Sub query vs joins performance

I would probably write this query using joins:

SELECT
s.siteid,
COALESCE(si.CountUniquePermissions, 0) AS CountUniquePermissions,
COALESCE(si.CountNotModified30Days, 0) AS CountNotModified30Days
FROM sites s
LEFT JOIN
(
SELECT siteid,
COUNT(CASE WHEN CountUniqueRoleAssignments > 0 THEN 1 END)
AS CountUniquePermissions,
COUNT(CASE WHEN Modified < DATEADD (day, -30, GETDATE()) THEN 1 END)
AS CountNotModified30Days
FROM ScannedItems
GROUP BY siteid
) si
ON si.siteid = s.siteid
ORDER BY
s.siteid;

The above query has no WHERE or HAVING clauses, and so I don't see any obvious way to tune it further using indices. But it at least has the potential advantage over your current query that it doesn't involve N^2 behavior with correlated subqueries in the select clause.



Related Topics



Leave a reply



Submit