SQL Joins VS SQL Subqueries (Performance)

SQL Joins Vs SQL Subqueries (Performance)?

I would EXPECT the first query to be quicker, mainly because you have an equivalence and an explicit JOIN. In my experience IN is a very slow operator, since SQL normally evaluates it as a series of WHERE clauses separated by "OR" (WHERE x=Y OR x=Z OR...).

As with ALL THINGS SQL though, your mileage may vary. The speed will depend a lot on indexes (do you have indexes on both ID columns? That will help a lot...) among other things.

The only REAL way to tell with 100% certainty which is faster is to turn on performance tracking (IO Statistics is especially useful) and run them both. Make sure to clear your cache between runs!

Join vs. sub-query

Taken from the MySQL manual (13.2.10.11 Rewriting Subqueries as Joins):

A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better—a fact that is not specific to MySQL Server alone.

So subqueries can be slower than LEFT [OUTER] JOIN, but in my opinion their strength is slightly higher readability.

Performance on subqueries and JOINS?

There are a lot of opinions about JOINs vs Subqueries.

Chris London has a great article on this subject.

So it seems like the verdict is to do subqueries. The reason the
subquery in the join is faster than the subquery in the where clause
is, I believe, because when it’s in the where it has to run that
condition for every row whereas it only has to run it once for the
subquery/join. Like I said before different RDBMSs handle things
differently but even if your RDBMS doesn’t handle subqueries as well
others, to me, they are more readable. So now I recommend subqueries!

Source: http://www.chrislondon.co/joins-vs-subqueries/

Subquery v/s inner join in sql server

Usually joins will work faster than inner queries, but in reality it will depend on the execution plan generated by SQL Server. No matter how you write your query, SQL Server will always transform it on an execution plan. If it is "smart" enough to generate the same plan from both queries, you will get the same result.

Here and here some links to help.

Why is subquery join much faster than direct join

The problem with your long-running queries, is that you lack an index on the page_id column of the comments table. Hence, for each row from the pages table, you need to check all rows of the comments table. Since you are using LEFT JOIN, this is the only possible join order. What happens in 5.6, is that when you use a subquery in the FROM clause (aka derived table), MySQL will create an index on the temporary table used for the result of the derived table (auto_key0 in the EXPLAIN output). The reason it is faster when you only select one column, is that the temporary table will be smaller.

In MySQL 5.7, such derived tables will be automatically merge into the main query, if possible. This is done to avoid the extra temporary tables. However, this means that you no longer have an index to use for the join. (See this blog post for details.)

You have two options to improve the query time in 5.7:

  1. You can create an index on comments(page_id)
  2. You can prevent the subquery from being merged by rewriting it to a query that can not be merged. Subqueries with aggregation, LIMIT, or UNION will not be merged (see the blog post for details). One way to do this is to add a LIMIT clause to the subquery. In order not to remove any rows from the result, the limit must be larger than the number of rows in the table.

In MySQL 8.0, you can also use an optimizer hint to avoid the merging. In your case, that would be something like

SELECT /*+ NO_MERGE(c) */ ... FROM

See slides 34-37 of this presentation for examples of how to use such hints.

Performance: Subquery or Joining

Modern RDBMs, including Oracle, optimize most joins and sub queries down to the same execution plan.

Therefore, I would go ahead and write your query in the way that is simplest for you and focus on ensuring that you've fully optimized your indexes.

If you provide your final query and your database schema, we might be able to offer detailed suggestions, including information regarding potential locking issues.

Edit

Here are some general tips that apply to your query:

  • For joins, ensure that you have an index on the columns that you are joining on. Be sure to apply an index to the joined columns in both tables. You might think you only need the index in one direction, but you should index both, since sometimes the database determines that it's better to join in the opposite direction.
  • For WHERE clauses, ensure that you have indexes on the columns mentioned in the WHERE.
  • For inserting many rows, it's best if you can insert them all in a single query.
  • For inserting on a table with a clustered index, it's best if you insert with incremental values for the clustered index so that the new rows are appended to the end of the data. This avoids rebuilding the index and often avoids locks on the existing records, which would slow down SELECT queries against existing rows. Basically, inserts become less painful to other users of the system.


Related Topics



Leave a reply



Submit