Difference Between Subquery and Correlated Subquery

Difference between Subquery and Correlated Subquery

Correlated Subquery is a sub-query that uses values from the outer query. In this case the inner query has to be executed for every row of outer query.

See example here http://en.wikipedia.org/wiki/Correlated_subquery

Simple subquery doesn't use values from the outer query and is being calculated only once:

SELECT id, first_name 
FROM student_details
WHERE id IN (SELECT student_id
FROM student_subjects
WHERE subject= 'Science');

CoRelated Subquery Example -

Query To Find all employees whose salary is above average for their department

 SELECT employee_number, name
FROM employees emp
WHERE salary > (
SELECT AVG(salary)
FROM employees
WHERE department = emp.department);

Mysql Exists vs IN -- correlated subquery vs subquery?

This is a RDBMS-agnostic answer, but may help nonetheless. In my understanding, the correlated (aka, dependent) subquery is perhaps the most often falsely accused culprit for bad performance.

The problem (as it is most often described) is that it processes the inner query for every row of the outer query. Therefore, if the outer query returns 1,000 rows, and the inner query returns 10,000, then your query has to slog through 10,000,000 rows (outer×inner) to produce a result. Compared to the 11,000 rows (outer+inner) from a non-correlated query over the same result sets, that ain't good.

However, this is just the worst case scenario. In many cases, the DBMS will be able to exploit indexes to drastically reduce the rowcount. Even if only the inner query can use an index, the 10,000 rows becomes ~13 seeks, which drops the total down to 13,000.

The exists operator can stop processing rows after the first, cutting down the query cost further, especially when most outer rows match at least one inner row.

In some rare cases, I have seen SQL Server 2008R2 optimise correlated subqueries to a merge join (which traverses both sets only once - best possible scenario) where a suitable index can be found in both inner and outer queries.

The real culprit for bad performance is not necessarily correlated subqueries, but nested scans.

Inner query aliasing in a correlated subquery

True, the correlated subquery is executed before the outer SELECT. But that is orthogonal to the visibility of table names and aliases.

The whole query is parsed and planned before any of it is executed. Details in the chapter The Path of a Query in the manual.

Why to use Correlated Subqueries?

Oracle has a good optimizer, but correlated subqueries are sometimes the most efficient way to express a query. For instance:

select t.*,
(select count(*) from z where z.t_id = t.id)
from t;

can be quite efficient with an index on z(t_id) because it avoids the outer aggregation.

There are other cases where they are both efficient and directly translate into a question: Fetch all the ts that don't exist in z.

select t.*
from t
where not exists (select 1 from z where z.id = t.id);

And finally, correlated subqueries are just an example of lateral joins. And lateral joins can be quite powerful. For instance, to get all the columns from the previous row, you might use:

select t.*, t2.*
from t cross join lateral
(select t2.*
from t t2
where t2.date < t.date
order by t2.date desc
fetch first 1 row only
) t2;

Use Of Correlated Subquery

Well, firstly it doesn't have a performance issue. It is what it is, and it will be executed as well as possible given the performance constraints of the hardware and database structure.

As for what it is useful for, it is just a way of expressing particular logical conditions.



Related Topics



Leave a reply



Submit