MySQL Correlated Subquery in Join Syntax

MySQL correlated subquery in JOIN syntax

The answer to your question is no, it is not possible to reference correlation names as you are doing. The derived table is produced by your inner query before the outer query starts evaluating joins. So the correlation names like t, tp, and u are not available to the inner query.

To solve this, I'd recommend using the same constant integer value in the inner query, and then join the derived table in the outer query using a real condition instead of 1=1.

SELECT t.ticketid, u.userid, t.fullname, u.loginapi_userid, t.email,
tp.subject, tp.contents, a.PhoneNumber, a.Location, a.Extension,
a.BusinessUnit, a.Department
FROM swtickets t
INNER JOIN swticketposts tp ON (t.ticketid = tp.ticketid)
INNER JOIN swusers u ON (t.userid = u.userid)
LEFT OUTER JOIN (
SELECT cfv.typeid,
MIN(CASE cfv.customfieldid WHEN 1 THEN cfv.fieldvalue END) AS 'PhoneNumber',
MIN(CASE cfv.customfieldid WHEN 3 THEN cfv.fieldvalue END) AS 'Location',
MIN(CASE cfv.customfieldid WHEN 5 THEN cfv.fieldvalue END) AS 'Extension',
MIN(CASE cfv.customfieldid WHEN 8 THEN cfv.fieldvalue END) AS 'BusinessUnit',
MIN(CASE cfv.customfieldid WHEN 9 THEN cfv.fieldvalue END) AS 'Department'
FROM swcustomfieldvalues cfv
WHERE cfv.typeid = 2458
GROUP BY cfv.typeid
) AS a ON (a.typeid = t.ticketid)
WHERE t.ticketid = 2458;

Evaluating a correlated subquery in SQL

Conceptually...

To understand this, first ignore the bit about correlated subquery.

Consider the order of operations for a statement like this:

SELECT t.foo FROM mytable t

MySQL prepares an empty resultset. Rows in the resultset will consist of one column, because there is one expression in the SELECT list. A row is retrieved from mytable. MySQL puts a row into the resultset, using the value from the foo column from the mytable row, assigning it to the foo column in the resultset. Fetch the next row, repeat that same process, until there are no more rows to fetch from the table.

Pretty easy stuff. But bear with me.

Consider this statement:

SELECT t.foo AS fi, 'bar' AS fo FROM mytable t

MySQL process that the same way. Prepare an empty resultset. Rows in the resultset are going to have two columns this time. The first column is given the name fi (because we assigned the name fi with an alias). The second column in rows of the resultset will be named fo, because (again) we assigned an alias.

Now we etch a row from mytable, and insert a row into the resultset. The value of the foo column goes into the column name fi, and the literal string 'bar' goes into the column named fo. Continue fetching rows and inserting rows into the resultset, until no more rows to fetch.

Not too hard.

Next, consider this statement, which looks a little more tricky:

SELECT t.foo AS fi, (SELECT 'bar') AS fo FROM mytable t

Same thing happens again. Empty resultset. Rows have two columns, name fi and fo.

Fetch a row from mytable, and insert a row into the resultset. The value of foo goes into column fi (just like before.) This is where it gets tricky... for the second column in the resultset, MySQL executes the query inside the parens. In this case it's a pretty simple query, we can test that pretty easily to see what it returns. Take the result from that query and assign that to the fo column, and insert the row into the resultset.

Still with me?

SELECT t.foo AS fi, (SELECT q.tip FROM bartab q LIMIT 1) AS fo FROM mytable 

This is starting to look more complicated. But it's not really that much different. The same things happen again. Prepare the empty resultset. Rows will have two columns, one name fi, the other named fo. Fetch a row from mytable. Get the value from foo column, and assign it to the fi column in the result row. For the fo column, execute the query, and assign the result from the query to the fo column. Insert the result row into the resultset. Fetch another row from mytable, a repeat the process.

Here we should stop and notice something. MySQL is picky about that query in the SELECT list. Really really picky. MySQL has restrictions on that. The query must return exactly one column. And it cannot return more than one row.

In that last example, for the row being inserted into the resultset, MySQL is looking for a single value to assign to the fo column. When we think about it that way, it makes sense that the query can't return more than one column... what would MySQL do with the value from the second column? And it makes sense that we don't want to return more than one row... what would MySQL do with multiple rows?

MySQL will allow the query to return zero rows. When that happens, MySQL assigns a NULL to the fo column.

If you have an understanding of that, your 95% of the way there to understanding the correlated subquery.

Let's look at another example. Our single line of SQL is getting a little unweildy, so we'll just add some line breaks and spaces to make it easier for us to work with. The extra spaces and linebreaks don't change the meaning of our statement.

SELECT t.foo AS fi
, ( SELECT q.tip
FROM bartab q
WHERE q.col = t.foo
ORDER BY q.tip DESC
LIMIT 1
) AS fo
FROM mytable t

Okay, that looks a lot more complicated. But is it really? It's the same thing again. Prepare an empty resultset. Rows will have two columns, fi and fo. Fetch a row from mytable, and get a row ready to insert into the resultset. Copy the value from the foo column, assign it to the fi column. And for the fo column, execute the query, take the single value returned by the query to the fo column, and push the row into the resultset. Fetch the next row from mytable, and repeat.

To explain (finall!) the part about "correlated".

That query we are going to run to get the result for the fo column. That contains a reference to a column from the outer table. t.foo. In this example that appears in the WHERE clause; it doesn't have to, it could appear anywhere in the statement.

What MySQL does with that, when it runs that subquery, it passes in the value of the foo column, into the query. If the row we just fetched from mytable has a value of 42 in the foo column... that subquery is equivalent to

         SELECT q.tip
FROM bartab q
WHERE q.col = 42
ORDER BY q.tip DESC
LIMIT 1

But since we're not passing in the literal value of 42, what we're passing in is values from the row in the outer query, the result returned by our subquery is "related" to the row we're processing in the outer query.

We could be a lot more complicated in our subquery, as long as we remember the rule about the subquery in the SELECT list... it has to return exactly one column, and at most one row. It returns at most one value.

Correlated subqueries can appear in parts of the statement other than the SELECT list, such as the WHERE clause. The same general concept applies. For each row processed by the outer query, the values of the column(s) from that row are passed in to the subquery. The result returned from the subquery is related to the row being processed in the outer query.


The discussion omits all the steps before the actual execution... parsing the statament into tokens, performing the syntax check (keywords and identifiers in the right place). Then performing the semantics check (does mytable exist, does the user have select privilege on it, does the column foo exist in mytable). Then determining the access plan. And in the execution, obtaining the required locks, and so on. All that happens with every statement we execute.)

And we're going to not discuss the kinds of horrendous performance issues we can create with correlated subqueries. Though the previous discussion should give a clue. Since the subquery is executed for every row we're putting into the resultset (if it's in the SELECT list of our outer query), or is being executed for every row that is accessed by the outer query... if the outer query is returning 40,000 rows, that means our correlated subquery is going to be executed 40,000 times. So we better well make sure that subquery executes fast. Even when it executes fast, we're still going to execute it 40,000 times.

MySQL/SQL: Update with correlated subquery from the updated table itself

Following the two answers I received (none of which was complete so I wrote my own), what I eventually did is as follows:

UPDATE Table AS target
INNER JOIN
(
select category, appearances_sum
from Table T inner join (
select category as cat, sum(appearances) as appearances_sum
from Table
group by cat
) as agg
where T.category = agg.cat
group by category
) as source
ON target.category = source.category
SET target.probability = target.appearances / source.appearances_sum

It works very quickly. I also tried with correlated subquery but it was much slower (orders of magnitude), so I'm sticking with the join.

Correlated SQL Join Query from multiple tables

If your exchangeRate table doesn't have any gaps, this is pretty simple:

select dateOfPurchase, costOfPurchase, countryOfPurchase, exchangeRate
from purchasesTable p
inner join
exchangeRateTable e
on p.dateofpurchase = e.effectivedate
and p.countryofpurchase = e.countrycode

If it does have gaps (the effective rate is set on 1/1 and doesn't change until 1/3, so the 1/1 rate applies to 1/2), then it gets a little more complicated because the end date is only ever implied. In that case, the following should work:

select dateOfPurchase, costOfPurchase, countryOfPurchase, exchangeRate
from purchasesTable p
left outer join
(select e1.exchangeRate, e1.countrycode,
e1.effectivedate, min(e2.effectivedate) as enddate
from exchangeRateTable e1
left outer join
exchangeRateTable e2
on e1.effective_date < e2.effective_date
and e1.countrycode = e2.countrycode
group by e1.exchangeRate, e1.countrycode,
e1.effectivedate) e
on p.dateofpurchase >= e.effectivedate
and (p.dateofpurchase < e.enddate
or e.enddate is null)
and p.countryofpurchase = e.countrycode

If you need to use this solution, you might want to put the innermost query in a stored query, both to simplify this and to make the end-date available to other processes.


What we're doing is getting each of the records from the exchange rate table (e1) and joining it to all of the entries in the same table (e2) that occur later in time. The we take the smallest of that second set of values (min(e2.effectivedate)).

Let's say you have only three values:

1/1/2000
1/3/2000
1/5/2000

The join will give you the following results (each value combined with all greater values):

1/1/2000 < 1/3/2000
1/1/2000 < 1/5/2000
1/3/2000 < 1/5/2000
1/5/2000 < [null]

Since there's no value that 1/5/2000 is lesser than and we specified an outer join, that row will have an empty value for the second table. We then specified that we only wanted the smallest value from the second table, so the result set is reduced to:

1/1/2000 < 1/3/2000
1/3/2000 < 1/5/2000
1/5/2000 < [null]

Finally, in the outermost join, we tell the query to join all dates between those two values. However, because one set has a null end date, we add an or condition to ignore the upper bound in that case.


I got started learning SQL from Litwin, et.al.'s "Access 95 Developer's Handbook" and reading Usenet a lot, so my sources are bit out of date...

correlated subquery between two tables

should be

SELECT 
id,
project_name,
(SELECT slug FROM
project_slugs ps
WHERE ps.project_id = p.id
ORDER BY created ASC LIMIT 1) as slug
FROM projects p;

where the subquery is known as slug which is an alias, this can be any name.

JOIN or Correlated subquery with exists clause, which one is better

Generally, the EXISTS clause because you may need DISTINCT for a JOIN for it to give the expected output. For example, if you have multiple Department rows for a ContactInformation row.

In your example above, the SELECT *:

  • means different output too so they are not actually equivalent
  • less chance of a index being used because you are pulling all columns out

Saying that, even with a limited column list, they will give the same plan: until you need DISTINCT... which is why I say "EXISTS"



Related Topics



Leave a reply



Submit