Misnamed Field in Subquery Leads to Join

Misnamed field in subquery leads to join

Authoritative quote for Postgres

The scope of a subquery includes all visible columns of the outer query. Unqualified names are resolved to the inner query first, then extending the search outward.

Assign table aliases and use theses aliases to table-qualify column names to remove any ambiguity - as you hinted yourself already.

Here is an example in the Postgres manual with a definitive statement explaining the scope:

SELECT ... FROM fdt WHERE c1 IN (SELECT c3 FROM t2 WHERE c2 = fdt.c1 + 10)

[...]

Qualifying c1 as fdt.c1 is only necessary if c1 is also the name of a
column in the derived input table of the subquery. But qualifying the
column name adds clarity even when it is not needed. This example
shows how the column naming scope of an outer query extends into its inner queries.

Bold emphasis mine.

There is also an example with an EXISTS semi-join in the list of examples in the same chapter of the manual. That's typically the superior alternative to WHERE x IN (subquery). But in this particular case you don't need either. See below.

One example:

  • sql query to extract new records

DB design, naming convention

This disaster happened because of confusion about column names. A clear and consistent naming convention in your table definitions would go a long way to make that a lot less likely to happen. This is true for any RDBMS. Make them as long as necessary to be clear, but as short as possible otherwise. Whatever your policy, be consistent.

For Postgres I would suggest:

CREATE TABLE colorcat (
colorcat_id integer NOT NULL PRIMARY KEY,
colorcat text UNIQUE NOT NULL
);

CREATE TABLE color (
color_id integer NOT NULL PRIMARY KEY,
color text NOT NULL,
colorcat_id integer REFERENCES colorcat -- assuming an FK
);
  • You already had legal, lower-case, unquoted identifiers. That's good.

  • Use a consistent policy. An inconsistent policy is worse than a bad policy. Not color_name (with underscore) vs. catname.

  • I rarely use 'name' in identifiers. It doesn't add information, just makes them longer. All identifiers are names. You chose cat_name, leaving away color, which actually carries information, and added name, which doesn't. If you have other "categories" in your DB, which is common, you'll have multiple cat_name which easily collide in bigger queries. I'd rather use colorcat (just like the table name).

  • Make the name indicate what's in the column. For the ID of a color category, colorcat_id is a good choice. id is not descriptive, colorcat would be misleading.

  • The FK column colorcat_id can have the same name as the referenced column. Both have exactly the same content. Also allows short syntax with USING in joins.

Related answer with more details:

  • How to implement a many-to-many relationship in PostgreSQL?

Better query

Building on my supposed design:

SELECT c.*
FROM colorcat cc
JOIN color c USING (colorcat_id)
WHERE cc.colorcat = 'magic color';

This is assuming a 1:n relationship between colorcat and color (which you did not specify, but seems likely).

Less publicly known (since the syntax is different in other RDBMS like SQL Server), you can join in additional tables in a DELETE as well:

DELETE FROM color c
USING colorcat cc
WHERE cc.colorcat = 'magic color'
AND cc.colorcat_id = c.colorcat_id;

MySQL: Using WHERE clause in subquery in JOIN gives error when outer table's column is referenced

To achieve the expected result. we could start with a query like this:

 SELECT m.id
, m.mode
, m.type
, m.other
FROM master m
WHERE m.mode = 2
ORDER BY ...

And for each row returned by that query, it looks like we want to get a single value. We can use a correlated subquery to get the value, for example:

 SELECT t.amount
FROM transactions t
WHERE t.id = m.id
AND t.type = IF(m.type=2,1,2)
ORDER BY t.amount DESC
LIMIT 1

The correlated subquery returns a single column, and returns no more than one row, so we can include that in the SELECT list of the original query.

Something like this:

 SELECT m.id
, m.mode
, m.type
, m.other
, ( SELECT t.amount
FROM transactions t
WHERE t.id = m.id
AND t.type = IF(m.type=2,1,2)
ORDER BY t.amount DESC
LIMIT 1
) AS `Amount`
FROM master m
WHERE m.mode = 2
ORDER BY ...

The way this works is that the outer query (query from master) is executed. For each row returned, the expressions in the SELECT list are evaluated. Evaluating expressions like m.id, m.mode, m.type are pretty straightforward, and each returns a scalar value which gets put into the resultset.

Evaluating the correlated subquery works the same way. It's a little more complicated... MySQL executes the subquery, using the values of the current m row, and returns a single value, which gets put into the resultset.

NOTE:

If the correlated subquery doesn't return a row, a NULL gets put into the resultset.

If the specification is to return only rows with a non-NULL value in the amount column, we can add a HAVING clause

 WHERE m.mode = 2 
HAVING `Amount` IS NOT NULL
ORDER BY ...

Also note:

Because that correlated subquery is executed for each row returned by the outer query, for a lot of rows from master, thats a lot of subquery executions, which can drag down performance for large sets.

That makes it very important to have a suitable index available for subquery.
Ideally, a covering index ...

ON transaction (id, type, amount)

mysql: duplicate column name on join with subquery

In the subselect of your join, you are selecting a.controlno and by t.* t.controlno.
You should provide an alias for one of the selected columns. In your case a.controlno. This is necessary, because the table aliases of the inner select are lost, when accessing it from the outer one.

The statement below should work, if there aren't any other duplicate column names in test and the set of used columns from applicant.

SELECT sp.testno, sp.companyid, st.* 
FROM sponsor AS sp
LEFT JOIN
(
SELECT a.sponsorempno, (CASE WHENt.companyid IS NULL OR t.companyid = '' THEN'aa' ELSE t.companyid END) agncy, a.controlno as a_controlno, a.tnpl, t.*
FROM applicant AS a
LEFT JOIN
test AS t
ON a.controlno = t.controlno
) AS st
ON sp.testno = st.testno

Reuse of a field from a joined table inside a subquery in FROM clause

Recall SQL's logical order of operations that differ from its lexical order (i.e., order in how it is written). Usually the first step in query processing is the FROM clause, then JOIN, ON, WHERE, GROUP BY, etc. and usually ending with ORDER BY and SELECT (ironically one of the last clauses processed though written first).

Technically, your queries do not involve correlated subqueries since there are no inner or outer levels. Specifically, the derived table t3 and base table t4 are at the same level. The query engine evaluates t3 in isolation by itself during FROM clause step. Then, it evaluates JOIN table, t4, in isolation by itself and finally applies the matching ON logic.

Because t4 is not defined in the universe of t3, MS Access via GUI prompts for that parameter value (where MS Access via ODBC will raise an error). To resolve you have to include all necessary data sources in each table scope:

SELECT t1.field1, t1.field2 - IIF(t3.calcfield IS NULL, 0, t3.calc) As Diff
FROM
(SELECT t2.fieldid, SUM(t2.field3) AS fsum
FROM t2
INNER JOIN table4 sub_t4
ON t2.fieldid = sub_t4.fieldid
WHERE t2.date > sub_t4.date
GROUP BY t2.fieldid
) t3
LEFT JOIN table4 t4
ON t3.fieldid = t4.fieldid

Often, too, using layered queries is beneficial in Access and can help with final, compact queries:

t3 query (save below as a query object)

SELECT t2.fieldid, SUM(t2.field3) AS fsum
FROM t2
INNER JOIN table4 sub_t4
ON t2.fieldid = sub_t4.fieldid
WHERE t2.date > sub_t4.date
GROUP BY t2.fieldid

Final query (join saved query)

SELECT t1.field1, t1.field2 - IIF(t3.calcfield IS NULL, 0, t3.calc) As Diff
FROM my_saved_query t3
LEFT JOIN table4 t4
ON t3.fieldid = t4.fieldid

SQL sub query with left join syntax error

I figured out why this wasn't working! Although it doesn't explain why using the actual query objects (Query1 and Query2) worked, it fixed me problem so I'm okay with it.

The grouping in the second sub-query was causing the issue. I removed the grouping in the second sub-query and then performed the grouping in the final query which is the aggregate of the two sub queries. Below is the working query that resolved the issue:

SELECT A.Employee, A.Event, A.Process, A.Priority, A.Minutes, Sum(B.Total) AS Assigned
FROM (
SELECT tblSkillSets.employeeName AS Employee, tblSkillSets.eventName AS Event, tblProcesses.ID AS Process, tblProcesses.Priority.Value AS Priority, tblProcesses.minutesPerItem AS Minutes
FROM tblEvents INNER JOIN (tblSkillSets INNER JOIN tblProcesses ON tblSkillSets.eventName = tblProcesses.eventName.Value) ON tblEvents.ID = tblSkillSets.eventName
WHERE tblProcesses.ID=27
) AS A
LEFT JOIN (
SELECT tblWorkflow.assignedTo, [itemCount]*[minutesPerItem] AS Total
FROM tblProcesses RIGHT JOIN tblWorkflow ON tblProcesses.ID = tblWorkflow.processName
WHERE (tblWorkflow.Status=1 Or tblWorkflow.Status=2)
AND DateDiff('d',[openedDate],Date())=0
) AS B
ON A.Employee = B.assignedTo
GROUP BY A.Employee, A.Event, A.Process, A.Priority, A.Minutes, A.Employee
ORDER BY Sum(B.Total), A.Employee;

So basically Access cannot combine two sub queries where one is an aggregate and the other is not. In order to fix this make both sub queries non-aggregate queries, combine them, and perform the aggregate on the final query.

Join vs. sub-query

Taken from the MySQL manual (13.2.10.11 Rewriting Subqueries as Joins):

A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better—a fact that is not specific to MySQL Server alone.

So subqueries can be slower than LEFT [OUTER] JOIN, but in my opinion their strength is slightly higher readability.

SQL LEFT JOIN Subquery Alias

You didn't select post_id in the subquery. You have to select it in the subquery like this:

SELECT wp_woocommerce_order_items.order_id As No_Commande
FROM wp_woocommerce_order_items
LEFT JOIN
(
SELECT meta_value As Prenom, post_id -- <----- this
FROM wp_postmeta
WHERE meta_key = '_shipping_first_name'
) AS a
ON wp_woocommerce_order_items.order_id = a.post_id
WHERE wp_woocommerce_order_items.order_id =2198


Related Topics



Leave a reply



Submit