Procedurally Transform Subquery into Join

Procedurally transform subquery into join

Converting a subquery into a JOIN can be pretty straightforward:

IN clause

 FROM TABLE_X x
WHERE x.col IN (SELECT y.col FROM TABLE_Y y)

...can be converted to:

FROM TABLE_X x
JOIN TABLE_Y y ON y.col = x.col

Your JOIN criteria is where you have direct comparison.

EXISTS clause

But there are complications when you look at the EXISTS clause. EXISTS are typically correllated, where the subquery is filtered by criteria from the table(s) outside the subquery. But the EXISTS is only for returning a boolean based on the criteria.

 FROM TABLE_X x
WHERE EXISTS (SELECT NULL
FROM TABLE_Y y
WHERE y.col = x.col)

...converted:

FROM TABLE_X x
JOIN TABLE_Y y ON y.col = x.col

Because of the boolean, there's a risk of more rows turning up in the resultset.

SELECTs in the SELECT clause

These should always be changed, with prejudice:

SELECT x.*,
(SELECT MAX(y.example_col)
FROM TABLE_Y y
WHERE y.col = x.col)
FROM TABLE_X x

You're probably noticing a patter now, but I made this a little different for an inline view example:

SELECT x.*,
z.mc
FROM TABLE_X x
JOIN (SELECT y.col, --inline view within the brackets
MAX(y.example_col) 'mc'
FROM TABLE_Y y
GROUP BY y.col) z ON z.col = x.col

The key is making sure the inline view resultset includes the column(s) needed to join to, along with the columns.

LEFT JOINs

You might've noticed I didn't have any LEFT JOIN examples - this would only be necessary if columns from the subquery use NULL testing (COALESCE on almost any db these days, Oracle's NVL or NVL2, MySQLs IFNULL, SQL Server's ISNULL, etc...):

SELECT x.*,
COALESCE((SELECT MAX(y.example_col)
FROM TABLE_Y y
WHERE y.col = x.col), 0)
FROM TABLE_X x

Converted:

   SELECT x.*,
COALESCE(z.mc, 0)
FROM TABLE_X x
LEFT JOIN (SELECT y.col,
MAX(y.example_col) 'mc'
FROM TABLE_Y y
GROUP BY y.col) z ON z.col = x.col

Conclusion

I'm not sure if that will satisfy your typographic needs, but hope I've demonstrated that the key is determining what the JOIN criteria is. Once you know the column(s) involved, you know the table(s) involved.

How to convert JOINs to INs clauses?

I don't know why you try do do this.

Because i don't think that will be faster.

I' added also the GROUP By Clause.

SELECT DISTINCT * FROM membership
WHERE membership.MEM_NUM IN (SELECT rental.MEM_NUM
FROM rental
WHERE rental.RENT_NUM IN (SELECT detailrental.RENT_NUM
FROM detailrental
WHERE DETAIL_DUEDATE < DETAIL_RETURNDATE)
)
GROUP BY membership.MEM_NUM;

An example based on your data

CREATE TABLE `price` (
`PRICE_CODE` decimal(2,0) NOT NULL,
`PRICE_DESCRIPTION` varchar(20) NOT NULL,
`PRICE_RENTFEE` decimal(5,2) DEFAULT NULL,
`PRICE_DAILYLATEFEE` decimal(5,2) DEFAULT NULL,
PRIMARY KEY (`PRICE_CODE`)
);


CREATE TABLE `movie` (
`MOVIE_NUM` decimal(8,0) NOT NULL,
`MOVIE_TITLE` varchar(75) NOT NULL,
`MOVIE_YEAR` decimal(4,0) DEFAULT NULL,
`MOVIE_COST` decimal(5,2) DEFAULT NULL,
`MOVIE_GENRE` varchar(50) DEFAULT NULL,
`PRICE_CODE` decimal(2,0) DEFAULT NULL,
PRIMARY KEY (`MOVIE_NUM`),
KEY `PRICE_CODE` (`PRICE_CODE`),
CONSTRAINT `movie_ibfk_1` FOREIGN KEY (`PRICE_CODE`) REFERENCES `price` (`PRICE_CODE`)
);


CREATE TABLE `video` (
`VID_NUM` decimal(8,0) NOT NULL,
`VID_INDATE` date DEFAULT NULL,
`MOVIE_NUM` decimal(8,0) DEFAULT NULL,
PRIMARY KEY (`VID_NUM`),
KEY `MOVIE_NUM` (`MOVIE_NUM`),
CONSTRAINT `video_ibfk_1` FOREIGN KEY (`MOVIE_NUM`) REFERENCES `movie` (`MOVIE_NUM`)
);


CREATE TABLE `membership` (
`MEM_NUM` decimal(8,0) NOT NULL,
`MEM_FNAME` varchar(30) NOT NULL,
`MEM_LNAME` varchar(30) NOT NULL,
`MEM_STREET` varchar(120) DEFAULT NULL,
`MEM_CITY` varchar(50) DEFAULT NULL,
`MEM_STATE` char(2) DEFAULT NULL,
`MEM_ZIP` char(5) DEFAULT NULL,
`MEM_BALANCE` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`MEM_NUM`)
);


CREATE TABLE `rental` (
`RENT_NUM` decimal(8,0) NOT NULL,
`RENT_DATE` date DEFAULT NULL,
`MEM_NUM` decimal(8,0) DEFAULT NULL,
PRIMARY KEY (`RENT_NUM`),
KEY `MEM_NUM` (`MEM_NUM`),
CONSTRAINT `rental_ibfk_1` FOREIGN KEY (`MEM_NUM`) REFERENCES `membership` (`MEM_NUM`)
);


CREATE TABLE `detailrental` (
`RENT_NUM` decimal(8,0) NOT NULL,
`VID_NUM` decimal(8,0) NOT NULL,
`DETAIL_FEE` decimal(5,2) DEFAULT NULL,
`DETAIL_DUEDATE` date DEFAULT NULL,
`DETAIL_RETURNDATE` date DEFAULT NULL,
`DETAIL_DAILYLATEFEE` decimal(5,2) DEFAULT NULL,
PRIMARY KEY (`RENT_NUM`,`VID_NUM`),
KEY `VID_NUM` (`VID_NUM`),
CONSTRAINT `detailrental_ibfk_1` FOREIGN KEY (`RENT_NUM`) REFERENCES `rental` (`RENT_NUM`),
CONSTRAINT `detailrental_ibfk_2` FOREIGN KEY (`VID_NUM`) REFERENCES `video` (`VID_NUM`)
);


INSERT INTO `membership` VALUES (102,'TAMI','DAWSON','2632 TAKLI CIRCLE','NORENE','TN','37136',11.00),(103,'CURT','KNIGHT','4025 CORNELL COURT','FLATGAP','KY','41219',6.00),(104,'JAMAL','MELENDEZ','788 EAST 145TH AVENUE','QUEBECK','TN','38579',0.00),(105,'IVA','MCCLAIN','6045 MUSKET BALL CIRCLE','SUMMIT','KY','42783',15.00),(106,'MIRANDA','PARKS','4469 MAXWELL PLACE','GERMANTOWN','TN','38183',0.00),(107,'ROSARIO','ELLIOTT','7578 DANNER AVENUE','COLUMBIA','TN','38402',5.00),(108,'MATTIE','GUY','4390 EVERGREEN STREET','LILY','KY','40740',0.00),(109,'CLINT','OCHOA','1711 ELM STREET','GREENEVILLE','TN','37745',10.00),(110,'LEWIS','ROSALES','4524 SOUTHWIND CIRCLE','COUNCE','TN','38326',0.00),(111,'STACY','MANN','2789 EAST COOK AVENUE','MURFREESBORO','TN','37132',8.00),(112,'LUIS','TRUJILLO','7267 MELVIN AVENUE','HEISKELL','TN','37754',3.00),(113,'MINNIE','GONZALES','6430 VASILI DRIVE','WILLISTON','TN','38076',0.00);
INSERT INTO `price` VALUES (1,'Standard',3.00,1.00),(2,'New Release',4.50,3.00),(3,'Discount',2.50,1.00),(4,'Weekly Special',2.00,0.50);
INSERT INTO `movie` VALUES (1234,'The Cesar Family Christmas',2014,39.95,'FAMILY',2),(1235,'Smokey Mountain Wildlife',2011,59.95,'ACTION',3),(1236,'Richard Goodhope',2015,59.95,'DRAMA',2),(1237,'Beatnik Fever',2014,29.95,'COMEDY',2),(1238,'Constant Companion',2015,89.95,'DRAMA',NULL),(1239,'Where Hope Dies',2005,25.49,'DRAMA',3),(1245,'Time to Burn',2015,45.49,'ACTION',3),(1246,'What He Doesn\'t Know',2013,58.29,'COMEDY',1);
INSERT INTO `video` VALUES (34341,'2014-01-22',1235),(34342,'2014-01-22',1235),(34366,'2016-03-02',1236),(34367,'2016-03-02',1236),(34368,'2016-03-02',1236),(34369,'2016-03-02',1236),(44392,'2015-10-21',1237),(44397,'2015-10-21',1237),(54321,'2015-06-18',1234),(54324,'2015-06-18',1234),(54325,'2015-06-18',1234),(59237,'2016-02-14',1237),(61353,'2013-01-28',1245),(61354,'2013-01-28',1245),(61367,'2015-07-30',1246),(61369,'2015-07-30',1246),(61388,'2014-01-25',1239);
INSERT INTO `rental` VALUES (1001,'2016-03-01',103),(1002,'2016-03-01',105),(1003,'2016-03-02',102),(1004,'2016-03-02',110),(1005,'2016-03-02',111),(1006,'2016-03-02',107),(1007,'2016-03-02',104),(1008,'2016-03-03',105),(1009,'2016-03-03',111);
INSERT INTO `detailrental` VALUES (1001,34342,2.00,'2016-03-04','2016-03-02',NULL),(1001,34366,3.50,'2016-03-04','2016-03-02',3.00),(1001,61353,2.00,'2016-03-04','2016-03-03',1.00),(1002,59237,3.50,'2016-03-04','2016-03-04',3.00),(1003,54325,3.50,'2016-03-04','2016-03-09',3.00),(1003,61369,2.00,'2016-03-06','2016-03-09',1.00),(1003,61388,0.00,'2016-03-06','2016-03-09',1.00),(1004,34341,2.00,'2016-03-07','2016-03-07',1.00),(1004,34367,3.50,'2016-03-05','2016-03-07',3.00),(1004,44392,3.50,'2016-03-05','2016-03-07',3.00),(1005,34342,2.00,'2016-03-07','2016-03-05',1.00),(1005,44397,3.50,'2016-03-05','2016-03-05',3.00),(1006,34366,3.50,'2016-03-05','2016-03-04',3.00),(1006,61367,2.00,'2016-03-07',NULL,1.00),(1007,34368,3.50,'2016-03-05',NULL,3.00),(1008,34369,3.50,'2016-03-05','2016-03-05',3.00),(1009,54324,3.50,'2016-03-05',NULL,3.00);












SELECT DISTINCT * FROM membership
WHERE membership.MEM_NUM IN (SELECT rental.MEM_NUM
FROM rental
WHERE rental.RENT_NUM IN (SELECT detailrental.RENT_NUM
FROM detailrental
WHERE DETAIL_DUEDATE < DETAIL_RETURNDATE)
)
GROUP BY membership.MEM_NUM;

MEM_NUM | MEM_FNAME | MEM_LNAME | MEM_STREET | MEM_CITY | MEM_STATE | MEM_ZIP | MEM_BALANCE
------: | :-------- | :-------- | :-------------------- | :------- | :-------- | :------ | ----------:
102 | TAMI | DAWSON | 2632 TAKLI CIRCLE | NORENE | TN | 37136 | 11.00
110 | LEWIS | ROSALES | 4524 SOUTHWIND CIRCLE | COUNCE | TN | 38326 | 0.00

db<>fiddle here

Join vs. sub-query

Taken from the MySQL manual (13.2.10.11 Rewriting Subqueries as Joins):

A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better—a fact that is not specific to MySQL Server alone.

So subqueries can be slower than LEFT [OUTER] JOIN, but in my opinion their strength is slightly higher readability.

How to express this SQL query without subqueries

You can JOIN all four tables to get the same result. Note that comma joins have been superseded for decades, you should write ANSI joins instead:

SELECT DISTINCT id, summary
FROM article a
JOIN articles_categories ac ON ac.article_id = a.id
JOIN articles_keyphrases ak ON ak.article_id = a.id
JOIN projects_categories pc ON pc.category_id = ac.category_id AND pc.project_id = 26
JOIN projects_keyphrases pk ON pk.keyphrase_id = ak.keyphrase_id AND pk.project_id = 26

Subqueries vs joins

A "correlated subquery" (i.e., one in which the where condition depends on values obtained from the rows of the containing query) will execute once for each row. A non-correlated subquery (one in which the where condition is independent of the containing query) will execute once at the beginning. The SQL engine makes this distinction automatically.

But, yeah, explain-plan will give you the dirty details.

How to join self join with subquery?

You are mixing the pre-1992 comma-separated join syntax with proper ANSI joins, which you shouldn't. The docs (https://dev.mysql.com/doc/refman/5.7/en/join.html) state clearly:

However, the precedence of the comma operator is less than that of INNER JOIN, CROSS JOIN, LEFT JOIN, and so on. If you mix comma joins with the other join types when there is a join condition, an error of the form Unknown column 'col_name' in 'on clause' may occur.

Replacing the comma with CROSS JOIN makes it work: FROM `values` vs1 CROSS JOIN `values` vs2.

However, I find most of your conditions superfluous. Plus changing the order of joins for readability (especially in order not to pretend to be cross-joining):

SELECT
vs1.time as t1,
vs1.value as v1,
vs2.time as t2,
vs2.value as v2
FROM
(
SELECT MIN(time) as mint, MAX(time) as maxt
FROM `values`
GROUP BY hour(time), minute(time)
) as te
JOIN `values` vs1 ON vs1.time = te.mint
JOIN `values` vs2 ON vs2.time = te.maxt
ORDER BY vs1.time;

How to eliminate cartesian product with joins while using subquery?

This can be surprisingly simple with array_agg() as window aggregate function combined with array_remove() (introduced with pg 9.3):

CREATE TABLE npatest AS
SELECT paperid, name AS author
, array_to_string(array_remove(array_agg(name) OVER (PARTITION BY paperid), name), ', ') AS coauthors
FROM newpaperauthor n;

If author names are not unique, there are complications.

Then again, if author names are not unique, your whole operation is flawed.

Using array_agg() and array_remove() instead of string_agg() and regexp_replace(), because the latter would fail easily for similar names like 'Jon Fox' and 'Jon Foxy', and also be messy with delimiters.

array_to_string() transforms the array to a string. I used ', ' as separator, which seems more sensible to me than just a space.

The use of SELECT INTO is discouraged. Use the superior CREATE TABLE AS instead. Per documentation:

CREATE TABLE AS is the recommended syntax, since this form of
SELECT INTO is not available in ECPG or PL/pgSQL, because they
interpret the INTO clause differently. Furthermore, CREATE TABLE AS
offers a superset of the functionality provided by SELECT INTO.

SQL Fiddle.

What is a better alternative to procedural approach to produce a string based on columns?

The easiest way IMO is to move your criteria being found via EXISTS to an OUTER APPLY. Like so:

SELECT 
CONVERT(NVARCHAR(50), STUFF(calc.ReturnString, LEN(calc.ReturnString), 1, ''))
FROM
AUTHORDATA AS A
OUTER APPLY (SELECT TOP (1)
1 AS Found
FROM
AUHTORPROFILE AS B
INNER JOIN AUTHORHISTORY AS C ON B.AUTHORPROFILEID = C.AUTHORPROFILEID
WHERE
B.authorId = A.authorId) AS lookup_author
/*outer apply here just for readibility in final select*/
OUTER APPLY (SELECT
CONCAT(CASE WHEN A.hasRegistered = 1 THEN '0,' ELSE '' END
,CASE WHEN A.hasRegistered = 1 THEN '1,' ELSE '' END
,CASE WHEN lookup_author.Found = 1 THEN '10,' ELSE '' END) AS ReturnString) AS calc;

Then you can use lookup_author.Found = 1 to determine that it was found in your lookup. From there, you just have to apply the rest of your conditions correctly via CASE statements and then use your final SELECT over the result.



Related Topics



Leave a reply



Submit