SQL select where not in subquery returns no results
Update:
These articles in my blog describe the differences between the methods in more detail:
NOT IN
vs.NOT EXISTS
vs.LEFT JOIN / IS NULL
:SQL Server
NOT IN
vs.NOT EXISTS
vs.LEFT JOIN / IS NULL
:PostgreSQL
NOT IN
vs.NOT EXISTS
vs.LEFT JOIN / IS NULL
:Oracle
NOT IN
vs.NOT EXISTS
vs.LEFT JOIN / IS NULL
:MySQL
There are three ways to do such a query:
LEFT JOIN / IS NULL
:SELECT *
FROM common
LEFT JOIN
table1 t1
ON t1.common_id = common.common_id
WHERE t1.common_id IS NULLNOT EXISTS
:SELECT *
FROM common
WHERE NOT EXISTS
(
SELECT NULL
FROM table1 t1
WHERE t1.common_id = common.common_id
)NOT IN
:SELECT *
FROM common
WHERE common_id NOT IN
(
SELECT common_id
FROM table1 t1
)
When table1.common_id
is not nullable, all these queries are semantically the same.
When it is nullable, NOT IN
is different, since IN
(and, therefore, NOT IN
) return NULL
when a value does not match anything in a list containing a NULL
.
This may be confusing but may become more obvious if we recall the alternate syntax for this:
common_id = ANY
(
SELECT common_id
FROM table1 t1
)
The result of this condition is a boolean product of all comparisons within the list. Of course, a single NULL
value yields the NULL
result which renders the whole result NULL
too.
We never cannot say definitely that common_id
is not equal to anything from this list, since at least one of the values is NULL
.
Suppose we have these data:
common
--
1
3
table1
--
NULL
1
2
LEFT JOIN / IS NULL
and NOT EXISTS
will return 3
, NOT IN
will return nothing (since it will always evaluate to either FALSE
or NULL
).
In MySQL
, in case on non-nullable column, LEFT JOIN / IS NULL
and NOT IN
are a little bit (several percent) more efficient than NOT EXISTS
. If the column is nullable, NOT EXISTS
is the most efficient (again, not much).
In Oracle
, all three queries yield same plans (an ANTI JOIN
).
In SQL Server
, NOT IN
/ NOT EXISTS
are more efficient, since LEFT JOIN / IS NULL
cannot be optimized to an ANTI JOIN
by its optimizer.
In PostgreSQL
, LEFT JOIN / IS NULL
and NOT EXISTS
are more efficient than NOT IN
, sine they are optimized to an Anti Join
, while NOT IN
uses hashed subplan
(or even a plain subplan
if the subquery is too large to hash)
SQL subquery returns no results
You're trying to find a department ID which matches a number of employees; not the department those employees belong to.
If you're on 12c or higher you can use the row limiting clause:
select d.department_id, d.department_name, count(e.employee_id) as employee_count
from departments d
join employees e on e.department_id = d.department_id
group by d.department_id, d.department_name
order by count(e.employee_id) desc
fetch first row only;
DEPARTMENT_ID DEPARTMENT_NAME EMPLOYEE_COUNT
------------- ------------------------------ --------------
50 Shipping 45
If you can have ties and want to show all tied rows, change only
to with ties
; otherwise you'll get an indeterminate row from those that are tied.
In earlier versions the equivalent would be:
select department_id, department_name, employee_count
from (
select d.department_id, d.department_name, count(e.employee_id) as employee_count
from departments d
join employees e on e.department_id = d.department_id
group by d.department_id, d.department_name
order by count(e.employee_id) desc
)
where rownum = 1;
DEPARTMENT_ID DEPARTMENT_NAME EMPLOYEE_COUNT
------------- ------------------------------ --------------
50 Shipping 45
which is also indeterminate if there are ties.
To handle ties properly pre-12c, you can use a query similar to @Littlefoot's, but using rank()
:
select department_id, department_name, employee_count
from (
select d.department_id, d.department_name, count(e.employee_id) as employee_count,
rank() over (order by count(*) desc) as rnk
from departments d
join employees e on e.department_id = d.department_id
group by d.department_id, d.department_name
order by count(e.employee_id) desc
)
where rnk = 1;
... or use @TheImpaler's approach.
the teacher wants subquery only
Always fun to have articifial restrictions... but you could do:
select d.department_id, d.department_name, count(e.employee_id) as employee_count
from departments d
join employees e on e.department_id = d.department_id
group by d.department_id, d.department_name
having count(e.employee_id) = (
select max(count(employee_id))
from employees
group by department_id
);
which will show ties; or
select d.department_id, d.department_name, count(e.employee_id) as employee_count
from departments d
join employees e on e.department_id = d.department_id
where d.department_id in (
select department_id
from (
select department_id
from employees
group by department_id
order by count(employee_id) desc
)
where rownum = 1
)
group by d.department_id, d.department_name;
which won't, unless you changerownum
to rank()
again.
no explicit joins if thats possible
If you aren't allowed any joins I suppose you could do:
select d.department_id, d.department_name,
(select count(e.employee_id)
from employees e
where e.department_id = d.department_id) as employee_count
from departments d
where d.department_id in (
select department_id
from (
select department_id
from employees
group by department_id
order by count(employee_id) desc
)
where rownum = 1
)
group by d.department_id, d.department_name;
DEPARTMENT_ID DEPARTMENT_NAME EMPLOYEE_COUNT
------------- ------------------------------ --------------
50 Shipping 45
which is getting a bit ridiculous *8-)
Query with NOT IN subquery returning 0 rows
Based on the provided SQLFiddle in the edited question, this works.
SELECT p.*, 'BBCode' AS Format,
FROM_UNIXTIME(TIME) AS DateInserted,
FROM_UNIXTIME(editTime) AS DateUpdated
FROM et_post p
INNER JOIN et_conversation c
ON c.conversationId = p.conversationId
and c.private = 0
join (
select conversationId,min(postId) as m
from et_post
group by conversationId
) r
on r.conversationId = c.conversationId
where p.postId<>r.m
12,15,18 disappear as requested in your edit ... so too does NOT IN
madness
If subquery returns no rows, return 0 else return value of the subquery
In the example you posted you could simply wrap the subquery in COALESCE and get your desired result.
SELECT
*
FROM
#Purchase
WHERE
Total < 5000
AND
Total > COALESCE(
(
SELECT TOP 1
buyamount
FROM
#employee
WHERE
manager = 'TST'
ORDER BY
buyamount ASC
), 0)
ORDER BY Date DESC
You could also use a variable in your provided example instead of a subquery. I prefer this approach because it separates the logic, making it easier to read. Subqueries can also be a performance killer. Having it broken out like this, it should perform better.
DECLARE @BuyAmount INT
SET @BuyAmount = (SELECT TOP 1 BuyAmount FROM #Employee WHERE Manager = 'TST' ORDER BY BuyAmount ASC)
IF @BuyAmount IS NULL SET @BuyAmount = 0
SELECT *
FROM #Purchase
WHERE Total < 5000 AND Total > @BuyAmount
ORDER BY Date DESC
MySQL and subquery returns no results
So either skk.sarjan_kilpailukausi_id = sr.vaadittu_pelaajastatus_id
or NOT EXISTS (SELECT * FROM skp WHERE skp.kayttaja_id = ki.kayttaja_id)
? Okay, I've applied this condition in the following query.
I'm using table aliases to make it all more radable. I've also re-written your joins. Please check them.
SELECT
sr.montako_huonointa_pois_lkm AS ei_huomioitavat_kilpailut_lkm
, sr.montako_suora_nousu_lkm AS montako_suora_nousu_lkm
, sr.montako_suora_pudotus_lkm AS montako_suora_pudotus_lkm
, sr.montako_nousukarsinta_lkm AS montako_nousukarsinta_lkm
, sr.montako_pudotuskarsinta_lkm AS montako_pudotuskarsinta_lkm
, sr.vaadittu_pelaajastatus_id AS vaadittu_pelaajastatus_id
, sr.epakelpo_pelaajastatus_id AS epakelpo_pelaajastatus_id
, sro.monesko_kilpailu AS monesko_kilpailu
, sro.sarakeotsikko AS kilpailu_sarakeotsikko
, sor.kayttaja_id AS kayttaja_id
, sor.pistemaara AS pisteet
, sor.kompensaatiopistemaara AS kompensaatiopisteet
, kk.id AS kilpailu_id
, kk.ajankohta_alkamispaiva AS kilpailu_alkamispaiva
, kk.kilpailunimi_virallinen AS kilpailu_nimi
, ki.pelaaja_oma_nimi_txt AS kayttaja_nimiteksti
, ki.pelaaja_oma_seura_lyh_txt AS kayttaja_seurateksti
, ki.pelaaja_oma_kunta_txt AS kayttaja_kuntateksti
, sk.kausi_id AS kausi_id
FROM sarjojen_rglistat sr
JOIN sarjojen_rglistojen_osakilpailut sro ON sro.rglista_id = sr.id
JOIN kilpailukausien_kilpailut kk ON kk.id = sro.kilpailu_id
AND kk.kausi_id IS NULL
AND kk.seura_id IS NULL
JOIN sarjojen_osakilpailuiden_rgpisteet sor ON sor.kilpailu_id = kk.id
JOIN sarjojen_kilpailukausien_kilpailusysteemit skk
ON skk.id = kk.kauden_kilpailusysteemi_id
JOIN sarjojen_kilpailukaudet sk ON sk.id = skk.sarjan_kilpailukausi_id
LEFT JOIN sarjojen_kilpailukausien_pelaajastatukset skp
ON skp.sarjan_kilpailukausi_id = sr.vaadittu_pelaajastatus_id
AND skp.kayttaja_id = sor.kayttaja_id
LEFT JOIN kayttajien_ilmoittautumiset ki ON ki.kayttaja_id = sor.kayttaja_id
AND ki.kilpailu_id = sor.kilpailu_id
WHERE sr.id = 4
AND
(
(
sr.vaadittu_pelaajastatus_id IS NULL
AND
sr.epakelpo_pelaajastatus_id IS NULL
)
OR
(
skp.sarjan_kilpailukausi_id = sr.vaadittu_pelaajastatus_id
AND
skp.kayttaja_id = ki.kayttaja_id
)
OR
(
skk.sarjan_kilpailukausi_id = sr.vaadittu_pelaajastatus_id
OR
NOT EXISTS
(
SELECT *
FROM skp
WHERE skp.kayttaja_id = ki.kayttaja_id
)
)
)
ORDER BY sk.kausi_id ASC, kk.ajankohta_alkamispaiva ASC, kk.id ASC;
SQL select with IN subquery returns no records if the sub-query contains NULL
This is by design. If the match fails and the set contains NULL the result is NULL, as specified by the SQL standard.
'1' IN ('1', '3') => true
'2' IN ('1', '3') => false
'1' IN ('1', NULL) => true
'2' IN ('1', NULL) => NULL
'1' NOT IN ('1', '3') => false
'2' NOT IN ('1', '3') => true
'1' NOT IN ('1', NULL) => false
'2' NOT IN ('1', NULL) => NULL
Informally, the logic behind this is that NULL can be thought of as an unknown value. For example here it doesn't matter what the unknown value is - '1' is clearly in the set, so the result is true.
'1' IN ('1', NULL) => true
In the following example we can't be sure that '2' is in the set, but since we don't know all the values we also can't be sure that it isn't in the set. So the result is NULL.
'2' IN ('1', NULL) => NULL
Another way of looking at it is by rewriting x NOT IN (Y, Z)
as X <> Y AND X <> Z
. Then you can use the rules of three-valued logic:
true AND NULL => NULL
false AND NULL => false
NOT IN subquery fails when there are NULL-valued results
It is better to use NOT EXISTS
anyway:
WHERE NOT EXISTS(
SELECT 1 FROM protocol_application_log_devl pal
WHERE pal.person_id = person_id
AND pal.set_id = @set_id
)
Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?
A pattern I see quite a bit, and wish that I didn't, is NOT IN. When
I see this pattern, I cringe. But not for performance reasons – after
all, it creates a decent enough plan in this case:The main problem is that the results can be surprising if the target
column is NULLable (SQL Server processes this as a left anti semi
join, but can't reliably tell you if a NULL on the right side is equal
to – or not equal to – the reference on the left side). Also,
optimization can behave differently if the column is NULLable, even if
it doesn't actually contain any NULL valuesInstead of NOT IN, use a correlated NOT EXISTS for this query pattern.
Always. Other methods may rival it in terms of performance, when all
other variables are the same, but all of the other methods introduce
either performance problems or other challenges.
Combine scalar query and results from a subquery that returns multiple columns
Literally just add them to your sub-query and make it your main query.
SELECT
DB_NAME() DB
, (SELECT SettingValue FROM Settings WHERE SettingName = 'XYZ') 'Is XYZ'
, D.DeptName, COUNT(*)
FROM (
SELECT DISTINCT substring(Source, len(Source) - 17, 14) AS DeptId
FROM AuditLog
) LogDept AS LD
INNER JOIN Department AS D ON D.DeptId = LD.DeptId
GROUP BY D.DeptName;
And I recommend the use of short table aliases as I have demonstated.
Why different results from WHERE EXISTS() and WHERE id IN
They are not identical.
When the subquery returns NULL
, then the outer query returns no results at all. The reason is that NULL
could match any value -- it means "unknown value" not "missing value" semantically. And an unknown value could match anything. In actual fact, the condition returns NULL
in this case, which is treated as false
.
For this reason, I strongly, strongly recommend never using NOT IN
with a subquery. Use NOT EXISTS
.
If you do, then you can do a NULL
check:
SELECT *
FROM table1
WHERE table1.id NOT IN (
SELECT table2.id FROM table2 WHERE table2.id IS NOT NULL
);
However, in my experience, you'll leave out the NULL
check at some crucial time, and think that there are no non-matches when there are.
Related Topics
Why Is Null Not Equal to Null False
Conditional Sum in Group by Query Mssql
Preventing Adjacent/Overlapping Entries with Exclude in Postgresql
Transpose Rows into Columns in Bigquery (Pivot Implementation)
Change Postgresql Columns Used in Views
Does Sparksql Support Subquery
Calendar Table - Week Number of Month
How to Set Variable from a SQL Query
What's the Difference Between Rank() and Dense_Rank() Functions in Oracle
Why Do People Hate SQL Cursors So Much
Partition Function Count() Over Possible Using Distinct
Oracle Query to Fetch Column Names
Get the Records of Last Month in SQL Server
SQL - Select First 10 Rows Only
Write a Number with Two Decimal Places SQL Server
How to Search Multiple Columns in MySQL
Oracle Joins - Comparison Between Conventional Syntax VS Ansi Syntax