SQL "Select Where Not in Subquery" Returns No Results

SQL select where not in subquery returns no results

Update:

These articles in my blog describe the differences between the methods in more detail:

  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server
  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: PostgreSQL
  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle
  • NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL

There are three ways to do such a query:

  • LEFT JOIN / IS NULL:

    SELECT  *
    FROM common
    LEFT JOIN
    table1 t1
    ON t1.common_id = common.common_id
    WHERE t1.common_id IS NULL
  • NOT EXISTS:

    SELECT  *
    FROM common
    WHERE NOT EXISTS
    (
    SELECT NULL
    FROM table1 t1
    WHERE t1.common_id = common.common_id
    )
  • NOT IN:

    SELECT  *
    FROM common
    WHERE common_id NOT IN
    (
    SELECT common_id
    FROM table1 t1
    )

When table1.common_id is not nullable, all these queries are semantically the same.

When it is nullable, NOT IN is different, since IN (and, therefore, NOT IN) return NULL when a value does not match anything in a list containing a NULL.

This may be confusing but may become more obvious if we recall the alternate syntax for this:

common_id = ANY
(
SELECT common_id
FROM table1 t1
)

The result of this condition is a boolean product of all comparisons within the list. Of course, a single NULL value yields the NULL result which renders the whole result NULL too.

We never cannot say definitely that common_id is not equal to anything from this list, since at least one of the values is NULL.

Suppose we have these data:

common

--
1
3

table1

--
NULL
1
2

LEFT JOIN / IS NULL and NOT EXISTS will return 3, NOT IN will return nothing (since it will always evaluate to either FALSE or NULL).

In MySQL, in case on non-nullable column, LEFT JOIN / IS NULL and NOT IN are a little bit (several percent) more efficient than NOT EXISTS. If the column is nullable, NOT EXISTS is the most efficient (again, not much).

In Oracle, all three queries yield same plans (an ANTI JOIN).

In SQL Server, NOT IN / NOT EXISTS are more efficient, since LEFT JOIN / IS NULL cannot be optimized to an ANTI JOIN by its optimizer.

In PostgreSQL, LEFT JOIN / IS NULL and NOT EXISTS are more efficient than NOT IN, sine they are optimized to an Anti Join, while NOT IN uses hashed subplan (or even a plain subplan if the subquery is too large to hash)

SQL subquery returns no results

You're trying to find a department ID which matches a number of employees; not the department those employees belong to.

If you're on 12c or higher you can use the row limiting clause:

select d.department_id, d.department_name, count(e.employee_id) as employee_count
from departments d
join employees e on e.department_id = d.department_id
group by d.department_id, d.department_name
order by count(e.employee_id) desc
fetch first row only;

DEPARTMENT_ID DEPARTMENT_NAME EMPLOYEE_COUNT
------------- ------------------------------ --------------
50 Shipping 45

If you can have ties and want to show all tied rows, change only to with ties; otherwise you'll get an indeterminate row from those that are tied.

In earlier versions the equivalent would be:

select department_id, department_name, employee_count
from (
select d.department_id, d.department_name, count(e.employee_id) as employee_count
from departments d
join employees e on e.department_id = d.department_id
group by d.department_id, d.department_name
order by count(e.employee_id) desc
)
where rownum = 1;

DEPARTMENT_ID DEPARTMENT_NAME EMPLOYEE_COUNT
------------- ------------------------------ --------------
50 Shipping 45

which is also indeterminate if there are ties.

To handle ties properly pre-12c, you can use a query similar to @Littlefoot's, but using rank():

select department_id, department_name, employee_count
from (
select d.department_id, d.department_name, count(e.employee_id) as employee_count,
rank() over (order by count(*) desc) as rnk
from departments d
join employees e on e.department_id = d.department_id
group by d.department_id, d.department_name
order by count(e.employee_id) desc
)
where rnk = 1;

... or use @TheImpaler's approach.


the teacher wants subquery only

Always fun to have articifial restrictions... but you could do:

select d.department_id, d.department_name, count(e.employee_id) as employee_count
from departments d
join employees e on e.department_id = d.department_id
group by d.department_id, d.department_name
having count(e.employee_id) = (
select max(count(employee_id))
from employees
group by department_id
);

which will show ties; or

select d.department_id, d.department_name, count(e.employee_id) as employee_count
from departments d
join employees e on e.department_id = d.department_id
where d.department_id in (
select department_id
from (
select department_id
from employees
group by department_id
order by count(employee_id) desc
)
where rownum = 1
)
group by d.department_id, d.department_name;

which won't, unless you changerownum to rank() again.

no explicit joins if thats possible

If you aren't allowed any joins I suppose you could do:

select d.department_id, d.department_name,
(select count(e.employee_id)
from employees e
where e.department_id = d.department_id) as employee_count
from departments d
where d.department_id in (
select department_id
from (
select department_id
from employees
group by department_id
order by count(employee_id) desc
)
where rownum = 1
)
group by d.department_id, d.department_name;

DEPARTMENT_ID DEPARTMENT_NAME EMPLOYEE_COUNT
------------- ------------------------------ --------------
50 Shipping 45

which is getting a bit ridiculous *8-)

Query with NOT IN subquery returning 0 rows

Based on the provided SQLFiddle in the edited question, this works.

SELECT p.*, 'BBCode' AS Format,
FROM_UNIXTIME(TIME) AS DateInserted,
FROM_UNIXTIME(editTime) AS DateUpdated
FROM et_post p
INNER JOIN et_conversation c
ON c.conversationId = p.conversationId
and c.private = 0
join (
select conversationId,min(postId) as m
from et_post
group by conversationId
) r
on r.conversationId = c.conversationId
where p.postId<>r.m

12,15,18 disappear as requested in your edit ... so too does NOT IN madness

If subquery returns no rows, return 0 else return value of the subquery

In the example you posted you could simply wrap the subquery in COALESCE and get your desired result.

SELECT 
*
FROM
#Purchase
WHERE
Total < 5000
AND
Total > COALESCE(
(
SELECT TOP 1
buyamount
FROM
#employee
WHERE
manager = 'TST'
ORDER BY
buyamount ASC
), 0)
ORDER BY Date DESC

You could also use a variable in your provided example instead of a subquery. I prefer this approach because it separates the logic, making it easier to read. Subqueries can also be a performance killer. Having it broken out like this, it should perform better.

DECLARE @BuyAmount INT
SET @BuyAmount = (SELECT TOP 1 BuyAmount FROM #Employee WHERE Manager = 'TST' ORDER BY BuyAmount ASC)
IF @BuyAmount IS NULL SET @BuyAmount = 0

SELECT *
FROM #Purchase
WHERE Total < 5000 AND Total > @BuyAmount
ORDER BY Date DESC

MySQL and subquery returns no results

So either skk.sarjan_kilpailukausi_id = sr.vaadittu_pelaajastatus_id or NOT EXISTS (SELECT * FROM skp WHERE skp.kayttaja_id = ki.kayttaja_id)? Okay, I've applied this condition in the following query.

I'm using table aliases to make it all more radable. I've also re-written your joins. Please check them.

SELECT
sr.montako_huonointa_pois_lkm AS ei_huomioitavat_kilpailut_lkm
, sr.montako_suora_nousu_lkm AS montako_suora_nousu_lkm
, sr.montako_suora_pudotus_lkm AS montako_suora_pudotus_lkm
, sr.montako_nousukarsinta_lkm AS montako_nousukarsinta_lkm
, sr.montako_pudotuskarsinta_lkm AS montako_pudotuskarsinta_lkm
, sr.vaadittu_pelaajastatus_id AS vaadittu_pelaajastatus_id
, sr.epakelpo_pelaajastatus_id AS epakelpo_pelaajastatus_id
, sro.monesko_kilpailu AS monesko_kilpailu
, sro.sarakeotsikko AS kilpailu_sarakeotsikko
, sor.kayttaja_id AS kayttaja_id
, sor.pistemaara AS pisteet
, sor.kompensaatiopistemaara AS kompensaatiopisteet
, kk.id AS kilpailu_id
, kk.ajankohta_alkamispaiva AS kilpailu_alkamispaiva
, kk.kilpailunimi_virallinen AS kilpailu_nimi
, ki.pelaaja_oma_nimi_txt AS kayttaja_nimiteksti
, ki.pelaaja_oma_seura_lyh_txt AS kayttaja_seurateksti
, ki.pelaaja_oma_kunta_txt AS kayttaja_kuntateksti
, sk.kausi_id AS kausi_id
FROM sarjojen_rglistat sr
JOIN sarjojen_rglistojen_osakilpailut sro ON sro.rglista_id = sr.id
JOIN kilpailukausien_kilpailut kk ON kk.id = sro.kilpailu_id
AND kk.kausi_id IS NULL
AND kk.seura_id IS NULL
JOIN sarjojen_osakilpailuiden_rgpisteet sor ON sor.kilpailu_id = kk.id
JOIN sarjojen_kilpailukausien_kilpailusysteemit skk
ON skk.id = kk.kauden_kilpailusysteemi_id
JOIN sarjojen_kilpailukaudet sk ON sk.id = skk.sarjan_kilpailukausi_id
LEFT JOIN sarjojen_kilpailukausien_pelaajastatukset skp
ON skp.sarjan_kilpailukausi_id = sr.vaadittu_pelaajastatus_id
AND skp.kayttaja_id = sor.kayttaja_id
LEFT JOIN kayttajien_ilmoittautumiset ki ON ki.kayttaja_id = sor.kayttaja_id
AND ki.kilpailu_id = sor.kilpailu_id
WHERE sr.id = 4
AND
(
(
sr.vaadittu_pelaajastatus_id IS NULL
AND
sr.epakelpo_pelaajastatus_id IS NULL
)
OR
(
skp.sarjan_kilpailukausi_id = sr.vaadittu_pelaajastatus_id
AND
skp.kayttaja_id = ki.kayttaja_id
)
OR
(
skk.sarjan_kilpailukausi_id = sr.vaadittu_pelaajastatus_id
OR
NOT EXISTS
(
SELECT *
FROM skp
WHERE skp.kayttaja_id = ki.kayttaja_id
)
)
)
ORDER BY sk.kausi_id ASC, kk.ajankohta_alkamispaiva ASC, kk.id ASC;

SQL select with IN subquery returns no records if the sub-query contains NULL

This is by design. If the match fails and the set contains NULL the result is NULL, as specified by the SQL standard.


'1' IN ('1', '3') => true
'2' IN ('1', '3') => false
'1' IN ('1', NULL) => true
'2' IN ('1', NULL) => NULL

'1' NOT IN ('1', '3') => false
'2' NOT IN ('1', '3') => true
'1' NOT IN ('1', NULL) => false
'2' NOT IN ('1', NULL) => NULL

Informally, the logic behind this is that NULL can be thought of as an unknown value. For example here it doesn't matter what the unknown value is - '1' is clearly in the set, so the result is true.

'1' IN ('1', NULL) => true

In the following example we can't be sure that '2' is in the set, but since we don't know all the values we also can't be sure that it isn't in the set. So the result is NULL.

'2' IN ('1', NULL) => NULL

Another way of looking at it is by rewriting x NOT IN (Y, Z) as X <> Y AND X <> Z. Then you can use the rules of three-valued logic:

true AND NULL => NULL
false AND NULL => false

NOT IN subquery fails when there are NULL-valued results

It is better to use NOT EXISTS anyway:

WHERE NOT EXISTS(
SELECT 1 FROM protocol_application_log_devl pal
WHERE pal.person_id = person_id
AND pal.set_id = @set_id
)

Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?

A pattern I see quite a bit, and wish that I didn't, is NOT IN. When
I see this pattern, I cringe. But not for performance reasons – after
all, it creates a decent enough plan in this case:

The main problem is that the results can be surprising if the target
column is NULLable (SQL Server processes this as a left anti semi
join, but can't reliably tell you if a NULL on the right side is equal
to – or not equal to – the reference on the left side). Also,
optimization can behave differently if the column is NULLable, even if
it doesn't actually contain any NULL values

Instead of NOT IN, use a correlated NOT EXISTS for this query pattern.
Always. Other methods may rival it in terms of performance, when all
other variables are the same, but all of the other methods introduce
either performance problems or other challenges.

Combine scalar query and results from a subquery that returns multiple columns

Literally just add them to your sub-query and make it your main query.

SELECT
DB_NAME() DB
, (SELECT SettingValue FROM Settings WHERE SettingName = 'XYZ') 'Is XYZ'
, D.DeptName, COUNT(*)
FROM (
SELECT DISTINCT substring(Source, len(Source) - 17, 14) AS DeptId
FROM AuditLog
) LogDept AS LD
INNER JOIN Department AS D ON D.DeptId = LD.DeptId
GROUP BY D.DeptName;

And I recommend the use of short table aliases as I have demonstated.

Why different results from WHERE EXISTS() and WHERE id IN

They are not identical.

When the subquery returns NULL, then the outer query returns no results at all. The reason is that NULL could match any value -- it means "unknown value" not "missing value" semantically. And an unknown value could match anything. In actual fact, the condition returns NULL in this case, which is treated as false.

For this reason, I strongly, strongly recommend never using NOT IN with a subquery. Use NOT EXISTS.

If you do, then you can do a NULL check:

SELECT *
FROM table1
WHERE table1.id NOT IN (
SELECT table2.id FROM table2 WHERE table2.id IS NOT NULL
);

However, in my experience, you'll leave out the NULL check at some crucial time, and think that there are no non-matches when there are.



Related Topics



Leave a reply



Submit