SQL "Select Where Not in Subquery" Returns No Results

SQL select where not in subquery returns no results

Update:

These articles in my blog describe the differences between the methods in more detail:

NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: SQL Server
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: PostgreSQL
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL

There are three ways to do such a query:

LEFT JOIN / IS NULL:

SELECT  *
FROM    common
LEFT JOIN
        table1 t1
ON      t1.common_id = common.common_id
WHERE   t1.common_id IS NULL

NOT EXISTS:

SELECT  *
FROM    common
WHERE   NOT EXISTS
        (
        SELECT  NULL
        FROM    table1 t1
        WHERE   t1.common_id = common.common_id
        )

NOT IN:

SELECT  *
FROM    common
WHERE   common_id NOT IN
        (
        SELECT  common_id
        FROM    table1 t1
        )

When table1.common_id is not nullable, all these queries are semantically the same.

When it is nullable, NOT IN is different, since IN (and, therefore, NOT IN) return NULL when a value does not match anything in a list containing a NULL.

This may be confusing but may become more obvious if we recall the alternate syntax for this:

common_id = ANY
(
SELECT  common_id
FROM    table1 t1
)

The result of this condition is a boolean product of all comparisons within the list. Of course, a single NULL value yields the NULL result which renders the whole result NULL too.

We never cannot say definitely that common_id is not equal to anything from this list, since at least one of the values is NULL.

Suppose we have these data:

common

--
1
3

table1

--
NULL
1
2

LEFT JOIN / IS NULL and NOT EXISTS will return 3, NOT IN will return nothing (since it will always evaluate to either FALSE or NULL).

In MySQL, in case on non-nullable column, LEFT JOIN / IS NULL and NOT IN are a little bit (several percent) more efficient than NOT EXISTS. If the column is nullable, NOT EXISTS is the most efficient (again, not much).

In Oracle, all three queries yield same plans (an ANTI JOIN).

In SQL Server, NOT IN / NOT EXISTS are more efficient, since LEFT JOIN / IS NULL cannot be optimized to an ANTI JOIN by its optimizer.

In PostgreSQL, LEFT JOIN / IS NULL and NOT EXISTS are more efficient than NOT IN, sine they are optimized to an Anti Join, while NOT IN uses hashed subplan (or even a plain subplan if the subquery is too large to hash)

SQL subquery returns no results

You're trying to find a department ID which matches a number of employees; not the department those employees belong to.

If you're on 12c or higher you can use the row limiting clause:

select d.department_id, d.department_name, count(e.employee_id) as employee_count
from departments d
join employees e on e.department_id = d.department_id
group by d.department_id, d.department_name
order by count(e.employee_id) desc
fetch first row only;

DEPARTMENT_ID DEPARTMENT_NAME                EMPLOYEE_COUNT
------------- ------------------------------ --------------
           50 Shipping                                   45

If you can have ties and want to show all tied rows, change only to with ties; otherwise you'll get an indeterminate row from those that are tied.

In earlier versions the equivalent would be:

select department_id, department_name, employee_count
from (
  select d.department_id, d.department_name, count(e.employee_id) as employee_count
  from departments d
  join employees e on e.department_id = d.department_id
  group by d.department_id, d.department_name
  order by count(e.employee_id) desc
)
where rownum = 1;

DEPARTMENT_ID DEPARTMENT_NAME                EMPLOYEE_COUNT
------------- ------------------------------ --------------
           50 Shipping                                   45

which is also indeterminate if there are ties.

To handle ties properly pre-12c, you can use a query similar to @Littlefoot's, but using rank():

select department_id, department_name, employee_count
from (
  select d.department_id, d.department_name, count(e.employee_id) as employee_count,
    rank() over (order by count(*) desc) as rnk
  from departments d
  join employees e on e.department_id = d.department_id
  group by d.department_id, d.department_name
  order by count(e.employee_id) desc
)
where rnk = 1;

... or use @TheImpaler's approach.

the teacher wants subquery only

Always fun to have articifial restrictions... but you could do:

select d.department_id, d.department_name, count(e.employee_id) as employee_count
from departments d
join employees e on e.department_id = d.department_id
group by d.department_id, d.department_name
having count(e.employee_id) = (
  select max(count(employee_id))
  from employees
  group by department_id
);

which will show ties; or

select d.department_id, d.department_name, count(e.employee_id) as employee_count
from departments d
join employees e on e.department_id = d.department_id
where d.department_id in (
  select department_id
  from (
    select department_id
    from employees
    group by department_id
    order by count(employee_id) desc
  )
  where rownum = 1
)
group by d.department_id, d.department_name;

which won't, unless you changerownum to rank() again.

no explicit joins if thats possible

If you aren't allowed any joins I suppose you could do:

select d.department_id, d.department_name,
  (select count(e.employee_id)
    from employees e
    where e.department_id = d.department_id) as employee_count
from departments d
where d.department_id in (
  select department_id
  from (
    select department_id
    from employees
    group by department_id
    order by count(employee_id) desc
  )
  where rownum = 1
)
group by d.department_id, d.department_name;

DEPARTMENT_ID DEPARTMENT_NAME                EMPLOYEE_COUNT
------------- ------------------------------ --------------
           50 Shipping                                   45

which is getting a bit ridiculous *8-)

Query with NOT IN subquery returning 0 rows

Based on the provided SQLFiddle in the edited question, this works.

SELECT p.*, 'BBCode' AS Format,
FROM_UNIXTIME(TIME) AS DateInserted,
FROM_UNIXTIME(editTime) AS DateUpdated
FROM et_post p
INNER JOIN et_conversation c 
ON c.conversationId = p.conversationId
and c.private = 0
join (
select conversationId,min(postId) as m
from et_post
group by conversationId
) r
on r.conversationId = c.conversationId
where p.postId<>r.m

12,15,18 disappear as requested in your edit ... so too does NOT IN madness

If subquery returns no rows, return 0 else return value of the subquery

In the example you posted you could simply wrap the subquery in COALESCE and get your desired result.

SELECT 
    * 
FROM 
    #Purchase
WHERE
    Total < 5000
AND 
    Total > COALESCE(
            (
                SELECT TOP 1
                   buyamount
                FROM
                   #employee
                WHERE
                   manager = 'TST'
                ORDER BY
                   buyamount ASC
            ), 0)
ORDER BY Date DESC

You could also use a variable in your provided example instead of a subquery. I prefer this approach because it separates the logic, making it easier to read. Subqueries can also be a performance killer. Having it broken out like this, it should perform better.

DECLARE @BuyAmount INT
SET @BuyAmount = (SELECT TOP 1 BuyAmount FROM #Employee WHERE Manager = 'TST' ORDER BY BuyAmount ASC)
IF @BuyAmount IS NULL SET @BuyAmount = 0

SELECT * 
FROM #Purchase
WHERE Total < 5000 AND Total > @BuyAmount
ORDER BY Date DESC

MySQL and subquery returns no results

So either skk.sarjan_kilpailukausi_id = sr.vaadittu_pelaajastatus_id or NOT EXISTS (SELECT * FROM skp WHERE skp.kayttaja_id = ki.kayttaja_id)? Okay, I've applied this condition in the following query.

I'm using table aliases to make it all more radable. I've also re-written your joins. Please check them.

SELECT
  sr.montako_huonointa_pois_lkm AS ei_huomioitavat_kilpailut_lkm
  , sr.montako_suora_nousu_lkm AS montako_suora_nousu_lkm
  , sr.montako_suora_pudotus_lkm AS montako_suora_pudotus_lkm
  , sr.montako_nousukarsinta_lkm AS montako_nousukarsinta_lkm
  , sr.montako_pudotuskarsinta_lkm AS montako_pudotuskarsinta_lkm
  , sr.vaadittu_pelaajastatus_id AS vaadittu_pelaajastatus_id
  , sr.epakelpo_pelaajastatus_id AS epakelpo_pelaajastatus_id
  , sro.monesko_kilpailu AS monesko_kilpailu
  , sro.sarakeotsikko AS kilpailu_sarakeotsikko
  , sor.kayttaja_id AS kayttaja_id
  , sor.pistemaara AS pisteet
  , sor.kompensaatiopistemaara AS kompensaatiopisteet
  , kk.id AS kilpailu_id
  , kk.ajankohta_alkamispaiva AS kilpailu_alkamispaiva
  , kk.kilpailunimi_virallinen AS kilpailu_nimi
  , ki.pelaaja_oma_nimi_txt AS kayttaja_nimiteksti
  , ki.pelaaja_oma_seura_lyh_txt AS kayttaja_seurateksti
  , ki.pelaaja_oma_kunta_txt AS kayttaja_kuntateksti
  , sk.kausi_id AS kausi_id
FROM sarjojen_rglistat sr
JOIN sarjojen_rglistojen_osakilpailut sro ON sro.rglista_id = sr.id
JOIN kilpailukausien_kilpailut kk ON  kk.id = sro.kilpailu_id
                                  AND kk.kausi_id IS NULL
                                  AND kk.seura_id IS NULL
JOIN sarjojen_osakilpailuiden_rgpisteet sor ON sor.kilpailu_id = kk.id
JOIN sarjojen_kilpailukausien_kilpailusysteemit skk 
                ON skk.id = kk.kauden_kilpailusysteemi_id
JOIN sarjojen_kilpailukaudet sk ON sk.id = skk.sarjan_kilpailukausi_id
LEFT JOIN sarjojen_kilpailukausien_pelaajastatukset skp
                ON  skp.sarjan_kilpailukausi_id = sr.vaadittu_pelaajastatus_id
                AND skp.kayttaja_id  = sor.kayttaja_id
LEFT JOIN kayttajien_ilmoittautumiset ki ON  ki.kayttaja_id = sor.kayttaja_id
                                         AND ki.kilpailu_id = sor.kilpailu_id
WHERE sr.id = 4
AND 
(
  (
    sr.vaadittu_pelaajastatus_id IS NULL 
    AND 
    sr.epakelpo_pelaajastatus_id IS NULL
  ) 
  OR 
  (
    skp.sarjan_kilpailukausi_id = sr.vaadittu_pelaajastatus_id 
    AND 
    skp.kayttaja_id = ki.kayttaja_id
  )
  OR
  (
    skk.sarjan_kilpailukausi_id = sr.vaadittu_pelaajastatus_id
    OR 
    NOT EXISTS 
    (
      SELECT *
      FROM skp
      WHERE skp.kayttaja_id = ki.kayttaja_id
    )
  )
)
ORDER BY sk.kausi_id ASC, kk.ajankohta_alkamispaiva ASC, kk.id ASC;

SQL select with IN subquery returns no records if the sub-query contains NULL

This is by design. If the match fails and the set contains NULL the result is NULL, as specified by the SQL standard.


'1' IN ('1', '3') => true
'2' IN ('1', '3') => false
'1' IN ('1', NULL) => true
'2' IN ('1', NULL) => NULL

'1' NOT IN ('1', '3') => false
'2' NOT IN ('1', '3') => true
'1' NOT IN ('1', NULL) => false
'2' NOT IN ('1', NULL) => NULL

Informally, the logic behind this is that NULL can be thought of as an unknown value. For example here it doesn't matter what the unknown value is - '1' is clearly in the set, so the result is true.

'1' IN ('1', NULL) => true

In the following example we can't be sure that '2' is in the set, but since we don't know all the values we also can't be sure that it isn't in the set. So the result is NULL.

'2' IN ('1', NULL) => NULL

Another way of looking at it is by rewriting x NOT IN (Y, Z) as X <> Y AND X <> Z. Then you can use the rules of three-valued logic:

true AND NULL => NULL
false AND NULL => false

NOT IN subquery fails when there are NULL-valued results

It is better to use NOT EXISTS anyway:

WHERE NOT EXISTS(
    SELECT 1 FROM protocol_application_log_devl pal
    WHERE pal.person_id = person_id
     AND  pal.set_id = @set_id
)

Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?

A pattern I see quite a bit, and wish that I didn't, is NOT IN. When
I see this pattern, I cringe. But not for performance reasons – after
all, it creates a decent enough plan in this case:

The main problem is that the results can be surprising if the target
column is NULLable (SQL Server processes this as a left anti semi
join, but can't reliably tell you if a NULL on the right side is equal
to – or not equal to – the reference on the left side). Also,
optimization can behave differently if the column is NULLable, even if
it doesn't actually contain any NULL values

Instead of NOT IN, use a correlated NOT EXISTS for this query pattern.
Always. Other methods may rival it in terms of performance, when all
other variables are the same, but all of the other methods introduce
either performance problems or other challenges.

Combine scalar query and results from a subquery that returns multiple columns

Literally just add them to your sub-query and make it your main query.

SELECT
    DB_NAME() DB
    , (SELECT SettingValue FROM Settings WHERE SettingName = 'XYZ') 'Is XYZ'
    , D.DeptName, COUNT(*) 
FROM (
    SELECT DISTINCT substring(Source, len(Source) - 17, 14) AS DeptId
    FROM AuditLog
) LogDept AS LD
INNER JOIN Department AS D ON D.DeptId = LD.DeptId
GROUP BY D.DeptName;

And I recommend the use of short table aliases as I have demonstated.

Why different results from WHERE EXISTS() and WHERE id IN

They are not identical.

When the subquery returns NULL, then the outer query returns no results at all. The reason is that NULL could match any value -- it means "unknown value" not "missing value" semantically. And an unknown value could match anything. In actual fact, the condition returns NULL in this case, which is treated as false.

For this reason, I strongly, strongly recommend never using NOT IN with a subquery. Use NOT EXISTS.

If you do, then you can do a NULL check:

SELECT *
FROM table1
WHERE table1.id NOT IN (
        SELECT table2.id FROM table2 WHERE table2.id IS NOT NULL
                       );

However, in my experience, you'll leave out the NULL check at some crucial time, and think that there are no non-matches when there are.

SQL "Select Where Not in Subquery" Returns No Results