How can I perform a SQL 'NOT IN' query faster?
You can use a left outer join, or a not exists
clause.
Left outer join:
select E.EmailAddress
from EMAIL E left outer join BLACKLIST B on (E.EmailAddress = B.EmailAddress)
where B.EmailAddress is null;
Not Exists:
select E.EmailAddress
from EMAIL E where not exists
(select EmailAddress from BLACKLIST B where B.EmailAddress = E.EmailAddress)
Both are quite generic SQL solutions (don't depend on a specific DB engine). I would say that the latter is a little bit more performant (not by much though). But definitely more performant than the not in
one.
As commenters stated, you can also try creating an index on BLACKLIST(EmailAddress)
, that should help speed up the execution of your query.
Is there a way to make an SQL NOT IN query faster?
A NOT IN
can be rewritten as a NOT EXISTS
query which is very often faster (unfortunately the Postgres optimizer isn't smart enough to detect this).
SELECT COUNT(DISTINCT l1.mobile_number)
FROM log_entries as l1
WHERE l1.created_at >= '2015-04-09 00:00:00'
AND l1.created_at <= '2015-04-09 23:59:59'
AND NOT EXISTS (SELECT *
FROM log_entries l2
WHERE l2.created_at < '2015-04-09 00:00:00'
AND l2.mobile_number = l1.mobile_number);
An index on (mobile_number, created_at)
should further improve the performance.
A side note: created_at <= '2015-04-09 23:59:59'
will not include rows with fractional seconds, e.g. 2015-04-09 23:59:59.789
. When dealing with timestamps it's better to use a "lower than" with the "next day" instead of a "lower or equal" with the day in question.
So better use: created_at < '2015-04-10 00:00:00'
instead to also "catch" rows on that day with fractional seconds.
SQL (NOT IN) Query takes forever to execute
Please try the following. I simplified the inner queries by eliminating unnecessary IN
, UNION
, and EXCEPT
clauses.
WITH frm
AS (SELECT product_id AS PId,
Min(Cast(product_startdate AS DATETIME)) AS PStartDate
FROM products
WHERE product_status IN ( 'F', 'R', 'M' )
GROUP BY product_id),
firstcount
AS (SELECT pid,
pstartdate,
(SELECT Count(*)
FROM products
WHERE product_status IN ( 'OR', 'OP' )
AND product_comments LIKE '%CANCELLED%'
AND product_id = pid) AS v_count
FROM frm),
secondcount
AS (SELECT pid,
pstartdate,
CASE
WHEN v_count = 0 THEN (SELECT Count(*)
FROM products
WHERE product_status IN ( 'F', 'R', 'M' )
AND product_startdate != '.'
AND product_id = pid)
ELSE v_count
END AS v_count
FROM firstcount)
INSERT INTO products_del
(product_id,
product_startdate,
productdel_status)
SELECT pid,
pstartdate,
CASE
WHEN v_count != 0 THEN 'UNKNOWN'
ELSE NULL
END
FROM secondcount
SELECT *
FROM products_del
Which is faster - NOT IN or NOT EXISTS?
Usually it does not matter if NOT IN
is slower / faster than NOT EXISTS
, because they are NOT equivalent in presence of NULL
. Read:
NOT IN vs NOT EXISTS
In these cases you almost always want NOT EXISTS
, because it has the usually expected behaviour.
If they are equivalent, it is likely that your database already has figured that out and will generate the same execution plan for both.
In the few cases where both options are aquivalent and your database is not able to figure that out, it is better to analyze both execution plans and choose the best options for your specific case.
NOT IN query performance issue with large data
You can write it with NOT EXISTS instead, although these queries will have different results if any value of id is NULL (in which case, NOT IN probably yields not the answer you want, so NOT EXISTS is better from that perspective as well.)
select id,number from tmp_t where not exists
(select 1 from tmp_t a where a.id=tmp_t.number);
But your formulation is also efficient as long as work_mem is large enough.
Related Topics
Sql Server 2008 Hierarchy Data Type Performance
How to Reuse a Large Query Without Repeating It
If I Update a View, Will My Original Tables Get Updated
Weighted Average in T-Sql (Like Excel's Sumproduct)
How This SQL Injection Works? Explanation Needed
Bigquery Select _Tables_ from All Tables Within Project
Sql - Stdevp or Stdev and How to Use It
Sql Server Store Multiple Values in SQL Variable
Resources for Database Sharding and Partitioning
Oracle SQL Group by "Not a Group by Expression" Help
Presto Check If Null and Return Default (Nvl Analog)
Transpose/Pivot Distinct Row Attribute as Column and Group Another Attribute
Sql Server: Arithmetic Overflow Error Converting Expression to Data Type Int
Sql - How to Find The Highest Number in a Column