How to Delete Records Not In

How to Delete Records NOT IN

I would like to start with assumptions.

  1. You have a chainlike data model:
    Projects --* ProjectSchemes --* Schemes
  2. Your target is to have only valid chains, so no ProjectSchemes without Project, no Schemes without ProjectSchemes.
  3. NULL is not a valid value for one of your ids.
  4. All ids are unique in their table
  5. You don't use referential integrity mechanisms of your database

As a result your SELECT would list the scheme_id for all Schemes in the Schemes table.

Said that, you should start to delete all ProjectSchemes without a corresponding Project. These are ProjectSchemes with an id of NULL or an id which does not exists in the Projects Table:

DELETE ProjectSchemes WHERE (Project_Id is NULL) OR 
(NOT EXISTS (SELECT * FROM Projects WHERE
Projects.Project_Id = ProjectSchemes.Project_Id))

After deleting the ProjectsSchemes without a Project we now may have some new orphans in the Schemes Table. The next thing is now to delete all Schemes which have an id of NULL or an id which does not exists in the ProjectsSchemes Table:

DELETE Schemes WHERE (Scheme_Id is NULL) OR 
(NOT EXISTS (SELECT * FROM ProjectSchemes WHERE
ProjectSchemes.Scheme_Id = Schemes.Scheme_Id))

There is still a chance to have schemes which are not connected to a project without deleting the ProjectSchemes.

Delete rows from table which does not exists in other table by picking eligible records only

Edit: I misread your query originally. I have changed my answer to remove the join to Sample_Table in the first condition.

Delete from FINAL_TABLE FT
where not exists
(select 1 from
Latest_table LT
where LT.ID = FT.ID
and LT.NAME = FT.NAME
and LT.SUB_ID = FT.SUB_ID)
AND FT.id IN
(
SELECT id FROM Sample_Table
)

This will only delete from Final_Table if the id appears in Sample_Table and the record (all 3 columns) does not appear in the Latest_table.

An alternative way of writing this is

Delete from FINAL_TABLE FT
where
(FT.ID, FT.SUB_ID, FT.NAME) NOT IN
(SELECT LT.ID, LT.SUB_ID, LT.NAME FROM Latest_table LT)
AND FT.ID IN
(SELECT ST.id FROM Sample_Table ST)

Delete rows from a table with not in ( another table )

My first recommendation is to try not exists rather than not in:

DELETE a FROM a
WHERE NOT EXISTS (SELECT 1 FROM b WHERE b.foreign_key_of_A_in_B = a.id_A);

NOT IN returns false or NULL if any value in the subquery is NULL. That is how the operator is defined. NOT EXISTS has more expected behavior. So, if you have any NULL values in the subquery, this will work (i.e. delete rows) but the NOT IN version will not.

I would recommend that you try the logic out using SELECT before doing a DELETE:

SELECT A.*
FROM A
WHERE NOT EXISTS (SELECT 1 FROM b WHERE b.foreign_key_of_A_in_B = A.id_A);

SQL delete rows not in another table

decode finds sameness (even if both values are null):

decode( field1, field2, 1, 0 ) = 1  

To delete rows in table1 not found in table2:

delete table1 t
where t.rowid in (select t1.rowid
from table1 t1
left outer join table2 t2
on decode(t1.field1, t2.field1, 1, 0) = 1
and decode(t1.field2, t2.field2, 1, 0) = 1
and decode(t1.field3, t2.field3, 1, 0) = 1
/* ... */
where t2.rowid is null /* no matching row found */
)

to use existing indexes

                      ...
left outer join table2 t2
on (t1.index_field1=t2.index_field1 or
t1.index_field1 is null and t2.index_field1 is null)
and ...

MySQL - How to delete rows where column does not contain certain values?

use

DELETE FROM TABLE_NAME WHERE Disease not in ('Malaria', 'HIV', 'E. coli 0157');

put your actual name of the table in place of TABLE_NAME in the above statement.

DELETE records which do not have a match in another table

I benchmarked four typical queries, with different settings for {work_mem, effective_cache_size, random_page_cost}, these settings have the largest influence on the selected plan. I first did a "run in" with my default settings to warm the cache.
Note: the test-set is small enough to allow all needed pages to be present in cache.

The test-set

SET search_path=tmp;

/************************/
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;

CREATE TABLE one
( id SERIAL NOT NULL PRIMARY KEY
, payload varchar
);

CREATE TABLE two
( id SERIAL NOT NULL PRIMARY KEY
, one_id INTEGER REFERENCES one
, payload varchar
);

INSERT INTO one (payload) SELECT 'Text_' || gs::text FROM generate_series(1,30000) gs;
INSERT INTO two (payload) SELECT 'Text_' || gs::text FROM generate_series(1,30000) gs;

UPDATE two t
SET one_id = o.id
FROM one o
WHERE o.id = t.id
AND random() < 0.1;

INSERT INTO two (one_id,payload) SELECT one_id,payload FROM two;
INSERT INTO two (one_id,payload) SELECT one_id,payload FROM two;
INSERT INTO two (one_id,payload) SELECT one_id,payload FROM two;

VACUUM ANALYZE one;
VACUUM ANALYZE two;
/***************/

The queries:

\echo NOT EXISTS()
EXPLAIN ANALYZE
DELETE FROM one o
WHERE NOT EXISTS ( SELECT * FROM two t
WHERE t.one_id = o.id
);

\echo NOT IN()
EXPLAIN ANALYZE
DELETE FROM one o
WHERE o.id NOT IN ( SELECT one_id FROM two t)
;

\echo USING (subquery self LEFT JOIN two where NULL)
EXPLAIN ANALYZE
DELETE FROM one o
USING (
SELECT o2.id
FROM one o2
LEFT JOIN two t ON t.one_id = o2.id
WHERE t.one_id IS NULL
) sq
WHERE sq.id = o.id
;

\echo USING (subquery self WHERE NOT EXISTS(two)))
EXPLAIN ANALYZE
DELETE FROM one o
USING (
SELECT o2.id
FROM one o2
WHERE NOT EXISTS ( SELECT *
FROM two t WHERE t.one_id = o2.id
)
) sq
WHERE sq.id = o.id
;

The result (summarised)

                        NOT EXISTS()    NOT IN()        USING(LEFT JOIN NULL)   USING(NOT EXISTS)
1) rpc=4.0.csz=1M wmm=64 80.358 14389.026 77.620 72.917
2) rpc=4.0.csz=1M wmm=64000 60.527 69.104 51.851 51.004
3) rpc=1.5.csz=1M wmm=64 69.804 10758.480 80.402 77.356
4) rpc=1.5.csz=1M wmm=64000 50.872 69.366 50.763 53.339
5) rpc=4.0.csz=1G wmm=64 84.117 7625.792 69.790 69.627
6) rpc=4.0.csz=1G wmm=64000 49.964 67.018 49.968 49.380
7) rpc=1.5.csz=1G wmm=64 68.567 3650.008 70.283 69.933
8) rpc=1.5.csz=1G wmm=64000 49.800 67.298 50.116 50.345

legend:
rpc := "random_page_cost"
csz := "effective_cache_size"
wmm := "work_mem"

As you can see, the NOT IN() variant is very sensitive to shortage of work_mem. Agreed, the setting 64(KB) is very low, but this `more or less* corresponds to large data sets, which won't fit in hashtables, either.

EXTRA: during the warm-in phase, the NOT EXISTS() query suffered from extreme FK-trigger contention. This apears to be a result of a conflict with the vacuum deamon, which is still active after the table set-up.:

PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 64-bit
NOT EXISTS()
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Delete on one o (cost=6736.00..7623.94 rows=27962 width=12) (actual time=80.596..80.596 rows=0 loops=1)
-> Hash Anti Join (cost=6736.00..7623.94 rows=27962 width=12) (actual time=49.174..61.327 rows=27050 loops=1)
Hash Cond: (o.id = t.one_id)
-> Seq Scan on one o (cost=0.00..463.00 rows=30000 width=10) (actual time=0.003..5.156 rows=30000 loops=1)
-> Hash (cost=3736.00..3736.00 rows=240000 width=10) (actual time=49.121..49.121 rows=23600 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 1015kB
-> Seq Scan on two t (cost=0.00..3736.00 rows=240000 width=10) (actual time=0.006..33.790 rows=240000 loops=1)
Trigger for constraint two_one_id_fkey: time=467720.117 calls=27050
Total runtime: 467824.652 ms
(9 rows)

Deleting records with T-SQL efficiently without using IN clause

This works fine...

DELETE IH
FROM IndexHistory IH
Left JOIN IndexedLineItems LI
ON IH.Docid = LI.DocId
WHERE LI.DocId is Null

How to delete records from a table if they don't meet a condition in another table

With NOT EXISTS:

delete c from company c
where not exists (
select 1 from company_name
where company_id = c.company_id and company_name = c.company_name and is_proper = 1
)

See the demo.

Results:

> company_id | company_name 
> ---------: | :------------
> 2 | Apple Inc.
> 5 | Facebook Inc.
> 6 | Google Inc.


Related Topics



Leave a reply



Submit