Delete sql rows where IDs do not have a match from another table
Using LEFT JOIN/IS NULL:
DELETE b FROM BLOB b
LEFT JOIN FILES f ON f.id = b.fileid
WHERE f.id IS NULL
Using NOT EXISTS:
DELETE FROM BLOB
WHERE NOT EXISTS(SELECT NULL
FROM FILES f
WHERE f.id = fileid)
Using NOT IN:
DELETE FROM BLOB
WHERE fileid NOT IN (SELECT f.id
FROM FILES f)
Warning
Whenever possible, perform DELETEs within a transaction (assuming supported - IE: Not on MyISAM) so you can use rollback to revert changes in case of problems.
Delete rows from a table with not in ( another table )
My first recommendation is to try not exists
rather than not in
:
DELETE a FROM a
WHERE NOT EXISTS (SELECT 1 FROM b WHERE b.foreign_key_of_A_in_B = a.id_A);
NOT IN
returns false or NULL
if any value in the subquery is NULL
. That is how the operator is defined. NOT EXISTS
has more expected behavior. So, if you have any NULL
values in the subquery, this will work (i.e. delete rows) but the NOT IN
version will not.
I would recommend that you try the logic out using SELECT
before doing a DELETE
:
SELECT A.*
FROM A
WHERE NOT EXISTS (SELECT 1 FROM b WHERE b.foreign_key_of_A_in_B = A.id_A);
Delete from a table where value doesn't exist in another table
You need to invert the active
test, and move it into the ON
clause. Then check for a non-matching element in the WHERE
clause.
DELETE t1 FROM `table_1` t1
LEFT JOIN `table_2` t2 ON t1.id = t2.table_1_id AND t2.active = 1
WHERE t2.table_1_id IS NULL
You only need to delete from table_1
here, since the LEFT JOIN
won't include any existing rows from table_2
. If table_1_id
is declared as a foreign key with ON DELETE CASCADE
, the related inactive rows in table_2
will be deleted automatically.
How do I delete rows in one table where the ID matches another table row where a field is a certain value?
try apply join
permissions
and user
and where status = 'T'
Example:
DELETE p
FROM permissions p
INNER JOIN users u
ON p.user_id=u.id
WHERE u.status = 'T'
DELETE records which do not have a match in another table
I benchmarked four typical queries, with different settings for {work_mem, effective_cache_size, random_page_cost}, these settings have the largest influence on the selected plan. I first did a "run in" with my default settings to warm the cache.
Note: the test-set is small enough to allow all needed pages to be present in cache.
The test-set
SET search_path=tmp;
/************************/
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE one
( id SERIAL NOT NULL PRIMARY KEY
, payload varchar
);
CREATE TABLE two
( id SERIAL NOT NULL PRIMARY KEY
, one_id INTEGER REFERENCES one
, payload varchar
);
INSERT INTO one (payload) SELECT 'Text_' || gs::text FROM generate_series(1,30000) gs;
INSERT INTO two (payload) SELECT 'Text_' || gs::text FROM generate_series(1,30000) gs;
UPDATE two t
SET one_id = o.id
FROM one o
WHERE o.id = t.id
AND random() < 0.1;
INSERT INTO two (one_id,payload) SELECT one_id,payload FROM two;
INSERT INTO two (one_id,payload) SELECT one_id,payload FROM two;
INSERT INTO two (one_id,payload) SELECT one_id,payload FROM two;
VACUUM ANALYZE one;
VACUUM ANALYZE two;
/***************/
The queries:
\echo NOT EXISTS()
EXPLAIN ANALYZE
DELETE FROM one o
WHERE NOT EXISTS ( SELECT * FROM two t
WHERE t.one_id = o.id
);
\echo NOT IN()
EXPLAIN ANALYZE
DELETE FROM one o
WHERE o.id NOT IN ( SELECT one_id FROM two t)
;
\echo USING (subquery self LEFT JOIN two where NULL)
EXPLAIN ANALYZE
DELETE FROM one o
USING (
SELECT o2.id
FROM one o2
LEFT JOIN two t ON t.one_id = o2.id
WHERE t.one_id IS NULL
) sq
WHERE sq.id = o.id
;
\echo USING (subquery self WHERE NOT EXISTS(two)))
EXPLAIN ANALYZE
DELETE FROM one o
USING (
SELECT o2.id
FROM one o2
WHERE NOT EXISTS ( SELECT *
FROM two t WHERE t.one_id = o2.id
)
) sq
WHERE sq.id = o.id
;
The result (summarised)
NOT EXISTS() NOT IN() USING(LEFT JOIN NULL) USING(NOT EXISTS)
1) rpc=4.0.csz=1M wmm=64 80.358 14389.026 77.620 72.917
2) rpc=4.0.csz=1M wmm=64000 60.527 69.104 51.851 51.004
3) rpc=1.5.csz=1M wmm=64 69.804 10758.480 80.402 77.356
4) rpc=1.5.csz=1M wmm=64000 50.872 69.366 50.763 53.339
5) rpc=4.0.csz=1G wmm=64 84.117 7625.792 69.790 69.627
6) rpc=4.0.csz=1G wmm=64000 49.964 67.018 49.968 49.380
7) rpc=1.5.csz=1G wmm=64 68.567 3650.008 70.283 69.933
8) rpc=1.5.csz=1G wmm=64000 49.800 67.298 50.116 50.345
legend:
rpc := "random_page_cost"
csz := "effective_cache_size"
wmm := "work_mem"
As you can see, the NOT IN()
variant is very sensitive to shortage of work_mem
. Agreed, the setting 64(KB) is very low, but this `more or less* corresponds to large data sets, which won't fit in hashtables, either.
EXTRA: during the warm-in phase, the NOT EXISTS()
query suffered from extreme FK-trigger contention. This apears to be a result of a conflict with the vacuum deamon, which is still active after the table set-up.:
PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 64-bit
NOT EXISTS()
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Delete on one o (cost=6736.00..7623.94 rows=27962 width=12) (actual time=80.596..80.596 rows=0 loops=1)
-> Hash Anti Join (cost=6736.00..7623.94 rows=27962 width=12) (actual time=49.174..61.327 rows=27050 loops=1)
Hash Cond: (o.id = t.one_id)
-> Seq Scan on one o (cost=0.00..463.00 rows=30000 width=10) (actual time=0.003..5.156 rows=30000 loops=1)
-> Hash (cost=3736.00..3736.00 rows=240000 width=10) (actual time=49.121..49.121 rows=23600 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 1015kB
-> Seq Scan on two t (cost=0.00..3736.00 rows=240000 width=10) (actual time=0.006..33.790 rows=240000 loops=1)
Trigger for constraint two_one_id_fkey: time=467720.117 calls=27050
Total runtime: 467824.652 ms
(9 rows)
Delete rows which is not match from another table
First of all, test data for SRC:
create table src(empid, ENTERTAINMENT) as
select 101, 'BaseBall,Cricket' from dual union all
select 102, 'Badminton,Chess' from dual union all
select 103, 'Golf,Reading Books' from dual;
Now create a view that corresponds to what TGT should be:
create or replace view v_tgt as
select empid, x.entertainment
from src, xmltable(
'if (contains($X,",")) then ora:tokenize($X,",") else $X'
passing entertainment as X
columns entertainment varchar2(64) path '.'
) x;
This is one of many methods to split strings. See https://stewashton.wordpress.com/category/splitting-strings/ for some of the others.
Now create the TGT table from the view:
create table tgt as select * from v_tgt;
Finally, here is a MERGE statement that will compare TGT to V_TGT and make TGT identical:
merge /*+ qb_name(SYNC_PARTITION) USE_NL(O) */ into (
select /*+ qb_name(target) */
"EMPID", "ENTERTAINMENT", rowid Z##RID
from TGT
) O
using (
select /*+ qb_name(CDC_PARTITION) */ * from (
select /*+ qb_name(before_filter) */
"EMPID", "ENTERTAINMENT",
case
when Z##NEW = 1
and sum(Z##NEW) over(partition by
"EMPID", "ENTERTAINMENT"
order by null rows unbounded preceding) > sum(Z##OLD) over(partition by
"EMPID", "ENTERTAINMENT"
)
then 'I'
when Z##OLD = 1
and sum(Z##OLD) over(partition by
"EMPID", "ENTERTAINMENT"
order by null rows unbounded preceding) > sum(Z##NEW) over(partition by
"EMPID", "ENTERTAINMENT"
)
then 'D'
end Z##OP, Z##RID
FROM (
select /*+ qb_name(old) */
"EMPID", "ENTERTAINMENT",
1 Z##OLD, 0 Z##NEW, rowid Z##RID
from TGT O
union all
select /*+ qb_name(new) */
"EMPID", "ENTERTAINMENT",
0, 1, null
from v_tgt N
)
)
where Z##OP is not null
) N
on (
O.Z##RID = n.Z##RID
)
when matched then update set
"EMPID"=N."EMPID"
delete where N.Z##OP = 'D'
when not matched then insert (
"EMPID", "ENTERTAINMENT"
) values(
N."EMPID", N."ENTERTAINMENT"
);
I generated this MERGE statement using a tool I present here: https://stewashton.wordpress.com/2018/02/12/comp_sync-1-a-new-table-compare-sync-package/
Best regards,
Stew Ashton
Related Topics
Log Record Changes in SQL Server in an Audit Table
How to Query a Comma Separated Column for a Specific Value
How to Sort a Varchar Column in SQL Server That Contains Numbers
Postgres Column "X" Does Not Exist
How to Convert a SQL Server 2008 Datetimeoffset to a Datetime
How to Version Your Database Schema
Combining the Results of Two SQL Queries as Separate Columns
Postgres Dynamic Query Function
How to Insert a Blob into a Database Using SQL Server Management Studio
How to Get Second Largest or Third Largest Entry from a Table
Partition Function Count() Over Possible Using Distinct
Postgresql Multi Insert...Returning with Multiple Columns
How to Design a Database for User Defined Fields
What Does Delimiter // Do in a Trigger
How to Query for All Dates Greater Than a Certain Date in SQL Server