Equivalent of on Conflict Do Nothing for Update Postgres

Equivalent of ON CONFLICT DO NOTHING for UPDATE postgres

AFAIK, there is no such equivalent.

Let us say you are developing an application that connects to a postgresql database, there are a few things you need to keep in mind, in the context of your question:

  • It may be counter-intuitive but you should consider errors being thrown by the DB as a good thing.

    This is just about getting a status, it does not mean application crash.
  • For insert, there is an alternative choice of action on conflict (update or nothing) so it makes sense to have a syntax to let you decide.

    For updates, the only thing you can do is ... nothing.

    So why would SQL let you ask to do something specific since there is no choice? Remember that DB reporting errors is good, so let the DB do nothing and tell you why.
  • Last, it is a bad practice to update primary keys.

    The ON CONFLICT ... for inserts is not intended to update the primary key fields. The very opposite in fact: it is intended to update all the fields except the ones from the primary key in a single record.

While I am on that point, please note that there was no need for a conflict on primary key for the query to fail

1 record with the "convenient" ON UPDATE NO ACTION foreign key would have made it fail too (which is still better than updating 10M+ records in 50 tables with a ON UPDATE CASCADE ...). BTW, did you know Oracle does not even have the ON UPDATE CASCADE clause? What do you think is the reason for that?


What can you/should not do in that situation?

  1. Do not update the primary key, like I said. Your question is still valid for UNIQUE constraints but please please please, NEVER update primary keys.
  2. Do not attempt to see if a conflicting record already exists. It may take a long time and still be unreliable.

    Do you really want to select millions of records just to avoid the error codes?

    Also, when you extend to other constraints (CHECK or EXCLUSION), will you really type the additional code it takes with no error in order to, once again, only avoid an error code?

    Last, if you have implemented row-level security, the conflict may arise from a record you cannot see.
  3. Handle the error code in your app. Receiving status is GOOD.
  4. Use save points if you are in the middle of a transaction.

    This is the only annoying thing with DB errors: if you get one in the middle of a transaction, you will start getting current transaction is aborted, commands ignored until end of transaction block for everything.

    Hopefully, you do not need to roll the entire transaction back and redo everything from scratch. You can get away using the following piece of code.

Here you go:

BEGIN;
SAVEPOINT MySavepoint;
UPDATE mytable set myuniquefield = 3; /*2+ records are going to be updated */
rollback to savepoint MySavepoint;
/*Insert Some more queries here*/
COMMIT;

Postgres update column, on conflict ignore this row

You need table aliases to fix your query:

UPDATE users u
SET email = u.secondary_email
WHERE NOT EXISTS (SELECT 1 FROM users u2 WHERE u2.email = u.secondary_email);

For your overall problem, check for no duplicates within the column as well:

UPDATE users u
SET email = u.secondary_email
FROM (SELECT secondary_email, COUNT(*) as cnt
FROM users u
GROUP BY secondary_email
HAVING COUNT(*) = 1
) s
WHERE s.secondary_email = u.secondary_email AND
NOT EXISTS (SELECT 1 FROM users u2 WHERE u2.email = u.secondary_email);

Or choose the first one:

UPDATE users u
SET email = u.secondary_email
FROM (SELECT u.*,
ROW_NUMBER() OVER (PARTITION BY secondary_email ORDER BY user_id) as seqnum
FROM users u
) s
WHERE s.user_id = u.user_id AND
s.seqnum = 1 AND
NOT EXISTS (SELECT 1 FROM users u2 WHERE u2.email = u.secondary_email);

Note: This will also filter out NULL values which seems like a good idea.

Here is a db<>fiddle.

What to use instead of INSERT ... ON CONFLICT DO NOTHING on postgres

Just do what you're doing, without the loop:

INSERT INTO knowledge_state (lo_id, learner_id) 
SELECT a.lo_id, a.learnerid
FROM qb_lo_tag a
WHERE a.qb_id = NEW.qb_id
and NOT EXISTS (SELECT * FROM knowledge_state b
WHERE b.lo_id = a.lo_id AND b.learner_id = a.learnerid);

Of course, you can add an index on knowledge_state (lo_id, learner_id) to make it faster (On Conflict implies a unique constraint or other constraint, and a unique constraint implies an index).

Postgres INSERT ON CONFLICT DO NOTHING vs SELECT + INSERT query

query 2 will work if a row gets inserted between SELECT 'MCD'
WHERE NOT EXISTS (SELECT * FROM stock_price_code WHERE value = 'MCD')
and INSERT INTO stock_price_code (value), while query 1 will fail with duplicate.

The overhead of ON CONFLICT I believe is smaller then WHERE NOT EXISTS, not sure though

Race conditions between INSERT ON CONFLICT DO NOTHING and SELECT

To make absolutely sure that the single row in the first table is there, and it's ID returned, you could create a function like outlined here:

  • Is SELECT or INSERT in a function prone to race conditions?

To make sure the row also stays there for the duration of the transaction, just make sure it's locked. If you INSERT the row, it's locked anyway. If you SELECT an existing id, you have to lock it explicitly - just like you suggested. FOR KEY SHARE is strong enough for our purpose as long as there is a (non-partial, non-functional) UNIQUE index on (scope, name), which is safe to assume given your ON CONFLICT clause.

CREATE OR REPLACE FUNCTION f_object_id(_scope text, _name text, OUT _object_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
SELECT id FROM object
WHERE scope = $1
AND name = $2
-- lock to prevent deletion in the tiny time frame before the next INSERT
FOR KEY SHARE
INTO _object_id;

EXIT WHEN FOUND;

INSERT INTO object AS o (scope, name)
VALUES ($1, $2)
ON CONFLICT (scope, name) DO NOTHING
RETURNING o.id
INTO _object_id;

EXIT WHEN FOUND;
END LOOP;
END
$func$;

You really only need to lock the row if it's conceivable that a concurrent transaction might DELETE it (you don't UPDATE) in the tiny time frame between the SELECT and the next INSERT statement.

Also, if you have a FOREIGN KEY constraint from object_member.object_id to object.id (which seems likely), referential integrity is guaranteed anyway. If you don't add the explicit lock, and the row is deleted in between, you get a foreign key violation, and the INSERT to object_member is cancelled, along with the whole transaction. Else, the other transaction with the DELETE has to wait until your transaction is done, and is then cancelled by the same FK constraint since depending rows are now there (unless it's defined to CASCADE ...) So by locking (or not) you can decide whether to prevent the DELETE or the INSERT in this scenario.

Then your call burns down to just:

query(
`WITH o(id) AS (SELECT f_object_id($1, $2))
INSERT INTO object_member (object_id, key, value)
SELECT o.id, UNNEST($3::text[]), UNNEST($4::int[])
FROM o;`
[$scope, $name, $keys, $values]
)

Since you obviously insert multiple rows into object_member, I moved f_object_id($1, $2) to a CTE to avoid repeated execution - which would work, but pointlessly expensive.

In Postgres 12 or later I would make that explicit by adding MATERIALIZED (since the INSERT is hidden in a function):

WITH o(id) AS MATERIALIZED (SELECT f_object_id($1, $2)) ...

Aside: For the multiple unnest() in the SELECT list, make sure you are on Postgres 10 or later. See:

  • What is the expected behaviour for multiple set-returning functions in SELECT clause?

Matters of detail

Will it make any difference (apart from execution time) to do this in the application logic with multiple queries in the same transaction?

Basically no. The only difference is performance. Well, and short code and reliability. It's objectively more error prone to go back and forth between db and client for each loop. But unless you have extremely competitive transactions, you would hardly ever be looping anyway.

The other consideration is this: the matter is tricky, and most developers do not understand it. Encapsulated in a server-side function, it's less likely to be broken by the next application programmer (or yourself). You have to make sure that it's actually used, too. Either way, properly document the reasons you are doing it one way or another ...

I really wonder whether my second snippet is safe, or why not (given the quote about visibility in the SELECT after the INSERT).

Mostly safe, but not absolutely. While the next separate SELECT will see (now committed) rows of a transactions competing with the previous UPSERT, there is nothing to keep a third transaction from deleting it again in the meantime. The row has not been locked, and you have no way to do that while it's not visible, and there is no generic predicate locking available in Postgres.

Consider this (T1, T2, T3 are concurrent transactions):

                               T2: BEGIN transaction
T1: BEGIN transaction
T2: INSERT object 666
T1: UPSERT object 666
unique violation?
-> wait for T2
T2: COMMIT
T1: unique violation -> NO ACTION
finish statement
can't return invisible object 666
T3: DELETE object 666 & COMMIT
T1: SELECT object 666 -> no row!
BOOM!

Typically it's extremely unlikely that it ever happens.

But it's possible. Hence the loop.

The other option is SERIALIZABLE transaction isolation. Typically more expensive, and you need to prepare for serialization failures. Catch 22.

update on conflict do nothing postgres

Example data:

create table oh_person(identifier text unique);
insert into oh_person
values ('012'), ('0012'), ('0015'), ('015');

Use anonymous code block:

do $$
declare
r record;
begin
for r in
select identifier
from oh_person
where left(identifier, 1) = '0'
loop
begin
update oh_person
set identifier = trim(leading '0' from identifier)
where identifier = r.identifier;
exception
when unique_violation then
raise notice '% not updated', r.identifier;
end;
end loop;
end $$;

NOTICE: 0012 not updated
NOTICE: 015 not updated

Result:

select * from oh_person;

identifier
------------
0012
015
12
15
(4 rows)

Postgres INSERT ON CONFLICT DO UPDATE vs INSERT or UPDATE

Query 3 is the Postgres syntax for "UPSERT" (= UPDATE or INSERT), introduced in Postgres 9.5.

From the documentation:

ON CONFLICT DO UPDATE guarantees an atomic INSERT or UPDATE outcome;
provided there is no independent error, one of those two outcomes is
guaranteed, even under high concurrency. This is also known as UPSERT
UPDATE or INSERT”.

This is the best practice for what you are trying to achieve.



Related Topics



Leave a reply



Submit