How to Use Returning With on Conflict in Postgresql

How to use RETURNING with ON CONFLICT in PostgreSQL?

I had exactly the same problem, and I solved it using 'do update' instead of 'do nothing', even though I had nothing to update. In your case it would be something like this:

INSERT INTO chats ("user", "contact", "name") 
VALUES ($1, $2, $3),
($2, $1, NULL)
ON CONFLICT("user", "contact")
DO UPDATE SET
name=EXCLUDED.name
RETURNING id;

This query will return all the rows, regardless they have just been inserted or they existed before.

Return rows from INSERT with ON CONFLICT without needing to update

It's the recurring problem of SELECT or INSERT, related to (but different from) an UPSERT. The new UPSERT functionality in Postgres 9.5 is still instrumental.

WITH ins AS (
INSERT INTO names(name)
VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO UPDATE
SET name = NULL
WHERE FALSE -- never executed, but locks the row
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM names
WHERE name = 'bob' -- only executed if no INSERT
LIMIT 1;

This way you do not actually write a new row version without need.

I assume you are aware that in Postgres every UPDATE writes a new version of the row due to its MVCC model - even if name is set to the same value as before. This would make the operation more expensive, add to possible concurrency issues / lock contention in certain situations and bloat the table additionally.

However, there is still a tiny corner case for a race condition. Concurrent transactions may have added a conflicting row, which is not yet visible in the same statement. Then INSERT and SELECT come up empty.

Proper solution for single-row UPSERT:

  • Is SELECT or INSERT in a function prone to race conditions?

General solutions for bulk UPSERT:

  • How to use RETURNING with ON CONFLICT in PostgreSQL?

Without concurrent write load

If concurrent writes (from a different session) are not possible you don't need to lock the row and can simplify:

WITH ins AS (
INSERT INTO names(name)
VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO NOTHING -- no lock needed
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM names
WHERE name = 'bob' -- only executed if no INSERT
LIMIT 1;

query of type INSERT ON CONFLICT DO NOTHING RETURNING returns nothing

Changing DO NOTHING to UPDATE SET statement (not modifying the final result) gives the results wanted:

xsignalsbot=# insert into users_strategies (id_strategy,id_account) 
values (1,48) on conflict (id_strategy,id_account) do update set
id_strategy=excluded.id_strategy returning users_strategies.active, users_strategies.risk;

active | risk
--------+------
t | 0.50
(1 row)

Postgresql upsert return id on conflict?

Use ON CONFLICT DO UPDATE instead to ensure that the query operates on a row and so can return something:

CREATE TABLE match(id serial PRIMARY KEY);

INSERT INTO match (id) VALUES (1)
ON CONFLICT (id) DO UPDATE
SET id = excluded.id
RETURNING id;

How to include excluded rows in RETURNING from INSERT ... ON CONFLICT

The error you get:

ON CONFLICT DO UPDATE command cannot affect row a second time

... indicates you are trying to upsert the same row more than once in a single command. In other words: you have dupes on (name, url, email) in your VALUES list. Fold duplicates (if that's an option) and the error goes away. This chooses an arbitrary row from each set of dupes:

INSERT INTO feeds_person (created, modified, name, url, email)
SELECT DISTINCT ON (name, url, email) *
FROM (
VALUES
('blah', 'blah', 'blah', 'blah', 'blah')
-- ... more rows
) AS v(created, modified, name, url, email) -- match column list
ON CONFLICT (name, url, email) DO UPDATE
SET url = feeds_person.url
RETURNING id;

Since we use a free-standing VALUES expression now, you have to add explicit type casts for non-default types. Like:

VALUES
(timestamptz '2016-03-12 02:47:56+01'
, timestamptz '2016-03-12 02:47:56+01'
, 'n3', 'u3', 'e3')
...

Your timestamptz columns need an explicit type cast, while the string types can operate with default text. (You could still cast to varchar(n) right away.)

If you want to have a say in which row to pick from each set of dupes, there are ways to do that:

  • Select first row in each GROUP BY group?

You are right, there is (currently) no way to use excluded columns in the RETURNING clause. I quote the Postgres Wiki:

Note that RETURNING does not make visible the "EXCLUDED.*" alias
from the UPDATE (just the generic "TARGET.*" alias is visible
there). Doing so is thought to create annoying ambiguity for the
simple, common cases [30] for little to no benefit. At some
point in the future, we may pursue a way of exposing if
RETURNING-projected tuples were inserted and updated, but this
probably doesn't need to make it into the first committed iteration of
the feature [31].

However, you shouldn't be updating rows that are not supposed to be updated. Empty updates are almost as expensive as regular updates - and might have unintended side effects. You don't strictly need UPSERT to begin with, your case looks more like "SELECT or INSERT". Related:

  • Is SELECT or INSERT in a function prone to race conditions?

One cleaner way to insert a set of rows would be with data-modifying CTEs:

WITH val AS (
SELECT DISTINCT ON (name, url, email) *
FROM (
VALUES
(timestamptz '2016-1-1 0:0+1', timestamptz '2016-1-1 0:0+1', 'n', 'u', 'e')
, ('2016-03-12 02:47:56+01', '2016-03-12 02:47:56+01', 'n1', 'u3', 'e3')
-- more (type cast only needed in 1st row)
) v(created, modified, name, url, email)
)
, ins AS (
INSERT INTO feeds_person (created, modified, name, url, email)
SELECT created, modified, name, url, email FROM val
ON CONFLICT (name, url, email) DO NOTHING
RETURNING id, name, url, email
)
SELECT 'inserted' AS how, id FROM ins -- inserted
UNION ALL
SELECT 'selected' AS how, f.id -- not inserted
FROM val v
JOIN feeds_person f USING (name, url, email);

The added complexity should pay for big tables where INSERT is the rule and SELECT the exception.

Originally, I had added a NOT EXISTS predicate on the last SELECT to prevent duplicates in the result. But that was redundant. All CTEs of a single query see the same snapshots of tables. The set returned with ON CONFLICT (name, url, email) DO NOTHING is mutually exclusive to the set returned after the INNER JOIN on the same columns.

Unfortunately this also opens a tiny window for a race condition. If ...

  • a concurrent transaction inserts conflicting rows
  • has not committed yet
  • but commits eventually

... some rows may be lost.

You might just INSERT .. ON CONFLICT DO NOTHING, followed by a separate SELECT query for all rows - within the same transaction to overcome this. Which in turn opens another tiny window for a race condition if concurrent transactions can commit writes to the table between INSERT and SELECT (in default READ COMMITTED isolation level). Can be avoided with REPEATABLE READ transaction isolation (or stricter). Or with a (possibly expensive or even unacceptable) write lock on the whole table. You can get any behavior you need, but there may be a price to pay.

Related:

  • How to use RETURNING with ON CONFLICT in PostgreSQL?
  • Return rows from INSERT with ON CONFLICT without needing to update

PostgreSQL Upsert differentiate inserted and updated rows using system columns XMIN, XMAX and others

I think that this is an interesting question that deserves an in-depth answer; please bear with me if it is a little bit lengthy.

In short: Your guess is right, and you can use the following RETURNING clause to determine if the row was inserted and not updated:

RETURNING (xmax = 0) AS inserted

Now the detailed explanation:

When a row is updated, PostgreSQL does not modify the data, but creates a new version of the row; the old version will be deleted by autovacuum when it is no longer needed. A version of a row is called a tuple, so in PostgreSQL there can be more than one tuples per row.

xmax serves two different purposes:

  1. As stated in the documentation, it can be the transaction ID of the transaction that deleted (or updated) the tuple (“tuple” is another word for “row”). Only transactions with a transaction ID between xmin and xmax can see the tuple. An old tuple can be deleted safely if there is no transaction with a transaction ID less than xmax.

  2. xmax is also used to store row locks. In PostgreSQL, row locks are not stored in the lock table, but in the tuple to avoid overflow of the lock table.

    If only one transaction has a lock on the row, xmax will contain the transaction ID of the locking transaction. If more than one transaction has a lock on the row, xmax contains the number of a so-called multixact, which is a data structure that in turn contains the transaction IDs of the locking transactions.

The documentation of xmax is not complete, because the exact meaning of this field is considered an implementation detail and cannot be understood without knowing t_infomask of the tuple, which is not immediately visible via SQL.

You can install the contrib module pageinspect to view this and other fields of a tuple.

I ran your example, and this is what I see when I use the heap_page_items function to examine details (the transaction ID numbers are of course different in my case):

SELECT *, ctid, xmin, xmax FROM t;

┌───┬────┬───────┬────────┬────────┐
│ i │ x │ ctid │ xmin │ xmax │
├───┼────┼───────┼────────┼────────┤
│ 1 │ 11 │ (0,2) │ 102508 │ 102508 │
│ 2 │ 22 │ (0,3) │ 102508 │ 0 │
└───┴────┴───────┴────────┴────────┘
(2 rows)

SELECT lp, lp_off, t_xmin, t_xmax, t_ctid,
to_hex(t_infomask) AS t_infomask, to_hex(t_infomask2) AS t_infomask2
FROM heap_page_items(get_raw_page('laurenz.t', 0));

┌────┬────────┬────────┬────────┬────────┬────────────┬─────────────┐
│ lp │ lp_off │ t_xmin │ t_xmax │ t_ctid │ t_infomask │ t_infomask2 │
├────┼────────┼────────┼────────┼────────┼────────────┼─────────────┤
│ 1 │ 8160 │ 102507 │ 102508 │ (0,2) │ 500 │ 4002 │
│ 2 │ 8128 │ 102508 │ 102508 │ (0,2) │ 2190 │ 8002 │
│ 3 │ 8096 │ 102508 │ 0 │ (0,3) │ 900 │ 2 │
└────┴────────┴────────┴────────┴────────┴────────────┴─────────────┘
(3 rows)

The meanings of t_infomask and t_infomask2 can be found in src/include/access/htup_details.h. lp_off is the offset of the tuple data in the page, and t_ctid is the current tuple ID which consists of the page number and a tuple number within the page. Since the table was newly created, all data are in page 0.

Let me discuss the three rows returned by heap_page_items.

  1. At line pointer (lp) 1 we find the old, updated tuple. It originally had ctid = (0,1), but that got modified to contain the tuple ID of the current version during update. The Tuple was created by transaction 102507 and invalidated by transaction 102508 (the transaction that issued the INSERT ... ON CONFLICT). This tuple is not visible any more an will get removed during VACUUM.

    t_infomask shows that both xmin and xmax belong to committed transactions and consequently show when the tuples was created and deleted. t_infomask2 shows that the tuple was updated with a HOT (heap only tuple) update, which means that the updated tuple is in the same page as the original tuple and no indexed column was modified (see src/backend/access/heap/README.HOT).

  2. At line pointer 2 we see the new, updated tuple that was created by transaction the INSERT ... ON CONFLICT (transaction 102508).

    t_infomask shows that this tuple is the result of an update, xmin is valid, and xmax contains a KEY SHARE row lock (which is no longer relevant since the transaction has completed). This row lock was taken during INSERT ... ON CONFLICT processing. t_infomask2 shows that this is a HOT tuple.

  3. At line pointer 3 we see the newly inserted row.

    t_infomask shows that xmin is valid and xmax is invalid. xmax is set to 0 because this value is always used for newly inserted tuples.

So the nonzero xmax of the updated row is an implementation artifact caused by a row lock. It is conceivable that INSERT ... ON CONFLICT is reimplemented one day so that this behaviour changes, but I think that is unlikely.

Use INSERT ... ON CONFLICT DO NOTHING RETURNING failed rows

A bit verbose, but I can't think of anything else:

with all_tags (name) as (
values ('tag10'), ('tag6'), ('tag11')
), inserted (id, name) as (
INSERT INTO tags (name)
select name
from all_tags
ON CONFLICT DO NOTHING
returning id, name
)
select t.id, t.name, 'already there'
from tags t
join all_tags at on at.name = t.name
union all
select id, name, 'inserted'
from inserted;

The outer select from tags sees the snapshot of the table as it was before the new tags were inserted. The third column with the constant is only there to test the query so that one can identify which rows were inserted and which not.

postgres: update a row on conflict insert and return old values

Somehow, this seems to work:

insert into t (x)
values ('a0'), ('b')
on conflict (x) do update
set x = excluded.x || '0'
returning i, x, (select x from t t2 where t2.i = t.i);

I am surprised because t is in scope in the subquery, but excluded is not. Hmmm . . . perhaps that is because it is not part of the on conflict clause but part of the overall insert. That starts to make sense.

Here is a db<>fiddle for this version.

I think your code would look like:

INSERT INTO tbl (...)
VALUES (...)
ON CONFLICT DO
UPDATE tbl x
SET tbl_id = 24,
name = 'New Gal'
RETURNING (SELECT t2.tbl_id FROM tbl t2 WHERE t2.tbl_id = tbl.tbl_id) AS old_id,
(SELECT t2.name FROM tbl t2 WHERE t2.tbl_id = tbl.tbl_id) AS old_name,
x.tbl_id, x.name;


Related Topics



Leave a reply



Submit