How to Include Excluded Rows in Returning from Insert ... on Conflict

How to include excluded rows in RETURNING from INSERT ... ON CONFLICT

The error you get:

ON CONFLICT DO UPDATE command cannot affect row a second time

... indicates you are trying to upsert the same row more than once in a single command. In other words: you have dupes on (name, url, email) in your VALUES list. Fold duplicates (if that's an option) and the error goes away. This chooses an arbitrary row from each set of dupes:

INSERT INTO feeds_person (created, modified, name, url, email)
SELECT DISTINCT ON (name, url, email) *
FROM (
VALUES
('blah', 'blah', 'blah', 'blah', 'blah')
-- ... more rows
) AS v(created, modified, name, url, email) -- match column list
ON CONFLICT (name, url, email) DO UPDATE
SET url = feeds_person.url
RETURNING id;

Since we use a free-standing VALUES expression now, you have to add explicit type casts for non-default types. Like:

VALUES
(timestamptz '2016-03-12 02:47:56+01'
, timestamptz '2016-03-12 02:47:56+01'
, 'n3', 'u3', 'e3')
...

Your timestamptz columns need an explicit type cast, while the string types can operate with default text. (You could still cast to varchar(n) right away.)

If you want to have a say in which row to pick from each set of dupes, there are ways to do that:

  • Select first row in each GROUP BY group?

You are right, there is (currently) no way to use excluded columns in the RETURNING clause. I quote the Postgres Wiki:

Note that RETURNING does not make visible the "EXCLUDED.*" alias
from the UPDATE (just the generic "TARGET.*" alias is visible
there). Doing so is thought to create annoying ambiguity for the
simple, common cases [30] for little to no benefit. At some
point in the future, we may pursue a way of exposing if
RETURNING-projected tuples were inserted and updated, but this
probably doesn't need to make it into the first committed iteration of
the feature [31].

However, you shouldn't be updating rows that are not supposed to be updated. Empty updates are almost as expensive as regular updates - and might have unintended side effects. You don't strictly need UPSERT to begin with, your case looks more like "SELECT or INSERT". Related:

  • Is SELECT or INSERT in a function prone to race conditions?

One cleaner way to insert a set of rows would be with data-modifying CTEs:

WITH val AS (
SELECT DISTINCT ON (name, url, email) *
FROM (
VALUES
(timestamptz '2016-1-1 0:0+1', timestamptz '2016-1-1 0:0+1', 'n', 'u', 'e')
, ('2016-03-12 02:47:56+01', '2016-03-12 02:47:56+01', 'n1', 'u3', 'e3')
-- more (type cast only needed in 1st row)
) v(created, modified, name, url, email)
)
, ins AS (
INSERT INTO feeds_person (created, modified, name, url, email)
SELECT created, modified, name, url, email FROM val
ON CONFLICT (name, url, email) DO NOTHING
RETURNING id, name, url, email
)
SELECT 'inserted' AS how, id FROM ins -- inserted
UNION ALL
SELECT 'selected' AS how, f.id -- not inserted
FROM val v
JOIN feeds_person f USING (name, url, email);

The added complexity should pay for big tables where INSERT is the rule and SELECT the exception.

Originally, I had added a NOT EXISTS predicate on the last SELECT to prevent duplicates in the result. But that was redundant. All CTEs of a single query see the same snapshots of tables. The set returned with ON CONFLICT (name, url, email) DO NOTHING is mutually exclusive to the set returned after the INNER JOIN on the same columns.

Unfortunately this also opens a tiny window for a race condition. If ...

  • a concurrent transaction inserts conflicting rows
  • has not committed yet
  • but commits eventually

... some rows may be lost.

You might just INSERT .. ON CONFLICT DO NOTHING, followed by a separate SELECT query for all rows - within the same transaction to overcome this. Which in turn opens another tiny window for a race condition if concurrent transactions can commit writes to the table between INSERT and SELECT (in default READ COMMITTED isolation level). Can be avoided with REPEATABLE READ transaction isolation (or stricter). Or with a (possibly expensive or even unacceptable) write lock on the whole table. You can get any behavior you need, but there may be a price to pay.

Related:

  • How to use RETURNING with ON CONFLICT in PostgreSQL?
  • Return rows from INSERT with ON CONFLICT without needing to update

How to use RETURNING with ON CONFLICT in PostgreSQL?

I had exactly the same problem, and I solved it using 'do update' instead of 'do nothing', even though I had nothing to update. In your case it would be something like this:

INSERT INTO chats ("user", "contact", "name") 
VALUES ($1, $2, $3),
($2, $1, NULL)
ON CONFLICT("user", "contact")
DO UPDATE SET
name=EXCLUDED.name
RETURNING id;

This query will return all the rows, regardless they have just been inserted or they existed before.

Return rows from INSERT with ON CONFLICT without needing to update

It's the recurring problem of SELECT or INSERT, related to (but different from) an UPSERT. The new UPSERT functionality in Postgres 9.5 is still instrumental.

WITH ins AS (
INSERT INTO names(name)
VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO UPDATE
SET name = NULL
WHERE FALSE -- never executed, but locks the row
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM names
WHERE name = 'bob' -- only executed if no INSERT
LIMIT 1;

This way you do not actually write a new row version without need.

I assume you are aware that in Postgres every UPDATE writes a new version of the row due to its MVCC model - even if name is set to the same value as before. This would make the operation more expensive, add to possible concurrency issues / lock contention in certain situations and bloat the table additionally.

However, there is still a tiny corner case for a race condition. Concurrent transactions may have added a conflicting row, which is not yet visible in the same statement. Then INSERT and SELECT come up empty.

Proper solution for single-row UPSERT:

  • Is SELECT or INSERT in a function prone to race conditions?

General solutions for bulk UPSERT:

  • How to use RETURNING with ON CONFLICT in PostgreSQL?

Without concurrent write load

If concurrent writes (from a different session) are not possible you don't need to lock the row and can simplify:

WITH ins AS (
INSERT INTO names(name)
VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO NOTHING -- no lock needed
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM names
WHERE name = 'bob' -- only executed if no INSERT
LIMIT 1;

query of type INSERT ON CONFLICT DO NOTHING RETURNING returns nothing

Changing DO NOTHING to UPDATE SET statement (not modifying the final result) gives the results wanted:

xsignalsbot=# insert into users_strategies (id_strategy,id_account) 
values (1,48) on conflict (id_strategy,id_account) do update set
id_strategy=excluded.id_strategy returning users_strategies.active, users_strategies.risk;

active | risk
--------+------
t | 0.50
(1 row)

PostgreSQL INSERT ON CONFLICT UPDATE (upsert) use all excluded values

Postgres hasn't implemented an equivalent to INSERT OR REPLACE. From the ON CONFLICT docs (emphasis mine):

It can be either DO NOTHING, or a DO UPDATE clause specifying the exact details of the UPDATE action to be performed in case of a conflict.

Though it doesn't give you shorthand for replacement, ON CONFLICT DO UPDATE applies more generally, since it lets you set new values based on preexisting data. For example:

INSERT INTO users (id, level)
VALUES (1, 0)
ON CONFLICT (id) DO UPDATE
SET level = users.level + 1;

Use INSERT ... ON CONFLICT DO NOTHING RETURNING failed rows

A bit verbose, but I can't think of anything else:

with all_tags (name) as (
values ('tag10'), ('tag6'), ('tag11')
), inserted (id, name) as (
INSERT INTO tags (name)
select name
from all_tags
ON CONFLICT DO NOTHING
returning id, name
)
select t.id, t.name, 'already there'
from tags t
join all_tags at on at.name = t.name
union all
select id, name, 'inserted'
from inserted;

The outer select from tags sees the snapshot of the table as it was before the new tags were inserted. The third column with the constant is only there to test the query so that one can identify which rows were inserted and which not.

How can I return EXCLUDED from WITH in PostgreSQL?

A more brute-force approach should work. Store the rows being inserted in a separate CTE. Then you can remove the ones that are inserted:

WITH to_insert (some_num) as (
VALUES (1), (2), (3)
),
i AS (
INSERT INTO test_table (some_num)
SELECT *
FROM to_insert
ON CONFLICT (some_num) DO NOTHING
RETURNING *
)
SELECT ti.some_num
FROM to_insert ti
WHERE NOT EXISTS (SELECT 1 FROM i WHERE i.some_num = ti.some_num);

Here is a db<>fidde.

Postgresql EXCLUDE constraint not triggering ON CONFLICT when INSERT

The documentation somewhat tersely remarks:

Note that exclusion constraints are not supported as arbiters with ON CONFLICT DO UPDATE.

Looking at the source code makes the case clearer:

  • You can never use an exclusion constraint with ON CONFLICT DO UPDATE.

  • You can, however, use

    ON CONFLICT ON CONSTRAINT price_history_item_id_valid_time_excl DO NOTHING

    That is, you can use a named exclusion constraint with DO NOTHING.

  • There is no “constraint inference” with exclusion constraints, i.e., even in the DO NOTHING case you cannot just specify the indexed expressions in parentheses and have PostgreSQL find the corresponding exclusion constraint.

How can I get the INSERTED and UPDATED rows for an UPSERT operation in postgres

If you add a boolean updated column to the people table:

ALTER TABLE people ADD COLUMN updated bool DEFAULT FALSE;

then you could identify updated rows by setting updated = TRUE in the DO UPDATE SET clause:

INSERT INTO people (SELECT * FROM people_update)
ON CONFLICT (name,surname)
DO UPDATE SET age = EXCLUDED.age , street = EXCLUDED.street , city = EXCLUDED.city
, postal = EXCLUDED.postal
, updated = TRUE
WHERE
(people.age,people.street,people.city,people.postal) IS DISTINCT FROM
(EXCLUDED.age,EXCLUDED.street,EXCLUDED.city,EXCLUDED.postal)
RETURNING *;

For example,

CREATE TABLE test.people (
name text
, surname text
, age float
, street text
, city text
, postal int
);
CREATE UNIQUE INDEX people_idx on people (name, surname);
ALTER TABLE people ADD COLUMN updated bool;
ALTER TABLE people ADD COLUMN prior_age float;
ALTER TABLE people ADD COLUMN prior_street text;
ALTER TABLE people ADD COLUMN prior_city text;
ALTER TABLE people ADD COLUMN prior_postal int;

INSERT INTO people (name, surname, age, street, city, postal) VALUES
('Sancho', 'Panza', 414, '1 Manchego', 'Barcelona', 01605)
, ('Oliver', 'Twist', 182, '2 Stilton', 'London', 01837)
, ('Quasi', 'Modo', 188, $$3 Rue d'Arcole$$, 'Paris' , 01831 )
;

CREATE TABLE test.people_update (
name text
, surname text
, age float
, street text
, city text
, postal int
);

INSERT INTO people_update (name, surname, age, street, city, postal) VALUES
('Sancho', 'Panza', 4140, '10 Idiazabal', 'Montserrat', 16050)
, ('Quasi', 'Modo', 1880, $$30 Champs Elysée$$ , 'Paris', 18310 )
, ('Pinocchio', 'Geppetto', 1380, '40 Nerbone', 'Florence', 18810)
;

INSERT INTO people (SELECT * FROM people_update)
ON CONFLICT (name,surname)
DO UPDATE SET
updated = TRUE
, prior_age = (CASE WHEN people.age = EXCLUDED.age THEN NULL ELSE people.age END)
, prior_street = (CASE WHEN people.street = EXCLUDED.street THEN NULL ELSE people.street END)
, prior_city = (CASE WHEN people.city = EXCLUDED.city THEN NULL ELSE people.city END)
, prior_postal = (CASE WHEN people.postal = EXCLUDED.postal THEN NULL ELSE people.postal END)
, age = EXCLUDED.age
, street = EXCLUDED.street
, city = EXCLUDED.city
, postal = EXCLUDED.postal
WHERE
(people.age,people.street,people.city,people.postal) IS DISTINCT FROM
(EXCLUDED.age,EXCLUDED.street,EXCLUDED.city,EXCLUDED.postal)
RETURNING *;

yields

| name       | surname  |  age | street           | city       | postal | updated | prior_age | prior_street   | prior_city | prior_postal |
|------------+----------+------+------------------+------------+--------+---------+-----------+----------------+------------+--------------|
| Sancho | Panza | 4140 | 10 Idiazabal | Montserrat | 16050 | t | 414 | 1 Manchego | Barcelona | 1605 |
| Quasi | Modo | 1880 | 30 Champs Elysée | Paris | 18310 | t | 188 | 3 Rue d'Arcole | | 1831 |
| Pinocchio | Geppetto | 1380 | 40 Nerbone | Florence | 18810 | f | | | | |

The updated column shows the ('Sancho', 'Panza') and ('Quasi', 'Modo') lines have been updated, and
('Pinocchio', 'Geppetto') is a new insert.



Related Topics



Leave a reply



Submit