Insert or Select Strategy to Always Return a Row

INSERT or SELECT strategy to always return a row?

Your observation seems impossible. The above command should always return an id, either for the newly inserted row or for the pre-existing row. Concurrent writes cannot mess with this since existing conflicting rows are locked. Explanation in this related answer:

  • How to use RETURNING with ON CONFLICT in PostgreSQL?

Unless an exception is raised, of course. You get an error message instead of a result in that case. Did you check that? Do you have error-handling in place? (In case your app somehow discards error messages: 1) Fix that. 2) There is an additional entry in the DB log with default logging settings.)

I do see a FK constraint in your table definition:

prop_type text not null references prop_type(name),

If you try to insert a row that violates the constraint, that's exactly what happens. If there is no row with name = 'jargon' in table prop_type, that's what you get:

ERROR:  insert or update on table "prop" violates foreign key constraint "prop_prop_type_fkey"
DETAIL: Key (prop_type)=(jargon) is not present in table "prop_type".

Demo:

dbfiddle here

Your observation would fit the crime:

If I change prop_type = 'jargon' to prop_type = 'foo' it works!

But your explanation is based on misconceptions:

It would seem the lock isn't taken if the expression wouldn't change anything even given the where false clause.

That's not how Postgres works. The lock is taken either way (explanation in above linked answer), and the Postgres locking mechanism never even considers how the new row compares to the old.

Does this really need to depend on my guessing a value that wouldn't be in the row, though? Or is there a better way to ensure you get the lock?

No. And no.

If missing FK values are indeed the problem, you might add missing (distinct) values in a single statement with rCTEs. Simple for single-row inserts like you demonstrate, but works for inserting many rows at once, too. Related:

  • How do I insert a row which contains a foreign key?
  • INSERT rows into multiple tables in a single query, selecting from an involved table
  • Can INSERT [...] ON CONFLICT be used for foreign key violations?

INSERTING and RETURNING the ID of a new or existing record, need a better strategy in Postgres 12.5

That just how on conflict works. When no row is inserted, nothing is returned.

A workaround is to rephrase the logic with a CTE: first insert (or do nothing), then select again from the table, using the input data for filtering.

with 
data as (
select *
from (values
('hello','world','passthrough.hello.world')
) v(schema_name, target_name, target_path)
),
ins as (
insert into pgconfig_target (schema_name, target_name, target_path)
select *
from data
on conflict (schema_name, target_name, target_path) do nothing
)
select c.id
from pgconfig_target c
inner join data d using (schema_name, target_name, target_path)

Return rows from INSERT with ON CONFLICT without needing to update

It's the recurring problem of SELECT or INSERT, related to (but different from) an UPSERT. The new UPSERT functionality in Postgres 9.5 is still instrumental.

WITH ins AS (
INSERT INTO names(name)
VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO UPDATE
SET name = NULL
WHERE FALSE -- never executed, but locks the row
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM names
WHERE name = 'bob' -- only executed if no INSERT
LIMIT 1;

This way you do not actually write a new row version without need.

I assume you are aware that in Postgres every UPDATE writes a new version of the row due to its MVCC model - even if name is set to the same value as before. This would make the operation more expensive, add to possible concurrency issues / lock contention in certain situations and bloat the table additionally.

However, there is still a tiny corner case for a race condition. Concurrent transactions may have added a conflicting row, which is not yet visible in the same statement. Then INSERT and SELECT come up empty.

Proper solution for single-row UPSERT:

  • Is SELECT or INSERT in a function prone to race conditions?

General solutions for bulk UPSERT:

  • How to use RETURNING with ON CONFLICT in PostgreSQL?

Without concurrent write load

If concurrent writes (from a different session) are not possible you don't need to lock the row and can simplify:

WITH ins AS (
INSERT INTO names(name)
VALUES ('bob')
ON CONFLICT ON CONSTRAINT names_name_key DO NOTHING -- no lock needed
RETURNING id
)
SELECT id FROM ins
UNION ALL
SELECT id FROM names
WHERE name = 'bob' -- only executed if no INSERT
LIMIT 1;

INSERT rows into multiple tables in a single query, selecting from an involved table

Final version

... after some more info from OP. Consider this demo:

-- DROP TABLE foo; DROP TABLE bar;

CREATE TEMP TABLE bar (
id serial PRIMARY KEY -- using a serial column!
,z integer NOT NULL
);

CREATE TEMP TABLE foo (
id serial PRIMARY KEY -- using a serial column!
,x integer NOT NULL
,y integer NOT NULL
,bar_id integer UNIQUE NOT NULL REFERENCES bar(id)
);

Insert values - bar first.

It would be very helpful if you provided test data in your question like this!

INSERT INTO bar (id,z) VALUES
(100, 7)
,(101,16)
,(102,21);

INSERT INTO foo (id, x, y, bar_id) VALUES
(1, 3,4,100)
,(2, 9,6,101)
,(3,18,0,102);

Set sequences to current values or we get duplicate key violations:

SELECT setval('foo_id_seq', 3);
SELECT setval('bar_id_seq', 102);

Checks:

-- SELECT nextval('foo_id_seq')
-- SELECT nextval('bar_id_seq')
-- SELECT * from bar;
-- SELECT * from foo;

Query:

WITH a AS (
SELECT f.x, f.y, bar_id, b.z
FROM foo f
JOIN bar b ON b.id = f.bar_id
WHERE x > 3
),b AS (
INSERT INTO bar (z)
SELECT z
FROM a
RETURNING z, id AS bar_id
)
INSERT INTO foo (x, y, bar_id)
SELECT a.x, a.y, b.bar_id
FROM a
JOIN b USING (z);

This should do what your last update describes.

The query assumes that z is UNIQUE. If z is not unique, it gets more complex. Refer to Query 2 in this related answer for a ready solution using the window function row_number() in this case.

Also, consider replacing the 1:1 relation between foo and bar with a single united table.



Data modifying CTE

Second answer after more info.

If you want to add rows to foo and bar in a single query, you can use a data modifying CTE since PostgreSQL 9.1:

WITH x AS (
INSERT INTO bar (col1, col2)
SELECT f.col1, f.col2
FROM foo f
WHERE f.id BETWEEN 12 AND 23 -- some filter
RETURNING col1, col2, bar_id -- assuming bar_id is a serial column
)
INSERT INTO foo (col1, col2, bar_id)
SELECT col1, col2, bar_id
FROM x;

I draw values from foo, insert them in bar, have them returned together with an auto-generated bar_id and insert that into foo. You can use any other data, too.

Here is a working demo to play with on sqlfiddle.



Basics

Original answer with basic information before clarifications.

The basic form is:

INSERT INTO foo (...)
SELECT ... FROM foo WHERE ...

No parenthesis needed.
You can do the same with any table

INSERT INTO foo (...)
SELECT ... FROM bar WHERE ...

And you can join to the table you insert into in the SELECT:

INSERT INTO foo (...)
SELECT f.col1, f.col2, .. , b.bar_id
FROM foo f
JOIN bar b USING (foo_id); -- present in foo and bar

It's just a SELECT like any other - that can include the table you are inserting into. The rows are first read, and then inserted.

Only inserting a row if it's not already there

What about the "JFDI" pattern?

BEGIN TRY
INSERT etc
END TRY
BEGIN CATCH
IF ERROR_NUMBER() <> 2627
RAISERROR etc
END CATCH

Seriously, this is quickest and the most concurrent without locks, especially at high volumes.
What if the UPDLOCK is escalated and the whole table is locked?

Read lesson 4:

Lesson 4: When developing the upsert proc prior to tuning the indexes, I first trusted that the If Exists(Select…) line would fire for any item and would prohibit duplicates. Nada. In a short time there were thousands of duplicates because the same item would hit the upsert at the same millisecond and both transactions would see a not exists and perform the insert. After much testing the solution was to use the unique index, catch the error, and retry allowing the transaction to see the row and perform an update instead an insert.

Solutions for INSERT OR UPDATE on SQL Server

don't forget about transactions. Performance is good, but simple (IF EXISTS..) approach is very dangerous.

When multiple threads will try to perform Insert-or-update you can easily
get primary key violation.

Solutions provided by @Beau Crawford & @Esteban show general idea but error-prone.

To avoid deadlocks and PK violations you can use something like this:

begin tran
if exists (select * from table with (updlock,serializable) where key = @key)
begin
update table set ...
where key = @key
end
else
begin
insert into table (key, ...)
values (@key, ...)
end
commit tran

or

begin tran
update table with (serializable) set ...
where key = @key

if @@rowcount = 0
begin
insert into table (key, ...) values (@key,..)
end
commit tran


Related Topics



Leave a reply



Submit