Is SELECT or INSERT in a function prone to race conditions?
It's the recurring problem of SELECT
or INSERT
under possible concurrent write load, related to (but different from) UPSERT
(which is INSERT
or UPDATE
).
This PL/pgSQL function uses UPSERT (INSERT ... ON CONFLICT .. DO UPDATE
) to INSERT
or SELECT
a single row:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
SELECT tag_id -- only if row existed before
FROM tag
WHERE tag = _tag
INTO _tag_id;
IF NOT FOUND THEN
INSERT INTO tag AS t (tag)
VALUES (_tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id
INTO _tag_id;
END IF;
END
$func$;
There is still a tiny window for a race condition. To make absolutely sure we get an ID:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
SELECT tag_id
FROM tag
WHERE tag = _tag
INTO _tag_id;
EXIT WHEN FOUND;
INSERT INTO tag AS t (tag)
VALUES (_tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id
INTO _tag_id;
EXIT WHEN FOUND;
END LOOP;
END
$func$;
db<>fiddle here
This keeps looping until either INSERT
or SELECT
succeeds.
Call:
SELECT f_tag_id('possibly_new_tag');
If subsequent commands in the same transaction rely on the existence of the row and it is actually possible that other transactions update or delete it concurrently, you can lock an existing row in the SELECT
statement with FOR SHARE
.
If the row gets inserted instead, it is locked (or not visible for other transactions) until the end of the transaction anyway.
Start with the common case (INSERT
vs SELECT
) to make it faster.
Related:
- Get Id from a conditional INSERT
- How to include excluded rows in RETURNING from INSERT ... ON CONFLICT
Related (pure SQL) solution to INSERT
or SELECT
multiple rows (a set) at once:
- How to use RETURNING with ON CONFLICT in PostgreSQL?
What's wrong with this pure SQL solution?
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
LANGUAGE sql AS
$func$
WITH ins AS (
INSERT INTO tag AS t (tag)
VALUES (_tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id
)
SELECT tag_id FROM ins
UNION ALL
SELECT tag_id FROM tag WHERE tag = _tag
LIMIT 1;
$func$;
Not entirely wrong, but it fails to seal a loophole, like @FunctorSalad worked out. The function can come up with an empty result if a concurrent transaction tries to do the same at the same time. The manual:
All the statements are executed with the same snapshot
If a concurrent transaction inserts the same new tag a moment earlier, but hasn't committed, yet:
The UPSERT part comes up empty, after waiting for the concurrent transaction to finish. (If the concurrent transaction should roll back, it still inserts the new tag and returns a new ID.)
The SELECT part also comes up empty, because it's based on the same snapshot, where the new tag from the (yet uncommitted) concurrent transaction is not visible.
We get nothing. Not as intended. That's counter-intuitive to naive logic (and I got caught there), but that's how the MVCC model of Postgres works - has to work.
So do not use this if multiple transactions can try to insert the same tag at the same time. Or loop until you actually get a row. The loop will hardly ever be triggered in common work loads anyway.
Postgres 9.4 or older
Given this (slightly simplified) table:
CREATE table tag (
tag_id serial PRIMARY KEY
, tag text UNIQUE
);
An almost 100% secure function to insert new tag / select existing one, could look like this.
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT tag_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
BEGIN
WITH sel AS (SELECT t.tag_id FROM tag t WHERE t.tag = _tag FOR SHARE)
, ins AS (INSERT INTO tag(tag)
SELECT _tag
WHERE NOT EXISTS (SELECT 1 FROM sel) -- only if not found
RETURNING tag.tag_id) -- qualified so no conflict with param
SELECT sel.tag_id FROM sel
UNION ALL
SELECT ins.tag_id FROM ins
INTO tag_id;
EXCEPTION WHEN UNIQUE_VIOLATION THEN -- insert in concurrent session?
RAISE NOTICE 'It actually happened!'; -- hardly ever happens
END;
EXIT WHEN tag_id IS NOT NULL; -- else keep looping
END LOOP;
END
$func$;
db<>fiddle here
Old sqlfiddle
Why not 100%? Consider the notes in the manual for the related UPSERT
example:
- https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE
Explanation
Try the
SELECT
first. This way you avoid the considerably more expensive exception handling 99.99% of the time.Use a CTE to minimize the (already tiny) time slot for the race condition.
The time window between the
SELECT
and theINSERT
within one query is super tiny. If you don't have heavy concurrent load, or if you can live with an exception once a year, you could just ignore the case and use the SQL statement, which is faster.No need for
FETCH FIRST ROW ONLY
(=LIMIT 1
). The tag name is obviouslyUNIQUE
.Remove
FOR SHARE
in my example if you don't usually have concurrentDELETE
orUPDATE
on the tabletag
. Costs a tiny bit of performance.Never quote the language name:
'plpgsql'.plpgsql
is an identifier. Quoting may cause problems and is only tolerated for backwards compatibility.Don't use non-descriptive column names like
id
orname
. When joining a couple of tables (which is what you do in a relational DB) you end up with multiple identical names and have to use aliases.
Built into your function
Using this function you could largely simplify your FOREACH LOOP
to:
...
FOREACH TagName IN ARRAY $3
LOOP
INSERT INTO taggings (PostId, TagId)
VALUES (InsertedPostId, f_tag_id(TagName));
END LOOP;
...
Faster, though, as a single SQL statement with unnest()
:
INSERT INTO taggings (PostId, TagId)
SELECT InsertedPostId, f_tag_id(tag)
FROM unnest($3) tag;
Replaces the whole loop.
Alternative solution
This variant builds on the behavior of UNION ALL
with a LIMIT
clause: as soon as enough rows are found, the rest is never executed:
- Way to try multiple SELECTs till a result is available?
Building on this, we can outsource the INSERT
into a separate function. Only there we need exception handling. Just as safe as the first solution.
CREATE OR REPLACE FUNCTION f_insert_tag(_tag text, OUT tag_id int)
RETURNS int
LANGUAGE plpgsql AS
$func$
BEGIN
INSERT INTO tag(tag) VALUES (_tag) RETURNING tag.tag_id INTO tag_id;
EXCEPTION WHEN UNIQUE_VIOLATION THEN -- catch exception, NULL is returned
END
$func$;
Which is used in the main function:
CREATE OR REPLACE FUNCTION f_tag_id(_tag text, OUT _tag_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
SELECT tag_id FROM tag WHERE tag = _tag
UNION ALL
SELECT f_insert_tag(_tag) -- only executed if tag not found
LIMIT 1 -- not strictly necessary, just to be clear
INTO _tag_id;
EXIT WHEN _tag_id IS NOT NULL; -- else keep looping
END LOOP;
END
$func$;
This is a bit cheaper if most of the calls only need
SELECT
, because the more expensive block withINSERT
containing theEXCEPTION
clause is rarely entered. The query is also simpler.FOR SHARE
is not possible here (not allowed inUNION
query).LIMIT 1
would not be necessary (tested in pg 9.4). Postgres derivesLIMIT 1
fromINTO _tag_id
and only executes until the first row is found.
Can an INSERT-SELECT query be subject to race conditions?
Yes it will fail with a duplicate key value violates unique constraint
error. What I do is to place the insertion code in a try/except
block and when the exception is thrown I catch it and retry. That simple. Unless the application has a huge amount of users it will work flawlessly.
In your query the default isolation level is enough since it is a single insert statement and there is no risk of phantom reads.
Notice that even when setting the isolation level to serializable the try/except block is not avoidable. From the manual about serializable:
like the Repeatable Read level, applications using this level must be prepared to retry transactions due to serialization failures
Is (insert into ... select) statement safe about race conditions - SQL Server?
Look at the locking on the table(s), and hints used. Based on the information provided I expect this to be open to race conditions (i.e. a lock is not maintained on the table). You can look at setting-up locks with SQL hits, or transactions. There are a few posts out there that discuss locking. An example of one that looks pertinent to your question is:
SQL Server Table Lock during bulk insert
Without a continuous lock on the table(s) affected from the SELECT
through the INSERT
, yes you are open to a race condition.
NOTE - locking the table for extended periods will impact performance to varying degrees depending on the volume of traffic, and time locked (if a lot of users are trying to book at the same time, 1sec difference can become significant). Therefore, if you do lock the table, I recommend spending some time tuning your queries for performance. I wont say much on this as I do not have any metrics on your data, or know your indexes. Probably best to start by simply re-structuring your queries to hit the most restrictive table first. I assume this would be the table holding alloted_room_id
(pass the ID so it can be used in the WHERE
clause).
Race Condition between SELECT and INSERT for multiple columns
The wrench in the works is instead of SELECT f_insert_tag(tag_p_id, _tag)
SELECT * FROM
f_insert_tag(tag_p_id, _tag)
For Postgres 9.4
CREATE FUNCTION f_insert_tag(_tag_id int, _tag text, OUT _tag_id_ int, OUT _tag_ text)
AS
$func$
BEGIN
INSERT INTO t(tag_id, tag)
VALUES (_tag_id, _tag)
RETURNING t.tag_id, t.tag
INTO _tag_id_, _tag_;
EXCEPTION WHEN UNIQUE_VIOLATION THEN
-- catch exception, return NULL
END
$func$ LANGUAGE plpgsql;
CREATE FUNCTION f_tag_id(_tag_id int, _tag text, OUT _tag_id_ int, OUT _tag_ text) AS
$func$
BEGIN
LOOP
SELECT t.tag_id, t.tag
FROM t
WHERE t.tag = _tag
UNION ALL
SELECT * -- !!!
FROM f_insert_tag(_tag_id, _tag)
LIMIT 1
INTO _tag_id_, _tag_;
EXIT WHEN _tag_id_ IS NOT NULL; -- else keep looping
END LOOP;
END
$func$ LANGUAGE plpgsql;
db<>fiddle here
For Postgres 9.5 or later:
CREATE FUNCTION f_tag_id(_tag_id int, _tag text, OUT _tag_id_ int, OUT _tag_ text) AS
$func$
BEGIN
LOOP
SELECT t.tag_id, t.tag
FROM t
WHERE t.tag = _tag
INTO _tag_id_, _tag_;
EXIT WHEN FOUND;
INSERT INTO t (tag_id, tag)
VALUES (_tag_id, _tag)
ON CONFLICT (tag) DO NOTHING
RETURNING t.tag_id, t.tag
INTO _tag_id_, _tag_;
EXIT WHEN FOUND;
END LOOP;
END
$func$ LANGUAGE plpgsql;
db<>fiddle here
Basics here:
- Is SELECT or INSERT in a function prone to race conditions?
Race conditions between INSERT ON CONFLICT DO NOTHING and SELECT
To make absolutely sure that the single row in the first table is there, and it's ID returned, you could create a function like outlined here:
- Is SELECT or INSERT in a function prone to race conditions?
To make sure the row also stays there for the duration of the transaction, just make sure it's locked. If you INSERT
the row, it's locked anyway. If you SELECT
an existing id
, you have to lock it explicitly - just like you suggested. FOR KEY SHARE
is strong enough for our purpose as long as there is a (non-partial, non-functional) UNIQUE
index on (scope, name)
, which is safe to assume given your ON CONFLICT
clause.
CREATE OR REPLACE FUNCTION f_object_id(_scope text, _name text, OUT _object_id int)
LANGUAGE plpgsql AS
$func$
BEGIN
LOOP
SELECT id FROM object
WHERE scope = $1
AND name = $2
-- lock to prevent deletion in the tiny time frame before the next INSERT
FOR KEY SHARE
INTO _object_id;
EXIT WHEN FOUND;
INSERT INTO object AS o (scope, name)
VALUES ($1, $2)
ON CONFLICT (scope, name) DO NOTHING
RETURNING o.id
INTO _object_id;
EXIT WHEN FOUND;
END LOOP;
END
$func$;
You really only need to lock the row if it's conceivable that a concurrent transaction might DELETE
it (you don't UPDATE
) in the tiny time frame between the SELECT
and the next INSERT
statement.
Also, if you have a FOREIGN KEY
constraint from object_member.object_id
to object.id
(which seems likely), referential integrity is guaranteed anyway. If you don't add the explicit lock, and the row is deleted in between, you get a foreign key violation, and the INSERT
to object_member
is cancelled, along with the whole transaction. Else, the other transaction with the DELETE
has to wait until your transaction is done, and is then cancelled by the same FK constraint since depending rows are now there (unless it's defined to CASCADE
...) So by locking (or not) you can decide whether to prevent the DELETE
or the INSERT
in this scenario.
Then your call burns down to just:
query(
`WITH o(id) AS (SELECT f_object_id($1, $2))
INSERT INTO object_member (object_id, key, value)
SELECT o.id, UNNEST($3::text[]), UNNEST($4::int[])
FROM o;`
[$scope, $name, $keys, $values]
)
Since you obviously insert multiple rows into object_member
, I moved f_object_id($1, $2)
to a CTE to avoid repeated execution - which would work, but pointlessly expensive.
In Postgres 12 or later I would make that explicit by adding MATERIALIZED
(since the INSERT
is hidden in a function):
WITH o(id) AS MATERIALIZED (SELECT f_object_id($1, $2)) ...
Aside: For the multiple unnest()
in the SELECT
list, make sure you are on Postgres 10 or later. See:
- What is the expected behaviour for multiple set-returning functions in SELECT clause?
Matters of detail
Will it make any difference (apart from execution time) to do this in the application logic with multiple queries in the same transaction?
Basically no. The only difference is performance. Well, and short code and reliability. It's objectively more error prone to go back and forth between db and client for each loop. But unless you have extremely competitive transactions, you would hardly ever be looping anyway.
The other consideration is this: the matter is tricky, and most developers do not understand it. Encapsulated in a server-side function, it's less likely to be broken by the next application programmer (or yourself). You have to make sure that it's actually used, too. Either way, properly document the reasons you are doing it one way or another ...
I really wonder whether my second snippet is safe, or why not (given the quote about visibility in the
SELECT
after theINSERT
).
Mostly safe, but not absolutely. While the next separate SELECT
will see (now committed) rows of a transactions competing with the previous UPSERT, there is nothing to keep a third transaction from deleting it again in the meantime. The row has not been locked, and you have no way to do that while it's not visible, and there is no generic predicate locking available in Postgres.
Consider this (T1, T2, T3 are concurrent transactions):
T2: BEGIN transaction
T1: BEGIN transaction
T2: INSERT object 666
T1: UPSERT object 666
unique violation?
-> wait for T2
T2: COMMIT
T1: unique violation -> NO ACTION
finish statement
can't return invisible object 666
T3: DELETE object 666 & COMMIT
T1: SELECT object 666 -> no row!
BOOM!
Typically it's extremely unlikely that it ever happens.
But it's possible. Hence the loop.
The other option is SERIALIZABLE
transaction isolation. Typically more expensive, and you need to prepare for serialization failures. Catch 22.
How can I deal with a race condition in PostgreSQL?
Do it all in a single command:
result= """
INSERT INTO shortened_link ( url_shortened ...
SELECT %(url)s
where not exists (
select 1
from shortened_link
WHERE url_shortened = %(url)s
);"""
It will only insert if that link does not exist.
Insert row if not exists leads to race condition?
There is an unavoidable "race" here, since two sessions cannot "see" eachothers uncommited rows. On a conflict, a session could only rollback (maybe to a savepoint) and retry. That would typically mean: referring to the other's freshly inserted row, instead of creating a private duplicate.
There is a data-modelling problem here: feed_channel appears to have a lot of candidate keys, and the cascading rule from feed_content could orphanise a lot of feed_content's rows (I suppose content-> channel is 1::M relation; more than one contents-row could refer to the same channel)
Finally, the feed_channel table at least needs the natural key {link,title}. That is where the insert/not exists is all about. (and the whole purpose of this function)
I cleaned up the function a bit. The IF construct is not needed, doing an INSERT WHERE NOT EXISTS first works just as well, and maybe even better.
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE feed_channel
( id SERIAL PRIMARY KEY
, name TEXT
, link TEXT NOT NULL
, title TEXT NOT NULL -- part of PK :: must be not nullable
, CONSTRAINT feed_channel_nat UNIQUE (link,title) -- the natural key
);
CREATE TABLE feed_content
( id SERIAL PRIMARY KEY
, channel INTEGER REFERENCES feed_channel(id) ON DELETE CASCADE ON UPDATE CASCADE
, guid TEXT UNIQUE NOT NULL -- yet another primary key
, title TEXT --
, link TEXT -- title && link appear to be yet another candidate key
, description TEXT
, pubdate TIMESTAMP
);
-- NOTE: omitted original function channel_insert() for brevity
CREATE OR REPLACE FUNCTION channel_insert_wp(
p_link feed_channel.link%TYPE,
p_title feed_channel.title%TYPE
) RETURNS feed_channel.id%TYPE AS $body$
DECLARE
v_id feed_channel.id%TYPE;
BEGIN
INSERT INTO feed_channel(link,title)
SELECT p_link,p_title
WHERE NOT EXISTS ( SELECT *
FROM feed_channel nx
WHERE nx.link= p_link
AND nx.title= p_title
)
;
SELECT id INTO v_id
FROM feed_channel ex
WHERE ex.link= p_link
AND ex.title= p_title
;
RETURN v_id;
END;
$body$ LANGUAGE plpgsql;
SELECT channel_insert('Bogus_link', 'Bogus_title');
SELECT channel_insert_wp('Bogus_link2', 'Bogus_title2');
SELECT * FROM feed_channel;
Results:
DROP SCHEMA
CREATE SCHEMA
SET
NOTICE: CREATE TABLE will create implicit sequence "feed_channel_id_seq" for serial column "feed_channel.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "feed_channel_pkey" for table "feed_channel"
NOTICE: CREATE TABLE / UNIQUE will create implicit index "feed_channel_nat" for table "feed_channel"
CREATE TABLE
NOTICE: CREATE TABLE will create implicit sequence "feed_content_id_seq" for serial column "feed_content.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "feed_content_pkey" for table "feed_content"
NOTICE: CREATE TABLE / UNIQUE will create implicit index "feed_content_guid_key" for table "feed_content"
CREATE TABLE
NOTICE: type reference feed_channel.link%TYPE converted to text
NOTICE: type reference feed_channel.title%TYPE converted to text
NOTICE: type reference feed_channel.id%TYPE converted to integer
CREATE FUNCTION
NOTICE: type reference feed_channel.link%TYPE converted to text
NOTICE: type reference feed_channel.title%TYPE converted to text
NOTICE: type reference feed_channel.id%TYPE converted to integer
CREATE FUNCTION
channel_insert
----------------
1
(1 row)
channel_insert_wp
-------------------
2
(1 row)
id | name | link | title
----+------+-------------+--------------
1 | | Bogus_link | Bogus_title
2 | | Bogus_link2 | Bogus_title2
(2 rows)
Race condition when insert a row
Yes, your situation may likely provoque a race condition.
Do you need these counters at all? You can easily replace them with appropriate queries:
SELECT COUNT(*) FROM ItemCatalog WHERE ID=100;
SELECT COUNT(*) FROM Item WHERE ID=100;
For successive field contents it is advisable to use AUTO_INCREMENT
columns. But it seems that doesn't apply in your case.
But nevertheless, you can use the COUNT(*)
approach from above:
insert into Item(ItemCatalogID, Name, Description, OtherValue) values (100, 'MyItem', 'Short Description', (select count(*) from Item where ID=100));
It might be that you'll have to alias one of the occurrences of your table:
insert into Item(ItemCatalogID, Name, Description, OtherValue) values (100, 'MyItem', 'Short Description', (select count(*) from Item AS I where ID=100))
This executes in one step and you won't have to worry about a race condition.
If you cannot change this due to whatever reasons, there is another solution: use table locking.
Prefix your statements with
LOCK TABLES Counters WRITE, Item WRITE
and suffix them with
UNLOCK TABLES
in order to have exclusive write access to them.
Handling race conditions in PostgreSQL
You have to care about the transaction isolation level. It should be set to "SERIALIZABLE
".
The reason are Phantom Reads
- The transaction doesn't lock the whole table, but only the rows which have already been read by the transaction.
So, if another transaction inserts new data, they haven't been locked yet, and the error appears.
Serializable avoids this, by blocking all other transactions, until this one finished.
You can do this via
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
The documentations: http://www.postgresql.org/docs/9.1/static/transaction-iso.html
If you want to learn more about this topic, I can highly recommend you this video: http://www.youtube.com/watch?v=zz-Xbqp0g0A
Related Topics
Stored Procedure That Automatically Delete Rows Older Than 7 Days in MySQL
Why Is Select * Considered Harmful
MySQL Query Finding Values in a Comma Separated String
Need to Return Two Sets of Data With Two Different Where Clauses
How to Update Two Tables in One Statement in SQL Server 2005
Cannot Insert Explicit Value For Identity Column in Table 'Table' When Identity_Insert Is Set to Off
Listagg in Oracle to Return Distinct Values
Simplest Way to Do a Recursive Self-Join
MySQL - Error 1045 - Access Denied
Calculate a Running Total in SQL Server
Delete All Duplicate Rows Except For One in MySQL
Foreign Key Constraint May Cause Cycles or Multiple Cascade Paths
Recommended SQL Database Design For Tags or Tagging
Selecting With Multiple Where Conditions on Same Column
How to Access the "Previous Row" Value in a Select Statement
What's the Difference Between Not Exists Vs. Not in Vs. Left Join Where Is Null