Generate DEFAULT values in a CTE UPSERT using PostgreSQL 9.3
Postgres 9.5 implemented UPSERT
. See below.
Postgres 9.4 or older
This is a tricky problem. You are running into this restriction (per documentation):
In a
VALUES
list appearing at the top level of anINSERT
, an
expression can be replaced byDEFAULT
to indicate that the destination
column's default value should be inserted.DEFAULT
cannot be used when
VALUES
appears in other contexts.
Bold emphasis mine. Default values are not defined without a table to insert into. So there is no direct solution to your question, but there is a number of possible alternative routes, depending on exact requirements.
Fetch defaults values from system catalog?
You could fetch those from the system catalog pg_attrdef
like @Patrick commented or from information_schema.columns
. Complete instructions here:
- Get the default values of table columns in Postgres?
But then you still only have a list of rows with a text representation of the expression to cook the default value. You would have to build and execute statements dynamically to get values to work with. Tedious and messy. Instead, we can let built-in Postgres functionality do that for us:
Simple shortcut
Insert a dummy row and have it returned to use generated defaults:
INSERT INTO playlist_items DEFAULT VALUES RETURNING *;
Problems / scope of the solution
- This is only guaranteed to work for
STABLE
orIMMUTABLE
default expressions. MostVOLATILE
functions will work just as well, but there are no guarantees. Thecurrent_timestamp
family of functions qualify as stable, since their values do not change within a transaction.
In particular, this has side effects onserial
columns (or any other defaults drawing from a sequence). But that should not be a problem, because you don't normally write toserial
columns directly. Those shouldn't be listed inINSERT
statements at all.
Remaining flaw forserial
columns: the sequence is still advanced by the single call to get a default row, producing a gap in the numbering. Again, that should not be a problem, because gaps are generally to be expected inserial
columns.
Two more problems can be solved:
If you have columns defined
NOT NULL
, you have to insert dummy values and replace withNULL
in the result.We do not actually want to insert the dummy row. We could delete later (in the same transaction), but that may have more side effects, like triggers
ON DELETE
. There is a better way:
Avoid dummy row
Clone a temporary table including column defaults and insert into that:
BEGIN;
CREATE TEMP TABLE tmp_playlist_items (LIKE playlist_items INCLUDING DEFAULTS)
ON COMMIT DROP; -- drop at end of transaction
INSERT INTO tmp_playlist_items DEFAULT VALUES RETURNING *;
...
Same result, fewer side effects. Since default expressions are copied verbatim, the clone draws from the same sequences if any. But other side effects from the unwanted row or triggers are avoided completely.
Credit to Igor for the idea:
- Postgresql, select a "fake" row
Remove NOT NULL
constraints
You would have to provide dummy values for NOT NULL
columns, because (per documentation):
Not-null constraints are always copied to the new table.
Either accommodate for those in the INSERT
statement or (better) eliminate the constraints:
ALTER TABLE tmp_playlist_items
ALTER COLUMN foo DROP NOT NULL
, ALTER COLUMN bar DROP NOT NULL;
There is a quick and dirty way with superuser privileges:
UPDATE pg_attribute
SET attnotnull = FALSE
WHERE attrelid = 'tmp_playlist_items'::regclass
AND attnotnull
AND attnum > 0;
It is just a temporary table with no data and no other purpose, and it's dropped at the end of the transaction. So the shortcut is tempting. Still, the basic rule is: never tamper with system catalogs directly.
So, let's look into a clean way:
Automate with dynamic SQL in a DO
statement. You just need the regular privileges you are guaranteed to have since the same role created the temp table.
DO $$BEGIN
EXECUTE (
SELECT 'ALTER TABLE tmp_playlist_items ALTER '
|| string_agg(quote_ident(attname), ' DROP NOT NULL, ALTER ')
|| ' DROP NOT NULL'
FROM pg_catalog.pg_attribute
WHERE attrelid = 'tmp_playlist_items'::regclass
AND attnotnull
AND attnum > 0
);
END$$
Much cleaner and still very fast. Execute care with dynamic commands and be wary of SQL injection. This statement is safe. I have posted several related answers with more explanation.
General solution (9.4 and older)
BEGIN;
CREATE TEMP TABLE tmp_playlist_items
(LIKE playlist_items INCLUDING DEFAULTS) ON COMMIT DROP;
DO $$BEGIN
EXECUTE (
SELECT 'ALTER TABLE tmp_playlist_items ALTER '
|| string_agg(quote_ident(attname), ' DROP NOT NULL, ALTER ')
|| ' DROP NOT NULL'
FROM pg_catalog.pg_attribute
WHERE attrelid = 'tmp_playlist_items'::regclass
AND attnotnull
AND attnum > 0
);
END$$;
LOCK TABLE playlist_items IN EXCLUSIVE MODE; -- forbid concurrent writes
WITH default_row AS (
INSERT INTO tmp_playlist_items DEFAULT VALUES RETURNING *
)
, new_values (id, playlist, item, group_name, duration, sort, legacy) AS (
VALUES
(651, 21, 30012, 'a', 30, 1, FALSE)
, (NULL, 21, 1, 'b', 34, 2, NULL)
, (668, 21, 30012, 'c', 30, 3, FALSE)
, (7428, 21, 23068, 'd', 0, 4, FALSE)
)
, upsert AS ( -- *not* replacing existing values in UPDATE (?)
UPDATE playlist_items m
SET ( playlist, item, group_name, duration, sort, legacy)
= (n.playlist, n.item, n.group_name, n.duration, n.sort, n.legacy)
-- ..., COALESCE(n.legacy, m.legacy) -- see below
FROM new_values n
WHERE n.id = m.id
RETURNING m.id
)
INSERT INTO playlist_items
(playlist, item, group_name, duration, sort, legacy)
SELECT n.playlist, n.item, n.group_name, n.duration, n.sort
, COALESCE(n.legacy, d.legacy)
FROM new_values n, default_row d -- single row can be cross-joined
WHERE NOT EXISTS (SELECT 1 FROM upsert u WHERE u.id = n.id)
RETURNING id;
COMMIT;
You only need the LOCK
if you have concurrent transactions trying to write to the same table.
As requested, this only replaces NULL values in the column legacy
in the input rows for the INSERT
case. Can easily be extended to work for other columns or in the UPDATE
case as well. For instance, you could UPDATE
conditionally as well: only if the input value is NOT NULL
. I added a commented line to the UPDATE
above.
Aside: You do not need to cast values in any row but the first in a VALUES
expression, since types are derived from the first row.
Postgres 9.5
implements UPSERT with INSERT .. ON CONFLICT .. DO NOTHING | UPDATE
. This largely simplifies the operation:
INSERT INTO playlist_items AS m (id, playlist, item, group_name, duration, sort, legacy)
VALUES (651, 21, 30012, 'a', 30, 1, FALSE)
, (DEFAULT, 21, 1, 'b', 34, 2, DEFAULT) -- !
, (668, 21, 30012, 'c', 30, 3, FALSE)
, (7428, 21, 23068, 'd', 0, 4, FALSE)
ON CONFLICT (id) DO UPDATE
SET (playlist, item, group_name, duration, sort, legacy)
= (EXCLUDED.playlist, EXCLUDED.item, EXCLUDED.group_name
, EXCLUDED.duration, EXCLUDED.sort, EXCLUDED.legacy)
-- (..., COALESCE(l.legacy, EXCLUDED.legacy)) -- see below
RETURNING m.id;
We can attach the VALUES
clause to INSERT
directly, which allows the DEFAULT
keyword. In the case of unique violations on (id)
, Postgres updates instead. We can use excluded rows in the UPDATE
. The manual:
The
SET
andWHERE
clauses inON CONFLICT DO UPDATE
have access to the
existing row using the table's name (or an alias), and to rows
proposed for insertion using the specialexcluded
table.
And:
Note that the effects of all per-row
BEFORE INSERT
triggers are
reflected in excluded values, since those effects may have contributed
to the row being excluded from insertion.
Remaining corner case
You have various options for the UPDATE
: You can ...
- ... not update at all: add a
WHERE
clause to theUPDATE
to only write to selected rows. - ... only update selected columns.
- ... only update if the column is currently NULL:
COALESCE(l.legacy, EXCLUDED.legacy)
- ... only update if the new value is
NOT NULL
:COALESCE(EXCLUDED.legacy, l.legacy)
But there is no way to discern DEFAULT
values and values actually provided in the INSERT
. Only resulting EXCLUDED
rows are visible. If you need the distinction, fall back to the previous solution, where you have both at our disposal.
How to use column default in the where clause to identify default value
What you retrieve from the information schema (or the system catalogs in my solution) is just a string literal representing the expression. You need to actually execute it to get value. It can be a trivial cast like in your case or any other expression. That's where you need dynamic SQL. (Or concatenate a second query in your client from the results of the first query.)
Detailed explanation in this related answer:
Generate DEFAULT values in a CTE UPSERT using PostgreSQL 9.3
(You'll also find instructions for an alternative route without dynamic SQL over there.)
This DO
statement does the trick.
DO
$do$
DECLARE
_data text := 'data:image/png;base64,iVBO...QmCC';
_answer bool;
BEGIN
EXECUTE (
SELECT format('SELECT %s = $1', d.adsrc)
FROM pg_attribute a
JOIN pg_attrdef d ON (d.adrelid, d.adnum) = (a.attrelid, a.attnum)
WHERE a.attrelid = 'a.t'::regclass -- schema.table
AND a.attname = 's'
)
USING _data
INTO _answer;
RAISE NOTICE '%', answer;
END
$do$;
For repeated use I would wrap this into a plpgsql function. There are many related answers.
Also be aware that column defaults can have side effects like increasing a sequence. Not in this particular case, but generally I'd advise to check the default before executing.
How to retrieve the actual default value for a column before insertion
There's no way to do what you want directly - you can't preview the value.
Imagine:
regress=> CREATE TABLE crazy (blah integer, rand float4 default random());
CREATE TABLE
regress=> insert into crazy(blah, rand) values (1, DEFAULT);
INSERT 0 1
regress=> select * from crazy;
blah | rand
------+----------
1 | 0.932575
(1 row)
random()
is a volatile function that returns a different value each time. So any attempt to preview the value would only get you a different value to the one that'll be inserted.
The same is true of nextval
as concurrent transactions can affect the value - even if you directly read the current sequence position, which PostgreSQL tries to prevent you from doing (because it'll produce wrong results). It's just more obvious to think about this problem with random
than nextval
.
So, with a volatile default, all you can do is:
Evaluate the default expression yourself, then supply the value in the
insert
, i.e. callSELECT nextval('statistical_outputs_id_seq')
thenINSERT INTO ... VALUES (..., 'the value from nextval()');
Use
RETURNING
to obtain the generated value
I suggest the latter. The former is annoying and difficult in the general case, since a default can be any arbitrary expression.
Example for RETURNING
:
regress=> insert into crazy(blah, rand) values (1, DEFAULT) RETURNING rand;
rand
----------
0.975092
(1 row)
INSERT 0 1
Inserting a COALESCE(NULL,default)
There's no way to re-use the defined default on the column. The default is only there to define what happens if an INSERT
doesn't specify a value. By this definition a null
value is still "specified" and therefore default can't be used.
Your comment that someone might not use the function indicates that a trigger is better for your requirements than a simple function.
https://www.postgresql.org/docs/current/static/plpgsql-trigger.html
CREATE OR REPLACE FUNCTION default_id() RETURNS TRIGGER AS $default_id$
BEGIN
IF (NEW.id IS NULL) THEN
NEW.id := gen_random_uuid();
END IF;
RETURN NEW;
END;
$default_id$ LANGUAGE plpgsql;
CREATE TRIGGER default_id_trigger
BEFORE INSERT OR UPDATE ON person
FOR EACH ROW EXECUTE PROCEDURE default_id();
If you do want to do this with a function then the simplest way is just to assign the value before inserting:
CREATE OR REPLACE FUNCTION create_person(
id UUID
) RETURNS BOOLEAN LANGUAGE plpgsql SECURITY DEFINER AS $$
BEGIN
IF id IS NULL THEN
id := gen_random_uuid();
END IF;
-- OR
-- id := coalesce(id, gen_random_uuid());
INSERT INTO person( id )
VALUES (id);
RETURN FOUND;
END;
$$;
Related Topics
Using a Database Table as a Queue
Using Bind Variables with Dynamic Select into Clause in Pl/Sql
How to Start Auto Increment from a Specific Point
Optional Arguments in Where Clause
How to Search All Columns in a Table
Select Statement to Return Parent and Infinite Children
What Is Your Naming Convention for Stored Procedures
Where Clause to Find All Records in a Specific Month
Select Rows with Same Id But Different Value in Another Column
Grant Privileges for a Particular Database in Postgresql
Performance of Like '%Query%' VS Full Text Search Contains Query
Why Don't Dbms's Support Assertion
"Order By" Using a Parameter for the Column Name
Is There a Postgres Closest Operator
Getting "Comma-Separated List Near 'Xx.Yy' Invalid" with Dbms_Utility.Comma_To_Table