Casting Null Type When Updating Multiple Rows

Casting NULL type when updating multiple rows

With a standalone VALUES expression PostgreSQL has no idea what the data types should be. With simple numeric literals the system is happy to assume matching types. But with other input (like NULL) you would need to cast explicitly - as you already have found out.

You can query pg_catalog (fast, but PostgreSQL-specific) or the information_schema (slow, but standard SQL) to find out and prepare your statement with appropriate types.

Or you can use one of these simple "tricks" (I saved the best for last):

0. Select row with LIMIT 0, append rows with UNION ALL VALUES

UPDATE foo f
SET x = t.x
, y = t.y
FROM (
(SELECT pkid, x, y FROM foo LIMIT 0) -- parenthesis needed with LIMIT
UNION ALL
VALUES
(1, 20, NULL) -- no type casts here
, (2, 50, NULL)
) t -- column names and types are already defined
WHERE f.pkid = t.pkid;

The first sub-select of the subquery:

(SELECT x, y, pkid  FROM foo LIMIT 0)

gets names and types for the columns, but LIMIT 0 prevents it from adding an actual row. Subsequent rows are coerced to the now well-defined row type - and checked immediately whether they match the type. Should be a subtle additional improvement over your original form.

While providing values for all columns of the table this short syntax can be used for the first row:

(TABLE foo LIMIT 0)

Major limitation: Postgres casts the input literals of the free-standing VALUES expression to a "best-effort" type immediately. When it later tries to cast to the given types of the first SELECT, it may already be too late for some types if there is no registered assignment cast between the assumed type and the target type. Examples: text -> timestamp or text -> json.

Pro:

  • Minimum overhead.
  • Readable, simple and fast.
  • You only need to know relevant column names of the table.

Con:

  • Type resolution can fail for some types.

1. Select row with LIMIT 0, append rows with UNION ALL SELECT

UPDATE foo f
SET x = t.x
, y = t.y
FROM (
(SELECT pkid, x, y FROM foo LIMIT 0) -- parenthesis needed with LIMIT
UNION ALL SELECT 1, 20, NULL
UNION ALL SELECT 2, 50, NULL
) t -- column names and types are already defined
WHERE f.pkid = t.pkid;

Pro:

  • Like 0., but avoids failing type resolution.

Con:

  • UNION ALL SELECT is slower than VALUES expression for long lists of rows, as you found in your test.
  • Verbose syntax per row.

2. VALUES expression with per-column type

...
FROM (
VALUES
((SELECT pkid FROM foo LIMIT 0)
, (SELECT x FROM foo LIMIT 0)
, (SELECT y FROM foo LIMIT 0)) -- get type for each col individually
, (1, 20, NULL)
, (2, 50, NULL)
) t (pkid, x, y) -- columns names not defined yet, only types.
...

Contrary to 0. this avoids premature type resolution.

The first row in the VALUES expression is a row of NULL values which defines the type for all subsequent rows. This leading noise row is filtered by WHERE f.pkid = t.pkid later, so it never sees the light of day. For other purposes you can eliminate the added first row with OFFSET 1 in a subquery.

Pro:

  • Typically faster than 1. (or even 0.)
  • Short syntax for tables with many columns and only few are relevant.
  • You only need to know relevant column names of the table.

Con:

  • Verbose syntax for only few rows
  • Less readable (IMO).

3. VALUES expression with row type

UPDATE foo f
SET x = (t.r).x -- parenthesis needed to make syntax unambiguous
, y = (t.r).y
FROM (
VALUES
('(1,20,)'::foo) -- columns need to be in default order of table
,('(2,50,)') -- nothing after the last comma for NULL
) t (r) -- column name for row type
WHERE f.pkid = (t.r).pkid;

You obviously know the table name. If you also know the number of columns and their order you can work with this.

For every table in PostgreSQL a row type is registered automatically. If you match the number of columns in your expression, you can cast to the row type of the table ('(1,50,)'::foo) thereby assigning column types implicitly. Put nothing behind a comma to enter a NULL value. Add a comma for every irrelevant trailing column.

In the next step you can access individual columns with the demonstrated syntax. More about Field Selection in the manual.

Or you could add a row of NULL values and use uniform syntax for actual data:

...
VALUES
((NULL::foo)) -- row of NULL values
, ('(1,20,)') -- uniform ROW value syntax for all
, ('(2,50,)')
...

Pro:

  • Fastest (at least in my tests with few rows and columns).
  • Shortest syntax for few rows or tables where you need all columns.
  • You don't have to spell out columns of the table - all columns automatically have the matching name.

Con:

  • Not so well known syntax for field selection from record / row / composite type.
  • You need to know number and position of relevant columns in default order.

4. VALUES expression with decomposed row type

Like 3., but with decomposed rows in standard syntax:

UPDATE foo f
SET x = t.x
, y = t.y
FROM (
VALUES
(('(1,20,)'::foo).*) -- decomposed row of values
, (2, 50, NULL)
) t(pkid, x, y) -- arbitrary column names (I made them match)
WHERE f.pkid = t.pkid; -- eliminates 1st row with NULL values

Or, with a leading row of NULL values again:

...
VALUES
((NULL::foo).*) -- row of NULL values
, (1, 20, NULL) -- uniform syntax for all
, (2, 50, NULL)
...

Pros and cons like 3., but with more commonly known syntax.

And you need to spell out column names (if you need them).

5. VALUES expression with types fetched from row type

Like Unril commented, we can combine the virtues of 2. and 4. to provide only a subset of columns:

UPDATE foo f
SET ( x, y)
= (t.x, t.y) -- short notation, see below
FROM (
VALUES
((NULL::foo).pkid, (NULL::foo).x, (NULL::foo).y) -- subset of columns
, (1, 20, NULL)
, (2, 50, NULL)
) t(pkid, x, y) -- arbitrary column names (I made them match)
WHERE f.pkid = t.pkid;

Pros and cons like 4., but we can work with any subset of columns and don't have to know the full list.

Also displaying short syntax for the UPDATE itself that's convenient for cases with many columns. Related:

  • Bulk update of all columns

4. and 5. are my favorites.

db<>fiddle here - demonstrating all

How to avoid explicit casting NULL during INSERT in Postgresql

Instead of inserting from a SELECT, you can attach a VALUES clause directly to the INSERT, i.e.:

INSERT INTO A ({col_list}) 
VALUES (row_1), (row_2), ...

When you insert from a query, Postgres examines the query in isolation when trying to infer the column types, and then tries to coerce them to match the target table (only to find out that it can't).

When you insert directly from a VALUES list, it knows about the target table when performing the type inference, and can then assume that any untyped NULL matches the corresponding column.

PostgreSQL: update multiple records (handle NULL values)

The problem is that Postgres doesn't know the type of the my_value column in your VALUES expression, so it defaults them to text. You can avoid this by annotating at least one NULL value with the desired type smallint:

update person as db_record_to_update
set my_value = db_record_new.my_value
from (values
('1', NULL::smallint),
('2', NULL)
) as db_record_new(id, my_value)
where db_record_new.id = db_record_to_update.id;

(online demo)

UPSERT based on UNIQUE constraint with NULL values

If you can find a value that can never legally exist in col3 (make sure with a check constraint), you could use a unique index:

CREATE UNIQUE INDEX ON my_table (
col2,
coalesce(col3, -1.0)
);

and use that in your INSERT:

INSERT INTO my_table (col2, col3, col4)
VALUES (p_col2, p_col3, p_col4)
ON CONFLICT (col2, coalesce(col3, -1.0))
DO UPDATE SET col4 = excluded.col4;

Why does the compiler let me cast a null to a specific type in C#?

In IL on this level, null is just null. The compiler knew it was null because that is what you wrote, as such the compiler does not need to call the cast operator at all. Casting null to an object will just yield null.

So this is a compile-time "optimization" or simplification if you will.

Since this is legal, to cast null to another object type, there is neither a warning nor an error reported from this.

Note that apparently the compiler will not do this even thought it may be able to verify that the value being cast is indeed guaranteed to be null, if it isn't a literal.

Your example:

void Main()
{
var s = (string)null;
GC.KeepAlive(s);
}

IL:

IL_0000:  ldnull      
IL_0001: stloc.0 // s
IL_0002: ldloc.0 // s
IL_0003: call System.GC.KeepAlive

(I added the call to GC.KeepAlive to avoid the compiler dropping the entire variable due to it not being used anywhere.)

If I stuff the null into an object first, with no possibility of it changing:

void Main()
{
object o = null;
var s = (string)o;
GC.KeepAlive(s);
}

IL:

IL_0000:  ldnull      
IL_0001: stloc.0 // o
IL_0002: ldloc.0 // o
IL_0003: castclass System.String
IL_0008: stloc.1 // s
IL_0009: ldloc.1 // s
IL_000A: call System.GC.KeepAlive

Conversion data type issue when UPDATE a NULL table

Reason for the error is correctly put by Damien. Below is just another flavor of the possible fix.
Modify your upper part of script to this:

SELECT * into #Temp1
from (
SELECT 1 tempid, cast (NULL as varchar(19)) tasknr , cast(NULL as decimal(13,4)) devcat, cast(NULL as int) taskop, cast(NULL as xml) taskcl
UNION ALL SELECT 2, NULL, NULL, NULL, NULL
UNION ALL SELECT 3, NULL, NULL, NULL, NULL
UNION ALL SELECT 4, NULL, NULL, NULL, NULL
UNION ALL SELECT 5, NULL, NULL, NULL, NULL
UNION ALL SELECT 6, NULL, NULL, NULL, NULL
UNION ALL SELECT 7, NULL, NULL, NULL, NULL
UNION ALL SELECT 8, NULL, NULL, NULL, NULL
UNION ALL SELECT 9, NULL, NULL, NULL, NULL
)

.......

Edit: To DELETE those rows which have every column as NULL, use this script:

DELETE FROM #temp1 WHERE
COALESCE(tasknr,devcat,taskop,taskcl) IS NULL

How to know which column has changed on UPDATE?

Basically, you need the pre-UPDATE values of updated rows to compare. That's kind of hard as RETURNING only returns post-UPDATE state. But can be worked around. See:

  • Return pre-UPDATE column values using SQL only

So this does the basic trick:

WITH input(col1, col2) AS (
SELECT 1, text 'post_up' -- "whole row"
)
, pre_upd AS (
UPDATE tab1 x
SET (col1, col2) = (i.col1, i.col2)
FROM input i
JOIN tab1 y USING (col1)
WHERE x.col1 = y.col1
RETURNING y.*
)
TABLE pre_upd
UNION ALL
TABLE input;

db<>fiddle here

This is assuming that col1 in your example is the PRIMARY KEY. We need some way to identify rows unambiguously.

Note that this is not safe against race conditions between concurrent writes. You need to do more to be safe. See related answer above.

The explicit cast to text I added in the CTE above is redundant as text is the default type for string literals anyway. (Like integer is the default for simple numeric literals.) For other data types, explicit casting may be necessary. See:

  • Casting NULL type when updating multiple rows

Also be aware that all updates write a new row version, even if nothing changes at all. Typically, you'd want to suppress such costly empty updates with appropriate WHERE clauses. See:

  • How do I (or can I) SELECT DISTINCT on multiple columns?

While "passing whole rows", you'll have to check on all columns that might change, to achieve that.



Related Topics



Leave a reply



Submit