Computed/Calculated/Virtual/Derived Columns in Postgresql

Computed / calculated / virtual / derived columns in PostgreSQL

Up to Postgres 11 generated columns are not supported - as defined in the SQL standard and implemented by some RDBMS including DB2, MySQL and Oracle. Nor the similar "computed columns" of SQL Server.

STORED generated columns are introduced with Postgres 12. Trivial example:

CREATE TABLE tbl (
int1 int
, int2 int
, product bigint GENERATED ALWAYS AS (int1 * int2) STORED
);

db<>fiddle here

VIRTUAL generated columns may come with one of the next iterations. (Not in Postgres 14, yet).

Related:

  • Attribute notation for function call gives error

Until then, you can emulate VIRTUAL generated columns with a function using attribute notation (tbl.col) that looks and works much like a virtual generated column. That's a bit of a syntax oddity which exists in Postgres for historic reasons and happens to fit the case. This related answer has code examples:

  • Store common query as column?

The expression (looking like a column) is not included in a SELECT * FROM tbl, though. You always have to list it explicitly.

Can also be supported with a matching expression index - provided the function is IMMUTABLE. Like:

CREATE FUNCTION col(tbl) ... AS ...  -- your computed expression here
CREATE INDEX ON tbl(col(tbl));

Alternatives

Alternatively, you can implement similar functionality with a VIEW, optionally coupled with expression indexes. Then SELECT * can include the generated column.

"Persisted" (STORED) computed columns can be implemented with triggers in a functionally identical way.

Materialized views are a closely related concept, implemented since Postgres 9.3.

In earlier versions one can manage MVs manually.

Does PostgreSQL only support STORED generated columns?

The documentation clearly states that virtual generated columns are not supported.

PostgreSQL currently implements only stored generated columns.

This appears just one sentence after the part of the documnetation that you quoted.

PostgreSQL Calculated Column with values of another table referenced by foreign key

You cannot define a generated column based on values from other tables. Per the documentation:

The generation expression can refer to other columns in the table, but not other generated columns. Any functions and operators used must be immutable. References to other tables are not allowed.

You can achieve the expected behavior by creating two triggers on both tables but usually creating a view based on the tables is a simpler and more efficient solution.

Is there a way to define a column as the sum of two others?

I don't think you can use computed columns in Postgres, a view would be a better option than a trigger:

CREATE VIEW viewname
AS
SELECT a,
b,
(a + b) AS Sum_A_B
FROM foo;

Or can you not just query it directly? does it need to be stored in the table:

 SELECT a,
b,
(a + b) AS Sum_A_B
FROM foo;

How to create a GENERATED column containing the MD5 of multiple columns?

I suggest an immutable helper function:

CREATE OR REPLACE FUNCTION f_request_md5(_request text, _request_body bytea)
RETURNS uuid
LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT md5(textin(record_out((md5(_request_body), _request))))::uuid';

And a table like this:

CREATE TABLE client_cache (
id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY
, request text COMPRESSION lz4 NOT NULL CHECK (length(request) <= 10240)
, request_body bytea COMPRESSION lz4 NOT NULL CHECK (length(request_body) <= 1048576)
, request_hash uuid GENERATED ALWAYS AS (f_request_md5(request, request_body)) STORED
);

db<>fiddle here

Note the more efficient uuid instead of varchar. See:

  • What is the optimal data type for an MD5 field?

Background

There are two overloaded variants of md5() in Postgres 14 (or any supported version):

test=> SELECT (proargtypes::regtype[])[0], prorettype::regtype, provolatile
test-> FROM pg_proc
test-> WHERE proname = 'md5';
proargtypes | prorettype | provolatile
-------------+------------+-------------
bytea | text | i
text | text | i
(2 rows)

One takes bytea, one text, both are IMMUTABLE and return text. So this expression is immutable:

ROW(MD5(request), MD5(request_body))

But this is not, like you found out the hard way:

MD5(ROW(A, B)::varchar)

The text representation of a record is not immutable. There are many reasons. One obvious reason for the case at hand: bytea output can be in (default) hex format or in the obsolescent escape format. A plain

SET bytea_output = 'escape'; 

... would break your generated column.

To get an immutable text representation of a bytea value, you'd run it through encode(request_body, 'hex'). But don't go there. md5(request_body) gives us a faster immutable text "representation" for our purpose.

We still can't cast a record. So I created the wrapper function. Be sure to read this related answer for more explanation:

  • Why doesn't my UNIQUE constraint trigger?

Like discussed in that answer, the new, built-in function hash_record_extended() would be much more efficient for the purpose. So if a bigint is good enough, consider this:

CREATE TABLE client_cache2 (
id bigint PRIMARY KEY GENERATED ALWAYS AS IDENTITY
, request text COMPRESSION lz4 NOT NULL CHECK (length(request) <= 10240)
, request_body bytea COMPRESSION lz4 NOT NULL CHECK (length(request_body) <= 1048576)
, request_hash bigint GENERATED ALWAYS AS (hash_record_extended((request, request_body), 0)) STORED
);

same db<>fiddle here

Works out of the box in Postgres 14 or later.

Related:

  • Why am I getting a an error when creating a generated column in PostgreSQL?
  • Computed / calculated / virtual / derived columns in PostgreSQL

How to use a custom function as column default in a PostgreSQL table

If monthlyPayment is fixed as per your definition, hence completely functionally dependent, then consider not persisting the value at all. Keep using your cheap (!) function instead. Much cleaner and cheaper overall. Like:

SELECT *, f_monthly_payment(l) AS monthly_payment
FROM loan l;

Assuming the function is defined as f_monthly_payment(loan) (taking the row type of the table as argument). Else adapt accordingly.

Postgres 12 or later has STORED generated columns, but those only pay for expensive calculations. Persisted columns occupy space and slow down all operations on the table.

See:

  • Computed / calculated / virtual / derived columns in PostgreSQL

If you want to allow manual changes, a column default is the way to go (like you actually asked). But you cannot use your function because, quoting the manual:

The DEFAULT clause assigns a default data value for the column whose
column definition it appears within. The value is any variable-free
expression (in particular, cross-references to other columns in the
current table are not allowed).

The remaining solution for this is a trigger BEFORE INSERT on the table like:

CREATE OR REPLACE FUNCTION trg_loan_default_rate()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
NEW.monthlyPayment := (NEW.loanAmount * monthlyInterestRate())
/ (1 - (1/power(1 + monthlyInterestRate(), NEW.numberOfYears * 12)));

RETURN NEW;
END
$func$;

CREATE TRIGGER loan_default_rate
BEFORE INSERT ON loan
FOR EACH ROW EXECUTE FUNCTION trg_loan_default_rate();

Assuming monthlyInterestRate() is a custom function.

And I replaced Math.pow with the built-in Postgres function power().

NEW is a special record variable in trigger functions, referring to the newly inserted row. See:

  • FOR EACH STATEMENT trigger example

EXECUTE FUNCTION requires Postgres 11. See:

  • Trigger function does not exist, but I am pretty sure it does

Related:

  • PostgreSQL - set a default cell value according to another cell value

Aside: consider legal, lower-case, unquoted identifiers in Postgres. See:

  • Are PostgreSQL column names case-sensitive?


Related Topics



Leave a reply



Submit