Detecting Column Changes in a Postgres Update Trigger

Detecting column changes in a postgres update trigger

Read up on the hstore extension. In particular you can create a hstore from a row, which means you can do something like:

changes := hstore(NEW) - hstore(OLD);
...pg_notify(... changes::text ...)

That's slightly more information than you wanted (includes new values). You can use akeys(changed) if you just want the keys.

Within a trigger function, how to get which fields are being updated

If a "source" doesn't "send an identifier", the column will be unchanged. Then you cannot detect whether the current UPDATE was done by the same source as the last one or by a source that did not change the column at all. In other words: this does not work properly.

If the "source" is identifiable by any session information function, you can work with that. Like:

NEW.column = session_user;

Unconditionally for every update.

General Solution

I found a way how to solve the original problem.

Set the column to a default value if it's not targeted in an UPDATE (not in the SET list). Key element is a per-column trigger introduced with PostgreSQL 9.0 - a column-specific trigger using the UPDATE OFcolumn_name clause. The manual:

The trigger will only fire if at least one of the listed columns is
mentioned as a target of the UPDATE command.

That's the only simple way I found to distinguish whether a column was updated with a new value identical to the old, versus not updated at all.

One could also parse the text returned by current_query(). But that seems cumbersome, tricky and unreliable.

Trigger functions

I assume a column source defined NOT NULL.

Step 1: Set source to NULL if unchanged:

CREATE OR REPLACE FUNCTION trg_tbl_upbef_step1()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
IF NEW.source = OLD.source THEN
NEW.source := NULL; -- "impossible" value (source is NOT NULL)
END IF;

RETURN NEW;
END
$func$;

Step 2: Revert to old value. Trigger will only be fired, if the value was actually updated (see below):

CREATE OR REPLACE FUNCTION trg_tbl_upbef_step2()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
IF NEW.source IS NULL THEN
NEW.source := OLD.source;
END IF;

RETURN NEW;
END
$func$;

Step 3: Now we can identify the lacking update and set a default value instead:

CREATE OR REPLACE FUNCTION trg_tbl_upbef_step3()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
IF NEW.source IS NULL THEN
NEW.source := 'UPDATE default source'; -- optionally same as column default
END IF;

RETURN NEW;
END
$func$;

Triggers

The trigger for Step 2 is fired per column!

CREATE TRIGGER upbef_step1
BEFORE UPDATE ON tbl
FOR EACH ROW
EXECUTE PROCEDURE trg_tbl_upbef_step1();

CREATE TRIGGER upbef_step2
BEFORE UPDATE OF source ON tbl -- key element!
FOR EACH ROW
EXECUTE PROCEDURE trg_tbl_upbef_step2();

CREATE TRIGGER upbef_step3
BEFORE UPDATE ON tbl
FOR EACH ROW
EXECUTE PROCEDURE trg_tbl_upbef_step3();

db<>fiddle here

Trigger names are relevant, because they are fired in alphabetical order (all being BEFORE UPDATE)!

The procedure could be simplified with something like "per-not-column triggers" or any other way to check the target-list of an UPDATE in a trigger. But I see no handle for this, currently (unchanged as of Postgres 14).

If source can be NULL, use any other "impossible" intermediate value and check for NULL additionally in trigger function 1:

IF OLD.source IS NOT DISTINCT FROM NEW.source THEN
NEW.source := '#impossible_value#';
END IF;

Adapt the rest accordingly.

PostgreSQL: NOTIFY when specific column is updated

It is simpler than you think:

IF OLD IS DISTINCT FROM NEW THEN
PERFORM pg_notify('my_table_update',output);
END IF;

Use IS DISTINCT FROM rather than <> to handle NULL values correctly.

PL/PGSQL trigger how to detect if a column is provided in the SQL statement on UPDATE?

Write two AFTER UPDATE triggers:

  • an AFTER UPDATE OF (user_id) trigger called trigger1 that only contains RETURN NULL

  • an AFTER UPDATE trigger called trigger2 that unconditionally throws an error

Then the triggers will be executed in that order, because trigger1 is alphabetically before trigger2.

If user_id was in the SET list, trigger1 will execute and terminate processing because it returns NULL, so that trigger2 won't run and no error is thrown.

If user_id is not in the SET lust, trigger1 won't run and trigger2 will throw an error.

See the documentation for an explanation of trigger execution order.

Postgres function to show new column values after update as JSON

Create a function that compares two JSONB values:

create or replace function jsonb_diff(jsonb, jsonb)
returns jsonb language sql immutable as $$
select jsonb_object_agg(n.key, n.value)
from jsonb_each($1) o
join jsonb_each($2) n on o.key = n.key
where o.value <> n.value;
$$;

and use it in your trigger function:

    updates := jsonb_diff(to_jsonb(OLD), to_jsonb(NEW));
raise NOTICE 'Logging update on relation (%.%) %', TG_TABLE_SCHEMA, TG_TABLE_NAME, updates;
RETURN NEW;

By the way, in Postgres 11+ you can use function arguments of the record type.

PostgreSQL trigger if one of the columns have changed

You can compare whole records:

if new <> old then ...

or

if new is distinct from old then ...

The second option is more general. Use the first one only when you are sure that the records cannot contain nulls.

Is there a more elegant way to detect changes in a large SQL table without altering it?

Adding columns and triggers is really quite safe

While I realise you've said it's a large table in a production DB so you say you can't modify it, I want to explain how you can make a very low impact change.

In PostgreSQL, an ALTER TABLE ... ADD COLUMN of a nullable column takes only moments and doesn't require a table re-write. It does require an exclusive lock, but the main consequence of that is that it can take a long time before the ALTER TABLE can actually proceed, it won't hold anything else up while it waits for a chance to get the lock.

The same is true of creating a trigger on the table.

This means that it's quite safe to add a modified_at or created_at column and an associated trigger function to maintain them to a live table that's in intensive real-world use. Rows added before the column was created will be null, which makes perfect sense since you don't know when they were added/modified. Your trigger will set the modified_at field whenever a row changes, so they'll get progressively filled in.

For your purposes it's probably more useful to have a trigger-maintained side-table that tracks the timestamp of the last change (insert/update/delete) anywhere in the table. That'll save you from storing a whole bunch of timestamps on disk and will let you discover when deletes have happened. A single-row side-table with a row you update on each change using a FOR EACH STATEMENT trigger will be quite low-cost. It's not a good idea for most tables because of contention - it essentially serializes all transactions that attempt to write to the table on the row update lock. In your case that might well be fine, since the table is large and rarely updated.

A third alternative is to have the side table accumulate a running log of the timestamps of insert/update/delete statements or even the individual rows. This allows your client read the change-log table instead of the main table and make small changes to its cached data rather than invalidating and re-reading the whole cache. The downside is that you have to have a way to periodically purge old and unwanted change log records.

So... there's really no operational reason why you can't change the table. There may well be business policy reasons that prevent you from doing so even though you know it's quite safe, though.

... but if you really, really, really can't:

Another option is to use the existing "md5agg" extension: http://llg.cubic.org/pg-mdagg/ . Or to apply the patch currently circulating pgsql-hackers to add an "md5_agg" to the next release to your PostgreSQL install if you built from source.

Logical replication

The bi-directional replication for PostgreSQL project has produced functionality that allows you to listen for and replay logical changes (row inserts/updates/deletes) without requiring triggers on tables. The pg_receivellog tool would likely suit your purposes well when wrapped with a little scripting.

The downside is that you'd have to run a patched PostgreSQL 9.3, so I'm guessing if you can't change a table, running a bunch of experimental code that's likely to change incompatibly in future isn't going to be high on your priority list ;-) . It's included in the stock release of 9.4 though, see "changeset extraction".

Testing the relfilenode timestamp won't work

You might think you could look at the modified timestamp(s) of the file(s) that back the table on disk. This won't be very useful:

  • The table is split into extents, individual files that by default are 1GB each. So you'd have to find the most recent timestamp across them all.
  • Autovacuum activity will cause these timestamps to change, possibly quite a while after corresponding writes happened.
  • Autovacuum must periodically do an automatic 'freeze' of table contents to prevent transaction ID wrap-around. This involves progressively rewriting the table and will naturally change the timestamp. This happens even if nothing's been added for potentially quite a long time.
  • Hint-bit setting results in small writes during SELECT. These writes will also affect the file timestamps.

Examine the transaction logs

In theory you could attempt to decode the transaction logs with pg_xlogreader and find records that affect the table of interest. You'd have to try to exclude activity caused by vacuum, full page writes after hint bit setting, and of course the huge amount of activity from every other table in the entire database cluster.

The performance impact of this is likely to be huge, since every change to every database on the entire system must be examined.

All in all, adding a trigger on a table is trivial in comparison.



Related Topics



Leave a reply



Submit