Rails, Postgresql, and History Triggers

Rails, PostgreSQL, and History Triggers

If you need or want database-specific features that ActiveRecord doesn't understand then you should switch to db/structure.sql for keeping track of your schema. db/structure.sql is pretty much a raw dump of your schema made using the database's native tools so it will contain triggers, CHECK constraints, indexes on function results, and everything else.

Switching is easy:

  1. Update your config/application.rb to contain config.active_record.schema_format = :sql.
  2. Do a rake db:structure:dump to get an initial db/structure.sql.
  3. Delete db/schema.rb from your directory tree and revision control.
  4. Add db/structure.sql to revision control.
  5. Adjust your rake habits:

    • Use db:structure:dump instead of db:schema:dump
    • Use db:structure:load instead of db:schema:load

Everything else should work as usual (assuming, of course, that you're sane and using PostgreSQL for development, testing, and production).

With this change made, your triggers will be tracked in db/structure.sql and recreating the database won't lose them.

insert values on trigger in temporal tables in postgres

https://www.postgresql.org/docs/current/static/plpgsql-trigger.html

A trigger procedure is created with the CREATE FUNCTION command,
declaring it as a function with no arguments and a return type of
trigger

and also

  1. same trigger can't fire both before and after event - just create two triggers if you really need it

https://www.postgresql.org/docs/current/static/sql-createtrigger.html

Determines whether the function is called before, after, or instead of
the event.


  1. use NEW instead of OLD for new values

https://www.postgresql.org/docs/current/static/plpgsql-trigger.html

NEW

Data type RECORD; variable holding the new database row for
INSERT/UPDATE operations in row-level triggers. This variable is
unassigned in statement-level triggers and for DELETE operations.

Postgres - Track changes

For you specific problem, I guess that you could use NEW.mytable_id and use trigger AFTER INSERT ...

Here is a example, I've wrote only the first if (INSERT).

CREATE TABLE IF NOT EXISTS sales.history (
history_id serial PRIMARY KEY NOT NULL,
mytable_id integer,
table_name varchar(30),
new_val json,
old_val json,
operation varchar(10),
CONSTRAINT "fk sales.mytable_id to history.mytable_id" FOREIGN KEY (mytable_id) REFERENCES sales.sales (mytable_id)
) TABLESPACE pg_default;


CREATE TRIGGER t_history AFTER INSERT OR UPDATE OR DELETE ON sales.sales

FOR EACH ROW EXECUTE PROCEDURE core.func_store_history_changes();


CREATE OR REPLACE FUNCTION core.func_store_history_changes() RETURNS trigger AS $$
BEGIN
IF TG_OP = 'INSERT'
THEN
INSERT INTO sales.history (table_name, operation,mytable_id, new_val)
VALUES (TG_RELNAME, TG_OP, NEW.mytable_id, row_to_json(NEW));
RETURN NEW;
ELSIF TG_OP = 'UPDATE'
THEN
INSERT INTO sales.history (tabname, schemaname, operation, new_val, old_val)

VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP,

row_to_json(NEW), row_to_json(OLD));
RETURN NEW;
ELSIF TG_OP = 'DELETE'
THEN
INSERT INTO sales.history (tabname, schemaname, operation, old_val)
VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP, row_to_json(OLD));
RETURN OLD;
END IF;
END;
$$ LANGUAGE 'plpgsql' SECURITY DEFINER;

Result:

# insert into sales.sales (value) values ('2.3');
INSERT 0 1
# select * from sales.sales;
mytable_id | value
------------+-------
5 | 2.30
(2 rows)

# select * from sales.history;
history_id | mytable_id | table_name | new_val | old_val | operation
------------+------------+------------+-------------------------------+---------+-----------
3 | 5 | sales | {"mytable_id":5,"value":2.30} | | INSERT
(2 rows)

But, maybe, you could try to use this generic trigger for any table.

https://github.com/2ndQuadrant/audit-trigger

Method to track price drops for a product on a user's wishlist?

You could use an update trigger on Product but it sounds like the Product table is not actually being updated, it is being replaced; if this is the case then there is no update to trigger the trigger so you'll have to do it the hard way by caching the price in Wishlist (as user247245 notes) and scanning Product for price changes.

If Products is being updated rather than replaced wholesale, then an update trigger could be used to note when a price changes and arrange to have the interested parties notified. The trigger would probably queue up the notifications by inserting them into a separate table (to avoid locking up the Products update) called, say Product_price_changes; then, when the Products update is done, a separate task can compare Product_price_changes to Wishlist, notify the interested users, and finish off by deleting everything in Product_price_changes.

Note that recent versions of PostgreSQL you can restrict an update trigger to only fire when a column changes rather than firing on every row change.

Is there any way to see when my attribute was set?

Postgres does keep some column-level statistics, updated when an ANALYZE is done or an auto-analyze occurs. These include things like the Most Common Values and Histogram Bounds.

However, these are for the column as a whole, across all rows. If you want to have information about an individual row's columns, you'll need to add either additional rows to the table, or create another table (or several) to store the additional information, and in either case update the tracking information accordingly.

These types of audit trails can be automated to some degree by the use of trigger functions to examine what's being changed, and if one of the columns you're interested in has changed, you can automatically update the timestamp (if inline in the table) or insert a row in your audit table(s).

However, there is not anything built in at the granularity at which you appear to be looking.

If you are just looking for this for one or a handful of attributes, adding a column inline to the table and updating it (in the insert/update themselves, or via a trigger function), is likely your best bet.

If you are looking for something more systemic, i.e. you want to know this for all columns in a bunch of tables, then a more full-fledged auditing system implemented via trigger functions and stored in separate tables is probably a better bet.

How to incrementally populate a data warehouse?

Here's the way I typically handle updates and deletes.

Updates in the Source System

Some DBMS's provide a column, which, if added to all the tables, provides the warehouse with a unique identifier that is always increasing. SQL Server has the TIMESTAMP column. Oracle provides the ora_rowscn pseudocolumn, which is good at the block level.

While I haven't used it, Postgres has the xmin pseudocolumn, which I believe could be used in a similar fashion. There are some concerns over it but I think for data warehouse change tracking purposes, it may do the trick.

UPDATE triggers in the source system to update the last modified date are another option. Keep this date at a very high precision to reduce the risk of "missing" records if something is running doing massive updates on the ODS when you do your extract.

Deletes in the Source System

As for deleted records, my preferred solution is to ensure that all of the source tables have a primary key (preferably one column, though multiple is doable). I extract the entirety of this column into a stage table daily, then identify the rows that are "missing" from the target table compared to the source, an update a "source deleted" flag or something on the target records. I typically only do this for dimension tables, since fact tables should retain history even if the original transaction is gone.



Related Topics



Leave a reply



Submit