Rails, PostgreSQL, and History Triggers
If you need or want database-specific features that ActiveRecord doesn't understand then you should switch to db/structure.sql
for keeping track of your schema. db/structure.sql
is pretty much a raw dump of your schema made using the database's native tools so it will contain triggers, CHECK constraints, indexes on function results, and everything else.
Switching is easy:
- Update your
config/application.rb
to containconfig.active_record.schema_format = :sql
. - Do a
rake db:structure:dump
to get an initialdb/structure.sql
. - Delete
db/schema.rb
from your directory tree and revision control. - Add
db/structure.sql
to revision control. - Adjust your rake habits:
- Use
db:structure:dump
instead ofdb:schema:dump
- Use
db:structure:load
instead ofdb:schema:load
- Use
Everything else should work as usual (assuming, of course, that you're sane and using PostgreSQL for development, testing, and production).
With this change made, your triggers will be tracked in db/structure.sql
and recreating the database won't lose them.
insert values on trigger in temporal tables in postgres
https://www.postgresql.org/docs/current/static/plpgsql-trigger.html
A trigger procedure is created with the CREATE FUNCTION command,
declaring it as a function with no arguments and a return type of
trigger
and also
- same trigger can't fire both before and after event - just create two triggers if you really need it
https://www.postgresql.org/docs/current/static/sql-createtrigger.html
Determines whether the function is called before, after, or instead of
the event.
- use
NEW
instead ofOLD
for new values
https://www.postgresql.org/docs/current/static/plpgsql-trigger.html
NEW
Data type RECORD; variable holding the new database row for
INSERT/UPDATE operations in row-level triggers. This variable is
unassigned in statement-level triggers and for DELETE operations.
Postgres - Track changes
For you specific problem, I guess that you could use NEW.mytable_id
and use trigger AFTER INSERT
...
Here is a example, I've wrote only the first if (INSERT).
CREATE TABLE IF NOT EXISTS sales.history (
history_id serial PRIMARY KEY NOT NULL,
mytable_id integer,
table_name varchar(30),
new_val json,
old_val json,
operation varchar(10),
CONSTRAINT "fk sales.mytable_id to history.mytable_id" FOREIGN KEY (mytable_id) REFERENCES sales.sales (mytable_id)
) TABLESPACE pg_default;
CREATE TRIGGER t_history AFTER INSERT OR UPDATE OR DELETE ON sales.sales
FOR EACH ROW EXECUTE PROCEDURE core.func_store_history_changes();
CREATE OR REPLACE FUNCTION core.func_store_history_changes() RETURNS trigger AS $$
BEGIN
IF TG_OP = 'INSERT'
THEN
INSERT INTO sales.history (table_name, operation,mytable_id, new_val)
VALUES (TG_RELNAME, TG_OP, NEW.mytable_id, row_to_json(NEW));
RETURN NEW;
ELSIF TG_OP = 'UPDATE'
THEN
INSERT INTO sales.history (tabname, schemaname, operation, new_val, old_val)
VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP,
row_to_json(NEW), row_to_json(OLD));
RETURN NEW;
ELSIF TG_OP = 'DELETE'
THEN
INSERT INTO sales.history (tabname, schemaname, operation, old_val)
VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP, row_to_json(OLD));
RETURN OLD;
END IF;
END;
$$ LANGUAGE 'plpgsql' SECURITY DEFINER;
Result:
# insert into sales.sales (value) values ('2.3');
INSERT 0 1
# select * from sales.sales;
mytable_id | value
------------+-------
5 | 2.30
(2 rows)
# select * from sales.history;
history_id | mytable_id | table_name | new_val | old_val | operation
------------+------------+------------+-------------------------------+---------+-----------
3 | 5 | sales | {"mytable_id":5,"value":2.30} | | INSERT
(2 rows)
But, maybe, you could try to use this generic trigger for any table.
https://github.com/2ndQuadrant/audit-trigger
Method to track price drops for a product on a user's wishlist?
You could use an update trigger on Product
but it sounds like the Product
table is not actually being updated, it is being replaced; if this is the case then there is no update to trigger the trigger so you'll have to do it the hard way by caching the price in Wishlist
(as user247245 notes) and scanning Product
for price changes.
If Products
is being updated rather than replaced wholesale, then an update trigger could be used to note when a price changes and arrange to have the interested parties notified. The trigger would probably queue up the notifications by inserting them into a separate table (to avoid locking up the Products
update) called, say Product_price_changes
; then, when the Products
update is done, a separate task can compare Product_price_changes
to Wishlist
, notify the interested users, and finish off by deleting everything in Product_price_changes
.
Note that recent versions of PostgreSQL you can restrict an update trigger to only fire when a column changes rather than firing on every row change.
Is there any way to see when my attribute was set?
Postgres does keep some column-level statistics, updated when an ANALYZE
is done or an auto-analyze occurs. These include things like the Most Common Values and Histogram Bounds.
However, these are for the column as a whole, across all rows. If you want to have information about an individual row's columns, you'll need to add either additional rows to the table, or create another table (or several) to store the additional information, and in either case update the tracking information accordingly.
These types of audit trails can be automated to some degree by the use of trigger functions to examine what's being changed, and if one of the columns you're interested in has changed, you can automatically update the timestamp (if inline in the table) or insert a row in your audit table(s).
However, there is not anything built in at the granularity at which you appear to be looking.
If you are just looking for this for one or a handful of attributes, adding a column inline to the table and updating it (in the insert/update themselves, or via a trigger function), is likely your best bet.
If you are looking for something more systemic, i.e. you want to know this for all columns in a bunch of tables, then a more full-fledged auditing system implemented via trigger functions and stored in separate tables is probably a better bet.
How to incrementally populate a data warehouse?
Here's the way I typically handle updates and deletes.
Updates in the Source System
Some DBMS's provide a column, which, if added to all the tables, provides the warehouse with a unique identifier that is always increasing. SQL Server has the TIMESTAMP column. Oracle provides the ora_rowscn pseudocolumn, which is good at the block level.
While I haven't used it, Postgres has the xmin pseudocolumn, which I believe could be used in a similar fashion. There are some concerns over it but I think for data warehouse change tracking purposes, it may do the trick.
UPDATE triggers in the source system to update the last modified date are another option. Keep this date at a very high precision to reduce the risk of "missing" records if something is running doing massive updates on the ODS when you do your extract.
Deletes in the Source System
As for deleted records, my preferred solution is to ensure that all of the source tables have a primary key (preferably one column, though multiple is doable). I extract the entirety of this column into a stage table daily, then identify the rows that are "missing" from the target table compared to the source, an update a "source deleted" flag or something on the target records. I typically only do this for dimension tables, since fact tables should retain history even if the original transaction is gone.
Related Topics
What Does 'Def Self.Function' Name Mean
How to Count the Number of Records That Have a Unique Value in a Particular Field in Ror
Guard with Rspec on Rails 4 Giving a Lot of Warnings
Rails 3.1 Pipeline - Exclude JavaScript File
Openssl::Ssl::Sslerror: Ssl_Connect Returned=1 Errno=0 State=Unknown State: Unknown Protocol
Differencebetween 'Range#Include' and 'Range#Cover'
How to Capture Values in Command Line and Add to Recipe
Double-Splat Operator Destructively Modifies Hash - Is This a Ruby Bug
Peer-To-Peer File Sharing with Web Sockets
Where to Define Custom Error Types in Ruby And/Or Rails
Differencebetween Send_Data and Send_File in Ruby on Rails
Bundle Failing - Can't Find the Postgresql Client Library (Libpq)
Active Record - Find Records Which Were Created_At Before Today
Ruby on Rails: Can You Put Ruby Code in a Yaml Config File
Rails Forms for Has_Many Through Association with Additional Attributes