How to Get a Real Time Within Postgresql Transaction

How to get a real time within PostgreSQL transaction?

Timeofday()

May work for you.

How to find out when data was inserted to Postgres?

Postgres 9.5 or later

You can enable track_commit_timestamp in postgresql.conf (and restart) to start tracking commit timestamps. Then you can get a timestamp for your xmin. Related answer:

  • Atomically set SERIAL value when committing transaction

Postgres 9.4 or older

There is no such metadata in PostgreSQL unless you record it yourself.

You may be able to deduce some information from the row headers (HeapTupleHeaderData), in particular from the insert transaction id xmin. It holds the ID of the transaction in which the row was inserted (needed to decide visibility in PostgreSQL's MVCC model). Try (for any table):

SELECT xmin, * FROM tbl LIMIT 10;

Some limitations apply:

  • If the database was dumped and restored then, obviously, the information is gone - all rows are inserted in the same transaction.
  • If the database is huge / very old / very heavily written, then it may have gone through transaction ID wraparound, and the order of numbers in xmin is ambiguous.

But for most databases you should be able to derive:

  • the chronological order of INSERTs
  • which rows were inserted together
  • when there (probably) was a long period of time between inserts

No timestamp, though.

App to monitor PostgreSQL queries in real time?

With PostgreSQL 8.4 or higher you can use the contrib module pg_stat_statements to gather query execution statistics of the database server.

Run the SQL script of this contrib module pg_stat_statements.sql (on ubuntu it can be found in /usr/share/postgresql/<version>/contrib) in your database and add this sample configuration to your postgresql.conf (requires re-start):

custom_variable_classes = 'pg_stat_statements'
pg_stat_statements.max = 1000
pg_stat_statements.track = top # top,all,none
pg_stat_statements.save = off

To see what queries are executed in real time you might want to just configure the server log to show all queries or queries with a minimum execution time. To do so set the logging configuration parameters log_statement and log_min_duration_statement in your postgresql.conf accordingly.

Postgres now() timestamp doesn't change, when script works

From TFM, highlights mine:

9.9.4. Current Date/Time


PostgreSQL provides a number of functions that return values related
to the current date and time. These SQL-standard functions all
return values based on the start time of the current transaction
:

CURRENT_DATE
CURRENT_TIME
CURRENT_TIMESTAMP
CURRENT_TIME(precision)
CURRENT_TIMESTAMP(precision)
LOCALTIME
LOCALTIMESTAMP
LOCALTIME(precision)
LOCALTIMESTAMP(precision)

...

Since these functions return the start time of the current
transaction, their values do not change during the transaction. This
is considered a feature: the intent is to allow a single transaction
to have a consistent notion of the "current" time, so that multiple
modifications within the same transaction bear the same time stamp.

PostgreSQL also provides functions that return the start time of the
current statement, as well as the actual current time at the instant
the function is called. The complete list of non-SQL-standard time
functions is:

transaction_timestamp()
statement_timestamp()
clock_timestamp()
timeofday()
now()

transaction_timestamp() is equivalent to CURRENT_TIMESTAMP, but is
named to clearly reflect what it returns. statement_timestamp()
returns the start time of the current statement (more specifically,
the time of receipt of the latest command message from the client).
statement_timestamp() and transaction_timestamp() return the same
value during the first command of a transaction, but might differ
during subsequent commands. clock_timestamp() returns the actual
current time
, and therefore its value changes even within a single SQL
command. timeofday() is a historical PostgreSQL function. Like
clock_timestamp(), it returns the actual current time, but as a
formatted text string rather than a timestamp with time zone value.
now() is a traditional PostgreSQL equivalent to transaction_timestamp().

PostgreSQL to Data-Warehouse: Best approach for near-real-time ETL / extraction of data

Assuming that your tables of interest have (or can be augmented with) a unique, indexed, sequential key, then you will get much much better value out of simply issuing SELECT ... FROM table ... WHERE key > :last_max_key with output to a file, where last_max_key is the last key value from the last extraction (0 if first extraction.) This incremental, decoupled approach avoids introducing trigger latency in the insertion datapath (be it custom triggers or modified Slony), and depending on your setup could scale better with number of CPUs etc. (However, if you also have to track UPDATEs, and the sequential key was added by you, then your UPDATE statements should SET the key column to NULL so it gets a new value and gets picked by the next extraction. You would not be able to track DELETEs without a trigger.) Is this what you had in mind when you mentioned Talend?

I would not use the logging facility unless you cannot implement the solution above; logging most likely involves locking overhead to ensure log lines are written sequentially and do not overlap/overwrite each other when multiple backends write to the log (check the Postgres source.) The locking overhead may not be catastrophic, but you can do without it if you can use the incremental SELECT alternative. Moreover, statement logging would drown out any useful WARNING or ERROR messages, and the parsing itself will not be instantaneous.

Unless you are willing to parse WALs (including transaction state tracking, and being ready to rewrite the code everytime you upgrade Postgres) I would not necessarily use the WALs either -- that is, unless you have the extra hardware available, in which case you could ship WALs to another machine for extraction (on the second machine you can use triggers shamelessly -- or even statement logging -- since whatever happens there does not affect INSERT/UPDATE/DELETE performance on the primary machine.) Note that performance-wise (on the primary machine), unless you can write the logs to a SAN, you'd get a comparable performance hit (in terms of thrashing filesystem cache, mostly) from shipping WALs to a different machine as from running the incremental SELECT.



Related Topics



Leave a reply



Submit