How to get a real time within PostgreSQL transaction?
Timeofday()
May work for you.
How to find out when data was inserted to Postgres?
Postgres 9.5 or later
You can enable track_commit_timestamp
in postgresql.conf
(and restart) to start tracking commit timestamps. Then you can get a timestamp for your xmin
. Related answer:
- Atomically set SERIAL value when committing transaction
Postgres 9.4 or older
There is no such metadata in PostgreSQL unless you record it yourself.
You may be able to deduce some information from the row headers (HeapTupleHeaderData), in particular from the insert transaction id xmin
. It holds the ID of the transaction in which the row was inserted (needed to decide visibility in PostgreSQL's MVCC model). Try (for any table):
SELECT xmin, * FROM tbl LIMIT 10;
Some limitations apply:
- If the database was dumped and restored then, obviously, the information is gone - all rows are inserted in the same transaction.
- If the database is huge / very old / very heavily written, then it may have gone through transaction ID wraparound, and the order of numbers in
xmin
is ambiguous.
But for most databases you should be able to derive:
- the chronological order of INSERTs
- which rows were inserted together
- when there (probably) was a long period of time between inserts
No timestamp, though.
App to monitor PostgreSQL queries in real time?
With PostgreSQL 8.4 or higher you can use the contrib module pg_stat_statements to gather query execution statistics of the database server.
Run the SQL script of this contrib module pg_stat_statements.sql
(on ubuntu it can be found in /usr/share/postgresql/<version>/contrib
) in your database and add this sample configuration to your postgresql.conf
(requires re-start):
custom_variable_classes = 'pg_stat_statements'
pg_stat_statements.max = 1000
pg_stat_statements.track = top # top,all,none
pg_stat_statements.save = off
To see what queries are executed in real time you might want to just configure the server log to show all queries or queries with a minimum execution time. To do so set the logging configuration parameters log_statement
and log_min_duration_statement
in your postgresql.conf accordingly.
Postgres now() timestamp doesn't change, when script works
From TFM, highlights mine:
9.9.4. Current Date/Time
PostgreSQL provides a number of functions that return values related
to the current date and time. These SQL-standard functions all
return values based on the start time of the current transaction:CURRENT_DATE
CURRENT_TIME
CURRENT_TIMESTAMP
CURRENT_TIME(precision)
CURRENT_TIMESTAMP(precision)
LOCALTIME
LOCALTIMESTAMP
LOCALTIME(precision)
LOCALTIMESTAMP(precision)
...
Since these functions return the start time of the current
transaction, their values do not change during the transaction. This
is considered a feature: the intent is to allow a single transaction
to have a consistent notion of the "current" time, so that multiple
modifications within the same transaction bear the same time stamp.PostgreSQL also provides functions that return the start time of the
current statement, as well as the actual current time at the instant
the function is called. The complete list of non-SQL-standard time
functions is:transaction_timestamp()
statement_timestamp()
clock_timestamp()
timeofday()
now()
transaction_timestamp()
is equivalent toCURRENT_TIMESTAMP
, but is
named to clearly reflect what it returns.statement_timestamp()
returns the start time of the current statement (more specifically,
the time of receipt of the latest command message from the client).
statement_timestamp()
andtransaction_timestamp()
return the same
value during the first command of a transaction, but might differ
during subsequent commands.clock_timestamp()
returns the actual
current time, and therefore its value changes even within a single SQL
command.timeofday()
is a historical PostgreSQL function. Like
clock_timestamp()
, it returns the actual current time, but as a
formatted text string rather than a timestamp with time zone value.
now()
is a traditional PostgreSQL equivalent totransaction_timestamp()
.
PostgreSQL to Data-Warehouse: Best approach for near-real-time ETL / extraction of data
Assuming that your tables of interest have (or can be augmented with) a unique, indexed, sequential key, then you will get much much better value out of simply issuing SELECT ... FROM table ... WHERE key > :last_max_key
with output to a file, where last_max_key
is the last key value from the last extraction (0 if first extraction.) This incremental, decoupled approach avoids introducing trigger latency in the insertion datapath (be it custom triggers or modified Slony), and depending on your setup could scale better with number of CPUs etc. (However, if you also have to track UPDATE
s, and the sequential key was added by you, then your UPDATE
statements should SET
the key column to NULL
so it gets a new value and gets picked by the next extraction. You would not be able to track DELETE
s without a trigger.) Is this what you had in mind when you mentioned Talend?
I would not use the logging facility unless you cannot implement the solution above; logging most likely involves locking overhead to ensure log lines are written sequentially and do not overlap/overwrite each other when multiple backends write to the log (check the Postgres source.) The locking overhead may not be catastrophic, but you can do without it if you can use the incremental SELECT
alternative. Moreover, statement logging would drown out any useful WARNING or ERROR messages, and the parsing itself will not be instantaneous.
Unless you are willing to parse WALs (including transaction state tracking, and being ready to rewrite the code everytime you upgrade Postgres) I would not necessarily use the WALs either -- that is, unless you have the extra hardware available, in which case you could ship WALs to another machine for extraction (on the second machine you can use triggers shamelessly -- or even statement logging -- since whatever happens there does not affect INSERT
/UPDATE
/DELETE
performance on the primary machine.) Note that performance-wise (on the primary machine), unless you can write the logs to a SAN, you'd get a comparable performance hit (in terms of thrashing filesystem cache, mostly) from shipping WALs to a different machine as from running the incremental SELECT
.
Related Topics
Quickest/Easiest Way to Use Search/Replace Through All Stored Procedures
Sql Server Left Join and Where Clause
Sql Server Left Join with 'Or' Operator
How This SQL Injection Works? Explanation Needed
Using a View with No Primary Key with Entity
Case Statement in Where Clause - SQL Server
How to Select Only Row with Max Sequence Without Using a Subquery
Cannot Connect to Azure SQL Database, Even with Whitelisted Ip
Does "Select for Update" Prevent Other Connections Inserting When the Row Is Not Present
Pagination with The Stored Procedure
Bigquery Select * Except Nested Column
How to Query Database Name in Oracle SQL Developer
SQL Server - Asynchronous Query Execution
Add Indexes to Speed Up Geocoder Near Search
Computed Column Cannot Be Persisted
How to Concat_Ws Multiple Fields and Remove Duplicate Separators for Empty Slots