Merge a table and a change log into a view in PostgreSQL
Assuming Postgres 9.1 or later.
I simplified / optimized your basic query to retrieve the latest values:
SELECT DISTINCT ON (1,2)
c.unique_id, a.attname AS col, c.value
FROM pg_attribute a
LEFT JOIN changes c ON c.column_name = a.attname
AND c.table_name = 'instances'
-- AND c.unique_id = 3 -- uncomment to fetch single row
WHERE a.attrelid = 'instances'::regclass -- schema-qualify to be clear?
AND a.attnum > 0 -- no system columns
AND NOT a.attisdropped -- no deleted columns
ORDER BY 1, 2, c.updated_at DESC;
I query the PostgreSQL catalog instead of the standard information schema because that is faster. Note the special cast to ::regclass
.
Now, that gives you a table. You want all values for one unique_id
in a row.
To achieve that you have basically three options:
One subselect (or join) per column. Expensive and unwieldy. But a valid option for only a few columns.
A big
CASE
statement.A pivot function. PostgreSQL provides the
crosstab()
function in the additional moduletablefunc
for that.
Basic instructions:- PostgreSQL Crosstab Query
Basic pivot table with crosstab()
I completely rewrote the function:
SELECT *
FROM crosstab(
$x$
SELECT DISTINCT ON (1, 2)
unique_id, column_name, value
FROM changes
WHERE table_name = 'instances'
-- AND unique_id = 3 -- un-comment to fetch single row
ORDER BY 1, 2, updated_at DESC;
$x$,
$y$
SELECT attname
FROM pg_catalog.pg_attribute
WHERE attrelid = 'instances'::regclass -- possibly schema-qualify table name
AND attnum > 0
AND NOT attisdropped
AND attname <> 'unique_id'
ORDER BY attnum
$y$
)
AS tbl (
unique_id integer
-- !!! You have to list all columns in order here !!! --
);
I separated the catalog lookup from the value query, as the crosstab()
function with two parameters provides column names separately. Missing values (no entry in changes) are substituted with NULL
automatically. A perfect match for this use case!
Assuming that attname
matches column_name
. Excluding unique_id
, which plays a special role.
Full automation
Addressing your comment: There is a way to supply the column definition list automatically. It's not for the faint of heart, though.
I use a number of advanced Postgres features here: crosstab()
, plpgsql function with dynamic SQL, composite type handling, advanced dollar quoting, catalog lookup, aggregate function, window function, object identifier type, ...
Test environment:
CREATE TABLE instances (
unique_id int
, col1 text
, col2 text -- two columns are enough for the demo
);
INSERT INTO instances VALUES
(1, 'foo1', 'bar1')
, (2, 'foo2', 'bar2')
, (3, 'foo3', 'bar3')
, (4, 'foo4', 'bar4');
CREATE TABLE changes (
unique_id int
, table_name text
, column_name text
, value text
, updated_at timestamp
);
INSERT INTO changes VALUES
(1, 'instances', 'col1', 'foo11', '2012-04-12 00:01')
, (1, 'instances', 'col1', 'foo12', '2012-04-12 00:02')
, (1, 'instances', 'col1', 'foo1x', '2012-04-12 00:03')
, (1, 'instances', 'col2', 'bar11', '2012-04-12 00:11')
, (1, 'instances', 'col2', 'bar17', '2012-04-12 00:12')
, (1, 'instances', 'col2', 'bar1x', '2012-04-12 00:13')
, (2, 'instances', 'col1', 'foo2x', '2012-04-12 00:01')
, (2, 'instances', 'col2', 'bar2x', '2012-04-12 00:13')
-- NO change for col1 of row 3 - to test NULLs
, (3, 'instances', 'col2', 'bar3x', '2012-04-12 00:13');
-- NO changes at all for row 4 - to test NULLs
Automated function for one table
CREATE OR REPLACE FUNCTION f_curr_instance(int, OUT t public.instances) AS
$func$
BEGIN
EXECUTE $f$
SELECT *
FROM crosstab($x$
SELECT DISTINCT ON (1,2)
unique_id, column_name, value
FROM changes
WHERE table_name = 'instances'
AND unique_id = $f$ || $1 || $f$
ORDER BY 1, 2, updated_at DESC;
$x$
, $y$
SELECT attname
FROM pg_catalog.pg_attribute
WHERE attrelid = 'public.instances'::regclass
AND attnum > 0
AND NOT attisdropped
AND attname <> 'unique_id'
ORDER BY attnum
$y$) AS tbl ($f$
|| (SELECT string_agg(attname || ' ' || atttypid::regtype::text
, ', ' ORDER BY attnum) -- must be in order
FROM pg_catalog.pg_attribute
WHERE attrelid = 'public.instances'::regclass
AND attnum > 0
AND NOT attisdropped)
|| ')'
INTO t;
END
$func$ LANGUAGE plpgsql;
The table instances
is hard-coded, schema qualified to be unambiguous. Note the use of the table type as return type. There is a row type registered automatically for every table in PostgreSQL. This is bound to match the return type of the crosstab()
function.
This binds the function to the type of the table:
- You will get an error message if you try to
DROP
the table - Your function will fail after an
ALTER TABLE
. You have to recreate it (without changes). I consider this a bug in 9.1.ALTER TABLE
shouldn't silently break the function, but raise an error.
This performs very well.
Call:
SELECT * FROM f_curr_instance(3);
unique_id | col1 | col2
----------+-------+-----
3 |<NULL> | bar3x
Note how col1
is NULL
here.
Use in a query to display an instance with its latest values:
SELECT i.unique_id
, COALESCE(c.col1, i.col1)
, COALESCE(c.col2, i.col2)
FROM instances i
LEFT JOIN f_curr_instance(3) c USING (unique_id)
WHERE i.unique_id = 3;
Full automation for any table
(Added 2016. This is dynamite.)
Requires Postgres 9.1 or later. (Could be made out to work with pg 8.4, but I didn't bother to backpatch.)
CREATE OR REPLACE FUNCTION f_curr_instance(_id int, INOUT _t ANYELEMENT) AS
$func$
DECLARE
_type text := pg_typeof(_t);
BEGIN
EXECUTE
(
SELECT format
($f$
SELECT *
FROM crosstab(
$x$
SELECT DISTINCT ON (1,2)
unique_id, column_name, value
FROM changes
WHERE table_name = %1$L
AND unique_id = %2$s
ORDER BY 1, 2, updated_at DESC;
$x$
, $y$
SELECT attname
FROM pg_catalog.pg_attribute
WHERE attrelid = %1$L::regclass
AND attnum > 0
AND NOT attisdropped
AND attname <> 'unique_id'
ORDER BY attnum
$y$) AS ct (%3$s)
$f$
, _type, _id
, string_agg(attname || ' ' || atttypid::regtype::text
, ', ' ORDER BY attnum) -- must be in order
)
FROM pg_catalog.pg_attribute
WHERE attrelid = _type::regclass
AND attnum > 0
AND NOT attisdropped
)
INTO _t;
END
$func$ LANGUAGE plpgsql;
Call (providing the table type with NULL::public.instances
:
SELECT * FROM f_curr_instance(3, NULL::public.instances);
Related:
- Refactor a PL/pgSQL function to return the output of various SELECT queries
- How to set value of composite variable field using dynamic SQL
How to merge multiple tables in postgreSQL (using wildcard)
As others have mentioned, there is no mechanism to prepend columns without dropping/recreating tables or individual columns.
To append columns to every table that starts with "m", you can use an anonymous plpgsql code block:
DO
$$
DECLARE tbl_ident regclass;
BEGIN
FOR tbl_ident IN
SELECT
oid::regclass fqn --this will give you schema qualified, properly quoted table name
FROM
pg_class
WHERE
'my_schema'::regnamespace::oid = relnamespace --pick schema(s) you care about
and relkind = 'r' --ordinary tables only. Does not include views (v),materialized views (m),partitioned tables (p) or foreign tables (f)
and relname ilike 'm%'
LOOP
-- append the columns you want to the tables
EXECUTE 'ALTER TABLE ' || tbl_ident || ' ADD COLUMN col1 integer, ADD COLUMN col2 varchar(1337)';
END LOOP;
END$$
PostgreSQL Table merge
To "merge" this tables you need:
1) Add "machinetype" and "location" columns to the "Network Status" table:
ALTER TABLE "Network Status"
ADD COLUMN "machinetype" TEXT;
ALTER TABLE "Network Status"
ADD COLUMN "location" TEXT;
2) Populate them with data
UPDATE "Network Status" NS
SET "machinetype" = FL."machinetype",
"location" = FL."location"
FROM "Firewall Log" FL
WHERE FL."ipaddress" = NS."ipaddress"
Or you can create a VIEW
and use it to SELECT
data as if it is a table:
CREATE VIEW "Status and log" AS
SELECT NS."ipaddress", NS."time", NS."violation", FL."machinetype", FL."location"
FROM "Network Status" NS
JOIN "Firewall Log" FL ON FL."ipaddress" = NS."ipaddress"
Combining 2 views into one in PostgreSQL
"from my understanding UNION and UNION ALL only work when combining 2 SELECT Queries"
Ummmm...no. You can do something like the following:
select col1,col2
from table
union all
select col1,col2
from some_other_table
union all
select col1,col2
from yet_another_table;
etc, etc.
updating table rows in postgres using subquery
Postgres allows:
UPDATE dummy
SET customer=subquery.customer,
address=subquery.address,
partn=subquery.partn
FROM (SELECT address_id, customer, address, partn
FROM /* big hairy SQL */ ...) AS subquery
WHERE dummy.address_id=subquery.address_id;
This syntax is not standard SQL, but it is much more convenient for this type of query than standard SQL. I believe Oracle (at least) accepts something similar.
How to update a table that is built from an inner join of two other tables
According to Postgres document, You can make query like below:
[ WITH [ RECURSIVE ] with_query [, ...] ]
UPDATE [ ONLY ] table_name [ * ] [ [ AS ] alias ]
SET { column_name = { expression | DEFAULT } |
( column_name [, ...] ) = [ ROW ] ( { expression | DEFAULT } [, ...] ) |
( column_name [, ...] ) = ( sub-SELECT )
} [, ...]
[ FROM from_item [, ...] ]
[ WHERE condition | WHERE CURRENT OF cursor_name ]
[ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ]
Sample query:
UPDATE
customer u_c
SET
customer_id = c.customer_id,
first_name = c.first_name,
last_name = c.last_name,
payment_id = p.payment_id,
amount = p.amount
FROM
payment p
INNER JOIN customer c ON p.customer_id = c.customer_id
WHERE
c.id = u_c.id
How to UPSERT (MERGE, INSERT ... ON DUPLICATE UPDATE) in PostgreSQL?
9.5 and newer:
PostgreSQL 9.5 and newer support INSERT ... ON CONFLICT (key) DO UPDATE
(and ON CONFLICT (key) DO NOTHING
), i.e. upsert.
Comparison with ON DUPLICATE KEY UPDATE
.
Quick explanation.
For usage see the manual - specifically the conflict_action clause in the syntax diagram, and the explanatory text.
Unlike the solutions for 9.4 and older that are given below, this feature works with multiple conflicting rows and it doesn't require exclusive locking or a retry loop.
The commit adding the feature is here and the discussion around its development is here.
If you're on 9.5 and don't need to be backward-compatible you can stop reading now.
9.4 and older:
PostgreSQL doesn't have any built-in UPSERT
(or MERGE
) facility, and doing it efficiently in the face of concurrent use is very difficult.
This article discusses the problem in useful detail.
In general you must choose between two options:
- Individual insert/update operations in a retry loop; or
- Locking the table and doing batch merge
Individual row retry loop
Using individual row upserts in a retry loop is the reasonable option if you want many connections concurrently trying to perform inserts.
The PostgreSQL documentation contains a useful procedure that'll let you do this in a loop inside the database. It guards against lost updates and insert races, unlike most naive solutions. It will only work in READ COMMITTED
mode and is only safe if it's the only thing you do in the transaction, though. The function won't work correctly if triggers or secondary unique keys cause unique violations.
This strategy is very inefficient. Whenever practical you should queue up work and do a bulk upsert as described below instead.
Many attempted solutions to this problem fail to consider rollbacks, so they result in incomplete updates. Two transactions race with each other; one of them successfully INSERT
s; the other gets a duplicate key error and does an UPDATE
instead. The UPDATE
blocks waiting for the INSERT
to rollback or commit. When it rolls back, the UPDATE
condition re-check matches zero rows, so even though the UPDATE
commits it hasn't actually done the upsert you expected. You have to check the result row counts and re-try where necessary.
Some attempted solutions also fail to consider SELECT races. If you try the obvious and simple:
-- THIS IS WRONG. DO NOT COPY IT. It's an EXAMPLE.
BEGIN;
UPDATE testtable
SET somedata = 'blah'
WHERE id = 2;
-- Remember, this is WRONG. Do NOT COPY IT.
INSERT INTO testtable (id, somedata)
SELECT 2, 'blah'
WHERE NOT EXISTS (SELECT 1 FROM testtable WHERE testtable.id = 2);
COMMIT;
then when two run at once there are several failure modes. One is the already discussed issue with an update re-check. Another is where both UPDATE
at the same time, matching zero rows and continuing. Then they both do the EXISTS
test, which happens before the INSERT
. Both get zero rows, so both do the INSERT
. One fails with a duplicate key error.
This is why you need a re-try loop. You might think that you can prevent duplicate key errors or lost updates with clever SQL, but you can't. You need to check row counts or handle duplicate key errors (depending on the chosen approach) and re-try.
Please don't roll your own solution for this. Like with message queuing, it's probably wrong.
Bulk upsert with lock
Sometimes you want to do a bulk upsert, where you have a new data set that you want to merge into an older existing data set. This is vastly more efficient than individual row upserts and should be preferred whenever practical.
In this case, you typically follow the following process:
CREATE
aTEMPORARY
tableCOPY
or bulk-insert the new data into the temp tableLOCK
the target tableIN EXCLUSIVE MODE
. This permits other transactions toSELECT
, but not make any changes to the table.Do an
UPDATE ... FROM
of existing records using the values in the temp table;Do an
INSERT
of rows that don't already exist in the target table;COMMIT
, releasing the lock.
For example, for the example given in the question, using multi-valued INSERT
to populate the temp table:
BEGIN;
CREATE TEMPORARY TABLE newvals(id integer, somedata text);
INSERT INTO newvals(id, somedata) VALUES (2, 'Joe'), (3, 'Alan');
LOCK TABLE testtable IN EXCLUSIVE MODE;
UPDATE testtable
SET somedata = newvals.somedata
FROM newvals
WHERE newvals.id = testtable.id;
INSERT INTO testtable
SELECT newvals.id, newvals.somedata
FROM newvals
LEFT OUTER JOIN testtable ON (testtable.id = newvals.id)
WHERE testtable.id IS NULL;
COMMIT;
Related reading
- UPSERT wiki page
- UPSERTisms in Postgres
- Insert, on duplicate update in PostgreSQL?
- http://petereisentraut.blogspot.com/2010/05/merge-syntax.html
- Upsert with a transaction
- Is SELECT or INSERT in a function prone to race conditions?
- SQL
MERGE
on the PostgreSQL wiki - Most idiomatic way to implement UPSERT in Postgresql nowadays
What about MERGE
?
SQL-standard MERGE
actually has poorly defined concurrency semantics and is not suitable for upserting without locking a table first.
It's a really useful OLAP statement for data merging, but it's not actually a useful solution for concurrency-safe upsert. There's lots of advice to people using other DBMSes to use MERGE
for upserts, but it's actually wrong.
Other DBs:
INSERT ... ON DUPLICATE KEY UPDATE
in MySQLMERGE
from MS SQL Server (but see above aboutMERGE
problems)MERGE
from Oracle (but see above aboutMERGE
problems)
Concatenate multiple result rows of one column into one, group by another column
Simpler with the aggregate function string_agg()
(Postgres 9.0 or later):
SELECT movie, string_agg(actor, ', ') AS actor_list
FROM tbl
GROUP BY 1;
The 1
in GROUP BY 1
is a positional reference and a shortcut for GROUP BY movie
in this case.
string_agg()
expects data type text
as input. Other types need to be cast explicitly (actor::text
) - unless an implicit cast to text
is defined - which is the case for all other string types (varchar
, character
, name
, ...) and some other types.
As isapir commented, you can add an ORDER BY
clause in the aggregate call to get a sorted list - should you need that. Like:
SELECT movie, string_agg(actor, ', ' ORDER BY actor) AS actor_list
FROM tbl
GROUP BY 1;
But it's typically faster to sort rows in a subquery. See:
- Create array in SELECT
Lock level on MERGE statement
From the manual:
ROW EXCLUSIVE Conflicts with the SHARE, SHARE ROW EXCLUSIVE,
EXCLUSIVE, and ACCESS EXCLUSIVE lock modes.The commands UPDATE, DELETE, INSERT, and MERGE acquire this lock mode
on the target table (in addition to ACCESS SHARE locks on any other
referenced tables). In general, this lock mode will be acquired by any
command that modifies data in a table.
Related Topics
Strpos() with Multiple Needles
Easiest Way to Implode() a Two-Dimensional Array
Wrapping a Div Around Every Third Item in a Foreach Loop PHP
Extracting Matches from PHP Regex
Sharing PHP Session ($_Session) Across Multiple Domain
Get Last Word from Url After a Slash in PHP
Does PHP Feature Short Hand Syntax for Objects
Can Not Increase File Upload Size Wamp
Laravel 5 - Session Doesn't Work
How to Send Https Posts Using PHP
How to Select a Result from the Select2 Search Results
Selecting Rows Where a Field Is Null Using PHP Pdo Prepared Statements and MySQL
How to Get a List of MySQL Databases in PHP Using Pdo
PHP Domdocument - Get HTML Source of Body
Php: How to Check If the Client Is Local