Bulk/Batch Update/Upsert in Postgresql

Bulk/batch update/upsert in PostgreSQL

I've used 3 strategies for batch transactional work:

  1. Generate SQL statements on the fly, concatenate them with semicolons, and then submit the statements in one shot. I've done up to 100 inserts in this way, and it was quite efficient (done against Postgres).
  2. JDBC has batching capabilities built in, if configured. If you generate transactions, you can flush your JDBC statements so that they transact in one shot. This tactic requires fewer database calls, as the statements are all executed in one batch.
  3. Hibernate also supports JDBC batching along the lines of the previous example, but in this case you execute a flush() method against the Hibernate Session, not the underlying JDBC connection. It accomplishes the same thing as JDBC batching.

Incidentally, Hibernate also supports a batching strategy in collection fetching. If you annotate a collection with @BatchSize, when fetching associations, Hibernate will use IN instead of =, leading to fewer SELECT statements to load up the collections.

Bulk insert, update if on conflict (bulk upsert) on Postgres

Turns out a special table named excluded contains the row-to-be-inserted
(strange name though)

insert into USERS(
id, username, profile_picture)
select unnest(array['12345']),
unnest(array['Peter']),
unnest(array['someURL'])
on conflict (id) do
update set
username = excluded.username,
profile_picture = excluded.profile_picture;

http://www.postgresql.org/docs/9.5/static/sql-insert.html#SQL-ON-CONFLICT

The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the existing row using the table's name (or an alias), and to rows proposed for insertion using the special excluded table...

Most efficient way to do a bulk UPDATE with pairs of input

Normally you want to batch-update from a table with sufficient index to make the merge easy:

CREATE TEMP TABLE updates_table
( id integer not null primary key
, val varchar
);
INSERT into updates_table(id, val) VALUES
( 1, 'foo' ) ,( 2, 'bar' ) ,( 3, 'baz' )
;

UPDATE target_table t
SET value = u.val
FROM updates_table u
WHERE t.id = u.id
;

So you should probably populate your update_table by something like:


INSERT into updates_table(id, val)
SELECT
split_part(x,',',1)::INT AS id,
split_part(x,',',2)::VARCHAR AS value
FROM (
SELECT UNNEST(ARRAY['1,foo','2,bar','3,baz'])
) AS x
;

Remember: an index (or the primary key) on the id field in the updates_table is important. (but for small sets like this one, a hashjoin will probably by chosen by the optimiser)


In addition: for updates, it is important to avoid updates with the same value, these cause extra rowversions to be created + plus the resulting VACUUM activity after the update was committed:

UPDATE target_table t
SET value = u.val
FROM updates_table u
WHERE t.id = u.id
AND (t.value IS NULL OR t.value <> u.value)
;

How do I increase the speed of a bulk UPSERT in postgreSQL?

Sorting arglist by "variant_name" and "start" (the first two columns in the index) should make sure that most of the index lookups will be hitting already cached pages. Having the table also be clustered on that index would help make sure the table pages are also accessed in a cache friendly way (although it won't stay clustered very well in the face of new data).

Also, your index is gratuitously double the size it needs to be. There is no point in doing INCLUDE on a column that is already part of the main part of the index. That is going to cost you CPU and IO to format and write the data (and the WAL) and also reduce the amount of data which fits in cache.

Postgresql Batch insert and on conflict batch update

You are going to use obviously incorrect syntax. Having the table

create table a_table(id serial primary key, x1 int, x2 int);

try this in psql

insert into a_table (x1, x2) 
values (1,2), (3,4)
on conflict do
update set (x1, x2) = (1,2), (3,4);

to get

ERROR:  syntax error at or near "3"
LINE 4: update set (x1, x2) = (1,2), (3,4);

On the other hand, ON CONFLICT makes no sense in this case. A conflict will never happen, as none of the used columns (or group of columns) is unique.

Check INSERT syntax, read more about UPSERT in wiki.



Related Topics



Leave a reply



Submit