Postgresql Batch Insert or Ignore

how to emulate insert ignore and on duplicate key update (sql merge) with postgresql?

Try to do an UPDATE. If it doesn't modify any row that means it didn't exist, so do an insert. Obviously, you do this inside a transaction.

You can of course wrap this in a function if you don't want to put the extra code on the client side. You also need a loop for the very rare race condition in that thinking.

There's an example of this in the documentation: http://www.postgresql.org/docs/9.3/static/plpgsql-control-structures.html, example 40-2 right at the bottom.

That's usually the easiest way. You can do some magic with rules, but it's likely going to be a lot messier. I'd recommend the wrap-in-function approach over that any day.

This works for single row, or few row, values. If you're dealing with large amounts of rows for example from a subquery, you're best of splitting it into two queries, one for INSERT and one for UPDATE (as an appropriate join/subselect of course - no need to write your main filter twice)

Postgresql batch insert or ignore

There are 3 challenges.

  1. Your query has no JOIN condition between the tables phones and groups, making this effectively a limited CROSS JOIN - which you most probably do not intend. I.e. every phone that qualifies is combined with every group that qualifies. If you have 100 phones and 100 groups that's already 10,000 combinations.

  2. Insert distinct combinations of (group_id, phone_name)

  3. Avoid inserting rows that are already there in table group_phones .

All things considered it could look like this:

INSERT INTO group_phones(group_id, phone_name)
SELECT i.id, i.name
FROM (
SELECT DISTINCT g.id, p.name -- get distinct combinations
FROM phones p
JOIN groups g ON ??how are p & g connected??
WHERE g.id IN ($add_groups)
AND p.name IN ($phones)
) i
LEFT JOIN group_phones gp ON (gp.group_id, gp.phone_name) = (i.id, i.name)
WHERE gp.group_id IS NULL -- avoid duping existing rows

Concurrency

This form minimizes the chance of a race condition with concurrent write operations. If your table has heavy concurrent write load, you may want to lock the table exclusively or use serializable transaction isolation, This safeguard against the extremely unlikely case that a row is altered by a concurrent transaction in the tiny time slot between the constraint verification (row isn't there) and the write operation in the query.

BEGIN ISOLATION LEVEL SERIALIZABLE;
INSERT ...
COMMIT;

Be prepared to repeat the transaction if it rolls back with a serialization error.
For more on that topic good starting points could be this blog post by @depesz or this related question on SO.

Normally, though, you needn't even bother with any of this.

Performance

LEFT JOIN tbl ON right_col = left_col WHERE right_col IS NULL

is generally the fastest method with distinct columns in the right table. If you have dupes in the column (especially if there are many),

WHERE NOT EXISTS (SELECT 1 FROM tbl WHERE right_col = left_col)

May be faster because it can stop to scan as soon as the first row is found.

You can also use IN, like @dezso demonstrates, but it is usually slower in PostgreSQL.

How to Ignore error in batch insert Postgresql

Solution

You could insert using the WHERE NOT EXISTS clause.

For example, consider a test table with a numeric id as primary key and a textual name.

Code

db=> CREATE TABLE test(id BIGSERIAL PRIMARY KEY, name TEXT);
CREATE TABLE

-- Insertion will work - empty table
db=> INSERT INTO test(id, name)
SELECT 1, 'Partner number 1'
WHERE NOT EXISTS (SELECT 1,2 FROM test WHERE id=1);
INSERT 0 1

-- Insertion will NOT work - duplicate id
db=> INSERT INTO test(id, name)
SELECT 1, 'Partner number 1'
WHERE NOT EXISTS (SELECT 1,2 FROM test WHERE id=1);
INSERT 0 0

-- After two insertions, the table contains only one row
db=> SELECT * FROM test;
id | name
----+------------------
1 | Partner number 1
(1 row)

Difference from ON CONFILCT

Quoting the documentation:

ON CONFLICT can be used to specify an alternative action to raising a unique constraint or exclusion constraint violation error.

The action can be DO NOTHING, or a DO UPDATE. The second approach is often referred to as Upsert - a portmanteau of Insert and Update.

Technically WHERE NOT EXISTS is equivalent to ON CONFILCT DO NOTHING. See the query plans for a deeper dive.

how to have postgres ignore inserts with a duplicate key but keep going

If you're using Postgres 9.5 or newer (which I assume you are, since it was released back in January 2016), there's a very useful ON CONFLICT cluase you can use:

INSERT INTO mytable (id, col1, col2)
VALUES (123, 'some_value', 'some_other_value')
ON CONFLICT (id) DO NOTHING

Postgresql Insert in multiple table using WITH and Ignore erros

How about changing the second insert to this?

INSERT INTO d_security_ticker (ticker, security_id )
SELECT %s, fi.security_id
FROM first_insert fi
WHERE fi.security_id IS NOT NULL;

Insert, on duplicate update in PostgreSQL?

PostgreSQL since version 9.5 has UPSERT syntax, with ON CONFLICT clause. with the following syntax (similar to MySQL)

INSERT INTO the_table (id, column_1, column_2) 
VALUES (1, 'A', 'X'), (2, 'B', 'Y'), (3, 'C', 'Z')
ON CONFLICT (id) DO UPDATE
SET column_1 = excluded.column_1,
column_2 = excluded.column_2;

Searching postgresql's email group archives for "upsert" leads to finding an example of doing what you possibly want to do, in the manual:

Example 38-2. Exceptions with UPDATE/INSERT

This example uses exception handling to perform either UPDATE or INSERT, as appropriate:

CREATE TABLE db (a INT PRIMARY KEY, b TEXT);

CREATE FUNCTION merge_db(key INT, data TEXT) RETURNS VOID AS
$$
BEGIN
LOOP
-- first try to update the key
-- note that "a" must be unique
UPDATE db SET b = data WHERE a = key;
IF found THEN
RETURN;
END IF;
-- not there, so try to insert the key
-- if someone else inserts the same key concurrently,
-- we could get a unique-key failure
BEGIN
INSERT INTO db(a,b) VALUES (key, data);
RETURN;
EXCEPTION WHEN unique_violation THEN
-- do nothing, and loop to try the UPDATE again
END;
END LOOP;
END;
$$
LANGUAGE plpgsql;

SELECT merge_db(1, 'david');
SELECT merge_db(1, 'dennis');

There's possibly an example of how to do this in bulk, using CTEs in 9.1 and above, in the hackers mailing list:

WITH foos AS (SELECT (UNNEST(%foo[])).*)
updated as (UPDATE foo SET foo.a = foos.a ... RETURNING foo.id)
INSERT INTO foo SELECT foos.* FROM foos LEFT JOIN updated USING(id)
WHERE updated.id IS NULL;

See a_horse_with_no_name's answer for a clearer example.

Sqlalalchemy postgresql insert statement ignore duplicates

By default sqlalchemy doesn't support ignore_duplicates option, so we need to add custom method to sqlalchemy query compiler by simply adding ON CONFLICT DO NOTHING string to the end of the query

from sqlalchemy.dialects.postgresql import insert
from sqlalchemy.ext.compiler import compiles

@compiles(Insert, 'postgresql')
def ignore_duplicates(insert, compiler, **kw):
s = compiler.visit_insert(insert, **kw)
ignore = insert.kwargs.get('postgresql_ignore_duplicates', False)
return s if not ignore else s + ' ON CONFLICT DO NOTHING'

Insert.argument_for('postgresql', 'ignore_duplicates', None)

stmt = insert(User, postgresql_ignore_duplicates=True, inline=True)

values = [{"id": 1, "name": "David"}, {"id": 2, "name": "Cris"}]

session.execute(stmt, values)
session.commit()


Related Topics



Leave a reply



Submit