How to Update Selected Rows With Values from a CSV File in Postgres

How to update selected rows with values from a CSV file in Postgres?

COPY the file to a temporary staging table and update the actual table from there. Like:

CREATE TEMP TABLE tmp_x (id int, apple text, banana text); -- but see below

COPY tmp_x FROM '/absolute/path/to/file' (FORMAT csv);

UPDATE tbl
SET banana = tmp_x.banana
FROM tmp_x
WHERE tbl.id = tmp_x.id;

DROP TABLE tmp_x; -- else it is dropped at end of session automatically

If the imported table matches the table to be updated exactly, this may be convenient:

CREATE TEMP TABLE tmp_x AS SELECT * FROM tbl LIMIT 0;

Creates an empty temporary table matching the structure of the existing table, without constraints.

Privileges

Up to Postgres 10, SQL COPY requires superuser privileges for this.

In Postgres 11 or later, there are also some predefined roles (formerly "default roles") to allow it. The manual:

COPY naming a file or command is only allowed to database superusers
or users who are granted one of the roles pg_read_server_files,
pg_write_server_files, or pg_execute_server_program [...]

The psql meta-command \copy works for any db role. The manual:

Performs a frontend (client) copy. This is an operation that runs an
SQL COPY command, but instead of the server reading or writing the
specified file, psql reads or writes the file and routes the data
between the server and the local file system. This means that file
accessibility and privileges are those of the local user, not the
server, and no SQL superuser privileges are required.

The scope of temporary tables is limited to a single session of a single role, so the above has to be executed in the same psql session:

CREATE TEMP TABLE ...;
\copy tmp_x FROM '/absolute/path/to/file' (FORMAT csv);
UPDATE ...;

If you are scripting this in a bash command, be sure to wrap it all in a single psql call. Like:

echo 'CREATE TEMP TABLE tmp_x ...; \copy tmp_x FROM ...; UPDATE ...;' | psql

Normally, you need the meta-command \\ to switch between psql meta commands and SQL commands in psql, but \copy is an exception to this rule. The manual again:

special parsing rules apply to the \copy meta-command. Unlike most other meta-commands, the entire remainder of the line is always taken to be the arguments of \copy, and neither variable interpolation nor backquote expansion are performed in the arguments.

Big tables

If the import-table is big it may pay to increase temp_buffers temporarily for the session (first thing in the session):

SET temp_buffers = '500MB';  -- example value

Add an index to the temporary table:

CREATE INDEX tmp_x_id_idx ON tmp_x(id);

And run ANALYZE manually, since temporary tables are not covered by autovacuum / auto-analyze.

ANALYZE tmp_x;

Related answers:

  • Best way to delete millions of rows by ID
  • How can I insert common data into a temp table from disparate schemas?
  • How to delete duplicate entries?

Update table rows based on values in a text file

The simple way is to load the file with COPY into a temporary table and then use UPDATE ... FROM or INSERT ... ON CONFLICT to update your table.

Alternatively, you can use file_fdw to define the file as a foreign table and UPDATE using that.

How to update a column in Postgres with data from CSV file using matching values

You need to create an intermediate table (aka "staging table").

Then import the CSV file into that table. After that you can update the target table from the imported data:

update target_table
set email = t.user_eamil
from staging_table st
where st."user" = target_table.name;

This assumes that name is unique in your target table and that every user appears exactly once in the input file.

update and insert records from a csv file in postgresql

You should use the keyword EXCLUDED:

CREATE TEMP TABLE CUSTOMER_TEMP (LIKE migration.customer);

COPY CUSTOMER_TEMP(CUST_ID,CUST_CONTACT,CUST_COUNTRY,CUST_DETAIL) from 'C:\demo_dir\CUSTOMER_DATA_CHANGE.csv' DELIMITER '|';

INSERT INTO MIGRATION.CUSTOMER(CUST_ID,CUST_CONTACT,CUST_COUNTRY,CUST_DETAIL)
(SELECT T.CUST_ID,T.CUST_CONTACT,T.CUST_COUNTRY,T.CUST_DETAIL FROM CUSTOMER_TEMP AS T)
on conflict on constraint pk_cust
do update set CUST_CONTACT = EXCLUDED.CUST_CONTACT, CUST_COUNTRY = EXCLUDED.CUST_COUNTRY, CUST_DETAIL = EXCLUDED.CUST_DETAIL
;

DROP TABLE CUSTOMER_TEMP;

INSERT or UPDATE bulk data from dataframe/CSV to PostgreSQL database

In this particular case it is better to drop down to DB-API level, because you need some tools that are not exposed even by SQLAlchemy Core directly, such as copy_expert(). That can be done using raw_connection(). If your source data is a CSV file, you do not need pandas in this case at all. Start by creating a temporary staging table, copy data to the temp table, and insert to the destination table with conflict handling:

conn = engine.raw_connection()

try:
with conn.cursor() as cur:
cur.execute("""CREATE TEMPORARY TABLE TEST_STAGING ( LIKE TEST_TABLE )
ON COMMIT DROP""")

with open("your_source.csv") as data:
cur.copy_expert("""COPY TEST_STAGING ( itemid, title, street, pincode )
FROM STDIN WITH CSV""", data)

cur.execute("""INSERT INTO TEST_TABLE ( itemid, title, street, pincode )
SELECT itemid, title, street, pincode
FROM TEST_STAGING
ON CONFLICT ( itemid )
DO UPDATE SET title = EXCLUDED.title
, street = EXCLUDED.street
, pincode = EXCLUDED.pincode""")

except:
conn.rollback()
raise

else:
conn.commit()

finally:
conn.close()

If on the other hand your source data is the DataFrame, you can still use COPY by passing a function as method= to to_sql(). The function could even hide all the above logic:

import csv

from io import StringIO
from psycopg2 import sql

def psql_upsert_copy(table, conn, keys, data_iter):
dbapi_conn = conn.connection

buf = StringIO()
writer = csv.writer(buf)
writer.writerows(data_iter)
buf.seek(0)

if table.schema:
table_name = sql.SQL("{}.{}").format(
sql.Identifier(table.schema), sql.Identifier(table.name))
else:
table_name = sql.Identifier(table.name)

tmp_table_name = sql.Identifier(table.name + "_staging")
columns = sql.SQL(", ").join(map(sql.Identifier, keys))

with dbapi_conn.cursor() as cur:
# Create the staging table
stmt = "CREATE TEMPORARY TABLE {} ( LIKE {} ) ON COMMIT DROP"
stmt = sql.SQL(stmt).format(tmp_table_name, table_name)
cur.execute(stmt)

# Populate the staging table
stmt = "COPY {} ( {} ) FROM STDIN WITH CSV"
stmt = sql.SQL(stmt).format(tmp_table_name, columns)
cur.copy_expert(stmt, buf)

# Upsert from the staging table to the destination. First find
# out what the primary key columns are.
stmt = """
SELECT kcu.column_name
FROM information_schema.table_constraints tco
JOIN information_schema.key_column_usage kcu
ON kcu.constraint_name = tco.constraint_name
AND kcu.constraint_schema = tco.constraint_schema
WHERE tco.constraint_type = 'PRIMARY KEY'
AND tco.table_name = %s
"""
args = (table.name,)

if table.schema:
stmt += "AND tco.table_schema = %s"
args += (table.schema,)

cur.execute(stmt, args)
pk_columns = {row[0] for row in cur.fetchall()}
# Separate "data" columns from (primary) key columns
data_columns = [k for k in keys if k not in pk_columns]
# Build conflict_target
pk_columns = sql.SQL(", ").join(map(sql.Identifier, pk_columns))

set_ = sql.SQL(", ").join([
sql.SQL("{} = EXCLUDED.{}").format(k, k)
for k in map(sql.Identifier, data_columns)])

stmt = """
INSERT INTO {} ( {} )
SELECT {}
FROM {}
ON CONFLICT ( {} )
DO UPDATE SET {}
"""

stmt = sql.SQL(stmt).format(
table_name, columns, columns, tmp_table_name, pk_columns, set_)
cur.execute(stmt)

You would then insert the new DataFrame using

df.to_sql("test_table", engine,
method=psql_upsert_copy,
index=False,
if_exists="append")

Using this method upserting ~1,000,000 rows took about 16s on this machine with a local database.

Import CSV into Postgres and update/replace any fields if need so

You can do this by defining a trigger function which tries to update the existing records and only allows insertion to go ahead if none are found.

For this to work, you need to have a primary key or other criteria for uniquely identifying the rows, of course.

Suppose your table is defined like this:

CREATE TABLE TEST(
id INT PRIMARY KEY,
name TEXT,
amount INT
);

The trigger function might look like this:

CREATE OR REPLACE FUNCTION test_insert_before_func()
RETURNS TRIGGER
AS $BODY$
DECLARE
exists INTEGER;
BEGIN

UPDATE test SET name=new.name, amount=new.amount
WHERE id=new.id
RETURNING id INTO exists;

-- If the above was successful, it would return non-null
-- in that case we return NULL so that the triggered INSERT
-- does not proceed
IF exists is not null THEN
RETURN NULL;
END IF;

-- Otherwise, return the new record so that triggered INSERT
-- goes ahead
RETURN new;


END;
$BODY$
LANGUAGE 'plpgsql' SECURITY DEFINER;

CREATE TRIGGER test_insert_before_trigger
BEFORE INSERT
ON test
FOR EACH ROW
EXECUTE PROCEDURE test_insert_before_func();

Now, if I insert a row which does not already exist, it is inserted:

test=> insert into test(id,name,amount) values (1,'Mary',100);
INSERT 0 1
test=> select * from test;
id | name | amount
----+------+--------
1 | Mary | 100
(1 row)

If I try to insert a row with the same ID:

test=> insert into test(id,name,amount) values (1,'Mary',200);
INSERT 0 0
test=> select * from test;
id | name | amount
----+------+--------
1 | Mary | 200
(1 row)

this time the row is updated instead of inserted.

It works just as well if I load the rows from a CSV file.

However: one thing you may not have considered: this will not delete any records that exist in the database and do not exist in the CSV file. If you wanted that to work you would need a more complex solution - perhaps a sequence like this:

  1. Loading the CSV file into a temporary table
  2. Deleting all rows from the real table that did not exist in the temp. table

    DELETE FROM test WHERE id NOT IN (SELECT id FROM temp);
  3. Then finally insert rows from the temp. table into the real table:

    INSERT INTO test(id,name,amount) (SELECT id,name,amount FROM temp);

This answer does not consider concurrency issues, in case the table might be updated by other users concurrently. However if you only ever load if from the CSV file then that is not likely to be an issue.

Is it possible to export certain values from specific rows to a CSV in Datagrip

  1. Write the query that retrieves the needed records. In your case it can be smth like
SELECTclock_in_time, clock_out_time 
FROM table
WHERE user_id = "Bob"

  1. Export the result to the CSV via Export button on the result's toolbar: Sample Image


Related Topics



Leave a reply



Submit