How to update selected rows with values from a CSV file in Postgres?
COPY
the file to a temporary staging table and update the actual table from there. Like:
CREATE TEMP TABLE tmp_x (id int, apple text, banana text); -- but see below
COPY tmp_x FROM '/absolute/path/to/file' (FORMAT csv);
UPDATE tbl
SET banana = tmp_x.banana
FROM tmp_x
WHERE tbl.id = tmp_x.id;
DROP TABLE tmp_x; -- else it is dropped at end of session automatically
If the imported table matches the table to be updated exactly, this may be convenient:
CREATE TEMP TABLE tmp_x AS SELECT * FROM tbl LIMIT 0;
Creates an empty temporary table matching the structure of the existing table, without constraints.
Privileges
Up to Postgres 10, SQL COPY
requires superuser privileges for this.
In Postgres 11 or later, there are also some predefined roles (formerly "default roles") to allow it. The manual:
COPY
naming a file or command is only allowed to database superusers
or users who are granted one of the rolespg_read_server_files
,pg_write_server_files
, orpg_execute_server_program
[...]
The psql meta-command \copy
works for any db role. The manual:
Performs a frontend (client) copy. This is an operation that runs an
SQLCOPY
command, but instead of the server reading or writing the
specified file, psql reads or writes the file and routes the data
between the server and the local file system. This means that file
accessibility and privileges are those of the local user, not the
server, and no SQL superuser privileges are required.
The scope of temporary tables is limited to a single session of a single role, so the above has to be executed in the same psql session:
CREATE TEMP TABLE ...;
\copy tmp_x FROM '/absolute/path/to/file' (FORMAT csv);
UPDATE ...;
If you are scripting this in a bash command, be sure to wrap it all in a single psql call. Like:
echo 'CREATE TEMP TABLE tmp_x ...; \copy tmp_x FROM ...; UPDATE ...;' | psql
Normally, you need the meta-command \\
to switch between psql meta commands and SQL commands in psql, but \copy
is an exception to this rule. The manual again:
special parsing rules apply to the
\copy
meta-command. Unlike most other meta-commands, the entire remainder of the line is always taken to be the arguments of\copy
, and neither variable interpolation nor backquote expansion are performed in the arguments.
Big tables
If the import-table is big it may pay to increase temp_buffers
temporarily for the session (first thing in the session):
SET temp_buffers = '500MB'; -- example value
Add an index to the temporary table:
CREATE INDEX tmp_x_id_idx ON tmp_x(id);
And run ANALYZE
manually, since temporary tables are not covered by autovacuum / auto-analyze.
ANALYZE tmp_x;
Related answers:
- Best way to delete millions of rows by ID
- How can I insert common data into a temp table from disparate schemas?
- How to delete duplicate entries?
Update table rows based on values in a text file
The simple way is to load the file with COPY
into a temporary table and then use UPDATE ... FROM
or INSERT ... ON CONFLICT
to update your table.
Alternatively, you can use file_fdw
to define the file as a foreign table and UPDATE
using that.
How to update a column in Postgres with data from CSV file using matching values
You need to create an intermediate table (aka "staging table").
Then import the CSV file into that table. After that you can update the target table from the imported data:
update target_table
set email = t.user_eamil
from staging_table st
where st."user" = target_table.name;
This assumes that name
is unique in your target table and that every user appears exactly once in the input file.
update and insert records from a csv file in postgresql
You should use the keyword EXCLUDED:
CREATE TEMP TABLE CUSTOMER_TEMP (LIKE migration.customer);
COPY CUSTOMER_TEMP(CUST_ID,CUST_CONTACT,CUST_COUNTRY,CUST_DETAIL) from 'C:\demo_dir\CUSTOMER_DATA_CHANGE.csv' DELIMITER '|';
INSERT INTO MIGRATION.CUSTOMER(CUST_ID,CUST_CONTACT,CUST_COUNTRY,CUST_DETAIL)
(SELECT T.CUST_ID,T.CUST_CONTACT,T.CUST_COUNTRY,T.CUST_DETAIL FROM CUSTOMER_TEMP AS T)
on conflict on constraint pk_cust
do update set CUST_CONTACT = EXCLUDED.CUST_CONTACT, CUST_COUNTRY = EXCLUDED.CUST_COUNTRY, CUST_DETAIL = EXCLUDED.CUST_DETAIL
;
DROP TABLE CUSTOMER_TEMP;
INSERT or UPDATE bulk data from dataframe/CSV to PostgreSQL database
In this particular case it is better to drop down to DB-API level, because you need some tools that are not exposed even by SQLAlchemy Core directly, such as copy_expert()
. That can be done using raw_connection()
. If your source data is a CSV file, you do not need pandas in this case at all. Start by creating a temporary staging table, copy data to the temp table, and insert to the destination table with conflict handling:
conn = engine.raw_connection()
try:
with conn.cursor() as cur:
cur.execute("""CREATE TEMPORARY TABLE TEST_STAGING ( LIKE TEST_TABLE )
ON COMMIT DROP""")
with open("your_source.csv") as data:
cur.copy_expert("""COPY TEST_STAGING ( itemid, title, street, pincode )
FROM STDIN WITH CSV""", data)
cur.execute("""INSERT INTO TEST_TABLE ( itemid, title, street, pincode )
SELECT itemid, title, street, pincode
FROM TEST_STAGING
ON CONFLICT ( itemid )
DO UPDATE SET title = EXCLUDED.title
, street = EXCLUDED.street
, pincode = EXCLUDED.pincode""")
except:
conn.rollback()
raise
else:
conn.commit()
finally:
conn.close()
If on the other hand your source data is the DataFrame
, you can still use COPY
by passing a function as method=
to to_sql()
. The function could even hide all the above logic:
import csv
from io import StringIO
from psycopg2 import sql
def psql_upsert_copy(table, conn, keys, data_iter):
dbapi_conn = conn.connection
buf = StringIO()
writer = csv.writer(buf)
writer.writerows(data_iter)
buf.seek(0)
if table.schema:
table_name = sql.SQL("{}.{}").format(
sql.Identifier(table.schema), sql.Identifier(table.name))
else:
table_name = sql.Identifier(table.name)
tmp_table_name = sql.Identifier(table.name + "_staging")
columns = sql.SQL(", ").join(map(sql.Identifier, keys))
with dbapi_conn.cursor() as cur:
# Create the staging table
stmt = "CREATE TEMPORARY TABLE {} ( LIKE {} ) ON COMMIT DROP"
stmt = sql.SQL(stmt).format(tmp_table_name, table_name)
cur.execute(stmt)
# Populate the staging table
stmt = "COPY {} ( {} ) FROM STDIN WITH CSV"
stmt = sql.SQL(stmt).format(tmp_table_name, columns)
cur.copy_expert(stmt, buf)
# Upsert from the staging table to the destination. First find
# out what the primary key columns are.
stmt = """
SELECT kcu.column_name
FROM information_schema.table_constraints tco
JOIN information_schema.key_column_usage kcu
ON kcu.constraint_name = tco.constraint_name
AND kcu.constraint_schema = tco.constraint_schema
WHERE tco.constraint_type = 'PRIMARY KEY'
AND tco.table_name = %s
"""
args = (table.name,)
if table.schema:
stmt += "AND tco.table_schema = %s"
args += (table.schema,)
cur.execute(stmt, args)
pk_columns = {row[0] for row in cur.fetchall()}
# Separate "data" columns from (primary) key columns
data_columns = [k for k in keys if k not in pk_columns]
# Build conflict_target
pk_columns = sql.SQL(", ").join(map(sql.Identifier, pk_columns))
set_ = sql.SQL(", ").join([
sql.SQL("{} = EXCLUDED.{}").format(k, k)
for k in map(sql.Identifier, data_columns)])
stmt = """
INSERT INTO {} ( {} )
SELECT {}
FROM {}
ON CONFLICT ( {} )
DO UPDATE SET {}
"""
stmt = sql.SQL(stmt).format(
table_name, columns, columns, tmp_table_name, pk_columns, set_)
cur.execute(stmt)
You would then insert the new DataFrame
using
df.to_sql("test_table", engine,
method=psql_upsert_copy,
index=False,
if_exists="append")
Using this method upserting ~1,000,000 rows took about 16s on this machine with a local database.
Import CSV into Postgres and update/replace any fields if need so
You can do this by defining a trigger function which tries to update the existing records and only allows insertion to go ahead if none are found.
For this to work, you need to have a primary key or other criteria for uniquely identifying the rows, of course.
Suppose your table is defined like this:
CREATE TABLE TEST(
id INT PRIMARY KEY,
name TEXT,
amount INT
);
The trigger function might look like this:
CREATE OR REPLACE FUNCTION test_insert_before_func()
RETURNS TRIGGER
AS $BODY$
DECLARE
exists INTEGER;
BEGIN
UPDATE test SET name=new.name, amount=new.amount
WHERE id=new.id
RETURNING id INTO exists;
-- If the above was successful, it would return non-null
-- in that case we return NULL so that the triggered INSERT
-- does not proceed
IF exists is not null THEN
RETURN NULL;
END IF;
-- Otherwise, return the new record so that triggered INSERT
-- goes ahead
RETURN new;
END;
$BODY$
LANGUAGE 'plpgsql' SECURITY DEFINER;
CREATE TRIGGER test_insert_before_trigger
BEFORE INSERT
ON test
FOR EACH ROW
EXECUTE PROCEDURE test_insert_before_func();
Now, if I insert a row which does not already exist, it is inserted:
test=> insert into test(id,name,amount) values (1,'Mary',100);
INSERT 0 1
test=> select * from test;
id | name | amount
----+------+--------
1 | Mary | 100
(1 row)
If I try to insert a row with the same ID:
test=> insert into test(id,name,amount) values (1,'Mary',200);
INSERT 0 0
test=> select * from test;
id | name | amount
----+------+--------
1 | Mary | 200
(1 row)
this time the row is updated instead of inserted.
It works just as well if I load the rows from a CSV file.
However: one thing you may not have considered: this will not delete any records that exist in the database and do not exist in the CSV file. If you wanted that to work you would need a more complex solution - perhaps a sequence like this:
- Loading the CSV file into a temporary table
Deleting all rows from the real table that did not exist in the temp. table
DELETE FROM test WHERE id NOT IN (SELECT id FROM temp);
Then finally insert rows from the temp. table into the real table:
INSERT INTO test(id,name,amount) (SELECT id,name,amount FROM temp);
This answer does not consider concurrency issues, in case the table might be updated by other users concurrently. However if you only ever load if from the CSV file then that is not likely to be an issue.
Is it possible to export certain values from specific rows to a CSV in Datagrip
- Write the query that retrieves the needed records. In your case it can be smth like
SELECTclock_in_time, clock_out_time
FROM table
WHERE user_id = "Bob"
- Export the result to the CSV via Export button on the result's toolbar:
Related Topics
SQL Server 2012 Column Identity Increment Jumping from 6 to 1000+ on 7Th Entry
Save Pl/Pgsql Output from Postgresql to a CSV File
Postgresql: Running Count of Rows For a Query 'By Minute'
Group by Clause in MySQL and Postgresql, Why the Error in Postgresql
Inner Join VS Left Join Performance in SQL Server
SQL Counting All Rows Instead of Counting Individual Rows
Is There Any Difference Between Group by and Distinct
MySQL Get Row Position in Order By
Join Between Tables in Two Different Databases
Find All Tables Containing Column With Specified Name - Ms SQL Server
How to Query SQL For a Latest Record Date For Each User
Must Appear in the Group by Clause or Be Used in an Aggregate Function
Count(*) Vs. Count(1) Vs. Count(Pk): Which Is Better
Combining "Like" and "In" For SQL Server