Inserting multiple rows using psycopg2
To use the execute method place the data to be inserted in a list. A list will be adapted by psycopg2 to an array. Then you unnest the array and cast the values as necessary
import psycopg2
insert = """
insert into history ("timestamp")
select value
from unnest(%s) s(value timestamp)
returning *
;"""
data = [('2014-04-27 14:07:30.000000',), ('2014-04-27 14:07:35.000000',)]
conn = psycopg2.connect("host=localhost4 port=5432 dbname=cpn")
cursor = conn.cursor()
cursor.execute(insert, (data,))
print cursor.fetchall()
conn.commit()
conn.close()
Not sure if the performance difference from executemany will be significant. But I think the above is neater. The returning
clause will, as the name suggests, return the inserted tuples.
BTW timestamp
is a reserved word and should not be used as a column name.
psycopg2: update multiple rows with one query
By splitting the list into chunks of size equal to page_size, it worked well:
def update_items(rows_to_update):
sql_query = """UPDATE contact as t SET
name = data.name
FROM (VALUES %s) AS data (id, name)
WHERE t.id = data.id"""
conn = get_db_connection()
cur = conn.cursor()
n = 100
with tqdm(total=len(rows_to_update)) as pbar:
for i in range(0, len(rows_to_update), n):
psycopg2.extras.execute_values (
cur, sql_query, rows_to_update[i:i + n], template=None, page_size=n
)
conn.commit()
pbar.update(cur.rowcount)
cur.close()
conn.close()
Efficiently insert massive amount of rows in Psycopg2
Based on the answers given here, COPY is the fastest method. COPY
reads from a file or file-like object.
Since memory I/O is many orders of magnitude faster than disk I/O, it is faster to write the data to a StringIO
file-like object than to write to an actual file.
The psycopg docs show an example of calling copy_from
with a StringIO
as input.
Therefore, you could use something like:
try:
# Python2
from cStringIO import StringIO
except ImportError:
# Python3
from io import StringIO
def db_insert_spectrum(curs, visual_data, recording_id):
f = StringIO()
# visual_data is a 2D array (a nx63 matrix)
values_list = []
for rowIndex, rowData in enumerate(visual_data):
items = []
for colIndex, colData in enumerate(rowData):
value = (rowIndex, colIndex, colData, recording_id)
items.append('\t'.join(map(str, value))+'\n')
f.writelines(items)
f.seek(0)
cur.copy_from(f, 'spectrums', columns=('row', 'col', 'value', 'recording_id'))
Related Topics
Use Df Command to Show Only the %Used
Module Not Found After Building Python Project by Using Pysinstaller
Distributing Ruby/Python Desktop Apps
Unicodeencodeerror: 'Charmap' Codec Can't Encode - Character Maps to <Undefined>, Print Function
How to "Perfectly" Override a Dict
Nested Defaultdict of Defaultdict
What Are the Differences Between Numpy Arrays and Matrices? Which One Should I Use
Removing Elements That Have Consecutive Duplicates
Generate Random Integers Between 0 and 9
How to Hide the Console When I Use Os.System() or Subprocess.Call()
What Is the Correct Way to Include Localisation in Python Packages
Python 3.4.3 Modules Installation in Linux Error
Is There Something Wrong with This Python Code, Why Does It Run So Slow Compared to Ruby
Are Python Variables Pointers? or Else, What Are They
How to Extract Text from a PDF File