How to Upsert Pandas Dataframe to Microsoft SQL Server Table

How to upsert pandas DataFrame to Microsoft SQL Server table?

Update, July 2022: You can save some typing by using this function to build the MERGE statement and perform the upsert for you.


SQL Server offers the MERGE statement:

import pandas as pd
import sqlalchemy as sa

connection_string = (
"Driver=ODBC Driver 17 for SQL Server;"
"Server=192.168.0.199;"
"UID=scott;PWD=tiger^5HHH;"
"Database=test;"
"UseFMTONLY=Yes;"
)
connection_url = sa.engine.URL.create(
"mssql+pyodbc",
query={"odbc_connect": connection_string}
)

engine = sa.create_engine(connection_url, fast_executemany=True)

with engine.begin() as conn:
# step 0.0 - create test environment
conn.exec_driver_sql("DROP TABLE IF EXISTS main_table")
conn.exec_driver_sql(
"CREATE TABLE main_table (id int primary key, txt varchar(50))"
)
conn.exec_driver_sql(
"INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
)
# step 0.1 - create DataFrame to UPSERT
df = pd.DataFrame(
[(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
)

# step 1 - upload DataFrame to temporary table
df.to_sql("#temp_table", conn, index=False, if_exists="replace")

# step 2 - merge temp_table into main_table
conn.exec_driver_sql(
"""\
MERGE main_table WITH (HOLDLOCK) AS main
USING (SELECT id, txt FROM #temp_table) AS temp
ON (main.id = temp.id)
WHEN MATCHED THEN
UPDATE SET txt = temp.txt
WHEN NOT MATCHED THEN
INSERT (id, txt) VALUES (temp.id, temp.txt);
"""
)

# step 3 - confirm results
result = conn.exec_driver_sql(
"SELECT * FROM main_table ORDER BY id"
).fetchall()
print(result)
# [(1, 'row 1 new text'), (2, 'new row 2 text')]

How to update db faster using pyodbc

Perhaps you can preprocess your data_json into two groups: one that needs to be inserted (does not already exist in the database) and one that needs to be updated (already exists in the database). Then you can use your df.to_sql call with method="multi" and chunksize=50. You may be able to convert your data_json directly into a Pandas DataFrame with pandas.DataFrame(data_json) (here I am assuming data_json is a Python object, probably a dictionary, instead of a string in JSON format).

If you can convert your data_json into a data frame first you may be able to run

df = pd.DataFrame(data_json)
df.loc[lambda D: D.item_id.isin(setid_db), :].to_sql(<UPDATE>)
df.loc[lambda D: ~D.item_id.isin(setid_db), :].to_sql(<INSERT>)


Related Topics



Leave a reply



Submit