pandas.to_sql replace old data with new data based on 'unique id'
There doesn't seem to be a feature to get through this easily. Currently, I just drop the entire table, and recreate a new one..
meta = MetaData()
table_to_drop = Table('Database count details',
meta, autoload=True, autoload_with=engine)
table_to_drop.drop(engine)
Pandas to_sql() to update unique values in DB?
In pandas, there is no convenient argument in to_sql
to append only non-duplicates to a final table. Consider using a staging temp table that pandas always replaces and then run a final append query to migrate temp table records to final table accounting only for unique PK's using the NOT EXISTS
clause.
engine = sqlalchemy.create_engine(...)
df.to_sql(name='myTempTable', con=engine, if_exists='replace')
with engine.begin() as cn:
sql = """INSERT INTO myFinalTable (Col1, Col2, Col3, ...)
SELECT t.Col1, t.Col2, t.Col3, ...
FROM myTempTable t
WHERE NOT EXISTS
(SELECT 1 FROM myFinalTable f
WHERE t.MatchColumn1 = f.MatchColumn1
AND t.MatchColumn2 = f.MatchColumn2)"""
cn.execute(sql)
This would be an ANSI SQL solution and not restricted to vendor-specific methods like UPSERT
and so is compliant in practically all SQL-integrated relational databases.
REPLACE rows in mysql database table with pandas DataFrame
Till this version (0.17.1)
I am unable find any direct way to do this in pandas. I reported a feature request for the same.
I did this in my project with executing some queries using MySQLdb
and then using DataFrame.to_sql(if_exists='append')
Suppose
1) product_id is my primary key in table PRODUCT
2) feed_id is my primary key in table XML_FEED.
SIMPLE VERSION
import MySQLdb
import sqlalchemy
import pandas
con = MySQLdb.connect('localhost','root','my_password', 'database_name')
con_str = 'mysql+mysqldb://root:my_password@localhost/database_name'
engine = sqlalchemy.create_engine(con_str) #because I am using mysql
df = pandas.read_sql('SELECT * from PRODUCT', con=engine)
df_product_id = df['product_id']
product_id_str = (str(list(df_product_id.values))).strip('[]')
delete_str = 'DELETE FROM XML_FEED WHERE feed_id IN ({0})'.format(product_id_str)
cur = con.cursor()
cur.execute(delete_str)
con.commit()
df.to_sql('XML_FEED', if_exists='append', con=engine)# you can use flavor='mysql' if you do not want to create sqlalchemy engine but it is depreciated
Please note:-
The REPLACE [INTO]
syntax allows us to INSERT
a row into a table, except that if a UNIQUE KEY
(including PRIMARY KEY
) violation occurs, the old row is deleted prior to the new INSERT, hence no violation.
pandas dataframe to_sql for replace and add new using sqlalchemy
The if_exists='replace'
is not a row wise operation. So it does not check if each row already exists and only replaces that specific row. It checks if the whole table is already there, if it finds the table, it will drop the old table and insert your new one.
Quoted from the docs:
replace: Drop the table before inserting new values.
What I think you should do is use if_exists='append'
and then check for duplicate rows and remove them. That would for now be the safest approach.
The method you are looking for is being worked on atm and is called upsert
, this will only insert record which do not "clash", and you can prioritise the new or old records. See GitHub
ticket
Pandas to_sql - Increase table's index when appending DataFrame
Even though Pandas has a lot of export options, its main purpose is not intented to use as database management api. Managing indexes is typically something a database should take care of.
I would suggest to set index=False, if_exists='append'
and create the table with an auto-increment index:
CREATE TABLE AReg (
id INT NOT NULL AUTO_INCREMENT,
# your fields here
PRIMARY KEY (id)
);
to_sql() method of pandas sends primary key column as NULL even if the column is not present in dataframe
Please note that pandas.DataFrame.to_sql()
has by default parameter index=True
which means that it will add an extra column (df.index) when inserting the data.
Some Databases like PostgreSQL have a data type serial
which allows you to sequentially fill the column with incremental numbers.
Snowflake DB doesn't have that concept but instead, there are other ways to handle it:
First Option:
You can use CREATE SEQUENCE
statement and create a sequence directly in the db - here is the official documentation on this topic. The downside of this approach is that you would need to convert your DataFrame into a proper SQL statement:
db preparation part:
CREATE OR REPLACE SEQUENCE schema.my_sequence START = 1 INCREMENT = 1;
CREATE OR REPLACE TABLE schema.my_table (i bigint, b text);
You would need to convert the DataFrame into Snowflake's INSERT
statement and use schema.my_sequence.nextval
to get the next ID value
INSERT INTO schema.my_table VALUES
(schema.my_sequence.nextval, 'string_1'),
(schema.my_sequence.nextval, 'string_2');
The result will be:
i b
1 string_1
2 string_2
Please note that there are some limitations to this approach and you need to ensure that each insert statement you do this way will be successful as calling schema.my_sequence.nextval
and not inserting it will mean that there will be gaps numbers.
To avoid it you can have a separate script that checks if the current insert was successful and if not it will recreate the sequence by calling:
REPLACE SEQUENCE schema.my_sequence start = (SELECT max(i) FROM schema.my_table) increment = 1;
Alternative Option:
You would need to create an extra function that runs the SQL to get the last i you inserted previously.
SELECT max(i) AS max_i FROM schema.my_table;
and then update the index
in your DataFrame before running to_sql()
df.index = range(max_i+1, len(df)+max_i+1)
This will ensure that your DataFrame index continues i in your table.
Once that is done you can use
df.to_sql(index_label='i', name='my_table', con=connection_object)
It will use your index as one of the columns you insert allowing you to maintain the unique index in the table.
How to insert a pandas dataframe to an already existing table in a database?
make use of if_exists parameter:
df.to_sql('db_table2', engine, if_exists='replace')
or
df.to_sql('db_table2', engine, if_exists='append')
from docstring:
"""
if_exists : {'fail', 'replace', 'append'}, default 'fail'
- fail: If table exists, do nothing.
- replace: If table exists, drop it, recreate it, and insert data.
- append: If table exists, insert data. Create if does not exist.
"""
How do I perform an UPDATE of existing rows of a db table using a Pandas DataFrame?
I think the easiest way would be to:
first delete those rows that are going to be "upserted". This can be done in a loop, but it's not very efficient for bigger data sets (5K+ rows), so i'd save this slice of the DF into a temporary MySQL table:
# assuming we have already changed values in the rows and saved those changed rows in a separate DF: `x`
x = df[mask] # `mask` should help us to find changed rows...
# make sure `x` DF has a Primary Key column as index
x = x.set_index('a')
# dump a slice with changed rows to temporary MySQL table
x.to_sql('my_tmp', engine, if_exists='replace', index=True)
conn = engine.connect()
trans = conn.begin()
try:
# delete those rows that we are going to "upsert"
engine.execute('delete from test_upsert where a in (select a from my_tmp)')
trans.commit()
# insert changed rows
x.to_sql('test_upsert', engine, if_exists='append', index=True)
except:
trans.rollback()
raise
PS i didn't test this code so it might have some small bugs, but it should give you an idea...
Related Topics
Python: How to Match Nested Parentheses With Regex
Compare a Column Between 2 CSV Files and Write Differences Using Python
How to Make a Discord Bot Leave a Server from a Command in Another Server
Grab a Number After a String in a File
How to Change the Title Bar in Tkinter
Change Date Formats in CSV With Python 3
Concatenate Two Columns in Csv: Python
Pandas.Read_Excel Parameter "Sheet_Name" Not Working
Remove Last Few Characters in Pyspark Dataframe Column
Python: How to Read and Load an Excel File from Aws S3
Remove Very First Row in Pandas
How to Write Multiple Images (Subplots) into One Image
How to Extract Text from an Existing Docx File Using Python-Docx
How to Save Plotly Offline Graph in Format Png
Matching Text Between a Pair of Single Quotes
How to Rotate Xticks on One Axis of Figure in Matplotlib Without "Getting" the Labels as a List
Python While Loop for Finding Prime Numbers
Python: Split a List into Multiple Lists Based on a Subset of Elements