Writing to MySQL Database with Pandas Using SQLalchemy, To_Sql

Writing to MySQL database with pandas using SQLAlchemy, to_sql

Using the engine in place of the raw_connection() worked:

import pandas as pd
import mysql.connector
from sqlalchemy import create_engine

engine = create_engine('mysql+mysqlconnector://[user]:[pass]@[host]:[port]/[schema]', echo=False)
data.to_sql(name='sample_table2', con=engine, if_exists = 'append', index=False)

Not clear on why when I tried this yesterday it gave me the earlier error.

Writing a Pandas Dataframe to MySQL

The other option to sqlalchemy can be used to_sql but in future released will be deprecated but now in version pandas 0.18.1 documentation is still active.

According to pandas documentation pandas.DataFrame.to_sql you can use following syntax:

DataFrame.to_sql(name, con, flavor='sqlite', schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None)

you specify the con type/mode and flavor ‘mysql’, here is some description:

con : SQLAlchemy engine or DBAPI2 connection (legacy mode)
Using SQLAlchemy makes it possible to use any DB supported by that library. If a DBAPI2 object, only sqlite3 is supported.

flavor : {‘sqlite’, ‘mysql’}, default ‘sqlite’ The flavor of SQL to
use. Ignored when using SQLAlchemy engine. ‘mysql’ is deprecated and
will be removed in future versions, but it will be further supported
through SQLAlchemy engines.

Pandas cant write all tables and rows to MySQL using SQLAlchemy

It looks the process might take too long and times out the connection. Based on the time frame you're giving me for how long it takes to write this data, this seems to be the case. There are two options that aren't mutually exclusive, so you can do both if needed. The first is to increase the wait_timeout on for your MySQL server. You can do this in Python or MySQL Workbench, and this answer tells you how to do it both ways.

The second option, which is a little more involved, is that you can switch from using SQLAlchemy to directly using pyodbc to perform the insert operations. That's not faster in itself, but it allows you to turn the DataFrame you are trying to push into a list of tuples, which is faster to push in an insert query than it is to push a DataFrame with .to_sql(). There might be some additional coding to do for things that SQLAlchemy handles automatically, but it is an option for increasing your performance. That said, I would highly recommend trying the first by itself to make sure that this would even be necessary before you try incorporating this strategy.

For the missing rows, I don't really know why a set of records would only be partially inserted into a table. Is autocommit set to True anywhere?

Lastly, this answer covers a lot of ground an I didn't accurately assess the issue.

How to insert pandas dataframe via mysqldb into database?

Update:

There is now a to_sql method, which is the preferred way to do this, rather than write_frame:

df.to_sql(con=con, name='table_name_for_df', if_exists='replace', flavor='mysql')

Also note: the syntax may change in pandas 0.14...

You can set up the connection with MySQLdb:

from pandas.io import sql
import MySQLdb

con = MySQLdb.connect() # may need to add some other options to connect

Setting the flavor of write_frame to 'mysql' means you can write to mysql:

sql.write_frame(df, con=con, name='table_name_for_df', 
if_exists='replace', flavor='mysql')

The argument if_exists tells pandas how to deal if the table already exists:

if_exists: {'fail', 'replace', 'append'}, default 'fail'
     fail: If table exists, do nothing.

     replace: If table exists, drop it, recreate it, and insert data.

     append: If table exists, insert data. Create if does not exist.

Although the write_frame docs currently suggest it only works on sqlite, mysql appears to be supported and in fact there is quite a bit of mysql testing in the codebase.

Pandas 0.20.2 to_sql() using MySQL

Thanks to a tip from @AndyHayden, this answer was the trick. Basically replacing mysqlconnector with mysqldb was the linchpin.

engine = create_engine('mysql+mysqldb://[user]:[pass]@[host]:[port]/[schema]', echo = False)
df.to_sql(name = 'my_table', con = engine, if_exists = 'append', index = False)

Where [schema] is the database name, and in my particular case, :[port] is omitted with [host] being localhost.



Related Topics



Leave a reply



Submit