How to Insert Pandas Dataframe via MySQLdb into Database

How to insert pandas dataframe via mysqldb into database?

Update:

There is now a to_sql method, which is the preferred way to do this, rather than write_frame:

df.to_sql(con=con, name='table_name_for_df', if_exists='replace', flavor='mysql')

Also note: the syntax may change in pandas 0.14...

You can set up the connection with MySQLdb:

from pandas.io import sql
import MySQLdb

con = MySQLdb.connect() # may need to add some other options to connect

Setting the flavor of write_frame to 'mysql' means you can write to mysql:

sql.write_frame(df, con=con, name='table_name_for_df', 
if_exists='replace', flavor='mysql')

The argument if_exists tells pandas how to deal if the table already exists:

if_exists: {'fail', 'replace', 'append'}, default 'fail'
     fail: If table exists, do nothing.

     replace: If table exists, drop it, recreate it, and insert data.

     append: If table exists, insert data. Create if does not exist.

Although the write_frame docs currently suggest it only works on sqlite, mysql appears to be supported and in fact there is quite a bit of mysql testing in the codebase.

Write Pandas DataFrame into a existing MySQL Database Table

In Ideal situation for any database operation you need:

  • A database engine
  • A Connection
  • Create a cursor from connection
  • Create a Insert SQL statement
  • Read the csv data row by row or all together and insert into the table

That is just a concept.

import pymysql

# Connect to the database
connection = pymysql.connect(host='localhost',
user='<user>',
password='<pass>',
db='<db_name>')

# create cursor
cursor=connection.cursor()

# Insert DataFrame recrds one by one.
sql = "INSERT INTO client_info(code,name, nac) VALUES(%s,%s,%s)"
for i,row in Client_Table1.iterrows():
cursor.execute(sql, tuple(row))

# the connection is not autocommitted by default, so we must commit to save our changes
connection.commit()

connection.close()

That is just a concept. I can not test the code I have written. There might be some error. You might need to debug it. For example data type miss match as I am considering all row as string with %s. Please read more in detail here.

Edit Based on Comment:

You can create separate methods for each table with a sql statement and then run them at the end. Again that is just a concept and can be generalised more.

def insert_into_client_info():
# create cursor
cursor = connection.cursor()

# Insert DataFrame recrds one by one.
sql = "INSERT INTO client_info(code,name, nac) VALUES(%s,%s,%s)"
for i, row in Client_Table1.iterrows():
cursor.execute(sql, tuple(row))

# the connection is not autocommitted by default, so we must commit to save our changes
connection.commit()
cursor.close()

def insert_into_any_table():
"a_cursor"
"a_sql"
"a_for_loop"
connection.commit()
cursor.close()

## Pile all the funciton one after another
insert_into_client_info()
insert_into_any_table()

# close the connection at the end
connection.close()

Insert Python Dataframes into MySQL

You probably don't need to iterate over the DataFrame, just use to_sql method:

import sqlalchemy as sa

e = sa.create_engine(...)
df.to_sql("table_name", e, if_exists="replace", index=False)

Here's an example for MySQL: Writing to MySQL database with pandas using SQLAlchemy, to_sql

Pandas dataframe insert every value in mysql database

It should be something like this:

mycursor = mydb.cursor()
mycursor.execute("INSERT INTO table_name(domain, date, company) VALUES ('0vh-cl0ud.sg', '2017-10-12', 'KEY-SYSTEMS GMBH'))"

This piece should be put in the loop after the data is scraped. Please go through the aforementioned links in comments to have a better understanding of the process.

Writing a Pandas Dataframe to MySQL

The other option to sqlalchemy can be used to_sql but in future released will be deprecated but now in version pandas 0.18.1 documentation is still active.

According to pandas documentation pandas.DataFrame.to_sql you can use following syntax:

DataFrame.to_sql(name, con, flavor='sqlite', schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None)

you specify the con type/mode and flavor ‘mysql’, here is some description:

con : SQLAlchemy engine or DBAPI2 connection (legacy mode)
Using SQLAlchemy makes it possible to use any DB supported by that library. If a DBAPI2 object, only sqlite3 is supported.

flavor : {‘sqlite’, ‘mysql’}, default ‘sqlite’ The flavor of SQL to
use. Ignored when using SQLAlchemy engine. ‘mysql’ is deprecated and
will be removed in future versions, but it will be further supported
through SQLAlchemy engines.

Pandas Insert data into MySQL

I think your code should read like this

import pandas as pd
from pandas.io import sql
from sqlalchemy import create_engine

df = pd.read_csv('File.csv', usercols=['ID', 'START_DATE'], skiprows=skip)
print(df)

engine = create_engine('mysql://username:password@localhost/dbname')
with engine.connect() as conn, conn.begin():
df.to_sql('Table1', conn, if_exists='replace')

But, regarding your question, unless I am mistaken in my understanding of Pandas, whatever columns df presently has, those are going to be written to the columns of the same name of the mysql table.

If you need different column names, you'll want to rename those in the DataFrame

Or use the parameters, as mentioned,

index : boolean, default True

Write DataFrame index as a column.

index_label : string or sequence, default None

Column label for index column(s). If None is given (default) and index is True, then the index names are used

Insert pandas DataFrame() into an MySQL table raises ProgrammingError

Try converting it all to regular strings. Just apply a lambda function on your name column. For example:

df = pd.DataFrame({'a':[1,2,3,4], 'b':[['a', 'b','v'], ['s', 'e','r'], ['a', 'j','k'], ['f','g','d']]})

df.b.apply(lambda x: ", ".join(x))
Out[31]:
0 a, b, v
1 s, e, r
2 a, j, k
3 f, g, d
Name: b, dtype: object

Strings should definitely work fine with sql.

How to iterate a dataframe and insert into mysql db

You could do this in a few different ways, but probably the easiest would be to start with a merge.

Bring your dataframes together with a left join of the data you want to write to SQL:

merged_df = df1.merge(df2, how='left', on='uid')

Then I'd just filter that dataframe so the flags of 1 are removed

merged_df = merged_df[merged_df['flag'] != 1]

and write the columns you want to sql:

merged_df[['Place', 'uid', 'sal']].to_sql('sql_table', con_engine, index=False, if_exists='append')

(the con_engine, df1 and df2 are all psuedo-code, but shouldn't be too tricky to fill in)

edit: I've just seen that you mentioned adding in a flag for the blank flags, you could do that with an apply if you were wondering:

merged_df['flag'] = merged_df['flag'].apply(lambda x: x if x == 1 else 0)


Related Topics



Leave a reply



Submit