How to insert pandas dataframe via mysqldb into database?
Update:
There is now a to_sql
method, which is the preferred way to do this, rather than write_frame
:
df.to_sql(con=con, name='table_name_for_df', if_exists='replace', flavor='mysql')
Also note: the syntax may change in pandas 0.14...
You can set up the connection with MySQLdb:
from pandas.io import sql
import MySQLdb
con = MySQLdb.connect() # may need to add some other options to connect
Setting the flavor
of write_frame
to 'mysql'
means you can write to mysql:
sql.write_frame(df, con=con, name='table_name_for_df',
if_exists='replace', flavor='mysql')
The argument if_exists
tells pandas how to deal if the table already exists:
if_exists: {'fail', 'replace', 'append'}
, default'fail'
fail
: If table exists, do nothing.
replace
: If table exists, drop it, recreate it, and insert data.
append
: If table exists, insert data. Create if does not exist.
Although the write_frame
docs currently suggest it only works on sqlite, mysql appears to be supported and in fact there is quite a bit of mysql testing in the codebase.
Write Pandas DataFrame into a existing MySQL Database Table
In Ideal situation for any database operation you need:
- A database engine
- A Connection
- Create a cursor from connection
- Create a Insert SQL statement
- Read the csv data row by row or all together and insert into the table
That is just a concept.
import pymysql
# Connect to the database
connection = pymysql.connect(host='localhost',
user='<user>',
password='<pass>',
db='<db_name>')
# create cursor
cursor=connection.cursor()
# Insert DataFrame recrds one by one.
sql = "INSERT INTO client_info(code,name, nac) VALUES(%s,%s,%s)"
for i,row in Client_Table1.iterrows():
cursor.execute(sql, tuple(row))
# the connection is not autocommitted by default, so we must commit to save our changes
connection.commit()
connection.close()
That is just a concept. I can not test the code I have written. There might be some error. You might need to debug it. For example data type miss match as I am considering all row as string with %s. Please read more in detail here.
Edit Based on Comment:
You can create separate methods for each table with a sql statement and then run them at the end. Again that is just a concept and can be generalised more.
def insert_into_client_info():
# create cursor
cursor = connection.cursor()
# Insert DataFrame recrds one by one.
sql = "INSERT INTO client_info(code,name, nac) VALUES(%s,%s,%s)"
for i, row in Client_Table1.iterrows():
cursor.execute(sql, tuple(row))
# the connection is not autocommitted by default, so we must commit to save our changes
connection.commit()
cursor.close()
def insert_into_any_table():
"a_cursor"
"a_sql"
"a_for_loop"
connection.commit()
cursor.close()
## Pile all the funciton one after another
insert_into_client_info()
insert_into_any_table()
# close the connection at the end
connection.close()
Insert Python Dataframes into MySQL
You probably don't need to iterate over the DataFrame, just use to_sql
method:
import sqlalchemy as sa
e = sa.create_engine(...)
df.to_sql("table_name", e, if_exists="replace", index=False)
Here's an example for MySQL: Writing to MySQL database with pandas using SQLAlchemy, to_sql
Pandas dataframe insert every value in mysql database
It should be something like this:
mycursor = mydb.cursor()
mycursor.execute("INSERT INTO table_name(domain, date, company) VALUES ('0vh-cl0ud.sg', '2017-10-12', 'KEY-SYSTEMS GMBH'))"
This piece should be put in the loop after the data is scraped. Please go through the aforementioned links in comments to have a better understanding of the process.
Writing a Pandas Dataframe to MySQL
The other option to sqlalchemy can be used to_sql but in future released will be deprecated but now in version pandas 0.18.1 documentation is still active.
According to pandas documentation pandas.DataFrame.to_sql you can use following syntax:
DataFrame.to_sql(name, con, flavor='sqlite', schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None)
you specify the con type/mode and flavor ‘mysql’, here is some description:
con : SQLAlchemy engine or DBAPI2 connection (legacy mode)
Using SQLAlchemy makes it possible to use any DB supported by that library. If a DBAPI2 object, only sqlite3 is supported.flavor : {‘sqlite’, ‘mysql’}, default ‘sqlite’ The flavor of SQL to
use. Ignored when using SQLAlchemy engine. ‘mysql’ is deprecated and
will be removed in future versions, but it will be further supported
through SQLAlchemy engines.
Pandas Insert data into MySQL
I think your code should read like this
import pandas as pd
from pandas.io import sql
from sqlalchemy import create_engine
df = pd.read_csv('File.csv', usercols=['ID', 'START_DATE'], skiprows=skip)
print(df)
engine = create_engine('mysql://username:password@localhost/dbname')
with engine.connect() as conn, conn.begin():
df.to_sql('Table1', conn, if_exists='replace')
But, regarding your question, unless I am mistaken in my understanding of Pandas, whatever columns df
presently has, those are going to be written to the columns of the same name of the mysql table.
If you need different column names, you'll want to rename those in the DataFrame
Or use the parameters, as mentioned,
index : boolean, default True
Write DataFrame index as a column.index_label : string or sequence, default None
Column label for index column(s). If None is given (default) and index is True, then the index names are used
Insert pandas DataFrame() into an MySQL table raises ProgrammingError
Try converting it all to regular strings. Just apply a lambda function on your name column. For example:
df = pd.DataFrame({'a':[1,2,3,4], 'b':[['a', 'b','v'], ['s', 'e','r'], ['a', 'j','k'], ['f','g','d']]})
df.b.apply(lambda x: ", ".join(x))
Out[31]:
0 a, b, v
1 s, e, r
2 a, j, k
3 f, g, d
Name: b, dtype: object
Strings should definitely work fine with sql.
How to iterate a dataframe and insert into mysql db
You could do this in a few different ways, but probably the easiest would be to start with a merge.
Bring your dataframes together with a left join of the data you want to write to SQL:
merged_df = df1.merge(df2, how='left', on='uid')
Then I'd just filter that dataframe so the flags of 1 are removed
merged_df = merged_df[merged_df['flag'] != 1]
and write the columns you want to sql:
merged_df[['Place', 'uid', 'sal']].to_sql('sql_table', con_engine, index=False, if_exists='append')
(the con_engine, df1 and df2 are all psuedo-code, but shouldn't be too tricky to fill in)
edit: I've just seen that you mentioned adding in a flag for the blank flags, you could do that with an apply if you were wondering:
merged_df['flag'] = merged_df['flag'].apply(lambda x: x if x == 1 else 0)
Related Topics
How to Turn Off Info Logging in Spark
Getting a MAChine's External Ip Address with Python
Pandas Create New Column with Count from Groupby
How to Load Files Using Pickle and Multiple Modules
Count Consecutive Occurences of Values Varying in Length in a Numpy Array
How to Write Inline If Statement for Print
Str' Object Has No Attribute 'Decode'. Python 3 Error
Beautifulsoup - Search by Text Inside a Tag
How to Add a Custom Ca Root Certificate to the Ca Store Used by Pip in Windows
Pipe Subprocess Standard Output to a Variable
Why 'Torch.Cuda.Is_Available()' Returns False Even After Installing Pytorch with Cuda
Reactornotrestartable Error in While Loop with Scrapy
Dropping Infinite Values from Dataframes in Pandas