How to Convert SQL Query Result to Pandas Data Structure

How to convert SQL Query result to PANDAS Data Structure?

Here's the shortest code that will do the job:

from pandas import DataFrame
df = DataFrame(resoverall.fetchall())
df.columns = resoverall.keys()

You can go fancier and parse the types as in Paul's answer.

Trying to find the most efficient way to convert SQL Query to Pandas DataFrame that has large number of records

Just for reference to others (we already talked about this).

The slow part in the lines of code above is the conversion of the SQL return to a pandas data frame. This step is not only slow but single-threaded given Python's default behavior.

To get around this behavior, one way to bruteforce the processing is to send x subqueries in separate processes.

Once we have the results of the subquery, assembly of individual dataframes via pd.concat is actually fast.

Since you are looking at parallelizing tasks, consider the following "distributed computing" libraries:

  • Dask: http://dask.pydata.org/en/latest/
  • Distarray: http://docs.enthought.com/distarray/
  • Ray: https://ray-project.github.io/2017/05/20/announcing-ray.html

All enable you to parallelize tasks with a bit more automation if you are willing to trade in adding another libraries to your list of dependencies.

The alternative is to either use multi-process functionalities within Python core itself.

sql output to pandas dataframe using python in pycharm IDE

Although there are direct read methods in Pandas like pandas.read_sql() you should be able to take your successful cursor object, define new variables as empty Python lists and append the rows, then create a Pandas dataframe. Assuming your table is setup with columns as separate variables, here is some example code:

import Pandas as pd

# create some empty lists:
var1 = []
var2 = []
var3 = []

# append rows from the cursor object:
for row in cursor:
var1.append(row[0])
var2.append(row[1])
var3.append(row[2])

# Create a dictionary with header names if desired:
my_data = {'header1': var1,
'header2': var2,
'header3': var3}

# Make a Pandas dataframe:
df = pd.DataFrame(data = my_data)

How to store mySQL query result into pandas DataFrame with pymysql?

Use Pandas.read_sql() for this:

query = "SELECT * FROM orders WHERE date_time BETWEEN ? AND ?"
df = pd.read_sql(query, connection, params=(start_date, end_date))

How to execute mysql query and get the output as data frame in python

You could use pandas read_sql if you want a pandas dataframe:

db_connection = sql.connect(host='10.10.10.10', database='cd', user='root', password='',charset='utf8')
raw_data_query = "select date(originating_date_time),count(*) as Calls,sum(if(call_duration>0,1,0)) as Duration,sum(CEILING(call_duration/100))/60 from calldetailrecs where term_trunk_group in (986,985,984,983) group by date(originating_date_time)"
df = pd.read_sql(raw_data_query, db_connection)

It is easy, fast and you don't have to remember to close the connection afterwards.

How can I write a function to convert sql query to a dataframe

just use like below read_sql()

def _get_data(self):

df= pd.read_sql("select col1,col2 from table_name", connection)
return df

It will return dataframe

SQLAlchemy ORM conversion to pandas DataFrame

Below should work in most cases:

df = pd.read_sql(query.statement, query.session.bind)

See pandas.read_sql documentation for more information on the parameters.

Using Jupyter notebook to convert SQL into Panda Data Frame

The error you are receiving is being caused by the order of your code:

1  import pandas as pd
2 df = pd.read_sql(sql, cnxn) ## You call the variable sql here, but don't assign it until line 6
3
4 cnxn = pyodbc.connect(connection_info)
5 cursor = cnxn.cursor()
6 sql = """SELECT * FROM AdventureWorks2012.Person.Address
7 WHERE City = 'Bothell'
8 ORDER BY AddressID ASC"""
9 df = psql.frame_query(sql, cnxn)
10 cnxn.close()
  • You are calling the variable sql on line 2, but you don't actually define the variable until line 6.
  • You are also missing a few libraries, and based off beardc's code it looks like you've meshed some of the wrong parts of his two answers together.

Try arranging the code like this:

(Please note this code is untested, and the other issues described below)

#Import the libraries
import pandas as pd
import pyodbc
#Give the connection info
cnxn = pyodbc.connect(connection_info)
#Assign the SQL query to a variable
sql = "SELECT * FROM AdventureWorks2012.Person.Address WHERE City = 'Bothell' ORDER BY AddressID ASC"
#Read the SQL to a Pandas dataframe
df = pd.read_sql(sql, cnxn)

In answer to your questions:

  1. Yes, you need to change the connection_info to the info in your database. There is a good example of the text you need to put in there here
  2. This specific issue isn't being caused by your network restrictions.


Related Topics



Leave a reply



Submit