Python db-api: fetchone vs fetchmany vs fetchall
I think it indeed depends on the implementation, but you can get an idea of the differences by looking into MySQLdb sources. Depending on the options, mysqldb fetch* keep the current set of rows in memory or server side, so fetchmany vs fetchone has some flexibility here to know what to keep in (python's) memory and what to keep db server side.
PEP 249 does not give much detail, so I guess this is to optimize things depending on the database while exact semantics are implementation-defined.
Python does cursor execute load all data
Taken from MySQL documentation:
The fetchone() method is used by fetchall() and fetchmany(). It is also used when a cursor is used as an iterator.
The following example shows two equivalent ways to process a query result. The first uses fetchone() in a while loop, the second uses the cursor as an iterator:
# Using a while loop
cursor.execute("SELECT * FROM employees")
row = cursor.fetchone()
while row is not None:
print(row)
row = cursor.fetchone()
# Using the cursor as iterator
cursor.execute("SELECT * FROM employees")
for row in cursor:
print(row)
It also stated that:
You must fetch all rows for the current query before executing new statements using the same connection.
If you are worried about performance issues you should use fetchmany(n)
in a while loop until you fetch all of the results like so:
'An iterator that uses fetchmany to keep memory usage down'
while True:
results = cursor.fetchmany(arraysize)
if not results:
break
for result in results:
yield result
This behavior adheres to PEP249, which describes how and which methods db connectors should implement. A partial answer is given in this thread.
Basically the implementation of fetchall vs fetchmany vs fetchone would be up to the developers of the library depending on the database capabilities, but it would make sense, in the case of fetchmany and fetchone, that the unfetched/remaining results would be kept server side, until requested by another call or destruction of cursor object.
So in conclusion I think it is safe to assume calling execute method does not, in this case(mysqldb), dump all the data from the query to memory.
SQLAlchemy `.fetchmany()` vs `.limit()`
limit will be a part of the sql query sent to the database server.
With fetchmany the query is executed without any limit, but the client (python code) requests only certain number of rows.
Therefore using limit should be faster in most cases.
What's the difference between using "c.fetchall()" vs. just assigning "c.execute(SELECT...." to a variable?
The Python DBAPI doesn't define what execute
returns: it could be a generator, it could be a list of results, it could be some custom object representing the results, etc. It only promises that the database query will be made; how the results are presented or made available is not defined.
The fetch_all
method, however, is defined to return a "sequence of sequences", which means you have the actual, instantiated result in memory immediately. Iterating over the result, in particular, isn't going to trigger a delayed database connection to execute a query or fetch more results.
One consequence of this is that an implementation could define your two approaches to be equivalent, but it is not required to.
cursor.fetchall() vs list(cursor) in Python
If you are using the default cursor, a MySQLdb.cursors.Cursor
, the entire result set will be stored on the client side (i.e. in a Python list) by the time the cursor.execute()
is completed.
Therefore, even if you use
for row in cursor:
you will not be getting any reduction in memory footprint. The entire result set has already been stored in a list (See self._rows
in MySQLdb/cursors.py).
However, if you use an SSCursor or SSDictCursor:
import MySQLdb
import MySQLdb.cursors as cursors
conn = MySQLdb.connect(..., cursorclass=cursors.SSCursor)
then the result set is stored in the server, mysqld. Now you can write
cursor = conn.cursor()
cursor.execute('SELECT * FROM HUGETABLE')
for row in cursor:
print(row)
and the rows will be fetched one-by-one from the server, thus not requiring Python to build a huge list of tuples first, and thus saving on memory.
Otherwise, as others have already stated, cursor.fetchall()
and list(cursor)
are essentially the same.
SQLite3, Python: fetchone() works on table1 but not table2 but fetchall() works on both
You are looping over the cursor. This yields the data already. By the time you call fetchone()
the row has already been served.
Just use the loop variable, it contains each row result as you iterate:
cur.execute('select * from ' + tablename1)
for row in cur:
print row
Your loop over tablename3
only sees half the rows; you fetch one row by iterating, ignore that row, fetch the next with cur.fetchone()
and print that one, repeating the process in a loop.
Use either iteration or fetchone()
and fetchall()
. Don't mix the two.
fetchone()
would be used to fetch just one result row, for example:
cur.execute('select * from ' + tablename1 + ' WHERE unique_column=?', ('somevalue',))
row = cur.fetchone()
if row is not None:
# there was a matching row, rejoice
print row
Related Topics
How to Allocate Array With Shape and Data Type
Pandas - Find Rows With Matching Values in Two Columns and Multiply Value in Another Column
Pandas: Subtracting Two Date Columns and the Result Being an Integer
How to Map True/False to 1/0 in a Pandas Dataframe
How to Remove Hashtag, @User, Link of a Tweet Using Regular Expression
Most Pythonic Way to Kill a Thread After Some Period of Time
Calculating the Area Under a Curve Given a Set of Coordinates, Without Knowing the Function
How to Remove an Item from a List in Python If That Item Contains a Word
How to Make Python Get the Username in Windows and Then Implement It in a Script
How to Get the Column Name in Pandas Based on Row Values
Python Pandas - Get Row Based on Previous Row Value
Pandas Populate New Dataframe Column Based on Matching Columns in Another Dataframe
How to Map the Differences Between Two Strings
Print All Number Divisible by 7 and Contain 7 from 0 to 100
Reading a CSV File into Pandas Dataframe With Quotation in Some Entries
Delete Every Non Utf-8 Symbols from String
How to Find Duration Between Two Time Difference in Python Dataframe