Python Db-Api: Fetchone VS Fetchmany VS Fetchall

Python db-api: fetchone vs fetchmany vs fetchall

I think it indeed depends on the implementation, but you can get an idea of the differences by looking into MySQLdb sources. Depending on the options, mysqldb fetch* keep the current set of rows in memory or server side, so fetchmany vs fetchone has some flexibility here to know what to keep in (python's) memory and what to keep db server side.

PEP 249 does not give much detail, so I guess this is to optimize things depending on the database while exact semantics are implementation-defined.

Python does cursor execute load all data

Taken from MySQL documentation:

The fetchone() method is used by fetchall() and fetchmany(). It is also used when a cursor is used as an iterator.

The following example shows two equivalent ways to process a query result. The first uses fetchone() in a while loop, the second uses the cursor as an iterator:

# Using a while loop
cursor.execute("SELECT * FROM employees")
row = cursor.fetchone()
while row is not None:
print(row)
row = cursor.fetchone()

# Using the cursor as iterator
cursor.execute("SELECT * FROM employees")
for row in cursor:
print(row)

It also stated that:

You must fetch all rows for the current query before executing new statements using the same connection.

If you are worried about performance issues you should use fetchmany(n) in a while loop until you fetch all of the results like so:

'An iterator that uses fetchmany to keep memory usage down'
while True:
results = cursor.fetchmany(arraysize)
if not results:
break
for result in results:
yield result

This behavior adheres to PEP249, which describes how and which methods db connectors should implement. A partial answer is given in this thread.

Basically the implementation of fetchall vs fetchmany vs fetchone would be up to the developers of the library depending on the database capabilities, but it would make sense, in the case of fetchmany and fetchone, that the unfetched/remaining results would be kept server side, until requested by another call or destruction of cursor object.

So in conclusion I think it is safe to assume calling execute method does not, in this case(mysqldb), dump all the data from the query to memory.

SQLAlchemy `.fetchmany()` vs `.limit()`

limit will be a part of the sql query sent to the database server.

With fetchmany the query is executed without any limit, but the client (python code) requests only certain number of rows.

Therefore using limit should be faster in most cases.

What's the difference between using "c.fetchall()" vs. just assigning "c.execute(SELECT...." to a variable?

The Python DBAPI doesn't define what execute returns: it could be a generator, it could be a list of results, it could be some custom object representing the results, etc. It only promises that the database query will be made; how the results are presented or made available is not defined.

The fetch_all method, however, is defined to return a "sequence of sequences", which means you have the actual, instantiated result in memory immediately. Iterating over the result, in particular, isn't going to trigger a delayed database connection to execute a query or fetch more results.

One consequence of this is that an implementation could define your two approaches to be equivalent, but it is not required to.

cursor.fetchall() vs list(cursor) in Python

If you are using the default cursor, a MySQLdb.cursors.Cursor, the entire result set will be stored on the client side (i.e. in a Python list) by the time the cursor.execute() is completed.

Therefore, even if you use

for row in cursor:

you will not be getting any reduction in memory footprint. The entire result set has already been stored in a list (See self._rows in MySQLdb/cursors.py).

However, if you use an SSCursor or SSDictCursor:

import MySQLdb
import MySQLdb.cursors as cursors

conn = MySQLdb.connect(..., cursorclass=cursors.SSCursor)

then the result set is stored in the server, mysqld. Now you can write

cursor = conn.cursor()
cursor.execute('SELECT * FROM HUGETABLE')
for row in cursor:
print(row)

and the rows will be fetched one-by-one from the server, thus not requiring Python to build a huge list of tuples first, and thus saving on memory.

Otherwise, as others have already stated, cursor.fetchall() and list(cursor) are essentially the same.

SQLite3, Python: fetchone() works on table1 but not table2 but fetchall() works on both

You are looping over the cursor. This yields the data already. By the time you call fetchone() the row has already been served.

Just use the loop variable, it contains each row result as you iterate:

cur.execute('select * from ' + tablename1)
for row in cur:
print row

Your loop over tablename3 only sees half the rows; you fetch one row by iterating, ignore that row, fetch the next with cur.fetchone() and print that one, repeating the process in a loop.

Use either iteration or fetchone() and fetchall(). Don't mix the two.

fetchone() would be used to fetch just one result row, for example:

cur.execute('select * from ' + tablename1 + ' WHERE unique_column=?', ('somevalue',))
row = cur.fetchone()
if row is not None:
# there was a matching row, rejoice
print row


Related Topics



Leave a reply



Submit