Streaming Large Result Sets with MySQL

Streaming large result sets with MySQL

Don't close your ResultSets twice.

Apparently, when closing a Statement it attempts to close the corresponding ResultSet, as you can see in these two lines from the stack trace:

DelegatingResultSet.close() line: 152

DelegatingPreparedStatement(DelegatingStatement).close() line: 163

I had thought the hang was in ResultSet.close() but it was actually in Statement.close() which calls ResultSet.close(). Since the ResultSet was already closed, it just hung.

We've replaced all ResultSet.close() with results.getStatement().close() and removed all Statement.close()s, and the problem is now solved.

Streaming MySql ResultSet with fixed number of results at a time

I will assume that you are using the official MySQL provided JDBC driver Connector/J.

You are explicitly telling JDBC (and MySQL) to stream the results row-by-row with statement.setFetchSize(Integer.MIN_VALUE);

From MYSQL Docs:

By default, ResultSets are completely retrieved and stored in memory.
In most cases this is the most efficient way to operate, and due to
the design of the MySQL network protocol is easier to implement. If
you are working with ResultSets that have a large number of rows or
large values, and can not allocate heap space in your JVM for the
memory required, you can tell the driver to stream the results back
one row at a time.

To enable this functionality, you need to create a Statement instance
in the following manner:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

The combination of a forward-only, read-only result set, with a fetch
size of Integer.MIN_VALUE serves as a signal to the driver to stream
result sets row-by-row. After this any result sets created with the
statement will be retrieved row-by-row.

Any value other than Integer.MIN_VALUE for the fetch size is ignored by MySQL, and the standard behavior applies. The entire result set will be fetched by the JDBC driver.

Either don't use setFetchSize(), so the JDBC driver will use the default value (0), or set the value to 0 explicitly. Using the value of 0 will also ensure that JDBC doesn't use MySQL cursors, which may occur depending on your MySQL and Connector/J versions and configuration.

MySql Resultset Stream large complex query

Streaming happens when the results are ready to be transferred. If you have a complex query it can still take minutes before the first row can be streamed.

Streaming is advantageous because it allows you to use less memory for large result sets. From a speed point of view it's basically always better to read all the results into memory before processing them.

How to stream resultset with MySQL .net connector for large dataset

I think you are looking for a derived class of DbDataReader; likely OdbcDataReader or OleDbDataReader. These classes give forward-only access to a result set. See the links below for more information:

http://msdn.microsoft.com/en-us/library/haa3afyz(v=vs.110).aspx ("The DataReader is a good choice when retrieving large amounts of data because the data is not cached in memory.")

http://msdn.microsoft.com/en-us/library/system.data.common.dbdatareader(v=vs.110).aspx

http://msdn.microsoft.com/en-us/library/system.data.odbc.odbcdatareader(v=vs.110).aspx

http://msdn.microsoft.com/en-us/library/system.data.oledb.oledbdatareader(v=vs.110).aspx

Does MySQL Connector/J buffer rows when streaming a ResultSet?

It seems MySQL does some buffering automatically when fetchSize is set to Integer.MIN_VALUE.

It does, at least sometimes. I tested the behaviour of MySQL Connector/J version 5.1.37 using Wireshark. For the table ...

CREATE TABLE lorem (
id INT AUTO_INCREMENT PRIMARY KEY,
tag VARCHAR(7),
text1 VARCHAR(255),
text2 VARCHAR(255)
)

... with test data ...

 id  tag      text1            text2
--- ------- --------------- ---------------
0 row_000 Lorem ipsum ... Lorem ipsum ...
1 row_001 Lorem ipsum ... Lorem ipsum ...
2 row_002 Lorem ipsum ... Lorem ipsum ...
...
999 row_999 Lorem ipsum ... Lorem ipsum ...

(where both `text1` and `text2` actually contain 255 characters in each row)

... and the code ...

try (Statement s = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY)) {
s.setFetchSize(Integer.MIN_VALUE);
String sql = "SELECT * FROM lorem ORDER BY id";
try (ResultSet rs = s.executeQuery(sql)) {

... immediately after the s.executeQuery(sql) – i.e., before rs.next() is even called – MySQL Connector/J had retrieved the first ~140 rows from the table.

In fact, when querying just the tag column

    String sql = "SELECT tag FROM lorem ORDER BY id";

MySQL Connector/J immediately retrieved all 1000 rows as shown by the Wireshark list of network frames:

framelist.png

Frame 19, which sent the query to the server, looked like this:

frame19.png

The MySQL server responded with frame 20, which started with ...

frame20.png

... and was immediately followed by frame 21, which began with ...

frame21.png

... and so on until the server had sent frame 32, which ended with

frame32.png

Since the only difference was the amount of information being returned for each row, we can conclude that MySQL Connector/J decides on an appropriate buffer size based on the maximum length of each returned row and the amount of free memory available.

what does MySQL do if the result set has more elements than the fetchSize? e.g., result set has 10M rows and fetchSize is set to 1000. What happens then?

MySQL Connector/J initially retrieves the first fetchSize group of rows, then as rs.next() moves through them it will eventually retrieve the next group of rows. That is true even for setFetchSize(1) which, incidentally, is the way to really get only one row at a time.

(Note that setFetchSize(n) for n>0 requires useCursorFetch=true in the connection URL. That is apparently not required for setFetchSize(Integer.MIN_VALUE).)



Related Topics



Leave a reply



Submit