Using Hibernate's Scrollableresults to Slowly Read 90 Million Records

Using Hibernate's ScrollableResults to slowly read 90 million records

Using setFirstResult and setMaxResults is your only option that I'm aware of.

Traditionally a scrollable resultset would only transfer rows to the client on an as required basis. Unfortunately the MySQL Connector/J actually fakes it, it executes the entire query and transports it to the client, so the driver actually has the entire result set loaded in RAM and will drip feed it to you (evidenced by your out of memory problems). You had the right idea, it's just shortcomings in the MySQL java driver.

I found no way to get around this, so went with loading large chunks using the regular setFirst/max methods. Sorry to be the bringer of bad news.

Just make sure to use a stateless session so there's no session level cache or dirty tracking etc.

EDIT:

Your UPDATE 2 is the best you're going to get unless you break out of the MySQL J/Connector. Though there's no reason you can't up the limit on the query. Provided you have enough RAM to hold the index this should be a somewhat cheap operation. I'd modify it slightly, and grab a batch at a time, and use the highest id of that batch to grab the next batch.

Note: this will only work if other_conditions use equality (no range conditions allowed) and have the last column of the index as id.

select * 
from person
where id > <max_id_of_last_batch> and <other_conditions>
order by id asc
limit <batch_size>

Hibernate: Walk millions of rows and don't leak memory

I think one of my problems was that

if (sr.isLast()) {
advanceScroll();
//...

combined with

((Session) Main.em.getDelegate()).clear();
//Also, "Main.em.clear()" should do...

resulted in flushing the database out one run too early. That was the cause of exceptions regarding collections. Collections cannot be handled in a StatelessSession, so that's off the table. I don't know why session.evict(currentObject) fails to work when Session.clear() does work, but that's the way I'll have to handle it for now. I'll toss the answer points to whoever can figure that one out.

So, for now, there we have an answer. A manual scrolling window is required, closing the ScrollableResults doesn't help, and I need to properly run a Session.clear().

Why does .scroll with stateless session causes OutOfMemoryException?

Here's an excerpt from the documentation of the MySQL JDBC driver

By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate, and due to the design of the MySQL network protocol is easier to implement. If you are working with ResultSets that have a large number of rows or large values, and cannot allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time.

To enable this functionality, create a Statement instance in the following manner:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

The combination of a forward-only, read-only result set, with a fetch size of Integer.MIN_VALUE serves as a signal to the driver to stream result sets row-by-row. After this, any result sets created with the statement will be retrieved row-by-row.

There are some caveats with this approach. You must read all of the rows in the result set (or close it) before you can issue any other queries on the connection, or an exception will be thrown.

Given that you get an OutOfMemoryError when just reading from a MySQL result set, my guess is that this result set is read completely in memory, and is large enough to consume all the available memory of the JVM.



Related Topics



Leave a reply



Submit