What Does Statement.Setfetchsize(Nsize) Method Really Do in SQL Server Jdbc Driver

What is the difference between Statement.setMaxRows vs Statement.setFetchsize in Hive

No, you can't use them interchangeably. They do different things. The setMaxRows = number of rows that can be returned overall. setFetchSize = number that will be returned in each database roundtrip i.e.

setFetchSize Gives the JDBC driver a hint as to the
number of rows that should be fetched from the database when more rows
are needed for ResultSet objects genrated by this Statement.

setMaxRows Sets the limit for the maximum number of rows
that any ResultSet object generated by this Statement object can
contain to the given number.

In fact since setFetchSize is a hint the driver is free to ignore this and do what it sees fit. So don't worry about Hive JDBC not supporting this.

Note that all that setMaxRows is doing is

reducing the size of the ResultSet object. It won't affect the speed
of the query. setMaxRows doesn't change the actual SQL - using
top/limit/rownum e.g. - so it doesn't change the work the DB does. The
query will return more results than your limit if there are more
results to return, then truncate them to fit your ResultSet.

This answer does a good job of explaining how setFetchSize is important:

very important to performance and memory-management within the JVM as
it controls the number of network calls from the JVM to the database
and correspondingly the amount of RAM used for ResultSet processing.


Btw, setFetchSize can be set on java.sql.Statement as well as java.sql.ResultSet. The default value is set by the Statement object that created the result set. The fetch size may be changed at any time. And Hive JDBC has it's own HiveQueryResultSet with a setFetchSize method.

Set a default row prefetch in SQL Server using JDBC driver

The usual ways to set row fetch size are:

  1. Via java.sql.Connection vendor implementation class custom method (e.g. OracleConnection.setDefaultRowPrefetch)
  2. Via java.sql.Statement.setFetchSize(int): gives a hint to the driver as to the row fetch size for all ResultSets obtained from this Statement. This method is inherited by PreparedStatement and CallableStatement. Most JDBC drivers support it.
  3. Via java.sql.ResultSet.setFetchSize(int): gives a hint to the driver as to the row fetch size for all this ResultSet.

MS SQL Server JDBC driver does not support any of these ways:

  1. MSSQL driver has no such a method.
  2. Unfortunately, while most drivers respect the hint, the MSSQL driver does not. So not useful for you. See What does Statement.setFetchSize(nSize) method really do in SQL Server JDBC driver?
  3. Same problem as Statement.

By default it retrieves all the rows from database unless you specify cursor type in the JDBC driver. MSSQL driver can't directly control the fetch size using the usual methods.

Solutions:

  • Cast your Statement to SQLServerStatement and use the method setMaxRows(int). Why they didn't implement this within the standard method Steve Ballmer only knows ;^)
  • Create your driver with a cursor type. The default fetch size for a cursor is 1. Set the Connection string property selectMethod=cursor. Alternatively, you can create the Statement with com.microsoft.sqlserver.jdbc.SQLServerResultSet.TYPE_SS_SERVER_CURSOR_FORWARD_ONLY scrollability for forward-only, read-only access, and then use the setFetchSize method to tune performance. http://technet.microsoft.com/en-us/library/aa342344%28SQL.90%29.aspx
  • Use (proprietary) SQL to limit the number of rows returned (not the same thing as setting the fetch size): SET ROWCOUNT or SELECT TOP N
  • Switch to open source jTDS driver, specially made to overcome the problems of the SQL Server driver. It's a superior driver.

How JDBC Statement.setFetchsize exactly works

From Statement.setFetchSize:

Gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed for ResultSet objects generated by this Statement. If the value specified is zero, then the hint is ignored. The default value is zero.

So if you have a total of 1000 rows and the fetch size is 100 (and assuming the driver doesn't ignore the hint or fetches an approximate nr), then this means that the driver will fetch all rows in 10 batches of 100 rows each. A batch of rows is fetched when there is no row available locally to fulfill a call to next(), this is done until all rows have been fetched or maxRows has been reached.

This results in something like:

Statement stmt = ...;
stmt.setFetchSize(100);
ResultSet rs = ...;
rs.next(); // Triggers fetch of 100 rows, positions on row 1 / 100
rs.next(); // Positions on row 2 / 100
// 3-99 rs.next();
rs.next(); // Positions on row 100 / 100
rs.next(); // Triggers fetch of 100 rows, positions on row 101 / 200
// etc

This is speaking ideally from the perspective of the JDBC specification. There are drivers that ignore the hint (eg MySQL abuses it to fetch row by row if you set it to Integer.MIN_VALUE and ignores all other values), and some databases will take the hint, but then return less (sometimes more) rows than requested.

I find the documentation of com.google.api.services.bigquery.model.QueryRequest.setMaxResults you link not entirely clear, but it seems to do the same thing.

How many times does a Java ResultSet ask for data from database?

The "Result Set" documentation states:

Fetch Size

By default, when Oracle JDBC runs a query, it retrieves a result set of 10 rows at a time from the database cursor. This is the default Oracle row fetch size value. You can change the number of rows retrieved with each trip to the database cursor by changing the row fetch size value.

Standard JDBC also enables you to specify the number of rows fetched with each database round-trip for a query, and this number is referred to as the fetch size. In Oracle JDBC, the row-prefetch value is used as the default fetch size in a statement object. Setting the fetch size overrides the row-prefetch setting and affects subsequent queries run through that statement object.

Fetch size is also used in a result set. When the statement object run a query, the fetch size of the statement object is passed to the result set object produced by the query. However, you can also set the fetch size in the result set object to override the statement fetch size that was passed to it.

Note:

Changes made to the fetch size of a statement object after a result set is produced will have no affect on that result set.

The result set fetch size, either set explicitly, or by default equal to the statement fetch size that was passed to it, determines the number of rows that are retrieved in any subsequent trips to the database for that result set. This includes any trips that are still required to complete the original query, as well as any refetching of data into the result set. Data can be refetched, either explicitly or implicitly, to update a scroll-sensitive or scroll-insensitive/updatable result set.

Setting the Fetch Size

The following methods are available in all Statement, PreparedStatement, CallableStatement, and ResultSet objects for setting and getting the fetch size:

void setFetchSize(int rows) throws SQLException
int getFetchSize() throws SQLException

To set the fetch size for a query, call setFetchSize on the statement object prior to running the query. If you set the fetch size to N, then N rows are fetched with each trip to the database.

After you have run the query, you can call setFetchSize on the result set object to override the statement object fetch size that was passed to it. This will affect any subsequent trips to the database to get more rows for the original query, as well as affecting any later refetching of rows.

If you want to know the number of trips to the database then call getFetchSize() on the statement and then divide the total number of rows in the result set by the fetch size (and round up).

jtds ignores setFetchSize. How can I limit the fetch size?

The solution is to insert ;useCursors=true into the url.

 url="jdbc:jtds:sqlserver://myserver:1433;DatabaseName=myDb;useCursors=true";

Spark JDBC fetchsize option

Fetch Size It's just a value for JDBC PreparedStatement.

You can see it in JDBCRDD.scala:

 stmt.setFetchSize(options.fetchSize)

You can read more about JDBC FetchSize here

One thing you can also improve is to set all 4 parameters, that will cause parallelization of reading. See more here. Then your reading can be splitted into many machines, so memory usage for every of them may be smaller.

For details which JDBC Options are supported and how, you must search for your Driver documentation - every driver may have it's own behaviour



Related Topics



Leave a reply



Submit