Speed of Paged Queries in Oracle

Speed of paged queries in Oracle

The main problem with Case 2 is that in many cases the whole query result set has to be obtained and then sorted before the first N rows can be returned - unless the ORDER BY columns are indexed and Oracle can use the index to avoid a sort. For a complex query and a large set of data this can take some time. However there may be some things you can do to improve the speed:

  1. Try to ensure that no functions are called in the inner SQL - these may get called 5 million times just to return the first 20 rows. If you can move these function calls to the outer query they will be called less.
  2. Use a FIRST_ROWS_n hint to nudge Oracle into optimising for the fact that you will never return all the data.

EDIT:

Another thought: you are currently presenting the user with a report that could return thousands or millions of rows, but the user is never realistically going to page through them all. Can you not force them to select a smaller amount of data e.g. by limiting the date range selected to 3 months (or whatever)?

oracle faster paging query

Take a look at the execution plans, example with 1000 rows:

SELECT *
FROM (SELECT ROWNUM rnum
,a.*
FROM (SELECT *
FROM members) a
WHERE ROWNUM <= endrow#)
WHERE rnum > startrow#;

--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 39000 | 3 (0)| 00:00:01 |
|* 1 | VIEW | | 1000 | 39000 | 3 (0)| 00:00:01 |
| 2 | COUNT | | | | | |
|* 3 | FILTER | | | | | |
| 4 | TABLE ACCESS FULL| MEMBERS | 1000 | 26000 | 3 (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("RNUM">"STARTROW#")
3 - filter("MEMBERS"."ENDROW#">=ROWNUM)

And 2.

SELECT *
FROM (SELECT ROWNUM rnum
,a.*
FROM (SELECT *
FROM members) a)
WHERE rnum BETWEEN startrow# AND endrow#;

-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000 | 39000 | 3 (0)| 00:00:01 |
|* 1 | VIEW | | 1000 | 39000 | 3 (0)| 00:00:01 |
| 2 | COUNT | | | | | |
| 3 | TABLE ACCESS FULL| MEMBERS | 1000 | 26000 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("RNUM"<="ENDROW#" AND "RNUM">="STARTROW#")

Out of that I'd say version 2 could be slightly faster as it includes one step less. But I don't know about your indexes and data distribution so it's up to you to get these execution plans yourself and judge the situation for your data. Or simply test it.

Oracle Pagination strategy

Pagination pattern has been invented for the purpose of websites presentation (in opposite to scrolling navigation), and works best there. In short, the live user is practically unable to view thousands/millions of records at once, so the information is divided into short pages (50~200 records), where one query is usually sent to the database for each page. The user usually clicks on a few pages only, but does not browse all of them, in addition the user needs a bit of time to browse the page, so the queries are not sent to the database one by one, but in long intervals. The time to retrieve a chunk of data is much shorter than retrieving all millions of record, so the user is happy because he does not have to wait long for subsequent pages, and the overall system load is smaller.


But it seems from the question that the nature of your application is oriented to batch processing rather than to the web presentation. The application must fetch all records and do some operations/transformations (calculations) on each of the records. In this case , completely different design patterns are used (stream/pipelined processing, sequence of steps, parallel steps/operations etc), and pagination will not work, if you go that way you will kill your system performance.


Instead of fancy theory, let's look at simple and practical example which will show you what differences in speed we are talking here
Let say there is a table PAGINATION with about 7 millions of records:

create table pagination as
select sysdate - 200 * dbms_random.value As my_date, t.*
from (
select o.* from all_objects o
cross join (select * from dual connect by level <= 100)
fetch first 10000000 rows only
) t;

select count(*) from pagination;

COUNT(*)
----------
7369600

Let say there is an index created on MY_DATE column, and index statistics are fresh:

create index PAGINATION_IX on pagination( my_date );

BEGIN dbms_stats.gather_table_stats( 'TEST', 'PAGINATION', method_opt => 'FOR ALL COLUMNS' ); END;
/

Let say that we are going to process about 10% of records from the table between the below dates:

select count(*) from pagination
where my_date between date '2017-10-01' and '2017-10-21';

COUNT(*)
----------
736341

and finally let say that our "processing" for simplicity, will consist in simple summing of lengths of one of field.

This is a simple paging implementation:

public class Pagination {

public static class RecordPojo {
Date myDate;
String objectName;

public Date getMyDate() {
return myDate;
}
public RecordPojo setMyDate(Date myDate) {
this.myDate = myDate;
return this;
}
public String getObjectName() {
return objectName;
}
public RecordPojo setObjectName(String objectName) {
this.objectName = objectName;
return this;
}
};

static class MyPaginator{

private Connection conn;
private int pageSize;
private int currentPage = 0;

public MyPaginator( Connection conn, int pageSize ) {
this.conn = conn;
this.pageSize = pageSize;
}

static final String QUERY = ""
+ "SELECT my_date, object_name FROM pagination "
+ "WHERE my_date between date '2017-10-01' and '2017-10-21' "
+ "ORDER BY my_date "
+ "OFFSET ? ROWS FETCH NEXT ? ROWS ONLY";

List<RecordPojo> getNextPage() {
List<RecordPojo> list = new ArrayList<>();
ResultSet rs = null;
try( PreparedStatement ps = conn.prepareStatement(QUERY);) {
ps.setInt(1, pageSize * currentPage++ );
ps.setInt(2, pageSize);
rs = ps.executeQuery();

while( rs.next()) {
list.add( new RecordPojo().setMyDate(rs.getDate(1)).setObjectName(rs.getString(2)));
}

} catch (SQLException e) {
e.printStackTrace();
}finally {
try{rs.close();}catch(Exception e) {}
}
return list;
}

public int getCurrentPage() {
return currentPage;
}
}

public static void main(String ...x) throws SQLException {
OracleDataSource ds = new OracleDataSource();
ds.setURL("jdbc:oracle:thin:test/test@//localhost:1521/orcl");
long startTime = System.currentTimeMillis();
long value = 0;
int pageSize = 1000;

try( Connection conn = ds.getConnection();){
MyPaginator p = new MyPaginator(conn, pageSize);
List<RecordPojo> list;
while( ( list = p.getNextPage()).size() > 0 ) {
value += list.stream().map( y -> y.getObjectName().length()).mapToLong(Integer::longValue).sum();
System.out.println("Page: " + p.getCurrentPage());
}
System.out.format("==================\nValue = %d, Pages = %d, time = %d seconds", value, p.getCurrentPage(), (System.currentTimeMillis() - startTime)/1000);
}
}
}

A result is:

Value = 18312338, Pages = 738,  time = 2216 seconds

Now let's test a very simple stream based solution - just take only one record, process it, discard it (freeing up memory), and take the next one.

public class NoPagination {

static final String QUERY = ""
+ "SELECT my_date, object_name FROM pagination "
+ "WHERE my_date between date '2017-10-01' and '2017-10-21' "
+ "ORDER BY my_date ";

public static void main(String[] args) throws SQLException {
OracleDataSource ds = new OracleDataSource();
ds.setURL("jdbc:oracle:thin:test/test@//localhost:1521/orcl");
long startTime = System.currentTimeMillis();
long count = 0;

ResultSet rs = null;
PreparedStatement ps = null;
try( Connection conn = ds.getConnection();){
ps = conn.prepareStatement(QUERY);
rs = ps.executeQuery();
while( rs.next()) {
// processing
RecordPojo r = new RecordPojo().setMyDate(rs.getDate(1)).setObjectName(rs.getString(2));
count+=r.getObjectName().length();
}
System.out.format("==================\nValue = %d, time = %d seconds", count, (System.currentTimeMillis() - startTime)/1000);
}finally {
try { rs.close();}catch(Exception e) {}
try { ps.close();}catch(Exception e) {}
}
}

A result is:

Value = 18312328, time = 11 seconds

Yes - 2216 seconds / 11 seconds = 201 times faster - 20 100 % faster !!!
Unbelievable ? You can test it yourself.

This example shows how important it is to choose the right solution (right design patterns) to solve the problem.

pagination slow with userdefined function and left join oracle 10g

Calling an PL/SQL function from inside a SQL query can be quite expensive, as it causes a switch from SQL to PL/SQL for each row.

ad_users.person_id seems to be nullable? I'd exclude calls to ps_fio if u.personid or sr.personid is NULL, depending on the frequency of those nulls.

Next, I'd remove the calculation of gender_int from the function ps_fio as it seems not to be used.

Next, I'd try to replace the function call with a join to the table persons and compute the full name directly in the query. Ugly, but may be faster.

Lastly, your calls to trim(familyname) etc suggest that there are trailing and/or leading spaces in the name columns of table ps_persons. The speed benefit would be minimal, but may be this clean up could be done in the table once, so that the trim function calls are not neccessary?

Alternatives to LIMIT and OFFSET for paging in Oracle

You will need to use the rownum pseudocolumn to limit results. See here:

http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html

http://www.oracle.com/technetwork/issue-archive/2006/06-sep/o56asktom-086197.html

(Oracle) How get total number of results when using a pagination query?

I think you have to modify your query to something like this to get all the information you want on a "single" query.

SELECT *
FROM (SELECT r.*, ROWNUM RNUM, COUNT(*) OVER () RESULT_COUNT
FROM (SELECT t0.ABC_SEQ_ID AS c0, t0.FIRST_NAME, t1.SCORE
FROM ABC t0, XYZ t1
WHERE (t0.XYZ_ID = 751)
AND t0.XYZ_ID = t1.XYZ_ID
ORDER BY t0.RANK ASC) R)
WHERE RNUM between 1 and 15

The reason is that the COUNT(*) OVER() window function gets evaluated after the WHERE clause, hence not giving the total count of records but the count of records that satisfy the ROWNUM <= 30 condition.

If you cannot accept the performance ot this query, or of executing 2 separate queries, maybe you should think about a solution like the one proposed by FrustratedWithFormsDesigner in his/her comment about caching the count of records.

If you work with databases on a regular basis I recommend you get a copy of SQL Cookbook. It is an exceptional book with lots of useful tips.

How can I speed up row_number in Oracle?

ROW_NUMBER is quite inefficient in Oracle.

See the article in my blog for performance details:

  • Oracle: ROW_NUMBER vs ROWNUM

For your specific query, I'd recommend you to replace it with ROWNUM and make sure that the index is used:

SELECT  *
FROM (
SELECT /*+ INDEX_ASC(t index_on_column) NOPARALLEL_INDEX(t index_on_column) */
t.*, ROWNUM AS rn
FROM table t
ORDER BY
column
)
WHERE rn >= :start
AND rownum <= :end - :start + 1

This query will use COUNT STOPKEY

Also either make sure you column is not nullable, or add WHERE column IS NOT NULL condition.

Otherwise the index cannot be used to retrieve all values.

Note that you cannot use ROWNUM BETWEEN :start and :end without a subquery.

ROWNUM is always assigned last and checked last, that's way ROWNUM's always come in order without gaps.

If you use ROWNUM BETWEEN 10 and 20, the first row that satisifies all other conditions will become a candidate for returning, temporarily assigned with ROWNUM = 1 and fail the test of ROWNUM BETWEEN 10 AND 20.

Then the next row will be a candidate, assigned with ROWNUM = 1 and fail, etc., so, finally, no rows will be returned at all.

This should be worked around by putting ROWNUM's into the subquery.

Oracle low speed in first execute

The reason why it takes only a few millisecs to execute the query after the first run is that Oracle caches the results. If you change the SQL then Oracle considers it a different query and won't serve the results from the cache but executes the new query.

It is a hard question how to speed up first execution. You'll need to post explain plan and probably you'll have to answer further questions if you want to get help on that.

My first try would be eliminate subqueries, although these days Oracle optimizes this quite well:

SELECT DISTINCT VW1.Id, VW1.Qflag, VW1.LetterId, VW1.LetterNo, VW1.LetterDate, 
VW1.InstanceDate, VW1.ViewDate, VW1.DeadlineDate
FROM
VWLETTERINSTANCESEARCH VW1,
VWLETTERINSTANCESEARCH VW2,
LetterInstances, ChildInstance, Folders,
vwStaffUsers SU
WHERE ( QFLAG IS NULL OR QFLAG = 0 )
AND (VW1.PARENTINSTANCEID=VW2.Id AND VW2.refSenderId=1947)
AND (ChildInstance.ParentInstanceId=VW1.Id AND
ChildInstance.FolderId=Folders.Id AND Folders.OwnerId=23)
AND (VW1.OwnerId = SU.StaffId AND SU.UserId=2 AND SU.Deleted=0)

EDIT
Also, I'd try to run the query only on the main table (i.e. VWLETTERINSTANCESEARCH VW1) and see if it is fast enough. Then I'd add the rest of the tables gradually, one by one, and see where the performance starts to degrade.



Related Topics



Leave a reply



Submit