Why Can't I Seem to Force Oracle 11G to Consume More Cpus for a Single SQL Query

Parallelism doesn't provide any performance gains

A parallel index fast full scan is probably the fastest way to count the number of records of a very large table with a primary key.

There are dozens of reasons why the parallel hint did not improve performance in the original query. The most likely reason, as shown below, is that your queries are comparing a parallel full table scan with a serial index fast full scan.

Sample Schema

--Sample schema: Create table, insert 30M rows, add a primary key, gather stats.
--This takes a few minutes to setup.
create table test1(c1 number,c2 number,c3 number,c4 number,c5 number) nologging;
begin
for i in 1 .. 300 loop
insert /*+ append */ into test1
select level+(i*100000),level,level,level,level
from dual connect by level <= 100000;
commit;
end loop;
end;
/
alter table test1 add constraint test1_pk primary key(c1);
begin
dbms_stats.gather_table_stats(user, 'test1');
end;
/

Explain Plans

--#1: Parallel hint - parallel full table scan.
explain plan for select /*+ parallel(test1) */ count(*) from test1;
select * from table(dbms_xplan.display);

--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 73 (22)| 00:00:01 | | | |
| 1 | SORT AGGREGATE | | 1 | | | | | |
| 2 | PX COORDINATOR | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | | | Q1,00 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | 1 | | | Q1,00 | PCWP | |
| 5 | PX BLOCK ITERATOR | | 30M| 73 (22)| 00:00:01 | Q1,00 | PCWC | |
| 6 | TABLE ACCESS FULL| TEST1 | 30M| 73 (22)| 00:00:01 | Q1,00 | PCWP | |
--------------------------------------------------------------------------------------------------------


--#2: Regular count(*) - serial index fast full scan.
explain plan for select count(*) from test1;
select * from table(dbms_xplan.display);

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 1270 (32)| 00:00:11 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| TEST1_PK | 30M| 1270 (32)| 00:00:11 |
--------------------------------------------------------------------------

--#3: Parallel_index - parallel index fast full scan.
explain plan for select /*+ parallel_index(test1) */ count(*) from test1;
select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 1270 (32)| 00:00:11 | | | |
| 1 | SORT AGGREGATE | | 1 | | | | | |
| 2 | PX COORDINATOR | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | | | Q1,00 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | 1 | | | Q1,00 | PCWP | |
| 5 | PX BLOCK ITERATOR | | 30M| 1270 (32)| 00:00:11 | Q1,00 | PCWC | |
| 6 | INDEX FAST FULL SCAN| TEST1_PK | 30M| 1270 (32)| 00:00:11 | Q1,00 | PCWP | |
-----------------------------------------------------------------------------------------------------------

Performance Comparison

This block of code was run several times. The first, highest, and lowest results were thrown out, and the rest averaged.

alter system flush buffer_cache;
select /*+ parallel(test1) */ count(*) from test1;
alter system flush buffer_cache;
select count(*) from test1;
alter system flush buffer_cache;
select /*+ parallel_index(test1) */ count(*) from test1;


Count(*) (serial index) - 6.34 seconds
Parallel (parallel full table scan) - 6.20 seconds
Parallel_index (parallel index) - 2.35 seconds

A thousand other reasons, some warnings

Maybe the queries ran the same on your system because parallelism isn't enabled, wasn't used, or wasn't useful. Check the explain plans, SQL Monitoring, or v$px_process to get an idea of how much parallelism is used. If the degree of parallelism doesn't seem right, I've put together a list of factors influencing the DOP in this answer.

Parallelism can make a huge difference with many queries, for both I/O and CPU. But 35 seconds is definitely on the small end of the scale. Parallel queries are normally used for statements that take several minutes or hours. There's going to be a large percentage of overhead with such a small query. Even if it does run faster you should consider whether it's worth running one query faster at the expense of additional resources taken away from other queries.

Slow performance for deeply nested subquery factoring (CTE)

Q1: Seems that there are nothing about calculation time, just bug in optimizer algorithm which make it mad while calculating a best execution plan.

Q2: There are a number of known and fixed bugs in Oracle 11.X.0.X related to optimization of nested queries and query factoring. But it's very hard to find a concrete issue.

Q3: There are two undocumented hints: materialize and inline but no one of them works for me while I tried your example. It's possible that some changes in server configuration or upgrading to 11.2.0.3 may increase limit of nested with clauses: for me (on 11.2.0.3 Win7/x86) your example works fine, but increasing number of nested tables to 30 hangs a session.

Workaround may look like this:

select k from (
select k, avg(k) over (partition by null) k_avg from ( --t16
select k, avg(k) over (partition by null) k_avg from ( --t15
select k, avg(k) over (partition by null) k_avg from ( --t14
select k, avg(k) over (partition by null) k_avg from ( --t13
select k, avg(k) over (partition by null) k_avg from ( --t12
select k, avg(k) over (partition by null) k_avg from ( --t11
select k, avg(k) over (partition by null) k_avg from ( --t10
select k, avg(k) over (partition by null) k_avg from ( --t9
select k, avg(k) over (partition by null) k_avg from ( --t8
select k, avg(k) over (partition by null) k_avg from ( --t7
select k, avg(k) over (partition by null) k_avg from ( --t6
select k, avg(k) over (partition by null) k_avg from ( --t5
select k, avg(k) over (partition by null) k_avg from ( --t4
select k, avg(k) over (partition by null) k_avg from ( --t3
select k, avg(k) over (partition by null) k_avg from ( --t2
select k, avg(k) over (partition by null) k_avg from ( -- t1
select k, avg(k) over (partition by null) k_avg from (select 0 as k from dual) t0
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
) where k >= k_avg
)

At least it works for me on nesting level of 30 and produces totally different execution plan with WINDOW BUFFER and VIEW instead of LOAD TABLE AS SELECT, SORT AGGREGATE and TABLE ACCESS FULL.

Update

  1. Just installed 11.2.0.4 (Win7/32bit) and test it against initial query. Nothing changed in optimizer behavior.

  2. There are no possibility to directly affect a CBO behavior, even with use of inline (undocumented) or RULE (deprecated) hints. May be some Guru knows a some variant, but it's a Top Secret for me (and Google too :-) .

  3. Doing things in a one select statement in reasonable time is possible if a main select statement separated into a parts and placed into the function which returns a set of rows (function returning sys_refcursor or strong typed cursor), but it's not a choice if a query constructed at runtime.

  4. Workaround with usage of XML is possible, but this variant looks like removing a tonsil through the ass hole (sorry):

.

select
extractvalue(column_value,'/t/somevalue') abc
from
table(xmlsequence((
select t2 from (
select
t0,
t1,
(
select xmlagg(
xmlelement("t",
xmlelement("k1",extractvalue(t1t.column_value,'/t/k1')),
xmlelement("somevalue", systimestamp))
)
from
table(xmlsequence(t0)) t0t,
table(xmlsequence(t1)) t1t
where
extractvalue(t1t.column_value,'/t/k1') >= (
select avg(extractvalue(t1t.column_value, '/t/k1')) from table(xmlsequence(t1))
)
and
extractvalue(t0t.column_value,'/t/k2') > 6
) t2
from (
select
t0,
(
select xmlagg(
xmlelement("t",
xmlelement("k1",extractvalue(column_value,'/t/k1')),
xmlelement("somevalue", sysdate))
)
from table(xmlsequence(t0))
where
extractvalue(column_value,'/t/k1') >= (
select avg(extractvalue(column_value, '/t/k1')) from table(xmlsequence(t0))
)
) t1
from (
select
xmlagg(xmlelement("t", xmlelement("k1", level), xmlelement("k2", level + 3))) t0
from dual connect by level < 5
)
)
)
)))

Another thing about a strange code above is that this variant applicable only if with data sets didn't have a big number of rows.



Related Topics



Leave a reply



Submit