Slow Performance for Deeply Nested Subquery Factoring (Cte)

Slow performance for deeply nested subquery factoring (CTE)

Q1: Seems that there are nothing about calculation time, just bug in optimizer algorithm which make it mad while calculating a best execution plan.

Q2: There are a number of known and fixed bugs in Oracle 11.X.0.X related to optimization of nested queries and query factoring. But it's very hard to find a concrete issue.

Q3: There are two undocumented hints: materialize and inline but no one of them works for me while I tried your example. It's possible that some changes in server configuration or upgrading to 11.2.0.3 may increase limit of nested with clauses: for me (on 11.2.0.3 Win7/x86) your example works fine, but increasing number of nested tables to 30 hangs a session.

Workaround may look like this:

select k from (
select k, avg(k) over (partition by null) k_avg from ( --t16
  select k, avg(k) over (partition by null) k_avg from ( --t15
    select k, avg(k) over (partition by null) k_avg from ( --t14
      select k, avg(k) over (partition by null) k_avg from ( --t13
        select k, avg(k) over (partition by null) k_avg from ( --t12
          select k, avg(k) over (partition by null) k_avg from ( --t11
            select k, avg(k) over (partition by null) k_avg from ( --t10
              select k, avg(k) over (partition by null) k_avg from ( --t9
                select k, avg(k) over (partition by null) k_avg from ( --t8
                  select k, avg(k) over (partition by null) k_avg from ( --t7
                    select k, avg(k) over (partition by null) k_avg from ( --t6
                      select k, avg(k) over (partition by null) k_avg from ( --t5
                        select k, avg(k) over (partition by null) k_avg from ( --t4
                          select k, avg(k) over (partition by null) k_avg from ( --t3
                            select k, avg(k) over (partition by null) k_avg from ( --t2
                              select k, avg(k) over (partition by null) k_avg from ( -- t1
                                select k, avg(k) over (partition by null) k_avg from (select 0 as k from dual) t0
                              ) where k >= k_avg
                            ) where k >= k_avg
                          ) where k >= k_avg
                        ) where k >= k_avg
                      ) where k >= k_avg
                    ) where k >= k_avg
                  ) where k >= k_avg
                ) where k >= k_avg
              ) where k >= k_avg
            ) where k >= k_avg
          ) where k >= k_avg
        ) where k >= k_avg
      ) where k >= k_avg
    ) where k >= k_avg
  ) where k >= k_avg
) where k >= k_avg
)

At least it works for me on nesting level of 30 and produces totally different execution plan with WINDOW BUFFER and VIEW instead of LOAD TABLE AS SELECT, SORT AGGREGATE and TABLE ACCESS FULL.

Update

Just installed 11.2.0.4 (Win7/32bit) and test it against initial query. Nothing changed in optimizer behavior.
There are no possibility to directly affect a CBO behavior, even with use of inline (undocumented) or RULE (deprecated) hints. May be some Guru knows a some variant, but it's a Top Secret for me (and Google too :-) .
Doing things in a one select statement in reasonable time is possible if a main select statement separated into a parts and placed into the function which returns a set of rows (function returning sys_refcursor or strong typed cursor), but it's not a choice if a query constructed at runtime.
Workaround with usage of XML is possible, ~~but this variant looks like removing a tonsil through the ass hole~~ (sorry):

select
  extractvalue(column_value,'/t/somevalue') abc
from 
  table(xmlsequence((
    select t2 from (
      select
        t0,
        t1,
        (   
          select xmlagg(
                   xmlelement("t", 
                     xmlelement("k1",extractvalue(t1t.column_value,'/t/k1')), 
                     xmlelement("somevalue", systimestamp))
                  )
          from 
            table(xmlsequence(t0)) t0t, 
            table(xmlsequence(t1)) t1t  
          where 
            extractvalue(t1t.column_value,'/t/k1') >= (
              select avg(extractvalue(t1t.column_value, '/t/k1')) from table(xmlsequence(t1))
            )                                              
            and 
            extractvalue(t0t.column_value,'/t/k2') > 6
        ) t2
      from (
        select
          t0,
          (
            select xmlagg(
                     xmlelement("t", 
                       xmlelement("k1",extractvalue(column_value,'/t/k1')), 
                       xmlelement("somevalue", sysdate))
                    )
            from table(xmlsequence(t0))   
            where 
              extractvalue(column_value,'/t/k1') >= (
                select avg(extractvalue(column_value, '/t/k1')) from table(xmlsequence(t0))
              )
          ) t1
        from (
          select
            xmlagg(xmlelement("t", xmlelement("k1", level), xmlelement("k2", level + 3))) t0
          from dual connect by level < 5
        )
      )
    )
  )))

Another thing about a strange code above is that this variant applicable only if with data sets didn't have a big number of rows.

Does having too many subqueries in the from clause slow down the query

[TL;DR] It depends on the queries you are using in the sub-queries.

In your case:

select id,
       name,
       annual_income * 0.10 AS tax
from   (
  select id,
         name,
         annual_income
  from   (
    select id,
           first_name || ' ' || last_name AS name
           income * 12 AS annual_income
    from   table_name
  )
);

Will get rewritten by the SQL engine to:

select id,
       first_name || ' ' || last_name AS name
       income * 1.2 AS tax
from   table_name;

There will be no difference in performance between the two queries and if it is easier for you to understand and/or maintain the query in its expanded form then you should use that format and not worry about the nested sub-queries.

However, there are some cases when sub-queries can affect performance. For example, this question was a particularly complex issue where the sub-query factoring clause was being materialized by the inclusion of the ROWNUM pseudo-column and that forced the SQL engine to execute in a particular order and prevented if from rewriting the query into a more optimal form and prevented it from using an index which made the query very slow.

Execute Subquery refactoring first before any other SQL

Are these transformations really that complex you have to use UNION ALL? It's really hard to optimize something you can't see, but have you maybe tried getting rid of the CTE and implementing your calculations inline?

CREATE OR REPLACE VIEW loan_vw AS
SELECT loan.contract_id
     , CASE commission.type -- or wherever this comes from
         WHEN 'PRINCIPAL'
         THEN SUM(whatever) OVER (PARTITION BY loan.contract_id, loan.type) -- total_whatever

         ELSE SUM(something_else) OVER (PARTITION BY loan.contract_id, loan.type) -- total_something_else
      END AS whatever_something
  FROM loan_table loan 
 INNER 
  JOIN commission_table commission
    ON loan.contract_id = commission.commission_id

Note that if your analytic functions don't have PARTITION BY contract_id you won't be able to use an index on that contract_id column at all.

Take a look at this db fiddle (you'll have to click on ... on the last result table to expand the results). Here, the loan table has an indexed (PK) contract_id column, but also some_other_id that is also unique, but not indexed and the predicate on the outer query is still on contract_id. If you compare plans for partition by contract and partition by other id, you'll see that index is not used at all in the partition by other id plan: there's a TABLE ACCESS with FULL options on the loan table, as compared to INDEX - UNIQUE SCAN in partition by contract. That's obviously because the optimizer cannot resolve the relation between contract_id and some_other_id by its own, and so it'll need to run SUM or AVG over the entire window instead of limiting window row counts through index usage.

What you can also try - if you have a dimension table with those contracts - is to join it to your results and expose the contract_id from the dimension table instead of the most likely huge loan fact table. Sometimes this can lead to an improvement in cardinality estimates through the usage of a unique index on the dimension table.

Again, it's really hard to optimize a black box, without a query or even a plan, so we don't know what's going on. CTE or a subquery can get materialized unnecessarily for example.

Is there a performance difference between CTE , Sub-Query, Temporary Table or Table Variable?

SQL is a declarative language, not a procedural language. That is, you construct a SQL statement to describe the results that you want. You are not telling the SQL engine how to do the work.

As a general rule, it is a good idea to let the SQL engine and SQL optimizer find the best query plan. There are many person-years of effort that go into developing a SQL engine, so let the engineers do what they know how to do.

Of course, there are situations where the query plan is not optimal. Then you want to use query hints, restructure the query, update statistics, use temporary tables, add indexes, and so on to get better performance.

As for your question. The performance of CTEs and subqueries should, in theory, be the same since both provide the same information to the query optimizer. One difference is that a CTE used more than once could be easily identified and calculated once. The results could then be stored and read multiple times. Unfortunately, SQL Server does not seem to take advantage of this basic optimization method (you might call this common subquery elimination).

Temporary tables are a different matter, because you are providing more guidance on how the query should be run. One major difference is that the optimizer can use statistics from the temporary table to establish its query plan. This can result in performance gains. Also, if you have a complicated CTE (subquery) that is used more than once, then storing it in a temporary table will often give a performance boost. The query is executed only once.

The answer to your question is that you need to play around to get the performance you expect, particularly for complex queries that are run on a regular basis. In an ideal world, the query optimizer would find the perfect execution path. Although it often does, you may be able to find a way to get better performance.

Slow Performance for Deeply Nested Subquery Factoring (Cte)