Cycle Detection with Recursive Subquery Factoring

Cycle detection with recursive subquery factoring

From documentation on CONNECT_BY_ISCYCLE:

The CONNECT_BY_ISCYCLE pseudocolumn returns 1 if the current row has a child which is also its ancestor

and that on CYCLE:

A row is considered to form a cycle if one of its ancestor rows has the same values for the cycle columns.

In your example, row 2 does have a child which is also its ancestor, but its id has not been returned yet.

In other words, CONNECT_BY_ISCYCLE checks the children (which are yet to be returned), while CYCLE checks the current row (which is already returned).

CONNECT BY is row based, while recursive CTE's are set-based.

Note that Oracle's documentation on CYCLE mentions an "ancestor row". However, generally speaking, there is no concept of an "ancestor row" in a recursive CTE. It's a set based operation which can yield results completely out of the tree. Generally speaking, the anchor part and the recursive part can even use the different tables.

Since recursive CTE's are usually used to build hierarchy trees, Oracle decided to add a cycle check. But due the set-based way the recursive CTE's operate, it's generally impossible to tell will the next step generate a cycle or not, because without a clear definition of the "ancestor row" cycle condition cannot be defined either.

To perform the "next" step, the whole "current" set needs to be available, but to generate each row of the current set (which includes the cycle column) we just need to have the results of the "next" operation.

It's not a problem if the current set always consists of a single row (like in CONNECT BY), but it is a problem if the recursive operation defined on a set as a whole.

Didn't look into Oracle 11 yet, but SQL Server implements recursive CTE's by just hiding a CONNECT BY behind them, which requires placing numerous restrictions (all of which effectively forbid all set-based operations).

PostgreSQL's implementation, on the other hand, is truly set-based: you can do any operation with the anchor part in the recursive part. It does not have any means to detect cycles, though, because cycles are not defined in the first place.

As was mentioned before, MySQL does not implement CTE's at all (it does not implement HASH JOIN's or MERGE JOINs as well, only the nested loops, so don't be surprised much).

Ironically, I received a letter today on this very subject, which I will cover in my blog.

Update:

Recursive CTE's in SQL Server are no more than CONNECT BY in disguise. See this article in my blog for shocking details:

SQL Server: are the recursive CTE’s really set-based?

Oracle Recursive Subquery Factoring convert

This is a quite a bit late, but I'm not sure this can be done using Recursive CTE. I did however come up with a solution using the MODEL clause:

WITH SAMPLE (ID,GRP_ID,SCORE,RANK) AS (
SELECT 1,1,100,NULL FROM DUAL UNION
SELECT 2,1,90,NULL FROM DUAL UNION
SELECT 3,1,70,NULL FROM DUAL UNION
SELECT 4,2,95,NULL FROM DUAL UNION
SELECT 5,2,70,NULL FROM DUAL UNION
SELECT 6,2,60,NULL FROM DUAL)
SELECT ID,GRP_ID,SCORE,RANK FROM SAMPLE
MODEL
DIMENSION BY (ID,GRP_ID)
MEASURES (SCORE,0 RANK,0 LAST_RANKED_GRP,0 ITEM_COUNT,0 HAS_RANK)
RULES
ITERATE (1000) UNTIL (ITERATION_NUMBER = ITEM_COUNT[1,1]) --ITERATE ONCE FOR EACH ITEM TO BE RANKED
(
RANK[ANY,ANY] = CASE WHEN SCORE[CV(),CV()] = MAX(SCORE) OVER (PARTITION BY HAS_RANK) THEN RANK() OVER (ORDER BY SCORE DESC,ID) ELSE RANK[CV(),CV()] END, --IF THE CURRENT ITEM SCORE IS EQUAL TO THE MAX SCORE OF UNRANKED, ASSIGN A RANK
LAST_RANKED_GRP[ANY,ANY] = FIRST_VALUE(GRP_ID) OVER (ORDER BY RANK DESC),
SCORE[ANY,ANY] = CASE WHEN RANK[CV(),CV()] = 0 AND CV(GRP_ID) = LAST_RANKED_GRP[CV(),CV()] THEN SCORE[CV(),CV()]+10 ELSE SCORE[CV(),CV()] END,
ITEM_COUNT[ANY,ANY] = COUNT(*) OVER (),
HAS_RANK[ANY,ANY] = CASE WHEN RANK[CV(),CV()] <> 0 THEN 1 ELSE 0 END --TO SEPARATE RANKED/UNRANKED ITEMS
)
ORDER BY RANK;

It's not very pretty, and I suspect there is a better way to go about this, but it does give the expected output.

Caveats:

You'd have to increase the iteration count if you have more than that number of rows.

This does a full re-ranking based on the score after each iteration. So if we took your sample data, but changed the initial score of item 2 to 95 rather than 90: after ranking item 1 and giving the 10 point bonus to item 2, it now has a score of 105. So we rank it as 1st and move item 1 down to 2nd. You'd have to make a few modifications if this is not the desired behavior.

How does the recursive WITH query work in oracle? When does it go into a cycle?

Oracle Setup:

CREATE TABLE lines ( Item, Qty ) AS
SELECT 'abc', 2 FROM DUAL UNION ALL 
SELECT 'cde', 1 FROM DUAL;

CREATE TABLE pick ( part, delivery ) AS
SELECT 'abc', 2 FROM DUAL UNION ALL
SELECT 'cde', 2 FROM DUAL;

Query 1: Using a hierarchical query:

SELECT Item,
       COLUMN_VALUE AS qty
FROM   lines l
       CROSS JOIN
       TABLE(
         CAST(
           MULTISET(
             SELECT 1
             FROM   DUAL
             CONNECT BY LEVEL <= l.Qty
           )
           AS SYS.ODCINUMBERLIST
         )
       ) t
WHERE  item IN ( SELECT part FROM pick WHERE delivery = 2 )

Query 2: Using a recursive sub-query factoring clause:

WITH rsqfc ( item, qty ) AS (
  SELECT item, qty
  FROM   lines l
  WHERE  item IN ( SELECT part FROM pick WHERE delivery = 2 )
UNION ALL
  SELECT item, qty - 1
  FROM   rsqfc
  WHERE  qty > 1
)
SELECT item, 1 AS qty
FROM   rsqfc;

Output:


ITEM | QTY
:--- | --:
abc  |   1
abc  |   1
cde  |   1

db<>fiddle here

How to write a recursive query which does not add already visited values?

You were on the right track, and you seem to just need a little help dealing with cycles. See the CYCLE clause right at the end of the recursive CTE definition (even though the CYCLE clause comes AFTER the closing parenthesis for the recursive CTE, it is still part of it):

with
-- Begin simulated data.
  client_box ( box_id, item_id, sub_box_id ) as (
    select 'BoxA', 'Item1', null   from dual union all
    select 'BoxA', null   , 'BoxB' from dual union all
    select 'BoxA', null   , 'BoxC' from dual union all
    select 'BoxB', null   , 'BoxA' from dual union all
    select 'BoxB', null   , 'BoxD' from dual
  ),
-- End of simulated data (for testing only, not part of the solution).
-- SQL query consists of the keyword WITH (above) and the lines below.
-- Use your actual table and column names.
-- Use whatever mechanism works for you in the ANCHOR branch of r (below).
  r ( box_id ) as (
    select  'BoxA' from dual   --  Modify this for inputs
    union all
    select  c.sub_box_id
      from  client_box c join r on c.box_id = r.box_id
      where c.sub_box_id is not null
  )
  cycle box_id set cycle to 1 default 0
select box_id
from   r
where  cycle = 0
;

BOX_ID
------
BoxA
BoxB
BoxC
BoxD

Cycle Detection with Recursive Subquery Factoring