Why Does Connect by Level on a Table Return Extra Rows

Why does CONNECT BY LEVEL on a table return extra rows?

In the first query, you connect by just the level.
So if level <= 1, you get each of the records 1 time. If level <= 2, then you get each level 1 time (for level 1) + N times (where N is the number of records in the table). It is like you are cross joining, because you're just picking all records from the table until the level is reached, without having other conditions to limit the result. For level <= 3, this is done again for each of those results.

So for 3 records:

  • Lvl 1: 3 record (all having level 1)
  • Lvl 2: 3 records having level 1 + 3*3 records having level 2 = 12
  • Lvl 3: 3 + 3*3 + 3*3*3 = 39 (indeed, 13 records each).
  • Lvl 4: starting to see a pattern? :)

It's not really a cross join. A cross join would only return those records that have level 2 in this query result, while with this connect by, you get the records having level 1 as well as the records having level 2, thus resulting in 3 + 3*3 instead of just 3*3 record.

Oracle SQL: CONNECT BY LEVEL returning many rows

The connect by is only based on the time, so you're connecting every time for route A with route B and vice versa.

The simple fix seems to be to make it:

CONNECT BY (LEVEL - 1) <= (END_TIME - START_TIME) / TIME_PERIOD
AND ROUTE_NAME = PRIOR ROUTE_NAME

to restrict it to the single route at a time; but that then forms a loop, so you need to add in a non-deterministc function call too to prevent that; for example:

CONNECT BY (LEVEL - 1) <= (END_TIME - START_TIME) / TIME_PERIOD
AND ROUTE_NAME = PRIOR ROUTE_NAME
AND PRIOR DBMS_RANDOM.VALUE() IS NOT NULL

which gets:

ROUTE_NAME OUTPUT_MOMENT      
---------- -------------------
ROUTE A 2018-03-09 05:00:08
ROUTE A 2018-03-09 06:00:08
ROUTE A 2018-03-09 07:00:08
ROUTE A 2018-03-09 08:00:08
ROUTE A 2018-03-09 09:00:08
ROUTE A 2018-03-09 10:00:08
ROUTE A 2018-03-09 11:00:08
ROUTE A 2018-03-09 12:00:08
ROUTE A 2018-03-09 13:00:08
ROUTE A 2018-03-09 14:00:08
ROUTE A 2018-03-09 15:00:08
ROUTE B 2018-03-09 05:00:08
ROUTE B 2018-03-09 05:30:08
ROUTE B 2018-03-09 06:00:08
ROUTE B 2018-03-09 06:30:08
ROUTE B 2018-03-09 07:00:08
ROUTE B 2018-03-09 07:30:08
ROUTE B 2018-03-09 08:00:08
ROUTE B 2018-03-09 08:30:08
ROUTE B 2018-03-09 09:00:08
ROUTE B 2018-03-09 09:30:08
ROUTE B 2018-03-09 10:00:08
ROUTE B 2018-03-09 10:30:08
ROUTE B 2018-03-09 11:00:08
ROUTE B 2018-03-09 11:30:08
ROUTE B 2018-03-09 12:00:08
ROUTE B 2018-03-09 12:30:08
ROUTE B 2018-03-09 13:00:08
ROUTE B 2018-03-09 13:30:08
ROUTE B 2018-03-09 14:00:08
ROUTE B 2018-03-09 14:30:08
ROUTE B 2018-03-09 15:00:08
ROUTE B 2018-03-09 15:30:08
ROUTE B 2018-03-09 16:00:08

34 rows selected.

You could also do two connect by queries and union the results together, possibly pulling the time range into a CTE to avoid duplicating that:

WITH START_END AS (
SELECT SYSDATE - 8 / 24 AS START_TIME,
SYSDATE + 3 / 24 AS END_TIME
FROM DUAL
)
SELECT 'ROUTE A' ROUTE_NAME,
START_TIME + (LEVEL - 1) / 24 AS OUTPUT_MOMENT
FROM START_END
CONNECT BY (LEVEL - 1) <= (END_TIME - START_TIME) / (1 / 24)
UNION ALL
SELECT 'ROUTE B' ROUTE_NAME,
START_TIME + (LEVEL - 1) / 48 AS OUTPUT_MOMENT
FROM START_END
CONNECT BY (LEVEL - 1) <= (END_TIME - START_TIME) / (1 / 48)

Using / ( 1 / 24) looks odd when you could instead do * 24, but you actually get a slightly different result because of rounding errors; with the latter you get an extra row for route A. You could rearrange the logic further to avoid that confusion though.

ORACLE CONNECT BY LEVEL Producing Duplicate rows

Currently, your CONNECT BY only limits the hierarchical level, and doesn't provide any condition for matching child rows to parent rows. This means that in a table with multiple rows, every row is a child of every other row. This is going to produce a massive result set.

If I understand correctly, you are trying to use the hierarchical functionality to pull multiple values from each individual row. So you really want each row to be parent and child to itself. I suggest trying:

CONNECT BY id = PRIOR id
AND prior sys_guid() is not null
AND level <= regexp_count(VALUE,CHR(10)||CHR(13))

Thanks to @kfinity for pointing out the need for the sys_guid() to prevent a CONNECT BY LOOP.

Oracle Connect By seems to produce too many rows

With no condition other than "level <= 4", every row from the original table, view etc. (from the join, in this case) will produce two rows at level 2, then four more rows at level 3, and 8 more at level 4. "Connect by" is essentially a succession of joins, and you are doing cross joins if you have no condition with the PRIOR operator.

You probably want to add "and prior a.id = a.id". This will lead to Oracle complaining about cycles (because Oracle decides a cycle is reached when it sees the same values in the columns subject to PRIOR). That, in turn, is solved by adding a third condition, usually "and prior sys_guid() is not null".

(Edited; the original answer made reference to NOCYCLE, which is not needed when using the "prior sys_guid() is not null" approach.)

This has been discussed recently on OTN: https://community.oracle.com/thread/3999985

Same question discussed here: https://community.oracle.com/thread/2526535

Duplicate rows using Connect by level

Oracle Setup:

CREATE TABLE My_SQL_table ( Site_NUM, start_week, end_week ) AS
SELECT 'France', 50, 52 FROM DUAL UNION ALL
SELECT 'Germany', 41, 43 FROM DUAL UNION ALL
SELECT 'USA', 12, 13 FROM DUAL;

Query: Using CONNECT BY

SELECT site_num,
COLUMN_VALUE wks_inbtwn
FROM My_SQL_table tbl1
CROSS JOIN
TABLE(
CAST(
MULTISET(
SELECT tbl1.START_WEEK + LEVEL
FROM DUAL
CONNECT BY tbl1.START_WEEK + LEVEL <= tbl1.END_WEEK
)
AS SYS.ODCINUMBERLIST
)
)

Output:


SITE_NUM | WKS_INBTWN
:------- | ---------:
France | 51
France | 52
Germany | 42
Germany | 43
USA | 13

Query 2: Using a recursive sub-query factoring clause

WITH rsqfc ( site_num, start_week, end_week ) AS (
SELECT site_num, start_week + 1, end_week
FROM my_sql_table
UNION ALL
SELECT site_num, start_week + 1, end_week
FROM rsqfc
WHERE start_week < end_week
)
SELECT site_num, start_week AS wks_inbtwn
FROM rsqfc
ORDER BY site_num, wks_inbtwn

Output:


SITE_NUM | WKS_INBTWN
:------- | ---------:
France | 51
France | 52
Germany | 42
Germany | 43
USA | 13

db<>fiddle here

Why my hierarchy query is showing duplicate records?

You are not understanding how CONNECT BY works. Here is a walkthrough of how Oracle is evaluating your 2nd query.

Without a START WITH clause, every row in your table with be used as a starting point, or "root" in your hierarchy.

Since you have no CONNECT BY conditions (like "columnA = PRIOR columnB"), every row in your table will be considered a child of every other row. This will happen forever, until your LEVEL <=4 condition is reached.

So,

LEVEL 1
--------
SNO 1
SNO 2

Explanation: Each row in your table is a starting point of its own hierarchy (because you have no START WITH conditions).

LEVEL 2
--------
SNO 1 -> SNO 1
SNO 1 -> SNO 2
SNO 2 -> SNO 1
SNO 2 -> SNO 2

Explanation of those 4 rows -- both SNO 1 and SNO 2 are roots, and for each root, SNO 1 and SNO 2 are children. So, 2x2 rows = 4 rows.

LEVEL 3 
-------
SNO 1 -> SNO 1 -> SNO 1
SNO 1 -> SNO 1 -> SNO 2
SNO 1 -> SNO 2 -> SNO 1
SNO 1 -> SNO 2 -> SNO 2
SNO 2 -> SNO 1 -> SNO 1
SNO 2 -> SNO 1 -> SNO 2
SNO 2 -> SNO 2 -> SNO 1
SNO 2 -> SNO 2 -> SNO 2

Explanation of those 8 rows. Starting with the 4 rows from level 2, both SNO 1 and SNO 2 are children of each, giving 4x2 = 8 rows at level 3.

Level 4, which I won't draw out, will similarly give 8x2 = 16 rows.

So, in total, you have 2 + 4 + 8 + 16 = 30 rows. (That's level 1 + level 2 + level 3 + level 4).

Then, after your CONNECT BY processing (shown above), the WHERE clause is applied, limiting your final results to rows where the value (at the lowest level of the hierarchy) is SNO = 1. That is exactly half of the 30 rows, or 15 rows, which is what you are getting.

Confusion with Oracle CONNECT BY

How a CONNECT BY query is executed and evaluated - step by step (by example).

Say we have the following table and a connect by query:

select * from mytable;

X
----------
1
2
3
4

SELECT level, m.*
FROM mytable m
START with x = 1
CONNECT BY PRIOR x +1 = x OR PRIOR x + 2 = x
ORDER BY level;

Step 1:

Select rows from table mytable that meet a START WITH condition, assign LEVEL = 1 to the returned result set:

 CREATE TABLE step1 AS
SELECT 1 "LEVEL", X from mytable
WHERE x = 1;

SELECT * FROM step1;

LEVEL X
---------- ----------
1 1

Step 2

Increase level by 1:

LEVEL = LEVEL + 1

Join the result set returned in previous step with mytable using CONNECT BY conditions as the join conditions.

In this clause PRIOR column-name refers to the resultset returned by previous step, and simple column-name refers to the mytable table:

CREATE TABLE step2 AS
SELECT 2 "LEVEL", mytable.X from mytable
JOIN step1 "PRIOR"
ON "PRIOR".x +1 = mytable.x or "PRIOR".x + 2 = mytable.x;

select * from step2;

LEVEL X
---------- ----------
2 2
2 3

STEP x+1

Repeat #2 until last operation returns an empty result set.

Step 3

CREATE TABLE step3 AS
SELECT 3 "LEVEL", mytable.X from mytable
JOIN step2 "PRIOR"
ON "PRIOR".x +1 = mytable.x or "PRIOR".x + 2 = mytable.x;

select * from step3;

LEVEL X
---------- ----------
3 3
3 4
3 4

Step 4

CREATE TABLE step4 AS
SELECT 4 "LEVEL", mytable.X from mytable
JOIN step3 "PRIOR"
ON "PRIOR".x +1 = mytable.x or "PRIOR".x + 2 = mytable.x;

select * from step4;

LEVEL X
---------- ----------
4 4

Step 5

CREATE TABLE step5 AS
SELECT 5 "LEVEL", mytable.X from mytable
JOIN step4 "PRIOR"
ON "PRIOR".x +1 = mytable.x or "PRIOR".x + 2 = mytable.x;

select * from step5;

no rows selected

Step 5 returned no rows, so now we finalize the query

Last step

UNION ALL results of all steps and return it as the final result:

SELECT * FROM step1
UNION ALL
SELECT * FROM step2
UNION ALL
SELECT * FROM step3
UNION ALL
SELECT * FROM step4
UNION ALL

SELECT * FROM step5;

LEVEL X
---------- ----------
1 1
2 2
2 3
3 3
3 4
3 4
4 4

Now let's apply the above procedure to your query:

SELECT * FROM dual;

DUMMY
-----
X

SELECT LEVEL FROM DUAL CONNECT BY rownum>5;

Step 1

Since the query does not contain the START WITH clause, Oracle selects all records from the source table:

CREATE TABLE step1 AS
SELECT 1 "LEVEL" FROM dual;

select * from step1;

LEVEL
----------
1

Step 2

CREATE TABLE step2 AS
SELECT 2 "LEVEL" from dual
JOIN step1 "PRIOR"
ON rownum > 5

select * from step2;

no rows selected

Since the last step returned no rows, we are going to finalize our query.

Last step

SELECT * FROM step1
UNION ALL

SELECT * FROM step2;

LEVEL
----------
1

The analyze of the last query:

select level from dual connect by rownum<10;

I leave to you as a homework assignment.

Inner join returning more rows then regular select

You have duplicate rows in feed_id_types. Run this to find which IDs are duplicated:

select
types.feed_type_id
from feed_id_types types
group by types.feed_type_id
having count(*) > 1

The IN() clause ignores the duplicates, matching on the first one it finds. The inner join matches each row from daily_run to every matching row in feed_id_types, creating extra results.



Related Topics



Leave a reply



Submit