Detect Duplicate Items in Recursive Cte

Detect duplicate items in recursive CTE

You can use the connectby function which exists in the tablefunc module.

First you need to enable the module

CREATE EXTENSION tablefunc;

Then you can use the connectby function (based on the sample table you provided in the question it will as follows):

SELECT distinct id
FROM connectby('objectdependencies', 'id', 'dependson', '4', 0)
AS t(id int, dependson int, level int)
where id != 4;

This will return:
1
2
3

Here is an explanation of the parameters from documentation:

connectby(text relname, text keyid_fld, text parent_keyid_fld
[, text orderby_fld ], text start_with, int max_depth
[, text branch_delim ])
  • relname Name of the source relation
  • keyid_fld Name of the key field
  • parent_keyid_fld Name of the parent-key field
  • orderby_fld Name of the field to order siblings by (optional)
  • start_with Key value of the row to start at
  • max_depth Maximum depth to descend to, or zero for unlimited depth
  • branch_delim String to separate keys with in branch output (optional)

please consult the documentation for more information.
https://www.postgresql.org/docs/9.5/static/tablefunc.html

How does a recursive CTE eliminate duplicates?

The difference is that when you populate your #orgchart2, you are including all the rows from #orgchart. So now when you create #orgchart3 (which represents a 3rd level of recursion), you are joining on the rows from #orgchart as well as #orgchart2.

So when you create the third level in #orgchart3, it is related to rows in both #orgchart and #orgchart2, when it should only be related to #orgchart2. Instead your third level includes rows that are one level beyond the 2nd level, but also one level beyond the anchor level, so you are duplicating rows, since you already have rows in the second level that are one level beyond the anchor level.

The optimizer knows not to do that with recursive CTEs. Each level of recursion only looks at the previous one and ignores all the ones that came before it. So no duplicates are created.

You would simulate what the optimizer does if you left out the top half of the UNION ALL when you populated #orgchart2 and #orgchart3, and then finally produced a single UNION ALL of all three temp tables.

Why does a recursive cte not return duplicates?

Don't think of recursive CTEs as actually being "recursive". They are inductive.

The anchor portion of the CTE runs. Then the "recursive" part keeps adding rows -- based on the most recently added rows. So, the first iteration adds the first date. The second iteration takes the just-added date and adds one to it.

The third iteration only considers the second (last) date, not both.

The terminology is not great, but they do not generate duplicates by reprocessing a given row more than once.

Ensuring uniqueness in recursive query

-- Data
CREATE TABLE node
( id integer NOT NULL PRIMARY KEY
, parentid integer REFERENCES node(id)
);

INSERT INTO node(id,parentid) VALUES
(5, NULL)
, (9,5), (10,5), (11,5)
, (15,9), (16,10), (17,11)
;

-- query
WITH RECURSIVE tree AS (
SELECT id, 0 AS depth
FROM node WHERE id IN (5, 9, 15)
UNION
SELECT node.id, tree.depth + 1
FROM tree JOIN node ON node.parentid = tree.id
)
SELECT *
FROM tree tr
WHERE NOT EXISTS ( -- trivial way to suppress duplicates with longer path
SELECT *
FROM tree nx
WHERE nx.id = tr.id
AND nx.depth < tr.depth
)
ORDER BY id
;

UPDATE: This looks less costly. It is correct for the given data (but not in the general case IIUC):

WITH RECURSIVE tree AS (
SELECT id, 0 AS depth
FROM node WHERE id IN (5, 9, 15)
UNION
SELECT node.id, tree.depth + 1
FROM tree JOIN node ON node.parentid = tree.id
WHERE node.id NOT IN (5, 9, 15)
)
SELECT *
FROM tree tr
ORDER BY id
;

Prevent recursive CTE visiting nodes multiple times

The CTE's are recursive.

When your CTE's have multiple initial conditions, that means they also have different recursion stacks, and there is no way to use information from one stack in another stack.

In your example, the recursion stacks will go as follows:

(1) - first IN condition
(1, 2)
(1, 2, 3)
(1, 2, 3, 4)
(1, 2, 3) - no more children
(1, 2) - no more children
(1) - no more children, going to second IN condition

(3) - second condition
(3, 4)
(3) - no more children, returning

As you can see, these recursion stack do not intersect.

You could probably record the visited values in a temporary table, JOIN each value with the temptable and do not follow this value it if it's found, but SQL Server does not support these things.

So you just use SELECT DISTINCT.

Recursive CTE to find all ancestors OF ALL ITEMS

Assuming I understand you right, it should be as simple as recursing backwards from the leaf nodes (which is easy since the table Items is storing only the leaf nodes):

;with AncestryTree as (
select Item, Parent
from Items
where Parent is not null
union all
select Items.Item, t.Parent
from AncestryTree t
join Items on t.Item = Items.Parent
)
select * from AncestryTree
order by Item, Parent

SQL Fiddle demo



Related Topics



Leave a reply



Submit