Visiting a Directed Graph as If It Were an Undirected One, Using a Recursive Query

Visiting a directed graph as if it were an undirected one, using a recursive query

Could work like this:

WITH RECURSIVE graph AS (
SELECT parent
,child
,',' || parent::text || ',' || child::text || ',' AS path
,0 AS depth
FROM ownership
WHERE parent = 1

UNION ALL
SELECT o.parent
,o.child
,g.path || o.child || ','
,g.depth + 1
FROM graph g
JOIN ownership o ON o.parent = g.child
WHERE g.path !~~ ('%,' || o.parent::text || ',' || o.child::text || ',%')
)
SELECT *
FROM graph

You mentioned performance, so I optimized in that direction.

Major points:

  • Traverse the graph only in the defined direction.

  • No need for a column cycle, make it an exclusion condition instead. One less step to go. That is also the direct answer to:

How can I do to stop cycles one step before the node that closes the
cycle?

  • Use a string to record the path. Smaller and faster than an array of rows. Still contains all necessary information. Might change with very big bigint numbers, though.

  • Check for cycles with the LIKE operator (~~), should be much faster.

  • If you don't expect more that 2147483647 rows over the course of time, use plain integer columns instead of bigint. Smaller and faster.

  • Be sure to have an index on parent. Index on child is irrelevant for my query. (Other than in your original where you traverse edges in both directions.)

  • For huge graphs I would switch to a plpgsql procedure, where you can maintain the path as temp table with one row per step and a matching index. A bit of an overhead, that will pay off with huge graphs, though.


Problems in your original query:

WHERE (g.parent = o.child or g.child = o.parent) 

There is only one endpoint of your traversal at any point in the process. As you wlak the directed graph in both directions, the endpoint can be either parent or child - but not both of them. You have to save the endpoint of every step, and then:

WHERE g.child IN (o.parent, o.child) 

The violation of the direction also makes your starting condition questionable:

WHERE parent = 1

Would have to be

WHERE 1 IN (parent, child)

And the two rows (1,2) and (2,1) are effectively duplicates this way ...


Additional solution after comment

  • Ignore direction
  • Still walk any edge only once per path.
  • Use ARRAY for path
  • Save original direction in path, not actual direction.

Note, that this way (2,1) and (1,2) are effective duplicates, but both can be used in the same path.

I introduce the column leaf which saves the actual endpoint of every step.

WITH RECURSIVE graph AS (
SELECT CASE WHEN parent = 1 THEN child ELSE parent END AS leaf
,ARRAY[ROW(parent, child)] AS path
,0 AS depth
FROM ownership
WHERE 1 in (child, parent)

UNION ALL
SELECT CASE WHEN o.parent = g.leaf THEN o.child ELSE o.parent END -- AS leaf
,path || ROW(o.parent, o.child) -- AS path
,depth + 1 -- AS depth
FROM graph g
JOIN ownership o ON g.leaf in (o.parent, o.child)
AND ROW(o.parent, o.child) <> ALL(path)
)
SELECT *
FROM graph

Directed graph in Oracle SQL using recursive query visiting each node only once

For keeping the traversing algorithm out of returning to already visited edges, one can indeed keep the visited edges somewhere. As you already found out, you won't get much of a success with a string concatenation. However, there are other usable "value concatenation" techniques available...

You have to have one handy schema-level collection of scalars at your disposal:

create or replace type arr_strings is table of varchar2(64);

And then you can collect the visited edges to that collection in each iteration:

with nondirected$ as (
select from_id, to_id, from_id||'-'||to_id as edge_desc
from edge
where from_id != to_id
union all
select to_id, from_id, from_id||'-'||to_id as edge_desc
from edge
where (to_id, from_id) not in (
select from_id, to_id
from edge
)
),
graph$(lvl, from_id, to_id, edge_desc, visited_edges) as (
select 1, from_id, to_id, edge_desc,
arr_strings(edge_desc)
from nondirected$ R
where from_id in (&nodes)
--
union all
--
select
lvl+1,
Y.from_id, Y.to_id, Y.edge_desc,
X.visited_edges multiset union arr_strings(Y.edge_desc)
from graph$ X
join nondirected$ Y
on Y.from_id = X.to_id
where not exists (
select 1
from table(X.visited_edges) Z
where Y.edge_desc = Z.column_value
)
)
search breadth first by edge_desc set order_id
cycle edge_desc set is_cycle to 1 default 0,
ranked_graph$ as (
select C.*,
row_number() over (partition by edge_desc order by lvl, order_id) as rank$
from graph$ C
-- where is_cycle = 0
)
select *
from ranked_graph$
--where rank$ <= 1
order by lvl, order_id
;

Notes

  1. I pre-processed the directed graph to a nondirected one by union-ing a set of reverse edges to the input. That should make the recursive traversal predicates easier to read. Solely for my purposes of easier reading+writing of the SQL. You don't have to do that, of course.
  2. I remember trying something like this a few years ago on Oracle 11.2. And I remember that it was failing, though I don't remember why. On 12.2, it ran OK. Just give it a try on 11g, too; I don't have one available.
  3. Since each iteration does, in addition to the traversal inner join, also an anti-join, I sincerely doubt that this is going to be more performant. However, it for sure solves the problem of lowering the number of recursive nestings.
  4. You'll have to resolve the desired ordering on your own, as you probably understood from my comments. :-)

Limiting the revisited edges to zero

In SQL, you can't. The PostgreSQL solution you mentioned does do it. In Oracle, however, you can't. You would have to, for each traversal join, test rows for all other traversal joins. And that would mean some kind of aggregation or analytics... which Oracle forbids and throws out an ORA exception.

PLSQL to the rescue?

You can do it in PL/SQL, though. How much performant it is supposed to be, depends on how much memory you want to spend on prefetching edges from DB vs. how many SQL roundtrips you are willing to take to traverse the graph from "current" nodes or if you are willing to use even more memory to keep the visited nodes in a fancy indexed-by-edge collection vs. if you rather anti-join against a regular arr_output collection l_visited_nodes. You have multiple choices, choose wisely.

Anyway, for the simplest scenario with heavier use of SQL engine, this might be the code you're looking for...

create or replace
package pkg_so_recursive_traversal
is

type rec_output is record (
from_id edge.from_id%type,
to_id edge.to_id%type,
lvl integer
);
type arr_output is table of rec_output;

function traverse_a_graph
( i_from in arr_strings
, i_is_directed in varchar2 default 'NO' )
return arr_output
pipelined;

end pkg_so_recursive_traversal;
/
create or replace
package body pkg_so_recursive_traversal
is

function traverse_a_graph
( i_from in arr_strings
, i_is_directed in varchar2 )
return arr_output
pipelined
is
l_next_edges arr_output;
l_current_edges arr_output;
l_visited_edges arr_output := arr_output();
l_out rec_output;
i pls_integer;
l_is_directed varchar2(32) := case when i_is_directed = 'YES' then 'YES' else 'NO' end;
begin
select E.from_id, E.to_id, 0
bulk collect into l_next_edges
from table(i_from) F
join edge E
on F.column_value in (E.from_id, case when l_is_directed = 'YES' then null else E.to_id end)
where E.from_id != E.to_id;

l_out.lvl := 0;

loop
dbms_output.put_line(l_next_edges.count());
exit when l_next_edges.count() <= 0;
l_out.lvl := l_out.lvl + 1;

-- spool the edges to output
i := l_next_edges.first();
while i is not null loop
l_out.from_id := l_next_edges(i).from_id;
l_out.to_id := l_next_edges(i).to_id;
pipe row(l_out);
i := l_next_edges.next(i);
end loop;

l_current_edges := l_next_edges;
l_visited_edges := l_visited_edges multiset union l_current_edges;

-- find next edges
select unique E.from_id, E.to_id, 0
bulk collect into l_next_edges
from table(l_current_edges) CE
join edge E
on CE.to_id in (E.from_id, case when l_is_directed = 'YES' then null else E.to_id end)
or l_is_directed = 'NO' and CE.from_id in (E.from_id, E.to_id)
where E.from_id != E.to_id
and not exists (
select 1
from table(l_visited_edges) VE
where VE.from_id = E.from_id
and VE.to_id = E.to_id
);
end loop;

return;
end;

end pkg_so_recursive_traversal;

/

When called for the starting node of A and considering the graph to be undirected...

select *
from table(pkg_so_recursive_traversal.traverse_a_graph(
i_from => arr_strings('A'),
i_is_directed => 'NO'
));

... it yields...

FROM_ID    TO_ID             LVL
---------- ---------- ----------
A B 1
C A 1
C E 2
B D 2
C F 2
E B 2
G D 3
H F 3
G I 4

Notes

  1. Again, I did not put any effort to keep the ordering you requested, as you said it's not that important.
  2. This is doing multiple (exactly 5 for your example inputs) SQL roundtrips to the edge table. That may or may not be a bigger performance hit when compared to the pure-SQL solution with redundant edge visiting. Test more solutions properly, see which one works for you the best.
  3. This particular piece of code will work on 12c and higher. For 11g and lower you'll have to declare the rec_output and arr_output types on the schema level.

WITH RECURSIVE query to choose the longest paths

You already have a solution at your fingertips with cycle, just add a predicate at the end.

But adjust your break condition by one level, currently you are appending one node too many:

WITH RECURSIVE search AS (
SELECT id, link, data, ARRAY[g.id] AS path, (link = id) AS cycle
FROM graph g
WHERE NOT EXISTS (
SELECT 1
FROM graph
WHERE link = g.id
)

UNION ALL
SELECT g.id, g.link, g.data, s.path || g.id, g.link = ANY(s.path)
FROM search s
JOIN graph g ON g.id = s.link
WHERE NOT s.cycle
)
SELECT *
FROM search
WHERE cycle;
-- WHERE cycle IS NOT FALSE; -- alternative if link can be NULL
  • Also including a start condition like mentioned by @wildplasser.

  • Init condition for cycle is (link = id) to catch shortcut cycles. Not necessary if you have a CHECK constraint to disallow that in your table.

  • The exact implementation depends on the missing details.

  • This is assuming all graphs are terminated with a cycle or link IS NULL and there is a FK constraint from link to id in the same table.
    The exact implementation depends on missing details. If link is not actually a link (no referential integrity), you need to adapt ...

PostgreSQL SQL query for traversing an entire undirected graph and returning all edges found

I got to this, it should not get into infinite loops with any kind of data:

--create temp table edges ("from" text, "to" text);
--insert into edges values ('initial_node', 'a'), ('a', 'b'), ('a', 'c'), ('c', 'd');

with recursive graph(points) as (
select array(select distinct "to" from edges where "from" = 'initial_node')
union all
select g.points || e1.p || e2.p
from graph g
left join lateral (
select array(
select distinct "to"
from edges
where "from" =any(g.points) and "to" <>all(g.points) and "to" <> 'initial_node') AS p) e1 on (true)
left join lateral (
select array(
select distinct "from"
from edges
where "to" =any(g.points) and "from" <>all(g.points) and "from" <> 'initial_node') AS p) e2 on (true)
where e1.p <> '{}' OR e2.p <> '{}'
)
select distinct unnest(points)
from graph
order by 1

Recursive queries are very limiting in terms of what can be selected, and since they don't allow using the recursive results inside a subselect, one can't use NOT IN (select * from recursive where...). Storing results in an array, using LEFT JOIN LATERAL and using =ANY() and <>ALL() solved this conundrum.

Counting distinct undirected edges in a directed graph in SQL

select count(*) from (
select to_there from edges where from_here = 1
union
select from_here from edges where to_there = 1
) as whatever


Related Topics



Leave a reply



Submit