Flatten Adjacency List Hierarchy to a List of All Paths

Flatten Adjacency List Hierarchy To A List Of All Paths

To do multi-level queries across a simple adjacency-list invariably involves self-left-joins. It's easy to make a right-aligned table:

SELECT category.category_id,
ancestor4.category_id AS lvl4,
ancestor3.category_id AS lvl3,
ancestor2.category_id AS lvl2,
ancestor1.category_id AS lvl1
FROM categories AS category
LEFT JOIN categories AS ancestor1 ON ancestor1.category_id=category.category_id
LEFT JOIN categories AS ancestor2 ON ancestor2.category_id=ancestor1.parent
LEFT JOIN categories AS ancestor3 ON ancestor3.category_id=ancestor2.parent
LEFT JOIN categories AS ancestor4 ON ancestor4.category_id=ancestor3.parent;

To left-align it like your example is a bit more tricky. This comes to mind:

SELECT category.category_id,
ancestor1.category_id AS lvl1,
ancestor2.category_id AS lvl2,
ancestor3.category_id AS lvl3,
ancestor4.category_id AS lvl4
FROM categories AS category
LEFT JOIN categories AS ancestor1 ON ancestor1.parent IS NULL
LEFT JOIN categories AS ancestor2 ON ancestor1.category_id<>category.category_id AND ancestor2.parent=ancestor1.category_id
LEFT JOIN categories AS ancestor3 ON ancestor2.category_id<>category.category_id AND ancestor3.parent=ancestor2.category_id
LEFT JOIN categories AS ancestor4 ON ancestor3.category_id<>category.category_id AND ancestor4.parent=ancestor3.category_id
WHERE
ancestor1.category_id=category.category_id OR
ancestor2.category_id=category.category_id OR
ancestor3.category_id=category.category_id OR
ancestor4.category_id=category.category_id;

would work for n-tier hierarchies.

Sorry, arbitrary-depth queries are not possible in the adjacency-list model. If you are doing this kind of query a lot, you should change your schema to one of the other models of storing hierarchical information: full adjacency relation (storing all ancestor-descendent relationships), materialised path, or nested sets.

If the categories don't move around a lot (which is usually the case for a store like your example), I would tend towards nested sets.

Query an adjacency list in SQL via a materialized path

Is there a way to limit the rCTE so that it only needs to evaluate the nodes in the hierarchy, instead of reconstructing the entire hierarchy each time?

Limiting the rCTE

There are a handful of approaches to limiting the scope of each recursive query so that it's only evaluating the relevant nodes in the hierarchy. A fairly efficient approach is to simply restrict the rCTE to records which the source path (let's call that @Path) starts with:

INNER JOIN  RCTE recursive
ON this.ParentId = recursive.Id
AND @Path LIKE CAST(recursive.Path + '/' + this.Name AS VARCHAR(MAX)) + '%'

This will limit the query to each record in your path:

Id    Path
1 /Root
2 /Root/FolderA
3 /Root/FolderA/SubfolderA

Which can then be easily filtered down to the final record based on a simple WHERE clause:

WHERE Path = @Path

Packaging it as a function

We can combine this with the original rCTE into a function. Putting it all together, it might look like:

CREATE
FUNCTION [dbo].[GetIdFromPath]
(
@Path VARCHAR(MAX)
)
RETURNS INT
AS

BEGIN

DECLARE @Id INT = -1

;WITH RCTE AS (

SELECT Id,
ParentId,
CAST('/' + Name AS VARCHAR(MAX)) AS Path
FROM [dbo].[Tree] root
WHERE root.Id = 1

UNION ALL

SELECT this.Id,
this.ParentId,
CAST(parent.Path + '/' + this.Name AS VARCHAR(MAX)) AS Path
FROM [dbo].[Tree] AS this
INNER JOIN RCTE parent
ON Tree.ParentId = parent.Id
AND @Path LIKE CAST(parent.Path + '/' + this.Name AS VARCHAR(MAX)) + '%'
)
SELECT @Id = Id
FROM RCTE as hierarchy
WHERE Path = @Path

RETURN @Id

END

Querying by path

Given the above function, you can now query the adjacency list by simply passing the full path to the GetIdFromPath() function:

SELECT          dbo.GetIdFromPath('/Root/FolderA/SubfolderA') AS Id

Which, given the sample data from the original post, will return 3.

Performance

I've tested this approach against a table of comparable size, with 2,500 sample records, and it consistently executes well under a second, which is a dramatic improvement over the naïve approach. Obviously, you'll need to evaluate this against your own database and performance requirements to determine if its efficient enough.

flatten tree structure into array

Here's a solution using flatMap() and recursion:

const flatten = (array) => array.flatMap(({ type, name, path, children }) => [
{ type, name, path },
...flatten(children || [])
]);

Complete snippet:

const data = [{
type: "folder",
name: "animals",
path: "/animals",
children: [{
type: "folder",
name: "cat",
path: "/animals/cat",
children: [{
type: "folder",
name: "images",
path: "/animals/cat/images",
children: [{
type: "file",
name: "cat001.jpg",
path: "/animals/cat/images/cat001.jpg"
}, {
type: "file",
name: "cat001.jpg",
path: "/animals/cat/images/cat002.jpg"
}
]
}]
}]
}];

const flatten = (array) => array.flatMap(({type, name, path, children}) => [
{ type, name, path },
...flatten(children || [])
]);

console.log(flatten(data));

Flatten an irregular (arbitrarily nested) list of lists

Using generator functions can make your example easier to read and improve performance.

Python 2

Using the Iterable ABC added in 2.6:

from collections import Iterable

def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, basestring):
for item in flatten(x):
yield item
else:
yield x

Python 3

In Python 3, basestring is no more, but the tuple (str, bytes) gives the same effect. Also, the yield from operator returns an item from a generator one at a time.

from collections.abc import Iterable

def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
yield from flatten(x)
else:
yield x

How to flatten tree via LINQ?

You can flatten a tree like this:

IEnumerable<MyNode> Flatten(IEnumerable<MyNode> e) =>
e.SelectMany(c => Flatten(c.Elements)).Concat(new[] { e });

You can then filter by group using Where(...).

To earn some "points for style", convert Flatten to an extension function in a static class.

public static IEnumerable<MyNode> Flatten(this IEnumerable<MyNode> e) =>
e.SelectMany(c => c.Elements.Flatten()).Concat(e);

To earn more points for "even better style", convert Flatten to a generic extension method that takes a tree and a function that produces descendants from a node:

public static IEnumerable<T> Flatten<T>(
this IEnumerable<T> e
, Func<T,IEnumerable<T>> f
) => e.SelectMany(c => f(c).Flatten(f)).Concat(e);

Call this function like this:

IEnumerable<MyNode> tree = ....
var res = tree.Flatten(node => node.Elements);

If you would prefer flattening in pre-order rather than in post-order, switch around the sides of the Concat(...).



Related Topics



Leave a reply



Submit