Managing Hierarchies in SQL: Mptt/Nested Sets VS Adjacency Lists VS Storing Paths

Managing hierarchies in SQL: MPTT/nested sets vs adjacency lists vs storing paths

You might also consider the Closure Table design I describe in my answer to What is the most efficient/elegant way to parse a flat table into a tree?

Calls required to create/delete/move a node:

  • Closure = 1

Calls required to get a tree:

  • Closure = 1

Calls required to get path to a node / ancestry:

  • Closure = 1

Calls required to get number of subnodes:

  • Closure = 1

Calls required to get depth of node:

  • Closure = 1

DB fields required:

  • Adjancency = 1 more field / row
  • Path = 1 more field / row
  • MPTT = 2 or 3 more fields / row
  • Closure = 2 or 3 fields in extra table. This table has O(n^2) rows worst case but far fewer than that in most practical cases.

There are a couple of other considerations:

Supports unlimited depth:

  • Adjacency = yes
  • MPTT = yes
  • Path = no
  • Closure = yes

Supports referential integrity:

  • Adjacency = yes
  • MPTT = no
  • Path = no
  • Closure = yes

I also cover Closure Table in my presentation Models for Hierarchical Data with SQL and PHP, and my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.

Hierarchical Data Models: Adjacency List vs. Nested Sets

Nested sets are better for performance, if you don't need frequent updates or hierarchical ordering.

If you need either tree updates or hierarchical ordering, it's better to use parent-child data model.

It's easily constructed in Oracle and SQL Server 2005+, and not so easily (but still possible) in MySQL.

Doctrine nested set vs adjacency list

Read this article about Nested Set vs Adjacency List. You'll see that Nested Set makes the queries faaaaar easier to write.
Also read this § about the nested set hierarchy hydration method, and you'll figure out how to get several doctrine objects in a hierarchical form with a single query.

adjacency list vs mptt for designing access control list

In real life situations you use far more "read" operations than "write" operations. So, best thing is to go with the modified preorder tree traversal (MPTT) solution which is very elegant once you understand it. Here's a PHP class providing an implementation of the modified preorder tree traversal algorithm which is very well documented and easy to get going with. Also, on the page, you'll find links to read more about the algorithm.

Storing hierarchical (parent/child) data in Python/Django: MPTT alternative?

django-treebeard is another option. It has great documentation. I believe it meets all of your above requirements and includes some functions for checking the tree for problems and fixing those problems in the tree.

Node.find_problems() https://tabo.pe/projects/django-treebeard/docs/1.60/api.html#treebeard.models.Node.find_problems

Node.fix_tree() https://tabo.pe/projects/django-treebeard/docs/1.60/api.html#treebeard.models.Node.fix_tree

What is the most efficient/elegant way to parse a flat table into a tree?

Now that MySQL 8.0 supports recursive queries, we can say that all popular SQL databases support recursive queries in standard syntax.

WITH RECURSIVE MyTree AS (
SELECT * FROM MyTable WHERE ParentId IS NULL
UNION ALL
SELECT m.* FROM MyTABLE AS m JOIN MyTree AS t ON m.ParentId = t.Id
)
SELECT * FROM MyTree;

I tested recursive queries in MySQL 8.0 in my presentation Recursive Query Throwdown in 2017.

Below is my original answer from 2008:


There are several ways to store tree-structured data in a relational database. What you show in your example uses two methods:

  • Adjacency List (the "parent" column) and
  • Path Enumeration (the dotted-numbers in your name column).

Another solution is called Nested Sets, and it can be stored in the same table too. Read "Trees and Hierarchies in SQL for Smarties" by Joe Celko for a lot more information on these designs.

I usually prefer a design called Closure Table (aka "Adjacency Relation") for storing tree-structured data. It requires another table, but then querying trees is pretty easy.

I cover Closure Table in my presentation Models for Hierarchical Data with SQL and PHP and in my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.

CREATE TABLE ClosureTable (
ancestor_id INT NOT NULL REFERENCES FlatTable(id),
descendant_id INT NOT NULL REFERENCES FlatTable(id),
PRIMARY KEY (ancestor_id, descendant_id)
);

Store all paths in the Closure Table, where there is a direct ancestry from one node to another. Include a row for each node to reference itself. For example, using the data set you showed in your question:

INSERT INTO ClosureTable (ancestor_id, descendant_id) VALUES
(1,1), (1,2), (1,4), (1,6),
(2,2), (2,4),
(3,3), (3,5),
(4,4),
(5,5),
(6,6);

Now you can get a tree starting at node 1 like this:

SELECT f.* 
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1;

The output (in MySQL client) looks like the following:

+----+
| id |
+----+
| 1 |
| 2 |
| 4 |
| 6 |
+----+

In other words, nodes 3 and 5 are excluded, because they're part of a separate hierarchy, not descending from node 1.


Re: comment from e-satis about immediate children (or immediate parent). You can add a "path_length" column to the ClosureTable to make it easier to query specifically for an immediate child or parent (or any other distance).

INSERT INTO ClosureTable (ancestor_id, descendant_id, path_length) VALUES
(1,1,0), (1,2,1), (1,4,2), (1,6,1),
(2,2,0), (2,4,1),
(3,3,0), (3,5,1),
(4,4,0),
(5,5,0),
(6,6,0);

Then you can add a term in your search for querying the immediate children of a given node. These are descendants whose path_length is 1.

SELECT f.* 
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
AND path_length = 1;

+----+
| id |
+----+
| 2 |
| 6 |
+----+

Re comment from @ashraf: "How about sorting the whole tree [by name]?"

Here's an example query to return all nodes that are descendants of node 1, join them to the FlatTable that contains other node attributes such as name, and sort by the name.

SELECT f.name
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
ORDER BY f.name;

Re comment from @Nate:

SELECT f.name, GROUP_CONCAT(b.ancestor_id order by b.path_length desc) AS breadcrumbs
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
JOIN ClosureTable b ON (b.descendant_id = a.descendant_id)
WHERE a.ancestor_id = 1
GROUP BY a.descendant_id
ORDER BY f.name

+------------+-------------+
| name | breadcrumbs |
+------------+-------------+
| Node 1 | 1 |
| Node 1.1 | 1,2 |
| Node 1.1.1 | 1,2,4 |
| Node 1.2 | 1,6 |
+------------+-------------+

A user suggested an edit today. SO moderators approved the edit, but I am reversing it.

The edit suggested that the ORDER BY in the last query above should be ORDER BY b.path_length, f.name, presumably to make sure the ordering matches the hierarchy. But this doesn't work, because it would order "Node 1.1.1" after "Node 1.2".

If you want the ordering to match the hierarchy in a sensible way, that is possible, but not simply by ordering by the path length. For example, see my answer to MySQL Closure Table hierarchical database - How to pull information out in the correct order.

How to repair a corrupted MPTT tree (nested set) in the database using SQL?

Using SQL Server, following script seems to be working for me.

Output testscript

category_id name                 parent      lft         rgt         lftcalc     rgtcalc
----------- -------------------- ----------- ----------- ----------- ----------- -----------
1 ELECTRONICS NULL 1 20 1 20
2 TELEVISIONS 1 2 9 2 9
3 TUBE 2 3 4 3 4
4 LCD 2 5 6 5 6
5 PLASMA 2 7 8 7 8
6 PORTABLE ELECTRONICS 1 10 19 10 19
7 MP3 PLAYERS 6 11 14 11 14
8 FLASH 7 12 13 12 13
9 CD PLAYERS 6 15 16 15 16
10 2 WAY RADIOS 6 17 18 17 18

Script

SET NOCOUNT ON
GO

DECLARE @nested_category TABLE (
category_id INT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
parent INT,
lft INT,
rgt INT
);

DECLARE @current_Category_ID INTEGER
DECLARE @current_parent INTEGER
DECLARE @SafeGuard INTEGER
DECLARE @myLeft INTEGER
SET @SafeGuard = 100

INSERT INTO @nested_category
SELECT 1,'ELECTRONICS',NULL,NULL,NULL
UNION ALL SELECT 2,'TELEVISIONS',1,NULL,NULL
UNION ALL SELECT 3,'TUBE',2,NULL,NULL
UNION ALL SELECT 4,'LCD',2,NULL,NULL
UNION ALL SELECT 5,'PLASMA',2,NULL,NULL
UNION ALL SELECT 6,'PORTABLE ELECTRONICS',1,NULL,NULL
UNION ALL SELECT 7,'MP3 PLAYERS',6,NULL,NULL
UNION ALL SELECT 8,'FLASH',7,NULL,NULL
UNION ALL SELECT 9,'CD PLAYERS',6,NULL,NULL
UNION ALL SELECT 10,'2 WAY RADIOS',6,NULL,NULL

/* Initialize */
UPDATE @nested_category
SET lft = 1
, rgt = 2
WHERE parent IS NULL

UPDATE @nested_category
SET lft = NULL
, rgt = NULL
WHERE parent IS NOT NULL

WHILE EXISTS (SELECT * FROM @nested_category WHERE lft IS NULL) AND @SafeGuard > 0
BEGIN
SELECT @current_Category_ID = MAX(nc.category_id)
FROM @nested_category nc
INNER JOIN @nested_category nc2 ON nc2.category_id = nc.parent
WHERE nc.lft IS NULL
AND nc2.lft IS NOT NULL

SELECT @current_parent = parent
FROM @nested_category
WHERE category_id = @current_category_id

SELECT @myLeft = lft
FROM @nested_category
WHERE category_id = @current_parent

UPDATE @nested_category SET rgt = rgt + 2 WHERE rgt > @myLeft;
UPDATE @nested_category SET lft = lft + 2 WHERE lft > @myLeft;
UPDATE @nested_category SET lft = @myLeft + 1, rgt = @myLeft + 2 WHERE category_id = @current_category_id

SET @SafeGuard = @SafeGuard - 1
END

SELECT * FROM @nested_category ORDER BY category_id

SELECT COUNT(node.name), node.name, MIN(node.lft)
FROM @nested_category AS node,
@nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY
node.name
ORDER BY
3, 1

Testscript ##

SET NOCOUNT ON
GO

DECLARE @nested_category TABLE (
category_id INT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
parent INT,
lft INT,
rgt INT,
lftcalc INT,
rgtcalc INT
);

INSERT INTO @nested_category
SELECT 1,'ELECTRONICS',NULL,1,20,NULL,NULL
UNION ALL SELECT 2,'TELEVISIONS',1,2,9,NULL,NULL
UNION ALL SELECT 3,'TUBE',2,3,4,NULL,NULL
UNION ALL SELECT 4,'LCD',2,5,6,NULL,NULL
UNION ALL SELECT 5,'PLASMA',2,7,8,NULL,NULL
UNION ALL SELECT 6,'PORTABLE ELECTRONICS',1,10,19,NULL,NULL
UNION ALL SELECT 7,'MP3 PLAYERS',6,11,14,NULL,NULL
UNION ALL SELECT 8,'FLASH',7,12,13,NULL,NULL
UNION ALL SELECT 9,'CD PLAYERS',6,15,16,NULL,NULL
UNION ALL SELECT 10,'2 WAY RADIOS',6,17,18,NULL,NULL

/* Initialize */
UPDATE @nested_category
SET lftcalc = 1
, rgtcalc = 2
WHERE parent IS NULL

DECLARE @current_Category_ID INTEGER
DECLARE @current_parent INTEGER
DECLARE @SafeGuard INTEGER
DECLARE @myRight INTEGER
DECLARE @myLeft INTEGER
SET @SafeGuard = 100
WHILE EXISTS (SELECT * FROM @nested_category WHERE lftcalc IS NULL) AND @SafeGuard > 0
BEGIN
SELECT @current_Category_ID = MAX(nc.category_id)
FROM @nested_category nc
INNER JOIN @nested_category nc2 ON nc2.category_id = nc.parent
WHERE nc.lftcalc IS NULL
AND nc2.lftcalc IS NOT NULL

SELECT @current_parent = parent
FROM @nested_category
WHERE category_id = @current_category_id

SELECT @myLeft = lftcalc
FROM @nested_category
WHERE category_id = @current_parent

UPDATE @nested_category SET rgtcalc = rgtcalc + 2 WHERE rgtcalc > @myLeft;
UPDATE @nested_category SET lftcalc = lftcalc + 2 WHERE lftcalc > @myLeft;
UPDATE @nested_category SET lftcalc = @myLeft + 1, rgtcalc = @myLeft + 2 WHERE category_id = @current_category_id

SELECT * FROM @nested_category WHERE category_id = @current_parent
SELECT * FROM @nested_category ORDER BY category_id
SET @SafeGuard = @SafeGuard - 1
END

SELECT * FROM @nested_category ORDER BY category_id

SELECT COUNT(node.name), node.name, MIN(node.lft)
FROM @nested_category AS node,
@nested_category AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
GROUP BY
node.name
ORDER BY
3, 1


Related Topics



Leave a reply



Submit