How to Represent a Data Tree in SQL

How to represent a tree like structure in a db

I showed a solution similar to your nodes & edges tables, in my answer to the StackOverflow question: What is the most efficient/elegant way to parse a flat table into a tree? I call this solution "Closure Table".

I did a presentation on different methods of storing and using trees in SQL, Models for Hierarchical Data with SQL and PHP. I demonstrated that with the right indexes (depending on the queries you need to run), the Closure Table design can have very good performance, even over large collections of edges (about 500K edges in my demo).

I also covered the design in my book, SQL Antipatterns: Avoiding the Pitfalls of Database Programming.

Database Structure for Tree Data Structure

You mention the most commonly implemented, which is Adjacency List:
https://blogs.msdn.microsoft.com/mvpawardprogram/2012/06/25/hierarchies-convert-adjacency-list-to-nested-sets

There are other models as well, including materialized path and nested sets:
http://communities.bmc.com/communities/docs/DOC-9902

Joe Celko has written a book on this subject, which is a good reference from a general SQL perspective (it is mentioned in the nested set article link above).

Also, Itzik Ben-Gann has a good overview of the most common options in his book "Inside Microsoft SQL Server 2005: T-SQL Querying".

The main things to consider when choosing a model are:

1) Frequency of structure change - how frequently does the actual structure of the tree change. Some models provide better structure update characteristics. It is important to separate structure changes from other data changes however. For example, you may want to model a company's organizational chart. Some people will model this as an adjacency list, using the employee ID to link an employee to their supervisor. This is usually a sub-optimal approach. An approach that often works better is to model the org structure separate from employees themselves, and maintain the employee as an attribute of the structure. This way, when an employee leaves the company, the organizational structure itself does not need to be changes, just the association with the employee that left.

2) Is the tree write-heavy or read-heavy - some structures work very well when reading the structure, but incur additional overhead when writing to the structure.

3) What types of information do you need to obtain from the structure - some structures excel at providing certain kinds of information about the structure. Examples include finding a node and all its children, finding a node and all its parents, finding the count of child nodes meeting certain conditions, etc. You need to know what information will be needed from the structure to determine the structure that will best fit your needs.

How to represent a tree structure with ORDER BY OR GROUP BY

You could do this:

select *
from templates
order by nvl(master_id, id);

This would tell the database to sort by master_id, and if that column is empty (NULL) to use the id. This way the master and its children are sorted together.

Alternatively you can use a hierarchical query:

select *
from templates
start with master_id is null
connect by master_id = prior id;

SQL Tree Structure

You didn't specify your DBMS but with standard SQL (supported by all modern DBMS) you can easily do a recursive query to get the full tree:

with recursive full_tree as (
select id, name, parent, 1 as level
from departments
where parent is null
union all
select c.id, c.name, c.parent, p.level + 1
from departments c
join full_tree p on c.parent = p.id
)
select *
from full_tree;

If you need a sub-tree, just change the starting condition in the common table expression. To get e.g. all "categories":

with recursive all_categories as (
select id, name, parent, 1 as level
from departments
where id = 2 --- << start with a different node
union all
select c.id, c.name, c.parent, p.level + 1
from departments c
join all_categories p on c.parent = p.id
)
select *
from all_categories;

Getting all leafs is straightforward: it's all nodes where their ID does not appear as a parent:

select *
from departments
where id not in (select parent
from departments
where parent is not null);

SQLFiddle example: http://sqlfiddle.com/#!15/414c9/1


Edit after DBMS has been specified.

Oracle does support recursive CTEs (although you need 11.2.x for that) you just need to leave out the keyword recursive. But you can also use the CONNECT BY operator:

select id, name, parent, level
from departments
start with parent is null
connect by prior id = parent;

select id, name, parent, level
from departments
start with id = 2
connect by prior id = parent;

SQLFiddle for Oracle: http://sqlfiddle.com/#!4/6774ee/3

See the manual for details: https://docs.oracle.com/database/121/SQLRF/queries003.htm#i2053935

Tree structure data query in SQL Server

I don't think there's anything wrong with the design, assuming you have a limited level of parent-child relationships. Here is a quick example of retrieving the relationship using a recursive CTE:

USE tempdb;
GO

CREATE TABLE dbo.tree
(
ID INT PRIMARY KEY,
name VARCHAR(32),
ParentID INT FOREIGN KEY REFERENCES dbo.tree(ID)
);

INSERT dbo.tree SELECT 1, 'grandpa', NULL
UNION ALL SELECT 2, 'dad', 1
UNION ALL SELECT 3, 'me', 2
UNION ALL SELECT 4, 'mom', 1
UNION ALL SELECT 5, 'grandma', NULL;

;WITH x AS
(
-- anchor:
SELECT ID, name, ParentID, [level] = 0
FROM dbo.tree WHERE ParentID IS NULL
UNION ALL
-- recursive:
SELECT t.ID, t.name, t.ParentID, [level] = x.[level] + 1
FROM x INNER JOIN dbo.tree AS t
ON t.ParentID = x.ID
)
SELECT ID, name, ParentID, [level] FROM x
ORDER BY [level]
OPTION (MAXRECURSION 32);
GO

Don't forget to clean up:

DROP TABLE dbo.tree;

This might be a useful article. An alternative is hierarchyid but I find it overly complex for most scenarios.



Related Topics



Leave a reply



Submit