How to Store a Tree in SQL Database

How to store a tree in SQL database

I found the discussion in the SQL Anti-patterns very helpfull, as it also focuses on the drawbacks of every implementation.

Also, The slides 48-77 in this presentation reiterate that analisys.

Bottom line, there is no such thing as a generic tree, and no silver bullet for SQL trees.
You'll have to ask yourself about the data, how and how much will they be selected, modified, will branches be moved around, etc, and based on those answers implement a suitable solution.

How to store directory / hierarchy / tree structure in the database?

There are many ways to store hierarchies in SQL databases. Which one to choose depends on which DBMS product you use, and how the data will be used. As you have used the MSSQL2005 tag, I think you should start considering the "Adjacency List" model; if you find that it doesn't perform well for your application, then have a look at Vadim Tropashko's comparison which highlights differences between models with a focus on multiple performance characteristics.

Best practice for storing tree data in a database

i agree with others.. recursion is available.

I would not place the manager_id in the emp table (despite the SCOTT/TIGER ancient wisdom). The real world breaks this business rule all the time, and it is not well normalized.

instead think of a person_to_person type link table, where two people are related to each other during a time period with a role... for example person1 is related to person2 from January through March as a manager. this allows you great flexibility in assigning people to projects, departments, arbitrary groups through time, even with the reality of having multiple managers at some points in time.

also consider that people to department relationships are similar - people may be related in subtle ways to multiple departments at the same time.

How can a tree be stored in a database?

Storing the Tree

There are several options:

  1. It is just a tree, after all, so you could store it the same way you would store any other tree (essentially via a recursive FOREIGN KEY).
  2. Or, convert the suffix tree into suffix array and store that into the the database.
  3. Or, you could serialize it to (say) XML, then store it to a single CLOB.
  4. Or, since a suffix tree is roughly 20 times larger than the "target" string it indexes, you could simply store the string and calculate the suffix tree on as-needed basis (e.g. using Ukkonen's algorithm).

NOTE: In the case of the suffix array, you won't be storing any characters, you'd just store the indexes describing each element, something like this:

CREATE TABLE SUFFIX_ARRAY (
ORDER INT PRIMARY KEY, -- Position in the suffix array.
START INT NOT NULL, -- Position of the starting character of the suffix within the target string.
LONGEST_COMMON_PREFIX INT NOT NULL -- If useful for your application.
)

You'd also have to separately store the "target" string (e.g. in a CLOB in another table).

Using the Tree

  1. If you store the suffix tree directly, you should be able search it directly using SQL.
  2. If you store it as suffix array, you'll have to juggle a bit to implement the binary search through the SQL, but should be possible.
  3. (and 4) If you store it in CLOB (or don't store it at all and just store the target string), then obviously you won't be able to access it directly in the database (not efficiently anyway) - your only option would be to load (or re-create) it in memory.

Database Structure for Tree Data Structure

You mention the most commonly implemented, which is Adjacency List:
https://blogs.msdn.microsoft.com/mvpawardprogram/2012/06/25/hierarchies-convert-adjacency-list-to-nested-sets

There are other models as well, including materialized path and nested sets:
http://communities.bmc.com/communities/docs/DOC-9902

Joe Celko has written a book on this subject, which is a good reference from a general SQL perspective (it is mentioned in the nested set article link above).

Also, Itzik Ben-Gann has a good overview of the most common options in his book "Inside Microsoft SQL Server 2005: T-SQL Querying".

The main things to consider when choosing a model are:

1) Frequency of structure change - how frequently does the actual structure of the tree change. Some models provide better structure update characteristics. It is important to separate structure changes from other data changes however. For example, you may want to model a company's organizational chart. Some people will model this as an adjacency list, using the employee ID to link an employee to their supervisor. This is usually a sub-optimal approach. An approach that often works better is to model the org structure separate from employees themselves, and maintain the employee as an attribute of the structure. This way, when an employee leaves the company, the organizational structure itself does not need to be changes, just the association with the employee that left.

2) Is the tree write-heavy or read-heavy - some structures work very well when reading the structure, but incur additional overhead when writing to the structure.

3) What types of information do you need to obtain from the structure - some structures excel at providing certain kinds of information about the structure. Examples include finding a node and all its children, finding a node and all its parents, finding the count of child nodes meeting certain conditions, etc. You need to know what information will be needed from the structure to determine the structure that will best fit your needs.



Related Topics



Leave a reply



Submit