Database Structure for Tree Data Structure
You mention the most commonly implemented, which is Adjacency List:
https://blogs.msdn.microsoft.com/mvpawardprogram/2012/06/25/hierarchies-convert-adjacency-list-to-nested-sets
There are other models as well, including materialized path and nested sets:
http://communities.bmc.com/communities/docs/DOC-9902
Joe Celko has written a book on this subject, which is a good reference from a general SQL perspective (it is mentioned in the nested set article link above).
Also, Itzik Ben-Gann has a good overview of the most common options in his book "Inside Microsoft SQL Server 2005: T-SQL Querying".
The main things to consider when choosing a model are:
1) Frequency of structure change - how frequently does the actual structure of the tree change. Some models provide better structure update characteristics. It is important to separate structure changes from other data changes however. For example, you may want to model a company's organizational chart. Some people will model this as an adjacency list, using the employee ID to link an employee to their supervisor. This is usually a sub-optimal approach. An approach that often works better is to model the org structure separate from employees themselves, and maintain the employee as an attribute of the structure. This way, when an employee leaves the company, the organizational structure itself does not need to be changes, just the association with the employee that left.
2) Is the tree write-heavy or read-heavy - some structures work very well when reading the structure, but incur additional overhead when writing to the structure.
3) What types of information do you need to obtain from the structure - some structures excel at providing certain kinds of information about the structure. Examples include finding a node and all its children, finding a node and all its parents, finding the count of child nodes meeting certain conditions, etc. You need to know what information will be needed from the structure to determine the structure that will best fit your needs.
Store tree data structure in database
You could store the data in a table using nested sets.
http://en.wikipedia.org/wiki/Nested_set_model#Example
I worry that your millions of nodes may make life difficult if you intend to constantly add new items. Perhaps that concern could be mitigated by using rational numbers instead of integers as the left and right values. Add a column for depth to speed up your desire to ask for descendants. I wrote some SQL to create the table and the stored procedures you asked for. I did it in SQL Server do the syntax might be slightly different but it's all standard SQL statements being executed. Also I just manually decided the upper and lower bounds for each Node. Obviously you'd have to deal with writing the code to get these nodes inserted (and maintained) in your database.
CREATE TABLE Tree(
Node nvarchar(10) NOT NULL,
Value int NOT NULL,
L int NOT NULL,
R int NOT NULL,
Depth int NOT NULL,
);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('A', 100, 1, 28, 0);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('B', 100, 2, 3, 1);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('C', 300, 4, 5, 1);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('D', 150, 6, 25, 1);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('E', 200, 26, 27, 1);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('F', 400, 7, 8, 2);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('G', 250, 9, 10, 2);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('H', 500, 11, 12, 2);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('I', 350, 13, 21, 2);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('J', 100, 21, 22, 2);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('K', 50, 23, 24, 2);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('L', 100, 14, 15, 3);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('M', 300, 16, 17, 3);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('N', 200, 18, 19, 3);
CREATE PROCEDURE grandValue
@Node NVARCHAR(10)
AS
BEGIN
SET NOCOUNT ON;
DECLARE @lbound INT;
DECLARE @ubound INT;
SELECT @lbound = L, @ubound = R FROM Tree WHERE Node = @Node
SELECT SUM(Value) AS Total FROM TREE WHERE L >= @lbound AND R <= @ubound
RETURN
END;
EXECUTE grandValue 'C';
EXECUTE grandValue 'I';
EXECUTE grandValue 'A';
CREATE PROCEDURE children
@Node NVARCHAR(10)
AS
BEGIN
SET NOCOUNT ON;
DECLARE @lbound INT;
DECLARE @ubound INT;
DECLARE @depth INT;
SELECT @lbound = L, @ubound = R, @depth=Depth FROM Tree WHERE Node = @Node
SELECT Node FROM TREE WHERE L > @lbound AND R < @ubound AND Depth = (@depth + 1)
RETURN
END;
EXECUTE children 'C';
EXECUTE children 'I';
EXECUTE children 'A';
CREATE PROCEDURE family
@Node NVARCHAR(10)
AS
BEGIN
SET NOCOUNT ON;
DECLARE @lbound INT;
DECLARE @ubound INT;
SELECT @lbound = L, @ubound = R FROM Tree WHERE Node = @Node
SELECT Node FROM TREE WHERE L > @lbound AND R < @ubound
RETURN
END;
EXECUTE family 'C';
EXECUTE family 'I';
EXECUTE family 'A';
CREATE PROCEDURE parent
@Node NVARCHAR(10)
AS
BEGIN
SET NOCOUNT ON;
DECLARE @lbound INT;
DECLARE @ubound INT;
DECLARE @depth INT;
SELECT @lbound = L, @ubound = R, @depth = Depth FROM Tree WHERE Node = @Node
SELECT Node FROM TREE WHERE L < @lbound AND R > @ubound AND Depth = (@depth - 1)
RETURN
END;
EXECUTE parent 'C';
EXECUTE parent 'I';
EXECUTE parent 'A';
CREATE PROCEDURE ancestor
@Node NVARCHAR(10)
AS
BEGIN
SET NOCOUNT ON;
DECLARE @lbound INT;
DECLARE @ubound INT;
SELECT @lbound = L, @ubound = R FROM Tree WHERE Node = @Node
SELECT Node FROM TREE WHERE L < @lbound AND R > @ubound
RETURN
END;
EXECUTE ancestor 'C';
EXECUTE ancestor 'I';
EXECUTE ancestor 'A';
For creating the nested sets in the table in the first place you can run some code to generate the inserts or start with the first node and then successively add each additional node - although since each add potentially modifies many of the nodes in the set there can be a lot of thrashing of the database as you build this.
Here's a stored procedure for adding a node as a child of another node:
CREATE PROCEDURE insertNode
@ParentNode NVARCHAR(10), @NewNodeName NVARCHAR(10), @NewNodeValue INT
AS
BEGIN
SET NOCOUNT ON;
DECLARE @ubound INT;
DECLARE @depth INT;
SELECT @ubound = R, @depth = Depth FROM Tree WHERE Node = @ParentNode
UPDATE Tree SET L = L + 2 WHERE L >= @ubound
UPDATE Tree SET R = R + 2 WHERE R >= @ubound
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES (@NewNodeName, @NewNodeValue, @ubound, @ubound + 1, @depth + 1);
RETURN
END;
I got this from http://www.evanpetersen.com/item/nested-sets.html who also shows a nice graph walking algorithm for creating the initial L and R values. You'd have to enhance this to keep track of the depth as well but that's be easy.
How to represent a tree like structure in a db
I showed a solution similar to your nodes & edges tables, in my answer to the StackOverflow question: What is the most efficient/elegant way to parse a flat table into a tree? I call this solution "Closure Table".
I did a presentation on different methods of storing and using trees in SQL, Models for Hierarchical Data with SQL and PHP. I demonstrated that with the right indexes (depending on the queries you need to run), the Closure Table design can have very good performance, even over large collections of edges (about 500K edges in my demo).
I also covered the design in my book, SQL Antipatterns: Avoiding the Pitfalls of Database Programming.
Database structure and querying hierarchal data and trees of data
There are already good approaches out there which are more simple than the solution you propose.
Here are a couple of links which explain how to do it (we use this ourselves for much the same problem you describe and it works well).
- Managing Hierarchical Data in MySQL (from MySQL)
- Storing Hierarchical Data in a Database (from Sitepoint, but a clearer explanation, I think)
This makes inserting/updating more complex, but selecting portions of the tree structure far faster (with only one query). It allows finding all children of any given node in one query, and finding all the ancestors of a given node with one query.
Related Topics
How to Import a Json File into Postgresql
Getting Result of Dynamic SQL into a Variable For Sql-Server
How to Get Column Names from a Table in Oracle
How to Specify Condition in Count()
Safely Rename Tables Using Serial Primary Key Columns
Pass Table as Parameter into SQL Server Udf
Split Column into Multiple Rows in Postgres
Oracle: What Does '(+)' Do in a Where Clause
Add Foreign Key Relationship Between Two Databases
Using the Result of an Expression (E.G. Function Call) in a Stored Procedure Parameter List
How to Delete Duplicate Rows Without Unique Identifier
Update a Column Value, Replacing Part of a String
Constraint Defined Deferrable Initially Immediate Is Still Deferred
How to Determine the Last Day of the Previous Month Using Postgresql