Creating a Flattened Table/View of a Hierarchically-Defined Set of Data

Creating a flattened table/view of a hierarchically-defined set of data

So what you want is to materialize the transitive closures. That is, given this application table ...

 ID   | PARENT_ID
------+----------
1 |
2 | 1
3 | 2
4 | 2
5 | 4

... the graph table would look like this:

 PARENT_ID | CHILD_ID
-----------+----------
1 | 2
1 | 3
1 | 4
1 | 5
2 | 3
2 | 4
2 | 5
4 | 5

It is possible to maintain a table like this in Oracle, although you will need to roll your own framework for it. The question is whether it is worth the overhead. If the source table is volatile then keeping the graph data fresh may cost more cycles than you will save on the queries. Only you know your data's profile.

I don't think you can maintain such a graph table with CONNECT BY queries and cascading foreign keys. Too much indirect activity, too hard to get right. Also a materialized view is out, because we cannot write a SQL query which will zap the 1->5 record when we delete the source record for ID=4.

So what I suggest you read a paper called Maintaining Transitive Closure of Graphs in SQL by Dong, Libkin, Su and Wong. This contains a lot of theory and some gnarly (Oracle) SQL but it will give you the grounding to build the PL/SQL you need to maintain a graph table.


"can you expand on the part about it
being too difficult to maintain with
CONNECT BY/cascading FKs? If I control
access to the table and all
inserts/updates/deletes take place via
stored procedures, what kinds of
scenarios are there where this would
break down?"

Consider the record 1->5 which is a short-circuit of 1->2->4->5. Now what happens if, as I said before, we delete the the source record for ID=4? Cascading foreign keys could delete the entries for 2->4 and 4->5. But that leaves 1->5 (and indeed 2->5) in the graph table although they no longer represent a valid edge in the graph.

What might work (I think, I haven't done it) would be to use an additional synthetic key in the source table, like this.

 ID   | PARENT_ID | NEW_KEY
------+-----------+---------
1 | | AAA
2 | 1 | BBB
3 | 2 | CCC
4 | 2 | DDD
5 | 4 | EEE

Now the graph table would look like this:

 PARENT_ID | CHILD_ID | NEW_KEY
-----------+----------+---------
1 | 2 | BBB
1 | 3 | CCC
1 | 4 | DDD
1 | 5 | DDD
2 | 3 | CCC
2 | 4 | DDD
2 | 5 | DDD
4 | 5 | DDD

So the graph table has a foreign key referencing the relationship in the source table which generated it, rather than linking to the ID. Then deleting the record for ID=4 would cascade deletes of all records in the graph table where NEW_KEY=DDD.

This would work if any given ID can only have zero or one parent IDs. But it won't work if it is permissible for this to happen:

 ID   | PARENT_ID
------+----------
5 | 2
5 | 4

In other words the edge 1->5 represents both 1->2->4->5 and 1->2->5. So, what might work depends on the complexity of your data.

Flattening of hierarchy in SQL Server Dynamically

This should do what you're looking for.
There is a lot going on here and trying to break it all down into minute details would would quickly become a TLDR. So, if you have specific questions about why I did something or how something works, ask it in the comments and I will update the answer to include those specific details.

USE tempdb;
GO
IF OBJECT_ID('tempdb.dbo.LedgerAccounts', 'U') IS NOT NULL
BEGIN DROP TABLE tempdb.dbo.LedgerAccounts; END;
GO

CREATE TABLE tempdb.dbo.LedgerAccounts (
ledger_key int NOT NULL,
Ledger nvarchar (12) NULL,
LedgerLevel int NULL,
ParentAccount nvarchar (12) NULL,
LedgerDescription nvarchar (30) NULL,
CONSTRAINT PK_LedgerAccount
PRIMARY KEY CLUSTERED (ledger_key ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
GO

INSERT INTO tempdb.dbo.LedgerAccounts
VALUES (40, '020000', 0, '020999', 'Participation'),
(41, '020999', 20, '021000', 'Participation in Group'),
(42, '021000', 0, '021999', 'Loans to..'),
(43, '021999', 20, '022000', 'Loans to group company'),
(44, '022000', 0, '022999', 'Participation in'),
(45, '022999', 20, '029999', 'Other Participation'),
(46, '029999', 30, '059999', 'Financial Fixed Assets'),
(47, '059999', 50, 'TOT.BALANS', 'Fixed Assets'),
(48, 'TOT.BALANS', 90, 'TOT.GB', 'Total Balance sheet'),
(49, 'TOT.GB', 99, 'NULL', 'Total GL');
GO

-- SELECT * FROM tempdb.dbo.LedgerAccounts la;

--=====================================================================================================================
--=====================================================================================================================

IF OBJECT_ID('tempdb..#build_path', 'U') IS NOT NULL
BEGIN DROP TABLE #build_path; END;
GO

CREATE TABLE #build_path (
ledger_key int NOT NULL,
Ledger nvarchar(12) NOT NULL,
ParentAccount nvarchar(30) NOT NULL,
h_level int NOT NULL,
h_path nvarchar(4000) NOT NULL
);
GO

WITH
cte_build_path AS (
SELECT
la.ledger_key,
la.Ledger,
la.ParentAccount,
h_level = 0,
h_path = CONVERT(nvarchar(4000), RIGHT(REPLICATE(N' ', 50) + la.ledger + N'-' + la.LedgerDescription, 50))
FROM
dbo.LedgerAccounts la
WHERE
la.Ledger LIKE '[0-9][0-9][0-9][0-9][0-9][0-9]'
AND la.ParentAccount NOT LIKE '[0-9][0-9][0-9][0-9][0-9][0-9]'
UNION ALL
SELECT
la.ledger_key,
la.Ledger,
la.ParentAccount,
h_level = bp.h_level + 1,
h_path = CONVERT(nvarchar(4000), bp.h_path + RIGHT(REPLICATE(N' ', 50) + la.ledger + N'-' + la.LedgerDescription, 50))
FROM
dbo.LedgerAccounts la
JOIN cte_build_path bp
ON la.ParentAccount = bp.Ledger
)
INSERT #build_path (ledger_key, Ledger, ParentAccount, h_level, h_path)
SELECT
bp .ledger_key,
bp.Ledger,
bp.ParentAccount,
bp.h_level,
bp.h_path
FROM
cte_build_path bp;
GO

-- SELECT * FROM #build_path bp

--=====================================================================================================================

DECLARE
@d_col_count int = (SELECT MAX(bp.h_level) FROM #build_path bp) + 1,
@d_sql nvarchar(MAX) = N'',
@debug bit = 0;

SELECT TOP (@d_col_count)
@d_sql = CONCAT(@d_sql, N',
[level', x.rn, N'] = CASE WHEN bp.h_level >= ', x.rn, N' THEN LTRIM(SUBSTRING(bp.h_path, ', x.rn * 50 + 1, N', 50)) ELSE N''---'' END'
)
FROM
(
SELECT TOP (@d_col_count)
rn = ROW_NUMBER() OVER (ORDER BY ac.object_id) - 1
FROM
sys.all_columns ac
) x
ORDER BY
x.rn ASC;

SELECT @d_sql = CONCAT(N'
SELECT
bp.ledger_key,
bp.Ledger',
@d_sql, N'

FROM
#build_path bp;');

IF @debug = 1
BEGIN
PRINT(@d_sql);
END;
ELSE
BEGIN
EXEC sys.sp_executesql @d_sql
END;
GO

And the results...

ledger_key  Ledger       level0                                             level1                                             level2                                             level3                                             level4                                             level5                                             level6                                             level7
----------- ------------ -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------
47 059999 059999-Fixed Assets --- --- --- --- --- --- ---
46 029999 059999-Fixed Assets 029999-Financial Fixed Assets --- --- --- --- --- ---
45 022999 059999-Fixed Assets 029999-Financial Fixed Assets 022999-Other Participation --- --- --- --- ---
44 022000 059999-Fixed Assets 029999-Financial Fixed Assets 022999-Other Participation 022000-Participation in --- --- --- ---
43 021999 059999-Fixed Assets 029999-Financial Fixed Assets 022999-Other Participation 022000-Participation in 021999-Loans to group company --- --- ---
42 021000 059999-Fixed Assets 029999-Financial Fixed Assets 022999-Other Participation 022000-Participation in 021999-Loans to group company 021000-Loans to.. --- ---
41 020999 059999-Fixed Assets 029999-Financial Fixed Assets 022999-Other Participation 022000-Participation in 021999-Loans to group company 021000-Loans to.. 020999-Participation in Group ---
40 020000 059999-Fixed Assets 029999-Financial Fixed Assets 022999-Other Participation 022000-Participation in 021999-Loans to group company 021000-Loans to.. 020999-Participation in Group 020000-Participation

SQL - Flattening a tree view

What you are looking for is a recursive CTE of the form below. The nutshell summary is that the query repeats until it hits the limit set in the anchor definition. See Recursive Queries Using Common Table Expressions for more information on how it works.

;WITH CallReasonTree (ID, ParentID, ReasonText, TreePath)
AS
(
-- Anchor member definition
SELECT ChildReason.ID, ChildReason.ParentID, ChildReason.ReasonText,
CONVERT(nvarchar(MAX), ChildReason.ReasonText) AS TreePath
FROM CLG_CallReasonTree AS ChildReason
WHERE ChildReason.ParentID = 0
UNION ALL
-- Recursive member definition
SELECT ChildReason.ID, ChildReason.ParentID, ChildReason.ReasonText,
TreePath + '>' + ChildReason.ReasonText AS TreePath
FROM CLG_CallReasonTree AS ChildReason
INNER JOIN CallReasonTree AS ParentReason ON ChildReason.ParentID = ParentReason.ID
)
-- Statement that executes the CTE
SELECT ID, ParentID, ReasonText, TreePath
FROM CallReasonTree
ORDER BY ID

This is the resulting output:

ID  ParentID  ReasonText  TreePath
1 0 Level1A Level1A
2 0 Level1B Level1B
3 1 Level2AA Level1A>Level2AA
4 1 Level2AB Level1A>Level2AB
5 2 Level2BA Level1B>Level2BA
6 4 Level3ABA Level1A>Level2AB>Level3ABA

What is the most efficient/elegant way to parse a flat table into a tree?

Now that MySQL 8.0 supports recursive queries, we can say that all popular SQL databases support recursive queries in standard syntax.

WITH RECURSIVE MyTree AS (
SELECT * FROM MyTable WHERE ParentId IS NULL
UNION ALL
SELECT m.* FROM MyTABLE AS m JOIN MyTree AS t ON m.ParentId = t.Id
)
SELECT * FROM MyTree;

I tested recursive queries in MySQL 8.0 in my presentation Recursive Query Throwdown in 2017.

Below is my original answer from 2008:


There are several ways to store tree-structured data in a relational database. What you show in your example uses two methods:

  • Adjacency List (the "parent" column) and
  • Path Enumeration (the dotted-numbers in your name column).

Another solution is called Nested Sets, and it can be stored in the same table too. Read "Trees and Hierarchies in SQL for Smarties" by Joe Celko for a lot more information on these designs.

I usually prefer a design called Closure Table (aka "Adjacency Relation") for storing tree-structured data. It requires another table, but then querying trees is pretty easy.

I cover Closure Table in my presentation Models for Hierarchical Data with SQL and PHP and in my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.

CREATE TABLE ClosureTable (
ancestor_id INT NOT NULL REFERENCES FlatTable(id),
descendant_id INT NOT NULL REFERENCES FlatTable(id),
PRIMARY KEY (ancestor_id, descendant_id)
);

Store all paths in the Closure Table, where there is a direct ancestry from one node to another. Include a row for each node to reference itself. For example, using the data set you showed in your question:

INSERT INTO ClosureTable (ancestor_id, descendant_id) VALUES
(1,1), (1,2), (1,4), (1,6),
(2,2), (2,4),
(3,3), (3,5),
(4,4),
(5,5),
(6,6);

Now you can get a tree starting at node 1 like this:

SELECT f.* 
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1;

The output (in MySQL client) looks like the following:

+----+
| id |
+----+
| 1 |
| 2 |
| 4 |
| 6 |
+----+

In other words, nodes 3 and 5 are excluded, because they're part of a separate hierarchy, not descending from node 1.


Re: comment from e-satis about immediate children (or immediate parent). You can add a "path_length" column to the ClosureTable to make it easier to query specifically for an immediate child or parent (or any other distance).

INSERT INTO ClosureTable (ancestor_id, descendant_id, path_length) VALUES
(1,1,0), (1,2,1), (1,4,2), (1,6,1),
(2,2,0), (2,4,1),
(3,3,0), (3,5,1),
(4,4,0),
(5,5,0),
(6,6,0);

Then you can add a term in your search for querying the immediate children of a given node. These are descendants whose path_length is 1.

SELECT f.* 
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
AND path_length = 1;

+----+
| id |
+----+
| 2 |
| 6 |
+----+

Re comment from @ashraf: "How about sorting the whole tree [by name]?"

Here's an example query to return all nodes that are descendants of node 1, join them to the FlatTable that contains other node attributes such as name, and sort by the name.

SELECT f.name
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
ORDER BY f.name;

Re comment from @Nate:

SELECT f.name, GROUP_CONCAT(b.ancestor_id order by b.path_length desc) AS breadcrumbs
FROM FlatTable f
JOIN ClosureTable a ON (f.id = a.descendant_id)
JOIN ClosureTable b ON (b.descendant_id = a.descendant_id)
WHERE a.ancestor_id = 1
GROUP BY a.descendant_id
ORDER BY f.name

+------------+-------------+
| name | breadcrumbs |
+------------+-------------+
| Node 1 | 1 |
| Node 1.1 | 1,2 |
| Node 1.1.1 | 1,2,4 |
| Node 1.2 | 1,6 |
+------------+-------------+

A user suggested an edit today. SO moderators approved the edit, but I am reversing it.

The edit suggested that the ORDER BY in the last query above should be ORDER BY b.path_length, f.name, presumably to make sure the ordering matches the hierarchy. But this doesn't work, because it would order "Node 1.1.1" after "Node 1.2".

If you want the ordering to match the hierarchy in a sensible way, that is possible, but not simply by ordering by the path length. For example, see my answer to MySQL Closure Table hierarchical database - How to pull information out in the correct order.

Oracle SQL Hierarchical Query: Flatten Hierarchy and Perform Aggregation

You can use SYS_CONNECT_BY_PATH to generate an expression, the product of all quantities in a branch. Then use a function to dynamically execute that expression to get the final quantity.

It's not an ideal solution. The context switches between SQL and PL/SQL will take some time. And you'll need to worry about SQL injection. But at least you can avoid querying the same table twice.

(A recursive CTE, as Dan A. and podiluska suggested, would very likely be the best solution. In my experience, even when the two syntaxes are doing the same thing and using similar access paths, the recursive CTE can be significantly faster than connect by. But you'll need to wait until an upgrade to 11gR2.)

CREATE OR REPLACE FUNCTION EVALUATE_EXPRESSION(P_EXPRESSION IN VARCHAR2) RETURN NUMBER AS
R_QTY INTEGER;
BEGIN
EXECUTE IMMEDIATE 'SELECT '||P_EXPRESSION||' FROM DUAL' INTO R_QTY;
RETURN R_QTY;
END;
/

SELECT 'ASSY001' || SYS_CONNECT_BY_PATH(CHILD,'/') NAV_PATH,
GETQTY(SYS_CONNECT_BY_PATH(CHILD,'/'), 'ASSY001') QTY,
SUBSTR(SYS_CONNECT_BY_PATH(QUANTITY,'*'), 2) QTY_EXPRESSION,
EVALUATE_EXPRESSION(SUBSTR(SYS_CONNECT_BY_PATH(QUANTITY,'*'), 2)) QTY2,
CHILD
FROM ITEMHIER
WHERE ISLEAF = 1
START WITH PARENT = 'ASSY001'
CONNECT BY PRIOR CHILD = PARENT;

Also, you mentioned that the table has indexes. But are the queries using the indexes? Can you post the explain plan?

Finally, with a query this slow you may need to look into parallelism. Unfortunately, I've never had much luck using parallelism and connect by.



Related Topics



Leave a reply



Submit