Sql Server 2008 Hierarchy Data Type Performance

SQL Server 2008 R2 - select hierarchical data

Assuming at least SQL Server 2005 I'd use a single query against a recursive common table expression.

Hierarchical data in Linq - options and performance

I would set up a view and an associated table-based function based on the CTE. My reasoning for this is that, while you could implement the logic on the application side, this would involve sending the intermediate data over the wire for computation in the application. Using the DBML designer, the view translates into a Table entity. You can then associate the function with the Table entity and invoke the method created on the DataContext to derive objects of the type defined by the view. Using the table-based function allows the query engine to take your parameters into account while constructing the result set rather than applying a condition on the result set defined by the view after the fact.

CREATE TABLE [dbo].[hierarchical_table](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [parent_id] [int] NULL,
    [data] [varchar](255) NOT NULL,
 CONSTRAINT [PK_hierarchical_table] PRIMARY KEY CLUSTERED 
(
    [id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE VIEW [dbo].[vw_recursive_view]
AS
WITH hierarchy_cte(id, parent_id, data, lvl) AS
(SELECT     id, parent_id, data, 0 AS lvl
      FROM         dbo.hierarchical_table
      WHERE     (parent_id IS NULL)
      UNION ALL
      SELECT     t1.id, t1.parent_id, t1.data, h.lvl + 1 AS lvl
      FROM         dbo.hierarchical_table AS t1 INNER JOIN
                            hierarchy_cte AS h ON t1.parent_id = h.id)
SELECT     id, parent_id, data, lvl
FROM         hierarchy_cte AS result

CREATE FUNCTION [dbo].[fn_tree_for_parent] 
(
    @parent int
)
RETURNS 
@result TABLE 
(
    id int not null,
    parent_id int,
    data varchar(255) not null,
    lvl int not null
)
AS
BEGIN
    WITH hierarchy_cte(id, parent_id, data, lvl) AS
   (SELECT     id, parent_id, data, 0 AS lvl
        FROM         dbo.hierarchical_table
        WHERE     (id = @parent OR (parent_id IS NULL AND @parent IS NULL))
        UNION ALL
        SELECT     t1.id, t1.parent_id, t1.data, h.lvl + 1 AS lvl
        FROM         dbo.hierarchical_table AS t1 INNER JOIN
            hierarchy_cte AS h ON t1.parent_id = h.id)
    INSERT INTO @result
    SELECT     id, parent_id, data, lvl
    FROM         hierarchy_cte AS result
RETURN 
END

ALTER TABLE [dbo].[hierarchical_table]  WITH CHECK ADD  CONSTRAINT [FK_hierarchical_table_hierarchical_table] FOREIGN KEY([parent_id])
REFERENCES [dbo].[hierarchical_table] ([id])

ALTER TABLE [dbo].[hierarchical_table] CHECK CONSTRAINT [FK_hierarchical_table_hierarchical_table]

To use it you would do something like -- assuming some reasonable naming scheme:

using (DataContext dc = new HierarchicalDataContext())
{
    HierarchicalTableEntity h = (from e in dc.HierarchicalTableEntities
                                 select e).First();
    var query = dc.FnTreeForParent( h.ID );
    foreach (HierarchicalTableViewEntity entity in query) {
        ...process the tree node...
    }
}

Question about SQL Server HierarchyID depth-first performance

Found workaround here:
http://connect.microsoft.com/SQLServer/feedback/details/532406/performance-issue-with-hierarchyid-fun-isdescendantof-in-where-clause#

Just reminding that I started with a heirarchyID passed in from the application and my goal is to retrieve any and all relatives of that value (both Ancestors and Descendants).

In my specific example, I had to add the following declarations before the SELECT statement:

declare @topNode hierarchyid = (select @messageID.GetAncestor((@messageID.GetLevel()-1)))
declare @topNodeParent hierarchyid = (select @topNode.GetAncestor(1))
declare @leftNode hierarchyid= (select @topNodeParent.GetDescendant (null, @topNode))
declare @rightNode hierarchyid= (select @topNodeParent.GetDescendant (@topNode, null))

The WHERE clause has been changed to:

messageid.IsDescendantOf(@topNode)=1 AND (messageid > @leftNode ) AND (messageid < @rightNode )

The querying performance increase is very significant:

For every result passed in, seek time is now 20ms on average (was from 120 to 420).

When querying 25 values, it previously took 25 - 35 seconds to return all related nodes (in some cases each value had many relatives, in some there were none). It now takes only 2 seconds.

Thank you very much to all who have contributed to this issue on this site and on others.

Anyone used SQl Server 2008 HierarchialID type to store genealogy data

I can't see how it would work; in a regular hierarchy, there is a single chain to the root, so it can store the path (which is what the binary is) to each node. However, with multiple parents, this isn't possible: even if you split matriarchy and partiarchy, you still have 1 mother, 2 grandmothers, 4 great-grand-mothers, etc (not even getting into some of the more "interesting" scanerios possible, especially with livestock). There is no single logical path to encode, so no: I don't think that this can work in your case.

I'm happy to be corrected, though.

Hierarchical SQL data (Recursive CTE vs HierarchyID vs closure table)

I have used all three methods. It's mostly a question of taste.

I agree that hierarchy with parent-child relationships in the table is the simplest. Moving a subtree is simple and it's easy to code the recursive access with CTEs. Performance is only going to be an issue if you have very large tree structures and you are frequently accessing the hierarchical data. For the most part, recursive CTEs are very fast when you have the correct indexes on the table.

The closure table is more like a supplement to the above. Finding all the descendants of a given node is lightning fast, you don't need the CTEs, just one extra join, so it's sweet. Yes, the number of records blows up, but I think it is no more than N-1 times the number of nodes for a tree of depth N (e.g. a tertiary tree of depth 5 would require 1 + 3 + 9 + 27 + 81 = 121 connections when storing only the parent-child relationship vs. 1 + 3 + (9 * 2) + (27 * 3) + (81 * 4) = 427 for the closure table). In addition, the closure table records are so narrow (just 2 ints at a minimum) that they take up almost no space. Generating the list of records to insert into the closure table when a new record is inserted into the hierarchy takes a tiny bit of overhead.

I personally like HierarchyId since it really combines the benefit of the above two, which is compact storage, and lightning fast access. Once you get it set up, it is easy to query and takes very little space. As you mentioned, it's a little tricky to move subtrees around, but it's manageable. Anyway, how often do you really move a subtree in a hierarchy? There are some links you can find that will suggest some methods, e.g.:

http://sqlblogcasts.com/blogs/simons/archive/2008/03/31/SQL-Server-2008---HierarchyId---How-do-you-move-nodes-subtrees-around.aspx

The main drawback I have found to hierarchyId is the learning curve. It's not as obvious how to work with it as the other two methods. I have worked with some very bright SQL developers who would frequently get snagged on it, so you end up with one or two resident experts who have to field questions from everyone else.

Materialized path pattern VS Hierarchyid

The chapter explains three methods for designing and querying hierarchies: Adjacency Pairs, Materialized Path, and HierarchyID. These are three solutions to the same problem so yes, it makes perfect sense to compare these three methods. The truth is that Materialized path is the fastest but Adjacency Pairs can solve more types of hierarchy problems. HierarchyID is clumsy, difficult to query, and, if you follow MSFT’s recommendation, it only stores the relative position, not the key, so it’s less robust.