Hierarchical Data in Linq - Options and Performance

Hierarchical data in Linq - options and performance

I would set up a view and an associated table-based function based on the CTE. My reasoning for this is that, while you could implement the logic on the application side, this would involve sending the intermediate data over the wire for computation in the application. Using the DBML designer, the view translates into a Table entity. You can then associate the function with the Table entity and invoke the method created on the DataContext to derive objects of the type defined by the view. Using the table-based function allows the query engine to take your parameters into account while constructing the result set rather than applying a condition on the result set defined by the view after the fact.

CREATE TABLE [dbo].[hierarchical_table](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [parent_id] [int] NULL,
    [data] [varchar](255) NOT NULL,
 CONSTRAINT [PK_hierarchical_table] PRIMARY KEY CLUSTERED 
(
    [id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

CREATE VIEW [dbo].[vw_recursive_view]
AS
WITH hierarchy_cte(id, parent_id, data, lvl) AS
(SELECT     id, parent_id, data, 0 AS lvl
      FROM         dbo.hierarchical_table
      WHERE     (parent_id IS NULL)
      UNION ALL
      SELECT     t1.id, t1.parent_id, t1.data, h.lvl + 1 AS lvl
      FROM         dbo.hierarchical_table AS t1 INNER JOIN
                            hierarchy_cte AS h ON t1.parent_id = h.id)
SELECT     id, parent_id, data, lvl
FROM         hierarchy_cte AS result


CREATE FUNCTION [dbo].[fn_tree_for_parent] 
(
    @parent int
)
RETURNS 
@result TABLE 
(
    id int not null,
    parent_id int,
    data varchar(255) not null,
    lvl int not null
)
AS
BEGIN
    WITH hierarchy_cte(id, parent_id, data, lvl) AS
   (SELECT     id, parent_id, data, 0 AS lvl
        FROM         dbo.hierarchical_table
        WHERE     (id = @parent OR (parent_id IS NULL AND @parent IS NULL))
        UNION ALL
        SELECT     t1.id, t1.parent_id, t1.data, h.lvl + 1 AS lvl
        FROM         dbo.hierarchical_table AS t1 INNER JOIN
            hierarchy_cte AS h ON t1.parent_id = h.id)
    INSERT INTO @result
    SELECT     id, parent_id, data, lvl
    FROM         hierarchy_cte AS result
RETURN 
END

ALTER TABLE [dbo].[hierarchical_table]  WITH CHECK ADD  CONSTRAINT [FK_hierarchical_table_hierarchical_table] FOREIGN KEY([parent_id])
REFERENCES [dbo].[hierarchical_table] ([id])

ALTER TABLE [dbo].[hierarchical_table] CHECK CONSTRAINT [FK_hierarchical_table_hierarchical_table]

To use it you would do something like -- assuming some reasonable naming scheme:

using (DataContext dc = new HierarchicalDataContext())
{
    HierarchicalTableEntity h = (from e in dc.HierarchicalTableEntities
                                 select e).First();
    var query = dc.FnTreeForParent( h.ID );
    foreach (HierarchicalTableViewEntity entity in query) {
        ...process the tree node...
    }
}

Recursive/Hierarchical Query in LINQ

Try this:

var geo_hierarchy = new []
{
    new { cat_id = "Root", geo_id = 1, parent_id = (int?)null, geo_name = "Pakistan", },
    new { cat_id = "Province", geo_id = 2, parent_id = (int?)1, geo_name = "Punjab", },
    new { cat_id = "District", geo_id = 3, parent_id = (int?)2, geo_name = "Attock", },
    new { cat_id = "City", geo_id = 4, parent_id = (int?)3, geo_name = "Attock", },
    new { cat_id = "City", geo_id = 5, parent_id = (int?)3, geo_name = "Fateh Jang", },
    new { cat_id = "City", geo_id = 6, parent_id = (int?)3, geo_name = "Hasan Abdal", },
};

var map = geo_hierarchy.ToDictionary(x => x.geo_id);

Func<int, string, string> up = null;
up = (geo_id, cat_id) =>
{
    var record = map[geo_id];
    return
        record.cat_id == cat_id
            ? record.geo_name
            : (record.parent_id.HasValue
                ? up(record.parent_id.Value, cat_id)
                : null);
};

var result = up(5, "Province"); // -> "Punjab"

How to retrieve hierarchical data using Linq-to-entities?

I'm afraid you'll have to recurse over the LINQ results in your code. Depending on table size and structure you may want to pull down the whole table into memory (as it seems you are doing) and do the hierarchy walk there. If you have a very large table, you may want to make repeated trips to the DB.

For more information read this great article: Storing Hierarchical Data in a Database

Your code is already doing the flattening in memory. I would suggest using Skip(pSize * pIndex).Take(pSize) on your list2 object (which contains the flattened) data. Be aware that this may lead to page breaks deep in the hierarchy.

Multi-level hierarchical data sorting C#

So, given:

public class Document
{
    public int id;
    public int? ParentId;
    public string name;
}

And:

var documents = new[]
{
    new Document() { id = 1, ParentId = null, name = "fruits" },
    new Document() { id = 2, ParentId = null, name = "vegetables" },
    new Document() { id = 3, ParentId = 2, name = "tomatoes" },
    new Document() { id = 4, ParentId = 1, name = "apples" },
    new Document() { id = 5, ParentId = 4, name = "golden apples" },
    new Document() { id = 6, ParentId = 4, name = "red apples" },
};

Then this works:

var lookup = documents.ToLookup(x => x.ParentId);

Func<int?, IEnumerable<Document>> heirarchySort = null;
heirarchySort = pid =>
    lookup[pid].SelectMany(x => new[] { x }.Concat(heirarchySort(x.id)));

IEnumerable<Document> sorted = heirarchySort(null);

It gives you:

Sorted Documents

Local functions now make this a little nicer on the eye.

IEnumerable<Document> heirarchySort(int? pid) =>
    lookup[pid].SelectMany(x => new[] { x }.Concat(heirarchySort(x.id)));

How do I find the root of a hierarchy using LINQ and EF?

Is this the kind of thing you're looking for

CREATE TABLE [dbo].[hierarchical_table](
    [id] INT,
    [parent_id] INT,
    [data] VARCHAR(255)
)
GO

INSERT [dbo].[hierarchical_table]
SELECT 1,NULL,'/1' UNION ALL
SELECT 2,1,'/1/2' UNION ALL
SELECT 3,2,'/1/2/3' UNION ALL
SELECT 4,2,'/1/2/4' UNION ALL
SELECT 5,NULL, '/5' UNION ALL
SELECT 6,5,'/5/6'
GO

CREATE FUNCTION [dbo].[fn_root_for_node] 
(
    @id int
)
RETURNS TABLE AS
RETURN (
    WITH [hierarchy_cte](id, parent_id, data, [Level]) AS
    (
        SELECT
            id,
            parent_id,
            data,
            0 AS [Level]
        FROM dbo.hierarchical_table
        WHERE id = @id
        UNION ALL
        SELECT t1.id, t1.parent_id, t1.data, h.[Level] + 1 AS [Level]
        FROM  dbo.hierarchical_table AS t1
        INNER JOIN hierarchy_cte AS h
            ON t1.id = h.parent_id
    )
    SELECT TOP 1
        id, parent_id, data, [Level]
    FROM [hierarchy_cte]
    ORDER BY [Level] DESC
)
GO

SELECT * FROM [dbo].[fn_root_for_node] (6)
GO

Rebuild Parent- Child- GrandChild hierarchy with LINQ performance

Why is the lambda so much slower?

projectSections.Select(section => new SectionViewModel(project, section))

from section in projectSections
select new SectionViewModel(
    project,
    section,
    bidItemCollection.Where(...).ToList()

These two call the different constructor, hence the difference in the execution time.

As long as the logic and method is the same, both way of writing it should give the same result and at the same time. Because, the compiler generate the same IL.

How can I optimize it?

Since I cannot do the benchmark on your machine, I'll just use general assumption.

Usually performing the pull all the required data is better than pulling it multiple times.
So I would avoid calling to SectionViewModel.ctor(Project,Section) and BidItemViewModel.ctor(Project,BidItem) as they will perform more queries in the database.

With that said, I would write my lambda as following : //actually this is just your 3rd piece of code cleaned up

sectionViewModels = new List<SectionViewModel>(
        projectSections.Select(
            s => new SectionViewModel(project, s, bidItemCollection.Where(b => b.SectionId == s.SectionId).Select(
                    b => new BidItemViewModel(project, b, subItemCollection.Where(si => si.BidItemId == b.BidItemId))))));

Also, for prettiness, I've changed the following contructors to avoid having ToList in middle of Lambda :

public class SectionViewModel
{
    private readonly Section section;
    private readonly List<BidItemViewModel> bidItems;

    public SectionViewModel(Project project, Section section, IEnumerable<BidItemViewModel> bidItemsForSection)
    {
        this.section = section;
        this.bidItems = bidItemsForSection.ToList();
    }
}

public class BidItemViewModel
{
    private BidItem bidItem;
    private List<SubItem> subItems;

    public BidItemViewModel(Project project, BidItem bidItem, IEnumerable<SubItem> subItems)
    {
        this.bidItem = bidItem;
        this.subItems = subItems.ToList();
    }
}

Sort hierarchy with path and depth fields using Linq

Ok, first you should really add the following constraint

interface ITree<T>
    where T : class, ITree<T>
{
    // ...
}

so we can safely navigate the hierarchy using Parent and Children properties w/o casting.

Second, instead of reinventing the wheel, I'll reuse the general tree in order traversal helper method from my answer to How to flatten tree via LINQ? (and couple others):

public static partial class TreeUtils
{
    public static IEnumerable<T> Expand<T>(this IEnumerable<T> source, Func<T, IEnumerable<T>> elementSelector)
    {
        var stack = new Stack<IEnumerator<T>>();
        var e = source.GetEnumerator();
        try
        {
            while (true)
            {
                while (e.MoveNext())
                {
                    var item = e.Current;
                    yield return item;
                    var elements = elementSelector(item);
                    if (elements == null) continue;
                    stack.Push(e);
                    e = elements.GetEnumerator();
                }
                if (stack.Count == 0) break;
                e.Dispose();
                e = stack.Pop();
            }
        }
        finally
        {
            e.Dispose();
            while (stack.Count != 0) stack.Pop().Dispose();
        }
    }
}

With that helper in the pocket, the method in question is simple as that:

partial class TreeUtils
{
    public static IEnumerable<T> Ordered<T>(this IEnumerable<T> source, Func<IEnumerable<T>, IEnumerable<T>> order = null)
        where T : class, ITree<T>
    {
        if (order == null) order = items => items.OrderBy(item => item.Name);
        return order(source.Where(item => item.Parent == null))
            .Expand(item => item.Children != null && item.Children.Any() ? order(item.Children) : null);
    }
}

Sample usage:

List<TreeItem> flatList = ...;
var orderedList = flatList.Ordered().ToList();

UPDATE: Here is the same by using only Path and Id properties:

public static partial class TreeUtils
{
    public static IEnumerable<T> Ordered<T>(this IEnumerable<T> source, Func<IEnumerable<T>, IEnumerable<T>> order = null)
        where T : class, ITree<T>
    {
        if (order == null) order = items => items != null && items.Any() ?  items.OrderBy(item => item.Name) : null;
        var chldrenByParentId = source
            .Select(item => new { item, path = item.Path.Split('.') })
            .ToLookup(e => e.path.Length >= 2 ? int.Parse(e.path[e.path.Length - 2]) : (int?)null, e => e.item);
        return (order(chldrenByParentId[null]) ?? Enumerable.Empty<T>())
            .Expand(item => order(chldrenByParentId[item.Id]));
    }
}

Hierarchical Data in Linq - Options and Performance