Help with Writing a SQL Query for Nested Sets

Writing Nested SQL queries

Something like this? Lines #1 - 26 represent sample data (you don't type that); query you need begins at line #28.

SQL> with
  2  -- sample data
  3  studentdata (studentid, name, course) as
  4    (select 1, 'olivier', 'it'    from dual union all
  5     select 2, 'noah', 'business' from dual union all
  6     select 3, 'jack', 'business' from dual union all
  7     select 4, 'mason', 'it'      from dual union all
  8     select 5, 'julion', 'it'     from dual),
  9  subjects (subjectid, name) as
 10    (select 1, 'java'           from dual union all
 11     select 2, 'business stg'   from dual union all
 12     select 3, 'python'         from dual union all
 13     select 4, 'statistics'     from dual union all
 14     select 5, 'mgt accounting' from dual union all
 15     select 7, 'social studies' from dual union all
 16     select 8, 'ess english'    from dual),
 17  marks (id, studentid, subjectid, marks) as
 18    (select 1, 1, 1, 56 from dual union all
 19     select 2, 1, 2, 78 from dual union all
 20     select 3, 1, 7, 83 from dual union all
 21     select 4, 1, 3, 45 from dual union all
 22     select 5, 1, 5, 63 from dual union all
 23     --
 24     select 6, 2, 1, 99 from dual union all
 25     select 7, 3, 2, 10 from dual union all
 26     select 8, 4, 7, 83 from dual)
 27  --
 28  select b.name subject, s.name student, m.marks
 29  from marks m join subjects b on b.subjectid = m.subjectid
 30  join studentdata s on s.studentid = m.studentid
 31  where m.marks = (select max(m1.marks)
 32                   from marks m1
 33                   where m1.subjectid = m.subjectid
 34                  )
 35  order by b.name, s.name;

SUBJECT        STUDENT      MARKS
-------------- ------- ----------
business stg   olivier         78
java           noah            99
mgt accounting olivier         63
python         olivier         45
social studies mason           83
social studies olivier         83

6 rows selected.

SQL>

SQL - nested set model - how do I find the deepest parents (not leaf nodes) of each branch

You're right, the first step is to identify the leaf nodes through their property right equals left + 1. Next step is to find all parent nodes for those leaf nodes: these have a left smaller than that of the leaf and right larger than that of the leaf. And the final step is to exclude all parents but the one that has the largest left value (i.e. is closest to the leaf node).

An example in T-SQL, where l is the leaf node, p1 is the direct parent we're looking for and p2 is used to weed out all the non-direct parents.

First set up a test table with your example data in it:

if object_id('tempdb..#nsm') is not null
    drop table #nsm;

select v.node, v.parent, v.lft, v.rgt
into #nsm
from (
    values 
       ('A'  ,     NULL,     1,   16),
       ('B'  ,    'A'   ,    2,   15),
       ('E'  ,    'B'   ,    3,    8),
       ('F'  ,    'E'   ,    4,    5),
       ('H'  ,    'E'   ,    6,    7),
       ('C'  ,    'B'   ,    9,   14),
       ('D'  ,    'C'   ,   10,   11),
       ('G'  ,    'C'   ,   12,   13)
   ) v (node, parent, lft, rgt)

And here's the query that retrieves the both nodes C and E you requested:

select p1.node, p1.parent, p1.lft, p1.rgt
from #nsm p1
where exists (
            select *
            from #nsm l
            where l.rgt = l.lft + 1
                and p1.lft < l.lft
                and p1.rgt > l.rgt
                and not exists (
                        select *
                        from #nsm p2
                        where p2.lft < l.lft
                            and p2.rgt > l.rgt
                            and p2.lft > p1.lft
                    )
        )

Nested Set Query to retrieve all ancestors of each node

First it's important to understand that you have an implicit GROUP BY

If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.

To make the point more understandable I'll leave out subqueries and reduce the problem to the banana. Banana is the set [10, 11]. The correct sorted ancestors are those:

SELECT "banana" as node, GROUP_CONCAT(title ORDER by `left`)
  FROM Tree WHERE `left` < 10 AND `right` > 11
  GROUP BY node;

The ORDER BY must be in GROUP_CONCAT() as you want the aggregation function to sort. ORDER BY outside sorts by the aggregated results (i.e. the result of GROUP_CONCAT()). The fact that it worked until level 4 is just luck. ORDER BY has no effect on an aggregate function. You would get the same results with or without the ORDER BY:

SELECT GROUP_CONCAT(title)
  FROM Tree WHERE `left` < 10 AND `right` > 11
  /* ORDER BY `left` */

It might help to understand what
SELECT GROUP_CONCAT(title ORDER BY left) FROM Tree WHERE … ORDER BY left does:

Get a selection (WHERE) which results in three rows in an undefined order:
```
("Food")
("Yellow")
("Fruit")
```
Aggregate the result into one row (implicit GROUP BY) in order to be able to use an aggregate function:
```
(("Food","Yellow", "Fruit"))
```
Fire the aggregate function (GROUP_CONCAT(title, ORDER BY link)) on it. I.e. order by link and then concatenate:
```
("Food,Fruit,Yellow")
```
And now finally it sorts that result (ORDER BY). As it's only one row, sorting changes nothing.
```
("Food,Fruit,Yellow")
```

Nested Set SQL Query to retrieve the first ancestor of each node

As I understand your question, you want to display the level 1 ancestor only. Starting from your existing query, you could just use conditional aggregation:

MAX(CASE WHEN p.level = 1 THEN p.sid END) as l1_ancestor

This is a bit closer to what you ask for, and would produce the result that you want:

CASE WHEN n.level = 1 THEN n.sid ELSE MAX(CASE WHEN p.level = 1 THEN p.sid END) END res l1_ancestor

Demo on DB Fiddle:


sid | name | ancestors | l1_ancestor
--: | :--- | :-------- | ----------:
  1 | Root | null      |        null
  2 | A    | 1         |           2
  3 | B    | 1,2       |           2
  4 | C    | 1,2       |           2
  5 | D    | 1,2,4     |           2
  6 | E    | 1         |           6
  7 | F    | 1,6       |           6
  8 | G    | 1,6,7     |           6
  9 | H    | 1,6,7     |           6
 10 | I    | 1,6,7,9   |           6
 11 | J    | 1         |          11
 12 | K    | 1,11      |          11

Materializing the path of Nested Set hierarchy in T-SQL

After your comments, I realized no need for the CTE... you already have the range keys.

Example

Select A.*
      ,Path = Replace(Path,'>','>')
 From YourTable A
 Cross Apply (
                Select Path = Stuff((Select ' > ' +Name 
                                      From (
                                            Select LeftAnchor,Name
                                             From  YourTable
                                             Where A.LeftAnchor between LeftAnchor and RightAnchor 
                                           ) B1
                                      Order By LeftAnchor
                                      For XML Path (''))
                                    ,1,6,'')
             ) B
 Order By LeftAnchor

Returns

Sample Image

Searching a nested set

I don't believe there's a perfectly clean and efficient way to do this with nested sets. Storing a list of the node's ancestors in a denormalized column would provide this efficiently, but I don't suggest implementing it.

There is an ok'ish method though, which is 1 query and will conveniently hit the index you already have. You're looking at one join for each level of depth of the target node.

For your example foo-bar-baz

select c3.* from categories c1 inner join categories c2 on c2.parent_id = c1.id AND c2.name = 'bar' inner join categories c3 on c3.parent_id = c2.id AND c2.name = 'baz' where c1.name = 'foo'

It's not the greatest, but it's probably your best bet unless you want to go to the effort of storing a bunch of denormalized information. It's fairly straight forward to generate the SQL in code as well.