Sorting Tree with a Materialized Path

Sorting tree with a materialized path?

I typically create an additional columnn for this, called something like SortPath. It would contain the data that you need to sort by, concatenated together. That column would be of type varchar, and get sorted as a string. Something like this:

id | parent_id | matpath |          created            |                   sortpath
---+-----------+---------+-----------------------------+--------------------------------------------------------------------------------------
2 | 1 | 1 | 2010-05-08 15:18:37.987544 | 2010-05-08 15:18:37.987544-2
6 | 2 | 1.2 | 2010-05-08 17:50:43.288759 | 2010-05-08 15:18:37.987544-2.2010-05-08 17:50:43.288759-6
8 | 6 | 1.2.6 | 2010-05-09 14:01:17.632695 | 2010-05-08 15:18:37.987544-2.2010-05-08 17:50:43.288759-6.2010-05-09 14:01:17.632695-8
3 | 1 | 1 | 2010-05-08 17:38:14.125377 | 2010-05-08 17:38:14.125377-3
4 | 1 | 1 | 2010-05-08 17:38:57.26743 | 2010-05-08 17:38:57.267430-4
5 | 1 | 1 | 2010-05-08 17:43:28.211708 | 2010-05-08 17:43:28.211708-5
9 | 5 | 1.5 | 2010-05-09 14:02:43.818646 | 2010-05-08 17:43:28.211708-5.2010-05-09 14:02:43.818646-9
7 | 1 | 1 | 2010-05-08 18:18:11.849735 | 2010-05-08 18:18:11.849735-7

A couple of things to note here:

  • sortpath will be sorted as a string, so it is important all dates have the same length for it to correctly sort. E.g., observe how 2010-05-08 17:38:57.26743 has an extra zero added in the sortpath column.
  • I have appended the PK of each node to the end of its date. This is so that if you happen to have two rows with the exact same date, they will always get returned in the same order due to the additional data we are appending.
  • To me, the data looks asymmetrical the way I have written it, because we are showing the current node's date in sortpath, but it is not in matpath. I would prefer to see it in both.
  • You may want to put the date of node ID 1 at the beginning of each sortcolumn as well. This is so that if you ever want to query for more than one forum at a time (you probably won't), then it will still sort correctly.

How do I properly sort a materialized paths in SQL?

Try this:

SELECT * FROM categories ORDER BY path + '/';

Produces:

/foo-it
/foo-it/boy
/foo
/foo/bar

/foo is sorted after /foo-it because /foo/ comes after /foo-.

You can fiddle around a bit like replacing - with something that comes after / in ordering and not allowed in paths or file name.

SELECT * FROM categories ORDER BY replace(path,'-','?') + '/';

Produces:

/foo
/foo/bar
/foo-it
/foo-it/boy

The best way to generate path pattern for materialized path tree structures

You are correct -- zero-padding each node ID would allow you to sort the entire tree quite simply. However, you have to make the padding width match the upper limit of digits of the ID field, as you have pointed out in your last example. E.g., if you're using an int unsigned field for your ID, the highest value would be 4,294,967,295. This is ten digits, meaning that the record set from your last example might look like:

uid   | name | tree_id
9999 | Tar | 0000000001.0000009999
10000 | Tor | 0000000001.0000010000

As long as you know you're not going to need to change your ID field to bigint unsigned in the future, this will continue work, though it might be a bit data-hungry depending on how huge your tables get. You could shave off two bytes per node ID by storing the values in hexadecimal, which would still be sorted correctly in a string sort:

uid   | name | tree_id
9999 | Tar | 00000001.0000270F
10000 | Tor | 00000001.00002710

I can imagine this would make things a real headache when trying to update the paths (pruning nodes, etc) though.

You can also create extra fields for sorting, e.g.:

uid   | name | tree_id           | name_sort
9999 | Tar | 00000001.0000270F | Ali.Tar
10000 | Tor | 00000001.00002710 | Ali.Tor

There are limitations, however, as laid out by this guy's answer to a similar materialized path sorting question. The name field would have to be padded to a set length (fortunately, in your example, each name seems to be three characters long), and it would take up a lot of space.

In conclusion, given the above issues, I've found that the most versatile way to do sorting like this is to simply do it in your application logic -- say, using a recursive function that builds a nested array, sorting the children of each node as it goes.

Creating a specific tree structure from materialized path

Finally I managed to code this, with a lot of debugging because there are a lot of traps.

the idea is to make the json in several passes, counting each time the number of remaining elements.

if this number does not decrease from one pass to another, then the pgm sends an error and stops.

for each element to add we calculate the supposed parent getParentKey() which returns a table of the list of all its parents

You must then find the direct parent in the table, using this list, starting with the root, if the parent does not exist then the element is kept in a table and will be retried next.

const entities = 

[ { id: 'xi32', name: 'Some name', path: ',templateId1,' }

, { id: 'x382', name: 'Some name 2', path: ',templateId1,xi32,' }

, { id: '2oxwo', name: 'Some name 3', path: ',templateId1,xi32,x382,' }

, { id: '2', name: '2', path: ',templateId1,' }

, { id: '2-2', name: '2-2', path: ',templateId1,2,' }

, { id: '3-3', name: '3-3', path: ',templateId1,3,' }

, { id: '3-3-3', name: '3-3-3', path: ',templateId1,3,3-3,' }

, { id: '3-3-3-3', name: '3-3-3-3', path: ',templateId1,3,3-3,3-3-3,'}

, { id: '3', name: '3', path: ',templateId1,' }

];

const Result = []

;

let unDone = []

, source = entities

;

let cpt = 0 // just for stoping infinite loop on error

;

do

{

unDone = setResult( source, unDone.length )

source = unDone;

if (++cpt > 10) throw 'mince! something is rotten in the state of Denmark...'

}

while

(unDone.length>0)

;

/* --------------------------------------------------------*/

console.log( 'result===', JSON.stringify(Result,0,2 ) )

/* --------------------------------------------------------*/

function setResult( arrayIn, nb_rej )

{

let orphans = [];

for (let elData of arrayIn)

{

let newEl = { ...elData, children: null }

, parAr = getParentKey(elData.path)

if (parAr.length===0)

{ Result.push(newEl) }

else

{

let resParent = Result;

do

{

let rech = parAr.pop()

, fPar = resParent.find(treeElm=>(treeElm.id===rech.id && treeElm.path===rech.path))

;

if (fPar)

{

if (fPar.children===null)

fPar.children = []

;

resParent = fPar.children

}

else // throw `parent element not found : id:'${rech.id}', path:'${rech.path}'` ;

{

orphans.push( { ...elData } )

resParent = null

parAr.length = 0

}

}

while

(parAr.length>0)

;

if (resParent) resParent.push(newEl);

}

}

if ( orphans.length>0 && orphans.length == nb_rej )

throw ` ${nb_rej} children element(s) without parent !'`;

return orphans

}

function getParentKey( path )

{ // return array of parent element

let rep = []

, par = path

, lev, bKey, xCom, idK;

do

{

bKey = par.substring(0, par.lastIndexOf(',')) // remove last ','

lev = bKey.match(/,/g).length -1

if (lev>0)

{

xCom = bKey.lastIndexOf(',')

par = bKey.substring(0, xCom) +','

idK = bKey.substring(++xCom)

rep.push({ path:par, id:idK })

} }

while

(lev>0)

return rep

}
.as-console-wrapper { max-height: 100% !important; top: 0; }

Sorting a tree while maintaining structure

You're going to need a recursive query and a window function. It'll look similar to this:

with recursive
ordered_tree as (
select tree.*,
array[row_number() over w] as score_path
from tree
where depth = 1
window w as (order by tree.score desc)
union all
select tree.*,
parent.score_path || array[row_number() over w] as score_path
from tree
join ordered_tree as parent on parent.id = tree.parent_id
window w as (partition by tree.parent_id order by tree.score desc)
)
select *
from ordered_tree
order by score_path

Note: the above will be rather slow if your set is large...

Searching for the right-most node of a materialized path tree

Short answer: No.

Here is a SQLFiddle demonstrating the problem I described in my comment.

For this simple setup:

id, path
1, '1'
2, '1\2'
3, '1\3'
4, '1\4'
5, '1\5'
6, '1\6'
7, '1\7'
8, '1\8'
9, '1\9'
10, '1\10'

attempting to get the rightmost leaf (id = 10) with a simple sort will fail:

SELECT TOP 1
id,
path
FROM hierarchy
ORDER BY path DESC

returns:

id, path
9, 1\9

Because path is a text-based column, 1\10 will come after 1\9 in the descending sort (See the results of the second query in the fiddle).

Even if you began tracking depth and path length, which are generally cheap and easy to keep up with, it would be entirely possible to get paths like this:

path       depth  length
12\3\11\2 4 9
5\17\10\1 4 9

which would still not sort properly.

Even if you are using letters instead of numbers, this only pushes the problem horizon out to the 26th child instead of the 10th:

SQLFiddle using letters

I am not as familiar with materialized path operations as I am with nested set and adjacency lists, and have no experience with django, so I'll defer to others if there are methods of which I am unaware, but you will almost certainly have to perform some sort of parsing on the path column to consistently get the correct leaf.

EDIT - Having addressed the question of whether sorting is a valid solution, here are some additional notes on other potential solutions after a bit of discussion and thinking on the problem a bit:

-"Rightmost" is a vague term when nodes can have more than two children (i.e., the tree is not a binary tree). If a node has 10 children, which are to the left of the parent, and which are to the right? You must define this condition before you can define a solution to the problem.

-Once "rightmost" is properly defined for your problem space, understand that the rightmost node will not necessarily be on the lowest level of the tree:

        1
/ \
1\1 1\2 <= This is the rightmost node
/
1\1\1 <= This is the lowest node

-Once "rightmost" is defined, a simple loop can be used to programatically find the rightmost node:

//in pseudocode
function GetRightmostNode(Node startNode)
{
Node currentNode = startNode;

while(currentNode.RightChildren != null)
{
currentNode = maximum of currentNode.RightChildren;
}

return currentNode;
}

This loop will look for children of the current node to the right of the current node. If they exist, it selects the rightmost of the right children and repeats. Once it reaches a node with no children to its right, it returns the current node, as it has found the rightmost node of the tree (or subtree) with startNode as its root.

Materialized path sorting order by date + path

You just have to find the part after the first point.

Then you order DESC until the first part, and ASC after the last part.

SELECT * FROM comments 
WHERE post_id=10
ORDER by substring_index(path, '.', 1) DESC,
path ASC

Note that you have an error in your attached file in the third column, inverting 9972 and 9974.

I don't know if, in this case, the MySQL optimization engine uses the index set on path to sort the result. It should be more efficient to add a column to your model.



Related Topics



Leave a reply



Submit