Hierarchical Data in MySQL

Retrieving Tree in Hierarchical data in MySQL

Assuming the data structure is fixed with a good PK, in MySQL 8.x you can do:

with recursive
n (id, name, ref_id, lvl) as (
select id, name, ref_id, 1 from category where id = 2 -- starting node
union all
select c.id, c.name, c.ref_id, n.lvl + 1
from n
join category_relation r on r.parent_ref_id = n.ref_id
join category c on c.ref_id = r.child_ref_id
)
select * from n where lvl <= 3

Result:

 id   name                                    ref_id  lvl 
---- --------------------------------------- ------- ---
2 computing 0 1
3 artificial intelligence 1 2
4 data science 2 2
7 web technologies 5 2
9 content technologies 8 2
10 operating systems 9 2
11 algorithms 10 2
62 information science 61 2
103 software / systems development 102 2
165 scientific computing 165 2
296 image processing 316 2
297 text processing 317 2
301 Google 321 2
322 computer vision 343 2
5 machine learning (ML) 3 3
5 machine learning (ML) 3 3
6 programming 4 3
18 models 17 3
21 classification 20 3
27 data preparation 26 3
28 data analysis 27 3
29 imbalanced datasets 28 3
50 visualization 49 3
61 information retrieval 60 3
68 k-means 67 3
71 Random Forest algorithm 70 3
104 project management 103 3
105 software development methodologies 104 3
107 web development 106 3
113 kNN model 112 3
132 CRISP-DM methodology 131 3
143 data 142 3
153 SMOTE 153 3
154 MSMOTE 154 3
157 backward feature elimination 157 3
158 forward feature selection 158 3
176 deep feature synthesis (DFS) 177 3
196 unsupervised learning 197 3
210 mean-shift 211 3
212 DBSCAN 213 3
246 naïve Bayes algorithm 247 3
248 decision tree algorithm 249 3
249 support vector machine (SVM) algorithm 250 3
251 neural networks 252 3
252 artificial neural networks (ANN) 253 3
281 deep learning 300 3
281 deep learning 300 3
285 image classification 304 3
285 image classification 304 3
286 natural language processing (NLP) 305 3
286 natural language processing (NLP) 305 3
288 text representation 307 3
294 visual recognition 314 3
295 optical character recognition (OCR) 315 3
295 optical character recognition (OCR) 315 3
296 image processing 316 3
298 machine translation (MT) 318 3
299 speech recognition 319 3
300 TensorFlow 320 3
302 R 322 3
304 Android 324 3
322 computer vision 343 3
323 object detection 344 3
324 instance segmentation 345 3
325 edge detection 346 3
326 image filters 347 3
327 feature maps 348 3
328 stride 349 3
329 padding 350 3
335 text preprocessing 356 3
336 tokenization 357 3
337 case normalization 358 3
338 removing punctuation 359 3
339 stop words 360 3
340 stemming 361 3
341 lemmatization 362 3
342 Porter algorithm 363 3
350 word2vec 371 3
351 Skip-gram 372 3
364 convnets 385 3
404 multiplicative update algorithm 716 3

If you want to remove duplicates you can use DISTINCT. For example:

with recursive
n (id, name, ref_id, lvl) as (
select id, name, ref_id, 1 from category where id = 2 -- starting node
union all
select c.id, c.name, c.ref_id, n.lvl + 1
from n
join category_relation r on r.parent_ref_id = n.ref_id
join category c on c.ref_id = r.child_ref_id
)
select distinct * from n where lvl <= 3

See running example at DB Fiddle.

MySQL - Best method to handle this hierarchical data?

Quassnoi has run some performance tests on the nested sets model and the adjacency list model and documented the results and recommendations in his blog post Adjacency list vs. nested sets: MySQL. The executive summary is:

  • Nested sets is faster for fetching all child nodes or all parent nodes.
  • Nested sets is a bad idea if you frequently need to update the table.

Here is the conclusion from his article:

In MySQL, the nested sets model should be preferred if the updates to the hierarhical structure are infrequent and it is affordable to lock the table for the duration of an update (which can take minutes on a long table).

This implies creating the table using MyISAM storage engine, creating the bounding box of a GEOMETRY type as described above, indexing it with a SPATIAL index and persisting the level in the table.

If the updates to the table are frequent or it is inaffordable to lock the table for a long period of time implied by an update, then the adjacency list model should be used to store the hierarchical data.

This requires creating a function to query the table.

The rest of the article shows how to define the table, implement the queries and gives performance measurements. The use of the spatial index is a clever idea to improve the performance of the nested set model that might be new to you.


If you're also considering approaches without MySQL then you might want to look at PostgreSQL which is another free and open-source database. PostgreSQL supports recursive queries in the form of recursive common table expressions which make querying heirarchical data easier than in MySQL and also give better performance. Quassnoi has also written an article Adjacency list vs. nested sets: PostgreSQL that shows the details.

While we are talking about looking at other approaches, Oracle's database is also worth a mention. Oracle also have a custom extension CONNECT BY which make querying heirarchical data very easy and fast. Quassnoi's article Adjacency list vs. nested sets: Oracle again covers the performance details. The query you need to get all children is extremely simple in this case:

SELECT *
FROM yourtable
START WITH id = 42
CONNECT BY parent = PRIOR id

Fetching hierarchical data in mysql

the version i checked is 8.0.22 – Yogus

WITH RECURSIVE
cte AS ( SELECT *
FROM user
WHERE IMMEDIATE_SUPERIOR_ID = 432
UNION ALL
SELECT user.*
FROM user
JOIN cte ON cte.user_id = user.IMMEDIATE_SUPERIOR_ID )
SELECT cte.user_id, 432 IMMEDIATE_SUPERIOR_ID
FROM cte;

https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=3acc566a4a7f356c0c48764be600c69a


If you need to gather the data for more than one root user then

WITH RECURSIVE
cte AS ( SELECT *, user_id UPPER_SUPERIOR_ID
FROM user
WHERE IMMEDIATE_SUPERIOR_ID IS NULL
-- AND user_id IN ( {needed root users list} )
UNION ALL
SELECT user.user_id, user.IMMEDIATE_SUPERIOR_ID, cte.UPPER_SUPERIOR_ID
FROM user
JOIN cte ON cte.user_id = user.IMMEDIATE_SUPERIOR_ID )
SELECT cte.user_id, UPPER_SUPERIOR_ID
FROM cte
-- WHERE IMMEDIATE_SUPERIOR_ID IS NOT NULL
-- ORDER BY UPPER_SUPERIOR_ID, user_id
;

https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=de8997ad5d484eec9d6c7a8c1844f1c5

Mysql hierarchical data question. Read in the description

To make the queries (given in the post you mentioned) work to solve your problem, you have to make a small change as given below:

For MySQL 8+: In where clause use IN instead of = and give any number of values you want desired result for.

with recursive cte (id, name, group_id) as (
select id,
name,
group_id
from Table1
where group_id IN (1,2)
union all
select p.id,
p.name,
p.group_id
from Table1 p
inner join cte
on p.group_id = cte.id
)
select * from cte;

Link for Demo: DB_Fiddle

For MySQL 5.x: In select @pv you have to provide list of ids for which you want the desired result like given below:

select  id,
name,
group_id
from (select * from Table1
order by group_id, id) products_sorted,
(select @pv := "1,2") initialisation
where find_in_set(group_id, @pv)
and length(@pv := concat(@pv, ',', id))

Link for Demo: SQL_Fiddle



Related Topics



Leave a reply



Submit