Roll Your Own Linked List/Tree in R

Roll Your Own Linked List/Tree in R?

A linked list in R can be represented as a vector, typically a list. You don't need to write special code to reference the next and previous items, because R does it for you via indexing.

To add a new item to the list, just keep track of its length and assign to the next in line.

lst <- list() # creates an empty (length zero) list
lst[[1]] <- 1 # automagically extends the lst
lst[[2]] <- 2 # ditto

This can be inefficient for long lists because of the way R handles memory. If possible, create the list in advance, and assign their contents as they become available.

lst <- list(1, 2, 3, 4, 5)    # a list of 5 items

lst <- vector("list", 10000) # 10000 NULLs
lst[[1]] <- 1
lst[[10000]] <- 10000 # lst now contains 1, NULL, ..., NULL, 10000

Deleting an item from the list can be accomplished with negative indexes.

lst <- list(1, 2, 3, 4, 5)
lst <- lst[-2] # now contains 1, 3, 4, 5

A tree is just a list containing other lists.

tree <- list(list(1, 2), list(3, list(4, 5)))

# left child: list(1, 2)
tree[[1]]

# right child
tree[[2]]

# right child of right child:list(4, 5)
tree[[2]][[2]]

By default there is no built-in enforcing of the structure, eg only two children per node for a binary tree. More structured approaches are available via S4 classes, but this will do the job in a pinch.

Implementing classification tree with lists and vectors

By playing a little bit with data.tree in R, I constructed this

library(data.tree)
my.tree <- Node$new('my tree')
my.tree$key <- 1
my.tree$var.name <- 'blahblah'

function <- insert.Node(tree=NULL, key=1, var.name='abcd'){

if (is.null(tree$key)){
# Creation of root
tree = Node$new(paste(var.name, " < ", key, sep = ''))
tree$key <- key
} else if (key < tree$key) {
# Left child
tree$AddChildNode(insert.Node(tree$children[[1]], key, var.name))
} else {
# Right child
tree$AddChildNode(insert.Node(tree$children[[2]], key, var.name))
}

}

tree <- insert.Node(tree=my.tree, key = 4, var.name = 'hello world')

Hopefully, this helps.

R - partykit - custom model tree nodes and splits

In principle, it is possible to build a "modelparty" object (as returned by mob()) by hand. However, there is not a simple coercion function like as.constparty() in the example you cite. The reason is that for building the model trees, and for printing and especially predicting with them, much more detailed information about the model is needed.

I would recommend to build the tree structure ("partynode") first, then fill the overall $info slot (with call, formula, Formula, terms, fit, control, dots, and nreg). And then it should be possible to call refit.modelparty() to refit the models in all the terminal nodes. And this can then be used to fill the $node$info (with coefficients, objfun, nobs, ...).

Disclaimer: All of this is completely untested. But instead of mimicking "modelparty" you could, of course, also create your own way of storing the models in the "party" object and just use more basic building blocks provided by partykit.

Node link diagram in R using Rpart.plot and rattle

If you are going tidymodels and parsnip to fit your model, it's better to use that actual fitted model for any visualizations like this. You can get the underlying engine object from a parsnip model using $fit.

library(tidyverse)
library(tidymodels)
library(rattle)
#> Loading required package: bitops
#> Rattle: A free graphical interface for data science with R.
#> Version 5.4.0 Copyright (c) 2006-2020 Togaware Pty Ltd.
#> Type 'rattle()' to shake, rattle, and roll your data.
data(kyphosis, package = "rpart")

tree_fit <- decision_tree(min_n = 20) %>%
set_engine("rpart") %>%
set_mode("classification") %>%
fit(Kyphosis ~ Age + Number + Start,
data = kyphosis)

fancyRpartPlot(tree_fit$fit, sub = "")

Sample Image

Created on 2021-05-25 by the reprex package (v2.0.0)

For some kinds of visualizations, you will need to use repair_call().

Is there a collections library in R?

As far as I know, there is no such package that implements all the features of collections. Dataframes are the easiest and fun way to manipulate data. However, we can use lists for most of the functionality of linked-lists, stack and queues. This is how it is done.

Edit: For optimal results, implementation of linkedlists using lists is not recommended because of the way in which R allocates memory.

Hope it helped!

How to save a data.tree in R?

Your question is a bit vague as I have no clue whatsoever what type of object you are referring to. However, this seems to work for me.

data("acme")
a = acme
class(a)
[1] "Node" "R6"

x = tempdir()
setwd(tempdir())
save(a, file = 'test.Rdata')
rm(a)
load('test.Rdata')
a

Calculate number of observations in each node in a decision tree in R?

It seems that we can do "rolling splits" to get what you are looking for. The logic is as follows.

  1. Start with a stack with only one dataframe dat.
  2. For each pair of variableName and splitValue, if they are not NAs, split the top dataframe on that stack into two sub dataframes identified by variableName <= splitValue and variableName > splitValue (the former on top of the latter); if they are NAs, then simply pop the top dataframe.

Here is the code. Note that this kind of state-dependent computation is hard to vectorize. It's thus not what R is good at. If you have a lot of trees and the code performance becomes a serious concern, I'd suggest rewriting the code below using Rcpp.

eval_node <- function(df, x, v) {
out <- vector("list", length(x))
stk <- vector("list", sum(is.na(x)))
pos <- 1L
stk[[pos]] <- df
for (i in seq_along(x)) {
if (!is.na(x[[i]])) {
subs <- pos + c(0L, 1L)
stk[subs] <- split(stk[[pos]], stk[[pos]][[x[[i]]]] <= v[[i]])
names(stk)[subs] <- trimws(paste0(
names(stk[pos]), ",", x[[i]], c(">", "<="), v[[i]]
), "left", ",")
out[[i]] <- rev(stk[subs])
pos <- pos + 1L
} else {
out[[i]] <- stk[pos]
stk[[pos]] <- NULL
pos <- pos - 1L
}
}
out
}

Then you can apply the function like this.

library(dplyr)

df %>% group_by(treeNo) %>% mutate(node = eval_node(dat, variableName, splitValue))

Output

# A tibble: 15 x 4
# Groups: treeNo [3]
variableName splitValue treeNo node
<chr> <dbl> <dbl> <list>
1 x2 0.542 1 <named list [2]>
2 x1 0.126 1 <named list [2]>
3 NA NA 1 <named list [1]>
4 NA NA 1 <named list [1]>
5 NA NA 1 <named list [1]>
6 x2 0.655 2 <named list [2]>
7 NA NA 2 <named list [1]>
8 NA NA 2 <named list [1]>
9 x5 0.418 3 <named list [2]>
10 x4 0.234 3 <named list [2]>
11 NA NA 3 <named list [1]>
12 NA NA 3 <named list [1]>
13 x3 0.747 3 <named list [2]>
14 NA NA 3 <named list [1]>
15 NA NA 3 <named list [1]>

, where node looks like this

[[1]]
[[1]]$`x2<=0.542`
x1 x2 x3 x4 x5
3 0.55232243 0.2803538 0.5383487 0.3486920 0.7775844
4 0.05638315 0.3984879 0.7489722 0.9541577 0.8273034
7 0.81240262 0.2046122 0.7703016 0.1804072 0.7803585
8 0.37032054 0.3575249 0.8819536 0.6293909 0.8842270
9 0.54655860 0.3594751 0.5490967 0.9895641 0.2077139

[[1]]$`x2>0.542`
x1 x2 x3 x4 x5
1 0.3077661 0.6249965 0.5358112 0.4883060 0.3306605
2 0.2576725 0.8821655 0.7108038 0.9285051 0.8651205
5 0.4685493 0.7625511 0.4201015 0.6952741 0.6033244
6 0.4837707 0.6690217 0.1714202 0.8894535 0.4912318
10 0.1702621 0.6902905 0.2777238 0.1302889 0.3070859

[[2]]
[[2]]$`x2<=0.542,x1<=0.126`
x1 x2 x3 x4 x5
4 0.05638315 0.3984879 0.7489722 0.9541577 0.8273034

[[2]]$`x2<=0.542,x1>0.126`
x1 x2 x3 x4 x5
3 0.5523224 0.2803538 0.5383487 0.3486920 0.7775844
7 0.8124026 0.2046122 0.7703016 0.1804072 0.7803585
8 0.3703205 0.3575249 0.8819536 0.6293909 0.8842270
9 0.5465586 0.3594751 0.5490967 0.9895641 0.2077139

[[3]]
[[3]]$`x2<=0.542,x1<=0.126`
x1 x2 x3 x4 x5
4 0.05638315 0.3984879 0.7489722 0.9541577 0.8273034

[[4]]
[[4]]$`x2<=0.542,x1>0.126`
x1 x2 x3 x4 x5
3 0.5523224 0.2803538 0.5383487 0.3486920 0.7775844
7 0.8124026 0.2046122 0.7703016 0.1804072 0.7803585
8 0.3703205 0.3575249 0.8819536 0.6293909 0.8842270
9 0.5465586 0.3594751 0.5490967 0.9895641 0.2077139

[[5]]
[[5]]$`x2>0.542`
x1 x2 x3 x4 x5
1 0.3077661 0.6249965 0.5358112 0.4883060 0.3306605
2 0.2576725 0.8821655 0.7108038 0.9285051 0.8651205
5 0.4685493 0.7625511 0.4201015 0.6952741 0.6033244
6 0.4837707 0.6690217 0.1714202 0.8894535 0.4912318
10 0.1702621 0.6902905 0.2777238 0.1302889 0.3070859

[[6]]
[[6]]$`x2<=0.6547`
x1 x2 x3 x4 x5
1 0.30776611 0.6249965 0.5358112 0.4883060 0.3306605
3 0.55232243 0.2803538 0.5383487 0.3486920 0.7775844
4 0.05638315 0.3984879 0.7489722 0.9541577 0.8273034
7 0.81240262 0.2046122 0.7703016 0.1804072 0.7803585
8 0.37032054 0.3575249 0.8819536 0.6293909 0.8842270
9 0.54655860 0.3594751 0.5490967 0.9895641 0.2077139

[[6]]$`x2>0.6547`
x1 x2 x3 x4 x5
2 0.2576725 0.8821655 0.7108038 0.9285051 0.8651205
5 0.4685493 0.7625511 0.4201015 0.6952741 0.6033244
6 0.4837707 0.6690217 0.1714202 0.8894535 0.4912318
10 0.1702621 0.6902905 0.2777238 0.1302889 0.3070859

[[7]]
[[7]]$`x2<=0.6547`
x1 x2 x3 x4 x5
1 0.30776611 0.6249965 0.5358112 0.4883060 0.3306605
3 0.55232243 0.2803538 0.5383487 0.3486920 0.7775844
4 0.05638315 0.3984879 0.7489722 0.9541577 0.8273034
7 0.81240262 0.2046122 0.7703016 0.1804072 0.7803585
8 0.37032054 0.3575249 0.8819536 0.6293909 0.8842270
9 0.54655860 0.3594751 0.5490967 0.9895641 0.2077139

[[8]]
[[8]]$`x2>0.6547`
x1 x2 x3 x4 x5
2 0.2576725 0.8821655 0.7108038 0.9285051 0.8651205
5 0.4685493 0.7625511 0.4201015 0.6952741 0.6033244
6 0.4837707 0.6690217 0.1714202 0.8894535 0.4912318
10 0.1702621 0.6902905 0.2777238 0.1302889 0.3070859

[[9]]
[[9]]$`x5<=0.418`
x1 x2 x3 x4 x5
1 0.3077661 0.6249965 0.5358112 0.4883060 0.3306605
9 0.5465586 0.3594751 0.5490967 0.9895641 0.2077139
10 0.1702621 0.6902905 0.2777238 0.1302889 0.3070859

[[9]]$`x5>0.418`
x1 x2 x3 x4 x5
2 0.25767250 0.8821655 0.7108038 0.9285051 0.8651205
3 0.55232243 0.2803538 0.5383487 0.3486920 0.7775844
4 0.05638315 0.3984879 0.7489722 0.9541577 0.8273034
5 0.46854928 0.7625511 0.4201015 0.6952741 0.6033244
6 0.48377074 0.6690217 0.1714202 0.8894535 0.4912318
7 0.81240262 0.2046122 0.7703016 0.1804072 0.7803585
8 0.37032054 0.3575249 0.8819536 0.6293909 0.8842270

[[10]]
[[10]]$`x5<=0.418,x4<=0.234`
x1 x2 x3 x4 x5
10 0.1702621 0.6902905 0.2777238 0.1302889 0.3070859

[[10]]$`x5<=0.418,x4>0.234`
x1 x2 x3 x4 x5
1 0.3077661 0.6249965 0.5358112 0.4883060 0.3306605
9 0.5465586 0.3594751 0.5490967 0.9895641 0.2077139

[[11]]
[[11]]$`x5<=0.418,x4<=0.234`
x1 x2 x3 x4 x5
10 0.1702621 0.6902905 0.2777238 0.1302889 0.3070859

[[12]]
[[12]]$`x5<=0.418,x4>0.234`
x1 x2 x3 x4 x5
1 0.3077661 0.6249965 0.5358112 0.4883060 0.3306605
9 0.5465586 0.3594751 0.5490967 0.9895641 0.2077139

[[13]]
[[13]]$`x5>0.418,x3<=0.747`
x1 x2 x3 x4 x5
2 0.2576725 0.8821655 0.7108038 0.9285051 0.8651205
3 0.5523224 0.2803538 0.5383487 0.3486920 0.7775844
5 0.4685493 0.7625511 0.4201015 0.6952741 0.6033244
6 0.4837707 0.6690217 0.1714202 0.8894535 0.4912318

[[13]]$`x5>0.418,x3>0.747`
x1 x2 x3 x4 x5
4 0.05638315 0.3984879 0.7489722 0.9541577 0.8273034
7 0.81240262 0.2046122 0.7703016 0.1804072 0.7803585
8 0.37032054 0.3575249 0.8819536 0.6293909 0.8842270

[[14]]
[[14]]$`x5>0.418,x3<=0.747`
x1 x2 x3 x4 x5
2 0.2576725 0.8821655 0.7108038 0.9285051 0.8651205
3 0.5523224 0.2803538 0.5383487 0.3486920 0.7775844
5 0.4685493 0.7625511 0.4201015 0.6952741 0.6033244
6 0.4837707 0.6690217 0.1714202 0.8894535 0.4912318

[[15]]
[[15]]$`x5>0.418,x3>0.747`
x1 x2 x3 x4 x5
4 0.05638315 0.3984879 0.7489722 0.9541577 0.8273034
7 0.81240262 0.2046122 0.7703016 0.1804072 0.7803585
8 0.37032054 0.3575249 0.8819536 0.6293909 0.8842270


Related Topics



Leave a reply



Submit