Determine Level of Nesting in R

Determine level of nesting in R?

A little recursive function can do this for you:

depth <- function(this,thisdepth=0){
if(!is.list(this)){
return(thisdepth)
}else{
return(max(unlist(lapply(this,depth,thisdepth=thisdepth+1))))
}
}

If you've got package:testthat, here's a test set:

l1=list(1,2,3)
l2=list(1,2,l1,4)
l3=list(1,l1,l2,5)

require(testthat)
expect_equal(depth(l1),1)
expect_equal(depth(l2),2)
expect_equal(depth(l3),3)

Apologies for using lower-case L in variable names. Readability fail.

How to run function on the deepest level only in a nested list?

We can recursively descend lst to find the maximum depth and then use that to recursively descend again applying unique only at the maximum depth. No packages are used.

maxDepth <- function(x, depth = 0) {
if (is.list(x)) max(sapply(x, maxDepth, depth+1))
else depth
}

lstUnique <- function(x, depth = maxDepth(x)) {
if (depth == 0) unique(x)
else if (is.list(x)) lapply(x, lstUnique, depth-1)
else x
}

lstUnique(lst)

Variation using rapply

A variation of the above is to recursively add a class to each leaf equal to its depth. Then we can use rapply three times. First use rapply to extract the classes and take the maximum to find the maximum depth. second use rapply to apply unique on just the nodes having the maximum depth class. Third, remove any remaining classes that were not removed by unique because the node was not at maximum depth. (The third rapply, i.e. the last line of code below, could be omitted if it is ok to leave some leaves with the classes we added.)

addDepth <- function(x, depth = 0) {
if (is.list(x)) lapply(x, addDepth, depth+1)
else structure(x, class = format(depth))
}
lst2 <- addDepth(lst)

mx <- max(as.numeric(rapply(lst2, class))) # max depth
lst3 <- rapply(lst2, unique, classes = format(mx), how = "replace")
rapply(lst3, as.vector, how = "replace")

Note on rapply

Note that if you alternately wanted to run unique on all leaves rather than just on the maximum depth leaves then rapply in base R would work.

rapply(lst, unique, how = "replace")

data.tree

This alternative does require the use of a package. First we create a data.tree dt and then traverse it applying unique to the nodes that satisfy the filterFun.

library(data.tree)

dt <- as.Node(lst)
dt$Do(function(x) x$"1" <- unique(x$"1"),
filterFun = function(x) x$level == dt$height)
print(dt, "1")

rrapply

The rrapply package provides an enhancement to rapply which can also pass a position vector whose length equals the depth so we can use it first to calculate the maximum depth mx and then again to apply unique only at that depth. (Have updated rrapply call to use how = "unlist" as opposed to applying unlist afterwards as per suggestion in comments.)

library(rrapply)

mx <- max(rrapply(lst, f = function(x, .xpos) length(.xpos), how = "unlist"))
uniq_mx <- function(x, .xpos) if (length(.xpos) == mx) unique(x) else x
rrapply(lst, is.numeric, uniq_mx)

Test whether one factor is nested in another

Suppose you have two factors f and g, and want to know whether g is nested in f.

Method 1: For people who love linear algebra

Consider the design matrix for two factors:

Xf <- model.matrix(~ f + 0)
Xg <- model.matrix(~ g + 0)

If g is nested in f, then the column space of Xf must be a subspace of the column space of Xg. In other word, for any linear combination of Xf's columns: y = Xf %*% bf, equation Xg %*% bg = y can be solved exactly.

y <- Xf %*% rnorm(ncol(Xf))  ## some random linear combination on `Xf`'s columns
c(crossprod(round(.lm.fit(Xg, y)$residuals, 8))) ## least squares residuals
## if this is 0, you have nesting.

Method 2: For people who love statistics

We check contingency table:

M <- table(f, g)

If all columns have only one non-zero entry, you have g nested in f. In other words:

all(colSums(M > 0L) == 1L)
## `TRUE` if you have nesting

Comment: For any method, you can squeeze the code into one line easily.

Check if a list is nested or not

You can use the is.list function:

any(sapply(x.1, is.list))
[1] FALSE

any(sapply(x.2, is.list))
[1] TRUE

As a function isNested:

isNested <- function(l) {
stopifnot(is.list(l))
for (i in l) {
if (is.list(i)) return(TRUE)
}
return(FALSE)
}

Instead of testing all list elements, the function stops as soon as it detects a nested list.

`group_by` and keep grouping levels as nested data frame's name

You need to add setNames in the map step :

library(tidyverse)

warpbreaks %>%
group_by(tension) %>%
nest() %>%
ungroup %>%
mutate(models=map(data,~glm(breaks~wool,data=.x)),
jt = map(models, ~emmeans::joint_tests(.x, data = .x$data)),
means=map(models,~emmeans::emmeans(.x,"wool",data=.x$data)),
p_cont = setNames(map(means,
~emmeans::contrast(.x, "pairwise",infer = c(T,T))),.$tension))

If you want to name all the list output use across :

warpbreaks %>%
group_by(tension) %>%
nest() %>%
ungroup %>%
mutate(models=map(data,~glm(breaks~wool,data=.x)),
jt = map(models, ~emmeans::joint_tests(.x, data = .x$data)),
means=map(models,~emmeans::emmeans(.x,"wool",data=.x$data)),
p_cont = map(means, ~emmeans::contrast(.x, "pairwise",infer = c(T,T))),
across(models:p_cont, setNames, .$tension)) -> result

result$jt

#$L
# model term df1 df2 F.ratio p.value
# wool 1 Inf 5.653 0.0174

#$M
# model term df1 df2 F.ratio p.value
# wool 1 Inf 1.253 0.2630

#$H
# model term df1 df2 F.ratio p.value
# wool 1 Inf 2.321 0.1277

Finding first place in nested list where some member doesn't exist

We can loop through the nested list with lapply, subset the elements based on logical condition %in%, Filter the NULL elements of list

Filter(length, lapply(li, function(x) Filter(length, 
lapply(x, function(nm) nm[!"occupation" %in% names(nm)]))))

Nested lists: how to define the size before entering data

You can do that using a recursive function.

rec.list <- function(len){
if(length(len) == 1){
vector("list", len)
} else {
lapply(1:len[1], function(...) rec.list(len[-1]))
}
}

l <- rec.list(c(2, 3, 3, 4, 2, 3, 3))

Or perhaps with a 7-d list array? It might look bizarre at first, but it is a perfectly valid data structure.

l <- vector("list", 2*3*3*4*2*3*3)
dim(l) <- c(2, 3, 3, 4, 2, 3, 3)
l[[1,1,1,1,1,1,1]] <- "content"


Related Topics



Leave a reply



Submit