extracting data from deeply nested list
I was going to suggest the use of purrr::pluck()
, then then reading through the doco I discovered you could actually just use a purrr::map()
.
You're very close: you need to pass a list of accessors to map()
rather than a character vector, and there's an accessor you've missed.
nestedlist %>% map( list('data', 1, 'name') )
[[1]]
[[1]][[1]]
[1] "john"
[[1]][[2]]
[1] "litz"
[[2]]
[[2]][[1]]
[1] "frank"
[[2]][[2]]
[1] "doe"
Get names at deepest level of a nested list in R
Here is one possible approach, using only base R. The following function f
replaces each terminal node (or "leaf") of a recursive list x
with the sequence of names leading up to it. It treats unnamed lists like named lists with all names equal to ""
, which is a useful generalization.
f <- function(x, s = NULL) {
if (!is.list(x)) {
return(s)
}
nms <- names(x)
if (is.null(nms)) {
nms <- character(length(x))
}
Map(f, x = x, s = Map(c, list(s), nms))
}
f(lst)
$title
[1] "title"
$author
[1] "author"
$date
[1] "date"
$`header-includes`
[1] "header-includes"
$output
$output$pdf_document
$output$pdf_document$citation_package
[1] "output" "pdf_document" "citation_package"
$`biblio-style`
[1] "biblio-style"
$bibliography
[1] "bibliography"
$papersize
[1] "papersize"
How to get common elements in a deep nested list: my two solutions work but take some time
You can try to convert each nested array at the second level into the set of tuples, where each lowest level array (i.e. [0,4]) is an element of the set.
The conversion into tuples is required because lists are not hashable.
Once you have each nested list of lists as a set, simply find their intersection.
set.intersection(*[set(tuple(elem) for elem in sublist) for sublist in ary])
How to access very first object in differently deep nested lists?
Using a while
loop :
x <- list1
while (inherits(x <- x[[1]], "list")) {}
x
#> Time Series:
#> Start = 1
#> End = 100
#> Frequency = 1
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#> [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
#> [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
#> [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
#> [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
#> [91] 91 92 93 94 95 96 97 98 99 100
x <- list2
while (inherits(x <- x[[1]], "list")) {}
x
#> Time Series:
#> Start = 1
#> End = 100
#> Frequency = 1
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#> [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
#> [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
#> [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
#> [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
#> [91] 91 92 93 94 95 96 97 98 99 100
Extract and align specific elements from deeply nested list into R dataframe
Using tidyr
, we can unnest the list by combining a bunch of calls to unnest_wider()
and unnest_longer()
:
library(tidyr)
tibble(conditions) |>
unnest_wider(conditions) |>
unnest_longer(Phrases) |>
unnest_wider(Phrases) |>
unnest_longer(Mappings) |>
unnest_wider(Mappings) |>
unnest_longer(MappingCandidates) |>
unnest_wider(MappingCandidates) |>
unnest_longer(MatchedWords)
#> # A tibble: 4 × 8
#> PMID PhraseText MappingScore CandidateScore CandidateCUI CandidateMatched CandidatePreferred MatchedWords
#> <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <list>
#> 1 1 Hodgkin Lymphoma 1000 1000 C075655 Hodgkins Lymphoma Hodgkins Lymphoma <chr [2]>
#> 2 1 Hodgkin Lymphoma 1000 850 C095659 Lymphoma Lymphoma <chr [1]>
#> 3 2 Plaque Psoriasis 1000 1000 C0125609 Plaque Psoriasis Plaque Psoriasis <chr [2]>
#> 4 2 Plaque Psoriasis 1000 750 C0320011 Psoriasis Psoriasis <chr [1]>
And another approach (perhaps easier to generalize) using rrapply()
in the rrapply
-package. Here rrapply()
is called twice with the option how = "bind"
. Once to bind together all repeated MappingCandidates
and once to bind the other nodes (PMID
, Phrases
, PhraseText
, MappingScore
):
library(rrapply)
## bind MappingCandidates
candidateNodes <- rrapply(
conditions,
how = "bind",
options = list(namecols = TRUE, coldepth = 8)
)
candidateNodes
#> L1 L2 L3 L4 L5 L6 L7 CandidateScore CandidateCUI CandidateMatched CandidatePreferred MatchedWords.1
#> 1 1 Phrases 1 Mappings 1 MappingCandidates 1 1000 C075655 Hodgkins Lymphoma Hodgkins Lymphoma hodgkin, lymphoma
#> 2 1 Phrases 1 Mappings 1 MappingCandidates 2 850 C095659 Lymphoma Lymphoma lymphoma
#> 3 2 Phrases 1 Mappings 1 MappingCandidates 1 1000 C0125609 Plaque Psoriasis Plaque Psoriasis plaque, psoriasis
#> 4 2 Phrases 1 Mappings 1 MappingCandidates 2 750 C0320011 Psoriasis Psoriasis psoriasis
## bind other nodes
otherNodes <- rrapply(
conditions,
condition = \(x, .xparents) !"MappingCandidates" %in% .xparents,
how = "bind",
options = list(namecols = TRUE)
)
otherNodes
#> L1 PMID Phrases.1.PhraseText Phrases.1.Mappings.1.MappingScore
#> 1 1 1 Hodgkin Lymphoma 1000
#> 2 2 2 Plaque Psoriasis 1000
## merge into single data.frame
allNodes <- merge(candidateNodes, otherNodes, by = "L1")
allNodes
#> L1 L2 L3 L4 L5 L6 L7 CandidateScore CandidateCUI CandidateMatched CandidatePreferred MatchedWords.1 PMID Phrases.1.PhraseText Phrases.1.Mappings.1.MappingScore
#> 1 1 Phrases 1 Mappings 1 MappingCandidates 1 1000 C075655 Hodgkins Lymphoma Hodgkins Lymphoma hodgkin, lymphoma 1 Hodgkin Lymphoma 1000
#> 2 1 Phrases 1 Mappings 1 MappingCandidates 2 850 C095659 Lymphoma Lymphoma lymphoma 1 Hodgkin Lymphoma 1000
#> 3 2 Phrases 1 Mappings 1 MappingCandidates 1 1000 C0125609 Plaque Psoriasis Plaque Psoriasis plaque, psoriasis 2 Plaque Psoriasis 1000
#> 4 2 Phrases 1 Mappings 1 MappingCandidates 2 750 C0320011 Psoriasis Psoriasis psoriasis 2 Plaque Psoriasis 1000
R: Find object by name in deeply nested list
Here's a function that will return the first match if found
find_name <- function(haystack, needle) {
if (hasName(haystack, needle)) {
haystack[[needle]]
} else if (is.list(haystack)) {
for (obj in haystack) {
ret <- Recall(obj, needle)
if (!is.null(ret)) return(ret)
}
} else {
NULL
}
}
find_name(my_list, "XY01")
We avoid lapply
so the loop can break early if found.
The list pruning is really a separate issue. Better to attack that with a different function. This should work
list_prune <- function(list, depth=1) {
if (!is.list(list)) return(list)
if (depth>1) {
lapply(list, list_prune, depth = depth-1)
} else {
Filter(function(x) !is.list(x), list)
}
}
Then you could do
list_prune(find_name(my_list, "XY01"), 1)
or with pipes
find_name(my_list, "XY01") %>% list_prune(1)
Extracting values from complex and deeply nested list of dictionaires using python?
Here the main idea is to convert dict to dataframe and dataframe to append on new list by rows
Code:
Step 1:
df = pd.json_normalize(complex_data )
df[2] = df[2].apply(lambda x: {k:v for k , v in dict(map(dict.popitem, x['B']))['C'].items() if k=='test456'})
df
#Output
0 1 2
0 {'A': 'test1'} {'A': 'test2'} {'test456': {'A': '111def'}}
1 {'A': 'test3'} {'A': 'test4'} {'test456': {'A': '999def'}}
Step 2:
desired_output = df.values.tolist()
desired_output
#output
[[{'A': 'test1'}, {'A': 'test2'}, {'test456': {'A': '111def'}}],
[{'A': 'test3'}, {'A': 'test4'}, {'test456': {'A': '999def'}}]]
Update you can avoid the None or {} value using if..else.. as below:
df[2].apply(lambda x: {} if len(x['B'])==0 else({} if not x['B'][-1] else ({'test456':x['B'][-1]['C']['test456']} if 'test456' in x['B'][-1]['C'].keys() else {})))
Related Topics
Why Does Apt-Get Install R-Base Install 3.2.3 Instead of 3.4.0 in R
Using Sample() with Sample Space Size = 1
Horizontal Rule in R Markdown/Bookdown Causing Errors
Make a Boxplot Without Whiskers
Verify Object Existence Inside a Function in R
Convert 12Hour Time to 24Hour Time
Classification Functions in Linear Discriminant Analysis in R
How to Filter an R Simple Features Collection Using Sf Methods Like St_Intersects()
Reconstruct Symmetric Matrix from Values in Long-Form
R - Stuck with Plot() - Colouring Shapefile Polygons Based Upon a Slot Value
How to Draw a Boxplot Without Specifying X Axis
Group/Bin/Bucket Data in R and Get Count Per Bucket and Sum of Values Per Bucket
How to Convert a Data Frame of Integer64 Values to Be a Matrix
Finding Which Element of a Vector Is Between Two Values in R