How to Flatten R Data Frame That Contains Lists

How to flatten R data frame that contains lists?

Here is another way in base r

df<-data.frame(CAT=c("A","B"))
df$COUNT <-list(1:3,4:5)
df$TREAT <-list(paste("Treat-", letters[1:2],sep=""),paste("Treat-", letters[3:5],sep=""))

Create a helper function to do the work

f <- function(l) {
if (!is.list(l)) return(l)
do.call('rbind', lapply(l, function(x) `length<-`(x, max(lengths(l)))))
}

Always test your code

f(df$TREAT)

# [,1] [,2] [,3]
# [1,] "Treat-a" "Treat-b" NA
# [2,] "Treat-c" "Treat-d" "Treat-e"

Apply it

df[] <- lapply(df, f)
df

# CAT COUNT.1 COUNT.2 COUNT.3 TREAT.1 TREAT.2 TREAT.3
# 1 A 1 2 3 Treat-a Treat-b <NA>
# 2 B 4 5 NA Treat-c Treat-d Treat-e

flatten list column within dataframe in R

Here is one option where we unlist the 'var_2', 'var_3', and unnest

library(dplyr)
library(purrr)
library(tidyr)
test %>%
group_split(var_1) %>%
map_dfr(~ .x %>%
mutate_at(-1, ~ list(unlist(.))) %>%
unnest(c(var_2, var_3)))
# A tibble: 5 x 3
# var_1 var_2 var_3
# <fct> <fct> <fct>
#1 ONE Date 1 Name 1
#2 ONE Date 2 Name 2
#3 TWO Date 3 Name 3
#4 TWO Date 4 Name 4
#5 TWO Date 5 Name 5

Or we can do

test %>%
rowwise %>%
summarise_all(~ list(unlist(.))) %>%
unnest(cols = everything())
# A tibble: 5 x 3
# var_1 var_2 var_3
# <fct> <fct> <fct>
#1 ONE Date 1 Name 1
#2 ONE Date 2 Name 2
#3 TWO Date 3 Name 3
#4 TWO Date 4 Name 4
#5 TWO Date 5 Name 5

Or with

test  %>% 
group_by(var_1) %>%
nest %>%
mutate(data = map(data, ~ summarise_all(.x, ~ list(unlist(.))) %>%
unnest(everything()))) %>%
unnest(data)

Convert a list to a data frame

Update July 2020:

The default for the parameter stringsAsFactors is now default.stringsAsFactors() which in turn yields FALSE as its default.


Assuming your list of lists is called l:

df <- data.frame(matrix(unlist(l), nrow=length(l), byrow=TRUE))

The above will convert all character columns to factors, to avoid this you can add a parameter to the data.frame() call:

df <- data.frame(matrix(unlist(l), nrow=132, byrow=TRUE),stringsAsFactors=FALSE)

Flatten a data frame, combine the values of a column into lists to populate individual cells

We can use aggregate

aggregate(Value ~ Color, df1, FUN = toString)

If we need a list

aggregate(Value ~ Color, df1, FUN = list)

Or with dplyr

library(dplyr)
df1 %>%
group_by(Color) %>%
summarise(Value = toString(Value))

Or as a list

df1 %>%
group_by(Color) %>%
summarise(Value = list(Value))

flattern nested list with uneven column numbers into data frame in R

tibbles are a nice format, as they support nested data.frames. I would aim for a tibble with 2 rows, a wide format. In it, each nested list element would be its own data.frame, which we could manipulate later when needed. I would do something like this:

library(tidyverse)
l = unlist(l, recursive = F)
ind_to_nest <- which(map_lgl(l[[1]], is.list))
non_tbl <- map(l, ~ .x[-ind_to_nest])
tbl <- map(l, ~ .x[ind_to_nest])

df <- bind_rows(non_tbl) %>%
mutate(n = 1:n(), .before = 1) %>%
mutate(data = map(tbl, ~ map(.x, ~flatten(.x) %>% bind_cols))) %>%
unnest_wider(data, simplify = F)

Note that this does throw a bunch of warnings. This is because of the name conflicts present within the list.

#> New names:
#> * id -> id...5
#> * id -> id...10

Can be resolved by specifying a naming policy, or by rethinking how the data is read into R to resolve naming conflicts early.

#> Outer names are only allowed for unnamed scalar atomic inputs 

This is a bit tougher to resolve, but this issue is a starting point.

For analysis some cleaning of sub-tibbles can be performed when needed, as different tasks require different shapes.

Flattening a list of data frames into one data frame with purrr::flatten_dfr

Try:

library(tidyverse)

df <- bind_rows(test_list)

I'm not sure there is a way you can solve this with flatten_dfr.

For example, even if you'd have the same length of all dataframes, flatten_dfr would just return one of them.

If the column names would be different and length the same, flatten_dfr would bind those with completely different names, therefore mimicking the behaviour of bind_cols.

Perhaps someone else has a specific use case for flatten_dfr, but I think in the end what you'll end up using is either bind_rows or bind_cols.

Flatten list column in data frame with ID column

You can just use unnest from "tidyr":

library(tidyr)
unnest(df, b)
# a b
# 1 1 1
# 2 2 1
# 3 2 2
# 4 3 1
# 5 3 2
# 6 3 3

Unnest list of lists of data frames, containing NAs

I create a helper function to combine p and c:

foo <- function(x) {
a <- x[[1]]
b <- x[[2]]
if (nrow(b) == 0) b[1, ] <- NA
return(cbind(a, b))
}

Then I run the helper function on each element and bind the rows:

do.call(rbind, lapply(mylist, foo))

The result:

> do.call(rbind, lapply(mylist, foo))
id text from
1 01 one A
2 01 two B
3 02 three C
4 02 four D
5 02 five E
6 03 <NA> <NA>

P.S. The same result using the R base pipe:

lapply(mylist, foo) |> do.call(what = rbind)

How can I best flatten a nested list to a data.frame in R?

You can unlist the result and extract x and y like this:

res <- unlist(result)
res['results.attrs.x']
# results.attrs.x
# "151398.09375"

res['results.attrs.y']
# results.attrs.y
# "540429.3125"

You can get the names of all other values like this:

names(res)
#[1] "results.id" "results.weight" "results.attrs.origin"
# "results.attrs.geom_quadindex" "results.attrs.zoomlevel"
#[6] "results.attrs.featureId" "results.attrs.lon" "results.attrs.detail"
# "results.attrs.rank" "results.attrs.geom_st_box2d" "results.attrs.lat"
# "results.attrs.num" "results.attrs.y" "results.attrs.x" "results.attrs.label"

Then you can combine them in a dataframe:

res_df <- data.frame(
X = res['results.attrs.x'],
Y = res['results.attrs.y']
)


Related Topics



Leave a reply



Submit