How to Flatten R Data Frame That Contains Lists

How to flatten R data frame that contains lists?

Here is another way in base r

df<-data.frame(CAT=c("A","B"))
df$COUNT <-list(1:3,4:5)
df$TREAT <-list(paste("Treat-", letters[1:2],sep=""),paste("Treat-", letters[3:5],sep=""))

Create a helper function to do the work

f <- function(l) {
  if (!is.list(l)) return(l)
  do.call('rbind', lapply(l, function(x) `length<-`(x, max(lengths(l)))))
}

Always test your code

f(df$TREAT)

#           [,1]      [,2]      [,3]     
# [1,] "Treat-a" "Treat-b" NA       
# [2,] "Treat-c" "Treat-d" "Treat-e"

Apply it

df[] <- lapply(df, f)
df

#     CAT COUNT.1 COUNT.2 COUNT.3 TREAT.1 TREAT.2 TREAT.3
#   1   A       1       2       3 Treat-a Treat-b    <NA>
#   2   B       4       5      NA Treat-c Treat-d Treat-e

flatten list column within dataframe in R

Here is one option where we unlist the 'var_2', 'var_3', and unnest

library(dplyr)
library(purrr)
library(tidyr)
test %>% 
    group_split(var_1) %>%
    map_dfr(~ .x %>% 
                mutate_at(-1, ~ list(unlist(.))) %>% 
                unnest(c(var_2, var_3)))
# A tibble: 5 x 3
#  var_1 var_2  var_3 
#  <fct> <fct>  <fct> 
#1 ONE   Date 1 Name 1
#2 ONE   Date 2 Name 2
#3 TWO   Date 3 Name 3
#4 TWO   Date 4 Name 4
#5 TWO   Date 5 Name 5

Or we can do

test %>%
     rowwise %>%
     summarise_all(~ list(unlist(.))) %>%
     unnest(cols = everything())
# A tibble: 5 x 3
#  var_1 var_2  var_3 
#  <fct> <fct>  <fct> 
#1 ONE   Date 1 Name 1
#2 ONE   Date 2 Name 2
#3 TWO   Date 3 Name 3
#4 TWO   Date 4 Name 4
#5 TWO   Date 5 Name 5

Or with

test  %>% 
    group_by(var_1) %>%
    nest %>% 
    mutate(data = map(data, ~ summarise_all(.x, ~ list(unlist(.))) %>% 
    unnest(everything())))   %>% 
    unnest(data)

Convert a list to a data frame

Update July 2020:

The default for the parameter stringsAsFactors is now default.stringsAsFactors() which in turn yields FALSE as its default.

Assuming your list of lists is called l:

df <- data.frame(matrix(unlist(l), nrow=length(l), byrow=TRUE))

The above will convert all character columns to factors, to avoid this you can add a parameter to the data.frame() call:

df <- data.frame(matrix(unlist(l), nrow=132, byrow=TRUE),stringsAsFactors=FALSE)

Flatten a data frame, combine the values of a column into lists to populate individual cells

We can use aggregate

aggregate(Value ~ Color, df1, FUN = toString)

If we need a list

aggregate(Value ~ Color, df1, FUN = list)

Or with dplyr

library(dplyr)
df1 %>%
   group_by(Color) %>%
   summarise(Value = toString(Value))

Or as a list

df1 %>%
   group_by(Color) %>%
   summarise(Value = list(Value))

flattern nested list with uneven column numbers into data frame in R

tibbles are a nice format, as they support nested data.frames. I would aim for a tibble with 2 rows, a wide format. In it, each nested list element would be its own data.frame, which we could manipulate later when needed. I would do something like this:

library(tidyverse)
l = unlist(l, recursive = F)
ind_to_nest <- which(map_lgl(l[[1]], is.list))
non_tbl <- map(l, ~ .x[-ind_to_nest])
tbl <- map(l, ~ .x[ind_to_nest])

df <- bind_rows(non_tbl) %>%
  mutate(n = 1:n(), .before = 1) %>%
  mutate(data =  map(tbl,  ~ map(.x, ~flatten(.x) %>% bind_cols))) %>%
  unnest_wider(data, simplify = F)

Note that this does throw a bunch of warnings. This is because of the name conflicts present within the list.

#> New names:
#> * id -> id...5
#> * id -> id...10

Can be resolved by specifying a naming policy, or by rethinking how the data is read into R to resolve naming conflicts early.

#> Outer names are only allowed for unnamed scalar atomic inputs

This is a bit tougher to resolve, but this issue is a starting point.

For analysis some cleaning of sub-tibbles can be performed when needed, as different tasks require different shapes.

Flattening a list of data frames into one data frame with purrr::flatten_dfr

Try:

library(tidyverse)

df <- bind_rows(test_list)

I'm not sure there is a way you can solve this with flatten_dfr.

For example, even if you'd have the same length of all dataframes, flatten_dfr would just return one of them.

If the column names would be different and length the same, flatten_dfr would bind those with completely different names, therefore mimicking the behaviour of bind_cols.

Perhaps someone else has a specific use case for flatten_dfr, but I think in the end what you'll end up using is either bind_rows or bind_cols.

Flatten list column in data frame with ID column

You can just use unnest from "tidyr":

library(tidyr)
unnest(df, b)
#   a b
# 1 1 1
# 2 2 1
# 3 2 2
# 4 3 1
# 5 3 2
# 6 3 3

Unnest list of lists of data frames, containing NAs

I create a helper function to combine p and c:

foo <- function(x) {
  a <- x[[1]]
  b <- x[[2]]
  if (nrow(b) == 0) b[1, ] <- NA
  return(cbind(a, b))
}

Then I run the helper function on each element and bind the rows:

do.call(rbind, lapply(mylist, foo))

The result:

> do.call(rbind, lapply(mylist, foo))
  id  text from
1 01   one    A
2 01   two    B
3 02 three    C
4 02  four    D
5 02  five    E
6 03  <NA> <NA>

P.S. The same result using the R base pipe:

lapply(mylist, foo) |> do.call(what = rbind)

How can I best flatten a nested list to a data.frame in R?

You can unlist the result and extract x and y like this:

res <- unlist(result)
res['results.attrs.x']
# results.attrs.x 
#  "151398.09375"

res['results.attrs.y']
# results.attrs.y 
#  "540429.3125"

You can get the names of all other values like this:

names(res)
#[1] "results.id"  "results.weight"  "results.attrs.origin"         
#    "results.attrs.geom_quadindex" "results.attrs.zoomlevel"     
#[6] "results.attrs.featureId" "results.attrs.lon" "results.attrs.detail"   
#    "results.attrs.rank" "results.attrs.geom_st_box2d" "results.attrs.lat"
#    "results.attrs.num" "results.attrs.y" "results.attrs.x"  "results.attrs.label"

Then you can combine them in a dataframe:

res_df <- data.frame(
  X = res['results.attrs.x'],
  Y = res['results.attrs.y']
)

How to Flatten R Data Frame That Contains Lists