How to flatten R data frame that contains lists?
Here is another way in base r
df<-data.frame(CAT=c("A","B"))
df$COUNT <-list(1:3,4:5)
df$TREAT <-list(paste("Treat-", letters[1:2],sep=""),paste("Treat-", letters[3:5],sep=""))
Create a helper function to do the work
f <- function(l) {
if (!is.list(l)) return(l)
do.call('rbind', lapply(l, function(x) `length<-`(x, max(lengths(l)))))
}
Always test your code
f(df$TREAT)
# [,1] [,2] [,3]
# [1,] "Treat-a" "Treat-b" NA
# [2,] "Treat-c" "Treat-d" "Treat-e"
Apply it
df[] <- lapply(df, f)
df
# CAT COUNT.1 COUNT.2 COUNT.3 TREAT.1 TREAT.2 TREAT.3
# 1 A 1 2 3 Treat-a Treat-b <NA>
# 2 B 4 5 NA Treat-c Treat-d Treat-e
flatten list column within dataframe in R
Here is one option where we unlist
the 'var_2', 'var_3', and unnest
library(dplyr)
library(purrr)
library(tidyr)
test %>%
group_split(var_1) %>%
map_dfr(~ .x %>%
mutate_at(-1, ~ list(unlist(.))) %>%
unnest(c(var_2, var_3)))
# A tibble: 5 x 3
# var_1 var_2 var_3
# <fct> <fct> <fct>
#1 ONE Date 1 Name 1
#2 ONE Date 2 Name 2
#3 TWO Date 3 Name 3
#4 TWO Date 4 Name 4
#5 TWO Date 5 Name 5
Or we can do
test %>%
rowwise %>%
summarise_all(~ list(unlist(.))) %>%
unnest(cols = everything())
# A tibble: 5 x 3
# var_1 var_2 var_3
# <fct> <fct> <fct>
#1 ONE Date 1 Name 1
#2 ONE Date 2 Name 2
#3 TWO Date 3 Name 3
#4 TWO Date 4 Name 4
#5 TWO Date 5 Name 5
Or with
test %>%
group_by(var_1) %>%
nest %>%
mutate(data = map(data, ~ summarise_all(.x, ~ list(unlist(.))) %>%
unnest(everything()))) %>%
unnest(data)
Convert a list to a data frame
Update July 2020:
The default for the parameter stringsAsFactors
is now default.stringsAsFactors()
which in turn yields FALSE
as its default.
Assuming your list of lists is called l
:
df <- data.frame(matrix(unlist(l), nrow=length(l), byrow=TRUE))
The above will convert all character columns to factors, to avoid this you can add a parameter to the data.frame() call:
df <- data.frame(matrix(unlist(l), nrow=132, byrow=TRUE),stringsAsFactors=FALSE)
Flatten a data frame, combine the values of a column into lists to populate individual cells
We can use aggregate
aggregate(Value ~ Color, df1, FUN = toString)
If we need a list
aggregate(Value ~ Color, df1, FUN = list)
Or with dplyr
library(dplyr)
df1 %>%
group_by(Color) %>%
summarise(Value = toString(Value))
Or as a list
df1 %>%
group_by(Color) %>%
summarise(Value = list(Value))
flattern nested list with uneven column numbers into data frame in R
tibble
s are a nice format, as they support nested data.frames. I would aim for a tibble with 2 rows, a wide format. In it, each nested list element would be its own data.frame, which we could manipulate later when needed. I would do something like this:
library(tidyverse)
l = unlist(l, recursive = F)
ind_to_nest <- which(map_lgl(l[[1]], is.list))
non_tbl <- map(l, ~ .x[-ind_to_nest])
tbl <- map(l, ~ .x[ind_to_nest])
df <- bind_rows(non_tbl) %>%
mutate(n = 1:n(), .before = 1) %>%
mutate(data = map(tbl, ~ map(.x, ~flatten(.x) %>% bind_cols))) %>%
unnest_wider(data, simplify = F)
Note that this does throw a bunch of warnings. This is because of the name conflicts present within the list.
#> New names:
#> * id -> id...5
#> * id -> id...10
Can be resolved by specifying a naming policy, or by rethinking how the data is read into R to resolve naming conflicts early.
#> Outer names are only allowed for unnamed scalar atomic inputs
This is a bit tougher to resolve, but this issue is a starting point.
For analysis some cleaning of sub-tibbles can be performed when needed, as different tasks require different shapes.
Flattening a list of data frames into one data frame with purrr::flatten_dfr
Try:
library(tidyverse)
df <- bind_rows(test_list)
I'm not sure there is a way you can solve this with flatten_dfr
.
For example, even if you'd have the same length of all dataframes, flatten_dfr
would just return one of them.
If the column names would be different and length the same, flatten_dfr
would bind those with completely different names, therefore mimicking the behaviour of bind_cols
.
Perhaps someone else has a specific use case for flatten_dfr
, but I think in the end what you'll end up using is either bind_rows
or bind_cols
.
Flatten list column in data frame with ID column
You can just use unnest
from "tidyr":
library(tidyr)
unnest(df, b)
# a b
# 1 1 1
# 2 2 1
# 3 2 2
# 4 3 1
# 5 3 2
# 6 3 3
Unnest list of lists of data frames, containing NAs
I create a helper function to combine p
and c
:
foo <- function(x) {
a <- x[[1]]
b <- x[[2]]
if (nrow(b) == 0) b[1, ] <- NA
return(cbind(a, b))
}
Then I run the helper function on each element and bind the rows:
do.call(rbind, lapply(mylist, foo))
The result:
> do.call(rbind, lapply(mylist, foo))
id text from
1 01 one A
2 01 two B
3 02 three C
4 02 four D
5 02 five E
6 03 <NA> <NA>
P.S. The same result using the R base pipe:
lapply(mylist, foo) |> do.call(what = rbind)
How can I best flatten a nested list to a data.frame in R?
You can unlist the result and extract x and y like this:
res <- unlist(result)
res['results.attrs.x']
# results.attrs.x
# "151398.09375"
res['results.attrs.y']
# results.attrs.y
# "540429.3125"
You can get the names of all other values like this:
names(res)
#[1] "results.id" "results.weight" "results.attrs.origin"
# "results.attrs.geom_quadindex" "results.attrs.zoomlevel"
#[6] "results.attrs.featureId" "results.attrs.lon" "results.attrs.detail"
# "results.attrs.rank" "results.attrs.geom_st_box2d" "results.attrs.lat"
# "results.attrs.num" "results.attrs.y" "results.attrs.x" "results.attrs.label"
Then you can combine them in a dataframe:
res_df <- data.frame(
X = res['results.attrs.x'],
Y = res['results.attrs.y']
)
Related Topics
How to Count Sequences of Ones in a Logical Vector
Print the Sourced R File to an Appendix Using Sweave
Convert a File Encoding Using R? (Ansi to Utf-8)
Assign Column Names to List of Dataframes
How to Show a Loading Screen When the Output Is Being Calculated in a Background Process
Print a Data Frame with Columns Aligned (As Displayed in R)
How to Set Axis Ranges in Ggplot2 When Using a Log Scale
Divide All Columns by a Chosen Column Using Mutate_All
Paste Several Column Values into One Value in R
How to Write Data from R to Postgresql Tables with an Autoincrementing Primary Key
How to Download a Large Binary File with Rcurl *After* Server Authentication
Calculate Elapsed Time Since Last Event
Transfer Data from Database to Spark Using Sparklyr
Mgcv Gam() Error: Model Has More Coefficients Than Data
Calculate the Derivative of a Data-Function in R
How to Install R Packages via Proxy [User + Password]
Programming with Ggplot2 and Dplyr
Finding the Index of First Changes in the Elements of a Vector