How to Unnest Column-List

Unnest a list column directly into several columns

with tidyr 1.0.0 you can do :

library(tidyr)
df1 <- tibble(
gr = c('a', 'b', 'c'),
values = list(1:2, 3:4, 5:6)
)

unnest_wider(df1, values)
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> # A tibble: 3 x 3
#> gr ...1 ...2
#> <chr> <int> <int>
#> 1 a 1 2
#> 2 b 3 4
#> 3 c 5 6

Created on 2019-09-14 by the reprex package (v0.3.0)

The output is verbose here because the elements that were unnested horizontally (the vector elements) were not named, and unnest_wider doesn't want to guess silently.

We can name them beforehand to avoid it :

df1 %>%
dplyr::mutate(values = purrr::map(values, setNames, c("V1","V2"))) %>%
unnest_wider(values)
#> # A tibble: 3 x 3
#> gr V1 V2
#> <chr> <int> <int>
#> 1 a 1 2
#> 2 b 3 4
#> 3 c 5 6

Or just use suppressMessages() or purrr::quietly()

How to unnest column-list?

You can do this by coercing the elements in the list column to data frames arranged as you like, which will unnest nicely:

library(tidyverse)

tibble(a = c('first', 'second'),
b = list(c('colA' = 1, 'colC' = 2), c('colA'= 3, 'colB'=2))) %>%
mutate(b = invoke_map(tibble, b)) %>%
unnest()
#> # A tibble: 2 x 4
#> a colA colC colB
#> <chr> <dbl> <dbl> <dbl>
#> 1 first 1. 2. NA
#> 2 second 3. NA 2.

Doing the coercion is a little tricky, though, as you don't want to end up with a 2x1 data frame. There are various ways around this, but a direct route is purrr::invoke_map, which calls a function with purrr::invoke (like do.call) on each element in a list.

How to unnest a single column from a nested list without unnesting all

You can use map() to grab out that column and then unnest that.

nested %>% 
mutate(qsec = map(extra, "qsec")) %>%
unnest_longer(qsec)

Notice though that this might create problems depending on what you do. This is because you'll now potentially have duplicates with the rest of the nested data. Might be safer to just unnest and nest again.

# A tibble: 32 x 6
mpg cyl disp hp extra qsec
<dbl> <dbl> <dbl> <dbl> <list> <dbl>
1 21 6 160 110 <tibble [2 x 7]> 16.5
2 21 6 160 110 <tibble [2 x 7]> 17.0
3 22.8 4 108 93 <tibble [1 x 7]> 18.6
4 21.4 6 258 110 <tibble [1 x 7]> 19.4
5 18.7 8 360 175 <tibble [1 x 7]> 17.0
6 18.1 6 225 105 <tibble [1 x 7]> 20.2
7 14.3 8 360 245 <tibble [1 x 7]> 15.8
8 24.4 4 147. 62 <tibble [1 x 7]> 20
9 22.8 4 141. 95 <tibble [1 x 7]> 22.9
10 19.2 6 168. 123 <tibble [1 x 7]> 18.3

Unnesting a list of lists in a data frame column

Note: Ignore the original and Update 1; Update 2 is better with the current state of the tidyverse.


Original:

With purrr, which is nice for lists,

library(purrr)

df %>% dmap(unlist)
## # A tibble: 2 x 2
## x y
## <dbl> <dbl>
## 1 1 1
## 2 1 2

which is more or less equivalent to

as.data.frame(lapply(df, unlist))
## x y
## a 1 1
## b 1 2

Update 1:

dmap has been deprecated and moved to purrrlyr, the home of interesting but ill-fated functions that will now shout lots of deprecation warnings at you. You could translate the base R idiom to tidyverse:

df %>% map(unlist) %>% as_tibble()

which will work fine for this case, but not for more than one row (a problem all these approaches face). A more robust solution might be

library(tidyverse)

df %>% bind_rows(df) %>% # make larger sample data
mutate_if(is.list, simplify_all) %>% # flatten each list element internally
unnest() # expand
#> # A tibble: 4 × 2
#> x y
#> <dbl> <dbl>
#> 1 1 1
#> 2 1 2
#> 3 1 1
#> 4 1 2

Update 2:

At some point since this was asked, tidyr::unnest() got updated such that it doesn't error anymore, so you can just do

df %>%
unnest(y) %>%
unnest(y)
#> # A tibble: 2 × 2
#> x y
#> <dbl> <dbl>
#> 1 1 1
#> 2 1 2

If you care about the names in the list, pull them out first and then unnest the names and the list at the same time:

df %>%
mutate(label = map(y, names)) %>%
unnest(c(y, label)) %>%
unnest(y)
#> # A tibble: 2 × 3
#> x y label
#> <dbl> <dbl> <chr>
#> 1 1 1 a
#> 2 1 2 b

I'll leave the previous answers for continuity, but this is simpler.

Unnest one of several list columns in dataframe

According to unnest, the argument ...

Specification of columns to nest. Use bare variable names or
functions of variables. If omitted, defaults to all list-cols.

Therefore, we could specify the column name to be unnested after the rename_all

iris %>
... #op's code
...

rename_all(funs(str_c("Mean.", .))))) %>%
unnest(sum_data)
# A tibble: 3 x 6
# Species data Mean.Sepal.Length Mean.Sepal.Width Mean.Petal.Length Mean.Petal.Width
# <fctr> <list> <dbl> <dbl> <dbl> <dbl>
#1 setosa <tibble [50 x 4]> 5.01 3.43 1.46 0.246
#2 versicolor <tibble [50 x 4]> 5.94 2.77 4.26 1.33
#3 virginica <tibble [50 x 4]> 6.59 2.97 5.55 2.03

Unlist/unnest list column into several columns

It is easier to do this rowSums i.e. divide the 'product1' by the rowSums on the columns that starts with key word 'product'. Instead of doing rowwise with c_across, this is vectorized and should be fast as well

library(dplyr)
dat %>%
mutate(sum_product = product1/rowSums(select(., starts_with('product'))))

NOTE: There is a mixing of base R code (apply) and the tidyverse option with across which doesn't seem to be the optimal way


If we need to do this for all the 'product' columns, create a sum column first with mutate and then use across on the columns that starts with 'product' to divide the column by 'Sum_col'

dat %>%
mutate(Sum_col = rowSums(select(., starts_with('product'))),
across(starts_with('product'),
~ ./Sum_col, .names = '{.col}_sum_product')) %>%
select(-Sum_col)

-output

#ysRespNum  product1 product2 product3 product1_sum_product product2_sum_product product3_sum_product
#1 1 23.766555 13.46907 24.32327 0.3860783 0.2187998 0.3951219
#2 2 30.071773 15.98740 11.39922 0.5233660 0.2782431 0.1983909
#3 3 18.224328 11.03880 20.67063 0.3649701 0.2210688 0.4139610
#4 4 30.140839 19.78984 19.62087 0.4333597 0.2845348 0.2821054
#5 5 8.915628 30.75021 24.29150 0.1393996 0.4807925 0.3798079
#6 6 23.791981 11.14885 21.72450 0.4198684 0.1967490 0.3833826

Or using base R

nm1 <- startsWith(names(dat), 'product')
dat[paste0('sum_product', seq_along(nm1))] <- dat[nm1]/rowSums(dat[nm1])

Unnest multiple columns-list from tibble with tidyr unnes_wider

We could use unlist2d with map

library(purrr)
library(collapse)
map_dfc(my_tibble, ~ unlist2d(.x) %>%
select(-1)) %>%
set_names(paste0(rep(names(my_tibble), each = 2), "_", 1:2))

-output

#    Level_1 Level_2 Level2_1 Level2_2 Level3_1 Level3_2
#1 10 20 30 40 330 430
#2 10 20 50 20 530 33

Or loop over the names of the data in map and apply unnest_wider

library(tidyr)
map_dfc(names(my_tibble),
~ my_tibble %>%
select(.x) %>%
unnest_wider(.x)) %>%
set_names(paste0(rep(names(my_tibble), each = 2), "_", 1:2))

According to documentation from ?unnest_wider

.col, col -
List-column to extract components from.

So, it is just a single column that we can specify where as in ?unnest, it is cols

cols - Columns to unnest.



Related Topics



Leave a reply



Submit