Unnesting a list of lists in a data frame column
Note: Ignore the original and Update 1; Update 2 is better with the current state of the tidyverse.
Original:
With purrr
, which is nice for lists,
library(purrr)
df %>% dmap(unlist)
## # A tibble: 2 x 2
## x y
## <dbl> <dbl>
## 1 1 1
## 2 1 2
which is more or less equivalent to
as.data.frame(lapply(df, unlist))
## x y
## a 1 1
## b 1 2
Update 1:
dmap
has been deprecated and moved to purrrlyr, the home of interesting but ill-fated functions that will now shout lots of deprecation warnings at you. You could translate the base R idiom to tidyverse:
df %>% map(unlist) %>% as_tibble()
which will work fine for this case, but not for more than one row (a problem all these approaches face). A more robust solution might be
library(tidyverse)
df %>% bind_rows(df) %>% # make larger sample data
mutate_if(is.list, simplify_all) %>% # flatten each list element internally
unnest() # expand
#> # A tibble: 4 × 2
#> x y
#> <dbl> <dbl>
#> 1 1 1
#> 2 1 2
#> 3 1 1
#> 4 1 2
Update 2:
At some point since this was asked, tidyr::unnest()
got updated such that it doesn't error anymore, so you can just do
df %>%
unnest(y) %>%
unnest(y)
#> # A tibble: 2 × 2
#> x y
#> <dbl> <dbl>
#> 1 1 1
#> 2 1 2
If you care about the names in the list, pull them out first and then unnest the names and the list at the same time:
df %>%
mutate(label = map(y, names)) %>%
unnest(c(y, label)) %>%
unnest(y)
#> # A tibble: 2 × 3
#> x y label
#> <dbl> <dbl> <chr>
#> 1 1 1 a
#> 2 1 2 b
I'll leave the previous answers for continuity, but this is simpler.
Unnest list of lists of data frames, containing NAs
I create a helper function to combine p
and c
:
foo <- function(x) {
a <- x[[1]]
b <- x[[2]]
if (nrow(b) == 0) b[1, ] <- NA
return(cbind(a, b))
}
Then I run the helper function on each element and bind the rows:
do.call(rbind, lapply(mylist, foo))
The result:
> do.call(rbind, lapply(mylist, foo))
id text from
1 01 one A
2 01 two B
3 02 three C
4 02 four D
5 02 five E
6 03 <NA> <NA>
P.S. The same result using the R base pipe:
lapply(mylist, foo) |> do.call(what = rbind)
R: How to unnest a nested list into data.frame?
We could also use as.data.frame
and this gets the correct type
out <- map_dfr(l12, as.data.frame)
str(out)
#'data.frame': 2 obs. of 3 variables:
# $ SeriousDlqin2yrs.prediction : chr "0" "1"
# $ SeriousDlqin2yrs.prediction_probs.0: num 0.5 0.6
# $ SeriousDlqin2yrs.prediction_probs.1: num 0.5 0.4
Or in base R
do.call(rbind, lapply(l12, as.data.frame))
# SeriousDlqin2yrs.prediction SeriousDlqin2yrs.prediction_probs.0 SeriousDlqin2yrs.prediction_probs.1
#1 0 0.5 0.5
#2 1 0.6 0.4
unnest list of lists of different lengths to dataframe
Would this work for you ?
library(jsonlite)
library(tidyverse)
data = fromJSON("http://search.worldbank.org/api/v2/wds?format=json&fl=abstracts,admreg,alt_title,authr,available_in,bdmdt,chronical_docm_id,closedt,colti,count,credit_no,disclosure_date,disclosure_type,disclosure_type_date,disclstat,display_title,docdt,docm_id,docna,docty,dois,entityid,envcat,geo_reg,geo_reg,geo_reg_and_mdk,guid,historic_topic,id,isbn,ispublicdocs,issn,keywd,lang,listing_relative_url,lndinstr,loan_no,majdocty,majtheme,ml_abstract,ml_display_title,new_url,owner,pdfurl,prdln,projectid,projn,publishtoextweb_dt,repnb,repnme,seccl,sectr,src_cit,subsc,subtopic,teratopic,theme,topic,topicv3,totvolnb,trustfund,txturl,unregnbr,url_friendly_title,versiontyp,versiontyp_key,virt_coll,vol_title,volnb&str_docdt=1986-01-01&end_docdt=2000-12-31&rows=500&os=1&srt=docdt&order=desc")
df <-
data$documents %>%
head(-1) %>% # remove facet element
transpose %>% # transpose so each subelement is now a main element
as_tibble %>% # convert to table
purrr::modify(~replace(.x,lengths(.x)==0,list(NA))) %>% # replace empty elements by list(NA) so they have length 1 too
modify_if(~all(lengths(.x)==1),unlist) # unlist lists that contain only items of length 1
Only one list column remains:
names(df)[map_chr(df,class) == "list"]
# [1] "keywd"
As it contains items of length 1 or 2:
table(lengths(df$keywd))
# 1 2
# 224 276
Here's what the output looks like:
glimpse(df)
# Observations: 500
# Variables: 38
# $ url <chr> "http://documents.worldbank.org/curated/en/903231468764970044/Attacking-rural-poverty-strategy-and-public-actions", "...
# $ available_in <chr> "English", "English", "English", "English", "English", "English,French,Spanish,Portuguese", "Portuguese,Chinese,Engli...
# $ url_friendly_title <chr> "http://documents.worldbank.org/curated/en/903231468764970044/Attacking-rural-poverty-strategy-and-public-actions", "...
# $ new_url <chr> "2000/12/1000476/Attacking-rural-poverty-strategy-and-public-actions", "2000/12/1000501/State-policies-and-womens-aut...
# $ guid <chr> "903231468764970044", "429001468753367328", "985531468746683502", "890081468757236671", "922151468776107524", "324581...
# $ disclosure_date <chr> "2010-07-01T00:00:00Z", "2010-07-01T00:00:00Z", "2010-07-01T00:00:00Z", "2010-07-01T00:00:00Z", "2010-07-01T00:00:00Z...
# $ disclosure_type <chr> "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA", "NA...
# $ disclosure_type_date <chr> "2010-07-01T00:00:00Z", "2010-07-01T00:00:00Z", "2010-07-01T00:00:00Z", "2010-07-01T00:00:00Z", "2010-07-01T00:00:00Z...
# $ publishtoextweb_dt <chr> "2010-07-01T00:00:00Z", "2010-07-01T00:00:00Z", "2010-07-01T00:00:00Z", "2010-07-01T00:00:00Z", "2010-07-01T00:00:00Z...
# $ docm_id <chr> "090224b0828c737a", "090224b0828ac316", "090224b0828bd3f7", "090224b0828ac343", "090224b0828cf43d", "090224b0828cf42b...
# $ chronical_docm_id <chr> "090224b0828c737a", "090224b0828ac316", "090224b0828bd3f7", "090224b0828ac343", "090224b0828cf43d", "090224b0828cf42b...
# $ txturl <chr> "http://documents.worldbank.org/curated/en/903231468764970044/text/multi-page.txt", "http://documents.worldbank.org/c...
# $ pdfurl <chr> "http://documents.worldbank.org/curated/en/903231468764970044/pdf/multi-page.pdf", "http://documents.worldbank.org/cu...
# $ docdt <chr> "2000-12-31T00:00:00Z", "2000-12-31T00:00:00Z", "2000-12-31T00:00:00Z", "2000-12-31T00:00:00Z", "2000-12-31T00:00:00Z...
# $ totvolnb <chr> "1", "1", "1", "1", "5", "1", "1", "14", "1", "1", "1", "1", "14", "14", "14", "14", "14", "14", "14", "14", "14", "1...
# $ versiontyp <chr> "Final", "Final", "Final", "Final", "Final", "Final", "Final", "Final", "Final", "Final", "Final", "Final", "Final", ...
# $ versiontyp_key <chr> "1309935", "1309935", "1309935", "1309935", "1309935", "1309935", "1309935", "1309935", "1309935", "1309935", "130993...
# $ volnb <chr> "1", "1", "1", "1", "4", "1", "1", "8", "1", "1", "1", "1", "13", "4", "9", "12", "3", "2", "7", "10", "1", "6", "11"...
# $ repnme <chr> "Attacking rural poverty : strategy and\n public actions", "State policies and women's autonomy in\n ...
# $ abstracts <chr> "Poverty remains pervasive, and its\n incidence and intensity are usually higher in rural than in\n ...
# $ display_title <chr> "Attacking rural poverty :\n strategy and public actions", "State policies and women's\n autono...
# $ listing_relative_url <chr> "/research/2000/12/1000476/attacking-rural-poverty-strategy-public-actions", "/research/2000/12/1000501/state-policie...
# $ docty <chr> "Newsletter", "Working Paper (Numbered Series)", "Publication", "Poverty Reduction Strategy Paper (PRSP)", "Environme...
# $ subtopic <chr> "Economic Theory & Research,Rural Settlements,Industrial Economics,Nutrition,Educational Sciences,Economic Growth,Agr...
# $ docna <chr> "Attacking rural poverty : strategy and\n public actions", "State policies and women's autonomy in\n ...
# $ teratopic <chr> "Poverty Reduction", "Education", "Energy", "Poverty Reduction", "Industry,Transport,Water Resources", NA, "Governanc...
# $ authors <chr> "Okidegbe, Nwanze", "Zhang, Xiaodan", "Bogach, V. Susan", NA, "Carl Brothers International Inc.", "World Bank", "Mann...
# $ entityids <chr> "000094946_01022305364180", "000094946_01022705322025", "000094946_01011005520622", "000094946_0102240538258", "00009...
# $ subsc <chr> "Macro/Non-Trade", "Human Development", "(Historic)Other power and energy conversion", "(Historic)Macro/non-trade", "...
# $ lang <chr> "English", "English", "English", "English", "English", "Portuguese", "English", "English", "Chinese", "English", "Eng...
# $ historic_topic <chr> "Poverty Reduction", "Education", "Energy", "Poverty Reduction", "Industry,Transport,Water Resources", NA, "Governanc...
# $ seccl <chr> "Public", "Public", "Public", "Public", "Public", "Public", "Public", "Public", "Public", "Public", "Public", "Public...
# $ sectr <chr> "(Historic)Economic Policy", "(Historic)Multisector", "(Historic)Electric Power & Other Energy", "(Historic)Economic ...
# $ majdocty <chr> "Publications & Research", "Publications & Research", "Publications,Publications & Research", "Country Focus", "Proje...
# $ src_cit <chr> "Rural development note. -- No. 6 (December 2000)", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
# $ keywd <list> [[["Rural Poor;medium term expenditure\n framework;rural poverty reduction strategy;rural\n ar...
# $ owner <chr> "Environ & Soc Sustainable Dev VP (ESD)", "Off of Sr VP Dev Econ/Chief Econ (DECVP)", "Energy & Mining Sector Unit (E...
# $ repnb <chr> "21649", "21743", "WTP492", "21834", "E287", "27779", "21604", "E425", "21604", "22194", "21837", "22903", "E425", "E...
How to unnest a list containing data frames
One of the issue is that there are nested data.frame within each column
library(tidyverse)
df %>%
mutate(json = map(json, ~ if(is.null(.x))
tibble(attributes.StreetName = NA_character_, attributes.Match_addr = NA_character_)
else do.call(data.frame, c(.x, stringsAsFactors = FALSE)))) %>%
unnest
# A tibble: 5 x 7
# full_address attributes.StreetNa… attributes.Match_ad… address location.x location.y score
# <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#1 2379 ADDISON BLVD, HIGH POINT, … <NA> <NA> <NA> NA NA NA
#2 1751 W LEXINGTON AVE, HIGH POIN… <NA> <NA> <NA> NA NA NA
#3 2514 WILLARD DAIRY RD, HIGH POI… WILLARD DAIRY 2514 WILLARD DAIRY 2514 WILLARD DAI… -80.0 36.0 92.8
#4 126 MARYWOOD DR, HIGH POINT, NC… MARYWOOD 126 MARYWOOD, HIGH … 126 MARYWOOD, HI… -80.0 36.0 97.2
#5 508 EDNEY RIDGE RD, GREENSBORO,… EDNEY RIDGE 508 EDNEY RIDGE RD 508 EDNEY RIDGE … -79.8 36.1 100
Or using map_if
f1 <- function(dat) {
dat %>%
flatten
}
f2 <- function(dat) {
tibble(attributes.StreetName = NA_character_,
attributes.Match_addr = NA_character_)
}
df %>%
mutate(json = map_if(json, is.data.frame, f1, .else = f2)) %>%
unnest
# A tibble: 5 x 7
# full_address attributes.StreetNa… attributes.Match_ad… address score location.x location.y
# <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#1 2379 ADDISON BLVD, HIGH POINT, … <NA> <NA> <NA> NA NA NA
#2 1751 W LEXINGTON AVE, HIGH POIN… <NA> <NA> <NA> NA NA NA
#3 2514 WILLARD DAIRY RD, HIGH POI… WILLARD DAIRY 2514 WILLARD DAIRY 2514 WILLARD DAI… 92.8 -80.0 36.0
#4 126 MARYWOOD DR, HIGH POINT, NC… MARYWOOD 126 MARYWOOD, HIGH … 126 MARYWOOD, HI… 97.2 -80.0 36.0
#5 508 EDNEY RIDGE RD, GREENSBORO,… EDNEY RIDGE 508 EDNEY RIDGE RD 508 EDNEY RIDGE … 100 -79.8 36.1
How to turn a list of lists into columns of a pandas dataframe?
You can try using df.explode
and df.apply
:
import pandas as pd
df = pd.DataFrame(data= {'Generation': 0, 'Route_set':[[[20., 19., 47., 56.], [21., 34., 78., 34.]]]})
df['route1']=df['Route_set'].apply(lambda x: x[0])
df['route2']=df['Route_set'].apply(lambda x: x[1])
df = df.explode(['route1', 'route2'], ignore_index=True)
df2 = df[df.columns.difference(['Route_set', 'Generation'])]
| | route1 | route2 |
|---:|---------:|---------:|
| 0 | 20 | 21 |
| 1 | 19 | 34 |
| 2 | 47 | 78 |
| 3 | 56 | 34 |
Or you can just create a new dataframe with the values like this:
import pandas as pd
df = pd.DataFrame(data= {'Generation': 0, 'Route_set':[[[20., 19., 47., 56.], [21., 34., 78., 34.]]]})
df1 = pd.DataFrame.from_dict(dict(zip(['route1', 'route2'], df.Route_set.to_numpy()[0])), orient='index').transpose()
| | route1 | route2 |
|---:|---------:|---------:|
| 0 | 20 | 21 |
| 1 | 19 | 34 |
| 2 | 47 | 78 |
| 3 | 56 | 34 |
Update 1:
import pandas as pd
df = pd.DataFrame(data= {'Generation': 0, 'Route_set':[
[[20.0, 19.0, 47.0, 56.0, 43.0, 53.0, 18.0, -1.0, -1.0, -1.0, -1.0, -1.0], [20.0, 51.0, 46.0, 37.0, 2.0, 57.0, 49.0, 36.0, 25.0, 5.0, 4.0, 34.0], [54.0, 23.0, 5.0, 46.0, 34.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [57.0, 48.0, 46.0, 35.0, 25.0, 27.0, 52.0, 8.0, 39.0, 22.0, 51.0, 28.0], [57.0, 16.0, 45.0, 25.0, 49.0, 38.0, 0.0, 46.0, 13.0, 18.0, 19.0, 20.0], [21.0, 11.0, 6.0, 33.0, 25.0, 49.0, 57.0, 29.0, 12.0, 3.0, -1.0, -1.0], [9.0, 15.0, 47.0, 42.0, 25.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [51.0, 25.0, 22.0, 14.0, 39.0, 8.0, 40.0, 0.0, 10.0, 26.0, 32.0, 47.0], [1.0, 33.0, 24.0, 46.0, 56.0, 30.0, 48.0, 51.0, -1.0, -1.0, -1.0, -1.0], [25.0, 31.0, 50.0, 17.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [57.0, 12.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [20.0, 41.0, 47.0, 15.0, 46.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [14.0, 44.0, 39.0, 25.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [20.0, 51.0, 25.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [57.0, 49.0, 5.0, 20.0, 37.0, 46.0, 36.0, 25.0, 39.0, 51.0, 48.0, -1.0], [5.0, 0.0, 33.0, 55.0, 25.0, 48.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [51.0, 32.0, 33.0, 24.0, 35.0, 8.0, 25.0, 4.0, 46.0, 1.0, 7.0, -1.0], [5.0, 25.0, 34.0, 46.0, 1.0, 9.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [38.0, 57.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [12.0, 57.0, 49.0, 25.0, 9.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0]],
]})
data = df.Route_set.to_numpy()[0]
df = pd.DataFrame.from_dict(dict(zip(['route{}'.format(i) for i in range(1, len(data)+1)], [data[i] for i in range(len(data))])), orient='index').transpose()
df = df.apply(lambda x: x.explode() if 'route' in x.name else x)
df[sorted(df.columns)]
print(df.to_markdown())
| | route1 | route2 | route3 | route4 | route5 | route6 | route7 | route8 | route9 | route10 | route11 | route12 | route13 | route14 | route15 | route16 | route17 | route18 | route19 | route20 |
|---:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|
| 0 | 20 | 20 | 54 | 57 | 57 | 21 | 9 | 51 | 1 | 25 | 57 | 20 | 14 | 20 | 57 | 5 | 51 | 5 | 38 | 12 |
| 1 | 19 | 51 | 23 | 48 | 16 | 11 | 15 | 25 | 33 | 31 | 12 | 41 | 44 | 51 | 49 | 0 | 32 | 25 | 57 | 57 |
| 2 | 47 | 46 | 5 | 46 | 45 | 6 | 47 | 22 | 24 | 50 | -1 | 47 | 39 | 25 | 5 | 33 | 33 | 34 | -1 | 49 |
| 3 | 56 | 37 | 46 | 35 | 25 | 33 | 42 | 14 | 46 | 17 | -1 | 15 | 25 | -1 | 20 | 55 | 24 | 46 | -1 | 25 |
| 4 | 43 | 2 | 34 | 25 | 49 | 25 | 25 | 39 | 56 | -1 | -1 | 46 | -1 | -1 | 37 | 25 | 35 | 1 | -1 | 9 |
| 5 | 53 | 57 | -1 | 27 | 38 | 49 | -1 | 8 | 30 | -1 | -1 | -1 | -1 | -1 | 46 | 48 | 8 | 9 | -1 | -1 |
| 6 | 18 | 49 | -1 | 52 | 0 | 57 | -1 | 40 | 48 | -1 | -1 | -1 | -1 | -1 | 36 | -1 | 25 | -1 | -1 | -1 |
| 7 | -1 | 36 | -1 | 8 | 46 | 29 | -1 | 0 | 51 | -1 | -1 | -1 | -1 | -1 | 25 | -1 | 4 | -1 | -1 | -1 |
| 8 | -1 | 25 | -1 | 39 | 13 | 12 | -1 | 10 | -1 | -1 | -1 | -1 | -1 | -1 | 39 | -1 | 46 | -1 | -1 | -1 |
| 9 | -1 | 5 | -1 | 22 | 18 | 3 | -1 | 26 | -1 | -1 | -1 | -1 | -1 | -1 | 51 | -1 | 1 | -1 | -1 | -1 |
| 10 | -1 | 4 | -1 | 51 | 19 | -1 | -1 | 32 | -1 | -1 | -1 | -1 | -1 | -1 | 48 | -1 | 7 | -1 | -1 | -1 |
| 11 | -1 | 34 | -1 | 28 | 20 | -1 | -1 | 47 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
How to unnest column-list?
You can do this by coercing the elements in the list column to data frames arranged as you like, which will unnest nicely:
library(tidyverse)
tibble(a = c('first', 'second'),
b = list(c('colA' = 1, 'colC' = 2), c('colA'= 3, 'colB'=2))) %>%
mutate(b = invoke_map(tibble, b)) %>%
unnest()
#> # A tibble: 2 x 4
#> a colA colC colB
#> <chr> <dbl> <dbl> <dbl>
#> 1 first 1. 2. NA
#> 2 second 3. NA 2.
Doing the coercion is a little tricky, though, as you don't want to end up with a 2x1 data frame. There are various ways around this, but a direct route is purrr::invoke_map
, which calls a function with purrr::invoke
(like do.call
) on each element in a list.
Unnest one of several list columns in dataframe
According to unnest
, the argument ...
Specification of columns to nest. Use bare variable names or
functions of variables. If omitted, defaults to all list-cols.
Therefore, we could specify the column name to be unnest
ed after the rename_all
iris %>
... #op's code
...
rename_all(funs(str_c("Mean.", .))))) %>%
unnest(sum_data)
# A tibble: 3 x 6
# Species data Mean.Sepal.Length Mean.Sepal.Width Mean.Petal.Length Mean.Petal.Width
# <fctr> <list> <dbl> <dbl> <dbl> <dbl>
#1 setosa <tibble [50 x 4]> 5.01 3.43 1.46 0.246
#2 versicolor <tibble [50 x 4]> 5.94 2.77 4.26 1.33
#3 virginica <tibble [50 x 4]> 6.59 2.97 5.55 2.03
Related Topics
How to Manually Set Colors in a Bar Chart
HTML with Multicolumn Table in Markdown Using Knitr
Street Address to Geolocation Lat/Long
Rmarkdown: Pandoc: PDFlatex Not Found
Difference Between Paste() and Paste0()
How to Hide Code in Rmarkdown, with Option to See It
How to Do Selective Labeling with Ggplot Geom_Point()
Hiding Personal Functions in R
How to Create Textarea as Input in a Shiny Webapp in R
Comparison Between Dplyr::Do/Purrr::Map, What Advantages
R Plot Color Combinations That Are Colorblind Accessible
R Sequence of Dates with Lubridate
Difference Between As.Data.Frame(X) and Data.Frame(X)
Loop Over Rows of Dataframe Applying Function with If-Statement
Change Level of Multiple Factor Variables
Extract Nested List Elements Using Bracketed Numbers and Names