Convert R list to dataframe with missing/NULL elements
A comment mentioned wanting only a single loop, which can be achieved with @flodel's answer just by putting the body of the two loops together:
rbind.fill(lapply(alist, function(f) {
as.data.frame(Filter(Negate(is.null), f))
}))
giving
name age
1 Foo 22
2 Bar NA
3 Baz NA
How to convert a list of lists contains NULL values to a data frame
One option involving dplyr
, tidyr
and purrr
could be:
map_depth(.x = my_ls, 2, ~ replace(.x, is.null(.x), NA), .ragged = TRUE) %>%
bind_rows() %>%
mutate(items = map_depth(items, 2, ~ replace(.x, is.null(.x), NA))) %>%
rename(`original_id` = id) %>%
unnest_wider(items)
original_id user_id user_name organization_id checkout_at currency bulk_discount
<int> <int> <chr> <chr> <lgl> <chr> <lgl>
1 406962 132786 Visitor … <NA> NA USD NA
2 406962 132786 Visitor … <NA> NA USD NA
3 407178 132786 Visitor … 00001 NA USD NA
# … with 11 more variables: coupon_codes <lgl>, id <int>, quantity <int>, unit_cost <dbl>,
# used <int>, item_id <int>, item_type <chr>, item_name <chr>, discount_type <chr>,
# discount <lgl>, coupon_id <lgl>
Or an option using rrapply
, dplyr
and tidyr
:
rrapply(my_ls, f = function(x) if(is.null(x)) NA else x, how = "replace") %>%
bind_rows() %>%
rename(`original_id` = id) %>%
unnest_wider(items)
Convert list with NULL entres to data.frame in R
Is this what you are trying to do?
> data.frame(do.call(rbind, z))
a b
1 1 2
2 2 3
3 NULL 4
Converting an R list with NULL sub-elements to a data frame
I think I've found a solution myself.
My approach is to first convert all the sub-lists into dataframes, so I have a list of dataframes instead of list of lists. These dataframes will just drop the NULL
variables.
ldf <- lapply(lll, function(x) {
nonnull <- sapply(x, typeof)!="NULL" # find all NULLs to omit
do.call(data.frame, c(x[nonnull], stringsAsFactors=FALSE))
})
The resultant list of dataframes:
> str(ldf)
List of 2
$ :'data.frame': 1 obs. of 2 variables:
..$ Name : chr "Sghokbt"
..$ Value: int 7
$ :'data.frame': 1 obs. of 3 variables:
..$ Name : chr "Sgnglio"
..$ Title: chr "Mr"
..$ Value: num 5
From here I get a little help from plyr
.
require(plyr)
df <- ldply(ldf)
The result has the columns out of order, but I'm happy enough with it.
> str(df)
'data.frame': 2 obs. of 3 variables:
$ Name : chr "Sghokbt" "Sgnglio"
$ Value: num 7 5
$ Title: chr NA "Mr"
I won't accept this as an answer yet for now in case there is a better solution.
Converting nested list with missing values to data frame in R
We'll let jsonlite::flatten
do most of the work:
Put your two example results in one list (hopefully this is faithful to your actual data structure):
first100geocode <- list(
structure(list(results = structure(list(address_components = list(
structure(list(long_name = c(
"11À", "óëèöà Ãîãîëÿ", "Çåëåíîãðàäñêèé àäìèíèñòðàòèâíûé îêðóã",
"Çåëåíîãðàä", "Ìîñêâà", "Ìîñêâà", "Ðîññèÿ", "124575"), short_name = c(
"11À", "óë. Ãîãîëÿ", "Çåëåíîãðàäñêèé àäìèíèñòðàòèâíûé îêðóã", "Çåëåíîãðàä",
"Ìîñêâà", "Ìîñêâà", "RU", "124575"), types = list(
"street_number", "route", c("political", "sublocality", "sublocality_level_1"
), c("locality", "political"), c(
"administrative_area_level_2",
"political"), c("administrative_area_level_1", "political"
), c("country", "political"), "postal_code")), .Names = c(
"long_name",
"short_name", "types"), class = "data.frame", row.names = c(NA, 8L))),
formatted_address = "óë. Ãîãîëÿ, 11À, Çåëåíîãðàä, Ìîñêâà, Ðîññèÿ, 124575",
geometry = structure(list(location = structure(
list(lat = 55.987567, lng = 37.17152),
.Names = c("lat", "lng"), class = "data.frame", row.names = 1L),
location_type = "ROOFTOP", viewport = structure(list(
northeast = structure(list(
lat = 55.9889159802915, lng = 37.1728689802915), .Names = c("lat", "lng"
), class = "data.frame", row.names = 1L), southwest = structure(list(
lat = 55.9862180197085, lng = 37.1701710197085), .Names = c("lat", "lng"),
class = "data.frame", row.names = 1L)), .Names = c("northeast", "southwest"),
class = "data.frame", row.names = 1L)),
.Names = c("location", "location_type", "viewport"),
class = "data.frame", row.names = 1L),
place_id = "ChIJzXSgUeQUtUYREIohzQOG--A", types = list("street_address")),
.Names = c("address_components",
"formatted_address", "geometry", "place_id", "types"),
class = "data.frame", row.names = 1L),
status = "OK"), .Names = c("results", "status")),
structure(list(results = list(), status = "ZERO_RESULTS"),
.Names = c("results", "status"))
)
Do the actual flattening (and filter out address_components
and types
that are a bit trickier and of no interest to you):
flatten_googleway <- function(df) {
res <- jsonlite::flatten(df)
res[, !names(res) %in% c("address_components", "types")]
}
Prepare the template data frame we'll use for "missing" results. And apply it to those:
template_res <- flatten_googleway(first100geocode[[1]]$results)[FALSE, ]
do.call(rbind, lapply(first100geocode, function(x) {
if (length(x$results) == 0) template_res[1, ] else flatten_googleway(x$results)
}))
# formatted_address place_id
# 1 óë. Ãîãîëÿ, 11À, Çåëåíîãðàä, Ìîñêâà, Ðîññèÿ, 124575 ChIJzXSgUeQUtUYREIohzQOG--A
# NA <NA> <NA>
# geometry.location_type geometry.location.lat geometry.location.lng
# 1 ROOFTOP 55.98757 37.17152
# NA <NA> NA NA
# geometry.viewport.northeast.lat geometry.viewport.northeast.lng
# 1 55.98892 37.17287
# NA NA NA
# geometry.viewport.southwest.lat geometry.viewport.southwest.lng
# 1 55.98622 37.17017
# NA NA NA
R: removing NULL elements from a list
The closest you'll be able to get is to first name the list elements and then remove the NULLs.
names(x) <- seq_along(x)
## Using some higher-order convenience functions
Filter(Negate(is.null), x)
# $`11`
# [1] 123
#
# $`13`
# [1] 456
# Or, using a slightly more standard R idiom
x[sapply(x, is.null)] <- NULL
x
# $`11`
# [1] 123
#
# $`13`
# [1] 456
rbinding with a list containing empty lists for NAs in R
Assuming that
- All the data frames in your list share the same variable names and number of columns.
- Your first element in that nested list is not an empty list. (This is just for my convenience later, you can pick any one element which is a data frame as you like.)
My approach is changing the elements which are not a data frame into one with 1 row of NAs and the same column names with other data frames.
change_others_to_dataframe <- function(x) {
# If x is a data frame, do nothing and return x
# Otherwise, return a data frame with 1 row of NAs
if (is.data.frame(x)) {return(x)}
else {return(setNames(data.frame(matrix(ncol = ncol(myList[[1]]), nrow = 1)),
names(myList[[1]])))}
}
# Apply the written function above to every element in myList
mynewList <- lapply(myList, change_others_to_dataframe)
# "bind_rows" with mynewList
df <- bind_rows(mynewList, .id = "id")
I believe that this would solve your problem.
For creating data frames with no data, you may refer to these threads on SO:
- Create empty data frame with column names by assigning a string vector
- Create an empty data frame
Convert a list to a data frame
Update July 2020:
The default for the parameter stringsAsFactors
is now default.stringsAsFactors()
which in turn yields FALSE
as its default.
Assuming your list of lists is called l
:
df <- data.frame(matrix(unlist(l), nrow=length(l), byrow=TRUE))
The above will convert all character columns to factors, to avoid this you can add a parameter to the data.frame() call:
df <- data.frame(matrix(unlist(l), nrow=132, byrow=TRUE),stringsAsFactors=FALSE)
Related Topics
How to Preserve Base Data Frame Rownames Upon Filtering in Dplyr Chain
Parallel Execution of Random Forest in R
Alternatives to Nested Ifelse Statements in R
Calculate Sum of a List of Variables by Group
In R, How to Subset a Data.Frame by Values from Another Data.Frame
Unnest a List Column Directly into Several Columns
Detect Non Ascii Characters in a String
Does Roxygen2 Automatically Write Namespace Directives for "Imports:" Packages
How to Reference the Local Environment Within a Function, in R
Accessing Excel File from Sharepoint with R
Are Recursive Functions Used in R
Can R Read from a File Through an Ssh Connection
Faster Way to Subset on Rows of a Data Frame in R
Convert from Lowercase to Uppercase All Values in All Character Variables in Dataframe
Cor Shows Only Na or 1 for Correlations - Why
Ggplot: How to Increase Spacing Between Faceted Plots