Convert R List to Dataframe with Missing/Null Elements

Convert R list to dataframe with missing/NULL elements

A comment mentioned wanting only a single loop, which can be achieved with @flodel's answer just by putting the body of the two loops together:

rbind.fill(lapply(alist, function(f) {
as.data.frame(Filter(Negate(is.null), f))
}))

giving

  name age
1 Foo 22
2 Bar NA
3 Baz NA

How to convert a list of lists contains NULL values to a data frame

One option involving dplyr, tidyr and purrr could be:

map_depth(.x = my_ls, 2, ~ replace(.x, is.null(.x), NA), .ragged = TRUE) %>%
bind_rows() %>%
mutate(items = map_depth(items, 2, ~ replace(.x, is.null(.x), NA))) %>%
rename(`original_id` = id) %>%
unnest_wider(items)

original_id user_id user_name organization_id checkout_at currency bulk_discount
<int> <int> <chr> <chr> <lgl> <chr> <lgl>
1 406962 132786 Visitor … <NA> NA USD NA
2 406962 132786 Visitor … <NA> NA USD NA
3 407178 132786 Visitor … 00001 NA USD NA
# … with 11 more variables: coupon_codes <lgl>, id <int>, quantity <int>, unit_cost <dbl>,
# used <int>, item_id <int>, item_type <chr>, item_name <chr>, discount_type <chr>,
# discount <lgl>, coupon_id <lgl>

Or an option using rrapply, dplyr and tidyr:

rrapply(my_ls, f = function(x) if(is.null(x)) NA else x, how = "replace") %>%
bind_rows() %>%
rename(`original_id` = id) %>%
unnest_wider(items)

Convert list with NULL entres to data.frame in R

Is this what you are trying to do?

> data.frame(do.call(rbind, z))
a b
1 1 2
2 2 3
3 NULL 4

Converting an R list with NULL sub-elements to a data frame

I think I've found a solution myself.

My approach is to first convert all the sub-lists into dataframes, so I have a list of dataframes instead of list of lists. These dataframes will just drop the NULL variables.

ldf <- lapply(lll, function(x) {
nonnull <- sapply(x, typeof)!="NULL" # find all NULLs to omit
do.call(data.frame, c(x[nonnull], stringsAsFactors=FALSE))
})

The resultant list of dataframes:

> str(ldf)
List of 2
$ :'data.frame': 1 obs. of 2 variables:
..$ Name : chr "Sghokbt"
..$ Value: int 7
$ :'data.frame': 1 obs. of 3 variables:
..$ Name : chr "Sgnglio"
..$ Title: chr "Mr"
..$ Value: num 5

From here I get a little help from plyr.

require(plyr)
df <- ldply(ldf)

The result has the columns out of order, but I'm happy enough with it.

> str(df)
'data.frame': 2 obs. of 3 variables:
$ Name : chr "Sghokbt" "Sgnglio"
$ Value: num 7 5
$ Title: chr NA "Mr"

I won't accept this as an answer yet for now in case there is a better solution.

Converting nested list with missing values to data frame in R

We'll let jsonlite::flatten do most of the work:

Put your two example results in one list (hopefully this is faithful to your actual data structure):

first100geocode <- list(
structure(list(results = structure(list(address_components = list(
structure(list(long_name = c(
"11À", "óëèöà Ãîãîëÿ", "Çåëåíîãðàäñêèé àäìèíèñòðàòèâíûé îêðóã",
"Çåëåíîãðàä", "Ìîñêâà", "Ìîñêâà", "Ðîññèÿ", "124575"), short_name = c(
"11À", "óë. Ãîãîëÿ", "Çåëåíîãðàäñêèé àäìèíèñòðàòèâíûé îêðóã", "Çåëåíîãðàä",
"Ìîñêâà", "Ìîñêâà", "RU", "124575"), types = list(
"street_number", "route", c("political", "sublocality", "sublocality_level_1"
), c("locality", "political"), c(
"administrative_area_level_2",
"political"), c("administrative_area_level_1", "political"
), c("country", "political"), "postal_code")), .Names = c(
"long_name",
"short_name", "types"), class = "data.frame", row.names = c(NA, 8L))),
formatted_address = "óë. Ãîãîëÿ, 11À, Çåëåíîãðàä, Ìîñêâà, Ðîññèÿ, 124575",
geometry = structure(list(location = structure(
list(lat = 55.987567, lng = 37.17152),
.Names = c("lat", "lng"), class = "data.frame", row.names = 1L),
location_type = "ROOFTOP", viewport = structure(list(
northeast = structure(list(
lat = 55.9889159802915, lng = 37.1728689802915), .Names = c("lat", "lng"
), class = "data.frame", row.names = 1L), southwest = structure(list(
lat = 55.9862180197085, lng = 37.1701710197085), .Names = c("lat", "lng"),
class = "data.frame", row.names = 1L)), .Names = c("northeast", "southwest"),
class = "data.frame", row.names = 1L)),
.Names = c("location", "location_type", "viewport"),
class = "data.frame", row.names = 1L),
place_id = "ChIJzXSgUeQUtUYREIohzQOG--A", types = list("street_address")),
.Names = c("address_components",
"formatted_address", "geometry", "place_id", "types"),
class = "data.frame", row.names = 1L),
status = "OK"), .Names = c("results", "status")),
structure(list(results = list(), status = "ZERO_RESULTS"),
.Names = c("results", "status"))
)

Do the actual flattening (and filter out address_components and types that are a bit trickier and of no interest to you):

flatten_googleway <- function(df) {
res <- jsonlite::flatten(df)
res[, !names(res) %in% c("address_components", "types")]
}

Prepare the template data frame we'll use for "missing" results. And apply it to those:

template_res <- flatten_googleway(first100geocode[[1]]$results)[FALSE, ]
do.call(rbind, lapply(first100geocode, function(x) {
if (length(x$results) == 0) template_res[1, ] else flatten_googleway(x$results)
}))
# formatted_address place_id
# 1 óë. Ãîãîëÿ, 11À, Çåëåíîãðàä, Ìîñêâà, Ðîññèÿ, 124575 ChIJzXSgUeQUtUYREIohzQOG--A
# NA <NA> <NA>
# geometry.location_type geometry.location.lat geometry.location.lng
# 1 ROOFTOP 55.98757 37.17152
# NA <NA> NA NA
# geometry.viewport.northeast.lat geometry.viewport.northeast.lng
# 1 55.98892 37.17287
# NA NA NA
# geometry.viewport.southwest.lat geometry.viewport.southwest.lng
# 1 55.98622 37.17017
# NA NA NA

R: removing NULL elements from a list

The closest you'll be able to get is to first name the list elements and then remove the NULLs.

names(x) <- seq_along(x)

## Using some higher-order convenience functions
Filter(Negate(is.null), x)
# $`11`
# [1] 123
#
# $`13`
# [1] 456

# Or, using a slightly more standard R idiom
x[sapply(x, is.null)] <- NULL
x
# $`11`
# [1] 123
#
# $`13`
# [1] 456

rbinding with a list containing empty lists for NAs in R

Assuming that

  1. All the data frames in your list share the same variable names and number of columns.
  2. Your first element in that nested list is not an empty list. (This is just for my convenience later, you can pick any one element which is a data frame as you like.)

My approach is changing the elements which are not a data frame into one with 1 row of NAs and the same column names with other data frames.

change_others_to_dataframe <- function(x) {
# If x is a data frame, do nothing and return x
# Otherwise, return a data frame with 1 row of NAs
if (is.data.frame(x)) {return(x)}
else {return(setNames(data.frame(matrix(ncol = ncol(myList[[1]]), nrow = 1)),
names(myList[[1]])))}
}

# Apply the written function above to every element in myList
mynewList <- lapply(myList, change_others_to_dataframe)
# "bind_rows" with mynewList
df <- bind_rows(mynewList, .id = "id")

I believe that this would solve your problem.

For creating data frames with no data, you may refer to these threads on SO:

  • Create empty data frame with column names by assigning a string vector
  • Create an empty data frame

Convert a list to a data frame

Update July 2020:

The default for the parameter stringsAsFactors is now default.stringsAsFactors() which in turn yields FALSE as its default.


Assuming your list of lists is called l:

df <- data.frame(matrix(unlist(l), nrow=length(l), byrow=TRUE))

The above will convert all character columns to factors, to avoid this you can add a parameter to the data.frame() call:

df <- data.frame(matrix(unlist(l), nrow=132, byrow=TRUE),stringsAsFactors=FALSE)


Related Topics



Leave a reply



Submit