Is There a More Efficient Way to Replace Null with Na in a List

Is there a more efficient way to replace NULL with NA in a list?

Many efficiency problems in R are solved by first changing the original data into a form that makes the processes that follow as fast and easy as possible. Usually, this is matrix form.

If you bring all the data together with rbind, your nullToNA function no longer has to search though nested lists, and therefore sapply serves its purpose (looking though a matrix) more efficiently. In theory, this should make the process faster.

Good question, by the way.

> dat <- do.call(rbind, lapply(employees, rbind))
> dat
id dept age sportsteam
[1,] 1 "IT" 29 "softball"
[2,] 2 "IT" 30 NULL
[3,] 3 "IT" 29 "hockey"
[4,] 4 NULL 29 "softball"

> nullToNA(dat)
id dept age sportsteam
[1,] 1 "IT" 29 "softball"
[2,] 2 "IT" 30 NA
[3,] 3 "IT" 29 "hockey"
[4,] 4 NA 29 "softball"

Change NULL to NA in list without transformation

You're example does work with the development version of purrr.

The NULL rows is causing problems for approaches, such as using dplyr::bind_rows, that would otherwise work to collapse a list of lists into a tibble. A work-around to remove the NULL row is to loop through and flatten each list. Looping via map_df binds the rows and gives your desired result.

map_df(example, flatten)

# A tibble: 3 x 4
ID Name Middle_name Surname
<chr> <chr> <chr> <chr>
1 1 Joe Alan Smith
2 2 Sarah <NA> Jones
3 3 Robert Myles McDonnell

Replace NULL with NA in r data.table with lists

My toy example is too small to compare timings, but combining both solutions suggested by @B. Christian Kamgang and @Ronak Shah works well for me:

# Function to replace NULL with NA in lists:
null2na <- function(dtcol){
fullcol = replace(dtcol, lengths(dtcol) == 0L, NA)
return(fullcol)

# Apply function to dataset:
dt[, names(dt) := lapply(.SD, null2na)]

Two things I found advantageous with this approach (thanks to both respondants for suggesting):

  1. Avoiding use of base r ifelse, dplyr::if_else and data.table::fifelse; base r ifelse converts all columns to a list unless you specify them before-hand, and the dplyr and data.table versions of ifelse, while they respect the original column classes don't work in this scenario because NA is interpreted as differing in type from the other values in the list.

  2. The use of the function lengths(dtcol) == 0L targets specifically only the list elements that are null and doesn't do anything to the other columns or values. This means that it is not necessary to specify the subset of columns that are lists before-hand, as inherently it deals only with those.

  3. I've gone with replace() rather than subsetting dtcol in the function as I think with larger datasets the former might be slightly faster (but have yet to test that).

NA to replace NULL in list/for loop

Try this:

library(httr)
resUrl <- "http://api.eia.gov/series/?api_key=2B5239FA427673D22505DBF45664B12E&series_id=NG.N3010CO3.M"
x <- GET(resUrl)
y <- content(x)
str(head(y$series[[1]]$data))
# List of 6
# $ :List of 2
# ..$ : chr "201701"
# ..$ : NULL
# $ :List of 2
# ..$ : chr "201612"
# ..$ : num 6.48
# $ :List of 2
# ..$ : chr "201611"
# ..$ : num 7.42
# $ :List of 2
# ..$ : chr "201610"
# ..$ : num 9.75
# $ :List of 2
# ..$ : chr "201609"
# ..$ : num 12.1
# $ :List of 2
# ..$ : chr "201608"
# ..$ : num 14.3

In this first URL, only the first within $series[[1]]$data contained a NULL. BTW: be clear to distinguish between NULL (the literal) and "NULL" (a character string with 4 letters).

Here are some ways (with various data types) to check for NULLs:

is.null(NULL)
# [1] TRUE
length(NULL)
# [1] 0

Simple enough so far, let's try to list with NULLs:

l <- list(NULL, 1)
is.null(l)
# [1] FALSE
sapply(l, is.null)
# [1] TRUE FALSE
length(l)
# [1] 2
lengths(l)
# [1] 0 1
sapply(l, length)
# [1] 0 1

(The "0" lengths indicate NULLs.) I'll use lengths here:

y$series[[1]]$data <- lapply(y$series[[1]]$data, function(z) { z[ lengths(z) == 0 ] <- NA; z; })
str(head(y$series[[1]]$data))
# List of 6
# $ :List of 2
# ..$ : chr "201701"
# ..$ : logi NA
# $ :List of 2
# ..$ : chr "201612"
# ..$ : num 6.48
# $ :List of 2
# ..$ : chr "201611"
# ..$ : num 7.42
# $ :List of 2
# ..$ : chr "201610"
# ..$ : num 9.75
# $ :List of 2
# ..$ : chr "201609"
# ..$ : num 12.1
# $ :List of 2
# ..$ : chr "201608"
# ..$ : num 14.3

How to replace NULL values over arbitrarily nested list in R?

A recursive function can be created

replace_null <- function(x){  
x <- purrr::map(x, ~ replace(.x, is.null(.x), NA_character_))
purrr::map(x, ~ if(is.list(.x)) replace_null(.x) else .x)

}

-checking

replace_null(myList)
#$elem1
#[1] "first"

#$elem2
#$elem2$elem2.1
#[1] "second1"

#$elem2$elem2.2
#[1] NA

#$elem3
#$elem3$elem3.1
#[1] "third1"

#$elem3$elem3.2
#$elem3$elem3.2$elem3.2.1
#[1] NA

#$elem3$elem3.2$elem3.2.2
#[1] NA

#$elem3$elem3.2$elem3.2.3
#[1] "third2.3"

#$elem3$elem4
#[1] "fourth"

Replace null in a list in R?

You can try a tidyverse solution

library(tidyverse)
# a list
a <- list(NULL, data.frame(x=T, y=F), NULL)
str(a)
List of 3
$ : NULL
$ :'data.frame': 1 obs. of 2 variables:
..$ x: logi TRUE
..$ y: logi FALSE
$ : NULL

# and replace
modify_if(a, is.null, ~compact(a) %>% unlist())
[[1]]
x y
TRUE FALSE

[[2]]
x y
1 TRUE FALSE

[[3]]
x y
TRUE FALSE

# or
modify_if(a, is.null, ~data.frame(x=NA,y=NA))

Replacing NULL value in Dataframe in R with Median of Column

If you wish to keep the entries as length-one lists you can do:

pivot_table_1[] <- lapply(pivot_table_1, function(x) {
ifelse(lengths(x) == 1, x, list(median(unlist(x))))})

pivot_table_1
#> # A tibble: 31 x 7
#> Day `Start Sleeping` `Stop Sleeping` `Start Working` `Stop Working`
#> <int> <list> <list> <list> <list>
#> 1 1 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> 2 2 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> 3 3 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> 4 4 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> 5 5 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> 6 6 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> 7 7 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> 8 8 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> 9 9 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> 10 10 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> # ... with 21 more rows, and 2 more variables: Breakfast <list>, Dinner <list>

Or, if you want them as numeric columns, do:

pivot_table_1[] <- lapply(pivot_table_1, function(x) {
unlist(ifelse(lengths(x) == 1, x, list(median(unlist(x)))))})

pivot_table_1
#> # A tibble: 31 x 7
#> Day `Start Sleeping` `Stop Sleeping` `Start Working` `Stop Working`
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 440 490 1005
#> 2 2 20 440 490 1005
#> 3 3 35 440 490 1005
#> 4 4 40 440 490 1005
#> 5 5 50 440 0 965
#> 6 6 0 440 0 965
#> 7 7 40 440 490 965
#> 8 8 0 440 490 965
#> 9 9 0 440 490 965
#> 10 10 40 440 490 965
#> # ... with 21 more rows, and 2 more variables: Breakfast <dbl>, Dinner <dbl>

Created on 2022-05-22 by the reprex package (v2.0.1)

Unlist a list without losing NULL's

One option could be:

foo[sapply(foo, is.null)] <- NA
unlist(foo, use.names = FALSE)

[1] 1 2 NA


Related Topics



Leave a reply



Submit