Is there a more efficient way to replace NULL with NA in a list?
Many efficiency problems in R are solved by first changing the original data into a form that makes the processes that follow as fast and easy as possible. Usually, this is matrix form.
If you bring all the data together with rbind
, your nullToNA
function no longer has to search though nested lists, and therefore sapply
serves its purpose (looking though a matrix) more efficiently. In theory, this should make the process faster.
Good question, by the way.
> dat <- do.call(rbind, lapply(employees, rbind))
> dat
id dept age sportsteam
[1,] 1 "IT" 29 "softball"
[2,] 2 "IT" 30 NULL
[3,] 3 "IT" 29 "hockey"
[4,] 4 NULL 29 "softball"
> nullToNA(dat)
id dept age sportsteam
[1,] 1 "IT" 29 "softball"
[2,] 2 "IT" 30 NA
[3,] 3 "IT" 29 "hockey"
[4,] 4 NA 29 "softball"
Change NULL to NA in list without transformation
You're example does work with the development version of purrr.
The NULL
rows is causing problems for approaches, such as using dplyr::bind_rows
, that would otherwise work to collapse a list of lists into a tibble. A work-around to remove the NULL
row is to loop through and flatten
each list. Looping via map_df
binds the rows and gives your desired result.
map_df(example, flatten)
# A tibble: 3 x 4
ID Name Middle_name Surname
<chr> <chr> <chr> <chr>
1 1 Joe Alan Smith
2 2 Sarah <NA> Jones
3 3 Robert Myles McDonnell
Replace NULL with NA in r data.table with lists
My toy example is too small to compare timings, but combining both solutions suggested by @B. Christian Kamgang and @Ronak Shah works well for me:
# Function to replace NULL with NA in lists:
null2na <- function(dtcol){
fullcol = replace(dtcol, lengths(dtcol) == 0L, NA)
return(fullcol)
# Apply function to dataset:
dt[, names(dt) := lapply(.SD, null2na)]
Two things I found advantageous with this approach (thanks to both respondants for suggesting):
Avoiding use of base r
ifelse
,dplyr::if_else
anddata.table::fifelse
; base r ifelse converts all columns to a list unless you specify them before-hand, and the dplyr and data.table versions of ifelse, while they respect the original column classes don't work in this scenario becauseNA
is interpreted as differing in type from the other values in the list.The use of the function
lengths(dtcol) == 0L
targets specifically only the list elements that are null and doesn't do anything to the other columns or values. This means that it is not necessary to specify the subset of columns that are lists before-hand, as inherently it deals only with those.I've gone with
replace()
rather than subsetting dtcol in the function as I think with larger datasets the former might be slightly faster (but have yet to test that).
NA to replace NULL in list/for loop
Try this:
library(httr)
resUrl <- "http://api.eia.gov/series/?api_key=2B5239FA427673D22505DBF45664B12E&series_id=NG.N3010CO3.M"
x <- GET(resUrl)
y <- content(x)
str(head(y$series[[1]]$data))
# List of 6
# $ :List of 2
# ..$ : chr "201701"
# ..$ : NULL
# $ :List of 2
# ..$ : chr "201612"
# ..$ : num 6.48
# $ :List of 2
# ..$ : chr "201611"
# ..$ : num 7.42
# $ :List of 2
# ..$ : chr "201610"
# ..$ : num 9.75
# $ :List of 2
# ..$ : chr "201609"
# ..$ : num 12.1
# $ :List of 2
# ..$ : chr "201608"
# ..$ : num 14.3
In this first URL, only the first within $series[[1]]$data
contained a NULL
. BTW: be clear to distinguish between NULL
(the literal) and "NULL"
(a character
string with 4 letters).
Here are some ways (with various data types) to check for NULL
s:
is.null(NULL)
# [1] TRUE
length(NULL)
# [1] 0
Simple enough so far, let's try to list with NULL
s:
l <- list(NULL, 1)
is.null(l)
# [1] FALSE
sapply(l, is.null)
# [1] TRUE FALSE
length(l)
# [1] 2
lengths(l)
# [1] 0 1
sapply(l, length)
# [1] 0 1
(The "0" lengths indicate NULL
s.) I'll use lengths
here:
y$series[[1]]$data <- lapply(y$series[[1]]$data, function(z) { z[ lengths(z) == 0 ] <- NA; z; })
str(head(y$series[[1]]$data))
# List of 6
# $ :List of 2
# ..$ : chr "201701"
# ..$ : logi NA
# $ :List of 2
# ..$ : chr "201612"
# ..$ : num 6.48
# $ :List of 2
# ..$ : chr "201611"
# ..$ : num 7.42
# $ :List of 2
# ..$ : chr "201610"
# ..$ : num 9.75
# $ :List of 2
# ..$ : chr "201609"
# ..$ : num 12.1
# $ :List of 2
# ..$ : chr "201608"
# ..$ : num 14.3
How to replace NULL values over arbitrarily nested list in R?
A recursive function can be created
replace_null <- function(x){
x <- purrr::map(x, ~ replace(.x, is.null(.x), NA_character_))
purrr::map(x, ~ if(is.list(.x)) replace_null(.x) else .x)
}
-checking
replace_null(myList)
#$elem1
#[1] "first"
#$elem2
#$elem2$elem2.1
#[1] "second1"
#$elem2$elem2.2
#[1] NA
#$elem3
#$elem3$elem3.1
#[1] "third1"
#$elem3$elem3.2
#$elem3$elem3.2$elem3.2.1
#[1] NA
#$elem3$elem3.2$elem3.2.2
#[1] NA
#$elem3$elem3.2$elem3.2.3
#[1] "third2.3"
#$elem3$elem4
#[1] "fourth"
Replace null in a list in R?
You can try a tidyverse
solution
library(tidyverse)
# a list
a <- list(NULL, data.frame(x=T, y=F), NULL)
str(a)
List of 3
$ : NULL
$ :'data.frame': 1 obs. of 2 variables:
..$ x: logi TRUE
..$ y: logi FALSE
$ : NULL
# and replace
modify_if(a, is.null, ~compact(a) %>% unlist())
[[1]]
x y
TRUE FALSE
[[2]]
x y
1 TRUE FALSE
[[3]]
x y
TRUE FALSE
# or
modify_if(a, is.null, ~data.frame(x=NA,y=NA))
Replacing NULL value in Dataframe in R with Median of Column
If you wish to keep the entries as length-one lists you can do:
pivot_table_1[] <- lapply(pivot_table_1, function(x) {
ifelse(lengths(x) == 1, x, list(median(unlist(x))))})
pivot_table_1
#> # A tibble: 31 x 7
#> Day `Start Sleeping` `Stop Sleeping` `Start Working` `Stop Working`
#> <int> <list> <list> <list> <list>
#> 1 1 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> 2 2 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> 3 3 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> 4 4 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> 5 5 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> 6 6 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> 7 7 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> 8 8 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> 9 9 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> 10 10 <dbl [1]> <dbl [1]> <dbl [1]> <dbl [1]>
#> # ... with 21 more rows, and 2 more variables: Breakfast <list>, Dinner <list>
Or, if you want them as numeric columns, do:
pivot_table_1[] <- lapply(pivot_table_1, function(x) {
unlist(ifelse(lengths(x) == 1, x, list(median(unlist(x)))))})
pivot_table_1
#> # A tibble: 31 x 7
#> Day `Start Sleeping` `Stop Sleeping` `Start Working` `Stop Working`
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 440 490 1005
#> 2 2 20 440 490 1005
#> 3 3 35 440 490 1005
#> 4 4 40 440 490 1005
#> 5 5 50 440 0 965
#> 6 6 0 440 0 965
#> 7 7 40 440 490 965
#> 8 8 0 440 490 965
#> 9 9 0 440 490 965
#> 10 10 40 440 490 965
#> # ... with 21 more rows, and 2 more variables: Breakfast <dbl>, Dinner <dbl>
Created on 2022-05-22 by the reprex package (v2.0.1)
Unlist a list without losing NULL's
One option could be:
foo[sapply(foo, is.null)] <- NA
unlist(foo, use.names = FALSE)
[1] 1 2 NA
Related Topics
Regression Tables in Markdown Format (For Flexible Use in R Markdown V2)
R: Using Rgl to Generate 3D Rotatable Plots That Can Be Viewed in a Web Browser
Error with Ggplot2 Mapping Variable to Y and Using Stat="Bin"
How to Fix Outofmemoryerror (Java): Gc Overhead Limit Exceeded in R
Merge/Combine Columns with Same Name But Incomplete Data
Convert Column in Data.Frame to Date
Optimal/Efficient Plotting of Survival/Regression Analysis Results
R Shiny - Disable/Able Shinyui Elements
Plotting Envfit Vectors (Vegan Package) in Ggplot2
R: Find and Add Missing (/Non Existing) Rows in Time Related Data Frame
Ggplot2 Equivalent of Matplot():Plot a Matrix/Array by Columns