Cleaning 'Inf' Values from an R Dataframe

Cleaning `Inf` values from an R dataframe


Option 1

Use the fact that a data.frame is a list of columns, then use do.call to recreate a data.frame.

do.call(data.frame,lapply(DT, function(x) replace(x, is.infinite(x),NA)))

Option 2 -- data.table

You could use data.table and set. This avoids some internal copying.

DT <- data.table(dat)
invisible(lapply(names(DT),function(.name) set(DT, which(is.infinite(DT[[.name]])), j = .name,value =NA)))

Or using column numbers (possibly faster if there are a lot of columns):

for (j in 1:ncol(DT)) set(DT, which(is.infinite(DT[[j]])), j, NA)

Timings

# some `big(ish)` data
dat <- data.frame(a = rep(c(1,Inf), 1e6), b = rep(c(Inf,2), 1e6),
c = rep(c('a','b'),1e6),d = rep(c(1,Inf), 1e6),
e = rep(c(Inf,2), 1e6))
# create data.table
library(data.table)
DT <- data.table(dat)

# replace (@mnel)
system.time(na_dat <- do.call(data.frame,lapply(dat, function(x) replace(x, is.infinite(x),NA))))
## user system elapsed
# 0.52 0.01 0.53

# is.na (@dwin)
system.time(is.na(dat) <- sapply(dat, is.infinite))
# user system elapsed
# 32.96 0.07 33.12

# modified is.na
system.time(is.na(dat) <- do.call(cbind,lapply(dat, is.infinite)))
# user system elapsed
# 1.22 0.38 1.60


# data.table (@mnel)
system.time(invisible(lapply(names(DT),function(.name) set(DT, which(is.infinite(DT[[.name]])), j = .name,value =NA))))
# user system elapsed
# 0.29 0.02 0.31

data.table is the quickest. Using sapply slows things down noticeably.

Replace -Inf in dataframe with NA in R

We need to do

dat[] <- Map(function(x) replace(x, is.infinite(x), NA), dat)

Or with lapply

dat[sapply(dat, is.infinite)] <- NA

How to remove rows with inf from a dataframe in R

To remove the rows with +/-Inf I'd suggest the following:

df <- df[!is.infinite(rowSums(df)),]

or, equivalently,

df <- df[is.finite(rowSums(df)),]

The second option (the one with is.finite() and without the negation) removes also rows containing NA values in case that this has not already been done.

Cleaning Inf values from an R list of dataframes

If K is a list of data frames this produces a new cleaned list of data.frames. The Inf2NA function replaces all infinite values in a vector v with NA.

 Inf2NA <- function(v) replace(v, is.infinite(v), NA)
lapply(K, function(d) replace(d, TRUE, sapply(d, Inf2NA)))

If it's sufficient to create a list of matrices then this shorter version would be sufficient:

lapply(K, function(d) sapply(d, Inf2NA))

Replace infinite values in an R data frame [why doesn't `is.infinite()` behave like `is.na()`]

The is.infinite expects the input 'x' to be atomic vector according to ?is.infinite

x- object to be tested: the default methods handle atomic vectors.

whereas ?is.na can take a vector, matrix, data.frame as input

an R object to be tested: the default method for is.na and anyNA handle atomic vectors, lists, pairlists, and NULL

Also, by checking the methods,

methods('is.na')
#[1] is.na.data.frame is.na.data.table* is.na.numeric_version is.na.POSIXlt is.na.raster* is.na.vctrs_vctr*

methods('is.infinite') # only for vectors
#[1] is.infinite.vctrs_vctr*

We can modify the replace in the code to

library(dplyr)
df %>%
mutate_if(is.numeric, ~ replace_na(., 0) %>%
replace(., is.infinite(.), 1))
# A tibble: 3 x 2
# col1 col2
# <chr> <dbl>
#1 A 0
#2 B 1
#3 C 5

How to remove NaN and Inf values from data.table where all columns are character types in R

One way would be to find the index of the rows containing NaN:

unique(which(data == "NaN" | data == "Inf", arr.ind=T)[,1])
[1]  1  2  7  8  9 10 11

And then set a logical condition to remove these rows:

data[!unique(which(data == "NaN" | data == "Inf", arr.ind=T)[,1])]
         date open high  low close volume
1: 2021-11-26 0.43 0.43 0.43 0.43 2
2: 2021-11-24 0.17 0.17 0.17 0.17 10
3: 2021-11-26 0.19 0.19 0.19 0.19 75
4: 2021-11-24 0.15 0.15 0.15 0.15 1

Some benchmarks

Unit: milliseconds
expr min lq mean median uq max neval cld
me 4.513141 5.545293 7.068744 6.707279 8.356170 31.30188 100 a
langtang 3.535727 3.646819 8.718629 6.318445 6.983275 59.76049 100 a
akrun 51.169168 195.102026 208.889413 204.564707 216.545022 274.02575 100 c
paul 11.235627 145.195062 146.721146 146.670909 148.432261 200.56718 100 b
Macosso 370.269687 448.143027 468.074160 457.499264 497.636319 553.70491 100 d
data = structure(list(date = c("2021-11-24", "2021-11-24", "2021-11-26", 
"2021-11-24", "2021-11-26", "2021-11-24", "2021-11-24", "2021-11-26",
"2021-11-26", "2021-11-26", "2021-11-26"), open = c("NaN", "NaN",
"0.43", "0.17", "0.19", "0.15", "NaN", "NaN", "NaN", "NaN", "NaN"
), high = c("NaN", "NaN", "0.43", "0.17", "0.19", "0.15", "NaN",
"NaN", "NaN", "NaN", "NaN"), low = c("NaN", "NaN", "0.43", "0.17",
"0.19", "0.15", "NaN", "NaN", "NaN", "NaN", "NaN"), close = c("NaN",
"NaN", "0.43", "0.17", "0.19", "0.15", "NaN", "NaN", "NaN", "NaN",
"NaN"), volume = c(0L, 0L, 2L, 10L, 75L, 1L, 0L, 0L, 0L, 0L,
0L)), row.names = c(NA, -11L), class = c("data.table", "data.frame"
))
data = do.call("rbind", replicate(1000, data, simplify = FALSE))

library(dtplyr)

res = microbenchmark::microbenchmark(
me = data[!unique(which(data == NaN, arr.ind=T)[,1])],

langtang = na.omit(cbind(data[, .(date,volume)], data[, lapply(.SD, as.numeric), .SDcols = 2:5])),

akrun = {data <- type.convert(data, as.is = TRUE);
data[data[, Reduce(`&`, lapply(.SD, function(x)
!is.nan(x) & is.finite(x))), .SDcols = -1]]},

paul = data %>%
lazy_dt %>%
filter(across(2:5, ~ .x != "NaN")) %>%
as.data.table,

Macosso = {data$Row <- row.names(data);
rm_rw <- data[apply(data, 1,
function(X) any(X== "NaN"|X== "Inf")),] %>% pull(Row);
data[!row.names(data) %in% rm_rw ,] %>% select(-Row)
}

)

R: Remove -Inf and Inf from a vector

Remember that is.na and is.infinite may operate on vectors, returning vectors of booleans. So you can filter the vector as so:

> x <- c(1, 2, NA, Inf, -Inf)
> x[!is.na(x) & !is.infinite(x)]
[1] 1 2

If this needs to be done inline, consider putting the above in a function.

Remove infinite values from a matrix in R

Use is.finite. I presume this is how you wish to "remove" those -Inf values:

m[!is.finite(m)] <- NA
colMeans(m, na.rm=TRUE)

Replace -inf, NaN and NA values with zero in a dataset in R

As per ?zoo:

Subscripting by a zoo object whose data contains
logical values is undefined.

So you need to wrap the subsetting in a which call:

log_ret[which(!is.finite(log_ret))] <- 0
log_ret
x y z s p t
2005-01-01 0.234 -0.012 0 0 0.454 0

Remove rows with Inf and NaN in R

You can't check for NaN with the normal compare operators. You can do so for Inf, but you would also have to check for the negative case. Better to use one of those functions: https://stat.ethz.ch/R-manual/R-devel/library/base/html/is.finite.html

Edit: tonytonov pointed out, that is.finite(NaN) is FALSE, which makes it sufficient to use in this case. You therefore just need

dat[is.finite(dat$Value1) & is.finite(dat$Value2), ]


Related Topics



Leave a reply



Submit