Cleaning `Inf` values from an R dataframe
Option 1
Use the fact that a data.frame
is a list of columns, then use do.call
to recreate a data.frame
.
do.call(data.frame,lapply(DT, function(x) replace(x, is.infinite(x),NA)))
Option 2 -- data.table
You could use data.table
and set
. This avoids some internal copying.
DT <- data.table(dat)
invisible(lapply(names(DT),function(.name) set(DT, which(is.infinite(DT[[.name]])), j = .name,value =NA)))
Or using column numbers (possibly faster if there are a lot of columns):
for (j in 1:ncol(DT)) set(DT, which(is.infinite(DT[[j]])), j, NA)
Timings
# some `big(ish)` data
dat <- data.frame(a = rep(c(1,Inf), 1e6), b = rep(c(Inf,2), 1e6),
c = rep(c('a','b'),1e6),d = rep(c(1,Inf), 1e6),
e = rep(c(Inf,2), 1e6))
# create data.table
library(data.table)
DT <- data.table(dat)
# replace (@mnel)
system.time(na_dat <- do.call(data.frame,lapply(dat, function(x) replace(x, is.infinite(x),NA))))
## user system elapsed
# 0.52 0.01 0.53
# is.na (@dwin)
system.time(is.na(dat) <- sapply(dat, is.infinite))
# user system elapsed
# 32.96 0.07 33.12
# modified is.na
system.time(is.na(dat) <- do.call(cbind,lapply(dat, is.infinite)))
# user system elapsed
# 1.22 0.38 1.60
# data.table (@mnel)
system.time(invisible(lapply(names(DT),function(.name) set(DT, which(is.infinite(DT[[.name]])), j = .name,value =NA))))
# user system elapsed
# 0.29 0.02 0.31
data.table
is the quickest. Using sapply
slows things down noticeably.
Replace -Inf in dataframe with NA in R
We need to do
dat[] <- Map(function(x) replace(x, is.infinite(x), NA), dat)
Or with lapply
dat[sapply(dat, is.infinite)] <- NA
How to remove rows with inf from a dataframe in R
To remove the rows with +/-Inf
I'd suggest the following:
df <- df[!is.infinite(rowSums(df)),]
or, equivalently,
df <- df[is.finite(rowSums(df)),]
The second option (the one with is.finite()
and without the negation) removes also rows containing NA
values in case that this has not already been done.
Cleaning Inf values from an R list of dataframes
If K is a list of data frames this produces a new cleaned list of data.frames. The Inf2NA function replaces all infinite values in a vector v with NA.
Inf2NA <- function(v) replace(v, is.infinite(v), NA)
lapply(K, function(d) replace(d, TRUE, sapply(d, Inf2NA)))
If it's sufficient to create a list of matrices then this shorter version would be sufficient:
lapply(K, function(d) sapply(d, Inf2NA))
Replace infinite values in an R data frame [why doesn't `is.infinite()` behave like `is.na()`]
The is.infinite
expects the input 'x' to be atomic vector according to ?is.infinite
x- object to be tested: the default methods handle atomic vectors.
whereas ?is.na
can take a vector, matrix, data.frame as input
an R object to be tested: the default method for is.na and anyNA handle atomic vectors, lists, pairlists, and NULL
Also, by checking the methods
,
methods('is.na')
#[1] is.na.data.frame is.na.data.table* is.na.numeric_version is.na.POSIXlt is.na.raster* is.na.vctrs_vctr*
methods('is.infinite') # only for vectors
#[1] is.infinite.vctrs_vctr*
We can modify the replace
in the code to
library(dplyr)
df %>%
mutate_if(is.numeric, ~ replace_na(., 0) %>%
replace(., is.infinite(.), 1))
# A tibble: 3 x 2
# col1 col2
# <chr> <dbl>
#1 A 0
#2 B 1
#3 C 5
How to remove NaN and Inf values from data.table where all columns are character types in R
One way would be to find the index of the rows containing NaN
:
unique(which(data == "NaN" | data == "Inf", arr.ind=T)[,1])
[1] 1 2 7 8 9 10 11
And then set a logical condition to remove these rows:
data[!unique(which(data == "NaN" | data == "Inf", arr.ind=T)[,1])]
date open high low close volume
1: 2021-11-26 0.43 0.43 0.43 0.43 2
2: 2021-11-24 0.17 0.17 0.17 0.17 10
3: 2021-11-26 0.19 0.19 0.19 0.19 75
4: 2021-11-24 0.15 0.15 0.15 0.15 1
Some benchmarks
Unit: milliseconds
expr min lq mean median uq max neval cld
me 4.513141 5.545293 7.068744 6.707279 8.356170 31.30188 100 a
langtang 3.535727 3.646819 8.718629 6.318445 6.983275 59.76049 100 a
akrun 51.169168 195.102026 208.889413 204.564707 216.545022 274.02575 100 c
paul 11.235627 145.195062 146.721146 146.670909 148.432261 200.56718 100 b
Macosso 370.269687 448.143027 468.074160 457.499264 497.636319 553.70491 100 d
data = structure(list(date = c("2021-11-24", "2021-11-24", "2021-11-26",
"2021-11-24", "2021-11-26", "2021-11-24", "2021-11-24", "2021-11-26",
"2021-11-26", "2021-11-26", "2021-11-26"), open = c("NaN", "NaN",
"0.43", "0.17", "0.19", "0.15", "NaN", "NaN", "NaN", "NaN", "NaN"
), high = c("NaN", "NaN", "0.43", "0.17", "0.19", "0.15", "NaN",
"NaN", "NaN", "NaN", "NaN"), low = c("NaN", "NaN", "0.43", "0.17",
"0.19", "0.15", "NaN", "NaN", "NaN", "NaN", "NaN"), close = c("NaN",
"NaN", "0.43", "0.17", "0.19", "0.15", "NaN", "NaN", "NaN", "NaN",
"NaN"), volume = c(0L, 0L, 2L, 10L, 75L, 1L, 0L, 0L, 0L, 0L,
0L)), row.names = c(NA, -11L), class = c("data.table", "data.frame"
))
data = do.call("rbind", replicate(1000, data, simplify = FALSE))
library(dtplyr)
res = microbenchmark::microbenchmark(
me = data[!unique(which(data == NaN, arr.ind=T)[,1])],
langtang = na.omit(cbind(data[, .(date,volume)], data[, lapply(.SD, as.numeric), .SDcols = 2:5])),
akrun = {data <- type.convert(data, as.is = TRUE);
data[data[, Reduce(`&`, lapply(.SD, function(x)
!is.nan(x) & is.finite(x))), .SDcols = -1]]},
paul = data %>%
lazy_dt %>%
filter(across(2:5, ~ .x != "NaN")) %>%
as.data.table,
Macosso = {data$Row <- row.names(data);
rm_rw <- data[apply(data, 1,
function(X) any(X== "NaN"|X== "Inf")),] %>% pull(Row);
data[!row.names(data) %in% rm_rw ,] %>% select(-Row)
}
)
R: Remove -Inf and Inf from a vector
Remember that is.na
and is.infinite
may operate on vectors, returning vectors of booleans. So you can filter the vector as so:
> x <- c(1, 2, NA, Inf, -Inf)
> x[!is.na(x) & !is.infinite(x)]
[1] 1 2
If this needs to be done inline, consider putting the above in a function.
Remove infinite values from a matrix in R
Use is.finite
. I presume this is how you wish to "remove" those -Inf
values:
m[!is.finite(m)] <- NA
colMeans(m, na.rm=TRUE)
Replace -inf, NaN and NA values with zero in a dataset in R
As per ?zoo
:
Subscripting by a zoo object whose data contains
logical values is undefined.
So you need to wrap the subsetting in a which
call:
log_ret[which(!is.finite(log_ret))] <- 0
log_ret
x y z s p t
2005-01-01 0.234 -0.012 0 0 0.454 0
Remove rows with Inf and NaN in R
You can't check for NaN
with the normal compare operators. You can do so for Inf
, but you would also have to check for the negative case. Better to use one of those functions: https://stat.ethz.ch/R-manual/R-devel/library/base/html/is.finite.html
Edit: tonytonov pointed out, that is.finite(NaN)
is FALSE
, which makes it sufficient to use in this case. You therefore just need
dat[is.finite(dat$Value1) & is.finite(dat$Value2), ]
Related Topics
Finding Row Index Containing Maximum Value Using R
How to Convert Data Frame to Spatial Coordinates
How to Maintain Size of Ggplot with Long Labels
Dplyr Issues When Using Group_By(Multiple Variables)
Convert Character Matrix into Numeric Matrix
Standard Deviation in R Seems to Be Returning the Wrong Answer - am I Doing Something Wrong
Exporting Non-S3-Methods with Dots in the Name Using Roxygen2 V4
Can Sweave Produce Many PDFs Automatically
Ggplot2: Adjust the Symbol Size in Legends
Replace All Values in a Matrix <0.1 with 0
One-Hot Encoding in [R] | Categorical to Dummy Variables
Use Ggpairs to Create This Plot
R: Lm() Result Differs When Using 'Weights' Argument and When Using Manually Reweighted Data
Creating Regular 15-Minute Time-Series from Irregular Time-Series
Ggplot Geom_Bar: Meaning of Aes(Group = 1)
Simple Way to Subset Spatialpolygonsdataframe (I.E. Delete Polygons) by Attribute in R