What is the difference between NaN and Inf, and NULL and NA in R?
In R language, there are two closely related null-like values: NA
and NULL
. Both are used to represent missing or undefined values.
NULL
represents the null object, it's a reserved word.NULL
is perhaps returned by expressions and functions, so that values are undefined.
NA
is a logical constant of length 1, which contains a missing value indicator. NA
can be freely coerced to any other vector type except raw.
There are also constants NA_integer_
, NA_real_
, NA_complex_
and NA_character_
of the other atomic vector types which support missing values: all of these are reserved words in the R language.
Why do the hash values differ for NaN and Inf - Inf?
tl;dr this has to do with very deep details of how NaN
s are represented in binary. You could work around it by using digest(.,ascii=TRUE)
...
Following up on @Jozef's answer: note boldfaced digits ...
> base::serialize(Inf-Inf,connection=NULL)
[1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
[26] 00 0e 00 00 00 01 ff f8 00 00 00 00 00 00
> base::serialize(NaN,connection=NULL)
[1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
[26] 00 0e 00 00 00 01 7f f8 00 00 00 00 00 00
Alternatively, using pryr::bytes()
...
> bytes(NaN)
[1] "7F F8 00 00 00 00 00 00"
> bytes(Inf-Inf)
[1] "FF F8 00 00 00 00 00 00"
The Wikipedia article on floating point format/NaNs says:
Some operations of floating-point arithmetic are invalid, such as taking the square root of a negative number. The act of reaching an invalid result is called a floating-point exception. An exceptional result is represented by a special code called a NaN, for "Not a Number". All NaNs in IEEE 754-1985 have this format:
- sign = either 0 or 1.
- biased exponent = all 1 bits.
- fraction = anything except all 0 bits (since all 0 bits represents infinity).
The sign is the first bit; the exponent is the next 11 bits; the fraction is the last 52 bits. Translating the first four hex digits given above to binary, Inf-Inf
is 1111 1111 1111 0100
(sign=1; exponent is all ones, as required; fraction starts with 0100
) whereas NaN
is 0111 1111 1111 0100
(the same, but with sign=0).
To understand why Inf-Inf
ends up with sign bit 1 and NaN
has sign bit 0 you'd probably have to dig more deeply into the way floating point arithmetic is implemented on this platform ...
It might be worth raising an issue on the digest GitHub repo about this; I can't think of an elegant way to do it, but it seems reasonable that objects where identical(x,y)
is TRUE
in R should have identical hashes ... Note that identical()
specifically ignores these differences in bit patterns via the single.NA
(default TRUE
) argument:
single.NA: logical indicating if there is conceptually just one numeric
‘NA’ and one ‘NaN’; ‘single.NA = FALSE’ differentiates bit
patterns.
Within the C code, it looks like R simply uses C's !=
operator to compare NaN
values unless bitwise comparison is enabled, in which case it does an explicit check of equality of the memory locations: see here. That is, C's comparison operator appears to treat different kinds of NaN
values as equivalent ...
Replace -inf, NaN and NA values with zero in a dataset in R
As per ?zoo
:
Subscripting by a zoo object whose data contains
logical values is undefined.
So you need to wrap the subsetting in a which
call:
log_ret[which(!is.finite(log_ret))] <- 0
log_ret
x y z s p t
2005-01-01 0.234 -0.012 0 0 0.454 0
Error saying I have NA/NaN/Inf value when there seems to be none present. Error in hclustfun(distc) : NA/NaN/Inf in foreign function call (arg 11)
Check this row:
ConfP <- Conf.t/rowSums(Conf.t)
It must be evaluating essentially to division by zero:
> 0/0
[1] NaN
> 1/0
[1] Inf
making the distinction between missing value types (non-response vs skip patterns)
Using NA, Inf, -Inf and NaN we can represent 4 categories of numeric missing values. Below we show the use of NA with Inf and then NA with NaN. In the third approach we discuss the use of naniar package.
1) Recode q2
values of Yes, No, structural missing and missing to 1, 0, Inf and NA respectively. Note that is.na(x)
will only report TRUE for an actual NA, is.infinite(x)
will only report TRUE for an Inf and !is.finite(x)
will report TRUE for NA or Inf in case you need to perform tests. Optionally recode the output back.
df %>%
count(q2 = recode(q2, Yes = 1, No = 0, .missing = ifelse(q1 == "No", Inf, NA)))
giving:
# A tibble: 3 x 2
# Groups: q2 [3]
q2 n
<dbl> <int>
1 1 1
2 Inf 2
3 NA 1
2) A variation on this is to use NaN in place of Inf. In that case tests can use is.na(x)
, is.nan(x)
and !is.finite(x)
df %>%
count(q2 = recode(q2, Yes = 1, No = 0, .missing = ifelse(q1 == "No", NaN, NA)))
giving:
# A tibble: 3 x 2
q2 n
<dbl> <int>
1 1 1
2 NA 1
3 NaN 2
3) The naniar package can create auxilliary columns that define the type of each NA using bind_shadow
. We can then recode the auxilliary columns using recode_shadow
and then use those in our counting.
library(naniar)
library(naniar)
df %>%
bind_shadow %>%
recode_shadow(q2 = .where(is.na(q2) & q1 == "No" ~ "struct")) %>%
count(q2, q2_NA)
giving:
# A tibble: 3 x 3
q2 q2_NA n
<chr> <fct> <int>
1 Yes !NA 1
2 <NA> NA 1
3 <NA> NA_struct 2
Dealing with TRUE, FALSE, NA and NaN
To answer your questions in order:
1) The ==
operator does indeed not treat NA's as you would expect it to. A very useful function is this compareNA
function from r-cookbook.com:
compareNA <- function(v1,v2) {
# This function returns TRUE wherever elements are the same, including NA's,
# and false everywhere else.
same <- (v1 == v2) | (is.na(v1) & is.na(v2))
same[is.na(same)] <- FALSE
return(same)
}
2) NA stands for "Not available", and is not the same as the general NaN ("not a number"). NA is generally used for a default value for a number to stand in for missing data; NaN's are normally generated because a numerical issue (taking log of -1 or similar).
3) I'm not really sure what you mean by "logical things"--many different data types, including numeric vectors, can be used as input to logical operators. You might want to try reading the R logical operators page: http://stat.ethz.ch/R-manual/R-patched/library/base/html/Logic.html.
Hope this helps!
How to remove rows with inf from a dataframe in R
To remove the rows with +/-Inf
I'd suggest the following:
df <- df[!is.infinite(rowSums(df)),]
or, equivalently,
df <- df[is.finite(rowSums(df)),]
The second option (the one with is.finite()
and without the negation) removes also rows containing NA
values in case that this has not already been done.
Replace -Inf in dataframe with NA in R
We need to do
dat[] <- Map(function(x) replace(x, is.infinite(x), NA), dat)
Or with lapply
dat[sapply(dat, is.infinite)] <- NA
DTWCLUST Shape Based Cluster Analysis in R: NA/NaN/Inf in Foreign Function Call Despite Complete Dataset
You have empty series, i.e. series whose values are all zero.
For example anger[1949,]
.
According to the definition of SBD, the distance between such series and any other is infinite.
You'll probably have to remove them with something like anger[rowSums(anger) != 0,]
.
Related Topics
Remove Extra Space and Ring at the Edge of a Polar Plot
Specifying Ggplot2 Panel Width
How to Add Rtools\Bin to the System Path in R
Get Date Difference in Years (Floating Point)
Round a Posix Date (Posixct) with Base R Functionality
Extracting Unique Rows from a Data Table in R
Shinydashboard Some Font Awesome Icons Not Working
How to Annotate a Reference Line at the Same Angle as the Reference Line Itself
How to Cross-Paste All Combinations of Two Vectors (Each-To-Each)
Get Connected Components Using Igraph in R
Make Sequential Numeric Column Names Prefixed with a Letter
The Condition Has Length ≫ 1 and Only the First Element Will Be Used
Split Time Series Data into Time Intervals (Say an Hour) and Then Plot the Count
Aggregate Methods Treat Missing Values (Na) Differently
Asymmetric Color Distribution in Scale_Gradient2
How to Resolve the "No Font Name" Issue When Importing Fonts into R Using Extrafont