Floating Point Issue in R

floating point issue in R?

From the Floating-Point Guide:

Why don’t my numbers, like 0.1 + 0.2 add up to a nice round 0.3, and
instead I get a weird result like 0.30000000000000004?

Because internally, computers use a format (binary floating-point)
that cannot accurately represent a number like 0.1, 0.2 or 0.3 at all.

When the code is compiled or interpreted, your “0.1” is already
rounded to the nearest number in that format, which results in a small
rounding error even before the calculation happens.

What can I do to avoid this problem?

That depends on what kind of calculations you’re doing.

  • If you really need your results to add up exactly, especially when
    you work with money: use a special decimal datatype.
  • If you just
    don’t want to see all those extra decimal places: simply format your
    result rounded to a fixed number of decimal places when displaying it.
  • If you have no decimal datatype available, an alternative is to work
    with integers, e.g. do money calculations entirely in cents. But this
    is more work and has some drawbacks.

How to deal with floating point errors in R

you can use all.equal in your function, which "tests if two objects are 'nearly' equal"

is.sqrt <- function(x, y){
isTRUE(all.equal(x^2,y)
}


is.sqrt(sqrt(2), 2)
# TRUE

is.sqrt(sqrt(2), 3)
# FALSE

R: Computing maximum floating-point error for log(exp(...))

I think you got the right answer. Here I refined the step as small as sqrt(.Machine$double.eps), you will see

> x <- seq(0, 2, by = sqrt(.Machine$double.eps))

> max(abs(log(exp(x)) - x))
[1] 1.110725e-16

However, once your x is extremely large, you will have Inf error, e.g.,

> (x <- .Machine$double.xmax)
[1] 1.797693e+308

> max(abs(log(exp(x)) - x))
[1] Inf

Managing floating point accuracy

Floating point precision has always generated lots of confusion. The crucial idea to remember is: when you work with doubles, there is no way to store each real number "as is", or "exactly right" -- the best you can store is the closest available approximation. So when you type (in R or any other modern language) something like x = 99.93029, you'll get this number represented by 99.930289999999999.

Now when you expect a + b to be "exactly 100", you're being inaccurate in terms. The best you can get is "100 up to N digits after the decimal point" and hope that N is big enough. In your case it would be correct to say 99.9302900 + 0.0697122 is 100 with 5 decimal points of accuracy. Naturally, by multiplying that equality by 10^k you'll lose additional k digits of accuracy.

So, there are two solutions here:

a. To get more precision in the output, provide more precision in the input.

bb <- c(99.93029, 0.06971) 
print(bb[2]/(100-bb[1])*100, digits = 20)
[1] 99.999999999999119

b. If double precision not enough (can happen in complex algorithms), use packages that provide extra numeric precision operations. For instance, package gmp.

Numerical issues in R

It's unclear how you define d. Here I assume it's defined based on exact algebra.

for (i in 1:20) print(1/(8 * 10^i) == round(1/(8 * 10^i), 3 + i))
sprintf("%.50f", 1/8e20)
#[1] "0.00000000000000000000124999999999999993158684291616"

Extreme numerical values in floating-point precision in R

R uses IEEE 754 double-precision floating-point numbers.

Floating-point numbers are more dense near zero. This is a result of their being designed to compute accurately (the equivalent of about 16 significant decimal digits, as you have noticed) over a very wide range.

Perhaps you expected a fixed-point system with uniform absolute precision. In practice fixed-point is either wasteful or the ranges of each intermediate computation must be carefully estimated beforehand, with drastic consequences if they are wrong.

Positive floating-point numbers look like this, schematically:


+-+-+-+--+--+--+----+----+----+--------+--------+--------+--
0

The smallest positive normal double-precision number is 2 to the power of the minimal exponent. Near one, the double-precision floating-point numbers are already spread quite wide apart. There is a distance of 2-53 from one to the number below it, and a distance of 2-52 from one to the number above it.

Rounding error in R?

This is a combination of two extremely Frequently A'd Qs.

  • finite floating-point precision: this R FAQ 7.31, see e.g. Why are these numbers not equal? . The value gets rounded to 178379.5. It won't help if you set options(digits=22) to print numbers to more decimal places; the precision has been lost because (as you suggested) R only stores values up to 53 binary/22ish decimal digits of precision.
  • round to even: R "rounds to even", see Is there an error in round function in R? . That means the value will be rounded up.

This is not about printing precision.

If you had used fewer '9's, you would have seen what you expected (which would be a combination of R's limited printing precision plus the expected rounding)

> x <- 178379.49
>
> x
[1] 178379.5 ## prints as .5, but full precision is present
> round(x)
[1] 178379

How to detect floating point errors in R

I'm not sure what kind of normalization you want to do, but if you want rowsums to be 1, then what you did seems right to me.

I'm not an expert, but the below gives you a reproducible example where R fails in the way you describe.

set.seed(2)
M1 <- matrix(runif(10000,min=0,max=10),nrow=2000)
rowSums(M1)

M2 = t(apply(M1, 1,function(x)(x/sum(x))))
length(which(rowSums(M2)!=1)) # gives 10 unnormalized row
all.equal(rowSums(M2),rep(1,2000),tolerance = .Machine$double.eps) # true
for(idx in which(rowSums(M2) != 1)){
print(idx)
print(sum(M2[idx,]),digits = 20)
print(sum(M2[idx,]) == 1)
print(all.equal(sum(M2[idx,]),1,tolerance=.Machine$double.eps))
}

This prints

[1] 422
[1] 0.99999999999999988898
[1] FALSE
[1] TRUE
...

.Machine$double.eps is the smallest floating point number x such that 1 + x != 1. You could retry this experiment with bigger numbers and it looks like the all.equal (which checks if two arrays have all equal entries, entry by entry, within some tolerance) is always true.

If you want to check if all of the rows after your normalization step sum to 1 then you could do so with the all.equal function.

Floating point less-than-equal comparisons after addition and substraction

No, there is no best practice. Unfortunately, there cannot be, because almost all floating-point calculations introduce some rounding error, and the consequences of the errors are different for different applications.

Typically, software will perform some calculations that ideally would yield some exact mathematical result x but, due to rounding errors (or other issues), produce an approximation x'. When comparing floating-point numbers, you want to ask some question about x, such as “Is x < 1?” or “Is x = 3.1415926…?” So the problem you want to solve is “How do I use x' to answer this question about x?”

There is no general solution for this. Some errors may produce an x' that is greater than 1 even though x is less than 1. Some errors may produce an x' that is less than 1 even though x is greater than 1. The solution in any specific instance depends on information about the errors that were generated while calculating x' and the specific question to be answered.

Sometimes a thorough analysis can demonstrate that certain questions about x can be answered using x'. For example, in some situations, we might craft calculations so that we know that, if x' < 1, then x < 1. Or perhaps that, if x' < .99875, then x < 1. Say we analyze the calculations we used to calculate x' and can show that the final error is less than .00125. Then, if x' < .99875, then we know x < 1, and, if x' > 1.00125, then x > 1. But, if .99875 < x' < 1.00125, then we do not know whether x > 1 or x < 1. What do we do in that situation? Is it then better for your application to take the path where x < 1 or the path where x > 1? The answer is specific to each application, and there is no general best practice.

I will add to this that the amount of rounding error that occurs varies hugely from application to application. This is because rounding error can be compounded in various ways. Some applications with a few floating-point operations will achieve results with small errors. Some applications with many floating-point operations will also achieve results with modest errors. But certain behaviors can lead calculations astray and produce catastrophic errors. So dealing with rounding error is a custom problem for each program.



Related Topics



Leave a reply



Submit