Dealing with Very Small Numbers in R

Dealing with very small numbers in R

Mathematically spoken, one of those numbers will be appx. zero, and the other one. The difference between your numbers is huge, so I'm even wondering if this makes sense.

But to do that in general, you can use the idea from the logspace_add C-function that's underneath the hood of R. One can define logxpy ( =log(x+y) ) when lx = log(x) and ly = log(y) as :

logxpy <- function(lx,ly) max(lx,ly) + log1p(exp(-abs(lx-ly)))

Which means that we can use :

> la1 <- 1000*log(0.1)
> la2 <- 1200*log(0.2)

> exp(la1 - logxpy(la1,la2))
[1] 5.807714e-162

> exp(la2 - logxpy(la1,la2))
[1] 1

This function can be called recursively as well if you have more numbers. Mind you, 1 is still 1, and not 1 minus 5.807...e-162 . If you really need more precision and your platform supports long double types, you could code everything in eg C or C++, and return the results later on. But if I'm right, R can - for the moment - only deal with normal doubles, so ultimately you'll lose the precision again when the result is shown.


EDIT :

to do the math for you :

log(x+y) = log(exp(lx)+exp(ly))
= log( exp(lx) * (1 + exp(ly-lx) )
= lx + log ( 1 + exp(ly - lx) )

Now you just take the largest as lx, and then you come at the expression in logxpy().

EDIT 2 : Why take the maximum then? Easy, to assure that you use a negative number in exp(lx-ly). If lx-ly gets too big, then exp(lx-ly) would return Inf. That's not a correct result. exp(ly-lx) would return 0, which allows for a far better result:

Say lx=1 and ly=1000, then :

> 1+log1p(exp(1000-1))
[1] Inf
> 1000+log1p(exp(1-1000))
[1] 1000

Very small numbers in R

You can use Library gmp

http://cran.r-project.org/web/packages/gmp/

Example (Large Numbers)

install.packages("gmp")
library(gmp)
largevalue <- as.bigz(2305843009213694080000000)
largevalue

Example (Small Numbers)

smallvalues <- asNumeric(cbind(0.0000000000000000000001,0.0000000000000000000003))
smallvalues

Handling very small numbers in ratio and how to keep exponential value

As @minem suggested, you can use the Rmpfr package. Here's one way to apply it to your case.

First move the multipliers inside the exponential of the numerator, using the fact that a*exp(b) = exp(b + log(a)). Then re-write your density function to compute the log numerator:

log_numerator <- function(nc, yc, X, beta, sig, k, lambda){
v <- yc - X %*% beta[,k]
res <- -sum(v*v)/(2*sig[k]) - (nc/2)*log(2*pi*sig[k]) + log(lambda[k])
drop(res)
}

Note that lambda is now passed to this function. Also note that we can compute the dot product of the vector Y - X*beta more efficiently, as shown.

Now we can generate some data. Here I fix c and just have k = 1:2.

set.seed(1)
n_c <- 340
y_c <- rnorm(340)
dat <- data.frame(fac = sample(letters[1:11], 340, replace = TRUE)
X_c <- model.matrix(~ fac, data = dat)
beta <- matrix(runif(22, -10, 10), 11, 2)
sigma <- c(21.694381, 4.267277)
lambda <- c(0.5, 0.5)

Using your density function we have

x1 <- lambda[1] *density(n_c, y_c,X_c,beta,sigma,1)
y1 <- lambda[2] *density(n_c, y_c,X_c,beta,sigma,2)
x1
# [1] +exp(-1738.4)
y1
# [1] +exp(-1838.7)
as.numeric(y1/sum(x1, y1))
# [1] 2.780805e-44

Using the log-numerator function we have

p <- 40
x <- mpfr(log_numerator(n_c, y_c,X_c,beta,sigma,1, lambda), p)
y <- mpfr(log_numerator(n_c, y_c,X_c,beta,sigma,2, lambda), p)
x
# 1 'mpfr' number of precision 40 bits
# [1] -1738.379327798
y
# 1 'mpfr' number of precision 40 bits
# [1] -1838.67033143
exp(y)/sum(exp(x), exp(y))
# 1 'mpfr' number of precision 53 bits
# [1] 2.780805017186589e-44

So certainly mpfr can be used to produce equivalent results, but without better test code it's hard to check timings.

You could also improve efficiency by using more vectorization. E.g. we can vectorize log_numerator over k:

log_numerator2 <- function(nc, yc, X, beta, sig, lambda){
M <- yc - X %*% beta
res <- -colSums(M*M)/(2*sig) - (nc/2)*log(2*pi*sig) + log(lambda)
drop(res)
}
z <- log_numerator2(n_c, y_c, X_c, beta, sigma, lambda)
z
# [1] -1738.379 -1838.670

Now suppose we have the log numerators in a c by k matrix, for illustration suppose all c have the same values as z,

log_num <- mpfr(matrix(z, byrow = TRUE, 3, 2), p)

you can compute the ratios as follows

num <- exp(log_num)
denom <- apply(num, 1, sum) # rowSums not implemented for mpfr
num/denom
# 'mpfrMatrix' of dim(.) = (3, 2) of precision 53 bits
# [,1] [,2]
# [1,] 1.000000000000000 2.780805017186589e-44
# [2,] 1.000000000000000 2.780805017186589e-44
# [3,] 1.000000000000000 2.780805017186589e-44

Arithmetic with very small numbers in R

The case of very small probabilities comes up often in machine learning and other statistical computing topics. You are getting a precision error because of the limitations of the internal representation of floating point numbers. This can be solved using arbitrary precision arithmetic, but that is not commonly done.

The most popular solution is to use a log transformation to represent your probabilities and then use addition instead of multiplication. This is referred to as log-likelihood. This transformation avoids the problem of very small numbers, and in addition, the log-likelihood values can be used directly to compare the probability of things (lower log-likelihood always means lower probability).

Note that there is a subtle distinction between likelihood and probability, but the log transformation turning very small numbers in to negative ones with less variety in the number of decimal places works regardless.

Reading very small numbers in R

Standard numeric data type in R (8-byte double precision) does not support such small numbers. The smallest positive number is about 1e-300

.Machine$double.xmin
# [1] 2.225074e-308

Can you convince whatever program generates your input data to save it in, say, logarithms?

Presenting very small numbers in R effectively

For using scientific notation in xtable, you need to use negative numbers in the digits option.

Part of the documentation for the digits option of xtable:

If values of digits are negative, the corresponding values of x are
displayed in scientific format with abs(digits) digits.

Example:

xtable::xtable(data.frame(coeffs, pv), digits = -2)

\begin{table}[ht]
\centering
\begin{tabular}{rrr}
\hline
& coeffs & pv \\
\hline
1 & -8.13E-02 & 1.96E-02 \\
2 & -9.65E-03 & 0.00E+00 \\
3 & -1.09E+00 & 0.00E+00 \\
4 & 1.15E-02 & 7.96E-01 \\
5 & -1.83E-03 & 7.44E-01 \\
6 & -1.95E-01 & 6.70E-06 \\
7 & -3.71E-02 & 6.26E-01 \\
8 & 6.44E-02 & 1.01E-01 \\
9 & -1.11E-01 & 4.20E-05 \\
10 & -4.34E-03 & 0.00E+00 \\
11 & 9.26E-03 & 1.06E-06 \\
12 & -1.36E-04 & 3.66E-01 \\
13 & -1.32E-04 & 4.47E-01 \\
14 & -1.37E-04 & 3.75E-01 \\
15 & -5.02E-04 & 1.79E-03 \\
\hline
\end{tabular}
\end{table}

Adding and printing very small numbers in R

Package Rmpfr can get more digits of precision. In the example below I will use 100 digits of precision.

library(Rmpfr)

d <- 100
zero <- mpfr(0, precBits = d)
one <- mpfr(1, precBits = d)
powers <- mpfr(2^(1:60), precBits = d)

result <- vector("list", length = length(powers))
temp2 <- zero
for(i in seq_along(powers)) {
temp2 = temp2 + one/powers[i]
iszero <- mpfrIs0(one/powers[i])
result[[i]] <- list(Exp = i, InvIsZero = iszero, Sum = temp2)
}

In the question the OP says that

past the 54th iteration, it does reach one, but it's apparently not
adding things any more (even increasing the %.40f parameter).

Apparently this has to do with sprintf, not with the results.

With the print.mpfr method:

for(i in 52:60){
print(result[[i]][[3]], digits = 60)
flush.console()
}
#1 'mpfr' number of precision 100 bits
#[1] 0.9999999999999997779553950749686919152736663818359375
#1 'mpfr' number of precision 100 bits
#[1] 0.99999999999999988897769753748434595763683319091796875
#1 'mpfr' number of precision 100 bits
#[1] 0.999999999999999944488848768742172978818416595458984375
#1 'mpfr' number of precision 100 bits
#[1] 0.9999999999999999722444243843710864894092082977294921875
#1 'mpfr' number of precision 100 bits
#[1] 0.99999999999999998612221219218554324470460414886474609375
#1 'mpfr' number of precision 100 bits
#[1] 0.999999999999999993061106096092771622352302074432373046875
#1 'mpfr' number of precision 100 bits
#[1] 0.9999999999999999965305530480463858111761510372161865234375
#1 'mpfr' number of precision 100 bits
#[1] 0.99999999999999999826527652402319290558807551860809326171875
#1 'mpfr' number of precision 100 bits
#[1] 0.999999999999999999132638262011596452794037759304046630859375

With sprintf:

for(i in 52:60){
print(sprintf("%d - %.60f", i, result[[i]][[3]]))
flush.console()
}
#[1] "52 - 0.999999999999999777955395074968691915273666381835937500000000"
#[1] "53 - 0.999999999999999888977697537484345957636833190917968750000000"
#[1] "54 - 1.000000000000000000000000000000000000000000000000000000000000"
#[1] "55 - 1.000000000000000000000000000000000000000000000000000000000000"
#[1] "56 - 1.000000000000000000000000000000000000000000000000000000000000"
#[1] "57 - 1.000000000000000000000000000000000000000000000000000000000000"
#[1] "58 - 1.000000000000000000000000000000000000000000000000000000000000"
#[1] "59 - 1.000000000000000000000000000000000000000000000000000000000000"
#[1] "60 - 1.000000000000000000000000000000000000000000000000000000000000"


Related Topics



Leave a reply



Submit