Dealing with very small numbers in R
Mathematically spoken, one of those numbers will be appx. zero, and the other one. The difference between your numbers is huge, so I'm even wondering if this makes sense.
But to do that in general, you can use the idea from the logspace_add
C-function that's underneath the hood of R. One can define logxpy ( =log(x+y) )
when lx = log(x)
and ly = log(y)
as :
logxpy <- function(lx,ly) max(lx,ly) + log1p(exp(-abs(lx-ly)))
Which means that we can use :
> la1 <- 1000*log(0.1)
> la2 <- 1200*log(0.2)
> exp(la1 - logxpy(la1,la2))
[1] 5.807714e-162
> exp(la2 - logxpy(la1,la2))
[1] 1
This function can be called recursively as well if you have more numbers. Mind you, 1 is still 1, and not 1 minus 5.807...e-162
. If you really need more precision and your platform supports long double types, you could code everything in eg C or C++, and return the results later on. But if I'm right, R can - for the moment - only deal with normal doubles, so ultimately you'll lose the precision again when the result is shown.
EDIT :
to do the math for you :
log(x+y) = log(exp(lx)+exp(ly))
= log( exp(lx) * (1 + exp(ly-lx) )
= lx + log ( 1 + exp(ly - lx) )
Now you just take the largest as lx, and then you come at the expression in logxpy()
.
EDIT 2 : Why take the maximum then? Easy, to assure that you use a negative number in exp(lx-ly). If lx-ly gets too big, then exp(lx-ly) would return Inf. That's not a correct result. exp(ly-lx) would return 0, which allows for a far better result:
Say lx=1 and ly=1000, then :
> 1+log1p(exp(1000-1))
[1] Inf
> 1000+log1p(exp(1-1000))
[1] 1000
Very small numbers in R
You can use Library gmp
http://cran.r-project.org/web/packages/gmp/
Example (Large Numbers)
install.packages("gmp")
library(gmp)
largevalue <- as.bigz(2305843009213694080000000)
largevalue
Example (Small Numbers)
smallvalues <- asNumeric(cbind(0.0000000000000000000001,0.0000000000000000000003))
smallvalues
Handling very small numbers in ratio and how to keep exponential value
As @minem suggested, you can use the Rmpfr package. Here's one way to apply it to your case.
First move the multipliers inside the exponential of the numerator, using the fact that a*exp(b) = exp(b + log(a)). Then re-write your density
function to compute the log numerator:
log_numerator <- function(nc, yc, X, beta, sig, k, lambda){
v <- yc - X %*% beta[,k]
res <- -sum(v*v)/(2*sig[k]) - (nc/2)*log(2*pi*sig[k]) + log(lambda[k])
drop(res)
}
Note that lambda
is now passed to this function. Also note that we can compute the dot product of the vector Y - X*beta more efficiently, as shown.
Now we can generate some data. Here I fix c and just have k = 1:2.
set.seed(1)
n_c <- 340
y_c <- rnorm(340)
dat <- data.frame(fac = sample(letters[1:11], 340, replace = TRUE)
X_c <- model.matrix(~ fac, data = dat)
beta <- matrix(runif(22, -10, 10), 11, 2)
sigma <- c(21.694381, 4.267277)
lambda <- c(0.5, 0.5)
Using your density function we have
x1 <- lambda[1] *density(n_c, y_c,X_c,beta,sigma,1)
y1 <- lambda[2] *density(n_c, y_c,X_c,beta,sigma,2)
x1
# [1] +exp(-1738.4)
y1
# [1] +exp(-1838.7)
as.numeric(y1/sum(x1, y1))
# [1] 2.780805e-44
Using the log-numerator function we have
p <- 40
x <- mpfr(log_numerator(n_c, y_c,X_c,beta,sigma,1, lambda), p)
y <- mpfr(log_numerator(n_c, y_c,X_c,beta,sigma,2, lambda), p)
x
# 1 'mpfr' number of precision 40 bits
# [1] -1738.379327798
y
# 1 'mpfr' number of precision 40 bits
# [1] -1838.67033143
exp(y)/sum(exp(x), exp(y))
# 1 'mpfr' number of precision 53 bits
# [1] 2.780805017186589e-44
So certainly mpfr
can be used to produce equivalent results, but without better test code it's hard to check timings.
You could also improve efficiency by using more vectorization. E.g. we can vectorize log_numerator
over k:
log_numerator2 <- function(nc, yc, X, beta, sig, lambda){
M <- yc - X %*% beta
res <- -colSums(M*M)/(2*sig) - (nc/2)*log(2*pi*sig) + log(lambda)
drop(res)
}
z <- log_numerator2(n_c, y_c, X_c, beta, sigma, lambda)
z
# [1] -1738.379 -1838.670
Now suppose we have the log numerators in a c by k matrix, for illustration suppose all c have the same values as z
,
log_num <- mpfr(matrix(z, byrow = TRUE, 3, 2), p)
you can compute the ratios as follows
num <- exp(log_num)
denom <- apply(num, 1, sum) # rowSums not implemented for mpfr
num/denom
# 'mpfrMatrix' of dim(.) = (3, 2) of precision 53 bits
# [,1] [,2]
# [1,] 1.000000000000000 2.780805017186589e-44
# [2,] 1.000000000000000 2.780805017186589e-44
# [3,] 1.000000000000000 2.780805017186589e-44
Arithmetic with very small numbers in R
The case of very small probabilities comes up often in machine learning and other statistical computing topics. You are getting a precision error because of the limitations of the internal representation of floating point numbers. This can be solved using arbitrary precision arithmetic, but that is not commonly done.
The most popular solution is to use a log transformation to represent your probabilities and then use addition instead of multiplication. This is referred to as log-likelihood. This transformation avoids the problem of very small numbers, and in addition, the log-likelihood values can be used directly to compare the probability of things (lower log-likelihood always means lower probability).
Note that there is a subtle distinction between likelihood and probability, but the log transformation turning very small numbers in to negative ones with less variety in the number of decimal places works regardless.
Reading very small numbers in R
Standard numeric data type in R (8-byte double precision) does not support such small numbers. The smallest positive number is about 1e-300
.Machine$double.xmin
# [1] 2.225074e-308
Can you convince whatever program generates your input data to save it in, say, logarithms?
Presenting very small numbers in R effectively
For using scientific notation in xtable
, you need to use negative numbers in the digits option.
Part of the documentation for the digits option of xtable
:
If values of digits are negative, the corresponding values of x are
displayed in scientific format with abs(digits) digits.
Example:
xtable::xtable(data.frame(coeffs, pv), digits = -2)
\begin{table}[ht]
\centering
\begin{tabular}{rrr}
\hline
& coeffs & pv \\
\hline
1 & -8.13E-02 & 1.96E-02 \\
2 & -9.65E-03 & 0.00E+00 \\
3 & -1.09E+00 & 0.00E+00 \\
4 & 1.15E-02 & 7.96E-01 \\
5 & -1.83E-03 & 7.44E-01 \\
6 & -1.95E-01 & 6.70E-06 \\
7 & -3.71E-02 & 6.26E-01 \\
8 & 6.44E-02 & 1.01E-01 \\
9 & -1.11E-01 & 4.20E-05 \\
10 & -4.34E-03 & 0.00E+00 \\
11 & 9.26E-03 & 1.06E-06 \\
12 & -1.36E-04 & 3.66E-01 \\
13 & -1.32E-04 & 4.47E-01 \\
14 & -1.37E-04 & 3.75E-01 \\
15 & -5.02E-04 & 1.79E-03 \\
\hline
\end{tabular}
\end{table}
Adding and printing very small numbers in R
Package Rmpfr
can get more digits of precision. In the example below I will use 100 digits of precision.
library(Rmpfr)
d <- 100
zero <- mpfr(0, precBits = d)
one <- mpfr(1, precBits = d)
powers <- mpfr(2^(1:60), precBits = d)
result <- vector("list", length = length(powers))
temp2 <- zero
for(i in seq_along(powers)) {
temp2 = temp2 + one/powers[i]
iszero <- mpfrIs0(one/powers[i])
result[[i]] <- list(Exp = i, InvIsZero = iszero, Sum = temp2)
}
In the question the OP says that
past the 54th iteration, it does reach one, but it's apparently not
adding things any more (even increasing the %.40f parameter).
Apparently this has to do with sprintf
, not with the results.
With the print.mpfr
method:
for(i in 52:60){
print(result[[i]][[3]], digits = 60)
flush.console()
}
#1 'mpfr' number of precision 100 bits
#[1] 0.9999999999999997779553950749686919152736663818359375
#1 'mpfr' number of precision 100 bits
#[1] 0.99999999999999988897769753748434595763683319091796875
#1 'mpfr' number of precision 100 bits
#[1] 0.999999999999999944488848768742172978818416595458984375
#1 'mpfr' number of precision 100 bits
#[1] 0.9999999999999999722444243843710864894092082977294921875
#1 'mpfr' number of precision 100 bits
#[1] 0.99999999999999998612221219218554324470460414886474609375
#1 'mpfr' number of precision 100 bits
#[1] 0.999999999999999993061106096092771622352302074432373046875
#1 'mpfr' number of precision 100 bits
#[1] 0.9999999999999999965305530480463858111761510372161865234375
#1 'mpfr' number of precision 100 bits
#[1] 0.99999999999999999826527652402319290558807551860809326171875
#1 'mpfr' number of precision 100 bits
#[1] 0.999999999999999999132638262011596452794037759304046630859375
With sprintf
:
for(i in 52:60){
print(sprintf("%d - %.60f", i, result[[i]][[3]]))
flush.console()
}
#[1] "52 - 0.999999999999999777955395074968691915273666381835937500000000"
#[1] "53 - 0.999999999999999888977697537484345957636833190917968750000000"
#[1] "54 - 1.000000000000000000000000000000000000000000000000000000000000"
#[1] "55 - 1.000000000000000000000000000000000000000000000000000000000000"
#[1] "56 - 1.000000000000000000000000000000000000000000000000000000000000"
#[1] "57 - 1.000000000000000000000000000000000000000000000000000000000000"
#[1] "58 - 1.000000000000000000000000000000000000000000000000000000000000"
#[1] "59 - 1.000000000000000000000000000000000000000000000000000000000000"
#[1] "60 - 1.000000000000000000000000000000000000000000000000000000000000"
Related Topics
How to Tell the R Interpreter How to Use the Proxy Server
Add Values to a Reactive Table in Shiny
Remove Extra Space and Ring at the Edge of a Polar Plot
Modifying Ggplot Objects After Creation
Plotting During a Loop in Rstudio
Ggplot Custom Scale Transformation with Custom Ticks
Why am I Losing Categorical Data in My Regression Summary
Create New Column Based on 4 Values in Another Column
Export Data Frames to Excel via Xlsx with Conditional Formatting
Wrap Text Around Plots in Markdown
Pad with Leading Zeros to Common Width
Remove All Duplicates Except Last Instance
Current Time in Iso 8601 Format