How to Read Large Numbers Precisely in R and Perform Arithmetic on Them

How do I read large numbers precisely in R and perform arithmetic on them?

That's not large. It is merely a representation problem. Try this:

options(digits=22)

options('digits') defaults to 7, which is why you are seeing what you do. All twelve digits are being read and stored, but not printed by default.

Preserving large numbers

It's not in a "1.67E+12 format", it just won't print entirely using the defaults. R is reading it in just fine and the whole number is there.

x <- 1665535004661
> x
[1] 1.665535e+12
> print(x, digits = 16)
[1] 1665535004661

See, the numbers were there all along. They don't get lost unless you have a really large number of digits. Sorting on what you brought in will work fine and you can just explicitly call print() with the digits option to see your data.frame instead of implicitly by typing the name.

Dealing with large numbers in R [Inf] and Python

In answer to your questions:

a) They use different representations for numbers. Most numbers in R are represented as double precision floating point values. These are all 64 bits long, and give about 15 digit precision throughout the range, which goes from -double.xmax to double.xmax, then switches to signed infinite values. R also uses 32 bit integer values sometimes. These cover the range of roughly +/- 2 billion. R chooses these types because it is geared towards statistical and numerical methods, and those rarely need more precision than double precision gives. (They often need a bigger range, but usually taking logs solves that problem.)

Python is more of a general purpose platform, and it has types discussed in MichaelChirico's comment.

b) Besides Brobdingnag, the gmp package can handle arbitrarily large integers. For example,

> as.bigz(2)^1500
Big Integer ('bigz') :
[1] 35074662110434038747627587960280857993524015880330828824075798024790963850563322203657080886584969261653150406795437517399294548941469959754171038918004700847889956485329097264486802711583462946536682184340138629451355458264946342525383619389314960644665052551751442335509249173361130355796109709885580674313954210217657847432626760733004753275317192133674703563372783297041993227052663333668509952000175053355529058880434182538386715523683713208549376
> nchar(as.character(as.bigz(2)^1500))
[1] 452

I imagine the as.character() call would also be needed with Brobdingnag.

How do I read csv with large numbers (probably scientific notation) in R?

The numbers are probably all there, by default R doesn't aways show them. See

> a<-1329100082670
> a
[1] 1.3291e+12
> dput(a)
1329100082670

The dput() shows all the digits are retained even if they display on screen in scientific notation. You can discourage R from using scientific notation by setting the scipen option. Something like

options(scipen=999)

will turn off most scientific notation. But really, what it looks like on screen shouldn't be too important to you hopefully.

Complete loss of accuracy in modulus when calculating with very large numbers

Your numbers are too large for R's numeric data type. See this answer which explains how to see the minumum and maximum values that can be represented in the numeric type. It is approximately 53 bits, which your number 1e20 exceeds.

Also see CRAN's FAQ (section 7.31) which explains a little more about floating-point representation in R.

As for how to handle large numbers in R, have a look at this blog post that describes the "gmp" package that may be helpful to you.

long/bigint/decimal equivalent datatype in R

See help(integer):

 Note that on almost all implementations of R the range of
representable integers is restricted to about +/-2*10^9: ‘double’s
can hold much larger integers exactly.

so I would recommend using numeric (i.e. 'double') -- a double-precision number.

Updated in 2022: This issue still stands and will unlikely ever change: integer in R is (signed) int32_t (and hence range limited). double in a proper double. Package int64 aimed to overcome this by using S4 and a complex (integer) type to give us 64 bit resolution (as in int64_t). Package bit64 does the same by using a double internally and many packages from data.table to database interfaces or JSON parsers (including our RcppSimdJson) use it. Our package nanotime relies on it to provide int64_t based timestamps (i.e nanoseconds since epoch). In short there is not other way. Some JSON packages stick with string representation too ("expensive", need to convert later).



Related Topics



Leave a reply



Submit