Weird Error in R When Importing (64-Bit) Integer with Many Digits

Weird error in R when importing (64-bit) integer with many digits

As others have noted, you can't represent integers that large. But R isn't reading those values into integers, it's reading them into double precision numerics.

Double precision can only represent numbers to ~16 places accurately, which is why you see your numbers rounded after 16 places. See the gmp, Rmpfr, and int64 packages for potential solutions. Though I don't see a function to read from a file in any of them, maybe you could cook something up by looking at their sources.

UPDATE:
Here's how you can get your file into an int64 object:

# This assumes your numbers are the only column in the file
# Read them in however, just ensure they're read in as character
a <- scan("temp.csv", what="")
ia <- as.int64(a)

R is turning large numbers to negative random numbers

should be 1504615865460506 and is -1372641510 for example

Looks like an overflow error.

From help(integer) in R:

Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9: ‘double’s can hold much larger integers exactly.

So, you'll need to use a larger type such as double for the values in question.

r - Error: Text after processing all cols in fread (data.table)

Actually there is a difference between the two files that you provide, and I think this is the cause of the different outputs of the fread.

The first file has an end of the line after the 3rd column, except line 258088, where there is a tab a 4th column and then the end of the line. (You can use the option 'show all characters to confirm that').

On the other hand the second file has in all rows an extra tab, i.e. a new empty column.
So in the first case fread expects 3 columns and then finds out a 4th column. On the contrary in the second file, fread expects 4 columns.

I checked read.table with fill=TRUE and it worked with both files. So I think that something is done differently with the fill option of the fread.

I would expect since fill=TRUE, all the lines to be used so as to infer the number of columns (with cost on computational time).

In the comments there are some nice workarounds you can use.

R unexpected rounding by writing in .csv and in a database

R uses 64-bit IEEE double precision as its base numeric format. This has a limit of precision of 14-15 significant figures (not decimal places). So R is just writing out the numbers to the correct limit of its accuracy.

If you want more decimals, you can use a package for arbitrary-precision arithmetic:

http://cran.r-project.org/web/packages/Rmpfr/index.html

http://cran.r-project.org/web/packages/gmp/index.html



Related Topics



Leave a reply



Submit