Weird error in R when importing (64-bit) integer with many digits
As others have noted, you can't represent integers that large. But R isn't reading those values into integers, it's reading them into double precision numerics.
Double precision can only represent numbers to ~16 places accurately, which is why you see your numbers rounded after 16 places. See the gmp, Rmpfr, and int64 packages for potential solutions. Though I don't see a function to read from a file in any of them, maybe you could cook something up by looking at their sources.
UPDATE:
Here's how you can get your file into an int64
object:
# This assumes your numbers are the only column in the file
# Read them in however, just ensure they're read in as character
a <- scan("temp.csv", what="")
ia <- as.int64(a)
R is turning large numbers to negative random numbers
should be 1504615865460506 and is -1372641510 for example
Looks like an overflow error.
From help(integer)
in R:
Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9: ‘double’s can hold much larger integers exactly.
So, you'll need to use a larger type such as double
for the values in question.
r - Error: Text after processing all cols in fread (data.table)
Actually there is a difference between the two files that you provide, and I think this is the cause of the different outputs of the fread.
The first file has an end of the line after the 3rd column, except line 258088, where there is a tab a 4th column and then the end of the line. (You can use the option 'show all characters to confirm that').
On the other hand the second file has in all rows an extra tab, i.e. a new empty column.
So in the first case fread expects 3 columns and then finds out a 4th column. On the contrary in the second file, fread expects 4 columns.
I checked read.table with fill=TRUE
and it worked with both files. So I think that something is done differently with the fill
option of the fread.
I would expect since fill=TRUE
, all the lines to be used so as to infer the number of columns (with cost on computational time).
In the comments there are some nice workarounds you can use.
R unexpected rounding by writing in .csv and in a database
R uses 64-bit IEEE double precision as its base numeric format. This has a limit of precision of 14-15 significant figures (not decimal places). So R is just writing out the numbers to the correct limit of its accuracy.
If you want more decimals, you can use a package for arbitrary-precision arithmetic:
http://cran.r-project.org/web/packages/Rmpfr/index.html
http://cran.r-project.org/web/packages/gmp/index.html
Related Topics
Ggpairs Plot with Heatmap of Correlation Values
How to Keep Midnight (00:00H) Using Strptime() in R
Dplyr Rowwise Sum and Other Functions Like Max
Substitute Dt1.X with Dt2.Y When Dt1.X and Dt2.X Match in R
C5.0 Decision Tree - C50 Code Called Exit with Value 1
Count Every Possible Pair of Values in a Column Grouped by Multiple Columns
Add Column Containing Data Frame Name to a List of Data Frames
Is There an Error in Round Function in R
Gsub in R with Unicode Replacement Give Different Results Under Windows Compared with Unix
How to Plot a Heat Map on a Spatial Map
Automate Zip File Reading in R