data.table is not handling integer64 in by statement
Update: This is now implemented in v1.9.3 (available from R-Forge), see NEWS :
o
bit64::integer64
now works in grouping and joins, #5369. Thanks to James Sams for highlighting UPCs and Clayton Stanley.
Reminder:fread()
has been able to detect and readinteger64
for a while.
On OP's example above:
test[, .N, by=ID]
# ID N
# 1: 432706205348805058 2
# 2: 432706205348805059 1
integer64
isn't yet implemented for data.table
operations such as setkey
or by
. It was just implemented in fread
only (first released to CRAN on 6 March 2013) as a first step. It could be useful as a value column for example.
I may have confused matters by filing a bug report relating to this (the one @Arun linked to). Strictly speaking, it isn't a bug but a feature request. I think of the bug list more like 'important things to resolve before the next release'.
Contributions are very welcome.
datatable.integer64 argument is not working for me should it?
This is implemented in v1.8.11, on R-Forge but not yet on CRAN. From NEWS :
o fread's integer64 argument implemented. Allows reading of integer64 data as 'double' or 'character'
instead of bit64::integer64 (which remains the default as before). Thanks to Chris Neff for the
suggestion. The default can be changed globally; e.g, options(datatable.integer64="character")
Regarding :
If colClasses is the answer, I think it does not allow to specify a single column name or index and the table I load has tens of columns so unpracticable...
colClasses
in fread
does let you override type for one or a few columns (by name or by number), and the rest will be automatically detected. For exactly the reason you state. If it doesn't, please report as a bug. An alternative to colClasses is the datatable.integer64 global option which lets you tell fread that whenever it detects integer64 it should load it as character or double instead (in v1.8.11 as well).
Dealing with large integers in R
You are passing a floating point number to as.integer64
. The loss of precision is already in your input to as.integer64
:
is.double(18495608239531729)
#[1] TRUE
sprintf("%20.5f", 18495608239531729)
#[1] "18495608239531728.00000"
Pass a character string to avoid that:
library(bit64)
as.integer64("18495608239531729")
#integer64
#[1] 18495608239531729
fread() fails with missing values in integer64 columns
This apparently is an issue with the bit64 package, not fread()
or data.table
. From the bit64
documentation http://cran.r-project.org/web/packages/bit64/bit64.pdf
"Subscripting non-existing elements and subscripting with NAs is currently not supported. Such subscripting currently returns 9218868437227407266 instead of NA (the NA value of the un-derlying double code). Following the full R behaviour here would either destroy performance or require extensive C-coding."
I tried reassigning the 9218868437227407266 value to NA thinking it would work
Ex.
DT[V8==9218868437227407266, ]
#actually returns nothing, but
DT[V8==max(V8), ]
#returns the rows with 9218868437227407266 in V8
#but this does not reassign the value
DT[V8==max(V8), V8:=NA]
#not that this makes sense, but I tried just in case...
DT[V8==max(V8), V8:=NA_character_]
So as the documentation pretty clearly states, if a vector is class integer64 it won't recognize NA or missing values. I've going to avoid bit64 just to not have to deal with this...
Related Topics
Generate All Possible Permutations (Or N-Tuples)
Number Format, Writing 1E-5 Instead of 0.00001
Assign Names to Data Frame with As.Data.Frame Function
Numbers as Column Names of Data Frames
Calculate Monthly Average of Ts Object
Download All Files from a Folder on a Website
Add Colored Arrow to Axis of Ggplot2 (Partially Outside Plot Region)
Si Prefixes in Ggplot2 Axis Labels
Convert List to Data Frame While Keeping List-Element Names
Changing Format of Some Axis Labels in Ggplot2 According to Condition
Ordering Stacks by Size in a Ggplot2 Stacked Bar Graph
"'\W' Is an Unrecognized Escape" in Grep
R Shiny, How to Make Datatable React to Checkboxes in Datatable
How to Expand Axis Asymmetrically with Ggplot2 Without Setting Limits Manually