R data.table breaks in exported functions
The issue, as @GSee pointed out (under comments) seems to be this issue still.
In order to find out if a package is data.table aware, data.table
calls the function cedta()
, which is:
> data.table:::cedta
function (n = 2L)
{
te = topenv(parent.frame(n))
if (!isNamespace(te))
return(TRUE)
nsname = getNamespaceName(te)
ans = nsname == "data.table" || "data.table" %chin% names(getNamespaceImports(te)) ||
"data.table" %chin% tryCatch(get(".Depends", paste("package",
nsname, sep = ":"), inherits = FALSE), error = function(e) NULL) ||
(nsname == "utils" && exists("debugger.look", parent.frame(n +
1L))) || nsname %chin% cedta.override || identical(TRUE,
tryCatch(get(".datatable.aware", asNamespace(nsname),
inherits = FALSE), error = function(e) NULL))
if (!ans && getOption("datatable.verbose"))
cat("cedta decided '", nsname, "' wasn't data.table aware\n",
sep = "")
ans
}
<bytecode: 0x7ff67b9ca190>
<environment: namespace:data.table>
The relevant check here is:
"data.table" %chin% get(".Depends", paste("package", nsname, sep=":"), inherits=FALSE)
When a package depends on data.table
, the above command should return TRUE
- that is, if you installed the package via R CMD INSTALL
and then loaded the package. This is because, when you load the package, R by default creates a ".Depends" variable in the namespace as well. If you did:
ls("package:test", all=TRUE)
# [1] ".Depends" "foo"
However, when you do devtools:::load_all()
, this variable doesn't seem to be set.
# new session + set path to package's dir
devtools:::load_all()
ls("package:test", all=TRUE)
# [1] "foo"
So, cedta()
doesn't get to know that this package indeed depends on data.table
. However, when you manually set .datatable.aware=TRUE
, the line:
identical(TRUE, get(".datatable.aware", asNamespace(nsname), inherits = FALSE))
gets executed, which will return TRUE and therefore overcomes the issue. But the fact that devtools
doesn't place the .Depends
variable in the package's namespace is still there.
All in all, this is really not an issue with data.table
.
Data.table aggregate function doesn't work when build in package
There still seems to be an issue with the devtools
package. As you can read here. What gave me a good hint was this earlier stackoverflow question.
In summary the approach is as follows:
- add
#' @import data.table
in the script file of the R package where the function lies. - add
import(data.table)
statement to theNAMESPACE
file - Although I already had
Imports: data.table
, I additionally addedDepends: data.table
in theDESCRIPTION
file - Then I rebuilt it and reinstalled it
split' is not an exported object from 'namespace:data.table'
split
is a generic function defined in base. Package data.table just adds a new "data.table" method to it.
## S3 method for class 'data.table'
split(x, f, drop = FALSE,
by, sorted = FALSE, keep.by = TRUE, flatten = TRUE,
..., verbose = getOption("datatable.verbose"))
You can go for data.table:::split.data.table
.
So this behavior is expected? It seems weird to me. Shouldn't
data.table::split
point todata.table:::split.data.table
?
It might be easier for you to consider print
. This is also a generic function. Does every package need to redefine it and make an R session a mess? No. A common practice is to add a new method by defining print.xxx
.
`data.table::unique` errors: is not an exported object from namespace
The function in question is really unique.data.table
, an S3 method defined in the data.table
package. That method is not really intended to be called directly, so it isn't exported. This is typically the case with S3 methods. Instead, the package registers the method as an S3 method, which then allows the S3 generic, base::unique
in this case, to dispatch on it. So the right way to call the function is:
library(data.table)
irisDT <- data.table(iris)
unique(irisDT)
We use base::unique
, which is exported, and it dispatches data.table:::unique.data.table
, which is not exported. The function data.table:::unique
does not actually exist (or does it need to).
As eddi points out, base::unique
dispatches based on the class of the object called. So base::unique
will call data.table:::unique.data.table
only if the object is a data.table
. You can force a call to that method directly with something like data.table:::unique.data.table(iris)
, but internally that will mostly likely result in the next method getting called unless your object is actually a data.table
.
Is a copy made when function returns a data.table?
Thanks to Arun for his answer in the comments. I will be using his example in his comments to answer the question.
One can check if copies are being made by using the tracemem
function to track an object in R. From the help file of the function, ?tracemem
, the description says:
This function marks an object so that a message is printed whenever the internal code copies the object. It is a major cause of hard-to-predict memory use in R.
For example:
# Using a data.frame
df <- data.frame(x=1:5, y=6:10)
tracemem(df)
## [1] "<0x32618220>"
df$y[2L] <- 11L
## tracemem[0x32618220 -> 0x32661a98]:
## tracemem[0x32661a98 -> 0x32661b08]: $<-.data.frame $<-
## tracemem[0x32661b08 -> 0x32661268]: $<-.data.frame $<-
df
## x y
## 1 1 6
## 2 2 11
## 3 3 8
## 4 4 9
## 5 5 10
# Using a data.table
dt <- data.table(x=1:5, y=6:10)
tracemem(dt)
## [1] "<0x5fdab40>"
set(dt, i=2L, j=2L, value=11L) # No memory output!
address(dt) # Verify the address in memory is the same
## [1] "0x5fdab40"
dt
## x y
## 1: 1 6
## 2: 2 11
## 3: 3 8
## 4: 4 9
## 5: 5 10
It appears that the data.frame
object is copied twice when changing one element in the data.frame
, while the data.table
is modified in place without making copies!
From my question, I can just track the data.table
or data.frame
object, d
, before passing it on to the function, foo
, to check if any copies were made.
Import from `.` `data.table` so that lintr recognizes it
What if you try quoting the dot with double quotes?importFrom data.table "."
I know this is how I've done imports for the magrittr
pipe operator
If that doesn't work you can always add the .
to a globals.R
file that defines your global variables using utils::globalVariables()
if(getRversion() >= "2.15.1") utils::globalVariables(c("."))
data.table not returning the correct splinefun by group
As I've answered in https://github.com/Rdatatable/data.table/issues/4298#issuecomment-597737776 , adding copy()
on x
and y
variables will solve this issue.
The reason is that splinefun()
would try to store the values of x
and y
. However, the internal object of data.table
is always passed by reference (for the speed)... On this case, you may have to explicitly copy()
the variable in order to have expected answers.
In short, changing
mod_splines <- dt[, .(Spline = list(splinefun(x=x, y=y, method = "natural"))),
by = c("cat")]
to
mod_splines <- dt[, .(Spline = list(splinefun(x=copy(x), y=copy(y), method = "natural"))),
by = c("cat")]
or this (you can ignore this, but it may give you a better understanding)
mod_splines <- dt[, .(Spline = list(splinefun(x=x+0, y=y+0, method = "natural"))),
by = cat]
is enough.
Why is this code working in the R console, but not as part of an R package?
The data.table
package does some strange non-standard evaluation. It tries to figure out whether your package wants to support that or not, and in your case, decided "not".
I think this is documented behaviour, but I'd call it a design flaw, if not a bug.
You can force it to support the NSE by putting
.datatable.aware <- TRUE
somewhere in your package source code.
Related Topics
Twitter Data Analysis - Error in Term Document Matrix
How to Calculate Any Negative Number to the Power of Some Fraction in R
How to Put Values on a Boxplot for Median, 1St Quartile and Last Quartile
Dynamic Linking with Rpath Not Working Under Ubuntu 17.10
Font Family Won't Change in Ggplot
Documentation on Internal Variables in Ggplot, Esp. Panel
Get the Event Which Is Fired in Shiny
Replace Value with the Name of Its Respective Column
Combining Low Frequency Counts
R: Updating a Data Frame with Another Data Frame
Ggplot2 Overlay of Barplot and Line Plot
How to Convert Time to Decimal
Calculating a Distance Matrix by Dtw
Real Cube Root of a Negative Number