R Data.Table Breaks in Exported Functions

R data.table breaks in exported functions

The issue, as @GSee pointed out (under comments) seems to be this issue still.

In order to find out if a package is data.table aware, data.table calls the function cedta(), which is:

> data.table:::cedta
function (n = 2L)
{
te = topenv(parent.frame(n))
if (!isNamespace(te))
return(TRUE)
nsname = getNamespaceName(te)
ans = nsname == "data.table" || "data.table" %chin% names(getNamespaceImports(te)) ||
"data.table" %chin% tryCatch(get(".Depends", paste("package",
nsname, sep = ":"), inherits = FALSE), error = function(e) NULL) ||
(nsname == "utils" && exists("debugger.look", parent.frame(n +
1L))) || nsname %chin% cedta.override || identical(TRUE,
tryCatch(get(".datatable.aware", asNamespace(nsname),
inherits = FALSE), error = function(e) NULL))
if (!ans && getOption("datatable.verbose"))
cat("cedta decided '", nsname, "' wasn't data.table aware\n",
sep = "")
ans
}
<bytecode: 0x7ff67b9ca190>
<environment: namespace:data.table>

The relevant check here is:

"data.table" %chin% get(".Depends", paste("package", nsname, sep=":"), inherits=FALSE)

When a package depends on data.table, the above command should return TRUE - that is, if you installed the package via R CMD INSTALL and then loaded the package. This is because, when you load the package, R by default creates a ".Depends" variable in the namespace as well. If you did:

ls("package:test", all=TRUE)
# [1] ".Depends" "foo"

However, when you do devtools:::load_all(), this variable doesn't seem to be set.

# new session + set path to package's dir
devtools:::load_all()
ls("package:test", all=TRUE)
# [1] "foo"

So, cedta() doesn't get to know that this package indeed depends on data.table. However, when you manually set .datatable.aware=TRUE, the line:

identical(TRUE, get(".datatable.aware", asNamespace(nsname), inherits = FALSE))

gets executed, which will return TRUE and therefore overcomes the issue. But the fact that devtools doesn't place the .Depends variable in the package's namespace is still there.

All in all, this is really not an issue with data.table.

Data.table aggregate function doesn't work when build in package

There still seems to be an issue with the devtools package. As you can read here. What gave me a good hint was this earlier stackoverflow question.

In summary the approach is as follows:

  1. add #' @import data.table in the script file of the R package where the function lies.
  2. add import(data.table) statement to the NAMESPACE file
  3. Although I already had Imports: data.table, I additionally added Depends: data.table in the DESCRIPTION file
  4. Then I rebuilt it and reinstalled it

split' is not an exported object from 'namespace:data.table'

split is a generic function defined in base. Package data.table just adds a new "data.table" method to it.

## S3 method for class 'data.table'
split(x, f, drop = FALSE,
by, sorted = FALSE, keep.by = TRUE, flatten = TRUE,
..., verbose = getOption("datatable.verbose"))

You can go for data.table:::split.data.table.



So this behavior is expected? It seems weird to me. Shouldn't data.table::split point to data.table:::split.data.table?

It might be easier for you to consider print. This is also a generic function. Does every package need to redefine it and make an R session a mess? No. A common practice is to add a new method by defining print.xxx.

`data.table::unique` errors: is not an exported object from namespace

The function in question is really unique.data.table, an S3 method defined in the data.table package. That method is not really intended to be called directly, so it isn't exported. This is typically the case with S3 methods. Instead, the package registers the method as an S3 method, which then allows the S3 generic, base::unique in this case, to dispatch on it. So the right way to call the function is:

library(data.table)
irisDT <- data.table(iris)
unique(irisDT)

We use base::unique, which is exported, and it dispatches data.table:::unique.data.table, which is not exported. The function data.table:::unique does not actually exist (or does it need to).

As eddi points out, base::unique dispatches based on the class of the object called. So base::unique will call data.table:::unique.data.table only if the object is a data.table. You can force a call to that method directly with something like data.table:::unique.data.table(iris), but internally that will mostly likely result in the next method getting called unless your object is actually a data.table.

Is a copy made when function returns a data.table?

Thanks to Arun for his answer in the comments. I will be using his example in his comments to answer the question.

One can check if copies are being made by using the tracemem function to track an object in R. From the help file of the function, ?tracemem, the description says:

This function marks an object so that a message is printed whenever the internal code copies the object. It is a major cause of hard-to-predict memory use in R.

For example:

# Using a data.frame
df <- data.frame(x=1:5, y=6:10)
tracemem(df)
## [1] "<0x32618220>"
df$y[2L] <- 11L
## tracemem[0x32618220 -> 0x32661a98]:
## tracemem[0x32661a98 -> 0x32661b08]: $<-.data.frame $<-
## tracemem[0x32661b08 -> 0x32661268]: $<-.data.frame $<-
df
## x y
## 1 1 6
## 2 2 11
## 3 3 8
## 4 4 9
## 5 5 10

# Using a data.table
dt <- data.table(x=1:5, y=6:10)
tracemem(dt)
## [1] "<0x5fdab40>"
set(dt, i=2L, j=2L, value=11L) # No memory output!
address(dt) # Verify the address in memory is the same
## [1] "0x5fdab40"
dt
## x y
## 1: 1 6
## 2: 2 11
## 3: 3 8
## 4: 4 9
## 5: 5 10

It appears that the data.frame object is copied twice when changing one element in the data.frame, while the data.table is modified in place without making copies!

From my question, I can just track the data.table or data.frame object, d, before passing it on to the function, foo, to check if any copies were made.

Import from `.` `data.table` so that lintr recognizes it

What if you try quoting the dot with double quotes?
importFrom data.table "."

I know this is how I've done imports for the magrittr pipe operator

If that doesn't work you can always add the . to a globals.R file that defines your global variables using utils::globalVariables()

if(getRversion() >= "2.15.1")  utils::globalVariables(c("."))

data.table not returning the correct splinefun by group

As I've answered in https://github.com/Rdatatable/data.table/issues/4298#issuecomment-597737776 , adding copy() on x and y variables will solve this issue.

The reason is that splinefun() would try to store the values of x and y. However, the internal object of data.table is always passed by reference (for the speed)... On this case, you may have to explicitly copy() the variable in order to have expected answers.

In short, changing

mod_splines <- dt[, .(Spline = list(splinefun(x=x, y=y, method = "natural"))),
by = c("cat")]

to

mod_splines <- dt[, .(Spline = list(splinefun(x=copy(x), y=copy(y), method = "natural"))),
by = c("cat")]

or this (you can ignore this, but it may give you a better understanding)

mod_splines <- dt[, .(Spline = list(splinefun(x=x+0, y=y+0, method = "natural"))),
by = cat]

is enough.

Why is this code working in the R console, but not as part of an R package?

The data.table package does some strange non-standard evaluation. It tries to figure out whether your package wants to support that or not, and in your case, decided "not".

I think this is documented behaviour, but I'd call it a design flaw, if not a bug.

You can force it to support the NSE by putting

.datatable.aware <- TRUE

somewhere in your package source code.



Related Topics



Leave a reply



Submit