Using Data.Table Package Inside My Own Package

Using data.table package inside my own package

Andrie's guess is right, +1. There is a FAQ on it (see vignette("datatable-faq")), as well as a new vignette on importing data.table:

FAQ 6.9: I have created a package that depends on data.table. How do I
ensure my package is data.table-aware so that inheritance from
data.frame works?

Either i) include data.table in the Depends: field of your DESCRIPTION file, or ii) include data.table in the Imports: field of your DESCRIPTION file AND import(data.table) in your NAMESPACE file.

Further background ... at the top of [.data.table (and other data.table functions), you'll see a switch depending on the result of a call to cedta(). This stands for Calling Environment Data Table Aware. Typing data.table:::cedta reveals how it's done. It relies on the calling package having a namespace, and, that namespace Import'ing or Depend'ing on data.table. This is how data.table can be passed to non-data.table-aware packages (such as functions in base) and those packages can use absolutely standard [.data.frame syntax on the data.table, blissfully unaware that the data.frame is() a data.table, too.

This is also why data.table inheritance didn't used to be compatible with namespaceless packages, and why upon user request we had to ask authors of such packages to add a namespace to their package to be compatible. Happily, now that R adds a default namespace for packages missing one (from v2.14.0), that problem has gone away :

CHANGES IN R VERSION 2.14.0

* All packages must have a namespace, and one is created on installation if not supplied in the sources.

R: Using data table inside my own package: Error in lapply(.SD, mean) : object '.SD' not found

As there is an reproducible exaple now in your question, I was able to dig into it.

I downloaded zip file from your link, unzip it, renamed myexample-package to mypackage. Then...

R CMD build myexample
R CMD INSTALL myexample_0.0.0.9000.tar.gz
R -q

then in R.

mymat <- cbind(matrix(rexp(100), 10), IN=c(rep(1,2), rep(2,3), rep(3,2), rep(4,1), rep(5,2)))
mymat
# [1,] 0.83010264 0.4778802 1.15826121 0.304299143 0.5781483 1.81660550
# [2,] 0.03895798 2.3709480 0.69694839 0.730800823 0.3319984 0.53348461
# [3,] 0.03383199 0.2842029 1.74151827 1.019573035 0.1863635 0.89487309
# [4,] 0.53533254 0.2814782 0.78563371 0.309180422 1.4393098 1.07450638
# [5,] 0.53010142 1.3132409 0.67072292 1.212244007 0.1984360 0.06208641
# [6,] 0.45916016 0.5576434 0.67770401 0.056270575 0.5065829 0.83416626
# [7,] 0.25404953 0.2730706 0.01070633 0.132406274 1.6186573 0.37083771
# [8,] 3.42254715 0.6489950 0.81291881 0.003048744 1.3522848 0.18066361
# [9,] 1.29994319 0.3170614 1.71145805 1.141222719 1.1832478 0.18091907
#[10,] 0.23622615 0.4473482 3.07774816 1.441207092 0.9761152 0.28132707
# IN
# [1,] 6.1868517 2.44880203 0.55261438 0.3459453 1
# [2,] 0.8177218 0.90554629 1.00106158 1.0427756 1
# [3,] 4.3910329 0.56068780 0.24262243 1.7036666 2
# [4,] 0.8712083 0.02439399 0.80927766 1.6596570 2
# [5,] 0.6613734 0.12954737 1.01661648 1.2446795 2
# [6,] 0.2858442 2.32610958 0.26553789 0.4856818 3
# [7,] 3.6628536 0.26447698 0.70633274 2.0283399 3
# [8,] 0.0515666 0.99916985 0.06102821 0.9413485 4
# [9,] 4.7781407 1.47764414 1.92598562 0.4581395 5
#[10,] 0.8770661 2.78552022 0.07543095 0.6886183 5
mynewmat <- myexample::aggregate_mean(mymat, "IN")
mynewmat
# get V1 V2 V3 V4 V5 V6 V7
#1: 1 0.4345303 1.4244141 0.9276048 0.517549983 0.4550734 1.1750451 3.5022868
#2: 2 0.3664220 0.6263073 1.0659583 0.846999155 0.6080364 0.6771553 1.9745382
#3: 3 0.3566048 0.4153570 0.3442052 0.094338425 1.0626201 0.6025020 1.9743489
#4: 4 3.4225471 0.6489950 0.8129188 0.003048744 1.3522848 0.1806636 0.0515666
#5: 5 0.7680847 0.3822048 2.3946031 1.291214905 1.0796815 0.2311231 2.8276034
# V8 V9 V10 IN
#1: 1.6771742 0.77683798 0.6943604 1
#2: 0.2382097 0.68950553 1.5360010 2
#3: 1.2952933 0.48593531 1.2570109 3
#4: 0.9991699 0.06102821 0.9413485 4
#5: 2.1315822 1.00070829 0.5733789 5

So I am not able to reproduce your problem. I encourage you to follow the same steps as described above, to narrow down, if the issue lies somewhere in the way how you install your package.
If you have more followup question, rather than editing question, best to put them in comments under my answer.

Hope that helps!

How can I use data.table in a package without importing all functions?

The (documented) solution I found is to set .datatable.aware <- TRUE somewhere in the package source code. According to the documentation, if you're using data.table in a package without importing the whole thing, you should do this so that [.data.table() does not revert to calling [.data.frame(). From the docs:

...please define .datatable.aware = TRUE anywhere in your R source code (no need to export). This tells data.table that you as a package developer have designed your code to intentionally rely on data.table functionality even though it may not be obvious from inspecting your NAMESPACE file.

How to use data.table::setDTthreads() in my own package?

Very good question.
Yes, it will affect all data.table calls (including those from other packages) in user environment and not just those from your package.
General advise is to not set this value in your package but let users know that they could set it themselves. If you want to set it in your package you should document it really well.
Note that 50% vs. 100% is often very small difference (can be less than 5%, or even slow down on a shared environments) so I suggest you to measure if it is really worth to mess with user environment if benefits are small.
Check those timings for example
https://github.com/h2oai/db-benchmark/issues/202

You could also fill a feature request for a possibility to set number of threads just for calls from a single package. It technically possible by checking top environment of a call.

data.table := not working in a package function

Thanks to jangorecki for pointing out the Importing data.table vignette

The issue was declaring data.table's special symbols in the NAMESPACE.

The Importing data.table vignette does not mention that if you are using roxygen2 to generate the NAMESPACE then you can't use import(data.table) in the NAMESPACE. But as always the excellent usethis package has it covered, with usethis::use_data_table(). This creates all the boilerplate and it now works :)

Local package dependency to R data.table :=

data.table should be imported in the NAMESPACE file of the package :

import(data.table)

With Roxygen, you could require this import in the function header, it will be automatically added to NAMESPACE:

#' Your function title & description
#'
#' @parameter data
#' @import data.table
#'
DTfunction <- function(data) {
data[,newcol:=.SD[,1]]
}

Test after loading the function:

DTfunction(as.data.table(mtcars[,1:2]))

mpg cyl newcol
<num> <num> <num>
1: 21.0 6 21.0
2: 21.0 6 21.0
3: 22.8 4 22.8
4: 21.4 6 21.4
...


Related Topics



Leave a reply



Submit