Using data.table package inside my own package
Andrie's guess is right, +1. There is a FAQ on it (see vignette("datatable-faq")
), as well as a new vignette on importing data.table
:
FAQ 6.9: I have created a package that depends on data.table. How do I
ensure my package is data.table-aware so that inheritance from
data.frame works?Either i) include
data.table
in theDepends:
field of your DESCRIPTION file, or ii) includedata.table
in theImports:
field of your DESCRIPTION file ANDimport(data.table)
in your NAMESPACE file.
Further background ... at the top of [.data.table
(and other data.table
functions), you'll see a switch depending on the result of a call to cedta()
. This stands for Calling Environment Data Table Aware. Typing data.table:::cedta
reveals how it's done. It relies on the calling package having a namespace, and, that namespace Import'ing or Depend'ing on data.table
. This is how data.table
can be passed to non-data.table-aware packages (such as functions in base
) and those packages can use absolutely standard [.data.frame
syntax on the data.table
, blissfully unaware that the data.frame
is()
a data.table
, too.
This is also why data.table
inheritance didn't used to be compatible with namespaceless packages, and why upon user request we had to ask authors of such packages to add a namespace to their package to be compatible. Happily, now that R adds a default namespace for packages missing one (from v2.14.0), that problem has gone away :
CHANGES IN R VERSION 2.14.0
* All packages must have a namespace, and one is created on installation if not supplied in the sources.
R: Using data table inside my own package: Error in lapply(.SD, mean) : object '.SD' not found
As there is an reproducible exaple now in your question, I was able to dig into it.
I downloaded zip file from your link, unzip it, renamed myexample-package
to mypackage
. Then...
R CMD build myexample
R CMD INSTALL myexample_0.0.0.9000.tar.gz
R -q
then in R.
mymat <- cbind(matrix(rexp(100), 10), IN=c(rep(1,2), rep(2,3), rep(3,2), rep(4,1), rep(5,2)))
mymat
# [1,] 0.83010264 0.4778802 1.15826121 0.304299143 0.5781483 1.81660550
# [2,] 0.03895798 2.3709480 0.69694839 0.730800823 0.3319984 0.53348461
# [3,] 0.03383199 0.2842029 1.74151827 1.019573035 0.1863635 0.89487309
# [4,] 0.53533254 0.2814782 0.78563371 0.309180422 1.4393098 1.07450638
# [5,] 0.53010142 1.3132409 0.67072292 1.212244007 0.1984360 0.06208641
# [6,] 0.45916016 0.5576434 0.67770401 0.056270575 0.5065829 0.83416626
# [7,] 0.25404953 0.2730706 0.01070633 0.132406274 1.6186573 0.37083771
# [8,] 3.42254715 0.6489950 0.81291881 0.003048744 1.3522848 0.18066361
# [9,] 1.29994319 0.3170614 1.71145805 1.141222719 1.1832478 0.18091907
#[10,] 0.23622615 0.4473482 3.07774816 1.441207092 0.9761152 0.28132707
# IN
# [1,] 6.1868517 2.44880203 0.55261438 0.3459453 1
# [2,] 0.8177218 0.90554629 1.00106158 1.0427756 1
# [3,] 4.3910329 0.56068780 0.24262243 1.7036666 2
# [4,] 0.8712083 0.02439399 0.80927766 1.6596570 2
# [5,] 0.6613734 0.12954737 1.01661648 1.2446795 2
# [6,] 0.2858442 2.32610958 0.26553789 0.4856818 3
# [7,] 3.6628536 0.26447698 0.70633274 2.0283399 3
# [8,] 0.0515666 0.99916985 0.06102821 0.9413485 4
# [9,] 4.7781407 1.47764414 1.92598562 0.4581395 5
#[10,] 0.8770661 2.78552022 0.07543095 0.6886183 5
mynewmat <- myexample::aggregate_mean(mymat, "IN")
mynewmat
# get V1 V2 V3 V4 V5 V6 V7
#1: 1 0.4345303 1.4244141 0.9276048 0.517549983 0.4550734 1.1750451 3.5022868
#2: 2 0.3664220 0.6263073 1.0659583 0.846999155 0.6080364 0.6771553 1.9745382
#3: 3 0.3566048 0.4153570 0.3442052 0.094338425 1.0626201 0.6025020 1.9743489
#4: 4 3.4225471 0.6489950 0.8129188 0.003048744 1.3522848 0.1806636 0.0515666
#5: 5 0.7680847 0.3822048 2.3946031 1.291214905 1.0796815 0.2311231 2.8276034
# V8 V9 V10 IN
#1: 1.6771742 0.77683798 0.6943604 1
#2: 0.2382097 0.68950553 1.5360010 2
#3: 1.2952933 0.48593531 1.2570109 3
#4: 0.9991699 0.06102821 0.9413485 4
#5: 2.1315822 1.00070829 0.5733789 5
So I am not able to reproduce your problem. I encourage you to follow the same steps as described above, to narrow down, if the issue lies somewhere in the way how you install your package.
If you have more followup question, rather than editing question, best to put them in comments under my answer.
Hope that helps!
How can I use data.table in a package without importing all functions?
The (documented) solution I found is to set .datatable.aware <- TRUE
somewhere in the package source code. According to the documentation, if you're using data.table
in a package without importing the whole thing, you should do this so that [.data.table()
does not revert to calling [.data.frame()
. From the docs:
...please define .datatable.aware = TRUE anywhere in your R source code (no need to export). This tells data.table that you as a package developer have designed your code to intentionally rely on data.table functionality even though it may not be obvious from inspecting your NAMESPACE file.
How to use data.table::setDTthreads() in my own package?
Very good question.
Yes, it will affect all data.table calls (including those from other packages) in user environment and not just those from your package.
General advise is to not set this value in your package but let users know that they could set it themselves. If you want to set it in your package you should document it really well.
Note that 50% vs. 100% is often very small difference (can be less than 5%, or even slow down on a shared environments) so I suggest you to measure if it is really worth to mess with user environment if benefits are small.
Check those timings for example
https://github.com/h2oai/db-benchmark/issues/202
You could also fill a feature request for a possibility to set number of threads just for calls from a single package. It technically possible by checking top environment of a call.
data.table := not working in a package function
Thanks to jangorecki for pointing out the Importing data.table vignette
The issue was declaring data.table
's special symbols in the NAMESPACE.
The Importing data.table vignette does not mention that if you are using roxygen2 to generate the NAMESPACE then you can't use import(data.table)
in the NAMESPACE. But as always the excellent usethis
package has it covered, with usethis::use_data_table()
. This creates all the boilerplate and it now works :)
Local package dependency to R data.table :=
data.table
should be imported in the NAMESPACE
file of the package :
import(data.table)
With Roxygen, you could require this import in the function header, it will be automatically added to NAMESPACE
:
#' Your function title & description
#'
#' @parameter data
#' @import data.table
#'
DTfunction <- function(data) {
data[,newcol:=.SD[,1]]
}
Test after loading the function:
DTfunction(as.data.table(mtcars[,1:2]))
mpg cyl newcol
<num> <num> <num>
1: 21.0 6 21.0
2: 21.0 6 21.0
3: 22.8 4 22.8
4: 21.4 6 21.4
...
Related Topics
R Shiny: Handle Action Buttons in Data Table
Get "Embedded Nul(S) Found in Input" When Reading a CSV Using Read.Csv()
Adding Minor Tick Marks to the X Axis in Ggplot2 (With No Labels)
Data.Table "Key Indices" or "Group Counter"
Identifying Duplicate Columns in a Dataframe
How to Put a Transformed Scale on the Right Side of a Ggplot2
Ggplot2 Change Axis Limits For Each Individual Facet Panel
Plotting Grouped Bar Charts in R
Return Elements of List as Independent Objects in Global Environment
Dplyr Join on By=(A = B), Where a and B Are Variables Containing Strings
How to Fill Geom_Polygon With Different Colors Above and Below Y = 0 (Or Any Other Value)
Do.Call(Rbind, List) For Uneven Number of Column
Create Discrete Color Bar With Varying Interval Widths and No Spacing Between Legend Levels
Error: '\R' Is an Unrecognized Escape in Character String Starting "C:\R"
How to Move Cells With a Value Row-Wise to the Left in a Dataframe
Rep() With Each Equals a Vector
Posix Character Class Does Not Work in Base R Regex
How to Omit Na Values While Pasting Numerous Column Values Together