data.table error when used through knitr, gWidgetsWWW
This seems to be an environment issue. That is probably a problem between data.table
and gWidgetsWWW
. On knitr
's side, there is at least one solution, which is to specify the environment for knitr
to be the global environment, e.g.
knit2html("test_report.Rmd", envir = globalenv())
Edit:
To illustrate this issue is irrelevant to knitr
, try this:
library(gWidgetsWWW)
w<-gwindow("Test Window")
g<-ggroup(horizontal=F,cont=w)
b<-gbutton("Report Button",cont=g,handler=function(h,...){
library(data.table)
df<-data.frame(State=rownames(USArrests),USArrests)
print(data.table(df)[,State:=tolower(State)])
})
visible(w)<-TRUE
Save it as test_gui.R
, and
library(gWidgetsWWW)
localServerOpen('test_gui.R')
Click the button and you will also see the error.
data.table error when used through knitr, gWidgetsWWW
This seems to be an environment issue. That is probably a problem between data.table
and gWidgetsWWW
. On knitr
's side, there is at least one solution, which is to specify the environment for knitr
to be the global environment, e.g.
knit2html("test_report.Rmd", envir = globalenv())
Edit:
To illustrate this issue is irrelevant to knitr
, try this:
library(gWidgetsWWW)
w<-gwindow("Test Window")
g<-ggroup(horizontal=F,cont=w)
b<-gbutton("Report Button",cont=g,handler=function(h,...){
library(data.table)
df<-data.frame(State=rownames(USArrests),USArrests)
print(data.table(df)[,State:=tolower(State)])
})
visible(w)<-TRUE
Save it as test_gui.R
, and
library(gWidgetsWWW)
localServerOpen('test_gui.R')
Click the button and you will also see the error.
RTVS: Unable to Knit Document with data.table
This is confirmed as fixed in data.table_1.11.8
and later per @Hugh Ugh's comment above.
However, if anyone is constrained to use a prior version of data.table for some reason with RTVS, the workaround is to add:
assignInNamespace("cedta.pkgEvalsUserCode", c(data.table:::cedta.pkgEvalsUserCode, "rtvs"), "data.table")
In a script block, like so:
```{r additional-libraries, echo=FALSE}
library(data.table, quietly = TRUE, warn.conflicts = FALSE)
assignInNamespace("cedta.pkgEvalsUserCode", c(data.table:::cedta.pkgEvalsUserCode, "rtvs"), "data.table")
}
```
Advanced `data.table` working fine when run in chunk but error when `knit2html`
seasonal_m1 <- data.table(seasonal_m1)
setorder(seasonal_m1, index)
solved, need to convert seasonal_m1 to a data.table format. https://github.com/yihui/knitr/issues/1941#issuecomment-759275616
Using data.table package in R to sum over columns - getting GForce sum(gsum) error
The error says that you cannot sum a character, so I'd say that colA
is a character. You can use str(DT)
to see the types of the variables in your data.
I created a similar dataset and used the code you provided and it worked for me:
library(data.table)
DT = data.table("Date" = c('01/23/15', '01/24/15', '02/23/15', '02/24/15'),
"colA" = c(2323, 1212, 1234, 2345),
"colB" = c(2323, 1112, 1134, 2245),
"colC" = c(2323, 1012, 1434, 2445),
"month" = c('january', 'january', 'february', 'february'),
"year" = c(2015, 2015, 2015, 2015)
)
setkey(DT, month, year)
DT[ ,lapply(.SD, sum, na.rm=TRUE), by=.(month , year), .SDcols= 2:(length(colnames(DT))-2) ]
month year colA colB colC
1: february 2015 3579 3379 3879
2: january 2015 3535 3435 3335
Using data.table package inside my own package
Andrie's guess is right, +1. There is a FAQ on it (see vignette("datatable-faq")
), as well as a new vignette on importing data.table
:
FAQ 6.9: I have created a package that depends on data.table. How do I
ensure my package is data.table-aware so that inheritance from
data.frame works?Either i) include
data.table
in theDepends:
field of your DESCRIPTION file, or ii) includedata.table
in theImports:
field of your DESCRIPTION file ANDimport(data.table)
in your NAMESPACE file.
Further background ... at the top of [.data.table
(and other data.table
functions), you'll see a switch depending on the result of a call to cedta()
. This stands for Calling Environment Data Table Aware. Typing data.table:::cedta
reveals how it's done. It relies on the calling package having a namespace, and, that namespace Import'ing or Depend'ing on data.table
. This is how data.table
can be passed to non-data.table-aware packages (such as functions in base
) and those packages can use absolutely standard [.data.frame
syntax on the data.table
, blissfully unaware that the data.frame
is()
a data.table
, too.
This is also why data.table
inheritance didn't used to be compatible with namespaceless packages, and why upon user request we had to ask authors of such packages to add a namespace to their package to be compatible. Happily, now that R adds a default namespace for packages missing one (from v2.14.0), that problem has gone away :
CHANGES IN R VERSION 2.14.0
* All packages must have a namespace, and one is created on installation if not supplied in the sources.
What you can do with a data.frame that you can't with a data.table?
From the data.table FAQ
FAQ 1.8 OK, I'm starting to see what data.table is about, but why didn't you enhance data.frame in R? Why does it have to be a new package?
As FAQ 1.1 highlights,
j
in[.data.table
is fundamentally
different fromj
in[.data.frame
. Even something as simple as
DF[,1]
would break existing code in many packages and user code.
This is by design, and we want it to work this way for more
complicated syntax to work. There are other differences, too (see FAQ
2.17).Furthermore,
data.table
inherits fromdata.frame
. It is a
data.frame
, too. Adata.table
can be passed to any package that
only acceptsdata.frame
and that package can use[.data.frame
syntax on thedata.table
.We have proposed enhancements to R wherever possible, too. One of
these was accepted as a new feature in R 2.12.0 :
unique()
andmatch()
are now faster on character vectors where all elements are in the globalCHARSXP
cache and have unmarked
encoding (ASCII). Thanks to Matthew Dowle for suggesting improvements
to the way the hash code is generated inunique.
c.
A second proposal was to use
memcpy
induplicate.c
, which is much
faster than a for loop in C. This would improve the way that R copies
data internally (on some measures by 13 times). The thread on r-devel
is here : http://tolstoy.newcastle.edu.au/R/e10/devel/10/04/0148.html.
What are the smaller syntax differences between data.frame
and data.table
DT[3]
refers to the 3rd row, butDF[3]
refers to the 3rd columnDT[3, ] == DT[3]
, butDF[ , 3] == DF[3]
(somewhat confusingly in data.frame, whereas data.table is consistent)- For this reason we say the comma is optional in
DT
, but not optional inDF
DT[[3]] == DF[, 3] == DF[[3]]
DT[i, ]
, wherei
is a single integer, returns a single row, just likeDF[i, ]
, but unlike a matrix single-row subset which returns a vector.DT[ , j]
wherej
is a single integer returns a one-column data.table, unlikeDF[, j]
which returns a vector by defaultDT[ , "colA"][[1]] == DF[ , "colA"]
.DT[ , colA] == DF[ , "colA"]
(currently in data.table v1.9.8 but is about to change, see release notes)DT[ , list(colA)] == DF[ , "colA", drop = FALSE]
DT[NA]
returns 1 row ofNA
, butDF[NA]
returns an entire copy ofDF
containingNA
throughout. The symbolNA
is typelogical
in R and is therefore recycled by[.data.frame
. The user's intention was probablyDF[NA_integer_]
.[.data.table
diverts to this probable intention automatically, for convenience.DT[c(TRUE, NA, FALSE)]
treats theNA
asFALSE
, butDF[c(TRUE, NA, FALSE)]
returns
NA
rows for eachNA
DT[ColA == ColB]
is simpler thanDF[!is.na(ColA) & !is.na(ColB) & ColA == ColB, ]
data.frame(list(1:2, "k", 1:4))
creates 3 columns, data.table creates onelist
column.check.names
is by defaultTRUE
indata.frame
butFALSE
in data.table, for convenience.stringsAsFactors
is by defaultTRUE
indata.frame
butFALSE
in data.table, for efficiency. Since a global string cache was added to R, characters items are a pointer to the single cached string and there is no longer a performance benefit of converting tofactor
.- Atomic vectors in
list
columns are collapsed when printed using", "
indata.frame
, but","
in data.table with a trailing comma after the 6th item to avoid accidental printing of large embedded objects.
In[.data.frame
we very often setdrop = FALSE
. When we forget, bugs can arise in edge cases where single columns are selected and all of a sudden a vector is returned rather than a single columndata.frame
. In[.data.table
we took the opportunity to make it consistent and droppeddrop
.
When a data.table is passed to a data.table-unaware package, that package is not concerned with any of these differences; it just works.
Small caveat
There will possibly be cases where some packages use code that falls down when given a data.frame, however, given that data.table
is constantly being maintained to avoid such problems, any problems that may arise will be fixed promptly.
For example
see this question and prompt response
From the NEWS for v 1.8.2
- base::unname(DT) now works again, as needed by plyr::melt(). Thanks to
Christoph Jaeckel for reporting. Test added.- An as.data.frame method has been added for ITime, so that ITime can be passed to ggplot2
without error, #1713. Thanks to Farrel Buchinsky for reporting. Tests added.
ITime axis labels are still displayed as integer seconds from midnight; we don't know why ggplot2
doesn't invoke ITime's as.character method. Convert ITime to POSIXct for ggplot2, is one approach.
Related Topics
Multiple Condition If-Else Using Dplyr, Custom Function, or Purrr
Drawing Simple Mediation Diagram in R
In R, Match Function for Rows or Columns of Matrix
Constrain Multiple Sliderinput in Shiny to Sum to 100
Joining Two Datasets Using Fuzzy Logic
How to Flip Rows and Columns in R
Function Commenting Conventions in R
A^K for Matrix Multiplication in R
Passing by Reference a Data.Frame and Updating It with Rcpp
How to Include Svg Image in PDF Document Rendered by Rmarkdown
Keyboard Shortcut for Inserting Roxygen #' Comment Start
Plotting Continuous and Discrete Series in Ggplot with Facet
How to Change Strip.Text Labels in Ggplot with Facet and Margin=True
Ggplot2 Error "No Layers in Plot"
R Markdown Math Equation Alignment
How Calculate Growth Rate in Long Format Data Frame