Data.Table Error When Used Through Knitr, Gwidgetswww

data.table error when used through knitr, gWidgetsWWW

This seems to be an environment issue. That is probably a problem between data.table and gWidgetsWWW. On knitr's side, there is at least one solution, which is to specify the environment for knitr to be the global environment, e.g.

knit2html("test_report.Rmd", envir = globalenv())

Edit:

To illustrate this issue is irrelevant to knitr, try this:

library(gWidgetsWWW)

w<-gwindow("Test Window")
g<-ggroup(horizontal=F,cont=w)
b<-gbutton("Report Button",cont=g,handler=function(h,...){
library(data.table)
df<-data.frame(State=rownames(USArrests),USArrests)
print(data.table(df)[,State:=tolower(State)])
})

visible(w)<-TRUE

Save it as test_gui.R, and

library(gWidgetsWWW)
localServerOpen('test_gui.R')

Click the button and you will also see the error.

data.table error when used through knitr, gWidgetsWWW

This seems to be an environment issue. That is probably a problem between data.table and gWidgetsWWW. On knitr's side, there is at least one solution, which is to specify the environment for knitr to be the global environment, e.g.

knit2html("test_report.Rmd", envir = globalenv())

Edit:

To illustrate this issue is irrelevant to knitr, try this:

library(gWidgetsWWW)

w<-gwindow("Test Window")
g<-ggroup(horizontal=F,cont=w)
b<-gbutton("Report Button",cont=g,handler=function(h,...){
library(data.table)
df<-data.frame(State=rownames(USArrests),USArrests)
print(data.table(df)[,State:=tolower(State)])
})

visible(w)<-TRUE

Save it as test_gui.R, and

library(gWidgetsWWW)
localServerOpen('test_gui.R')

Click the button and you will also see the error.

RTVS: Unable to Knit Document with data.table

This is confirmed as fixed in data.table_1.11.8 and later per @Hugh Ugh's comment above.

However, if anyone is constrained to use a prior version of data.table for some reason with RTVS, the workaround is to add:

assignInNamespace("cedta.pkgEvalsUserCode", c(data.table:::cedta.pkgEvalsUserCode, "rtvs"), "data.table")

In a script block, like so:

```{r additional-libraries, echo=FALSE}

library(data.table, quietly = TRUE, warn.conflicts = FALSE)

assignInNamespace("cedta.pkgEvalsUserCode", c(data.table:::cedta.pkgEvalsUserCode, "rtvs"), "data.table")

}
```

Advanced `data.table` working fine when run in chunk but error when `knit2html`

seasonal_m1 <- data.table(seasonal_m1)
setorder(seasonal_m1, index)

solved, need to convert seasonal_m1 to a data.table format. https://github.com/yihui/knitr/issues/1941#issuecomment-759275616

Using data.table package in R to sum over columns - getting GForce sum(gsum) error

The error says that you cannot sum a character, so I'd say that colA is a character. You can use str(DT) to see the types of the variables in your data.

I created a similar dataset and used the code you provided and it worked for me:

library(data.table)
DT = data.table("Date" = c('01/23/15', '01/24/15', '02/23/15', '02/24/15'),
"colA" = c(2323, 1212, 1234, 2345),
"colB" = c(2323, 1112, 1134, 2245),
"colC" = c(2323, 1012, 1434, 2445),
"month" = c('january', 'january', 'february', 'february'),
"year" = c(2015, 2015, 2015, 2015)
)

setkey(DT, month, year)

DT[ ,lapply(.SD, sum, na.rm=TRUE), by=.(month , year), .SDcols= 2:(length(colnames(DT))-2) ]
month year colA colB colC
1: february 2015 3579 3379 3879
2: january 2015 3535 3435 3335

Using data.table package inside my own package

Andrie's guess is right, +1. There is a FAQ on it (see vignette("datatable-faq")), as well as a new vignette on importing data.table:

FAQ 6.9: I have created a package that depends on data.table. How do I
ensure my package is data.table-aware so that inheritance from
data.frame works?

Either i) include data.table in the Depends: field of your DESCRIPTION file, or ii) include data.table in the Imports: field of your DESCRIPTION file AND import(data.table) in your NAMESPACE file.

Further background ... at the top of [.data.table (and other data.table functions), you'll see a switch depending on the result of a call to cedta(). This stands for Calling Environment Data Table Aware. Typing data.table:::cedta reveals how it's done. It relies on the calling package having a namespace, and, that namespace Import'ing or Depend'ing on data.table. This is how data.table can be passed to non-data.table-aware packages (such as functions in base) and those packages can use absolutely standard [.data.frame syntax on the data.table, blissfully unaware that the data.frame is() a data.table, too.

This is also why data.table inheritance didn't used to be compatible with namespaceless packages, and why upon user request we had to ask authors of such packages to add a namespace to their package to be compatible. Happily, now that R adds a default namespace for packages missing one (from v2.14.0), that problem has gone away :

CHANGES IN R VERSION 2.14.0

* All packages must have a namespace, and one is created on installation if not supplied in the sources.

What you can do with a data.frame that you can't with a data.table?

From the data.table FAQ

FAQ 1.8 OK, I'm starting to see what data.table is about, but why didn't you enhance data.frame in R? Why does it have to be a new package?

As FAQ 1.1 highlights, j in [.data.table is fundamentally
different from j in [.data.frame. Even something as simple as
DF[,1] would break existing code in many packages and user code.
This is by design, and we want it to work this way for more
complicated syntax to work. There are other differences, too (see FAQ
2.17).

Furthermore, data.table inherits from data.frame. It is a
data.frame, too. A data.table can be passed to any package that
only accepts data.frame and that package can use [.data.frame
syntax on the data.table.

We have proposed enhancements to R wherever possible, too. One of
these was accepted as a new feature in R 2.12.0 :

unique() and match() are now faster on character vectors where all elements are in the global CHARSXP cache and have unmarked
encoding (ASCII). Thanks to Matthew Dowle for suggesting improvements
to the way the hash code is generated in unique.c.


A second proposal was to use memcpy in duplicate.c, which is much
faster than a for loop in C. This would improve the way that R copies
data internally (on some measures by 13 times). The thread on r-devel
is here : http://tolstoy.newcastle.edu.au/R/e10/devel/10/04/0148.html.

What are the smaller syntax differences between data.frame and data.table

  • DT[3] refers to the 3rd row, but DF[3] refers to the 3rd column
  • DT[3, ] == DT[3], but DF[ , 3] == DF[3] (somewhat confusingly in data.frame, whereas data.table is consistent)
  • For this reason we say the comma is optional in DT, but not optional in DF
  • DT[[3]] == DF[, 3] == DF[[3]]
  • DT[i, ], where i is a single integer, returns a single row, just like DF[i, ], but unlike a matrix single-row subset which returns a vector.
  • DT[ , j] where j is a single integer returns a one-column data.table, unlike DF[, j] which returns a vector by default
  • DT[ , "colA"][[1]] == DF[ , "colA"].
  • DT[ , colA] == DF[ , "colA"] (currently in data.table v1.9.8 but is about to change, see release notes)
  • DT[ , list(colA)] == DF[ , "colA", drop = FALSE]
  • DT[NA] returns 1 row of NA, but DF[NA] returns an entire copy of DF containing NA throughout. The symbol NA is type logical in R and is therefore recycled by [.data.frame. The user's intention was probably DF[NA_integer_]. [.data.table diverts to this probable intention automatically, for convenience.
  • DT[c(TRUE, NA, FALSE)] treats the NA as FALSE, but DF[c(TRUE, NA, FALSE)] returns
    NA rows for each NA
  • DT[ColA == ColB] is simpler than DF[!is.na(ColA) & !is.na(ColB) & ColA == ColB, ]
  • data.frame(list(1:2, "k", 1:4)) creates 3 columns, data.table creates one list column.
  • check.names is by default TRUE in data.frame but FALSE in data.table, for convenience.
  • stringsAsFactors is by default TRUE in data.frame but FALSE in data.table, for efficiency. Since a global string cache was added to R, characters items are a pointer to the single cached string and there is no longer a performance benefit of converting to factor.
  • Atomic vectors in list columns are collapsed when printed using ", " in data.frame, but "," in data.table with a trailing comma after the 6th item to avoid accidental printing of large embedded objects.
    In [.data.frame we very often set drop = FALSE. When we forget, bugs can arise in edge cases where single columns are selected and all of a sudden a vector is returned rather than a single column data.frame. In [.data.table we took the opportunity to make it consistent and dropped drop.
    When a data.table is passed to a data.table-unaware package, that package is not concerned with any of these differences; it just works.

Small caveat

There will possibly be cases where some packages use code that falls down when given a data.frame, however, given that data.table is constantly being maintained to avoid such problems, any problems that may arise will be fixed promptly.

For example

  • see this question and prompt response

  • From the NEWS for v 1.8.2

  • base::unname(DT) now works again, as needed by plyr::melt(). Thanks to
    Christoph Jaeckel for reporting. Test added.
  • An as.data.frame method has been added for ITime, so that ITime can be passed to ggplot2
    without error, #1713. Thanks to Farrel Buchinsky for reporting. Tests added.
    ITime axis labels are still displayed as integer seconds from midnight; we don't know why ggplot2
    doesn't invoke ITime's as.character method. Convert ITime to POSIXct for ggplot2, is one approach.


Related Topics



Leave a reply



Submit