knitr gets tricked by data.table `:=` assignment
Update Oct 2014. Now in data.table v1.9.5 :
:=
no longer prints inknitr
for consistency with behaviour at the prompt, #505. Output of a testknit("knitr.Rmd")
is now in data.table's unit tests.
and related :
if (TRUE) DT[,LHS:=RHS]
now doesn't print (thanks to Jureiss, #869). Test added. To get this to work we've had to live with one downside: if a:=
is used inside a function with noDT[]
before the end of the function, then the next timeDT
is typed at the prompt, nothing will be printed. A repeatedDT
will print. To avoid this: include aDT[]
after the last:=
in your function. If that is not possible (e.g., it's not a function you can change) thenprint(DT)
andDT[]
at the prompt are guaranteed to print. As before, adding an extra[]
on the end of a:=
query is a recommended idiom to update and then print; e.g.> DT[,foo:=3L][]
Previous answer kept for posterity (the global$depthtrigger
business is no longer done as from data.table v1.9.5 so this is no longer true) ...
Just to be clear I understand then: knitr
is printing when you don't want it to.
Try increasing data.table:::.global$depthtrigger
a little bit at the start of the script.
This will be 3 for you currently :
data.table:::.global$depthtrigger
[1] 3
I don't know how much eval depth knitr
adds to the stack. But try changing the trigger to 4 first; i.e.
assign("depthtrigger", 4, data.table:::.global)
and at the end of the knitr
script ensure to set it back to 3. If 4 doesn't work, try 5, then 6. If you get to 10 give up and I'll think again. ;-P
Why might this work?
See NEWS from v1.8.4 :
DT[,LHS:=RHS,...]
no longer printsDT
. This implements #2128 "Try
again to getDT[i,j:=value]
to return invisibly". Thanks to discussions here :
how to suppress output when using `:=` in R {data.table}, prior to v1.8.3?
http://r.789695.n4.nabble.com/Avoiding-print-when-using-tp4643076.html
FAQs 2.21 and 2.22 have been updated.FAQ 2.21 Why does DT[i,col:=value] return the whole of DT? I expected either no visible value (consistent with <-), or a message or return
value containing how many rows were updated. It isn't obvious that the
data has indeed been updated by reference.
This has changed in v1.8.3
to meet your expectations. Please upgrade. The whole of DT is returned
(now invisibly) so that compound syntax can work; e.g.,
DT[i,done:=TRUE][,sum(done)]. The number of rows updated is returned
when verbosity is on, either on a per query basis or globally using
options(datatable.verbose=TRUE).FAQ 2.22 Ok, thanks. What was so difficult about the result of DT[i,col:=value] being returned invisibly?
R internally forces
visibility on for [. The value of FunTab's eval column (see
src/main/names.c) for [ is 0 meaning force R_Visible on (see
R-Internals section 1.6). Therefore, when we tried invisible() or
setting R_Visible to 0 directly ourselves, eval in src/main/eval.c
would force it on again. To solve this problem, the key was to stop
trying to stop the print method running after a :=. Instead, inside :=
we now (from v1.8.3) set a global flag which the print method uses to
know whether to actually print or not.
That global flag is data.table:::.global$print
. At the top of data.table:::print.data.table
you'll see it looking at it. That's because there is no known way to suppress printing from [
(as FAQ 2.22 explains).
So, inside :=
inside [.data.table
it looks to see how "deep" this call is :
if (Cstack_info()[["eval_depth"]] <= .global$depthtrigger) {
suppPrint = function(x) { .global$print=FALSE; x }
# Suppress print when returns ok not on error, bug #2376.
# Thanks to: https://stackoverflow.com/a/13606880/403310
# All appropriate returns following this point are
# wrapped i.e. return(suppPrint(x)).
}
Essential that's just saying: if DT[,x:=y]
is running at the prompt, then I know the REPL is going to call the print
method on my result, beyond my control. Ok, so given print
method is going to run, I'm going to suppress it inside that print
method by setting a flag (since the print
method that runs (i.e. print.data.table
) is something I can control).
In knitr
's case it's simulating the REPL in a clever way. It isn't really a script, iiuc, otherwise DT[,x:=y]
wouldn't print anyway for that reason. But because it's simulating REPL via an eval
there is an extra level of eval
depth for code run from knitr
. Or something similar (I don't know knitr
).
Which is why I'm thinking increasing the depthtrigger
might do the trick.
Hacky/crufty, I agree. But if it works, and you let me know which value works, I can change data.table
to be knitr
aware and change the depthtrigger
automatically. Or any better solutions are most welcome.
why does knitr caching fail for data.table `:=`?
Speculation:
Here is what appears to be going on.
knitr quite sensibly caches objects as as soon as they are created. It then updates their cached value whenever it detects that they have been altered.
data.table, though, bypasses R's normal copy-by-value assignment and replacement mechanisms, and uses a :=
operator rather than a =
, <<-
, or <-
. As a result knitr isn't picking up the signals that DT
has been changed by DT[, c:=5]
.
Solution:
Just add this block to your code wherever you'd like the current value of DT
to be re-cached. It won't cost you anything memory or time-wise (since nothing except a reference is copied by DT <- DT
) but it does effectively send a (fake) signal to knitr that DT
has been updated:
```{r, cache=TRUE, echo=FALSE}
DT <- DT
```
Working version of example doc:
Check that it works by running this edited version of your doc:
```{r}
library(data.table)
```
Data.Table Markdown
========================================================
Suppose we make a `data.table` in **R Markdown**
```{r, cache=TRUE}
DT = data.table(a = rnorm(10))
```
Then add a column using `:=`
```{r, cache=TRUE}
DT[, c:=5]
```
```{r, cache=TRUE, echo=FALSE}
DT <- DT
```
Then we display that in a non-cached block
```{r, cache=FALSE}
DT
```
The first time you run this, the above will show a `c` column.
The second, third, and nth times, it will as well.
How to avoid data.table to be displayed in RMarkdown HTML output when column is added by reference
The best solution would be to open a new chunk, which will be calculated, but not shown:
---
output: html_document
---
```{r}
library(data.table)
myDT <- data.table(mtcars)
```
```{r, results='hide'}
myDT[,log.mpg:=log(mpg)] # This line knits the data.table toHTML
```
```{r}
plot(myDT$log.mpg,myDT$wt)
newDT <- myDT[,sqrt.mpg:=sqrt(mpg)] # This avoids HTML output, but its not data.table style, i.e. not elegant
plot(myDT$sqrt.mpg,myDT$wt)
```
data.table column expression breaks in easyHtmlReport, works in plain knitr
@Ramnath is correct. Line 40 of EasyHTMLReport.R is:
knit(input=f,output=md.file)
Update this line with:
knit(input = f, output = md.file, envir = envir)
Update the signature of the function from:
easyHtmlReport <-
function(rmd.file,from,to,subject,headers=list(),control=list(),
markdown.options=c("hard_wrap","use_xhtml","smartypants"),
stylesheet="", echo.disable=TRUE, is.debug=F){
to:
easyHtmlReport <-
function(rmd.file,from,to,subject,headers=list(),control=list(),
markdown.options=c("hard_wrap","use_xhtml","smartypants"),
stylesheet="", echo.disable=TRUE, is.debug=FALSE, envir = parent.frame()){
If you don't want to rebuild the package you should be able to make this change using the edit
function.
ESS does not deal well with data.table and knitr
Most likely an environment issue. This should solve it:
(setq ess-swv-processing-command "%s(%s)")
print the data.table package's .onAttach messages with knitr
Don't. Anything hacked in for this would be fragile and arguably not terribly useful.
Yihui Xie (knitr's author) makes a good case. My synopsis:
- This is not useful. You're writing a tutorial, so why include dynamic content (that may change when the package changes)? Moreover, why not point to resources directly rather than to the list of resources printed there?
- This is very hard. It is not just a matter of output streams. The messages don't print because they are walled behind an
interactive()
check. It's not obvious how this should be overridden and, supposing it could be done, what weird side effects that might introduce.
Related Topics
Difference Between If() and Ifelse() Functions
Moving Columns Within a Data.Frame() Without Retyping
How to Extract Just the Number from a Named Number (Without the Name)
Ggplot2:Plot Mean with Geom_Bar
Convert Character Matrix into Numeric Matrix
How to Color Sliderbar (Sliderinput)
Lm Function in R Does Not Give Coefficients for All Factor Levels in Categorical Data
Reasons That Ggplot2 Legend Does Not Appear
R: Split Unbalanced List in Data.Frame Column
What's the Difference Between '=' and '<-' in R
Counting Number of Instances of a Condition Per Row R
Venn Diagram Proportional and Color Shading with Semi-Transparency
Dplyr Issues When Using Group_By(Multiple Variables)
Import Data into R with an Unknown Number of Columns