General Suggestions For Debugging in R

General suggestions for debugging in R

I'd say that debugging is an art form, so there's no clear silver bullet. There are good strategies for debugging in any language, and they apply here too (e.g. read this nice article). For instance, the first thing is to reproduce the problem...if you can't do that, then you need to get more information (e.g. with logging). Once you can reproduce it, you need to reduce it down to the source.

Rather than a "trick", I would say that I have a favorite debugging routine:

When an error occurs, the first thing that I usually do is look at the stack trace by calling traceback(): that shows you where the error occurred, which is especially useful if you have several nested functions.
Next I will set options(error=recover); this immediately switches into browser mode where the error occurs, so you can browse the workspace from there.
If I still don't have enough information, I usually use the debug() function and step through the script line by line.

The best new trick in R 2.10 (when working with script files) is to use the findLineNum() and setBreakpoint() functions.

As a final comment: depending upon the error, it is also very helpful to set try() or tryCatch() statements around external function calls (especially when dealing with S4 classes). That will sometimes provide even more information, and it also gives you more control over how errors are handled at run time.

These related questions have a lot of suggestions:

Debugging tools for the R language
Debugging lapply/sapply calls
Getting the state of variables after an error occurs in R
R script line numbers at error?

Debugging options in R

(This is too long/formatted for a comment.) I'm not sure what you mean (referring to options(error=recover) by

it will be offer to stop before the loop started instead of offering to jump in the code at the iteration that caused the bug.

Here's an example where the break seems to occur at the iteration that caused the error, as requested:

options(error=recover)
f <- function(x) { for (i in 1:x) if (i==2) stop() }
f(5)
Error in f(5) : 
Enter a frame number, or 0 to exit   
1: f(5)
Selection: 1
Called from: top level 
Browse[1]> print(i)
[1] 2

This is breaking at a specific step in the loop, not (as suggested above) before the loop starts (where i would be undefined).

Can you please give a reproducible example to clarify the difference between the behaviour that happens and what you'd prefer?

For what it's worth, the RStudio front-end offers a slightly more visual debugging experience that you might prefer.

Recommendations for Dynamic/interactive debugging of functions in R?

Not entirely sure about the use case, but when you encounter a problem, you can call the function traceback(). That will show the path of your function call through the stack until it hit its problem. You could, if you were inclined to work your way down from the top, call debug on each of the functions given in the list before making your function call. Then you would be walking through the entire process from the beginning.

Here's an example of how you could do this in a more systematic way, by creating a function to step through it:

walk.through <- function() {
  tb <- unlist(.Traceback)
  if(is.null(tb)) stop("no traceback to use for debugging")
  assign("debug.fun.list", matrix(unlist(strsplit(tb, "\\(")), nrow=2)[1,], envir=.GlobalEnv)
  lapply(debug.fun.list, function(x) debug(get(x)))
  print(paste("Now debugging functions:", paste(debug.fun.list, collapse=",")))
}

unwalk.through <- function() {
  lapply(debug.fun.list, function(x) undebug(get(as.character(x))))
  print(paste("Now undebugging functions:", paste(debug.fun.list, collapse=",")))
  rm(list="debug.fun.list", envir=.GlobalEnv)
}

Here's a dummy example of using it:

foo <- function(x) { print(1); bar(2) }
bar <- function(x) { x + a.variable.which.does.not.exist }
foo(2)

# now step through the functions
walk.through() 
foo(2)

# undebug those functions again...
unwalk.through()
foo(2)

IMO, that doesn't seem like the most sensible thing to do. It makes more sense to simply go into the function where the problem occurs (i.e. at the lowest level) and work your way backwards.

I've already outlined the logic behind this basic routine in "favorite debugging trick".

How do I track down where a R package function fails?

traceback will print the call stack.

traceback()

Also, have a look at the on-line help for the debug function. Although I have seen better interactive debuggers, there is some basic functionality provided by debug(), debugonce() and undebug()

?base::debug

Debugging in R- How do I locate the error?

Here I rewrite your code using the vectorize version of SnowballStemmer. No need to use for.

library(plyr)   
stemMAP<-function(text){
  flatText <- unlist(strsplit(text," "))
  ## here I use the vectorize version
  wordStem <- as.character(SnowballStemmer(flatText))
  hh <- data.frame(ff = flatText,sn = wordStem)
  ## I use plyr to transform the result to a list
  ## dlply : data.frame to list apply
  ## we group the hh by the column sn , and a apply the 
  ## function as.character(x$ff) to each group( x here is subset data.fame)
  stemList <- dlply(hh,.(sn),function(x) as.character(x$ff))
  stemList
}

stemList
$I
[1] "I"

$a
[1] "a"

$activ
[1] "active"     "activates"  "activation"

$and
[1] "and" "and"

$be
[1] "being"

Debugging simple R code

A workaround is to specify defaults for alpha2 and x0, as I've done below, so that R will use these values if they aren't specified.

Pareto <- function (x, alpha2 = (1+sqrt(2)), x0=(10- 5* sqrt(2))) ifelse(x<x0 , 0, 1 - (x0/x)^alpha2)

R Debugger doesn't stop at breakpoints

debug is very convenient for problems of that sort. Say, you want to go through the function myfun step by step. Just run debug(myfun) before you run your code and it will behave as if you had a breakpoint on the first line of that function.

This works also, if the function is called from within other functions or if it is inside a package. In the latter case, it is particularly useful, because you can not just change the code of a function that comes from a package.

Saving workspace (in a particular frame) for post-mortem debugging in R

save(list=ls(), file="mylocals.Rda")

The hurdle I had to get over to realize this was the way forward was the name of that argument in save. Why did the authors use the argument name, "list", when it was a character vector (and not a list)? Same whine applies to the rm function argument names.

General Suggestions For Debugging in R