Writing Robust R Code: Namespaces, Masking and Using the '::' Operator

Writing robust R code: namespaces, masking and using the `::` operator

GREAT question.

Validation

Writing robust, stable, and production-ready R code IS hard. You said: "Surprisingly, this doesn't seem to bother a lot of programmers out there". That's because most R programmers are not writing production code. They are performing one-off academic/research tasks. I would seriously question the skillset of any coder that claims that R is easy to put into production. Aside from my post on search/find mechanism which you have already linked to, I also wrote a post about the dangers of warning. The suggestions will help reduce complexity in your production code.

Tips for writing robust/production R code

  1. Avoid packages that use Depends and favor packages that use Imports. A package with dependencies stuffed into Imports only is completely safe to use. If you absolutely must use a package that employs Depends, then email the author immediately after you call install.packages().

Here's what I tell authors: "Hi Author, I'm a fan of the XYZ package. I'd like to make a request. Could you move ABC and DEF from Depends to Imports in the next update? I cannot add your package to my own package's Imports until this happens. With R 2.14 enforcing NAMESPACE for every package, the general message from R Core is that packages should try to be "good citizens". If I have to load a Depends package, it adds a significant burden: I have to check for conflicts every time I take a dependency on a new package. With Imports, the package is free of side-effects. I understand that you might break other people's packages by doing this. I think its the right thing to do to demonstrate a commitment to Imports and in the long-run it will help people produce more robust R code."


  1. Use importFrom. Don't add an entire package to Imports, add only those specific functions that you require. I accomplish this with Roxygen2 function documentation and roxygenize() which automatically generates the NAMESPACE file. In this way, you can import two packages that have conflicts where the conflicts aren't in the functions you actually need to use. Is this tedious? Only until it becomes a habit. The benefit: you can quickly identify all of your 3rd-party dependencies. That helps with...

  2. Don't upgrade packages blindly. Read the changelog line-by-line and consider how the updates will affect the stability of your own package. Most of the time, the updates don't touch the functions you actually use.

  3. Avoid S4 classes. I'm doing some hand-waving here. I find S4 to be complex and it takes enough brain power to deal with the search/find mechanism on the functional side of R. Do you really need these OO feature? Managing state = managing complexity - leave that for Python or Java =)

  4. Write unit tests. Use the testthat package.

  5. Whenever you R CMD build/test your package, parse the output and look for NOTE, INFO, WARNING. Also, physically scan with your own eyes. There's a part of the build step that notes conflicts but doesn't attach a WARN, etc. to it.

  6. Add assertions and invariants right after a call to a 3rd-party package. In other words, don't fully trust what someone else gives you. Probe the result a little bit and stop() if the result is unexpected. You don't have to go crazy - pick one or two assertions that imply valid/high-confidence results.

I think there's more but this has become muscle memory now =) I'll augment if more comes to me.

R package scope and masking

If you want to specify a particular method for "[" then you should be able to use:

 `[.data.frame`(x, TRUE, j)

Or test for data.tables using inherits and trap that as an edge case?

Is there an equivalent to using the double colon operator (::) with source() in R?

You can use environments to effect this, using $ in lieu of ::.

If you have files:

  • file1.R

    func1 <- function(x) x + 1
    func2 <- function(y) y + 2
  • file2.R

    func3 <- function(x) x + 3
    func4 <- function(y) y + 4

then you can create environments for them and load them into there with local=:

e1 <- new.env()
source("file1.R", local = e1)
e2 <- new.env()
source("file2.R", local = e2)
ls()
# [1] "e1" "e2"
e1$func1(1)
# [1] 2
e1$func2(1)
# [1] 3
e2$func3(1)
# [1] 4
e2$func4(1)
# [1] 5

Note: functions defined in file2.R will not "see" functions in file1.R. This has some pros and cons:

  • Pro: namespace pollution is reduced. If you have constants defined in a file that the functions within it must be able to reference, then this works well. Those constants are in a sense "private" (very loosely speaking) to functions in that same file.

  • Con: unlike a "package", functions that must see each other must either be defined in the same file or must have another mechanism for determining where to find the other functions.

R writing style - require vs. ::


"Why should one prefer require over ::
when writing a function?"

I usually prefer require due to the nice TRUE/FALSE return value that lets me deal with the possibility of the package not being available up front before getting into the code. Crash as early as possible instead of halfway through your analysis.

I only use :: when I need to make sure I am using the correct version of a function, not a version from some other package that is masking the name.

On the other hand, :: operator gets
the variable from the package, while
require loads whole package (at least
I hope so), so speed differences came
first to my mind. :: must be faster
than require.

I think you may be ignoring the effects of lazy loading which is used by the foreign package according to the first page of its manual. Essentially, packages that use lazy loading defer the loading of objects, such as functions, until the objects are called upon for the first time. So your argument that ":: must be faster than require" is not necessarily true as foreign is not loading all of its contents into memory when you attach it with require. For full details on lazy loading, see Prof. Ripley's article in RNews, Volume 4, Issue 2.

Namespaces without packages

I’ve implemented a comprehensive solution and published it as a package, ‘box’.

Internally, ‘box’ modules uses an approach similar to packages; that is, it loads the code inside a dedicated namespace environment and then exports selected symbols into a module environment which is returned to the user, and optionally attached. The main difference to packages is that modules are more lightweight and easier to write (each R file is its own module), and can be nested.

Usage of the package is described in detail on its website.



Related Topics



Leave a reply



Submit