What Best Practices Do You Use for Programming in R

How to apply Clean Code to R and what are some alternatives specific to R?

R is a multi-paradigm language, so directly adapting OOL-like code style to improve your R skills isn't exactly the best idea. 16 of the usual GoF design patterns are irrelevant or trivial in FP (hence also in R) - it would be more than counterproductive to limit yourself with the OO mindset.

I do recommend reading "Clean Code" (the cosmetics of OO code isn't much different than what FP programmers would consider "clean"), but to really optimize your R code you need a supplement of FP. In that case, plenty of books are available for you to read: Michelson's "An Introduction to Functional Programming Through Lambda Calculus" (if you are completely new to FP), "Functional Thinking" by Neal Ford, "Becoming Functional" by Joshua Backfield.

FP is pure untapped power which you can use in R. Why limit yourself only with OOP?

What is the most useful R trick?

str() tells you the structure of any object.

How to learn R as a programming language

For starters, you might want to look at this article by John Cook. Also make sure that you read "The R Inferno".

There are many good resources on the R homepage, but in particular, read "An Introduction to R" and "The R Language Definition".

Some very closely related stackoverflow questions:

  • books-for-learning-the-r-language.
  • what-are-some-good-books-web-resources-and-projects-for-learning-r
  • suggestions-on-way-resources-to-start-learning-statistical-language-r

My favorite book on the subject: "Software for Data Analysis: Programming with R", by John Chambers, the creator of the S language.

Updating R, R packages best practices

For your programming environment, update unless you have good reason not to, and maintain a good test suite for your in-house code. For projects with special needs, use renv to control versions of packages.

For your production environment, use the renv package to keep package versions locked down, and upgrade in a controlled manner if there is an explicit need.

Writing robust R code: namespaces, masking and using the `::` operator

GREAT question.

Validation

Writing robust, stable, and production-ready R code IS hard. You said: "Surprisingly, this doesn't seem to bother a lot of programmers out there". That's because most R programmers are not writing production code. They are performing one-off academic/research tasks. I would seriously question the skillset of any coder that claims that R is easy to put into production. Aside from my post on search/find mechanism which you have already linked to, I also wrote a post about the dangers of warning. The suggestions will help reduce complexity in your production code.

Tips for writing robust/production R code

  1. Avoid packages that use Depends and favor packages that use Imports. A package with dependencies stuffed into Imports only is completely safe to use. If you absolutely must use a package that employs Depends, then email the author immediately after you call install.packages().

Here's what I tell authors: "Hi Author, I'm a fan of the XYZ package. I'd like to make a request. Could you move ABC and DEF from Depends to Imports in the next update? I cannot add your package to my own package's Imports until this happens. With R 2.14 enforcing NAMESPACE for every package, the general message from R Core is that packages should try to be "good citizens". If I have to load a Depends package, it adds a significant burden: I have to check for conflicts every time I take a dependency on a new package. With Imports, the package is free of side-effects. I understand that you might break other people's packages by doing this. I think its the right thing to do to demonstrate a commitment to Imports and in the long-run it will help people produce more robust R code."


  1. Use importFrom. Don't add an entire package to Imports, add only those specific functions that you require. I accomplish this with Roxygen2 function documentation and roxygenize() which automatically generates the NAMESPACE file. In this way, you can import two packages that have conflicts where the conflicts aren't in the functions you actually need to use. Is this tedious? Only until it becomes a habit. The benefit: you can quickly identify all of your 3rd-party dependencies. That helps with...

  2. Don't upgrade packages blindly. Read the changelog line-by-line and consider how the updates will affect the stability of your own package. Most of the time, the updates don't touch the functions you actually use.

  3. Avoid S4 classes. I'm doing some hand-waving here. I find S4 to be complex and it takes enough brain power to deal with the search/find mechanism on the functional side of R. Do you really need these OO feature? Managing state = managing complexity - leave that for Python or Java =)

  4. Write unit tests. Use the testthat package.

  5. Whenever you R CMD build/test your package, parse the output and look for NOTE, INFO, WARNING. Also, physically scan with your own eyes. There's a part of the build step that notes conflicts but doesn't attach a WARN, etc. to it.

  6. Add assertions and invariants right after a call to a 3rd-party package. In other words, don't fully trust what someone else gives you. Probe the result a little bit and stop() if the result is unexpected. You don't have to go crazy - pick one or two assertions that imply valid/high-confidence results.

I think there's more but this has become muscle memory now =) I'll augment if more comes to me.

Coding principles in R - Looking for a book/web tutorial for writing complex programs in R

UPDATE:

There are two more recent books that you definitely need to check out when writing packages:

Advanced R from Hadley Wickham, explaining about environments and other advanced topics.

R Packages from Hadley Wickham, giving a great guide for package writing


There isn't one book or style guide for writing R packages; there are numerous books about R that include package writing etc, and the R internals give you a style guide as well.

R coding standards from R internals

The books that contain the most advanced information about R as a programming language are in my view the following two:

R programming for bioinformatics from Robert Gentleman

Software for data analysis: Programming with R from John Chambers

Both books give a lot of insight in R itself and contain useful style tips. Gentleman focuses on object oriented programming (as Bioconductor is largely S4 based), and Chambers is difficult to read but a rich information mine.

Next to that, you have a lot of information on stackoverflow to get ideas:

Coding practice in R : what are the advantages and disadvantages of different styles?

Function commenting conventions in R

any R style guide / checker?

What is your preferred style for naming variables in R?

Common R idioms

But basically you'll have to sit down with your team and agree on a standard. There's no 'best' way, so you all just have to agree on a good way you all use in order to keep the code consistent.

What is the most useful R trick?

str() tells you the structure of any object.



Related Topics



Leave a reply



Submit