Organizing R Source Code

Organizing R Source Code

This question is very closely related to: "How to organize large R programs?"

You should consider creating an R package. You can use the package.skeleton function to start with given a set of R files. I also strongly recommend using roxygen to document the package at the beginning, because it's much more difficult to do it after the fact.

Read "Writing R Extensions". The online book "Statistics with R" has a section on this subject. Also take a look at Creating R Packages: A Tutorial by Friedrich Leisch. Lastly, if you're in NY, come to the upcoming NY use-R group meeting on "Authoring R Packages: a gentle introduction with examples".

Just to rehash some suggestions about good practices:

  • A package allows you to use R CMD check which is very helpful at catching bugs; separately you can look at using the codetools package.
  • A package also forces you to do a minimal amount of documentation, which leads to better practices in the long run.
  • You should also consider doing unit testing (e.g. with RUnit) if you want your code to be robust/maintainable.
  • You should consider using a style guide (e.g. Google Style Guide).
  • Use a version control system from the beginning, and if you're going to make your code open source, then consider using github or r-forge.

Edit:

Regarding how do make incremental changes without rebuilding and installing the full package: I find the easiest thing to do is to make changes in your relevant R file and then use the source command to load those changes. Once you load your library into an R session, it will always be lower in the environment (and lower in priority) than the .GlobalEnv, so any changes that you source or load in directly will be used first (use the search command to see this). That way you can have your package underlying and you are overwriting changes as you're testing them in the environment.

Alternatively, you can use an IDE like StatET or ESS. They make loading individual lines or functions out of an R package very easy. StatET is particularly well designed to handle managing packages in a directory-like structure.

How to organize large R programs?

The standard answer is to use packages -- see the Writing R Extensions manual as well as different tutorials on the web.

It gives you

  • a quasi-automatic way to organize your code by topic
  • strongly encourages you to write a help file, making you think about the interface
  • a lot of sanity checks via R CMD check
  • a chance to add regression tests
  • as well as a means for namespaces.

Just running source() over code works for really short snippets. Everything else should be in a package -- even if you do not plan to publish it as you can write internal packages for internal repositories.

As for the 'how to edit' part, the R Internals manual has excellent R coding standards in Section 6. Otherwise, I tend to use defaults in Emacs' ESS mode.

Update 2008-Aug-13: David Smith just blogged about the Google R Style Guide.

Proper R Markdown Code Organization

Often times, I have many reports that need to run the same code with slightly different parameters. Calling all my "stats" functions separately, generating the results and then just referencing is what I typically do. The way to do this is as follows:

---
title: "Untitled"
author: "Author"
date: "August 4, 2015"
output: html_document
---

```{r, echo=FALSE, message=FALSE}
directoryPath <- "rawPath" ##Something like /Users/userid/RDataFile
fullPath <- file.path(directoryPath,"myROutputFile.RData")
load(fullPath)
```

Some Text, headers whatever

```{r}
summary(myStructure$value1) #Where myStructure was saved to the .RData file
```

You can save an RData file by using the save.image() command.

Hope that helps!

Code organisation in R package development

You can't use subfolders without additional setup (like defining a custom makefile). The best you can do is to use prefixes: client-a.r, client-b.r, server-a.r, server-b.r, etc.

How to organize big R functions?

Option 1

One option is to use switch instead of multiple if statements:

myfun <- function(y, type=c("aa", "bb", "cc", "dd" ... "zz")){
switch(type,
"aa" = sub_fun_aa(y),
"bb" = sub_fun_bb(y),
"bb" = sub_fun_cc(y),
"dd" = sub_fun_dd(y)
)
}

Option 2

In your edited question you gave far more specific information. Here is a general design pattern that you might want to consider. The key element in this pattern is that there is not a single if in sight. I replace it with match.function, where the key idea is that the type in your function is itself a function (yes, since R supports functional programming, this is allowed).:

sharpening <- function(x){
paste(x, "General sharpening", sep=" - ")
}

unsharpMask <- function(x){
y <- sharpening(x)
#... Some specific stuff here...
paste(y, "Unsharp mask", sep=" - ")
}

hiPass <- function(x) {
y <- sharpening(x)
#... Some specific stuff here...
paste(y, "Hipass filter", sep=" - ")
}

generalMethod <- function(x, type=c(hiPass, unsharpMask, ...)){
match.fun(type)(x)
}

And call it like this:

> generalMethod("stuff", "unsharpMask")
[1] "stuff - General sharpening - Unsharp mask"
> hiPass("mystuff")
[1] "mystuff - General sharpening - Hipass filter"

How to include (source) R script in other scripts

Here is one possible way. Use the exists function to check for something unique in your util.R code.

For example:

if(!exists("foo", mode="function")) source("util.R")

(Edited to include mode="function", as Gavin Simpson pointed out)

Where in R do I permanently store my custom functions?

Yes, create a package. There are numerous tutorials as well as the Writing R Extensions manual that came with your copy of R.

It may seem like too much work at first, but you will probably be glad that you did this in the longer run.

PS And you can then load that package from ~/.Rprofile. For really short code, you can also define it there.

Document/Scripts management for R code

R comes with several mechanisms for searching for help, most of which naturally use CRAN. Some examples: the sos package, cranberries, crantastic, and rseek. In many cases, these could be adapted to use a local repository (you can find out how to create a local repository in the R manual, which is very easy to do). Otherwise, if you package your scripts and submit them to CRAN, you will naturally have these available to you. I would also highly recommend this presentation on the subject: Creating R Packages, Using CRAN, R-Forge, And Local R Archive Networks And Subversion (SVN) Repositories from Spencer Graves and Sundar Dorai-Raj.

These would require you to put your code in packages, and create documentation, all of which is worth doing anyway. The package documentation turns out to be very useful for both documenting what things do, and helping your find them in the future. You can use roxygen to create this documentation in-line with your code. Also read this related question: Organizing R Source Code.

Alternatively, the help.search() function can be very useful for searching local packages, regardless of whether you have a repository set up.



Related Topics



Leave a reply



Submit