Code Organisation in R Package Development

Code organisation in R package development

You can't use subfolders without additional setup (like defining a custom makefile). The best you can do is to use prefixes: client-a.r, client-b.r, server-a.r, server-b.r, etc.

How to organize large R programs?

The standard answer is to use packages -- see the Writing R Extensions manual as well as different tutorials on the web.

It gives you

  • a quasi-automatic way to organize your code by topic
  • strongly encourages you to write a help file, making you think about the interface
  • a lot of sanity checks via R CMD check
  • a chance to add regression tests
  • as well as a means for namespaces.

Just running source() over code works for really short snippets. Everything else should be in a package -- even if you do not plan to publish it as you can write internal packages for internal repositories.

As for the 'how to edit' part, the R Internals manual has excellent R coding standards in Section 6. Otherwise, I tend to use defaults in Emacs' ESS mode.

Update 2008-Aug-13: David Smith just blogged about the Google R Style Guide.

Organizing R Source Code

This question is very closely related to: "How to organize large R programs?"

You should consider creating an R package. You can use the package.skeleton function to start with given a set of R files. I also strongly recommend using roxygen to document the package at the beginning, because it's much more difficult to do it after the fact.

Read "Writing R Extensions". The online book "Statistics with R" has a section on this subject. Also take a look at Creating R Packages: A Tutorial by Friedrich Leisch. Lastly, if you're in NY, come to the upcoming NY use-R group meeting on "Authoring R Packages: a gentle introduction with examples".

Just to rehash some suggestions about good practices:

  • A package allows you to use R CMD check which is very helpful at catching bugs; separately you can look at using the codetools package.
  • A package also forces you to do a minimal amount of documentation, which leads to better practices in the long run.
  • You should also consider doing unit testing (e.g. with RUnit) if you want your code to be robust/maintainable.
  • You should consider using a style guide (e.g. Google Style Guide).
  • Use a version control system from the beginning, and if you're going to make your code open source, then consider using github or r-forge.

Edit:

Regarding how do make incremental changes without rebuilding and installing the full package: I find the easiest thing to do is to make changes in your relevant R file and then use the source command to load those changes. Once you load your library into an R session, it will always be lower in the environment (and lower in priority) than the .GlobalEnv, so any changes that you source or load in directly will be used first (use the search command to see this). That way you can have your package underlying and you are overwriting changes as you're testing them in the environment.

Alternatively, you can use an IDE like StatET or ESS. They make loading individual lines or functions out of an R package very easy. StatET is particularly well designed to handle managing packages in a directory-like structure.

How to develop a package in R?

Consult the following reference material:

  1. Chapter 1, Creating R packages, of the Writing R extensions manual. This is the canonical source. It's the ultimate reference point, but not necessarily the best starting point.
  2. A short presentation outlining the key ideas in package development and using the devtools package for development
  3. Hadley's devtools wiki, particular the Package basics section.
  4. The R help for ?package.skeleton and ?create in devtools.
  5. The presentation by Uwe Ligges at useR!2010 on package development.
  6. R Packages by Hadley Wickham.

Questions about R package publishing and code visibility

I guess you have a few different questions here. Let's take them in the order you asked them:

What if I add a package which uses another package? Is this package automatically downloaded and loaded too? Or is it in general forbidden for a R package to use another package?

It is certainly not forbidden for an R package to use another R package. In fact, the majority of R packages rely on other packages.

The source code for each R package must include a text-based DESCRIPTION file in the root directory. In this file you will find (among other things) a "Depends" field, and an "Imports" field. Together, these two fields list all the other packages required to use this package. If a user doesn't already have these other packages installed in their local library, R will install them automatically when it installs the requested package.

If your package lists a dependency in "Depends", then the dependency package is attached whenever your package is attached. Thus if you looked at the source code for a package called "foo" and you see that its DESCRIPTION file contains the line

Depends: bar,

you know that when you call library(foo) in your R console, you have effectively done library(bar); library(foo)

This isn't always ideal. The package foo might only need a couple of functions from package bar, and bar might contain some other functions whose names could clash with other commonly used functions. Therefore, in general, if you are writing a package and you only want to use a few functions from another package, it would be better to use "Imports" rather than "Depends" to limit the number of unnecessary symbols being added to your user's search path.



Suppose I want to publish a R package. Within my code, can I use functions from other packages and install and load these packages

Yes, you can use functions from other packages. The simplest way to do this is to include the name of the package in the Depends field of your DESCRIPTION file.

However, when using just a few functions from another package inside your own package, best practice is to use the "Imports" field in the DESCRIPTION file, and use a namespace qualifier for the imported function in your actual R code. For example, if you wanted to use ggplot from the ggplot2 package, then inside your function you would call it ggplot2::ggplot rather than just ggplot.

If you publish your package for others to use, the dependencies will be installed automatically along with your package if the user calls install.packages with the default settings. For example, when I did:

install.packages("fGarch")

I got the associated message:

#> also installing the dependencies ‘timeSeries’, ‘fBasics’, ‘fastICA’


Do I have to implement a message that this and that package is needed and that the user has to install and load it prior to it and I need to implement error catching functions in case the package cannot be found on the pc system?

No, not in general. R will take care of this as long as you have listed the correct packages in your DESCRIPTION file.



When I want to publish a R package, can I use/call Java code within my package/code?

R does not have a native Java API, but you can use your own Java code via the rJava package, which you can list as a dependency for your package. However, there are some users who have difficulty getting Java to run, for example business and academic users who may use R but do not have Java installed and do not have admin rights to install it, so this is something to bear in mind when writing a package.



For a package which was already published - so let's take just as an example the fGarch package - I would like to see the complete code. How can I see this?

Every package available for download from CRAN has its source code available. In the case of fGarch, its CRAN page contains a link to the gzipped tarball of the source code. You can download this and use untar in R to review all the source code. Alternatively, many packages will have an easily-found repository on Github or other source-control sites where you can examine the source code via a browser. For example, you can browse the fGarch source on Github here.



For a package which was already published, is it possible to see and look into all files which were submitted? So like a repository as git where all files are submitted - the code itself and further files which are needed like description files or whatever - and I can see these files and look into them?

Yes, you can look at all the sources files for all the packages uploaded to CRAN on Github at the unofficial Github CRAN mirror here



Is there code in a R package which I cannot see as an end user? This refers also to my previous question, how can I or which way can I see the whole code in a R package?

As above, you can get the source code for any package via CRAN or Github. As you said, you can look at the source code for exported functions just by typing the name of that function into R. For unexported functions, you can do the same with a triple colon. For example, ggplot2:::adjust_breaks allows you to see the function body of the unexported function adjust_breaks from ggplot2. There are some complexities when an object-oriented system like S4, ggproto or R6 is used, or when the source code includes compiled C or C++ code, but I haven't come across a situation yet in which I was not able to find the relevant source code after a minute or two with an R console and a good search engine.

Where to include text notes in R package directory structure?

The package subdirectories section of Writing R Extensions is the canonical reference (although rather dense and technical).

You could put it in inst/doc, for example: I don't think anything in inst/ is specifically checked.

Alternatively, if you want it on GitHub but not to be included in the package material, put the file name in .Rbuildignore (your .git, .Rproj.user, etc. should already be there; I'm not sure why you have a Meta directory, that's usually in installed packages?). (I'd suggest that you do put it in inst/doc so that end users who've installed the package could find it if they wanted.)



Related Topics



Leave a reply



Submit