Function Commenting Conventions in R

Function commenting conventions in R

Updating this question December 2019 as the R-universe has changed since 2011 when originally written

My recommended resource is now http://r-pkgs.had.co.nz/

Original answer (links are mostly out of date)

The canonical way to document your functions and make them accessible to others is to make a package. In order for your package to pass the build checks, you have to supply sufficiently detailed help files for each of your functions / datasets.

Check out http://cran.r-project.org/doc/manuals/R-exts.html#Creating-R-packages

This blog post from Rob J Hyndman was very useful and one of the easiest for me to follow: http://robjhyndman.com/researchtips/building-r-packages-for-windows/

I've started using roxygen to assist in making & compiling packages as of late: http://roxygen.org/

Lots of good resources and people to help when you have questions!

how to make comments appear from custom functions

I see what you mean. If you write a customised function

foo = function(x,y) { ... }

Then you go foo( and hit tab, the code completion pop-up menu will give you the options x = and y =. However, when you type an existing R function such as round(, not only does tab give you the options, but there's an explanation beneath each variable, telling you its role in the function:

Sample Image

The only way I could think of doing this for your own functions is to package your functions in your own customised package, and to make sure the "help" documentations includes your functions' parameters. This is getting beyond the realm of a stackoverflow question, but I'll point you to a couple of blogs where I learned the basics of R packages.

The Not So Standard Deviation blog explains how to write a simple package with help documentation, which is precisely what you need to see your customised functions appear with explanations inside RStudio's autocomplete. In a nutshell, you'll need to install roxygen2, devtools and, with each customised function, you'll need to thoroughly comment the function like this :

Sample Image

(disclaimer: the goofy cat example is the blogger's, not mine)

Here's a more detailed tutorial on creating R packages, and here's another blog on getting organised with R packages. Good luck!

Documenting functions in an r script

You could do what you'd like with the help of the docstring package https://cran.r-project.org/package=docstring

It allows you to add roxygen style documentation within a function and view that documentation using the typical help file viewer all without needing to convert your code into a full package.

The vignette provides a good introduction to how to use the package https://cran.r-project.org/web/packages/docstring/vignettes/docstring_intro.html

Note: I am the author of the package so this is a bit of self promotion but it seems to be incredibly relevant to the question asked.

Best practices to comment R pipeline %%

Not really an answer, but too long for a comment--

I personally just put my comments in between commands in the pipe. For example:

object %>%
command1 %>%

#* Comment

command2 %>%
command3 %>%

#* Perhaps a
#* Really long
#* Comment

command4

The key, for me, is indenting your comment to the same level as the code it discusses so that I can visualize that it is part of a single block.

What does the dot mean in R – personal preference, naming convention or more?

A dot in function name can mean any of the following:

  • nothing at all
  • a separator between method and class in S3 methods
  • to hide the function name

Possible meanings

1. Nothing at all

The dot in data.frame doesn't separate data from frame, other than visually.

2. Separation of methods and classes in S3 methods

plot is one example of a generic S3 method. Thus plot.lm and plot.glm are the underlying function definitions that are used when calling plot(lm(...)) or plot(glm(...))

3. To hide internal functions

When writing packages, it is sometimes useful to use leading dots in function names because these functions are somewhat hidden from general view. Functions that are meant to be purely internal to a package sometimes use this.

In this context, "somewhat hidden" simply means that the variable (or function) won't normally show up when you list object with ls(). To force ls to show these variables, use ls(all.names=TRUE). By using a dot as first letter of a variable, you change the scope of the variable itself. For example:

x <- 3
.x <- 4

ls()
[1] "x"

ls(all.names=TRUE)
[1] ".x" "x"

x
[1] 3
.x
[1] 4

4. Other possible reasons

In Hadley's plyr package, he uses the convention to use leading dots in function names. This as a mechanism to try and ensure that when resolving variable names, the values resolve to the user variables rather than internal function variables.


Complications

This mishmash of different uses can lead to very confusing situations, because these different uses can all get mixed up in the same function name.

For example, to convert a data.frame to a list you use as.list(..)

as.list(iris)

In this case as.list is a S3 generic method, and you are passing a data.frame to it. Thus the S3 function is called as.list.data.frame:

> as.list.data.frame
function (x, ...)
{
x <- unclass(x)
attr(x, "row.names") <- NULL
x
}
<environment: namespace:base>

And for something truly spectacular, load the data.table package and look at the function as.data.table.data.frame:

> library(data.table)

> methods(as.data.table)
[1] as.data.table.data.frame* as.data.table.data.table* as.data.table.matrix*

Non-visible functions are asterisked

> data.table:::as.data.table.data.frame
function (x, keep.rownames = FALSE)
{
if (keep.rownames)
return(data.table(rn = rownames(x), x, keep.rownames = FALSE))
attr(x, "row.names") = .set_row_names(nrow(x))
class(x) = c("data.table", "data.frame")
x
}
<environment: namespace:data.table>

Are there any official naming conventions for R?

The R Developer Page contains "more or less finalized ideas and plans for the R statistical system" from R-core. It does not contain any information about naming conventions. A brief look at the core R code will confirm this.

What is your preferred style for naming variables in R?

Good previous answers so just a little to add here:

  • underscores are really annoying for ESS users; given that ESS is pretty widely used you won't see many underscores in code authored by ESS users (and that set includes a bunch of R Core as well as CRAN authors, excptions like Hadley notwithstanding);

  • dots are evil too because they can get mixed up in simple method dispatch; I believe I once read comments to this effect on one of the R list: dots are a historical artifact and no longer encouraged;

  • so we have a clear winner still standing in the last round: camelCase. I am also not sure if I really agree with the assertion of 'lacking precendent in the R community'.

And yes: pragmatism and consistency trump dogma. So whatever works and is used by colleagues and co-authors. After all, we still have white-space and braces to argue about :)



Related Topics



Leave a reply



Submit