Scoping of Variables in Aes(...) Inside a Function in Ggplot

Scoping of variables in aes(...) inside a function in ggplot

Let's return a non-rendered ggplot object to see what's going on:

gg.str <- function() {
     i=2
     str(ggplot(df,aes(x=x,y=df[,i]))+geom_line())
}

gg.str()
List of 9
 $ data       :'data.frame':    91 obs. of  3 variables:
  ..$ x : num [1:91] 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 ...
  ..$ y1: num [1:91] 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 ...
  ..$ y2: num [1:91] -0.208 -0.28 -0.335 -0.373 -0.393 ...
 $ layers     :List of 1
  ..$ :Classes 'proto', 'environment' <environment: 0x0000000009886ca0> 
 $ scales     :Reference class 'Scales' [package "ggplot2"] with 1 fields
  ..$ scales: list()
  ..and 21 methods, of which 9 are possibly relevant:
  ..  add, clone, find, get_scales, has_scale, initialize, input, n, non_position_scales
 $ mapping    :List of 2
  ..$ x: symbol x
  ..$ y: language df[, i]
 $ theme      : list()
 $ coordinates:List of 1
  ..$ limits:List of 2
  .. ..$ x: NULL
  .. ..$ y: NULL
  ..- attr(*, "class")= chr [1:2] "cartesian" "coord"
 $ facet      :List of 1
  ..$ shrink: logi TRUE
  ..- attr(*, "class")= chr [1:2] "null" "facet"
 $ plot_env   :<environment: R_GlobalEnv> 
 $ labels     :List of 2
  ..$ x: chr "x"
  ..$ y: chr "df[, i]"
 - attr(*, "class")= chr [1:2] "gg" "ggplot"

As we can see, mapping for y is simply an unevaluated expression. Now, when we ask to do the actual plotting, the expression is evaluated within plot_env, which is global. I do not know why it is done so; I believe there are reasons for that.

Here's a demo that can override this behaviour:

gg.envir <- function(envir=environment()) {
    i=2
    p <- ggplot(df,aes(x=x,y=df[,i]))+geom_line()
    p$plot_env <- envir
    plot(p)
}
# evaluation in local environment; ok
gg.envir() 
# evaluation in global environment (same as default); fails if no i
gg.envir(environment())

Local Variables Within aes

I would capture the local environment,

xy <- data.frame(x=1:10,y=1:10)

plotfunc <- function(Data, YMul = 2){
    .e <- environment()
    ggplot(Data, aes(x = x, y = y*YMul), environment = .e) + geom_line()
}

plotfunc(xy)

ggplot2 variables within function

You're doing several things wrong.

First, everything specified inside aes() should be columns in your data frame. Do not reference separate vectors, or redundantly call columns via data_df[,1]. The whole point of specifying data = data_df is that then everything inside aes() is evaluated within that data frame.

Second, to write functions to create ggplots on different columns based on arguments, you should be using aes_string so that you can pass the aesthetic mappings as characters explicitly and avoid problems with non-standard evaluation.

Similarly, I would not rely on deparse(substitute()) for the plot title. Use some other variable built into the data frame, or some other data structure.

For instance, I would do something more like this:

data_df = data.frame(matrix(rnorm(200), nrow=20))
time=1:nrow(data_df)
data_df$time <- time

graphit <- function(data,column){
    ggplot(data=data, aes_string(x="time", y=column)) + 
        geom_point(alpha=1/4) + 
        ggtitle(column)
}

graphit(data_df,"X1")

ggplot2 inside function with a 2nd aesthetic: scoping issue

When I run into trouble with inherited aesthetics, I always go back and remove anything from the main ggplot() call that doesn't need to be there. Try this:

testFunc <- function(formula = NULL, data = NULL) {
    res <- as.character(formula[[2]])
    fac1 <- as.character(formula[[3]][2])
    fac2 <- as.character(formula[[3]][3])

    # Now add points
    p <- ggplot() + geom_point(data = data, aes_string(x = fac1, y = res, color = fac2,
        group = fac2)) # works fine if we stop here

    # Due to a bug in ggplot2_0.9.3, we must calc some quantities
    # and put them in a separate data frame for a new aesthetic
    avg <- aggregate(data[,res] ~ data[,fac1]*data[, fac2], data, FUN = mean)
    names(avg) <- c("factor1", "factor2", "mean")   
    p <- p + geom_line(aes_string(x = 'factor1', y = 'mean', group = 'factor2'), data = avg)
    }

Use of ggplot() within another function in R

As Joris and Chase have already correctly answered, standard best practice is to simply omit the meansdf$ part and directly refer to the data frame columns.

testplot <- function(meansdf)
{
  p <- ggplot(meansdf, 
              aes(fill = condition,
                  y = means,
                  x = condition))
  p + geom_bar(position = "dodge", stat = "identity")
}

This works, because the variables referred to in aes are looked for either in the global environment or in the data frame passed to ggplot. That is also the reason why your example code - using meansdf$condition etc. - did not work: meansdf is neither available in the global environment, nor is it available inside the data frame passed to ggplot, which is meansdf itself.

The fact that the variables are looked for in the global environment instead of in the calling environment is actually a known bug in ggplot2 that Hadley does not consider fixable at the moment.
This leads to problems, if one wishes to use a local variable, say, scale, to influence the data used for the plot:

testplot <- function(meansdf)
{
  scale <- 0.5
  p <- ggplot(meansdf, 
              aes(fill = condition,
                  y = means * scale,   # does not work, since scale is not found
                  x = condition))
  p + geom_bar(position = "dodge", stat = "identity")
}

A very nice workaround for this case is provided by Winston Chang in the referenced GitHub issue: Explicitly setting the environment parameter to the current environment during the call to ggplot.
Here's what that would look like for the above example:

testplot <- function(meansdf)
{
  scale <- 0.5
  p <- ggplot(meansdf, 
              aes(fill = condition,
                  y = means * scale,
                  x = condition),
              environment = environment())   # This is the only line changed / added
  p + geom_bar(position = "dodge", stat = "identity")
}

## Now, the following works
testplot(means)

How to use earlier declared variables within aes in ggplot with special operators (..count.., etc.)

It seems that there is some bug with ggplot() function when you use some stat for plotting (for example y=..count..). Function ggplot() has already environment variable and so it can use variable defined outside this function.

For example this will work because k is used only to change x variable:

k<-5
ggplot(dframe,aes(val/k,y=..count..))+geom_bar()

This will give an error because k is used to change y that is calculated with stat y=..count..

k<-5
ggplot(dframe,aes(val,y=..count../k))+geom_bar()
Error in eval(expr, envir, enclos) : object 'k' not found

To solve this problem you can kefine k inside the aes().

k <- 5
ggplot(dframe,aes(val,k=k,y=..count../k))+geom_bar()

R ggplot2 - Understanding the parameters of the aes function

Consider the code chunk below:

library(ggplot2)

df <- data.frame(
  x = c(1, 2), y = c(2, 1)
)

ggplot(df, aes(x, y + 1)) +
  geom_point(colour = "green") +
  geom_line(aes(colour = "blue"))

Here, the aes(x, y + 1) means aes(x = x, y = y + 1) which sets the x and y aesthetics that some layers understand to the x and y columns of the dataframe. This is because aes() has three arguments, x, y and .... By not declaring x = x for example, the first variable x is matched to the x parameter through the position in the function call. Other parameters than x or y must be named, for example aes(size = 10) and get passed trough ... to become part of the mapping (which are name-expression pairs).

Because the expression y = y + 1 is evaluated using 'non standard evaluation' in aes(), the scoping rules change and the variable y will first be attempted to be evaluated in the context of the data columns and not in the global environment, and hence we can 'calculate' the +1 on the dataframe columns.

It's not the aes() function that determines what are valid argument = value mappings, it is the layers that accept or reject parameters. You can find the parameters a layer accepts in the documentation of the layer, for example in ?geom_point, you see that it understands x, y, alpha, colour, fill, group, shape, size and stroke. You should be able to find these back if you call your_geom_layer$geom$aesthetics(). Extension packages can define their own layers with their own aesthetics, such as the area in the {treemap} package.

Additionally, because we've defined aes(x, y + 1) in the main ggplot() call, it will applied to every geometry or stat layer in that plot, in this case the points and the line. Hence, we do not need to repeat the same mapping in every layer but it is inherited unless you set inherit.aes = FALSE in a layer.

In the point layer we've defined colour = "green" outside the aes() function, so it will be interpreted literally (and follows standard evaluation with the usual scoping rules). People also call this a 'static' mapping, and you can only use this in layers and not globally. In contrast, because we've defined aes(colour = "blue") in the line layer, the "blue" will be interpreted as a categorical variable that participates in a colour scale that has it's own palette (a 'dynamic' mapping). If you execute the code, you'll see that the line is not blue, but a salmon-ish colour with a legend that maps the categorical value "blue" to a discrete scale with a 1-colour palette. Because "blue" is not a column in the dataframe, nor a variable in the global environment, it will be interpreted as a length 1 vector that will be recycled to fit the number of rows in the dataframe.

In general, if you want to map something to a scale (including position scales such as x and y), you put it inside aes(). If you want to have a literal interpretation, you put it outside aes() at the relevant layer.

Pass variable or expression into `aes`

Converted to an answer, as it seems useful

aes is designed to evaluate unquoted column names within the scope of the provided dataset (dset in your case). dset[, i] is not a column name, rather a whole column which aes wasn't designed to deal with.

Fortunately, you can parse quoted column names to aes_string. Thus, using

aes_string(x = names(dset)[i])

instead of

aes(x = dset[, i])

Should solve your problem

When does the argument go inside or outside aes()?

This issue and more specifically the difference in the output from the two mentioned commands are explicitly dealt with in Section 5.4.2 of the 2nd edition of "ggplot2. Elegant graphics for data analysis", by Hadley Wickham himself:

Either:

you can map (inside aes) a variable of your data to an aesthetic, e.g., aes(..., color = VarX), or ...
you can set (outside aes, but inside a geom element) an aesthetic to a constant value e.g. "blue"

In the first case, of mapping an aesthetic, such as color, ggplot2 chooses a color based on a kind of uniform average of all available colors (at the colorwheel), because the values of the mapped variable are all constant; why should the chosen color coincide with the constant value you happend to choose to map from? More explicitly, if you try the command:

ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y =hwy, color = "foo"))

you get exactly the same output plot as in the first command of the original question.

Scoping of Variables in Aes(...) Inside a Function in Ggplot