Scoping of variables in aes(...) inside a function in ggplot
Let's return a non-rendered ggplot
object to see what's going on:
gg.str <- function() {
i=2
str(ggplot(df,aes(x=x,y=df[,i]))+geom_line())
}
gg.str()
List of 9
$ data :'data.frame': 91 obs. of 3 variables:
..$ x : num [1:91] 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 ...
..$ y1: num [1:91] 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 ...
..$ y2: num [1:91] -0.208 -0.28 -0.335 -0.373 -0.393 ...
$ layers :List of 1
..$ :Classes 'proto', 'environment' <environment: 0x0000000009886ca0>
$ scales :Reference class 'Scales' [package "ggplot2"] with 1 fields
..$ scales: list()
..and 21 methods, of which 9 are possibly relevant:
.. add, clone, find, get_scales, has_scale, initialize, input, n, non_position_scales
$ mapping :List of 2
..$ x: symbol x
..$ y: language df[, i]
$ theme : list()
$ coordinates:List of 1
..$ limits:List of 2
.. ..$ x: NULL
.. ..$ y: NULL
..- attr(*, "class")= chr [1:2] "cartesian" "coord"
$ facet :List of 1
..$ shrink: logi TRUE
..- attr(*, "class")= chr [1:2] "null" "facet"
$ plot_env :<environment: R_GlobalEnv>
$ labels :List of 2
..$ x: chr "x"
..$ y: chr "df[, i]"
- attr(*, "class")= chr [1:2] "gg" "ggplot"
As we can see, mapping
for y
is simply an unevaluated expression. Now, when we ask to do the actual plotting, the expression is evaluated within plot_env
, which is global. I do not know why it is done so; I believe there are reasons for that.
Here's a demo that can override this behaviour:
gg.envir <- function(envir=environment()) {
i=2
p <- ggplot(df,aes(x=x,y=df[,i]))+geom_line()
p$plot_env <- envir
plot(p)
}
# evaluation in local environment; ok
gg.envir()
# evaluation in global environment (same as default); fails if no i
gg.envir(environment())
Local Variables Within aes
I would capture the local environment,
xy <- data.frame(x=1:10,y=1:10)
plotfunc <- function(Data, YMul = 2){
.e <- environment()
ggplot(Data, aes(x = x, y = y*YMul), environment = .e) + geom_line()
}
plotfunc(xy)
ggplot2 variables within function
You're doing several things wrong.
First, everything specified inside aes()
should be columns in your data frame. Do not reference separate vectors, or redundantly call columns via data_df[,1]
. The whole point of specifying data = data_df
is that then everything inside aes()
is evaluated within that data frame.
Second, to write functions to create ggplot
s on different columns based on arguments, you should be using aes_string
so that you can pass the aesthetic mappings as characters explicitly and avoid problems with non-standard evaluation.
Similarly, I would not rely on deparse(substitute())
for the plot title. Use some other variable built into the data frame, or some other data structure.
For instance, I would do something more like this:
data_df = data.frame(matrix(rnorm(200), nrow=20))
time=1:nrow(data_df)
data_df$time <- time
graphit <- function(data,column){
ggplot(data=data, aes_string(x="time", y=column)) +
geom_point(alpha=1/4) +
ggtitle(column)
}
graphit(data_df,"X1")
ggplot2 inside function with a 2nd aesthetic: scoping issue
When I run into trouble with inherited aesthetics, I always go back and remove anything from the main ggplot()
call that doesn't need to be there. Try this:
testFunc <- function(formula = NULL, data = NULL) {
res <- as.character(formula[[2]])
fac1 <- as.character(formula[[3]][2])
fac2 <- as.character(formula[[3]][3])
# Now add points
p <- ggplot() + geom_point(data = data, aes_string(x = fac1, y = res, color = fac2,
group = fac2)) # works fine if we stop here
# Due to a bug in ggplot2_0.9.3, we must calc some quantities
# and put them in a separate data frame for a new aesthetic
avg <- aggregate(data[,res] ~ data[,fac1]*data[, fac2], data, FUN = mean)
names(avg) <- c("factor1", "factor2", "mean")
p <- p + geom_line(aes_string(x = 'factor1', y = 'mean', group = 'factor2'), data = avg)
}
Use of ggplot() within another function in R
As Joris and Chase have already correctly answered, standard best practice is to simply omit the meansdf$
part and directly refer to the data frame columns.
testplot <- function(meansdf)
{
p <- ggplot(meansdf,
aes(fill = condition,
y = means,
x = condition))
p + geom_bar(position = "dodge", stat = "identity")
}
This works, because the variables referred to in aes
are looked for either in the global environment or in the data frame passed to ggplot
. That is also the reason why your example code - using meansdf$condition
etc. - did not work: meansdf
is neither available in the global environment, nor is it available inside the data frame passed to ggplot
, which is meansdf
itself.
The fact that the variables are looked for in the global environment instead of in the calling environment is actually a known bug in ggplot2 that Hadley does not consider fixable at the moment.
This leads to problems, if one wishes to use a local variable, say, scale
, to influence the data used for the plot:
testplot <- function(meansdf)
{
scale <- 0.5
p <- ggplot(meansdf,
aes(fill = condition,
y = means * scale, # does not work, since scale is not found
x = condition))
p + geom_bar(position = "dodge", stat = "identity")
}
A very nice workaround for this case is provided by Winston Chang in the referenced GitHub issue: Explicitly setting the environment
parameter to the current environment during the call to ggplot
.
Here's what that would look like for the above example:
testplot <- function(meansdf)
{
scale <- 0.5
p <- ggplot(meansdf,
aes(fill = condition,
y = means * scale,
x = condition),
environment = environment()) # This is the only line changed / added
p + geom_bar(position = "dodge", stat = "identity")
}
## Now, the following works
testplot(means)
How to use earlier declared variables within aes in ggplot with special operators (..count.., etc.)
It seems that there is some bug with ggplot()
function when you use some stat
for plotting (for example y=..count..
). Function ggplot()
has already environment
variable and so it can use variable defined outside this function.
For example this will work because k
is used only to change x
variable:
k<-5
ggplot(dframe,aes(val/k,y=..count..))+geom_bar()
This will give an error because k
is used to change y
that is calculated with stat y=..count..
k<-5
ggplot(dframe,aes(val,y=..count../k))+geom_bar()
Error in eval(expr, envir, enclos) : object 'k' not found
To solve this problem you can kefine k
inside the aes()
.
k <- 5
ggplot(dframe,aes(val,k=k,y=..count../k))+geom_bar()
R ggplot2 - Understanding the parameters of the aes function
Consider the code chunk below:
library(ggplot2)
df <- data.frame(
x = c(1, 2), y = c(2, 1)
)
ggplot(df, aes(x, y + 1)) +
geom_point(colour = "green") +
geom_line(aes(colour = "blue"))
Here, the aes(x, y + 1)
means aes(x = x, y = y + 1)
which sets the x
and y
aesthetics that some layers understand to the x
and y
columns of the dataframe. This is because aes()
has three arguments, x
, y
and ...
. By not declaring x = x
for example, the first variable x
is matched to the x
parameter through the position in the function call. Other parameters than x
or y
must be named, for example aes(size = 10)
and get passed trough ...
to become part of the mapping (which are name-expression pairs).
Because the expression y = y + 1
is evaluated using 'non standard evaluation' in aes()
, the scoping rules change and the variable y
will first be attempted to be evaluated in the context of the data columns and not in the global environment, and hence we can 'calculate' the +1
on the dataframe columns.
It's not the aes()
function that determines what are valid argument = value
mappings, it is the layers that accept or reject parameters. You can find the parameters a layer accepts in the documentation of the layer, for example in ?geom_point
, you see that it understands x
, y,
alpha
, colour
, fill
, group
, shape
, size
and stroke
. You should be able to find these back if you call your_geom_layer$geom$aesthetics()
. Extension packages can define their own layers with their own aesthetics, such as the area
in the {treemap} package.
Additionally, because we've defined aes(x, y + 1)
in the main ggplot()
call, it will applied to every geometry or stat layer in that plot, in this case the points and the line. Hence, we do not need to repeat the same mapping in every layer but it is inherited unless you set inherit.aes = FALSE
in a layer.
In the point layer we've defined colour = "green"
outside the aes()
function, so it will be interpreted literally (and follows standard evaluation with the usual scoping rules). People also call this a 'static' mapping, and you can only use this in layers and not globally. In contrast, because we've defined aes(colour = "blue")
in the line layer, the "blue"
will be interpreted as a categorical variable that participates in a colour scale that has it's own palette (a 'dynamic' mapping). If you execute the code, you'll see that the line is not blue, but a salmon-ish colour with a legend that maps the categorical value "blue" to a discrete scale with a 1-colour palette. Because "blue"
is not a column in the dataframe, nor a variable in the global environment, it will be interpreted as a length 1 vector that will be recycled to fit the number of rows in the dataframe.
In general, if you want to map something to a scale (including position scales such as x
and y
), you put it inside aes()
. If you want to have a literal interpretation, you put it outside aes()
at the relevant layer.
Pass variable or expression into `aes`
Converted to an answer, as it seems useful
aes
is designed to evaluate unquoted column names within the scope of the provided dataset (dset
in your case). dset[, i]
is not a column name, rather a whole column which aes
wasn't designed to deal with.
Fortunately, you can parse quoted column names to aes_string
. Thus, using
aes_string(x = names(dset)[i])
instead of
aes(x = dset[, i])
Should solve your problem
When does the argument go inside or outside aes()?
This issue and more specifically the difference in the output from the two mentioned commands are explicitly dealt with in Section 5.4.2 of the 2nd edition of "ggplot2. Elegant graphics for data analysis", by Hadley Wickham himself:
Either:
- you can map (inside
aes
) a variable of your data to an aesthetic, e.g.,aes(..., color = VarX)
, or ... - you can set (outside
aes
, but inside ageom
element) an aesthetic to a constant value e.g. "blue"
In the first case, of mapping an aesthetic, such as color
, ggplot2 chooses a color based on a kind of uniform average of all available colors (at the colorwheel), because the values of the mapped variable are all constant; why should the chosen color coincide with the constant value you happend to choose to map from? More explicitly, if you try the command:
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y =hwy, color = "foo"))
you get exactly the same output plot as in the first command of the original question.
Related Topics
Mutate Multiple/Consecutive Columns (With Dplyr or Base R)
How to Plot Bars and One Line on Two Y-Axes in the Same Chart, with R-Ggplot
Update Plot Within Observer Loop in Shiny Application
Sum Specific Columns Among Rows
Splitting String Based on Letters Case
Tricks to Override Plot.Factor
How to Insert Missing Observations on a Data Frame
Operator Precedence of "Unary Minus" (-) and Exponentiation (^) Outside VS. Inside Function
Datatype for Linear Model in R
Adding a 3Rd Order Polynomial and Its Equation to a Ggplot in R
Insert Missing Time Rows into a Dataframe
Missing Data When Supplying a Dual-Axis--Multiple-Traces to Subplot
Write.Csv() a List of Unequally Sized Data.Frames
Text Mining R Package & Regex to Handle Replace Smart Curly Quotes
R: Calculate Means for Subset of a Group
Merge Plm Fitted Values to Dataset
R // Sum by Based on Date Range
Row Not Consolidating Duplicates in R When Using Multiple Months in Date Filter