How to Write an R Function That Evaluates an Expression Within a Data-Frame

How to write an R function that evaluates an expression within a data-frame

The lattice package does this sort of thing in a different way. See, e.g., lattice:::xyplot.formula.

fn <- function(dat, expr) {
  eval(substitute(expr), dat)
}
fn(df, a)             # 1 2 3 4 5
fn(df, 2 * a + b)     # 3 6 9 12 15

R: Evaluate an expression in a data frame with arguments that are passed as an object

From ?lm, re data argument:

If not found in data, the variables are taken from environment(formula)

In your first case, the formula is created in your eval(expr, df, pf) call, so the environment of the formula is an environment based on df. In the second case, the formula is created in the global environment, which is why it doesn't work.

Because formulas come with their own environment, they can be tricky to handle in NSE.

You could try:

with(mydf,
  {
    print(lm(y~x))
    fml <- y~x
    print(lm(fml))
  }
)

but that probably isn't ideal for you. Short of checking whether any names in the captured parameter resolve to formulas, and re-assigning their environments, you'll have some trouble. Worse, it isn't even necessarily obvious that re-assigning the environment is the right thing to do. In many cases, you do want to look in the formula environment.

There was a loosely related discussion on this issue on R Chat:

Ben Bolker outlines an issue
Josh O'Brien points to some old references

Evaluation of multiple expression on data frame

This can be done with the following adjustments:

createFactor <- function(df, column, condition, label){
  df[column] <- NA      
  for(i in seq_along(label)) {
    df[,column][condition[[i]]] <- label[i]
  }
  return(df)
}

set.seed(26)
dataset <- matrix(sample(c(NA, 1:5), 25, replace = TRUE), 5)
df <- as.data.frame(dataset)
tempMerge_TEST <- createFactor(df,
                               column='V2',
                               condition=list(df$V1==1, df$V5==5),
                               label=c('medium', 'high'))

Note one important difference in how the function was called: changing condition=c(df$V1==1, df$V5==5) to condition=list(df$V1==1, df$V5==5) (changing c() to list()). This is necessary because c() would concatenate the two conditions, when what you really want is a list of two conditions to work with.

Finally, while you wanted a base R solution, the case_when function from dplyr is pretty helpful for situations like this:

library(dplyr)
df %>%
  mutate(V2 = case_when(V1 == 1 ~ "medium",
                        V5 == 5 ~ "high"))

Pass expressions to function to evaluate within data.table to allow for internal optimisation

No need for fancy tools, just use base R metaprogramming features.

my_fun2 = function(my_i, my_j, by, my_data) {
  dtq = substitute(
    my_data[.i, .j, .by],
    list(.i=substitute(my_i), .j=substitute(my_j), .by=substitute(by))
  )
  print(dtq)
  eval(dtq)
}

my_fun2(Species == "setosa", sum(Sepal.Length), my_data=as.data.table(iris))
my_fun2(my_j = "Sepal.Length", my_data=as.data.table(iris))

This way you can be sure that data.table will use all possible optimizations as when typing [ call by hand.

Note that in data.table we are planning to make substitution easier, see solution proposed in PR
Rdatatable/data.table#4304.

Then using extra env var substitute will be handled internally for you

my_fun3 = function(my_i, my_j, by, my_data) {
  my_data[.i, .j, .by, env=list(.i=substitute(my_i), .j=substitute(my_j), .by=substitute(by)), verbose=TRUE]
}
my_fun3(Species == "setosa", sum(Sepal.Length), my_data=as.data.table(iris))
#Argument 'j'  after substitute: sum(Sepal.Length)
#Argument 'i'  after substitute: Species == "setosa"
#...
my_fun3(my_j = "Sepal.Length", my_data=as.data.table(iris))
#Argument 'j'  after substitute: Sepal.Length
#...

In R, how do I evaluate an expression in a specific environment within a function?

I am not sure why expression() doesn't work in this context. However, it works if you write expr as a string and replace expression(expr) by parse(text=expr):

loopFunction <- function(expr,
                         ...) {

  ### Get all 'dots' in a named list
  arguments <- list(...);
  argNames <- names(arguments);

  if (any(length(tail(arguments, -2) > 1))) {
    stop("Only the first two arguments may have length > 1!");
  }

  for (esIndex in seq_along(arguments[[1]])) {
    for (pwrIndex in seq_along(arguments[[2]])) {
      tempEnvironment <-
        new.env();
      assign(argNames[1], arguments[[1]][esIndex],
             envir = tempEnvironment);
      assign(argNames[2], arguments[[2]][pwrIndex],
             envir = tempEnvironment);
      if (length(arguments) > 2) {
        for (i in 3:length(arguments)) {
          assign(argNames[i], arguments[[i]],
                 envir = tempEnvironment);
        }
      }
      print(argNames);
      print(as.list(tempEnvironment));
      print(ls(tempEnvironment));
      print(get('x', envir=tempEnvironment));
      print(get('n', envir=tempEnvironment));
      return(eval(expr=parse(text=expr), envir =tempEnvironment)$estimate)
    }
  }
}

loopFunction("binom.test(x, n)", x=10, n=30)

Result:

> loopFunction("binom.test(x, n)", x=10, n=30)
[1] "x" "n"
$`x`
[1] 10

$n
[1] 30

[1] "n" "x"
[1] 10
[1] 30
probability of success 
             0.3333333

R: passing expression to an inner function

This is most easily avoided by passing strings into topfn instead of expressions.

topfn <- function(df, ex_txt) 
{
  fn(df, ex_txt) 
}

fn <- function(dfr, expr_txt) 
{        
   eval(parse(text = expr_txt), dfr) 
}

df <- data.frame(a = 1:5, b = 1:5 )
fn(df, "a")                              
fn(df, "2 * a + b")
topfn(df, "a")             
topfn(df, "2 * a + b")

EDIT:

You could let the user pass expressions in, but use strings underneath for your convenience.

Change topfn to

topfn <- function(df, ex) 
{
  ex_txt <- deparse(substitute(ex))
  fn(df, ex_txt) 
}
topfn(df, a)             
topfn(df, 2 * a + b)

ANOTHER EDIT:

This seems to work:

topfn <- function(df, ex) 
{
  eval(substitute(fn(df, ex)))
}

fn <- function(dfr, expr) 
{        
   eval(substitute(expr), dfr) 
}
fn(df, a)                              
fn(df, 2 * a + b)
topfn(df, a)             
topfn(df, 2 * a + b)

Evaluate values within expression function

Then you do not need to evaluate, since what you need is an expression. you need to substitute:

a <- 2
b <- 1
substitute(expression(b + a),list(b=1))
expression(1 + a)

Evaluate expression with in-function variable calculation

Using substitute rather than quote:

add_trend <- function(.data, .f = NULL) {
  .t <- 1:NROW(.data)
  .expr <- substitute(.f)
  eval(.expr)
}

From help("quote")

substitute returns the parse tree for the (unevaluated) expression
expr, substituting any variables bound in env.

quote simply returns its argument. The argument is not evaluated and
can be any R expression.

How to Write an R Function That Evaluates an Expression Within a Data-Frame