Update() Inside a Function Only Searches the Global Environment

update() inside a function only searches the global environment?

I've been bitten by this behaviour before too, so I wrote my own version of update. It evaluates everything in the environment of the formula, so it should be fairly robust.

my_update <- function(mod, formula = NULL, data = NULL) {
  call <- getCall(mod)
  if (is.null(call)) {
    stop("Model object does not support updating (no call)", call. = FALSE)
  }
  term <- terms(mod)
  if (is.null(term)) {
    stop("Model object does not support updating (no terms)", call. = FALSE)
  }

  if (!is.null(data)) call$data <- data
  if (!is.null(formula)) call$formula <- update.formula(call$formula, formula)
  env <- attr(term, ".Environment")

  eval(call, env, parent.frame())
}

library(nlme4)

fake <- data.frame(
  subj = rep(1:5, 4), 
  factor1 = rep(LETTERS[c(1,2,1,2)], each = 5), 
  factor2 = rep(letters[1:2], each = 10), 
  data = sort(rlnorm(20)))

foo <- function() {
  temp <- fake
  model1 <- lmer(data ~ factor1 * factor2 + (1 | subj), fake)
  model1a <- my_update(model1, ~ . - factor1:factor2)
  model1a
}
foo()

Function input not recognised - local & global environment issue

I could make it work with the following code.

plottime <- function(x) { #takes a timetrend list as input
  y=x[[3]]
  t <- tslm(formula = y ~ trend)
  plot(x[[3]])
  lines(t$fitted.values)
  return(t)
}

Sample Image

Not sure why it is happening, maybe the use of indexing x[[3]] in the formula argument is a problem?

Overwriting an object in the global environment after using deparse(substitute()) in a function call

You can fix this by using deparse(substitute(data)) before you do anything to data:

# Let's change your function just a bit
change.name <- function(data){
    # call deparse(substutite()) *before* you do anything to data
    object_name <- deparse(substitute(data))
    for (i in 1:length(data)){
        names(data[[i]]) <- c("a", "b", "c", "d", "e")
    }
    assign(object_name, value = data, envir = globalenv())
}

# Create sample data
my_object1 <- lapply(1:12, function(x) {
    data.frame(u = 1, v = 2, x = 3, y = 4, z = 5)
})
names(my_object1) <- month.name

change.name(my_object1)
ls()
#> [1] "change.name" "my_object1"
head(my_object1, 2)
#> $January
#>   a b c d e
#> 1 1 2 3 4 5
#> 
#> $February
#>   a b c d e
#> 1 1 2 3 4 5

^{Created on 2018-12-20 by the reprex package (v0.2.1)}

Changing options() in a function environment without changing options() in global environment in R?

This is a good place to use on.exit(). It has the virtue of ensuring that the options get reset to their original values (stored in oo) before the evaluation frame of the function call is exited -- even if that exit is the result of an error.

f <- function(x) {
    oo <- options(scipen = -100)
    on.exit(options(oo))
    print(x)
}

## Try it out
1111
## [1] 1111
f(1111)
## [1] 1.111e+03
1111
## [1] 1111

Update data frame via function doesn't work

test in your function is a copy of the object from your global environment (I'm assuming that's where it is defined). Assignment happens in the current environment unless specified otherwise, so any changes that happen inside the function apply only to the copy inside the function, not the object in your global environment.

And it's good form to pass all necessary objects as arguments to the function.

Personally, I would return(test) at the end of your function and make the assignment outside of the function, but I'm not sure if you can do this in your actual situation.

test.fun <- function (x, test) {
    test[test$v1==x,"v2"] <- 10
    return(test)
}
test <- data.frame(v1=c(rep(1,3),rep(2,3)),v2=0)
(test <- test.fun(1, test))
#  v1 v2
#1  1 10
#2  1 10
#3  1 10
#4  2  0
#5  2  0
#6  2  0

If it is absolutely necessary to modify an object outside your function directly, so you need to tell R that you want to assign the local copy of test to the test in the .GlobalEnv.

test.fun <- function (x, test) {
    test[test$v1==x,"v2"] <- 10
    assign('test',test,envir=.GlobalEnv)
    #test <<- test  # This also works, but the above is more explicit.
}
(test.fun(1, test))
#  v1 v2
#1  1 10
#2  1 10
#3  1 10
#4  2  0
#5  2  0
#6  2  0

Using assign or <<- in this fashion is fairly uncommon, though, and many experienced R programmers will recommend against it.

update() a model inside a function with local covariate

The problem is that var1 is looked up in the data frame and the model's environment but not within the environment in MyUpdate.

1) To avoid this problem update the model with not only the revised formula but also a revised data frame containing var1 :

MyUpdate <- function(model) {
     mf <- model.frame(model)
     n <- nrow(mf)
     var1 <- rnorm(n)
     update(model, formula = . ~ . + var1, data = data.frame(mf, var1))
}

The above is probably the best solution of the ones presented in this answer as it avoids mucking around with internal structures. It seems to work for lm, glm, multinom and clm. The other solutions below do muck around with internal structures and therefore are less general across model fitting routines. The others all work with lm but may not work for others.

test Here is a test which runs without errors on each of the model fitting functions mentioned in the question if MyUpdate is as above and also the solutions in (2) all run the tests without error. The solution (3) works at least with lm.

model.lm <- lm(Sepal.Length~Species, data=iris)
MyUpdate(model.lm)

model.glm <- glm(Sepal.Length~Species, data=iris)
MyUpdate(model.glm)

library(nnet)
example(multinom)
MyUpdate(bwt.mu)

library(ordinal)
model.clm <- clm(rating ~ temp * contact, data = wine)
MyUpdate(model.clm)

The remaining solutions perform more direct access of internals making them less robust to changing the model function.

2) Messing with Environments

In addition here are three solutions that involve messing with environments. The first is the cleanest followed by the second and then the third. The third is the least acceptable since it actually writes var1 into the model's environment (dangerously overwriting any var1 there) but it is the shortest. They work with lm, glm multinom and clm.

Note that we do not really need to put var1 into a data frame nor is it necessary to put the updating formula in quotes and we have changed both in all examples below. Also the return statement can be removed and we have done that too.

2a) The following modifies the environment of the original model to point to a new proxy proto object containing var1 whose parent is the original model environment. Here proto(p, var1 = rnorm(n)) is the proxy proto object (a proto object is an environment with differing semantics) and p is the parent of the proxy.

library(proto)

MyUpdate <- function(model){

     mf <- model.frame(model)
     n <- nrow(mf)
     var1 <- rnorm(n)
     p <- environment(formula(model))

     if (is.null(model$formula)) {
           attr(model$terms, ".Environment") <- proto(p, var1 = var1)
     } else environment(model$formula) <- proto(p, var1 = var1)

     update(model, . ~ . + var1) 
}

#note: the period is shorthand for 
keep everything on either the left or right hand side 
of the formula (i.e., the ~) and 
the + or - sign are used to add or remove model terms

For more information read the Proxies section in this document: http://r-proto.googlecode.com/files/prototype_approaches.pdf

2b) This could alternately be done without proto but at the expense of expanding the ## line to three lines containing some additional ugly environment manipulations. Here e is the proxy environment.

MyUpdate <- function(model){
     mf <- model.frame(model)
     n <- nrow(mf)
     var1 <- rnorm(n)
     p <- environment(formula(model))

     e <- new.env(parent = p)
     e$var1 <- var1

     if (is.null(model$formula)) attr(model$terms, ".Environment") <- e
     else environment(model$formula) <- e

     update(model, . ~ . + var1)
}

2c) Shortest but the most hackish is to stick var1 into the original model environment:

MyUpdate <- function(model){
     mf <- model.frame(model)
     n <- nrow(mf)
     var1 <- rnorm(n)       

     if (is.null(model$formula)) attr(model$terms, ".Environment")$var1 <- var1
     else environment(model$formula)$var1 <- var1

     update(model, . ~ . + var1)
}

3) eval/substitute This solution does use eval which is sometimes frowned upon. It works on lm and glm and on clm it works except that the output does not display var1 but rather the expression that computes it.

MyUpdate <- function(model) {
     m <- eval.parent(substitute(update(model, . ~ . + rnorm(nrow(model.frame(model))))))
     m$call$formula <- update(formula(model), . ~ . + var1)
     names(m$coefficients)[length(m$coefficient)] <- "var1"
     m
}

REVISED Added additional solutions, simplified (1), got solutions in (2) to run all examples in test section.

R scoping: disallow global variables in function

My other answer is more about what approach you can take inside your function. Now I'll provide some insight on what to do once your function is defined.

To ensure that your function is not using global variables when it shouldn't be, use the codetools package.

library(codetools)

sUm <- 10
f <- function(x, y) {
    sum = x + y
    return(sUm)
}

checkUsage(f)

This will print the message:

<anonymous> local variable ‘sum’ assigned but may not be used (:1)

To see if any global variables were used in your function, you can compare the output of the findGlobals() function with the variables in the global environment.

> findGlobals(f)
[1] "{"  "+"  "="  "return"  "sUm"

> intersect(findGlobals(f), ls(envir=.GlobalEnv))
[1] "sUm"

That tells you that the global variable sUm was used inside f() when it probably shouldn't have been.

Update() Inside a Function Only Searches the Global Environment