Call by Reference in R (Using Function to Modify an Object)

Call by reference in R (using function to modify an object)

The two paradigms are replacing the whole object, as you indicate, or writing 'replacement' functions such as

`updt<-` <- function(x, ..., value) {
    ## x is the object to be manipulated, value the object to be assigned
    x$lbl <- paste0(x$lbl, value)
    x
}

with

> d <- data.frame(x=1:5, lbl=letters[1:5])
> d
  x lbl
1 1   a
2 2   b
3 3   c
> updt(d) <- "*"
> d
  x lbl
1 1  a*
2 2  b*
3 3  c*

This is the behavior of, for instance, $<- -- in-place update the element accessed by $. Here is a related question. One could think of replacement functions as syntactic sugar for

updt1 <- function(x, ..., value) {
    x$lbl <- paste0(x$lbl, value)
    x
}
d <- updt1(d, value="*")

but the label 'syntactic sugar' doesn't really do justice, in my mind, to the central paradigm that is involved. It is enabling convenient in-place updates, which is different from the copy-on-change illusion that R usually maintains, and it is really the 'R' way of updating objects (rather than using ?ReferenceClasses, for instance, which have more of the feel of other languages but will surprise R users expecting copy-on-change semantics).

Modify contents of object with call by reference

I do not remember why, but for <- functions, the last argument must be named 'value'.
So in your case:

setGeneric("add<-", function(testA,value) standardGeneric("add<-"))
setReplaceMethod("add",signature = c("test","test"),
  definition=function(testA,value) {
    testA@val <- testA@val + value@val
    testA
  })
bar = new("test")
add(foo)<-bar

You may also use a Reference class ig you want to avoid the traditional arguments as values thing.

Modify contents of object with call by reference

I do not remember why, but for <- functions, the last argument must be named 'value'.
So in your case:

setGeneric("add<-", function(testA,value) standardGeneric("add<-"))
setReplaceMethod("add",signature = c("test","test"),
  definition=function(testA,value) {
    testA@val <- testA@val + value@val
    testA
  })
bar = new("test")
add(foo)<-bar

You may also use a Reference class ig you want to avoid the traditional arguments as values thing.

Can you pass-by-reference in R?

No.

Objects in assignment statements are immutable. R will copy the object not just the reference.

> v = matrix(1:12, nrow=4)
> v
           [,1] [,2] [,3]
     [1,]    1    5    9
     [2,]    2    6   10
     [3,]    3    7   11
     [4,]    4    8   12
> v1 = v
> v1[,1]     # fetch the first column 
     [1] 1 2 3 4

(proviso: the statement above is true for R primitives, e.g., vectors, matrices), and also for functions; I cannot say for certain whether it's true for all R objects--just most of them, as well as the vast majority of the ones most often used.)

If you don't like this behavior you can opt out of it with the help from an R Package. E.g., there is an R Package called R.oo that allows you to mimic pass-by-reference behavior; R.oo is available on CRAN.

modify variable within R function

There are ways as @Dason showed, but really - you shouldn't!

The whole paradigm of R is to "pass by value". @Rory just posted the normal way to handle it - just return the modified value...

Environments are typically the only objects that can be passed by reference in R.

But lately new objects called reference classes have been added to R (they use environments). They can modify their values (but in a controlled way). You might want to look into using them if you really feel the need...

Change arguments in a call object

I believe that the pryr package provides some useful functions for manipulating calls:

lma <- lm(mpg ~ cyl, data=mtcars)
lm_call <- lma$call

library(pryr)
modify_call(lm_call,list(weights = runif(32)))

> lm_call2 <- modify_call(lm_call,list(weights = runif(32)))
> eval(lm_call2)

Call:
lm(formula = mpg ~ cyl, data = mtcars, weights = c(0.934802365722135, 
0.983909613220021, 0.762353664264083, 0.23217184189707, 0.850970500381663, 
0.430563687346876, 0.962665138067678, 0.318865151610225, 0.697970792884007, 
0.389103061752394, 0.824285467388108, 0.676439745584503, 0.344414771301672, 
0.292265978176147, 0.925716639030725, 0.517001488478854, 0.726312294835225, 
0.842773627489805, 0.669753148220479, 0.618112818570808, 0.139365098671988, 
0.843711007386446, 0.851153723662719, 0.134744396666065, 0.92681276681833, 
0.00274682720191777, 0.732672147220001, 0.4184603120666, 0.0912447033915669, 
0.427389309043065, 0.721000595251098, 0.614837386412546))

Coefficients:
(Intercept)          cyl  
     38.508       -2.945

You can look inside pryr::modify_call to see what it's doing if you'd like to do it manually, I suppose.

R: Pass data.frame by reference to a function

Actually in R (almost) each modification is performed on a copy of the previous data (copy-on-writing behavior).

So for example inside your function, when you do d$value[i] <-0 actually some copies are created. You usually won't notice that since it's well optimized, but you can trace it by using tracemem function.

That being said, if your data.frame is not really big you can stick with your function returning the modified object, since it's just one more copy afterall.

But, if your dataset is really big and doing a copy everytime can be really expensive, you can use data.table, that allows in-place modifications, e.g. :

library(data.table)
d <- data.table(value=c(1,2,3,4))
f <- function(d){
  for(i in 1:nrow(d)) {
    if(d$value[i] %% 2 == 0){
      set(d,i,1L,0) # special function of data.table (see also ?`:=` )
    }
  }
  print(d)
}

f(d)
print(d)

# results :
> f(d)
   value
1:     1
2:     0
3:     3
4:     0
> 
> print(d)
   value
1:     1
2:     0
3:     3
4:     0

N.B.

In this specific case, the loop can be replaced with a "vectorized" and more efficient version e.g. :

d[d$value %% 2 == 0,'value'] <- 0

but maybe your real loop code is much more convoluted and cannot be vectorized easily.

Modify S3 object without returning it?

Here is a reference class implementation, as suggested in one of the comments. The basic idea is to set up a reference class called Stores that has three fields: apples, pears and fruits (edited to be an accessor method). The initialize method is used to initialize a new store, the addApples method adds apples to the store, while the show method is equivalent to print for other objects.

Stores = setRefClass("Stores", 
  fields = list(
    apples = "numeric",
    pears  = "numeric",
    fruits = function(){apples + pears}
  ), 
  methods = list(
    initialize = function(apples, pears){
      apples <<- apples
      pears <<- pears
    },
    addApples = function(i){
      apples <<- apples + i
    },
    show = function(){
      cat(apples, "apples and", pears, "pears")
    }
  )
)

If we initialize a new store and call it, here is what we get

FruitStore = Stores$new(apples = 3, pears = 4)
FruitStore

# 3 apples and 4 pears

Now, invoking the addApples method, let us add 4 apples to the store

FruitStore$addApples(4)
FruitStore

# 7 apples and 4 pears

EDIT. As per Hadley's suggestion, I have updated my answer so that fruits is now an accessor method. It remains updated as we add more apples to the store. Thanks @hadley.

Replacement functions in R that don't take input

After further information from the OP, it looks as if what is needed is a way to write to the existing variable in the environment that calls the function. This can be done with non-standard evaluation:

check_result <- function(process_list) 
{ 
  # Capture the name of the passed object as a string
  list_name <- deparse(substitute(process_list))

  # Check the object exists in the calling environment
  if(!exists(list_name, envir = parent.frame()))
     stop("Object '", list_name, "' not found")

  # Create a local copy of the passed object in function scope
  copy_of_process_list <- get(list_name, envir = parent.frame())

  # If the process has completed, write its output to the copy
  # and assign the copy to the name of the object in the calling frame
  if(length(copy_of_process_list$process$get_exit_status()) > 0)
  {
    copy_of_process_list$output <- copy_of_process_list$process$read_all_output_lines()
    assign(list_name, copy_of_process_list, envir = parent.frame()) 
  }
  print(copy_of_process_list)
}

This will update res if the process has completed; otherwise it leaves it alone. In either case it prints out the current contents. If this is client-facing code you will want further type-checking logic on the object passed in.

So I can do

res <- run_sh(c("naw.sh", "hello"))

and check the contents of res I have:

res
#> $`process`
#> PROCESS 'sh', running, pid 1112.
#> 
#> $orig_args
#> [1] "naw.sh" "hello" 
#> 
#> $output
#> NULL

and if I immediately run:

check_result(res)
#> $`process`
#> PROCESS 'sh', running, pid 1112.
#> 
#> $orig_args
#> [1] "naw.sh" "hello" 
#> 
#> $output
#> NULL

we can see that the process hasn't completed yet. However, if I wait a few seconds and call check_result again, I get:

check_result(res)
#> $`process`
#> PROCESS 'sh', finished.
#> 
#> $orig_args
#> [1] "naw.sh" "hello" 
#> 
#> $output
#> [1] "hello"     "naw 1"     "naw 2"     "naw 3"     "naw 4"     "naw 5"    
#> [7] "All done."

and without explicitly writing to res, it has updated via the function:

res
#> $`process`
#> PROCESS 'sh', finished.
#> 
#> $orig_args
#> [1] "naw.sh" "hello" 
#> 
#> $output
#> [1] "hello"     "naw 1"     "naw 2"     "naw 3"     "naw 4"     "naw 5"    
#> [7] "All done."

Call by Reference in R (Using Function to Modify an Object)