Can You Pass-By-Reference in R

Can you pass-by-reference in R?

No.

Objects in assignment statements are immutable. R will copy the object not just the reference.

> v = matrix(1:12, nrow=4)
> v
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
> v1 = v
> v1[,1] # fetch the first column
[1] 1 2 3 4

(proviso: the statement above is true for R primitives, e.g., vectors, matrices), and also for functions; I cannot say for certain whether it's true for all R objects--just most of them, as well as the vast majority of the ones most often used.)

If you don't like this behavior you can opt out of it with the help from an R Package. E.g., there is an R Package called R.oo that allows you to mimic pass-by-reference behavior; R.oo is available on CRAN.

R: Pass data.frame by reference to a function

Actually in R (almost) each modification is performed on a copy of the previous data (copy-on-writing behavior).

So for example inside your function, when you do d$value[i] <-0 actually some copies are created. You usually won't notice that since it's well optimized, but you can trace it by using tracemem function.

That being said, if your data.frame is not really big you can stick with your function returning the modified object, since it's just one more copy afterall.

But, if your dataset is really big and doing a copy everytime can be really expensive, you can use data.table, that allows in-place modifications, e.g. :

library(data.table)
d <- data.table(value=c(1,2,3,4))
f <- function(d){
for(i in 1:nrow(d)) {
if(d$value[i] %% 2 == 0){
set(d,i,1L,0) # special function of data.table (see also ?`:=` )
}
}
print(d)
}

f(d)
print(d)

# results :
> f(d)
value
1: 1
2: 0
3: 3
4: 0
>
> print(d)
value
1: 1
2: 0
3: 3
4: 0

N.B.

In this specific case, the loop can be replaced with a "vectorized" and more efficient version e.g. :

d[d$value %% 2 == 0,'value'] <- 0

but maybe your real loop code is much more convoluted and cannot be vectorized easily.

Is R data.table documented to pass by reference as argument?

I think what you're being surprised about is actually R behavior, which is why it's not specifically documented in data.table (maybe it should be anyway, as the implications are more important for data.table).

You were surprised that the object passed to a function had the same address, but this is the same for base R as well:

x = 1:10
address(x)
# [1] "0x7fb7d4b6c820"
(function(y) {print(address(y))})(x)
# [1] "0x7fb7d4b6c820"

What's being copied in the function environment is the pointer to x. Moreover, for base R, the parent x is immutable:

foo = function(y) {
print(address(y))
y[1L] = 2L
print(address(y))
}
foo(x)
# [1] "0x7fb7d4b6c820"
# [1] "0x7fb7d4e11d28"

That is, as soon as we try to edit y, a copy is made. This is related to reference counting -- you can see some work by Luke Tierney on this, e.g. this presentation

The difference for data.table is that data.table enables edit permissions for the parent object -- a double-edged sword as I think you know.

R: Passing a data frame by reference

The premise of the question is (partly) incorrect. R works as pass-by-promise and there is repeated copying in the manner you outline only when further assignments and alterations to the dataframe are made as the promise is passed on. So the number of copies will not be N*size where N is the stack depth, but rather where N is the number of levels where assignments are made. You are correct, however, that environments can be useful. I see on following the link that you have already found the 'proto' package. There is also a relatively recent introduction of a "reference class" sometimes referred to as "R5" where R/S3 was the original class system of S3 that is copied in R and R4 would be the more recent class system that seems to mostly support the BioConductor package development.

Here is a link to an example by Steve Lianoglou (in a thread discussing the merits of reference classes) of embedding an environment inside an S4 object to avoid the copying costs:

https://stat.ethz.ch/pipermail/r-help/2011-September/289987.html

Matthew Dowle's 'data.table' package creates a new class of data object whose access semantics using the "[" are different than those of regular R data.frames, and which is really working as pass-by-reference. It has superior speed of access and processing. It also can fall back on dataframe semantics since in later years such objects now inherit the 'data.frame' class.

You may also want to investigate Hesterberg's dataframe package.

R, pass-by-value inside a function

You can create a function which uses a locked binding and creates a function to complete your purpose. The former value of a will be used for w which will be stored in the environment of the function and will not be replaced by further values changes of a.

a <- 1
j <- new.env() # create a new environment
create.func <- function () {
j$w <<- a
function (x) {
x+ j$w
}
}
f <- create.func()
a <- 2
f(2)
[1] 3 # if w was changed this should be 4

Credits to Andrew Taylor (see comments)

EDIT: BE CAREFUL: f will change if you call create.func, even if you do not store it into f. To avoid this, you could write this code (it clearly depends on what you want).

a <- 1
create.func <- function (x) {
j <- new.env()
j$w <- a
function (x) {
x + j$w
}
}
f <- create.func()
f(1)
[1] 2
a <- 2
q <- create.func()
q(1)
[1] 3
f(1)
[1] 2

EDIT 2: Lazy evaluation doesn't apply here because a is evaluated by being set to j$w. If you had used it as an argument say:

function(a)
function(x)
#use a here

you would have to use force before defining the second function, because then it wouldn't be evaluated.

EDIT 3: I removed the foo <- etc. The function will return as soon as it is declared, since you want it to be similar to the code factories defined in your link.

EDIT by OPJust to add to the accepted answer that in spirit of
Function Factory in R
the code below works:

funs.gen <- function(n) {
force(n)
function(x) {
x + n
}
}

funs = list()
for (i in seq(length(names))) {
n = names[i]
funs[[n]] = funs.gen(i)
}

Rcpp and R: pass by reference

No, R does not make a copy immediately, only if it is necessary, i.e., copy-on-modify:

x <- 1
tracemem(x)
#[1] "<0000000009A57D78>"
y <- x
tracemem(x)
#[1] "<0000000009A57D78>"
x <- 2
tracemem(x)
#[1] "<00000000099E9900>"

Since you modify M by reference outside R, R can't know that a copy is necessary. If you want to ensure a copy is made, you can use data.table::copy. Or avoid the side effect in your C++ code, e.g., make a deep copy there (by using clone).

pass a list by reference in R function

There is no easy way to have pass-by-reference behaviours in R, with only one exception: environment. I'm not sure if environment suits your need, you can give it a try:

modify_input <- function(x){
x$z <- 1
}

x <- new.env(parent = emptyenv())
modify_input(x)
x$z

As to the usage, environment supports e$z and e[["z"]] and length(e) just like list, but it doesn't support e[[1]] and things like that. You can think of environment as a dictionary and elements in it have no order. If you want to list all the elements in an environment, you can use ls. And there are ways to transform environment to list (as.list) and vise versa (list2env). Hope it can help.



Related Topics



Leave a reply



Submit