Copy-On-Modify Semantic on a Vector Does Not Append in a Loop. Why

Copy-on-modify semantic on a vector does not append in a loop. Why?

I complete the @MikeH. awnser with this code

library(pryr)

x = runif(10)
y = numeric(length(x))
print(c(address(y), refs(y)))

for(i in 2:length(x))
{
y[i] = x[i] - x[i-1]
print(c(address(y), refs(y)))
}

print(c(address(y), refs(y)))

The output shows clearly what happened

[1] "0x7872180" "2"        
[1] "0x765b860" "1"
[1] "0x765b860" "1"
[1] "0x765b860" "1"
[1] "0x765b860" "1"
[1] "0x765b860" "1"
[1] "0x765b860" "1"
[1] "0x765b860" "1"
[1] "0x765b860" "1"
[1] "0x765b860" "1"
[1] "0x765b860" "2"

There is a copy at the first iteration. Indeed because of Rstudio there are 2 refs. But after this first copy y belongs in the loops and is not available into the global environment. Then, Rstudio does not create any additional refs and thus no copy is made during the next updates. y is updated by reference. On loop exit y become available in the global environment. Rstudio creates an extra refs but this action does not change the address obviously.

Stop copy-on-modify behavior in R in a while loop

I don't think that's possible for base R, but you can optimise this procedure using the data.table package. Try this and you will see no copy is made (I also did some other minor changes to further optimise your code)

library(data.table)

N <- 1e7; i <- 1:N
dt <- list(x = double(N), todo = logical(N)); setDT(dt)
cat(tracemem(dt), "\n")
cat(tracemem(N), "\n")
cat(tracemem(i), "\n")

while(N > 0L){
set(dt, i, "x", rexp(N, 3) + rexp(N, 3))
set(dt, i, "todo", runif(N, -1, 1) < cos(3.2*pi*dt[i]$x))
N <- length(i <- which(dt$todo))
}

It takes about 3 seconds to run, which is a bit too long. I think there is room for further improvement.

system.time({
N <- 1e7; i <- 1:N
dt <- list(x = double(N), todo = logical(N)); setDT(dt)

while(N > 0L){
set(dt, i, "x", rexp(N, 3) + rexp(N, 3))
set(dt, i, "todo", runif(N, -1, 1) < cos(3.2*pi*dt[i]$x))
N <- length(i <- which(dt$todo))
}
})

Result

 user  system elapsed 
3.26 0.10 3.34

What exactly is copy-on-modify semantics in R, and where is the canonical source?

Call-by-value

The R Language Definition says this (in section 4.3.3 Argument Evaluation)

The semantics of invoking a function in R argument are call-by-value. In general, supplied arguments behave as if they are local variables initialized with the value supplied and the name of the corresponding formal argument. Changing the value of a supplied argument within a function will not affect the value of the variable in the calling frame. [Emphasis added]

Whilst this does not describe the mechanism by which copy-on-modify works, it does mention that changing an object passed to a function doesn't affect the original in the calling frame.

Additional information, particularly on the copy-on-modify aspect are given in the description of SEXPs in the R Internals manual, section 1.1.2 Rest of Header. Specifically it states [Emphasis added]

The named field is set and accessed by the SET_NAMED and NAMED
macros, and take values 0, 1 and 2. R has a 'call by value'
illusion, so an assignment like

b <- a

appears to make a copy of a and refer to it as b. However, if
neither a nor b are subsequently altered there is no need to copy.

What really happens is that a new symbol b is bound to the same
value as a and the named field on the value object is set (in this
case to 2). When an object is about to be altered, the named field
is consulted. A value of 2 means that the object must be duplicated
before being changed. (Note that this does not say that it is
necessary to duplicate, only that it should be duplicated whether
necessary or not.) A value of 0 means that it is known that no other
SEXP shares data with this object, and so it may safely be altered.
A value of 1 is used for situations like

dim(a) <- c(7, 2)

where in principle two copies of a exist for the duration of the
computation as (in principle)

a <- `dim<-`(a, c(7, 2))

but for no longer, and so some primitive functions can be optimized to
avoid a copy in this case.

Whilst this doesn't describe the situation whereby objects are passed to functions as arguments, we might deduce that the same process operates, especially given the information from the R Language definition quoted earlier.

Promises in function evaluation

I don't think it is quite correct to say that a promise is passed to the function. The arguments are passed to the function and the actual expressions used are stored as promises (plus a pointer to the calling environment). Only when an argument gets evaluated is the expression stored in the promise retrieved and evaluated within the environment indicated by the pointer, a process known as forcing.

As such, I don't believe it is correct to talk about pass-by-reference in this regard. R has call-by-value semantics but tries to avoid copying unless a value passed to an argument is evaluated and modified.

The NAMED mechanism is an optimisation (as noted by @hadley in the comments) which allows R to track whether a copy needs to be made upon modification. There are some subtleties involved with exactly how the NAMED mechanism operates, as discussed by Peter Dalgaard (in the R Devel thread @mnel cites in their comment to the question)

Replacement functions in R

There was a change in how R 3.5 stores values in the form a:b. If you try the same example with

library(pryr)
x <- c(1,2,3,4,5,6,7,8,9,10)
address(x)
x[2] <- 7L
address(x)

You should get the same address. Now the 1:10 isn't full expanded until it has to be. And changing an element inside the vector will cause it to expand.

How do I clone a list so that it doesn't change unexpectedly after assignment?

new_list = my_list doesn't actually create a second list. The assignment just copies the reference to the list, not the actual list, so both new_list and my_list refer to the same list after the assignment.

To actually copy the list, you have several options:

  • You can use the builtin list.copy() method (available since Python 3.3):

    new_list = old_list.copy()
  • You can slice it:

    new_list = old_list[:]

    Alex Martelli's opinion (at least back in 2007) about this is, that it is a weird syntax and it does not make sense to use it ever. ;) (In his opinion, the next one is more readable).

  • You can use the built in list() constructor:

    new_list = list(old_list)
  • You can use generic copy.copy():

    import copy
    new_list = copy.copy(old_list)

    This is a little slower than list() because it has to find out the datatype of old_list first.

  • If you need to copy the elements of the list as well, use generic copy.deepcopy():

    import copy
    new_list = copy.deepcopy(old_list)

    Obviously the slowest and most memory-needing method, but sometimes unavoidable. This operates recursively; it will handle any number of levels of nested lists (or other containers).

Example:

import copy

class Foo(object):
def __init__(self, val):
self.val = val

def __repr__(self):
return f'Foo({self.val!r})'

foo = Foo(1)

a = ['foo', foo]
b = a.copy()
c = a[:]
d = list(a)
e = copy.copy(a)
f = copy.deepcopy(a)

# edit orignal list and instance
a.append('baz')
foo.val = 5

print(f'original: {a}\nlist.copy(): {b}\nslice: {c}\nlist(): {d}\ncopy: {e}\ndeepcopy: {f}')

Result:

original: ['foo', Foo(5), 'baz']
list.copy(): ['foo', Foo(5)]
slice: ['foo', Foo(5)]
list(): ['foo', Foo(5)]
copy: ['foo', Foo(5)]
deepcopy: ['foo', Foo(1)]

How to use c++11 move semantics to append vector contents to another vector?

Just do:

#include <iterator>
#include <algorithm>

// ...

void MoveAppend(std::vector<X>& src, std::vector<X>& dst)
{
if (dst.empty())
{
dst = std::move(src);
}
else
{
dst.reserve(dst.size() + src.size());
std::move(std::begin(src), std::end(src), std::back_inserter(dst));
src.clear();
}
}

If dst is empty, a move-assignment from src to dst will do the job - that will be as cheap as it can be, just "stealing" the array encapsulated by src so that dst will point to it afterwards.

If dst is not empty, elements appended to dst will be move-constructed from elements in src. After the call to std::move(), src will not be empty - it will contain "zombie" moved-from elements. That's why the call to clear() is still necessary.

Slice of pointers and loop

When you call range on a collection, the go runtime initialises 2 memory locations; one for the index (in this case _), and one for the value cmd.

What range then does, is take each of the items in the collection and copy them into the memory location that it created when you called range.

This means that each of the items in the slice get put into that memory location one by one.

When you do &cmd you are taking a pointer. This pointer points to the shared memory location that each of the slice items are being copied into.

All of your pointers created using &cmd all point to the same memory location.

This means that after the range is done, the only value left in that memory location that your pointers are pointing to is the last value from the range iteration.

This is why you get the output

update
update
update


Related Topics



Leave a reply



Submit