Avoid Copying the Whole Vector When Replacing an Element (A[1] <- 2)

Avoid copying the whole vector when replacing an element (a[1] - 2)

The tracemem function (R needs to be compiled to support it) provides an indication of when copying occurs. Here's what you do

> a <- 1:1000000; tracemem(a)
[1] "<0x7f791b39e010>"
> a[1] = 2
tracemem[0x7f791b39e010 -> 0x7f791a9d4010]:

and indeed there's a copy. But this is because you're coercing a from an integer vector (1:1000000 creates a sequence of integers) to a numeric vector (because 2 is a numeric value, and R coerces to a common type). If instead you update your integer vector with an integer value, or a numeric vector with a numeric value, there is no copying

> a <- 1:1000000; tracemem(a)
[1] "<0x7f791a4ef010>"
> a[1] = 2L
> a = c(1, 2, 3); tracemem(a)
[1] "<0x5180470>"
> a[1] = 2
>

A little bit further insight comes from understanding at a superficial level how R's memory management works. Each allocation has a NAMED level associated with it. NAMED=0 or 1 indicates that there is at most 1 symbol that refers to it; it is therefore safe to copy in place. NAMED=2 means that there is, or has been, at least 2 symbols pointing to the same location, and that any attempt to update the value requires a duplication to preserve R's illusion of 'copy on change'. The following reveals some of the internal structure of a, including that it of type INTSXP (integer) with NAM(1) (NAMED level 1) and that it's being TRaced. Hence updating (with an integer!) does not require a copy.

> a = 1:10; tracemem(a); .Internal(inspect(a))
[1] "<0x5170818>"
@5170818 13 INTSXP g0c4 [NAM(1),TR] (len=10, tl=0) 1,2,3,4,5,...
> a[1] = 2L
>

On the other had, here two symbols refer to the location in memory, hence NAMED is 2 and a copy is required

> a = b = 1:10; tracemem(a); .Internal(inspect(a))
[1] "<0x576d1a0>"
@576d1a0 13 INTSXP g0c4 [NAM(2),TR] (len=10, tl=0) 1,2,3,4,5,...
> a[1] = 2L
tracemem[0x576d1a0 -> 0x576d148]:

It is difficult to reason about NAMED, so at some level these types of games have a level of futility about them.

inspect returns other information. Each R type is represented internally as an 'SEXP' (S-expression) type. These are enumerate, and the 13th SEXP type is an integer SEXP -- hence 13 INTSXP. Check out .Internal(inspect(...)) for a numeric vector, character vector, or even function .Internal(inspect(function() {})).

R manages memory by periodically running a 'garbage collector' that checks to see if memory is currently referenced; if it is not, then it is reclaimed for use by another symbol. The garbage collector is 'generational', which means that recently allocated memory is checked for reclamation more frequently than older memory (this is because, empirically, variables tend to have a short half-life, e.g., for the duration of a function call, so recently allocated memory is more likely to be available for reclamation than memory that has been in use for a longer time). The g0c4 and similar annotations are providing information about the generation the SEXP belongs to.

The TR represents a 'bit' set in the SEXP to indicate that the variable is being traced; it was set when we said tracemem(a).

Some of these topics are discussed in the documentation of R's internal implementation RShowDoc("R-ints") and in the C header file Rinternals.h.

Unable to avoid copying while pushing objects with copy-construcor into a vector

std::move isn't useful here. The temporary Vertex object is already a prvalue, so casting it to an xvalue doesn't change anything. The class has no move constructor, so copy initialisation cannot move; it has to copy. The implicit move constructor has been inhibited by the user defined copy constructor. Although, the move constructor couldn't be any faster than the copy constructor for this class anyway.

The way that emplace_back works is that it forwards the arguments to the constructor of the element. If the argument that you pass is an object of the element type, then you invoke the constructor that accepts another instance of the class - that is the copy constructor (or the move constructor for classes that have it).

Instead of creating a temporary Vertex object, and passing it as an argument to emplace_back, you should pass the three size_t objects that can be forwarded to the constructor Vertex(size_t, size_t, size_t). This way you can avoid copying (and moving) of the Vertex object entirely:

vert.emplace_back(1, 2, 3);
vert.emplace_back(4, 5, 6);
vert.emplace_back(7, 8, 9);

Replace given value in vector

Perhaps replace is what you are looking for:

> x = c(3, 2, 1, 0, 4, 0)
> replace(x, x==0, 1)
[1] 3 2 1 1 4 1

Or, if you don't have x (any specific reason why not?):

replace(c(3, 2, 1, 0, 4, 0), c(3, 2, 1, 0, 4, 0)==0, 1)

Many people are familiar with gsub, so you can also try either of the following:

as.numeric(gsub(0, 1, x))
as.numeric(gsub(0, 1, c(3, 2, 1, 0, 4, 0)))

Update

After reading the comments, perhaps with is an option:

with(data.frame(x = c(3, 2, 1, 0, 4, 0)), replace(x, x == 0, 1))

R programming: avoid copying the whole data frame when amending?

You should use data.table for this, it is done for exactly this purpose. And read this reference post

R) dt<-data.table(x=1:10,y=1:10)
R) .Internal(inspect(dt))
@0x000000000dce56c0 19 VECSXP g0c7 [OBJ,NAM(1),ATT] (len=2, tl=100)
@0x000000000ebc4100 13 INTSXP g0c4 [NAM(2)] (len=10, tl=0) 1,2,3,4,5,...
@0x000000000ebc41b0 13 INTSXP g0c4 [NAM(2)] (len=10, tl=0) 1,2,3,4,5,...
ATTRIB:
@0x000000000e6c2d00 02 LISTSXP g0c0 []
TAG: @0x00000000003b0088 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "names" (has value)
@0x000000000cc99fd0 16 STRSXP g0c7 [NAM(2)] (len=2, tl=100)
@0x00000000003ddbb8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "x"
@0x000000000734f4d8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "y"
TAG: @0x00000000003b1d98 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "row.names" (has value)
@0x0000000014487f98 13 INTSXP g0c1 [] (len=2, tl=0) -2147483648,-10
TAG: @0x00000000003b0558 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "class" (has value)
@0x000000000ead1910 16 STRSXP g0c2 [] (len=2, tl=0)
@0x000000000753f440 09 CHARSXP g1c2 [MARK,gp=0x61] [ASCII] [cached] "data.table"
@0x000000000715f398 09 CHARSXP g1c2 [MARK,gp=0x61,ATT] [ASCII] [cached] "data.frame"
TAG: @0x000000000c3d7cc0 01 SYMSXP g1c0 [MARK] ".internal.selfref"
@0x000000000e6c1e80 22 EXTPTRSXP g0c0 []
R) dt[,y:=y+1]
R) .Internal(inspect(dt))
@0x000000000dce56c0 19 VECSXP g0c7 [OBJ,NAM(2),ATT] (len=2, tl=100)
@0x000000000ebc4100 13 INTSXP g0c4 [NAM(2)] (len=10, tl=0) 1,2,3,4,5,...
@0x000000000ebc6728 14 REALSXP g0c6 [NAM(1)] (len=10, tl=0) 2,3,4,5,6,...
ATTRIB:
@0x000000000e6c2d00 02 LISTSXP g0c0 []
TAG: @0x00000000003b0088 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "names" (has value)
@0x000000000cc99fd0 16 STRSXP g0c7 [NAM(2)] (len=2, tl=100)
@0x00000000003ddbb8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "x"
@0x000000000734f4d8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "y"
TAG: @0x00000000003b1d98 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "row.names" (has value)
@0x0000000014487f98 13 INTSXP g0c1 [] (len=2, tl=0) -2147483648,-10
TAG: @0x00000000003b0558 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "class" (has value)
@0x000000000ead1910 16 STRSXP g0c2 [NAM(2)] (len=2, tl=0)
@0x000000000753f440 09 CHARSXP g1c2 [MARK,gp=0x61] [ASCII] [cached] "data.table"
@0x000000000715f398 09 CHARSXP g1c2 [MARK,gp=0x61,ATT] [ASCII] [cached] "data.frame"
TAG: @0x000000000c3d7cc0 01 SYMSXP g1c0 [MARK] ".internal.selfref"
@0x000000000e6c1e80 22 EXTPTRSXP g0c0 []

2D std::vector replace values - need for delete to avoid memory leaks?

You don't need to delete anything in your code sice you didn't create something with new.

If you want too do all this in a single line you can use std::for_each and std::fill:

#include <algorithm>
#include <vector>
std::vector<std::vector<int>> a;

// set every element in two dimensional vector to 5
std::for_each(a.begin(), a.end(), [](std::vector<int> &x){std::fill(x.begin(), x.end(), 5);});

Addendum related to your comment:
Yes, your original vector is stored on the stack. Since you don't pass a custom allocator, the vector will use the std::allocator to allocate its elements (which in your case are vectors of int). The std::allocator allocates these elements in dynamic memory aka the heap but you don't need to worry about freeing or deleting this memory because it is handeled by the internals of the vector. This means that the memory is deleted at the latest if the destructor of the vector is called (e.g. because it goes out of scope) or possibly earlier if you change the size of the vector.

Is it more efficient to copy a vector by reserving and copying, or by creating and swapping?

Your second example does not work if you send the argument by reference. Did you mean

void copyVecFast(vec<int> original) // no reference
{

vector<int> new_;
new_.swap(original);
}

That would work, but an easier way is

vector<int> new_(original);

Append value to empty vector in R?

Appending to an object in a for loop causes the entire object to be copied on every iteration, which causes a lot of people to say "R is slow", or "R loops should be avoided".

As BrodieG mentioned in the comments: it is much better to pre-allocate a vector of the desired length, then set the element values in the loop.

Here are several ways to append values to a vector. All of them are discouraged.

Appending to a vector in a loop

# one way
for (i in 1:length(values))
vector[i] <- values[i]
# another way
for (i in 1:length(values))
vector <- c(vector, values[i])
# yet another way?!?
for (v in values)
vector <- c(vector, v)
# ... more ways

help("append") would have answered your question and saved the time it took you to write this question (but would have caused you to develop bad habits). ;-)

Note that vector <- c() isn't an empty vector; it's NULL. If you want an empty character vector, use vector <- character().

Pre-allocate the vector before looping

If you absolutely must use a for loop, you should pre-allocate the entire vector before the loop. This will be much faster than appending for larger vectors.

set.seed(21)
values <- sample(letters, 1e4, TRUE)
vector <- character(0)
# slow
system.time( for (i in 1:length(values)) vector[i] <- values[i] )
# user system elapsed
# 0.340 0.000 0.343
vector <- character(length(values))
# fast(er)
system.time( for (i in 1:length(values)) vector[i] <- values[i] )
# user system elapsed
# 0.024 0.000 0.023

How do I print out the contents of a vector?

If you have a C++11 compiler, I would suggest using a range-based for-loop (see below); or else use an iterator. But you have several options, all of which I will explain in what follows.

Range-based for-loop (C++11)

In C++11 (and later) you can use the new range-based for-loop, which looks like this:

std::vector<char> path;
// ...
for (char i: path)
std::cout << i << ' ';

The type char in the for-loop statement should be the type of the elements of the vector path and not an integer indexing type. In other words, since path is of type std::vector<char>, the type that should appear in the range-based for-loop is char. However, you will likely often see the explicit type replaced with the auto placeholder type:

for (auto i: path)
std::cout << i << ' ';

Regardless of whether you use the explicit type or the auto keyword, the object i has a value that is a copy of the actual item in the path object. Thus, all changes to i in the loop are not preserved in path itself:

std::vector<char> path{'a', 'b', 'c'};

for (auto i: path) {
i = '_'; // 'i' is a copy of the element in 'path', so although
// we can change 'i' here perfectly fine, the elements
// of 'path' have not changed
std::cout << i << ' '; // will print: "_ _ _"
}

for (auto i: path) {
std::cout << i << ' '; // will print: "a b c"
}

If you would like to proscribe being able to change this copied value of i in the for-loop as well, you can force the type of i to be const char like this:

for (const auto i: path) {
i = '_'; // this will now produce a compiler error
std::cout << i << ' ';
}

If you would like to modify the items in path so that those changes persist in path outside of the for-loop, then you can use a reference like so:

for (auto& i: path) {
i = '_'; // changes to 'i' will now also change the
// element in 'path' itself to that value
std::cout << i << ' ';
}

and even if you don't want to modify path, if the copying of objects is expensive you should use a const reference instead of copying by value:

for (const auto& i: path)
std::cout << i << ' ';

Iterators

Before C++11 the canonical solution would have been to use an iterator, and that is still perfectly acceptable. They are used as follows:

std::vector<char> path;
// ...
for (std::vector<char>::const_iterator i = path.begin(); i != path.end(); ++i)
std::cout << *i << ' ';

If you want to modify the vector's contents in the for-loop, then use iterator rather than const_iterator.

Supplement: typedef / type alias (C++11) / auto (C++11)

This is not another solution, but a supplement to the above iterator solution. If you are using the C++11 standard (or later), then you can use the auto keyword to help the readability:

for (auto i = path.begin(); i != path.end(); ++i)
std::cout << *i << ' ';

Here the type of i will be non-const (i.e., the compiler will use std::vector<char>::iterator as the type of i). This is because we called the begin method, so the compiler deduced the type for i from that. If we call the cbegin method instead ("c" for const), then i will be a std::vector<char>::const_iterator:

for (auto i = path.cbegin(); i != path.cend(); ++i) {
*i = '_'; // will produce a compiler error
std::cout << *i << ' ';
}

If you're not comfortable with the compiler deducing types, then in C++11 you can use a type alias to avoid having to type the vector out all the time (a good habit to get into):

using Path = std::vector<char>; // C++11 onwards only
Path path; // 'Path' is an alias for std::vector<char>
// ...
for (Path::const_iterator i = path.begin(); i != path.end(); ++i)
std::cout << *i << ' ';

If you do not have access to a C++11 compiler (or don't like the type alias syntax for whatever reason), then you can use the more traditional typedef:

typedef std::vector<char> Path; // 'Path' now a synonym for std::vector<char>
Path path;
// ...
for (Path::const_iterator i = path.begin(); i != path.end(); ++i)
std::cout << *i << ' ';

Side note:

At this point, you may or may not have come across iterators before, and you may or may not have heard that iterators are what you are "supposed" to use, and may be wondering why. The answer is not easy to appreciate, but, in brief, the idea is that iterators are an abstraction that shield you from the details of the operation.

It is convenient to have an object (the iterator) that does the operation you want (like sequential access) rather than you writing the details yourself (the "details" being the code that does the actual accessing of the elements of the vector). You should notice that in the for-loop you are only ever asking the iterator to return you a value (*i, where i is the iterator) -- you are never interacting with path directly itself. The logic goes like this: you create an iterator and give it the object you want to loop over (iterator i = path.begin()), and then all you do is ask the iterator to get the next value for you (*i); you never had to worry exactly how the iterator did that -- that's its business, not yours.

OK, but what's the point? Well, imagine if getting a value wasn't simple. What if it involves a bit of work? You don't need to worry, because the iterator has handled that for you -- it sorts out the details, all you need to do is ask it for a value. Additionally, what if you change the container from std::vector to something else? In theory, your code doesn't change even if the details of how accessing elements in the new container does: remember, the iterator sorts all the details out for you behind the scenes, so you don't need to change your code at all -- you just ask the iterator for the next value in the container, same as before.

So, whilst this may seem like confusing overkill for looping through a vector, there are good reasons behind the concept of iterators and so you might as well get used to using them.

Indexing

You can also use a integer type to index through the elements of the vector in the for-loop explicitly:

for (int i=0; i<path.size(); ++i)
std::cout << path[i] << ' ';

If you are going to do this, it's better to use the container's member types, if they are available and appropriate. std::vector has a member type called size_type for this job: it is the type returned by the size method.

typedef std::vector<char> Path; // 'Path' now a synonym for std::vector<char>
for (Path::size_type i=0; i<path.size(); ++i)
std::cout << path[i] << ' ';

Why not use this in preference to the iterator solution? For simple cases, you can do that, but using an iterator brings several advantages, which I have briefly outlined above. As such, my advice would be to avoid this method unless you have good reasons for it.

std::copy (C++11)

See Joshua's answer. You can use the STL algorithm std::copy to copy the vector contents onto the output stream. I don't have anything to add, except to say that I don't use this method; but there's no good reason for that besides habit.

std::ranges::copy (C++20)

For completeness, C++20 introduced ranges, which can act on the whole range of a std::vector, so no need for begin and end:

#include <iterator> // for std::ostream_iterator
#include <algorithm> // for std::ranges::copy depending on lib support

std::vector<char> path;
// ...
std::ranges::copy(path, std::ostream_iterator<char>(std::cout, " "));

Unless you have a recent compiler (on GCC apparently at least version 10.1), likely you will not have ranges support even if you might have some C++20 features available.

Overload std::ostream::operator<<

See also Chris's answer below. This is more a complement to the other answers since you will still need to implement one of the solutions above in the overloading, but the benefit is much cleaner code. This is how you could use the std::ranges::copy solution above:

#include <iostream>
#include <vector>
#include <iterator> // for std::ostream_iterator
#include <algorithm> // for std::ranges::copy depending on lib support

using Path = std::vector<char>; // type alias for std::vector<char>

std::ostream& operator<< (std::ostream& out, const Path& v) {
if ( !v.empty() ) {
out << '[';
std::ranges::copy(v, std::ostream_iterator<char>(out, ", "));
out << "\b\b]"; // use two ANSI backspace characters '\b' to overwrite final ", "
}
return out;
}

int main() {
Path path{'/', 'f', 'o', 'o'};

// will output: "path: [/, f, o, o]"
std::cout << "path: " << path << std::endl;

return 0;
}

Now you can pass your Path objects to your output stream just like fundamental types. Using any of the other solutions above should also be equally straightforward.

Conclusion

Any of the solutions presented here will work. It's up to you (and context or your coding standards) on which one is the "best". Anything more detailed than this is probably best left for another question where the pros/cons can be properly evaluated, but as always user preference will always play a part: none of the solutions presented are objectively wrong, but some will look nicer to each coder.

Addendum

This is an expanded solution of an earlier one I posted. Since that post kept getting attention, I decided to expand on it and refer to the other excellent solutions posted here, at least those that I have personally used in the past at least once. I would, however, encourage the reader to look at the answers below because there are probably good suggestions that I have forgotten, or do not know, about.



Related Topics



Leave a reply



Submit