Index Element from List in Rcpp

Index element from list in Rcpp

Quick ones:

  1. Compound expression in C++ can bite at times; the template magic gets in the way. So just assign from the List object to a whatever the element is, eg a NumericMatrix.

  2. Then pick from the NumericMatrix as you see fit. We have row, col, element, ... access.

  3. Printing can be easier using Rcpp::Rcout << anElement but note that we currently cannot print entire matrices or vectors -- but the int or double types are fine.

Edit:

Here is a sample implementation.

#include <Rcpp.h>

// [[Rcpp::export]]
double sacha(Rcpp::List L) {
double sum = 0;
for (int i=0; i<L.size(); i++) {
Rcpp::NumericMatrix M = L[i];
double topleft = M(0,0);
sum += topleft;
Rcpp::Rcout << "Element is " << topleft << std::endl;
}
return sum;
}

/*** R
set.seed(42)
L <- list(matrix(rnorm(9),3), matrix(1:9,3), matrix(sqrt(1:4),2))
sacha(L) # fix typo
*/

And its result:

R> Rcpp::sourceCpp('/tmp/sacha.cpp')

R> set.seed(42)

R> L <- list(matrix(rnorm(9),3), matrix(1:9,3), matrix(sqrt(1:4),2))

R> sacha(L)
Element is 1.37096
Element is 1
Element is 1
[1] 3.37096
R>

How to correctly change a list of a ListMatrix in rcpp in R

TL;DR: Welcome to shared environments and Rcpp's use of pointers.

There are two ways to address R's treatment of objects in a shared environment to avoid the domino update associated with the function.

  1. perform a deep copy of the object and then update the matrix list position or
  2. have only unique elements inside the list (e.g. no duplicated matrices)

To begin, let's look at the memory allocation for this shared memory list. For convenience -- and to avoid tracemem() -- we'll use the lobstr package to explore the shared environment. The package is currently GitHub-only and can be obtained using:

install.packages("devtools")
devtools::install_github("r-lib/lobstr")

With lobstr in hand, let's take a look at the underlying memory addresses of the matrix list in R...

x = matrix(list(matrix(0,3,2)),2,2)
lobstr::obj_addrs(x)
# [1] "0x7fceb69757c8" "0x7fceb69757c8" "0x7fceb69757c8" "0x7fceb69757c8"
# ^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
# identical memory addresses for all objects

Notice that all objects share the same memory location. Thus, R is treating each of these elements inside the list as being the same. R opts for this behavior to reduce the size of data in memory as each element belongs to the same shared environment.

This is particularly problematic for Rcpp as it is manipulating pointers, e.g. a location in memory that shows where a value is stored, and not values, e.g. a variable that holds a value like an int or a double.

Due to this pointer behavior, two actions occur:

  1. all of the matrices in the matrix list are updated simultaneously, and
  2. the overall object x is updated without requiring a return statement.

If we modify your function slightly, the second point is more apparent.

Modified Function to Illustrate Pointers

Note: This function no longer has a return type, yet we are still seeing the x object change in R's environment.

#include<Rcpp.h>

// [[Rcpp::export]]
void ListMatrixType_pointer(Rcpp::ListMatrix x){
Rcpp::NumericMatrix a = x(0, 0);
a(0, 0) = 100;
}

Output

ListMatrixType_pointer(x)
str(x)
# List of 4
# $ : num [1:3, 1:2] 100 0 0 0 0 0
# $ : num [1:3, 1:2] 100 0 0 0 0 0
# $ : num [1:3, 1:2] 100 0 0 0 0 0
# $ : num [1:3, 1:2] 100 0 0 0 0 0
# - attr(*, "dim")= int [1:2] 2 2

Notice, we do not have to return a value as x is automatically updated. Furthermore, we still have the same memory location for each of the elements, e.g.

lobstr::obj_addrs(x)
# [1] "0x7fceb69757c8" "0x7fceb69757c8" "0x7fceb69757c8" "0x7fceb69757c8"
# ^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
# identical memory addresses for all objects

Deep Copies of Matrix

To get around the identical memory addresses, you can deeply clone the object and then save it back into the ListMatrix. The clone() function instantiates a new block of memory to store the values to. Thus, modifying one element no longer will trigger a domino update. Furthermore, when you add a cloned object back to a list, the memory address changes for only that element.

Using clone() to make a deep copy

#include<Rcpp.h>

// [[Rcpp::export]]
void ListMatrixType_clone(Rcpp::ListMatrix x){
Rcpp::NumericMatrix a = x(0, 0);

// Perform a deep copy of a into b
// and, thus, changing the memory address
Rcpp::NumericMatrix b = Rcpp::clone(a);
b(0, 0) = 100;

// Update x with b
x(0, 0) = b;
}
Output

ListMatrixType_clone(x)
str(x)
# List of 4
# $ : num [1:3, 1:2] 100 0 0 0 0 0
# $ : num [1:3, 1:2] 0 0 0 0 0 0
# $ : num [1:3, 1:2] 0 0 0 0 0 0
# $ : num [1:3, 1:2] 0 0 0 0 0 0
# - attr(*, "dim")= int [1:2] 2 2

lobstr::obj_addrs(x)
# [1] "0x7fceb811a7e8" "0x7fceb69757c8" "0x7fceb69757c8" "0x7fceb69757c8"
# ^^^^^^^ ^^^^^^^
# different memory addresses

Unique Elements in a List

To emphasize the need for unique elements, consider modifying the matrix in the first position of the matrix list, the memory address for only the first element will change; the rest of the addresses will remain the same. e.g.

x[[1, 1]] = matrix(1, 2, 2)
lobstr::obj_addrs(x)
# [1] "0x7fceb811a7e8" "0x7fceb69757c8" "0x7fceb69757c8" "0x7fceb69757c8"
# ^^^^^^^ ^^^^^^^
# different memory addresses

Additional Resources

For further reading on this topic please see any of the linked posts below that emphasize pointer behavior associated with Rcpp (Disclaimer: I wrote them):

  • Assigning Rcpp objects into an Rcpp List yields duplicates of the last element
  • Declare a variable as a reference in Rcpp
  • Deciding between NumericVector and arma::vec in Rcpp
  • Should I prefer Rcpp::NumericVector over std::vector?

Miscellaneous note on Object Sizes

Note: object.size() from utils in Base R provides an incorrect calculation of the shared environment sizes. c.f. ?utils::object.size

This function merely provides a rough indication: it should be reasonably accurate for atomic vectors, but does not detect if elements of a list are shared, for example. (Sharing amongst elements of a character vector is taken into account, but not that between character vectors in a single object.)

lobstr::obj_size(x)
# 424 B
utils::object.size(x)
# 1304 bytes

So, due to the shared environment, the object size of the list is about 1/3.

Extracting element in a nested list in Rcpp

It appears to mostly be a scoping issue and a variable type issue. I believe the std::string b declarations in the if then statements are local. Any changes there do not last.

Then, the errors in your comments are trying to assign the LHS std::string with RHS Rcpp:Vector. While I am sure there are better ways to convert and/or simplify, the solution below just declares Rcpp::CharacterVector b.

library(Rcpp)
cppFunction('int foo(Rcpp::List x, std::string step_name) {
Rcpp::List steps = x("steps");
Rcpp::List step = steps(step_name);
int a = step("a");
//// Not declaring b causes "not declared in this scope" error
//// so I initialize it with a random value.
CharacterVector b; #############
if (step.containsElementNamed("b")){
Rcout << "b is in List!" << "\\n";
if (!Rf_isNull(step("b"))) {
Rcout << "b is not NULL!" << "\\n";
if (TYPEOF(step("b")) == STRSXP)
Rcout << "b is character!" << "\\n";
b = step("b");
} else {
Rcout << "b is NULL!" << "\\n";
}
} else {
Rcout << "b is not in List!" << "\\n";
}
Rcout << "The size of b is " << b.size() << ".\\n"; #########
if (b.size() > 0 && b[0] == "abc") { ################
Rcout << "Do something with b.\\n";
//// Do something with the value of b
}
return a;
}')

mylist = list(steps = list(`1` = list(a = 7, b = "abc"),
`2` = list(a = 3),
`3` = list(a = 5, b = NULL)))

foo(mylist, "1")
# b is in List!
# b is not NULL!
# b is character!
# The size of b is 1.
# Do something with b.
# [1] 7
foo(mylist, "2")
# b is not in List!
# The size of b is 0.
# [1] 3
foo(mylist, "3")
# b is in List!
# b is NULL!
# The size of b is 0.
# [1] 5

Find index of all max/min values in vector in Rcpp

I don't know about a native function, but a loop is fairly straight-forward to write.

Here are three versions.

Two which find the Rcpp::max() of the vector, then find the indices of the vector which match this max. One uses a pre-allocated Rcpp::IntegerVector() to store the result, which is then subset to remove the extra 'unused' zeroes. The other uses a std::vector< int > with a .push_back() to store the results.

library(Rcpp)

cppFunction('IntegerVector which_maxCpp1(NumericVector v) {
double m = Rcpp::max(v);
Rcpp::IntegerVector res( v.size() ); // pre-allocate result vector

int i;
int counter = 0;
for( i = 0; i < v.size(); ++i) {
if( v[i] == m ) {
res[ counter ] = i;
counter++;
}
}
counter--;
Rcpp::Range rng(0, counter);
return res[rng];
}')

v = c(1,2,3,1,2,3)

which_maxCpp(v)
# [1] 2 5
cppFunction('IntegerVector which_maxCpp2(NumericVector v) {
double m = Rcpp::max(v);
std::vector< int > res;

int i;
for( i = 0; i < v.size(); ++i) {
if( v[i] == m ) {
res.push_back( i );
}
}
Rcpp::IntegerVector iv( res.begin(), res.end() );
return iv;
}')

which_maxCpp(v)
# [1] 2 5

The third option avoids the double-pass over the vector by finding both the max, and keeping track of the indices in the one loop at the same time.

cppFunction('IntegerVector which_maxCpp3(NumericVector v) {

double current_max = v[0];
int n = v.size();
std::vector< int > res;
res.push_back( 0 );
int i;

for( i = 1; i < n; ++i) {
double x = v[i];
if( x > current_max ) {
res.clear();
current_max = x;
res.push_back( i );
} else if ( x == current_max ) {
res.push_back( i );
}
}
Rcpp::IntegerVector iv( res.begin(), res.end() );
return iv;
}')

Benchmarking

Here are some benchmarks showing how these functions stack-up against the base R approach.

library(microbenchmark)

x <- sample(1:100, size = 1e6, replace = T)

microbenchmark(
iv = { which_maxCpp1(x) },
stl = { which_maxCpp2(x) },
max = { which_maxCpp3(x) },
r = { which( x == max(x)) }
)

# Unit: milliseconds
# expr min lq mean median uq max neval
# iv 6.638583 10.617945 14.028378 10.956616 11.63981 165.719783 100
# stl 6.830686 9.506639 9.787291 9.744488 10.17247 11.275061 100
# max 3.161913 5.690886 5.926433 5.913899 6.19489 7.427020 100
# r 4.044166 5.558075 5.819701 5.719940 6.00547 7.080742 100

It seems it is a bit slow to extract element from a List in Rcpp

I suspect two main problems concerning performance:

  • Lots of string comparisons (of the order of 1e9)
  • Lots of cache misses for the matrices, since in general two consecutive xy-pairs won't be from the same category and will therefore need a different matrix

Both indicate into the same direction: Do not try to implement your own GROUP BY operations. Database engines and packages like data.table know better how to do that. For example, when using data.table we need a much simpler function that expects the x and y for one category and outputs a single matrix:

#include <Rcpp.h>
using namespace Rcpp;

//[[Rcpp::export]]
NumericMatrix getMat(NumericVector x, NumericVector y,
double xmin, double xmax, double ymin, double ymax,
int plot_width = 600, int plot_height = 600) {
int n = x.size();
NumericMatrix M(plot_height, plot_width);

for(int i=0; i < n; i++) {
int id_x = floor((x[i] - xmin)/(xmax - xmin) * (plot_width - 1));
int id_y = floor((y[i] - ymin)/(ymax - ymin) * (plot_height - 1));
M(id_y, id_x) += 1;
}
return M;
}

/***R
n <- 1e8
class <- 20

library("data.table")
foo <- data.table(x = rnorm(n),
y = rnorm(n),
category = sample(as.factor(1:class), size = n, replace = TRUE))

xmin <- min(foo$x)
xmax <- max(foo$x)
ymin <- min(foo$y)
ymax <- max(foo$y)

system.time(bar <- foo[,
list(baz = list(getMat(x, y, xmin, xmax, ymin, ymax))),
by = category])
*/

Notes:

  • On my system the aggregation takes less than 6 seconds.
  • It is even faster if one does a setkey(foo, category) before the aggregation. That physically alters the order of the rows, though. Use with care!
  • data.table syntax is a bit terse, but one gets used to it ...
  • The structure of the output is different, but can be converted if needed.

Iterate over named List in Rcpp

Many of these use cases are in fact visible in the unit tests.

Here I am quoting from tinytest/cpp/Vector.cpp:

Get names:

// [[Rcpp::export]]
CharacterVector integer_names_get( IntegerVector y ){
return y.names() ;
}

Index by name

// [[Rcpp::export]]
int integer_names_indexing( IntegerVector y ){
return y["foo"] ;
}

That should be enough to get you the vector you aim to iterate over.



Related Topics



Leave a reply



Submit