Index element from list in Rcpp
Quick ones:
Compound expression in C++ can bite at times; the template magic gets in the way. So just assign from the
List
object to a whatever the element is, eg aNumericMatrix
.Then pick from the
NumericMatrix
as you see fit. We have row, col, element, ... access.Printing can be easier using
Rcpp::Rcout << anElement
but note that we currently cannot print entire matrices or vectors -- but theint
ordouble
types are fine.
Edit:
Here is a sample implementation.
#include <Rcpp.h>
// [[Rcpp::export]]
double sacha(Rcpp::List L) {
double sum = 0;
for (int i=0; i<L.size(); i++) {
Rcpp::NumericMatrix M = L[i];
double topleft = M(0,0);
sum += topleft;
Rcpp::Rcout << "Element is " << topleft << std::endl;
}
return sum;
}
/*** R
set.seed(42)
L <- list(matrix(rnorm(9),3), matrix(1:9,3), matrix(sqrt(1:4),2))
sacha(L) # fix typo
*/
And its result:
R> Rcpp::sourceCpp('/tmp/sacha.cpp')
R> set.seed(42)
R> L <- list(matrix(rnorm(9),3), matrix(1:9,3), matrix(sqrt(1:4),2))
R> sacha(L)
Element is 1.37096
Element is 1
Element is 1
[1] 3.37096
R>
How to correctly change a list of a ListMatrix in rcpp in R
TL;DR: Welcome to shared environments and Rcpp's use of pointers.
There are two ways to address R's treatment of objects in a shared environment to avoid the domino update associated with the function.
- perform a deep copy of the object and then update the matrix list position or
- have only unique elements inside the list (e.g. no duplicated matrices)
To begin, let's look at the memory allocation for this shared memory list. For convenience -- and to avoid tracemem()
-- we'll use the lobstr
package to explore the shared environment. The package is currently GitHub-only and can be obtained using:
install.packages("devtools")
devtools::install_github("r-lib/lobstr")
With lobstr
in hand, let's take a look at the underlying memory addresses of the matrix list in R...
x = matrix(list(matrix(0,3,2)),2,2)
lobstr::obj_addrs(x)
# [1] "0x7fceb69757c8" "0x7fceb69757c8" "0x7fceb69757c8" "0x7fceb69757c8"
# ^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
# identical memory addresses for all objects
Notice that all objects share the same memory location. Thus, R is treating each of these elements inside the list as being the same. R opts for this behavior to reduce the size of data in memory as each element belongs to the same shared environment.
This is particularly problematic for Rcpp as it is manipulating pointers, e.g. a location in memory that shows where a value is stored, and not values, e.g. a variable that holds a value like an int
or a double
.
Due to this pointer behavior, two actions occur:
- all of the matrices in the matrix list are updated simultaneously, and
- the overall object
x
is updated without requiring a return statement.
If we modify your function slightly, the second point is more apparent.
Modified Function to Illustrate Pointers
Note: This function no longer has a return type, yet we are still seeing the x
object change in R's environment.
#include<Rcpp.h>
// [[Rcpp::export]]
void ListMatrixType_pointer(Rcpp::ListMatrix x){
Rcpp::NumericMatrix a = x(0, 0);
a(0, 0) = 100;
}
Output
ListMatrixType_pointer(x)
str(x)
# List of 4
# $ : num [1:3, 1:2] 100 0 0 0 0 0
# $ : num [1:3, 1:2] 100 0 0 0 0 0
# $ : num [1:3, 1:2] 100 0 0 0 0 0
# $ : num [1:3, 1:2] 100 0 0 0 0 0
# - attr(*, "dim")= int [1:2] 2 2
Notice, we do not have to return a value as x
is automatically updated. Furthermore, we still have the same memory location for each of the elements, e.g.
lobstr::obj_addrs(x)
# [1] "0x7fceb69757c8" "0x7fceb69757c8" "0x7fceb69757c8" "0x7fceb69757c8"
# ^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^
# identical memory addresses for all objects
Deep Copies of Matrix
To get around the identical memory addresses, you can deeply clone the object and then save it back into the ListMatrix
. The clone()
function instantiates a new block of memory to store the values to. Thus, modifying one element no longer will trigger a domino update. Furthermore, when you add a cloned object back to a list, the memory address changes for only that element.
Using clone()
to make a deep copy
#include<Rcpp.h>
// [[Rcpp::export]]
void ListMatrixType_clone(Rcpp::ListMatrix x){
Rcpp::NumericMatrix a = x(0, 0);
// Perform a deep copy of a into b
// and, thus, changing the memory address
Rcpp::NumericMatrix b = Rcpp::clone(a);
b(0, 0) = 100;
// Update x with b
x(0, 0) = b;
}
OutputListMatrixType_clone(x)
str(x)
# List of 4
# $ : num [1:3, 1:2] 100 0 0 0 0 0
# $ : num [1:3, 1:2] 0 0 0 0 0 0
# $ : num [1:3, 1:2] 0 0 0 0 0 0
# $ : num [1:3, 1:2] 0 0 0 0 0 0
# - attr(*, "dim")= int [1:2] 2 2
lobstr::obj_addrs(x)
# [1] "0x7fceb811a7e8" "0x7fceb69757c8" "0x7fceb69757c8" "0x7fceb69757c8"
# ^^^^^^^ ^^^^^^^
# different memory addresses
Unique Elements in a List
To emphasize the need for unique elements, consider modifying the matrix in the first position of the matrix list, the memory address for only the first element will change; the rest of the addresses will remain the same. e.g.
x[[1, 1]] = matrix(1, 2, 2)
lobstr::obj_addrs(x)
# [1] "0x7fceb811a7e8" "0x7fceb69757c8" "0x7fceb69757c8" "0x7fceb69757c8"
# ^^^^^^^ ^^^^^^^
# different memory addresses
Additional Resources
For further reading on this topic please see any of the linked posts below that emphasize pointer behavior associated with Rcpp (Disclaimer: I wrote them):
- Assigning Rcpp objects into an Rcpp List yields duplicates of the last element
- Declare a variable as a reference in Rcpp
- Deciding between NumericVector and arma::vec in Rcpp
- Should I prefer Rcpp::NumericVector over std::vector?
Miscellaneous note on Object Sizes
Note: object.size()
from utils
in Base R provides an incorrect calculation of the shared environment sizes. c.f. ?utils::object.size
This function merely provides a rough indication: it should be reasonably accurate for atomic vectors, but does not detect if elements of a list are shared, for example. (Sharing amongst elements of a character vector is taken into account, but not that between character vectors in a single object.)
lobstr::obj_size(x)
# 424 B
utils::object.size(x)
# 1304 bytes
So, due to the shared environment, the object size of the list is about 1/3.
Extracting element in a nested list in Rcpp
It appears to mostly be a scoping issue and a variable type issue. I believe the std::string b
declarations in the if then
statements are local. Any changes there do not last.
Then, the errors in your comments are trying to assign the LHS std::string
with RHS Rcpp:Vector
. While I am sure there are better ways to convert and/or simplify, the solution below just declares Rcpp::CharacterVector b
.
library(Rcpp)
cppFunction('int foo(Rcpp::List x, std::string step_name) {
Rcpp::List steps = x("steps");
Rcpp::List step = steps(step_name);
int a = step("a");
//// Not declaring b causes "not declared in this scope" error
//// so I initialize it with a random value.
CharacterVector b; #############
if (step.containsElementNamed("b")){
Rcout << "b is in List!" << "\\n";
if (!Rf_isNull(step("b"))) {
Rcout << "b is not NULL!" << "\\n";
if (TYPEOF(step("b")) == STRSXP)
Rcout << "b is character!" << "\\n";
b = step("b");
} else {
Rcout << "b is NULL!" << "\\n";
}
} else {
Rcout << "b is not in List!" << "\\n";
}
Rcout << "The size of b is " << b.size() << ".\\n"; #########
if (b.size() > 0 && b[0] == "abc") { ################
Rcout << "Do something with b.\\n";
//// Do something with the value of b
}
return a;
}')
mylist = list(steps = list(`1` = list(a = 7, b = "abc"),
`2` = list(a = 3),
`3` = list(a = 5, b = NULL)))
foo(mylist, "1")
# b is in List!
# b is not NULL!
# b is character!
# The size of b is 1.
# Do something with b.
# [1] 7
foo(mylist, "2")
# b is not in List!
# The size of b is 0.
# [1] 3
foo(mylist, "3")
# b is in List!
# b is NULL!
# The size of b is 0.
# [1] 5
Find index of all max/min values in vector in Rcpp
I don't know about a native function, but a loop is fairly straight-forward to write.
Here are three versions.
Two which find the Rcpp::max()
of the vector, then find the indices of the vector which match this max. One uses a pre-allocated Rcpp::IntegerVector()
to store the result, which is then subset to remove the extra 'unused' zeroes. The other uses a std::vector< int >
with a .push_back()
to store the results.
library(Rcpp)
cppFunction('IntegerVector which_maxCpp1(NumericVector v) {
double m = Rcpp::max(v);
Rcpp::IntegerVector res( v.size() ); // pre-allocate result vector
int i;
int counter = 0;
for( i = 0; i < v.size(); ++i) {
if( v[i] == m ) {
res[ counter ] = i;
counter++;
}
}
counter--;
Rcpp::Range rng(0, counter);
return res[rng];
}')
v = c(1,2,3,1,2,3)
which_maxCpp(v)
# [1] 2 5
cppFunction('IntegerVector which_maxCpp2(NumericVector v) {
double m = Rcpp::max(v);
std::vector< int > res;
int i;
for( i = 0; i < v.size(); ++i) {
if( v[i] == m ) {
res.push_back( i );
}
}
Rcpp::IntegerVector iv( res.begin(), res.end() );
return iv;
}')
which_maxCpp(v)
# [1] 2 5
The third option avoids the double-pass over the vector by finding both the max, and keeping track of the indices in the one loop at the same time.
cppFunction('IntegerVector which_maxCpp3(NumericVector v) {
double current_max = v[0];
int n = v.size();
std::vector< int > res;
res.push_back( 0 );
int i;
for( i = 1; i < n; ++i) {
double x = v[i];
if( x > current_max ) {
res.clear();
current_max = x;
res.push_back( i );
} else if ( x == current_max ) {
res.push_back( i );
}
}
Rcpp::IntegerVector iv( res.begin(), res.end() );
return iv;
}')
Benchmarking
Here are some benchmarks showing how these functions stack-up against the base R approach.
library(microbenchmark)
x <- sample(1:100, size = 1e6, replace = T)
microbenchmark(
iv = { which_maxCpp1(x) },
stl = { which_maxCpp2(x) },
max = { which_maxCpp3(x) },
r = { which( x == max(x)) }
)
# Unit: milliseconds
# expr min lq mean median uq max neval
# iv 6.638583 10.617945 14.028378 10.956616 11.63981 165.719783 100
# stl 6.830686 9.506639 9.787291 9.744488 10.17247 11.275061 100
# max 3.161913 5.690886 5.926433 5.913899 6.19489 7.427020 100
# r 4.044166 5.558075 5.819701 5.719940 6.00547 7.080742 100
It seems it is a bit slow to extract element from a List in Rcpp
I suspect two main problems concerning performance:
- Lots of string comparisons (of the order of
1e9
) - Lots of cache misses for the matrices, since in general two consecutive xy-pairs won't be from the same category and will therefore need a different matrix
Both indicate into the same direction: Do not try to implement your own GROUP BY operations. Database engines and packages like data.table
know better how to do that. For example, when using data.table
we need a much simpler function that expects the x and y for one category and outputs a single matrix:
#include <Rcpp.h>
using namespace Rcpp;
//[[Rcpp::export]]
NumericMatrix getMat(NumericVector x, NumericVector y,
double xmin, double xmax, double ymin, double ymax,
int plot_width = 600, int plot_height = 600) {
int n = x.size();
NumericMatrix M(plot_height, plot_width);
for(int i=0; i < n; i++) {
int id_x = floor((x[i] - xmin)/(xmax - xmin) * (plot_width - 1));
int id_y = floor((y[i] - ymin)/(ymax - ymin) * (plot_height - 1));
M(id_y, id_x) += 1;
}
return M;
}
/***R
n <- 1e8
class <- 20
library("data.table")
foo <- data.table(x = rnorm(n),
y = rnorm(n),
category = sample(as.factor(1:class), size = n, replace = TRUE))
xmin <- min(foo$x)
xmax <- max(foo$x)
ymin <- min(foo$y)
ymax <- max(foo$y)
system.time(bar <- foo[,
list(baz = list(getMat(x, y, xmin, xmax, ymin, ymax))),
by = category])
*/
Notes:
- On my system the aggregation takes less than 6 seconds.
- It is even faster if one does a
setkey(foo, category)
before the aggregation. That physically alters the order of the rows, though. Use with care! data.table
syntax is a bit terse, but one gets used to it ...- The structure of the output is different, but can be converted if needed.
Iterate over named List in Rcpp
Many of these use cases are in fact visible in the unit tests.
Here I am quoting from tinytest/cpp/Vector.cpp:
Get names:
// [[Rcpp::export]]
CharacterVector integer_names_get( IntegerVector y ){
return y.names() ;
}
Index by name
// [[Rcpp::export]]
int integer_names_indexing( IntegerVector y ){
return y["foo"] ;
}
That should be enough to get you the vector you aim to iterate over.
Related Topics
How to Ddply() Without Sorting
R How to Extract First Row of Each Matrix Within a List
S3 Method Consistency Warning When Building R Package with Roxygen
How to Store R Ggplot Graph as HTML Code Snippet
How to Pass Data Between Functions in a Shiny App
What Are the Caveats of Using Source Versus Parse & Eval
Lme4::Glmer VS. Stata's Melogit Command
Shiny R - Download the Result of a Table
Find *All* Duplicated Records in Data.Table (Not All-But-One)
Saving a Data Frame as a Binary File
Generating a Very Large Matrix of String Combinations Using Combn() and Bigmemory Package
How Does Branch Prediction Affect Performance in R
References Truncated in Beamer Presentation Prepared in Knitr/Rmarkdown
Function Commenting Conventions in R