How to Create a List of Vectors in Rcpp

How do I create a list of vectors in Rcpp?

[ Nice to see this here but Romain and I generally recommend the rccp-devel list for question. Please post there going forward as the project is not yet that large it warrants to have questions scattered all over the web. ]

RcppResultSet is part of the older classic API whereas a lot of work has gone into what we call the new API (starting with the 0.7.* releases). Have a look at the current Rcpp page on CRAN and the list of vignettes -- six and counting.

With new API you would return something like

return Rcpp::List::create(Rcpp::Named("vec") = someVector,
Rcpp::Named("lst") = someList,
Rcpp::Named("vec2") = someOtherVector);

all in one statement (and possibly using explicit Rcpp::wrap() calls), creating what in R would be

list(vec=someVector, lst=someList, vec2=someOtherVector)

And Rcpp::List should also be able to do lists of lists of lists... though I am not sure we have unit tests for this --- but there are numerous examples in the 500+ unit tests.

As it happens, I spent the last few days converting a lot of RQuantLib code from the classic API to the new API. This will probably get released once we get version 0.8.3 of Rcpp out (hopefully in a few days). In the meantime, you can look at the RQuantLib SVN archive

Creating a Large List of (Large) Vectors with Rcpp

In the end, the solution I went with is the one seen above:

// [[Rcpp::export]]
List permute_data(NumericMatrix mat1,NumericMatrix mat2,int B) {

List out(B); // Will be large ~5000 elements
int N1 = mat1.rows();
int N2 = mat2.rows();
int m = mat1.cols(); //Will be large ~10000 elements

// Row labels to be permuted
IntegerVector permindx = seq(0,N1+N2-1);
NumericMatrix M1 = no_init_matrix(N1,m);
NumericMatrix M2 = no_init_matrix(N2,m);

for(int b = 0; b<B; ++b){
// Permute the N1+N2 rows
permindx = sample(permindx,N1+N2); //Use Rcpp's function to work with R's RNG
for(int j=0; j<m; ++j){
// Pick out first N1 elements of permindx
for(int i=0; i<N1; ++i){
if(permindx[i]>=N1){ //Check that shuffled index is in bounds
M1(i,j) = mat2(permindx[i],j);
} else{
M1(i,j) = mat1(permindx[i],j);
}
}
// Pick out last N2 elements of permindx
for(int k=0; k<N2; ++k){
if(permindx[k+N1]<N1){ //Check that shuffled index is in bounds
M2(k,j) = mat1(permindx[k+N1],j);
} else{
M2(k,j) = mat2(permindx[k+N1],j);
}
}
}
out[b] = vecmin(ColMax(M1),ColMax(M2)); //a vector of length m
}
return(out);
}

Return a list of NumericVectors from Rcpp function

The solution for this problem is to define the vector sim after the first for command, like this:

List gowerSim(CharacterMatrix inp) {

int n_row = inp.nrow(), n_col = inp.ncol();
int sumRow=0,colLen;
List out(n_row);

for(int i=0;i<n_row;i++){

NumericVector sim(n_row);

for(int j=0;j<n_row;j++){
sumRow=0;
colLen=n_col;
for(int k=0; k<n_col;k++){
if(inp(i,k)!="NA" && inp(j,k)!="NA"){
if(inp(i,k)!=inp(j,k)){
sumRow=sumRow+1;
}
}else{
colLen=colLen-1;
}
}
if(colLen>0){
sim[j] = (double) sumRow/colLen;
//printf("%f",sim[j]);
}else{
sim[j] = NA_INTEGER;
}
}
out[i] = sim;
if(i<3){
print(out);
}
}

return out;
}

A little example:

mat <- matrix( as.character(c(rep(1,5),sample(3,15,repl=TRUE),rep(5,5))),5)
clust <- gowerSim(mat)
clust

Sample Image

Or you can define the vector as you did it and reset it in the first for-loop.

Why exactly this approach works and your not: I don't really know, but I think it is referred to the list structure in C++.

My first approach to solve your problem was the following one: Instead filling up a list, we are filling a Matrix, and this works fine, see here:

NumericMatrix gowerSim(CharacterMatrix inp) {

int n_row = inp.nrow(), n_col = inp.ncol();
int sumRow=0,colLen;
NumericMatrix out(n_row, n_col);
NumericVector sim(n_row);

for(int i=0;i<n_row;i++);

for(int j=0;j<n_row;j++){
sumRow=0;
colLen=n_col;
for(int k=0; k<n_col;k++){
if(inp(i,k)!="NA" && inp(j,k)!="NA"){
if(inp(i,k)!=inp(j,k)){
sumRow=sumRow+1;
}
}else{
colLen=colLen-1;
}
}
if(colLen>0){
sim[j] = (double) sumRow/colLen;
//printf("%f",sim[j]);
}else{
sim[j] = NA_INTEGER;
}
}
out(_,i) = sim;
if(i<3){
print(out);
}
}

return out;
}

R List of numeric vectors - C++ 2d array with Rcpp

I will give you bonus points for a reproducible example, and of course for using Rcpp :) And then I will take those away for not asking on the rcpp-devel list...

As for converting STL types: you don't have to, but when you decide to do it, the as<>() idiom is correct. The only 'better way' I can think of is to do name lookup as you would in R itself:

require(inline)
require(Rcpp)

set.seed(42)
xl <- list(U=runif(4), N=rnorm(4), T2df=rt(4,2))

fun <- cxxfunction(signature(x="list"), plugin="Rcpp", body = '
Rcpp::List xl(x);
std::vector<double> u = Rcpp::as<std::vector<double> >(xl["U"]);
std::vector<double> n = Rcpp::as<std::vector<double> >(xl["N"]);
std::vector<double> t2 = Rcpp::as<std::vector<double> >(xl["T2df"]);
// do something clever here
return(R_NilValue);
')

Hope that helps. Otherwise, the list is always open...

PS As for the two-dim array, that is trickier as there is no native C++ two-dim array. If you actually want to do linear algebra, look at RcppArmadillo and RcppEigen.

Keeping vectors (from list of vectors) whose elements do not have a proper subset within that same list (using RCPP)

The notion here is to avoid the O(N^3) and use a less order instead. The other answer provided here will be slow still since it is greater than O(N^2). Here is a solution with less than O(N^2), where the worst case scenario is O(N^2) when all the elements are unique.

onlySet <- function(x){
i <- 1
repeat{
y <- sapply(x[-1], function(el)!all(is.element(x[[1]], el)))
if(all(y)){
if(i==length(x)) break
else i <- i+1
}
x <- c(x[-1][y], x[1])
}
x
}

Now to show the time difference, check out the following:

match_fun <- Vectorize(function(s1, s2) all(s1 %in% s2))
method1 <- function(a){
mat <- outer(a, a, match_fun)
a[colSums(mat) == 1]
}

poss <- rep(possibilities, 100)

microbenchmark::microbenchmark(method1(poss), onlySet(poss))

Unit: milliseconds
expr min lq mean median uq max neval cld
method1(poss) 840.7919 880.12635 932.255030 889.36380 923.32555 1420.1077 100 b
onlySet(poss) 1.9845 2.07005 2.191647 2.15945 2.24245 3.3656 100 a

Fast way to convert a list of character vectors to a list of numeric vectors

Inspired by @Konrad's answer, I coded up the following using Rcpp.

NumericVector char_to_num(CharacterVector x) {
std::size_t n = x.size();
if (n == 0) return NumericVector(0);
NumericVector out(n);
for (std::size_t i = 0; i != n; ++i) {
std::string x_i(x[i]);
double number = NA_REAL;
try {
std::size_t pos;
number = std::stod(x_i, &pos);
number = ((pos == x_i.size()) ? number : NA_REAL);
} catch (const std::invalid_argument& e) {
; // do nothing
}
out[i] = number;
}
return out;
}

// [[Rcpp::export]]
List lst_char_to_num(List x) {
std::size_t n = x.size();
List out(n);
for (std::size_t i = 0; i != n; ++i)
out[i] = char_to_num(x[i]);
return out;
}

This lst_char_to_num() turns out to be the best answer. I compare it to my favourite answers so far which are try1, try2 and try3 from @rmflight. try1 was fastest so far (on a big dataset, which is what I'm worried about). I've taken the stringr operation out of the timings because I want to purely evaluate the speed of the list conversion.

character_vector <- rep(c("a1b2", "c3d4e5", "xyz"), 1000)
extracted_numbers <- stringr::str_extract_all(character_vector, "\\d")

try_1 <- function(char_list) {
lapply(char_list, as.numeric)
}

try_2 <- function(char_list) {
purrr::map(char_list, as.numeric)
}

try_3 <- function(char_list) {
relist(as.numeric(unlist(char_list)), char_list)
}

microbenchmark::microbenchmark(try_1(extracted_numbers),
try_2(extracted_numbers),
try_3(extracted_numbers),
lst_char_to_num(extracted_numbers),
times = 1000)

Unit: microseconds
expr min lq mean median uq max neval cld
try_1(extracted_numbers) 1068.823 1334.9060 1518.7589 1477.7825 1559.791 5318.318 1000 b
try_2(extracted_numbers) 2029.832 2581.6655 2974.4126 2856.8560 3057.930 9846.862 1000 c
try_3(extracted_numbers) 10015.929 12261.6405 14043.5922 13188.8465 14802.795 165217.152 1000 d
lst_char_to_num(extracted_numbers) 500.858 681.5895 827.5021 765.9505 830.311 6744.985 1000 a


Related Topics



Leave a reply



Submit