How do I create a list of vectors in Rcpp?
[ Nice to see this here but Romain and I generally recommend the rccp-devel list for question. Please post there going forward as the project is not yet that large it warrants to have questions scattered all over the web. ]
RcppResultSet
is part of the older classic API whereas a lot of work has gone into what we call the new API (starting with the 0.7.* releases). Have a look at the current Rcpp page on CRAN and the list of vignettes -- six and counting.
With new API you would return something like
return Rcpp::List::create(Rcpp::Named("vec") = someVector,
Rcpp::Named("lst") = someList,
Rcpp::Named("vec2") = someOtherVector);
all in one statement (and possibly using explicit Rcpp::wrap()
calls), creating what in R would be
list(vec=someVector, lst=someList, vec2=someOtherVector)
And Rcpp::List
should also be able to do lists of lists of lists... though I am not sure we have unit tests for this --- but there are numerous examples in the 500+ unit tests.
As it happens, I spent the last few days converting a lot of RQuantLib code from the classic API to the new API. This will probably get released once we get version 0.8.3 of Rcpp out (hopefully in a few days). In the meantime, you can look at the RQuantLib SVN archive
Creating a Large List of (Large) Vectors with Rcpp
In the end, the solution I went with is the one seen above:
// [[Rcpp::export]]
List permute_data(NumericMatrix mat1,NumericMatrix mat2,int B) {
List out(B); // Will be large ~5000 elements
int N1 = mat1.rows();
int N2 = mat2.rows();
int m = mat1.cols(); //Will be large ~10000 elements
// Row labels to be permuted
IntegerVector permindx = seq(0,N1+N2-1);
NumericMatrix M1 = no_init_matrix(N1,m);
NumericMatrix M2 = no_init_matrix(N2,m);
for(int b = 0; b<B; ++b){
// Permute the N1+N2 rows
permindx = sample(permindx,N1+N2); //Use Rcpp's function to work with R's RNG
for(int j=0; j<m; ++j){
// Pick out first N1 elements of permindx
for(int i=0; i<N1; ++i){
if(permindx[i]>=N1){ //Check that shuffled index is in bounds
M1(i,j) = mat2(permindx[i],j);
} else{
M1(i,j) = mat1(permindx[i],j);
}
}
// Pick out last N2 elements of permindx
for(int k=0; k<N2; ++k){
if(permindx[k+N1]<N1){ //Check that shuffled index is in bounds
M2(k,j) = mat1(permindx[k+N1],j);
} else{
M2(k,j) = mat2(permindx[k+N1],j);
}
}
}
out[b] = vecmin(ColMax(M1),ColMax(M2)); //a vector of length m
}
return(out);
}
Return a list of NumericVectors from Rcpp function
The solution for this problem is to define the vector sim
after the first for
command, like this:
List gowerSim(CharacterMatrix inp) {
int n_row = inp.nrow(), n_col = inp.ncol();
int sumRow=0,colLen;
List out(n_row);
for(int i=0;i<n_row;i++){
NumericVector sim(n_row);
for(int j=0;j<n_row;j++){
sumRow=0;
colLen=n_col;
for(int k=0; k<n_col;k++){
if(inp(i,k)!="NA" && inp(j,k)!="NA"){
if(inp(i,k)!=inp(j,k)){
sumRow=sumRow+1;
}
}else{
colLen=colLen-1;
}
}
if(colLen>0){
sim[j] = (double) sumRow/colLen;
//printf("%f",sim[j]);
}else{
sim[j] = NA_INTEGER;
}
}
out[i] = sim;
if(i<3){
print(out);
}
}
return out;
}
A little example:
mat <- matrix( as.character(c(rep(1,5),sample(3,15,repl=TRUE),rep(5,5))),5)
clust <- gowerSim(mat)
clust
Or you can define the vector as you did it and reset it in the first for-loop.
Why exactly this approach works and your not: I don't really know, but I think it is referred to the list structure in C++.
My first approach to solve your problem was the following one: Instead filling up a list, we are filling a Matrix, and this works fine, see here:
NumericMatrix gowerSim(CharacterMatrix inp) {
int n_row = inp.nrow(), n_col = inp.ncol();
int sumRow=0,colLen;
NumericMatrix out(n_row, n_col);
NumericVector sim(n_row);
for(int i=0;i<n_row;i++);
for(int j=0;j<n_row;j++){
sumRow=0;
colLen=n_col;
for(int k=0; k<n_col;k++){
if(inp(i,k)!="NA" && inp(j,k)!="NA"){
if(inp(i,k)!=inp(j,k)){
sumRow=sumRow+1;
}
}else{
colLen=colLen-1;
}
}
if(colLen>0){
sim[j] = (double) sumRow/colLen;
//printf("%f",sim[j]);
}else{
sim[j] = NA_INTEGER;
}
}
out(_,i) = sim;
if(i<3){
print(out);
}
}
return out;
}
R List of numeric vectors - C++ 2d array with Rcpp
I will give you bonus points for a reproducible example, and of course for using Rcpp :) And then I will take those away for not asking on the rcpp-devel list...
As for converting STL types: you don't have to, but when you decide to do it, the as<>()
idiom is correct. The only 'better way' I can think of is to do name lookup as you would in R itself:
require(inline)
require(Rcpp)
set.seed(42)
xl <- list(U=runif(4), N=rnorm(4), T2df=rt(4,2))
fun <- cxxfunction(signature(x="list"), plugin="Rcpp", body = '
Rcpp::List xl(x);
std::vector<double> u = Rcpp::as<std::vector<double> >(xl["U"]);
std::vector<double> n = Rcpp::as<std::vector<double> >(xl["N"]);
std::vector<double> t2 = Rcpp::as<std::vector<double> >(xl["T2df"]);
// do something clever here
return(R_NilValue);
')
Hope that helps. Otherwise, the list is always open...
PS As for the two-dim array, that is trickier as there is no native C++ two-dim array. If you actually want to do linear algebra, look at RcppArmadillo and RcppEigen.
Keeping vectors (from list of vectors) whose elements do not have a proper subset within that same list (using RCPP)
The notion here is to avoid the O(N^3) and use a less order instead. The other answer provided here will be slow still since it is greater than O(N^2). Here is a solution with less than O(N^2), where the worst case scenario is O(N^2) when all the elements are unique.
onlySet <- function(x){
i <- 1
repeat{
y <- sapply(x[-1], function(el)!all(is.element(x[[1]], el)))
if(all(y)){
if(i==length(x)) break
else i <- i+1
}
x <- c(x[-1][y], x[1])
}
x
}
Now to show the time difference, check out the following:
match_fun <- Vectorize(function(s1, s2) all(s1 %in% s2))
method1 <- function(a){
mat <- outer(a, a, match_fun)
a[colSums(mat) == 1]
}
poss <- rep(possibilities, 100)
microbenchmark::microbenchmark(method1(poss), onlySet(poss))
Unit: milliseconds
expr min lq mean median uq max neval cld
method1(poss) 840.7919 880.12635 932.255030 889.36380 923.32555 1420.1077 100 b
onlySet(poss) 1.9845 2.07005 2.191647 2.15945 2.24245 3.3656 100 a
Fast way to convert a list of character vectors to a list of numeric vectors
Inspired by @Konrad's answer, I coded up the following using Rcpp
.
NumericVector char_to_num(CharacterVector x) {
std::size_t n = x.size();
if (n == 0) return NumericVector(0);
NumericVector out(n);
for (std::size_t i = 0; i != n; ++i) {
std::string x_i(x[i]);
double number = NA_REAL;
try {
std::size_t pos;
number = std::stod(x_i, &pos);
number = ((pos == x_i.size()) ? number : NA_REAL);
} catch (const std::invalid_argument& e) {
; // do nothing
}
out[i] = number;
}
return out;
}
// [[Rcpp::export]]
List lst_char_to_num(List x) {
std::size_t n = x.size();
List out(n);
for (std::size_t i = 0; i != n; ++i)
out[i] = char_to_num(x[i]);
return out;
}
This lst_char_to_num()
turns out to be the best answer. I compare it to my favourite answers so far which are try1
, try2
and try3
from @rmflight. try1
was fastest so far (on a big dataset, which is what I'm worried about). I've taken the stringr
operation out of the timings because I want to purely evaluate the speed of the list conversion.
character_vector <- rep(c("a1b2", "c3d4e5", "xyz"), 1000)
extracted_numbers <- stringr::str_extract_all(character_vector, "\\d")
try_1 <- function(char_list) {
lapply(char_list, as.numeric)
}
try_2 <- function(char_list) {
purrr::map(char_list, as.numeric)
}
try_3 <- function(char_list) {
relist(as.numeric(unlist(char_list)), char_list)
}
microbenchmark::microbenchmark(try_1(extracted_numbers),
try_2(extracted_numbers),
try_3(extracted_numbers),
lst_char_to_num(extracted_numbers),
times = 1000)
Unit: microseconds
expr min lq mean median uq max neval cld
try_1(extracted_numbers) 1068.823 1334.9060 1518.7589 1477.7825 1559.791 5318.318 1000 b
try_2(extracted_numbers) 2029.832 2581.6655 2974.4126 2856.8560 3057.930 9846.862 1000 c
try_3(extracted_numbers) 10015.929 12261.6405 14043.5922 13188.8465 14802.795 165217.152 1000 d
lst_char_to_num(extracted_numbers) 500.858 681.5895 827.5021 765.9505 830.311 6744.985 1000 a
Related Topics
How to Request an Early Exit When Knitting an Rmd Document
How to Convert Entire Dataframe to Numeric While Preserving Decimals
Time-Series - Data Splitting and Model Evaluation
How to Specify "Does Not Contain" in Dplyr Filter
How to Order a Data Frame by One Descending and One Ascending Column
Override Column Types When Importing Data Using Readr::Read_Csv() When There Are Many Columns
Change the Color and Font of Text in Shiny App
Show That Shiny Is Busy (Or Loading) When Changing Tab Panels
Choosing Eps and Minpts for Dbscan (R)
How to Plot One Variable in Ggplot
Rm(List=Ls()) Doesn't Completely Clear the Workspace
How to Leave the R Browser() Mode in the Console Window
How to Create Datatable with Complex Header in R Shiny
How to Manually Create a Dendrogram (Or "Hclust") Object? (In R)