Ordering Permutation in Rcpp i.e. base::order()
Here's a simple version leveraging Rcpp sugar to implement an order
function. We put in a check for duplicates so that we guarantee that things work 'as expected'. (There is also a bug with Rcpp's sort
method when there are NA
s, so that may want to be checked as well -- this will be fixed by the next release).
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector order_(NumericVector x) {
if (is_true(any(duplicated(x)))) {
Rf_warning("There are duplicates in 'x'; order not guaranteed to match that of R's base::order");
}
NumericVector sorted = clone(x).sort();
return match(sorted, x);
}
/*** R
library(microbenchmark)
x <- runif(1E5)
identical( order(x), order_(x) )
microbenchmark(
order(x),
order_(x)
)
*/
gives me
> Rcpp::sourceCpp('~/test-order.cpp')
> set.seed(456)
> library(microbenchmark)
> x <- runif(1E5)
> identical( order(x), order_(x) )
[1] TRUE
> microbenchmark(
+ order(x),
+ order_(x)
+ )
Unit: milliseconds
expr min lq median uq max neval
order(x) 15.48007 15.69709 15.86823 16.21142 17.22293 100
order_(x) 10.81169 11.07167 11.40678 11.87135 48.66372 100
>
Of course, if you're comfortable with the output not matching R, you can remove the duplicated check -- x[order_(x)]
will still be properly sorted; more specifically, all(x[order(x)] == x[order_(x)])
should return TRUE
.
Permutation order in rcpp
If I understand what you asking, one approach would be to leverage the comparison operators of std::tuple
, which should "do the right thing".
// [[Rcpp::plugins(cpp11)]]
#include <Rcpp.h>
#include <tuple>
template <typename... Ts>
struct Point {
std::size_t index;
using tuple_t = std::tuple<Ts...>;
tuple_t data;
template <typename... Vs>
Point(std::size_t i, const Vs&... vs)
: index(i),
data(std::forward<decltype(vs[i])>(vs[i])...)
{}
bool operator<(const Point& other) const {
return data < other.data;
}
};
template <typename... Ts>
class PointVector {
public:
std::size_t sz;
using vector_t = std::vector<Point<Ts...>>;
vector_t data;
template <typename... Vs>
PointVector(const Vs&... vs)
: sz(min_size(vs...))
{
data.reserve(sz);
for (std::size_t i = 0; i < sz; i++) {
data.emplace_back(i, vs...);
}
}
Rcpp::IntegerVector sorted_index() const {
vector_t tmp(data);
std::stable_sort(tmp.begin(), tmp.end());
Rcpp::IntegerVector res(sz);
for (std::size_t i = 0; i < sz; i++) {
res[i] = tmp[i].index + 1;
}
return res;
}
private:
template <typename V>
std::size_t min_size(const V& v) {
return v.size();
}
template <typename T, typename S, typename... Vs>
std::size_t min_size(const T& t, const S& s, const Vs&... vs) {
return t.size() < s.size() ?
min_size(t, vs...) :
min_size(s, vs...);
}
};
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector order2(NumericVector x, NumericVector y) {
PointVector<double, double> pv(x, y);
return pv.sorted_index();
}
// [[Rcpp::export]]
IntegerVector order3(NumericVector x, NumericVector y, NumericVector z) {
PointVector<double, double, double> pv(x, y, z);
return pv.sorted_index();
}
Using this data to demonstrate,
x <- rep(1:2 + 0.5, 10)
y <- rep(1:4 + 0.5, 5)
z <- rep(1:5 + 0.5, 4)
(df <- data.frame(x, y, z))
# x y z
# 1 1.5 1.5 1.5
# 2 2.5 2.5 2.5
# 3 1.5 3.5 3.5
# 4 2.5 4.5 4.5
# 5 1.5 1.5 5.5
# 6 2.5 2.5 1.5
# 7 1.5 3.5 2.5
# 8 2.5 4.5 3.5
# 9 1.5 1.5 4.5
# 10 2.5 2.5 5.5
# 11 1.5 3.5 1.5
# 12 2.5 4.5 2.5
# 13 1.5 1.5 3.5
# 14 2.5 2.5 4.5
# 15 1.5 3.5 5.5
# 16 2.5 4.5 1.5
# 17 1.5 1.5 2.5
# 18 2.5 2.5 3.5
# 19 1.5 3.5 4.5
# 20 2.5 4.5 5.5
the C++ version yields the same results as the base R equivalent:
all.equal(base::order(x, y, z), order3(x, y, z))
# [1] TRUE
df[order3(x, y, z),]
# x y z
# 1 1.5 1.5 1.5
# 17 1.5 1.5 2.5
# 13 1.5 1.5 3.5
# 9 1.5 1.5 4.5
# 5 1.5 1.5 5.5
# 11 1.5 3.5 1.5
# 7 1.5 3.5 2.5
# 3 1.5 3.5 3.5
# 19 1.5 3.5 4.5
# 15 1.5 3.5 5.5
# 6 2.5 2.5 1.5
# 2 2.5 2.5 2.5
# 18 2.5 2.5 3.5
# 14 2.5 2.5 4.5
# 10 2.5 2.5 5.5
# 16 2.5 4.5 1.5
# 12 2.5 4.5 2.5
# 8 2.5 4.5 3.5
# 4 2.5 4.5 4.5
# 20 2.5 4.5 5.5
Edit: Here's a slightly more palatable version which allows you to write auto pv = MakePointVector(...)
rather than specifying all of the type parameters:
// Point class as before
template <typename... Vecs>
class PointVector {
public:
std::size_t sz;
using point_t = Point<typename std::remove_reference<decltype(Vecs{}[0])>::type...>;
using vector_t = std::vector<point_t>;
vector_t data;
template <typename... Vs>
PointVector(const Vecs&... vs)
: sz(min_size(vs...))
{
data.reserve(sz);
for (std::size_t i = 0; i < sz; i++) {
data.emplace_back(i, vs...);
}
}
Rcpp::IntegerVector sorted_index() const {
vector_t tmp(data);
std::stable_sort(tmp.begin(), tmp.end());
Rcpp::IntegerVector res(sz);
for (std::size_t i = 0; i < sz; i++) {
res[i] = tmp[i].index + 1;
}
return res;
}
private:
template <typename V>
std::size_t min_size(const V& v) {
return v.size();
}
template <typename T, typename S, typename... Vs>
std::size_t min_size(const T& t, const S& s, const Vs&... vs) {
return t.size() < s.size() ?
min_size(t, vs...) :
min_size(s, vs...);
}
};
template <typename... Vecs>
PointVector<Vecs...> MakePointVector(const Vecs&... vecs) {
return PointVector<Vecs...>(vecs...);
}
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector order2(NumericVector x, NumericVector y) {
auto pv = MakePointVector(x, y);
return pv.sorted_index();
}
// [[Rcpp::export]]
IntegerVector order3(NumericVector x, NumericVector y, NumericVector z) {
auto pv = MakePointVector(x, y, z);
return pv.sorted_index();
}
/*** R
all.equal(base::order(x, y), order2(x, y))
# [1] TRUE
all.equal(base::order(x, y, z), order3(x, y, z))
# [1] TRUE
*/
Rcpp unique order output
If you look at the (short) source file you see that it uses an internal class IndexHash
. I suspect this one sorts by default.
If original order is paramount, I supposed you could write yourself a new convenience wrapper. It cannot be all this hard: at the risk of wasting a few bytes of memory, allocate a temporary logical vector, use a standard hashmap and loop over the incoming vector. For each value, ask if the hashmap has seen this value, store the boolean answer. Then use that to index the original vector.
Chances are this is even implemented somewhere. Also look at Armadillo and Eigen for utility functions.
Reorder a vector by order of another
We could use rank
vec1[rank(vec2)]
#[1] 0 1 6 4 2 5 9 3 7
Or with order
vec1[order(order(vec2))]
#[1] 0 1 6 4 2 5 9 3 7
Or as @markus suggested an option with frank
from data.table
library(data.table)
vec1[frank(vec2)]
rcpp updating data in base environment
Your size
variable is changing because you are changing it in your C++ code. In particular this line:
emp(apply(j)) = emp(apply(j)) - 1;
Rcpp
passes variables by reference so anything you do to them inside will be reflected in your top R variables. If you want to avoid this, then you want to clone
your variable. Changing your code to the following corrects the problem.
#include <Rcpp.h>
using namespace Rcpp;
// Note the change in the name of 'emp' to 'emp_'!!!
//[[Rcpp::export]]
NumericVector gs2(int I, int J, int nc, NumericVector pos, NumericVector emp_, NumericMatrix choices) {
NumericVector admits(J);
NumericVector out(I);
// clone your emp
NumericVector emp = clone(emp_);
std::fill(out.begin(),out.end(),J+1);
for (int i=0;i<I;i++){
NumericVector apply = choices(pos(i),_)-1;
for (int j=0;j<nc;j++){
if (emp(apply(j))>0)
{
out(pos(i)) = apply(j)+1;
admits(apply(j)) = admits(apply(j)) + 1;
emp(apply(j)) = emp(apply(j)) - 1;
break;
}
}
}
return out;
}
Test
library(Rcpp)
sourceCpp("test.cpp")
set.seed(123)
rank = (1:20)-1
stuchoice = matrix(sample(1:3,6*20,replace=T),byrow=T,ncol=6,nrow=20)
size = c(7,11,4)
gs2(20,3,6,rank,size,stuchoice)
size
[1] 7 11 4
Related Topics
Put Multiple Data Frames into List (Smart Way)
Fastest Way to Detect If Vector Has at Least 1 Na
Error Calling Serialize R Function
Copying and Modifying a Default Theme
Handling Missing/Incomplete Data in R--Is There Function to Mask But Not Remove Nas
Use R to Convert PDF Files to Text Files for Text Mining
Draw a Chronological Timeline with Ggplot2
Name Columns Within Aggregate in R
Controlling the 'Alpha' Level in a Ggplot2 Legend
Add Author Affiliation in R Markdown Beamer Presentation
How to Build a Dendrogram from a Directory Tree
Knitr: Run All Chunks in an Rmarkdown Document
How to Get Currency Exchange Rates in R
R Glmnet:"(List) Object Cannot Be Coerced to Type 'Double' "
Comparison Between Dplyr::Do/Purrr::Map, What Advantages
Why Can't I Get a P-Value Smaller Than 2.2E-16