Ordering Permutation in Rcpp I.E. Base::Order()

Ordering Permutation in Rcpp i.e. base::order()

Here's a simple version leveraging Rcpp sugar to implement an order function. We put in a check for duplicates so that we guarantee that things work 'as expected'. (There is also a bug with Rcpp's sort method when there are NAs, so that may want to be checked as well -- this will be fixed by the next release).

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
IntegerVector order_(NumericVector x) {
if (is_true(any(duplicated(x)))) {
Rf_warning("There are duplicates in 'x'; order not guaranteed to match that of R's base::order");
}
NumericVector sorted = clone(x).sort();
return match(sorted, x);
}

/*** R
library(microbenchmark)
x <- runif(1E5)
identical( order(x), order_(x) )
microbenchmark(
order(x),
order_(x)
)
*/

gives me

> Rcpp::sourceCpp('~/test-order.cpp')

> set.seed(456)

> library(microbenchmark)

> x <- runif(1E5)

> identical( order(x), order_(x) )
[1] TRUE

> microbenchmark(
+ order(x),
+ order_(x)
+ )
Unit: milliseconds
expr min lq median uq max neval
order(x) 15.48007 15.69709 15.86823 16.21142 17.22293 100
order_(x) 10.81169 11.07167 11.40678 11.87135 48.66372 100
>

Of course, if you're comfortable with the output not matching R, you can remove the duplicated check -- x[order_(x)] will still be properly sorted; more specifically, all(x[order(x)] == x[order_(x)]) should return TRUE.

Permutation order in rcpp

If I understand what you asking, one approach would be to leverage the comparison operators of std::tuple, which should "do the right thing".

// [[Rcpp::plugins(cpp11)]]
#include <Rcpp.h>
#include <tuple>

template <typename... Ts>
struct Point {
std::size_t index;

using tuple_t = std::tuple<Ts...>;
tuple_t data;

template <typename... Vs>
Point(std::size_t i, const Vs&... vs)
: index(i),
data(std::forward<decltype(vs[i])>(vs[i])...)
{}

bool operator<(const Point& other) const {
return data < other.data;
}
};

template <typename... Ts>
class PointVector {
public:
std::size_t sz;
using vector_t = std::vector<Point<Ts...>>;
vector_t data;

template <typename... Vs>
PointVector(const Vs&... vs)
: sz(min_size(vs...))
{
data.reserve(sz);
for (std::size_t i = 0; i < sz; i++) {
data.emplace_back(i, vs...);
}
}

Rcpp::IntegerVector sorted_index() const {
vector_t tmp(data);
std::stable_sort(tmp.begin(), tmp.end());

Rcpp::IntegerVector res(sz);
for (std::size_t i = 0; i < sz; i++) {
res[i] = tmp[i].index + 1;
}

return res;
}

private:
template <typename V>
std::size_t min_size(const V& v) {
return v.size();
}

template <typename T, typename S, typename... Vs>
std::size_t min_size(const T& t, const S& s, const Vs&... vs) {
return t.size() < s.size() ?
min_size(t, vs...) :
min_size(s, vs...);
}
};

using namespace Rcpp;

// [[Rcpp::export]]
IntegerVector order2(NumericVector x, NumericVector y) {
PointVector<double, double> pv(x, y);
return pv.sorted_index();
}

// [[Rcpp::export]]
IntegerVector order3(NumericVector x, NumericVector y, NumericVector z) {
PointVector<double, double, double> pv(x, y, z);
return pv.sorted_index();
}

Using this data to demonstrate,

x <- rep(1:2 + 0.5, 10)
y <- rep(1:4 + 0.5, 5)
z <- rep(1:5 + 0.5, 4)
(df <- data.frame(x, y, z))
# x y z
# 1 1.5 1.5 1.5
# 2 2.5 2.5 2.5
# 3 1.5 3.5 3.5
# 4 2.5 4.5 4.5
# 5 1.5 1.5 5.5
# 6 2.5 2.5 1.5
# 7 1.5 3.5 2.5
# 8 2.5 4.5 3.5
# 9 1.5 1.5 4.5
# 10 2.5 2.5 5.5
# 11 1.5 3.5 1.5
# 12 2.5 4.5 2.5
# 13 1.5 1.5 3.5
# 14 2.5 2.5 4.5
# 15 1.5 3.5 5.5
# 16 2.5 4.5 1.5
# 17 1.5 1.5 2.5
# 18 2.5 2.5 3.5
# 19 1.5 3.5 4.5
# 20 2.5 4.5 5.5

the C++ version yields the same results as the base R equivalent:

all.equal(base::order(x, y, z), order3(x, y, z))
# [1] TRUE

df[order3(x, y, z),]
# x y z
# 1 1.5 1.5 1.5
# 17 1.5 1.5 2.5
# 13 1.5 1.5 3.5
# 9 1.5 1.5 4.5
# 5 1.5 1.5 5.5
# 11 1.5 3.5 1.5
# 7 1.5 3.5 2.5
# 3 1.5 3.5 3.5
# 19 1.5 3.5 4.5
# 15 1.5 3.5 5.5
# 6 2.5 2.5 1.5
# 2 2.5 2.5 2.5
# 18 2.5 2.5 3.5
# 14 2.5 2.5 4.5
# 10 2.5 2.5 5.5
# 16 2.5 4.5 1.5
# 12 2.5 4.5 2.5
# 8 2.5 4.5 3.5
# 4 2.5 4.5 4.5
# 20 2.5 4.5 5.5

Edit: Here's a slightly more palatable version which allows you to write auto pv = MakePointVector(...) rather than specifying all of the type parameters:

// Point class as before 

template <typename... Vecs>
class PointVector {
public:
std::size_t sz;
using point_t = Point<typename std::remove_reference<decltype(Vecs{}[0])>::type...>;
using vector_t = std::vector<point_t>;
vector_t data;

template <typename... Vs>
PointVector(const Vecs&... vs)
: sz(min_size(vs...))
{
data.reserve(sz);
for (std::size_t i = 0; i < sz; i++) {
data.emplace_back(i, vs...);
}
}

Rcpp::IntegerVector sorted_index() const {
vector_t tmp(data);
std::stable_sort(tmp.begin(), tmp.end());

Rcpp::IntegerVector res(sz);
for (std::size_t i = 0; i < sz; i++) {
res[i] = tmp[i].index + 1;
}

return res;
}

private:
template <typename V>
std::size_t min_size(const V& v) {
return v.size();
}

template <typename T, typename S, typename... Vs>
std::size_t min_size(const T& t, const S& s, const Vs&... vs) {
return t.size() < s.size() ?
min_size(t, vs...) :
min_size(s, vs...);
}
};

template <typename... Vecs>
PointVector<Vecs...> MakePointVector(const Vecs&... vecs) {
return PointVector<Vecs...>(vecs...);
}

using namespace Rcpp;

// [[Rcpp::export]]
IntegerVector order2(NumericVector x, NumericVector y) {
auto pv = MakePointVector(x, y);
return pv.sorted_index();
}

// [[Rcpp::export]]
IntegerVector order3(NumericVector x, NumericVector y, NumericVector z) {
auto pv = MakePointVector(x, y, z);
return pv.sorted_index();
}

/*** R

all.equal(base::order(x, y), order2(x, y))
# [1] TRUE

all.equal(base::order(x, y, z), order3(x, y, z))
# [1] TRUE

*/

Rcpp unique order output

If you look at the (short) source file you see that it uses an internal class IndexHash. I suspect this one sorts by default.

If original order is paramount, I supposed you could write yourself a new convenience wrapper. It cannot be all this hard: at the risk of wasting a few bytes of memory, allocate a temporary logical vector, use a standard hashmap and loop over the incoming vector. For each value, ask if the hashmap has seen this value, store the boolean answer. Then use that to index the original vector.

Chances are this is even implemented somewhere. Also look at Armadillo and Eigen for utility functions.

Reorder a vector by order of another

We could use rank

vec1[rank(vec2)]
#[1] 0 1 6 4 2 5 9 3 7

Or with order

vec1[order(order(vec2))]
#[1] 0 1 6 4 2 5 9 3 7

Or as @markus suggested an option with frank from data.table

library(data.table)
vec1[frank(vec2)]

rcpp updating data in base environment

Your size variable is changing because you are changing it in your C++ code. In particular this line:

emp(apply(j))  = emp(apply(j)) - 1;

Rcpp passes variables by reference so anything you do to them inside will be reflected in your top R variables. If you want to avoid this, then you want to clone your variable. Changing your code to the following corrects the problem.

#include <Rcpp.h>
using namespace Rcpp;

// Note the change in the name of 'emp' to 'emp_'!!!

//[[Rcpp::export]]
NumericVector gs2(int I, int J, int nc, NumericVector pos, NumericVector emp_, NumericMatrix choices) {
NumericVector admits(J);
NumericVector out(I);

// clone your emp
NumericVector emp = clone(emp_);

std::fill(out.begin(),out.end(),J+1);
for (int i=0;i<I;i++){
NumericVector apply = choices(pos(i),_)-1;
for (int j=0;j<nc;j++){
if (emp(apply(j))>0)
{
out(pos(i)) = apply(j)+1;
admits(apply(j)) = admits(apply(j)) + 1;
emp(apply(j)) = emp(apply(j)) - 1;
break;
}
}
}
return out;
}

Test

library(Rcpp)
sourceCpp("test.cpp")

set.seed(123)
rank = (1:20)-1
stuchoice = matrix(sample(1:3,6*20,replace=T),byrow=T,ncol=6,nrow=20)

size = c(7,11,4)

gs2(20,3,6,rank,size,stuchoice)
size
[1] 7 11 4


Related Topics



Leave a reply



Submit