Rcpp Armadillo, submatrices and subvectors - rcpp

I try to translate some R code into RcppArmadillo and therefore I would also like to do the following:
Assume there is a nonnegative vector v and a matrix M, both with for example m rows. I would like to get rid of all rows in the matrix M whenever there is a zero in the corresponding row of the vector v and afterwards also get rid of all entries that are zero in the vector v. Using R this is simply just the following:
M = M[v>0,]
v = v[v>0]
So my question is if there is a way to do this in RcppArmadillo. Since I am quite new to any programming language I was not able to find anything that could solve my problem, although I think that I am not the first one who asks this maybe quite easy question.

Of course there is a way to go about subsetting elements in both Rcpp (subsetting with Rcpp) and RcppArmadillo (Armadillo subsetting).
Here is a way to replicate the behavior of R subsets in Armadillo.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// Isolate by Row
// [[Rcpp::export]]
arma::mat vec_subset_mat(const arma::mat& x, const arma::uvec& idx) {
return x.rows(find(idx > 0));
}
// Isolate by Element
// [[Rcpp::export]]
arma::vec subset_vec(const arma::vec& x) {
return x.elem(find(x > 0));
}
/*** R
set.seed(1334)
m = matrix(rnorm(100), 10, 10)
v = sample(0:1, 10, replace = T)
all.equal(m[v>0,], vec_subset_mat(m,v))
all.equal(v[v>0], as.numeric(subset_vec(v)))
*/

Related

Extract a data.frame from a list within Rcpp

This is probably a really simple question, but I can't figure out what's wrong.
I have a list that I pass to an Rcpp function, and the first element of that list is a data.frame.
How do I get that data.frame?
bar = list(df = data.frame(A = 1:3,B=letters[1:3]),some_other_variable = 2)
foo(bar)
And the following C++ code:
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericVector bar(Rcpp::List test){
Rcpp::DataFrame df_test = test["df"];
Rcpp::NumericVector result = df_test["A"];
return result;
}
I get the following error on the line DataFrame df_test = test["df"]:
error: conversion from 'Rcpp::Vector<19>::NameProxy{aka 'Rcpp::internal::generic_name_proxy<19, Rcpp::PreserveStorage> to 'Rcpp::DataFrame{aka 'Rcpp::DataFrame_ImplRcpp::PreserveStorage ambiguous
Anyone know what I'm missing? Thanks.
There may be a combination of issues going on with the instantiation and construction of List and DataFrame objects. See the (old !!) RcppExamples package for working examples.
Here is a repaired version of your code that works and does something with the vector inside the data.frame:
Code
#include <Rcpp.h>
// [[Rcpp::export]]
int bar(Rcpp::List test){
Rcpp::DataFrame df(test["df"]);
Rcpp::IntegerVector ivec = df["A"];
return Rcpp::sum(ivec);
}
/*** R
zz <- list(df = data.frame(A = 1:3,B=letters[1:3]),some_other_variable = 2)
bar(zz)
*/
Demo
> Rcpp::sourceCpp("~/git/stackoverflow/70035630/answer.cpp")
> zz <- list(df = data.frame(A = 1:3,B=letters[1:3]),some_other_variable = 2)
> bar(zz)
[1] 6
>
Edit: For completeness, the assignment op can be used with a SEXP as in SEXP df2 = test["df"]; which can then used to instantiate a data.frame. Template programming is difficult and not all corners are completely smoothed.

Check BIC via RcppArmadillo(Rcpp)

I want check the BIC usage in statistics.
My little example, which is saved as check_bic.cpp, is presented as follows:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
List check_bic(const int N = 10, const int p = 20, const double seed=0){
arma_rng::set_seed(seed); // for reproducibility
arma::mat Beta = randu(p,N); //randu/randn:random values(uniform and normal distributions)
arma::vec Bic = randu(N);
uvec ii = find(Bic == min(Bic)); // may be just one or several elements
int id = ii(ii.n_elem); // fetch the last one element
vec behat = Beta.col(id); // fetch the id column of matrix Beta
List ret;
ret["Bic"] = Bic;
ret["ii"] = ii;
ret["id"] = id;
ret["Beta"] = Beta;
ret["behat"] = behat;
return ret;
}
Then I compile check_bic.cpp in R by
library(Rcpp)
library(RcppArmadillo);
sourceCpp("check_bic.cpp")
and the compilation can pass successfully.
However, when I ran
check_bic(10,20,0)
in R, it shows errors as
error: Mat::operator(): index out of bounds
Error in check_bic(10, 20, 0) : Mat::operator(): index out of bounds
I check the .cpp code line by line, and guess the problems probably
happen at
uvec ii = find(Bic == min(Bic)); // may be just one or several elements
int id = ii(ii.n_elem); // fetch the last one element
since if uvec ii only has one element, then ii.n_elem may be NaN or something
else in Rcpp (while it's ok in Matlab), while I dont konw how to
deal with case. Any help?

random shuffle in RcppArrayFire

I am experiencing trouble getting this example to run correctly. Currently it produces the same random sample for every iteration and seed input, despite the seed changing as shown by af::getSeed().
#include "RcppArrayFire.h"
#include <random>
using namespace Rcpp;
using namespace RcppArrayFire;
// [[Rcpp::export]]
af::array random_test(RcppArrayFire::typed_array<f64> theta, int counts, int seed){
const int theta_size = theta.dims()[0];
af::array out(counts, theta_size, f64);
for(int f = 0; f < counts; f++){
af::randomEngine engine;
af_set_seed(seed + f);
//Rcpp::Rcout << af::getSeed();
af::array out_temp = af::randu(theta_size, u8, engine);
out(f, af::span) = out_temp;
// out(f, af::span) = theta(out_temp);
}
return out;
}
/*** R
theta <- 1:10
random_test(theta, 5, 1)
random_test(theta, 5, 2)
*/
The immediate problem is that you are creating a random engine within each iteration of the loop but set the seed of the global random engine. Either you set the seed of the local engine via engine.setSeed(seed), or you get rid of the local engine all together, letting af::randu default to using the global engine.
However, it would still be "unusual" to change the seed during each step of the loop. Normally one sets the seed only once, e.g.:
// [[Rcpp::depends(RcppArrayFire)]]
#include "RcppArrayFire.h"
// [[Rcpp::export]]
af::array random_test(RcppArrayFire::typed_array<f64> theta, int counts, int seed){
const int theta_size = theta.dims()[0];
af::array out(counts, theta_size, f64);
af::setSeed(seed);
for(int f = 0; f < counts; f++){
af::array out_temp = af::randu(theta_size, u8);
out(f, af::span) = out_temp;
}
return out;
}
BTW, it makes sense to parallelize this as long as your device has enough memory. For example, you could generate a random counts x theta_size matrix in one go using af::randu(counts, theta_size, u8).

Rcpp sugar unique of List

I have a list of Numeric Vector and I need a List of unique elements. I tried Rcpp:unique fonction. It works very well when apply to a Numeric Vector but not to List. This is the code and the error I got.
List h(List x){
return Rcpp::unique(x);
}
Error in dyn.load("/tmp/RtmpDdKvcH/sourceCpp-x86_64-pc-linux-gnu-1.0.0/sourcecpp_272635d5289/sourceCpp_10.so") :
unable to load shared object '/tmp/RtmpDdKvcH/sourceCpp-x86_64-pc-linux-gnu-1.0.0/sourcecpp_272635d5289/sourceCpp_10.so':
/tmp/RtmpDdKvcH/sourceCpp-x86_64-pc-linux-gnu-1.0.0/sourcecpp_272635d5289/sourceCpp_10.so: undefined symbol: _ZNK4Rcpp5sugar9IndexHashILi19EE8get_addrEP7SEXPREC
It is unclear what you are doing wrong, and it is an incomplete / irreproducible question.
But there is a unit test that does just what you do, and we can do it by hand too:
R> Rcpp::cppFunction("NumericVector uq(NumericVector x) { return Rcpp::unique(x); }")
R> uq(c(1.1, 2.2, 2.2, 3.3, 27))
[1] 27.0 1.1 3.3 2.2
R>
Even if there isn't a matching Rcpp sugar function, you can call R functions from within C++. Example:
#include <Rcpp.h>
using namespace Rcpp;
Rcpp::Environment base("package:base");
Function do_unique = base["unique"];
// [[Rcpp::export]]
List myfunc(List x) {
return do_unique(x);
}
Thank you for being interested to this issue.
As I notified that, my List contains only NumericVector. I propose this code that works very well and faster than unique function in R. However its efficiency decreases when the list is large. Maybe this can help someone. Moreover, someone can also optimise this code.
List uniqueList(List& x) {
int xsize = x.size();
List xunique(x);
int s = 1;
for(int i(1); i<xsize; ++i){
NumericVector xi = x[i];
int l = 0;
for(int j(0); j<s; ++j){
NumericVector xj = x[j];
int xisize = xi.size();
int xjsize = xj.size();
if(xisize != xjsize){
++l;
}
else{
if((sum(xi == xj) == xisize)){
goto notkeep;
}
else{
++l;
}
}
}
if(l == s){
xunique[s] = xi;
++s;
}
notkeep: 0;
}
return head(xunique, s);
}
/***R
x <- list(1,42, 1, 1:3, 42)
uniqueList(x)
[[1]]
[1] 1
[[2]]
[1] 42
[[3]]
[1] 1 2 3
microbenchmark::microbenchmark(uniqueList(x), unique(x))
Unit: microseconds
expr min lq mean median uq max neval
uniqueList(x) 2.382 2.633 3.05103 2.720 2.8995 29.307 100
unique(x) 2.864 3.110 3.50900 3.254 3.4145 24.039 100
But R function becomes faster when the List is large. I am sure that someone can optimise this code.

Most efficient way to translate C++ classes/objects into Rcpp?

I am new to Rccp and came across a problem by translating C++ code into the Rcpp environment – and I could not find a solution so far (this is an edited version of my original post that I think was unclear):
Background: I have multiple parameters and large matrices/arrays that needs to be transferred to the C++ level. In C++, I have several functions that need to access these parameters and matrices and in some cases, change values etc. In C++ I would create classes that combine all parameters and matrices as well as the functions that need to access them. By doing so, I dod not need to pass them (each time) to the function.
Issue: I could not figure out how that may work with Rcpp. In the example below (the function is stupid, but hopefully an easy way to illustrate my issue), I create a matrix in R that is then used in C++. However, I then hand the entire matrix over to a sub-function in order to use the matrix within this function. This seems a very bad idea and I would rater like to have the matrix M in the namespace memory and access it in the sub function without cloning it.
#include <RcppArmadillo.h>
//[[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
double fnc1 (int t, int s, arma::mat M) // I would prefer not to have M in the arguments but rather available in the namespace
{
double out = M(t,s) - M(t,s);
return out;
}
// [[Rcpp::export]]
arma::mat Rout (arma::mat M)
{
int ncol = M.n_cols;
int nrow = M.n_rows;
for(int c = 0; c<ncol; ++c)
{
for(int r = 0; r<nrow; ++r)
{
M(r,c) = fnc1(r, c, M);
}
}
return M;
}
/*** R
m <- matrix(runif(50), ncol = 10, nrow = 5)
Rout(m)
*/
Okay, let's talk about R to C++. At somepoint, you have to have a function exported to R that will receive an R object and pass it back to C++. Once inside C++, the sky's the limit as to how you want to structure the interaction with that object. The thought process of:
However, I then hand the entire matrix over to a sub-function in order to use the matrix within this function. This seems a very bad idea and I would rater like to have the matrix M in the namespace memory and access it in the sub function without cloning it.
is slightly problematic as you have now just introduced a global variable called M to handle your data. If M is not initialized, then the routine will falter. If you inadvertently modify M, then the data will change for all routines. So, I'm not sure going the global variable approach is the solution you desire.
The main issue you seem to have is the emphasized portion regarding a "clone". When working with C++, the default pass by construct is to copy the object. However, unlike R, it is very easy to pass by reference by prefixing object names with & and, thus, negate a copy entirely. This localizes the process.
Pass-by-Reference Demo
#include <RcppArmadillo.h>
//[[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
double fnc1 (int t, int s, const arma::mat& M) {
double out = M(t,s) - M(t,s);
return out;
}
// [[Rcpp::export]]
arma::mat Rout (arma::mat& M) {
int ncol = M.n_cols;
int nrow = M.n_rows;
for(int c = 0; c<ncol; ++c) {
for(int r = 0; r<nrow; ++r) {
M(r,c) = fnc1(r, c, M);
}
}
return M;
}
/*** R
m <- matrix(runif(50), ncol = 10, nrow = 5)
Rout(m)
*/
Global Variable Demo
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// Create a namespace to store M
namespace toad {
arma::mat M;
}
double fnc1 (int t, int s)
{
double out = toad::M(t,s) - toad::M(t,s);
return out;
}
// [[Rcpp::export]]
void Rin (arma::mat M)
{
toad::M = M;
}
// [[Rcpp::export]]
void Rmanipulate()
{
int ncol = toad::M.n_cols;
int nrow = toad::M.n_rows;
for(int c = 0; c<ncol; ++c)
{
for(int r = 0; r<nrow; ++r)
{
toad::M(r,c) = fnc1(r, c);
}
}
}
// [[Rcpp::export]]
arma::mat Rout (){
return toad::M;
}
/*** R
m <- matrix(runif(50), ncol = 10, nrow = 5)
Rin(m)
Rmanipulate()
Rout()
*/

Resources