Most efficient way to translate C++ classes/objects into Rcpp? - rcpp

I am new to Rccp and came across a problem by translating C++ code into the Rcpp environment – and I could not find a solution so far (this is an edited version of my original post that I think was unclear):
Background: I have multiple parameters and large matrices/arrays that needs to be transferred to the C++ level. In C++, I have several functions that need to access these parameters and matrices and in some cases, change values etc. In C++ I would create classes that combine all parameters and matrices as well as the functions that need to access them. By doing so, I dod not need to pass them (each time) to the function.
Issue: I could not figure out how that may work with Rcpp. In the example below (the function is stupid, but hopefully an easy way to illustrate my issue), I create a matrix in R that is then used in C++. However, I then hand the entire matrix over to a sub-function in order to use the matrix within this function. This seems a very bad idea and I would rater like to have the matrix M in the namespace memory and access it in the sub function without cloning it.
#include <RcppArmadillo.h>
//[[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
double fnc1 (int t, int s, arma::mat M) // I would prefer not to have M in the arguments but rather available in the namespace
{
double out = M(t,s) - M(t,s);
return out;
}
// [[Rcpp::export]]
arma::mat Rout (arma::mat M)
{
int ncol = M.n_cols;
int nrow = M.n_rows;
for(int c = 0; c<ncol; ++c)
{
for(int r = 0; r<nrow; ++r)
{
M(r,c) = fnc1(r, c, M);
}
}
return M;
}
/*** R
m <- matrix(runif(50), ncol = 10, nrow = 5)
Rout(m)
*/

Okay, let's talk about R to C++. At somepoint, you have to have a function exported to R that will receive an R object and pass it back to C++. Once inside C++, the sky's the limit as to how you want to structure the interaction with that object. The thought process of:
However, I then hand the entire matrix over to a sub-function in order to use the matrix within this function. This seems a very bad idea and I would rater like to have the matrix M in the namespace memory and access it in the sub function without cloning it.
is slightly problematic as you have now just introduced a global variable called M to handle your data. If M is not initialized, then the routine will falter. If you inadvertently modify M, then the data will change for all routines. So, I'm not sure going the global variable approach is the solution you desire.
The main issue you seem to have is the emphasized portion regarding a "clone". When working with C++, the default pass by construct is to copy the object. However, unlike R, it is very easy to pass by reference by prefixing object names with & and, thus, negate a copy entirely. This localizes the process.
Pass-by-Reference Demo
#include <RcppArmadillo.h>
//[[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
double fnc1 (int t, int s, const arma::mat& M) {
double out = M(t,s) - M(t,s);
return out;
}
// [[Rcpp::export]]
arma::mat Rout (arma::mat& M) {
int ncol = M.n_cols;
int nrow = M.n_rows;
for(int c = 0; c<ncol; ++c) {
for(int r = 0; r<nrow; ++r) {
M(r,c) = fnc1(r, c, M);
}
}
return M;
}
/*** R
m <- matrix(runif(50), ncol = 10, nrow = 5)
Rout(m)
*/
Global Variable Demo
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// Create a namespace to store M
namespace toad {
arma::mat M;
}
double fnc1 (int t, int s)
{
double out = toad::M(t,s) - toad::M(t,s);
return out;
}
// [[Rcpp::export]]
void Rin (arma::mat M)
{
toad::M = M;
}
// [[Rcpp::export]]
void Rmanipulate()
{
int ncol = toad::M.n_cols;
int nrow = toad::M.n_rows;
for(int c = 0; c<ncol; ++c)
{
for(int r = 0; r<nrow; ++r)
{
toad::M(r,c) = fnc1(r, c);
}
}
}
// [[Rcpp::export]]
arma::mat Rout (){
return toad::M;
}
/*** R
m <- matrix(runif(50), ncol = 10, nrow = 5)
Rin(m)
Rmanipulate()
Rout()
*/

Related

RcppArrayFire passing a matrix row as af::array input

In this simple example I would like to subset a matrix by row and pass it to another cpp function; the example demonstrates this works by passing an input array to the other function first.
#include "RcppArrayFire.h"
using namespace Rcpp;
af::array theta_check_cpp( af::array theta){
if(*theta(1).host<double>() >= 1){
theta(1) = 0;
}
return theta;
}
// [[Rcpp::export]]
af::array theta_check(RcppArrayFire::typed_array<f64> theta){
const int theta_size = theta.dims()[0];
af::array X(2, theta_size);
X(0, af::seq(theta_size)) = theta_check_cpp( theta );
X(1, af::seq(theta_size)) = theta;
// return X;
Rcpp::Rcout << " works till here";
return theta_check_cpp( X.row(1) );
}
/*** R
theta <- c( 2, 2, 2)
theta_check(theta)
*/
The constructor you are using to create X has an argument ty for the data type, which defaults to f32. Therefore X uses 32 bit floats and you cannot extract a 64 bit host pointer from that. Either use
af::array X(2, theta_size, f64);
to create an array using 64 bit doubles, or extract a 32 bit host pointer via
if(*theta(1).host<float>() >= 1){
...

random shuffle in RcppArrayFire

I am experiencing trouble getting this example to run correctly. Currently it produces the same random sample for every iteration and seed input, despite the seed changing as shown by af::getSeed().
#include "RcppArrayFire.h"
#include <random>
using namespace Rcpp;
using namespace RcppArrayFire;
// [[Rcpp::export]]
af::array random_test(RcppArrayFire::typed_array<f64> theta, int counts, int seed){
const int theta_size = theta.dims()[0];
af::array out(counts, theta_size, f64);
for(int f = 0; f < counts; f++){
af::randomEngine engine;
af_set_seed(seed + f);
//Rcpp::Rcout << af::getSeed();
af::array out_temp = af::randu(theta_size, u8, engine);
out(f, af::span) = out_temp;
// out(f, af::span) = theta(out_temp);
}
return out;
}
/*** R
theta <- 1:10
random_test(theta, 5, 1)
random_test(theta, 5, 2)
*/
The immediate problem is that you are creating a random engine within each iteration of the loop but set the seed of the global random engine. Either you set the seed of the local engine via engine.setSeed(seed), or you get rid of the local engine all together, letting af::randu default to using the global engine.
However, it would still be "unusual" to change the seed during each step of the loop. Normally one sets the seed only once, e.g.:
// [[Rcpp::depends(RcppArrayFire)]]
#include "RcppArrayFire.h"
// [[Rcpp::export]]
af::array random_test(RcppArrayFire::typed_array<f64> theta, int counts, int seed){
const int theta_size = theta.dims()[0];
af::array out(counts, theta_size, f64);
af::setSeed(seed);
for(int f = 0; f < counts; f++){
af::array out_temp = af::randu(theta_size, u8);
out(f, af::span) = out_temp;
}
return out;
}
BTW, it makes sense to parallelize this as long as your device has enough memory. For example, you could generate a random counts x theta_size matrix in one go using af::randu(counts, theta_size, u8).

af::array conversion to float or double

I have been experimenting with the RcppArrayFire Package, mostly rewriting some cost functions from RcppArmadillo and can't seem to get over "no viable conversion from 'af::array' to 'float'. I have also been getting some backend errors, the example below seems free of these.
This cov-var example is written poorly just to use all relevant coding pieces from my actual cost function. As of now it is the only addition in a package generated by, "RcppArrayFire.package.skeleton".
#include "RcppArrayFire.h"
#include <Rcpp.h>
// [[Rcpp::depends(RcppArrayFire)]]
// [[Rcpp::export]]
float example_ols(const RcppArrayFire::typed_array<f32>& X_vect, const RcppArrayFire::typed_array<f32>& Y_vect){
int Len = X_vect.dims()[0];
int Len_Y = Y_vect.dims()[0];
while( Len_Y < Len){
Len --;
}
float mean_X = af::sum(X_vect)/Len;
float mean_Y = af::sum(Y_vect)/Len;
RcppArrayFire::typed_array<f32> temp(Len);
RcppArrayFire::typed_array<f32> temp_x(Len);
for( int f = 0; f < Len; f++){
temp(f) = (X_vect(f) - mean_X)*(Y_vect(f) - mean_Y);
temp_x(f) = af::pow(X_vect(f) -mean_X, 2);
}
return af::sum(temp)/af::sum(temp_x);
}
/*** R
X <- 1:10
Y <- 2*X +rnorm(10, mean = 0, sd = 1)
example_ols(X, Y)
*/
The first thing to consider is the af::sum function, which comes in different forms: An sf::sum(af::array) that returns an af::array in device memory and a templated af::sum<T>(af::array) that returns a T in host memory. So the minimal change to your example would be using af::sum<float>:
#include "RcppArrayFire.h"
#include <Rcpp.h>
// [[Rcpp::depends(RcppArrayFire)]]
// [[Rcpp::export]]
float example_ols(const RcppArrayFire::typed_array<f32>& X_vect,
const RcppArrayFire::typed_array<f32>& Y_vect){
int Len = X_vect.dims()[0];
int Len_Y = Y_vect.dims()[0];
while( Len_Y < Len){
Len --;
}
float mean_X = af::sum<float>(X_vect)/Len;
float mean_Y = af::sum<float>(Y_vect)/Len;
RcppArrayFire::typed_array<f32> temp(Len);
RcppArrayFire::typed_array<f32> temp_x(Len);
for( int f = 0; f < Len; f++){
temp(f) = (X_vect(f) - mean_X)*(Y_vect(f) - mean_Y);
temp_x(f) = af::pow(X_vect(f) -mean_X, 2);
}
return af::sum<float>(temp)/af::sum<float>(temp_x);
}
/*** R
set.seed(1)
X <- 1:10
Y <- 2*X +rnorm(10, mean = 0, sd = 1)
example_ols(X, Y)
*/
However, there are more things one can improve. In no particular order:
You don't need to include Rcpp.h.
There is an af::mean function for computing the mean of an af::array.
In general RcppArrayFire::typed_array<T> is only needed for getting arrays from R into C++. Within C++ and for the way back you can use af::array.
Even when your device does not support double, you can still use double values on the host.
In order to get good performance, you should avoid for loops and use vectorized functions, just like in R. You have to impose equal dimensions for X and Y, though.
Interestingly I get a different result when I use vectorized functions. Right now I am not sure why this is the case, but the following form makes more sense to me. You should verify that the result is what you want to get:
#include <RcppArrayFire.h>
// [[Rcpp::depends(RcppArrayFire)]]
// [[Rcpp::export]]
double example_ols(const RcppArrayFire::typed_array<f32>& X_vect,
const RcppArrayFire::typed_array<f32>& Y_vect){
double mean_X = af::mean<double>(X_vect);
double mean_Y = af::mean<double>(Y_vect);
af::array temp = (X_vect - mean_X) * (Y_vect - mean_Y);
af::array temp_x = af::pow(X_vect - mean_X, 2.0);
return af::sum<double>(temp)/af::sum<double>(temp_x);
}
/*** R
set.seed(1)
X <- 1:10
Y <- 2*X +rnorm(10, mean = 0, sd = 1)
example_ols(X, Y)
*/
BTW, an even shorter version would be:
#include <RcppArrayFire.h>
// [[Rcpp::depends(RcppArrayFire)]]
// [[Rcpp::export]]
af::array example_ols(const RcppArrayFire::typed_array<f32>& X_vect,
const RcppArrayFire::typed_array<f32>& Y_vect){
return af::cov(X_vect, Y_vect) / af::var(X_vect);
}
Generally it is a good idea to use the in-build functions as much as possible.

Rcpp and "optional" arguments on functions [duplicate]

I created a cumsum function in an R package with rcpp which will cumulatively sum a vector until it hits the user defined ceiling or floor. However, if one wants the cumsum to be bounded above, the user must still specify a floor.
Example:
a = c(1, 1, 1, 1, 1, 1, 1)
If i wanted to cumsum a and have an upper bound of 3, I could cumsum_bounded(a, lower = 1, upper = 3). I would rather not have to specify the lower bound.
My code:
#include <Rcpp.h>
#include <float.h>
#include <cmath>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector cumsum_bounded(NumericVector x, int upper, int lower) {
NumericVector res(x.size());
double acc = 0;
for (int i=0; i < x.size(); ++i) {
acc += x[i];
if (acc < lower) acc = lower;
else if (acc > upper) acc = upper;
res[i] = acc;
}
return res;
}
What I would like:
#include <Rcpp.h>
#include <float.h>
#include <cmath>
#include <climits> //for LLONG_MIN and LLONG_MAX
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector cumsum_bounded(NumericVector x, long long int upper = LLONG_MAX, long long int lower = LLONG_MIN) {
NumericVector res(x.size());
double acc = 0;
for (int i=0; i < x.size(); ++i) {
acc += x[i];
if (acc < lower) acc = lower;
else if (acc > upper) acc = upper;
res[i] = acc;
}
return res;
}
In short, yes its possible but it requires finesse that involves creating an intermediary function or embedding sorting logic within the main function.
In long, Rcpp attributes only supports a limit feature set of values. These values are listed in the Rcpp FAQ 3.12 entry
String literals delimited by quotes (e.g. "foo")
Integer and Decimal numeric values (e.g. 10 or 4.5)
Pre-defined constants including:
Booleans: true and false
Null Values: R_NilValue, NA_STRING, NA_INTEGER, NA_REAL, and NA_LOGICAL.
Selected vector types can be instantiated using the
empty form of the ::create static member function.
CharacterVector, IntegerVector, and NumericVector
Matrix types instantiated using the rows, cols constructor Rcpp::Matrix n(rows,cols)
CharacterMatrix, IntegerMatrix, and NumericMatrix)
If you were to specify numerical values for LLONG_MAX and LLONG_MIN this would meet the criteria to directly use Rcpp attributes on the function. However, these values are implementation specific. Thus, it would not be ideal to hardcode them. Thus, we have to seek an outside solution: the Rcpp::Nullable<T> class to enable the default NULL value. The reason why we have to wrap the parameter type with Rcpp::Nullable<T> is that NULL is a very special and can cause heartache if not careful.
The NULL value, unlike others on the real number line, will not be used to bound your values in this case. As a result, it is the perfect candidate to use on the function call. There are two choices you then have to make: use Rcpp::Nullable<T> as the parameters on the main function or create a "logic" helper function that has the correct parameters and can be used elsewhere within your application without worry. I've opted for the later below.
#include <Rcpp.h>
#include <float.h>
#include <cmath>
#include <climits> //for LLONG_MIN and LLONG_MAX
using namespace Rcpp;
NumericVector cumsum_bounded_logic(NumericVector x,
long long int upper = LLONG_MAX,
long long int lower = LLONG_MIN) {
NumericVector res(x.size());
double acc = 0;
for (int i=0; i < x.size(); ++i) {
acc += x[i];
if (acc < lower) acc = lower;
else if (acc > upper) acc = upper;
res[i] = acc;
}
return res;
}
// [[Rcpp::export]]
NumericVector cumsum_bounded(NumericVector x,
Rcpp::Nullable<long long int> upper = R_NilValue,
Rcpp::Nullable<long long int> lower = R_NilValue) {
if(upper.isNotNull() && lower.isNotNull()){
return cumsum_bounded_logic(x, Rcpp::as< long long int >(upper), Rcpp::as< long long int >(lower));
} else if(upper.isNull() && lower.isNotNull()){
return cumsum_bounded_logic(x, LLONG_MAX, Rcpp::as< long long int >(lower));
} else if(upper.isNotNull() && lower.isNull()) {
return cumsum_bounded_logic(x, Rcpp::as< long long int >(upper), LLONG_MIN);
} else {
return cumsum_bounded_logic(x, LLONG_MAX, LLONG_MIN);
}
// Required to quiet compiler
return x;
}
Test Output
cumsum_bounded(a, 5)
## [1] 1 2 3 4 5 5 5
cumsum_bounded(a, 5, 2)
## [1] 2 3 4 5 5 5 5

Rcpp Armadillo, submatrices and subvectors

I try to translate some R code into RcppArmadillo and therefore I would also like to do the following:
Assume there is a nonnegative vector v and a matrix M, both with for example m rows. I would like to get rid of all rows in the matrix M whenever there is a zero in the corresponding row of the vector v and afterwards also get rid of all entries that are zero in the vector v. Using R this is simply just the following:
M = M[v>0,]
v = v[v>0]
So my question is if there is a way to do this in RcppArmadillo. Since I am quite new to any programming language I was not able to find anything that could solve my problem, although I think that I am not the first one who asks this maybe quite easy question.
Of course there is a way to go about subsetting elements in both Rcpp (subsetting with Rcpp) and RcppArmadillo (Armadillo subsetting).
Here is a way to replicate the behavior of R subsets in Armadillo.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// Isolate by Row
// [[Rcpp::export]]
arma::mat vec_subset_mat(const arma::mat& x, const arma::uvec& idx) {
return x.rows(find(idx > 0));
}
// Isolate by Element
// [[Rcpp::export]]
arma::vec subset_vec(const arma::vec& x) {
return x.elem(find(x > 0));
}
/*** R
set.seed(1334)
m = matrix(rnorm(100), 10, 10)
v = sample(0:1, 10, replace = T)
all.equal(m[v>0,], vec_subset_mat(m,v))
all.equal(v[v>0], as.numeric(subset_vec(v)))
*/

Resources