Check BIC via RcppArmadillo(Rcpp) - rcpp

I want check the BIC usage in statistics.
My little example, which is saved as check_bic.cpp, is presented as follows:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
List check_bic(const int N = 10, const int p = 20, const double seed=0){
arma_rng::set_seed(seed); // for reproducibility
arma::mat Beta = randu(p,N); //randu/randn:random values(uniform and normal distributions)
arma::vec Bic = randu(N);
uvec ii = find(Bic == min(Bic)); // may be just one or several elements
int id = ii(ii.n_elem); // fetch the last one element
vec behat = Beta.col(id); // fetch the id column of matrix Beta
List ret;
ret["Bic"] = Bic;
ret["ii"] = ii;
ret["id"] = id;
ret["Beta"] = Beta;
ret["behat"] = behat;
return ret;
}
Then I compile check_bic.cpp in R by
library(Rcpp)
library(RcppArmadillo);
sourceCpp("check_bic.cpp")
and the compilation can pass successfully.
However, when I ran
check_bic(10,20,0)
in R, it shows errors as
error: Mat::operator(): index out of bounds
Error in check_bic(10, 20, 0) : Mat::operator(): index out of bounds
I check the .cpp code line by line, and guess the problems probably
happen at
uvec ii = find(Bic == min(Bic)); // may be just one or several elements
int id = ii(ii.n_elem); // fetch the last one element
since if uvec ii only has one element, then ii.n_elem may be NaN or something
else in Rcpp (while it's ok in Matlab), while I dont konw how to
deal with case. Any help?

Related

Extract a data.frame from a list within Rcpp

This is probably a really simple question, but I can't figure out what's wrong.
I have a list that I pass to an Rcpp function, and the first element of that list is a data.frame.
How do I get that data.frame?
bar = list(df = data.frame(A = 1:3,B=letters[1:3]),some_other_variable = 2)
foo(bar)
And the following C++ code:
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericVector bar(Rcpp::List test){
Rcpp::DataFrame df_test = test["df"];
Rcpp::NumericVector result = df_test["A"];
return result;
}
I get the following error on the line DataFrame df_test = test["df"]:
error: conversion from 'Rcpp::Vector<19>::NameProxy{aka 'Rcpp::internal::generic_name_proxy<19, Rcpp::PreserveStorage> to 'Rcpp::DataFrame{aka 'Rcpp::DataFrame_ImplRcpp::PreserveStorage ambiguous
Anyone know what I'm missing? Thanks.
There may be a combination of issues going on with the instantiation and construction of List and DataFrame objects. See the (old !!) RcppExamples package for working examples.
Here is a repaired version of your code that works and does something with the vector inside the data.frame:
Code
#include <Rcpp.h>
// [[Rcpp::export]]
int bar(Rcpp::List test){
Rcpp::DataFrame df(test["df"]);
Rcpp::IntegerVector ivec = df["A"];
return Rcpp::sum(ivec);
}
/*** R
zz <- list(df = data.frame(A = 1:3,B=letters[1:3]),some_other_variable = 2)
bar(zz)
*/
Demo
> Rcpp::sourceCpp("~/git/stackoverflow/70035630/answer.cpp")
> zz <- list(df = data.frame(A = 1:3,B=letters[1:3]),some_other_variable = 2)
> bar(zz)
[1] 6
>
Edit: For completeness, the assignment op can be used with a SEXP as in SEXP df2 = test["df"]; which can then used to instantiate a data.frame. Template programming is difficult and not all corners are completely smoothed.

RcppArrayFire passing a matrix row as af::array input

In this simple example I would like to subset a matrix by row and pass it to another cpp function; the example demonstrates this works by passing an input array to the other function first.
#include "RcppArrayFire.h"
using namespace Rcpp;
af::array theta_check_cpp( af::array theta){
if(*theta(1).host<double>() >= 1){
theta(1) = 0;
}
return theta;
}
// [[Rcpp::export]]
af::array theta_check(RcppArrayFire::typed_array<f64> theta){
const int theta_size = theta.dims()[0];
af::array X(2, theta_size);
X(0, af::seq(theta_size)) = theta_check_cpp( theta );
X(1, af::seq(theta_size)) = theta;
// return X;
Rcpp::Rcout << " works till here";
return theta_check_cpp( X.row(1) );
}
/*** R
theta <- c( 2, 2, 2)
theta_check(theta)
*/
The constructor you are using to create X has an argument ty for the data type, which defaults to f32. Therefore X uses 32 bit floats and you cannot extract a 64 bit host pointer from that. Either use
af::array X(2, theta_size, f64);
to create an array using 64 bit doubles, or extract a 32 bit host pointer via
if(*theta(1).host<float>() >= 1){
...

random shuffle in RcppArrayFire

I am experiencing trouble getting this example to run correctly. Currently it produces the same random sample for every iteration and seed input, despite the seed changing as shown by af::getSeed().
#include "RcppArrayFire.h"
#include <random>
using namespace Rcpp;
using namespace RcppArrayFire;
// [[Rcpp::export]]
af::array random_test(RcppArrayFire::typed_array<f64> theta, int counts, int seed){
const int theta_size = theta.dims()[0];
af::array out(counts, theta_size, f64);
for(int f = 0; f < counts; f++){
af::randomEngine engine;
af_set_seed(seed + f);
//Rcpp::Rcout << af::getSeed();
af::array out_temp = af::randu(theta_size, u8, engine);
out(f, af::span) = out_temp;
// out(f, af::span) = theta(out_temp);
}
return out;
}
/*** R
theta <- 1:10
random_test(theta, 5, 1)
random_test(theta, 5, 2)
*/
The immediate problem is that you are creating a random engine within each iteration of the loop but set the seed of the global random engine. Either you set the seed of the local engine via engine.setSeed(seed), or you get rid of the local engine all together, letting af::randu default to using the global engine.
However, it would still be "unusual" to change the seed during each step of the loop. Normally one sets the seed only once, e.g.:
// [[Rcpp::depends(RcppArrayFire)]]
#include "RcppArrayFire.h"
// [[Rcpp::export]]
af::array random_test(RcppArrayFire::typed_array<f64> theta, int counts, int seed){
const int theta_size = theta.dims()[0];
af::array out(counts, theta_size, f64);
af::setSeed(seed);
for(int f = 0; f < counts; f++){
af::array out_temp = af::randu(theta_size, u8);
out(f, af::span) = out_temp;
}
return out;
}
BTW, it makes sense to parallelize this as long as your device has enough memory. For example, you could generate a random counts x theta_size matrix in one go using af::randu(counts, theta_size, u8).

af::array conversion to float or double

I have been experimenting with the RcppArrayFire Package, mostly rewriting some cost functions from RcppArmadillo and can't seem to get over "no viable conversion from 'af::array' to 'float'. I have also been getting some backend errors, the example below seems free of these.
This cov-var example is written poorly just to use all relevant coding pieces from my actual cost function. As of now it is the only addition in a package generated by, "RcppArrayFire.package.skeleton".
#include "RcppArrayFire.h"
#include <Rcpp.h>
// [[Rcpp::depends(RcppArrayFire)]]
// [[Rcpp::export]]
float example_ols(const RcppArrayFire::typed_array<f32>& X_vect, const RcppArrayFire::typed_array<f32>& Y_vect){
int Len = X_vect.dims()[0];
int Len_Y = Y_vect.dims()[0];
while( Len_Y < Len){
Len --;
}
float mean_X = af::sum(X_vect)/Len;
float mean_Y = af::sum(Y_vect)/Len;
RcppArrayFire::typed_array<f32> temp(Len);
RcppArrayFire::typed_array<f32> temp_x(Len);
for( int f = 0; f < Len; f++){
temp(f) = (X_vect(f) - mean_X)*(Y_vect(f) - mean_Y);
temp_x(f) = af::pow(X_vect(f) -mean_X, 2);
}
return af::sum(temp)/af::sum(temp_x);
}
/*** R
X <- 1:10
Y <- 2*X +rnorm(10, mean = 0, sd = 1)
example_ols(X, Y)
*/
The first thing to consider is the af::sum function, which comes in different forms: An sf::sum(af::array) that returns an af::array in device memory and a templated af::sum<T>(af::array) that returns a T in host memory. So the minimal change to your example would be using af::sum<float>:
#include "RcppArrayFire.h"
#include <Rcpp.h>
// [[Rcpp::depends(RcppArrayFire)]]
// [[Rcpp::export]]
float example_ols(const RcppArrayFire::typed_array<f32>& X_vect,
const RcppArrayFire::typed_array<f32>& Y_vect){
int Len = X_vect.dims()[0];
int Len_Y = Y_vect.dims()[0];
while( Len_Y < Len){
Len --;
}
float mean_X = af::sum<float>(X_vect)/Len;
float mean_Y = af::sum<float>(Y_vect)/Len;
RcppArrayFire::typed_array<f32> temp(Len);
RcppArrayFire::typed_array<f32> temp_x(Len);
for( int f = 0; f < Len; f++){
temp(f) = (X_vect(f) - mean_X)*(Y_vect(f) - mean_Y);
temp_x(f) = af::pow(X_vect(f) -mean_X, 2);
}
return af::sum<float>(temp)/af::sum<float>(temp_x);
}
/*** R
set.seed(1)
X <- 1:10
Y <- 2*X +rnorm(10, mean = 0, sd = 1)
example_ols(X, Y)
*/
However, there are more things one can improve. In no particular order:
You don't need to include Rcpp.h.
There is an af::mean function for computing the mean of an af::array.
In general RcppArrayFire::typed_array<T> is only needed for getting arrays from R into C++. Within C++ and for the way back you can use af::array.
Even when your device does not support double, you can still use double values on the host.
In order to get good performance, you should avoid for loops and use vectorized functions, just like in R. You have to impose equal dimensions for X and Y, though.
Interestingly I get a different result when I use vectorized functions. Right now I am not sure why this is the case, but the following form makes more sense to me. You should verify that the result is what you want to get:
#include <RcppArrayFire.h>
// [[Rcpp::depends(RcppArrayFire)]]
// [[Rcpp::export]]
double example_ols(const RcppArrayFire::typed_array<f32>& X_vect,
const RcppArrayFire::typed_array<f32>& Y_vect){
double mean_X = af::mean<double>(X_vect);
double mean_Y = af::mean<double>(Y_vect);
af::array temp = (X_vect - mean_X) * (Y_vect - mean_Y);
af::array temp_x = af::pow(X_vect - mean_X, 2.0);
return af::sum<double>(temp)/af::sum<double>(temp_x);
}
/*** R
set.seed(1)
X <- 1:10
Y <- 2*X +rnorm(10, mean = 0, sd = 1)
example_ols(X, Y)
*/
BTW, an even shorter version would be:
#include <RcppArrayFire.h>
// [[Rcpp::depends(RcppArrayFire)]]
// [[Rcpp::export]]
af::array example_ols(const RcppArrayFire::typed_array<f32>& X_vect,
const RcppArrayFire::typed_array<f32>& Y_vect){
return af::cov(X_vect, Y_vect) / af::var(X_vect);
}
Generally it is a good idea to use the in-build functions as much as possible.

Most efficient way to translate C++ classes/objects into Rcpp?

I am new to Rccp and came across a problem by translating C++ code into the Rcpp environment – and I could not find a solution so far (this is an edited version of my original post that I think was unclear):
Background: I have multiple parameters and large matrices/arrays that needs to be transferred to the C++ level. In C++, I have several functions that need to access these parameters and matrices and in some cases, change values etc. In C++ I would create classes that combine all parameters and matrices as well as the functions that need to access them. By doing so, I dod not need to pass them (each time) to the function.
Issue: I could not figure out how that may work with Rcpp. In the example below (the function is stupid, but hopefully an easy way to illustrate my issue), I create a matrix in R that is then used in C++. However, I then hand the entire matrix over to a sub-function in order to use the matrix within this function. This seems a very bad idea and I would rater like to have the matrix M in the namespace memory and access it in the sub function without cloning it.
#include <RcppArmadillo.h>
//[[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
double fnc1 (int t, int s, arma::mat M) // I would prefer not to have M in the arguments but rather available in the namespace
{
double out = M(t,s) - M(t,s);
return out;
}
// [[Rcpp::export]]
arma::mat Rout (arma::mat M)
{
int ncol = M.n_cols;
int nrow = M.n_rows;
for(int c = 0; c<ncol; ++c)
{
for(int r = 0; r<nrow; ++r)
{
M(r,c) = fnc1(r, c, M);
}
}
return M;
}
/*** R
m <- matrix(runif(50), ncol = 10, nrow = 5)
Rout(m)
*/
Okay, let's talk about R to C++. At somepoint, you have to have a function exported to R that will receive an R object and pass it back to C++. Once inside C++, the sky's the limit as to how you want to structure the interaction with that object. The thought process of:
However, I then hand the entire matrix over to a sub-function in order to use the matrix within this function. This seems a very bad idea and I would rater like to have the matrix M in the namespace memory and access it in the sub function without cloning it.
is slightly problematic as you have now just introduced a global variable called M to handle your data. If M is not initialized, then the routine will falter. If you inadvertently modify M, then the data will change for all routines. So, I'm not sure going the global variable approach is the solution you desire.
The main issue you seem to have is the emphasized portion regarding a "clone". When working with C++, the default pass by construct is to copy the object. However, unlike R, it is very easy to pass by reference by prefixing object names with & and, thus, negate a copy entirely. This localizes the process.
Pass-by-Reference Demo
#include <RcppArmadillo.h>
//[[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
double fnc1 (int t, int s, const arma::mat& M) {
double out = M(t,s) - M(t,s);
return out;
}
// [[Rcpp::export]]
arma::mat Rout (arma::mat& M) {
int ncol = M.n_cols;
int nrow = M.n_rows;
for(int c = 0; c<ncol; ++c) {
for(int r = 0; r<nrow; ++r) {
M(r,c) = fnc1(r, c, M);
}
}
return M;
}
/*** R
m <- matrix(runif(50), ncol = 10, nrow = 5)
Rout(m)
*/
Global Variable Demo
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// Create a namespace to store M
namespace toad {
arma::mat M;
}
double fnc1 (int t, int s)
{
double out = toad::M(t,s) - toad::M(t,s);
return out;
}
// [[Rcpp::export]]
void Rin (arma::mat M)
{
toad::M = M;
}
// [[Rcpp::export]]
void Rmanipulate()
{
int ncol = toad::M.n_cols;
int nrow = toad::M.n_rows;
for(int c = 0; c<ncol; ++c)
{
for(int r = 0; r<nrow; ++r)
{
toad::M(r,c) = fnc1(r, c);
}
}
}
// [[Rcpp::export]]
arma::mat Rout (){
return toad::M;
}
/*** R
m <- matrix(runif(50), ncol = 10, nrow = 5)
Rin(m)
Rmanipulate()
Rout()
*/

Resources