Extract a data.frame from a list within Rcpp - rcpp

This is probably a really simple question, but I can't figure out what's wrong.
I have a list that I pass to an Rcpp function, and the first element of that list is a data.frame.
How do I get that data.frame?
bar = list(df = data.frame(A = 1:3,B=letters[1:3]),some_other_variable = 2)
foo(bar)
And the following C++ code:
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::NumericVector bar(Rcpp::List test){
Rcpp::DataFrame df_test = test["df"];
Rcpp::NumericVector result = df_test["A"];
return result;
}
I get the following error on the line DataFrame df_test = test["df"]:
error: conversion from 'Rcpp::Vector<19>::NameProxy{aka 'Rcpp::internal::generic_name_proxy<19, Rcpp::PreserveStorage> to 'Rcpp::DataFrame{aka 'Rcpp::DataFrame_ImplRcpp::PreserveStorage ambiguous
Anyone know what I'm missing? Thanks.

There may be a combination of issues going on with the instantiation and construction of List and DataFrame objects. See the (old !!) RcppExamples package for working examples.
Here is a repaired version of your code that works and does something with the vector inside the data.frame:
Code
#include <Rcpp.h>
// [[Rcpp::export]]
int bar(Rcpp::List test){
Rcpp::DataFrame df(test["df"]);
Rcpp::IntegerVector ivec = df["A"];
return Rcpp::sum(ivec);
}
/*** R
zz <- list(df = data.frame(A = 1:3,B=letters[1:3]),some_other_variable = 2)
bar(zz)
*/
Demo
> Rcpp::sourceCpp("~/git/stackoverflow/70035630/answer.cpp")
> zz <- list(df = data.frame(A = 1:3,B=letters[1:3]),some_other_variable = 2)
> bar(zz)
[1] 6
>
Edit: For completeness, the assignment op can be used with a SEXP as in SEXP df2 = test["df"]; which can then used to instantiate a data.frame. Template programming is difficult and not all corners are completely smoothed.

Related

Conversion of Rcpp::NumericMatrix to Eigen::MatrixXd

I am facing a very similar issue to these questions:
convert Rcpp::NumericVector to Eigen::VectorXd
Converting between NumericVector/Matrix and VectorXd/MatrixXd in Rcpp(Eigen) to perform Cholesky solve
I am writing an R-package, that uses the RcppEigen library for Matrix arithmetic. My problem is that the project is not compiling because of an error in the conversion of the Rcpp::NumericMatrix input to an Eigen::MatrixXd.
The file looks like this:
#include <map>
#include <Rcpp.h>
...
using namespace Eigen;
...
// [[Rcpp::export]]
Rcpp::List my_function(Rcpp::NumericMatrix input_matrix)
{
...
Map<MatrixXd> GATE_matrixx(Rcpp::as<Map<MatrixXd> >(GATE_matrix));
...
}
This gives me the following error:
Myfile.cpp:40:65: required from here
C:/Users/User/AppData/Local/Programs/R/R-4.2.2/library/Rcpp/include/Rcpp/internal/Exporter.h:31:31:error:
matching function for call to 'Eigen::Map<Eigen::Matrix<double, -1,
-1> >::Map(SEXPREC*&) 31 | Exporter( SEXP x ) : t(x) | ^ In file included from
C:/Users/User/AppData/Local/Programs/R/R-4.2.2/library/RcppEigen/include/Eigen/Core:19
from
C:/Users/User/AppData/Local/Programs/R/R-4.2.2/library/RcppEigen/include/Eigen/SparseCore:11
from
C:/Users/User/AppData/Local/Programs/R/R-4.2.2/library/RcppEigen/include/Eigen/Sparse:26
from Myfile.cpp:8
I have also tried to change the line to:
MatrixXd input_matrix_eigen(Rcpp::as\<MatrixXd\>(input_matrix));
This gives me the equivalent error :
Myfile.cpp:40:54: required from here
C:/Users/User/AppData/Local/Programs/R/R
4.2.2/library/RcppEigen/include/Eigen/src/Core/Matrix.h:332:31:error: matching function for call to 'Eigen::Matrix<double, -1,
-1>::_init1<SEXPREC*>(SEXPREC* const&)
Do you have any ideas?
If more information is required to evaluate the issue, just let me know.
If you use the command
RcppEigen::RcppEigen.package.skeleton("demoPackage")
an example package demoPackage is created for you which you can install the usual way. It contains example functions to create amd return a matrix:
// [[Rcpp::export]]
Eigen::MatrixXd rcppeigen_hello_world() {
Eigen::MatrixXd m1 = Eigen::MatrixXd::Identity(3, 3);
// Eigen::MatrixXd m2 = Eigen::MatrixXd::Random(3, 3);
// Do not use Random() here to not promote use of a non-R RNG
Eigen::MatrixXd m2 = Eigen::MatrixXd::Zero(3, 3);
for (auto i=0; i<m2.rows(); i++)
for (auto j=0; j<m2.cols(); j++)
m2(i,j) = R::rnorm(0, 1);
return m1 + 3 * (m1 + m2);
}
If you set the same seed as I do you should get the same matrix
> set.seed(42)
> demoPackage::rcppeigen_hello_world()
[,1] [,2] [,3]
[1,] 8.11288 -1.694095 1.089385
[2,] 1.89859 5.212805 -0.318374
[3,] 4.53457 -0.283977 10.055271
>

Check BIC via RcppArmadillo(Rcpp)

I want check the BIC usage in statistics.
My little example, which is saved as check_bic.cpp, is presented as follows:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
List check_bic(const int N = 10, const int p = 20, const double seed=0){
arma_rng::set_seed(seed); // for reproducibility
arma::mat Beta = randu(p,N); //randu/randn:random values(uniform and normal distributions)
arma::vec Bic = randu(N);
uvec ii = find(Bic == min(Bic)); // may be just one or several elements
int id = ii(ii.n_elem); // fetch the last one element
vec behat = Beta.col(id); // fetch the id column of matrix Beta
List ret;
ret["Bic"] = Bic;
ret["ii"] = ii;
ret["id"] = id;
ret["Beta"] = Beta;
ret["behat"] = behat;
return ret;
}
Then I compile check_bic.cpp in R by
library(Rcpp)
library(RcppArmadillo);
sourceCpp("check_bic.cpp")
and the compilation can pass successfully.
However, when I ran
check_bic(10,20,0)
in R, it shows errors as
error: Mat::operator(): index out of bounds
Error in check_bic(10, 20, 0) : Mat::operator(): index out of bounds
I check the .cpp code line by line, and guess the problems probably
happen at
uvec ii = find(Bic == min(Bic)); // may be just one or several elements
int id = ii(ii.n_elem); // fetch the last one element
since if uvec ii only has one element, then ii.n_elem may be NaN or something
else in Rcpp (while it's ok in Matlab), while I dont konw how to
deal with case. Any help?

af::array conversion to float or double

I have been experimenting with the RcppArrayFire Package, mostly rewriting some cost functions from RcppArmadillo and can't seem to get over "no viable conversion from 'af::array' to 'float'. I have also been getting some backend errors, the example below seems free of these.
This cov-var example is written poorly just to use all relevant coding pieces from my actual cost function. As of now it is the only addition in a package generated by, "RcppArrayFire.package.skeleton".
#include "RcppArrayFire.h"
#include <Rcpp.h>
// [[Rcpp::depends(RcppArrayFire)]]
// [[Rcpp::export]]
float example_ols(const RcppArrayFire::typed_array<f32>& X_vect, const RcppArrayFire::typed_array<f32>& Y_vect){
int Len = X_vect.dims()[0];
int Len_Y = Y_vect.dims()[0];
while( Len_Y < Len){
Len --;
}
float mean_X = af::sum(X_vect)/Len;
float mean_Y = af::sum(Y_vect)/Len;
RcppArrayFire::typed_array<f32> temp(Len);
RcppArrayFire::typed_array<f32> temp_x(Len);
for( int f = 0; f < Len; f++){
temp(f) = (X_vect(f) - mean_X)*(Y_vect(f) - mean_Y);
temp_x(f) = af::pow(X_vect(f) -mean_X, 2);
}
return af::sum(temp)/af::sum(temp_x);
}
/*** R
X <- 1:10
Y <- 2*X +rnorm(10, mean = 0, sd = 1)
example_ols(X, Y)
*/
The first thing to consider is the af::sum function, which comes in different forms: An sf::sum(af::array) that returns an af::array in device memory and a templated af::sum<T>(af::array) that returns a T in host memory. So the minimal change to your example would be using af::sum<float>:
#include "RcppArrayFire.h"
#include <Rcpp.h>
// [[Rcpp::depends(RcppArrayFire)]]
// [[Rcpp::export]]
float example_ols(const RcppArrayFire::typed_array<f32>& X_vect,
const RcppArrayFire::typed_array<f32>& Y_vect){
int Len = X_vect.dims()[0];
int Len_Y = Y_vect.dims()[0];
while( Len_Y < Len){
Len --;
}
float mean_X = af::sum<float>(X_vect)/Len;
float mean_Y = af::sum<float>(Y_vect)/Len;
RcppArrayFire::typed_array<f32> temp(Len);
RcppArrayFire::typed_array<f32> temp_x(Len);
for( int f = 0; f < Len; f++){
temp(f) = (X_vect(f) - mean_X)*(Y_vect(f) - mean_Y);
temp_x(f) = af::pow(X_vect(f) -mean_X, 2);
}
return af::sum<float>(temp)/af::sum<float>(temp_x);
}
/*** R
set.seed(1)
X <- 1:10
Y <- 2*X +rnorm(10, mean = 0, sd = 1)
example_ols(X, Y)
*/
However, there are more things one can improve. In no particular order:
You don't need to include Rcpp.h.
There is an af::mean function for computing the mean of an af::array.
In general RcppArrayFire::typed_array<T> is only needed for getting arrays from R into C++. Within C++ and for the way back you can use af::array.
Even when your device does not support double, you can still use double values on the host.
In order to get good performance, you should avoid for loops and use vectorized functions, just like in R. You have to impose equal dimensions for X and Y, though.
Interestingly I get a different result when I use vectorized functions. Right now I am not sure why this is the case, but the following form makes more sense to me. You should verify that the result is what you want to get:
#include <RcppArrayFire.h>
// [[Rcpp::depends(RcppArrayFire)]]
// [[Rcpp::export]]
double example_ols(const RcppArrayFire::typed_array<f32>& X_vect,
const RcppArrayFire::typed_array<f32>& Y_vect){
double mean_X = af::mean<double>(X_vect);
double mean_Y = af::mean<double>(Y_vect);
af::array temp = (X_vect - mean_X) * (Y_vect - mean_Y);
af::array temp_x = af::pow(X_vect - mean_X, 2.0);
return af::sum<double>(temp)/af::sum<double>(temp_x);
}
/*** R
set.seed(1)
X <- 1:10
Y <- 2*X +rnorm(10, mean = 0, sd = 1)
example_ols(X, Y)
*/
BTW, an even shorter version would be:
#include <RcppArrayFire.h>
// [[Rcpp::depends(RcppArrayFire)]]
// [[Rcpp::export]]
af::array example_ols(const RcppArrayFire::typed_array<f32>& X_vect,
const RcppArrayFire::typed_array<f32>& Y_vect){
return af::cov(X_vect, Y_vect) / af::var(X_vect);
}
Generally it is a good idea to use the in-build functions as much as possible.

Rcpp sugar unique of List

I have a list of Numeric Vector and I need a List of unique elements. I tried Rcpp:unique fonction. It works very well when apply to a Numeric Vector but not to List. This is the code and the error I got.
List h(List x){
return Rcpp::unique(x);
}
Error in dyn.load("/tmp/RtmpDdKvcH/sourceCpp-x86_64-pc-linux-gnu-1.0.0/sourcecpp_272635d5289/sourceCpp_10.so") :
unable to load shared object '/tmp/RtmpDdKvcH/sourceCpp-x86_64-pc-linux-gnu-1.0.0/sourcecpp_272635d5289/sourceCpp_10.so':
/tmp/RtmpDdKvcH/sourceCpp-x86_64-pc-linux-gnu-1.0.0/sourcecpp_272635d5289/sourceCpp_10.so: undefined symbol: _ZNK4Rcpp5sugar9IndexHashILi19EE8get_addrEP7SEXPREC
It is unclear what you are doing wrong, and it is an incomplete / irreproducible question.
But there is a unit test that does just what you do, and we can do it by hand too:
R> Rcpp::cppFunction("NumericVector uq(NumericVector x) { return Rcpp::unique(x); }")
R> uq(c(1.1, 2.2, 2.2, 3.3, 27))
[1] 27.0 1.1 3.3 2.2
R>
Even if there isn't a matching Rcpp sugar function, you can call R functions from within C++. Example:
#include <Rcpp.h>
using namespace Rcpp;
Rcpp::Environment base("package:base");
Function do_unique = base["unique"];
// [[Rcpp::export]]
List myfunc(List x) {
return do_unique(x);
}
Thank you for being interested to this issue.
As I notified that, my List contains only NumericVector. I propose this code that works very well and faster than unique function in R. However its efficiency decreases when the list is large. Maybe this can help someone. Moreover, someone can also optimise this code.
List uniqueList(List& x) {
int xsize = x.size();
List xunique(x);
int s = 1;
for(int i(1); i<xsize; ++i){
NumericVector xi = x[i];
int l = 0;
for(int j(0); j<s; ++j){
NumericVector xj = x[j];
int xisize = xi.size();
int xjsize = xj.size();
if(xisize != xjsize){
++l;
}
else{
if((sum(xi == xj) == xisize)){
goto notkeep;
}
else{
++l;
}
}
}
if(l == s){
xunique[s] = xi;
++s;
}
notkeep: 0;
}
return head(xunique, s);
}
/***R
x <- list(1,42, 1, 1:3, 42)
uniqueList(x)
[[1]]
[1] 1
[[2]]
[1] 42
[[3]]
[1] 1 2 3
microbenchmark::microbenchmark(uniqueList(x), unique(x))
Unit: microseconds
expr min lq mean median uq max neval
uniqueList(x) 2.382 2.633 3.05103 2.720 2.8995 29.307 100
unique(x) 2.864 3.110 3.50900 3.254 3.4145 24.039 100
But R function becomes faster when the List is large. I am sure that someone can optimise this code.

Rcpp Armadillo, submatrices and subvectors

I try to translate some R code into RcppArmadillo and therefore I would also like to do the following:
Assume there is a nonnegative vector v and a matrix M, both with for example m rows. I would like to get rid of all rows in the matrix M whenever there is a zero in the corresponding row of the vector v and afterwards also get rid of all entries that are zero in the vector v. Using R this is simply just the following:
M = M[v>0,]
v = v[v>0]
So my question is if there is a way to do this in RcppArmadillo. Since I am quite new to any programming language I was not able to find anything that could solve my problem, although I think that I am not the first one who asks this maybe quite easy question.
Of course there is a way to go about subsetting elements in both Rcpp (subsetting with Rcpp) and RcppArmadillo (Armadillo subsetting).
Here is a way to replicate the behavior of R subsets in Armadillo.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// Isolate by Row
// [[Rcpp::export]]
arma::mat vec_subset_mat(const arma::mat& x, const arma::uvec& idx) {
return x.rows(find(idx > 0));
}
// Isolate by Element
// [[Rcpp::export]]
arma::vec subset_vec(const arma::vec& x) {
return x.elem(find(x > 0));
}
/*** R
set.seed(1334)
m = matrix(rnorm(100), 10, 10)
v = sample(0:1, 10, replace = T)
all.equal(m[v>0,], vec_subset_mat(m,v))
all.equal(v[v>0], as.numeric(subset_vec(v)))
*/

Resources