Calling 'mypackage' function within public worker - rcpp

I know the problem I have is a thread-safety issue. As the code I have now will execute with 'seThreadOptions(1)'. My question is what would be a good practice to overcome this.
I know this: Threadsafe function pointer with Rcpp and RcppParallel via std::shared_ptr Will come into play somehow. And I have also been thinking/playing around with making the internal function part of the structure for the parallel worker. Realistically, I am calling two internal functions and I would like one to be variable and the other to be constant, this tends me to think that i will need 2 solutions.
The error is that the R session, in rstudio, crashes.
Two things of note here:
1. if I 'setThreadOptions(1)' this runs fine.
2. if I move 'myfunc' into the main cpp file and make the call simply 'myfunc' this also runs fine.
Here is a detailed example:
First cpp file:
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::interfaces(cpp)]]
// [[Rcpp::plugins(cpp11)]]
#include "RcppArmadillo.h"
using namespace arma;
using namespace std;
// [[Rcpp::export]]
double myfunc(arma::vec vec_in){
int Len = arma::size(vec_in)[0];
return (vec_in[0] +vec_in[1])/Len;
}
Second,cpp file:
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(RcppParallel)]]
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(ParallelExample)]]
#include "RcppArmadillo.h"
#include "RcppParallel.h"
#include "ParallelExample.h"
#include <random>
#include <memory>
#include <math.h>
using namespace Rcpp;
using namespace arma;
using namespace RcppParallel;
using namespace std;
struct PARALLEL_WORKER : public Worker{
const arma::vec &input;
arma::vec &output;
PARALLEL_WORKER(const arma::vec &input, arma::vec &output) : input(input), output(output) {}
void operator()(std::size_t begin, std::size_t end){
std::mt19937 engine(1);
// Create a loop that runs through a selected section of the total Boot_reps
for( int k = begin; k < end; k ++){
engine.seed(k);
arma::vec index = input;
std::shuffle( index.begin(), index.end(), engine);
output[k] = ParallelExample::myfunc(index);
}
}
};
// [[Rcpp::export]]
arma::vec Parallelfunc(int Len_in){
arma::vec input = arma::regspace(0, 500);
arma::vec output(Len_in);
PARALLEL_WORKER parallel_woker(input, output);
parallelFor( 0, Len_in, parallel_woker);
return output;
}
Makevars, as I am using a macintosh:
CXX_STD = CXX11
PKG_CXXFLAGS += -I../inst/include
And Namespace:
exportPattern("^[[:alpha:]]+")
importFrom(Rcpp, evalCpp)
importFrom(RcppParallel,RcppParallelLibs)
useDynLib(ParallelExample, .registration = TRUE)
export(Parallelfunc)

When you call ParallelExample::myfunc, you are calling a function defined in inst/include/ParallelExample_RcppExport.h, which uses the R API. This is something one must not do in a parallel context. I see two possibilities:
Convert myfunc to header-only and include it in int/include/ParallelExample.h.
If the second cpp file is within the same package, put a suitable declaration for myfunc into src/first.h, include that file in both src/first.cpp and src/second.cpp, and call myfunc instead of ParallelExample::myfunc. After all, it is not necessary to register a function with R if you only want to call it within the same package. Registring with R is for functions that are called from the outside.

In some ways this kinda defeats the purpose of the built in interface cpp feature of Rcpp.
First, cpp saved as 'ExampleInternal.h':
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp11)]]
#include "RcppArmadillo.h"
using namespace arma;
using namespace std;
namespace ExampleInternal
{
double myfunc3(arma::vec vec_in){
int Len = arma::size(vec_in)[0];
return (vec_in[0] +vec_in[1])/Len;
}
}
and second:
#include "ParallelExample.h"
#include "ExampleInternal.h"
#include <random>
#include <memory>
#include <math.h>
using namespace Rcpp;
using namespace arma;
using namespace RcppParallel;
using namespace ExampleInternal;
using namespace std;
struct PARALLEL_WORKER : public Worker{
const arma::vec &input;
arma::vec &output;
PARALLEL_WORKER(const arma::vec &input, arma::vec &output) : input(input), output(output) {}
void operator()(std::size_t begin, std::size_t end){
std::mt19937 engine(1);
// Create a loop that runs through a selected section of the total Boot_reps
for( int k = begin; k < end; k ++){
engine.seed(k);
arma::vec index = input;
std::shuffle( index.begin(), index.end(), engine);
output[k] = ExampleInternal::myfunc3(index);
}
}
};
// [[Rcpp::export]]
arma::vec Parallelfunc(int Len_in){
arma::vec input = arma::regspace(0, 500);
arma::vec output(Len_in);
PARALLEL_WORKER parallel_woker(input, output);
parallelFor( 0, Len_in, parallel_woker);
return output;
}

Related

Pointer of arma::mat or arma::vet as arguments of Rcpp function for internal purpose only

I am developing an R package using RcppArmadillo. I was writing a few util functions, which manipulate arma::mat and arma::vec objects. So I was trying to use pointer of arma::mat (or arma::vec) as arguments of those functions. Just like the following C++ example (https://onlinegdb.com/mNczwaPaV), I just want to pass the address of object, then manipulate the object value:
#include <iostream>
using namespace std;
void plus_one(int *x){
*x = *x + 1;
}
int main(){
int x = 1;
plus_one(&x);
printf("%d", x);
return 0;
}
2
...Program finished with exit code 0
Press ENTER to exit console.
Here is a toy example I was trying. RStudio gave me the error message "called object type 'arma::vec *' (aka 'Col *') is not a function or function pointer."
#include <RcppArmadillo.h>
using namespace Rcpp;
//
// [[Rcpp::depends(RcppArmadillo)]]
void f2(arma::vec *v){
*v = (*v)%log(*v) + (1-(*v))*log(1-(*v));
}
void trim(arma::vec *v, double tol){
*v(find(*v<=0.0)).fill(tol);
*v(find(*v>=1.0)).fill(1-tol);
}
// [[Rcpp::export]]
arma::vec f1(arma::vec v){
trim(&v, 1e-8);
return(f2(&v));
}
/*** R
f1(seq(0,1,0.2))
*/
I don't think v.memptr() allows me to manipulate the vector by R-like vector operations. For example,
double* v_mem = v.memptr();
*v_mem+1;
does not give the entrywise addition result. (Here, I want is v+1 in R). Do you have any suggestions?
Thank you!

RcppParallel using References in workers

I have been experiment with using references for speed improvements. The (not working) trivial example below, probably won't see any improvements. However, I think by not copying the data into a separate (cost) function, should save some time for a non trivial example.
As for now I have the example in an R-project as three c++ files:
header
#ifndef ExampleInternal_H
#define ExampleInternal_H
namespace ExampleInternal{
#include <RcppArmadillo.h>
#include <RcppParallel.h>
#include <Rcpp.h>
void myfuncA(arma::rowvec &vec_in, arma::colvec& data){
vec_in.at(1) = vec_in.at(0)*arma::accu(data);
}
struct PARALLEL_WORKER : RcppParallel::Worker{
arma::mat &input_output;
const arma::colvec &data_in;
PARALLEL_WORKER(arma::mat &input_output, const arma::colvec &data_in);
void operator()(std::size_t begin, std::size_t end);
};
}
#endif
Function
#include <RcppArmadillo.h>
#include <RcppParallel.h>
#include "ExampleInternal.h"
using namespace ExampleInternal;
// [[Rcpp::export]]
arma::mat Parallelfunc(int Len_in, const arma::colvec data_in){
arma::mat input(Len_in, 2, arma::fill::zeros);
for(unsigned int i = 0; i < Len_in; i ++){
input.at(i, 0) =i;
}
ExampleInternal::PARALLEL_WORKER worker(input, data_in);
parallelFor(0, Len_in, worker);
return input;
}
Parallel Worker
#include <RcppArmadillo.h>
#include <RcppParallel.h>
#include "ExampleInternal.h"
using namespace RcppParallel;
using namespace ExampleInternal;
namespace ExampleInternal{
PARALLEL_WORKER::PARALLEL_WORKER(arma::mat &input_output, const arma::colvec &data_in) : input_output(input_output), data_in(data_in) {}
void PARALLEL_WORKER::operator()(std::size_t begin, std::size_t end){
for(unsigned int k = begin; k < end; k ++){
ExampleInternal::myfuncA(input_output.row(k), data_in);
}
}
}

Adding function pointers to Parallel Worker

Building off the example here Parallel Worker in namespace, I would like to employ function pointers with the Parallel Worker.
The code below produces an error along the lines of: "cannot initialize a new value of type (**) with a return value of (*)"
ExampleInternal.h
#ifndef ExampleInternal_H
#define ExampleInternal_H
namespace ExampleInternal{
#include <RcppArmadillo.h>
#include <RcppParallel.h>
#include <Rcpp.h>
#include <memory>
double myfuncA(arma::vec vec_in);
double myfuncB(arma::vec vec_in);
typedef double (*funcPtr)(arma::vec);
std::shared_ptr<funcPtr> selectfunc(std::string abc){
if(abc == "A"){
return std::make_shared<funcPtr>(new funcPtr(&ExampleInternal::myfuncA));
}else {
return std::make_shared<funcPtr>(new funcPtr(&ExampleInternal::myfuncB));
}
}
struct PARALLEL_WORKER : RcppParallel::Worker{
const arma::vec &input;
std::shared_ptr<funcPtr> &Ptr;
arma::vec &output;
PARALLEL_WORKER( const arma::vec &input, std::shared_ptr<funcPtr> &Ptr, arma::vec &output);
void operator()(std::size_t begin, std::size_t end);
};
}
#endif
myfuncA.cpp
#include <RcppArmadillo.h>
#include <RcppParallel.h>
#include "ExampleInternal.h"
using namespace arma;
namespace ExampleInternal{
double myfuncA(arma::vec vec_in){
int Len = arma::size(vec_in)[0];
return (vec_in[0] +vec_in[1])/Len;
}
} // Close namespace
myfuncB.cpp
#include <RcppArmadillo.h>
#include <RcppParallel.h>
#include "ExampleInternal.h"
using namespace arma;
namespace ExampleInternal{
double myfuncB(arma::vec vec_in){
int Len = arma::size(vec_in)[0];
return (vec_in[0] +vec_in[1])*Len;
}
} // Close namespace
Parallel_worker.cpp
#include <RcppArmadillo.h>
#include <RcppParallel.h>
#include <random>
#include <memory>
#include "ExampleInternal.h"
using namespace RcppParallel;
using namespace ExampleInternal;
namespace ExampleInternal{
PARALLEL_WORKER::PARALLEL_WORKER(const arma::vec &input, std::shared_ptr<ExampleInternal::funcPtr> &Ptr, arma::vec &output) : input(input), Ptr(Ptr), output(output) {}
void PARALLEL_WORKER::operator()(std::size_t begin, std::size_t end){
std::mt19937 engine(1);
// Create a loop that runs through a selected section of the total Boot_reps
for( int k = begin; k < end; k ++){
engine.seed(k);
arma::vec index = input;
std::shuffle( index.begin(), index.end(), engine);
output[k] = Ptr(index);
}
}
} //Close Namespace
Parallel_func.cpp
#include <RcppArmadillo.h>
#include <RcppParallel.h>
#include <memory>
#include "ExampleInternal.h"
using namespace ExampleInternal;
// [[Rcpp::export]]
arma::vec Parallelfunc(int Len_in, std::string func_letter){
std::shared_ptr<funcPtr> df = ExampleInternal::selectfunc(func_letter);
arma::vec input = arma::regspace(0, 500);
arma::vec output(Len_in);
ExampleInternal::PARALLEL_WORKER parallel_woker(input, df, output);
parallelFor( 0, Len_in, parallel_woker);
return output;
}
You may also need a makevars to specify c++11, on mac:
CXX_STD = CXX11
PKG_LIBS += $(shell ${R_HOME}/bin/Rscript -e "RcppParallel::RcppParallelLibs()")
Ok this works. And the shared_ptr is not needed.
ExampleInternal.h
#ifndef ExampleInternal_H
#define ExampleInternal_H
namespace ExampleInternal{
#include <RcppArmadillo.h>
#include <RcppParallel.h>
#include <Rcpp.h>
#include <memory>
double myfuncA(arma::vec vec_in);
double myfuncB(arma::vec vec_in);
typedef double (*funcPtr)(arma::vec);
ExampleInternal::funcPtr func_select(int abc);
struct PARALLEL_WORKER : RcppParallel::Worker{
const arma::vec &input;
ExampleInternal::funcPtr &Ptr;
arma::vec &output;
PARALLEL_WORKER( const arma::vec &input, ExampleInternal::funcPtr &Ptr, arma::vec &output);
void operator()(std::size_t begin, std::size_t end);
};
}
#endif
func_select.cpp
#include <RcppArmadillo.h>
#include <RcppParallel.h>
#include "ExampleInternal.h"
namespace ExampleInternal{
ExampleInternal::funcPtr func_select(int abc){
ExampleInternal::funcPtr df;
if(abc == 1){
df = &ExampleInternal::myfuncA;
}else{
df = &ExampleInternal::myfuncB;
}
return df;
}
} //close namespace
Parallel_func.cpp
#include <RcppArmadillo.h>
#include <RcppParallel.h>
#include <memory>
#include "ExampleInternal.h"
using namespace ExampleInternal;
// [[Rcpp::export]]
arma::vec Parallelfunc(int Len_in, int func_num){
ExampleInternal::funcPtr point = ExampleInternal::func_select(func_num);
arma::vec input = arma::regspace(0, 500);
arma::vec output(Len_in);
ExampleInternal::PARALLEL_WORKER parallel_woker(input, point, output);
parallelFor( 0, Len_in, parallel_woker);
return output;
}

Issue in predicate function in wait in thread C++

I am trying to put the condition inside a function but it is throwing confusing compile time error . While if I write it in lambda function like this []{ retur i == k;} it is showing k is unidentified . Can anybody tell How to solve this problem .
#include <iostream>
#include <mutex>
#include <sstream>
#include <thread>
#include <chrono>
#include <condition_variable>
using namespace std;
condition_variable cv;
mutex m;
int i;
bool check_func(int i,int k)
{
return i == k;
}
void print(int k)
{
unique_lock<mutex> lk(m);
cv.wait(lk,check_func(i,k)); // Line 33
cout<<"Thread no. "<<this_thread::get_id()<<" parameter "<<k<<"\n";
i++;
return;
}
int main()
{
thread threads[10];
for(int i = 0; i < 10; i++)
threads[i] = thread(print,i);
for(auto &t : threads)
t.join();
return 0;
}
Compiler Error:
In file included from 6:0:
/usr/include/c++/4.9/condition_variable: In instantiation of 'void std::condition_variable::wait(std::unique_lock<std::mutex>&, _Predicate) [with _Predicate = bool]':
33:30: required from here
/usr/include/c++/4.9/condition_variable:97:14: error: '__p' cannot be used as a function
while (!__p())
^
wait() takes a predicate, which is a callable unary function returning bool. wait() uses that predicate like so:
while (!pred()) {
wait(lock);
}
check_func(i,k) is a bool. It's not callable and it's a constant - which defeats the purpose. You're waiting on something that can change. You need to wrap it in something that can be repeatedly callable - like a lambda:
cv.wait(lk, [&]{ return check_func(i,k); });

Problems when reading integers with scanf

I know that it might be a stupid question but can you tell me why the following patch of code fails? I see nothing wrong. I am trying to read integers using scanf. I have included the necessary library, but when I run the program it crashes after I read the first s. Thank you.
#include <iostream>
#include <ctime>
#include <cstdlib>
#include <cmath>
#include <cstdio>
#include <vector>
using namespace std;
int main()
{
int n, x;
scanf("%d", &n); scanf("%d", &x);
vector< pair<int, int> > moments;
for(int i = 0; i < n; ++i)
{
int f, s;
scanf("%d", &f);
scanf("%d", &s );
moments[i].first = f;
moments[i].second = s;
}
return 0;
}
That is not the way to assign values to moments since moments[i] does not yet exist. Try:
pair<int, int> thing;
thing = make_pair(f,s);
moments.push_back(thing);
instead of your assignements to moments elements.

Resources