I never have used C++ before so this may be a dumb question.
In R I use this function to read from a socket:
socket_bin_reader <- function(in_sock) {
string_read <- raw(0)
while((rd <- readBin(in_sock, what = "raw", n=1)) > 0) {
if (rd == 0xff) rd <- readBin(in_sock, what = "raw", n =1)
string_read <- c(string_read, rd)
}
return(string_read)
}
This functions does exactly what I need, but has the disadvantage that it takes a lot of time to read large quantities of data. Therefore I am looking for ways to use C++.
I found this example on how to read byte-by-byte from a file (cpp-byte-file-reading)
The body from my function will probably be based on this example. My guess is that it will look like:
// [[Rcpp::export]]
NumericVector socket_bin_reader_C(??? in_sock) {
NumericVector out = NumericVector::create(??);
ifstream infile(in_sock, ios::in | ios::binary);
while (rd = infile.read(char*) > 0) {
if (rd == 0xff) rd = infile.read(char*);
add rd to out;
}
}
But I have two questions:
In Rcpp you have to provide a class for each parameter. What is the class for a socketConnection?
I know that in C or C++ you have to allocate memory. How can I dynamically reallocate more memory for the return vector?
Ben
Related
I am training a linear model using py-torch and I am saving it to a file with the "save" function call. I have another code that loads the model in C++ and performs inference.
I would like to instruct the Torch CPP Library to use a specific memory blob at the final output tensor. Is this even possible? If yes, how? Below you can see a small example of what I am trying to achieve.
#include <iostream>
#include <memory>
#include <torch/script.h>
int main(int argc, const char* argv[]) {
if (argc != 3) {
std::cerr << "usage: example-app <path-to-exported-script-module>\n";
return -1;
}
long numElements = (1024*1024)/sizeof(float) * atoi(argv[2]);
float *a = new float[numElements];
float *b = new float[numElements];
float *c = new float[numElements*4];
for (int i = 0; i < numElements; i++){
a[i] = i;
b[i] = -i;
}
//auto options = torch::TensorOptions().dtype(torch::kFloat64);
at::Tensor a_t = torch::from_blob((float*) a, {numElements,1});
at::Tensor b_t = torch::from_blob((float*) b, {numElements,1});
at::Tensor out = torch::from_blob((float*) c, {numElements,4});
at::Tensor c_t = at::cat({a_t,b_t}, 1);
at::Tensor d_t = at::reshape(c_t, {numElements,2});
torch::jit::script::Module module;
try {
module = torch::jit::load(argv[1]);
}
catch (const c10::Error& e) {
return -1;
}
out = module.forward({d_t}).toTensor();
std::cout<< out.sizes() << "\n";
delete [] a;
delete [] b;
delete [] c;
return 0;
}
So, I am allocating memory into "c" and then I am casting creating a tensor out of this memory. I store this memory into a tensor named "out". I load the model when I call the forward method. I observe that the resulted data are copied/moved into the "out" tensor. However, I would like to instruct Torch to directly store into "out" memory. Is this possible?
Somewhere in libtorch source code (I don' remember where, I'll try to find the file), there is an operator which is something like below (notice the last &&)
torch::tensor& operator=(torch::Tensor rhs) &&;
and which does what you need if I remember correctly. Basically torch assumes that if you allocate a tensor rhs to an rvalue reference tensor, then you actually mean to copy rhs into the underlying storage.
So in your case, that would be
std::move(out) = module.forward({d_t}).toTensor();
or
torch::from_blob((float*) c, {numElements,4}) = module.forward({d_t}).toTensor();
In this simple example I would like to subset a matrix by row and pass it to another cpp function; the example demonstrates this works by passing an input array to the other function first.
#include "RcppArrayFire.h"
using namespace Rcpp;
af::array theta_check_cpp( af::array theta){
if(*theta(1).host<double>() >= 1){
theta(1) = 0;
}
return theta;
}
// [[Rcpp::export]]
af::array theta_check(RcppArrayFire::typed_array<f64> theta){
const int theta_size = theta.dims()[0];
af::array X(2, theta_size);
X(0, af::seq(theta_size)) = theta_check_cpp( theta );
X(1, af::seq(theta_size)) = theta;
// return X;
Rcpp::Rcout << " works till here";
return theta_check_cpp( X.row(1) );
}
/*** R
theta <- c( 2, 2, 2)
theta_check(theta)
*/
The constructor you are using to create X has an argument ty for the data type, which defaults to f32. Therefore X uses 32 bit floats and you cannot extract a 64 bit host pointer from that. Either use
af::array X(2, theta_size, f64);
to create an array using 64 bit doubles, or extract a 32 bit host pointer via
if(*theta(1).host<float>() >= 1){
...
I have a list of Numeric Vector and I need a List of unique elements. I tried Rcpp:unique fonction. It works very well when apply to a Numeric Vector but not to List. This is the code and the error I got.
List h(List x){
return Rcpp::unique(x);
}
Error in dyn.load("/tmp/RtmpDdKvcH/sourceCpp-x86_64-pc-linux-gnu-1.0.0/sourcecpp_272635d5289/sourceCpp_10.so") :
unable to load shared object '/tmp/RtmpDdKvcH/sourceCpp-x86_64-pc-linux-gnu-1.0.0/sourcecpp_272635d5289/sourceCpp_10.so':
/tmp/RtmpDdKvcH/sourceCpp-x86_64-pc-linux-gnu-1.0.0/sourcecpp_272635d5289/sourceCpp_10.so: undefined symbol: _ZNK4Rcpp5sugar9IndexHashILi19EE8get_addrEP7SEXPREC
It is unclear what you are doing wrong, and it is an incomplete / irreproducible question.
But there is a unit test that does just what you do, and we can do it by hand too:
R> Rcpp::cppFunction("NumericVector uq(NumericVector x) { return Rcpp::unique(x); }")
R> uq(c(1.1, 2.2, 2.2, 3.3, 27))
[1] 27.0 1.1 3.3 2.2
R>
Even if there isn't a matching Rcpp sugar function, you can call R functions from within C++. Example:
#include <Rcpp.h>
using namespace Rcpp;
Rcpp::Environment base("package:base");
Function do_unique = base["unique"];
// [[Rcpp::export]]
List myfunc(List x) {
return do_unique(x);
}
Thank you for being interested to this issue.
As I notified that, my List contains only NumericVector. I propose this code that works very well and faster than unique function in R. However its efficiency decreases when the list is large. Maybe this can help someone. Moreover, someone can also optimise this code.
List uniqueList(List& x) {
int xsize = x.size();
List xunique(x);
int s = 1;
for(int i(1); i<xsize; ++i){
NumericVector xi = x[i];
int l = 0;
for(int j(0); j<s; ++j){
NumericVector xj = x[j];
int xisize = xi.size();
int xjsize = xj.size();
if(xisize != xjsize){
++l;
}
else{
if((sum(xi == xj) == xisize)){
goto notkeep;
}
else{
++l;
}
}
}
if(l == s){
xunique[s] = xi;
++s;
}
notkeep: 0;
}
return head(xunique, s);
}
/***R
x <- list(1,42, 1, 1:3, 42)
uniqueList(x)
[[1]]
[1] 1
[[2]]
[1] 42
[[3]]
[1] 1 2 3
microbenchmark::microbenchmark(uniqueList(x), unique(x))
Unit: microseconds
expr min lq mean median uq max neval
uniqueList(x) 2.382 2.633 3.05103 2.720 2.8995 29.307 100
unique(x) 2.864 3.110 3.50900 3.254 3.4145 24.039 100
But R function becomes faster when the List is large. I am sure that someone can optimise this code.
I have a named list in R:
l = list(a=1, b=2)
I would like to use this list in Rcpp, and iterate over both the values and the names. Ideally, it could be something like (using C++11 formatting for the sake of concision):
void print_list (List l)
for (pair < String, int > &p: l)
cout << p.first << ": " << p.second;
Is there a way to do it (even using plain C++)?
Many of these use cases are in fact visible in the unit tests.
Here I am quoting from tinytest/cpp/Vector.cpp:
Get names:
// [[Rcpp::export]]
CharacterVector integer_names_get( IntegerVector y ){
return y.names() ;
}
Index by name
// [[Rcpp::export]]
int integer_names_indexing( IntegerVector y ){
return y["foo"] ;
}
That should be enough to get you the vector you aim to iterate over.
Is there a way to use poiter arithmetic on a large malloc block, so you can assign multiple structs or primitive data types to that area already allocated? I'm writing something like this but it isnt working (trying to assign 200 structs to a 15000byte malloc area):
char *primDataPtr = NULL;
typedef struct Metadata METADATA;
struct Metadata {
.
.
.
};/*struct Metadata*/
.
.
.
primDataPtr = (void*)(malloc(15000));
if(primDataPtr == NULL) {
exit(1);
}
char *tempPtr = primDataPtr;
int x;
for(x=0;x<200;x++) {
METADATA *md = (void*)(primDataPtr + (sizeof(METADATA) * x));
}//end x -for
The only thing I can see is that:
METADATA *md = (void*)(primDataPtr + (sizeof(METADATA) * x));
should be:
METADATA *md = (METADATA *)(primDataPtr + (sizeof(METADATA) * x));
I think?
PS: your malloc could also just allocation 200 * sizeof(METADATA).
In C, the syntax for a pointer to something is just like the syntax for an array of something. You just need to be careful with the index ranges:
#define ARRAY_SIZE_IN_BYTES (15000)
void *primDataPtr = (void*) malloc(ARRAY_SIZE_IN_BYTES);
assert(primDataPtr);
METADATA *md = (METADATA *)primDataPtr;
for (x=0; x<(ARRAY_SIZE_IN_BYTES/sizeof(METADATA)); x++) {
do_something_with(md[x]);
}