I have a list of Numeric Vector and I need a List of unique elements. I tried Rcpp:unique fonction. It works very well when apply to a Numeric Vector but not to List. This is the code and the error I got.
List h(List x){
return Rcpp::unique(x);
}
Error in dyn.load("/tmp/RtmpDdKvcH/sourceCpp-x86_64-pc-linux-gnu-1.0.0/sourcecpp_272635d5289/sourceCpp_10.so") :
unable to load shared object '/tmp/RtmpDdKvcH/sourceCpp-x86_64-pc-linux-gnu-1.0.0/sourcecpp_272635d5289/sourceCpp_10.so':
/tmp/RtmpDdKvcH/sourceCpp-x86_64-pc-linux-gnu-1.0.0/sourcecpp_272635d5289/sourceCpp_10.so: undefined symbol: _ZNK4Rcpp5sugar9IndexHashILi19EE8get_addrEP7SEXPREC
It is unclear what you are doing wrong, and it is an incomplete / irreproducible question.
But there is a unit test that does just what you do, and we can do it by hand too:
R> Rcpp::cppFunction("NumericVector uq(NumericVector x) { return Rcpp::unique(x); }")
R> uq(c(1.1, 2.2, 2.2, 3.3, 27))
[1] 27.0 1.1 3.3 2.2
R>
Even if there isn't a matching Rcpp sugar function, you can call R functions from within C++. Example:
#include <Rcpp.h>
using namespace Rcpp;
Rcpp::Environment base("package:base");
Function do_unique = base["unique"];
// [[Rcpp::export]]
List myfunc(List x) {
return do_unique(x);
}
Thank you for being interested to this issue.
As I notified that, my List contains only NumericVector. I propose this code that works very well and faster than unique function in R. However its efficiency decreases when the list is large. Maybe this can help someone. Moreover, someone can also optimise this code.
List uniqueList(List& x) {
int xsize = x.size();
List xunique(x);
int s = 1;
for(int i(1); i<xsize; ++i){
NumericVector xi = x[i];
int l = 0;
for(int j(0); j<s; ++j){
NumericVector xj = x[j];
int xisize = xi.size();
int xjsize = xj.size();
if(xisize != xjsize){
++l;
}
else{
if((sum(xi == xj) == xisize)){
goto notkeep;
}
else{
++l;
}
}
}
if(l == s){
xunique[s] = xi;
++s;
}
notkeep: 0;
}
return head(xunique, s);
}
/***R
x <- list(1,42, 1, 1:3, 42)
uniqueList(x)
[[1]]
[1] 1
[[2]]
[1] 42
[[3]]
[1] 1 2 3
microbenchmark::microbenchmark(uniqueList(x), unique(x))
Unit: microseconds
expr min lq mean median uq max neval
uniqueList(x) 2.382 2.633 3.05103 2.720 2.8995 29.307 100
unique(x) 2.864 3.110 3.50900 3.254 3.4145 24.039 100
But R function becomes faster when the List is large. I am sure that someone can optimise this code.
Related
I never have used C++ before so this may be a dumb question.
In R I use this function to read from a socket:
socket_bin_reader <- function(in_sock) {
string_read <- raw(0)
while((rd <- readBin(in_sock, what = "raw", n=1)) > 0) {
if (rd == 0xff) rd <- readBin(in_sock, what = "raw", n =1)
string_read <- c(string_read, rd)
}
return(string_read)
}
This functions does exactly what I need, but has the disadvantage that it takes a lot of time to read large quantities of data. Therefore I am looking for ways to use C++.
I found this example on how to read byte-by-byte from a file (cpp-byte-file-reading)
The body from my function will probably be based on this example. My guess is that it will look like:
// [[Rcpp::export]]
NumericVector socket_bin_reader_C(??? in_sock) {
NumericVector out = NumericVector::create(??);
ifstream infile(in_sock, ios::in | ios::binary);
while (rd = infile.read(char*) > 0) {
if (rd == 0xff) rd = infile.read(char*);
add rd to out;
}
}
But I have two questions:
In Rcpp you have to provide a class for each parameter. What is the class for a socketConnection?
I know that in C or C++ you have to allocate memory. How can I dynamically reallocate more memory for the return vector?
Ben
I created a cumsum function in an R package with rcpp which will cumulatively sum a vector until it hits the user defined ceiling or floor. However, if one wants the cumsum to be bounded above, the user must still specify a floor.
Example:
a = c(1, 1, 1, 1, 1, 1, 1)
If i wanted to cumsum a and have an upper bound of 3, I could cumsum_bounded(a, lower = 1, upper = 3). I would rather not have to specify the lower bound.
My code:
#include <Rcpp.h>
#include <float.h>
#include <cmath>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector cumsum_bounded(NumericVector x, int upper, int lower) {
NumericVector res(x.size());
double acc = 0;
for (int i=0; i < x.size(); ++i) {
acc += x[i];
if (acc < lower) acc = lower;
else if (acc > upper) acc = upper;
res[i] = acc;
}
return res;
}
What I would like:
#include <Rcpp.h>
#include <float.h>
#include <cmath>
#include <climits> //for LLONG_MIN and LLONG_MAX
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector cumsum_bounded(NumericVector x, long long int upper = LLONG_MAX, long long int lower = LLONG_MIN) {
NumericVector res(x.size());
double acc = 0;
for (int i=0; i < x.size(); ++i) {
acc += x[i];
if (acc < lower) acc = lower;
else if (acc > upper) acc = upper;
res[i] = acc;
}
return res;
}
In short, yes its possible but it requires finesse that involves creating an intermediary function or embedding sorting logic within the main function.
In long, Rcpp attributes only supports a limit feature set of values. These values are listed in the Rcpp FAQ 3.12 entry
String literals delimited by quotes (e.g. "foo")
Integer and Decimal numeric values (e.g. 10 or 4.5)
Pre-defined constants including:
Booleans: true and false
Null Values: R_NilValue, NA_STRING, NA_INTEGER, NA_REAL, and NA_LOGICAL.
Selected vector types can be instantiated using the
empty form of the ::create static member function.
CharacterVector, IntegerVector, and NumericVector
Matrix types instantiated using the rows, cols constructor Rcpp::Matrix n(rows,cols)
CharacterMatrix, IntegerMatrix, and NumericMatrix)
If you were to specify numerical values for LLONG_MAX and LLONG_MIN this would meet the criteria to directly use Rcpp attributes on the function. However, these values are implementation specific. Thus, it would not be ideal to hardcode them. Thus, we have to seek an outside solution: the Rcpp::Nullable<T> class to enable the default NULL value. The reason why we have to wrap the parameter type with Rcpp::Nullable<T> is that NULL is a very special and can cause heartache if not careful.
The NULL value, unlike others on the real number line, will not be used to bound your values in this case. As a result, it is the perfect candidate to use on the function call. There are two choices you then have to make: use Rcpp::Nullable<T> as the parameters on the main function or create a "logic" helper function that has the correct parameters and can be used elsewhere within your application without worry. I've opted for the later below.
#include <Rcpp.h>
#include <float.h>
#include <cmath>
#include <climits> //for LLONG_MIN and LLONG_MAX
using namespace Rcpp;
NumericVector cumsum_bounded_logic(NumericVector x,
long long int upper = LLONG_MAX,
long long int lower = LLONG_MIN) {
NumericVector res(x.size());
double acc = 0;
for (int i=0; i < x.size(); ++i) {
acc += x[i];
if (acc < lower) acc = lower;
else if (acc > upper) acc = upper;
res[i] = acc;
}
return res;
}
// [[Rcpp::export]]
NumericVector cumsum_bounded(NumericVector x,
Rcpp::Nullable<long long int> upper = R_NilValue,
Rcpp::Nullable<long long int> lower = R_NilValue) {
if(upper.isNotNull() && lower.isNotNull()){
return cumsum_bounded_logic(x, Rcpp::as< long long int >(upper), Rcpp::as< long long int >(lower));
} else if(upper.isNull() && lower.isNotNull()){
return cumsum_bounded_logic(x, LLONG_MAX, Rcpp::as< long long int >(lower));
} else if(upper.isNotNull() && lower.isNull()) {
return cumsum_bounded_logic(x, Rcpp::as< long long int >(upper), LLONG_MIN);
} else {
return cumsum_bounded_logic(x, LLONG_MAX, LLONG_MIN);
}
// Required to quiet compiler
return x;
}
Test Output
cumsum_bounded(a, 5)
## [1] 1 2 3 4 5 5 5
cumsum_bounded(a, 5, 2)
## [1] 2 3 4 5 5 5 5
Is there a base function in Rcpp that:
Fills entirely by a single value if size of a vector is 1.
Fills the other vector completely if same length.
Fills with an NA value if neither Vector are the same length nor a vector is of size 1.
I've written the above criteria as a function below using a NumericVector as an example. If there isn't a base function in Rcpp that performs said operations there should be a way to template the function so that given any type of vector (e.g. numeric, character and so on) the above logic would be able to be executed.
// [[Rcpp::export]]
NumericVector cppvectorize(NumericVector x,NumericVector y) {
NumericVector y_out(y.size());
if(x.size() == 1) {
for(int i = 0; i < y_out.size(); i++) {
y_out[i] = x[0];
}
} else if(x.size() == y_out.size()) {
for(int i = 0; i < y_out.size(); i++) {
y_out[i] = x[i];
}
} else {
for(int i = 0; i < y_out.size(); i++) {
y_out[i] = NA_REAL;
}
}
return y_out;
}
Unfortunately, the closest you will come to such a function is one of the rep variants that Rcpp supports. However, none of the variants match the desired output. Therefore, the only option is to really implement a templated version of your desired function.
To create the templated function, we will first create a routing function that handles the dispatch of SEXP objects. The rationale behind the routing function is SEXP objects are able to be retrieved from and surfaced into R using Rcpp Attributes whereas a templated version is not. As a result, we need to specify the SEXTYPE (used as RTYPE) dispatches that are possible. The TYPEOF() macro retrieves the coded number. Using a switch statement, we can dispatch this number into the appropriate cases.
After dispatching, we arrive at the templated function. The templated function makes use of the base Vector class of Rcpp to simplify the data flow. From here, the notable novelty will be the use of ::traits::get_na<RTYPE>() to dynamically retrieve the appropriate NA value and fill it.
With the plan in place, let's look at the code:
#include <Rcpp.h>
using namespace Rcpp;
// ---- Templated Function
template <int RTYPE>
Vector<RTYPE> vec_helper(const Vector<RTYPE>& x, const Vector<RTYPE>& y) {
Vector<RTYPE> y_out(y.size());
if(x.size() == 1){
y_out.fill(x[0]);
} else if (x.size() == y.size()) {
y_out = x;
} else {
y_out.fill(::traits::get_na<RTYPE>());
}
return y_out;
}
// ---- Dispatch function
// [[Rcpp::export]]
SEXP cppvectorize(SEXP x, SEXP y) {
switch (TYPEOF(x)) {
case INTSXP: return vec_helper<INTSXP>(x, y);
case REALSXP: return vec_helper<REALSXP>(x, y);
case STRSXP: return vec_helper<STRSXP>(x, y);
default: Rcpp::stop("SEXP Type Not Supported.");
}
// Need to return a value even though this will never be triggered
// to quiet the compiler.
return R_NilValue;
}
Sample Tests
Here we conduct a few sample tests on each of the supported data
# Case 1: x == 1
x = 1:5
y = 2
cppvectorize(x, y)
## [1] NA
# Case 2: x == y
x = letters[1:5]
y = letters[6:10]
cppvectorize(x, y)
## [1] "a" "b" "c" "d" "e"
# Case 3: x != y && x > 1
x = 1.5
y = 2.5:6.5
cppvectorize(x, y)
## [1] 1.5 1.5 1.5 1.5 1.5
From R, I'm trying to run sourceCpp on this file:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace arma;
using namespace Rcpp;
// [[Rcpp::export]]
vec dnormLog(vec x, vec means, vec sds) {
int n = x.size();
vec res(n);
for(int i = 0; i < n; i++) {
res[i] = log(dnorm(x[i], means[i], sds[i]));
}
return res;
}
See this answer to see where I got the function from. This throws the error:
no matching function for call to 'dnorm4'
Which is the exact error I was hoping to prevent by using the loop, since the referenced answer mentions that dnorm is only vectorized with respect to its first argument. I fear the answer is obvious, but I've tried adding R:: before the dnorm, tried using NumericVector instead of vec, without using log() in front. No luck. However, adding R:: before dnorm does produce a separate error:
too few arguments to function call, expected 4, have 3; did you mean '::dnorm4'?
Which is not fixed by replacing dnorm above with R::dnorm4.
There are two nice teachable moments here:
Pay attention to namespaces. If in doubt, don't go global.
Check the headers for the actual definitions. You missed the fourth argument in the scalar version R::dnorm().
Here is a repaired version, included is a second variant you may find interesting:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec dnormLog(arma::vec x, arma::vec means, arma::vec sds) {
int n = x.size();
arma::vec res(n);
for(int i = 0; i < n; i++) {
res[i] = std::log(R::dnorm(x[i], means[i], sds[i], FALSE));
}
return res;
}
// [[Rcpp::export]]
arma::vec dnormLog2(arma::vec x, arma::vec means, arma::vec sds) {
int n = x.size();
arma::vec res(n);
for(int i = 0; i < n; i++) {
res[i] = R::dnorm(x[i], means[i], sds[i], TRUE);
}
return res;
}
/*** R
dnormLog( c(0.1,0.2,0.3), rep(0.0, 3), rep(1.0, 3))
dnormLog2(c(0.1,0.2,0.3), rep(0.0, 3), rep(1.0, 3))
*/
When we source this, both return the same result because the R API allows us to ask for logarithms to be taken.
R> sourceCpp("/tmp/dnorm.cpp")
R> dnormLog( c(0.1,0.2,0.3), rep(0.0, 3), rep(1.0, 3))
[,1]
[1,] -0.923939
[2,] -0.938939
[3,] -0.963939
R> dnormLog2(c(0.1,0.2,0.3), rep(0.0, 3), rep(1.0, 3))
[,1]
[1,] -0.923939
[2,] -0.938939
[3,] -0.963939
R>
We need to find the maximum element in an array which is also equal to product of two elements in the same array. For example [2,3,6,8] , here 6=2*3 so answer is 6.
My approach was to sort the array and followed by a two pointer method which checked whether the product exist for each element. This is o(nlog(n)) + O(n^2) = O(n^2) approach. Is there a faster way to this ?
There is a slight better solution with O(n * sqrt(n)) if you are allowed to use O(M) memory M = max number in A[i]
Use an array of size M to mark every number while you traverse them from smaller to bigger number.
For each number try all its factors and see if those were already present in the array map.
Here is a pseudo code for that:
#define M 1000000
int array_map[M+2];
int ans = -1;
sort(A,A+n);
for(i=0;i<n;i++) {
for(j=1;j<=sqrt(A[i]);j++) {
int num1 = j;
if(A[i]%num1==0) {
int num2 = A[i]/num1;
if(array_map[num1] && array_map[num2]) {
if(num1==num2) {
if(array_map[num1]>=2) ans = A[i];
} else {
ans = A[i];
}
}
}
}
array_map[A[i]]++;
}
There is an ever better approach if you know how to find all possible factors in log(M) this just becomes O(n*logM). You have to use sieve and backtracking for that
#JerryGoyal 's solution is correct. However, I think it can be optimized even further if instead of using B pointer, we use binary search to find the other factor of product if arr[c] is divisible by arr[a]. Here's the modification for his code:
for(c=n-1;(c>1)&& (max==-1);c--){ // loop through C
for(a=0;(a<c-1)&&(max==-1);a++){ // loop through A
if(arr[c]%arr[a]==0) // If arr[c] is divisible by arr[a]
{
if(binary_search(a+1, c-1, (arr[c]/arr[a]))) //#include<algorithm>
{
max = arr[c]; // if the other factor x of arr[c] is also in the array such that arr[c] = arr[a] * x
break;
}
}
}
}
I would have commented this on his solution, unfortunately I lack the reputation to do so.
Try this.
Written in c++
#include <vector>
#include <algorithm>
using namespace std;
int MaxElement(vector< int > Input)
{
sort(Input.begin(), Input.end());
int LargestElementOfInput = 0;
int i = 0;
while (i < Input.size() - 1)
{
if (LargestElementOfInput == Input[Input.size() - (i + 1)])
{
i++;
continue;
}
else
{
if (Input[i] != 0)
{
LargestElementOfInput = Input[Input.size() - (i + 1)];
int AllowedValue = LargestElementOfInput / Input[i];
int j = 0;
while (j < Input.size())
{
if (Input[j] > AllowedValue)
break;
else if (j == i)
{
j++;
continue;
}
else
{
int Product = Input[i] * Input[j++];
if (Product == LargestElementOfInput)
return Product;
}
}
}
i++;
}
}
return -1;
}
Once you have sorted the array, then you can use it to your advantage as below.
One improvement I can see - since you want to find the max element that meets the criteria,
Start from the right most element of the array. (8)
Divide that with the first element of the array. (8/2 = 4).
Now continue with the double pointer approach, till the element at second pointer is less than the value from the step 2 above or the match is found. (i.e., till second pointer value is < 4 or match is found).
If the match is found, then you got the max element.
Else, continue the loop with next highest element from the array. (6).
Efficient solution:
2 3 8 6
Sort the array
keep 3 pointers C, B and A.
Keeping C at the last and A at 0 index and B at 1st index.
traverse the array using pointers A and B till C and check if A*B=C exists or not.
If it exists then C is your answer.
Else, Move C a position back and traverse again keeping A at 0 and B at 1st index.
Keep repeating this till you get the sum or C reaches at 1st index.
Here's the complete solution:
int arr[] = new int[]{2, 3, 8, 6};
Arrays.sort(arr);
int n=arr.length;
int a,b,c,prod,max=-1;
for(c=n-1;(c>1)&& (max==-1);c--){ // loop through C
for(a=0;(a<c-1)&&(max==-1);a++){ // loop through A
for(b=a+1;b<c;b++){ // loop through B
prod=arr[a]*arr[b];
if(prod==arr[c]){
System.out.println("A: "+arr[a]+" B: "+arr[b]);
max=arr[c];
break;
}
if(prod>arr[c]){ // no need to go further
break;
}
}
}
}
System.out.println(max);
I came up with below solution where i am using one array list, and following one formula:
divisor(a or b) X quotient(b or a) = dividend(c)
Sort the array.
Put array into Collection Col.(ex. which has faster lookup, and maintains insertion order)
Have 2 pointer a,c.
keep c at last, and a at 0.
try to follow (divisor(a or b) X quotient(b or a) = dividend(c)).
Check if a is divisor of c, if yes then check for b in col.(a
If a is divisor and list has b, then c is the answer.
else increase a by 1, follow step 5, 6 till c-1.
if max not found then decrease c index, and follow the steps 4 and 5.
Check this C# solution:
-Loop through each element,
-loop and multiply each element with other elements,
-verify if the product exists in the array and is the max
private static int GetGreatest(int[] input)
{
int max = 0;
int p = 0; //product of pairs
//loop through the input array
for (int i = 0; i < input.Length; i++)
{
for (int j = i + 1; j < input.Length; j++)
{
p = input[i] * input[j];
if (p > max && Array.IndexOf(input, p) != -1)
{
max = p;
}
}
}
return max;
}
Time complexity O(n^2)