It seems that the sugar Rcpp function setdiff orders the values, which is different from the standard R function setdiff.
As an example, consider the following code:
src <-
"IntegerVector setdiff_Rcpp(IntegerVector x, IntegerVector y){
IntegerVector d = setdiff(x,y);
return(d);
}"
Rcpp::cppFunction(src)
setdiff(15:11, c(13,12))
# [1] 15 14 11
setdiff_Rcpp(15:11, c(13,12))
# [1] 11 14 15
Is it possible to obtain a result as in the standard R function?
[Edit]
I was able to solve my problem. Here is the Rcpp function I used:
// [[Rcpp::export]]
IntegerVector setdiff_R(IntegerVector x, IntegerVector y)
{
// difference of sets x & y (without reordering)
x = x[duplicated(x) == 0]; x = na_omit(x);
y = y[duplicated(y) == 0]; y = na_omit(y);
IntegerVector out(0, NA_INTEGER);
for(int i=0; i < x.length(); i++)
{
if(is_false(any(x[i] == y)))
out.push_back(x[i]);
}
return(out);
}
It only works for IntegerVector type and probably it is not optimised but it gets the job done.
Related
Let x be a vector and M a matrix.
In R, I can do
D <- diag(exp(x))
crossprod(M, D%M)
and in RcppArmadillo, I have the following which is much slower.
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat multiple_mnv(const arma::vec& x, const arma::mat& M) {
arma::colvec diagonal(x.size())
for (int i = 0; i < x.size(); i++)
{
diagonal(i) = exp(x[i]);
}
arma::mat D = diagmat(diagonal);
return M.t()*D*M;
}
Why is this so slow? How can I speed this up?
Welcome to Stack Overflow manju. For future questions, please be advised that a minimal reproducible example is expected, and in fact is in your best interest to provide; it helps others help you. Here's an example of how you could provide example data for others to work with:
## Set seed for reproducibility
set.seed(123)
## Generate data
x <- rnorm(10)
M <- matrix(rnorm(100), nrow = 10, ncol = 10)
## Output code for others to copy your objects
dput(x)
dput(M)
This is the data I will work with to show that your C++ code is in fact not slower than R. I used your C++ code (adding in a missing semicolon):
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat foo(const arma::vec& x, const arma::mat& M) {
arma::colvec diagonal(x.size());
for ( int i = 0; i < x.size(); i++ )
{
diagonal(i) = exp(x[i]);
}
arma::mat D = diagmat(diagonal);
return M.t() * D * M;
}
Note also that I had to make some of my own choices about the type of the return object and types of the function arguments (this is one of the places where a minimal reproducible example could help you: What if these choices affect my results?) I then create an R function to do what foo() does:
bar <- function(v, M) {
D <- diag(exp(v))
return(crossprod(M, D %*% M))
}
Note also that I had to fix a typo you had, changing D%M to D %*% M. Let's double check they give the same results:
all.equal(foo(x, M), bar(x, M))
# [1] TRUE
Now let's explore how fast they are:
library(microbenchmark)
bench <- microbenchmark(cpp = foo(x, M), R = foo(x, M), times = 1e5)
bench
# Unit: microseconds
# expr min lq mean median uq max
# cpp 22.185 23.015 27.00436 23.204 23.461 31143.30
# R 22.126 23.028 25.48256 23.216 23.475 29628.86
Those look pretty much the same to me! We can also look at a density plot of the times (throwing out the extreme value outliers to make things a little clearer):
cpp_times <- with(bench, time[expr == "cpp"])
R_times <- with(bench, time[expr == "R"])
cpp_time_dens <- density(cpp_times[cpp_times < quantile(cpp_times, 0.95)])
R_time_dens <- density(R_times[R_times < quantile(R_times, 0.95)])
plot(cpp_time_dens, col = "blue", xlab = "Time (in nanoseconds)", ylab = "",
main = "Comparing C++ and R execution time")
lines(R_time_dens, col = "red")
legend("topright", col = c("blue", "red"), bty = "n", lty = 1,
legend = c("C++ function (foo)", "R function (bar)"))
Why?
As helpfully pointed out by Dirk Eddelbuettel in the comments, in the end both R and Armadillo are going to be calling a LAPACK or BLAS routine anyways -- you shouldn't expect much difference unless you can give Armadillo a hint on how to be more efficient.
Can we make the Armadillo code faster?
Yes! As pointed out by mtall in the comments, we can give Armadillo the hint that we're dealing with a diagonal matrix. Let's try; we'll use the following code:
// [[Rcpp::export]]
arma::mat baz(const arma::vec& x, const arma::mat& M) {
return M.t() * diagmat(arma::exp(x)) * M;
}
And benchmark it:
all.equal(foo(x, M), baz(x, M))
# [1] TRUE
library(microbenchmark)
bench <- microbenchmark(cpp = foo(x, M), R = foo(x, M),
cpp2 = baz(x, M), times = 1e5)
bench
# Unit: microseconds
# expr min lq mean median uq max
# cpp 22.822 23.757 27.57015 24.118 24.632 26600.48
# R 22.855 23.771 26.44725 24.124 24.638 30619.09
# cpp2 20.035 21.218 25.49863 21.587 22.123 36745.72
We see a small but sure improvement; let's take a look graphically as we did before:
cpp_times <- with(bench, time[expr == "cpp"])
cpp2_times <- with(bench, time[expr == "cpp2"])
R_times <- with(bench, time[expr == "R"])
cpp_time_dens <- density(cpp_times[cpp_times < quantile(cpp_times, 0.95)])
cpp2_time_dens <- density(cpp2_times[cpp2_times < quantile(cpp2_times, 0.95)])
R_time_dens <- density(R_times[R_times < quantile(R_times, 0.95)])
xlims <- range(c(cpp_time_dens$x, cpp2_time_dens$x, R_time_dens$x))
ylims <- range(c(cpp_time_dens$y, cpp2_time_dens$y, R_time_dens$y))
ylims <- ylims * c(1, 1.15)
cols <- c("#0072b2", "#f0e442", "#d55e00")
cols <- c("#e69f00", "#56b4e9", "#009e73")
labs <- c("C++ original", "C++ improved", "R")
plot(cpp_time_dens, col = cols[1], xlim = xlims, ylim = ylims,
xlab = "Time (in nanoseconds)", ylab = "",
main = "Comparing C++ and R execution time")
lines(cpp2_time_dens, col = cols[2])
lines(R_time_dens, col = cols[3])
legend("topleft", col = cols, bty = "n", lty = 1, legend = labs, horiz = TRUE)
I'd like to use the fact that pmax(x, 0) = (x + abs(x)) / 2 on an integer vector using Rcpp for performance.
I've written a naive implementation:
IntegerVector do_pmax0_abs_int(IntegerVector x) {
R_xlen_t n = x.length();
IntegerVector out(clone(x));
for (R_xlen_t i = 0; i < n; ++i) {
int oi = out[i];
out[i] += abs(oi);
out[i] /= 2;
}
return out;
}
which is indeed performant; however, it invokes undefined behaviour should x contains any element larger than .Machine$integer.max / 2.
Is there a way to quickly determine whether or not the vector would be less than .Machine$integer.max / 2? I considered a bit-shifting but this would not be valid for negative numbers.
As mentioned in the comments you can make use of int64_t for intermediate results. In addition, it makes sense to not copy x to out and don't initilize out to zero everywhere:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector do_pmax0_abs_int(IntegerVector x) {
R_xlen_t n = x.length();
IntegerVector out(clone(x));
for (R_xlen_t i = 0; i < n; ++i) {
int oi = out[i];
out[i] += abs(oi);
out[i] /= 2;
}
return out;
}
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
IntegerVector do_pmax0_abs_int64(IntegerVector x) {
R_xlen_t n = x.length();
IntegerVector out = no_init(n);
for (R_xlen_t i = 0; i < n; ++i) {
int64_t oi = x[i];
oi += std::abs(oi);
out[i] = static_cast<int>(oi / 2);
}
return out;
}
/***R
ints <- as.integer(sample.int(.Machine$integer.max, 1e6) - 2^30)
bench::mark(do_pmax0_abs_int(ints),
do_pmax0_abs_int64(ints),
pmax(ints, 0))[, 1:5]
ints <- 2L * ints
bench::mark(#do_pmax0_abs_int(ints),
do_pmax0_abs_int64(ints),
pmax(ints, 0))[, 1:5]
*/
Result:
> Rcpp::sourceCpp('57310889/code.cpp')
> ints <- as.integer(sample.int(.Machine$integer.max, 1e6) - 2^30)
> bench::mark(do_pmax0_abs_int(ints),
+ do_pmax0_abs_int64(ints),
+ pmax(ints, 0))[, 1:5]
# A tibble: 3 x 5
expression min median `itr/sec` mem_alloc
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt>
1 do_pmax0_abs_int(ints) 1.91ms 3.31ms 317. 3.82MB
2 do_pmax0_abs_int64(ints) 1.28ms 2.67ms 432. 3.82MB
3 pmax(ints, 0) 9.85ms 10.68ms 86.9 15.26MB
> ints <- 2L * ints
> bench::mark(#do_pmax0_abs_int(ints),
+ do_pmax0_abs_int64(ints),
+ pmax(ints, 0))[, 1:5]
# A tibble: 2 x 5
expression min median `itr/sec` mem_alloc
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt>
1 do_pmax0_abs_int64(ints) 1.28ms 2.52ms 439. 3.82MB
2 pmax(ints, 0) 9.88ms 10.83ms 89.5 15.26MB
Notes:
Without no_init the two C++ methods are equally fast.
I ave removed the original method from the second benchmark, since bench::mark compares the results by default, and the original method produces wrong results for that particular input.
I have a list of Numeric Vector and I need a List of unique elements. I tried Rcpp:unique fonction. It works very well when apply to a Numeric Vector but not to List. This is the code and the error I got.
List h(List x){
return Rcpp::unique(x);
}
Error in dyn.load("/tmp/RtmpDdKvcH/sourceCpp-x86_64-pc-linux-gnu-1.0.0/sourcecpp_272635d5289/sourceCpp_10.so") :
unable to load shared object '/tmp/RtmpDdKvcH/sourceCpp-x86_64-pc-linux-gnu-1.0.0/sourcecpp_272635d5289/sourceCpp_10.so':
/tmp/RtmpDdKvcH/sourceCpp-x86_64-pc-linux-gnu-1.0.0/sourcecpp_272635d5289/sourceCpp_10.so: undefined symbol: _ZNK4Rcpp5sugar9IndexHashILi19EE8get_addrEP7SEXPREC
It is unclear what you are doing wrong, and it is an incomplete / irreproducible question.
But there is a unit test that does just what you do, and we can do it by hand too:
R> Rcpp::cppFunction("NumericVector uq(NumericVector x) { return Rcpp::unique(x); }")
R> uq(c(1.1, 2.2, 2.2, 3.3, 27))
[1] 27.0 1.1 3.3 2.2
R>
Even if there isn't a matching Rcpp sugar function, you can call R functions from within C++. Example:
#include <Rcpp.h>
using namespace Rcpp;
Rcpp::Environment base("package:base");
Function do_unique = base["unique"];
// [[Rcpp::export]]
List myfunc(List x) {
return do_unique(x);
}
Thank you for being interested to this issue.
As I notified that, my List contains only NumericVector. I propose this code that works very well and faster than unique function in R. However its efficiency decreases when the list is large. Maybe this can help someone. Moreover, someone can also optimise this code.
List uniqueList(List& x) {
int xsize = x.size();
List xunique(x);
int s = 1;
for(int i(1); i<xsize; ++i){
NumericVector xi = x[i];
int l = 0;
for(int j(0); j<s; ++j){
NumericVector xj = x[j];
int xisize = xi.size();
int xjsize = xj.size();
if(xisize != xjsize){
++l;
}
else{
if((sum(xi == xj) == xisize)){
goto notkeep;
}
else{
++l;
}
}
}
if(l == s){
xunique[s] = xi;
++s;
}
notkeep: 0;
}
return head(xunique, s);
}
/***R
x <- list(1,42, 1, 1:3, 42)
uniqueList(x)
[[1]]
[1] 1
[[2]]
[1] 42
[[3]]
[1] 1 2 3
microbenchmark::microbenchmark(uniqueList(x), unique(x))
Unit: microseconds
expr min lq mean median uq max neval
uniqueList(x) 2.382 2.633 3.05103 2.720 2.8995 29.307 100
unique(x) 2.864 3.110 3.50900 3.254 3.4145 24.039 100
But R function becomes faster when the List is large. I am sure that someone can optimise this code.
Is there a base function in Rcpp that:
Fills entirely by a single value if size of a vector is 1.
Fills the other vector completely if same length.
Fills with an NA value if neither Vector are the same length nor a vector is of size 1.
I've written the above criteria as a function below using a NumericVector as an example. If there isn't a base function in Rcpp that performs said operations there should be a way to template the function so that given any type of vector (e.g. numeric, character and so on) the above logic would be able to be executed.
// [[Rcpp::export]]
NumericVector cppvectorize(NumericVector x,NumericVector y) {
NumericVector y_out(y.size());
if(x.size() == 1) {
for(int i = 0; i < y_out.size(); i++) {
y_out[i] = x[0];
}
} else if(x.size() == y_out.size()) {
for(int i = 0; i < y_out.size(); i++) {
y_out[i] = x[i];
}
} else {
for(int i = 0; i < y_out.size(); i++) {
y_out[i] = NA_REAL;
}
}
return y_out;
}
Unfortunately, the closest you will come to such a function is one of the rep variants that Rcpp supports. However, none of the variants match the desired output. Therefore, the only option is to really implement a templated version of your desired function.
To create the templated function, we will first create a routing function that handles the dispatch of SEXP objects. The rationale behind the routing function is SEXP objects are able to be retrieved from and surfaced into R using Rcpp Attributes whereas a templated version is not. As a result, we need to specify the SEXTYPE (used as RTYPE) dispatches that are possible. The TYPEOF() macro retrieves the coded number. Using a switch statement, we can dispatch this number into the appropriate cases.
After dispatching, we arrive at the templated function. The templated function makes use of the base Vector class of Rcpp to simplify the data flow. From here, the notable novelty will be the use of ::traits::get_na<RTYPE>() to dynamically retrieve the appropriate NA value and fill it.
With the plan in place, let's look at the code:
#include <Rcpp.h>
using namespace Rcpp;
// ---- Templated Function
template <int RTYPE>
Vector<RTYPE> vec_helper(const Vector<RTYPE>& x, const Vector<RTYPE>& y) {
Vector<RTYPE> y_out(y.size());
if(x.size() == 1){
y_out.fill(x[0]);
} else if (x.size() == y.size()) {
y_out = x;
} else {
y_out.fill(::traits::get_na<RTYPE>());
}
return y_out;
}
// ---- Dispatch function
// [[Rcpp::export]]
SEXP cppvectorize(SEXP x, SEXP y) {
switch (TYPEOF(x)) {
case INTSXP: return vec_helper<INTSXP>(x, y);
case REALSXP: return vec_helper<REALSXP>(x, y);
case STRSXP: return vec_helper<STRSXP>(x, y);
default: Rcpp::stop("SEXP Type Not Supported.");
}
// Need to return a value even though this will never be triggered
// to quiet the compiler.
return R_NilValue;
}
Sample Tests
Here we conduct a few sample tests on each of the supported data
# Case 1: x == 1
x = 1:5
y = 2
cppvectorize(x, y)
## [1] NA
# Case 2: x == y
x = letters[1:5]
y = letters[6:10]
cppvectorize(x, y)
## [1] "a" "b" "c" "d" "e"
# Case 3: x != y && x > 1
x = 1.5
y = 2.5:6.5
cppvectorize(x, y)
## [1] 1.5 1.5 1.5 1.5 1.5
I have an array (object[,,]), let us assume for arguments sake the array is x1000, y1000,z*1000 and represents a matrix of points on an x,y,z plane.
at position: x50,y10,z199, for example I want to extract another object[,,] containing a smaller cube, say a submatrix of 100 cubed(or whatever is available, nulls if empty?) from the parent array using the reference point as a centerpoint, is this possible, I was hoping I could do it in linq but got hopelessly lost.. how would you/ should I tackle this.. one though was to do the following:
1.Create a new 3d array with the size of the amount of items i want to retrieve (xyz).
2.Iterate over each axis (x, y, z).
3.Copy the value from the source array to the target array (offsetX+x, offsetY+y, offsetZ+z).
4.Return new array.
but if this is being called a lot, I see it being quite a bottleneck, ideas anyone?
Depending on your usage of the smaller array, this may or may not suit your needs.
To represent a sub section (chunk) of array, instead of creating a new array or doing any copying, you could write your own class which serves as a view to that chunk of array.
Note that this example has the following properties:
No guard clauses in constructor
Chunk is always cube (x, y, z length are equal)
Chunk length is always odd (since we are expanding out from point of reference)
public class ArrayChunk<T>
{
// Array this chunk is from.
private readonly T[,,] _parentArray;
// Point of reference.
private readonly int _x, _y, _z;
// How many elements to move outwards in each direction from point of reference.
private readonly int _numToExpand;
public ArrayChunk(T[,,] parentArray, int x, int y, int z, int numToExpand)
{
_parentArray = parentArray;
_x = x;
_y = y;
_z = z;
_numToExpand = numToExpand;
}
public int Length => _numToExpand*2 + 1;
public T this[int x, int y, int z]
{
get
{
// Make sure index is within chunk range.
EnsureInChunkRange(x, y, z);
// Map chunk index to parent array index.
int parentX = MapToParent(_x, x),
parentY = MapToParent(_y, y),
parentZ = MapToParent(_z, z);
// If parent array index is in parent array range, return element from parent array.
if (IsInRangeOfParent(parentX, parentY, parentZ))
return _parentArray[parentX, parentY, parentZ];
// Otherwise return default element for type T.
return default(T);
}
set
{
EnsureInChunkRange(x, y, z);
int parentX = MapToParent(_x, x),
parentY = MapToParent(_y, y),
parentZ = MapToParent(_z, z);
if (IsInRangeOfParent(parentX, parentY, parentZ))
_parentArray[parentX, parentY, parentZ] = value;
else
throw new InvalidOperationException();
}
}
private void EnsureInChunkRange(int x, int y, int z)
{
if (x < 0 || y < 0 || z < 0 ||
x >= Length || y >= Length || z >= Length)
{
throw new IndexOutOfRangeException();
}
}
private int MapToParent(int referenceIndex, int index)
{
return referenceIndex - _numToExpand + index;
}
private bool IsInRangeOfParent(int parentX, int parentY, int parentZ)
{
return
parentX >= 0 &&
parentY >= 0 &&
parentZ >= 0 &&
parentX < _parentArray.GetLength(0) &&
parentY < _parentArray.GetLength(1) &&
parentZ < _parentArray.GetLength(2);
}
}
To easily get chunk from array, you could declare an extension method:
public static class ArrayChunkExtensions
{
public static ArrayChunk<T> GetChunk<T>(this T[,,] array, int x, int y, int z, int numToExpand)
{
return new ArrayChunk<T>(array, x, y, z, numToExpand);
}
}
Here's a sample usage:
Action<int, Action<int, int, int>> iterate = (length, action) =>
{
for (int x = 0; x < length; x++)
for (int y = 0; y < length; y++)
for (int z = 0; z < length; z++)
action(x, y, z);
};
// Create 5x5x5 parent array.
const int size = 5;
var array = new string[size, size, size];
iterate(size, (x, y, z) => array[x, y, z] = $"x:{x} y:{y} z:{z}");
// Take 3x3x3 chunk from parent array center.
const int indexOfReference = 2;
const int numToExpand = 1;
ArrayChunk<string> chunk = array.GetChunk(indexOfReference, indexOfReference, indexOfReference, numToExpand);
iterate(chunk.Length, (x, y, z) => Console.WriteLine(chunk[x, y, z]));