Problem when using gsl_ran_multinomial in Rcpp - rcpp

I'm trying to generate multinomial random variables as fast as possible. And I learned that gsl_ran_multinomial could be a good choice. However, I tried to use it based on the answer in this post: https://stackoverflow.com/a/23100665/21039115, and the results were always wrong.
In detail, my code is
// [[Rcpp::export]]
arma::ivec rmvn_gsl(int K, arma::vec prob) {
gsl_rng *s = gsl_rng_alloc(gsl_rng_mt19937); // Create RNG seed
arma::ivec temp(K);
gsl_ran_multinomial(s, K, 1, prob.begin(), (unsigned int *) temp.begin());
gsl_rng_free(s); // Free memory
return temp;
}
And the result was something like
rmvn_gsl(3, c(0.2, 0.7, 0.1))
[,1]
[1,] 1
[2,] 0
[3,] 0
which is ridiculous.
I was wondering if there was any problem exist in the code... I couldn't find any other examples to compare. I appreciate any help!!!
UPDATE:
I found the primary problem here is that I didn't set a random seed, and it seems that gsl has its own default seed (FYI: https://stackoverflow.com/a/32939816/21039115). Once I set the seed by time, the code worked. But I will go with rmultinom since it can even be faster than gsl_ran_multinomial based on microbenchmark.
Anyway, #Dirk Eddelbuettel provided a great example of implementing gsl_ran_multinomial below. Just pay attention to the random seeds issue if someone met the same problem as me.

Here is a complete example taking a double vector of probabilities and returning an unsigned integer vector (at the compiled level) that is mapped to an integer vector by the time we are back in R:
Code
#include <RcppGSL.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
// [[Rcpp::depends(RcppGSL)]]
// [[Rcpp::export]]
std::vector<unsigned int> foo(int n, const std::vector <double> p) {
int k = p.size();
std::vector<unsigned int> nv(k);
gsl_rng_env_setup();
gsl_rng *s = gsl_rng_alloc(gsl_rng_mt19937); // Create RNG instance
gsl_ran_multinomial(s, k, n, &(p[0]), &(nv[0]));
gsl_rng_free(s);
return nv;
}
/*** R
foo(400, c(0.1, 0.2, 0.3, 0.4))
*/
Output
> Rcpp::sourceCpp("~/git/stackoverflow/75165241/answer.cpp")
> foo(400, c(0.1, 0.2, 0.3, 0.4))
[1] 37 80 138 145
>

Related

GMP setting last digit to zero

I’m looking for fastest way to set last digit of positive number l declated as mpz_t to zero. I didn’t find the function could to this already. For example 6531489321483 should be changed to 6531489321480.
Update
It appears that subtraction and modulo is the superior method for zeroing out the last digit with mpz_t types. Just as #MarkDickinson and #MarcGlisse pointed out, the asymptotic behavior greatly favors using mpz_tdiv_r_ui (or mpz_fdiv_r_ui) over mpz_div_ui followed by mpz-mul_ui. My original benchmarks were on relatively small numbers (25 digits). I retested on a 175 digit number and the sub_mod method was nearly 40% faster.
Test value: 1234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789
Result with div_mul: 1234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456780
Result with sub_mod: 1234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456780
time with division followed by multiplication: 6.145656
time with subtraction and modulo: 4.413998
And with a 350 digit number we see that sub_mod is around 85% faster:
Test value: 12345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789
Result with div_mul: 12345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456780
Result with sub_mod: 12345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456780
time with division followed by multiplication: 10.256122
time with subtraction and modulo: 5.522990
It should be noted that whether we use mpz_tdiv_r_ui or mpz_fdiv_r_ui, the results were almost identical.
Since the sub_mod method was only marginally slower with smaller numbers, it seems reasonable to only use this method for all cases.
It would be nice to tests this on different compilers. I'm currently using clang 5.0.1.
Original
Benchmarks on my machine show that division followed by multiplication is faster than finding the remainder via modulo operator and subtracting.
#include <stdio.h>
#include <time.h>
#include <gmp.h>
void div_mul(mpz_t x) {
mpz_tdiv_q_ui(x, x, 10u);
mpz_mul_ui(x, x, 10u);
}
// Maybe this could be simpler?
void sub_mod(mpz_t x, mpz_t y) {
// N.B. mpz_mod_ui is equivalent to mpz_fdiv_r_ui. Changed to
// mpz_tdiv_r_ui for consistency with div_mul.
mpz_tdiv_r_ui(y, x, 10u);
mpz_sub(x, x, y);
}
Main:
int main() {
mpz_t testVal;
mpz_init(testVal);
mpz_set_str(testVal, "1234567898765432123456789", 10);
gmp_printf("Test value: %Zd\n", testVal);
mpz_t x;
mpz_t y;
mpz_init(x);
mpz_init(y);
mpz_set(x, testVal);
div_mul(x);
gmp_printf("Result with div_mul: %Zd\n", x);
mpz_set(x, testVal);
sub_mod(x, y);
gmp_printf("Result with sub_mod: %Zd\n", x);
const int limit = 100000000;
const double checkPoint0 = (double) clock() / CLOCKS_PER_SEC;
for (int i = 0; i < limit; ++i) {
mpz_set(x, testVal);
div_mul(x);
}
const double checkPoint1 = (double) clock() / CLOCKS_PER_SEC;
const double time_div_mul = checkPoint1 - checkPoint0;
printf("time with division followed by multiplication: %f\n", time_div_mul);
const double checkPoint2 = (double) clock() / CLOCKS_PER_SEC;
for (int i = 0; i < limit; ++i) {
mpz_set(x, testVal);
sub_mod(x, y);
}
const double checkPoint3 = (double) clock() / CLOCKS_PER_SEC;
const double time_sub_mod = checkPoint3 - checkPoint2;
printf("time with subtraction and modulo: %f\n", time_sub_mod);
mpz_clear(testVal);
mpz_clear(x);
mpz_clear(y);
return 0;
}
Output:
Test value: 1234567898765432123456789
Result with div_mul: 1234567898765432123456780
Result with sub_mod: 1234567898765432123456780
time with division followed by multiplication: 2.941251
time with subtraction and modulo: 3.171949
I suspect that one of the reasons that the latter method is slightly slower is that 2 variables are needed as complicated multi operations on the same line are not accessible in the C api. If we could use gmpxx, we could write x - x % 10.
Another thought as to why the first method is faster, is that the div_mul involves two operations with unsigned integers while the sub_mod method involves an operation with an unsigned integer followed by an operation with mpz_t.
I tried to get this reproduced on ideone.com but could not get gmp.h loaded. I opted to implement a similar benchmark with type long long int just for fun. You will note the presence of volatile and that the limit is one billion instead of one hundred million as seen above. The volatile was need to keep the for loop from being optimized away.
Converting the number to a string and changing last character wouldn't be the fastest way?

Rcpp subsetting contiguous StringVector

Good afternoon,
I have been trying to use a similar method to subsetting x[200:300] in R while using Rcpp. (Note, this is not the problem I am trying to solve, but I need to subset many ranges within the functions I am trying to write in C++, and I found that this was the bottleneck of my performance)
However, although I have tried ussing the methods in rcpp, using iterators or other things, I just don't seem to find a solution that is minimally "fast." Most of the solutions I find are very slow.
And looking at the reference of Rcpp, I can't seem to find anything, not can I find it looking in StackExchange.
I know this code is pretty ugly right now... But I am just clueless
// [[Rcpp::export]]
StringVector range_test_( StringVector& x, int i, int j){
StringVector vect(x.begin()+i, x.begin()+j);
return vect;
}
And then, it is like 800 times slower. I have been trying to find the same x[i:j] function that R, which is very fast, within the rcpp base... but I can't find it.
tests_range <- rbenchmark::benchmark(
x[200:3000],
range_test_(x, 200, 3000),
order = NULL,
replications = 80
)[,1:4]
Gives as result
test replications elapsed relative
1 x[200:3000] 80 0.001 1
3 range_test_(x, 200, 3000) 80 0.822 822
If anybody knows how to access the subsetting function x[i:j] or something as fast within Rcpp I would really appreciate it. I just can't seem to find the tool I am missing.
The issue is that the iterator constructor makes a copy. See this page
Copy the data between iterators first and last to the created vector
However, you can try this instead
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::StringVector in_range(Rcpp::StringVector &x, int i, int j) {
return x[Rcpp::Range(i - 1, j - 1)]; // zero indexed
}
The time taken is a lot closer
> set.seed(20597458)
> x <- replicate(1e3, paste0(sample(LETTERS, 5), collapse = ""))
> head(x)
[1] "NHVFQ" "XMEOF" "DABUT" "XKTAZ" "NQXZL" "NPJLM"
>
> stopifnot(all.equal(in_range(x, 100, 200), x[100:200]))
>
> library(microbenchmark)
> microbenchmark(in_range(x, 100, 200), x[100:200], times = 1e4)
Unit: nanoseconds
expr min lq mean median uq max neval
in_range(x, 100, 200) 1185 1580 3669.780 1581 1976 3263205 10000
x[100:200] 790 790 1658.571 1185 1186 2331256 10000
Note that there is a page here on susbetting. I could not find a relevant example there though.

Generating a random sample of integers with unequal probability in Rcpp

I am trying to use the Rcpp sample function, but I want to sample without replacement and with unequal probability. However I am struggling to understand what form the argument sugar::probs_t probs should take, because I cannot find the definition.
sugar::probs_t is a typedef of Nullable< Vector<REALSXP> > (see the inst/include/Rcpp/sugar/functions/sample.h code file). So, if you pass it a Rcpp::NumericVector, then everything should be fine.
As you didn't provide any example code, lets look at an implementation in the unit test file for sugar:
// [[Rcpp::export]]
NumericVector sample_dbl(NumericVector x, int sz, bool rep = false, sugar::probs_t p = R_NilValue)
{
return sample(x, sz, rep, p);
}

Writing a C program to solve an equation numerically

I am trying to solve x-cos(x)=0 numerically.
I need the program to accept one argument on the command line that becomes the desired accuracy of the solution.
The solution should yield an answer within some +/- the specified accuracy (epsilon) of 0 when the equation x-cos(x) is evaluated.
The maximum number of iterations should be set to 100.
The program should start with a first guess value of x=0.
The desired accuracy should accept both floating and scientific notation formats.
There should be a warning message if too few or too many arguments are supplied, and therefore exit the program.
If a solution is found in the max iterations, it should print the solution, accuracy and number of iterations.
If no solution is achieved in the max iterations, the program should print a message to indicate as such and then close.
Find the smallest accuracy that can be achieved in max iterations in powers of 10.
I know that there are loops involved. I've started it as such:`
#include<stdio.h>
#include<math.h>
int
main(void)
{
int MAX_ITERATIONS[100],x=0;
float epsilon;
double epsilon;
x=cos[x];
for (x=0; x<MAX_ITERATIONS; ++x)
if (MAX_ITERATIONS < x)
x=MAX_ITERATIONS[100];
}
I am not sure where to go from here or if I am even on the right track.
Here is some code to help you get started. My philosophy is to make only small changes and always keep a copy of what was working before. The way when I break something, I know exactly where I broke it. This code does not do everything you want, but you can make those changes yourself. To compile the code, I used cc -lm progname.c. To execute it, I used ./a.out 0.002.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main(int argc, char *argv[]) {
double delta, x;
double epsilon=0.001;
double previous = 1.0;
if (argc > 1)
epsilon = strtof(argv[1], NULL);
printf("Using epsilon = %12.8f\n", epsilon);
for (x=0.1; x<1.0; x+=epsilon) {
delta = fabs(x-cos(x));
if (delta < previous)
previous = delta;
else
break;
}
printf("%12.8f %12.8f %12.8f\n", x, cos(x), delta);
}

Convert a text to lowercase but keep uppercase for first letter in word (with R, if possible in tm package)

Is there an R function for changing a text to lowercase, but for the first letter of each word, i.e. change?
"You live NEAR Chicago"
to
"You live Near Chicago"
The point is to benefit from a quite efficient implementation, if possible.
Could this be integrated to the tm R package (or is already available there), so that it could be applied to a corpus directly?
(the goal is to built a simple location detector in text, crossing with geonames).
If you're handling the bit where the word(s) (like "near") are next to the geographic location(s), then there are existing code snippets for something like a ucfirst bit of functionality. However, you mentioned speed, so here's a comparison between an Rcpp implementation and a basic/straight R implementation (both are vectorized):
library(Rcpp)
library(microbenchmark)
# pure Rcpp/C++ implementation
sourceCpp("
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
std::vector< std::string > ucfirst( std::vector< std::string > strings ) {
int len = strings.size();
for( int i=0; i < len; i++ ) {
std::transform(strings[i].begin(), strings[i].end(), strings[i].begin(), ::tolower);
strings[i][0] = toupper( strings[i][0] );
}
return strings;
}")
r_ucfirst <- function (str) {
paste(toupper(substring(str, 1, 1)), tolower(substring(str, 2)), sep = "")
}
print(ucfirst("hello"))
## [1] "Hello"
print(r_ucfirst("hello"))
## [1] "Hello"
mb <- microbenchmark(ucfirst("hello"), r_ucfirst("hello"), times=1000)
print(mb)
## Unit: microseconds
## expr min lq median uq max neval
## ucfirst("hello") 1.925 2.123 2.2765 2.4025 20.844 1000
## r_ucfirst("hello") 6.199 7.059 7.5285 7.9555 41.473 1000
Both should be compatible across-platforms. You can get even faster in C++ with some C-hacks, but 2.27μs for 1,000 conversions isn't exactly bad (neither is 7.5μs for the pure-R version :-)
Having said that, you could try implementing the "pure R" version with the stringi package, which uses Rcpp/C++/C-backed functions.

Resources