How to calculate integral, numerically, in Rcpp - rcpp

I've searched for an hour for the methods doing numerical integration. I'm new to Rcpp and rewriting my old programs now. What I have done in R was:
x=smpl.x(n,theta.true)
joint=function(theta){# the joint dist for
#all random variable
d=c()
for(i in 1:n){
d[i]=den(x[i],theta)
}
return(prod(d)*dbeta(theta,a,b)) }
joint.vec=Vectorize(joint)##vectorize the function, as required when
##using integrate()
margin=integrate(joint.vec,0,1)$value # the
##normalizeing constant at the donominator
area=integrate(joint.vec,0,theta.true)$value # the values at the
## numeritor
The integrate() function in R will be slow, and since I am doing the integration for a posterior distribution of a sample of size n, the value of the integration will be huge with large error.
I am trying to rewrite my code with the help of Rcpp, but I don't know how to deal with the integrate. Should I include a c++ h file? Or any suggestions?

You can code your function in C and call it, for instance, via the sourceCpp function and then integrate it in R. In alternative, you can call the integrate function of R within your C code by using the Function macro of Rcpp. See Dirk's book (Seamless R and C++ Integration with Rcpp) on page 56 for an example of how to call R functions from C. Another alternative (which I believe is the best for most cases) is to integrate your function written in C , directly in C, using the RcppGSL package.
As about the huge normalizing constant, sometimes it is better to scale the function at the mode before integrating it (you can find modes with, e.g., nlminb, optim, etc.). Then, you integrate the rescaled function and to recover the original nroming constant multiply the resulting normalizing constant by the rescaling factor. Hope this may help!

after reading your #utobi advice, I felt programming by my own maybe easier. I simply use Simpson formula to approximate the integral:
// [[Rcpp::export]]
double den_cpp (double x, double theta){
return(2*x/theta*(x<=theta)+2*(1-x)/(1-theta)*(theta<x));
}
// [[Rcpp::export]]
double joint_cpp ( double theta,int n,NumericVector x, double a, double b){
double val = 1.0;
NumericVector d(n);
for (int i = 0; i < n; i++){
double tmp = den_cpp(x[i],theta);
val = val*tmp;
}
val=val*R::dbeta(theta,a,b,0);
return(val);
}
// [[Rcpp::export]]
List Cov_rate_raw ( double theta_true, int n, double a, double b,NumericVector x){
//This function is used to test, not used in the fanal one
int steps = 1000;
double s = 0;
double start = 1.0e-4;
std::cout<<start<<" ";
double end = 1-start;
std::cout<<end<<" ";
double h = (end-start)/steps;
std::cout<<"1st h ="<<h<<" ";
double area = 0;
double margin = 0;
for (int i = 0; i < steps ; i++){
double at_x = start+h*i;
double f_val = (joint_cpp(at_x,n,x,a,b)+4*joint_cpp(at_x+h/2,n,x,a,b)+joint_cpp(at_x+h,n,x,a,b))/6;
s = s + f_val;
}
margin = h*s;
s=0;
h=(theta_true-start)/steps;
std::cout<<"2nd h ="<<h<<" ";
for (int i = 0; i < steps ; i++){
double at_x = start+h*i;
double f_val = (joint_cpp(at_x,n,x,a,b)+4*joint_cpp(at_x+h/2,n,x,a,b)+joint_cpp(at_x+h,n,x,a,b))/6;
s = s + f_val;
}
area = h * s;
double r = area/margin;
int cover = (r>=0.025)&&(r<=0.975);
List ret;
ret["s"] = s;
ret["margin"] = margin;
ret["area"] = area;
ret["ratio"] = r;
ret["if_cover"] = cover;
return(ret);
}
I'm not that good at c++, so the two for loops like kind of silly.
It generally works, but there are still several potential problems:
I don't really know how to choose the steps, or how many sub intervals do I need to approximate the integrals. I've taken numerical analysis when I was an undergraduate, I think maybe I need to check my book about the expression of the error term, to decide the step length.
I compared my results with those from R. the integrate() function in R can take care of the integral over the interval [0,1]. That helps me because my function is undefined at 0 or 1, which takes infinite value. In my C++ code, I can only make my interval from [1e-4, 1-1e-4]. I tried different values like 1e-7, 1e-10, however, 1e-4 was the one most close to R's results....What should I do with it?

Related

Math.Net Exponential Moving Average

I'm using simple moving average in Math.Net, but now that I also need to calculate EMA (exponential moving average) or any kind of weighted moving average, I don't find it in the library.
I looked over all methods under MathNet.Numerics.Statistics and beyond, but didn't find anything similar.
Is it missing in library or I need to reference some additional package?
I don't see any EMA in MathNet.Numerics, however it's trivial to program. The routine below is based on the definition at Investopedia.
public double[] EMA(double[] x, int N)
{
// x is the input series
// N is the notional age of the data used
// k is the smoothing constant
double k = 2.0 / (N + 1);
double[] y = new double[x.Length];
y[0] = x[0];
for (int i = 1; i < x.Length; i++) y[i] = k * x[i] + (1 - k) * y[i - 1];
return y;
}
Occasionally I found this package: https://daveskender.github.io/Stock.Indicators/docs/INDICATORS.html It targets to the latest .NET framework and has very detailed documents.
Try this:
public IEnumerable<double> EMA(IEnumerable<double> items, int notationalAge)
{
double k = 2.0d / (notationalAge + 1), prev = 0.0d;
var e = items.GetEnumerator();
if (!e.MoveNext()) yield break;
yield return prev = e.Current;
while(e.MoveNext())
{
yield return prev = (k * e.Current) + (1 - k) * prev;
}
}
It will still work with arrays, but also List, Queue, Stack, IReadOnlyCollection, etc.
Although it's not explicitly stated I also get the sense this is working with money, in which case it really ought to use decimal instead of double.

object array positioning-LibGdx

In my game,if I touch a particular object,coin objects will come out of them at random speeds and occupy random positions.
public void update(delta){
if(isTouched()&& getY()<Constants.WORLD_HEIGHT/2){
setY(getY()+(randomSpeed * delta));
setX(getX()-(randomSpeed/4 * delta));
}
}
Now I want to make this coins occupy positions in some patterns.Like if 3 coins come out,a triangle pattern or if 4 coins, rectangular pattern like that.
I tried to make it work,but coins are coming out and moved,but overlapping each other.Not able to create any patterns.
patterns like:
This is what I tried
int a = Math.abs(rndNo.nextInt() % 3)+1;//no of coins
int no =0;
float coinxPos = player.getX()-coins[0].getWidth()/2;
float coinyPos = player.getY();
int minCoinGap=20;
switch (a) {
case 1:
for (int i = 0; i < coins.length; i++) {
if (!coins[i].isCoinVisible() && no < a) {
coins[i].setCoinVisible(true);
coinxPos = coinxPos+rndNo.nextInt()%70;
coinyPos = coinyPos+rndNo.nextInt()%70;
coins[i].setPosition(coinxPos, coinyPos);
no++;
}
}
break;
case 2:
for (int i = 0; i < coins.length; i++) {
if (!coins[i].isCoinVisible() && no < a) {
coins[i].setCoinVisible(true);
coinxPos = coinxPos+minCoinGap+rndNo.nextInt()%70;
coinyPos = coinyPos+rndNo.nextInt()%150;
coins[i].setPosition(coinxPos, coinyPos);
no++;
}
}
break:
......
......
default:
break;
may be this is a simple logic to implement,but I wasted a lot of time on it and got confused of how to make it work.
Any help would be appreciated.
In my game, when I want some object at X,Y to reach some specific coordinates Xe,Ye at every frame I'm adding to it's coordinates difference between current and wanted position, divided by constant and multiplied by time passed from last frame. That way it starts moving quickly and goes slowly and slowly as it's closer, looks kinda cool.
X += ((Xe - X)* dt)/ CONST;
Y += ((Ye - Y)* dt)/ CONST;
You'll experimentally get that CONST value, bigger value means slower movement. If you want it to look even cooler you can add velocity variable and instead of changing directly coordinates depending on distance from end position you can adjust that velocity. That way even if object at some point reaches the end position it will still have some velocity and it will keep moving - it will have inertia. A bit more complex to achieve, but movement would be even wilder.
And if you want that Xe,Ye be some specific position (not random), then just set those constant values. No need to make it more complicated then that. Set like another constat OFFSET:
static final int OFFSET = 100;
Xe1 = X - OFFSET; // for first coin
Ye1 = Y - OFFSET;
Xe2 = X + OFFSET; // for second coin
Ye2 = Y - OFFSET;
...

Asymmetric Levenshtein distance

Given two bit strings, x and y, with x longer than y, I'd like to compute a kind of asymmetric variant of the Levensthein distance between them. Starting with x, I'd like to know the minimum number of deletions and substitutions it takes to turn x into y.
Can I just use the usual Levensthein distance for this, or do I need I need to modify the algorithm somehow? In other words, with the usual set of edits of deletion, substitution, and addition, is it ever beneficial to delete more than the difference in lengths between the two strings and then add some bits back? I suspect the answer is no, but I'm not sure. If I'm wrong, and I do need to modify the definition of Levenshtein distance to disallow deletions, how do I do so?
Finally, I would expect intuitively that I'd get the same distance if I started with y (the shorter string) and only allowed additions and substitutions. Is this right? I've got a sense for what these answers are, I just can't prove them.
If i understand you correctly, I think the answer is yes, the Levenshtein edit distance could be different than an algorithm that only allows deletions and substitutions to the larger string. Because of this, you would need to modify, or create a different algorithm to get your limited version.
Consider the two strings "ABCD" and "ACDEF". The Levenshtein distance is 3 (ABCD->ACD->ACDE->ACDEF). If we start with the longer string, and limit ourselves to deletions and substitutions we must use 4 edits (1 deletion and 3 substitutions. The reason is that strings where deletions are applied to the smaller string to efficiently get to the larger string can't be achieved when starting with the longer string, because it does not have the complimentary insertion operation (since you're disallowing that).
Your last paragraph is true. If the path from shorter to longer uses only insertions and substitutions, then any allowed path can simply be reversed from the longer to the shorter. Substitutions are the same regardless of direction, but the inserts when going from small to large become deletions when reversed.
I haven't tested this thoroughly, but this modification shows the direction I would take, and appears to work with the values I've tested with it. It's written in c#, and follows the psuedo code in the wikipedia entry for Levenshtein distance. There are obvious optimizations that can be made, but I refrained from doing that so it was more obvious what changes I've made from the standard algorithm. An important observation is that (using your constraints) if the strings are the same length, then substitution is the only operation allowed.
static int LevenshteinDistance(string s, string t) {
int i, j;
int m = s.Length;
int n = t.Length;
// for all i and j, d[i,j] will hold the Levenshtein distance between
// the first i characters of s and the first j characters of t;
// note that d has (m+1)*(n+1) values
var d = new int[m + 1, n + 1];
// set each element to zero
// c# creates array already initialized to zero
// source prefixes can be transformed into empty string by
// dropping all characters
for (i = 0; i <= m; i++) d[i, 0] = i;
// target prefixes can be reached from empty source prefix
// by inserting every character
for (j = 0; j <= n; j++) d[0, j] = j;
for (j = 1; j <= n; j++) {
for (i = 1; i <= m; i++) {
if (s[i - 1] == t[j - 1])
d[i, j] = d[i - 1, j - 1]; // no operation required
else {
int del = d[i - 1, j] + 1; // a deletion
int ins = d[i, j - 1] + 1; // an insertion
int sub = d[i - 1, j - 1] + 1; // a substitution
// the next two lines are the modification I've made
//int insDel = (i < j) ? ins : del;
//d[i, j] = (i == j) ? sub : Math.Min(insDel, sub);
// the following 8 lines are a clearer version of the above 2 lines
if (i == j) {
d[i, j] = sub;
} else {
int insDel;
if (i < j) insDel = ins; else insDel = del;
// assign the smaller of insDel or sub
d[i, j] = Math.Min(insDel, sub);
}
}
}
}
return d[m, n];
}

Finding the local maxima/peaks and minima/valleys of histograms

Ok, so I have a histogram (represented by an array of ints), and I'm looking for the best way to find local maxima and minima. Each histogram should have 3 peaks, one of them (the first one) probably much higher than the others.
I want to do several things:
Find the first "valley" following the first peak (in order to get rid of the first peak altogether in the picture)
Find the optimum "valley" value in between the remaining two peaks to separate the picture
I already know how to do step 2 by implementing a variant of Otsu.
But I'm struggling with step 1
In case the valley in between the two remaining peaks is not low enough, I'd like to give a warning.
Also, the image is quite clean with little noise to account for
What would be the brute-force algorithms to do steps 1 and 3? I could find a way to implement Otsu, but the brute-force is escaping me, math-wise. As it turns out, there is more documentation on doing methods like otsu, and less on simply finding peaks and valleys. I am not looking for anything more than whatever gets the job done (i.e. it's a temporary solution, just has to be implementable in a reasonable timeframe, until I can spend more time on it)
I am doing all this in c#
Any help on which steps to take would be appreciated!
Thank you so much!
EDIT: some more data:
most histogram are likely to be like the first one, with the first peak representing background.
Use peakiness-test. It's a method to find all the possible peak between two local minima, and measure the peakiness based on a formula. If the peakiness higher than a threshold, the peak is accepted.
Source: UCF CV CAP5415 lecture 9 slides
Below is my code:
public static List<int> PeakinessTest(int[] histogram, double peakinessThres)
{
int j=0;
List<int> valleys = new List<int> ();
//The start of the valley
int vA = histogram[j];
int P = vA;
//The end of the valley
int vB = 0;
//The width of the valley, default width is 1
int W = 1;
//The sum of the pixels between vA and vB
int N = 0;
//The measure of the peaks peakiness
double peakiness=0.0;
int peak=0;
bool l = false;
try
{
while (j < 254)
{
l = false;
vA = histogram[j];
P = vA;
W = 1;
N = vA;
int i = j + 1;
//To find the peak
while (P < histogram[i])
{
P = histogram[i];
W++;
N += histogram[i];
i++;
}
//To find the border of the valley other side
peak = i - 1;
vB = histogram[i];
N += histogram[i];
i++;
W++;
l = true;
while (vB >= histogram[i])
{
vB = histogram[i];
W++;
N += histogram[i];
i++;
}
//Calculate peakiness
peakiness = (1 - (double)((vA + vB) / (2.0 * P))) * (1 - ((double)N / (double)(W * P)));
if (peakiness > peakinessThres & !valleys.Contains(j))
{
//peaks.Add(peak);
valleys.Add(j);
valleys.Add(i - 1);
}
j = i - 1;
}
}
catch (Exception)
{
if (l)
{
vB = histogram[255];
peakiness = (1 - (double)((vA + vB) / (2.0 * P))) * (1 - ((double)N / (double)(W * P)));
if (peakiness > peakinessThres)
valleys.Add(255);
//peaks.Add(255);
return valleys;
}
}
//if(!valleys.Contains(255))
// valleys.Add(255);
return valleys;
}

Is there a circular hash function?

Thinking about this question on testing string rotation, I wondered: Is there was such thing as a circular/cyclic hash function? E.g.
h(abcdef) = h(bcdefa) = h(cdefab) etc
Uses for this include scalable algorithms which can check n strings against each other to see where some are rotations of others.
I suppose the essence of the hash is to extract information which is order-specific but not position-specific. Maybe something that finds a deterministic 'first position', rotates to it and hashes the result?
It all seems plausible, but slightly beyond my grasp at the moment; it must be out there already...
I'd go along with your deterministic "first position" - find the "least" character; if it appears twice, use the next character as the tie breaker (etc). You can then rotate to a "canonical" position, and hash that in a normal way. If the tie breakers run for the entire course of the string, then you've got a string which is a rotation of itself (if you see what I mean) and it doesn't matter which you pick to be "first".
So:
"abcdef" => hash("abcdef")
"defabc" => hash("abcdef")
"abaac" => hash("aacab") (tie-break between aa, ac and ab)
"cabcab" => hash("abcabc") (it doesn't matter which "a" comes first!)
Update: As Jon pointed out, the first approach doesn't handle strings with repetition very well. Problems arise as duplicate pairs of letters are encountered and the resulting XOR is 0. Here is a modification that I believe fixes the the original algorithm. It uses Euclid-Fermat sequences to generate pairwise coprime integers for each additional occurrence of a character in the string. The result is that the XOR for duplicate pairs is non-zero.
I've also cleaned up the algorithm slightly. Note that the array containing the EF sequences only supports characters in the range 0x00 to 0xFF. This was just a cheap way to demonstrate the algorithm. Also, the algorithm still has runtime O(n) where n is the length of the string.
static int Hash(string s)
{
int H = 0;
if (s.Length > 0)
{
//any arbitrary coprime numbers
int a = s.Length, b = s.Length + 1;
//an array of Euclid-Fermat sequences to generate additional coprimes for each duplicate character occurrence
int[] c = new int[0xFF];
for (int i = 1; i < c.Length; i++)
{
c[i] = i + 1;
}
Func<char, int> NextCoprime = (x) => c[x] = (c[x] - x) * c[x] + x;
Func<char, char, int> NextPair = (x, y) => a * NextCoprime(x) * x.GetHashCode() + b * y.GetHashCode();
//for i=0 we need to wrap around to the last character
H = NextPair(s[s.Length - 1], s[0]);
//for i=1...n we use the previous character
for (int i = 1; i < s.Length; i++)
{
H ^= NextPair(s[i - 1], s[i]);
}
}
return H;
}
static void Main(string[] args)
{
Console.WriteLine("{0:X8}", Hash("abcdef"));
Console.WriteLine("{0:X8}", Hash("bcdefa"));
Console.WriteLine("{0:X8}", Hash("cdefab"));
Console.WriteLine("{0:X8}", Hash("cdfeab"));
Console.WriteLine("{0:X8}", Hash("a0a0"));
Console.WriteLine("{0:X8}", Hash("1010"));
Console.WriteLine("{0:X8}", Hash("0abc0def0ghi"));
Console.WriteLine("{0:X8}", Hash("0def0abc0ghi"));
}
The output is now:
7F7D7F7F
7F7D7F7F
7F7D7F7F
7F417F4F
C796C7F0
E090E0F0
A909BB71
A959BB71
First Version (which isn't complete): Use XOR which is commutative (order doesn't matter) and another little trick involving coprimes to combine ordered hashes of pairs of letters in the string. Here is an example in C#:
static int Hash(char[] s)
{
//any arbitrary coprime numbers
const int a = 7, b = 13;
int H = 0;
if (s.Length > 0)
{
//for i=0 we need to wrap around to the last character
H ^= (a * s[s.Length - 1].GetHashCode()) + (b * s[0].GetHashCode());
//for i=1...n we use the previous character
for (int i = 1; i < s.Length; i++)
{
H ^= (a * s[i - 1].GetHashCode()) + (b * s[i].GetHashCode());
}
}
return H;
}
static void Main(string[] args)
{
Console.WriteLine(Hash("abcdef".ToCharArray()));
Console.WriteLine(Hash("bcdefa".ToCharArray()));
Console.WriteLine(Hash("cdefab".ToCharArray()));
Console.WriteLine(Hash("cdfeab".ToCharArray()));
}
The output is:
4587590
4587590
4587590
7077996
You could find a deterministic first position by always starting at the position with the "lowest" (in terms of alphabetical ordering) substring. So in your case, you'd always start at "a". If there were multiple "a"s, you'd have to take two characters into account etc.
I am sure that you could find a function that can generate the same hash regardless of character position in the input, however, how will you ensure that h(abc) != h(efg) for every conceivable input? (Collisions will occur for all hash algorithms, so I mean, how do you minimize this risk.)
You'd need some additional checks even after generating the hash to ensure that the strings contain the same characters.
Here's an implementation using Linq
public string ToCanonicalOrder(string input)
{
char first = input.OrderBy(x => x).First();
string doubledForRotation = input + input;
string canonicalOrder
= (-1)
.GenerateFrom(x => doubledForRotation.IndexOf(first, x + 1))
.Skip(1) // the -1
.TakeWhile(x => x < input.Length)
.Select(x => doubledForRotation.Substring(x, input.Length))
.OrderBy(x => x)
.First();
return canonicalOrder;
}
assuming generic generator extension method:
public static class TExtensions
{
public static IEnumerable<T> GenerateFrom<T>(this T initial, Func<T, T> next)
{
var current = initial;
while (true)
{
yield return current;
current = next(current);
}
}
}
sample usage:
var sequences = new[]
{
"abcdef", "bcdefa", "cdefab",
"defabc", "efabcd", "fabcde",
"abaac", "cabcab"
};
foreach (string sequence in sequences)
{
Console.WriteLine(ToCanonicalOrder(sequence));
}
output:
abcdef
abcdef
abcdef
abcdef
abcdef
abcdef
aacab
abcabc
then call .GetHashCode() on the result if necessary.
sample usage if ToCanonicalOrder() is converted to an extension method:
sequence.ToCanonicalOrder().GetHashCode();
One possibility is to combine the hash functions of all circular shifts of your input into one meta-hash which does not depend on the order of the inputs.
More formally, consider
for(int i=0; i<string.length; i++) {
result^=string.rotatedBy(i).hashCode();
}
Where you could replace the ^= with any other commutative operation.
More examply, consider the input
"abcd"
to get the hash we take
hash("abcd") ^ hash("dabc") ^ hash("cdab") ^ hash("bcda").
As we can see, taking the hash of any of these permutations will only change the order that you are evaluating the XOR, which won't change its value.
I did something like this for a project in college. There were 2 approaches I used to try to optimize a Travelling-Salesman problem. I think if the elements are NOT guaranteed to be unique, the second solution would take a bit more checking, but the first one should work.
If you can represent the string as a matrix of associations so abcdef would look like
a b c d e f
a x
b x
c x
d x
e x
f x
But so would any combination of those associations. It would be trivial to compare those matrices.
Another quicker trick would be to rotate the string so that the "first" letter is first. Then if you have the same starting point, the same strings will be identical.
Here is some Ruby code:
def normalize_string(string)
myarray = string.split(//) # split into an array
index = myarray.index(myarray.min) # find the index of the minimum element
index.times do
myarray.push(myarray.shift) # move stuff from the front to the back
end
return myarray.join
end
p normalize_string('abcdef').eql?normalize_string('defabc') # should return true
Maybe use a rolling hash for each offset (RabinKarp like) and return the minimum hash value? There could be collisions though.

Resources