I have a approximation (round) error issue in R

I have a approximation (round) error issue in R - decimal

I have some simple error coming from approximation of decimal numbers in R. I guess it's due to the reserved bytes in memory or something like that...
I print those intermediate values for the variables while executing my R code, just to check where the error comes from:
prop_individues_stades[S] = 4.626294e-05
repart_pop_total = 1
pb1 = 0.9999537
pb2 = 1
frac = 4.626294e-05
The strange thing is that frac is calculated like this :
frac = repart_pop_total * (pb2 - pb1)
If i check directly in R, I obtain :
frac = 4.63e-05
I understand that it comes from round decimal places, but as I obtain prop_individues_stades[S] = 4.626294e-05 and frac = 4.626294e-05, and my next step is prop_individues_stades[S] - frac, then I obtain zero and my program stops...
The ideal values without approximation errors suppose to be :
prop_individues_stades[S] = 4.6262936325314E-5
repart_pop_total = 1
pb1 = 0.99995373706367
pb2 = 1
frac = 4.6262936325259E-5
and then prop_individues_stades[S] - frac = 5.5511151231258E-17 and the program will continue...
I tried with the function round(frac, ...) but it doesn't help.
Sometimes small values makes things important :) Thank's in advance for all your ideas/solutions/advices...

Related

Splitting an int64 into two int32, performing math, then re-joining

I am working within constraints of hardware that has 64bit integer limit. Does not support floating point. I am dealing with very large integers that I need to multiply and divide. When multiplying I encounter an overflow of the 64bits. I am prototyping a solution in python. This is what I have in my function:
upper = x >> 32 #x is cast as int64 before being passed to this function
lower = x & 0x00000000FFFFFFFF
temp_upper = upper * y // z #Dividing first is not an option, as this is not the actual equation I am working with. This is just to make sure in my testing I overflow unless I do the splitting.
temp_lower = lower * y // z
return temp_upper << 32 | lower
This works, somewhat, but I end up losing a lot of precision (my result is off by sometimes a few million). From looking at it, it appears that this is happening because of the division. If sufficient enough it shifts the upper to the right. Then when I shift it back into place I have a gap of zeroes.
Unfortunately this topic is very hard to google, since anything with upper/lower brings up results about rounding up/down. And anything about splitting ints returns results about splitting them into a char array. Anything about int arithmetic bring up basic algebra with integer math. Maybe I am just not good at googling. But can you guys give me some pointers on how to do this?
Splitting like this is just a thing I am trying, it doesnt have to be the solution. All I need to be able to do is to temporarily go over 64bit integer limit. The final result will be under 64bit (After the division part). I remember learning in college about splitting it up like this and then doing the math and re-combining. But unfortunately as I said I am having trouble finding anything online on how to do the actual math on it.
Lastly, my numbers are sometimes small. So I cant chop off the right bits. I need the results to basically be equivalent to if I used something like int128 or something.
I suppose a different way to look at this problem is this. Since I have no problem with splitting the int64, we can forget about that part. So then we can pretend that two int64's are being fed to me, one is upper and one is lower. I cant combine them, because they wont fit into a single int64. So I need to divide them first by Z. Combining step is easy. How do I do the division?
Thanks.

As I understand it, you want to perform (x*y)//z.
Your numbers x,y,z all fit on 64bits, except that you need 128 bits for intermediate x*y.
The problem you have is indeed related to division: you have
h * y = qh * z + rh
l * y = ql * z + rl
h * y << 32 + l*y = (qh<<32 + ql) * z + (rh<<32 + rl)
but nothing says that (rh<<32 + rl) < z, and in your case high bits of l*y overlap low bits of h * y, so you get the wrong quotient, off by potentially many units.
What you should do as second operation is rather:
rh<<32 + l * y = ql' * z + rl'
Then get the total quotient qh<<32 + ql'
But of course, you must care to avoid overflow when evaluating left operand...
Since you are splitting only one of the operands of x*y, I'll assume that the intermediate result always fits on 96 bits.
If that is correct, then your problem is to divide a 3 32bits limbs x*y by a 2 32bits limbs z.
It is thus like Burnigel - Ziegler divide and conquer algorithm for division.
The algorithm can be decomposed like this:
obtain the 3 limbs a2,a1,a0 of multiplication x*y by using karatsuba for example
split z into 2 limbs z1,z0
perform the div32( (a2,a1,a0) , (z1,z0) )
here is some pseudo code, only dealing with positive operands, and with no guaranty to be correct, but you get an idea of implementation:
p = 1<<32;
function (a1,a0) = split(a)
a1 = a >> 32;
a0 = a - (a1 * p);
function (a2,a1,a0) = mul22(x,y)
(x1,x0) = split(x) ;
(y1,y0) = split(y) ;
(h1,h0) = split(x1 * y1);
assert(h1 == 0); -- assume that results fits on 96 bits
(l1,l0) = split(x0 * y0);
(m1,m0) = split((x1 - x0) * (y0 - y1)); -- karatsuba trick
a0 = l0;
(carry,a1) = split( l1 + l0 + h0 + m0 );
a2 = l1 + m1 + h0 + carry;
function (q,r) = quorem(a,b)
q = a // b;
r = a - (b * q);
function (q1,q0,r0) = div21(a1,a0,b0)
(q1,r1) = quorem(a1,b0);
(q0,r0) = quorem( r1 * p + a0 , b0 );
(q1,q0) = split( q1 * p + q0 );
function q = div32(a2,a1,a0,b1,b0)
(q,r) = quorem(a2*p+a1,b1*p+b0);
q = q * p;
(a2,a1)=split(r);
if a2<b1
(q1,q0,r)=div21(a2,a1,b1);
assert(q1==0); -- since a2<b1...
else
q0=p-1;
r=(a2-b1)*p+a1+b1;
(d1,d0) = split(q0*b0);
r = (r-d1)*p + a0 - d0;
while(r < 0)
q = q - 1;
r = r + b1*p + b0;
function t=muldiv(x,y,z)
(a2,a1,a0) = mul22(x,y);
(z1,z0) = split(z);
if z1 == 0
(q2,q1,r1)=div21(a2,a1,z0);
assert(q2==0); -- otherwise result will not fit on 64 bits
t = q1*p + ( ( r1*p + a0 )//z0);
else
t = div32(a2,a1,a0,z1,z0);

Iterations over 2d numpy arrays with while and for statements

In the code supplied below I am trying to iterate over 2D numpy array [i][k]
Originally it is a code which was written in Fortran 77 which is older than my grandfather. I am trying to adapt it to python.
(for people interested whatabouts: it is a simple hydraulics transients event solver)
Bear in mind that all variables are introduced in my code which I don't paste here.
H = np.zeros((NS,50))
Q = np.zeros((NS,50))
Here I am assigning the first row values:
for i in range(NS):
H[0][i] = HR-i*R*Q0**2
Q[0][i] = Q0
CVP = .5*Q0**2/H[N]
T = 0
k = 0
TAU = 1
#Interior points:
HP = np.zeros((NS,50))
QP = np.zeros((NS,50))
while T<=Tmax:
T += dt
k += 1
for i in range(1,N):
CP = H[k][i-1]+Q[k][i-1]*(B-R*abs(Q[k][i-1]))
CM = H[k][i+1]-Q[k][i+1]*(B-R*abs(Q[k][i+1]))
HP[k][i-1] = 0.5*(CP+CM)
QP[k][i-1] = (HP[k][i-1]-CM)/B
#Boundary Conditions:
HP[k][0] = HR
QP[k][0] = Q[k][1]+(HP[k][0]-H[k][1]-R*Q[k][1]*abs(Q[k][1]))/B
if T == Tc:
TAU = 0
CV = 0
else:
TAU = (1.-T/Tc)**Em
CV = CVP*TAU**2
CP = H[k][N-1]+Q[k][N-1]*(B-R*abs(Q[k][N-1]))
QP[k][N] = -CV*B+np.sqrt(CV**2*(B**2)+2*CV*CP)
HP[k][N] = CP-B*QP[k][N]
for i in range(NS):
H[k][i] = HP[k][i]
Q[k][i] = QP[k][i]
Remember i is for rows and k is for columns
What I am expecting is that for all k number of columns the values should be calculated until T<=Tmax condition is met. I cannot figure out what my mistake is, I am getting the following errors:
RuntimeWarning: divide by zero encountered in true_divide
CVP = .5*Q0**2/H[N]
RuntimeWarning: invalid value encountered in multiply
QP[N][k] = -CV*B+np.sqrt(CV**2*(B**2)+2*CV*CP)
QP[N][k] = -CV*B+np.sqrt(CV**2*(B**2)+2*CV*CP)
ValueError: setting an array element with a sequence.

Looking at your first iteration:
H = np.zeros((NS,50))
Q = np.zeros((NS,50))
for i in range(NS):
H[0][i] = HR-i*R*Q0**2
Q[0][i] = Q0
The shape of H is (NS,50), but when you iterate over a range(NS) you apply that index to the 2nd dimension. Why? Shouldn't it apply to the dimension with size NS?
In numpy arrays have 'C' order by default. Last dimension is inner most. They can have a F (fortran) order, but let's not go there. Thinking of the 2d array as a table, we typically talk of rows and columns, though they don't have a formal definition in numpy.
Lets assume you want to set the first column to these values:
for i in range(NS):
H[i, 0] = HR - i*R*Q0**2
Q[i, 0] = Q0
But we can do the assignment whole rows or columns at a time. I believe new versions of Fortran also have these 'whole-array' functions.
Q[:, 0] = Q0
H[:, 0] = HR - np.arange(NS) * R * Q0**2
One point of caution when translating to Python. Indexing starts with 0; so does ranges and np.arange(...).
H[0][i] is functionally the same as H[0,i]. But when using slices you have to use the H[:,i] format.
I suspect your other iterations have similar problems, but I'll stop here for now.

Regarding the errors:
The first:
RuntimeWarning: divide by zero encountered in true_divide
CVP = .5*Q0**2/H[N]
You initialize H as zeros so it is normal that it complains of division by zero. Maybe you should add a conditional.
The third:
QP[N][k] = -CV*B+np.sqrt(CV**2*(B**2)+2*CV*CP)
ValueError: setting an array element with a sequence.
You define CVP = .5*Q0**2/H[N] and then CV = CVP*TAU**2 which is a sequence. And then you try to assign a derivate form it to QP[N][K] which is an element. You are trying to insert an array to a value.
For the second error I think it might be related to the third. If you could provide more information I would like to try to understand what happens.
Hope this has helped.

python scipy fmin not completing succesfully

I have a function that I am attempting to minimize for multiple values. For some values it terminates successfully however for others the error
Warning: Maximum number of function evaluations has been exceeded.
Is the error that is given. I am unsure of the role of maxiter and maxfun and how to increase or decrease these in order to successfully get to the minimum. My understanding is that these values are optional so I am unsure of what the default values are.
# create starting parameters, parameters equal to sin(x)
a = 1
k = 0
h = 0
wave_params = [a, k, h]
def wave_func(func_params):
"""This function calculates the difference between a sinewave (sin(x)) and raw_data (different sin wave)
This is the function that will be minimized by modulating a, b, k, and h parameters in order to minimize
the difference between curves."""
a = func_params[0]
b = 1
k = func_params[1]
h = func_params[2]
y_wave = a * np.sin((x_vals-h)/b) + k
error = np.sum((y_wave - raw_data) * (y_wave - raw_data))
return error
wave_optimized = scipy.optimize.fmin(wave_func, wave_params)

You can try using scipy.optimize.minimize with method='Nelder-Mead' https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html.
https://docs.scipy.org/doc/scipy/reference/optimize.minimize-neldermead.html#optimize-minimize-neldermead
Then you can just do
minimum = scipy.optimize.minimize(wave_func, wave_params, method='Nelder-Mead')
n_function_evaluations = minimum.nfev
n_iterations = minimum.nit
or you can customize the search algorithm like this:
minimum = scipy.optimize.minimize(
wave_func, wave_params, method='Nelder-Mead',
options={'maxiter': 10000, 'maxfev': 8000}
)
I don't know anything about fmin, but my guess is that it behaves extremely similarly.

Color Histogram

I'm trying to calculate histogram for an image. I'm using the following formula to calculate the bin
%bin = red*(N^2) + green*(N^1) + blue;
I have to implement the following Matlab functions.
[row, col, noChannels] = size(rgbImage);
hsvImage = rgb2hsv(rgbImage); % Ranges from 0 to 1.
H = zeros(4,4,4);
for col = 1 : columns
for row = 1 : rows
hBin = floor(hsvImage(row, column, 1) * 15);
sBin = floor(hsvImage(row, column, 2) * 4);
vBin = floor(hsvImage(row, column, 3) * 4);
F(hBin, sBin, vBin) = hBin, sBin, vBin + 1;
end
end
When I run the code I get the following error message "Subscript indices must either be real positive integers or logical."
As I am new to Matlab and Image processing, I'm not sure if the problem is with implementing the algorithm or a syntax error.

There are 3 problems with your code. (Four if you count that you changed from H to F your accumulator vector, but I'll assume that's a typo.)
First one, your variable bin can be zero at any moment if the values of a giving pixel are low. And F(0) is not a valid index for a vector or matrix. This is why you are getting that error.
You can solve easily by doing F(bin+1) and keep in mind that your F vector will have your values shifted one position over.
Second error, you are assigning the value bin + 1 to your accumulator vector F, which is not what you want, you want to add 1 every time a pixel in that range is found, what you should do is F(bin+1) = F(bin+1) + 1;. This way the values of F will be increasing all the time.
Third error is simpler, you forgot to implement your bin = red*(N^2) + green*(N^1) + blue; equation

setting point sizes when using Gadfly in Julia

In my attempts to practice Julia, I've made a program which draws a bifurcation diagram. My code is as follows:
function bifur(x0,y0,a=1.3,b=0.4,n=1000,m=10000)
i,x,y=1,x0,y0
while i < n && abs(x) < m
x,y = a - x^2 + y, b * x
i += 1
end
if abs(x) < m
return x
else
return 1000
end
end
la = Float64[];
lx = Float64[];
for a=0:400
for j = 1:1000
x0 = rand()
y0 = rand()
x = bifur(x0,y0,a/100)
if x != 1000
push!(la,a/100)
push!(lx,x)
end
end
end
using Gadfly
myplot = Gadfly.plot( x=la, y=lx , Scale.x_discrete, Scale.y_continuous, Geom.point)
draw(PNG("myplot.png",10inch,8inch),myplot)
The output I get is this image:
In order to make my plot look more like this:
I need to be able to set point sizes to as small as one pixel. Then by increasing the iteration length I should be able to get a better bifurcation diagram. Does anyone know how to set the point sizes in Gadfly diagrams in Julia?

[Just to encapsulate the comments as an answer...]
Gadfly's Theme defaults can be changed. In particular, point_size is probably what you are looking for.
For changing the automatic plot scale/range settings, have a look at Gadfly's Scale params.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string