I am trying to fit the beneath data to the form - I am most interested in 'c' (I know that c ≈ 1/8, b ≈ 3) but would like to extract all these values from the data.
Formula:
y = a*(x-b)**c
Values.txt:
# "values.txt"
2.000000e+00 6.058411e-04
2.200000e+00 5.335520e-04
2.400000e+00 3.509583e-03
2.600000e+00 1.655943e-03
2.800000e+00 1.995418e-03
3.000000e+00 9.437851e-04
3.200000e+00 5.516159e-04
3.400000e+00 6.765981e-04
3.600000e+00 3.860859e-04
3.800000e+00 2.942881e-04
4.000000e+00 5.039975e-04
4.200000e+00 3.962199e-04
4.400000e+00 4.659717e-04
4.600000e+00 2.892683e-04
4.800000e+00 2.248839e-04
5.000000e+00 2.536980e-04
I have tried using the following commands in gnuplot however I am not meaningful results
f(x) = a*(x-b)**c
b = 3
c = 1/8
fit f(x) "values.txt" via a,b,c
Does anyone know the best way to extract these values? I would rather not provide initial guesses for 'b' & 'c' if possible.
Thanks,
J
The main problem with your fitting function is finding b. You can express your equation as a linear function in log(x-b), after which the fitting is trivial:
b = 3
f(x) = c0 + c1 * x
fit f(x) "values.txt" using (log($1-b)):(log($2)) via c0, c1
a = exp(c0)
c = c1
As you see, you need to provide b but do not need initial guesses for the other parameters because it's a trivial linear fit.
Now, I would suggest that you provide a series of values of b and check how good the fitting is for each value. gnuplot gives you the error in the fitting parameter. Then you can plot the overall error (error_c0 + error_c1) as a function of b and figure out for which b the error is minimum. About the optimum b the curve error_c0 + error_c1 vs b should be quadratic and have the minimum at b_opt. Then run the fitting as in the code above with this b = b_opt and get a and c.
Related
I'm plotting this dataset and making a logarithmic fit, but, for some reason, the fit seems to be strongly wrong, at some point I got a good enough fit, but then I re ploted and there were that bad fit. At the very beginning there were a 0.0 0.0076 but I changed that to 0.001 0.0076 to avoid the asymptote.
I'm using (not exactly this one for the image above but now I'm testing with this one and there is that bad fit as well) this for the fit
f(x) = a*log(k*x + b)
fit = fit f(x) 'R_B/R_B.txt' via a, k, b
And the output is this
Also, sometimes it says 7 iterations were as is the case shown in the screenshot above, others only 1, and when it did the "correct" fit, it did like 35 iterations or something and got a = 32 if I remember correctly
Edit: here is again the good one, the plot I got is this one. And again, I re ploted and get that weird fit. It's curious that if there is the 0.0 0.0076 when the good fit it's about to be shown, gnuplot says "Undefined value during function evaluation", but that message is not shown when I'm getting the bad one.
Do you know why do I keep getting this inconsistence? Thanks for your help
As I already mentioned in comments the method of fitting antiderivatives is much better than fitting derivatives because the numerical calculus of derivatives is strongly scattered when the data is slightly scatered.
The principle of the method of fitting an integral equation (obtained from the original equation to be fitted) is explained in https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales . The application to the case of y=a.ln(c.x+b) is shown below.
Numerical calculus :
In order to get even better result (according to some specified criteria of fitting) one can use the above values of the parameters as initial values for iterarive method of nonlinear regression implemented in some convenient software.
NOTE : The integral equation used in the present case is :
NOTE : On the above figure one can compare the result with the method of fitting an integral equation to the result with the method of fitting with derivatives.
Acknowledgements : Alex Sveshnikov did a very good work in applying the method of regression with derivatives. This allows an interesting and enlightening comparison. If the goal is only to compute approximative values of parameters to be used in nonlinear regression software both methods are quite equivalent. Nevertheless the method with integral equation appears preferable in case of scattered data.
UPDATE (After Alex Sveshnikov updated his answer)
The figure below was drawn in using the Alex Sveshnikov's result with further iterative method of fitting.
The two curves are almost indistinguishable. This shows that (in the present case) the method of fitting the integral equation is almost sufficient without further treatment.
Of course this not always so satisfying. This is due to the low scatter of the data.
In ADDITION , answer to a question raised in comments by CosmeticMichu :
The problem here is that the fit algorithm starts with "wrong" approximations for parameters a, k, and b, so during the minimalization it finds a local minimum, not the global one. You can improve the result if you provide the algorithm with starting values, which are close to the optimal ones. For example, let's start with the following parameters:
gnuplot> a=47.5087
gnuplot> k=0.226
gnuplot> b=1.0016
gnuplot> f(x)=a*log(k*x+b)
gnuplot> fit f(x) 'R_B.txt' via a,k,b
....
....
....
After 40 iterations the fit converged.
final sum of squares of residuals : 16.2185
rel. change during last iteration : -7.6943e-06
degrees of freedom (FIT_NDF) : 18
rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : 0.949225
variance of residuals (reduced chisquare) = WSSR/ndf : 0.901027
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 35.0415 +/- 2.302 (6.57%)
k = 0.372381 +/- 0.0461 (12.38%)
b = 1.07012 +/- 0.02016 (1.884%)
correlation matrix of the fit parameters:
a k b
a 1.000
k -0.994 1.000
b 0.467 -0.531 1.000
The resulting plot is
Now the question is how you can find "good" initial approximations for your parameters? Well, you start with
If you differentiate this equation you get
or
The left-hand side of this equation is some constant 'C', so the expression in the right-hand side should be equal to this constant as well:
In other words, the reciprocal of the derivative of your data should be approximated by a linear function. So, from your data x[i], y[i] you can construct the reciprocal derivatives x[i], (x[i+1]-x[i])/(y[i+1]-y[i]) and the linear fit of these data:
The fit gives the following values:
C*k = 0.0236179
C*b = 0.106268
Now, we need to find the values for a, and C. Let's say, that we want the resulting graph to pass close to the starting and the ending point of our dataset. That means, that we want
a*log(k*x1 + b) = y1
a*log(k*xn + b) = yn
Thus,
a*log((C*k*x1 + C*b)/C) = a*log(C*k*x1 + C*b) - a*log(C) = y1
a*log((C*k*xn + C*b)/C) = a*log(C*k*xn + C*b) - a*log(C) = yn
By subtracting the equations we get the value for a:
a = (yn-y1)/log((C*k*xn + C*b)/(C*k*x1 + C*b)) = 47.51
Then,
log(k*x1+b) = y1/a
k*x1+b = exp(y1/a)
C*k*x1+C*b = C*exp(y1/a)
From this we can calculate C:
C = (C*k*x1+C*b)/exp(y1/a)
and finally find the k and b:
k=0.226
b=1.0016
These are the values used above for finding the better fit.
UPDATE
You can automate the process described above with the following script:
# Name of the file with the data
data='R_B.txt'
# The coordinates of the last data point
xn=NaN
yn=NaN
# The temporary coordinates of a data point used to calculate a derivative
x0=NaN
y0=NaN
linearFit(x)=Ck*x+Cb
fit linearFit(x) data using (xn=$1,dx=$1-x0,x0=$1,$1):(yn=$2,dy=$2-y0,y0=$2,dx/dy) via Ck, Cb
# The coordinates of the first data point
x1=NaN
y1=NaN
plot data using (x1=$1):(y1=$2) every ::0::0
a=(yn-y1)/log((Ck*xn+Cb)/(Ck*x1+Cb))
C=(Ck*x1+Cb)/exp(y1/a)
k=Ck/C
b=Cb/C
f(x)=a*log(k*x+b)
fit f(x) data via a,k,b
plot data, f(x)
pause -1
Could someone tell me if I've coded this correctly? This is my code for solving for the sides of a triangle given its perimeter, altitude, and angle (for the algebra see http://www.analyzemath.com/Geometry/challenge/triangle_per_alt_angle.html)
Prompt P
Prompt H
Prompt L [the angle]
(HP^2)/(2H(1+cos(L))+2Psin(L))→Y
(-P^2-2(1+cos(L))Y/(-2P)→Z
(Z+sqrt(Z^2-4Y))/2→N
[The same as above but Z-sqrt...]→R
If N>0
N→U
If R>0
R→U
Y/U→V
sqrt(U^2+V^2-2UVcos(L))→W
Disp U
Disp V
Disp W
Also, how would I fix this so that I can input angle = 90?
Also, in this code does it matter if the altitude is the one between b and c (refer to the website again)?
Thanks in advance
The code already works with L=90°.
Yes, the altitude must be the distance from point A to the base a between points B and C, forming a right-angle with that base. The derivation made that assumption, specifically with respect to the way it used h and a in the second area formula 1/2 h a. That exact formula would not apply if h was drawn differently.
The reason your second set of inputs resulted in a non-real answer is that sometimes a set of mathematical parameters can be inconsistent with each other and describe an impossible construct, and your P, h, and L values do exactly that. Specifically, they describe an impossible triangle.
Given an altitude h and angle L, the smallest perimeter P that can be achieved is an isosceles triangle split down the middle by h. With L=30, this would have perimeter P = a + b + c = 2h tan15 + h/cos15 + h/cos15, which, plugging in your h=3, results in P=7.819. You instead tried to use P=3+sqrt(3)=4.732. Try using various numbers less than 7.819 (plus a little; I've rounded here) and you'll see they all result in imaginary results. That's math telling you you're calculating something that cannot exist in reality.
If you fill in the missing close parenthesis between the Y and the / in line 5, then your code works perfectly.
I wrote the code slightly differently from you, here's what I did:
Prompt P
Prompt H
Prompt L
HP²/(2H(1+cos(L))+2Psin(L))→Y
(HP-Ysin(L))/H→Z
Z²-4Y→D
If D<0:Then
Disp "IMAGINARY"
Stop
End
(Z+√(D))/2→C
Y/C→B
P-(B+C)→A
Disp A
Disp B
Disp C
Edit: #Gabriel, there's nothing special (with respect to this question) about the angles 30-60-90; there is an infinite number of sets of P, h, and L inputs that describe such triangles. However, if you actually want to arrive at such triangles in the answer, you've actually changed the question; instead of just knowing one angle L plus P and h, you now know three angles (30-60-90) plus P and h. You've now over-specified the triangle, so that it is pretty well certain that a randomly generated set of inputs will describe an impossible triangle. As a contrived example, if you specified h as 0.0001 and P as 99999, then that's clearly impossible, because a triangle with a tiny altitude and fairly unextreme angles (which 30-60-90 are) cannot possibly achieve a perimeter many times its altitude.
If you want to start with just one of P or h, then you can derive equations to calculate all parameters of the triangle from the known P or h plus the knowledge of the 30-60-90 angles.
To give one example of this, if we assume that side a forms the base of the triangle between the 90° and 60° angles, then we have L=30 and (labelling the 60° angle as B) we have h=b, and you can get simple equations for all parameters:
P = a + h + c
sin60 = h/c
cos60 = a/c
=> P = c cos60 + c sin60 + c
P = c(cos60 + sin60 + 1)
c = P/(cos60 + sin60 + 1)
b = h = c sin60
a = c cos60
Plugging in P=100 we have
c = 100/(cos60 + sin60 + 1) = 42.265
b = h = 36.603
a = 21.132
If you plug in P=100, h=36.603, and L=30 into the code, you'll see you get these exact results.
Always optimize for speed, then size.
Further optimizing bgoldst's code:
Prompt P,H,L
HP²/(2H(1+cos(L))+2Psin(L
.5(Z+√((HP-sin(L)Ans)/H)²-4Ans
{Y/C→B,P-B-Ans,Ans
How can I transform the blue curve values into linear (red curve)? I am doing some tests in excel, but basically I have those blue line values inside a 3D App that I want to manipulate with python so I can make those values linear. Is there any mathematical approach that I am missing?
The x axis goes from 0 to 90, and the y axis from 0 to 1.
For example: in the middle of the graph the blue line gives me a value of "0,70711", and I know that in linear it is "0,5". I was wondering if there's an easy formula to transform all the incoming non-linear values into linear.
I have no idea what "formula" is creating that non-linear blue line, also ignore the yellow line since I was just trying to "reverse engineer" to see if would lead me to any conclusion.
Thank you
Find a linear function y = ax + b that for x = 0 gives the value 1 and for x = 90 gives 0, just like the function that is represented by a blue curve.
In that case, your system of equations is the following:
1 = b // for x = 0
0 = a*90 + b // for x = 90
Solution provided by solver is the following : { a = -1/90, b = 1 }, the red linear function will have form y = ax + b, we put the values of a and b we found from the solver and we discover that the linear function you are looking for is y = -x/90 + 1 .
The tool I used to solve the system of equations:
http://wims.unice.fr/wims/en_tool~linear~linsolver.en.html
What exactly do you mean? You can calculate points on the red line like this:
f(x) = 1-x/90
and the point then is (x,f(x)) = (x, 1-x/90). But to be honest, I think your question is still rather unclear.
I have a class implementing an audio stream that can be read at varying speed (including reverse and fast varying / "scratching")... I use linear interpolation for the read part and everything works quite decently..
But now I want to implement writing to the stream at varying speed as well and that requires me to implement a kind of "reverse interpolation" i.e. Deduce the input sample vector Z that, interpolated with vector Y will produce the output X (which I'm trying to write)..
I've managed to do it for constant speeds, but generalising for varying speeds (e.g accelerating or decelerating) is proving more complicated..
I imagine this problem has been solved repeatedly, but I can't seem to find many clues online, so my specific question is if anyone has heard of this problem and can point me in the right direction (or, even better, show me a solution :)
Thanks!
I would not call it "reverse interpolation" as that does not exists (my first thought was you were talking about extrapolation!). What you are doing is still simply interpolation, just at an uneven rate.
Interpolation: finding a value between known values
Extrapolation: finding a value beyond known values
Interpolating to/from constant rates is indeed much much simpler than the generic quest of "finding a value between known values". I propose 2 solutions.
1) Interpolate to a significantly higher rate, and then just sub-sample to the nearest one (try adding dithering)
2) Solve the generic problem: for each point you need to use the neighboring N points and fit a order N-1 polynomial to them.
N=2 would be linear and would add overtones (C0 continuity)
N=3 could leave you with step changes at the halfway point between your source samples (perhaps worse overtones than N=2!)
N=4 will get you C1 continuity (slope will match as you change to the next sample), surely enough for your application.
Let me explain that last one.
For each output sample use the 2 previous and 2 following input samples. Call them S0 to S3 on a unit time scale (multiply by your sample period later), and you are interpolating from time 0 to 1. Y is your output and Y' is the slope.
Y will be calculated from this polynomial and its differential (slope)
Y(t) = At^3 + Bt^2 + Ct + D
Y'(t) = 3At^2 + 2Bt + C
The constraints (the values and slope at the endpoints on either side)
Y(0) = S1
Y'(0) = (S2-S0)/2
Y(1) = S2
Y'(1) = (S3-S1)/2
Expanding the polynomial
Y(0) = D
Y'(0) = C
Y(1) = A+B+C+D
Y'(1) = 3A+2B+C
Plugging in the Samples
D = S1
C = (S2-S0)/2
A + B = S2 - C - D
3A+2B = (S3-S1)/2 - C
The last 2 are a system of equations that are easily solvable. Subtract 2x the first from the second.
3A+2B - 2(A+B)= (S3-S1)/2 - C - 2(S2 - C - D)
A = (S3-S1)/2 + C - 2(S2 - D)
Then B is
B = S2 - A - C - D
Once you have A, B, C and D you can put in an time 't' in the polynomial to find a sample value between your known samples.
Repeat for every output sample, reuse A,B,C&D if the next output sample is still between the same 2 input samples. Calculating t each time is similar to Bresenham's line algorithm, you're just advancing by a different amount each time.
I've generated a plot of the attenutation seen in an electrical trace up to a frequency of 14e10 rad/s. The ydata ranges from approximately 1-10 Np/m. I'm trying to generate a fit of the form
y = A*sqrt(x) + B*x + C*x^2.
I expect A to be around 10^-6, B to be around 10^-11, and C to be around 10^-23. However, the smallest coefficient lsqcurvefit will return is 10^-7. Also, its will only return a nonzero coefficient for A, while returning 0 for B and C. The fit actually looks really good however the physics indicates that B and C should not be 0.
Here is how I'm calling the function
% measurement estimate
x_alpha = [1e-6 1e-11 1e-23];
lb = [1e-7, 1e-13, 1e-25];
ub = [1e-3, 1e-6, 1e-15];
x_alpha = lsqcurvefit(#modelfun, x_alpha, omega, alpha_t, lb,ub)
Here is the model function
function [ yhat ] = modelfun( x, xdata )
yhat = x(1)*xdata.^.5 + x(2)*xdata + x(3)*xdata.^2;
end
Is it possible to get lsqcurvefit to return such small coefficients? Is the error in rounding or is it something else? Any ways I can change the tolerance to see a fit closer to what I expect?
Found a stackoverflow page that seems to address this issue!
fit using lsqcurvefit