I have too data sets x F(x) and x G(x) and want to find, for instance, the optimal a such as F(x)=G(a.x) (can be applied in a similar fashion to any functional of G).
Now I understand that the fit function of gnuplot relies on fitting an analytical form of a function to a data set, so I guess my exact question is :
Can I 'spawn' a function with my first data set that I would use to fit on the second ?
Something like (pseudo-code) :
f(x)='data_set1.dat'(x)
g(x)=f(a*x)
fit f(x), 'data_set2.dat' via a
I guess any kind of interpolation (polynomial, etc.) could give me the ability to spawn such a function, but it seems like an awful-lot of trouble (and approximations).
Related
I have a problem fitting an exponentional function
f(x)= Aexp(-bx)sin(2pi*x/T + phi) + S
data
it kept being a straight line then I tried giving it some values for A, b, T, phi, S and it became something closer to the data but still shite
Multidimensional fitting is very non-trivial and algorithms often fail on this one. Try to help the algorithm by giving a better initial guess. You can also try to fit variables 1 by 1, e.g., the average S first, then the periodic length, then this 2 together, etc.
Please also provide how you tried to fit the function and which version of Gnuplot you used. If the 3rd column consists of 0s and you provided it as error values for fit in Gnuplot v4, fit completely fails.
On this given set of data, using a bad guess, the fit fails. But a better guess can succeed:
f(x)=A*exp(-b*x)*sin(2.*pi*x/T+phi)+S
A = 40.
b = 1/500.
T = 400.
phi = 1.
S = 170.
f_bad_guess(x) = 40. * exp(-x/500.) * sin(2.*pi*x/150+3.) + 170.
f_good_guess(x) = 40. * exp(-x/500.) * sin(2.*pi*x/400+1.) + 170.
fit f(x) "data.txt" via A,b,T,phi,S
p "data.txt" t "data", f(x) t "fitted function", f_good_guess(x) t "good initial guess set manually", f_bad_guess(x) t "bad initial guess set manually"
The non-linear regression calculus is iterative starting from "guessed" initial values of the parameters. Especially when the model involves sinusoidal functions the key point is to start with guessed values close enough to the correct values which are not known.
Probably your difficulty is to guess good enough values (or the difficulty of the software to try some initial good enough values).
A non-conventional method which is not iterative and which doesn't need initial values is explained in this paper : https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales. The application of this method in the present case is shown below :
If more accuracy is wanted one have to try a non-linear regression (with an available software). Using the numerical values of the parameters found above as inital values increases the chances of good convergence.
I've to fit the following exponential function to a time-series data (data).
$C(t)$ = $C_{\infty} (1-\exp(-\frac{t}{\tau}))$
I want to compute the time scale $\tau$ at which C(t) reaches $C_{\infty}$. I would like to ask for suggestions on how $\tau$ can be computed. I found an example here that use curve fitting. But I am not sure how to use curve_fit library in scipy to set up the problem described above.
One cannot expect a good fitting along the whole curve with the function that you choose.
This is because especially at t=0 this function returns C=0 while the data value is C=2.5 .This is very far considering the order of magnitude.
Nevertheless on can try to fit this function for a rough result. A non-linear regression calculus is necessary : this is the usual approach using available softwares. This is the recommended method in context of academic exercices.
Alternatively and more simply, a linear regression can be used thanks to a non-conventional method explained in https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales .
The result is shown below.
For a better fitting one have to take account of the almost constant value of data in the neighborhood of t=0. Choosing a function made of two logistic functions would be recommended. But the calculus is more complicated.
IN ADDITION, AFTER THE OP CHANGES THE DATA :
The change of data makes out of date the above answer.
In fact artificially changing the origin of the y-scale so that y=0 at t=0 changes nothing. The slope at t=0 of the chosen fonction is far to be nul, while the slope of the data curve is almost 0. This remains incompatible.
Definitively the chosen function y=C*(1-exp(-t/tau)) cannot fit correctly the data (the preceeding data or the new data as well).
As already pointed out, for a better fitting one have to take account of the almost constant value of data in the neighborhood of t=0. Choosing a function made of two logistic functions would be recommended. But the calculus is more complicated.
I have the datafile:
10.0000 -330.12684910
15.0000 -332.85109334
20.0000 -333.85785274
25.0000 -334.18315783
30.0000 -334.28078907
35.0000 -334.30486903
40.0000 -334.30824069
45.0000 -334.30847874
50.0000 -334.30940105
55.0000 -334.31091085
60.0000 -334.31217217
The commands a used to fit this
f(x) = a+b*exp(c*x)
fit f(x) datafile via a, b, c
didn't get the negative exponential that I expected, then just to see how the hyperbola fitted I tried
f(x) = a+b/x
fit f(x) datafile via a, b
but decided to do this:
f(x) = a+b*exp(-c*x)
fit f(x) datafile via a, b, c
and it worked. I continued doing fits but in some point it started to mark this error undefined value during function evaluation.
I restarted the session and deleted the fit.log file, I thought it was a gnuplot bug but since then I always receive the undefined value error. I've been reading similar issues. It could be a, b, c seeds but I have introduced very similar values to the ones a received that time it fitted well but didn't work. I'm thinking the problem might be chaotic or I'm doing something wrong.
Thank you for your method. I understand gnuplot uses non linear least squares method for fitting.
I found one solution is to use the model y=a+b*exp(-c*c*x), I also found better initial values and it worked. anyway I have this other dataset:
2 -878.11598213
6 -878.08846509
10 -878.08105262
19 -878.07882425
28 -878.07793702
44 -878.07755010
60 -878.07738151
85 -878.07729504
110 -878.07725107
And gnuplot fit does the work but really bad. instead I used your method, here I show a comparison:
gnuplot fit and jjaquelin comparison
it is way better.
I don't know the algorithm used by gnuplot. Probably an iterative method starting from guessed values of the parameters. The difficulty might come from not convenient initial values and/or from no convergence of the process.
From my own calculus the result is close to the values below. The method, which is not iterative and doesn't require initial guess, is explained in the paper : https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales
FOR INFORMATION :
The linearisation of the regression is obtained thanks to the integral equation
to which the function to be fitted is solution.
The paper referenced above is mainly written in French. It is pratially translated in : https://scikit-guess.readthedocs.io/en/latest/appendices/references.html
I want plot a function y=f(x) in excel and do some operations on it. The function is defined in an interval (x1,x2) with a defined step xs. Obviously I can define the vector x by hand, but I cannot manage to define it automatically, something like I do in Matlab using (x1:xs:x2). Is there any way to do that?
I'm having trouble fitting the following data in gnuplot 4.4
0.0007629768 -0.1256279199 0.0698209297
0.0007565689 0.5667065856 0.0988522507
0.00071274 1.3109126758 0.7766233743
f1(x) = -a1 * x + b1
a1 = 28000
fit f1(x) "56demo.csv" using 1:2:3 via a1, b1
plot "56demo.csv" using 1:2:3 with yerrorbars title "56%", \
f1(x) notitle
This converges to values of a1 and b1 which are higher than I would like.
Several similar tests converge to values in the range in which they should be, but for some reason these don't.
Specifically, I'd like to have
a1 = 28000, approximately.
I'm looking for some way to hit a local minimum. I've tried making the fit limit smaller, but I haven't had much luck that way.
Is it possible to set an upper limit to the values of a1 and b1? That is one way I'd like to try.
Thanks
The most common method of fitting is the chi-square (χ²) method. Chi-square is the expression
where xi, yi and σi are the data points with error in y, and f(x) is a model function which describes your data. This function has some parameters, and the goal is to find those values for the parameters, for which this expression has a global minimum. A program like gnuplot will try several sets of values for this parameters, to find the one set for which χ² is minimal.
In general, several things can go wrong, which usually means that the algorithm has found a local minimum, not the global one. This happens for example when the initial values for the parameters are bad. It helps to estimate the initial values as good as possible.
Another problem is when the algorithm uses too big steps between the sets of parameter values. This often happens for example, if you have a very narrow peak on a broader peak. Usually, you will end up with a parameter set describing a sum of two identical peaks, which describes the broad peak well and ignores the narrow one. Again, a good initial value set will help. You may also first keep the peak positions fixed (i.e. not in the via-list in gnuplot ) and fit all other parameters, and then fit all parameters in a second command.
But if f(x) is a linear function, this problems do not exist !
You can replace f(x) by m*x+b and do the math. The result is that χ² is a parabola in the parameter space, which has a single, unique minimum, which can also be calculated explicitly.
So, if gnuplot gives you a set of parameters for that data, this result is absolutely correct, even if you don't like that result.