Fitting exponential function in GNUPLOT - gnuplot

I have a problem fitting an exponentional function
f(x)= Aexp(-bx)sin(2pi*x/T + phi) + S
data
it kept being a straight line then I tried giving it some values for A, b, T, phi, S and it became something closer to the data but still shite

Multidimensional fitting is very non-trivial and algorithms often fail on this one. Try to help the algorithm by giving a better initial guess. You can also try to fit variables 1 by 1, e.g., the average S first, then the periodic length, then this 2 together, etc.
Please also provide how you tried to fit the function and which version of Gnuplot you used. If the 3rd column consists of 0s and you provided it as error values for fit in Gnuplot v4, fit completely fails.
On this given set of data, using a bad guess, the fit fails. But a better guess can succeed:
f(x)=A*exp(-b*x)*sin(2.*pi*x/T+phi)+S
A = 40.
b = 1/500.
T = 400.
phi = 1.
S = 170.
f_bad_guess(x) = 40. * exp(-x/500.) * sin(2.*pi*x/150+3.) + 170.
f_good_guess(x) = 40. * exp(-x/500.) * sin(2.*pi*x/400+1.) + 170.
fit f(x) "data.txt" via A,b,T,phi,S
p "data.txt" t "data", f(x) t "fitted function", f_good_guess(x) t "good initial guess set manually", f_bad_guess(x) t "bad initial guess set manually"

The non-linear regression calculus is iterative starting from "guessed" initial values of the parameters. Especially when the model involves sinusoidal functions the key point is to start with guessed values close enough to the correct values which are not known.
Probably your difficulty is to guess good enough values (or the difficulty of the software to try some initial good enough values).
A non-conventional method which is not iterative and which doesn't need initial values is explained in this paper : https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales. The application of this method in the present case is shown below :
If more accuracy is wanted one have to try a non-linear regression (with an available software). Using the numerical values of the parameters found above as inital values increases the chances of good convergence.

Related

Fit an exponential function to time-series data

I've to fit the following exponential function to a time-series data (data).
$C(t)$ = $C_{\infty} (1-\exp(-\frac{t}{\tau}))$
I want to compute the time scale $\tau$ at which C(t) reaches $C_{\infty}$. I would like to ask for suggestions on how $\tau$ can be computed. I found an example here that use curve fitting. But I am not sure how to use curve_fit library in scipy to set up the problem described above.
One cannot expect a good fitting along the whole curve with the function that you choose.
This is because especially at t=0 this function returns C=0 while the data value is C=2.5 .This is very far considering the order of magnitude.
Nevertheless on can try to fit this function for a rough result. A non-linear regression calculus is necessary : this is the usual approach using available softwares. This is the recommended method in context of academic exercices.
Alternatively and more simply, a linear regression can be used thanks to a non-conventional method explained in https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales .
The result is shown below.
For a better fitting one have to take account of the almost constant value of data in the neighborhood of t=0. Choosing a function made of two logistic functions would be recommended. But the calculus is more complicated.
IN ADDITION, AFTER THE OP CHANGES THE DATA :
The change of data makes out of date the above answer.
In fact artificially changing the origin of the y-scale so that y=0 at t=0 changes nothing. The slope at t=0 of the chosen fonction is far to be nul, while the slope of the data curve is almost 0. This remains incompatible.
Definitively the chosen function y=C*(1-exp(-t/tau)) cannot fit correctly the data (the preceeding data or the new data as well).
As already pointed out, for a better fitting one have to take account of the almost constant value of data in the neighborhood of t=0. Choosing a function made of two logistic functions would be recommended. But the calculus is more complicated.

gnuplot fit undefined value during function evaluation

I have the datafile:
10.0000 -330.12684910
15.0000 -332.85109334
20.0000 -333.85785274
25.0000 -334.18315783
30.0000 -334.28078907
35.0000 -334.30486903
40.0000 -334.30824069
45.0000 -334.30847874
50.0000 -334.30940105
55.0000 -334.31091085
60.0000 -334.31217217
The commands a used to fit this
f(x) = a+b*exp(c*x)
fit f(x) datafile via a, b, c
didn't get the negative exponential that I expected, then just to see how the hyperbola fitted I tried
f(x) = a+b/x
fit f(x) datafile via a, b
but decided to do this:
f(x) = a+b*exp(-c*x)
fit f(x) datafile via a, b, c
and it worked. I continued doing fits but in some point it started to mark this error undefined value during function evaluation.
I restarted the session and deleted the fit.log file, I thought it was a gnuplot bug but since then I always receive the undefined value error. I've been reading similar issues. It could be a, b, c seeds but I have introduced very similar values to the ones a received that time it fitted well but didn't work. I'm thinking the problem might be chaotic or I'm doing something wrong.
Thank you for your method. I understand gnuplot uses non linear least squares method for fitting.
I found one solution is to use the model y=a+b*exp(-c*c*x), I also found better initial values and it worked. anyway I have this other dataset:
2 -878.11598213
6 -878.08846509
10 -878.08105262
19 -878.07882425
28 -878.07793702
44 -878.07755010
60 -878.07738151
85 -878.07729504
110 -878.07725107
And gnuplot fit does the work but really bad. instead I used your method, here I show a comparison:
gnuplot fit and jjaquelin comparison
it is way better.
I don't know the algorithm used by gnuplot. Probably an iterative method starting from guessed values of the parameters. The difficulty might come from not convenient initial values and/or from no convergence of the process.
From my own calculus the result is close to the values below. The method, which is not iterative and doesn't require initial guess, is explained in the paper : https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales
FOR INFORMATION :
The linearisation of the regression is obtained thanks to the integral equation
to which the function to be fitted is solution.
The paper referenced above is mainly written in French. It is pratially translated in : https://scikit-guess.readthedocs.io/en/latest/appendices/references.html

Improving fit in gnuplot (by limiting parameter size?)

I'm having trouble fitting the following data in gnuplot 4.4
0.0007629768 -0.1256279199 0.0698209297
0.0007565689 0.5667065856 0.0988522507
0.00071274 1.3109126758 0.7766233743
f1(x) = -a1 * x + b1
a1 = 28000
fit f1(x) "56demo.csv" using 1:2:3 via a1, b1
plot "56demo.csv" using 1:2:3 with yerrorbars title "56%", \
f1(x) notitle
This converges to values of a1 and b1 which are higher than I would like.
Several similar tests converge to values in the range in which they should be, but for some reason these don't.
Specifically, I'd like to have
a1 = 28000, approximately.
I'm looking for some way to hit a local minimum. I've tried making the fit limit smaller, but I haven't had much luck that way.
Is it possible to set an upper limit to the values of a1 and b1? That is one way I'd like to try.
Thanks
The most common method of fitting is the chi-square (χ²) method. Chi-square is the expression
where xi, yi and σi are the data points with error in y, and f(x) is a model function which describes your data. This function has some parameters, and the goal is to find those values for the parameters, for which this expression has a global minimum. A program like gnuplot will try several sets of values for this parameters, to find the one set for which χ² is minimal.
In general, several things can go wrong, which usually means that the algorithm has found a local minimum, not the global one. This happens for example when the initial values for the parameters are bad. It helps to estimate the initial values as good as possible.
Another problem is when the algorithm uses too big steps between the sets of parameter values. This often happens for example, if you have a very narrow peak on a broader peak. Usually, you will end up with a parameter set describing a sum of two identical peaks, which describes the broad peak well and ignores the narrow one. Again, a good initial value set will help. You may also first keep the peak positions fixed (i.e. not in the via-list in gnuplot ) and fit all other parameters, and then fit all parameters in a second command.
But if f(x) is a linear function, this problems do not exist !
You can replace f(x) by m*x+b and do the math. The result is that χ² is a parabola in the parameter space, which has a single, unique minimum, which can also be calculated explicitly.
So, if gnuplot gives you a set of parameters for that data, this result is absolutely correct, even if you don't like that result.

How to find a regression line for a closed set of data with 4 parameters in matlab or excel?

I have a set of data I have acquired from simulations. There are 3 parameters that go into my simulations and I get one result out.
I can graph the data from the small subset i have and see the trends for each input, but I need to be able to extrapolate this and get some form of a regression equation seeing as the simulation takes a long time.
In matlab or excel, is it possible to list the inputs and outputs to obtain a 4 parameter regression line for a given set of information?
Before this gets flagged as a duplicate, i understand polyfit will give me an equation of best fit and will be as accurate as i want it, but i need the equation to correspond to the inputs, not just a regression line.
In other words if i 20 simulations of inputs a, b, c and output y, is there a way to obtain a "best fit":
y=B0+B1*a+B2*b+B3*c
using the data?
My usual recommendation for higher-dimensional curve fitting is to pose the problem as a minimization problem (that may be unneeded here with the nice linear model you've proposed, but I'm a hammer-nail guy sometimes).
It starts by creating a correlation function (the functional form you think maps your inputs to the output) given a vector of fit parameters p and input data xData:
correl = #(p,xData) p(1) + p(2)*xData(:,1) + p(3)*xData(:2) + p(4)*xData(:,3)
Then you need to define a function to minimize given the parameter vector, which I call the objective; this is typically your correlation minus you output data.
The details of this function are determined from the solver you'll use (see below).
All of the method need a starting vector pGuess, which is dependent on the trends you see.
For nonlinear correlation function, finding a good pGuess can be a trial but necessary for a good solution.
fminsearch
To use fminsearch, the data must be collapsed to a scalar value using some norm (2 here):
x = [a,b,c]; % your input data as columns of x
objective = #(p) norm(correl(p,x) - y,2);
p = fminsearch(objective,pGuess); % you need to define a good pGuess
lsqnonlin
To use lsqnonlin (which solves the same problem as above in different ways), the norm-ing of the objective is not needed:
objective = #(p) correl(p,x) - y ;
p = lsqnonlin(objective,pGuess); % you need to define a good pGuess
(You can also specify lower and upper bounds on the parameter solution, which is nice.)
lsqcurvefit
To use lsqcurvefit (which is simply a wrapper for lsqnonlin), only the correlation function is needed along with the data:
p = lsqcurvefit(correl,pGuess,x,y); % you need to define a good pGuess

Gnuplot Curve Fitting With Time-Offset

I have an issue with curve fitting process using Gnuplot. I have data with the time starting at 0.5024. I want to use a linear sin/cos combo to fit a value M over time (M=a+bsin(wt)+ccos(wt)). For further processing I only need the c value.
My code is
f(x)=a+b*sin(w*x)+c*cos(w*x)
fit f(x) "data.dat" using 1:2 via a,b,c,w
the asymptotic standard error ist 66% for parameter c which seems quite high. I suspect that it has to do with the fact, that the time starts at 0.5024 instead of 0. What I could do of course is
fit f(x) "data.dat" using ($1-0.5024):2 via a,b,c,w
with an asymptotic error of about 10% which is way lower. The question is: Can I do that? Does my new plot with the time offset still represent the original curve? Any other ideas?
Thanks in advance for your help :-)
It's a bit difficult to answer this without having seen your data, but your observation is typical.
The problem is an effect of the fit itself, or even your formula. Let me explain it using an example data set. (Well, this will become offtopic...)
An statistics excourse
The data follows the function f(x)=x and all y-values have been shifted by gassian random numbers. In addtion, the data is in the x-dange [600:800].
You can now simply apply a linear fit f(x)=m*x+b. According to Gauß' error distribution, the error is df(x)=sqrt((dm*x)²+(db)²). So, you can plot the data, the linear function and the error margin f(x) +/- df(x)
Here is the result:
The parameters:
m = 0.981822 +/- 0.1212 (12.34%)
b = 0.974375 +/- 85.13 (8737%)
The correlation matrix:
m b
m 1.000
b -0.997 1.000
You may notice three things:
The error for b is very large!
The error margin is small at x=0, but increases with x. Shouldn't it be smallest where the data is, i.e. at x=700?
The correlation between m and b is -0.997, which is near the maximum (absolute) value of 1.
The third point can be understood at the plot: If you increase the slope m, the y-offset decreases, too. Both parameters are very correlated, and an error on one of them is distributed to the other!
From statistics you may know, that a linear regression function always goes through the center of gravity (cog) of your data. So, let's shift the data so that the cog is the origin (it's enough to shift it so that the cog is on the y-axis, but I did it so)
Result:
m = 1.0465 +/- 0.1211 (11.57%)
b = -12.0611 +/- 7.027 (58.26%)
Correlation:
m b
m 1.000
b -0.000 1.000
Compared to the first plot, the value and error for m is almost the same, but the very large error ob b is much smaller now. The reason is that m and b are not correlated any more, and so a (tiny) variation m does not give a (very big) variation of b. It is also nice to see that the error margin has shrunk a lot.
Here is a last plot with the original data, the first fit function and the "back-shifted function for the shifted data":
About your fit function:
First, there is a big correlation problem: b and c are extremely correlated, as both together define the phase and amplitude of your oscillation. It would help a lot to use another, equivalent function:
f(x)=a+N*sin(w*x+p)
Here, you have phase and amplitude separated. You can still calculate your c from the fit results, and I guess, the error is much better for it.
Like in my example, if the data is far away from the y-axis, a small variation of w will have a big impact on p . So, I would suggest to shift your data so that it's cog is on the y-axis to get almost rid of this.
Is this shift allowed?
Yes. You do not alter the data, you simply change your coordinate system to get better errors. Also, the fit function should describe the data, so it should be very accurate in the range where your data is. In my first plot, the highest accuracy is at the y-axis, not where the data is.
Important
You should always remark which tricks you applied. Otherwise, someome may check your results and fit the data without the tricks, sees the red curve instead youre green one, and may accuse you of cheating...
Whether you can do that or not depends on whether the curve you're fitting to represents the physical phenomena you're studying and is consistent with the physical model you need to comply with. My suggestion is that you provide those and ask this question again in a physics forum (or chemistry, biology, etc., depending on your field).

Resources