Wrong fit with error bars in Gnuplot - gnuplot

Fitting without errors (works)
I made a simple linear fit in Gnuplot 5.0 using the command:
f(x)=a*x+b
fit f(x) 'file.dat' using 1:2 via a,b
I get the output:
degrees of freedom (FIT_NDF) : 6
rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : 0.00794747
variance of residuals (reduced chisquare) = WSSR/ndf : 6.31623e-05
Final set of parameters Asymptotic Standard Error
======================= ==========================
p1 = -0.00964423 +/- 0.0004976 (5.159%)
p2 = 1.07794 +/- 0.01908 (1.77%)
The result is this:
Fitting with errors
Then I added very tiny error bars just to see how they influence the fitting results (expecting the difference from the previous case to be very small), but using the command
f(x)=a*x+b
fit f(x) 'file.dat' using 1:2:3 yerrors via a,b
I get a completely wrong fit:
The output is:
degrees of freedom (FIT_NDF) : 6
rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : 750.565
variance of residuals (reduced chisquare) = WSSR/ndf : 563348
p-value of the Chisq distribution (FIT_P) : 0
Final set of parameters Asymptotic Standard Error
======================= ==========================
p1 = -0.0115247 +/- 0.0003419 (2.967%)
p2 = 1.15636 +/- 0.01483 (1.282%)
Furthermore, if I set the errors to be much larger, the output remains the one I had for tiny errors.
Do anyone have suggestions? What did I do wrong?
Data
y x dy
0.64345112296614271 45.082768716145587 6.6513808914832773E-004
0.71703932263695935 38.322543680055119 1.8140129703996476E-004
0.62214826712778870 46.283953074076770 1.2093419803380392E-004
0.70999997854232788 39.152893923419398 3.9303614359375108E-004
0.75723404482236245 33.204658354605364 6.6513808915369822E-004
0.69366599317566635 39.410047372618159 5.8653086043387384E-003
0.75948892906677234 33.491967428263528 6.6513808915369822E-004
0.79365751671683227 28.533494222921814 1.2758557891916475E-002
where for the first plot I just used the first two columns and for the second one I used the third for the error in y.

Related

Why isn't GNUPlot drawing a trendline that's fitted to the points in my dataset?

I have the following GNUPlot sequence of commands:
$ cat bb.gnuplot
set datafile separator ","
set autoscale x
set autoscale y
set xdata time
set timefmt "%Y%m%d"
set format x "%Y%m%d"
set key left top
set grid
m=1
b=1
f(x) = m*x + b
fit f(x) "bb" using 1:2 via m,b
plot "bb" using 1:2 title "filebeat-6.5.1", f(x) title "fit"
Along with this sample data:
$ cat bb
20190416,0
20190417,0
20190418,0
20190419,0
20190420,0
20190423,0
20190424,0
20190425,0
20190426,0
20190509,0
20190510,72
20190511,62
20190512,63
20190513,108
20190514,78
20190515,66
20190516,59
20190517,86
20190518,57
20190519,57
20190520,62
20190521,78
20190522,95
20190523,104
20190524,22
20190525,128
20190526,96
20190527,125
20190528,129
20190529,152
20190530,160
20190531,148
20190601,136
20190602,178
20190603,198
20190604,148
20190605,140
20190606,142
20190607,171
20190608,205
20190609,174
20190610,198
20190611,208
20190612,205
20190613,13
I'm trying to get GNUPlot to draw a trend line in the same plot but the line I'm getting doesn't make sense to me in terms of where it's getting placed in my plot.
$ gnuplot < bb.gnuplot
iter chisq delta/lim lambda m b
0 1.0926745428e+20 0.00e+00 1.10e+09 1.000000e+00 1.000000e+00
1 1.3194958855e+16 -8.28e+08 1.10e+08 1.098907e-02 1.000000e+00
2 1.6307478323e+08 -8.09e+12 1.10e+07 1.279057e-06 1.000000e+00
3 2.1025098835e+05 -7.75e+07 1.10e+06 5.819285e-08 1.000000e+00
4 2.1025098815e+05 -9.56e-05 1.10e+05 5.819150e-08 1.000000e+00
iter chisq delta/lim lambda m b
After 4 iterations the fit converged.
final sum of squares of residuals : 210251
rel. change during last iteration : -9.56318e-10
degrees of freedom (FIT_NDF) : 43
rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : 69.9254
variance of residuals (reduced chisquare) = WSSR/ndf : 4889.56
Final set of parameters Asymptotic Standard Error
======================= ==========================
m = 5.81915e-08 +/- 7.064e-06 (1.214e+04%)
b = 1 +/- 1.101e+04 (1.101e+06%)
correlation matrix of the fit parameters:
m b
m 1.000
b -1.000 1.000
Resulting graph:
I'm expecting the line to cut through my points and show me the optimally fitted line among the data points that I've provided it.
What am I missing here?
I can't find the appropriate section in the manual and I can't explain it well but
exchange your function with:
f(x) = m*(x-strptime("%Y%m%d","20190509")) + b
I guess it has something to do with offset/prescaling and because time/date data is handled internally as seconds passed from January, 1st 1970. So, today, June, 13th 2019 is approx. 1'560'000'000 seconds. And your time span is only about 4'580'000 seconds This makes it difficult to find proper parameters. If I find a better explanation, I will add it (or maybe somebody else can explain better).
Result:

Gnuplot: Fitting asymptotic curve to data

I am trying to fit an asymptotic curve to my data using gnuplot. It is a dataset showing reaction time results over a testing period. I have been able to plot the data and fit a straight line through it using the following code.
f(x) = a*x + c;
fit f(x) 'ReactionLearning.txt' using 1:2 via a,c
plot 'ReactionLearning.txt' using 1:2 with points lt 1 pt 3 notitle, \
f(x) with lines notitle
Which gives the following result:
http://imgur.com/PlQmalX.jpg
However, as this is supposed to show a learning effect, an asymptotic curve would make a lot more sense because the increase in performance caused by a learning effect will eventually stop, making the line even out.
From what I understand asymptotic cuves are created with the f(x) = 1/x. So I changed my code to be
f(x) = 1/(a*x)
fit f(x) 'ReactionLearning.txt' using 1:2 via a
plot 'ReactionLearning.txt' using 1:2 with points lt 1 pt 3 notitle, \
f(x) with lines notitle
However, I get this output: http://imgur.com/PimTa1T
Could someone explain what I am doing wrong here?
Thanks
There are many curves that show an asymptotic behavior, and 1/x is probably not the one that comes most often when describing physical or biological processes. Usually, these processes might show some sort of exponential decay. With the data that you show I don't think you can conclude anything about which model you should use, other than "it decays". If you already know what is the functional behavior you expect, that makes things different. That said, the general form of your 1/x curve should be f(x) = a/(x-x0) + c, which will probably give you some meaningful results when you fit to it:
f(x) = a/(x-x0) + c
fit f(x) "data" via a,c,x0
Since fitting might show instabilities for this kind of function if the initial values are bad, you should/might need to provide sensible initial values or reformulate the problem as a linear relation. You can do the latter by a change of variable y = 1/(x - x0) and do the fitting for different values of x0. Record the error in the fit (which is output by gnuplot) for each of them and see how the error gets minimized as a function of x0: it should be quadratic about the optimum value. Something like this:
f(x) = a*x + c
x0 = 1. # give some value for x0
fit f(x) "data" u (1./($1-x0)):2 via a,c # record fit errors for a and c
x0 = 3. # give some other value for x0
fit f(x) "data" u (1./($1-x0)):2 via a,c # record fit errors for a and c

What does the error message "w = 0 in Givens();" mean when trying curve fitting in gnuplot?

I keep having the w = 0 in Givens(); error message when I try to use gnuplot built-in curve fitting feature.
What I do is trying to fit experimental data to a certain mathematical model in gnuplot.
I define the model function s(x):
gnuplot> z(x)=(x-mu)/be
gnuplot> s(x)=(k/be)*exp(-z(x)-exp(-z(x)))
Then I plot the actual data and the model function to get an initial guess for the model parameters:
Then I adjust the initial guess:
gnuplot> k=2.6; mu=-8.8;
gnuplot> replot
To obtain a pretty fine picture:
Then I try to precisely fit the curve:
gnuplot> fit s(x) '701_707_TRACtdetq.log30.hist1.txt' u 2:6 via k,be,mu
And what I get is the single iteration and a error message:
Iteration 0
WSSR : 3.85695 delta(WSSR)/WSSR : 0
delta(WSSR) : 0 limit for stopping : 1e-05
lambda : 0.223951
initial set of free parameter values
k = 2.6
be = 1
mu = -8.8
/
Iteration 1
WSSR : 0.0720502 delta(WSSR)/WSSR : -52.5315
delta(WSSR) : -3.7849 limit for stopping : 1e-05
lambda : 0.0223951
resultant parameter values
k = 2.03996
be = 0.777868
mu = -8.87082
w = 0 in Givens(); Cjj = 3.37383e-196, Cij = 2.54469e-192
And the curve pretty fit:
What does that error means and how would I get the fit process going?
What I'm just about to say might seem strange but it works!
When I run into the 'w = 0 in Givens()' error I use:
gnuplot> set xrange [a,b]
where 'a' and 'b' are chosen to window the 'most interesting' parts. If you now do the fitting command that you have:
gnuplot> fit s(x) '701_707_TRACtdetq.log30.hist1.txt' u 2:6 via k,be,mu
You might find that your fit now converges. I'm not sure why 'set range' affects the fitting algorithm but it does! In your example, I might let:
a = -12
b = -2
The error message w = 0 in Givens(); seems to be related to inability of fit to perform the next iteration of fit parameters estimation. The error message is accompanied by the values of a certain matrix C[][] that is related to the direction of the next step of the fit iterations. Those values are usually very small, like in the example, Cjj = 3.37383e-196, Cij = 2.54469e-192. This means that the fit process has converged to a state where every other local set of fit parameters are less optimal than the current (local state extreme), but the current residuals are above the convergence limit, in this case delta(WSSR) : -3.7849 limit for stopping : 1e-05. This happens when the data to be fitted exhibits a disturbance (at approximately x=-13 in this case) that yields significant delta despite the perfect fit.
Long story short: the error usually happens when the fit is fine but the delta is still high.

get fit data out of gnuplot

I often use Octave to create data that I can plot from my lab results. That data is then fitted with some function in gnuplot:
f1(x) = a * exp(-x*g);
fit f1(x) "c_1.dat" using 1:2:3 via a,g
That creates a fit.log:
*******************************************************************************
Tue May 8 19:13:39 2012
FIT: data read from "e_schwach.dat" using 1:2:3
format = x:z:s
#datapoints = 16
function used for fitting: schwach(x)
fitted parameters initialized with current variable values
Iteration 0
WSSR : 12198.7 delta(WSSR)/WSSR : 0
delta(WSSR) : 0 limit for stopping : 1e-05
lambda : 14.2423
initial set of free parameter values
mu2 = 1
omega2 = 1
Q2 = 1
After 70 iterations the fit converged.
final sum of squares of residuals : 46.0269
rel. change during last iteration : -2.66463e-06
degrees of freedom (FIT_NDF) : 13
rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : 1.88163
variance of residuals (reduced chisquare) = WSSR/ndf : 3.54053
Final set of parameters Asymptotic Standard Error
======================= ==========================
mu2 = 0.120774 +/- 0.003851 (3.188%)
omega2 = 0.531482 +/- 0.0006112 (0.115%)
Q2 = 17.6593 +/- 0.7416 (4.199%)
correlation matrix of the fit parameters:
mu2 omega2 Q2
mu2 1.000
omega2 -0.139 1.000
Q2 -0.915 0.117 1.000
Is there some way to get the parameters and their error back into Octave? I mean I can write a Python program that parses that, but I hoped to avoid that.
Update
This question is not applicable to me any more, since I use Python and matplotlib for my lab work now, and it can does all this from a single program. I leave this question open in case somebody else has the same problem.
I don't know much about the gnuplot-Octave interface, but what can make your (parsing) life easier is you can:
set fit errorvariables
fit a*x+g via a,g
set print "fit_parameters.txt"
print a,a_err
print g,g_err
set print
Now your variables and their respective errors are in the file "fit_parameters.txt" with
no parsing needed from python.
from the documentation on fit:
If gnuplot was built with this option, and you activated it using set
fit errorvariables, the error for each fitted parameter will be
stored in a variable named like the parameter, but with _err
appended. Thus the errors can be used as input for further
computations.

gnuplot fit line to two points

Consider the data file with two columns and two rows:
3869. 1602.
3882. 9913.
I'd like to fit a line using gnuplot
gnuplot> f(x) = a * x + b
gnuplot> fit f(x) './data.txt' u 1:2 via a, b
Iteration 0
WSSR : 3.43474e+07 delta(WSSR)/WSSR : 0
delta(WSSR) : 0 limit for stopping : 1e-05
lambda : 2740.4
initial set of free parameter values
a = 1.7524
b = -1026.99
/
Iteration 1
WSSR : 3.43474e+07 delta(WSSR)/WSSR : -1.49847e-12
delta(WSSR) : -5.14686e-05 limit for stopping : 1e-05
lambda : 274.04
resultant parameter values
a = 1.7524
b = -1026.99
After 1 iterations the fit converged.
final sum of squares of residuals : 3.43474e+07
rel. change during last iteration : -1.49847e-12
Exactly as many data points as there are parameters.
In this degenerate case, all errors are zero by definition.
Final set of parameters
=======================
a = 1.7524
b = -1026.99
gnuplot>
which gives wrong values for fit parameters. Why is this happening? My gnuplot version is Version 4.4 patchlevel 0.
It looks to me that the curve-fitting function is struggling to find the true parameters. This could be associated with the magnitude of your data points and/or trying to fit a line with two parameters to only two data points.
In any case, doing the calculation of a and b in Excel or equivalent yields:
a= 577.769
b = -2233787
If you give gnuplot a good guess at what they should be, e.g. a=500 and b=-2233700 and repeat the procedure, it should successfully find the correct solution:
Final set of parameters
=======================
a = 577.769
b = -2.23379e+06
Of course, if you're fitting two points to a two-parameter straight line, it's much easier to calculate the values of a and b by hand:
a = (9113-1602) / (3882-3869)
b = 1602 - a * 3869
Gnuplot uses a non-linear method to determine the parameters of your function f with respect to a certain error value: limit for stopping : 1e-05.
If you change that error value your function will be exactly fit. The error value can be specified with the FIT_LIMIT variable like so:
FIT_LIMIT = 1e-8
With this setting your points will be exactly matched after 12 iterations. (At least on my machine^^)

Resources