I am trying to fit an asymptotic curve to my data using gnuplot. It is a dataset showing reaction time results over a testing period. I have been able to plot the data and fit a straight line through it using the following code.
f(x) = a*x + c;
fit f(x) 'ReactionLearning.txt' using 1:2 via a,c
plot 'ReactionLearning.txt' using 1:2 with points lt 1 pt 3 notitle, \
f(x) with lines notitle
Which gives the following result:
http://imgur.com/PlQmalX.jpg
However, as this is supposed to show a learning effect, an asymptotic curve would make a lot more sense because the increase in performance caused by a learning effect will eventually stop, making the line even out.
From what I understand asymptotic cuves are created with the f(x) = 1/x. So I changed my code to be
f(x) = 1/(a*x)
fit f(x) 'ReactionLearning.txt' using 1:2 via a
plot 'ReactionLearning.txt' using 1:2 with points lt 1 pt 3 notitle, \
f(x) with lines notitle
However, I get this output: http://imgur.com/PimTa1T
Could someone explain what I am doing wrong here?
Thanks
There are many curves that show an asymptotic behavior, and 1/x is probably not the one that comes most often when describing physical or biological processes. Usually, these processes might show some sort of exponential decay. With the data that you show I don't think you can conclude anything about which model you should use, other than "it decays". If you already know what is the functional behavior you expect, that makes things different. That said, the general form of your 1/x curve should be f(x) = a/(x-x0) + c, which will probably give you some meaningful results when you fit to it:
f(x) = a/(x-x0) + c
fit f(x) "data" via a,c,x0
Since fitting might show instabilities for this kind of function if the initial values are bad, you should/might need to provide sensible initial values or reformulate the problem as a linear relation. You can do the latter by a change of variable y = 1/(x - x0) and do the fitting for different values of x0. Record the error in the fit (which is output by gnuplot) for each of them and see how the error gets minimized as a function of x0: it should be quadratic about the optimum value. Something like this:
f(x) = a*x + c
x0 = 1. # give some value for x0
fit f(x) "data" u (1./($1-x0)):2 via a,c # record fit errors for a and c
x0 = 3. # give some other value for x0
fit f(x) "data" u (1./($1-x0)):2 via a,c # record fit errors for a and c
Related
I am trying to use GNUplot to calculate the best-fit line for some time-series data. The data is just about linear already with a negative slope. The input data looks something like:
1615840396,138849,510249
1615840406,139011,511152
1615840416,137580,510330
1615840426,137493,510501
1615840436,137261,510186
1615840447,137435,511026
1615840456,137054,510252
1615840466,136955,510174
1615840476,136922,510540
1615840486,136970,510999
The first column is a Unix timestamp. A graph of column 2 vs. time looks like this:
I'm trying to produce a best-fit line like this:
gnuplot> set xdata time
gnuplot> set timefmt "%s"
gnuplot> set datafile separator comma
gnuplot> f(x) = m*x + b
gnuplot> fit f(x) 'data.csv' using 1:2 via m,b
Which produces:
Final set of parameters Asymptotic Standard Error
======================= ==========================
m = 8.08062e-05 +/- 1.633 (2.021e+06%)
b = 1 +/- 2.639e+09 (2.639e+11%)
The resulting best fit line has a positive slope, and doesn't really git the data at all:
What am I doing wrong?
This is a recurring question about fitting time data. I guess there should be similar questions here on SO, but I can't find them right now. I'm not sure if there is an example of fitting time data on the gnuplot homepage.
I guess the problem is the following: If you assume a linear function f(x) = a*x + b with time data, the origin will be at Jan, 1st 1970.
Typically, this will be pretty far from your actual data and furthermore, you only have a small range of data compared to the distance to your origin. So, I guess the fitting function cannot deliver really good values.
You better try to fit a function which is shifted by your start date.
You either set this start date manually, or you spend a few lines of code to find it automatically.
Additionally, it will help if you give some starting values for the fitting parameters.
Here, it seems that a will be found without giving a start value and if you set b=1 it will not give good result, but b=10 seems to be ok as starting value.
Code:
### fitting time data
reset session
# create some random test data
set print $Data
do for [i=1:100] {
print sprintf("%.0f,%g",time(0)+i*86400,i+rand(0)*10 )
}
set print
set datafile separator comma
# find out the StartDate
StartDate = 16158768671 # manually by setting a value
# or automatically by using stats
stats $Data u 1 index 0 every ::0:0:0:0 nooutput
StartDate = STATS_min
f(x) = a*(x-StartDate) + b
set fit brief nolog
b=10
fit f(x) $Data u 1:2 via a,b
set key top left
set format x "%b %d" timedate
plot $Data u 1:2 ti "Data", \
f(x) w l lc rgb "red" ti "Fit"
### end of code
Result:
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 1.16005e-05 +/- 1.163e-07 (1.003%)
b = 6.1323 +/- 0.5759 (9.39%)
I am trying to fit the function f(x)=exp(a*x) on Gnuplot. It keeps giving me the error 'undefined value during function evaluation'. I use the following code:
y(x)=exp(a*x)
a = 60
fit y(x) 'data.txt' using 1:2 via a
plot y(x), 'data.txt' using 1:2 notitle
The error is coming from the fourth line in the above bit of code. I have set the directory properly but did not it include in the piece of code above.
Where am I going wrong?
Assuming your data looks like this:
8,701 1032,000 1025,000
9,701 974,000 963,000
...
26,701 609,000 603,000
First, by default gnuplot expects decimal numbers to be written with '.' as decimal sign. To change this, use:
set decimalsign ','
Second, and more important to your question, gnuplot internally uses double precision numbers. They go up to about 1e308. In the first iteration of the fit there are calculations like exp(a*x) with a=60 and x=26, which results in exp(1560) = 3e677 - way too large, hence the error message.
Third, an exponential function f(x) = exp(a*x) starts with f(0) = 1 and is increasing for positive a, your data starts at f(0) > 1000 and is decreasing. Therefore I would try a setup like this:
set decimalsign ','
y(x)=b*exp(-a*x)
a = 0.1
b = 1000
fit y(x) 'data.txt' using 1:2 via a,b
plot y(x), 'data.txt' using 1:2 notitle
Result:
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 0.0286709 +/- 0.0005953 (2.076%)
b = 1256.51 +/- 12.12 (0.9647%)
It's up to you to decide if the function really represents the underlying data.
I have this set of data that want to fit with gnuplot using the function f(x) = exp(A+ B/(x-x0)) where A,B and x0 are my set of parameters to fit
# x f(x)
0.382 8.29023731095968
0.509 6.36124122026352
0.637 4.66938977764103
0.764 3.3194714217965
0.891 2.15140777817893
1.019 1.15428884806615
1.146 0.262232461832655
I have try it with
fit log(f(x)) 'data.dat' using 1:(log($2)) via A, B, x0
also have defined the function as f(x) = A+ B/(x-x0) and tried with
fit f(x) 'data.dat' using 1:(log($2)) via A, B, x0
and then plot exp(f(x))
The code works but the fitted parameters are not fine, because when I plot the curve and the points together not make sense. Is this fit too complicate for gnuplot?
Fitting can fail if you have an inappropriate function or if you have starting values which might make it difficult for the fitting procedure to converge.
In your case, I guess x0 is an important parameter. You should help the gnuplot fitting algorithm a little to have a chance to find reasonable values. Here, I guess x0=1.5 is a reasonable starting value. If this is not sufficient and if your model permits you might want to add additional variables or terms to get a better fit.
Code:
### fitting with appropriate starting values
reset session
$Data <<EOD
0.382 8.29023731095968
0.509 6.36124122026352
0.637 4.66938977764103
0.764 3.3194714217965
0.891 2.15140777817893
1.019 1.15428884806615
1.146 0.262232461832655
EOD
A = 1
B = 1
x0 = 1.5
f(x) = exp(A + B/(x-x0))
set fit nolog
fit f(x) $Data u 1:2 via A,B,x0
plot $Data u 1:2 w lp pt 7 ti "Data",\
f(x) w l lc rgb "red" ti "Fit"
### end of code
Result:
Final set of parameters Asymptotic Standard Error
======================= ==========================
A = 4.61445 +/- 0.3907 (8.466%)
B = 3.57094 +/- 0.8876 (24.86%)
x0 = 1.80616 +/- 0.1371 (7.593%)
Is there a way to contraint the values that fitting parameters can take with gnuplot?
f(x) = A/(x**2) + B/(x**4)
A = 1
B = 0.01
fit f(x) 'data.dat' u 1:2 via A,B
I know that B < 0 doesn't make any sense. Is there a way to impose B > 0?
Since gnuplot supports non-linear fitting you can use B**2 (or sqrt(B**2)) in your function to constrain your variable to be positive.
You could change your function to something like this:
minB = 0.001
f(x) = A*x**-2 + (B<minB:minB:B)*x**-4
But i'm not sure how the NLLS alogrithm reacts to this. Beware.
Or you might think about something like this:
f(x) = A*x**-2 + 10**B*x**-4
Probably this will react much smoother and be closer to an actual physical model of your data.
I'm using the following gnuplot script to plot a linear fit:
#!/usr/bin/gnuplot
set term cairolatex
set output "linear_fit.tex"
c = 299792458.
x(x) = c / x
y(x) = x
h(x) = a * x + b
fit h(x) "linear_fit.dat" u (x($1)):(y($2)) via a,b
plot "linear_fit.dat" u (x($1)):(y($2)) w points title "", \
(h(x)) with lines linecolor rgb "black" title "Linear Fit"
However, after the iterations converge, b is always 1.0: https://dpaste.de/ozReq/
How can I get gnuplot to adjust b as well as a?
Update: Repeating the fit command a few hundred times with alternating via a/via b does give pretty good results, but that just can't be how it's supposed to be done.
Update 2: Here's the data in linear_fit.dat:
# lambda, V
360e-9 1.119
360e-9 1.148
360e-9 1.145
400e-9 0.949
400e-9 0.993
400e-9 0.971
440e-9 0.883
440e-9 0.875
440e-9 0.863
490e-9 0.737
490e-9 0.728
490e-9 0.755
540e-9 0.575
540e-9 0.571
540e-9 0.592
590e-9 0.457
590e-9 0.455
590e-9 0.482
I think your troubles stem from the fact that your x-values are very large (on the order of 10e14).
If you do not provide gnuplot with an initial guess for a and b, it will assume a=1 and b=1 as starting points for the fit. However, this is a poor initial guess:
Please note the log scale on both the x- and y-axis.
From the gnuplot documentation:
fit may, and often will get "lost" if started far from a solution, where SSR is large and changing slowly as the parameters are varied, or it may reach a numerically unstable region (e.g., too large a number causing a floating point overflow) which results in an "undefined value" message or gnuplot halting.
To improve the chances of finding the global optimum, you should set the starting values at least roughly in the vicinity of the solution, e.g., within an order of magnitude, if possible. The closer your starting values are to the solution, the less chance of stopping at another minimum. One way to find starting values is to plot data and the fitting function on the same graph and change parameter values and replot until reasonable similarity is reached. The same plot is also useful to check whether the fit stopped at a minimum with a poor fit.
In your case, such starting values could be:
a = 1e-15
b = -0.5
I obtained these values by eye-balling your range of values.
With those starting values, the linear fit results in:
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 1.97355e-015 +/- 6.237e-017 (3.161%)
b = -0.5 +/- 0.04153 (8.306%)
Which looks like this:
You can play with the control setting of fit (such as setting FIT_LIMIT = 1.e-35) or the starting values to achieve a better fit than this.
EDIT
While I still have not been able to coax gnuplot into modifying both parameters a, b at the same time, I found an alternate approach using R. I am aware that there are many other (scripting) languages that can perform a linear fit and this question was about gnuplot. However, the required effort with R appeared to be minimal.
Here's an example, which, when saved as linear_fit.R and called with
R CMD BATCH linear_fit.R
will provide the two coefficients of the linear fit, that gnuplot failed to provide.
y <- c(1.119, 1.148, 1.145, 0.949, 0.993, 0.971, 0.883, 0.875, 0.863,
0.737, 0.728, 0.755, 0.575, 0.571, 0.592, 0.457, 0.455, 0.482)
x <- c(3.60E-007, 3.60E-007, 3.60E-007, 4.00E-007, 4.00E-007,
4.00E-007, 4.40E-007, 4.40E-007, 4.40E-007, 4.90E-007,
4.90E-007, 4.90E-007, 5.40E-007, 5.40E-007, 5.40E-007,
5.90E-007, 5.90E-007, 5.90E-007)
c = 299792458.
x <- c/x
lm.out <- lm(y ~ x)
svg("linear_fit.svg")
plot(x,y)
abline(lm.out,col="red")
summary(lm.out)
You will end up with an svg-file that contains the plot and a linear_fit.Rout text file. In there you'll find the following coefficients:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.429e-01 4.012e-02 -13.53 3.55e-10 ***
x 2.037e-15 6.026e-17 33.80 2.61e-16 ***
So, in the terminology of the original question, we obtain:
a = 2.037e-15
b = -5.429e-01
These values are very close to the values you quoted from alternating the fit.
In case the comments get purged, these questions were identified as related:
What is gnuplot's internal representation of floating point numbers?
Gnuplot behaves oddly in polynomial fit. Why is that?