Gnuplot: undefined value during function evaluation - gnuplot

I am trying to fit the function f(x)=exp(a*x) on Gnuplot. It keeps giving me the error 'undefined value during function evaluation'. I use the following code:
y(x)=exp(a*x)
a = 60
fit y(x) 'data.txt' using 1:2 via a
plot y(x), 'data.txt' using 1:2 notitle
The error is coming from the fourth line in the above bit of code. I have set the directory properly but did not it include in the piece of code above.
Where am I going wrong?

Assuming your data looks like this:
8,701 1032,000 1025,000
9,701 974,000 963,000
...
26,701 609,000 603,000
First, by default gnuplot expects decimal numbers to be written with '.' as decimal sign. To change this, use:
set decimalsign ','
Second, and more important to your question, gnuplot internally uses double precision numbers. They go up to about 1e308. In the first iteration of the fit there are calculations like exp(a*x) with a=60 and x=26, which results in exp(1560) = 3e677 - way too large, hence the error message.
Third, an exponential function f(x) = exp(a*x) starts with f(0) = 1 and is increasing for positive a, your data starts at f(0) > 1000 and is decreasing. Therefore I would try a setup like this:
set decimalsign ','
y(x)=b*exp(-a*x)
a = 0.1
b = 1000
fit y(x) 'data.txt' using 1:2 via a,b
plot y(x), 'data.txt' using 1:2 notitle
Result:
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 0.0286709 +/- 0.0005953 (2.076%)
b = 1256.51 +/- 12.12 (0.9647%)
It's up to you to decide if the function really represents the underlying data.

Related

Gnuplot fit function against time x-axis and real number y-axis give "singular matrix in invert_RtR" error? [duplicate]

I am trying to use GNUplot to calculate the best-fit line for some time-series data. The data is just about linear already with a negative slope. The input data looks something like:
1615840396,138849,510249
1615840406,139011,511152
1615840416,137580,510330
1615840426,137493,510501
1615840436,137261,510186
1615840447,137435,511026
1615840456,137054,510252
1615840466,136955,510174
1615840476,136922,510540
1615840486,136970,510999
The first column is a Unix timestamp. A graph of column 2 vs. time looks like this:
I'm trying to produce a best-fit line like this:
gnuplot> set xdata time
gnuplot> set timefmt "%s"
gnuplot> set datafile separator comma
gnuplot> f(x) = m*x + b
gnuplot> fit f(x) 'data.csv' using 1:2 via m,b
Which produces:
Final set of parameters Asymptotic Standard Error
======================= ==========================
m = 8.08062e-05 +/- 1.633 (2.021e+06%)
b = 1 +/- 2.639e+09 (2.639e+11%)
The resulting best fit line has a positive slope, and doesn't really git the data at all:
What am I doing wrong?
This is a recurring question about fitting time data. I guess there should be similar questions here on SO, but I can't find them right now. I'm not sure if there is an example of fitting time data on the gnuplot homepage.
I guess the problem is the following: If you assume a linear function f(x) = a*x + b with time data, the origin will be at Jan, 1st 1970.
Typically, this will be pretty far from your actual data and furthermore, you only have a small range of data compared to the distance to your origin. So, I guess the fitting function cannot deliver really good values.
You better try to fit a function which is shifted by your start date.
You either set this start date manually, or you spend a few lines of code to find it automatically.
Additionally, it will help if you give some starting values for the fitting parameters.
Here, it seems that a will be found without giving a start value and if you set b=1 it will not give good result, but b=10 seems to be ok as starting value.
Code:
### fitting time data
reset session
# create some random test data
set print $Data
do for [i=1:100] {
print sprintf("%.0f,%g",time(0)+i*86400,i+rand(0)*10 )
}
set print
set datafile separator comma
# find out the StartDate
StartDate = 16158768671 # manually by setting a value
# or automatically by using stats
stats $Data u 1 index 0 every ::0:0:0:0 nooutput
StartDate = STATS_min
f(x) = a*(x-StartDate) + b
set fit brief nolog
b=10
fit f(x) $Data u 1:2 via a,b
set key top left
set format x "%b %d" timedate
plot $Data u 1:2 ti "Data", \
f(x) w l lc rgb "red" ti "Fit"
### end of code
Result:
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 1.16005e-05 +/- 1.163e-07 (1.003%)
b = 6.1323 +/- 0.5759 (9.39%)

Using the correlation matrix after a fit in Gnuplot

Say I need to fit some data to a parabola, and then perform some calculations involving the correlation matrix elements of the fit parameters: is there a way to use these parameters directly in gnuplot after the fit converges? Are they stored in some variable like the error estimates?.
I quote the explicit problem I'm having. All of this is written to a plot.gp text file and ran with gnuplot plot.gp.
I include set fit errorbariables at the beginning, and then proceed with:
f(x)=a+b*x+c*x*x
fit f(x) 'file.dat' u 1:2:3 yerrors via a,b,c
Once the fit is done, I can use the values of a,b,c and their errors a_err, b_err and c_err directly in the plot.gp script; my question is: can I do the same with the correlation matrix of the parameters?
The problem is that the matrix is printed to terminal once the script finishes to run:
correlation matrix of the fit parameters:
a b e
a 1.000
b 0.910 1.000
c -0.956 -0.987 1.000
Are the entries of the matrix stores in some variable (like a_err, b_err) that I can access after the fit is done but before the script ends?
I think the command you are looking for is
set fit covariancevariables
If the `covariancevariables` option is turned on, the covariances between
final parameters will be saved to user-defined variables. The variable name
for a certain parameter combination is formed by prepending "FIT_COV_" to
the name of the first parameter and combining the two parameter names by
"_". For example given the parameters "a" and "b" the covariance variable is
named "FIT_COV_a_b".
Edit: I certainly missed gnuplot's intended way via option covariancevariables (apparently available since gnuplot 5.0). Ethan's answer is the way to go. I nevertheless leave my answer, with some modifications it might maybe be useful to extract something else from the fit output.
Maybe I missed it, but I am not aware that you can directly store the elements of the correlation matrix into variables, however, you can do it with some workaround.
You can set the output file for your fit results (check help set fit). The shortest output will be created with the option results. The results will be written to this file (actually, appended if the file already exists).
Example:
After 5 iterations the fit converged.
final sum of squares of residuals : 0.45
rel. change during last iteration : -3.96255e-10
degrees of freedom (FIT_NDF) : 1
rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : 0.67082
variance of residuals (reduced chisquare) = WSSR/ndf : 0.45
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 1.75 +/- 0.3354 (19.17%)
b = -2.65 +/- 1.704 (64.29%)
c = 1.75 +/- 1.867 (106.7%)
correlation matrix of the fit parameters:
a b c
a 1.000
b -0.984 1.000
c 0.898 -0.955 1.000
Now, you can read this file back into a datablock (check gnuplot: load datafile 1:1 into datablock) and extract the values from the last lines (here: 3), check help word and check real.
Script:
### get fit correlation matrix into variables
reset session
$Data <<EOD
1 1
2 3
3 10
4 19
EOD
f(x) = a*x**2 + b*x + c
myFitFILE = "SO71788523_fit.dat"
set fit results logfile myFitFILE
fit f(x) $Data u 1:2 via a,b,c
set key top left
set grid x,y
# load file 1:1 into datablock
FileToDatablock(f,d) = GPVAL_SYSNAME[1:7] eq "Windows" ? \
sprintf('< echo %s ^<^<EOD & type "%s"',d,f) : \
sprintf('< echo "\%s <<EOD" & cat "%s"',d,f) # Linux/MacOS
load FileToDatablock(myFitFILE,'$FIT')
# extract parameters into variables
N = 3 # number of parameters
getValue(p1,p2) = real(word($FIT[|$FIT|-N+p1],p2+1)) # extract value as floating point number
aa = getValue(1,1)
ba = getValue(2,1)
bb = getValue(2,2)
ca = getValue(3,1)
cb = getValue(3,2)
cc = getValue(3,3)
set label 1 at graph 0.1,graph 0.8 \
sprintf("Correlation matrix:\naa: %g\nba: %g\nbb: %g\nca: %g\ncb: %g\ncc: %g",aa,ba,bb,ca,cb,cc)
plot $Data u 1:2 w lp pt 7 lc "red", \
f(x) w l lc "blue" title sprintf("fit: a=%g, b=%g, c=%g",a,b,c)
### end of script
Result:

How to plot lines parallel to the x-axis with a certain offset given by data in an input file with gnuplot

I calculated the eigenvalues of the Hamiltonian for the 1D-hydrogen atom in atomic units with the Fourier-Grid-Hamiltonian method in a nice little Fortran program.
All the eigenvalues found between -1 and 0 (the bound states) are saved into a file line by line like this:
-0.50016671392950229
-0.18026105614262633
-0.11485673263086937
-4.7309305955423042E-002
-4.7077108902158216E-002
As the number of found eigenvalues differs depends on the stepsize my program uses, the number of entries in the file can vary (in theory, there are infinite ones).
I now want to plot the values from the file as a line parallel to the x-axis with the offset given by the values read from file.
I also want to be able to plot the data only up to a certain line number, as the values get really close to each other the further you come to zero and they cannot be distinguished by eye anymore.
(Here e.g. it would make sence to plot the first four entries, the fifth is already too close to the previous one)
I know that one can plot lines parallel to the x axis with the command plot *offset* but I don't know how to tell gnuplot to use the data from the file. So far I had to manually plot the values.
As a second step I would like to plot the data only in a certain x range, more concrete between the points of intersection with the harmonic potential used for the numeric solution V(x) = -1/(1+abs(x))
The result should look like this:
scheme of the desired plot (lookalike)
The closest I got to, was with
plot -1/(1+abs(x)),-0.5 title 'E0',-0.18 title 'E1', -0.11 title 'E2'
which got me the following result:
my plot
Hope you guys can help me, and I'm really curios whether gnuplot actually can do the second step I described!
As for the first part of your question, you can for example use the xerrorbars plotting style as:
set terminal pngcairo
set output 'fig.png'
unset key
set xr [-1:1]
set yr [-1:0]
unset bars
plot '-' u (0):($1<-0.1?$1:1/0):(1) w xerrorbars pt 0 lc rgb 'red'
-0.50016671392950229
-0.18026105614262633
-0.11485673263086937
-4.7309305955423042E-002
-4.7077108902158216E-002
e
The idea here is to:
interpret the energies E as points with coordinates (0,E) and assign to each of them an x-errorbar of width 1 (via the third part of the specification (0):($1<-0.1?$1:1/0):(1))
"simulate" the horizontal lines with x-errorbars. To this end, unset bars and pt 0 ensure that Gnuplot displays just plain lines.
consider only energies E<-0.1, the expressions $1<-0.1?$1:1/0 evaluates otherwise to an undefined value 1/0 which has the consequence that nothing is plotted for such E.
plot '-' with explicit values can be of course replaced with, e.g., plot 'your_file.dat'
This produces:
For the second part, it mostly depends how complicated is your function V(x). In the particular case of V(x)=-1/(1+|x|), one could infer directly that it's symmetric around x=0 and calculate the turning points explicitly, e.g.,
set terminal pngcairo
set output 'fig.png'
fName = 'test.dat'
unset key
set xr [-10:10]
set yr [-1:0]
unset bars
f(x) = -1 / (1+abs(x))
g(y) = (-1/y - 1)
plot \
f(x) w l lc rgb 'black', \
fName u (0):($1<-0.1?$1:1/0):(g($1)) w xerrorbars pt 0 lc rgb 'red', \
fName u (0):($1<-0.1?$1:1/0):(sprintf("E%d", $0)) w labels offset 0, char 0.75
which yields
The idea is basically the same as before, just the width of the errorbar now depends on the y-coordinate (the energy). Also, the labels style is used in order to produce explicit labels.
Another approach may be to get data from "energy.dat" (as given in the question) with system and cat commands (so assuming a Un*x-like system...) and select V(x) and E at each x via max:
set key bottom right
set yr [-1:0.2]
set samples 1000
Edat = system( "cat energy.dat" )
max(a,b) = ( a > b ) ? a : b
V(x) = -1/(1+abs(x))
plot for [ E in Edat ] \
max(V(x),real(E)) title sprintf("E = %8.6f", real(E)) lw 2, \
V(x) title "V(x) = -1/(1+|x|)" lc rgb "red" lw 2
If we change the potential to V(x) = -abs(cos(x)), the plot looks pretty funny (and the energy levels are of course not correct!)
More details about the script:
max is not a built-in function in Gnuplot, but a user-defined function having two formal arguments. So for example, we may define it as
mymax( p, q ) = ( p > q ) ? p : q
with any other names (and use mymax in the plot command). Next, the ? symbol is a ternary operator that gives a short-hand notation for an if...else construct. In a pseudo-code, it works as
function max( a, b ) {
if ( a > b ) then
return a
else
return b
end
}
This way, max(V(x),real(E)) selects the greater value between V(x) and real(E) for any given x and E.
Next, Edat = system( "cat energy.dat" ) tells Gnuplot to run the shell command "cat energy.dat" and assign the output to a new variable Edat. In the above case, Edat becomes a string that contains a sequence of energy values read in from "energy.dat". You can check the contents of Edat by print( Edat ). For example, it may be something like
Edat = "-0.11 -0.22 ... -0.5002"
plot for [ E in Edat ] ... loops over words contained in a string Edat. In the above case, E takes a string "-0.11", "-0.22", ..., "-0.5002" one-by-one. real(E) converts this string to a floating-point value. It is used to pass E (a character string) to any mathematical function.
The basic idea is to draw a truncated potential above E, max(V(x),E), for each value of E. (You can check the shape of such potential by plot max(V(x),-0.5), for example). After plotting such curves, we redraw the potential V(x) to make it appear as a single potential curve with a different color.
set samples 1000 increases the resolution of the plot with 1000 points per curve. 1000 is arbitrary, but this seems to be sufficient to make the figure pretty smooth.

Gnuplot: Fitting asymptotic curve to data

I am trying to fit an asymptotic curve to my data using gnuplot. It is a dataset showing reaction time results over a testing period. I have been able to plot the data and fit a straight line through it using the following code.
f(x) = a*x + c;
fit f(x) 'ReactionLearning.txt' using 1:2 via a,c
plot 'ReactionLearning.txt' using 1:2 with points lt 1 pt 3 notitle, \
f(x) with lines notitle
Which gives the following result:
http://imgur.com/PlQmalX.jpg
However, as this is supposed to show a learning effect, an asymptotic curve would make a lot more sense because the increase in performance caused by a learning effect will eventually stop, making the line even out.
From what I understand asymptotic cuves are created with the f(x) = 1/x. So I changed my code to be
f(x) = 1/(a*x)
fit f(x) 'ReactionLearning.txt' using 1:2 via a
plot 'ReactionLearning.txt' using 1:2 with points lt 1 pt 3 notitle, \
f(x) with lines notitle
However, I get this output: http://imgur.com/PimTa1T
Could someone explain what I am doing wrong here?
Thanks
There are many curves that show an asymptotic behavior, and 1/x is probably not the one that comes most often when describing physical or biological processes. Usually, these processes might show some sort of exponential decay. With the data that you show I don't think you can conclude anything about which model you should use, other than "it decays". If you already know what is the functional behavior you expect, that makes things different. That said, the general form of your 1/x curve should be f(x) = a/(x-x0) + c, which will probably give you some meaningful results when you fit to it:
f(x) = a/(x-x0) + c
fit f(x) "data" via a,c,x0
Since fitting might show instabilities for this kind of function if the initial values are bad, you should/might need to provide sensible initial values or reformulate the problem as a linear relation. You can do the latter by a change of variable y = 1/(x - x0) and do the fitting for different values of x0. Record the error in the fit (which is output by gnuplot) for each of them and see how the error gets minimized as a function of x0: it should be quadratic about the optimum value. Something like this:
f(x) = a*x + c
x0 = 1. # give some value for x0
fit f(x) "data" u (1./($1-x0)):2 via a,c # record fit errors for a and c
x0 = 3. # give some other value for x0
fit f(x) "data" u (1./($1-x0)):2 via a,c # record fit errors for a and c

reduce datapoints when using logscale in gnuplot

I have a large set of data points from x = 1 to x = 10e13 (step size is fixed to about 3e8).
When I try to plot them using a logscale I certainly get an incredible huge point-density towards the end. Of course this affects my output plots since postscript and svg files (holding each and every data point) are getting really big.
Is there a way to tell gnuplot to decrease the data density dynamically?
Sample data here. Shows a straight line using logarithmic x-axis.
Usually, for this kind of plots, one can use a filter function which selects the desired points and discards all others (sets their value to 1/0:
Something like:
plot 'sample.dat' using (filter($1) ? $1 : 1/0):2
Now you must define an appropriate filter function to change the data density. Here is a proposal, with pseudo-data, although you might for sure find a better one, which doesn't show this typical logarithmic pattern:
set logscale x
reduce(x) = x/(10**(floor(log10(x))))
filterfunc(x) = abs(log10(sc)+(log10(x) - floor(log10(x))) - log10(floor(sc*reduce(x))))
filter(x) = filterfunc(x) < 1e-5 ? x : 1/0
set multiplot layout 1,2
sc = 1
plot 'sample.data' using (filter($1)):2 notitle
sc = 10
replot
The variable sc allows to change the density. The result is (with 4.6.5) is:
I did some work inspired by Christoph's answer and able to get equal spacing in log scale. I made a filtering, if you have numbers in the sequence you can simply use Greatest integer function and then find the nearest to it in log scale by comparing the fraction part. Precision is tuned by precision_parameter here.
precision_parameter=100
function(x)=(-floor(precision_parameter*log10(x))+(precision_parameter*log10(x)))
Now filter by using the filter function defined below
density_parameter = 3.5
filter(x)=(function(x) < 1/(log10(x))**density_parameter & function(x-1) > 1/(log10(x))**density_parameter ) ? x : 1/0
set datafile missing "NaN"
Last line helps in plotting with line point. I used x and x-1 assuming the xdata is in arithmetic progression with 1 as common difference, change it accordingly with your data. Just replace x by filter(x) in the plot command.
plot 'sample_data.dat' u (filter($1)):2 w lp

Resources