Gnuplot fit function against time x-axis and real number y-axis give "singular matrix in invert_RtR" error? [duplicate] - gnuplot

I am trying to use GNUplot to calculate the best-fit line for some time-series data. The data is just about linear already with a negative slope. The input data looks something like:
1615840396,138849,510249
1615840406,139011,511152
1615840416,137580,510330
1615840426,137493,510501
1615840436,137261,510186
1615840447,137435,511026
1615840456,137054,510252
1615840466,136955,510174
1615840476,136922,510540
1615840486,136970,510999
The first column is a Unix timestamp. A graph of column 2 vs. time looks like this:
I'm trying to produce a best-fit line like this:
gnuplot> set xdata time
gnuplot> set timefmt "%s"
gnuplot> set datafile separator comma
gnuplot> f(x) = m*x + b
gnuplot> fit f(x) 'data.csv' using 1:2 via m,b
Which produces:
Final set of parameters Asymptotic Standard Error
======================= ==========================
m = 8.08062e-05 +/- 1.633 (2.021e+06%)
b = 1 +/- 2.639e+09 (2.639e+11%)
The resulting best fit line has a positive slope, and doesn't really git the data at all:
What am I doing wrong?

This is a recurring question about fitting time data. I guess there should be similar questions here on SO, but I can't find them right now. I'm not sure if there is an example of fitting time data on the gnuplot homepage.
I guess the problem is the following: If you assume a linear function f(x) = a*x + b with time data, the origin will be at Jan, 1st 1970.
Typically, this will be pretty far from your actual data and furthermore, you only have a small range of data compared to the distance to your origin. So, I guess the fitting function cannot deliver really good values.
You better try to fit a function which is shifted by your start date.
You either set this start date manually, or you spend a few lines of code to find it automatically.
Additionally, it will help if you give some starting values for the fitting parameters.
Here, it seems that a will be found without giving a start value and if you set b=1 it will not give good result, but b=10 seems to be ok as starting value.
Code:
### fitting time data
reset session
# create some random test data
set print $Data
do for [i=1:100] {
print sprintf("%.0f,%g",time(0)+i*86400,i+rand(0)*10 )
}
set print
set datafile separator comma
# find out the StartDate
StartDate = 16158768671 # manually by setting a value
# or automatically by using stats
stats $Data u 1 index 0 every ::0:0:0:0 nooutput
StartDate = STATS_min
f(x) = a*(x-StartDate) + b
set fit brief nolog
b=10
fit f(x) $Data u 1:2 via a,b
set key top left
set format x "%b %d" timedate
plot $Data u 1:2 ti "Data", \
f(x) w l lc rgb "red" ti "Fit"
### end of code
Result:
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 1.16005e-05 +/- 1.163e-07 (1.003%)
b = 6.1323 +/- 0.5759 (9.39%)

Related

Using the correlation matrix after a fit in Gnuplot

Say I need to fit some data to a parabola, and then perform some calculations involving the correlation matrix elements of the fit parameters: is there a way to use these parameters directly in gnuplot after the fit converges? Are they stored in some variable like the error estimates?.
I quote the explicit problem I'm having. All of this is written to a plot.gp text file and ran with gnuplot plot.gp.
I include set fit errorbariables at the beginning, and then proceed with:
f(x)=a+b*x+c*x*x
fit f(x) 'file.dat' u 1:2:3 yerrors via a,b,c
Once the fit is done, I can use the values of a,b,c and their errors a_err, b_err and c_err directly in the plot.gp script; my question is: can I do the same with the correlation matrix of the parameters?
The problem is that the matrix is printed to terminal once the script finishes to run:
correlation matrix of the fit parameters:
a b e
a 1.000
b 0.910 1.000
c -0.956 -0.987 1.000
Are the entries of the matrix stores in some variable (like a_err, b_err) that I can access after the fit is done but before the script ends?
I think the command you are looking for is
set fit covariancevariables
If the `covariancevariables` option is turned on, the covariances between
final parameters will be saved to user-defined variables. The variable name
for a certain parameter combination is formed by prepending "FIT_COV_" to
the name of the first parameter and combining the two parameter names by
"_". For example given the parameters "a" and "b" the covariance variable is
named "FIT_COV_a_b".
Edit: I certainly missed gnuplot's intended way via option covariancevariables (apparently available since gnuplot 5.0). Ethan's answer is the way to go. I nevertheless leave my answer, with some modifications it might maybe be useful to extract something else from the fit output.
Maybe I missed it, but I am not aware that you can directly store the elements of the correlation matrix into variables, however, you can do it with some workaround.
You can set the output file for your fit results (check help set fit). The shortest output will be created with the option results. The results will be written to this file (actually, appended if the file already exists).
Example:
After 5 iterations the fit converged.
final sum of squares of residuals : 0.45
rel. change during last iteration : -3.96255e-10
degrees of freedom (FIT_NDF) : 1
rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : 0.67082
variance of residuals (reduced chisquare) = WSSR/ndf : 0.45
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 1.75 +/- 0.3354 (19.17%)
b = -2.65 +/- 1.704 (64.29%)
c = 1.75 +/- 1.867 (106.7%)
correlation matrix of the fit parameters:
a b c
a 1.000
b -0.984 1.000
c 0.898 -0.955 1.000
Now, you can read this file back into a datablock (check gnuplot: load datafile 1:1 into datablock) and extract the values from the last lines (here: 3), check help word and check real.
Script:
### get fit correlation matrix into variables
reset session
$Data <<EOD
1 1
2 3
3 10
4 19
EOD
f(x) = a*x**2 + b*x + c
myFitFILE = "SO71788523_fit.dat"
set fit results logfile myFitFILE
fit f(x) $Data u 1:2 via a,b,c
set key top left
set grid x,y
# load file 1:1 into datablock
FileToDatablock(f,d) = GPVAL_SYSNAME[1:7] eq "Windows" ? \
sprintf('< echo %s ^<^<EOD & type "%s"',d,f) : \
sprintf('< echo "\%s <<EOD" & cat "%s"',d,f) # Linux/MacOS
load FileToDatablock(myFitFILE,'$FIT')
# extract parameters into variables
N = 3 # number of parameters
getValue(p1,p2) = real(word($FIT[|$FIT|-N+p1],p2+1)) # extract value as floating point number
aa = getValue(1,1)
ba = getValue(2,1)
bb = getValue(2,2)
ca = getValue(3,1)
cb = getValue(3,2)
cc = getValue(3,3)
set label 1 at graph 0.1,graph 0.8 \
sprintf("Correlation matrix:\naa: %g\nba: %g\nbb: %g\nca: %g\ncb: %g\ncc: %g",aa,ba,bb,ca,cb,cc)
plot $Data u 1:2 w lp pt 7 lc "red", \
f(x) w l lc "blue" title sprintf("fit: a=%g, b=%g, c=%g",a,b,c)
### end of script
Result:

Gnuplot smoothing data in loglog plot

I would like to plot a smoothed curve based on a dataset which spans over 13 orders of magnitude [1E-9:1E4] in x and 4 orders of magnitude [1E-6:1e-2] in y.
MWE:
set log x
set log y
set xrange [1E-9:1E4]
set yrange [1E-6:1e-2]
set samples 1000
plot 'data.txt' u 1:3:(1) smooth csplines not
The smooth curve looks nice above x=10. Below, it is just a straight line down to the point at x=1e-9.
When increasing samples to 1e4, smoothing works well above x=1. For samples 1e5, smoothing works well above x=0.1 and so on.
Any idea on how to apply smoothing to lower data points without setting samples to 1e10 (which does not work anyway...)?
Thanks and best regards!
JP
To my understanding sampling in gnuplot is linear. I am not aware, but maybe there is a logarithmic sampling in gnuplot which I haven't found yet.
Here is a suggestion for a workaround which is not yet perfect but may act as a starting point.
The idea is to split your data for example into decades and to smooth them separately.
The drawback is that there might be some overlaps between the ranges. These you can minimize or hide somehow when you play with set samples and every ::n or maybe there is another way to eliminate the overlaps.
Code:
### smoothing over several orders of magnitude
reset session
# create some random test data
set print $Data
do for [p=-9:3] {
do for [m=1:9:3] {
print sprintf("%g %g", m*10**p, (1+rand(0))*10**(p/12.*3.-2))
}
}
set print
set logscale x
set logscale y
set format x "%g"
set format y "%g"
set samples 100
pMin = -9
pMax = 3
set table $Smoothed
myFilter(col,p) = (column(col)/10**p-1) < 10 ? column(col) : NaN
plot for [i=pMin:pMax] $Data u (myFilter(1,i)):2 smooth cspline
unset table
plot $Data u 1:2 w p pt 7 ti "Data", \
$Smoothed u 1:2 every ::3 w l ti "cspline"
### end of code
Result:
Addition:
Thanks to #maij who pointed out that it can be simplified by simply mapping the whole range into linear space. In contrast to #maij's solution I would let gnuplot handle the logarithmic axes and keep the actual plot command as simple as possible with the extra effort of some table plots.
Code:
### smoothing in loglog plot
reset session
# create some random test data
set print $Data
do for [p=-9:3] {
do for [m=1:9:3] {
print sprintf("%g %g", m*10**p, (1+rand(0))*10**(p/12.*3.-2))
}
}
set print
set samples 500
set table $SmoothedLog
plot $Data u (log10($1)):(log10($2)) smooth csplines
set table $Smoothed
plot $SmoothedLog u (10**$1):(10**$2) w table
unset table
set logscale x
set logscale y
set format x "%g"
set format y "%g"
set key top left
plot $Data u 1:2 w p pt 7 ti "Data", \
$Smoothed u 1:2 w l lc "red" ti "csplines"
### end of code
Result:
Using a logarithmic scale basically means to plot the logarithm of a value instead of the value itself. The set logscale command tells gnuplot to do this automatically:
read the data, still linear world, no logarithm yet
calculate the splines on an equidistant grid (smooth csplines), still linear world
calculate and plot the logarithms (set logscale)
The key point is the equidistant grid. Let's say one chooses set xrange [1E-9:10000] and set samples 101. In the linear world 1e-9 compared to 10000 is approximately 0, and the resulting grid will be 1E-9 ~ 0, 100, 200, 300, ..., 9800, 9900, 10000. The first grid point is at 0, the second one at 100, and gnuplot is going to draw a straight line between them. This does not change when afterwards logarithms of the numbers are plotted.
This is what you already have noted in your question: you need 10 times more points to get a smooth curve for smaller exponents.
As a solution, I would suggest to switch the calculation of the logarithms and the calculation of the splines.
# create some random test data, code "stolen" from #theozh (https://stackoverflow.com/a/66690491)
set print $Data
do for [p=-9:3] {
do for [m=1:9:3] {
print sprintf("%g %g", m*10**p, (1+rand(0))*10**(p/12.*3.-2))
}
}
set print
# this makes the splines smoother
set samples 1000
# manually account for the logarithms in the tic labels
set format x "10^{%.0f}" # for example this format
set format y "1e{%+03.0f}" # or this one
set xtics 2 # logarithmic world, tic distance in orders of magnitude
set ytics 1
# just "read logarithm of values" from file, before calculating splines
plot $Data u (log10($1)):(log10($2)) w p pt 7 ti "Data" ,\
$Data u (log10($1)):(log10($2)) ti "cspline" smooth cspline
This is the result:

Gnuplot: undefined value during function evaluation

I am trying to fit the function f(x)=exp(a*x) on Gnuplot. It keeps giving me the error 'undefined value during function evaluation'. I use the following code:
y(x)=exp(a*x)
a = 60
fit y(x) 'data.txt' using 1:2 via a
plot y(x), 'data.txt' using 1:2 notitle
The error is coming from the fourth line in the above bit of code. I have set the directory properly but did not it include in the piece of code above.
Where am I going wrong?
Assuming your data looks like this:
8,701 1032,000 1025,000
9,701 974,000 963,000
...
26,701 609,000 603,000
First, by default gnuplot expects decimal numbers to be written with '.' as decimal sign. To change this, use:
set decimalsign ','
Second, and more important to your question, gnuplot internally uses double precision numbers. They go up to about 1e308. In the first iteration of the fit there are calculations like exp(a*x) with a=60 and x=26, which results in exp(1560) = 3e677 - way too large, hence the error message.
Third, an exponential function f(x) = exp(a*x) starts with f(0) = 1 and is increasing for positive a, your data starts at f(0) > 1000 and is decreasing. Therefore I would try a setup like this:
set decimalsign ','
y(x)=b*exp(-a*x)
a = 0.1
b = 1000
fit y(x) 'data.txt' using 1:2 via a,b
plot y(x), 'data.txt' using 1:2 notitle
Result:
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 0.0286709 +/- 0.0005953 (2.076%)
b = 1256.51 +/- 12.12 (0.9647%)
It's up to you to decide if the function really represents the underlying data.

How to plot lines parallel to the x-axis with a certain offset given by data in an input file with gnuplot

I calculated the eigenvalues of the Hamiltonian for the 1D-hydrogen atom in atomic units with the Fourier-Grid-Hamiltonian method in a nice little Fortran program.
All the eigenvalues found between -1 and 0 (the bound states) are saved into a file line by line like this:
-0.50016671392950229
-0.18026105614262633
-0.11485673263086937
-4.7309305955423042E-002
-4.7077108902158216E-002
As the number of found eigenvalues differs depends on the stepsize my program uses, the number of entries in the file can vary (in theory, there are infinite ones).
I now want to plot the values from the file as a line parallel to the x-axis with the offset given by the values read from file.
I also want to be able to plot the data only up to a certain line number, as the values get really close to each other the further you come to zero and they cannot be distinguished by eye anymore.
(Here e.g. it would make sence to plot the first four entries, the fifth is already too close to the previous one)
I know that one can plot lines parallel to the x axis with the command plot *offset* but I don't know how to tell gnuplot to use the data from the file. So far I had to manually plot the values.
As a second step I would like to plot the data only in a certain x range, more concrete between the points of intersection with the harmonic potential used for the numeric solution V(x) = -1/(1+abs(x))
The result should look like this:
scheme of the desired plot (lookalike)
The closest I got to, was with
plot -1/(1+abs(x)),-0.5 title 'E0',-0.18 title 'E1', -0.11 title 'E2'
which got me the following result:
my plot
Hope you guys can help me, and I'm really curios whether gnuplot actually can do the second step I described!
As for the first part of your question, you can for example use the xerrorbars plotting style as:
set terminal pngcairo
set output 'fig.png'
unset key
set xr [-1:1]
set yr [-1:0]
unset bars
plot '-' u (0):($1<-0.1?$1:1/0):(1) w xerrorbars pt 0 lc rgb 'red'
-0.50016671392950229
-0.18026105614262633
-0.11485673263086937
-4.7309305955423042E-002
-4.7077108902158216E-002
e
The idea here is to:
interpret the energies E as points with coordinates (0,E) and assign to each of them an x-errorbar of width 1 (via the third part of the specification (0):($1<-0.1?$1:1/0):(1))
"simulate" the horizontal lines with x-errorbars. To this end, unset bars and pt 0 ensure that Gnuplot displays just plain lines.
consider only energies E<-0.1, the expressions $1<-0.1?$1:1/0 evaluates otherwise to an undefined value 1/0 which has the consequence that nothing is plotted for such E.
plot '-' with explicit values can be of course replaced with, e.g., plot 'your_file.dat'
This produces:
For the second part, it mostly depends how complicated is your function V(x). In the particular case of V(x)=-1/(1+|x|), one could infer directly that it's symmetric around x=0 and calculate the turning points explicitly, e.g.,
set terminal pngcairo
set output 'fig.png'
fName = 'test.dat'
unset key
set xr [-10:10]
set yr [-1:0]
unset bars
f(x) = -1 / (1+abs(x))
g(y) = (-1/y - 1)
plot \
f(x) w l lc rgb 'black', \
fName u (0):($1<-0.1?$1:1/0):(g($1)) w xerrorbars pt 0 lc rgb 'red', \
fName u (0):($1<-0.1?$1:1/0):(sprintf("E%d", $0)) w labels offset 0, char 0.75
which yields
The idea is basically the same as before, just the width of the errorbar now depends on the y-coordinate (the energy). Also, the labels style is used in order to produce explicit labels.
Another approach may be to get data from "energy.dat" (as given in the question) with system and cat commands (so assuming a Un*x-like system...) and select V(x) and E at each x via max:
set key bottom right
set yr [-1:0.2]
set samples 1000
Edat = system( "cat energy.dat" )
max(a,b) = ( a > b ) ? a : b
V(x) = -1/(1+abs(x))
plot for [ E in Edat ] \
max(V(x),real(E)) title sprintf("E = %8.6f", real(E)) lw 2, \
V(x) title "V(x) = -1/(1+|x|)" lc rgb "red" lw 2
If we change the potential to V(x) = -abs(cos(x)), the plot looks pretty funny (and the energy levels are of course not correct!)
More details about the script:
max is not a built-in function in Gnuplot, but a user-defined function having two formal arguments. So for example, we may define it as
mymax( p, q ) = ( p > q ) ? p : q
with any other names (and use mymax in the plot command). Next, the ? symbol is a ternary operator that gives a short-hand notation for an if...else construct. In a pseudo-code, it works as
function max( a, b ) {
if ( a > b ) then
return a
else
return b
end
}
This way, max(V(x),real(E)) selects the greater value between V(x) and real(E) for any given x and E.
Next, Edat = system( "cat energy.dat" ) tells Gnuplot to run the shell command "cat energy.dat" and assign the output to a new variable Edat. In the above case, Edat becomes a string that contains a sequence of energy values read in from "energy.dat". You can check the contents of Edat by print( Edat ). For example, it may be something like
Edat = "-0.11 -0.22 ... -0.5002"
plot for [ E in Edat ] ... loops over words contained in a string Edat. In the above case, E takes a string "-0.11", "-0.22", ..., "-0.5002" one-by-one. real(E) converts this string to a floating-point value. It is used to pass E (a character string) to any mathematical function.
The basic idea is to draw a truncated potential above E, max(V(x),E), for each value of E. (You can check the shape of such potential by plot max(V(x),-0.5), for example). After plotting such curves, we redraw the potential V(x) to make it appear as a single potential curve with a different color.
set samples 1000 increases the resolution of the plot with 1000 points per curve. 1000 is arbitrary, but this seems to be sufficient to make the figure pretty smooth.

Is there a way to have gnuplot use xaxis time data, but skip certain intervals (e.g. non-trading hours)

I'm collecting pricing data on stocks and options during trading hours and appending them to a data file that I plot with gnuplot. The file looks like:
2013-01-30--15:58:14 38.68 0.64
2013-01-30--15:58:44 38.70 0.64
2013-01-30--15:59:15 38.70 0.64
2013-01-30--15:59:45 38.69 0.64
I end up with large periods of time that I don't collect any data for since the markets are closed.
When I plot this data with gnuplot, using xdata as timefmt, it displays large gaps from the end of one day to the start of another.
I'd prefer to have it skip those times during the days where there is no actual data... Is there a way to do this?
I've been able to come close by not plotting the data against the time value in the first column, but I'd like to show the time data AS WELL AS skip those times when the data was not collected.
I hope this makes sense and appreciate your help.
If I understood correctly, you can make good use of a broken axis on x.
There are two ways to obtain broken axis. The first one relies on ternary operators to plot the data only in the region of your interest, which in your case should not even be necessary, and shifting the xtics left in order to reduce the dimension of the empty region.
This is a nice tutorial:
http://gnuplot-tricks.blogspot.com/2009/06/broken-axis-revisited.html
The second one makes uses of multiplots instead. This is probably better suit to your needs.
http://gnuplot-tricks.blogspot.com/2010/06/broken-axis-once-more.html
Hope it helps.
There are similar but slightly different questions:
GNUPLOT Plotting 5 day financial week
I have non-contiguous date/time X data and want non-contiguous X scale
The question is not about breaking the axis, but skipping time intervals with no data.
This can simply be done by plotting the y-data versus the row index (i.e. pseudocolumn 0) (check pseudocolumns), however, then the challenge is to get some reasonable xtics. Here are two suggestions.
Script: (works for gnuplot>=5.0.0, Jan. 2015)
### skip non-trading hours
reset session
FILE = "SO14618708.dat"
myTimeFmt = "%Y-%m-%d--%H:%M:%S"
# create some random test data
set print FILE
t0 = time(0)
y0 = 100
do for [i=0:400] {
t = t0 + i*1800
isOpen(t) = tm_wday(t)>0 && tm_wday(t)<6 && tm_hour(t)>=9 && tm_hour(t)<=17
if (isOpen(t)) {
print sprintf("%s %g",strftime(myTimeFmt,t),y0=y0+rand(0)*2-1)
}
}
set print
set format x "%a\n%d" timedate
set grid x,y
set ytics 5
set key noautotitle
set multiplot layout 3,1
set title "with non-trading hours"
plot FILE u (timecolumn(1,myTimeFmt)):2 w l lc "red"
set title "without non-trading hours, but possible duplicates in day tics"
set format x "\n" timedate
myXtic(col) = strftime("%a\n%d",strptime(myTimeFmt,strcol(col)))
N = 15
plot FILE u 0:2 w l lc "web-green", \
'' u ($0*N):(NaN):xtic(myXtic(1)) every N
N = 1
set title sprintf("with tics only every Nth day (here: N=%d)",N)
SecPerDay = 3600*24
isNewDay(col) = (t0=t1,t1=timecolumn(col,myTimeFmt),t0!=t0 || int(t1)/SecPerDay-int(t0)/SecPerDay>0)
everyNthNewDay(col) = (isNewDay(col) ? d0=d0+1 : 0, d0==N ? (d0=0,1) : 0)
myXtic(col) = everyNthNewDay(col) ? strftime("%a\n%d",t1) : NaN
plot FILE u 0:2 w l lc "blue", \
t1=(d0=0,NaN) '' u 0:(NaN):xtic(myXtic(1))
unset multiplot
### end of script
Result:
Script: (version for the time of OP's question. Works for gnuplot>=4.6.0, March 2012)
Creation of reasonable time and string data files is difficult in gnuplot 4.6, so this part was skipped and assumed you have a suitable datafile.
Although, in the lowest plot, I've only managed either to not display the very first tic (Thu 22) or to show it incorrectly.
### skip non-trading hours
reset
FILE = "SO14618708.dat"
myTimeFmt = "%Y-%m-%d--%H:%M:%S"
set format x "%a\n%d"
set grid x,y
set ytics 5
set key noautotitle
set timefmt "%Y-%m-%d--%H:%M:%S"
set xdata time
set multiplot layout 3,1
set title "with non-trading hours"
plot FILE u (timecolumn(1)):2 w l lc rgb "red"
set title "without non-trading hours, but possible duplicates in day tics"
set format x "\n"
myXtic(col) = strftime("%a\n%d",strptime(myTimeFmt,strcol(col)))
N = 15
plot FILE u 0:2 w l lc rgb "web-green", \
'' u ($0*N):(NaN):xtic(myXtic(1)) every N
N = 1
set title sprintf("with tics only every Nth day (here: N=%d)",N)
SecPerDay = 3600*24
isNewDay(col) = (t0=t1,t1=strptime(myTimeFmt,strcol(col)),(t0!=t0) || ((int(t1)/SecPerDay-int(t0)/SecPerDay)>0))
everyNthNewDay(col) = (isNewDay(col) ? d0=d0+1 : 0, d0==N ? (d0=0,1) : 0)
myXtic(c) = c ? strftime("%a\n%d",t1) : ' '
plot FILE u 0:2 w l lc rgb "blue", \
t1=(d0=0,NaN) '' u ((c=everyNthNewDay(1)) ? $0 : NaN):(NaN):xtic(myXtic(c)) w p
unset multiplot
### end of script
Result: (created with gnuplot4.6.0)

Resources