fit to time series using Gnuplot - gnuplot

I am a big fan of Gnuplot and now I would like to use the fit-function for time series.
My data set is like:
1.000000 1.000000 0.999795 0.000000 0.000000 0.421927 0.654222 -25.127700 1.000000 1994-08-12
1.000000 2.000000 0.046723 -0.227587 -0.689491 0.328387 1.000000 0.000000 1.000000 1994-08-12
2.000000 1.000000 0.945762 0.000000 0.000000 0.400038 0.582360 -8.624480 1.000000 1995-04-19
2.000000 2.000000 0.060228 -0.056367 -0.680224 0.551019 1.000000 0.000000 1.000000 1995-04-19
3.000000 1.000000 1.016430 0.000000 0.000000 0.574478 0.489638 -3.286880 1.000000 1995-07-15
And my fitting script:
set timefmt "%Y-%m-%d"
set xdata time
set format x "%Y-%m-%d"
f(x)=a+b*x
fit f(x) "model_fit.dat" u 10:($2==2?$4:1/0) via a,b
So I make a conditional fitting to time data.
My problem is, that the Gnuplot fit function doesn't work on time data.
I found a similar question here: Linear regression for time series with Gnuplot but I don't want to use other software. And I also don't know how to change time values to numbers, and then back again....
Can anyone help me solving this with Gnuplot?
Thanks a lot!

Indeed, gnuplot's fitting mechanism works fine for time-data. You must only pay attention to some details.
In general, a linear fit through two data points can be solved exactly. But gnuplot does generally a nonlinear fit, so the initial values are important.
The effective x-value used for the line is given in seconds. Using an initial value of b = 1e-8 works fine here:
set timefmt "%Y-%m-%d"
set xdata time
set format x "%Y-%m-%d"
f(x)=a+b*x
a = 1
b = 1e-8
fit f(x) "model_fit.dat" u 10:($2==2?$4:1/0) via a,b
plot "model_fit.dat" u 10:($2==2?$4:1/0) lt 1 pt 7 title 'data',\
f(x) w l lt 1 title 'fit'

Related

Display changing column value in Gnuplot animation

I am making a gnuplot animation of a satellite going around a planet. My task is to display it's XY trajectory and associated values of velocity and energy versus time. I know how to plot the path, but I've been having problems displaying velocity etc.
the code below does the following:
satellite track and time steps -- column 3:4;
satellite position -- column 3:4;
planet position -- column 6:7.
do for [n=0:int(STATS_records)] {
plot "sat.dat" u 3:4 every ::0::n w lp ls 2 t sprintf("steps=%i", n), \
"sat.dat" u 3:4 every ::n::n w lp ls 4 notitle, \
"sat.dat" u 6:7 every ::0::n w lp ls 3 notitle , \
}
How do I display the associated velocity values for each sprintf ? The velocity values are in column 5. Thank you everyone in advance.
It seems that you want to put everything in the "key" (legend), but another option is to use labels, which can be easily placed arbitrarily. There are labels you can place one at a time (with set label) and with labels for plotting with actual labels. Don't get them confused.
Your main issue seems to be how to pull out the velocity value from column 5. My first instinct (which is quite hacky) is to use some external program, like awk:
v = system(sprintf("awk 'NR==%d{print $5}' '%s'", n+1, infile))
set label 1 sprintf("v=%.3f", v+0) at screen 0.2,0.9
This is also an example of a label (named 1). The screen keyword means screen-relative rather than graph-relative. Putting this inside your for loop will reassign label 1 every iteration, so it overwrites the label from the previous iteration. Not using this 1 will just plop another label on top of the last one, so it would get messy.
Using an external command line like this isn't very portable. (I don't think it would work on Windows.) I saw this question that shows how to pull a value from a specific row and column of a file. The problem I had with using this is that stats implicitly filters according to whatever xrange is set. When making animations like this, I've noticed that the camera can jump around too much from autoranging, so it's nice to have tight control over the plotting range. Defining an xrange at the top of the file interfered with a subsequent stats command to read a velocity value.
You can, however, specify a range for stats (before the file name, such as stats [*:*] infile). But I had issues using this in combination with a predefined xrange based for position. I found that it did work if I specify the desired plotting range on the plot line instead of a set xrange. Here is another (full script) version using only gnuplot:
set terminal pngcairo
infile = 'anim.dat'
stats infile using 3:4 name 'data' nooutput
set key font 'Courier'
do for [n=0:data_records-1] {
set output sprintf('frame-%03d.png', n)
stats [*:*] infile every ::n::n using 5 name 'velocity' nooutput
plot [data_min_x:1.1*data_max_x][data_min_y:1.1*data_max_y] \
infile u 3:4 every ::0::n w linespoints ls 2 t \
sprintf("steps =%6d\nvelocity =%6.3f", n, velocity_min), \
'' u 3:4 every ::n::n w points pt 7 ps 3 notitle
}
Notice that you could easily change this to a set label if you want. Another option is to plot
'' u (x):(y):5 every ::n::n w labels
to place a label at graph position (x,y).
I don't have your data, but I made my own file with what I hope is a similar format to yours:
anim.dat
0 0.0 0.0 0.0 1.11803398875 0.625
1 0.05 0.05 0.02375 1.09658560997 0.625
2 0.1 0.1 0.045 1.07703296143 0.625
3 0.15 0.15 0.06375 1.05948100502 0.625
4 0.2 0.2 0.08 1.04403065089 0.625
5 0.25 0.25 0.09375 1.0307764064 0.625
6 0.3 0.3 0.105 1.01980390272 0.625
7 0.35 0.35 0.11375 1.01118742081 0.625
8 0.4 0.4 0.12 1.00498756211 0.625
9 0.45 0.45 0.12375 1.00124921973 0.625
10 0.5 0.5 0.125 1.0 0.625
11 0.55 0.55 0.12375 1.00124921973 0.625
12 0.6 0.6 0.12 1.00498756211 0.625
13 0.65 0.65 0.11375 1.01118742081 0.625
14 0.7 0.7 0.105 1.01980390272 0.625
15 0.75 0.75 0.09375 1.0307764064 0.625
16 0.8 0.8 0.08 1.04403065089 0.625
17 0.85 0.85 0.06375 1.05948100502 0.625
18 0.9 0.9 0.045 1.07703296143 0.625
19 0.95 0.95 0.02375 1.09658560997 0.625

Best visualization approach to plot a dataset when there is a big difference between the values (GNUPLOT)

I am using the following gnuplot script in order to plot a dataset composed of 400 lines
set title "Learning time for the proposed approachs (Freebase)"
set term png
set boxwidth 3
set style fill solid
set output "dbpedia.png"
set ylabel "Learning time (seconds)"
set xlabel "increasing size of the training dataset"
set xtics font ", 9"
set grid
everyfifth(col) = (int(column(col))%10 ==0)?stringcolumn(1):""
plot for [col=2:4] "dbpedia_duration.txt" every 10 using col:xticlabels(everyfifth(0)) with lines lw 2 title columnheader
Sample dataset
size DDS-rand DDS-ambig DDS-ambig-NN
10 0.003 0.01 0.046
20 0.004 0.423 2.094
30 0.004 1.768 9.262
40 0.004 5.933 30.649
50 0.003 0.586 2.871
60 0.007 2.282 14.226
70 0.005 0.512 2.707
80 0.007 0.089 0.468
90 0.006 4.61 24.471
100 0.006 3.013 16.411
110 0.006 1.578 8.244
120 0.006 1.194 6.418
130 0.008 2.401 12.398
140 0.008 0.014 0.027
150 0.007 0.284 1.541
160 0.009 1.25 7.598
170 0.012 2.027 11.149
Problem and questions
As you can see there is a big difference between the blue curve on one side and the red and green curves on the other side. It's hard to see the other curves on a black and white paper.
Is there a better way to plot this dataset? It is really annoying because we can barely see the red and green curves.
update
if we use the set logscale y as suggested by #Daniel we do get a clear graph.
A standard approach would be to plot the y axis logarithmically, such that the tics for 0.001, 0.01, 0.1, 1, 10, ... are equidistant.
set logscale y
Note: this does not work if your data set contains values exactly equal to zero. In this case, you could use
plot 'data.txt' using 1:($2>0? $2 : 1/0)
to skip values with y == zero (1/0 = undefined, which will be skipped by gnuplot). Adjust column numbers for x and y for your data file.

Multiple plots with gnuplot by grouping columns

I have a data file with schema as "object parameter output1 output2 ...... outputk". For eg.
A 0.1 0.2 0.43 0.81 0.60
A 0.2 0.1 0.42 0.83 0.62
A 0.3 0.5 0.48 0.84 0.65
B 0.1 0.1 0.42 0.83 0.62
B 0.2 0.1 0.82 0.93 0.61
B 0.3 0.5 0.48 0.34 0.15
...
I want to create multiple plots, each plot corresponding to an object, with x axis being the parameter and series being the outputs. Currently, I've written a python script which dumps the rows for each object in different files and then calls gnuplot. Is there a more elegant way to plot it?
You are looking for this:
plot 'data.txt' using (strcol(1) eq "A" ? $2 : 1/0):4 with line
which results to:
If you would like to create plots for every object use:
do for [object in "A B"] {
reset
set title sprintf("Object %s",object)
plot 'data.txt' using (strcol(1) eq object ? $2 : 1/0):4 notitle with line
pause -1
}
Just press Enter for next plot.
Of course you can export these plots in files, too.

Gnuplot Graph Is Disjointed (Data Seems Shifted/Misordered)

Gnuplot gives me the following picture with an odd disjoint in the second graph, whose origin I cannot determine. I've included the data below, in which the x-values are monotonically increasing, which should rule out the possibility of such a disjoint. Any help appreciated!
Generated from the following script:
set size 0.8,0.4
set lmargin 1
set terminal png
set output "test.png"
set multiplot
set origin 0.1,0.1
set xtics 5
set xrange[0:25]
set xlabel "Year"
plot "./g1" u ($1+1):2 w lines t "4 years"
set xlabel ""
set origin 0.1,0.5
set xtics format ""
set x2tics 5
plot "./g2" u ($1+1):2 w lines t "5 years"
unset multiplot
Data for g1 is:
0.000000 1.000000
1.000000 3.000000
2.000000 9.000000
3.000000 27.000000
4.000000 0.809131
5.000000 2.427394
6.000000 7.282183
7.000000 21.846549
8.000000 0.654694
9.000000 1.964081
10.000000 5.892243
11.000000 8.935199
12.000000 0.529733
13.000000 1.589200
14.000000 3.983240
15.000000 2.509780
16.000000 0.428624
17.000000 1.233139
18.000000 1.951804
19.000000 0.595792
20.000000 0.343980
21.000000 0.809600
22.000000 0.729229
23.000000 0.171423
24.000000 0.258384
25.000000 0.426250
Data for g2 is:
0.000000 1.000000
1.000000 3.000000
2.000000 9.000000
3.000000 27.000000
4.000000 81.000000
5.000000 2.427394
6.000000 7.282183
7.000000 21.846549
8.000000 65.539647
9.000000 196.618942
10.000000 5.892243
11.000000 17.676730
12.000000 53.030190
13.000000 159.090569
14.000000 241.250367
15.000000 14.302798
16.000000 42.908394
17.000000 128.725182
18.000000 322.642448
19.000000 203.292210
20.000000 34.718531
21.000000 104.155593
22.000000 299.652772
23.000000 474.288428
24.000000 144.777335
25.000000 84.275565
That's strange. On my system (ubuntu 11.10 64bit) I don't see the problem you have:
$ gnuplot --version
gnuplot 4.4 patchlevel 3
$ gnuplot < a.gnuplot # a.gnuplot is your script, unmodified
And it produces this:
If I were you I'd check:
gnuplot version
The input files - in vim use set list to see if there's any rampant characters hidden

Histogram with numeric x-axis in gnuplot?

I'm having this file as data.dat:
Xstep Y1 Y2 Y3 Y4
332 1.22 0.00 0.00 1.43
336 5.95 12.03 6.11 10.41
340 81.05 81.82 81.92 81.05
394 11.76 6.16 10.46 5.87
398 0.00 0.00 1.51 1.25
1036 0.03 0.00 0.00 0.00
I can plot this data as histogram with this script, hist-v1.gplot (using set style data histogram):
set xlabel "X values"
set ylabel "Occurence"
set style data histogram
set style histogram cluster gap 1
set style fill solid border -1
set term png
set output 'hist-v1.png'
set boxwidth 0.9
# attempt to set xtics so they are positioned numerically on x axis:
set xtics ("332" 332, "336" 336, "340" 340, "394" 394, "398" 398, "1036" 1036)
# ti col reads the first entry of the column, uses it as title name
plot 'data.dat' using 2:xtic(1) ti col, '' u 3 ti col, '' u 4 ti col, '' u 5 ti col
And by calling:
gnuplot hist-v1.gplot && eog hist-v1.png
this image is generated:
However, you can notice that the X axis is not scaled numerically - it understands the X values as categories (i.e. it is a category axis).
I can get a more numerical X axis with the following script, hist-v2.gplot (using with boxes):
set xlabel "X values"
set ylabel "Occurence"
# in this case, histogram commands have no effect
set style data histogram
set style histogram cluster gap 1
set style fill solid border -1
set term png
set output 'hist-v2.png'
set boxwidth 0.9
set xr [330:400]
# here, setting xtics makes them positioned numerically on x axis:
set xtics ("332" 332, "336" 336, "340" 340, "394" 394, "398" 398, "1036" 1036)
# 1:2 will ONLY work with proper xr; since we have x>300; xr[0:10] generates "points y value undefined"!
plot 'data.dat' using 1:2 ti col smooth frequency with boxes, '' u 1:3 ti col smooth frequency with boxes
And by calling:
gnuplot hist-v2.gplot && eog hist-v2.png
this image is generated:
image hist-v2.png http://img266.imageshack.us/img266/6717/histv2.png
Unfortunately, the bars 'overlap' here, so it is hard to read the graph.
Is there a way to keep the numerical scale X axis as in hist-v2.png, but keep the 'bars' side by side with as in hist-v1.png? This thread, "Re: Histogram with x axis date error" says you cannot:
But it will be hard to pull the x-coordinate date out of the data file, ...
but then, it refers to a different problem...
Thanks,
Cheers!
Ok, after reading the gnuplot help for a bit, it seems that histogram style will ''always'' interpret x axis as sequential entries/categories - so indeed, there seems to be no way to get a numerical axis with a histogram style.
However, it turns out that $ can refer to a column, and those can be used to actually 'reposition' the bars in the second (frequency with boxes style) example; so with this code as hist-v2b.gplot:
set xlabel "X values"
set ylabel "Occurence"
set style fill solid border -1
set term png
set output 'hist-v2.png'
set boxwidth 0.9
set xr [330:400]
# here, setting xtics makes them positioned numerically on x axis:
set xtics ("332" 332, "336" 336, "340" 340, "394" 394, "398" 398, "1036" 1036)
# 1:2 will ONLY work with proper xr; since we have x>300; xr[0:10] generates "points y value undefined"!
plot 'data.dat' using ($1-0.5):2 ti col smooth frequency with boxes, '' u ($1-0.25):3 ti col smooth frequency with boxes, '' u ($1+0.25):4 ti col smooth frequency with boxes, '' u ($1+0.5):5 ti col smooth frequency with boxes
And by calling:
gnuplot hist-v2b.gplot && eog hist-v2b.png
this image is generated:
image hist-v2b.png http://img823.imageshack.us/img823/805/histv2b.png
... which is pretty much what I wanted in the first place.
Just a small note - I originally wanted to use the script with inline data; for a setup like this, it would have to be written as
plot '-' using ($1-0.5):2 ti col smooth frequency with boxes, '-' u ($1-0.25):3 ti col smooth frequency with boxes
Xstep Y1 Y2 Y3 Y4
332 1.22 0.00 0.00 1.43
336 5.95 12.03 6.11 10.41
340 81.05 81.82 81.92 81.05
394 11.76 6.16 10.46 5.87
398 0.00 0.00 1.51 1.25
1036 0.03 0.00 0.00 0.00
end
Xstep Y1 Y2 Y3 Y4
332 1.22 0.00 0.00 1.43
336 5.95 12.03 6.11 10.41
340 81.05 81.82 81.92 81.05
394 11.76 6.16 10.46 5.87
398 0.00 0.00 1.51 1.25
1036 0.03 0.00 0.00 0.00
end
... that is, the data would have to be entered multiple times, as it comes in from stdin - this problem is discussed in gnuplot - do multiple plots from data file with built-in commands.
Cheers!
PS: As there is quite a bit of space on the diagram, it would be nice if we could somehow specify separate x-axis ranges; that is discussed in:
Gnuplot tricks
Gnuplot tricks: Broken axis revisited
Setting the box width properly is very important when you plot a histogram using "boxes" plot style. In one of my blog article I have talked about it. If any interest,click here!

Resources