Cumulative data and extrapolation with gnuplot - gnuplot

Having a list of dates and events which is not necessarily sorted by date
e.g. like
# Date Event
04.12.2018 -4
23.06.2018 5
04.10.2018 3
11.11.2018 -9
08.03.2018 -4
08.03.2018 2
11.11.2018 -3
I would like to sum up the events and do a (e.g. linear) extrapolation, e.g. when the data will hit a certain threshold (e.g. zero).
It looks like smooth frequency and smooth cumulative seemed to be made for this.
But I am struggeling with the following:
a) how can I add a start value (offset), e.g. StartValue = 500
plot $Data u (strftime("%d.%m.%Y",timecolumn(1,"%d.%m.%Y"))):($2+StartValue) smooth cumulative w l t "Cumulated Events"
doesn't do it.
b) how can I get the cumulative data? Especially if the data is not sorted by date?
set table "DataCumulative.dat"
plot $Data u (strftime("%d.%m.%Y",timecolumn(1,"%d.%m.%Y"))):2 smooth cumulative with table
unset table
This look similar to this question (GNUPLOT: saving data from smooth cumulative) but I don't get the expected numbers. In my example below in the file "DataCumulative.dat", I expected unique dates and basically the data from the lower plot. How to get this?
The code:
### start code
reset session
set colorsequence classic
# function for creating a random date between two dates
t(date_str) = strptime("%d.%m.%Y", date_str)
date_random(d0,d1) = strftime("%d.%m.%Y",rand(0)*(t(d1)-t(d0)) + t(d0))
# create some random date data
date_start = "01.01.2018"
date_end = "30.06.2018"
set print $Data
do for [i=1:1000] {
print sprintf("%s\t%g", date_random(date_start,date_end), floor(rand(0)*10-6))
}
set print
set xdata time
set timefmt "%d.%m.%Y"
set xtics format "%b"
set xrange[date_start:"31.12.2018"]
set multiplot layout 2,1
plot $Data u (strftime("%d.%m.%Y",timecolumn(1,"%d.%m.%Y"))):2 smooth frequency with impulses t "Events"
plot $Data u (strftime("%d.%m.%Y",timecolumn(1,"%d.%m.%Y"))):2 smooth cumulative w l t "Cumulated Events"
unset multiplot
# attempt to get cumulative data into datablock
set table "DataCumulative.dat"
plot $Data u (strftime("%d.%m.%Y",timecolumn(1,"%d.%m.%Y"))):2 smooth cumulative with table
unset table
### end of code
The plots:

I guess, I finally got it now. However, there are a few learnings which I still don't understand completely.
1.
In order to get the cumulative data you should not set
set table $DataCumulative
plot $Data u (stringcolumn(1)):2 smooth cumulative with table
unset table
but instead:
set table $DataCumulative
plot $Data u (stringcolumn(1)):2 smooth cumulative
unset table
note the missing "with table" in the plot command.
The first version gives you the original data, the second one the desired cumulative data. But I don't yet understand why.
2.
the default datafile separator setting
which is
set datafile separator whitespace
it doesn't seem not to work. It will give an error message like line xxx: No data to fit
instead, you have to set
set datafile separator " \t" # space and TAB
But I don't understand why.
3.
fitting time date
f_lin(x) = m*x + c
won't give a good fit at all. Apparently, you have to subtract the start date and do the fitting.
f_lin(x) = m*(x-strptime("%d.%m.%Y", Date_Start)) + c
I remember reading this long time ago in the gnuplot documention but I can't find it anymore.
For the time being, I am happy now with the following.
The modified code:
### generate random date between two dates
reset session
# function for creating a random date between two dates
t(date_str) = strptime("%d.%m.%Y", date_str)
date_random(d0,d1) = strftime("%d.%m.%Y",rand(0)*(t(d1)-t(d0)) + t(d0))
# create some random date data
Date_Start = "01.01.2018"
Date_End = "30.06.2018"
set print $Data
do for [i=1:100] {
print sprintf("%s\t%g", date_random(Date_Start,Date_End), floor(rand(0)*10-6))
}
set print
set xdata time
set timefmt "%d.%m.%Y"
# get cumulative data into datablock
set xtics format "%d.%m.%Y"
set table $DataCumulative
plot $Data u (stringcolumn(1)):2 smooth cumulative
unset table
set xtics format "%b"
set datafile separator " \t" # space and TAB
# linear function and fitting
f_lin(x) = m*(x-strptime("%d.%m.%Y", Date_Start)) + c
set fit nolog quiet
fit f_lin(x) $DataCumulative u 1:2 via m,c
Level_Start = 500
Level_End = 0
x0 = (Level_End - Level_Start - c)/m + strptime("%d.%m.%Y", Date_Start)
set multiplot layout 3,1
# event plot & cumulative plot
set xrange[Date_Start:"31.12.2018"]
set xtics format ""
set lmargin 7
set bmargin 0
plot $Data u (timecolumn(1,"%d.%m.%Y")):2 smooth frequency with impulses lc rgb "red" t "Events 2018"
set xtics format "%b"
set bmargin
plot $Data u (timecolumn(1,"%d.%m.%Y")):2 smooth cumulative w l lc rgb "web-green" t "Cumulated Events 2018"
# fit & extrapolation plot
set label 1 at x0, graph 0.8 strftime("%d.%m.%Y",x0) center
set arrow 1 from x0, graph 0.7 to x0, Level_End
set key at graph 0.30, graph 0.55
set xrange[Date_Start:x0+3600*24*50] # end range = extrapolated date + 50 days
set xtics format "%m.%y"
set yrange [-90:]
plot $DataCumulative u (timecolumn(1,"%d.%m.%Y")):($2+Level_Start) w l lc rgb "blue" t "Cumulated Events",\
Level_End w l lc rgb "red" not,\
f_lin(x)+Level_Start w l ls 0 t "Fitting \\& Extrapolation"
unset multiplot
### end of code
will result in:

Related

Indicating weekends in timeseries plot and setting xrange in timeseries gnuplot

Using the excellent answer gnuplot - Read Double Quoted datetime stamp I have been able to plot my time series data.
I now trying to indicate weekends (or interesting timeblocks) my plot and set visible xrange to be 31/1 to 28/2
Weekends in Feb this year were 2/5/22 to 2/6/22 and 2/12/22 to 2/13/22 etc - how could I draw a vertical column and shade to indicate weekend or other interesting timeseries blocks? I looked at trying to plot a rectangle using timeseries points, ie weekend1, but I was unable to fill that shape. Then I tried to draw a rectangle, but could not work out how to specify the corners in the timeseries format to display it.
Since my x axis is a timeseries
How could I indicate all weekends in the diagram - kind of like in a calendar or timesheet?
How do I define the xrange to be 1/31/22 to 2/28/22?
reset session
set datafile separator comma
myTimeFmt = "%m/%d/%y, %H:%M %p"
set format x "%d" time
#
# Gives error all points y value undefined!
#
# set xrange ["1/31/22, 12:01 AM":"2/28/22, 11:59 PM"] #
#
# Trying to draw a series to fill to indicate a weekend range - vertically
#
$weekend1 <<EOD
"2/5/22, 12:01 AM",0
"2/5/22, 12:01 AM",600
"2/6/22, 11:59 PM",600
"2/6/22, 11:59 PM",0
EOD
$account <<EOD
"1/31/22, 5:07 PM",1
"1/31/22, 8:01 PM",100
"2/1/22, 11:10 AM",200
"2/6/22, 12:25 PM",300
"2/9/22, 2:02 PM",400
"2/24/22, 4:22 PM",500
EOD
set object 1 rect from 1,1 to 2,2
plot $account u (timecolumn(1,myTimeFmt)):2 w lp pt 1 ps 1 lc "red" lw 1 ti "Account"
#plot $weekend1 u (timecolumn(1,myTimeFmt)):2 w lp pt 1 ps 1 lc "grey"
Here is what I've understood from your question: plot some time series data and highlight the weekends by coloring the background.
One possible way to get this would be to create datablock with all days within your time range and draw boxes (check help boxxyerror) which are colored (check help lc variable) depending of the weekday (check help tm_wday).
first you have to plot the boxes in the background and then the data
the background color should span the whole vertical graph size. For this you need to know the y-range of the data. You can get STATS_min and STATS_max from stats (check help stats).
in order to span the whole graph you can extend the y-range of the boxes (by adding the range again on top and on bottom) but do not apply autoscale for the boxes (check help noautoscale). Autoscale will be only used for the data.
Maybe you have a fixed known y-range, then you can simply set it via set yrange and suitable size of the boxes.
I hope you can adapt the following example to your needs.
Script:
### highlight weekends
reset session
myTimeFmt = "%d.%m.%Y"
DateStart = "01.01.2022"
DateEnd = "28.02.2022"
SecsPerDay = 24*3600
# create some random test data
set print $Data
y=50
do for [t=strptime(myTimeFmt,DateStart):strptime(myTimeFmt,DateEnd):SecsPerDay] {
print sprintf('"%s", %g', strftime(myTimeFmt,t),y=y+rand(0)*10-5)
}
set print
# datablock with every day between start and end date
set print $Days
do for [t=strptime(myTimeFmt,DateStart):strptime(myTimeFmt,DateEnd):SecsPerDay] {
print strftime(myTimeFmt,t)
}
set print
set datafile separator comma
set key noautotitle
set style fill solid 0.4 border
set format x "%d %b\n%Y" timedate
set xtics out scale 2, 1
DayColor(t) = tm_wday(t)==0 ? 0xff0000 : tm_wday(t)==6 ? 0xffdd00 : 0xdddddd
stats $Data u 2 nooutput # get min and max from column 2
plot $Days u (t=timecolumn(1,myTimeFmt)):(0):(t):(t+SecsPerDay):\
(2*STATS_min-STATS_max):(2*STATS_max+STATS_min):(DayColor(t)) w boxxy lc rgb var noautoscale, \
$Data u (timecolumn(1,myTimeFmt)):2 w lp pt 7 lc "black"
### end of code
Result:
NB: first I thought you wanted to plot a calendar highlighting the weekends, but this was not your question. Since I already had the following code (which will plot a calendar in two different versions), I will post it nevertheless. Maybe it is useful to you or others for further adaptions and optimizations.
Script:
### plot a calendar
reset session
myTimeFmt = "%d.%m.%Y"
DateStart = "01.01.2022"
DateEnd = "31.12.2022"
SecsPerDay = 24*3600
set print $Calendar
do for [t=strptime(myTimeFmt,DateStart):strptime(myTimeFmt,DateEnd):SecsPerDay] {
print strftime(myTimeFmt,t)
}
set print
set xrange[0.5:31.5]
set xtics 1 scale 0 offset 0,0.5 font ",8"
set link x2 via x inverse x
set x2tics 1 out scale 0 offset 0,-0.5 font ",8"
set yrange [:] reverse noextend
set ytics 1 scale 0
set key noautotitle
set style fill solid 0.4 border lc "black"
WeekDay(t) = strftime("%a",t)[1:1]
DayColor(t) = tm_wday(t) == 0 ? 0xff0000 : tm_wday(t) == 6 ? 0xffdd00 : 0xdddddd
Month(t) = int(tm_year(t)*12 + tm_mon(t))
MonthLabel(t,y) = strftime( y ? "%B %Y" : "%Y", t) # y=0 only month, y=1 month+year
plot $Calendar u (t=timecolumn(1,myTimeFmt), tm_mday(t)):(Month(t)):(0.5):(0.5):(DayColor(t)): \
xtic(tm_mday(t)):ytic(MonthLabel(t,1)) w boxxy lc rgb var, \
'' u (t=timecolumn(1,myTimeFmt), tm_mday(t)):(Month(t)):(WeekDay(t)) w labels
pause -1
MonthFirst(t) = int(strptime("%Y%m%d",sprintf("%04d%02d01",tm_year(t),tm_mon(t)+1)))
MonthOffset(t) = tm_wday(MonthFirst(t))==0 ? 7 : tm_wday(MonthFirst(t))
set xrange[*:*]
plot $Calendar u (t=timecolumn(1,myTimeFmt), tm_mday(t)+MonthOffset(t)):(Month(t)):(0.5):(0.5):(DayColor(t)): \
xtic(WeekDay(t)):x2tic(WeekDay(t)):ytic(MonthLabel(t,1)) w boxxy lc rgb var, \
'' u (t=timecolumn(1,myTimeFmt), tm_mday(t)+MonthOffset(t)):(Month(t)):(sprintf("%d",tm_mday(t))) w labels font ",8"
### end of script
Result:
Addition: (calendar with events from a datafile/datablock)
Script:
### plot a calendar with events
reset session
myTimeFmt = "%d.%m.%Y"
DateStart = "01.01.2022"
DateEnd = "31.12.2022"
SecsPerDay = 24*3600
set print $Calendar
do for [t=strptime(myTimeFmt,DateStart):strptime(myTimeFmt,DateEnd):SecsPerDay] {
print strftime(myTimeFmt,t)
}
set print
$Events <<EOD
01.01.2022 A 0xff0000
23.04.2022 B 0x00ff00
03.06.2022 C 0x0000ff
12.08.2022 A 0xffff00
05.09.2022 B 0xff00ff
10.10.2022 X 0x00ffff
12.02.2022 Y 0xffa500
EOD
set xrange[0.5:31.5]
set xtics 1 scale 0 offset 0,0.5 font ",8"
set link x2 via x inverse x
set x2tics 1 out scale 0 offset 0,-0.5 font ",8"
set yrange [:] reverse noextend
set ytics 1 scale 0
set key noautotitle
set style fill solid 0.4 border lc "black"
Month(t) = int(tm_year(t)*12 + tm_mon(t))
MonthLabel(t,y) = strftime( y ? "%B %Y" : "%Y", t) # y=0 only month, y=1 month+year
plot $Calendar u (t=timecolumn(1,myTimeFmt), tm_mday(t)):(Month(t)):(0.5):(0.5): \
xtic(tm_mday(t)):ytic(MonthLabel(t,1)) w boxxy lc "light-grey", \
$Events u (t=timecolumn(1,myTimeFmt), tm_mday(t)):(Month(t)):(0.5):(0.5):3 w boxxy lc rgb var, \
'' u (t=timecolumn(1,myTimeFmt), tm_mday(t)):(Month(t)):2 w labels
### end of script
Result:

Gnuplot multi column plot using CSV headings

I'm struggling to get a multi-column bar chart / histogram going with my input as a CSV with headings. As well as the key showing the {wcfiles,wclines,clocfiles,cloclines} attributes.
$summary << EOD
browser,wcfiles,wclines,clocfiles,cloclines
webkitgtk-2.28.2,19472,4710385,18620,3120740
firefox-78.0.1,289298,43627834,240137,24371602
chromium-83.0.4103.116,420343,100340817,269434,49597826
EOD
set datafile separator ','
set yrange [0:*] # start at zero, find max from the data
set style fill solid border -1
set ytics format "%.0s%c" # will generate labels 100k 200k 300k ... 1M
set title 'sloc the Web'
plot '$summary' using 0:2:($0+1):xtic(1) with boxes lc variable,\
"" u 3 title "wclines",\
"" u 4 title "clocfiles"
Check the examples #Ethan mentioned.
In your case you should set logscale y, otherwise it will be difficult to visualize values with differences of several orders of magnitude.
Code:
### histogram clustered
reset session
$Data <<EOD
browser,wcfiles,wclines,clocfiles,cloclines
webkitgtk-2.28.2,19472,4710385,18620,3120740
firefox-78.0.1,289298,43627834,240137,24371602
chromium-83.0.4103.116,420343,100340817,269434,49597826
EOD
set datafile separator ','
set title 'sloc the Web'
set yrange [1000:*]
set logscale y
set ytics format "%.0s%c"
set style data histogram
set style histogram cluster gap 1
set style fill solid border -1
set boxwidth 0.9
plot $Data u 2:xtic(1) ti col,\
'' u 3 ti col,\
'' u 4 ti col
### end of code
Result:

Plot a forecast line with Gnuplot?

I've this data :
Serv1;2019-10;2561.36
Serv1;2019-11;3292.65
Serv1;2019-12;3077.58
Serv1;2020-01;3369.98
Serv1;2020-02;3134.53
Serv1;2020-03;593.332
With excel, I'm able to create an graph with a forecast line on excel like that :
I'm able to create a graph with gnuplot :
With this gnuplot script :
set title "test"
set terminal png truecolor size 960,720 background rgb "#eff1f0"
set output "/xxx/xxx/xxx/xxx/xxx/test.png"
set grid
set style line 1 \
linecolor rgb '#0060ad' \
linetype 1 linewidth 2 \
pointtype 7 pointsize 1.5
set offsets 0.5,0.5,0,0.5
set datafile separator ";"
set key left
plot "test.txt" using 3:xtic(2) with linespoints linestyle 1
But I don't know how to plot a forecast line with Gnuplot...
Could you show me how to do that ?
Assuming you are looking for a linear fit and extending this linear function, you can try the following below.
Edit:
There is no gnuplot function to get the data value of a certain row and column, e.g. like a = value(row,column). You have to use a somehow strange workaround. Basically, you plot your data into a dummy table, but only the first datapoint of the first block of the first dataset (counting starts with 0). Check help every and help index.
set table $Dummy
plot $Data u (StartDate=timecolumn(1,myTimeFmt)) index 0 every ::0:0:0:0 w table
unset table
print sprintf("StartDate: %s",strftime(myTimeFmt,StartDate))
Result: StartDate: 01/03/2020
Code:
### linear fit and extrapolation
reset session
$Data <<EOD
01/03/2020,100
02/03/2020,150
03/03/2020,125
04/03/2020,150
05/03/2020,175
06/03/2020,200
07/03/2020,220
08/03/2020,150
09/03/2020,175
10/03/2020,125
11/03/2020,150
12/03/2020,200
13/03/2020,210
14/03/2020,230
EOD
set datafile separator comma
myTimeFmt = "%d/%m/%Y"
set format x "%d.%m." time
# put start date into variable StartDate
set table $Dummy
plot $Data u (StartDate=timecolumn(1,myTimeFmt)) index 0 every ::0:0:0:0 w table
unset table
EndDate = strptime("%Y-%m","30/04/2020")
f(x) = a*(x-StartDate)+ b
set fit quiet nolog
fit f(x) $Data u (timecolumn(1,myTimeFmt)):2 via a,b
set xrange[StartDate:EndDate]
set grid xtics, ytics
plot $Data u (timecolumn(1,myTimeFmt)):2 w lp pt 7 lc rgb "red" notitle, \
[StartDate:EndDate] f(x) ti "linear fit with extrapolation"
### end of code
Result:
Edit 2: (version for gnuplot 4.6)
Modified for gnuplot 4.6. Where I got problems and found out later is the parameter FIT_LIMIT = 1e-8 which you need to set for fitting timedata.
Data: (Data.dat)
Serv1;2019-10;2561.36
Serv1;2019-11;3292.65
Serv1;2019-12;3077.58
Serv1;2020-01;3369.98
Serv1;2020-02;3134.53
Serv1;2020-03;593.332
Code:
### linear fit and extrapolation, version for gnuplot 4.6
reset
FILE = "Data.dat"
set datafile separator ";"
set xdata time
set timefmt "%Y-%m"
set format x "%Y\n%m"
# put start date into variable StartDate, dummy plot
plot FILE u (StartDate=timecolumn(2)):0 index 0 every ::0:0:0:0
EndDate = strptime(myTimeFmt,"2020-09")
f(x) = a*(x-StartDate) + b
FIT_LIMIT = 1e-8
fit f(x) FILE u (timecolumn(2)):3 via a,b
set xrange[StartDate:EndDate]
set grid xtics, ytics
set yrange[0:4000]
plot FILE u (timecolumn(2)):3 w lp pt 7 lc rgb "red" notitle, \
f(x) ti "linear fit with extrapolation"
### end of code
Result:

Problem with gnuplot - timestamp data mapping to xrange

what i have:
csv data with timestamps in the first column, columns I want to plot selectively after that.
Every data point ist roughly ten minutes apart. Data is for 24 hours. Everything else set up nicely, examples below
What i want:
Be able to map the time data formatted on the x-axis (xrange?). Like xtics every n hours, in a given format (like "%T, %A"). Best configurable per column I want to plot (thinking about multiplot).
Data:
1545389400,39,0,0,1,664,2493,31.7
1545390000,37,0,0,1,736,3093,32.5
1545391200,33,0,0,1,664,4293,32.6
1545392400,28,0,0,1,704,5493,31.3
1545393000,26,0,0,0,649,6093,30.8
1545393600,24,0,0,0,632,6693,30.5
Code:
set title "Battery Log"
set datafile separator ','
set key center bottom outside
set border lw 0.5 lc '#959595'
set terminal svg dynamic rounded mouse lw 1 background '#272822'
set grid ytics
set ytics nomirror in
set yrange [0:100]
set xtics nomirror
set xtics rotate
set xdata time
set timefmt "%s"
set format x "%T, %A"
plot 'stats.csv' \
u 0:2 w l lc '#f92783' t columnheader, '' \
u 0:8 w l lc '#a6e22a' t columnheader
what about this?
### set time xtics
N = 3 # every n-th hour
set samples 100
set xdata time
set format x "%a, %H:%M"
set xtics rotate
set xtics N*3600
plot '+' u ($0*1200):(3*sin(x)+rand(0)) w lp pt 7 not
### end of code
which should give something like this, ticks every 3rd hour.
Set your N depending on the column you want to plot.

Gnuplot interchanging Axes

I would like to reproduce this plot with gnuplot:
My data has this format:
Data
1: time
2: price
3: volume
I tried this:
plot file using 1:2 with lines, '' using 1:3 axes x1y2 with impulses
Which gives a normal time series chart with y1 as price and y2 as volume.
Next, I tried:
plot file using 2:1 with lines, '' using 2:3 axes x1y2 with impulses
Which gives prices series with y1 as time and y2 as volume.
However, I need the price to remain at y1 and volume at x2.
Maybe something like:
plot file using 1:2 with lines,' ' using 2:3 axes y1x2 with impulses
However, that does not give what I want.
Gnuplot has no official way to draw this kind of horizontal boxplots. However, you can use the boxxyerrorbars (shorthand boxxy) to achieve this.
As I don't have any test data of your actual example, I generated a data file from a Gaussian random-walk. To generate the data run the following python script:
from numpy import zeros, savetxt, random
N = 500
g = zeros(N)
for i in range(1, N):
g[i] = g[i-1] + random.normal()
savetxt('randomwalk.dat', g, delimiter='\t', fmt='%.3f')
As next thing, I do binning of the 'position data' (which in your case would be the volume data). For this one can use smooth frequency. This computes the sum of the y values for the same x-values. So first I use a proper binning function, which returns the same value for a certain range (x +- binwidth/2). The output data is saved in a file, because for the plotting we must exchange x and y value:
binwidth = 2
hist(x) = floor(x+0.5)/binwidth
set output "| head -n -2 > randomwalk.hist"
set table
plot 'randomwalk.dat' using (hist($1)):(1) smooth frequency
unset table
unset output
Normally one should be able to use set table "randomwalk.hist", but due to a bug, one needs this workaround to filter out the last entry of the table output, see my answer to Why does the 'set table' option in Gnuplot re-write the first entry in the last line?.
Now the actual plotting part is:
unset key
set x2tics
set xtics nomirror
set xlabel 'time step'
set ylabel 'position value'
set x2label 'frequency'
set style fill solid 1.0 border lt -1
set terminal pngcairo
set output 'randwomwalk.png'
plot 'randomwalk.hist' using ($2/2.0):($1*binwidth):($2/2.0):(binwidth/2.0) with boxxy lc rgb '#00cc00' axes x2y1,\
'randomwalk.dat' with lines lc rgb 'black'
which gives the result (with 4.6.3, depends of course on your random data):
So, for your data structure, the following script should work:
reset
binwidth = 2
hist(x) = floor(x+0.5)/binwidth
file = 'data.txt'
histfile = 'pricevolume.hist'
set table histfile
plot file using (hist($2)):($3) smooth unique
unset table
# get the number of records to skip the last one
stats histfile using 1 nooutput
unset key
set x2tics
set xtics nomirror
set xlabel 'time'
set ylabel 'price'
set x2label 'volume'
set style fill solid 1.0 border lt -1
plot histfile using ($2/2.0):($1*binwidth):($2/2.0):(binwidth/2.0) every ::::(STATS_records-2) with boxxy lc rgb '#00cc00' axes x2y1,\
file with lines using 1:2 lc rgb 'black'
Note, that this time the skipping of the last table entry is done by counting all entries with the stats command, and skipping the last one with every (yes, STATS_records-2 is correct, because the point numbering starts at 0). This variant doesn't need any external tool.
I also use smooth unique, which computes the average value of the , instead of the sum (which is done with smooth frequency).

Resources