Gnuplot: sum occurrences in time interval - statistics

i'm having a csv like this
2021-10-31;20:30:26
2021-10-31;20:32:15
2021-10-31;20:39:17
2021-10-31;20:40:15
2021-10-31;20:42:13
2021-11-01;08:37:15
...
i would like to calculate the entries within a 10 minute interval and display it in an bar graph. In the example above there are from 20:30 till 20:40 there are 3 hits, from 20:40 till 20:50 there ar 2 hits, and so on.
Is there any way to ge this done with gnuplot? Or do i've to prepare the data?
Thank you, Martin

You can try the smooth frequency option like this:
reset
# formatting of output data (graph)
set format x "%Y-%m-%d\n%H:%M" timedate
# y-axis, bar graph should start at 0
set yrange [0:*]
set ylabel "Occurences"
set ytics 1
# make some space for large x axis labels
set rmargin at screen 0.95
# put input values into bins/time intervals
binwidth=10*60 # 10 minutes in seconds
bin(val) = binwidth * floor(val/binwidth)
# configure bar graph
set boxwidth binwidth
# final plot command
plot "a.dat" using (bin(timecolumn(1, "%Y-%m-%d;%H:%M:%S"))):(1) smooth freq with boxes fs solid 0.25 notitle
Documentation from help smooth freq:
The `frequency` option makes the data monotonic in x; points with the same
x-value are replaced by a single point having the summed y-values.
To plot a histogram of the number of data values in equal size bins,
set the y-value to 1.0 so that the sum is a count of occurances in that bin:
Example:
binwidth = <something> # set width of x values in each bin
bin(val) = binwidth * floor(val/binwidth)
plot "datafile" using (bin(column(1))):(1.0) smooth frequency
You have time data, so column must be replaced by timecolumn, see
help timecolumn for details.
The command set boxwidth is used by the boxes plotting style, see help plotting styles boxes for details.
This is the result:

Related

Gnuplot histogram with boxes and a color per value

I would like to create a histogram with boxes using three pieces of data, first the number of iterations as the x-axis, then the execution time as the y-axis and finally the number of processes used.
I would like to see a bar for each number of processes used, and with a color specific to the value of the number of processes. How can I do this?
My test data is defined as:
"iterations" "processes" "time_execution"
1000 1 14
1000 2 10
1000 4 9
4000 1 60
4000 2 42
4000 4 45
7000 1 80
7000 2 70
7000 4 50
And here is my script so far, but I can't get it to place the three bars side by side:
set term svg
set output out.svg
set boxwidth 1
set style fill solid 1.00 border 0
set style histogram
set size ratio 0.8
set xlabel 'Number of iterations'
set ylabel offset 2 'Time execution in seconds'
set key left Right
set key samplen 2 spacing .8 height 3 font ',10'
set title 'Time execution per iterations and processus used'
plot test.data u 1:3:2 w boxes
Thanks!
I guess your data format doesn't fit the expected histogram format. Check the examples on the gnuplot homepage, although, I think the examples are too crowded which might be confusing and maybe the reason why there are so many histogram questions on SO.
If you modify your data format (see below) it will be easy to plot the histogram.
You can probably use any format, but the effort to prepare the data will be higher (see for example here: Gnuplot: How to plot a bar graph from flattened tables).
Script:
### plotting histogram requires suitable input data format
reset session
$Data <<EOD
xxx 1 2 4
1000 14 10 9
4000 60 42 45
7000 80 70 50
EOD
set style histogram clustered gap 1
set style data histogram
set boxwidth 0.8 relative
set style fill solid 0.3
set xlabel 'Number of iterations'
set xtics out
set ylabel 'Time execution in seconds'
set grid x,y
set key top center title "Processors"
set offset 0,0,0.5,0
plot for [col=2:4] $Data u col:xtic(1) ti col
### end of script
Result:
You can use lc variable
plot test.data u 1:3:2 w boxes lc variable notitle
EDIT
notitle is not necessary, but it makes the plot seems better.

plotting the total monthly amount of rain from a daily data file with gnuplot

I've got a data file with daily values for the amount of rain in the 4th column, for each day of the year.
I'd like to plot a bar graph with each month in the x-axis, and the total monthly amount of rain in the y-axis: that is, to plot "January" (with %B or %b format) vs the sum of the 31 first values of the 4th column. Then to plot "February" vs the sum of the next 28 values of the 4th column, and so on. Do you know how to do that with gnuplot ? Besides, is it possible to write the numerical value of the monthly amounts of rain, on top of each bar ?
I can imagine and understand that for a gnuplot beginner it will not be easy to find and combine the necessary commands to realize your task. If you do a search you will most probably not find exactly your case, but there should be very similar questions and examples around. The key search would be "creating a histogram".
Check help smooth frequency, help strftime, help strptime, help datablocks, help table, basically for every command or keyword there should be a help entry.
The following example is one way to achieve what you are asking for. It is basically binning data, like creating a histogram. Here, your bins will be the months in the following numerical format, e.g. 202109, 202110, 202111, 202112, 202201, etc.
In the example below, some random test data (mm of rain per day) will be created in order to illustrate the result with a graph.
Example data in $Data:
2021-12-01 66
2021-12-02 0
2021-12-03 0
2021-12-04 17
2021-12-05 52
Plot your data into a datablock $Monthly using the option smooth frequency. It will sum up all values per month.
The result in $Monthly will be something like this:
202107 368
202108 622
202109 557
202110 361
202111 628
I hope you can adapt the code to your data and needs.
Edit: the previous version of the code used the plotting style with boxes for the monthly plot. However, this style is centering the box at the beginning of the month, which is undesired here (especially when plotting together with the daily rain). The modified code is using the plotting style with boxxyerror which plots the boxes from the beginning of the month to the beginning of the next month. Check help boxes and help boxxyerror.
Code:
### sum up monthly rainfall
reset session
TimeFmtInput = "%Y-%m-%d"
# create some random test data
set print $Data
StartDate = strptime(TimeFmtInput,"2021-04-01")
do for [i=0:280] {
RainMM = int(rand(0)+0.3) * rand(0)*100
print sprintf("%s %.0f",strftime(TimeFmtInput,StartDate+3600*24*i),RainMM)
}
set print
set table $Monthly
plot $Data u (tm_year(timecolumn(1,TimeFmtInput))*100+tm_mon(timecolumn(1,TimeFmtInput))+1):2 smooth freq
unset table
set style fill solid 0.3
set format x "%Y\n%b" timedate
set key out top center
set grid x,y
set xtics out
NextMonth(t) = strptime("%Y%m",sprintf("%04d%02d",tm_year(t),tm_mon(t)+2))
NextDay(t) = t + 24*3600
set multiplot layout 2,1
plot $Data u (t0=timecolumn(1,TimeFmtInput)):2:(t0):(NextDay(t0)):(0):2 w boxxy lc "blue" title "Daily rain / mm"
set xrange[GPVAL_X_MIN:GPVAL_X_MAX] # take the same xrange as the previous plot
plot $Monthly u (t0=timecolumn(1,"%Y%m")):2:(t0):(NextMonth(t0)):(0):2 w boxxy lc "blue" title "Monthly rain / mm"
unset multiplot
### end of code
Result:

Horizontal bar chart in gnuplot

When Googling "horizontal gnuplot bar chart", the first result I could find http://www.phyast.pitt.edu/~zov1/gnuplot/html/histogram.html suggests rotating (!) the final bar chart which seems rather baroque. Nonetheless I tried the approach but the labels are cut off.
reset
$heights << EOD
dad 181
mom 170
son 100
daughter 60
EOD
set yrange [0:*] # start at zero, find max from the data
set boxwidth 0.5 # use a fixed width for boxes
unset key # turn off all titles
set style fill solid # solid color boxes
set colors podo
set xtic rotate by 90 scale 0
unset ytics
set y2tics rotate by 90
plot '$heights' using 0:2:($0+1):xtic(1) with boxes lc variable
Is there a better approach?
The link you are referring to is from approx. 2009. gnuplot has developed since then. As #Christoph suggested, check help boxxyerror.
Script: (edit: shortened by using 4-columns syntax for boxxyerror, i.e. x:y:+/-dx:+/-dy)
### horizontal bar graph
reset session
$Data << EOD
dad 181
mom 170
son 100
daughter 60
EOD
set yrange [0:*] # start at zero, find max from the data
set style fill solid # solid color boxes
unset key # turn off all titles
myBoxWidth = 0.8
set offsets 0,0,0.5-myBoxWidth/2.,0.5
plot $Data using (0.5*$2):0:(0.5*$2):(myBoxWidth/2.):($0+1):ytic(1) with boxxy lc var
### end of script
Result:
Addition:
what does
2:0:(0):2:($0-myBoxWidth/2.):($0+myBoxWidth/2.):($0+1):ytic(1) mean?
Well, it looks more complicated than it is. Check help boxxyerror. From the manual:
6 columns: x y xlow xhigh ylow yhigh
So, altogether:
x take value from column 2, but not so relevant here since we will use the xyerror box
y take pseudocolumn 0 which is line number starting from zero, check help pseudocolumns, but not so relevant here as well
xlow (0) means fixed value of zero
xhigh value from column 2
ylow ($0-myBoxWidth/2.), line number minus half of the boxwidth
yhigh ($0+myBoxWidth/2.), line number plus half of the boxwidth
($0+1) together with ... lc var: color depending on line number starting from 1
ytic(1): column 1 as ytic label
For some reason (which I don't know) gnuplot still doesn't seem to have a convenient horizontal histogram plotting style, but at least it offers this boxxyerror workaround.

gnuplot: xtics not shown on X-axis

I am trying to populate graph with some fixed values on X-axis and corresponding values on Y-axis. With my below script, no values are labelled on X-axis and value on Y-axis are labelled with powers.
How to make xtics data(1000, 10000, 100000, 1000000, 10000000) appear on X-axis ?
How to get rid of powers on Y-axis ? (Example : I want 4000000 on Y-axis instead of 4x10^6
set xrange [0:]
set output "macs.png"
set ylabel "Flows/sec"
set xlabel "MACS per Switch"
set grid
set xtics (1000, 10000, 100000, 1000000, 10000000)
set style line 2 lt 1 lw 2 pt 1 linecolor 1
plot "macs.data" using :1 with linespoints linestyle 0 title "Floodlight" // Using ":1" as X-axis data is supplied in xtics
Here is my data file :
# Not Supplying X-axis data here as it is supplied through xtics
400
60000
700000
800000
900000
I want my populated graph with only one line to looks like this :
You have supply x and y value for each point. Fortunately, gnuplot supports some special column numbers like column 0, which is a counter of valid data sets, i.e. here a line number ignoring comments. It starts at zero.
Next, your x-axis uses a log scale, so you should do it, too. The formula to convert line number to correct x-value is 10(colum_0) + 3. which translates to 10**($0+3) in gnuplot.
Here is the code:
# Logarithmic scale for x axis
set log x
# get rid of scientific formatting of numbers,
# plain format also for large numbers
set format x "%.0f"
# If you don't like the small ticks between the large ones
set mxtics 1
# put the key (legend) outside, right of the plot area,
# vertically centered (as in your picture)
set key outside right center
# only horizontal grid lines
set grid y
plot "macs.data" using (10**($0+3)):1 title "foo" with linespoints
And here the result:
Alternative:
Your approach plots the data as if it were given like
0 400
1 60000
2 700000
3 800000
4 900000
In this case, you need to label the x-axis on your own, the correct syntax is
set xtics("1000" 0, "10000" 1, "100000" 2, "1000000" 3, "10000000" 4)
This will not draw any automatic labels, but it will put e.g. your string 10000 at x=1

Plotting filled region between two dates in Gnuplot

On the x-axis, I have time as a date %Y-%m-%d. On the y axis I have integers.
Basically, I have a date range for each data point, usually given by a target date and a two week window on either side. I the plot the data point relative to that target window using vertical lines for the low and high ends of the window.
I would like to shade the region between the low and high end.
I tried adding " with filledcurves x1='2000-01-01'"
Thanks
I think you have a few options here. If there are only a few shaded regions that you want to draw, you can use a rectangle (I imagine that this will work -- though I haven't tested it):
set xdata time
set timefmt '%Y-%m-%d'
set object rectangle from first '2000-01-01',graph 0 to first '2001-01-14',graph 1 fc rgb "red" solid back
Another option is that you format your datafile like this:
#date value low-date high-date
2000-01-12 12 2000-01-01 2000-01-26
2000-02-12 12 2000-02-01 2000-02-26
2000-03-12 12 2000-03-01 2000-03-26
Note that there are two blank spaces between each "record" (triple spaced). If your file isn't triple spaced, you can do this easily (in gnuplot) using sed:
plot "< sed 'G;G' datafile.dat" ...
In the special case where low-date and high-date are exactly 3600*24*14 (number of seconds in two weeks) lower/higher than date, you can skip the last two columns and plot it like this:
NPOINTS=3 #Number of points in datafile.
YHIGH=15
set xdata time
set timefmt '%Y-%m-%d'
set style fill solid .5 noborder #somewhat transparent -- see "help fillstyle"
set yrange [0:YHIGH]
plot for [I=0:NPOINTS-1] 'test.dat' i I u 1:(YHIGH):(3600*24*14*2) w boxes,\
for [I=0:NPOINTS-1] 'test.dat' i I u 1:2 w points ls I+1
The first pass draws the rectangles, the second pass draws the points. This only works if the point is in the center of the range, and each range is exactly 3600*24*14 seconds (2 weeks). Note that you'll have to set the number of points and the YHIGH to some value which works for your data.
If the ranges can be skewed -- e.g. the range isn't centered on the point in question, you can probably do something like this:
NPOINTS=3
YHIGH=15
TIMEFMT='%Y-%m-%d'
set xdata time
set timefmt TIMEFMT
set style fill solid .5 noborder #somewhat transparent -- see "help fillstyle"
set yrange [0:YHIGH]
#difference between two times in seconds
boxwidth(s1,s2)=strptime(TIMEFMT,s1)-strptime(TIMEFMT,s2)
#average of two times -- number of seconds since 2000 epoch.
boxmidpoint(s1,s2)=(strptime(TIMEFMT,s1)+strptime(TIMEFMT,s2))/2
set macro #just to make it a little easier to read.
BOXARGS='stringcolumn(4),stringcolumn(3)'
plot for [I=0:NPOINTS-1] 'test.dat' i I u (boxmidpoint(#BOXARGS)):(YHIGH):(boxwidth(#BOXARGS)) w boxes,\
for [I=0:NPOINTS-1] 'test.dat' i I u 1:2 ls I+1

Resources