Gnuplot plot points in certain interval with autoupdate - gnuplot

Say, I have a large data file that starts at index 1 and ends at more than 10000, like this:
1 -35000 44312 53750 97500 67687 5000 1.64
2 33500 -12937 -68000 -37250 -35937 -96750 1.64
3 -37750 43125 53500 95250 66937 4500 1.64
4 29000 -15437 -69000 -39750 -36562 -97250 1.64
5 -39000 43062 52250 93000 65750 3750 1.64
.
.
.
100000 29250 -14250 -69250 -41500 -37500 -98000 1.64
I use this command to monitor the data online:
plot 'data.raw' using 0:3 title 'Reference' w lp ls 1, \
'data.raw' using 0:7 title 'Temperature' w lp ls 7
set xrange [0: ]
pause 0.5
replot
reread
As the data points increases, I barely see a change in the graph, because I plot the whole file from X=0. How can I plot a certain interval only, e.g. deltaX = 300 points with autoupdate? So I would then see practically 0-300, 300-600, and so on in plot window of Gnuplot.
Thank You!

Not sure if this is what you're after. Say that I have some data file with 1000 entries (generated with bash):
for i in `seq 1 1 1000`; do echo $i $RANDOM >> data; done
Now I plot in intervals of 100 points and visualize each interval during 2 second:
do for [i=1:10] {
set xrange[100*(i-1):100*i]
set title "Interval no. ".i
plot "data" w l
pause 2
}
This looks like so:

Related

Plot HTTP Status Codes Grouped by Days

I have a stream of timestamped HTTP status codes:
2021-02-09T10:54:00 200 50
2021-02-09T10:57:00 200 35
2021-02-09T11:00:00 200 50
2021-02-09T11:03:00 500 150
2021-02-09T11:06:00 500 350
2021-02-09T11:09:00 500 450
2021-02-09T11:12:00 500 1000
2021-02-09T11:15:00 404 35
2021-02-09T11:18:00 404 50
2021-02-09T11:21:00 200 50
2021-02-09T11:24:00 200 35
2021-02-09T11:27:00 200 50
2021-02-09T11:30:00 200 50
I already managed to setup gnuplot to group the days:
set xdata time
set ydata time
set format y "%H:%M"
set timefmt "%Y-%m-%dT%H:%M:%S"
set xrange ["2021-02-08T00:00:00":"2021-02-14T23:59:59"]
plot 'availability.csv' using (timecolumn(1,"%Y-%m-%d")):(timecolumn(1,"%H-%M")):2…
I already found a lot of samples like summing over the day (boxes/ histogram) or marking the point in time per day (point). But none of them match my goal of availability over time.
My goal is to have a bar per day binned to 15min blocks. Each block should be colored according to the max status code, e.g. HTTP.500=red, HTTP.404=yellow, HTTP.200=green (only these 3, no teapot/redirect/spooky ones, and the colors as a sort of traffic light). Y-axis is the hour of the day, x-axis is the day.
Am I on the right track, is this possible at all with gnuplot?
What does the using clause look like?
How is binning to 15min intervals merged into the second column?
How to color the specific codes? (It is not like a heatmap calculating color from frequency)
I would start with something like the following.
timecolumn(1,"%H-%M") does not extract hour and minute from timestrings like "2021-02-08T12:34:56". As far as I know, first we have to extract the 12:34 part and then convert this to hours and minutes:
strptime("%H:%M", strcol(1)[12:17])
timestamps are internally stored as seconds, so binning into 15 minute (= 900 second) bins can be reached by using integer division: int(<seconds>)/900*900.0
A gnuplot command like plot "a.dat" using 1:(<expression>, value) evaluates expression and plots value. This is used to ...
"manually" select the max value within a bin. The script goes through all points within a bin and remembers the max value. Please read help ternary. I use the ternary operator twice: once for checking the bin and once for checking the max value
for color, please read help set palette
This is the complete script:
set xdata time
set ydata time
set format y "%H:%M"
set timefmt "%Y-%m-%dT%H:%M:%S"
set xrange ["2021-02-08T00:00:00":"2021-02-14T23:59:59"]
set palette defined (200 "green", 400 "yellow", 500 "red")
unset colorbox
bin = 0
bin_before = 0
max_value = 0
plot 'availability.csv' using \
(timecolumn(1,"%Y-%m-%d")):\
(bin = (int(strptime("%H:%M", strcol(1)[12:17]))/900*900), bin):\
(y = $2, bin == bin_before ? (y>max_value ? max_value = y : max_value = max_value) \
: (max_value = y, bin_before = bin), max_value ) \
linecolor palette pt 5 ps 2 notitle
This is the result:
I think we are not finished, one should add a legend, and it might be interesting to check the possibilities with splot and pm3d.
Interesting challenge. My suggestion would be the following. It's probably not the easiest, but I would say the result looks reasonable. It uses the plotting style with boxxyerror (see help boxxyerror).
From your question, I get that you want to have a binning of 15 minutes and display only the color of the maximum status in that interval. Why not showing a histogram of the different states for each interval? For example: if in the interval there are the following HTTP states: 2x 200, 1x 404 and 2x 500. Then the horizontal bar in this interval will be split into 40% green, 20% yellow and 40% red.
What the following code basically does:
creating some random test data (just for illustration)
binning of the data using smooth freq (check help smooth) with adding a little offset of 1,2,3 seconds for the 3 different states.
do some table rearrangements
create the final table with the x,y positions of the boxes and corresponding to the relative contribution of each status within the binning interval.
In order to get a better understanding:
Example data of datablock $Data:
2021-02-10T12:30:00 200 407
2021-02-10T12:33:00 200 922
2021-02-10T12:36:00 404 615
2021-02-10T12:39:00 200 689
2021-02-10T12:42:00 200 628
2021-02-10T12:45:00 500 10
2021-02-10T12:48:00 200 185
2021-02-10T12:51:00 200 2
2021-02-10T12:54:00 404 743
2021-02-10T12:57:00 200 618
Example data of datablock $Histo3:
1612960200 5 i
1612960201 4 i
1612960202 1 i
1612961100 5 i
1612961101 3 i
1612961102 1 i
1612961103 1 i
Example data of datablock $Histo4:
NaN 0 nan 12:30 0
2021-02-10 0 0.8 12:30 1
2021-02-10 0.8 1 12:30 2
NaN 0 nan 12:45 0
2021-02-10 0 0.6 12:45 1
2021-02-10 0.6 0.8 12:45 2
2021-02-10 0.8 1 12:45 3
The code can certainly be optimized. So, look at it as a starting point...
Code:
### status overview as date/time dependent histograms
reset session
# general settings
myDateFmt = "%Y-%m-%d" # date only format
myTimeFmt = "%H:%M:%S" # time only format
myDateTimeFmt = myDateFmt."T".myTimeFmt # datetime format
SecPerDay = 24*3600 # seconds per day
myStatusList = "200 404 500" # possible states
myColorList = "0x00ff00 0xffff00 0xff0000" # green, yellow, red
# create some random test data
set print $Data
myTime = time(0) # now
myRandomStatus(x) = x<0.70 ? 1 : x<0.95 ? 2 : 3 # random status
myInterval = 3 # interval in minutes
do for [i=1:5000] {
myTime = myTime + myInterval*60
myStatus = word(myStatusList,myRandomStatus(rand(0))) # random status
myValue = int(rand(0)*1000) # random value 0-999
print sprintf("%s %s %g",strftime("%Y-%m-%dT%H:%M:00",myTime),myStatus,myValue)
}
set print
# functions
myStatusNo(col) = column(col)==200 ? 1 : column(col)==404 ? 2 : 3
myColor(i) = int(i) ? int(word(myColorList,int(i))) : 1
myDayTime(t) = tm_hour(t)*3600 + tm_min(t)*60 + tm_sec(t)
# binning
BinWidthSec = 900 # in seconds 900 sec = 15 min
BinTime(col) = floor(myDayTime(timecolumn(col,myDateTimeFmt))/BinWidthSec)*BinWidthSec
set table $Histo1
set format x "%.0f"
plot $Data u (timecolumn(1,myDateFmt)+BinTime(1)):(1) smooth freq
plot $Data u (timecolumn(1,myDateFmt)+BinTime(1)+myStatusNo(2)):(1) smooth freq
set table $Histo2
plot $Histo1 u (sprintf("%.0f",$1)):2 w table # remove empty lines etc.
set table $Histo3
set format x "%.0f"
plot $Histo2 u 1:2 smooth freq # sort the events by time
unset table
# create final table
myX(col1,col2) = int(column(col1))%4==0 ? (Sum=0.0, Total=column(col2),"NaN") : \
strftime(myDateFmt,column(col1))
myXRelStart(col1,col2) = Sum/Total
myXRelEnd(col1,col2) = int(column(col1))%4==0 ? NaN : (Sum=Sum+column(col2), Sum/Total)
BinTimeT(col) = strftime("%H:%M",column(col))
set table $Histo4
plot $Histo3 u (sprintf("% 10s % 5g % 5g % 7s % 3d", \
myX(1,2), myXRelStart(1,2), myXRelEnd(1,2), BinTimeT(1), tm_sec($1))) w table
unset table
# plot settings
set format x "%d.%m." timedate
set format y "%H:%M" timedate
set style fill transparent solid 0.5 noborder
set yrange [0:SecPerDay]
set tics out
set key out title "HTTP status"
plot $Histo4 u (timecolumn(1,myDateFmt)+($3+$2)/2*SecPerDay) : \
(timecolumn(4,myTimeFmt)+BinWidthSec/2) : \
(($3-$2)/2*SecPerDay) : (BinWidthSec/2.):(myColor($5)) \
w boxxy lc rgb var notitle, \
for [i=1:3] keyentry w boxes lc rgb myColor(i) title word(myStatusList,i)
### end of code
Result:

How can I skip empty lines in gnuplot

I would like to plot below file using gnuplot with a continues line. the problem is that there is a empty line after each point. I can get a graph with points. Could you please help me?
x y type
0 -1866.47 i
100 -1866.52 i
200 -1867.11 i
300 -1868.78 i
400 -1871.58 i
500 -1875.4 i
600 -1880.12 i
700 -1885.62 i
800 -1891.81 i
900 -1898.63 i
1000 -1906.02 i
1100 -1913.94 i
1200 -1922.33 i
1300 -1931.17 i
1400 -1940.43 i
1500 -1950.08 i
1600 -1960.11 i
1700 -1970.49 i
1800 -1981.22 i
1900 -1992.27 i
2000 -2003.63 i
You can filter out the file using an external command. E.g, in a *nix OS, you can use awk:
plot "< awk 'NF!=0 { print $0 }' file.dat" w l
(in awk syntax, NF gives the number of fields in a given line, and $0 contains the entire line)
You can cheat with splot :D
set ticslevel 0
set view 90,0
unset ytics
set xtics offset 0,-1
splot 'empt.txt' u 1:1:2 w l t 'title'
A rather late answer, but already from gnuplot 5.0.0 on (2015), you have the plotting style with table (check help with table). So, you can do it without external tools (platform-independently) and without splot-"cheating".
Simply plot your file (or datablock) in a datablock (or file) which will remove the empty lines.
Script:
### remove empty lines in data
reset session
$Data <<EOD
x y type
0 -1866.47 i
100 -1866.52 i
200 -1867.11 i
300 -1868.78 i
400 -1871.58 i
500 -1875.4 i
600 -1880.12 i
700 -1885.62 i
800 -1891.81 i
900 -1898.63 i
1000 -1906.02 i
1100 -1913.94 i
1200 -1922.33 i
1300 -1931.17 i
1400 -1940.43 i
1500 -1950.08 i
1600 -1960.11 i
1700 -1970.49 i
1800 -1981.22 i
1900 -1992.27 i
2000 -2003.63 i
EOD
set table $NoEmptyLines
plot $Data u 1:2 w table
unset table
plot $NoEmptyLines u 1:2 w l lc rgb "red"
### end of script
Result:

gnuplot : using a logarithmic axis for a histogram

I have a data file that I am creating a histogram from.
The data file is :
-0.1 0 0 JANE
1 1 1 BILL
2 2 1 BILL
1 3 1 BILL
6 4 0 JANE
35 5 0 JANE
9 6 1 BILL
4 7 1 BILL
24 8 1 BILL
28 9 1 BILL
9 10 0 JANE
16 11 1 BILL
4 12 0 JANE
45 13 1 BILL
My gnuplot script is :
file='test.txt'
binwidth=10
bin(x,width)=width*floor(x/width)
set boxwidth 1
plot file using (bin($1,binwidth)):(1.0) smooth freq with boxes, \
file using (1+(bin($2,binwidth))):(1.0) smooth freq with boxes
I would like to plot this data on a logscale in y. However there are some 0 values (because some of the bins are empty) that cannot be handled by set logscale y. I get the error Warning: empty y range [1:1], adjusting to [0.99:1.01].
According to gnuplot's help, "The frequency option makes the data monotonic in x; points with the same x-value are replaced by a single point having the summed y-values."
How can I take the log10() of the summed y-values computed by smooth freq with boxes?
There are at least two things that you could do. One is to use a linear axis between 0 and 1 and then use the logarithmic one as explained in this answer. The other one is to plot to a table first and then set the log scale ignoring the points with zero value.
With a normal linear axis and your code (plus set yrange [0:11]) your data looks:
Now lets plot to a table, then set the log scale, then plot ignoring the zero values:
file='test.txt'
binwidth=10
bin(x,width)=width*floor(x/width)
set table "data"
plot file using (bin($1,binwidth)):(1.0) smooth freq, \
file using (1+(bin($2,binwidth))):(1.0) smooth freq
unset table
set boxwidth 1
set logscale y
set yrange [0.1:11]
plot "data" index 0 using ($1):($2 == 0 ? 1/0 : $2) with boxes lc 1, \
"data" index 1 using ($1):($2 == 0 ? 1/0 : $2) with boxes lc 2
set table sometimes generates some undesirable points in the plot, which you can see at x = 0. To get rid of them you can use "< grep -v u data" instead of "data".

Specific gnuplot by data grouping

I'm new in gnuplot and sorry that my problem formulation might be unprecise, but I don't know how to find the tools/commnds needed to solve my problem. The code for plotting I would like to integrate in my bash file.
I have a data set like:
285 1 50 7.35092
265 1 50 7.35092
259 1 50 7.35092
258 1 50 7.35092
264 1 50 7.35092
491 5 50 33.97
488 5 50 33.97
495 5 50 33.97
492 5 50 25.1649
495 5 50 33.0725
500 5 50 13.6176
507 5 50 32.2502
489 5 50 33.0725
494 5 50 33.97
491 5 50 33.97
746 10 50 34.6007
746 10 50 34.6007
767 10 50 30.858
745 10 50 34.8789
746 10 50 34.6007
747 10 50 34.6007
758 10 50 34.6007
772 10 50 34.60
I already grouped the data by entering a new line between blocks. I would like to calculate for each block the mean and standard deviation of the 4th column.
Then I would like to plot on the Y axes the mean with the confidence interval (standard deviation) and on the X axes the value from the second column.
Each data block has a unique number in the 2nd column.
Solution: so far I got the values for a point from the first block but while I try to plot I get an error:
#myBash code for plotting.sh
FILEIN=simulationR.txt
rm plotTestR.png
gnuplot << EOF
reset
set terminal png
set output 'plotTestR.png'
set ylabel 'reward'
set xlabel 'Nr of simualtion'
set title 'Simualtio duration'
set grid
stats "$FILEIN" using 4 every :::0::0 nooutput
mean1 = sprintf('%.3f', STATS_mean)
std1 = sprintf('%.3f', STATS_stddev)
stats "$FILEIN" using 2 every :::0::0 nooutput
x1 = sprintf('%.3f', STATS_max)
plot '-' w yerrorbars title std1
x1 mean1 std1
exit
EOF
and the error:
gnuplot> plot '-' w yerrorbars title std1
^
line 1: Bad data on line 1 of file -
Usually, gnuplot isn't made for such data processing tasks. That's best done with an external script, which does the processing and writes to stdout, which can then be feed directly to gnuplot like
plot '< python myscript.py simulationR.txt'
In your example, you can only have fixed data after the plot '-' part, no variable substitution is done here.
However, gnuplot version 5 introduces a new inline data structure, to which you can write your computed values (set print $data).
Note, that the following is a plain gnuplot script, if you want to wrap it in a bash script (which is not necessary, since you can pass variables to a gnuplot script via the command line), then you must escape the $ characters.
FILEIN="simulationR.txt"
system('rm -f plotTestR.png')
reset
set terminal pngcairo
set output 'plotTestR.png'
set ylabel 'reward'
set xlabel 'Nr of simulation'
set title 'Simulation duration'
set grid
set print $data
do for [i=0:2] {
stats FILEIN using 2:4 every :::i::i nooutput
print sprintf("%e %e %e", STATS_max_x, STATS_mean_y, STATS_stddev_y)
}
set autoscale xfix
set offsets 1,1,0,0
plot $data using 1:2:3 w yerrorbars
A further improvement could be to separate two blocks by two blank lines, in which case you can use
stats 'simulationR.txt' using 0 nooutput
to have the number of blocks in the variable STATS_blocks, and you can rewrite the loop as
do for [i=0:STATS_blocks-1] {
stats FILEIN using 2:4 index i nooutput
print sprintf("%e %e %e", STATS_max_x, STATS_mean_y, STATS_stddev_y)
}

Plot cyclic sum of some row data

I have a data file that store for a given timestamp k values.
Ex:
# data.dat
# Example for k = 3
# Time ID value
1 0 1.555
1 1 1.76
1 2 12.56
2 0 1.75
2 1 2.04
2 2 13.04
3 0 2.01
3 1 0.52
3 2 12.99
# ...
I can print individually the data of each ID versus the time as follows:
set xrange [0:4]
set yrange[0:14]
set xtics 1
plot "data.dat" every 3 using 1:3 title "ID=0" with lp, \
"" every 3::1 using 1:3 title "ID=1" with lp, \
"" every 3::2 using 1:3 title "ID=2" with lp
Yet I'm interested to plot the average sum of the 3 values vs time.
Of course, I could regenerate a new data file containing (with evaluated sum):
# avg_data.dat modified to
# Example for k = 3
# Time ID value
1 (1.555+1.76+12.56)/3
2 (1.75+2.04+13.04)/3
3 (2.01+0.52+12.99)/3
# ...
But of course, I'm seeking an automated way do express that in gnuplot using the data.dat file directly...
Drawing some inspiration from the running average demo on the gnuplot site:
k = 3
back1 = back2 = back3 = 0
shifter(x) = (back3 = back2, back2 = back1, back1 = x)
avger(x,y) = (shifter(x), y == k - 1 ? (back1 + back2 + back3)/3 : 1/0)
plot 'data.dat' u 1:(avger($3, $2)) with points pt 7
This works for me in gnuplot 4.6.1. If you want to have the points at each timestep connected in a line, it may be better to preprocess the data, since gnuplot in general won't connect points resulting from an expression evaluation (see discussion here and here, and in the gnuplot docs for set datafile missing).

Resources