Getting plot title and caption data from the data file - gnuplot

Consider the following file that I want to plot using gnuplot: Servos20211222_105253.csv
# Date/Time 2021/12/22, 10:52:53
# PonE=0,LsKp=200,LsKi=0,LsKd=250,HsKp=40,HsKi=0,HsKd=130,Sp=800,TDEC=1175137
#
# Rel. Time, currentPos, PosPID, currentSpeed, speedPID, Lag, ServoPos
0.00000,4693184,0,0,0,0,4693184
0.00000,4693184,2300,0,368,0,4693184
0.00391,4693185,2300,12,367,0,4693184
:
:
I would like to:
set the plot title to the date/time from the first comment record.
display the record that starts "# PonE" as a caption.
extract the value for TDEC and plot a horizontal line with the name "Target"
I have some influence over the format of the header records, so if (for example) it would be better that they were not comments but provided in some other way, then that can be done.

It is a common problem to get text values from files using only gnuplot. If you can use OS and shell dependent solutions, I'd suggest to use remove the comments from the file and try something like
set title "`head -1 Servos20211222_105253.csv`"
You can place text anywhere using set label <"label text">, where the label text can be the 2nd line from the file.
You can plot a straight line using plot:
p sin(x), 0.5 title "TDEC"
But instead of 0.5, you need to get the value using shell scripts again, e.g. the cut unix command.

There are ways with gnuplot only, although sometimes a bit cumbersome compared with using tools which you have available on Linux (or comparable tools which you need to install on Windows).
Update: shorter and "simplified" script
One possible gnuplot-only way:
set commentschar to nothing, i.e. ''
assign the columns to variables and/or arrays, e.g. myDate, myTime, P[1..9].
Merge P[1..8] into a multi-line string Params by "mis"-using sum (check help sum)
Convert P[9] into a floating point number TDEC for plotting
Script: (modified the data a bit just for illustration)
### extract values from headers with gnuplot only
reset session
$Data <<EOD
# Date/Time 2021/12/22, 10:52:53
# PonE=0,LsKp=200,LsKi=0,LsKd=250,HsKp=40,HsKi=0,HsKd=130,Sp=800,TDEC=1175137
#
# Rel. Time, currentPos, PosPID, currentSpeed, speedPID, Lag, ServoPos
0.00000,1300000,0,0,0,0,4693184
0.00200,1200000,2300,0,368,0,4693184
0.00391,1100000,2300,12,367,0,4693184
EOD
set datafile separator comma commentschar ''
array P[9] # array to store parameters
stats $Data u ($0==0 ? (myDate=strcol(1)[3:], myTime=strcol(2)) : \
sum [_i=1:9] (P[_i] = _i==1 ? strcol(_i)[3:] : strcol(_i) ,0 )) \
every ::0::1 nooutput
set datafile commentschar # set back to default
Params = P[1]
Params = (sum [_i=2:8] (Params=Params.sprintf("\n%s",P[_i]),0),Params)
set title sprintf("%s %s", myDate, myTime)
TDEC = real(P[9][6:]) # convert to real number
set label 1 at graph 0.02, first TDEC P[9] offset 0,-0.7
set label 2 at graph 0.02, graph 0.85 Params
plot $Data u 1:2 w lp pt 7 title "Data", \
TDEC w l lc "red" title "Target"
### end of script
Result:

Related

gnuplot: simple beeswarm example

I have been struggling with a basic beeswarm plot from page 62 in this doc. I imagine they are skipping some details, and I'm not sure what actual data they used. I think in particular the problem is mapping a categorical/string variable to an X-axis value.
I used this data:
A 1
A 2
A 3
B 4
B 5
B 6
With this script:
set terminal png
set output "graph.png"
set jitter
plot "data.csv" using 1:2:1 with points lc variable
I get this error:
"graph_script" line 4: warning: Skipping data file with no valid points
plot "data.csv" using 1:2:1 with points lc variable
^
"graph_script" line 4: x range is invalid
In their demos gallery, I see something like set xtics ("A" -1, "B" 0) which could maybe help me to label already-numeric data better, but what if my data doesn't start off numeric to begin with?
Do I need something like (hash_string_to_large_int($1) % 2)? There must be an easier way!
As mentioned in the comments you have to "convert" your keys into numbers in order to plot them.
You can do this by creating a list with your unique keywords and defining a function to get the indices.
First, the following example creates some random data
The code after knows nothing about the keywords, so it creates the unique list from scratch from the random data.
Maybe there is (and I am not aware) a simpler solution with gnuplot only.
Code:
### bee-swarm plot with string keys
reset session
# create some random test data
myExts = '.py .sh .html'
set print $Data
do for [i=1:100] {
print sprintf("%s %d",word(myExts,int(rand(0)*3)+1),int(rand(0)*10+1)*5)
}
set print
# create a unique list of strings from a data stringcolumn
Uniques = ''
addToList(list,col) = list.( strstrt(list,'"'.strcol(col).'"') > 0 ? '' : ' "'.strcol(col).'"')
stats $Data u (Uniques = addToList(Uniques,1),0) nooutput
getIdx(key) = (_idx=NaN, sum [_i=1:words(Uniques)] (word(Uniques,_i) eq key ? _idx=_i : 0), _idx)
set offsets 0.5,0.5,1,1
set key noautotitle
set multiplot layout 1,2
set title "No jitter"
plot $Data u (idx=getIdx(strcol(1))):2:(idx):xtic(word(Uniques,idx)) w points pt 7 lc var
set title "With jitter"
set jitter
replot
unset multiplot
### end of code
Result:

gnuplot single plot in different colors

I have a single column of data (say 100 samples):
plot 'file' using 1 with lines
But this data is segmented: 10 points, then 10 more, etc... and I'd like each block of 10 to appear in a different color. I did filter them to 10 separate files and used
plot 'file.1' with lines, 'file.2' with lines...
But then the X axis goes 0..10 instead of 0..100 and all 10 graphs are stacked. Is there a simple way to do that without having to generate fake X data ?
Depending on your detailed data format... the following is doing what I think you are asking for.
Your "fake x data" is called pseudocolumn 0, check help pseudocolumns. The color you can change with lc var, check help linecolor variable.
Code:
### variable line color
reset session
# create some test data
set print $Data
do for [i=1:100] {
print sprintf("%g", rand(0)*i)
}
set print
plot $Data u 0:1:(int($0/10)) w lp pt 7 lc var notitle
### end of code
Result:

max value for same minute over multiple days from csv with unix timestamps

I have a CSV with a unix timestamp column that was collected over multiple days having a data row for every 5 minutes (output log of my photo voltaik roof power plant).
I'd like to create a plot for 24 hours that shows the maximum value for every single (fifth) minute over all days.
Can this be done with gnuplots own capabilities or do I have to do the processing outside gnuplot via scrips?
You don't show how your exact data structure looks like, - theozh
This files are rather large. I placed an example here:
http://www.filedropper.com/log-pv-20190607-20190811 (300kB)
I'm specially interested in column 4 (DC1 P) and 9 (DC2 P).
Column 1 (Zeit) holds the unix timestamp.
The final goal is separate graphs (colors) for DC1 P and DC2 P, but that's a different question... ;o)
Update/Revision:
After revisiting this answer, I guess it is time for a clean up and simpler and extended solution. After some iterations and clarifications and after OP provided some data (although, the link is not valid anymore), I came up with some suggestions, which can be improved.
You can do all in gnuplot, no need for external tools!
The original request to plot the maximum values from several days is easy if you use the plotting style with boxes. But this is basically only a graphical solution. In that case is was apparently sufficient. However, if you are interested in the maximum values as numbers it is a little bit more effort.
gnuplot has the option smooth unique and smooth frequency (check help smooth). With this you can easily get the average and sum, respectively, but there is no smooth max or smooth min. As #meuh suggested, you can get maximum or mimimum with arrays, which are available since gnuplot 5.2.0
Script: (Requires gnuplot>=5.2.0)
### plot time data modulo 24h avg/sum/min/max
reset session
FILE = 'log-pv-20190607-20190811.csv'
set datafile separator comma
HeaderCount = 7
myTimeFmt = "%Y-%m-%d %H:%M:%S"
StartTime = ''
EndTime = ''
# if you don't define start/end time it will be taken automatically
if (StartTime eq '' || EndTime eq '') {
stats FILE u 1 skip HeaderCount nooutput
StartTime = (StartTime eq '' ? STATS_min : strptime(myTimeFmt,StartTime))
EndTime = (EndTime eq '' ? STATS_max : strptime(myTimeFmt,EndTime))
}
Modulo24Hours(t) = (t>=StartTime && t<=EndTime) ? (int(t)%86400) : NaN
set key noautotitle
set multiplot layout 3,2
set title "All data" offset 0,-0.5
set format x "%d.%m." timedate
set grid x,y
set yrange [0:]
myHeight = 1./3*1.1
set size 1.0,myHeight
plot FILE u 1:4:(tm_mday($1)) skip HeaderCount w l lc var
set multiplot next
set title "Data per 24 hours"
set format x "%H:%M" timedate
set xtics 3600*6
set size 0.5,myHeight
plot FILE u (Modulo24Hours($1)):4:(tm_mday($1)) skip HeaderCount w l lc var
set title "Average"
set size 0.5,myHeight
plot FILE u (int(Modulo24Hours($1))):4 skip HeaderCount smooth unique w l lc "web-green"
set title "Sum"
set size 0.5,myHeight
plot FILE u (int(Modulo24Hours($1))):4 skip HeaderCount smooth freq w l
set title "Min/Max"
set size 0.5,myHeight
N = 24*60/5
SecPerDay = 3600*24
array Min[N]
array Max[N]
do for [i=1:N] { Min[i]=NaN; Max[i]=0 } # initialize arrays
stats FILE u (idx=(int($1)%SecPerDay)/300+1, $4>Max[idx] ? Max[idx]=$4:0, \
Min[idx]!=Min[idx] ? Min[idx]=$4 : $4<Min[idx] ? Min[idx]=$4:0 ) skip HeaderCount nooutput
plot Min u ($1*300):2 w l lc "web-blue", \
Max u ($1*300):2 w l lc "red"
unset multiplot
### end of script
Result:
From gnuplot 5.2 you could use the new array datatype to calculate a maximum value for each 5 minute slot. I am not a gnuplot expert, so the following example needs more work, but shows the potential.
Assume data is similar to these lines, where there is a date in the format
yyyy.mm.dd.HH:MM, a comma and a y value:
2018.02.03.18:23,4
2018.02.03.19:23,7
2018.02.04.18:23,8
2018.02.05.19:23,11
Instead of using gnuplot's built-in time parsing, since we want to ignore the date, we create a function fsecs to use substr(stringcolumn(...),12,16) to get just the hours and minutes from data column 1, and strptime("%H:%M",...) to convert this to seconds:
set datafile separator ","
fsecs(v) = strptime("%H:%M",substr(stringcolumn(v),12,16))
We create an array Max indexed by "5 minute slot", of which there are 24*60/5 per day. It is initialised to NaN, not-a-number.
Nitems = int(24*60/5)+1
array Max[Nitems]
do for [i=1:Nitems] {
Max[i] = NaN
}
We then "plot" the data file data.csv into a dummy table, rather than generating any output. As we go through the data we index Max by the data x value (column 1) converted to seconds by fsecs(1) and then to slot by findex(). This is Max[findex(fsecs(1))].
We call our function fmax() to return the new maximum to set in the array.
findex(x) = int(((x)/60)/5)
fmax(a,b) = ((a>=b)?a:b)
set table $Dummy
plot 'data.csv' using \
(Max[findex(fsecs(1))] = fmax(Max[findex(fsecs(1))],$2)):2
unset table
Finally, we plot the array, which is the slot number against the value held in that slot number.
plot Max using 1:(Max[$1]) with points lw 2 title "max day"
This works for me on 5.2. You still need to label the x axes with HH:MM, and change the date parsing to fit your needs.
For time formating, please see Gnuplot date/time in x axis
If you do not care about format as time, you may use the every command, see gnuplot docu, but that does not take a maximum or something.
For the maximum value over a given time interval I suggest an awk script, see e.g. https://unix.stackexchange.com/a/207287/297901

Incorrect position and size of percentages in Gnu Plot

I have developped a CGI in bash/html that allow me to generate a graph of my clusters.
Here is an exemple :
This is a graph that works well. The problem is that for some graphs, the percentages overlap or shift far too far from where it should be. Here is my GNUPLOT code:
f(w) = (strlen(w) > 10 ? word(w, 1) . "\n" . word(w, 2) : w)
set title "TITLE"
set terminal png truecolor size 960, 720 background rgb "#eff1f0"
set output "/var/www/html/CLUSTER_NAME.png"
set bmargin at screen 0.1
set key top center
set grid
set style data histograms
set style fill solid 1.00 border -1
set boxwidth 0.7 relative
set yrange [*:*]
set format y "%g%%"
set datafile separator ","
plot 'test1.txt' using 2:xtic(f(stringcolumn(1))) title " CPU consumption (%) ", \
'' using 3 title " RAM consumption (%)", \
'' using 0:($2+1):(sprintf(" %g%%",$2)) with labels notitle, \
'' using 0:($3+1):(sprintf(" %g%%",$3)) with labels notitle
Here is an example of a graph that does not work properly because the percentages are too shifted :
I am able to change this by changing this line in my code:
'' using 0:($3+1):(sprintf(" %g%%",$3)) with labels notitle
To :
'' using 0:($3+1):(sprintf(" %g%%",$3)) with labels notitle
Adding spaces allows to shift the percentages :
But even if it works for this graph, it moves the percentages for the other graphs too... :
I can't get "clean" graphics. Either the percentages overlap, or they go out of scope because the values are too large, or they are completely shifted....
Another example:
Is there a way to make all this move by itself, automatically, according to the values and therefore the size of the bars etc?
You might try an alternative mechanism, using plot for [i=2:3] ... to loop through the 2 columns of values. Instead of guessing the number of spaces to indent, you estimate the x position of the bar using column(0)+(i-2)*.25 (for i = 2 then 3),
which I got to by trial and error.
For example, using a function mytitle to get the 2 titles (my gnuplot is too old for an array):
mytitle(x) = (x==2?"cpu":"ram")
plot for [i=2:3] 'data' using i:xtic(stringcolumn(1)) title mytitle(i), \
for [i=2:3] '' using (column(0)+(i-2)*.25):(column(i)+1):\
(sprintf("%g%%",column(i))) with labels notitle

gnuplot setting line titles by variables

Iam trying to plot multiple data lines with their titles in the key based on the variable which I am using as the index:
plot for [i=0:10] 'filename' index i u 2:7 w lines lw 2 t ' = '/(0.5*i)
However, it cannot seem to do this for a fractional multiple of i. Is there a way around this other than to set the title for each line separately?
sprintf should provide all the functionality needed, e.g.,
plot for [i=0:10] .... t sprintf(" = %.1f", 0.5*i)
in order to use the value of 0.5*i with 1 decimal digit...

Resources