Plotting with GNU Plot, title based on column name - gnuplot

I have a file composed of 3 columns, which format is: X Axis Value | Title ID | Y axis Value
I donĀ“t know how to plot it using columns 1:3 and 2 for the title name. Here is an example:
X Axis Plot Y Axis
2000 plot1 1.2
2000 plot2 4.6
2000 plot3 5.7
3000 plot1 5.8
3000 plot2 7.5
3000 plot3 8.3
So here, we will have 3 plots which should have second column values on their names (1, 2 and 3); 2000 and 3000 would be the X axis values and the thrid column represents de Y value.
So, the "title-2" graph would be: (2000, 4.6) and (3000, 7.5)

If you are a UNIX user or the "grep" command is callable from gnuplot, you may consider the following approach.
set xrange [1000:4000]
set yrange [0:10]
plot for [i=1:3] \
sprintf("< grep plot%i test.dat",i) using 1:3 with linespoints title sprintf("plot%i", i)
The above plot command is a bit more complicated because of the use of "for loop" and "sprintf", but is equivalent to the following.
plot "< grep plot1 test.dat" using 1:3 with linespoints title "plot1", \
"< grep plot2 test.dat" using 1:3 with linespoints title "plot2", \
"< grep plot3 test.dat" using 1:3 with linespoints title "plot3"

The adapted "gnuplot only" version for your case might look like this.
fcol is the filter column number, myKey your keyword, and dcol the data column number. It's not clear to me whether you want 3 lines in 1 graph, or 3 lines in 3 graphs on 1 canvas (multiplot, like example below), or the graph in 1 file on disk or maybe distributed on 3 files on disk.
Code:
### split data by keyword for each plot
reset session
$Data <<EOD
X Axis Plot Y Axis
2000 plot1 1.2
2000 plot2 4.6
2000 plot3 5.7
3000 plot1 5.8
3000 plot2 7.5
3000 plot3 8.3
EOD
myFilter(fcol,myKey,dcol) = strcol(fcol) eq myKey ? column(dcol) : NaN
set datafile missing NaN
set key top left
set multiplot layout 3,1
do for [i=1:3] {
myKey = sprintf("plot%d",i)
set title myKey
plot $Data u 1:(myFilter(2,myKey,3)) w lp pt 7 lc i title myKey
}
unset multiplot
### end of code
Result:
Addition: (all lines in one plot)
If you have regular file pattern in your rows, e.g. 1,2,3,1,2,3,1,2,3,... or 1,1,1,2,2,2,3,3,3,... you possibly could also work with every, check help every.
However, this filtering with the ternary operator should work for all cases, including random 1,3,2,2,3,1,1,2,3,... sequences.
Code:
### split data by keyword
reset session
$Data <<EOD
X Axis Plot Y Axis
2000 plot1 1.2
2000 plot2 4.6
2000 plot3 5.7
3000 plot1 5.8
3000 plot2 7.5
3000 plot3 8.3
EOD
myFilter(fcol,myKey,dcol) = strcol(fcol) eq myKey ? column(dcol) : NaN
set datafile missing NaN
set key top left
myKey(i) = sprintf("plot%d",i)
set title "Plot 1,2,3"
plot for [i=1:3] $Data u 1:(myFilter(2,myKey(i),3)) w lp pt 7 lc i title myKey(i)
### end of code
Result:

Related

Marking zero values with a red line in gnuplot

I have a .csv datafile that outputs values from 0 to 500 i.e
25.2
2.82
2.05
2.13
2.42
2.17
2.00
0
0
0
0
3.33
3.41
3.26
3.30
0
27.8
and plot it using
plot 'out.csv' using 1 with lines
From that file I would like any 0 values to be marked on the graph with a single red line
Depending on what exactly you are looking for, the following would be my suggestion from what I understood from your question.
Plot the data again with impulses and filtered with the ternary operator (check help ternary).
Script:
### plot with ternary operator
reset session
# create some random test data
set samples 300
set table $Data
plot '+' u (rand(0)<0.97 ? (rand(0)**20)*500+1 : 0) w table
unset table
set key out
plot $Data u 1 w l lc 1 ti "data", \
'' u ($1==0 ? 500 : NaN) w impulses lc "red" ti "0 value"
### end of script
Result:

gnuplot's 'steps' style does not accept variable color

I'm using Gnuplot Version 5.2 patchlevel 6 on Debian 10. The following program
$d << EOD
1 0.5 0.1
2 0.75 0.2
3 0.99 0.5
4 1.25 1.1
EOD
plot $d using 1:2:3 w lines lc palette z lw 2
produces an expected output:
But if I change the last line to
plot $d using 1:2:3 w steps lc palette z lw 2
I receive an error message:
line 7: Too many using specs for this style
According to paragraphs II Plotting Styles, Steps in Gnuplot User Manual
The input column requires are the same as for plot styles lines and points.
and in paragraph II Plotting Styles, Lines stated that:
The basic form requires 1, 2, or 3 columns of input data. Additional input columns may be used to provide information such as variable line color
What am I doing wrong?
If you are drawing with steps, the question probably is: which color should the vertical lines have?
Quickly checking the documentation I couldn't find a hint whether variable line color together with steps explicitely works or explicitely doesn't work.
In any case, you can workaround with the following code:
Code:
### plotting with steps and variable line color
reset session
$Data <<EOD
1 0.5 0.1
2 0.75 0.2
3 0.99 0.5
4 1.25 1.1
EOD
set xrange [0:5]
set yrange [0:1.5]
plot x1=y1=NaN $Data u (x0=x1,x1=$1,x0):(y0=y1,y1=$2,y0):(x1-x0):(0):3 w vectors lw 2 lc palette nohead notitle, \
x1=y1=NaN $Data u (x0=x1,x1=$1,x1):(y0=y1,y1=$2,y0):(0):(y1-y0):3 w vectors lw 2 lc palette nohead notitle
### end of code
Result:
Addition: (vertical lines with variable colors)
Maybe you noticed that with your 4 datapoints there are only 3 colors. This is obvious, because if you have 4 data points you will only have 3 connecting lines, hence 3 colors.
A variation would be the following:
Draw your 4 points with the color according to the value column 3 and the same color for the horizontal lines.
However, for the vertical lines you split the lines into as many levels you want (here: myLevels = 20) using the color according to the palette.
Code:
### plotting with steps and variable line color (vertical lines with variable color)
reset session
$Data <<EOD
1 0.5 0.1
2 0.75 0.2
3 0.99 0.5
4 1.25 1.1
EOD
set xrange [0:5]
set yrange [0:1.5]
myLevels = 20
plot x1=y1=c1=NaN $Data u (x0=x1,x1=$1,x0):(y0=y1,y1=$2,y0):(x1-x0):(0):(c0=c1,c1=$3,c0) w vectors lw 2 lc palette nohead notitle, \
for [i=0:myLevels-1] x1=y1=NaN $Data u (x0=x1,x1=$1,x1):(y0=y1,y1=$2,y0+(y1-y0)*i/myLevels):(0):((y1-y0)/myLevels):(c0=c1,c1=$3,c0+(c1-c0)*i/myLevels) w vectors lw 2 lc palette nohead notitle, \
$Data u 1:2:3 w p pt 7 ps 2 lc palette notitle
### end of code
Result:

Gnuplot plot data with text column as X axis

I have a text file as follows:
ls 10
cd 5
cut 12
awk 7
...
I would like to plot it using gnuplot with the text column as the X axis:
plot 'data.txt' u 1:2
But I get this error:
warning: Skipping data file with no valid points
^
x range is invalid
I appreciate your help
$data << EOD
ls 10
cd 5
cut 12
awk 7
EOD
# this is all just plot layout stuff; customize to taste
unset border
set tics scale 0
set xzeroaxis
set title "x coord = line number"
# use line number for x coordinate, column 1 for tic label
plot $data using 0:2:xticlabel(1) with impulse
Check help xticlabels.
And try this:
reset session
$Data <<EOD
ls 10
cd 5
cut 12
awk 7
... 9
EOD
set boxwidth 0.7
set style fill solid 1.0
set yrange[0:]
plot $Data u 2:xtic(1) w boxes

How to add vertical lines with label using gnuplot?

I have this script to plot data from a CSV file using gnuplot. I want to add 3 vertical lines at different times on the plot to show where I changed the workload of my experiment. I was trying to do it with vector but it was messing the data already plotted. I attached my chart and added manually the vertical blue line as an example of what I want.
#!/usr/bin/gnuplot
# set grid
set key under left maxrows 1
set style line 1 lc rgb '#E02F44' lt 1 lw 1 ps 0.5 pt 7 # input throughput
set style line 2 lc rgb '#FF780A' lt 1 lw 1 ps 0.5 pt 1 # output throughput
set style line 3 lc rgb '#56A64B' lt 1 lw 1 ps 0.5 pt 2 # average processing latency
set style line 4 lc rgb '#000000' lt 1 lw 1 ps 0.5 pt 3 # 99th percentile processing latency
set terminal pdf
set pointintervalbox 0
set datafile separator ','
set output "efficiency-throughput-networkbuffer-baseline-TaxiRideNYC-100Kpersec.pdf"
set title "Throughput vs. processing latency consuming 50K r/s from the New York City (TLC)"
set xlabel "time (minutes)"
set ylabel "Throughput (K rec/sec)"
set y2label "processing latency (seconds)"
set ytics nomirror
set y2tics 0, 1
set xdata time # tells gnuplot the x axis is time data
set timefmt "%Y-%m-%d %H:%M:%S" # specify our time string format
set format x "%M" # otherwise it will show only MM:SS
plot "throughput-latency-increasing.csv" using 1:(column(2)/1000) title "IN throughput" with linespoints ls 1 axis x1y1 \
, "throughput-latency-increasing.csv" using 1:(column(10)/1000) title "OUT throughput" with linespoints ls 2 axis x1y1 \
, "throughput-latency-increasing.csv" using 1:(column(18)/1000) title "avg. latency" with linespoints ls 3 axis x1y2 \
, "throughput-latency-increasing.csv" using 1:(column(26)/1000) title "99th perc. latency" with linespoints ls 4 axis x1y2 \
#, "" using 1:($1):(3):(0) notitle with vectors nohead
My data file is:
"Time","pre_aggregate[0]-IN","pre_aggregate[1]-IN","pre_aggregate[2]-IN","pre_aggregate[3]-IN","pre_aggregate[4]-IN","pre_aggregate[5]-IN","pre_aggregate[6]-IN","pre_aggregate[7]-IN","pre_aggregate[0]-OUT","pre_aggregate[1]-OUT","pre_aggregate[2]-OUT","pre_aggregate[3]-OUT","pre_aggregate[4]-OUT","pre_aggregate[5]-OUT","pre_aggregate[6]-OUT","pre_aggregate[7]-OUT","pre_aggregate[0]-50","pre_aggregate[1]-50","pre_aggregate[2]-50","pre_aggregate[3]-50","pre_aggregate[4]-50","pre_aggregate[5]-50","pre_aggregate[6]-50","pre_aggregate[7]-50","pre_aggregate[0]-99","pre_aggregate[1]-99","pre_aggregate[2]-99","pre_aggregate[3]-99","pre_aggregate[4]-99","pre_aggregate[5]-99","pre_aggregate[6]-99","pre_aggregate[7]-99"
"2020-04-27 10:31:00",1428.05,1274.4666666666667,1364.6166666666666,1384.4666666666667,1327.3,1376.5,1390.9166666666667,1418.35,1428.05,1274.4666666666667,1364.6333333333334,1384.4666666666667,1327.3,1376.5,1390.9166666666667,1418.35,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
"2020-04-27 10:31:15",1463.5833333333333,1452.3666666666666,1346.7333333333333,1380.3833333333334,1429.4833333333333,1431.6833333333334,1442.85,1425.15,1463.5833333333333,1452.3666666666666,1346.7333333333333,1380.3833333333334,1429.4833333333333,1431.6833333333334,1442.85,1425.15,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
"2020-04-27 10:31:30",1393.4666666666667,1396.65,1369.55,1381.3833333333334,1336.8,1434.5166666666667,1440.0833333333333,1399.2833333333333,1393.45,1396.65,1369.55,1381.3833333333334,1336.8,1434.5166666666667,1440.0833333333333,1399.2833333333333,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
"2020-04-27 10:31:45",1404.8833333333334,1448.5333333333333,1313.9,1308.1,1359.6333333333334,1329.5166666666667,1338.4166666666667,1481.5666666666666,1404.8833333333334,1448.5333333333333,1313.9,1308.1,1359.6333333333334,1329.5166666666667,1338.4166666666667,1481.5833333333333,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
Of course you can plot your lines and labels. In the example below I'm using the newer syntax compared to set xdata time. Which requires timecolumn(1,myTimeFmt) and e.g. set format x "%M" time.
Your date is in double quotes, so you have to define the timeformat using single quotes including the double quotes.
Furthermore, you are using absolute times, so your lines ideally use the same format. You can put it into a datablock. I hope you can adapt the code to your needs.
Code:
### vertical lines with labels on time axis
reset session
$myLines <<EOD
"2020-04-27 10:34:00"
"2020-04-27 10:39:20"
"2020-04-27 10:43:50"
"2020-04-27 10:48:00"
EOD
myTimeFmt = '"%Y-%m-%d %H:%M:%S"'
StartDate = '"2020-04-27 10:30:00"'
EndDate = '"2020-04-27 10:52:00"'
set format x "%M" time
set xrange [strptime(myTimeFmt,StartDate):strptime(myTimeFmt,EndDate)]
yLow = 1.4
yHigh = 3.5
set tmargin screen 0.90
plot '+' u (strptime(myTimeFmt,StartDate)+$0*60):(rand(0)*3+0.5) w l lc rgb "red" notitle, \
$myLines u (timecolumn(1,myTimeFmt)):(yHigh):("Workload\nchanged") w labels right offset -0.5,1.5 not, \
$myLines u (timecolumn(1,myTimeFmt)):(yLow):(0):(yHigh-yLow) w vec lc rgb "blue" lw 2 nohead not
### end of code
Result:

Gnuplot: plotting points with variable point types

I have x,y values for points in the first 2 colums and a number that indicates the point type (symbol) in the 3. column, in one data file. How do I plot data points with different symbols?
Unfortunately, there isn't a way (AFAIK) to automatically set the point of the plot from a column value using vanilla GNUPLOT.
However, there is a way to get around that by setting a linestyle for each data series, and then plotting the values based on that defined style:
set style line 1 lc rgb 'red' pt 7 #Circle
set style line 2 lc rgb 'blue' pt 5 #Square
Remember that the number after pt is the point-type.
Then, all you have to do is plot (assuming that the data in "data.txt" is ordered ColX ColY Col3):
plot "data.txt" using 1:2 title 'Y Axis' with points ls 1, \
"data.txt" using 1:3 title 'Y Axis' with points ls 2
Try it here using this data (in the section titled "Data" - also note that column 3 "Symbol" is noted used, it's mainly there for illustrative purposes):
# This file is called force.dat
# Force-Deflection data for a beam and a bar
# Deflection Col-Force Symbol
0.000 0 5
0.001 104 5
0.002 202 7
0.003 298 7
And in the Plot Script Heading:
set key inside bottom right
set xlabel 'Deflection (m)'
set ylabel 'Force (kN)'
set title 'Some Data'
set style line 1 lc rgb 'red' pt 7
set style line 2 lc rgb 'blue' pt 5
plot "data.txt" using 1:2 title 'Col-Force' with points ls 1, \
"data.txt" using 1:3 title 'Beam-Force' with points ls 2
The one caveat is of course that you have have to reconfigure your data input source.
REFERENCES:
http://www.gnuplotting.org/plotting-single-points/
http://www.gnuplotting.org/plotting-data/
Here is a possible solution (which is a simple extrapolation from gnuplot conditional plotting with if), that works as long as you don't have tens of different symbols to handle.
Suppose I want to plot 2D points in a coordinate system. I have only two symbols, that I arbitrarily represented with a 0 and a 1 in the last column of my data file :
0 -0.29450470209121704 1.2279523611068726 1
1 -0.4006965458393097 1.0025811195373535 0
2 -0.7109975814819336 0.9022682905197144 1
3 -0.8540692329406738 1.0190201997756958 1
4 -0.5559651851654053 0.7677079439163208 0
5 -1.1831613779067993 1.5692367553710938 0
6 -0.24254602193832397 0.8055955171585083 0
7 -0.3412654995918274 0.6301406025886536 0
8 -0.25005266070365906 0.7788659334182739 1
9 -0.16853423416614532 0.09659398347139359 1
10 0.169997438788414 0.3473801910877228 0
11 -0.5252010226249695 -0.1398928463459015 0
12 -0.17566296458244324 0.09505800902843475 1
To achieve what I want, I just plot my file using conditionals. Using an undefined value like 1/0 results in no plotting of the given point:
# Set styles
REG_PTS = 'pointtype 7 pointsize 1.5 linecolor rgb "purple"'
NET_PTS = 'pointtype 4 pointsize 1.5 linecolor rgb "blue"'
set grid
# Plot each category with its own style
plot "data_file" u 2:($4 == 0 ? $3 : 1/0) title "regular" #REG_PTS, \
"data_file" u 2:($4 == 1 ? $3 : 1/0) title "network" #NET_PTS
Here is the result :
Hope this helps
Variable pointype (pt variable) was introduced (I guess) not until gnuplot 5.2.0 (Sept 2017) (check help points).
Just in retrospective, another (awkward) solution would be the following for those who are still using such early versions.
Data:
1 1.0 4 # empty square
2 2.0 5 # filled square
3 3.0 6 # empty circle
4 4.0 7 # filled circle
5 5.0 8 # empty triangle up
6 6.0 9 # filled triangle down
7 7.0 15 # filled pentagon (cross in gnuplot 4.6 to 5.0)
Script: (works from gnuplot>=4.6.0, March 2012; but not necessary since 5.2.0)
### variable pointtype for gnuplot>=4.6
reset
FILE = 'SO23707979.dat'
set key noautotitle
set offsets 1,1,1,1
set pointsize 4
stats FILE u 0 nooutput
N = STATS_records # get the number of rows
p0=x1=y1=NaN
plot for [n=0:N-1 ] FILE u (x0=x1, x1=$1, x0):(y0=y1, y1=$2, y0):(p0=$3) \
every ::n::n w p pt p0 lc rgb "red", \
FILE u 1:2 every ::N-1::N-1 w p pt p0 lc rgb "red"
### end of script
Result:

Resources