gnuplot additional parameter to X axis

gnuplot additional parameter to X axis - gnuplot

I wonder how I can add a parameter to every X parameter. Like on the picture, where every X parameter has an additional parameter.
I run gnuplot with the following command
gnuplot -p -e "reset; set yrange [0:1]; set term png truecolor size 1024,1024; set grid ytics; set grid xtics; set key bottom right; set output 'Recall.png'; set key autotitle columnhead; plot for [i=2:3] 'Recall' using 1:i with linespoints linecolor i pt 0 ps 3
Recall file has the following content
train approach1 approach2
2 0.6 0.07
7 0.64 0.076
9 0.65 0.078
I wonder if I can add additional parameter as follows
train approach1 approach2
2(10) 0.6 0.07
7(15) 0.64 0.076
9(20) 0.65 0.078
The actual plotting should be according the real X parameters (2,7,9) an additional parameter is only for visualization and should be printed together with X.

Many gnuplot's terminals provide an enhanced option
that mimics the functionality provided by the postscript
terminal, functionality described here.
What you want can be done using an enhanced terminal in conjunction with the set xtics command (see help set xtics for the correct sintax):
gnuplot> set term qt enhanced
gnuplot> set xrange [2:10]
gnuplot> set xtics ('{/=8 3} {/=20 (a)}' 3, '6 (c)' 6)
gnuplot> plot sin(x)
Please refer to the link for a complete description of the available commands.
Update
To produce automatically the x axis labels, one can use backticks substitution, either directly in a gnuplot command file or on the command line, as in the OP approach.
The command line is longish...
gnuplot -p -e "reset; set yrange [0:1]; set term png truecolor size 1024,1024; set grid ; set key bottom right; set output 'Recall.png'; set key autotitle columnhead; `awk -f Recall.awk Recall` ; plot for [i=2:3] 'Recall' using 1:i with linespoints linecolor i pt 0 ps 3"
The key point is using an awk script that outputs the appropriate gnuplot command, and here it is the awk script
% cat Recall.awk
BEGIN { printf "set xtics (" }
NR>1 {
printf (NR==2?"":",")
printf ("'{/=8 %d} {/=16 (%d)}' %d", $1, $4, $1) }
END { print ")"}
Oooops!
I forgot to show the modified format of data file...
% cat Recall
train approach1 approach2
2 0.6 0.07 10
7 0.64 0.076 15
9 0.65 0.078 20
and here it is the product of the previous command line

If you want to take an xtic label from your data file, you can use using ...:xtic(1) which would take the value of the first column as xtic label.
The disadvantage might be, that for every value in your data file you'll get an xtic, and no other ones. So, using the data file
train approach1 approach2
2(10) 0.6 0.07
7(15) 0.64 0.076
9(20) 0.65 0.078
you could plot with
reset
set term png truecolor size 1024,1024
set grid ytics
set grid xtics
set key bottom right
set output 'Recall.png'
set key autotitle columnhead
plot for [i=2:3] 'Recall' using 1:i:xtic(1) with linespoints linecolor i pt 7 ps 3
and get
Note, that this uses the correct x-values only, because gnuplot itself drops the content inside the parenthesis, not being a valid number.
If you want to use different font sizes for the label parts, you could add an additional column which contains the parameter.
Data file Recall2
train add approach1 approach2
2 (10) 0.6 0.07
7 (15) 0.64 0.076
9 (20) 0.65 0.078
Now, instead of using xtic(1), you can also construct the string to be used as xticlabel:
reset
set term pngcairo truecolor enhance size 1024,1024
set grid ytics
set grid xtics
set key bottom right
set output 'Recall2.png'
set key autotitle columnhead
myxtic(a, b) = sprintf("{%s}{/*1.5 %s}", a, b)
plot for [i=3:4] 'Recall2' using 1:i:xtic(myxtic(strcol(1), strcol(2))) with linespoints linecolor i pt 7 ps 3

Related

Plot CDF with GNUPLOT with data not being sorted

Hi I am trying to write a gnuplot script that produced CDF graph for the data produced from another program.
The data looks like this:
col1 col2 col3 col4 col5
ABCD11 19.8 1.13 129 2
AABC32 14.3 2.32 109 2
AACd12 19.1 0.21 103 2
I want to plot CDF for the column 2. The point is that data in the col2 might not be sorted.
To compile the script I use online tool such as here
The script I tried is:
set output 'out.svg'
set terminal svg size 600,300 enhanced fname 'arial' fsize 10 mousing butt solid
set xlabel "X"
set ylabel "CDF"
set style line 2 lc rgb 'black' lt 1 lw 1
set xtics format "" nomirror rotate by -10 font ", 7"
set ytics nomirror
set grid ytics
set key box height .4 width -1 box right
set nokey
set title "CDF of X"
a=0
#gnuplot 4.4+ functions are now defined as:
#func(variable1,variable2...)=(statement1,statement2,...,return value)
cumulative_sum(x)=(a=a+x,a)
plot "data.txt" using 1:(cumulative_sum($2)) with linespoints lt -1

You can use the cumulative smoothing style to get a CDF from data, see help smooth cumulative:
plot "test.dat" u 2:(1) smooth cumulative w lp

If you want to calculate the (running) cumulative sum of the values from second column using sorted values, then you could slightly extend your approach based on awk. To be more specific, the command would be
tail -n+2 'test.txt' | sort -k2,2n | awk '{s+=$2; print NR, s}'
Here, tail strips off the header (skips the first line), sort sorts numerically according to the second column, and finally awk calculates the cumulative sum as a function of the number of records/items.

gnuplot - intersection of two plots

I am using gnuplot to plot data from two separate csv files (found in this link: https://drive.google.com/open?id=0B2Iv8dfU4fTUZGV6X1Bvb3c4TWs) with a different number of rows which generates the following graph.
These data seem to have no common timestamp (the first column) in both csv files and yet gnuplot seems to fit the plotting as shown above.
Here is the gnuplot script that I use to generate my plot.
# ###### GNU Plot
set style data lines
set terminal postscript eps enhanced color "Times" 20
set output "output.eps"
set title "Actual vs. Estimated Comparison"
set style line 99 linetype 1 linecolor rgb "#999999" lw 2
#set border 1 back ls 11
set key right top
set key box linestyle 50
set key width -2
set xrange [0:10]
set key spacing 1.2
#set nokey
set grid xtics ytics mytics
#set size 2
#set size ratio 0.4
#show timestamp
set xlabel "Time [Seconds]"
set ylabel "Segments"
set style line 1 lc rgb "#ff0000" lt 1 pi 0 pt 4 lw 4 ps 0
plot "estimated.csv" using ($1):2 with lines title "Estimated", "actual.csv" using ($1):2 with lines title "Actual";
Is there any way where we can print out (write to a file) the values of the intersection of these plots by ignoring the peaks above green plot? I also have tried to do an sql-join query but it doesn't seem to print out anything for the same reason I explained above.
PS: If the blue line doesn't touch the green line (i.e. if it is way below the green line), I want to take the values of the closest green line so that it will be a one-to-one correspondence (or very close) with the actual dataset.

Perhaps one could somehow force Gnuplot to reinterpolate both data sets on a fine grid, save this auxiliary data and then compare it row by row. However, I think that it's indeed much more practical to delegate this task to an external tool.
It's certainly not the most efficient way to do it, nevertheless a "lazy approach" could be to read the data points, interpret each dataset as a LineString (collection of line segments, essentially equivalent to assuming a linear interpolation between data points) and then calculate the intersection points. In Python, the script to do this might look like this:
#!/usr/bin/env python
import sys
import numpy as np
from shapely.geometry import LineString
#-------------------------------------------------------------------------------
def load_data(fname):
return LineString(np.genfromtxt(fname, delimiter = ','))
#-------------------------------------------------------------------------------
lines = list(map(load_data, sys.argv[1:]))
for g in lines[0].intersection(lines[1]):
if g.geom_type != 'Point':
continue
print('%f,%f' % (g.x, g.y))
Then in Gnuplot, one can invoke it directly:
set terminal pngcairo
set output 'fig.png'
set datafile separator comma
set yr [0:700]
set xr [0:10]
set xtics 0,2,10
set ytics 0,100,700
set grid
set xlabel "Time [seconds]"
set ylabel "Segments"
plot \
'estimated.csv' w l lc rgb 'dark-blue' t 'Estimated', \
'actual.csv' w l lc rgb 'green' t 'Actual', \
'<python filter.py estimated.csv actual.csv' w p lc rgb 'red' ps 0.5 pt 7 t ''
which gives:

gnuplot - autoscale y axis with filledcurves + xrange + xdata time

in gnuplot 5.0 patchlevel 1 on my old server I used:
set term pngcairo transparent truecolor size 190,40
set output "some.png"
unset bmargin
set bmargin 0
set lmargin 0
set rmargin 0
set tmargin 0
unset border
unset xtics
unset ytics
unset y2tics
unset key
unset title
unset colorbox
set timefmt '%Y-%m'
set xdata time
set style fill transparent solid 0.25 noborder
tt = "`date +%Y-%m-%d\ %H:%M`"
TIMEFMT = "%Y-%m-%d %H:%M"
now_secs = strptime(TIMEFMT,tt)
two_years_past = now_secs - 3600.0*24*365*2
eval(sprintf('set xrange ["%s":]',strftime(TIMEFMT,two_years_past)))
set autoscale yfix
plot "datafile" using 1:2 with filledcurves below x1 lw 1 lc rgb "#a7eeeeee" title ''
...it produced a graph with y range correctly auto-scaled.
But on my new server with gnuplot 5.0 patchlevel 3 installed it does not work anymore. It seems they screwed something in the code. The yrange is computed from all x timedata, not over the selected xrange only.
I have no idea how to correct the yrange in this case. It could be computed using the stats command, but the "xdata time" must be switched off before, but in that case I do not know, how to set the right xrange for the stats command.
Regards
Pavel
EDIT:
minimal datafile:
2014-01 2
2014-06 6
2015-01 4
2015-06 8
2016-01 6
2016-06 10

I can reproduce your y-range autoscale issue with gnuplot<=5.0.1 and all versions >5.0.1 to 5.4.0.
Although, gnuplot is not scaling to the full y-data-range as you assumed, but apparently filledcurves x1 seems to always (auto)scale to 0 unless there is y-data <0.
To me, this looks more like a bug than a feature. I don't see a reason why filledcurves should always autoscale to 0.
In contrary to this behaviour, the plotting style with boxes will still autoscale in y to the minimum as you want it.
So, as a workaround to keep your desired behaviour you need to add two lines:
stats $Data u 2 nooutput
set yrange[STATS_min:]
The x-range is already limited when executing the stats command, hence you will get the y-minimum in STATS_min which you can use to set the y-range.
By the way: I cleaned up your script a bit. Why making a platform-dependent system call for getting the current time, if you have the gnuplot function time()? Check help time.
Script: (works identical for all gnuplot versions >=5.0.0)
### adjust time range via current time with proper y autoscale
reset session
$Data <<EOD
2020-01 12
2020-06 6
2021-01 4
2021-06 8
2022-01 6
2022-06 10
EOD
t0 = time(0) # now, i.e. seconds from Jan, 1st 1970 00:00:00
TwoYearsInSec = 3600*24*365*2 # two years in seconds
myTimeFmt = "%Y-%m"
set format x "%Y\n%m" timedate
set style fill transparent solid 0.25 noborder
set xrange [t0-TwoYearsInSec:t0]
set multiplot layout 2,1
set title "undesired y-autoscaling to 0 with filledcurves"
plot $Data u (timecolumn(1,myTimeFmt)):2 w filledcurves above x1 lc rgb 0xff0000 not
set title "workaround to scale to y-minimum"
stats $Data u 2 nooutput
set yrange[STATS_min:]
replot
unset multiplot
### end of script
Result: (created with gnuplot 5.4.0)

Two data points on same x coordinate overlapping

I started keeping a record of days that I've gone running, and the distance. I like plotting this using boxes to get an overview of how active I have been lately.
I ran into a problem today when I added yesterday's data.
As you can see from 05/04/13 there are two runs, and the plot shows two boxes on the same day (far left box). I like this behavior. 06/26/13 I had two runs again but this time the plot was only showing one (far right box). After a little playing around I realized it's because on 05/04, the larger number (in column 2) comes first, so the smaller number gets plotted on top of it. The opposite is true for 06/26, and the result is only being able to see the larger number for that day.
Is there a way to fix this without altering my data file?
If it's possible to do in the plot script, I wouldn't have to watch how I enter data to my file.
Here is the data:
05/04/13 1.59
05/04/13 0.81
05/05/13 1.56
05/06/13 1.90
05/08/13 2.77
05/11/13 2.19
05/12/13 0.93
05/14/13 2.50
05/15/13 1.04
05/16/13 1.66
06/02/13 4.02
06/03/13 1.80
06/04/13 1.04
06/05/13 0.93
06/12/13 1.18
06/15/13 1.78
06/16/13 1.26
06/19/13 0.86
06/21/13 0.93
06/26/13 1.05
06/26/13 1.39
The script:
set terminal x11 nopersist size 1200,645
unset mouse
unset key
unset label
unset grid
set boxwidth 86400 absolute
set style fill solid 1.00 border lt -1
set bmargin at screen 0.08
set xdata time
set timefmt x "%m/%d/%y"
set format x "%b %d"
set xtics 86400 nomirror rotate by -90
set mxtics 0
set xrange [ "05/01/13" : "06/30/13" ] noreverse nowriteback
set ylabel "Distance"
set ylabel textcolor lt -1 rotate by -270
set yrange [ 0.00000 : 4.50000 ] noreverse nowriteback
plot "/Users/user/Dropbox/nvalt/walks.txt" using 1:2 with boxes lt rgb "#777777"
An image of the plot:

For this type of files, it doesn't really matter in what order the days are, but as you mention, the ordering of the data is important. I was able to obtain the required output, by simply replacing
plot "/Users/user/Dropbox/nvalt/walks.txt" using 1:2 with boxes lt rgb "#777777"
By
plot "<sort -r /Users/user/Dropbox/nvalt/walks.txt" using 1:2 with boxes lt rgb "#777777"
This should also work for more than two data points for the same date.

x range for non-numerical data in Gnuplot

When running the following script, I get an error message:
set terminal postscript enhanced color
set output '| ps2pdf - histogram_categorie.pdf'
set auto x
set key off
set yrange [0:20]
set style fill solid border -1
set boxwidth 5
unset border
unset ytic
set xtics nomirror
plot "categorie.dat" using 1:2 ti col with boxes
The error message that I get is
smeik:plots nvcleemp$ gnuplot categorie.gnuplot
plot "categorie.dat" using 1:2 ti col with boxes
^
"categorie.gnuplot", line 13: x range is invalid
The content of the file categorie.dat is
categorie aantal
poussin 13
pupil 9
miniem 15
cadet 15
junior 6
senior 5
veteraan 8
I understand that the problem is that I haven't defined an x range. How can I make him use the first column as values for the x range? Or do I need to take the row numbers as x range and let him use the first column as labels? I'm using Gnuplot 4.4.
I'm ultimately trying to get a plot that looks the same as the plot I made before this one. That one worked fine, but had numerical data on the x axis.
set terminal postscript enhanced color
set output '| ps2pdf - histogram_geboorte.pdf'
set auto x
set key off
set yrange [0:40]
set xrange [1935:2005]
set style fill solid border -1
set boxwidth 5
unset border
unset ytic
set xtics nomirror
plot "geboorte.dat" using 1:2 ti col with boxes,\
"geboorte.dat" using 1:($2+2):2 with labels
and the content of the file geboorte.dat is
decennium aantal
1940 2
1950 1
1960 3
1970 2
1980 3
1990 29
2000 30

the boxes style expects that the x-values are numeric. That's an easy one, we can give it the pseudo-column 0 which is essentially the script's line number:
plot "categorie.dat" using (column(0)):2 ti col with boxes
Now you probably want the information in the first column on the plot somehow. I'll assume you want those strings to become the x-tics:
plot "categorie.dat" using (column(0)):2:xtic(1) ti col with boxes
*careful here, this might not work with your current boxwidth settings. You might want to consider set boxwidth 1 or plot ... with (5*column(0)):2:xtic(1) ....
EDIT -- Taking your datafiles posted above, I've tested both of the above changes to get the boxwidth correct, and both seemed to work.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string