Hi I am trying to write a gnuplot script that produced CDF graph for the data produced from another program.
The data looks like this:
col1 col2 col3 col4 col5
ABCD11 19.8 1.13 129 2
AABC32 14.3 2.32 109 2
AACd12 19.1 0.21 103 2
I want to plot CDF for the column 2. The point is that data in the col2 might not be sorted.
To compile the script I use online tool such as here
The script I tried is:
set output 'out.svg'
set terminal svg size 600,300 enhanced fname 'arial' fsize 10 mousing butt solid
set xlabel "X"
set ylabel "CDF"
set style line 2 lc rgb 'black' lt 1 lw 1
set xtics format "" nomirror rotate by -10 font ", 7"
set ytics nomirror
set grid ytics
set key box height .4 width -1 box right
set nokey
set title "CDF of X"
a=0
#gnuplot 4.4+ functions are now defined as:
#func(variable1,variable2...)=(statement1,statement2,...,return value)
cumulative_sum(x)=(a=a+x,a)
plot "data.txt" using 1:(cumulative_sum($2)) with linespoints lt -1
You can use the cumulative smoothing style to get a CDF from data, see help smooth cumulative:
plot "test.dat" u 2:(1) smooth cumulative w lp
If you want to calculate the (running) cumulative sum of the values from second column using sorted values, then you could slightly extend your approach based on awk. To be more specific, the command would be
tail -n+2 'test.txt' | sort -k2,2n | awk '{s+=$2; print NR, s}'
Here, tail strips off the header (skips the first line), sort sorts numerically according to the second column, and finally awk calculates the cumulative sum as a function of the number of records/items.
Related
So, im trying to plot two diferent box plot with diferent files, here my code:
set boxwidth 0.5
set style fill solid 0.5
set xlabel ""
set ylabel "Boxplot Value"
set grid layerdefault
set xtics ("Data A" 1, "Data B" 2)
set xtics rotate by -50
plot "out4.txt" using (1):1 notitle with boxplot, "out20.txt" using (1):2 notitle with boxplot
And this error shows up: "boxplot.gnu", line 8: warning: Skipping data file with no valid points
My data is arranged like this:
2
3
4
5
6
7
6
23
423
42
342
34
234
Just one column, its the same data in both files.
If your file "out20.txt" also consists of only one column, what should gnuplot plot if you write "out20.txt" using (1):2? There is no second column to plot. That's what gnuplot is telling you "Skipping data file with no valid points".
If "out4.txt" corresponds to Data A at x=1 and "out20.txt" to Data B at x=2, then changing line 8 to the following should show your graph:
plot "out4.txt" using (1):1 notitle with boxplot, "out20.txt" using (2):1 notitle with boxplot
Lets say you have the following data file:
#id count min 1st quart median 3rd quart max sum std-dev name
1 172 0.00032 0.00033 0.00033 0.00033 0.00138 0.05811 0.00008 spec
2 172 0.00039 0.00040 0.00041 0.00042 0.00142 0.07236 0.00008 schema
3 172 0.00007 0.00008 0.00008 0.00009 0.00032 0.01539 0.00003 truss
And you want to draw three box plots with different color depending on which name, column 10, and you'd rather not add an additional column to your already wide table with redundant information.
You've currently got a graph that looks like:
Through the script:
set terminal pdf enhanced size 8cm,8cm font "Verdana 10"
set output "charts/comparison-keyword-".ARG1.".pdf"
set boxwidth 0.2 absolute
set title "Validation comparison for key :".ARG1
set ylabel "milliseconds"
set xrange[0:4]
set yrange[0.00005:50]
set logscale y
set grid y
set tics scale 0
set xtics nomirror
set ytics nomirror
set border 2
set style fill solid 0.25 border -1
set style data boxplot
# Data columns: id count min 1st-quart median 3rd-quart max sum std-dev name
plot "data/comparison-keyword-".ARG1 using 1:4:3:7:6:(0.6):xticlabels(10) with candlesticks linecolor rgb 'orange' title 'Quartiles' whiskerbars, \
'' using 1:4:4:4:4:(0.6) with candlesticks lt -1 notitle
And would like to change the linecolor thruogh a dictionary lookup where:
spec => blue
schema => orange
truss => green
How would you go about it? Is it even possible to translate spec => blue in GnuPlot?
Using sed, you can add extra column with color values corresponding to the words in the last column. You have to plot it twice, first time to set the labels on the X axis, and second time to plot with colors.
plot "candle.dat" using 1:4:3:7:6:(0.6):xticlabels(10) with candlesticks notitle whiskerbars, \
"< sed 's/spec/spec 0x0000ff/;s/schema/schema 0xff9900/;s/truss/truss 0x00ff00/' candle.dat" using 1:4:3:7:6:(0.6):11 with candlesticks linecolor rgb variable title 'Quartiles' whiskerbars, \
"candle.dat" using 1:4:4:4:4:(0.6) with candlesticks lt -1 notitle
A late answer, but there is no need for sed and no need to modify the data by adding an extra column.
You can do it with only gnuplot which would also be platform-independent.
It can be done by a string lookup which is also used here.
For the colors it would be easier to provide them in 0xrrggbb scheme, instead of color names, otherwise you need to check this: gnuplot: apply colornames from datafile
Script:
### selecting colors by key from data column ("lookup table")
reset session
$Data <<EOD
#id count min 1st quart median 3rd quart max sum std-dev name
1 172 0.00009 0.00023 0.00033 0.00043 0.00138 0.05811 0.00008 spec
2 172 0.00011 0.00020 0.00037 0.00042 0.00142 0.07236 0.00008 schema
3 172 0.00002 0.00003 0.00008 0.00012 0.00032 0.01539 0.00003 truss
EOD
$Lookup <<EOD
spec 0x0000ff
schema 0xffa500
truss 0x00ff00
EOD
getIdx(s) = int(sum [_i=1:|$Lookup|] (word($Lookup[_i],1) eq s ? _i : 0))
myColor(col) = int(word($Lookup[getIdx(strcol(col))],2))
set title "Validation comparison for key :"
set xrange[0:4]
set xtics scale 0
set ylabel "milliseconds"
set ytics nomirror
set logscale y
set grid y
set border 2
set style fill solid 0.25 border -1
set style data boxplot
set key noautotitles
# Data columns: id count min 1st-quart median 3rd-quart max sum std-dev name
plot $Data u 1:4:3:7:6:(0.6):(myColor(10)):xtic(10) w candle lc rgb var whiskerbars, \
'' u 1:5:5:5:5:(0.6) w candle lc rgb "black" ti 'Quartiles' whiskerbars
### end of script
The above lookup table works only for gnuplot>=5.2.0 because it uses indexing of datablocks. The lookup version for earlier versions would look like this:
myNames = "spec schema truss"
myColors = "0x0000ff 0xffa500 0x00ff00"
getIdx(s) = int(sum [_i=1:words(myNames)] (word(myNames,_i) eq s ? _i : 0))
myColor(col) = int(word(myColors,getIdx(strcol(col))))
Result:
I wonder how I can add a parameter to every X parameter. Like on the picture, where every X parameter has an additional parameter.
I run gnuplot with the following command
gnuplot -p -e "reset; set yrange [0:1]; set term png truecolor size 1024,1024; set grid ytics; set grid xtics; set key bottom right; set output 'Recall.png'; set key autotitle columnhead; plot for [i=2:3] 'Recall' using 1:i with linespoints linecolor i pt 0 ps 3
Recall file has the following content
train approach1 approach2
2 0.6 0.07
7 0.64 0.076
9 0.65 0.078
I wonder if I can add additional parameter as follows
train approach1 approach2
2(10) 0.6 0.07
7(15) 0.64 0.076
9(20) 0.65 0.078
The actual plotting should be according the real X parameters (2,7,9) an additional parameter is only for visualization and should be printed together with X.
Many gnuplot's terminals provide an enhanced option
that mimics the functionality provided by the postscript
terminal, functionality described here.
What you want can be done using an enhanced terminal in conjunction with the set xtics command (see help set xtics for the correct sintax):
gnuplot> set term qt enhanced
gnuplot> set xrange [2:10]
gnuplot> set xtics ('{/=8 3} {/=20 (a)}' 3, '6 (c)' 6)
gnuplot> plot sin(x)
Please refer to the link for a complete description of the available commands.
Update
To produce automatically the x axis labels, one can use backticks substitution, either directly in a gnuplot command file or on the command line, as in the OP approach.
The command line is longish...
gnuplot -p -e "reset; set yrange [0:1]; set term png truecolor size 1024,1024; set grid ; set key bottom right; set output 'Recall.png'; set key autotitle columnhead; `awk -f Recall.awk Recall` ; plot for [i=2:3] 'Recall' using 1:i with linespoints linecolor i pt 0 ps 3"
The key point is using an awk script that outputs the appropriate gnuplot command, and here it is the awk script
% cat Recall.awk
BEGIN { printf "set xtics (" }
NR>1 {
printf (NR==2?"":",")
printf ("'{/=8 %d} {/=16 (%d)}' %d", $1, $4, $1) }
END { print ")"}
Oooops!
I forgot to show the modified format of data file...
% cat Recall
train approach1 approach2
2 0.6 0.07 10
7 0.64 0.076 15
9 0.65 0.078 20
and here it is the product of the previous command line
If you want to take an xtic label from your data file, you can use using ...:xtic(1) which would take the value of the first column as xtic label.
The disadvantage might be, that for every value in your data file you'll get an xtic, and no other ones. So, using the data file
train approach1 approach2
2(10) 0.6 0.07
7(15) 0.64 0.076
9(20) 0.65 0.078
you could plot with
reset
set term png truecolor size 1024,1024
set grid ytics
set grid xtics
set key bottom right
set output 'Recall.png'
set key autotitle columnhead
plot for [i=2:3] 'Recall' using 1:i:xtic(1) with linespoints linecolor i pt 7 ps 3
and get
Note, that this uses the correct x-values only, because gnuplot itself drops the content inside the parenthesis, not being a valid number.
If you want to use different font sizes for the label parts, you could add an additional column which contains the parameter.
Data file Recall2
train add approach1 approach2
2 (10) 0.6 0.07
7 (15) 0.64 0.076
9 (20) 0.65 0.078
Now, instead of using xtic(1), you can also construct the string to be used as xticlabel:
reset
set term pngcairo truecolor enhance size 1024,1024
set grid ytics
set grid xtics
set key bottom right
set output 'Recall2.png'
set key autotitle columnhead
myxtic(a, b) = sprintf("{%s}{/*1.5 %s}", a, b)
plot for [i=3:4] 'Recall2' using 1:i:xtic(myxtic(strcol(1), strcol(2))) with linespoints linecolor i pt 7 ps 3
When running the following script, I get an error message:
set terminal postscript enhanced color
set output '| ps2pdf - histogram_categorie.pdf'
set auto x
set key off
set yrange [0:20]
set style fill solid border -1
set boxwidth 5
unset border
unset ytic
set xtics nomirror
plot "categorie.dat" using 1:2 ti col with boxes
The error message that I get is
smeik:plots nvcleemp$ gnuplot categorie.gnuplot
plot "categorie.dat" using 1:2 ti col with boxes
^
"categorie.gnuplot", line 13: x range is invalid
The content of the file categorie.dat is
categorie aantal
poussin 13
pupil 9
miniem 15
cadet 15
junior 6
senior 5
veteraan 8
I understand that the problem is that I haven't defined an x range. How can I make him use the first column as values for the x range? Or do I need to take the row numbers as x range and let him use the first column as labels? I'm using Gnuplot 4.4.
I'm ultimately trying to get a plot that looks the same as the plot I made before this one. That one worked fine, but had numerical data on the x axis.
set terminal postscript enhanced color
set output '| ps2pdf - histogram_geboorte.pdf'
set auto x
set key off
set yrange [0:40]
set xrange [1935:2005]
set style fill solid border -1
set boxwidth 5
unset border
unset ytic
set xtics nomirror
plot "geboorte.dat" using 1:2 ti col with boxes,\
"geboorte.dat" using 1:($2+2):2 with labels
and the content of the file geboorte.dat is
decennium aantal
1940 2
1950 1
1960 3
1970 2
1980 3
1990 29
2000 30
the boxes style expects that the x-values are numeric. That's an easy one, we can give it the pseudo-column 0 which is essentially the script's line number:
plot "categorie.dat" using (column(0)):2 ti col with boxes
Now you probably want the information in the first column on the plot somehow. I'll assume you want those strings to become the x-tics:
plot "categorie.dat" using (column(0)):2:xtic(1) ti col with boxes
*careful here, this might not work with your current boxwidth settings. You might want to consider set boxwidth 1 or plot ... with (5*column(0)):2:xtic(1) ....
EDIT -- Taking your datafiles posted above, I've tested both of the above changes to get the boxwidth correct, and both seemed to work.
My data file looks like this
A 20120301 4
A 20120302 3
B 20120301 5
B 20120302 6
C 20120303 5
except there are many more than just A,B,C and I want to create a stacked graph with gnuplot (similar to the "Stacked histograms" from the gnuplot demos)
20120301 = (A:4 + B:5)
20120302 = (A:3 + B:6)
20120303 = (C:5)
So far I could not convince plot to read the data in that format. Do I have re-arrange the data file for this? Or is there a way for gnuplot to read the data in that format?
I think I've managed to beat it into a form that will work (you'll need at least gnuplot 4.3):
set boxwidth 0.75 absolute
set style fill solid 1.00 border lt -1
set datafile missing '-'
set style histogram rowstacked
set style data histograms
set yrange [0:]
plot for [i=2:4] 'test.dat' u i,'' u (0.0):xtic(1) notitle
and here's the datafile test.dat
#date A B C
#missing data is marked by a minus sign
20120301 4 5 -
20120302 3 6 -
20120303 - - 5
Phew! I've never been much good with gnuplot when it comes to histograms. Hopefully this will work for you (Sorry about the change to your datafile).