I have data I would like to plot in a histogram style with a "cumulated" curve on top. I have the following problem:
My data consists of one column with the categories ("discharge") and one column with the quantity of values ("probability") that belong to the respective category. The last value of the category-column is ">100" summarizing all power plants that have a bigger discharge than the last numeric value ("100 m^3/s"). I have not found a solution to plot this last category and the respective values with the command plot 'datafile.dat' using 1:2 with boxes ... because (as I assume) in this case only numerical values are read out for the x-ticlabels, so the last category is missing. If
I plot it with this command plot 'datafile.dat' using 2:xtics(1) with boxes ... I get the last category ">100" plotted just fine.
BUT: if I use the latter command the x-axis labels appear in the normal font size. Even though I have the line set format x '\footnotesize \%10.0f' in my code.
I have read about explicit labels in the plotcommand line that overwrite format style which was set before but was not able to adapt it to my code.
Changing ytic font size in gnuplot epslatex (multiplot)
Do you have an idea how to do this?
Excel screenshot to visualize what I want to achieve
'datafile.dat'
discharge probability cumulated
10 20 20%
20 10 10%
30 5 5%
40 6 6%
50 4 4%
60 12 12%
70 8 8%
80 15 15%
90 20 20%
100 6 6%
>100 4 4%`
[terminal=epslatex,terminaloptions={size 15cm, 8cm font ",10"}]
set xrange [*:*]
set yrange [0:20]
set y2range [0:100]
set xlabel 'Discharge$' offset 0,-1
set ylabel 'No. of power plants' offset 10.5
set y2label 'Cumulated probability' offset -10
set format xy '$\%g$'
set format x '\footnotesize \%10.0f'
set format y '\footnotesize \%10.0f'
set format y2 '\footnotesize \%10.0f'
set xtics rotate by 45 center offset 0,-1
set style fill pattern border -1
set boxwidth 0.3 relative
set style line 1 lt 1 lc rgb 'black' lw 2 pt 6 ps 1 dt 2
plot 'datafile.dat' using 1:2 with boxes axes x1y1 fs pattern 6 lc black notitle, \
'datafile.dat' using 1:3 with linespoints axes x1y2 ls 1 notitle
I am confused by your datafile; the numbers in the third column do not seem to be cumulative, and do not add up to 100%. Here is a solution that uses only the first two columns of your file:
set term epslatex standalone header "\\usepackage[T1]{fontenc}"
set output 'test.tex'
stats "datafile.dat" using 2
total = STATS_sum
set xlabel "Discharge" offset 0, 1.5
set xtics rotate
set ylabel "No. of power plants"
set ytics nomirror
set yrange [0:*]
set y2label "Cumulative probability"
set y2tics
set y2range [0:]
set boxwidth 0.3 relative
set style line 1 lt 1 lc rgb 'black' lw 2 pt 6 ps 1 dt 2
plot \
'datafile.dat' using 2:xtic("\\footnotesize " . stringcolumn(1)) with boxes axes x1y1 fs pattern 6 lc black notitle, \
'datafile.dat' using ($2/total) smooth cumulative with linespoints axes x1y2 ls 1 notitle
set output
The trick is to add the latex command \footnotesize in front of each label in the using command. It also first computes the total number of power plants so that it can compute probabilities, and computes cumulative values with the smooth cumulative option.
Related
I have a dataset (show-errorbar.dat) containing:
Model# DE IE Error
Apple -4.6 -128.9538 4.0
Huawei -5.2 -176.6343 5.3
One-Pro -5.2 -118.1106 3.2
#!/usr/bin/gnuplot
#set terminal pdfcairo enhanced color font 'Helvetica,12' linewidth 0.8
set terminal png
set output 'BrandError.png'
set boxwidth 1.0 relative
set bmargin 5
set style fill solid border -1
set xtic rotate by -45 scale 0
#set auto x
set style line 81 lt 0 lc rgb "#808080" lw 0.5
set grid xtics
set grid ytics
set grid mxtics
set grid mytics
set grid back ls 81
set arrow from graph 0,first -4.6 to graph 1, first -4.6 nohead lw 2 lc rgb "#000000" front
set border 11
set border lw 2.0
set xtics font ",11"
set ytics font ",14"
set tics out
set ytics nomirror
set y2tics
set y2tics font ",14"
set mxtics 10
set mytics 2
set my2tics 2
set yrange [-10:0]
set y2range [-260:0]
set key left bottom
set y2label offset -2
set ylabel offset 2
set ylabel 'DE' tc rgb "red"
set y2label 'IE' tc rgb "green"
set style data histograms
set style histogram cluster gap 2
set linetype 2 lc rgb 'red'
set linetype 3 lc rgb 'yellow'
set linetype 4 lc rgb 'green'
plot 'show-errorbars.dat' using 2 ti 'DE' lc 2 axis x1y1, '' u 3:xticlabels(1) ti 'IE' lc 4 axis x1y2
set output
enter image description here
I would like to plot a histogram comparing DE vs IE and also show error bars (data in column 4) for the IE values.
Please any help on how to go about it.
There is a variant histogram style for exactly that purpose
set style histogram errorbars gap 2 {lw W}.
Here is the help section from the docs:
The `errorbars` style is very similar to the `clustered` style, except that it
requires additional columns of input for each entry. The first column holds
the height (y value) of that box, exactly as for the `clustered` style.
2 columns: y yerr bar extends from y-yerr to y+err
3 columns: y ymin ymax bar extends from ymin to ymax
The appearance of the error bars is controlled by the current value of
`set errorbars` and by the optional <linewidth> specification.
Updated answer
Notes:
You can't mix axis choice within a single histogram. So I have removed the axes x1y1 and axes x1y2 from the plot command. Since you have explicitly given the range for both y1 and y2, the plot border and labels are not affected.
However since the green bars are now being plotted against y1, we have to scale them so that the y2 axis labels apply. So the column 3 and column 4 values will be divided by 26, which is (y2 range) / (y1 range)
In "histogram errorbars" mode each plot component looks for an extra column of data to determine the size of the errorbar. Since your column 2 data has no corresponding column of errors, we dummy it up to use all a constant not-a-number (no data) value: (NaN)
Your data contains a line of columnheaders, which could confuse the program if it thinks this is a line of data. There are a number of ways you can tell the program to skip this line; I have used set key autotitle columnhead for convenience and because it is supported by old versions of gnuplot. If you have a current version it would be better to use instead set datafile columnheaders.
I have kept all of your commands except that the plot command is replaced by the following 3 lines:
set style histogram errorbars gap 2 lw 1.5
set key autotitle columnhead
plot 'show-errorbars.dat' using 2:(NaN) ti 'DE' lc 2, '' u ($3/26.):($4/26.):xticlabels(1) ti 'IE' lc 4
I desgined a histogram in gnuplot however the y-scale needs to be in log2 due to huge difference in values. Therefore, to improve readability of the plot I pretend to display the concrete values on top of each bar. The values represent bytes and so I would like for this values also be in log2 and to be formated to display kb, Mb, ... as is being done in the y-axis.
How can I achieve this?
This is the comands I'm currently using:
set terminal postscript eps enhanced dash color "" 13
reset
set datafile separator ","
set title "Bytes per Protocol"
set xlabel "Protocol"
set ylabel "Bytes" rotate by 90
set yrange [0:1342177280]
set logscale y 2
set format y '%.0s%cB'
set style data histogram
set boxwidth 0.5
set style fill solid
set xtics format ""
set grid ytics
set style data histogram
set style histogram clustered gap 2
set grid ytics
set tic scale 0
set size 1,0.9
set size ratio 0.5
set key autotitle columnhead
set output "ex_a_1_BIG.eps"
plot "ex_a_1_BIG.csv" using ($3):xtic(1) title "IN", \
'' using ($5):xtic(1) title "OUT", \
'' using 0:($3):($3) with labels center offset -2,1 notitle, \
'' using 0:($5):($5) with labels center offset 2,1 notitle
This is the content of the csv I want to plot (I only want the bytes in and out):
protocol,packets in,bytes in,packets out,bytes out
ICMP,1833,141562,979,60334
IGMP,0,0,283,14006
TCP,158214,129221151,130101,47734355
UDP,68476,9571677,72530,24310734
Check help format_specifiers and help gprintf. And the example below.
What is a bit unfortunate, that in gnuplot apparently the prefix for 1 to 999 is a single space instead of an empty string.
For example, with the format '%.1s %cB' this leads to two spaces for 1-999 B and one space for the others, e.g. 1 kB. However, if you use '%.1s%cB' this leads to one space for 1-999 B and no space for the others e.g. 100kB. As far as I know, correct would be one space between the number and the units. I'm not sure whether there is an easy fix for this.
Code:
### prefixes
reset session
$Data <<EOD
1 1
2 12
3 123
4 1234
5 12345
6 123456
7 1234567
8 12345678
9 123456789
10 1234567890
11 12345678901
12 123456789012
13 1234567890123
EOD
set boxwidth 0.7
set style fill solid 1.0
set xtics 1
set yrange [0.5:8e13]
set multiplot layout 2,1
set logscale y # base of 10
set format y '%.0s %cB'
plot $Data u 1:2 w boxes lc rgb "green" notitle, \
'' u 1:2:(gprintf('%.1s %cB',$2)) w labels offset 0,1 not
set logscale y 2 # base of 2
set format y '%.0b %BB'
plot $Data u 1:2 w boxes lc rgb "red" notitle, \
'' u 1:2:(gprintf('%.1b %BB',$2)) w labels offset 0,1 not
unset multiplot
### end of code
Result:
Addition:
a workaround for number/unit space issue at least for the labels in the graph would be:
myFmt(c) = column(c)>=1 && column(c)<1000 ? \
gprintf('%.1s%cB',column(c)) : gprintf('%.1s %cB',column(c))
and
plot $Data u 1:2 w boxes lc rgb "green" notitle, \
'' u 1:2:(myFmt(2)) w labels offset 0,1 not
But for the ytics labels I still don't have an idea.
I would like to plot a bar chart or histogram like this in gnuplot.
I tried set style histogram rowstacked which is a start but it adds the columns on top of each other while I need them overlapped. Next is the issue of transparent color shading.
Thanks for your feedback.
UPDATE: user8153 asked for additional data.
The set style histogram clustered gap 0.0 is doing the cluster mode of the histogram bars. If you blur the eye it sort-of shows what I want but with overlap and transparent shading.
The only other histogram modes given in the docs are rowstacked and columnstacked. I never got a plot out of columnstacked so I discarded it. Now rowstacked stacks the histogram bars.
The overlay appearance is there but it is wrong. I don't want the stacked appearance. The histograms have to overlay.
Code :
set boxwidth 1.0 absolute
set style fill solid 0.5 noborder
set style data histogram
set style histogram clustered gap 0.0
#set style histogram rowstacked gap 0.0
set xtics in rotate by 90 offset first +0.5,0 right
set yrange [0:8000]
set xrange [90:180]
plot 'dat1.raw' using 3 lc rgb 'orange', \
'dat2.raw' using 3 lc rgb 'blue', \
'dat3.raw' using 3 lc rgb 'magenta'
Thanks for your feedback.
Given a sample datafile test.dat
-10 4.5399929762484854e-05
-9 0.0003035391380788668
-8 0.001661557273173934
-7 0.007446583070924338
-6 0.02732372244729256
-5 0.0820849986238988
-4 0.20189651799465538
-3 0.4065696597405991
-2 0.6703200460356393
-1 0.9048374180359595
0 1.0
1 0.9048374180359595
2 0.6703200460356393
3 0.4065696597405991
4 0.20189651799465538
5 0.0820849986238988
6 0.02732372244729256
7 0.007446583070924338
8 0.001661557273173934
9 0.0003035391380788668
10 4.5399929762484854e-05
you can use the following commands
set style fill transparent solid 0.7
plot "test.dat" with boxes, \
"test.dat" u ($1+4):2 with boxes
to get the following result (using the pngcairo terminal):
Using transparency as in user8153's solution is certainly the easiest way to visualize an overlap of two histograms.
This works even if the two histogram do not have identical bins or x-data-ranges.
However, the color of the overlap is pretty much bound to the colors of the two histogram and the level of transparency. Furthermore, if you want to show the overlap in the key you have to do it "manually".
Here is a solution where you can choose an independent color for the overlap area.
The overlap is basically the minimum y-value from both histograms for each x-value.
For this you need to compare the y-values for each x-value. This can be done in gnuplot with some "trick" by merging the two files line by line. This requires the data in a datablock (how to get it there from a file). Since this merging procedure is using indexing of datablock lines, it requires gnuplot>=5.2.0.
This assumes that you have the same x-range and bins for each histogram. If this is not the case, you have to implement some further steps.
Script: (works with gnuplot>=5.2.0, Sept. 2017)
### plot overlap of two histograms
reset session
# create some random test data
set samples 21
f(x,a,b) = 1./(a*(x-b)**4+1)
set table $Data1
plot '+' u 1:(f(x,0.01,-2)) w table
set table $Data2
plot '+' u 1:(f(x,0.02,4)) w table
unset table
set boxwidth 1.0
set grid y
set ytics 0.2
set multiplot layout 2,1
set style fill transparent solid 0.3
plot $Data1 u 1:2 w boxes lc 1 ti "Data1", \
$Data2 u 1:2 w boxes lc 2 ti "Data2"
set print $Overlap
do for [i=1:|$Data1|] { print $Data1[i].$Data2[i] }
set print
set style fill solid 0.3
plot $Data1 u 1:2 w boxes lc 1 ti "Data1", \
$Data2 u 1:2 w boxes lc 2 ti "Data2", \
$Overlap u 1:($2>$4?$4:$2) w boxes lc "red" ti "Overlap"
unset multiplot
### end of script
Result:
I have files looking like this:
Number Data1 Data2
1 9.10 4.022
2 15.27 3.996
3 21.92 4.004
4 21.19 4.026
5 20.67 4.022
6 20.99 4.000
7 19.80 4.004
8 20.01 3.931
9 20.18 4.004
10 19.78 4.007
I want to plot Number in X axes, Data1 in leftY and Data2 in rightY, but I can not figure out how to do it.
Thanks
Just a brief annotated sample, using your data saved in a file so.dat:
# Set ticks for 2nd y axis
set y2tics
# We don't want to see the left ticks on the right axis
set ytics nomirror
# Set ranges so that the data points are not on the axis
set xrange [0:11]
set yrange [8:23]
set y2range[3.95:4.05]
# use first line of the file for labels
set key autotitle columnhead
# display key in least busy area
set key bottom right
# Title and axis labels
set title "Nice Try"
set xlabel "Number"
set ylabel "Data1"
set y2label "Data2"
plot "so.dat" using 1:2 axes x1y1 with points pointsize 2,\
"" u 1:3 axes x1y2 w p ps 2 pointtype 6
One can do a lot more decoration etc. but I think this is the essence of what you want. The graph produced:
I'm just taking the first look at gnuplot today and using the histogram example, I wanted to build a small example as from the tutorial, only I changed the input numbers from 50,000 something to 100-range and it is not visualized correctly.
Here's the dat file
Region Denmark Netherlands Norway Sweden
1891-1900 500 400 300 200
And this is the gnuplot script
set terminal pngcairo
set output 'histograms.2.png'
set boxwidth 0.9 absolute
set style fill solid 1.00 border lt -1
set key inside right top vertical Right noreverse noenhanced autotitles nobox
set style histogram clustered gap 5 title offset character 0, 0, 0
set datafile missing '-'
set style data histograms
set xtics border in scale 0,0 nomirror rotate by -45 offset character 0, 0, 0
set xtics norangelimit font ",8"
set xtics ()
set title "US immigration from Northern Europe\n(same plot with larger gap between clusters)"
set yrange [ 0.00000 : 3000. ] noreverse nowriteback
i = 22
plot 'immigration.dat' using 1:xtic(1) ti col, '' u 2 ti col, '' u 3 ti col, '' u 4 ti col
As seen in here:
the first column is wrongly visualized. any ideas?!
I think you want:
plot 'immigration.dat' using 2:xtic(1) ti col, '' u 3 ti col, '' u 4 ti col, '' u 5 ti col
In your version, gnuplot is interpreting the data in the first column (1891-1900) as a number (1891). You can also see this by carefully looking at the key -- The red bar corresponds to Region.