Gnuplot histogram with errorbars (High and Low) - gnuplot

I am trying to create a histogram (barchart) with High and Low errors, using gnuplot. I have found this thread Gnuplot barchart histogram with errorbars Unfortunately it consists only from X value and X-error (2 vaues). Whats I would like to achieve is X value (average) and error bar consisting of High and Low values (total 3: avg, High and Low). How I can do this using gnuplot?
My script is identical to the one mentioned in the Thread, I only changed some labels etc (simple cosmetic changes). My example dataset structure is as follows:
WikiVote 10 12 7

If you have a very simple datafile:
#y ymin ymax
4 3 8
You can plot this datafile using:
set yrange [0:]
set style histogram errorbars gap 2 lw 1
plot 'datafile' u 1:2:3 w hist

I have modified the code provided by mgilson, to achieve multiple histograms for a single X value. If anybody needs it here is the code.
plot 'stack_2.dat' u 2:3:4:xtic(1) w hist ti "Hadoop" linecolor rgb "#FF0000", '' u 5:6:7:xtic(1) w hist ti "Giraph" lt 1 lc rgb "#00FF00"
Here is the pattern
#y_0 #min #max #y_1 #min #max
Dataset 4 3 8 6 5 9

Related

Is there any way to visualize the field on adaptive mesh with gnuplot?

I am a beginner in gnuplot. Recently I tried to visualize a pressure field on adaptive mesh.
Firstly I got the coordinates of nodes and center of the cell and the pressure value at the center of the cell.
And, I found something difficult to deal with. That is the coordinates in x and y directions are not regular, which made me feel hard in preparing the format of source data. For regular and equal rectangular case, I can do something just like x-y-z format. But is there any successful case in adaptive mesh?
I understand that you have some x,y,z data which is in no regular grid (well, your adaptive mesh).
I'm not fully sure whether this is what you are looking for, but
gnuplot can grid the data for you, i.e. inter-/extrapolating your data within a regular grid and then plot it.
Check help dgrid3d.
Code:
### grid data
reset session
# create some test data
set print $Data
do for [i=1:200] {
x = rand(0)*100-50
y = rand(0)*100-50
z = sin(x/15)*sin(y/15)
print sprintf("%g %g %g",x,y,z)
}
set print
set view equal xyz
set view map
set multiplot layout 1,2
set title "Original data with no regular grid"
unset dgrid3d
splot $Data u 1:2:3 w p pt 7 lc palette notitle
set title "Gridded data"
set dgrid3d 100,100 qnorm 2
splot $Data u 1:2:3 w pm3d
unset multiplot
### end of code
Result:
If you have the size of each cell, you can use the "boxxyerror" plotting style. Let xdelta and ydelta be half the size of a cell along the x-axis and y-axis.
Script:
$datablock <<EOD
# x y xdelta ydelta pressure
1 1 1 1 0
3 1 1 1 1
1 3 1 1 1
3 3 1 1 3
2 6 2 2 4
6 2 2 2 4
6 6 2 2 5
4 12 4 4 6
12 4 4 4 6
12 12 4 4 7
EOD
set xrange [-2:18]
set yrange [-2:18]
set palette maxcolors 14
set style fill solid 1 border lc black
plot $datablock using 1:2:3:4:5 with boxxyerror fc palette title "mesh", \
$datablock using 1:2 with points pt 7 lc rgb "gray30" title "point"
pause -1
In this script, 5-column data (x, y, xdelta, ydelta, pressure) is given for "boxxyerror" plot. To colorize the cells, the option "fc palette" is required.
Result:
I hope this figure is what you are looking for.
Thanks.

gnuplot histogram chart with overlap

I would like to plot a bar chart or histogram like this in gnuplot.
I tried set style histogram rowstacked which is a start but it adds the columns on top of each other while I need them overlapped. Next is the issue of transparent color shading.
Thanks for your feedback.
UPDATE: user8153 asked for additional data.
The set style histogram clustered gap 0.0 is doing the cluster mode of the histogram bars. If you blur the eye it sort-of shows what I want but with overlap and transparent shading.
The only other histogram modes given in the docs are rowstacked and columnstacked. I never got a plot out of columnstacked so I discarded it. Now rowstacked stacks the histogram bars.
The overlay appearance is there but it is wrong. I don't want the stacked appearance. The histograms have to overlay.
Code :
set boxwidth 1.0 absolute
set style fill solid 0.5 noborder
set style data histogram
set style histogram clustered gap 0.0
#set style histogram rowstacked gap 0.0
set xtics in rotate by 90 offset first +0.5,0 right
set yrange [0:8000]
set xrange [90:180]
plot 'dat1.raw' using 3 lc rgb 'orange', \
'dat2.raw' using 3 lc rgb 'blue', \
'dat3.raw' using 3 lc rgb 'magenta'
Thanks for your feedback.
Given a sample datafile test.dat
-10 4.5399929762484854e-05
-9 0.0003035391380788668
-8 0.001661557273173934
-7 0.007446583070924338
-6 0.02732372244729256
-5 0.0820849986238988
-4 0.20189651799465538
-3 0.4065696597405991
-2 0.6703200460356393
-1 0.9048374180359595
0 1.0
1 0.9048374180359595
2 0.6703200460356393
3 0.4065696597405991
4 0.20189651799465538
5 0.0820849986238988
6 0.02732372244729256
7 0.007446583070924338
8 0.001661557273173934
9 0.0003035391380788668
10 4.5399929762484854e-05
you can use the following commands
set style fill transparent solid 0.7
plot "test.dat" with boxes, \
"test.dat" u ($1+4):2 with boxes
to get the following result (using the pngcairo terminal):
Using transparency as in user8153's solution is certainly the easiest way to visualize an overlap of two histograms.
This works even if the two histogram do not have identical bins or x-data-ranges.
However, the color of the overlap is pretty much bound to the colors of the two histogram and the level of transparency. Furthermore, if you want to show the overlap in the key you have to do it "manually".
Here is a solution where you can choose an independent color for the overlap area.
The overlap is basically the minimum y-value from both histograms for each x-value.
For this you need to compare the y-values for each x-value. This can be done in gnuplot with some "trick" by merging the two files line by line. This requires the data in a datablock (how to get it there from a file). Since this merging procedure is using indexing of datablock lines, it requires gnuplot>=5.2.0.
This assumes that you have the same x-range and bins for each histogram. If this is not the case, you have to implement some further steps.
Script: (works with gnuplot>=5.2.0, Sept. 2017)
### plot overlap of two histograms
reset session
# create some random test data
set samples 21
f(x,a,b) = 1./(a*(x-b)**4+1)
set table $Data1
plot '+' u 1:(f(x,0.01,-2)) w table
set table $Data2
plot '+' u 1:(f(x,0.02,4)) w table
unset table
set boxwidth 1.0
set grid y
set ytics 0.2
set multiplot layout 2,1
set style fill transparent solid 0.3
plot $Data1 u 1:2 w boxes lc 1 ti "Data1", \
$Data2 u 1:2 w boxes lc 2 ti "Data2"
set print $Overlap
do for [i=1:|$Data1|] { print $Data1[i].$Data2[i] }
set print
set style fill solid 0.3
plot $Data1 u 1:2 w boxes lc 1 ti "Data1", \
$Data2 u 1:2 w boxes lc 2 ti "Data2", \
$Overlap u 1:($2>$4?$4:$2) w boxes lc "red" ti "Overlap"
unset multiplot
### end of script
Result:

Three 2D maps in a single 3D graph in Gnuplot

Is there any way to construct three 2D maps (three heat maps) in a
single 3D graph in the Gnuplot? I have three datasets (in matrix form) to plot as 2D maps in a single 3D graph: the first data in the XY plane, the second in XZ, and the last one in YZ.
Thus I tried the (naive) code:
set multiplot
splot 'data_1' matrix u 1:2:3 w image
splot 'data_2' matrix u 2:3:1 w image
splot 'data_3' matrix u 3:2:1 w image
unset multiplot
but except for the 'data_1' map, all the others are out of scale.
There is any way to do this?
You have to give the splot command 4 pieces of information: the x, y, and the z coordinate, and the value for the color. For example, the script
set xyplane at -0.5
set xrange [-0.5:3.5]
set yrange [-0.5:3.5]
set zrange [-0.5:3.5]
set xtics 1
set ytics 1
set ztics 1
set view 55,110
unset key
splot "data.dat" matrix u 1:2:(-0.5):3 w image, \
"" matrix u 1:(-0.5):2:3 w image, \
"" matrix u (-0.5):1:2:3 w image
where data.dat is a data file in matrix format such as
1 2 3 2
4 5 6 5
7 8 9 8
4 5 6 5
gives the following output:

How to remove line between "jumping" values, in gnuplot?

I would like to draw a line with plots that contain "jumping" values.
Here is an example: when we have plots of sin(x) for several cycles and plot it, unrealistic line will appear that go across from right to left (as shown in following figure).
One idea to avoid this might be using with linespoints (link), but I want to draw it without revising the original data file.
Do we have simple and robust solution for this problem?
Assuming that you are plotting a function, that is, for each x value there exists one and only one corresponding y value, the easiest way to achieve what you want is to use the smooth unique option. This smoothing routine will make the data monotonic in x, then plot it. When several y values exist for the same x value, the average will be used.
Example:
Data file:
0.5 0.5
1.0 1.5
1.5 0.5
0.5 0.5
Plotting without smoothing:
set xrange [0:2]
set yrange [0:2]
plot "data" w l
With smoothing:
plot "data" smooth unique
Edit: points are lost if this solution is used, so I suggest to improve my answer.
Here can be applied "conditional plotting". Suppose we have a file like this:
1 2
2 5
3 3
1 2
2 5
3 3
i.e. there is a backline between 3rd and 4th point.
plot "tmp.dat" u 1:2
Find minimum x value:
stats "tmp.dat" u 1:2
prev=STATS_min_x
Or find first x value:
prev=system("awk 'FNR == 1 {print $1}' tmp.dat")
Plot the line if current x value is greater than previous, or don't plot if it's less:
plot "tmp.dat" u ($0==0? prev:($1>prev? $1:1/0), prev=$1):2 w l
OK, it's not impossible, but the following is a ghastly hack. I really advise you add an empty line in your dataset at the breaks.
$dat << EOD
1 1
2 2
3 3
1 5
2 6
3 7
1 8
2 9
3 10
EOD
plot for [i=0:3] $dat us \
($0==0?j=0:j=j,llx=lx,lx=$1,llx>lx?j=j+1:j=j,i==j?$1:NaN):2 w lp notit
This plots your dataset three times (acually four, there is a small error in there. I guess i have to initialise all variables), counts how often the abscissa values "jump", and only plots datapoints if this counter j is equal to the plot counter i.
Check the help on the serial evaluation operator "a, b" and the ternary operator "a?b:c"
If you have data in a repetitive x-range where the corresponding y-values do not change, then #Miguel's smooth unique solution is certainly the easiest.
In a more general case, what if the x-range is repetitive but y-values are changing, e.g. like a noisy sin(x)?
Then compare two consecutive x-values x0 and x1, if x0>x1 then you have a "jump" and make the linecolor fully transparent, i.e. invisible, e.g. 0xff123456 (scheme 0xaarrggbb, check help colorspec). The same "trick" can be used when you want to interrupt a dataline which has a certain forward "jump" (see https://stackoverflow.com/a/72535613/7295599).
Minimal solution:
plot x1=NaN $Data u 1:2:(x0=x1,x1=$1,x0>x1?0xff123456:0x0000ff) w l lc rgb var
Script:
### plot "folded" data without connecting lines
reset session
# create some test data
set table $Data
plot [0:2*pi] for [i=1:4] '+' u 1:(sin(x)+rand(0)*0.5) w table
unset table
set xrange[0:2*pi]
set key noautotitle
set multiplot layout 1,2
plot $Data u 1:2 w l lc "red" ti "data as is"
plot x1=NaN $Data u 1:2:(x0=x1,x1=$1,x0>x1?0xff123456:0x0000ff) \
w l lc rgb var ti "\n\n\"Jumps\" removed\nwithout changing\ninput data"
unset multiplot
### end of script
Result:

Remove duplicated outliers in gnuplot boxplot [duplicate]

I have a large set of data points. I try to plot them with a boxplot, but some of the outliers are the exact same value and they are represented on a line beside each other. I found How to set the horizontal distance between outliers in gnuplot boxplot, but it doesn't help too much, as it is apparently not possible.
Is it possible to group the outliers together, print one point and then print a number in brackets beside it to indicate how many points there are? I think this would make it more readable in a graph.
For information, I have three boxplots for one x value and that times six in one graph. I am using gnuplot 5 and already played around with the pointsize, which doesn't reduce the distance anymore.
I hope you can help!
Edit:
set terminal pdf
set output 'dat.pdf'
file0 = 'dat1.dat'
file1 = 'dat2.dat'
file2 = 'dat3.dat'
set pointsize 0.2
set notitle
set xlabel 'X'
set ylabel 'Y'
header = system('head -1 '.file0);
N = words(header)
set xtics ('' 1)
set for [i=1:N] xtics add (word(header, i) i)
set style data boxplot
plot file0 using (1-0.25):1:(0.2) with boxplot lw 2 lc rgb '#8B0000' fs pattern 16 title 'A'
plot file1 using (1):1:(0.2) with boxplot lw 2 lc rgb '#00008B' fs pattern 4 title 'B'
plot file2 using (1+0.25):1:(0.2) with boxplot lw 2 lc rgb '#006400' fs pattern 5 title 'C'
for [i=2:N] plot file0 using (i-0.25):i:(0.2) with boxplot lw 2 lc rgb '#8B0000' fs pattern 16 notitle
for [i=2:N] plot file1 using (i):i:(0.2) with boxplot lw 2 lc rgb '#00008B' fs pattern 4 notitle
for [i=2:N] plot file2 using (i+0.25):i:(0.2) with boxplot lw 2 lc rgb '#006400' fs pattern 5 notitle
What is the best way to implement it with this code already in place?
There is not option to have this done automatically. Required steps to do this manually in gnuplot are:
(In the following I assume, that the data file data.dat has only a single column.)
Analyze your data with stats to determine the boundaries for the outliers:
stats 'data.dat' using 1
range = 1.5 # (this is the default value of the `set style boxplot range` value)
lower_limit = STATS_lo_quartile - range*(STATS_up_quartile - STATS_lo_quartile)
upper_limit = STATS_up_quartile + range*(STATS_up_quartile - STATS_lo_quartile)
Count only the outliers and write them to a temporary file
set table 'tmp.dat'
plot 'data.dat' using 1:($1 > upper_limit || $1 < lower_limit ? 1 : 0) smooth frequency
unset table
Plot the boxplot without the outliers, and the outliers with the labels plotting style:
set style boxplot nooutliers
plot 'data.dat' using (1):1 with boxplot,\
'tmp.dat' using (1):($2 > 0 ? $1 : 1/0):(sprintf('(%d)', int($2))) with labels offset 1,0 left point pt 7
And this needs to be done for every single boxplot.
Disclaimer: This procedure should work basically, but having no example data I couldn't test it.

Resources