How to set the horizontal distance between outliers in gnuplot boxplot - gnuplot

So if i have plotted some data in gnuplot as a boxplot (set style data gnuplot), and I have outliers having the same value, then they are plotted as dots horizontally at the same place.
How can I set that horizontal distance?
So for example I have the datafile data.dat
1
1
1
1
1
1
1
1
1
1
1
1
9
9
and plot it using
set style data boxplot
plot 'data.dat' using (1):1
set yrange [0:10]
How can I set then the distance between the two points at y=9?

No, you cannot change that distance, The position of duplicate outliers depends on the selected pointsize. There is, however, a difference in the point distance between the command
plot 'data.dat' using (1):1
and
plot 'data.dat' using (1):1 pointsize 1.
But I suspect this shouldn't happen and might be categorized as bug.
set style data boxplot
set yrange[0:10]
plot 'data.dat' using (1):1 title 'no explicit point size',\
'' using (2):1 pointsize 1 title 'point size 1'

Christoph, you were on the right track, just didn't go far enough. At least with Ver. 5.0, the following worked for me:
set style boxplot outliers pointtype 6
plot 'data.dat' using (1):2:(0):1 pointsize .1
I was plotting a few thousand points, and this resulted in a much more reasonable plot, given that there could be tens of duplicates.
Outliers that don't overlap

Related

Gnuplot XRD graph, connecting points

I have a XRD data and when I plot it I want to have this kind of graph. Anyway, excel has a problem to plot too large data and I want to plot it with Gnuplot and here is my code
set title "GNUPLOT RESULT"
set xlabel "Wavelength 2Theta"
set ylabel "Intensity"
set xrange [20:90]
set key right center
set terminal pngcairo size 1600, 1000 enhanced font "Arial,16"
set output "Allt-XRD.png"
plot "AllW" using 1:2 w p pt 7 ps 2 lc rgb "orange" title "point", "AllW" using 1:2 smooth acspline lw 3 lc rgb 'blue' title 'spline'
But what it produces, it does not connect all dots/points and I do not know but somehow it has a preferences (is it a weight point?) to connecting them.
Question
How can I connect all the dots as seen at excel graph with Gnuplot
Thanks in advance
P.S: I tried all bunch of smooth version acscpline' cspline' bezier etc. it did not work
Edit 1: The line plot who wonders why I do not try it
Edit 2: The worked answer of user8153 : Use decimal data point not an integer. Both spline and points option plot perfectly the data as it seen below
How XRD data looks like, it is too long so I pasted only a few of them
Wavelength = 1.54059 Å (Cu)
Angle Intensity
20.00243 1467
20.02869 1533
20.05495 1482
20.08121 1468
20.10747 1376
20.13374 1421
20.16000 1433
20.18626 1380
20.21252 1431
20.23878 1405
20.26504 1357
20.29130 1374
20.31756 1413
Your with points plot shows that your data contains only integer values of the wavelength, but each value has multiple intensities associated with it. Is that really what the data should look like, or was there some mistake that chopped off the values of the wavelengths after the decimal point? Maybe your data file uses a symbol for the decimal point that gnuplot doesn't recognize? If so, use set decimalsign so gnuplot realizes that you are feeding it floating point numbers.
As it is, gnuplot does precisely what you tell it to do: it plots all these points at the same x coordinate, and connects them with lines if you use with lines, which are then by construction vertical.
You told it to plot "with points pointtype 7 pointsize 2" (shorthand "w p pt 7 ps 2"). So it did.
If you want it to plot with lines then say "with lines".
plot "AllW" using 1:2 with lines lc rgb "orange" title "lines"

horizontal offset for categorical plot

For the following categorical plot I would like to put some space left of the first category and right of the last:
#abc.dat
a 1
b 2
c 3
In gnuplot:
set yrange [0:4]
plot 'abc.dat' using 2:xticlabels(1) pointtype 7 pointsize 5
Result:
Desired (approximately):
How can this be done? I specifically want points (and not bars).
The command set offsets adds a space between data and axes:
set offsets graph 0.05, graph 0.05
graph ... means a space relative to your plot size.

Gnuplot: draw error bars of data points outside plotting range

If I set a specific yrange and plot in a pdf terminal with this plot command:
plot "data.dat" u 1:4:5:6 w yerrorbars pt 6 ps 0.5 t "R_t"
errorbars that belong to data points outside the yrange, but end inside the yrange are not shown.
How do I force gnuplot to draw those. I already tried "set clip one/two"
The only workaround I found is to plot the data 3 times, once for the central point and once for each side of the error bar.
Use "-" as symbol for the errorbars and use their own "errorbars" to draw a line to the central point.
You could use multiplot to achieve this.
Set your plot to have zero margins, so the axes are on the border of the canvas, and switch of all tics and borders for the first plot.
Switch on the axes, tics etc. again, and do an empty plot that you set at the correct position using set size and set origin. You'll have to do some math to calculate the exact position.
#MaVo159, you can reduce it to plotting only twice by using with yerrorbars and with vectors (check help vectors). You need to set the proper arrow style, check help arrowstyle.
However, this works only for gnuplot>=5.2.3, for earlier versions there seems to be a bug which plots the arrowhead at the wrong side for some of the vectors extending the graph.
You nevertheless have to plot once with yerrorbars in order to get the proper legend.
Script: (works for gnuplot>=5.2.3, May 2018)
### plot errorbars from points outside the range
reset
$Data <<EOD
1 9 5.11 8.32
2 8 6.20 9.22
3 6 5.31 6.31
4 5 4.41 5.51
5 4 3.31 4.71
6 2.9 2.81 3.71
7 2 1.11 3.41
EOD
set yrange[3:7]
set offsets 1,1,0,0
set style arrow 1 heads size 0.05,90 lw 2 lc 1
set multiplot layout 2,1
plot $Data u 1:2:3:4 w yerrorbars pt 6 ps 2 lw 2
plot $Data u 1:2:3:4 w yerrorbars pt 6 ps 2 lw 2, \
'' u 1:3:(0):($4-$3) w vec as 1 notitle
unset multiplot
### end of script
Result:
You could modify your data file: Because the central value of the data point is outside the plot range you could set it equal to the errorbar's end point that would be still visible in your plot.
Example:
plot range: set yrange[-2:2]
data point: 1, -3, -1, -4 (x, y, ylow, yhigh)
set data point to: 1, -1, -1, -4
Attention: Since you have to edit your data file you should
Make a copy of the original data file
Be very careful when editing the file
Keep in mind, that when changing the plot range such that the central
value of the data point becomes visible you have to use the original data point. Otherwise you will see the correct error bar but there will be no central value plotted. (this is equivalent to setting 'point type' to 0)

Remove duplicated outliers in gnuplot boxplot [duplicate]

I have a large set of data points. I try to plot them with a boxplot, but some of the outliers are the exact same value and they are represented on a line beside each other. I found How to set the horizontal distance between outliers in gnuplot boxplot, but it doesn't help too much, as it is apparently not possible.
Is it possible to group the outliers together, print one point and then print a number in brackets beside it to indicate how many points there are? I think this would make it more readable in a graph.
For information, I have three boxplots for one x value and that times six in one graph. I am using gnuplot 5 and already played around with the pointsize, which doesn't reduce the distance anymore.
I hope you can help!
Edit:
set terminal pdf
set output 'dat.pdf'
file0 = 'dat1.dat'
file1 = 'dat2.dat'
file2 = 'dat3.dat'
set pointsize 0.2
set notitle
set xlabel 'X'
set ylabel 'Y'
header = system('head -1 '.file0);
N = words(header)
set xtics ('' 1)
set for [i=1:N] xtics add (word(header, i) i)
set style data boxplot
plot file0 using (1-0.25):1:(0.2) with boxplot lw 2 lc rgb '#8B0000' fs pattern 16 title 'A'
plot file1 using (1):1:(0.2) with boxplot lw 2 lc rgb '#00008B' fs pattern 4 title 'B'
plot file2 using (1+0.25):1:(0.2) with boxplot lw 2 lc rgb '#006400' fs pattern 5 title 'C'
for [i=2:N] plot file0 using (i-0.25):i:(0.2) with boxplot lw 2 lc rgb '#8B0000' fs pattern 16 notitle
for [i=2:N] plot file1 using (i):i:(0.2) with boxplot lw 2 lc rgb '#00008B' fs pattern 4 notitle
for [i=2:N] plot file2 using (i+0.25):i:(0.2) with boxplot lw 2 lc rgb '#006400' fs pattern 5 notitle
What is the best way to implement it with this code already in place?
There is not option to have this done automatically. Required steps to do this manually in gnuplot are:
(In the following I assume, that the data file data.dat has only a single column.)
Analyze your data with stats to determine the boundaries for the outliers:
stats 'data.dat' using 1
range = 1.5 # (this is the default value of the `set style boxplot range` value)
lower_limit = STATS_lo_quartile - range*(STATS_up_quartile - STATS_lo_quartile)
upper_limit = STATS_up_quartile + range*(STATS_up_quartile - STATS_lo_quartile)
Count only the outliers and write them to a temporary file
set table 'tmp.dat'
plot 'data.dat' using 1:($1 > upper_limit || $1 < lower_limit ? 1 : 0) smooth frequency
unset table
Plot the boxplot without the outliers, and the outliers with the labels plotting style:
set style boxplot nooutliers
plot 'data.dat' using (1):1 with boxplot,\
'tmp.dat' using (1):($2 > 0 ? $1 : 1/0):(sprintf('(%d)', int($2))) with labels offset 1,0 left point pt 7
And this needs to be done for every single boxplot.
Disclaimer: This procedure should work basically, but having no example data I couldn't test it.

Gnuplot histogram gap does nothing

I have a gnuplot script which plots a histogram. I used the following syntax:
set style data histogram
set style histogram cluster gap 2
set style fill solid
set logscale y
rgb(r,g,b) = int(r)*65536 + int(g)*256 + int(b)
plot 'histogram_data' using (column(0)):2:(0.5):(rgb($3,$4,$5)):xticlabels(1) w boxes notitle lc rgb variable
What the last line does is: using column 1 as x labels, column 2 as the height of the histogram bars, 0.5 as box width, and columns 3, 4 and 5 as the rgb values to colour the bars.
Now, the problem is that modifying the gap parameter in line 2 does not change in any way the spacing between bars, even though as far as I understand that is the correct way to adjust such spacing. I am using gnuplot 4.6 patchlevel 4.
I found a way to do this with boxes, though I do not consider it very clean:
plot 'histogram_data' u (column(0)*2+1):2 w boxes notitle lc rgb 'white',\
'histogram_data' u (column(0)*2):2:(rgb($3,$4,$5)):xticlabels(1) w boxes notitle lc rgb variable;
This command is plotting all the data of the main plot on even slots and a white box on odd slots. So the first line in the plot command is plotting the gaps between every box of the plot (the width of these gaps can be specified using the boxwidth property I think but I haven't tested this), while the second line is drawing the actual plot.
I could not find a way to do this with the histogram plotting style, keeping the variable colours specified in the data file.

Resources