Fitting a normalized histogram using gnuplot - gnuplot

I have a datafile containing N random numbers generated from a C-code. Now I want to normalize the histogram from this datafile and, then, fit it to a given distribution function. How can I do that?
This is my gnuplot code for the histogram plot:
width = 5000
hist(x,width)=width*floor(x/width)+width/2.0
set boxwidth width
set style fill solid 0.5
set xrange [0:500000]
set yrange [0:20]
plot "out.dat" u (hist($1,width)):(1.0) smooth freq w boxes lc rgb"green"

Since gnuplot version 5.2 there is an new smoothing type smooth fnormal which does exactly that: sum up all values with same x-value and normalize the data so that the overall sum is 1.
A simple example:
set boxwidth 0.9
set style fill solid 0.5
set yrange [0:*]
$data <<EOD
1
1
2
2
2
3
3
EOD
set style data boxes
plot $data u 1:(1) smooth freq title 'smooth frequency',\
'' u 1:(1) smooth fnormal title 'smooth fnormal'
Applied to you example you must only update the actual plotting line to
plot "out.dat" u (hist($1,width)):(1.0/(sum)) smooth fnormal w boxes lc rgb "green"

Related

Show error bars in a multiaxis plot in Gnuplot

I have a dataset (show-errorbar.dat) containing:
Model# DE IE Error
Apple -4.6 -128.9538 4.0
Huawei -5.2 -176.6343 5.3
One-Pro -5.2 -118.1106 3.2
#!/usr/bin/gnuplot
#set terminal pdfcairo enhanced color font 'Helvetica,12' linewidth 0.8
set terminal png
set output 'BrandError.png'
set boxwidth 1.0 relative
set bmargin 5
set style fill solid border -1
set xtic rotate by -45 scale 0
#set auto x
set style line 81 lt 0 lc rgb "#808080" lw 0.5
set grid xtics
set grid ytics
set grid mxtics
set grid mytics
set grid back ls 81
set arrow from graph 0,first -4.6 to graph 1, first -4.6 nohead lw 2 lc rgb "#000000" front
set border 11
set border lw 2.0
set xtics font ",11"
set ytics font ",14"
set tics out
set ytics nomirror
set y2tics
set y2tics font ",14"
set mxtics 10
set mytics 2
set my2tics 2
set yrange [-10:0]
set y2range [-260:0]
set key left bottom
set y2label offset -2
set ylabel offset 2
set ylabel 'DE' tc rgb "red"
set y2label 'IE' tc rgb "green"
set style data histograms
set style histogram cluster gap 2
set linetype 2 lc rgb 'red'
set linetype 3 lc rgb 'yellow'
set linetype 4 lc rgb 'green'
plot 'show-errorbars.dat' using 2 ti 'DE' lc 2 axis x1y1, '' u 3:xticlabels(1) ti 'IE' lc 4 axis x1y2
set output
enter image description here
I would like to plot a histogram comparing DE vs IE and also show error bars (data in column 4) for the IE values.
Please any help on how to go about it.
There is a variant histogram style for exactly that purpose
set style histogram errorbars gap 2 {lw W}.
Here is the help section from the docs:
The `errorbars` style is very similar to the `clustered` style, except that it
requires additional columns of input for each entry. The first column holds
the height (y value) of that box, exactly as for the `clustered` style.
2 columns: y yerr bar extends from y-yerr to y+err
3 columns: y ymin ymax bar extends from ymin to ymax
The appearance of the error bars is controlled by the current value of
`set errorbars` and by the optional <linewidth> specification.
Updated answer
Notes:
You can't mix axis choice within a single histogram. So I have removed the axes x1y1 and axes x1y2 from the plot command. Since you have explicitly given the range for both y1 and y2, the plot border and labels are not affected.
However since the green bars are now being plotted against y1, we have to scale them so that the y2 axis labels apply. So the column 3 and column 4 values will be divided by 26, which is (y2 range) / (y1 range)
In "histogram errorbars" mode each plot component looks for an extra column of data to determine the size of the errorbar. Since your column 2 data has no corresponding column of errors, we dummy it up to use all a constant not-a-number (no data) value: (NaN)
Your data contains a line of columnheaders, which could confuse the program if it thinks this is a line of data. There are a number of ways you can tell the program to skip this line; I have used set key autotitle columnhead for convenience and because it is supported by old versions of gnuplot. If you have a current version it would be better to use instead set datafile columnheaders.
I have kept all of your commands except that the plot command is replaced by the following 3 lines:
set style histogram errorbars gap 2 lw 1.5
set key autotitle columnhead
plot 'show-errorbars.dat' using 2:(NaN) ti 'DE' lc 2, '' u ($3/26.):($4/26.):xticlabels(1) ti 'IE' lc 4

Plotting intersecting lines in GNUplot

I haven't been able to find any example of what I'm trying to do in GNUplot from raking docs and demos.
Essentially I want to plot the Blue, Green, and Red lines I manually drew on this output (for demonstration) at the 10/50/90% marks.
EDIT: For clarity, I'm looking to determine where the distribution lines hit the cumulative distribution at 0.1/0.5/0.9 to know which co-ordinates to draw the lines at. Thanks!
set terminal png size 1600,800 font "Consolas" 16
set output "test.png"
set title "PDF and CDF - 1000 Simulations"
set grid y2
set ylabel "Date Probability"
set y2range [0:1.00]
set y2tics 0.1
set y2label "Cumulative Distribution"
set xtics rotate by 90 offset 0,-5
set bmargin 6
plot "data.txt" using 1:3:xtic(2) notitle with boxes axes x1y1,'' using 1:4 notitle with linespoints axes x1y2
Depending on the number of points in your cumulative data curve you might need interpolation. The following example is chosen such that no original data point will be at your levels 10%, 50%, 90%. If your data is not steadily increasing, it will take the last value which matches your level(s).
The procedure is as follows:
plot your data to a dummy table.
check when Level is between to successive y-values (y0,y1).
remember the interpolated x-value in xp.
draw arrows from the borders of the graph to the point (xp,Level) (or instead use the partly outside rectangle "trick" from #Ethan).
Code:
### linear interpolation of data
reset session
set colorsequence classic
set key left
# create some dummy data
set sample 10
set table $Data
plot [-2:2] '+' u 1:(norm(x)) with table
unset table
Interpolate(yi) = x0 + (x1-x0)*(yi-y0)/(y1-y0)
Levels = "0.1 0.5 0.9"
do for [i=1:words(Levels)] {
Level = word(Levels,i)
x0 = x1 = y0 = y1 = NaN
set table $Dummy
plot $Data u (x0=x1,x1=$1,y0=y1,y1=$2, (y0<=Level && Level<=y1)? (xp=Interpolate(Level)):NaN ): (Level) w table
unset table
set arrow i*2 from xp, graph 0 to xp,Level nohead lc i
set arrow i*2+1 from xp,Level to graph 1,Level nohead lc i
}
plot $Data u 1:2 w lp pt 7 lc 0 t "Original data"
### end code
Result:
It is not clear if you are asking how to find the x-coordinates at which your cumulative distribution line hits 0.1, 0.5, 0.9 (hard to do so I will leave that for now) or asking how to draw the lines once you know those x values. The latter part is easy. Think of the lines you want to draw as the unclipped portion of a rectangle that extends off the plot to the lower right:
set object 1 rectangle from x1, 0.1 to graph 2, -2 fillstyle empty border lc "blue"
set object 2 rectangle from x2, 0.1 to graph 2, -2 fillstyle empty border lc "green"
set object 3 rectangle from x3, 0.1 to graph 2, -2 fillstyle empty border lc "red"
plot ...

gnuplot curve from file and parametric sphere

I am trying to plot in a 3d space a curve coming from a file and a sphere made with parametric entries.
The idea is to plot the planet Earth and the orbit of a satellite.
The orbit is defined in a file x y z and gnuplot commands are simply
splot 'file.txt' u 1:2:3 title 'Orbit element 1' with lines
Orbit satellite :
I found a script to plot the Earth
#color definitions
set border lw 1.5
set style line 1 lc rgb '#000000' lt 1 lw 2
set style line 2 lc rgb '#c0c0c0' lt 2 lw 1
unset key; unset border
set tics scale 0
set lmargin screen 0
set bmargin screen 0
set rmargin screen 1
set tmargin screen 1
set format ''
set mapping spherical
set angles degrees
set xyplane at -1
set view 56,81
set parametric
set isosamples 25
set urange[0:360]
set vrange[-90:90]
r = 0.99
splot r*cos(v)*cos(u),r*cos(v)*sin(u),r*sin(v) with lines linestyle 2,'world.dat' with lines linestyle 1
unset parametric
Unfortunately, I am not able to mix splot wiht the data file and the splot with the parametric.
Any suggestions more than welcome!
Thanks
In order to generate the plot below, I used the data linked in this blog post. Now, if we want to combine several data sources into one plot, we will need to convert one or the other into a common system of coordinates. If the satellite data is in Cartesian x,y,z coordinates, perhaps the easiest solution would be to convert the world map into Cartesian system as well.
This could be done as shown below. The parameter R denotes the radius of the sphere on the surface of which Gnuplot draws the world map. It should be slightly larger than r so that hidden3d works. The columns in the world_110m.txt file have the meaning of longitude (first column) and latitude (second column), therefore the conversion is given as (R*cos($1)*cos($2)):(R*sin($1)*cos($2)):(R*sin($2)). In the file input.pnts.dat, I just generated coordinates of points on an ellipse with a=1.6 and b=1.2 rotated around the x axis by 45 degrees (counterclockwise). For real satellite data, one would need to rescale the coordinates by dividing by the radius of Earth, i.e., use ($1/Re):($2/Re):($3/Re) instead of 1:2:3, where Re denotes the radius in whichever units your data is (probably meters, judging by the first plot in your question).
set terminal pngcairo
set output 'fig.png'
set xr [-2:2]
set yr [-2:2]
set zr [-2:2]
#color definitions
set border lw 1.5
set style line 1 lc rgb '#000000' lt 1 lw 2
set style line 2 lc rgb '#c0c0c0' lt 2 lw 1
unset key; unset border; set tics scale 0
set format ''
set angles degrees
set xyplane at -1
set view 56,81
set lmargin screen 0
set bmargin screen 0
set rmargin screen 1
set tmargin screen 1
set parametric
set isosamples 25
set urange[0:360]
set vrange[-90:90]
r = 0.99
R = 1.00
set hidden3d
#since we are using Cartesian coordinates, we don't want this
#set mapping spherical
splot \
r*cos(v)*cos(u),r*cos(v)*sin(u),r*sin(v) with lines linestyle 2, \
'world_110m.txt' u (R*cos($1)*cos($2)):(R*sin($1)*cos($2)):(R*sin($2)) w l lw 2 lc rgb 'black', \
'input.pnts.dat' u 1:2:3 w l lw 2 lc rgb 'red'
This then gives:

gnuplot histogram chart with overlap

I would like to plot a bar chart or histogram like this in gnuplot.
I tried set style histogram rowstacked which is a start but it adds the columns on top of each other while I need them overlapped. Next is the issue of transparent color shading.
Thanks for your feedback.
UPDATE: user8153 asked for additional data.
The set style histogram clustered gap 0.0 is doing the cluster mode of the histogram bars. If you blur the eye it sort-of shows what I want but with overlap and transparent shading.
The only other histogram modes given in the docs are rowstacked and columnstacked. I never got a plot out of columnstacked so I discarded it. Now rowstacked stacks the histogram bars.
The overlay appearance is there but it is wrong. I don't want the stacked appearance. The histograms have to overlay.
Code :
set boxwidth 1.0 absolute
set style fill solid 0.5 noborder
set style data histogram
set style histogram clustered gap 0.0
#set style histogram rowstacked gap 0.0
set xtics in rotate by 90 offset first +0.5,0 right
set yrange [0:8000]
set xrange [90:180]
plot 'dat1.raw' using 3 lc rgb 'orange', \
'dat2.raw' using 3 lc rgb 'blue', \
'dat3.raw' using 3 lc rgb 'magenta'
Thanks for your feedback.
Given a sample datafile test.dat
-10 4.5399929762484854e-05
-9 0.0003035391380788668
-8 0.001661557273173934
-7 0.007446583070924338
-6 0.02732372244729256
-5 0.0820849986238988
-4 0.20189651799465538
-3 0.4065696597405991
-2 0.6703200460356393
-1 0.9048374180359595
0 1.0
1 0.9048374180359595
2 0.6703200460356393
3 0.4065696597405991
4 0.20189651799465538
5 0.0820849986238988
6 0.02732372244729256
7 0.007446583070924338
8 0.001661557273173934
9 0.0003035391380788668
10 4.5399929762484854e-05
you can use the following commands
set style fill transparent solid 0.7
plot "test.dat" with boxes, \
"test.dat" u ($1+4):2 with boxes
to get the following result (using the pngcairo terminal):
Using transparency as in user8153's solution is certainly the easiest way to visualize an overlap of two histograms.
This works even if the two histogram do not have identical bins or x-data-ranges.
However, the color of the overlap is pretty much bound to the colors of the two histogram and the level of transparency. Furthermore, if you want to show the overlap in the key you have to do it "manually".
Here is a solution where you can choose an independent color for the overlap area.
The overlap is basically the minimum y-value from both histograms for each x-value.
For this you need to compare the y-values for each x-value. This can be done in gnuplot with some "trick" by merging the two files line by line. This requires the data in a datablock (how to get it there from a file). Since this merging procedure is using indexing of datablock lines, it requires gnuplot>=5.2.0.
This assumes that you have the same x-range and bins for each histogram. If this is not the case, you have to implement some further steps.
Script: (works with gnuplot>=5.2.0, Sept. 2017)
### plot overlap of two histograms
reset session
# create some random test data
set samples 21
f(x,a,b) = 1./(a*(x-b)**4+1)
set table $Data1
plot '+' u 1:(f(x,0.01,-2)) w table
set table $Data2
plot '+' u 1:(f(x,0.02,4)) w table
unset table
set boxwidth 1.0
set grid y
set ytics 0.2
set multiplot layout 2,1
set style fill transparent solid 0.3
plot $Data1 u 1:2 w boxes lc 1 ti "Data1", \
$Data2 u 1:2 w boxes lc 2 ti "Data2"
set print $Overlap
do for [i=1:|$Data1|] { print $Data1[i].$Data2[i] }
set print
set style fill solid 0.3
plot $Data1 u 1:2 w boxes lc 1 ti "Data1", \
$Data2 u 1:2 w boxes lc 2 ti "Data2", \
$Overlap u 1:($2>$4?$4:$2) w boxes lc "red" ti "Overlap"
unset multiplot
### end of script
Result:

Gnuplot change color of bars in histogram

is it possible to change the color of bars in a Gnuplot script dynamically?
I have the following script
reset
fontsize = 12
set term postscript enhanced eps fontsize
set output "bargraph_speedup.eps"
set style fill solid 1.00 border 0
set style histogram
set style data histogram
set xtics rotate by -45
set grid ytics linestyle 1
set xlabel "Benchmarks" font "bold"
set ylabel "Relative execution time vs. reference implementation" font "bold"
set datafile separator ","
plot 'bm_speedup.dat' using 2:xtic(1) ti "Speedup" linecolor rgb "#00FF00"
which generates this plot:
Is it possible to make the color of the bars which are below zero red?
Thanks,
Sven
You can mimic this behavior using the boxes style:
My test data:
zip 2
baz 2
bar -1
cat 4
foo -3
And then plotting with gnuplot:
set style line 1 lt 1 lc rgb "green"
set style line 2 lt 1 lc rgb "red"
set style fill solid
plot 'test.dat' u (column(0)):2:(0.5):($2>0?1:2):xtic(1) w boxes lc variable
# #xval:ydata:boxwidth:color_index:xtic_labels
You could split your data file into two parts, positive values and negative, and plot them separately:
plot 'bm_speedup_pos.dat' using 2:xtic(1) ti "Faster" linecolor rgb "#00FF00", \
'bm_speedup_neg.dat' using 2:xtic(1) ti "Slower" linecolor rgb "#FF0000"
Or, if you only need to generate a few graphs, a few times, a common technique is to generate the raw graph in gnuplot, then post-process it in an image editor to adjust the colors. If you go that route, I suggest having gnuplot generate the graph in SVG format, which will give you much better looking graphs than any of the bitmap formats.
Doesn't seem like histogram lets you do it. May be like this:
set boxwidth 0.3
f(v)=v<0?1:2
plot 'bm_speedup.dat' using 0:2:(f($2)):xticlabels(1) with boxes ti "Speedup" lc variable
Actually you can also use linecolor rgb variable and give the color like this:
plot 'bm_speedup.dat' using 2:xtic(1):($2 >= 0 ? 0x00FF00 : 0xFF0000) ti Speedup lc rgb variable

Resources