Fit histogram in gnuplot - linux

I'm trying to fit data (histogram) in gnuplot. I tried various functions, and by looking at my histogram, I suppose the best fit is lognormal or gamma distribution, but I am not able to do this fit in gnuplot (Im rather new user of gnuplot).
Here is picture of histogram with gaussian distribution:
Also here is code in gnuplot:
reset
n=100 #number of intervals
max=15. #max value
min=0. #min value
width=(max-min)/n #interval width
#function used to map a value to the intervals
hist(x,width)=width*floor(x/width)
set term png #output terminal and file
set output "histogram.png"
set xrange [min:max]
set yrange [0:]
#to put an empty boundary around the
#data inside an autoscaled graph.
set offset graph 0.05,0.05,0.05,0.0
set xtics min,(max-min)/5,max
set boxwidth width*0.9
set style fill solid 0.5 #fillstyle
set tics out nomirror
set xlabel "Diameter"
set ylabel "Frequency"
#count and plot
#fac(x) = (int(x)==0) ? 1.0 : int(x) * fac(int(x)-1.0)
gauss(x)=a/(sqrt(2*pi)*sigma)*exp(-(x-mean)**2/(2*sigma**2))
fit gauss(x) 'hist.temp' u 1:2 via a, sigma, mean
plot 'data.list' u (hist($8, width)):(1.0) smooth freq w boxes lc rgb "green" notitle, \
gauss(x) w lines ls 2 lw 2
In file hist.temp is tabular output ( see this link )

Related

gnuplot: how to generate smooth density plots from a distribution?

I would like to make a density plot from a distribution like the second subfigure in the following:
Here is what I tried:
unset key
set yrange [0:]
set ytics 5
set print $data
do for [i = 1:100] { print rand(0)*10 }
unset print
binwidth = 1
set boxwidth 0.8*binwidth
# set fill style of bins
set style fill solid 0.5
# define macro for plotting the histogram
hist = 'u (binwidth*(floor(($1)/binwidth)+0.5)):(1.0) smooth freq w boxes'
density = 'u (binwidth*(floor(($1)/binwidth)+0.5)):(1.0) smooth freq with filledcurves y=0'
plot $data #density
It is mainly based on a histogram by adding with filledcurves, but a clear difference is that the resulting figure is not smooth at all.
So, how can I generate smooth density plots from a distribution? Is there any interpolation function that can be used in gnuplot?
I found kernel density estimate in gnuplot can be helpful here.
plot $data u 1:(1/100.) s kdens bandwidth 1 with filledcurves y=0

Automatic offset in gnuplot

I am plotting data from a datafile and the data has behaviour that after a while on the x-axis the y-axis start to monotonically decrease and ultimately go to zero (with some very small fluctuations later on).
Hence, I want to offset the y-axis so that those fluctuations are clearly visible. For that I use something like set offsets 0,0,0,0.1. But I have actually written a bash script to generate the plot for me. I just need to provide the datafile name to it. So for each plot I don't want to go into the script and manually set offset value based on the data.
I would like if the offset were determined by gnuplot automatically based on the bin-size on the axis, like the offset is 1*bin-size. So my command could look like :
set offsets 0,0,0,1*$bin_size
Is there any way to achieve this?
Edit:
This is the script I am using.
#!/bin/bash
#Requires that the script be in the same directory as the data files
#sed -n '3001,4000p' fish_data_re.dat > fish_data_re_3k_4k.dat : Can be used to extract data from specific range in data file
DATA_FILE_NAME="abc"
DATA_FILE_TYPE="dat"
#Code to generate normalised files
awk 'NR == FNR {if(max < $2) {max = $2}; next} {$2 = $2 / max; printf "%f\t%f\n", $1, $2}' $DATA_FILE_NAME.$DATA_FILE_TYPE $DATA_FILE_NAME.$DATA_FILE_TYPE > $DATA_FILE_NAME\_normed.$DATA_FILE_TYPE
DATA_FILE_NAME="$DATA_FILE_NAME\_normed"
DATA_FILE_TYPE="dat"
OUTPUT_FILE_TYPE="eps"
OUTPUT_FILE_NAME="$DATA_FILE_NAME\_plot.$OUTPUT_FILE_TYPE"
X_LABEL="Time"
Y_LABEL="Real Classical Fisher Information"
TITLE="Real Classical Fisher Information vs Time"
#Set font size for axis tics
X_TICS_SIZE="6"
Y_TICS_SIZE="6"
gnuplot <<- MULTI_LINE_CODE_TAG
set xlabel "$X_LABEL"
set ylabel "$Y_LABEL"
#Following command allows the printing of underscore from name of data file in plot
set key noenhanced
set title "$TITLE"
set xtics font ", $X_TICS_SIZE"
set ytics font ", $Y_TICS_SIZE"
set xtics nomirror
set ytics nomirror
#set ytics format "%.22g"
set ytics format "%0.s*10^{%L}"
#set xtics format "%t"
set multiplot
#------The big-plot------
set title "$TITLE"
set offsets 0,0,0,0.01
#Following plots only data from line 1 to line 100
#plot "<(sed -n '1,100p' $DATA_FILE_NAME.$DATA_FILE_TYPE)" u 1:2 notitle w l lc "red" lw 2
plot "$DATA_FILE_NAME.$DATA_FILE_TYPE" u 1:2 notitle w l lc "red" lw 2
#------The sub-plot------
unset title
unset offsets
set origin 0.25,0.3
set size 0.45,0.45
set xrange [30:60]
set yrange [-0.01:0.01]
unset xlabel
unset ylabel
#unset label
plot "$DATA_FILE_NAME.$DATA_FILE_TYPE" u 1:2 notitle w l lc "red" lw 2
unset multiplot
set term "$OUTPUT_FILE_TYPE"
set output "$OUTPUT_FILE_NAME"
replot
MULTI_LINE_CODE_TAG
exit
As you can see I need to provide the offset manually.
Here is the plot I am getting.
The y-axis here got offset by -0.002 -0.2. I want to automate this thing and want gnuplot to always use the the offset as the size of a bin (which I define as the distance between successive tics).
(If this is a trivial question I apologise in advance, I am quite new to gnuplot.)
I guess I still don't understand your exact problem. By the way, your offset it -200e-3 = -0.2 not -0.002.
Is your data always between 0 and 1?
You could set the offsets depending on the graph (check help offsets)
set offsets 0,0,0, graph 0.2
In general, why not using logarithmic scale? With this you will be able to see all small features in your data.
Code:
### linear scale vs logarithmic scale
reset session
# Gauss curve by specifing Amplitude A, position x0 and width via FWHM
GaussW(x,x0,A,FWHM) = A * exp(-(x-x0)**2/(2*(FWHM/(2*sqrt(2*log(2))))**2))
# create some test data
set xrange[0:100]
set samples 500
set table $Data
plot '+' u 1:(GaussW($1,5,1,2.5) + GaussW($1,40,7e-3,2) + GaussW($1,47,8e-4,5) + 2e-4) w table
unset table
set multiplot layout 1,2
set offset 0,0,0, graph 0.2
set yrange[-0.02:1]
plot $Data u 1:2 w l title "linear y-scale"
set logscale y
set yrange[1e-4:1]
plot $Data u 1:2 w l title "logarithmic y-scale"
unset multiplot
### end of code
Result:

Gnuplot: oscilloscope-like line style?

Is it possible in Gnuplot to emulate the drawing style of an analogue oscilloscope, meaning thinner+dimmisher lines on larger amplitudes, like this:?
The effect you see in the oscilloscope trace is not due to amplitude, it is due to the rate of change as the trace is drawn. If you know that rate of change and can feed it to gnuplot as a third column of values, then you could use it to modulate the line color as it is drawn:
plot 'data' using 1:2:3 with lines linecolor palette z
I don't know what color palette would work best for your purpose, but here is an approximation using a function with an obvious, known, derivative.
set palette gray
set samples 1000
plot '+' using ($1):(sin($1)):(abs(cos($1))) with lines linecolor palette
For thickness variations, you could shift the curve slightly up and down, and fill the area between them.
f(x) = sin(2*x) * sin(30*x)
dy = 0.02
plot '+' u 1:(f(x)+dy):(f(x)-dy) w filledcurves ls 1 notitle
This does not allow variable colour, but the visual effect is similar.
Another approach:
As #Ethan already stated, the intensity is somehow proportional to the speed of movement, i.e. the derivative. If you have sin(x) as waveform, the derivative is cos(x). But what if you have given data? Then you have to calculate the derivative numerically.
Furthermore, depending on the background the line should fade from white (minimal derivative) to fully transparent (maximum derivative), i.e. you should change the transparency with the derivative.
Code:
### oscilloscope "imitation"
reset session
set term wxt size 500,400 butt # option butt, otherwise you will get overlap points
set size ratio 4./5
set samples 1000
set xrange[-5:5]
# create some test data
f(x) = 1.5*sin(15*x)*(cos(1.4*x)+1.5)
set table $Data
plot '+' u 1:(f($1)) w table
unset table
set xtics axis 1 format ""
set mxtics 5
set grid xtics ls -1
set yrange[-4:4]
set ytics axis 1 format ""
set mytics 5
set grid ytics ls -1
ColorScreen = 0x28a7e0
set obj 1 rect from screen 0,0 to screen 1,1 behind
set obj 1 fill solid 1.0 fc rgb ColorScreen
x0=y0=NaN
Derivative(x,y) = (dx=x-x0,x0=x,x-dx/2,dy=y-y0,y0=y,dy/dx) # approx. derivative
# get min/max derivative
set table $Dummy
plot n=0 $Data u (d=abs(Derivative($1,$2)),n=n+1,n<=2? (dmin=dmax=d) : \
(dmin>d ? dmin=d:dmin), (dmax<d?dmax=d:dmax)) w table
unset table
myColor(x,y) = (int((abs(Derivative(column(x),column(y)))-dmin)/(dmax-dmin)*0xff)<<24) +0xffffff
plot $Data u 1:2:(myColor(1,2)) w l lw 1.5 lc rgb var not
### end of code
Result:

Heatmap of points in a volume

I have (x,y,z) points with coordinates like the following figure,
I would like to color the points based on their concentration.
The idea is to make a heatmap of points but in a 3D figure.
I would appreciate very much any help possible.
Regards.
Use data values in a 4th column to index a smooth color palette
splot DATA using 1:2:3:4 with points lc palette
The gnuplot development version now supports calculation of a point density function that can in turn be used to color individual points. This depends on a new set of commands that operate on a 3D grid of voxels. Sample script and output:
set title "Gaussian 3D cloud of 3000 random samples\ncolored by local point density"
rlow = -4.0; rhigh = 4.0
set xrange [rlow:rhigh]; set yrange [rlow:rhigh]; set zrange [rlow:rhigh]
set xtics axis nomirror; set ytics axis nomirror; set ztics axis nomirror;
set xyplane at 0
set xzeroaxis lt -1; set yzeroaxis lt -1; set zzeroaxis lt -1;
set log cb; set cblabel "point density"
# define 100 x 100 x 100 voxel grid
set vgrid $vdensity size 100
vclear $vdensity
# datablock $random has previously been loaded with 3000 points
# in a spherical Gaussian distribution about the origin
# The vfill command adds 1 to each voxel in a spherical region with radius 0.33
# around each point in $random
vfill $random using 1:2:3:(0.33):(1.0)
# plot the same points colored by local point density
splot $random using 1:2:3:(voxel($1,$2,$3)) with points pt 7 ps 0.5 lc palette
Full demo here: voxel demo in gnuplot online collection

Gnuplot: Scatter plot and density

I have x- and y-data points representing a star cluster. I want to visualize the density using Gnuplot and its scatter function with overlapping points.
I used the following commands:
set style fill transparent solid 0.04 noborder
set style circle radius 0.01
plot "data.dat" u 1:2 with circles lc rgb "red"
The result:
However I want something like that
Is that possible in Gnuplot? Any ideas?
(edit: revised and simplified)
Probably a much better way than my previous answer is the following:
For each data point check how many other data points are within a radius of R. You need to play with the value or R to get some reasonable graph.
Indexing the datalines requires gnuplot>=5.2.0 and the data in a datablock (without empty lines). You can either first plot your file into a datablock (check help table) or see here:
gnuplot: load datafile 1:1 into datablock
The time for creating this graph will increase with number of points O(N^2) because you have to check each point against all others. I'm not sure if there is a smarter and faster method. The example below with 1200 datapoints will take about 4 seconds on my laptop. You basically can apply the same principle for 3D.
Script: works with gnuplot>=5.2.0
### 2D density color plot
reset session
t1 = time(0.0)
# create some random rest data
set table $Data
set samples 700
plot '+' u (invnorm(rand(0))):(invnorm(rand(0))) w table
set samples 500
plot '+' u (invnorm(rand(0))+2):(invnorm(rand(0))+2) w table
unset table
print sprintf("Time data creation: %.3f s",(t0=t1,t1=time(0.0),t1-t0))
# for each datapoint: how many other datapoints are within radius R
R = 0.5 # Radius to check
Dist(x0,y0,x1,y1) = sqrt((x1-x0)**2 + (y1-y0)**2)
set print $Density
do for [i=1:|$Data|] {
x0 = real(word($Data[i],1))
y0 = real(word($Data[i],2))
c = 0
stats $Data u (Dist(x0,y0,$1,$2)<=R ? c=c+1 : 0) nooutput
d = c / (pi * R**2) # density: points per unit area
print sprintf("%g %g %d", x0, y0, d)
}
set print
print sprintf("Time density check: %.3f sec",(t0=t1,t1=time(0.0),t1-t0))
set size ratio -1 # same screen units for x and y
set palette rgb 33,13,10
plot $Density u 1:2:3 w p pt 7 lc palette z notitle
### end of script
Result:
Would it be an option to postprocess the image with imagemagick?
# convert into a gray scale image
convert source.png -colorspace gray -sigmoidal-contrast 10,50% gray.png
# build the gradient, the heights have to sum up to 256
convert -size 10x1 gradient:white-white white.png
convert -size 10x85 gradient:red-yellow \
gradient:yellow-lightgreen \
gradient:lightgreen-blue \
-append gradient.png
convert gradient.png white.png -append full-gradient.png
# finally convert the picture
convert gray.png full-gradient.png -clut target.png
I have not tried but I am quite sure that gnuplot can plot the gray scale image directly.
Here is the (rotated) gradient image:
This is the result:
Although this question is rather "old" and the problem might have been solved differently...
It's probably more for curiosity and fun than for practical purposes.
The following code implements a coloring according to the density of points using gnuplot only. On my older computer it takes a few minutes to plot 1000 points. I would be interested if this code can be improved especially in terms of speed (without using external tools).
It's a pity that gnuplot does not offer basic functionality like sorting, look-up tables, merging, transposing or other basic functions (I know... it's gnuPLOT... and not an analysis tool).
The code:
### density color plot 2D
reset session
# create some dummy datablock with some distribution
N = 1000
set table $Data
set samples N
plot '+' u (invnorm(rand(0))):(invnorm(rand(0))) w table
unset table
# end creating dummy data
stats $Data u 1:2 nooutput
XMin = STATS_min_x
XMax = STATS_max_x
YMin = STATS_min_y
YMax = STATS_max_y
XRange = XMax-XMin
YRange = YMax-YMin
XBinCount = 20
YBinCount = 20
BinNo(x,y) = floor((y-YMin)/YRange*YBinCount)*XBinCount + floor((x-XMin)/XRange*XBinCount)
# do the binning
set table $Bins
plot $Data u (BinNo($1,$2)):(1) smooth freq # with table
unset table
# prepare final data: BinNo, Sum, XPos, YPos
set print $FinalData
do for [i=0:N-1] {
set table $Data3
plot $Data u (BinNumber = BinNo($1,$2),$1):(XPos = $1,$1):(YPos = $2,$2) every ::i::i with table
plot [BinNumber:BinNumber+0.1] $Bins u (BinNumber == $1 ? (PointsInBin = $2,$2) : NaN) with table
print sprintf("%g\t%g\t%g\t%g", XPos, YPos, BinNumber, PointsInBin)
unset table
}
set print
# plot data
set multiplot layout 2,1
set rmargin at screen 0.85
plot $Data u 1:2 w p pt 7 lc rgb "#BBFF0000" t "Data"
set xrange restore # use same xrange as previous plot
set yrange restore
set palette rgbformulae 33,13,10
set colorbox
# draw the bin borders
do for [i=0:XBinCount] {
XBinPos = i/real(XBinCount)*XRange+XMin
set arrow from XBinPos,YMin to XBinPos,YMax nohead lc rgb "grey" dt 1
}
do for [i=0:YBinCount] {
YBinPos = i/real(YBinCount)*YRange+YMin
set arrow from XMin,YBinPos to XMax,YBinPos nohead lc rgb "grey" dt 1
}
plot $FinalData u 1:2:4 w p pt 7 ps 0.5 lc palette z t "Density plot"
unset multiplot
### end of code
The result:

Resources