gnuplot: how to generate smooth density plots from a distribution? - gnuplot

I would like to make a density plot from a distribution like the second subfigure in the following:
Here is what I tried:
unset key
set yrange [0:]
set ytics 5
set print $data
do for [i = 1:100] { print rand(0)*10 }
unset print
binwidth = 1
set boxwidth 0.8*binwidth
# set fill style of bins
set style fill solid 0.5
# define macro for plotting the histogram
hist = 'u (binwidth*(floor(($1)/binwidth)+0.5)):(1.0) smooth freq w boxes'
density = 'u (binwidth*(floor(($1)/binwidth)+0.5)):(1.0) smooth freq with filledcurves y=0'
plot $data #density
It is mainly based on a histogram by adding with filledcurves, but a clear difference is that the resulting figure is not smooth at all.
So, how can I generate smooth density plots from a distribution? Is there any interpolation function that can be used in gnuplot?

I found kernel density estimate in gnuplot can be helpful here.
plot $data u 1:(1/100.) s kdens bandwidth 1 with filledcurves y=0

Related

Gnuplot smoothing data in loglog plot

I would like to plot a smoothed curve based on a dataset which spans over 13 orders of magnitude [1E-9:1E4] in x and 4 orders of magnitude [1E-6:1e-2] in y.
MWE:
set log x
set log y
set xrange [1E-9:1E4]
set yrange [1E-6:1e-2]
set samples 1000
plot 'data.txt' u 1:3:(1) smooth csplines not
The smooth curve looks nice above x=10. Below, it is just a straight line down to the point at x=1e-9.
When increasing samples to 1e4, smoothing works well above x=1. For samples 1e5, smoothing works well above x=0.1 and so on.
Any idea on how to apply smoothing to lower data points without setting samples to 1e10 (which does not work anyway...)?
Thanks and best regards!
JP
To my understanding sampling in gnuplot is linear. I am not aware, but maybe there is a logarithmic sampling in gnuplot which I haven't found yet.
Here is a suggestion for a workaround which is not yet perfect but may act as a starting point.
The idea is to split your data for example into decades and to smooth them separately.
The drawback is that there might be some overlaps between the ranges. These you can minimize or hide somehow when you play with set samples and every ::n or maybe there is another way to eliminate the overlaps.
Code:
### smoothing over several orders of magnitude
reset session
# create some random test data
set print $Data
do for [p=-9:3] {
do for [m=1:9:3] {
print sprintf("%g %g", m*10**p, (1+rand(0))*10**(p/12.*3.-2))
}
}
set print
set logscale x
set logscale y
set format x "%g"
set format y "%g"
set samples 100
pMin = -9
pMax = 3
set table $Smoothed
myFilter(col,p) = (column(col)/10**p-1) < 10 ? column(col) : NaN
plot for [i=pMin:pMax] $Data u (myFilter(1,i)):2 smooth cspline
unset table
plot $Data u 1:2 w p pt 7 ti "Data", \
$Smoothed u 1:2 every ::3 w l ti "cspline"
### end of code
Result:
Addition:
Thanks to #maij who pointed out that it can be simplified by simply mapping the whole range into linear space. In contrast to #maij's solution I would let gnuplot handle the logarithmic axes and keep the actual plot command as simple as possible with the extra effort of some table plots.
Code:
### smoothing in loglog plot
reset session
# create some random test data
set print $Data
do for [p=-9:3] {
do for [m=1:9:3] {
print sprintf("%g %g", m*10**p, (1+rand(0))*10**(p/12.*3.-2))
}
}
set print
set samples 500
set table $SmoothedLog
plot $Data u (log10($1)):(log10($2)) smooth csplines
set table $Smoothed
plot $SmoothedLog u (10**$1):(10**$2) w table
unset table
set logscale x
set logscale y
set format x "%g"
set format y "%g"
set key top left
plot $Data u 1:2 w p pt 7 ti "Data", \
$Smoothed u 1:2 w l lc "red" ti "csplines"
### end of code
Result:
Using a logarithmic scale basically means to plot the logarithm of a value instead of the value itself. The set logscale command tells gnuplot to do this automatically:
read the data, still linear world, no logarithm yet
calculate the splines on an equidistant grid (smooth csplines), still linear world
calculate and plot the logarithms (set logscale)
The key point is the equidistant grid. Let's say one chooses set xrange [1E-9:10000] and set samples 101. In the linear world 1e-9 compared to 10000 is approximately 0, and the resulting grid will be 1E-9 ~ 0, 100, 200, 300, ..., 9800, 9900, 10000. The first grid point is at 0, the second one at 100, and gnuplot is going to draw a straight line between them. This does not change when afterwards logarithms of the numbers are plotted.
This is what you already have noted in your question: you need 10 times more points to get a smooth curve for smaller exponents.
As a solution, I would suggest to switch the calculation of the logarithms and the calculation of the splines.
# create some random test data, code "stolen" from #theozh (https://stackoverflow.com/a/66690491)
set print $Data
do for [p=-9:3] {
do for [m=1:9:3] {
print sprintf("%g %g", m*10**p, (1+rand(0))*10**(p/12.*3.-2))
}
}
set print
# this makes the splines smoother
set samples 1000
# manually account for the logarithms in the tic labels
set format x "10^{%.0f}" # for example this format
set format y "1e{%+03.0f}" # or this one
set xtics 2 # logarithmic world, tic distance in orders of magnitude
set ytics 1
# just "read logarithm of values" from file, before calculating splines
plot $Data u (log10($1)):(log10($2)) w p pt 7 ti "Data" ,\
$Data u (log10($1)):(log10($2)) ti "cspline" smooth cspline
This is the result:

Gnuplot: oscilloscope-like line style?

Is it possible in Gnuplot to emulate the drawing style of an analogue oscilloscope, meaning thinner+dimmisher lines on larger amplitudes, like this:?
The effect you see in the oscilloscope trace is not due to amplitude, it is due to the rate of change as the trace is drawn. If you know that rate of change and can feed it to gnuplot as a third column of values, then you could use it to modulate the line color as it is drawn:
plot 'data' using 1:2:3 with lines linecolor palette z
I don't know what color palette would work best for your purpose, but here is an approximation using a function with an obvious, known, derivative.
set palette gray
set samples 1000
plot '+' using ($1):(sin($1)):(abs(cos($1))) with lines linecolor palette
For thickness variations, you could shift the curve slightly up and down, and fill the area between them.
f(x) = sin(2*x) * sin(30*x)
dy = 0.02
plot '+' u 1:(f(x)+dy):(f(x)-dy) w filledcurves ls 1 notitle
This does not allow variable colour, but the visual effect is similar.
Another approach:
As #Ethan already stated, the intensity is somehow proportional to the speed of movement, i.e. the derivative. If you have sin(x) as waveform, the derivative is cos(x). But what if you have given data? Then you have to calculate the derivative numerically.
Furthermore, depending on the background the line should fade from white (minimal derivative) to fully transparent (maximum derivative), i.e. you should change the transparency with the derivative.
Code:
### oscilloscope "imitation"
reset session
set term wxt size 500,400 butt # option butt, otherwise you will get overlap points
set size ratio 4./5
set samples 1000
set xrange[-5:5]
# create some test data
f(x) = 1.5*sin(15*x)*(cos(1.4*x)+1.5)
set table $Data
plot '+' u 1:(f($1)) w table
unset table
set xtics axis 1 format ""
set mxtics 5
set grid xtics ls -1
set yrange[-4:4]
set ytics axis 1 format ""
set mytics 5
set grid ytics ls -1
ColorScreen = 0x28a7e0
set obj 1 rect from screen 0,0 to screen 1,1 behind
set obj 1 fill solid 1.0 fc rgb ColorScreen
x0=y0=NaN
Derivative(x,y) = (dx=x-x0,x0=x,x-dx/2,dy=y-y0,y0=y,dy/dx) # approx. derivative
# get min/max derivative
set table $Dummy
plot n=0 $Data u (d=abs(Derivative($1,$2)),n=n+1,n<=2? (dmin=dmax=d) : \
(dmin>d ? dmin=d:dmin), (dmax<d?dmax=d:dmax)) w table
unset table
myColor(x,y) = (int((abs(Derivative(column(x),column(y)))-dmin)/(dmax-dmin)*0xff)<<24) +0xffffff
plot $Data u 1:2:(myColor(1,2)) w l lw 1.5 lc rgb var not
### end of code
Result:

Heatmap of points in a volume

I have (x,y,z) points with coordinates like the following figure,
I would like to color the points based on their concentration.
The idea is to make a heatmap of points but in a 3D figure.
I would appreciate very much any help possible.
Regards.
Use data values in a 4th column to index a smooth color palette
splot DATA using 1:2:3:4 with points lc palette
The gnuplot development version now supports calculation of a point density function that can in turn be used to color individual points. This depends on a new set of commands that operate on a 3D grid of voxels. Sample script and output:
set title "Gaussian 3D cloud of 3000 random samples\ncolored by local point density"
rlow = -4.0; rhigh = 4.0
set xrange [rlow:rhigh]; set yrange [rlow:rhigh]; set zrange [rlow:rhigh]
set xtics axis nomirror; set ytics axis nomirror; set ztics axis nomirror;
set xyplane at 0
set xzeroaxis lt -1; set yzeroaxis lt -1; set zzeroaxis lt -1;
set log cb; set cblabel "point density"
# define 100 x 100 x 100 voxel grid
set vgrid $vdensity size 100
vclear $vdensity
# datablock $random has previously been loaded with 3000 points
# in a spherical Gaussian distribution about the origin
# The vfill command adds 1 to each voxel in a spherical region with radius 0.33
# around each point in $random
vfill $random using 1:2:3:(0.33):(1.0)
# plot the same points colored by local point density
splot $random using 1:2:3:(voxel($1,$2,$3)) with points pt 7 ps 0.5 lc palette
Full demo here: voxel demo in gnuplot online collection

Gnuplot: Scatter plot and density

I have x- and y-data points representing a star cluster. I want to visualize the density using Gnuplot and its scatter function with overlapping points.
I used the following commands:
set style fill transparent solid 0.04 noborder
set style circle radius 0.01
plot "data.dat" u 1:2 with circles lc rgb "red"
The result:
However I want something like that
Is that possible in Gnuplot? Any ideas?
(edit: revised and simplified)
Probably a much better way than my previous answer is the following:
For each data point check how many other data points are within a radius of R. You need to play with the value or R to get some reasonable graph.
Indexing the datalines requires gnuplot>=5.2.0 and the data in a datablock (without empty lines). You can either first plot your file into a datablock (check help table) or see here:
gnuplot: load datafile 1:1 into datablock
The time for creating this graph will increase with number of points O(N^2) because you have to check each point against all others. I'm not sure if there is a smarter and faster method. The example below with 1200 datapoints will take about 4 seconds on my laptop. You basically can apply the same principle for 3D.
Script: works with gnuplot>=5.2.0
### 2D density color plot
reset session
t1 = time(0.0)
# create some random rest data
set table $Data
set samples 700
plot '+' u (invnorm(rand(0))):(invnorm(rand(0))) w table
set samples 500
plot '+' u (invnorm(rand(0))+2):(invnorm(rand(0))+2) w table
unset table
print sprintf("Time data creation: %.3f s",(t0=t1,t1=time(0.0),t1-t0))
# for each datapoint: how many other datapoints are within radius R
R = 0.5 # Radius to check
Dist(x0,y0,x1,y1) = sqrt((x1-x0)**2 + (y1-y0)**2)
set print $Density
do for [i=1:|$Data|] {
x0 = real(word($Data[i],1))
y0 = real(word($Data[i],2))
c = 0
stats $Data u (Dist(x0,y0,$1,$2)<=R ? c=c+1 : 0) nooutput
d = c / (pi * R**2) # density: points per unit area
print sprintf("%g %g %d", x0, y0, d)
}
set print
print sprintf("Time density check: %.3f sec",(t0=t1,t1=time(0.0),t1-t0))
set size ratio -1 # same screen units for x and y
set palette rgb 33,13,10
plot $Density u 1:2:3 w p pt 7 lc palette z notitle
### end of script
Result:
Would it be an option to postprocess the image with imagemagick?
# convert into a gray scale image
convert source.png -colorspace gray -sigmoidal-contrast 10,50% gray.png
# build the gradient, the heights have to sum up to 256
convert -size 10x1 gradient:white-white white.png
convert -size 10x85 gradient:red-yellow \
gradient:yellow-lightgreen \
gradient:lightgreen-blue \
-append gradient.png
convert gradient.png white.png -append full-gradient.png
# finally convert the picture
convert gray.png full-gradient.png -clut target.png
I have not tried but I am quite sure that gnuplot can plot the gray scale image directly.
Here is the (rotated) gradient image:
This is the result:
Although this question is rather "old" and the problem might have been solved differently...
It's probably more for curiosity and fun than for practical purposes.
The following code implements a coloring according to the density of points using gnuplot only. On my older computer it takes a few minutes to plot 1000 points. I would be interested if this code can be improved especially in terms of speed (without using external tools).
It's a pity that gnuplot does not offer basic functionality like sorting, look-up tables, merging, transposing or other basic functions (I know... it's gnuPLOT... and not an analysis tool).
The code:
### density color plot 2D
reset session
# create some dummy datablock with some distribution
N = 1000
set table $Data
set samples N
plot '+' u (invnorm(rand(0))):(invnorm(rand(0))) w table
unset table
# end creating dummy data
stats $Data u 1:2 nooutput
XMin = STATS_min_x
XMax = STATS_max_x
YMin = STATS_min_y
YMax = STATS_max_y
XRange = XMax-XMin
YRange = YMax-YMin
XBinCount = 20
YBinCount = 20
BinNo(x,y) = floor((y-YMin)/YRange*YBinCount)*XBinCount + floor((x-XMin)/XRange*XBinCount)
# do the binning
set table $Bins
plot $Data u (BinNo($1,$2)):(1) smooth freq # with table
unset table
# prepare final data: BinNo, Sum, XPos, YPos
set print $FinalData
do for [i=0:N-1] {
set table $Data3
plot $Data u (BinNumber = BinNo($1,$2),$1):(XPos = $1,$1):(YPos = $2,$2) every ::i::i with table
plot [BinNumber:BinNumber+0.1] $Bins u (BinNumber == $1 ? (PointsInBin = $2,$2) : NaN) with table
print sprintf("%g\t%g\t%g\t%g", XPos, YPos, BinNumber, PointsInBin)
unset table
}
set print
# plot data
set multiplot layout 2,1
set rmargin at screen 0.85
plot $Data u 1:2 w p pt 7 lc rgb "#BBFF0000" t "Data"
set xrange restore # use same xrange as previous plot
set yrange restore
set palette rgbformulae 33,13,10
set colorbox
# draw the bin borders
do for [i=0:XBinCount] {
XBinPos = i/real(XBinCount)*XRange+XMin
set arrow from XBinPos,YMin to XBinPos,YMax nohead lc rgb "grey" dt 1
}
do for [i=0:YBinCount] {
YBinPos = i/real(YBinCount)*YRange+YMin
set arrow from XMin,YBinPos to XMax,YBinPos nohead lc rgb "grey" dt 1
}
plot $FinalData u 1:2:4 w p pt 7 ps 0.5 lc palette z t "Density plot"
unset multiplot
### end of code
The result:

Fit histogram in gnuplot

I'm trying to fit data (histogram) in gnuplot. I tried various functions, and by looking at my histogram, I suppose the best fit is lognormal or gamma distribution, but I am not able to do this fit in gnuplot (Im rather new user of gnuplot).
Here is picture of histogram with gaussian distribution:
Also here is code in gnuplot:
reset
n=100 #number of intervals
max=15. #max value
min=0. #min value
width=(max-min)/n #interval width
#function used to map a value to the intervals
hist(x,width)=width*floor(x/width)
set term png #output terminal and file
set output "histogram.png"
set xrange [min:max]
set yrange [0:]
#to put an empty boundary around the
#data inside an autoscaled graph.
set offset graph 0.05,0.05,0.05,0.0
set xtics min,(max-min)/5,max
set boxwidth width*0.9
set style fill solid 0.5 #fillstyle
set tics out nomirror
set xlabel "Diameter"
set ylabel "Frequency"
#count and plot
#fac(x) = (int(x)==0) ? 1.0 : int(x) * fac(int(x)-1.0)
gauss(x)=a/(sqrt(2*pi)*sigma)*exp(-(x-mean)**2/(2*sigma**2))
fit gauss(x) 'hist.temp' u 1:2 via a, sigma, mean
plot 'data.list' u (hist($8, width)):(1.0) smooth freq w boxes lc rgb "green" notitle, \
gauss(x) w lines ls 2 lw 2
In file hist.temp is tabular output ( see this link )

Resources