gnuplot histogram bins divided by volume - gnuplot

I am simulating points in a sphere volume with radius 1. I generated 1.000.000 monte-carlo based points in this volume. To make a gnuplot histogram i calculated the length of each vector (every vector length is between 0 and 1). With 100 bins the histogram looks like:
gnuplot data histogram.
If someone is wondering why there no points greater than 0.91 are generated, i also dont know, but this is not the question here.
This is my gnuplot Code:
n=100 #number of intervals
max=1.0 #max value
min=0.0 #min value
width=(max-min)/n #interval width
#function used to map a value to the intervals
hist(x,width)=width*floor(x/width)+width/2.0
#settings
set xlabel "Radius"
set ylabel "Primarys/Intervall"
set xrange [-0.1:1.1]
set yrange [0:32000]
set boxwidth width*0.8
set style fill solid 0.5 #fillstyle
set tics out nomirror
#plot
plot "primaryPosition(1).csv" u (hist($1,width)):(1.0) smooth freq w boxes lc rgb"green"
In general: A Volume grows by r^3 to Radius r.
In my histrogram every spherical shell is one bin and the bin number is 100. So, as the bin number increases, the volume of each sperical shell grows cubically (with r^3). From this point of view, the histogram looks good.
But what i want to do is to plot the density of points per volume: points/shellvolume.
This should be a linear distribution from the center of the sphere to its border.
How can i tell gnuplot to divide each bin by its corresponding volume, which depends on the outer and the inner radius of each spherical shell?
The formula is: (4/3)pi(R^3-r^3) with R outer and r inner radius a shell.

The following example creates some random test data (should be 20'000 equally distributed random points).
One possibility would be that you first you create your histogram data via binning into a table and then you divide it by the volume of the shell.
By the way, the volume of a sphere shell is (4./3)*pi*(R**3-r**3), not the formula you've given. And why are you setting max < min? Maybe you want to fine tune the binning to your exact needs.
Code:
### histogram normalized by sphere shell volume
reset session
set view equal xyz
# create some test data
set print $Data
do for [i=1:20000] {
x = rand(0)*2-1
y = rand(0)*2-1
z = rand(0)*2-1
r = sqrt(x**2 + y**2 + z**2)
if (r <= 1) { print sprintf("%g %g %g %g",x,y,z,r) }
}
set print
n = 100 # number of intervals
min = 0.0 # max value
max = 1.0 # min value
myWidth=(max-min)/n # interval width
bin(x)=myWidth*floor(x/myWidth)
ShellVolume(r) = (4./3)*pi*((r+myWidth)**3-r**3)
set boxwidth myWidth absolute
set table $Histo
plot $Data u (bin($4)):(1) smooth freq
unset table
set multiplot layout 2,1
plot $Histo u 1:2 w boxes ti "Occurrences"
plot $Histo u 1:($2/ShellVolume($1)) w boxes ti "Density"
unset multiplot
### end of code
Result:

Related

Gnuplot smoothing data in loglog plot

I would like to plot a smoothed curve based on a dataset which spans over 13 orders of magnitude [1E-9:1E4] in x and 4 orders of magnitude [1E-6:1e-2] in y.
MWE:
set log x
set log y
set xrange [1E-9:1E4]
set yrange [1E-6:1e-2]
set samples 1000
plot 'data.txt' u 1:3:(1) smooth csplines not
The smooth curve looks nice above x=10. Below, it is just a straight line down to the point at x=1e-9.
When increasing samples to 1e4, smoothing works well above x=1. For samples 1e5, smoothing works well above x=0.1 and so on.
Any idea on how to apply smoothing to lower data points without setting samples to 1e10 (which does not work anyway...)?
Thanks and best regards!
JP
To my understanding sampling in gnuplot is linear. I am not aware, but maybe there is a logarithmic sampling in gnuplot which I haven't found yet.
Here is a suggestion for a workaround which is not yet perfect but may act as a starting point.
The idea is to split your data for example into decades and to smooth them separately.
The drawback is that there might be some overlaps between the ranges. These you can minimize or hide somehow when you play with set samples and every ::n or maybe there is another way to eliminate the overlaps.
Code:
### smoothing over several orders of magnitude
reset session
# create some random test data
set print $Data
do for [p=-9:3] {
do for [m=1:9:3] {
print sprintf("%g %g", m*10**p, (1+rand(0))*10**(p/12.*3.-2))
}
}
set print
set logscale x
set logscale y
set format x "%g"
set format y "%g"
set samples 100
pMin = -9
pMax = 3
set table $Smoothed
myFilter(col,p) = (column(col)/10**p-1) < 10 ? column(col) : NaN
plot for [i=pMin:pMax] $Data u (myFilter(1,i)):2 smooth cspline
unset table
plot $Data u 1:2 w p pt 7 ti "Data", \
$Smoothed u 1:2 every ::3 w l ti "cspline"
### end of code
Result:
Addition:
Thanks to #maij who pointed out that it can be simplified by simply mapping the whole range into linear space. In contrast to #maij's solution I would let gnuplot handle the logarithmic axes and keep the actual plot command as simple as possible with the extra effort of some table plots.
Code:
### smoothing in loglog plot
reset session
# create some random test data
set print $Data
do for [p=-9:3] {
do for [m=1:9:3] {
print sprintf("%g %g", m*10**p, (1+rand(0))*10**(p/12.*3.-2))
}
}
set print
set samples 500
set table $SmoothedLog
plot $Data u (log10($1)):(log10($2)) smooth csplines
set table $Smoothed
plot $SmoothedLog u (10**$1):(10**$2) w table
unset table
set logscale x
set logscale y
set format x "%g"
set format y "%g"
set key top left
plot $Data u 1:2 w p pt 7 ti "Data", \
$Smoothed u 1:2 w l lc "red" ti "csplines"
### end of code
Result:
Using a logarithmic scale basically means to plot the logarithm of a value instead of the value itself. The set logscale command tells gnuplot to do this automatically:
read the data, still linear world, no logarithm yet
calculate the splines on an equidistant grid (smooth csplines), still linear world
calculate and plot the logarithms (set logscale)
The key point is the equidistant grid. Let's say one chooses set xrange [1E-9:10000] and set samples 101. In the linear world 1e-9 compared to 10000 is approximately 0, and the resulting grid will be 1E-9 ~ 0, 100, 200, 300, ..., 9800, 9900, 10000. The first grid point is at 0, the second one at 100, and gnuplot is going to draw a straight line between them. This does not change when afterwards logarithms of the numbers are plotted.
This is what you already have noted in your question: you need 10 times more points to get a smooth curve for smaller exponents.
As a solution, I would suggest to switch the calculation of the logarithms and the calculation of the splines.
# create some random test data, code "stolen" from #theozh (https://stackoverflow.com/a/66690491)
set print $Data
do for [p=-9:3] {
do for [m=1:9:3] {
print sprintf("%g %g", m*10**p, (1+rand(0))*10**(p/12.*3.-2))
}
}
set print
# this makes the splines smoother
set samples 1000
# manually account for the logarithms in the tic labels
set format x "10^{%.0f}" # for example this format
set format y "1e{%+03.0f}" # or this one
set xtics 2 # logarithmic world, tic distance in orders of magnitude
set ytics 1
# just "read logarithm of values" from file, before calculating splines
plot $Data u (log10($1)):(log10($2)) w p pt 7 ti "Data" ,\
$Data u (log10($1)):(log10($2)) ti "cspline" smooth cspline
This is the result:

Gnuplot: oscilloscope-like line style?

Is it possible in Gnuplot to emulate the drawing style of an analogue oscilloscope, meaning thinner+dimmisher lines on larger amplitudes, like this:?
The effect you see in the oscilloscope trace is not due to amplitude, it is due to the rate of change as the trace is drawn. If you know that rate of change and can feed it to gnuplot as a third column of values, then you could use it to modulate the line color as it is drawn:
plot 'data' using 1:2:3 with lines linecolor palette z
I don't know what color palette would work best for your purpose, but here is an approximation using a function with an obvious, known, derivative.
set palette gray
set samples 1000
plot '+' using ($1):(sin($1)):(abs(cos($1))) with lines linecolor palette
For thickness variations, you could shift the curve slightly up and down, and fill the area between them.
f(x) = sin(2*x) * sin(30*x)
dy = 0.02
plot '+' u 1:(f(x)+dy):(f(x)-dy) w filledcurves ls 1 notitle
This does not allow variable colour, but the visual effect is similar.
Another approach:
As #Ethan already stated, the intensity is somehow proportional to the speed of movement, i.e. the derivative. If you have sin(x) as waveform, the derivative is cos(x). But what if you have given data? Then you have to calculate the derivative numerically.
Furthermore, depending on the background the line should fade from white (minimal derivative) to fully transparent (maximum derivative), i.e. you should change the transparency with the derivative.
Code:
### oscilloscope "imitation"
reset session
set term wxt size 500,400 butt # option butt, otherwise you will get overlap points
set size ratio 4./5
set samples 1000
set xrange[-5:5]
# create some test data
f(x) = 1.5*sin(15*x)*(cos(1.4*x)+1.5)
set table $Data
plot '+' u 1:(f($1)) w table
unset table
set xtics axis 1 format ""
set mxtics 5
set grid xtics ls -1
set yrange[-4:4]
set ytics axis 1 format ""
set mytics 5
set grid ytics ls -1
ColorScreen = 0x28a7e0
set obj 1 rect from screen 0,0 to screen 1,1 behind
set obj 1 fill solid 1.0 fc rgb ColorScreen
x0=y0=NaN
Derivative(x,y) = (dx=x-x0,x0=x,x-dx/2,dy=y-y0,y0=y,dy/dx) # approx. derivative
# get min/max derivative
set table $Dummy
plot n=0 $Data u (d=abs(Derivative($1,$2)),n=n+1,n<=2? (dmin=dmax=d) : \
(dmin>d ? dmin=d:dmin), (dmax<d?dmax=d:dmax)) w table
unset table
myColor(x,y) = (int((abs(Derivative(column(x),column(y)))-dmin)/(dmax-dmin)*0xff)<<24) +0xffffff
plot $Data u 1:2:(myColor(1,2)) w l lw 1.5 lc rgb var not
### end of code
Result:

Discrete heat map with GNUPLOT

I'm trying to make something as a heat map with GNUPLOT but I need that my palette takes discrete colors for defined values.
I mean, my data file has three columns, for example:
x y value
0.0 0.0 10
0.0 0.5 2
0.0 1.0 2
0.5 1.0 10
1.0 0.0 -1
1.0 1.0 -1
I need that each point has one color depending of its value. Traditional heat map mixes point making regions of continuos colors, but I need it in a discrete form.
If your data forms a "matrix", i.e., there are M x-samples, N y-samples, and you have the data for all MxN points, then probably the easiest solution is to use
plot ... w rgbimage u 1:2:(r($3)):(g($3)):(b($3))
and supply the r,g,b values as three additional columns as shown above.
However, if your data is "sparse" (only some of the samples are available as shown in your question) and there are not many points, one might be tempted to generate the elementary squares forming the plot manually. To this end, one could proceed as:
set terminal png enhanced
set output 'plot.png'
#custom value -> color mapping
rgb(r, g, b) = 65536 * int(r) + 256 * int(g) + int(b)
fn(val) = rgb(100 + val*10, 0, 0)
#square size
delta = 0.5
set xr [-delta/2:1+delta/2]
set yr [-delta/2:1+delta/2]
set xtics 0,delta/2,1 out nomirror
set ytics 0,delta/2,1 out nomirror
set format x "%.2f"
set format y "%.2f"
set size ratio 1
unset key
fName="test.dat"
load sprintf("<gawk -v d=%f -f parse.awk %s", delta, fName)
plot fName u 1:2:3 w labels tc rgb 'white'
This script assumes the presence of auxiliary gawk script parse.awk in the same directory:
{
printf "set object rectangle from %f,%f to %f,%f fc rgb fn(%d) fs solid\n",
$1-d/2, $2-d/2, $1+d/2, $2+d/2, $3
}
This scripts accepts the required square size (-v d=%f in the invocation of gawk) and generates for each point a statement generating the corresponding square. These statements are consequently executed by the load command.
Mapping of the colors is done via the function fn defined in the main Gnuplot script. It takes the passed value and generates a rgb value which is then used with fc rgb in the rectangle specification.
Together, this then produces:
This might do what you want, after some fiddling:
set view map
set style fill transparent solid noborder
splot 'data' u 1:2:3:(100+200*$3) pt 5 lc rgbcolor var ps 14
The pt 5 will plot a square (at least in the x11 term) at each point in the datafile, colored according to a transformation on the last column.

Gnuplot: Scatter plot and density

I have x- and y-data points representing a star cluster. I want to visualize the density using Gnuplot and its scatter function with overlapping points.
I used the following commands:
set style fill transparent solid 0.04 noborder
set style circle radius 0.01
plot "data.dat" u 1:2 with circles lc rgb "red"
The result:
However I want something like that
Is that possible in Gnuplot? Any ideas?
(edit: revised and simplified)
Probably a much better way than my previous answer is the following:
For each data point check how many other data points are within a radius of R. You need to play with the value or R to get some reasonable graph.
Indexing the datalines requires gnuplot>=5.2.0 and the data in a datablock (without empty lines). You can either first plot your file into a datablock (check help table) or see here:
gnuplot: load datafile 1:1 into datablock
The time for creating this graph will increase with number of points O(N^2) because you have to check each point against all others. I'm not sure if there is a smarter and faster method. The example below with 1200 datapoints will take about 4 seconds on my laptop. You basically can apply the same principle for 3D.
Script: works with gnuplot>=5.2.0
### 2D density color plot
reset session
t1 = time(0.0)
# create some random rest data
set table $Data
set samples 700
plot '+' u (invnorm(rand(0))):(invnorm(rand(0))) w table
set samples 500
plot '+' u (invnorm(rand(0))+2):(invnorm(rand(0))+2) w table
unset table
print sprintf("Time data creation: %.3f s",(t0=t1,t1=time(0.0),t1-t0))
# for each datapoint: how many other datapoints are within radius R
R = 0.5 # Radius to check
Dist(x0,y0,x1,y1) = sqrt((x1-x0)**2 + (y1-y0)**2)
set print $Density
do for [i=1:|$Data|] {
x0 = real(word($Data[i],1))
y0 = real(word($Data[i],2))
c = 0
stats $Data u (Dist(x0,y0,$1,$2)<=R ? c=c+1 : 0) nooutput
d = c / (pi * R**2) # density: points per unit area
print sprintf("%g %g %d", x0, y0, d)
}
set print
print sprintf("Time density check: %.3f sec",(t0=t1,t1=time(0.0),t1-t0))
set size ratio -1 # same screen units for x and y
set palette rgb 33,13,10
plot $Density u 1:2:3 w p pt 7 lc palette z notitle
### end of script
Result:
Would it be an option to postprocess the image with imagemagick?
# convert into a gray scale image
convert source.png -colorspace gray -sigmoidal-contrast 10,50% gray.png
# build the gradient, the heights have to sum up to 256
convert -size 10x1 gradient:white-white white.png
convert -size 10x85 gradient:red-yellow \
gradient:yellow-lightgreen \
gradient:lightgreen-blue \
-append gradient.png
convert gradient.png white.png -append full-gradient.png
# finally convert the picture
convert gray.png full-gradient.png -clut target.png
I have not tried but I am quite sure that gnuplot can plot the gray scale image directly.
Here is the (rotated) gradient image:
This is the result:
Although this question is rather "old" and the problem might have been solved differently...
It's probably more for curiosity and fun than for practical purposes.
The following code implements a coloring according to the density of points using gnuplot only. On my older computer it takes a few minutes to plot 1000 points. I would be interested if this code can be improved especially in terms of speed (without using external tools).
It's a pity that gnuplot does not offer basic functionality like sorting, look-up tables, merging, transposing or other basic functions (I know... it's gnuPLOT... and not an analysis tool).
The code:
### density color plot 2D
reset session
# create some dummy datablock with some distribution
N = 1000
set table $Data
set samples N
plot '+' u (invnorm(rand(0))):(invnorm(rand(0))) w table
unset table
# end creating dummy data
stats $Data u 1:2 nooutput
XMin = STATS_min_x
XMax = STATS_max_x
YMin = STATS_min_y
YMax = STATS_max_y
XRange = XMax-XMin
YRange = YMax-YMin
XBinCount = 20
YBinCount = 20
BinNo(x,y) = floor((y-YMin)/YRange*YBinCount)*XBinCount + floor((x-XMin)/XRange*XBinCount)
# do the binning
set table $Bins
plot $Data u (BinNo($1,$2)):(1) smooth freq # with table
unset table
# prepare final data: BinNo, Sum, XPos, YPos
set print $FinalData
do for [i=0:N-1] {
set table $Data3
plot $Data u (BinNumber = BinNo($1,$2),$1):(XPos = $1,$1):(YPos = $2,$2) every ::i::i with table
plot [BinNumber:BinNumber+0.1] $Bins u (BinNumber == $1 ? (PointsInBin = $2,$2) : NaN) with table
print sprintf("%g\t%g\t%g\t%g", XPos, YPos, BinNumber, PointsInBin)
unset table
}
set print
# plot data
set multiplot layout 2,1
set rmargin at screen 0.85
plot $Data u 1:2 w p pt 7 lc rgb "#BBFF0000" t "Data"
set xrange restore # use same xrange as previous plot
set yrange restore
set palette rgbformulae 33,13,10
set colorbox
# draw the bin borders
do for [i=0:XBinCount] {
XBinPos = i/real(XBinCount)*XRange+XMin
set arrow from XBinPos,YMin to XBinPos,YMax nohead lc rgb "grey" dt 1
}
do for [i=0:YBinCount] {
YBinPos = i/real(YBinCount)*YRange+YMin
set arrow from XMin,YBinPos to XMax,YBinPos nohead lc rgb "grey" dt 1
}
plot $FinalData u 1:2:4 w p pt 7 ps 0.5 lc palette z t "Density plot"
unset multiplot
### end of code
The result:

gnuplot: How to increase the width of my graph

I am using 'gnuplot' to plot a line graph with point:
set style data linespoints
set xlabel "number"
set ylabel "Dollars"
set yrange [0:250]
how can I increase the width of my graph, so that as I have more 'x', i want my graph to more of a rectangle instead of a square?
And how can I increase the interval of my 'y-axis'? now, it just draw a mark for every 50 in my y axis?
It sounds like you want your output to dynamically adjust in size to the data being plotted. Here is a script that does that:
#!/usr/bin/env gnuplot
# don't make any output just yet
set terminal unknown
# plot the data file to get information on ranges
plot 'data.dat' title 'My Moneys'
# span of data in x and y
xspan = GPVAL_DATA_X_MAX - GPVAL_DATA_X_MIN
yspan = GPVAL_DATA_Y_MAX - GPVAL_DATA_Y_MIN
# define the values in x and y you want to be one 'equivalent:'
# that is, xequiv units in x and yequiv units in y will make a square plot
xequiv = 100
yequiv = 250
# aspect ratio of plot
ar = yspan/xspan * xequiv/yequiv
# dimension of plot in x and y (pixels)
# for constant height make ydim constant
ydim = 200
xdim = 200/ar
# set the y tic interval
set ytics 100
# set the x and y ranges
set xrange [GPVAL_DATA_X_MIN:GPVAL_DATA_X_MAX]
set yrange [GPVAL_DATA_Y_MIN:GPVAL_DATA_Y_MAX]
# set the labels
set title 'Dollars in buckets'
set xlabel 'number'
set ylabel 'Dollars'
set terminal png size xdim,ydim
set output 'test.png'
set size ratio ar
set style data linespoints
replot
For these example data:
0 50
50 150
100 400
150 500
200 300
I get the following plot:
It is about square, as it should be (I defined 100 units in x to be equal to 250 units in y, and the data span the range [(0,200),(50,500)]). If I add another data point (400,300), the output file is wider, as expected:
To answer your other question, you can set the y tic increment thus:
set ytics <INCREMENT>
The script above gives an example.
To add to the discussion here, there's also set size ratio ... so that you can set the aspect ratio of your plot.
Here's an excerpt from help set size:
ratio causes gnuplot to try to create a graph with an aspect ratio of
(the ratio of the y-axis length to the x-axis length) within the portion of
the plot specified by <xscale> and <yscale>.
The meaning of a negative value for is different. If =-1, gnuplot
tries to set the scales so that the unit has the same length on both the x
and y axes (suitable for geographical data, for instance). If =-2, the
unit on y has twice the length of the unit on x, and so on.
For this to really work, you'll probably need to set the output driver to some reasonable size:
set term png size 800,400 #800 pixels by 400 pixels
or:
set term post size 8,4 #8 inches by 4 inches
This is all terminal dependent, so its worth it to look up the terminal's help to see what units it uses, etc.
set xrange[:]
set yrange[:]
Use those 2 commands to define the 'size' of your graph ;)

Resources