Gnuplot smooth curve through frequency points + filled area under curve - gnuplot

I am a gnuplot-newbie and am stuck with the following situation. Based on this I have a gnuplot script as follows:
clear
reset
set key off
set border 3
set style fill solid 1.0 noborder
bin_width = 0.01;
set boxwidth bin_width absolute
bin_number(x) = floor(x/bin_width)
rounded(x) = bin_width * ( bin_number(x) + 0.5 )
plot '1000randomValuesBetween0and1.dat' using (rounded($1)):(1) smooth frequency
Which was a good first step; but I would like to have a smooth curve through the points that are generated by counting the frequency. with filledcurves lacked what I wanted in 2 ways. First it is not smoothed (I would prefer something like bezier which is not usable after with); second the filling is done in a rather unexpected way which doesn't fit my needs (for me unexpected). See this picture .
To give a little bit more context: I ultimately want to use this to generate
violin plots with gnuplot without having to do the binning beforehand so I can just give my script a single-column data-file and am ready to go.
EDIT: I tried adapting the "normal" density plot from this demo as another first step, but I failed; I read in the documentation that bandwidth should be 1/#points so it
should be 0.001 in my case meaning I tried this:
set border 3 front lt black linewidth 1.000 dashtype solid
set style increment default
set style data filledcurves
set xtics border in scale 0,0 nomirror norotate autojustify
set xtics norangelimit 0.00000,0.5,1.0
set title "Same data - kernel density"
set title font ",15" norotate
plot 'random01.dat' using 1:(1) smooth kdensity bandwidth 0.001 with filledcurves above y lt 9
which results in this picture:.
Setting no bandwith or lower/higher values didn't solve the issue.
The plot specifies using 1:(1) because I just have a single column so according to the doc the first value should be this column and as the second value would specify a weighting which should be 1/#points according to doc.
EDIT2: Setting bandwidth to the ideal value or not setting it at all always yields the same result which doesn't change anything except the scale of the y-axis with changing the weighting.
My data are 1000 values in a range between 0 and 1 (created randomly for testing purposes).
Here the new plot
EDIT3: zooming out may show another aspect of the problem as the plot seems to extend outside the interval of the given values (I checked the values and there are no examples <0 or >1). Here's the graph:

The demo 'violinplot.dem' included with the gnuplot distribution package and also available online shows how to do what you want using the combination "smooth kdensity" and "with filledcurve" applied to unbinned data.
Online version here: violin plot demo
Notes:
You mis-read the documentation. 1/N is not the recommended bandwidth, it is the normalized uniform weight. The plot you showed initially looks like the bandwidth was set far too low. What is the range of values in your data?
I suggest letting the program calculate the "ideal" bandwidth for you and then adjusting it afterwards if you think it is too large. The ideal value is stored in GPVAL_KDENSITY_BANDWIDTH. Increasing the bandwidth will make the envelope smoother; decreasing it will emphasize local spikes.

Related

Getting smooth curve with gnuplot

I'm not getting smooth curve on gnuplot.
This is my code:
set style line 3 lc rgb '#09ad00' lt 1 lw 1.5 #green
set style line 1 lc rgb '#0060ad' lt 1 lw 2 #blue
set style line 2 lc rgb '#dd181f' lt 1 lw 2 #red
plot [-1:1] f1(x) with line ls 3,f2(x) with line ls 1,f1(x)+f2(x) with line ls 2
I'm getting this plot
while I'm expecting this type of curve
You haven't shown what your particular functions are, but this is almost certainly a sampling problem. Gnuplot doesn't really draw curves for functions - it actually computes the functions at multiple points and connects them with straight lines, similarly to what would happen if you were plotting a data file. The number of points that it computes is user settable.
Suppose that I do plot sin(x) and see this:
Here the sampling rate is set pretty low. We can look at the individual points in order to see what is going on.
In order to improve this, I need to increase the sampling rate by using the set samples ? command. The default is 100 (in 5.0 patch level 6). Depending on how rapidly the function changes, higher values may be needed. I usually set it to around 1000 with set samples 1000. This changes the graph to
which produces a much nicer smooth curve. Again, this is just a bunch of points connected by straight lines, but when there are a lot of these, it looks like a smooth continuous curve.
We can look at the individual points again (using a sampling rate of 100 as 1000 is too many to clearly see the points)
We can also see here that there is not much difference between the graph with 1000 points and 100 points. In the case of a sine curve, 100 is enough to see a smooth graph, but with a faster changing curve, we may need more.
The set samples command takes (optionally) two values, but the second value is only used for 3d plots. You can find out more with the help samples command.

Gnuplot's Graphic Jump

is there a way to avoid the drawing of the near asymptote line in the function 1/(2-x), for example, without usage of conditional plotting? The idea is to draw iterated functions based in this one and, since asymptote changes, using conditional plotting isn't a good solution.
You can plot with points at a very high sampling rate:
set yrange [-10:10]
set samples 100000
plot 1/(2-x) with points
If the singularity occurs at different values of x you can use conditional plotting on y:
f(x)=1/(2-x)
set samples 1000
plot (abs(f(x)) < 10 ? f(x) : 1/0) with lines

Auto-scale setting the window too small GNUPLOT

I have this code:
set title "Ex1.txt"
set key title "Legenda"
set key inside right top vertical Right reverse enhanced autotitle box opaque
set key noinvert samplen 1 spacing 1 width 0 height 0
set style fill transparent solid 0.50 noborder
set parametric
set trange[0:]
set xrange[0:]
set yrange[0:]
set grid
set terminal pngcairo
set output 'ev.png'
plot [0:][0:] t, -2.0*t+(2000.0) with filledcurves x1, t, -0.6666666666666666*t+(1333.3333333333333) with filledcurves x1, t, -0.8333333333333334*t+(0.0),t, -0.8333333333333334*t+(1333.3333333333333),t, -0.8333333333333334*t+(1416.6666666666667),
unset output
unset terminal
unset parametric
exit
But when I run this script the windows, in the x "range", goes from 0 to 5. It really is supposed to start at zero but if it goes only until 5 I can barely distinguish the lines.
Here you can see the output of the code as it is:
If I change the x scale to [0:700] it goes like this:
As you can see much better because we can distinguish all lines. The problem is that I can't specify the maximum range because the equations might be different, because this is an output of a java program that I have, for example in this case 700 works but in another exercise the optimal value could be 300. Is there a way to make the gnuplot to know the max range of the x axis without the autoscale as it is (because it stops at x=5)?
Thanks in advance
No, there is no way to do that. gnuplot uses a default fixed range for all functions (unless set to another value by a range command), and it has no idea what you might find interesting (do you want to see where the curves intersect, or do you want to see where they intersect the axes - gnuplot wouldn't know), so can't highlight any such features.
Additionally, although it does have some features to analyze curves and such, it is far from being a mathematical workspace, and would have no way to find such interesting points. gnuplot is designed to graph data, not manipulate it.
If you have to drive this from another program, you are going to have to have that program do the analysis and figure out what range to use. I have several python programs that use gnuplot to graph data, but the python code figures out what ranges need to be and adds the command to the gnuplot call.

gnuplot: filling the whole space when plotting sampled data

I have a problem with gnuplot. I've searched and I don't find the correct solution. I'm plotting some data arranged in three columns with the command splot, and the steps in x and y are different. The plot I get with:
set view map
splot 'data.dat' using 1:2:3 with points palette
is:
and I would like the white space to be filled, making each tile size adapt, avoiding interpolation.
Some ideas are given here Reduce distance between points in splot.
I've tryed http://gnuplot.sourceforge.net/demo/heatmaps.html too, but with image doesn't seem to work :(
I should avoid pointsize as my grid changes from time to time.
You can try
set pm3d map interpolate 1,1 corners2color c1
splot 'data.dat' using 1:($2-5e-5):3
This uses no interpolation, and the color of each polygon depends on the value of corner 'c1'. You may need to test if this is the correct one, or if you need 'c2', 'c3', or 'c4'.
Another solution to my problem, better than this one for some terminals at least, is given in the answers to my other question about maps appearance in pdfcairo terminal, where the solution comes when using plot with image insted of this splot. I tried to use that before, as I mention here, but maybe it also needed this specific data format.

Histogram in logarithmic scale in gnuplot

I have to plot an histogram in logarithmic scale on both axis using gnuplot. I need bins to be equally spaced in log10. Using a logarithmic scale on the y axis isn't a problem. The main problem is creating the bin on the x axis. For example, using 10 bins in log10, first bins will be [1],[2],[3]....[10 - 19][20 - 29].....[100 190] and so on. I've searched on the net but I couldn't find any practical solution. If realizing it in gnuplot is too much complicated could you suggest some other software/language to do it?
As someone asked I will explain more specifically what I need to do. I have a (huge) list like this:
1 14000000
2 7000000
3 6500000
.
.
.
.
6600 1
8900 1
15000 1
19000 1
It shows, for example, that 14 milions of ip addresses have sent 1 packet, 7 milions 2 packets.... 1 ip address have sent 6600 packets, ... , 1 ip address have sent 19000 packets. As you can see the values on both axes are pretty high so I cannot plot it without a logarithmic scale.
The first things I tried because I needed to do it fast was plotting this list as it is with gnuplot setting logscale on both axes using boxes. The result is understandable but not too appropriate. In fact, the boxes became more and more thin going right on the x axis because, obviously, there are more points in 10-100 than in 1-10! So it became a real mess after the second decade.
I tried plotting a histogram with both axis being logarithmically scaled and gnuplot through the error
Log scale on X is incompatible with histogram plots.
So it appears that gnuplot does not support a log scale on the x axis with histograms.
Plotting in log-log scale in GnuPlot is perfectly doable contrary to the other post in this thread.
One can set the log-log scale in GnuPlot with the command set logscale.
Then, the assumption is that we have a file with positive (strictly non-zero) values both in the x-axis, as well as the y-axis. For example, the following file is a valid file:
1 0.5
2 0.2
3 0.15
4 0.05
After setting the log-log scale one can plot the file with the command:
plot "file.txt" w p where of course file.txt is the name of the file. This command will generate the output with points.
Note also that plotting boxes is tricky and is probably not recommended. One first has to restrict the x-range with a command of the form set xrange [1:4] and only then plot with boxes. Otherwise, when the x-range is undefined an error is returned. I am assuming that in this case plot requires (for appropriate x-values) some boxes to have size log(0), which of course is undefined and hence the error is returned.
Hope it is clear and it will also help others.
Have you tried Matplotlib with Python? Matplotlib is a really nice plotting library and when used with Python's simple syntax, you can plot things quite easily:
import matplotlib.pyplot as plot
figure = plot.figure()
axis = figure.add_subplot(1 ,1, 1)
axis.set_yscale('log')
# Rest of plotting code

Resources