I have to plot an histogram in logarithmic scale on both axis using gnuplot. I need bins to be equally spaced in log10. Using a logarithmic scale on the y axis isn't a problem. The main problem is creating the bin on the x axis. For example, using 10 bins in log10, first bins will be [1],[2],[3]....[10 - 19][20 - 29].....[100 190] and so on. I've searched on the net but I couldn't find any practical solution. If realizing it in gnuplot is too much complicated could you suggest some other software/language to do it?
As someone asked I will explain more specifically what I need to do. I have a (huge) list like this:
1 14000000
2 7000000
3 6500000
.
.
.
.
6600 1
8900 1
15000 1
19000 1
It shows, for example, that 14 milions of ip addresses have sent 1 packet, 7 milions 2 packets.... 1 ip address have sent 6600 packets, ... , 1 ip address have sent 19000 packets. As you can see the values on both axes are pretty high so I cannot plot it without a logarithmic scale.
The first things I tried because I needed to do it fast was plotting this list as it is with gnuplot setting logscale on both axes using boxes. The result is understandable but not too appropriate. In fact, the boxes became more and more thin going right on the x axis because, obviously, there are more points in 10-100 than in 1-10! So it became a real mess after the second decade.
I tried plotting a histogram with both axis being logarithmically scaled and gnuplot through the error
Log scale on X is incompatible with histogram plots.
So it appears that gnuplot does not support a log scale on the x axis with histograms.
Plotting in log-log scale in GnuPlot is perfectly doable contrary to the other post in this thread.
One can set the log-log scale in GnuPlot with the command set logscale.
Then, the assumption is that we have a file with positive (strictly non-zero) values both in the x-axis, as well as the y-axis. For example, the following file is a valid file:
1 0.5
2 0.2
3 0.15
4 0.05
After setting the log-log scale one can plot the file with the command:
plot "file.txt" w p where of course file.txt is the name of the file. This command will generate the output with points.
Note also that plotting boxes is tricky and is probably not recommended. One first has to restrict the x-range with a command of the form set xrange [1:4] and only then plot with boxes. Otherwise, when the x-range is undefined an error is returned. I am assuming that in this case plot requires (for appropriate x-values) some boxes to have size log(0), which of course is undefined and hence the error is returned.
Hope it is clear and it will also help others.
Have you tried Matplotlib with Python? Matplotlib is a really nice plotting library and when used with Python's simple syntax, you can plot things quite easily:
import matplotlib.pyplot as plot
figure = plot.figure()
axis = figure.add_subplot(1 ,1, 1)
axis.set_yscale('log')
# Rest of plotting code
Related
I am a gnuplot-newbie and am stuck with the following situation. Based on this I have a gnuplot script as follows:
clear
reset
set key off
set border 3
set style fill solid 1.0 noborder
bin_width = 0.01;
set boxwidth bin_width absolute
bin_number(x) = floor(x/bin_width)
rounded(x) = bin_width * ( bin_number(x) + 0.5 )
plot '1000randomValuesBetween0and1.dat' using (rounded($1)):(1) smooth frequency
Which was a good first step; but I would like to have a smooth curve through the points that are generated by counting the frequency. with filledcurves lacked what I wanted in 2 ways. First it is not smoothed (I would prefer something like bezier which is not usable after with); second the filling is done in a rather unexpected way which doesn't fit my needs (for me unexpected). See this picture .
To give a little bit more context: I ultimately want to use this to generate
violin plots with gnuplot without having to do the binning beforehand so I can just give my script a single-column data-file and am ready to go.
EDIT: I tried adapting the "normal" density plot from this demo as another first step, but I failed; I read in the documentation that bandwidth should be 1/#points so it
should be 0.001 in my case meaning I tried this:
set border 3 front lt black linewidth 1.000 dashtype solid
set style increment default
set style data filledcurves
set xtics border in scale 0,0 nomirror norotate autojustify
set xtics norangelimit 0.00000,0.5,1.0
set title "Same data - kernel density"
set title font ",15" norotate
plot 'random01.dat' using 1:(1) smooth kdensity bandwidth 0.001 with filledcurves above y lt 9
which results in this picture:.
Setting no bandwith or lower/higher values didn't solve the issue.
The plot specifies using 1:(1) because I just have a single column so according to the doc the first value should be this column and as the second value would specify a weighting which should be 1/#points according to doc.
EDIT2: Setting bandwidth to the ideal value or not setting it at all always yields the same result which doesn't change anything except the scale of the y-axis with changing the weighting.
My data are 1000 values in a range between 0 and 1 (created randomly for testing purposes).
Here the new plot
EDIT3: zooming out may show another aspect of the problem as the plot seems to extend outside the interval of the given values (I checked the values and there are no examples <0 or >1). Here's the graph:
The demo 'violinplot.dem' included with the gnuplot distribution package and also available online shows how to do what you want using the combination "smooth kdensity" and "with filledcurve" applied to unbinned data.
Online version here: violin plot demo
Notes:
You mis-read the documentation. 1/N is not the recommended bandwidth, it is the normalized uniform weight. The plot you showed initially looks like the bandwidth was set far too low. What is the range of values in your data?
I suggest letting the program calculate the "ideal" bandwidth for you and then adjusting it afterwards if you think it is too large. The ideal value is stored in GPVAL_KDENSITY_BANDWIDTH. Increasing the bandwidth will make the envelope smoother; decreasing it will emphasize local spikes.
I'm not getting smooth curve on gnuplot.
This is my code:
set style line 3 lc rgb '#09ad00' lt 1 lw 1.5 #green
set style line 1 lc rgb '#0060ad' lt 1 lw 2 #blue
set style line 2 lc rgb '#dd181f' lt 1 lw 2 #red
plot [-1:1] f1(x) with line ls 3,f2(x) with line ls 1,f1(x)+f2(x) with line ls 2
I'm getting this plot
while I'm expecting this type of curve
You haven't shown what your particular functions are, but this is almost certainly a sampling problem. Gnuplot doesn't really draw curves for functions - it actually computes the functions at multiple points and connects them with straight lines, similarly to what would happen if you were plotting a data file. The number of points that it computes is user settable.
Suppose that I do plot sin(x) and see this:
Here the sampling rate is set pretty low. We can look at the individual points in order to see what is going on.
In order to improve this, I need to increase the sampling rate by using the set samples ? command. The default is 100 (in 5.0 patch level 6). Depending on how rapidly the function changes, higher values may be needed. I usually set it to around 1000 with set samples 1000. This changes the graph to
which produces a much nicer smooth curve. Again, this is just a bunch of points connected by straight lines, but when there are a lot of these, it looks like a smooth continuous curve.
We can look at the individual points again (using a sampling rate of 100 as 1000 is too many to clearly see the points)
We can also see here that there is not much difference between the graph with 1000 points and 100 points. In the case of a sine curve, 100 is enough to see a smooth graph, but with a faster changing curve, we may need more.
The set samples command takes (optionally) two values, but the second value is only used for 3d plots. You can find out more with the help samples command.
is there a way to avoid the drawing of the near asymptote line in the function 1/(2-x), for example, without usage of conditional plotting? The idea is to draw iterated functions based in this one and, since asymptote changes, using conditional plotting isn't a good solution.
You can plot with points at a very high sampling rate:
set yrange [-10:10]
set samples 100000
plot 1/(2-x) with points
If the singularity occurs at different values of x you can use conditional plotting on y:
f(x)=1/(2-x)
set samples 1000
plot (abs(f(x)) < 10 ? f(x) : 1/0) with lines
i try to fit this plot as you cans see the fit is not so good for the data.
My code is:
clear
reset
set terminal pngcairo size 1000,600 enhanced font 'Verdana,10'
set output 'LocalEnergyStepZoom.png'
set ylabel '{/Symbol D}H/H_0'
set xlabel 'n_{step}'
set format y '%.2e'
set xrange [*:*]
set yrange [1e-16:*]
f(x) = a*x**b
fit f(x) "revErrEnergyGfortCaotic.txt" via a,b
set logscale
plot 'revErrEnergyGfortCaotic.txt' w p,\
'revErrEnergyGfortRegular.txt' w p,\
f(x) w l lc rgb "black" lw 3
exit
So the question is how mistake i compute here? because i suppose that in a log-log plane a fit of the form i put in the code should rappresent very well the data.
Thanks a lot
Finally i can be able to solve the problem using the suggestion in the answer of Christop and modify it just a bit.
I found the approximate slop of the function (something near to -4) then taking this parameter fix i just fit the curve with only a, found it i fix it and modify only b. After that using the output as starting solution for the fit i found the best fit.
You must find appropriate starting values to get a correct fit, because that kind of fitting doesn't have one global solution.
If you don't define a and b, both are set to 1 which might be too far away. Try using
a = 100
b = -3
for a better start. Maybe you need to tweak those value a bit more, I couldn't because I don't have the data file.
Also, you might want to restrict the region of the fitting to the part above 10:
fit [10:] f(x) "revErrEnergyGfortCaotic.txt" via a,b
Of course only, if it is appropriate.
This is a common issue in data analysis, and I'm not certain if there's a nice Gnuplot way to solve it.
The issue is that the penalty functions in standard fitting routines are typically the sum of squares of errors, and try as you might, if your data have a lot of dynamic range, the errors for the smallest y-values come out to essentially zero from the point of view of the algorithm.
I recently taught a course to students where they needed to fit such data. Lots of them beat their (matlab) fitting routines into submission by choosing very stringent convergence criteria, but even this did not help too much.
What you really need to do, if you want to fit this power-law tail well, is to convert the data into log-log form and run a linear regression on that log-log representation.
The main problem here is that the residual errors of the function values of the higher x are very small compared to the residuals at lower x values. After all, you almost span 20 orders of magnitude on the y axis.
Just weight the y values with 1/y**2, or even better: if you have the standard deviations of your data points weight the values with 1/std**2. Then the fit should converge much much better.
In gnuplot weighting is done using a third data column:
fit f(x) 'data' using 1:2:(1/$2**2") via ...
Or you can use Raman Shah's advice and linearize the y axis and do a linear regression.
you need to use weights for your fit (currently low values are not considered as important) and have a better starting guess (via "pars_file.pars")
I have been wondering about this for a while, and it might already be implemented in gnuplot but I haven't been able to find info online.
When you have a data file, it is possible to exchange the axes and assign the "dummy variable", say x, (in gnuplot's help terminology) to the vertical axis:
plot "data" u 1:2 # x goes to horizontal axis, standard
plot "data" u 2:1 # x goes to vertical axis, exchanged axes
However, when you have a function, you need to resort to a parametric function to do this. Imagine you want to plot x = y² (as opposite to y = x²), then (as far as I know) you need to do:
set parametric
plot t**2,t
which works nicely in this case. I think however that a more flexible approach would be desirable, something like
plot x**2 axes y1x1 # this doesn't work!
Is something like the above implemented, or is there an easy way to use y as dummy variable without the need to set parametric?
So here is another ugly, but gnuplot-only variant: Use the special filename '+' to generate a dynamic data set for plotting:
plot '+' using ($1**2):1
The development version contains a new feature, which allows you to use dummy variables instead of column numbers for plotting with '+':
plot sample [y=-10:10] '+' using (y**2):(y)
I guess that's what come closest to your request.
From what I have seen, parametric plots are pretty common in order to achieve your needs.
If you really hate parametric plots and you have no fear for a VERY ugly solutions, I can give you my method...
My trick is to use a data file filled with a sequence of numbers. To fit your example, let's make a file sq with a sequence of reals from -10 to 10 :
seq -10 .5 10 > sq
And then you can do the magic you want using gnuplot :
plot 'sq' u ($1**2):($1)
And if you uses linux you can also put the command directly in the command line :
plot '< seq -10 .5 10' u ($1**2):($1)
I want to add that I'm not proud of this solution and I'd love the "axis y1x1" functionality too.
As far as I know there is no way to simply invert or exchange the axes in gnuplot when plotting a function.
The reason comes from the way functions are plotted in the normal plotting mode. There is a set of points at even intervals along the x axis which are sampled (frequency set by set samples) and the function value computed. This only allows for well-behaved functions; one y-value per x-value.