Better understanding histograms in Gnuplot

Better understanding histograms in Gnuplot - gnuplot

In gnuplot, you can create a histogram like
binwidth=#whatever#
set boxwidth binwidth
bin(x,width)=width*round(x/width)
plot "gaussian.data" u (bin($1,binwidth)):(1.0/10000) smooth freq w boxes
Here, I am interested in a probability histogram, hence the 1.0/10000.
I have spend a lot of time reading the gnuplot documentation on using and what I understand is that I am telling gnuplot to plot data from gaussian.data using certain values for the x and y. In fact, when I open the data file associated with the plot command (achieved through making a temporary file), I see that the y values are 1/10000, as expected. But then, the x and y values change. It seems like there's something dynamic about it. I do not quite understand this behavior of using. Could anyone please guide me?

In case anyone else would like further explanation.
http://psy.swansea.ac.uk/staff/carter/gnuplot/gnuplot_frequency.htm

Related

Fit log-log data with gnuplot

i try to fit this plot as you cans see the fit is not so good for the data.
My code is:
clear
reset
set terminal pngcairo size 1000,600 enhanced font 'Verdana,10'
set output 'LocalEnergyStepZoom.png'
set ylabel '{/Symbol D}H/H_0'
set xlabel 'n_{step}'
set format y '%.2e'
set xrange [*:*]
set yrange [1e-16:*]
f(x) = a*x**b
fit f(x) "revErrEnergyGfortCaotic.txt" via a,b
set logscale
plot 'revErrEnergyGfortCaotic.txt' w p,\
'revErrEnergyGfortRegular.txt' w p,\
f(x) w l lc rgb "black" lw 3
exit
So the question is how mistake i compute here? because i suppose that in a log-log plane a fit of the form i put in the code should rappresent very well the data.
Thanks a lot
Finally i can be able to solve the problem using the suggestion in the answer of Christop and modify it just a bit.
I found the approximate slop of the function (something near to -4) then taking this parameter fix i just fit the curve with only a, found it i fix it and modify only b. After that using the output as starting solution for the fit i found the best fit.

You must find appropriate starting values to get a correct fit, because that kind of fitting doesn't have one global solution.
If you don't define a and b, both are set to 1 which might be too far away. Try using
a = 100
b = -3
for a better start. Maybe you need to tweak those value a bit more, I couldn't because I don't have the data file.
Also, you might want to restrict the region of the fitting to the part above 10:
fit [10:] f(x) "revErrEnergyGfortCaotic.txt" via a,b
Of course only, if it is appropriate.

This is a common issue in data analysis, and I'm not certain if there's a nice Gnuplot way to solve it.
The issue is that the penalty functions in standard fitting routines are typically the sum of squares of errors, and try as you might, if your data have a lot of dynamic range, the errors for the smallest y-values come out to essentially zero from the point of view of the algorithm.
I recently taught a course to students where they needed to fit such data. Lots of them beat their (matlab) fitting routines into submission by choosing very stringent convergence criteria, but even this did not help too much.
What you really need to do, if you want to fit this power-law tail well, is to convert the data into log-log form and run a linear regression on that log-log representation.

The main problem here is that the residual errors of the function values of the higher x are very small compared to the residuals at lower x values. After all, you almost span 20 orders of magnitude on the y axis.
Just weight the y values with 1/y**2, or even better: if you have the standard deviations of your data points weight the values with 1/std**2. Then the fit should converge much much better.
In gnuplot weighting is done using a third data column:
fit f(x) 'data' using 1:2:(1/$2**2") via ...
Or you can use Raman Shah's advice and linearize the y axis and do a linear regression.

you need to use weights for your fit (currently low values are not considered as important) and have a better starting guess (via "pars_file.pars")

Exchanging the axes in gnuplot

I have been wondering about this for a while, and it might already be implemented in gnuplot but I haven't been able to find info online.
When you have a data file, it is possible to exchange the axes and assign the "dummy variable", say x, (in gnuplot's help terminology) to the vertical axis:
plot "data" u 1:2 # x goes to horizontal axis, standard
plot "data" u 2:1 # x goes to vertical axis, exchanged axes
However, when you have a function, you need to resort to a parametric function to do this. Imagine you want to plot x = y² (as opposite to y = x²), then (as far as I know) you need to do:
set parametric
plot t**2,t
which works nicely in this case. I think however that a more flexible approach would be desirable, something like
plot x**2 axes y1x1 # this doesn't work!
Is something like the above implemented, or is there an easy way to use y as dummy variable without the need to set parametric?

So here is another ugly, but gnuplot-only variant: Use the special filename '+' to generate a dynamic data set for plotting:
plot '+' using ($1**2):1
The development version contains a new feature, which allows you to use dummy variables instead of column numbers for plotting with '+':
plot sample [y=-10:10] '+' using (y**2):(y)
I guess that's what come closest to your request.

From what I have seen, parametric plots are pretty common in order to achieve your needs.
If you really hate parametric plots and you have no fear for a VERY ugly solutions, I can give you my method...
My trick is to use a data file filled with a sequence of numbers. To fit your example, let's make a file sq with a sequence of reals from -10 to 10 :
seq -10 .5 10 > sq
And then you can do the magic you want using gnuplot :
plot 'sq' u ($1**2):($1)
And if you uses linux you can also put the command directly in the command line :
plot '< seq -10 .5 10' u ($1**2):($1)
I want to add that I'm not proud of this solution and I'd love the "axis y1x1" functionality too.

As far as I know there is no way to simply invert or exchange the axes in gnuplot when plotting a function.
The reason comes from the way functions are plotted in the normal plotting mode. There is a set of points at even intervals along the x axis which are sampled (frequency set by set samples) and the function value computed. This only allows for well-behaved functions; one y-value per x-value.

gnuplot: filling the whole space when plotting sampled data

I have a problem with gnuplot. I've searched and I don't find the correct solution. I'm plotting some data arranged in three columns with the command splot, and the steps in x and y are different. The plot I get with:
set view map
splot 'data.dat' using 1:2:3 with points palette
is:
and I would like the white space to be filled, making each tile size adapt, avoiding interpolation.
Some ideas are given here Reduce distance between points in splot.
I've tryed http://gnuplot.sourceforge.net/demo/heatmaps.html too, but with image doesn't seem to work :(
I should avoid pointsize as my grid changes from time to time.

You can try
set pm3d map interpolate 1,1 corners2color c1
splot 'data.dat' using 1:($2-5e-5):3
This uses no interpolation, and the color of each polygon depends on the value of corner 'c1'. You may need to test if this is the correct one, or if you need 'c2', 'c3', or 'c4'.

Another solution to my problem, better than this one for some terminals at least, is given in the answers to my other question about maps appearance in pdfcairo terminal, where the solution comes when using plot with image insted of this splot. I tried to use that before, as I mention here, but maybe it also needed this specific data format.

Linecolor (not so) variable

I'm trying to present data in a boxplot with a few additions.
On top of the boxplot, i want to also print all the data points, since there aren't that many.
There will be many boxplots side by side, and the data points will correspond, so each data point in one plot will be represented in another boxplot, however their order can change. That's why I want to color the points.
I got this so far:
plot data using (1):($1) with boxplot,\
data using (1):($1) with points lc variable
[more plots...]
This needs an extra column in each datafile, that specifies the linecolor. Which works fine, if I had such a column, or if I could care to add it.
Is there another way to iterate through the linestyles (or colors), so it plots the first point with style 1, the second with style 2 etc.?
It seems like a real easy problem, that's either solved by some command I can't seem to find, or maybe by taking the linestyles from a different file, which would be the same for all plots (if that works in gnuplot).
Furthermore, I'd like to know if the boxplot command has the additional feature of being able to plot the average as well (or do I absolutely need the stats command from gnuplot 4.6, or some kind of hack).
Sometimes it's just nice to be able to simply add the average in a boxplot.

Is there another way to iterate through the linestyles (or colors), so it plots the first point with style 1, the second with style 2 etc.?
Yes. Gnuplot provides a number of pseudo-columns. To get more information, see
help datafile using pseudocolumn
But the gist of it is that you can use column(0) for this. I believe that iteration starts at 0 though. Since there isn't a ls 0, you'll need to add 1.
plot data using (1):($1) with boxplot,\
data using (1):($1):(column(0)+1) with points lc variable
Furthermore, I'd like to know if the boxplot command has the additional feature of being able to plot the average as well (or do I absolutely need the stats command from gnuplot 4.6, or some kind of hack).
I believe that you need either gnuplot 4.6 or some kind of hack. One such hack (which will work using gnuplot 4.4, but not earlier) could be:
sum=0.0
npt=0
compute_sum_npt(x)=(npt=npt+1,sum=sum+x,NaN)
set term unknown
plot data u 1:(compute_sum_npt($1))
avg=sum/npt
set term ...
set output ...
plot data using (1):($1) with boxplot,\
data using (1):($1):(column(0)+1) with points lc variable,\
avg w lines ls -1
If your version of gnuplot is earlier than 4.4, you'll need to use a shell command to compute the average. Something like awk should suffice.

Drawing a straight line averaging a curve

I would like to draw a straight line that makes the average of a curve. I am plotting my data like that:
plot 'dataset' u 2:4 w p smooth bezier
My data consists of multiple columns and I would get something like that:
Any ideas of how to do it? I guess it is more an interpolation than an average. It is not relevant the ups and downs of the curve, and it would be much better to have a straight line interpolating the curve...
Using a straight line could be more or less easy to fit using fit however, how could I fit a curve that does not look like a well know curve? Let me show you an example? How could I fit a smooth curve among the main group of points? Please notice that there is some noise on the lower part of the graph that I wouldn't like to represent.

If you want to do some basic statistics on your data, gnuplot has a builtin command stats which may do what you want. Gnuplot offers some internal variables after plotting that contain data about min, max, etc. To see what these are, type show variables all after plotting your data.
Otherwise if you want to fit your data to a line, gnuplot does that as well:
f(x) = a*x + b
fit f(x) 'data.dat' using 2:4 via a,b
plot 'data.dat' using 2:4, f(x)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Better understanding histograms in Gnuplot - gnuplot

In case anyone else would like further explanation. http://psy.swansea.ac.uk/staff/carter/gnuplot/gnuplot_frequency.htm

Related

Fit log-log data with gnuplot

Exchanging the axes in gnuplot

gnuplot: filling the whole space when plotting sampled data

Linecolor (not so) variable

Drawing a straight line averaging a curve

Categories

Resources