Gnuplot: Use fit in log scale - graphics

I need to make a linear approximation. However it needs to be in a log scale.
Here is my gnuplot script:
f(x)= a*x+b
fit f(x) "d0.dat" via a,b
set logscale x
set logscale y
plot "d0.dat" with points lt rgb "#ff0000" title "Points", \
f(x) with lines lt rgb "#ff00ff" title "Approximation"
Clearly the approximation is wrong. Can anyone help me to fix it. I didn't find any thing in google.

Gnuplot is correctly fitting your data to the function you provided--a straight line.
The problem is that using a log scale for the y axis does not scale the data--just how the data are plotted.
Try fitting it to a power law:
f(x)= a*x**b
fit f(x) "d0.dat" via a,b
set logscale x
set logscale y
plot "d0.dat" with points lt rgb "#ff0000" title "Points", \
f(x) with lines lt rgb "#ff00ff" title "Approximation"

I actually recommend a fit in logscale directly:
fl(x) = a+b*x
fit fl(x) 'data.dat' u (log($1)):(log($2)) via a,b
replot exp(fl(log(x))) t 'log approx'
The difference is appreciable when (a few) values for large x are out of the fit. The cost-function is otherwise too strongly affected (because x and y are exponentially large).

Related

Gnuplot smoothing data in loglog plot

I would like to plot a smoothed curve based on a dataset which spans over 13 orders of magnitude [1E-9:1E4] in x and 4 orders of magnitude [1E-6:1e-2] in y.
MWE:
set log x
set log y
set xrange [1E-9:1E4]
set yrange [1E-6:1e-2]
set samples 1000
plot 'data.txt' u 1:3:(1) smooth csplines not
The smooth curve looks nice above x=10. Below, it is just a straight line down to the point at x=1e-9.
When increasing samples to 1e4, smoothing works well above x=1. For samples 1e5, smoothing works well above x=0.1 and so on.
Any idea on how to apply smoothing to lower data points without setting samples to 1e10 (which does not work anyway...)?
Thanks and best regards!
JP
To my understanding sampling in gnuplot is linear. I am not aware, but maybe there is a logarithmic sampling in gnuplot which I haven't found yet.
Here is a suggestion for a workaround which is not yet perfect but may act as a starting point.
The idea is to split your data for example into decades and to smooth them separately.
The drawback is that there might be some overlaps between the ranges. These you can minimize or hide somehow when you play with set samples and every ::n or maybe there is another way to eliminate the overlaps.
Code:
### smoothing over several orders of magnitude
reset session
# create some random test data
set print $Data
do for [p=-9:3] {
do for [m=1:9:3] {
print sprintf("%g %g", m*10**p, (1+rand(0))*10**(p/12.*3.-2))
}
}
set print
set logscale x
set logscale y
set format x "%g"
set format y "%g"
set samples 100
pMin = -9
pMax = 3
set table $Smoothed
myFilter(col,p) = (column(col)/10**p-1) < 10 ? column(col) : NaN
plot for [i=pMin:pMax] $Data u (myFilter(1,i)):2 smooth cspline
unset table
plot $Data u 1:2 w p pt 7 ti "Data", \
$Smoothed u 1:2 every ::3 w l ti "cspline"
### end of code
Result:
Addition:
Thanks to #maij who pointed out that it can be simplified by simply mapping the whole range into linear space. In contrast to #maij's solution I would let gnuplot handle the logarithmic axes and keep the actual plot command as simple as possible with the extra effort of some table plots.
Code:
### smoothing in loglog plot
reset session
# create some random test data
set print $Data
do for [p=-9:3] {
do for [m=1:9:3] {
print sprintf("%g %g", m*10**p, (1+rand(0))*10**(p/12.*3.-2))
}
}
set print
set samples 500
set table $SmoothedLog
plot $Data u (log10($1)):(log10($2)) smooth csplines
set table $Smoothed
plot $SmoothedLog u (10**$1):(10**$2) w table
unset table
set logscale x
set logscale y
set format x "%g"
set format y "%g"
set key top left
plot $Data u 1:2 w p pt 7 ti "Data", \
$Smoothed u 1:2 w l lc "red" ti "csplines"
### end of code
Result:
Using a logarithmic scale basically means to plot the logarithm of a value instead of the value itself. The set logscale command tells gnuplot to do this automatically:
read the data, still linear world, no logarithm yet
calculate the splines on an equidistant grid (smooth csplines), still linear world
calculate and plot the logarithms (set logscale)
The key point is the equidistant grid. Let's say one chooses set xrange [1E-9:10000] and set samples 101. In the linear world 1e-9 compared to 10000 is approximately 0, and the resulting grid will be 1E-9 ~ 0, 100, 200, 300, ..., 9800, 9900, 10000. The first grid point is at 0, the second one at 100, and gnuplot is going to draw a straight line between them. This does not change when afterwards logarithms of the numbers are plotted.
This is what you already have noted in your question: you need 10 times more points to get a smooth curve for smaller exponents.
As a solution, I would suggest to switch the calculation of the logarithms and the calculation of the splines.
# create some random test data, code "stolen" from #theozh (https://stackoverflow.com/a/66690491)
set print $Data
do for [p=-9:3] {
do for [m=1:9:3] {
print sprintf("%g %g", m*10**p, (1+rand(0))*10**(p/12.*3.-2))
}
}
set print
# this makes the splines smoother
set samples 1000
# manually account for the logarithms in the tic labels
set format x "10^{%.0f}" # for example this format
set format y "1e{%+03.0f}" # or this one
set xtics 2 # logarithmic world, tic distance in orders of magnitude
set ytics 1
# just "read logarithm of values" from file, before calculating splines
plot $Data u (log10($1)):(log10($2)) w p pt 7 ti "Data" ,\
$Data u (log10($1)):(log10($2)) ti "cspline" smooth cspline
This is the result:

fitting sub-range on time data in gnuplot

Let me start by saying that I am working on:
$ gnuplot --version
gnuplot 5.2 patchlevel 2
I would like to plot and fit date/time data in gnuplot and have the fit only performed and subsequently displayed on a sub-range of the plot.
Example data that I played with can e.g. be found here.
EDIT: I realized that the data in the file don't match the timefmt signature, I added a /06 to each line so that the point would be drawn in the middle of the year which allowed to nicely plot it together with also monthly data from the same source.
I can get the desired result with the code below where I plot three functions, one over the full range of the plot and two others both of which only cover part of the date range.
set key left
set yrange[-0.75:1.0]
set xdata time
set timefmt '%Y/%m'
r=10e-10
e(x) = r*x+s
fit e(x) 'HadCRUT.4.6.0.0.annual_ns_avg_smooth.txt' using 1:2 via r,s
a=10e-10
f(x) = a * x + b
set xrange ["1970/06":"2018/06"]
fit f(x) 'HadCRUT.4.6.0.0.annual_ns_avg_smooth.txt' using 1:2 via a,b
g(x) = ( x > "1970/06" ) ? f(x) : 1/0
set xrange ["1850/06":"1970/06"]
c=9.24859e-11
h(x) = c * x + d
fit h(x) 'HadCRUT.4.6.0.0.annual_ns_avg_smooth.txt' using 1:2 via c,d
i(x) = ( x < "1970/06" ) ? h(x) : 1/0
set xrange ["1849/06":"2018/06"]
set term png size 1500,1000
set output 'annual_average_with_fit.png'
plot 'HadCRUT.4.6.0.0.annual_ns_avg_smooth.txt' using 1:2 with lp lw 2 t'annual avg (decadally smoothed)', e(x) t'full range fit' lw 2, i(x) t'1850-1970 fit' lw 2, g(x) t'1970-2018 fit' lw 2
which yields this plot
This is all good and well, but (and this is where the question comes in) in principle I should be able to achieve the same result also by other means.
First: I restrict the range of the file data to a certain range to fit it only on that range. In principle I should be able to do the same using this (type of) syntax:
fit ["1970/06":"2018/06"] f(x) 'HadCRUT.4.6.0.0.annual_ns_avg_smooth.txt' using 1:2 via a,b
which however gives
Read 168 points
Skipped 168 points outside range [x=1970:2018]
[...] No data to fit
which seems weird given that the set xrange clearly has the desired effect.
Secondly trying to restrict the plotting of the curve to the fit range with
plot 'HadCRUT.4.6.0.0.annual_ns_avg_smooth.txt' using 1:2 with lp lw 2 t'annual avg (decadally smoothed)', ["1970/06":"2018/06"] f(x) t''
does not plot the function at all.
I might be overlooking something very basic, but having tried various things I don't see what it is
The following (a bit cleaned up) code should do what you want (tested with gnuplot 5.2.5).
I guess the problem is that you tried to fit a range ["1970/06":"2018/06"], but your data is only until 2017. So better leave it open, e.g. ["1970/06":] or ["1970/06":*].
edit: added a limited range fit i(x)
reset session
set term png size 1500,1000
set output 'annual_average_with_fit.png'
set key left
set yrange[-0.75:1.0]
set xdata time
set timefmt '%Y/%m'
set format x '%Y'
FILE = 'HadCRUT.4.6.0.0.annual_ns_avg_smooth.txt'
r=10e-10
f(x) = r*x+s
fit [*:*] f(x) FILE using 1:2 via r,s
c=9.24859e-11
g(x) = c * x + d
fit [*:"1970/06"] g(x) FILE using 1:2 via c,d
a=10e-10
h(x) = a * x + b
fit ["1970/06":*] h(x) FILE using 1:2 via a,b
p=1e-9
i(x) = p * x + q
fit [strptime("%Y/%m", "1910/06"):strptime("%Y/%m", "1945/06")] i(x) FILE using 1:2 via p,q
set xrange [*:*]
plot FILE using 1:2 with lp lw 2 t'annual avg (decadally smoothed)', \
f(x) t 'full range fit' lw 2, \
[:"1970/06"] g(x) t '1850-1970 fit' lw 2, \
["1970/06":] h(x) t '1970-2018 fit' lw 2,\
[strptime("%Y/%m", "1910/06"):strptime("%Y/%m", "1945/06")] i(x) t '1910-1945 fit' lw 2
set output
Output:

The function disappear close to zero

I have problem with plotting fitted function.
The part of the ploted function close to zero disappears and connected with the hyperbola or something which should not be there at all. This happen only if I change set xrange to something smaller than 0. I have to do this because I have lot of data points to close zero so it would look very ugly if I would not changed it.
I tried to use conditionals x>0?f(x):1/0 but it does not help. The hyperbola disappear but the function does not continue down as it should.
I use this code:
set terminal postscript eps size 3.5,2.62 enhanced color
set output "a.eps"
set xrange [-1:]
f(x)=a*b*x/(1+a*x)
fit f(x) "./a" via a, b
plot "./a" w p title "", f(x) w l title "Langmuir isotherm"
That is simply a matter of sampling. The default sampling rate is 100 (show samples), which isn't enough to show fast-varying functions. Increase the sampling rate with e.g.
set samples 1000
to have your function plotted correctly.
A second point is, that discontinuities aren't shown properly if no sample is located exactly at that position. Consider the following plot to demonstrate this:
set xrange [-1:1]
set multiplot layout 2,1
set samples 100
plot 1/x
set samples 101
plot 1/x
unset multiplot
So, if you want to plot the function correctly on both sides of the discontinuity, you must either define a small region around the discontinuity as undefined, or you plot the parts on the left and right separately:
set xrange [-1:]
f(x)=a*b*x/(1+a*x)
fit f(x) "./a" via a, b
left(x) = (x < -1/a ? f(x) : 1/0)
right(x) = (x > -1/a ? f(x) : 1/0)
plot "./a" w p title "", left(x) w l lt 2 title "Langmuir isotherm", right(x) w l lt 2 notitle

How can I fix zero to be at the same place when using separate y axes in gnuplot?

I have a data file, with column 1 as the independent variable and columns 2 and 3 as dependent variables. I want to plot variables 2 and 3 on different y axes using something like this:
plot "file.out" u 1:2 axes x1y1, "file.out" u 1:3 axes x1y2
When I do this, the "0" for both axes are offset from one another. How can I fix the zero of one y-axis to the zero of the other y-axis, without explicitly setting yrange to be symmetric for both quantities?
It is possible form version 5 to use set link. However it does not autofit the ratios, so you're left with calculating them yourself
stat "file.out" u 1:2
MAX1=abs(STATS_max_y)
MIN1=-abs(STATS_min_y)
stat "file.out" u 1:3
MAX2=abs(STATS_max_y)
MIN2=-abs(STATS_min_y)
min(a,b)=(a<b)?a:b
set link y2 via min(MAX1/MAX2,MIN1/MIN2)*y inverse y/min(MAX1/MAX2,MIN1/MIN2)
plot "file.out" u 1:2 axes x1y1, "file.out" u 1:3 axes x1y2
Here is a solution which works without linking axes, hence it also works even with gnuplot 4.4 (the version from 2010).
Although, it doesn't need stats but as a disadvantage it requires to replot the data to get the proper scaling of the y2-axis.
Code:
### aligning zero on y1- and y2-axes
reset
set ytics nomirror
set y2tics nomirror
set xzeroaxis
set key top left
plot \
sin(x) axes x1y1 w l, \
cos(x)-0.5 axes x1y2 w l
R0 = -GPVAL_Y_MIN/(GPVAL_Y_MAX-GPVAL_Y_MIN)
y2_min_new = abs(GPVAL_Y2_MIN)>abs(GPVAL_Y2_MAX) ? GPVAL_Y2_MIN : R0*GPVAL_Y2_MAX/(R0-1)
y2_max_new = abs(GPVAL_Y2_MAX)>abs(GPVAL_Y2_MIN) ? GPVAL_Y2_MAX : (R0-1)*GPVAL_Y2_MIN/R0
set y2range[y2_min_new:y2_max_new]
replot
### end of code
Result:
Unfortunately, you can't (at least not in general). If the yrange has the same percent above and below 0, it should probably work, e.g.:
set yrange [-5:10]
set y2range [-10:20]
But if you don't want to do that, then I don't know that there's a better solution...

Gnuplot: how to add y2 axis scale for different units

I'm plotting data from a file. The data points are in metric units. I want to show a second scale on the right (y2) that's in standard units.
The file represents rocket motor thrust over time. The data are in Newtons. I want to show newtons on the left (this happens by itself, naturally) and pounds force on the right. The conversion is a simple factor (multiply N by 0.2248 to obtain lbf).
I can set y2tics and if I set y2range manually, they appear on the right. What I don't know how to do is set y2range automatically to y1range * a factor.
My eventual solution is to plot twice, once in Newtons on y1 and once in pounds on y2, and make the y2 plot almost invisible:
plot '-' using 1:($2*0.2248) with dots axes x1y2 lc rgb 'white' notitle, \
'' using 1:2 with lines lc rgb '<color>' title '<title>'
The solution above often generates slightly different y scales: with autoragne, gnuplot rounds up the range so the top tick on each axis is a round number, and of course the rounding is different for different units.
Ultimately I end up with Python code that finds the highest thrust value in each graph, then I explicitly set yrange to that number and y2range to that number * 0.2248:
f.write("set yrange [0:%s]; set y2range[0:%s]\n" % (peak_thrust, peak_thrust*NEWTON_LBF));
Here's the end result: http://www.lib.aero/hosted/motors/cesaroni_12-15-12.html (sample graph below)
It seems to me that the easiest way to do this is to simply scale the data:
set y2tics
plot sin(x) w lines, 5*sin(x) w lines axes x1y2
Of course, you're plotting data from a file, so it would look something more like:
set y2tics
FACTOR=0.2248 #conversion factor from newtons to lbf
plot 'datafile' u 1:2 w lines, '' u 1:(FACTOR*$2) w lines
If you're setting the yrange explicitly (which you may need to do):
set yrange [ymin:ymax]
set y2range [ymin*FACTOR:ymax*FACTOR]
Finally, if you really want to rely on autoscaling, you're going to need to do some "gymnastics".
First, set a dummy terminal so we can plot without making a plot:
set term unknown
plot 'datafile' u 1:2 #collect information on our data
Now that we've collected information on the data, we can set our real y2range
FACTOR=0.2248
set y2range [FACTOR*GPVAL_Y_MIN : FACTOR*GPVAL_Y_MAX]
set y2tics nomirror
set ytics nomirror
Now set the terminal and plot the data:
set term ...
set output ...
plot 'datafile' u 1:2 w lines
Version 5.0 added support for this kind of relations between the y and y2 (or also x and x2) axis:
set xrange[0:370]
set ytics nomirror
set y2tics
set link y2 via 0.2248*y inverse y/0.2248
plot x
I know it's an old question and the answer has already been accepted, but I think it's worth sharing my approach.
I simply use modified labels for the x2axis. In your case, this would be
set y2tics ("10" 10/0.2248, "20" 20/0.2248 etc etc...
that can be looped this way
do for [i=0:1000:10] { set y2tics add (sprintf("%i",i) i/0.2248) }
where the for range should be adjusted according to your data (you could use stats and the variable GPVAL_DATA_Y_MAX for complete peace of mind).
Don't forget to
set ytics nomirror
This will give exactly what are you looking for, in (almost) a one liner:
If you want to use a grid and have the converted factors on the x2axis, so that for example to the label y=50 N would correspond y2=11.2 (it keeps things tidy if you use a grid) you can do
do for [i=0:1000:50] { set y2tics add (sprintf("%5.1f",i*0.2248) }
This is the result:

Resources