how to fit 3D data within zrange in gnuplot - gnuplot

I know to fit 2D data has z value between [-1:4] in gnuplot is
f(x)=a*x+b
fit [][-1:4] f(x) "data"
but for 3D data , if I only want to fit data when f(x) has value between [-1:4]
f(x)=a*x+b*y+c
fit [][-1:4] f(x) "data"
fit [][][-1:4] f(x) "data"
are both wrong. why ?

I am not sure, if the range behaviour you describe with the 2D fit is actually intended, because it does not work with the gnuplot development version. And according to the documentation, the range specifications for the fit command apply only to the dummy variables (i.e. x and y). So it might be, that your first fit command works only because of a bug, which is a feature for you.
To limit the z-range, you can set all values outside the desired range to 1/0, which results in an undefined data point which is then ignored:
f(x, y) = a*x + b*y + c
zmin = -1
zmax = 4
fit f(x, y) "data" using 1:2:($3 < zmin || $3 > zmax ? 1/0 : $3):(1) via a,b,c
Note, that your function must be defined for two dummy variables x and y, and you must have the via statement, which is missing in all of your examples.
To fit a function with two independent variables, z=f(x,y), the required
format is using with four items, x:y:z:s. The complete format must be
given---no default columns are assumed for a missing token. Weights for
each data point are evaluated from 's' as above. If error estimates are
not available, a constant value can be specified as a constant expression
(see plot datafile using), e.g., using 1:2:3:(1).

This plots a plane in 3D, not a line. I was confused until I zoomed out and realized. Try the below dataset of 4 points. 'Set autoscale' to make sure you see the whole image. Or just read the fit.log file and realize the errors are high indicating a poor fit.
377.4202 -345.5518 2.1142
377.4201 -345.5505 2.5078
377.4206 -345.556 2.8359
377.4288 -345.5555 3.2109

Related

Why my fit for a logarithm function looks so wrong

I'm plotting this dataset and making a logarithmic fit, but, for some reason, the fit seems to be strongly wrong, at some point I got a good enough fit, but then I re ploted and there were that bad fit. At the very beginning there were a 0.0 0.0076 but I changed that to 0.001 0.0076 to avoid the asymptote.
I'm using (not exactly this one for the image above but now I'm testing with this one and there is that bad fit as well) this for the fit
f(x) = a*log(k*x + b)
fit = fit f(x) 'R_B/R_B.txt' via a, k, b
And the output is this
Also, sometimes it says 7 iterations were as is the case shown in the screenshot above, others only 1, and when it did the "correct" fit, it did like 35 iterations or something and got a = 32 if I remember correctly
Edit: here is again the good one, the plot I got is this one. And again, I re ploted and get that weird fit. It's curious that if there is the 0.0 0.0076 when the good fit it's about to be shown, gnuplot says "Undefined value during function evaluation", but that message is not shown when I'm getting the bad one.
Do you know why do I keep getting this inconsistence? Thanks for your help
As I already mentioned in comments the method of fitting antiderivatives is much better than fitting derivatives because the numerical calculus of derivatives is strongly scattered when the data is slightly scatered.
The principle of the method of fitting an integral equation (obtained from the original equation to be fitted) is explained in https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales . The application to the case of y=a.ln(c.x+b) is shown below.
Numerical calculus :
In order to get even better result (according to some specified criteria of fitting) one can use the above values of the parameters as initial values for iterarive method of nonlinear regression implemented in some convenient software.
NOTE : The integral equation used in the present case is :
NOTE : On the above figure one can compare the result with the method of fitting an integral equation to the result with the method of fitting with derivatives.
Acknowledgements : Alex Sveshnikov did a very good work in applying the method of regression with derivatives. This allows an interesting and enlightening comparison. If the goal is only to compute approximative values of parameters to be used in nonlinear regression software both methods are quite equivalent. Nevertheless the method with integral equation appears preferable in case of scattered data.
UPDATE (After Alex Sveshnikov updated his answer)
The figure below was drawn in using the Alex Sveshnikov's result with further iterative method of fitting.
The two curves are almost indistinguishable. This shows that (in the present case) the method of fitting the integral equation is almost sufficient without further treatment.
Of course this not always so satisfying. This is due to the low scatter of the data.
In ADDITION , answer to a question raised in comments by CosmeticMichu :
The problem here is that the fit algorithm starts with "wrong" approximations for parameters a, k, and b, so during the minimalization it finds a local minimum, not the global one. You can improve the result if you provide the algorithm with starting values, which are close to the optimal ones. For example, let's start with the following parameters:
gnuplot> a=47.5087
gnuplot> k=0.226
gnuplot> b=1.0016
gnuplot> f(x)=a*log(k*x+b)
gnuplot> fit f(x) 'R_B.txt' via a,k,b
....
....
....
After 40 iterations the fit converged.
final sum of squares of residuals : 16.2185
rel. change during last iteration : -7.6943e-06
degrees of freedom (FIT_NDF) : 18
rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : 0.949225
variance of residuals (reduced chisquare) = WSSR/ndf : 0.901027
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 35.0415 +/- 2.302 (6.57%)
k = 0.372381 +/- 0.0461 (12.38%)
b = 1.07012 +/- 0.02016 (1.884%)
correlation matrix of the fit parameters:
a k b
a 1.000
k -0.994 1.000
b 0.467 -0.531 1.000
The resulting plot is
Now the question is how you can find "good" initial approximations for your parameters? Well, you start with
If you differentiate this equation you get
or
The left-hand side of this equation is some constant 'C', so the expression in the right-hand side should be equal to this constant as well:
In other words, the reciprocal of the derivative of your data should be approximated by a linear function. So, from your data x[i], y[i] you can construct the reciprocal derivatives x[i], (x[i+1]-x[i])/(y[i+1]-y[i]) and the linear fit of these data:
The fit gives the following values:
C*k = 0.0236179
C*b = 0.106268
Now, we need to find the values for a, and C. Let's say, that we want the resulting graph to pass close to the starting and the ending point of our dataset. That means, that we want
a*log(k*x1 + b) = y1
a*log(k*xn + b) = yn
Thus,
a*log((C*k*x1 + C*b)/C) = a*log(C*k*x1 + C*b) - a*log(C) = y1
a*log((C*k*xn + C*b)/C) = a*log(C*k*xn + C*b) - a*log(C) = yn
By subtracting the equations we get the value for a:
a = (yn-y1)/log((C*k*xn + C*b)/(C*k*x1 + C*b)) = 47.51
Then,
log(k*x1+b) = y1/a
k*x1+b = exp(y1/a)
C*k*x1+C*b = C*exp(y1/a)
From this we can calculate C:
C = (C*k*x1+C*b)/exp(y1/a)
and finally find the k and b:
k=0.226
b=1.0016
These are the values used above for finding the better fit.
UPDATE
You can automate the process described above with the following script:
# Name of the file with the data
data='R_B.txt'
# The coordinates of the last data point
xn=NaN
yn=NaN
# The temporary coordinates of a data point used to calculate a derivative
x0=NaN
y0=NaN
linearFit(x)=Ck*x+Cb
fit linearFit(x) data using (xn=$1,dx=$1-x0,x0=$1,$1):(yn=$2,dy=$2-y0,y0=$2,dx/dy) via Ck, Cb
# The coordinates of the first data point
x1=NaN
y1=NaN
plot data using (x1=$1):(y1=$2) every ::0::0
a=(yn-y1)/log((Ck*xn+Cb)/(Ck*x1+Cb))
C=(Ck*x1+Cb)/exp(y1/a)
k=Ck/C
b=Cb/C
f(x)=a*log(k*x+b)
fit f(x) data via a,k,b
plot data, f(x)
pause -1

Is there any error variable for gnuplot fit?

I'm making a c++ code which prints commands for gnuplot, in order to plot different things faster. The code plots the data already as the data fit as well, but now I'm adding some labels, and I want to print the fit equation, I mean something with this form
f(x) = (a +/- Δa)*x + (b +/- Δb)
I have the following line for printing it
set label 1 at screen 0.22, screen 0.75 sprintf('f(x) = %3.4f*x + %3.4f', a, b)
But, as you can see, there is only a and b values with no errors, I was thinking something like put there in the sprintf function any error related variables (FIT_something) and then have something like
set label 1 at screen 0.22, screen 0.75 sprintf('f(x) = (%3.4f +/- %3.4f)*x + (%3.4f + %3.4f)', a, deltaa, b, deltab)
But I can't find those, my answers are: does those exists? and if the answer is no, is there any way to print the variable errors further just writing it explicitly on the line?
Thanks for your help
Please read the statistical overview section of the gnuplot documentation (help statistical_overview). Keeping in mind the caveats described there, see also the documentation for set fit errorvariables, which I extract below:
If the `errorvariables` option is turned on, the error of each fitted
parameter computed by `fit` will be copied to a user-defined variable
whose name is formed by appending "_err" to the name of the parameter
itself. This is useful mainly to put the parameter and its error onto
a plot of the data and the fitted function, for reference, as in:
set fit errorvariables
fit f(x) 'datafile' using 1:2 via a, b
print "error of a is:", a_err
set label 1 sprintf("a=%6.2f +/- %6.2f", a, a_err)
plot 'datafile' using 1:2, f(x)
If the `errorscaling` option is specified, which is the default, the
calculated parameter errors are scaled with the reduced chi square. This is
equivalent to providing data errors equal to the calculated standard
deviation of the fit (FIT_STDFIT) resulting in a reduced chi square of one.

Making eye-diagram with gnuplot

I would like to plot 1000+ curves and display their eye diagram with gnuplot.
Example of eye-diagram example with matlab: http://www.mathworks.fr/fr/help/comm/ref/commscope.eyediagram.html
I can already plot the curves using the script bellow:
gnuplot> plot for [col=1:1000] 'input_dataset1.txt' using 0:col with lines linecolor rgb("#0000ff")
Result: output_image.png
My problem is that when two lines intersects, the intersection has the same color as the line. The eye-diagram should display area with lots of intersections in a different color.
I haven't fould any example of such diagrams made with gnuplot.
Playing with line transparency didn't work: the intersction of two semi-transparent lines is the same color as the line.
Any ideas ?
Thanks,
I worked out a gnuplot-only way to do this, it involves a bit of work and you'll probably have to fine tune the details for your particular problem.
As an example I generated a data file containing values for the function exp(x) and its Taylor expansions from order zero (T^(0)[exp(x)] = 1) to order 3 (T^(3)[exp(x)] = 1 + x + x**2/2. + x**3/6.). This kind of data is suited to this problem because you will have a high data density around the origin, where all the approximations converge to the exact value, and lower data density away from it. It can be generated like this with gnuplot:
set xrange [0:1]
set table
set output "| grep -v '^$' > data"
plot exp(x), 1, 1+x, 1+x+x**2/2., 1+x+x**2/2.+x**3/6.
unset table ; unset output
Note I'm formatting the output so my data file has no blank lines, otherwise gnuplot treats fields separated by blank lines as different data blocks and this eventually messes up the histograms below. This data looks like this (plot "data"):
Now, I create a 2D histogram with this data. It would be extremely helpful if gnuplot offered this feature, but it doesn't, so the task gets a bit tricky. What I will do is create several 1D histograms. For more info on how to generate the latter, check this.
The first thing is to figure out the width along x and y for your bins, xwidth and ywidth, where the number of data points are counted, that is, we divide the data space into a grid where each element measures xwidth by ywidth and is assigned a number equal to the number of data points contained within. The smaller these elements the better resolution your graph will have, but also the more data points you'll need for it to look good. For my data above, this could be something like
xwidth = 0.02
ywidth = 0.05
Now we declare a function to define our 1D bins (details):
bin(x,width)=width*floor(x/width)+width/2.0
and define the number of bins along each direction. Because the xrange for my data is [0:1] and my yrange is [1:2.8], the number of bins would be 50 and 36, respectively. I could use Nx = xrange / xwidth but that would lead to a float Nx and I want an integer. To be safe I do:
Nx = 50
Ny = 36
It might make more sense to define these values the other way around: calculate xwidth as xrange / Nx, in which case you should not have problems with integer/float.
Now I generate the 1D histograms along y, looping over x values:
set output "| grep -v 'u\\|^$' | sed 's/#/\\n#/g' > data2"
set table
plot for [i=0:(Nx-1)] "./data" using \
(bin($2,ywidth)):( i*xwidth <= $1 && (i+1.)*xwidth > $1 ? 1.0 : 0.0) \
smooth freq
unset table ; unset output
Now data2 contains Nx blocks of data, each of them being a scan along y with Ny data points. The value of these data points is the number of data entries in the original data file. As it is, data2 contains 2D data (y, color), which I need to remap to 3D. The x value is given by the data block position, accessible with the every option in gnuplot. To plot this 3-dimensionally I do:
set output "| grep -v 'u\\|^$' | sed 's/#/\\n#/g' > data3"
set table
splot for [i=0:(Nx-1)] "./data2" every :::i::i using \
((i+0.5)*xwidth):1:2
unset table ; unset output
This data3 can now be plotted as a color map:
plot "./data3" with image
which looks like this:
Had I used higher quality data (i.e. with higher resolution) the graph would look nicer. With 2x resolution along each direction, the same looks like below:

reduce datapoints when using logscale in gnuplot

I have a large set of data points from x = 1 to x = 10e13 (step size is fixed to about 3e8).
When I try to plot them using a logscale I certainly get an incredible huge point-density towards the end. Of course this affects my output plots since postscript and svg files (holding each and every data point) are getting really big.
Is there a way to tell gnuplot to decrease the data density dynamically?
Sample data here. Shows a straight line using logarithmic x-axis.
Usually, for this kind of plots, one can use a filter function which selects the desired points and discards all others (sets their value to 1/0:
Something like:
plot 'sample.dat' using (filter($1) ? $1 : 1/0):2
Now you must define an appropriate filter function to change the data density. Here is a proposal, with pseudo-data, although you might for sure find a better one, which doesn't show this typical logarithmic pattern:
set logscale x
reduce(x) = x/(10**(floor(log10(x))))
filterfunc(x) = abs(log10(sc)+(log10(x) - floor(log10(x))) - log10(floor(sc*reduce(x))))
filter(x) = filterfunc(x) < 1e-5 ? x : 1/0
set multiplot layout 1,2
sc = 1
plot 'sample.data' using (filter($1)):2 notitle
sc = 10
replot
The variable sc allows to change the density. The result is (with 4.6.5) is:
I did some work inspired by Christoph's answer and able to get equal spacing in log scale. I made a filtering, if you have numbers in the sequence you can simply use Greatest integer function and then find the nearest to it in log scale by comparing the fraction part. Precision is tuned by precision_parameter here.
precision_parameter=100
function(x)=(-floor(precision_parameter*log10(x))+(precision_parameter*log10(x)))
Now filter by using the filter function defined below
density_parameter = 3.5
filter(x)=(function(x) < 1/(log10(x))**density_parameter & function(x-1) > 1/(log10(x))**density_parameter ) ? x : 1/0
set datafile missing "NaN"
Last line helps in plotting with line point. I used x and x-1 assuming the xdata is in arithmetic progression with 1 as common difference, change it accordingly with your data. Just replace x by filter(x) in the plot command.
plot 'sample_data.dat' u (filter($1)):2 w lp

Linear Fit does not adjust b independently form a

I'm using the following gnuplot script to plot a linear fit:
#!/usr/bin/gnuplot
set term cairolatex
set output "linear_fit.tex"
c = 299792458.
x(x) = c / x
y(x) = x
h(x) = a * x + b
fit h(x) "linear_fit.dat" u (x($1)):(y($2)) via a,b
plot "linear_fit.dat" u (x($1)):(y($2)) w points title "", \
(h(x)) with lines linecolor rgb "black" title "Linear Fit"
However, after the iterations converge, b is always 1.0: https://dpaste.de/ozReq/
How can I get gnuplot to adjust b as well as a?
Update: Repeating the fit command a few hundred times with alternating via a/via b does give pretty good results, but that just can't be how it's supposed to be done.
Update 2: Here's the data in linear_fit.dat:
# lambda, V
360e-9 1.119
360e-9 1.148
360e-9 1.145
400e-9 0.949
400e-9 0.993
400e-9 0.971
440e-9 0.883
440e-9 0.875
440e-9 0.863
490e-9 0.737
490e-9 0.728
490e-9 0.755
540e-9 0.575
540e-9 0.571
540e-9 0.592
590e-9 0.457
590e-9 0.455
590e-9 0.482
I think your troubles stem from the fact that your x-values are very large (on the order of 10e14).
If you do not provide gnuplot with an initial guess for a and b, it will assume a=1 and b=1 as starting points for the fit. However, this is a poor initial guess:
Please note the log scale on both the x- and y-axis.
From the gnuplot documentation:
fit may, and often will get "lost" if started far from a solution, where SSR is large and changing slowly as the parameters are varied, or it may reach a numerically unstable region (e.g., too large a number causing a floating point overflow) which results in an "undefined value" message or gnuplot halting.
To improve the chances of finding the global optimum, you should set the starting values at least roughly in the vicinity of the solution, e.g., within an order of magnitude, if possible. The closer your starting values are to the solution, the less chance of stopping at another minimum. One way to find starting values is to plot data and the fitting function on the same graph and change parameter values and replot until reasonable similarity is reached. The same plot is also useful to check whether the fit stopped at a minimum with a poor fit.
In your case, such starting values could be:
a = 1e-15
b = -0.5
I obtained these values by eye-balling your range of values.
With those starting values, the linear fit results in:
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 1.97355e-015 +/- 6.237e-017 (3.161%)
b = -0.5 +/- 0.04153 (8.306%)
Which looks like this:
You can play with the control setting of fit (such as setting FIT_LIMIT = 1.e-35) or the starting values to achieve a better fit than this.
EDIT
While I still have not been able to coax gnuplot into modifying both parameters a, b at the same time, I found an alternate approach using R. I am aware that there are many other (scripting) languages that can perform a linear fit and this question was about gnuplot. However, the required effort with R appeared to be minimal.
Here's an example, which, when saved as linear_fit.R and called with
R CMD BATCH linear_fit.R
will provide the two coefficients of the linear fit, that gnuplot failed to provide.
y <- c(1.119, 1.148, 1.145, 0.949, 0.993, 0.971, 0.883, 0.875, 0.863,
0.737, 0.728, 0.755, 0.575, 0.571, 0.592, 0.457, 0.455, 0.482)
x <- c(3.60E-007, 3.60E-007, 3.60E-007, 4.00E-007, 4.00E-007,
4.00E-007, 4.40E-007, 4.40E-007, 4.40E-007, 4.90E-007,
4.90E-007, 4.90E-007, 5.40E-007, 5.40E-007, 5.40E-007,
5.90E-007, 5.90E-007, 5.90E-007)
c = 299792458.
x <- c/x
lm.out <- lm(y ~ x)
svg("linear_fit.svg")
plot(x,y)
abline(lm.out,col="red")
summary(lm.out)
You will end up with an svg-file that contains the plot and a linear_fit.Rout text file. In there you'll find the following coefficients:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.429e-01 4.012e-02 -13.53 3.55e-10 ***
x 2.037e-15 6.026e-17 33.80 2.61e-16 ***
So, in the terminology of the original question, we obtain:
a = 2.037e-15
b = -5.429e-01
These values are very close to the values you quoted from alternating the fit.
In case the comments get purged, these questions were identified as related:
What is gnuplot's internal representation of floating point numbers?
Gnuplot behaves oddly in polynomial fit. Why is that?

Resources