Gnuplot - Multi-branch fit with errors - gnuplot

I want to do a multi-branch fit with gnuplot. I have a set of data containing values for xyerror the following data.
x y1 dx dy1
x y2 dx dy2
I was able to do a fit without x and y errors.
fit f(x,y) 'files.dat' using 1:-1:2 via a,b,c
I want to know how to do a multi-branch fit with x and y errors?
Thank you in advance

As far as I can tell your case is already mentioned in the gnuplot manual about fit:
As an example, if one has 2 independent variables, and errors for the first independent variable and the
dependent variable, one uses the errors x,z qualifier, and a using qualifier with 5 columns, which are
interpreted as x:y:z:sx:sz (where x and y are the independent variables, z the dependent variable, and sx and
sz the standard deviations of x and z).
x ("x" in your data file) and y (line number) are your two independent variables, z ("y" in your data file) is the dependent variable. For your case the fit command should look like
fit f(x,y) 'files.dat' using 1:-1:2:3:4 errors x,z via a,b,c

Related

Is there any error variable for gnuplot fit?

I'm making a c++ code which prints commands for gnuplot, in order to plot different things faster. The code plots the data already as the data fit as well, but now I'm adding some labels, and I want to print the fit equation, I mean something with this form
f(x) = (a +/- Δa)*x + (b +/- Δb)
I have the following line for printing it
set label 1 at screen 0.22, screen 0.75 sprintf('f(x) = %3.4f*x + %3.4f', a, b)
But, as you can see, there is only a and b values with no errors, I was thinking something like put there in the sprintf function any error related variables (FIT_something) and then have something like
set label 1 at screen 0.22, screen 0.75 sprintf('f(x) = (%3.4f +/- %3.4f)*x + (%3.4f + %3.4f)', a, deltaa, b, deltab)
But I can't find those, my answers are: does those exists? and if the answer is no, is there any way to print the variable errors further just writing it explicitly on the line?
Thanks for your help
Please read the statistical overview section of the gnuplot documentation (help statistical_overview). Keeping in mind the caveats described there, see also the documentation for set fit errorvariables, which I extract below:
If the `errorvariables` option is turned on, the error of each fitted
parameter computed by `fit` will be copied to a user-defined variable
whose name is formed by appending "_err" to the name of the parameter
itself. This is useful mainly to put the parameter and its error onto
a plot of the data and the fitted function, for reference, as in:
set fit errorvariables
fit f(x) 'datafile' using 1:2 via a, b
print "error of a is:", a_err
set label 1 sprintf("a=%6.2f +/- %6.2f", a, a_err)
plot 'datafile' using 1:2, f(x)
If the `errorscaling` option is specified, which is the default, the
calculated parameter errors are scaled with the reduced chi square. This is
equivalent to providing data errors equal to the calculated standard
deviation of the fit (FIT_STDFIT) resulting in a reduced chi square of one.

fit function in gnuplot at x-log(y) scale

My data has two columns: date (in Month/Year format) and corresponding value. I plotted this data on x-log(y) scale using gnuplot. It looks very close to a straight line. I am interested to draw a straight line using curve fitting. I tried with few fit functions but did not get success.
I tried the following fit functions:
f(x) = a * x + b (f(x) is not linear as scale is x-log(y))
f(x) = a*10**x + b (overflow error)
Any help in this regard would be appreciated.
The overflow error should be due to at least one large value of x. If you can rescale the x data so that there is no overflow when calculating 10**x, the fit might work. As a test, try something like:
x_scaled = x / 1000.0
f(x_scaled) = a*10**x_scaled + b
Inspecting the maximum value of x will give you an idea of the scaling value, shown as 1000.0 in my example.

Octave Multiple Colors in Single Plot

Consider X, Y and Z as n-columnar vector where, Z only has values 1-6.
Then, I would like to plot
for i=1:n
if Z(i) == 1
plot(X(i), Y(i), #1)
hold on
elseif
plot(X(i), Y(i), #2)
...
What I would like to do is accomplish this in single line as
plot(X, y, 'color', Z).
Is there a way to do so?
(In short, can my settings (color in this instance) be dictated by third vector? )
Thanks in advance.
If I understood your question correctly, you want to plot each pair of coordinates x(i), y(i) using color z(i). Use the scatter() function:
scatter(x,y,[],z)
z can be either a vector or a matrix where each row is a RGB color specification.

how to fit 3D data within zrange in gnuplot

I know to fit 2D data has z value between [-1:4] in gnuplot is
f(x)=a*x+b
fit [][-1:4] f(x) "data"
but for 3D data , if I only want to fit data when f(x) has value between [-1:4]
f(x)=a*x+b*y+c
fit [][-1:4] f(x) "data"
fit [][][-1:4] f(x) "data"
are both wrong. why ?
I am not sure, if the range behaviour you describe with the 2D fit is actually intended, because it does not work with the gnuplot development version. And according to the documentation, the range specifications for the fit command apply only to the dummy variables (i.e. x and y). So it might be, that your first fit command works only because of a bug, which is a feature for you.
To limit the z-range, you can set all values outside the desired range to 1/0, which results in an undefined data point which is then ignored:
f(x, y) = a*x + b*y + c
zmin = -1
zmax = 4
fit f(x, y) "data" using 1:2:($3 < zmin || $3 > zmax ? 1/0 : $3):(1) via a,b,c
Note, that your function must be defined for two dummy variables x and y, and you must have the via statement, which is missing in all of your examples.
To fit a function with two independent variables, z=f(x,y), the required
format is using with four items, x:y:z:s. The complete format must be
given---no default columns are assumed for a missing token. Weights for
each data point are evaluated from 's' as above. If error estimates are
not available, a constant value can be specified as a constant expression
(see plot datafile using), e.g., using 1:2:3:(1).
This plots a plane in 3D, not a line. I was confused until I zoomed out and realized. Try the below dataset of 4 points. 'Set autoscale' to make sure you see the whole image. Or just read the fit.log file and realize the errors are high indicating a poor fit.
377.4202 -345.5518 2.1142
377.4201 -345.5505 2.5078
377.4206 -345.556 2.8359
377.4288 -345.5555 3.2109

Linear Fit does not adjust b independently form a

I'm using the following gnuplot script to plot a linear fit:
#!/usr/bin/gnuplot
set term cairolatex
set output "linear_fit.tex"
c = 299792458.
x(x) = c / x
y(x) = x
h(x) = a * x + b
fit h(x) "linear_fit.dat" u (x($1)):(y($2)) via a,b
plot "linear_fit.dat" u (x($1)):(y($2)) w points title "", \
(h(x)) with lines linecolor rgb "black" title "Linear Fit"
However, after the iterations converge, b is always 1.0: https://dpaste.de/ozReq/
How can I get gnuplot to adjust b as well as a?
Update: Repeating the fit command a few hundred times with alternating via a/via b does give pretty good results, but that just can't be how it's supposed to be done.
Update 2: Here's the data in linear_fit.dat:
# lambda, V
360e-9 1.119
360e-9 1.148
360e-9 1.145
400e-9 0.949
400e-9 0.993
400e-9 0.971
440e-9 0.883
440e-9 0.875
440e-9 0.863
490e-9 0.737
490e-9 0.728
490e-9 0.755
540e-9 0.575
540e-9 0.571
540e-9 0.592
590e-9 0.457
590e-9 0.455
590e-9 0.482
I think your troubles stem from the fact that your x-values are very large (on the order of 10e14).
If you do not provide gnuplot with an initial guess for a and b, it will assume a=1 and b=1 as starting points for the fit. However, this is a poor initial guess:
Please note the log scale on both the x- and y-axis.
From the gnuplot documentation:
fit may, and often will get "lost" if started far from a solution, where SSR is large and changing slowly as the parameters are varied, or it may reach a numerically unstable region (e.g., too large a number causing a floating point overflow) which results in an "undefined value" message or gnuplot halting.
To improve the chances of finding the global optimum, you should set the starting values at least roughly in the vicinity of the solution, e.g., within an order of magnitude, if possible. The closer your starting values are to the solution, the less chance of stopping at another minimum. One way to find starting values is to plot data and the fitting function on the same graph and change parameter values and replot until reasonable similarity is reached. The same plot is also useful to check whether the fit stopped at a minimum with a poor fit.
In your case, such starting values could be:
a = 1e-15
b = -0.5
I obtained these values by eye-balling your range of values.
With those starting values, the linear fit results in:
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 1.97355e-015 +/- 6.237e-017 (3.161%)
b = -0.5 +/- 0.04153 (8.306%)
Which looks like this:
You can play with the control setting of fit (such as setting FIT_LIMIT = 1.e-35) or the starting values to achieve a better fit than this.
EDIT
While I still have not been able to coax gnuplot into modifying both parameters a, b at the same time, I found an alternate approach using R. I am aware that there are many other (scripting) languages that can perform a linear fit and this question was about gnuplot. However, the required effort with R appeared to be minimal.
Here's an example, which, when saved as linear_fit.R and called with
R CMD BATCH linear_fit.R
will provide the two coefficients of the linear fit, that gnuplot failed to provide.
y <- c(1.119, 1.148, 1.145, 0.949, 0.993, 0.971, 0.883, 0.875, 0.863,
0.737, 0.728, 0.755, 0.575, 0.571, 0.592, 0.457, 0.455, 0.482)
x <- c(3.60E-007, 3.60E-007, 3.60E-007, 4.00E-007, 4.00E-007,
4.00E-007, 4.40E-007, 4.40E-007, 4.40E-007, 4.90E-007,
4.90E-007, 4.90E-007, 5.40E-007, 5.40E-007, 5.40E-007,
5.90E-007, 5.90E-007, 5.90E-007)
c = 299792458.
x <- c/x
lm.out <- lm(y ~ x)
svg("linear_fit.svg")
plot(x,y)
abline(lm.out,col="red")
summary(lm.out)
You will end up with an svg-file that contains the plot and a linear_fit.Rout text file. In there you'll find the following coefficients:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.429e-01 4.012e-02 -13.53 3.55e-10 ***
x 2.037e-15 6.026e-17 33.80 2.61e-16 ***
So, in the terminology of the original question, we obtain:
a = 2.037e-15
b = -5.429e-01
These values are very close to the values you quoted from alternating the fit.
In case the comments get purged, these questions were identified as related:
What is gnuplot's internal representation of floating point numbers?
Gnuplot behaves oddly in polynomial fit. Why is that?

Resources