GNUPLOT - How to plot sum function that depends on limit? - gnuplot

I want to plot the following function in gnuplot:
f(x) = \sum_{i=1}^{x-1} 10^i
I tried using f(x)= (sum [i=1:x-1] 10 ** i), but it says "range specifiers of sum must have integer values".
Any idea how can I plot this sum function where the independent variable is in the limits? Thanks!

Make an integer out of your independent variable. Check help int. Maybe needs some changes depending on what exactly you want to have. Try:
f(x) = (sum [i=1:int(x-1)] 10**i)

Related

Why my fit for a logarithm function looks so wrong

I'm plotting this dataset and making a logarithmic fit, but, for some reason, the fit seems to be strongly wrong, at some point I got a good enough fit, but then I re ploted and there were that bad fit. At the very beginning there were a 0.0 0.0076 but I changed that to 0.001 0.0076 to avoid the asymptote.
I'm using (not exactly this one for the image above but now I'm testing with this one and there is that bad fit as well) this for the fit
f(x) = a*log(k*x + b)
fit = fit f(x) 'R_B/R_B.txt' via a, k, b
And the output is this
Also, sometimes it says 7 iterations were as is the case shown in the screenshot above, others only 1, and when it did the "correct" fit, it did like 35 iterations or something and got a = 32 if I remember correctly
Edit: here is again the good one, the plot I got is this one. And again, I re ploted and get that weird fit. It's curious that if there is the 0.0 0.0076 when the good fit it's about to be shown, gnuplot says "Undefined value during function evaluation", but that message is not shown when I'm getting the bad one.
Do you know why do I keep getting this inconsistence? Thanks for your help
As I already mentioned in comments the method of fitting antiderivatives is much better than fitting derivatives because the numerical calculus of derivatives is strongly scattered when the data is slightly scatered.
The principle of the method of fitting an integral equation (obtained from the original equation to be fitted) is explained in https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales . The application to the case of y=a.ln(c.x+b) is shown below.
Numerical calculus :
In order to get even better result (according to some specified criteria of fitting) one can use the above values of the parameters as initial values for iterarive method of nonlinear regression implemented in some convenient software.
NOTE : The integral equation used in the present case is :
NOTE : On the above figure one can compare the result with the method of fitting an integral equation to the result with the method of fitting with derivatives.
Acknowledgements : Alex Sveshnikov did a very good work in applying the method of regression with derivatives. This allows an interesting and enlightening comparison. If the goal is only to compute approximative values of parameters to be used in nonlinear regression software both methods are quite equivalent. Nevertheless the method with integral equation appears preferable in case of scattered data.
UPDATE (After Alex Sveshnikov updated his answer)
The figure below was drawn in using the Alex Sveshnikov's result with further iterative method of fitting.
The two curves are almost indistinguishable. This shows that (in the present case) the method of fitting the integral equation is almost sufficient without further treatment.
Of course this not always so satisfying. This is due to the low scatter of the data.
In ADDITION , answer to a question raised in comments by CosmeticMichu :
The problem here is that the fit algorithm starts with "wrong" approximations for parameters a, k, and b, so during the minimalization it finds a local minimum, not the global one. You can improve the result if you provide the algorithm with starting values, which are close to the optimal ones. For example, let's start with the following parameters:
gnuplot> a=47.5087
gnuplot> k=0.226
gnuplot> b=1.0016
gnuplot> f(x)=a*log(k*x+b)
gnuplot> fit f(x) 'R_B.txt' via a,k,b
....
....
....
After 40 iterations the fit converged.
final sum of squares of residuals : 16.2185
rel. change during last iteration : -7.6943e-06
degrees of freedom (FIT_NDF) : 18
rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : 0.949225
variance of residuals (reduced chisquare) = WSSR/ndf : 0.901027
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 35.0415 +/- 2.302 (6.57%)
k = 0.372381 +/- 0.0461 (12.38%)
b = 1.07012 +/- 0.02016 (1.884%)
correlation matrix of the fit parameters:
a k b
a 1.000
k -0.994 1.000
b 0.467 -0.531 1.000
The resulting plot is
Now the question is how you can find "good" initial approximations for your parameters? Well, you start with
If you differentiate this equation you get
or
The left-hand side of this equation is some constant 'C', so the expression in the right-hand side should be equal to this constant as well:
In other words, the reciprocal of the derivative of your data should be approximated by a linear function. So, from your data x[i], y[i] you can construct the reciprocal derivatives x[i], (x[i+1]-x[i])/(y[i+1]-y[i]) and the linear fit of these data:
The fit gives the following values:
C*k = 0.0236179
C*b = 0.106268
Now, we need to find the values for a, and C. Let's say, that we want the resulting graph to pass close to the starting and the ending point of our dataset. That means, that we want
a*log(k*x1 + b) = y1
a*log(k*xn + b) = yn
Thus,
a*log((C*k*x1 + C*b)/C) = a*log(C*k*x1 + C*b) - a*log(C) = y1
a*log((C*k*xn + C*b)/C) = a*log(C*k*xn + C*b) - a*log(C) = yn
By subtracting the equations we get the value for a:
a = (yn-y1)/log((C*k*xn + C*b)/(C*k*x1 + C*b)) = 47.51
Then,
log(k*x1+b) = y1/a
k*x1+b = exp(y1/a)
C*k*x1+C*b = C*exp(y1/a)
From this we can calculate C:
C = (C*k*x1+C*b)/exp(y1/a)
and finally find the k and b:
k=0.226
b=1.0016
These are the values used above for finding the better fit.
UPDATE
You can automate the process described above with the following script:
# Name of the file with the data
data='R_B.txt'
# The coordinates of the last data point
xn=NaN
yn=NaN
# The temporary coordinates of a data point used to calculate a derivative
x0=NaN
y0=NaN
linearFit(x)=Ck*x+Cb
fit linearFit(x) data using (xn=$1,dx=$1-x0,x0=$1,$1):(yn=$2,dy=$2-y0,y0=$2,dx/dy) via Ck, Cb
# The coordinates of the first data point
x1=NaN
y1=NaN
plot data using (x1=$1):(y1=$2) every ::0::0
a=(yn-y1)/log((Ck*xn+Cb)/(Ck*x1+Cb))
C=(Ck*x1+Cb)/exp(y1/a)
k=Ck/C
b=Cb/C
f(x)=a*log(k*x+b)
fit f(x) data via a,k,b
plot data, f(x)
pause -1

Is there any error variable for gnuplot fit?

I'm making a c++ code which prints commands for gnuplot, in order to plot different things faster. The code plots the data already as the data fit as well, but now I'm adding some labels, and I want to print the fit equation, I mean something with this form
f(x) = (a +/- Δa)*x + (b +/- Δb)
I have the following line for printing it
set label 1 at screen 0.22, screen 0.75 sprintf('f(x) = %3.4f*x + %3.4f', a, b)
But, as you can see, there is only a and b values with no errors, I was thinking something like put there in the sprintf function any error related variables (FIT_something) and then have something like
set label 1 at screen 0.22, screen 0.75 sprintf('f(x) = (%3.4f +/- %3.4f)*x + (%3.4f + %3.4f)', a, deltaa, b, deltab)
But I can't find those, my answers are: does those exists? and if the answer is no, is there any way to print the variable errors further just writing it explicitly on the line?
Thanks for your help
Please read the statistical overview section of the gnuplot documentation (help statistical_overview). Keeping in mind the caveats described there, see also the documentation for set fit errorvariables, which I extract below:
If the `errorvariables` option is turned on, the error of each fitted
parameter computed by `fit` will be copied to a user-defined variable
whose name is formed by appending "_err" to the name of the parameter
itself. This is useful mainly to put the parameter and its error onto
a plot of the data and the fitted function, for reference, as in:
set fit errorvariables
fit f(x) 'datafile' using 1:2 via a, b
print "error of a is:", a_err
set label 1 sprintf("a=%6.2f +/- %6.2f", a, a_err)
plot 'datafile' using 1:2, f(x)
If the `errorscaling` option is specified, which is the default, the
calculated parameter errors are scaled with the reduced chi square. This is
equivalent to providing data errors equal to the calculated standard
deviation of the fit (FIT_STDFIT) resulting in a reduced chi square of one.

fit function in gnuplot at x-log(y) scale

My data has two columns: date (in Month/Year format) and corresponding value. I plotted this data on x-log(y) scale using gnuplot. It looks very close to a straight line. I am interested to draw a straight line using curve fitting. I tried with few fit functions but did not get success.
I tried the following fit functions:
f(x) = a * x + b (f(x) is not linear as scale is x-log(y))
f(x) = a*10**x + b (overflow error)
Any help in this regard would be appreciated.
The overflow error should be due to at least one large value of x. If you can rescale the x data so that there is no overflow when calculating 10**x, the fit might work. As a test, try something like:
x_scaled = x / 1000.0
f(x_scaled) = a*10**x_scaled + b
Inspecting the maximum value of x will give you an idea of the scaling value, shown as 1000.0 in my example.

Gnuplot fit of a nested function

What is the proper way in gnuplot to fit a function f(x) having the next form?
f(x) = A*exp(x - B*f(x))
I tried to fit it as any other function using:
fit f(x) "data.txt" via A,B
and the output is just a sentence saying: "stack overflow"
I don't even know how to look for this topic so any help would be much appreciate it.
How are this kind of functions called? Nested? Recursive? Implicit?
Thanks
This doen't only fail for fitting, also for plotting. You'll have to write down the explicit form of f(x), otherwise gnuplot will loop it until it reaches its recursion limit. One way to do it would be to use a different name:
f(x) = sin(x) # for example
g(x) = A*exp(x - B*f(x))
And now use g(x) to fit, rather than f(x). If you have never declared f(x), then gnuplot doesn't have an expression to work with. In any case, if you want to recursively define a function, you'll at least need to set a recursion limit. Maybe something like this:
f0(x) = x
f1(x) = A*exp(x - B*f0(x))
f2(x) = A*exp(x - B*f1(x))
f3(x) = A*exp(x - B*f2(x))
...
This can be automatically looped:
limit=10
f0(x) = x
do for [i=1:limit] {
j=i-1
eval "f".i."(x) = A*exp(x - B*f".j."(x))"
}
Using the expression above you set the recursion limit with the limit variable. In any case it shall remain a finite number.
That is a recursive function. You need a condition for the recursion to stop, like a maximum number of iterations:
maxiter = 10
f(x, n) = (n > maxiter ? 0 : A*exp(x - B*f(x, n+1)))
fit f(x, 0) "data.txt" via A,B
Of course you must check, which value should be returned when the recursion is stopped (here I used 0)
Thanks for your replies
Discussing with a friend about this problem I found a way around.
First, this kind of functions are call "transcendental functions", that means that the function f(x) is not explicitly solvable, but the variable x could be solved as a function of f(x) and it will have the next form
x = B*f(x) + log(f(x)/A)
Therefore it is possible to define a new function (that is not transcendental)
g(x) = B*x + log(x/A)
From here you can fit the function g(x) to the plot x vs y. Using gnuplot it is possible to do the fitting as
fit g(x) "data.txt" using ($2):($1) via A,B
Hope this will help someone else

how to fit 3D data within zrange in gnuplot

I know to fit 2D data has z value between [-1:4] in gnuplot is
f(x)=a*x+b
fit [][-1:4] f(x) "data"
but for 3D data , if I only want to fit data when f(x) has value between [-1:4]
f(x)=a*x+b*y+c
fit [][-1:4] f(x) "data"
fit [][][-1:4] f(x) "data"
are both wrong. why ?
I am not sure, if the range behaviour you describe with the 2D fit is actually intended, because it does not work with the gnuplot development version. And according to the documentation, the range specifications for the fit command apply only to the dummy variables (i.e. x and y). So it might be, that your first fit command works only because of a bug, which is a feature for you.
To limit the z-range, you can set all values outside the desired range to 1/0, which results in an undefined data point which is then ignored:
f(x, y) = a*x + b*y + c
zmin = -1
zmax = 4
fit f(x, y) "data" using 1:2:($3 < zmin || $3 > zmax ? 1/0 : $3):(1) via a,b,c
Note, that your function must be defined for two dummy variables x and y, and you must have the via statement, which is missing in all of your examples.
To fit a function with two independent variables, z=f(x,y), the required
format is using with four items, x:y:z:s. The complete format must be
given---no default columns are assumed for a missing token. Weights for
each data point are evaluated from 's' as above. If error estimates are
not available, a constant value can be specified as a constant expression
(see plot datafile using), e.g., using 1:2:3:(1).
This plots a plane in 3D, not a line. I was confused until I zoomed out and realized. Try the below dataset of 4 points. 'Set autoscale' to make sure you see the whole image. Or just read the fit.log file and realize the errors are high indicating a poor fit.
377.4202 -345.5518 2.1142
377.4201 -345.5505 2.5078
377.4206 -345.556 2.8359
377.4288 -345.5555 3.2109

Resources