Fitting in gnuplot with three variables - gnuplot

I am trying to fit some data using gnuplot.
Here is the data (variables h, k,l and I):
#h k l I
2 1 1 7807
2 2 0 9664
3 2 1 6042
4 0 0 1394
3 3 2 1358
4 2 0 4896
### Function
I(h,k,l) = M * (F * ( (sin(A*pi*sqrt(h*h+k*k+l*l)*L))/(A*2*pi*sqrt(h*h+k*k+l*l)) ))^2
### Initial values
M=1
F=0.5
A=1
L=1
### Fitting
fit I(h,k,l) "cavendish.data" using 1:2:3 via M, F, A, L
I want to determine the constants M,F,A and L from this fitting.
When I run this code I get message undefined variable: h
How can I could determine the variables. Thanks in advance.

Try using a recent version of gnuplot (>= 5.0), which supports fit commands with more than two variables (see release notes). Also note, that the power operator in gnuplot is ** and not ^.
You're example has to be changed slightly to work:
### Function
I(h,k,l) = M * (F * ((sin(A*pi*sqrt(h*h+k*k+l*l)*L))/(A*2*pi*sqrt(h*h+k*k+l*l)) ))**2
### Initial values
M=1.0
F=0.5
A=1.0
L=1.0
### Fitting
set dummy h, k, l
fit I(h,k,l) "cavendish.data" using 1:2:3:4 via M, F, A, L

Related

Why isn't GNUPlot drawing a trendline that's fitted to the points in my dataset?

I have the following GNUPlot sequence of commands:
$ cat bb.gnuplot
set datafile separator ","
set autoscale x
set autoscale y
set xdata time
set timefmt "%Y%m%d"
set format x "%Y%m%d"
set key left top
set grid
m=1
b=1
f(x) = m*x + b
fit f(x) "bb" using 1:2 via m,b
plot "bb" using 1:2 title "filebeat-6.5.1", f(x) title "fit"
Along with this sample data:
$ cat bb
20190416,0
20190417,0
20190418,0
20190419,0
20190420,0
20190423,0
20190424,0
20190425,0
20190426,0
20190509,0
20190510,72
20190511,62
20190512,63
20190513,108
20190514,78
20190515,66
20190516,59
20190517,86
20190518,57
20190519,57
20190520,62
20190521,78
20190522,95
20190523,104
20190524,22
20190525,128
20190526,96
20190527,125
20190528,129
20190529,152
20190530,160
20190531,148
20190601,136
20190602,178
20190603,198
20190604,148
20190605,140
20190606,142
20190607,171
20190608,205
20190609,174
20190610,198
20190611,208
20190612,205
20190613,13
I'm trying to get GNUPlot to draw a trend line in the same plot but the line I'm getting doesn't make sense to me in terms of where it's getting placed in my plot.
$ gnuplot < bb.gnuplot
iter chisq delta/lim lambda m b
0 1.0926745428e+20 0.00e+00 1.10e+09 1.000000e+00 1.000000e+00
1 1.3194958855e+16 -8.28e+08 1.10e+08 1.098907e-02 1.000000e+00
2 1.6307478323e+08 -8.09e+12 1.10e+07 1.279057e-06 1.000000e+00
3 2.1025098835e+05 -7.75e+07 1.10e+06 5.819285e-08 1.000000e+00
4 2.1025098815e+05 -9.56e-05 1.10e+05 5.819150e-08 1.000000e+00
iter chisq delta/lim lambda m b
After 4 iterations the fit converged.
final sum of squares of residuals : 210251
rel. change during last iteration : -9.56318e-10
degrees of freedom (FIT_NDF) : 43
rms of residuals (FIT_STDFIT) = sqrt(WSSR/ndf) : 69.9254
variance of residuals (reduced chisquare) = WSSR/ndf : 4889.56
Final set of parameters Asymptotic Standard Error
======================= ==========================
m = 5.81915e-08 +/- 7.064e-06 (1.214e+04%)
b = 1 +/- 1.101e+04 (1.101e+06%)
correlation matrix of the fit parameters:
m b
m 1.000
b -1.000 1.000
Resulting graph:
I'm expecting the line to cut through my points and show me the optimally fitted line among the data points that I've provided it.
What am I missing here?
I can't find the appropriate section in the manual and I can't explain it well but
exchange your function with:
f(x) = m*(x-strptime("%Y%m%d","20190509")) + b
I guess it has something to do with offset/prescaling and because time/date data is handled internally as seconds passed from January, 1st 1970. So, today, June, 13th 2019 is approx. 1'560'000'000 seconds. And your time span is only about 4'580'000 seconds This makes it difficult to find proper parameters. If I find a better explanation, I will add it (or maybe somebody else can explain better).
Result:

Plotting Average curve for points in gnuplot

[Current]
I am importing a text file in which the first column has simulation time (0~150) the second column has the delay (0.01~0.02).
1.000000 0.010007
1.000000 0.010010
2.000000 0.010013
2.000000 0.010016
.
.
.
149.000000 0.010045
149.000000 0.010048
150.000000 0.010052
150.000000 0.010055
which gives me the plot:
[Desired]
I need to plot an average line on it like shown in the following image with red line:
Here is a gnuplot only solution with sample data:
set table "test.data"
set samples 1000
plot rand(0)+sin(x)
unset table
You should check the gnuplot demo page for a running average. I'm going to generalize this demo in terms of dynamically building the functions. This makes it much easier to change the number of points include in the average.
This is the script:
# number of points in moving average
n = 50
# initialize the variables
do for [i=1:n] {
eval(sprintf("back%d=0", i))
}
# build shift function (back_n = back_n-1, ..., back1=x)
shift = "("
do for [i=n:2:-1] {
shift = sprintf("%sback%d = back%d, ", shift, i, i-1)
}
shift = shift."back1 = x)"
# uncomment the next line for a check
# print shift
# build sum function (back1 + ... + backn)
sum = "(back1"
do for [i=2:n] {
sum = sprintf("%s+back%d", sum, i)
}
sum = sum.")"
# uncomment the next line for a check
# print sum
# define the functions like in the gnuplot demo
# use macro expansion for turning the strings into real functions
samples(x) = $0 > (n-1) ? n : ($0+1)
avg_n(x) = (shift_n(x), #sum/samples($0))
shift_n(x) = #shift
# the final plot command looks quite simple
set terminal pngcairo
set output "moving_average.png"
plot "test.data" using 1:2 w l notitle, \
"test.data" using 1:(avg_n($2)) w l lc rgb "red" lw 3 title "avg\\_".n
This is the result:
The average lags quite a bit behind the datapoints as expected from the algorithm. Maybe 50 points are too many. Alternatively, one could think about implementing a centered moving average, but this is beyond the scope of this question.
And, I also think that you are more flexible with an external program :)
Here's some replacement code for the top answer, which makes this also work for 1000+ points and much much faster. Only works in gnuplot 5.2 and later I guess
# number of points in moving average
n = 5000
array A[n]
samples(x) = $0 > (n-1) ? n : int($0+1)
mod(x) = int(x) % n
avg_n(x) = (A[mod($0)+1]=x, (sum [i=1:samples($0)] A[i]) / samples($0))
Edit
The updated question is about a moving average.
You can do this in a limited way with gnuplot alone, according to this demo.
But in my opinion, it would be more flexible to pre-process your data using a programming language like python or ruby and add an extra column for whatever kind of moving average you require.
The original answer is preserved below:
You can use fit. It seems you want to fit to a constant function. Like this:
f(x) = c
fit f(x) 'S1_delay_120_LT100_LU15_MU5.txt' using 1:2 every 5 via c
Then you can plot them both.
plot 'S1_delay_120_LT100_LU15_MU5.txt' using 1:2 every 5, \
f(x) with lines
Note that this is technique can be used with arbitrary functions, not just constant or lineair functions.
I wanted to comment on Franky_GT, but somehow stackoverflow didn't let me.
However, Franky_GT, your answer works great!
A note for people plotting .xvg files (e.g. after doing analysis of MD simulations), if you don't add the following line:
set datafile commentschars "##&"
Franky_GT's moving average code will result in this error:
unknown type in imag()
I hope this is of use to anyone.
For gnuplot >=5.2, probably the most efficient solution is using an array like #Franky_GT's solution.
However, it uses the pseudocolumn 0 (see help pseudocolumns). In case you have some empty lines in your data $0 will be reset to 0 which eventually might mess up your average.
This solution uses an index t to count up the datalines and a second array X[] in case a centered moving average is desired. Datapoints don't have to be equidistant in x.
At the beginning there will not be enough datapoints for a centered average of N points so for the x-value it will use every second point and the other will be NaN, that's why set datafile missing NaN is necessary to plot a connected line at the beginning.
Code:
### moving average over N points
reset session
# create some test data
set print $Data
y = 0
do for [i=1:5000] {
print sprintf("%g %g", i, y=y+rand(0)*2-1)
}
set print
# average over N values
N = 250
array Avg[N]
array X[N]
MovAvg(col) = (Avg[(t-1)%N+1]=column(col), n = t<N ? t : N, t=t+1, (sum [i=1:n] Avg[i])/n)
MovAvgCenterX(col) = (X[(t-1)%N+1]=column(col), n = t<N ? t%2 ? NaN : (t+1)/2 : ((t+1)-N/2)%N+1, n==n ? X[n] : NaN) # be aware: gnuplot does integer division here
set datafile missing NaN
plot $Data u 1:2 w l ti "Data", \
t=1 '' u 1:(MovAvg(2)) w l lc rgb "red" ti sprintf("Moving average over %d",N), \
t=1 '' u (MovAvgCenterX(1)):(MovAvg(2)) w l lw 2 lc rgb "green" ti sprintf("Moving average centered over %d",N)
### end of code
Result:

Fit more than one block of data from the same file

I have these two blocks of data in the same file. Both represents a set of measurements that I want to fit then using a single script to compare each other. I know that it would be easier separate in two files and than fit each one separately but I'll have more than two blocks and it would be boring. Someone know how should I do it?.
I tried to use:
f(x) = a*x^b
f1(x) = a1*x^b1
fit f(x) "temp.dat" i 0 u 1:2:4 via a,b, f1(x) "temp.dat" i 1 u 1:2:4 via a1,b1
p f(x), "temp.dat" i 0 u 1:2:4 w yer, f1(x), "temp.dat" i 1 u 1:2:4
Thks
1 100 2.13048e-09 0.2 2.4178e-11
2 140 1.51668e-09 0.2 1.69698e-11
3 180 1.18001e-09 0.2 1.35081e-11
4
5 100 1.41599e-09 0.3 1.62087e-11
6 140 1.02526e-09 0.3 1.16511e-11
7 180 8.1794e-10 0.3 9.50745e-12
Note that your data file blocks should be separated by two blank lines in order to use the index option. Otherwise, with only one blank line, you need to use every.
That said, what you want to achieve can be done with eval and a do for loop:
do for [i=0:1] {
eval sprintf("f%i(x) = a%i + b%i * x", i, i, i)
eval sprintf("fit f%i(x) 'temp.dat' i %i via a%i, b%i", i, i, i, i)
}
plot "temp.dat" i 0, f0(x), "temp.dat" i 1, f1(x)

Gnuplot: Expression in input data

Is there a way to specify that input data is an expression that needs to be evaluated?
In my case the data is rational numbers encoded in the format "n/d". Is there a way to tell gnuplot to interpret "n/d" as "n divided by d"?
Example input data:
1/9 1
1/8 2
1/7 3
1/6 4
I tried plot "data" using ($1):2 but this truncates "n/d" to "n".
Update: After some digging in the manual, I found that in this case I can tell gnuplot to interpret "/" as a column separator and then divide the first number by the second as follows: plot "data" using ($1/$2):3 '%lf/%lf %lf'
I don't know a gnuplot only answer. But you can use the system command to let another program do the work. For example the bc program on linux. The following script works for me:
result(s) = system(sprintf('echo "%s" | bc -l ~/.bcrc', s)) + 0
set table "data.eval"
plot "data.dat" using 1:(result(strcol(2)))
unset table
This is the datafile:
1 1/2
2 1/2.0
3 4+4
4 4*5-1
5 4*(5-1)-(3-7)
6 sin(3.1415)
This is the output:
# Curve 0 of 1, 6 points
# Curve title: ""data.dat" using 1:(result(strcol(2)))"
# x y type
1 0.5 i
2 0.5 i
3 8 i
4 19 i
5 20 i
6 9.26536e-05 i
Notes:
The set table "data.eval" prints the values into a file, now it is easier to check the results.
strcol(2) reads the entries of the second column as a string. The expression must not contain white space.
The function result transfers the string to bc. The string itself must be quoted, else the shell would complain for example about brackets as in line 5 or 6 of the datafile.
The option -l on bc enables floating point evaluation of expressions like in the first line (1/2 = 0.5 instead of 1/2 = 0), and it defines functions like s(x) for sine and e(x) for exp(x).
~/.bcrc reads some function definitions
The system command returns a string. The string is promoted to a floating point number by adding 0.
My ~/.bcrc looks like this:
pi=4*a(1)
e=e(1)
define ln(x)
{return(l(x))}
define lg(x)
{return(l(x)/l(10))}
define exp(x)
{return(e(x))}
define sin(x)
{return(s(x))}
define fac(x)
{if (x<=1) return(1);
return(fac(x-1)*x)}
define ncr(n,r)
{return(fac(n)/(fac(r)*fac(n-r)))}
Tested with gnuplot 4.6 and bc 1.06.95 on Debian Jessie. On Windows you have the set command for integer calculations. It seems that Google knows some other commandline calculators.
It wouldn't be gnuplot if there wasn't a gnuplot-only solution.
Simply collect your expressions in a string by "mis"using stats and evaluate them via eval in a do for loop and write the results in a string and convert the values to a number via real and plot them.
Check help stats, help do, help eval, help real and the example below. Most of the data is taken from #maij's answer. The script works for gnuplot>=5.0 and with some adaptions probably with earlier versions.
Script: (works for gnuplot>=5.0, Jan 2015)
### evaluate expressions in input data
reset session
$Data <<EOD
1 1/2 # integer division
2 1/2.0 # float division
3 4+4
4 4*5-1
5 4*(5-1)-(3-7)
6 sin(3.1415/2)
7 2**3
8 sqrt(9)
EOD
myCol = 2
myExprs = ''
stats $Data u (myExprs=myExprs.sprintf(' "v=%s"',strcol(myCol))) nooutput
myValues = ''
do for [i=1:words(myExprs)] {
eval word(myExprs,i)
myValues = myValues.sprintf(" %g",v)
}
myValue(n) = real(word(myValues,int(column(n)+1)))
set offsets 0.5,0.5,2,0
plot $Data u 1:(myValue(0)) w lp pt 7 lc "red" ti "Expressions", \
'' u 1:(myValue(0)):2 w labels offset 0,1 notitle
### end of script
Result:

Can gnuplot compute and plot the delta between consecutive data points

For instance, given the following data file (x^2 for this example):
0
1
4
9
16
25
Can gnuplot plot the points along with the differences between the points as if it were:
0 0
1 1 # ( 1 - 0 = 1)
4 3 # ( 4 - 1 = 3)
9 5 # ( 9 - 4 = 5)
16 7 # (16 - 9 = 7)
25 9 # (25 -16 = 9)
The actual file has more than just the column I'm interested in and I would like to avoid pre-processing in order to add the deltas, if possible.
dtop's solution didn't work for me, but this works and is purely gnuplot (not calling awk):
delta_v(x) = ( vD = x - old_v, old_v = x, vD)
old_v = NaN
set title "Compute Deltas"
set style data lines
plot 'data.dat' using 0:($1), '' using 0:(delta_v($1)) title 'Delta'
Sample data file named 'data.dat':
0
1
4
9
16
25
Here's how to do this without pre-processing:
Script for gnuplot:
# runtime_delta.dem script
# run with
# gnuplot> load 'runtime_delta.dem'
#
reset
delta_v(x) = ( vD = x - old_v, old_v = x, vD)
old_v = NaN
set title "Compute Deltas"
set style data lines
plot 'runtime_delta.dat' using 0:(column('Data')), '' using 0:(delta_v(column('Data'))) title 'Delta'
Sample data file 'runtime_delta.dat':
Data
0
1
4
9
16
25
How about using awk?
plot "< awk '{print $1,$1-prev; prev=$1}' <datafilename>"
Below is a version that uses arrays from Gnuplot 5.1. Using arrays allows multiple diffs to be calculated in single Gnuplot instance.
array Z[128]
do for [i=1:128] { Z[i] = NaN }
diff(i, x) = (y = x - Z[i], Z[i] = x, y)
i is the instance index that needs to be incremented for each use. For example
plot "file1.csv" using 1:(diff(1,$2)) using line, \
"file2.csv" using 1:(diff(2,$2)) using line

Resources