Can gnuplot compute and plot the delta between consecutive data points - gnuplot

For instance, given the following data file (x^2 for this example):
0
1
4
9
16
25
Can gnuplot plot the points along with the differences between the points as if it were:
0 0
1 1 # ( 1 - 0 = 1)
4 3 # ( 4 - 1 = 3)
9 5 # ( 9 - 4 = 5)
16 7 # (16 - 9 = 7)
25 9 # (25 -16 = 9)
The actual file has more than just the column I'm interested in and I would like to avoid pre-processing in order to add the deltas, if possible.

dtop's solution didn't work for me, but this works and is purely gnuplot (not calling awk):
delta_v(x) = ( vD = x - old_v, old_v = x, vD)
old_v = NaN
set title "Compute Deltas"
set style data lines
plot 'data.dat' using 0:($1), '' using 0:(delta_v($1)) title 'Delta'
Sample data file named 'data.dat':
0
1
4
9
16
25

Here's how to do this without pre-processing:
Script for gnuplot:
# runtime_delta.dem script
# run with
# gnuplot> load 'runtime_delta.dem'
#
reset
delta_v(x) = ( vD = x - old_v, old_v = x, vD)
old_v = NaN
set title "Compute Deltas"
set style data lines
plot 'runtime_delta.dat' using 0:(column('Data')), '' using 0:(delta_v(column('Data'))) title 'Delta'
Sample data file 'runtime_delta.dat':
Data
0
1
4
9
16
25

How about using awk?
plot "< awk '{print $1,$1-prev; prev=$1}' <datafilename>"

Below is a version that uses arrays from Gnuplot 5.1. Using arrays allows multiple diffs to be calculated in single Gnuplot instance.
array Z[128]
do for [i=1:128] { Z[i] = NaN }
diff(i, x) = (y = x - Z[i], Z[i] = x, y)
i is the instance index that needs to be incremented for each use. For example
plot "file1.csv" using 1:(diff(1,$2)) using line, \
"file2.csv" using 1:(diff(2,$2)) using line

Related

Plotting Average curve for points in gnuplot

[Current]
I am importing a text file in which the first column has simulation time (0~150) the second column has the delay (0.01~0.02).
1.000000 0.010007
1.000000 0.010010
2.000000 0.010013
2.000000 0.010016
.
.
.
149.000000 0.010045
149.000000 0.010048
150.000000 0.010052
150.000000 0.010055
which gives me the plot:
[Desired]
I need to plot an average line on it like shown in the following image with red line:
Here is a gnuplot only solution with sample data:
set table "test.data"
set samples 1000
plot rand(0)+sin(x)
unset table
You should check the gnuplot demo page for a running average. I'm going to generalize this demo in terms of dynamically building the functions. This makes it much easier to change the number of points include in the average.
This is the script:
# number of points in moving average
n = 50
# initialize the variables
do for [i=1:n] {
eval(sprintf("back%d=0", i))
}
# build shift function (back_n = back_n-1, ..., back1=x)
shift = "("
do for [i=n:2:-1] {
shift = sprintf("%sback%d = back%d, ", shift, i, i-1)
}
shift = shift."back1 = x)"
# uncomment the next line for a check
# print shift
# build sum function (back1 + ... + backn)
sum = "(back1"
do for [i=2:n] {
sum = sprintf("%s+back%d", sum, i)
}
sum = sum.")"
# uncomment the next line for a check
# print sum
# define the functions like in the gnuplot demo
# use macro expansion for turning the strings into real functions
samples(x) = $0 > (n-1) ? n : ($0+1)
avg_n(x) = (shift_n(x), #sum/samples($0))
shift_n(x) = #shift
# the final plot command looks quite simple
set terminal pngcairo
set output "moving_average.png"
plot "test.data" using 1:2 w l notitle, \
"test.data" using 1:(avg_n($2)) w l lc rgb "red" lw 3 title "avg\\_".n
This is the result:
The average lags quite a bit behind the datapoints as expected from the algorithm. Maybe 50 points are too many. Alternatively, one could think about implementing a centered moving average, but this is beyond the scope of this question.
And, I also think that you are more flexible with an external program :)
Here's some replacement code for the top answer, which makes this also work for 1000+ points and much much faster. Only works in gnuplot 5.2 and later I guess
# number of points in moving average
n = 5000
array A[n]
samples(x) = $0 > (n-1) ? n : int($0+1)
mod(x) = int(x) % n
avg_n(x) = (A[mod($0)+1]=x, (sum [i=1:samples($0)] A[i]) / samples($0))
Edit
The updated question is about a moving average.
You can do this in a limited way with gnuplot alone, according to this demo.
But in my opinion, it would be more flexible to pre-process your data using a programming language like python or ruby and add an extra column for whatever kind of moving average you require.
The original answer is preserved below:
You can use fit. It seems you want to fit to a constant function. Like this:
f(x) = c
fit f(x) 'S1_delay_120_LT100_LU15_MU5.txt' using 1:2 every 5 via c
Then you can plot them both.
plot 'S1_delay_120_LT100_LU15_MU5.txt' using 1:2 every 5, \
f(x) with lines
Note that this is technique can be used with arbitrary functions, not just constant or lineair functions.
I wanted to comment on Franky_GT, but somehow stackoverflow didn't let me.
However, Franky_GT, your answer works great!
A note for people plotting .xvg files (e.g. after doing analysis of MD simulations), if you don't add the following line:
set datafile commentschars "##&"
Franky_GT's moving average code will result in this error:
unknown type in imag()
I hope this is of use to anyone.
For gnuplot >=5.2, probably the most efficient solution is using an array like #Franky_GT's solution.
However, it uses the pseudocolumn 0 (see help pseudocolumns). In case you have some empty lines in your data $0 will be reset to 0 which eventually might mess up your average.
This solution uses an index t to count up the datalines and a second array X[] in case a centered moving average is desired. Datapoints don't have to be equidistant in x.
At the beginning there will not be enough datapoints for a centered average of N points so for the x-value it will use every second point and the other will be NaN, that's why set datafile missing NaN is necessary to plot a connected line at the beginning.
Code:
### moving average over N points
reset session
# create some test data
set print $Data
y = 0
do for [i=1:5000] {
print sprintf("%g %g", i, y=y+rand(0)*2-1)
}
set print
# average over N values
N = 250
array Avg[N]
array X[N]
MovAvg(col) = (Avg[(t-1)%N+1]=column(col), n = t<N ? t : N, t=t+1, (sum [i=1:n] Avg[i])/n)
MovAvgCenterX(col) = (X[(t-1)%N+1]=column(col), n = t<N ? t%2 ? NaN : (t+1)/2 : ((t+1)-N/2)%N+1, n==n ? X[n] : NaN) # be aware: gnuplot does integer division here
set datafile missing NaN
plot $Data u 1:2 w l ti "Data", \
t=1 '' u 1:(MovAvg(2)) w l lc rgb "red" ti sprintf("Moving average over %d",N), \
t=1 '' u (MovAvgCenterX(1)):(MovAvg(2)) w l lw 2 lc rgb "green" ti sprintf("Moving average centered over %d",N)
### end of code
Result:

How to plot data with multiple records in one row

I want to plot JCAMP-DX formatted spectrum.
It has multiple records for y axis in one row and specified increment for x axis.
Simple example: linear plot (1,1) to (12,12)
1 1 2 3
4 4 5 6
7 7 8 9
10 10 11 12
First column represents x axis and second to fourth column represent y axis with each consequent y data belonging to x incremented by one. I can plot it with command:
plot "test.gnuplot" using 1:2 linecolor "black" with dots, "test.gnuplot" using ($1+1):3 linecolor "black" with dots, "test.gnuplot" using ($1+2):4 linecolor "black" with dots
However, the spectrum is much more complicated and I would like to plot it with lines, which is not possible using above mentioned method (lines wouldn´t connect and would create ugly intersections at nonlinear regions of plot).
For now I plot just the second column (using 1:2), but that lowers the resolution.
I want to avoid using external filters (awk etc.) and editing the input file (vim etc.).
real data (skip first 35 lines -- data specification): http://webbook.nist.gov/cgi/cbook.cgi?JCAMP=C7664417&Index=1&Type=IR
You want to avoid external tools, but maybe creating a temporary file with gnuplot itself is acceptable?
I have taken the real data from webbook.nist.gov, and I have removed the comment lines and the last data line which has less y values than the other lines.
This is my suggestion:
datafile = "7664-41-7-IR.jdx2"
dx = 0.935253
col_count=6
# Build a function that will create a new datafile by converting
# single lines of the form "x y1 y2 y3 ..." into multiple
# lines of the form "x y1", "x+dx y2", "x+2*dx y3", ...
#
# We will call this function later for each input line and append
# the new data values.
all_command = "all = sprintf(\"%s"
do for [i=2:col_count] {
all_command = all_command."%f %f\n"
}
all_command = all_command."\", all"
do for [t=2:col_count] {
all_command = all_command.", column(1)+dx*(".t."-2), column(".t.")"
}
all_command = all_command.")"
# Just to check:
print all_command
# Now we call the function for each input line. The variable "all" will contain
# the "expanded" data. Note, the "plot" command is a dummy plot.
all = ""
plot datafile using 1:( #all_command, 1)
# Generate the temporary data file
set print "temp_file.dat"
print all
plot datafile w p, "temp_file.dat" w l
This a part of the output:
For counting the lines generically please check this question.

Gnuplot: Expression in input data

Is there a way to specify that input data is an expression that needs to be evaluated?
In my case the data is rational numbers encoded in the format "n/d". Is there a way to tell gnuplot to interpret "n/d" as "n divided by d"?
Example input data:
1/9 1
1/8 2
1/7 3
1/6 4
I tried plot "data" using ($1):2 but this truncates "n/d" to "n".
Update: After some digging in the manual, I found that in this case I can tell gnuplot to interpret "/" as a column separator and then divide the first number by the second as follows: plot "data" using ($1/$2):3 '%lf/%lf %lf'
I don't know a gnuplot only answer. But you can use the system command to let another program do the work. For example the bc program on linux. The following script works for me:
result(s) = system(sprintf('echo "%s" | bc -l ~/.bcrc', s)) + 0
set table "data.eval"
plot "data.dat" using 1:(result(strcol(2)))
unset table
This is the datafile:
1 1/2
2 1/2.0
3 4+4
4 4*5-1
5 4*(5-1)-(3-7)
6 sin(3.1415)
This is the output:
# Curve 0 of 1, 6 points
# Curve title: ""data.dat" using 1:(result(strcol(2)))"
# x y type
1 0.5 i
2 0.5 i
3 8 i
4 19 i
5 20 i
6 9.26536e-05 i
Notes:
The set table "data.eval" prints the values into a file, now it is easier to check the results.
strcol(2) reads the entries of the second column as a string. The expression must not contain white space.
The function result transfers the string to bc. The string itself must be quoted, else the shell would complain for example about brackets as in line 5 or 6 of the datafile.
The option -l on bc enables floating point evaluation of expressions like in the first line (1/2 = 0.5 instead of 1/2 = 0), and it defines functions like s(x) for sine and e(x) for exp(x).
~/.bcrc reads some function definitions
The system command returns a string. The string is promoted to a floating point number by adding 0.
My ~/.bcrc looks like this:
pi=4*a(1)
e=e(1)
define ln(x)
{return(l(x))}
define lg(x)
{return(l(x)/l(10))}
define exp(x)
{return(e(x))}
define sin(x)
{return(s(x))}
define fac(x)
{if (x<=1) return(1);
return(fac(x-1)*x)}
define ncr(n,r)
{return(fac(n)/(fac(r)*fac(n-r)))}
Tested with gnuplot 4.6 and bc 1.06.95 on Debian Jessie. On Windows you have the set command for integer calculations. It seems that Google knows some other commandline calculators.
It wouldn't be gnuplot if there wasn't a gnuplot-only solution.
Simply collect your expressions in a string by "mis"using stats and evaluate them via eval in a do for loop and write the results in a string and convert the values to a number via real and plot them.
Check help stats, help do, help eval, help real and the example below. Most of the data is taken from #maij's answer. The script works for gnuplot>=5.0 and with some adaptions probably with earlier versions.
Script: (works for gnuplot>=5.0, Jan 2015)
### evaluate expressions in input data
reset session
$Data <<EOD
1 1/2 # integer division
2 1/2.0 # float division
3 4+4
4 4*5-1
5 4*(5-1)-(3-7)
6 sin(3.1415/2)
7 2**3
8 sqrt(9)
EOD
myCol = 2
myExprs = ''
stats $Data u (myExprs=myExprs.sprintf(' "v=%s"',strcol(myCol))) nooutput
myValues = ''
do for [i=1:words(myExprs)] {
eval word(myExprs,i)
myValues = myValues.sprintf(" %g",v)
}
myValue(n) = real(word(myValues,int(column(n)+1)))
set offsets 0.5,0.5,2,0
plot $Data u 1:(myValue(0)) w lp pt 7 lc "red" ti "Expressions", \
'' u 1:(myValue(0)):2 w labels offset 0,1 notitle
### end of script
Result:

Fitting in gnuplot with three variables

I am trying to fit some data using gnuplot.
Here is the data (variables h, k,l and I):
#h k l I
2 1 1 7807
2 2 0 9664
3 2 1 6042
4 0 0 1394
3 3 2 1358
4 2 0 4896
### Function
I(h,k,l) = M * (F * ( (sin(A*pi*sqrt(h*h+k*k+l*l)*L))/(A*2*pi*sqrt(h*h+k*k+l*l)) ))^2
### Initial values
M=1
F=0.5
A=1
L=1
### Fitting
fit I(h,k,l) "cavendish.data" using 1:2:3 via M, F, A, L
I want to determine the constants M,F,A and L from this fitting.
When I run this code I get message undefined variable: h
How can I could determine the variables. Thanks in advance.
Try using a recent version of gnuplot (>= 5.0), which supports fit commands with more than two variables (see release notes). Also note, that the power operator in gnuplot is ** and not ^.
You're example has to be changed slightly to work:
### Function
I(h,k,l) = M * (F * ((sin(A*pi*sqrt(h*h+k*k+l*l)*L))/(A*2*pi*sqrt(h*h+k*k+l*l)) ))**2
### Initial values
M=1.0
F=0.5
A=1.0
L=1.0
### Fitting
set dummy h, k, l
fit I(h,k,l) "cavendish.data" using 1:2:3:4 via M, F, A, L

Creating and modifying arbitrary-length arrays while plotting in gnuplot

I would like to count the number of occurrences of an event (for example, x data value equals some number) and store these occurrences in order, while plotting a file in gnuplot. Say I have the following file:
1
0
0
0
1
1
0
Now I want to count how many times I have a 1 and store that number in variable N. Then I want to know the positions where that happens and store that information in an array pos, all of this while plotting the file. The result, for the example above, should be:
print N
3
print pos
1 5 6
I know how to achieve the counting:
N = 0
plot "data" u ($0):($1 == 1 ? (N = N+1, $1) : $1)
print N
3
Then to achieve the position recording, it would be schematically something like this:
N = 0 ; pos = ""
plot "data" u ($0):($1 == 1 ? (N = N+1, pos = pos." ".$0, $1) : $1) # This doesn't work!
print N
3
print pos
1 5 6
How can this be done in gnuplot without resorting to external bash commands?
Well, as sometimes happens writing down the question triggers an idea for an answer. I'll leave it here in case somebody finds it useful:
N=0 ; pos=""
plot "data" u ($0):($1 == 1 ? (N = N+1, pos = sprintf("%s %g", pos, $0+1), $1) : $1)
print N
3
print pos
1 5 6
Note I had to use $0+1 because position 1 is treated by gnuplot as zero.

Resources