I have three data files each with a matrix; I use stats to find the maximum value in the matrix for each file and it is displayed correctly. I need to use those three maximum values as data points and plot them so as to have points on my plot as (1.0, A_max), (2.0, B_max) and (3.0, C_max) where A_max is the maximum value calculated using stats from first data file, B_max from second and C_max from third. Here is how my gp file looks like :
set terminal epslatex size 3.5,2.62 color colortext
set output 'data.tex'
set xlabel '$x$'
set ylabel '$y$'
stats 'dataA.txt' matrix name "A"
show variables A_
stats 'dataB.txt' matrix name "B"
show variables B_
stats 'dataC.txt' matrix name "C"
show variables C_
plot '-' w p, '-' w p, '-' w p
1.0 A_max
e
2.0 B_max
e
3.0 C_max
e
The plot I get, looks like below.
Clearly, it is taking x-axis as 0 and the points I intend for my x-axis corresponding to y. Not sure what I am missing, probably how to read the stats variable. Any help will be appreciated.
Inline data, like you are using, is used as-is without any variable replacement.
Use set print $data to print data to the named data block $data:
set print $data
stats 'dataA.txt' matrix name "A"
print sprintf("%e A", A_max)
stats 'dataB.txt' matrix name "B"
print sprintf("%e B", B_max)
stats 'dataC.txt' matrix name "C"
print sprintf("%e C", C_max)
plot $data using 0:1:xticlabel(2) w p notitle
or, with more automation:
set print $data
do for [f in "A B C"]{
stats 'data'.f.'.txt' matrix name f
print sprintf("%e %s", value(f.'_max), f)
}
plot $data using 0:1:xticlabel(2) w p notitle
Related
I am trying to use GNUplot to calculate the best-fit line for some time-series data. The data is just about linear already with a negative slope. The input data looks something like:
1615840396,138849,510249
1615840406,139011,511152
1615840416,137580,510330
1615840426,137493,510501
1615840436,137261,510186
1615840447,137435,511026
1615840456,137054,510252
1615840466,136955,510174
1615840476,136922,510540
1615840486,136970,510999
The first column is a Unix timestamp. A graph of column 2 vs. time looks like this:
I'm trying to produce a best-fit line like this:
gnuplot> set xdata time
gnuplot> set timefmt "%s"
gnuplot> set datafile separator comma
gnuplot> f(x) = m*x + b
gnuplot> fit f(x) 'data.csv' using 1:2 via m,b
Which produces:
Final set of parameters Asymptotic Standard Error
======================= ==========================
m = 8.08062e-05 +/- 1.633 (2.021e+06%)
b = 1 +/- 2.639e+09 (2.639e+11%)
The resulting best fit line has a positive slope, and doesn't really git the data at all:
What am I doing wrong?
This is a recurring question about fitting time data. I guess there should be similar questions here on SO, but I can't find them right now. I'm not sure if there is an example of fitting time data on the gnuplot homepage.
I guess the problem is the following: If you assume a linear function f(x) = a*x + b with time data, the origin will be at Jan, 1st 1970.
Typically, this will be pretty far from your actual data and furthermore, you only have a small range of data compared to the distance to your origin. So, I guess the fitting function cannot deliver really good values.
You better try to fit a function which is shifted by your start date.
You either set this start date manually, or you spend a few lines of code to find it automatically.
Additionally, it will help if you give some starting values for the fitting parameters.
Here, it seems that a will be found without giving a start value and if you set b=1 it will not give good result, but b=10 seems to be ok as starting value.
Code:
### fitting time data
reset session
# create some random test data
set print $Data
do for [i=1:100] {
print sprintf("%.0f,%g",time(0)+i*86400,i+rand(0)*10 )
}
set print
set datafile separator comma
# find out the StartDate
StartDate = 16158768671 # manually by setting a value
# or automatically by using stats
stats $Data u 1 index 0 every ::0:0:0:0 nooutput
StartDate = STATS_min
f(x) = a*(x-StartDate) + b
set fit brief nolog
b=10
fit f(x) $Data u 1:2 via a,b
set key top left
set format x "%b %d" timedate
plot $Data u 1:2 ti "Data", \
f(x) w l lc rgb "red" ti "Fit"
### end of code
Result:
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 1.16005e-05 +/- 1.163e-07 (1.003%)
b = 6.1323 +/- 0.5759 (9.39%)
I calculated the eigenvalues of the Hamiltonian for the 1D-hydrogen atom in atomic units with the Fourier-Grid-Hamiltonian method in a nice little Fortran program.
All the eigenvalues found between -1 and 0 (the bound states) are saved into a file line by line like this:
-0.50016671392950229
-0.18026105614262633
-0.11485673263086937
-4.7309305955423042E-002
-4.7077108902158216E-002
As the number of found eigenvalues differs depends on the stepsize my program uses, the number of entries in the file can vary (in theory, there are infinite ones).
I now want to plot the values from the file as a line parallel to the x-axis with the offset given by the values read from file.
I also want to be able to plot the data only up to a certain line number, as the values get really close to each other the further you come to zero and they cannot be distinguished by eye anymore.
(Here e.g. it would make sence to plot the first four entries, the fifth is already too close to the previous one)
I know that one can plot lines parallel to the x axis with the command plot *offset* but I don't know how to tell gnuplot to use the data from the file. So far I had to manually plot the values.
As a second step I would like to plot the data only in a certain x range, more concrete between the points of intersection with the harmonic potential used for the numeric solution V(x) = -1/(1+abs(x))
The result should look like this:
scheme of the desired plot (lookalike)
The closest I got to, was with
plot -1/(1+abs(x)),-0.5 title 'E0',-0.18 title 'E1', -0.11 title 'E2'
which got me the following result:
my plot
Hope you guys can help me, and I'm really curios whether gnuplot actually can do the second step I described!
As for the first part of your question, you can for example use the xerrorbars plotting style as:
set terminal pngcairo
set output 'fig.png'
unset key
set xr [-1:1]
set yr [-1:0]
unset bars
plot '-' u (0):($1<-0.1?$1:1/0):(1) w xerrorbars pt 0 lc rgb 'red'
-0.50016671392950229
-0.18026105614262633
-0.11485673263086937
-4.7309305955423042E-002
-4.7077108902158216E-002
e
The idea here is to:
interpret the energies E as points with coordinates (0,E) and assign to each of them an x-errorbar of width 1 (via the third part of the specification (0):($1<-0.1?$1:1/0):(1))
"simulate" the horizontal lines with x-errorbars. To this end, unset bars and pt 0 ensure that Gnuplot displays just plain lines.
consider only energies E<-0.1, the expressions $1<-0.1?$1:1/0 evaluates otherwise to an undefined value 1/0 which has the consequence that nothing is plotted for such E.
plot '-' with explicit values can be of course replaced with, e.g., plot 'your_file.dat'
This produces:
For the second part, it mostly depends how complicated is your function V(x). In the particular case of V(x)=-1/(1+|x|), one could infer directly that it's symmetric around x=0 and calculate the turning points explicitly, e.g.,
set terminal pngcairo
set output 'fig.png'
fName = 'test.dat'
unset key
set xr [-10:10]
set yr [-1:0]
unset bars
f(x) = -1 / (1+abs(x))
g(y) = (-1/y - 1)
plot \
f(x) w l lc rgb 'black', \
fName u (0):($1<-0.1?$1:1/0):(g($1)) w xerrorbars pt 0 lc rgb 'red', \
fName u (0):($1<-0.1?$1:1/0):(sprintf("E%d", $0)) w labels offset 0, char 0.75
which yields
The idea is basically the same as before, just the width of the errorbar now depends on the y-coordinate (the energy). Also, the labels style is used in order to produce explicit labels.
Another approach may be to get data from "energy.dat" (as given in the question) with system and cat commands (so assuming a Un*x-like system...) and select V(x) and E at each x via max:
set key bottom right
set yr [-1:0.2]
set samples 1000
Edat = system( "cat energy.dat" )
max(a,b) = ( a > b ) ? a : b
V(x) = -1/(1+abs(x))
plot for [ E in Edat ] \
max(V(x),real(E)) title sprintf("E = %8.6f", real(E)) lw 2, \
V(x) title "V(x) = -1/(1+|x|)" lc rgb "red" lw 2
If we change the potential to V(x) = -abs(cos(x)), the plot looks pretty funny (and the energy levels are of course not correct!)
More details about the script:
max is not a built-in function in Gnuplot, but a user-defined function having two formal arguments. So for example, we may define it as
mymax( p, q ) = ( p > q ) ? p : q
with any other names (and use mymax in the plot command). Next, the ? symbol is a ternary operator that gives a short-hand notation for an if...else construct. In a pseudo-code, it works as
function max( a, b ) {
if ( a > b ) then
return a
else
return b
end
}
This way, max(V(x),real(E)) selects the greater value between V(x) and real(E) for any given x and E.
Next, Edat = system( "cat energy.dat" ) tells Gnuplot to run the shell command "cat energy.dat" and assign the output to a new variable Edat. In the above case, Edat becomes a string that contains a sequence of energy values read in from "energy.dat". You can check the contents of Edat by print( Edat ). For example, it may be something like
Edat = "-0.11 -0.22 ... -0.5002"
plot for [ E in Edat ] ... loops over words contained in a string Edat. In the above case, E takes a string "-0.11", "-0.22", ..., "-0.5002" one-by-one. real(E) converts this string to a floating-point value. It is used to pass E (a character string) to any mathematical function.
The basic idea is to draw a truncated potential above E, max(V(x),E), for each value of E. (You can check the shape of such potential by plot max(V(x),-0.5), for example). After plotting such curves, we redraw the potential V(x) to make it appear as a single potential curve with a different color.
set samples 1000 increases the resolution of the plot with 1000 points per curve. 1000 is arbitrary, but this seems to be sufficient to make the figure pretty smooth.
[Current]
I am importing a text file in which the first column has simulation time (0~150) the second column has the delay (0.01~0.02).
1.000000 0.010007
1.000000 0.010010
2.000000 0.010013
2.000000 0.010016
.
.
.
149.000000 0.010045
149.000000 0.010048
150.000000 0.010052
150.000000 0.010055
which gives me the plot:
[Desired]
I need to plot an average line on it like shown in the following image with red line:
Here is a gnuplot only solution with sample data:
set table "test.data"
set samples 1000
plot rand(0)+sin(x)
unset table
You should check the gnuplot demo page for a running average. I'm going to generalize this demo in terms of dynamically building the functions. This makes it much easier to change the number of points include in the average.
This is the script:
# number of points in moving average
n = 50
# initialize the variables
do for [i=1:n] {
eval(sprintf("back%d=0", i))
}
# build shift function (back_n = back_n-1, ..., back1=x)
shift = "("
do for [i=n:2:-1] {
shift = sprintf("%sback%d = back%d, ", shift, i, i-1)
}
shift = shift."back1 = x)"
# uncomment the next line for a check
# print shift
# build sum function (back1 + ... + backn)
sum = "(back1"
do for [i=2:n] {
sum = sprintf("%s+back%d", sum, i)
}
sum = sum.")"
# uncomment the next line for a check
# print sum
# define the functions like in the gnuplot demo
# use macro expansion for turning the strings into real functions
samples(x) = $0 > (n-1) ? n : ($0+1)
avg_n(x) = (shift_n(x), #sum/samples($0))
shift_n(x) = #shift
# the final plot command looks quite simple
set terminal pngcairo
set output "moving_average.png"
plot "test.data" using 1:2 w l notitle, \
"test.data" using 1:(avg_n($2)) w l lc rgb "red" lw 3 title "avg\\_".n
This is the result:
The average lags quite a bit behind the datapoints as expected from the algorithm. Maybe 50 points are too many. Alternatively, one could think about implementing a centered moving average, but this is beyond the scope of this question.
And, I also think that you are more flexible with an external program :)
Here's some replacement code for the top answer, which makes this also work for 1000+ points and much much faster. Only works in gnuplot 5.2 and later I guess
# number of points in moving average
n = 5000
array A[n]
samples(x) = $0 > (n-1) ? n : int($0+1)
mod(x) = int(x) % n
avg_n(x) = (A[mod($0)+1]=x, (sum [i=1:samples($0)] A[i]) / samples($0))
Edit
The updated question is about a moving average.
You can do this in a limited way with gnuplot alone, according to this demo.
But in my opinion, it would be more flexible to pre-process your data using a programming language like python or ruby and add an extra column for whatever kind of moving average you require.
The original answer is preserved below:
You can use fit. It seems you want to fit to a constant function. Like this:
f(x) = c
fit f(x) 'S1_delay_120_LT100_LU15_MU5.txt' using 1:2 every 5 via c
Then you can plot them both.
plot 'S1_delay_120_LT100_LU15_MU5.txt' using 1:2 every 5, \
f(x) with lines
Note that this is technique can be used with arbitrary functions, not just constant or lineair functions.
I wanted to comment on Franky_GT, but somehow stackoverflow didn't let me.
However, Franky_GT, your answer works great!
A note for people plotting .xvg files (e.g. after doing analysis of MD simulations), if you don't add the following line:
set datafile commentschars "##&"
Franky_GT's moving average code will result in this error:
unknown type in imag()
I hope this is of use to anyone.
For gnuplot >=5.2, probably the most efficient solution is using an array like #Franky_GT's solution.
However, it uses the pseudocolumn 0 (see help pseudocolumns). In case you have some empty lines in your data $0 will be reset to 0 which eventually might mess up your average.
This solution uses an index t to count up the datalines and a second array X[] in case a centered moving average is desired. Datapoints don't have to be equidistant in x.
At the beginning there will not be enough datapoints for a centered average of N points so for the x-value it will use every second point and the other will be NaN, that's why set datafile missing NaN is necessary to plot a connected line at the beginning.
Code:
### moving average over N points
reset session
# create some test data
set print $Data
y = 0
do for [i=1:5000] {
print sprintf("%g %g", i, y=y+rand(0)*2-1)
}
set print
# average over N values
N = 250
array Avg[N]
array X[N]
MovAvg(col) = (Avg[(t-1)%N+1]=column(col), n = t<N ? t : N, t=t+1, (sum [i=1:n] Avg[i])/n)
MovAvgCenterX(col) = (X[(t-1)%N+1]=column(col), n = t<N ? t%2 ? NaN : (t+1)/2 : ((t+1)-N/2)%N+1, n==n ? X[n] : NaN) # be aware: gnuplot does integer division here
set datafile missing NaN
plot $Data u 1:2 w l ti "Data", \
t=1 '' u 1:(MovAvg(2)) w l lc rgb "red" ti sprintf("Moving average over %d",N), \
t=1 '' u (MovAvgCenterX(1)):(MovAvg(2)) w l lw 2 lc rgb "green" ti sprintf("Moving average centered over %d",N)
### end of code
Result:
I have a data file with matrices split into different gnuplot indices. I wanna do an animation of a density plot evolving with time (=index).
The problem is that I want to keep the maximum and minimum of the cbrange symmetric while allowing it to change with time.
In the code below, the first "stats" command simply gives me the number of blocks for the loop. The second "stats" command with prefix "B" should give me the max and min values for the matrix at each index, so I can set cbrange properly.
The first time the code enters the loop it works (for i=1) and stats gives me the proper numbers. Starting on the second loop (i=2) stats gives me wrong numbers...
I've tried to set cbrange and zrange to [*:*] before the stats command, but it doesn't help.
Here's the code:
set terminal gif animate delay 0.5
set output 'foobar.gif'
stats 'dat-rw2d.dat' nooutput
set pm3d map
set palette defined (-1 "blue", 0 "white", 1 "red")
print STATS_blocks
do for [i=1:int(STATS_blocks)] {
print i
stats "dat-rw2d.dat" index (i-1) matrix nooutput prefix "B"
max = (B_max > -B_min)?(B_max):(-B_min)
set cbrange [-max:max]
print B_max, B_min
splot 'dat-rw2d.dat' matrix index (i-1)
}
If I don't plot anything (code below), the stats give me the correct numbers. So it is actually the "splot" that is causing the problem. It's fixing some scale and getting in the way of stats? I've tried to set cbrange [*:*] before the stats, but it doesn't solve the problem.
do for [i=1:int(STATS_blocks)] {
print i
stats "dat-rw2d.dat" index (i-1) matrix nooutput prefix "B"
max = (B_max > -B_min)?(B_max):(-B_min)
set cbrange [-max:max]
print B_max, B_min
}
If you don't specify any column to use for stats, gnuplot tries to guess a suitable default one. With the matrix option this seems to be a wrong one (probably x-value or y-value, or matrix size), which doesn't change from block to block.
You must tell gnuplot to explicitely use the third column for the stats:
stats 'dat-rw2d.dat' nooutput
set pm3d map
set palette defined (-1 "blue", 0 "white", 1 "red")
print STATS_blocks
do for [i=1:int(STATS_blocks)] {
print i
stats "dat-rw2d.dat" using 3 index (i-1) matrix nooutput prefix "B"
max = (B_max > -B_min)?(B_max):(-B_min)
set cbrange [-max:max]
print B_max, B_min
splot 'dat-rw2d.dat' matrix index (i-1)
}
Since it looks like a bug, I can just propose a workaround (awful IMHO) but that's what comes to my mind:
Call gnuplot within a system command, save variables in a file dummy.txt and load that file from your script.
stats 'test.txt' nooutput
do for [cntr=1:int(STATS_blocks)] {
# next line doesn't work
# stats 'test.txt' index (cntr-1) matrix prefix "B"
# next 3 lines do the hack
cmd=sprintf('gnuplot -e "stats \"test.txt\" index %d matrix nooutput prefix \"B\"; save var \"dummy.txt\""',cntr-1)
system(cmd)
load("dummy.txt")
print cntr, B_max, B_min
max = (B_max > -B_min)?(B_max):(-B_min)
set cbrange [-max:max]
splot 'test.txt' matrix index (cntr-1) w l
}
If someone is willing to reproduce the problem , here is my test.txt file:
0 0
0 1
1 1
1 2
2 2
3 3
I have some values of an experiment I want to plot in the xyz coordinate system. the values are somewhat limited to a certain amount and have thus a beginning and an end. Furthermore I want to plot them and the last point of them should be visible in a special way. Like a unique point with a special color or a special dot. How can I create such a line where the end is visible as a dot or something similar?
In case that it is the last point, you can use every to select it. Unfortunately, there is no 'magic' value to get the last point. You must count the number of entries and use that value:
stats 'file.dat' nooutput
last_index = int(STATS_records - 1)
splot 'file.dat' with lines, '' every ::last_index with points
Alternatively, you could use the using statement, to add some filtering which is true only for this single point which you want to select.
In the most general case, you would have
f(i, x, y, z) = ...
splot 'file.dat' with lines, '' using (f($0, $1, $2, $3) ? $1 : 1/0):2:3 with points
This skips all points, for which f returns 0. $0 is the shorthand for column(0), which gives the row index, $1 gives the numerical value of the first column and so on.
Now it is up to you to define an appropriate filtering function f.
If the end point is given e.g. by the point with the maximum x-value, you could use:
stats 'file.dat' using 1 nooutput
f(x) = (x == STATS_max ? 1 : 0)
splot 'file.dat' with lines, '' using (f($1) ? $1 : 1/0):2:3 with points
If you have an other criterium, you must define your function f accordingly.
To add a label to this point, you can use the label plotting style:
splot 'file.dat' with lines, \
'' using (f($1) ? $1 : 1/0):2:3:(sprintf('(%.1f,%.1f,%.1f)', $1, $2, $3)) \
offset char 1,1 point notitle
Here is a solution without stats which works even with gnuplot 4.4.0 (I guess stats was only available in gnuplot 4.6.0).
use the pseudocolumn 0 (check help pseudocolumns), if $0==0 (i.e. first row) you assign the values to x0,y0,z0.
assign x1=$1, y1=$2, and z1=$3. So after the first plot command the variables x1,y1,z1 hold the last point's coordinates.
use the special filename '+' (check special filenames) for plotting a single datapoint.
Script: (works with gnuplot>=4.4.0, March 2010)
### mark startpoint and endpoint of a path
reset
FILE = "SO19426060.dat"
# create some random test data
a = rand(0)
b = rand(0)
c = rand(0)
set table FILE
splot '+' u (a=a+rand(0)-0.5):(b=b+rand(0)-0.5):(c=c+rand(0)-0.5)
unset table
set grid x
set grid y
set xyplane relative 0
splot FILE u ($0==0?x0=$1:0,x1=$1):($0==0?y0=$2:0,y1=$2):($0==0?z0=$3:0,z1=$3) \
w l lc rgb "black" notitle, \
'+' u (x0):(y0):(z0) every ::0::0 w p pt 7 lc rgb "blue" ti "start", \
'+' u (x1):(y1):(z1) every ::0::0 w p pt 7 lc rgb "red" ti "end"
### end of script
Result: