How to plot lines parallel to the x-axis with a certain offset given by data in an input file with gnuplot - gnuplot

I calculated the eigenvalues of the Hamiltonian for the 1D-hydrogen atom in atomic units with the Fourier-Grid-Hamiltonian method in a nice little Fortran program.
All the eigenvalues found between -1 and 0 (the bound states) are saved into a file line by line like this:
-0.50016671392950229
-0.18026105614262633
-0.11485673263086937
-4.7309305955423042E-002
-4.7077108902158216E-002
As the number of found eigenvalues differs depends on the stepsize my program uses, the number of entries in the file can vary (in theory, there are infinite ones).
I now want to plot the values from the file as a line parallel to the x-axis with the offset given by the values read from file.
I also want to be able to plot the data only up to a certain line number, as the values get really close to each other the further you come to zero and they cannot be distinguished by eye anymore.
(Here e.g. it would make sence to plot the first four entries, the fifth is already too close to the previous one)
I know that one can plot lines parallel to the x axis with the command plot *offset* but I don't know how to tell gnuplot to use the data from the file. So far I had to manually plot the values.
As a second step I would like to plot the data only in a certain x range, more concrete between the points of intersection with the harmonic potential used for the numeric solution V(x) = -1/(1+abs(x))
The result should look like this:
scheme of the desired plot (lookalike)
The closest I got to, was with
plot -1/(1+abs(x)),-0.5 title 'E0',-0.18 title 'E1', -0.11 title 'E2'
which got me the following result:
my plot
Hope you guys can help me, and I'm really curios whether gnuplot actually can do the second step I described!

As for the first part of your question, you can for example use the xerrorbars plotting style as:
set terminal pngcairo
set output 'fig.png'
unset key
set xr [-1:1]
set yr [-1:0]
unset bars
plot '-' u (0):($1<-0.1?$1:1/0):(1) w xerrorbars pt 0 lc rgb 'red'
-0.50016671392950229
-0.18026105614262633
-0.11485673263086937
-4.7309305955423042E-002
-4.7077108902158216E-002
e
The idea here is to:
interpret the energies E as points with coordinates (0,E) and assign to each of them an x-errorbar of width 1 (via the third part of the specification (0):($1<-0.1?$1:1/0):(1))
"simulate" the horizontal lines with x-errorbars. To this end, unset bars and pt 0 ensure that Gnuplot displays just plain lines.
consider only energies E<-0.1, the expressions $1<-0.1?$1:1/0 evaluates otherwise to an undefined value 1/0 which has the consequence that nothing is plotted for such E.
plot '-' with explicit values can be of course replaced with, e.g., plot 'your_file.dat'
This produces:
For the second part, it mostly depends how complicated is your function V(x). In the particular case of V(x)=-1/(1+|x|), one could infer directly that it's symmetric around x=0 and calculate the turning points explicitly, e.g.,
set terminal pngcairo
set output 'fig.png'
fName = 'test.dat'
unset key
set xr [-10:10]
set yr [-1:0]
unset bars
f(x) = -1 / (1+abs(x))
g(y) = (-1/y - 1)
plot \
f(x) w l lc rgb 'black', \
fName u (0):($1<-0.1?$1:1/0):(g($1)) w xerrorbars pt 0 lc rgb 'red', \
fName u (0):($1<-0.1?$1:1/0):(sprintf("E%d", $0)) w labels offset 0, char 0.75
which yields
The idea is basically the same as before, just the width of the errorbar now depends on the y-coordinate (the energy). Also, the labels style is used in order to produce explicit labels.

Another approach may be to get data from "energy.dat" (as given in the question) with system and cat commands (so assuming a Un*x-like system...) and select V(x) and E at each x via max:
set key bottom right
set yr [-1:0.2]
set samples 1000
Edat = system( "cat energy.dat" )
max(a,b) = ( a > b ) ? a : b
V(x) = -1/(1+abs(x))
plot for [ E in Edat ] \
max(V(x),real(E)) title sprintf("E = %8.6f", real(E)) lw 2, \
V(x) title "V(x) = -1/(1+|x|)" lc rgb "red" lw 2
If we change the potential to V(x) = -abs(cos(x)), the plot looks pretty funny (and the energy levels are of course not correct!)
More details about the script:
max is not a built-in function in Gnuplot, but a user-defined function having two formal arguments. So for example, we may define it as
mymax( p, q ) = ( p > q ) ? p : q
with any other names (and use mymax in the plot command). Next, the ? symbol is a ternary operator that gives a short-hand notation for an if...else construct. In a pseudo-code, it works as
function max( a, b ) {
if ( a > b ) then
return a
else
return b
end
}
This way, max(V(x),real(E)) selects the greater value between V(x) and real(E) for any given x and E.
Next, Edat = system( "cat energy.dat" ) tells Gnuplot to run the shell command "cat energy.dat" and assign the output to a new variable Edat. In the above case, Edat becomes a string that contains a sequence of energy values read in from "energy.dat". You can check the contents of Edat by print( Edat ). For example, it may be something like
Edat = "-0.11 -0.22 ... -0.5002"
plot for [ E in Edat ] ... loops over words contained in a string Edat. In the above case, E takes a string "-0.11", "-0.22", ..., "-0.5002" one-by-one. real(E) converts this string to a floating-point value. It is used to pass E (a character string) to any mathematical function.
The basic idea is to draw a truncated potential above E, max(V(x),E), for each value of E. (You can check the shape of such potential by plot max(V(x),-0.5), for example). After plotting such curves, we redraw the potential V(x) to make it appear as a single potential curve with a different color.
set samples 1000 increases the resolution of the plot with 1000 points per curve. 1000 is arbitrary, but this seems to be sufficient to make the figure pretty smooth.

Related

Gnuplot fit function against time x-axis and real number y-axis give "singular matrix in invert_RtR" error? [duplicate]

I am trying to use GNUplot to calculate the best-fit line for some time-series data. The data is just about linear already with a negative slope. The input data looks something like:
1615840396,138849,510249
1615840406,139011,511152
1615840416,137580,510330
1615840426,137493,510501
1615840436,137261,510186
1615840447,137435,511026
1615840456,137054,510252
1615840466,136955,510174
1615840476,136922,510540
1615840486,136970,510999
The first column is a Unix timestamp. A graph of column 2 vs. time looks like this:
I'm trying to produce a best-fit line like this:
gnuplot> set xdata time
gnuplot> set timefmt "%s"
gnuplot> set datafile separator comma
gnuplot> f(x) = m*x + b
gnuplot> fit f(x) 'data.csv' using 1:2 via m,b
Which produces:
Final set of parameters Asymptotic Standard Error
======================= ==========================
m = 8.08062e-05 +/- 1.633 (2.021e+06%)
b = 1 +/- 2.639e+09 (2.639e+11%)
The resulting best fit line has a positive slope, and doesn't really git the data at all:
What am I doing wrong?
This is a recurring question about fitting time data. I guess there should be similar questions here on SO, but I can't find them right now. I'm not sure if there is an example of fitting time data on the gnuplot homepage.
I guess the problem is the following: If you assume a linear function f(x) = a*x + b with time data, the origin will be at Jan, 1st 1970.
Typically, this will be pretty far from your actual data and furthermore, you only have a small range of data compared to the distance to your origin. So, I guess the fitting function cannot deliver really good values.
You better try to fit a function which is shifted by your start date.
You either set this start date manually, or you spend a few lines of code to find it automatically.
Additionally, it will help if you give some starting values for the fitting parameters.
Here, it seems that a will be found without giving a start value and if you set b=1 it will not give good result, but b=10 seems to be ok as starting value.
Code:
### fitting time data
reset session
# create some random test data
set print $Data
do for [i=1:100] {
print sprintf("%.0f,%g",time(0)+i*86400,i+rand(0)*10 )
}
set print
set datafile separator comma
# find out the StartDate
StartDate = 16158768671 # manually by setting a value
# or automatically by using stats
stats $Data u 1 index 0 every ::0:0:0:0 nooutput
StartDate = STATS_min
f(x) = a*(x-StartDate) + b
set fit brief nolog
b=10
fit f(x) $Data u 1:2 via a,b
set key top left
set format x "%b %d" timedate
plot $Data u 1:2 ti "Data", \
f(x) w l lc rgb "red" ti "Fit"
### end of code
Result:
Final set of parameters Asymptotic Standard Error
======================= ==========================
a = 1.16005e-05 +/- 1.163e-07 (1.003%)
b = 6.1323 +/- 0.5759 (9.39%)

Removing vertical lines due to sudden jumps in gnuplot

I am trying to plot a function that contains discontinuities in gnuplot. As a result, gnuplot automatically draws a vertical line connecting the jump discontinuities. I would like to remove this line. I have looked around and found two solutions, none of which worked: One solution was to use smooth unique when plotting, and the other one was to define the function in a conditional form and remove the discontinuity manually. The first solution simply did not make any changes to the plot (at least visually). The second solution seemed to move the location of the jump discontinuity to left or right, not get rid of the vertical line. Please note that I would like to plot with lines. I know with points works, but I do not wish to plot with points.
set sample 10000
N=50
l1(x)=2*cosh(1/x)
l2(x)=2*sinh(1/x)
Z(x)=l1(x)**N+l2(x)**N
e(x)=(-1/Z(x))*(l2(x)*l1(x)**(N-1)+l1(x)*l2(x)**(N-1))
plot e(x)
Produces:
If all you need to do is to remove the vertical line at the singularity you could use conditional plotting:
plot (x<0 ? 1/x : 1/0) w l ls 1, (x>0 ? 1/x : 1/0) w l ls 1
However, your function is more complicated: it cannot be numerically evaluated in a region around 0:
set grid
set xrange [-0.3:0.3]
plot e(x) with linespoints
If Mathematica is to be trusted, the function e(x) goes to 1 and -1 as x approaches 0 from the left and the right, respectively. However, you see in the picture above that gnuplot fails to properly evaluate the function already at x=0.1. print e(0.1) gives -0.0, and print e(0.05) already gives NaN. In this region the numerator and denominator of the function e(x) get too large to be handled with floating point numbers.
You can either exclude this region using conditional plotting,
plot (x<-0.15 ? e(x) : 1/0) w l ls 1, (x>0.15 ? e(x) : 1/0) w l ls 1
or you have to rewrite the function e(x) so you avoid extremely large values in its evaluation (if that is possible). Alternatively you can use a software package that can switch to higher precision, such as Mathematica.
You can redefine your function e(x) to avoid calculations of large exponentials like
e(x) = -(l2(x)/l1(x) + (l2(x)/l1(x))**(N-1))/(1 + (l2(x)/l1(x))**N)
Now you always calculate l2(x)/l1(x) before taking the power.
For your high sampling rate of 10000, this still gives some undefined points near the singularity, so that you have not connecting line. For lower sampling rates of e.g. 1000 you would also see a line crossing zero. To avoid that you can use an odd sampling rate:
set sample 1001
N=50
l1(x)=2*cosh(1/x)
l2(x)=2*sinh(1/x)
Z(x)=l1(x)**N+l2(x)**N
e(x) = -(l2(x)/l1(x) + (l2(x)/l1(x))**(N-1))/(1 + (l2(x)/l1(x))**N)
set autoscale yfix
set offsets 0,0,0.05,0.05
plot e(x) with lines
Late answer... but you can use the same principle as
here:
How to remove line between "jumping" values, in gnuplot?
or here:
Avoid connection of points when there is empty data
Just find the condition for where you want the line to be interrupted.
The condition in this case would be for example:
If two successive values y0 and y1 have different signs then make the line color fully transparent according to the color scheme 0xaarrggbb, e.g. 0xff123456, actually it doesn't matter what comes after 0xff, because 0xff means fully transparent.
Script:
### remove connected "jump" in curve
reset session
N=50
l1(x)=2*cosh(1/x)
l2(x)=2*sinh(1/x)
Z(x)=l1(x)**N+l2(x)**N
e(x)=(-1/Z(x))*(l2(x)*l1(x)**(N-1)+l1(x)*l2(x)**(N-1))
set key noautotitle
set grid x,y
plot y1=NaN '+' u 1:(y0=y1, y1=e(x)):(sgn(y0)!=sgn(y1)?0xff123456:0xff0000) w l lc rgb var
### end of code
Result: (identical independent of the number of samples)

Plotting Average curve for points in gnuplot

[Current]
I am importing a text file in which the first column has simulation time (0~150) the second column has the delay (0.01~0.02).
1.000000 0.010007
1.000000 0.010010
2.000000 0.010013
2.000000 0.010016
.
.
.
149.000000 0.010045
149.000000 0.010048
150.000000 0.010052
150.000000 0.010055
which gives me the plot:
[Desired]
I need to plot an average line on it like shown in the following image with red line:
Here is a gnuplot only solution with sample data:
set table "test.data"
set samples 1000
plot rand(0)+sin(x)
unset table
You should check the gnuplot demo page for a running average. I'm going to generalize this demo in terms of dynamically building the functions. This makes it much easier to change the number of points include in the average.
This is the script:
# number of points in moving average
n = 50
# initialize the variables
do for [i=1:n] {
eval(sprintf("back%d=0", i))
}
# build shift function (back_n = back_n-1, ..., back1=x)
shift = "("
do for [i=n:2:-1] {
shift = sprintf("%sback%d = back%d, ", shift, i, i-1)
}
shift = shift."back1 = x)"
# uncomment the next line for a check
# print shift
# build sum function (back1 + ... + backn)
sum = "(back1"
do for [i=2:n] {
sum = sprintf("%s+back%d", sum, i)
}
sum = sum.")"
# uncomment the next line for a check
# print sum
# define the functions like in the gnuplot demo
# use macro expansion for turning the strings into real functions
samples(x) = $0 > (n-1) ? n : ($0+1)
avg_n(x) = (shift_n(x), #sum/samples($0))
shift_n(x) = #shift
# the final plot command looks quite simple
set terminal pngcairo
set output "moving_average.png"
plot "test.data" using 1:2 w l notitle, \
"test.data" using 1:(avg_n($2)) w l lc rgb "red" lw 3 title "avg\\_".n
This is the result:
The average lags quite a bit behind the datapoints as expected from the algorithm. Maybe 50 points are too many. Alternatively, one could think about implementing a centered moving average, but this is beyond the scope of this question.
And, I also think that you are more flexible with an external program :)
Here's some replacement code for the top answer, which makes this also work for 1000+ points and much much faster. Only works in gnuplot 5.2 and later I guess
# number of points in moving average
n = 5000
array A[n]
samples(x) = $0 > (n-1) ? n : int($0+1)
mod(x) = int(x) % n
avg_n(x) = (A[mod($0)+1]=x, (sum [i=1:samples($0)] A[i]) / samples($0))
Edit
The updated question is about a moving average.
You can do this in a limited way with gnuplot alone, according to this demo.
But in my opinion, it would be more flexible to pre-process your data using a programming language like python or ruby and add an extra column for whatever kind of moving average you require.
The original answer is preserved below:
You can use fit. It seems you want to fit to a constant function. Like this:
f(x) = c
fit f(x) 'S1_delay_120_LT100_LU15_MU5.txt' using 1:2 every 5 via c
Then you can plot them both.
plot 'S1_delay_120_LT100_LU15_MU5.txt' using 1:2 every 5, \
f(x) with lines
Note that this is technique can be used with arbitrary functions, not just constant or lineair functions.
I wanted to comment on Franky_GT, but somehow stackoverflow didn't let me.
However, Franky_GT, your answer works great!
A note for people plotting .xvg files (e.g. after doing analysis of MD simulations), if you don't add the following line:
set datafile commentschars "##&"
Franky_GT's moving average code will result in this error:
unknown type in imag()
I hope this is of use to anyone.
For gnuplot >=5.2, probably the most efficient solution is using an array like #Franky_GT's solution.
However, it uses the pseudocolumn 0 (see help pseudocolumns). In case you have some empty lines in your data $0 will be reset to 0 which eventually might mess up your average.
This solution uses an index t to count up the datalines and a second array X[] in case a centered moving average is desired. Datapoints don't have to be equidistant in x.
At the beginning there will not be enough datapoints for a centered average of N points so for the x-value it will use every second point and the other will be NaN, that's why set datafile missing NaN is necessary to plot a connected line at the beginning.
Code:
### moving average over N points
reset session
# create some test data
set print $Data
y = 0
do for [i=1:5000] {
print sprintf("%g %g", i, y=y+rand(0)*2-1)
}
set print
# average over N values
N = 250
array Avg[N]
array X[N]
MovAvg(col) = (Avg[(t-1)%N+1]=column(col), n = t<N ? t : N, t=t+1, (sum [i=1:n] Avg[i])/n)
MovAvgCenterX(col) = (X[(t-1)%N+1]=column(col), n = t<N ? t%2 ? NaN : (t+1)/2 : ((t+1)-N/2)%N+1, n==n ? X[n] : NaN) # be aware: gnuplot does integer division here
set datafile missing NaN
plot $Data u 1:2 w l ti "Data", \
t=1 '' u 1:(MovAvg(2)) w l lc rgb "red" ti sprintf("Moving average over %d",N), \
t=1 '' u (MovAvgCenterX(1)):(MovAvg(2)) w l lw 2 lc rgb "green" ti sprintf("Moving average centered over %d",N)
### end of code
Result:

Discrete heat map with GNUPLOT

I'm trying to make something as a heat map with GNUPLOT but I need that my palette takes discrete colors for defined values.
I mean, my data file has three columns, for example:
x y value
0.0 0.0 10
0.0 0.5 2
0.0 1.0 2
0.5 1.0 10
1.0 0.0 -1
1.0 1.0 -1
I need that each point has one color depending of its value. Traditional heat map mixes point making regions of continuos colors, but I need it in a discrete form.
If your data forms a "matrix", i.e., there are M x-samples, N y-samples, and you have the data for all MxN points, then probably the easiest solution is to use
plot ... w rgbimage u 1:2:(r($3)):(g($3)):(b($3))
and supply the r,g,b values as three additional columns as shown above.
However, if your data is "sparse" (only some of the samples are available as shown in your question) and there are not many points, one might be tempted to generate the elementary squares forming the plot manually. To this end, one could proceed as:
set terminal png enhanced
set output 'plot.png'
#custom value -> color mapping
rgb(r, g, b) = 65536 * int(r) + 256 * int(g) + int(b)
fn(val) = rgb(100 + val*10, 0, 0)
#square size
delta = 0.5
set xr [-delta/2:1+delta/2]
set yr [-delta/2:1+delta/2]
set xtics 0,delta/2,1 out nomirror
set ytics 0,delta/2,1 out nomirror
set format x "%.2f"
set format y "%.2f"
set size ratio 1
unset key
fName="test.dat"
load sprintf("<gawk -v d=%f -f parse.awk %s", delta, fName)
plot fName u 1:2:3 w labels tc rgb 'white'
This script assumes the presence of auxiliary gawk script parse.awk in the same directory:
{
printf "set object rectangle from %f,%f to %f,%f fc rgb fn(%d) fs solid\n",
$1-d/2, $2-d/2, $1+d/2, $2+d/2, $3
}
This scripts accepts the required square size (-v d=%f in the invocation of gawk) and generates for each point a statement generating the corresponding square. These statements are consequently executed by the load command.
Mapping of the colors is done via the function fn defined in the main Gnuplot script. It takes the passed value and generates a rgb value which is then used with fc rgb in the rectangle specification.
Together, this then produces:
This might do what you want, after some fiddling:
set view map
set style fill transparent solid noborder
splot 'data' u 1:2:3:(100+200*$3) pt 5 lc rgbcolor var ps 14
The pt 5 will plot a square (at least in the x11 term) at each point in the datafile, colored according to a transformation on the last column.

reduce datapoints when using logscale in gnuplot

I have a large set of data points from x = 1 to x = 10e13 (step size is fixed to about 3e8).
When I try to plot them using a logscale I certainly get an incredible huge point-density towards the end. Of course this affects my output plots since postscript and svg files (holding each and every data point) are getting really big.
Is there a way to tell gnuplot to decrease the data density dynamically?
Sample data here. Shows a straight line using logarithmic x-axis.
Usually, for this kind of plots, one can use a filter function which selects the desired points and discards all others (sets their value to 1/0:
Something like:
plot 'sample.dat' using (filter($1) ? $1 : 1/0):2
Now you must define an appropriate filter function to change the data density. Here is a proposal, with pseudo-data, although you might for sure find a better one, which doesn't show this typical logarithmic pattern:
set logscale x
reduce(x) = x/(10**(floor(log10(x))))
filterfunc(x) = abs(log10(sc)+(log10(x) - floor(log10(x))) - log10(floor(sc*reduce(x))))
filter(x) = filterfunc(x) < 1e-5 ? x : 1/0
set multiplot layout 1,2
sc = 1
plot 'sample.data' using (filter($1)):2 notitle
sc = 10
replot
The variable sc allows to change the density. The result is (with 4.6.5) is:
I did some work inspired by Christoph's answer and able to get equal spacing in log scale. I made a filtering, if you have numbers in the sequence you can simply use Greatest integer function and then find the nearest to it in log scale by comparing the fraction part. Precision is tuned by precision_parameter here.
precision_parameter=100
function(x)=(-floor(precision_parameter*log10(x))+(precision_parameter*log10(x)))
Now filter by using the filter function defined below
density_parameter = 3.5
filter(x)=(function(x) < 1/(log10(x))**density_parameter & function(x-1) > 1/(log10(x))**density_parameter ) ? x : 1/0
set datafile missing "NaN"
Last line helps in plotting with line point. I used x and x-1 assuming the xdata is in arithmetic progression with 1 as common difference, change it accordingly with your data. Just replace x by filter(x) in the plot command.
plot 'sample_data.dat' u (filter($1)):2 w lp

Resources