I have N input files and I want to plot the data of these files together with their fit function into one single plot (i.e. one plot for all files, data and fit-function).
After a long time of fiddling I found a solution (see below), but I find it "cumbersome and ugly" and I'm wondering if there is a better, more elegant way of achieving the same thing.
I should say that I'm on gnuplot 5.0 under Windows. The test script below doesn't specify a terminal (I'm testing with windows and wxt), but the final script will use pngcairo terminal.
Things that I find sub-optimal about my solution:
I need two intermediary tables $data and $fit. My original attempt was to use a do for{} loop to read each file in turn perform the fit and generate the plot, but that didn't work out.
Rather than using a fit function, I plot the fit curve (in this simple case a straight line) as data into a table. I experimented with creating on-the-fly user functions using eval but just couldn't quite figure it out (especially how to keep them in sync with the data).
I want the fit-equation to be displayed in the chart. I do this by setting labels, but it would be nicer if it would just be part of the key.
My test data:
data1.dat
100 0.15
200 0.29
300 0.46
400 0.58
data2.dat
100 0.12
200 0.22
300 0.35
400 0.48
data3.dat
100 0.1
200 0.22
300 0.29
400 0.40
My gnuplot script:
set key left
set xrange [0:*]
set yrange [0:0.5]
# user function for linear fit
lin(x) = slope * x + offset
max(a,b) = ((a>=b)? a : b)
file_list = "data1 data2 data3"
x_max = 0
# first write all data of interest into a (memory) table
set table $data
do for [name in file_list] {
filename = name . ".dat"
plot filename u 1:2
print ""
print ""
x_max = max(GPVAL_DATA_X_MAX, x_max)
}
unset table
x_max = max(GPVAL_DATA_X_MAX, x_max)
num_indices = words(file_list)
# now calculate a linear fit for each dataset
set sample 2
set table $fit
do for [i = 0:(num_indices-1)]{
fit lin(x) $data index i using 1:2 via slope, offset
plot [0:x_max][0:0.5] lin(x)
set label (i+1) sprintf("%s = %.3g*x + %.3g", word(file_list, i+1)."(x) ", slope, offset) at 200,(0.20 - 0.05*i)
}
unset table
set title "Data and Linear Fit"
set xlabel "x"
set ylabel "y"
#now we got both data and fit for all files, plot everything at once
plot for [i = 0:(num_indices-1)] $data index i title word(file_list,i+1) with points lc i+1, for [i = 0:(num_indices-1)] $fit index i with lines lc i+1 noti
There is always the stupid, brute force way. You can create a new datafile containing all points you want to fit (e.g. using "cat data1.dat data2.dat data3.dat > newdata.dat" in a linux system and then fit newdata).
Related
I have more than 1000 files named as "snap%d_beta800.dat" where %d is a number between 1 and 1000.
I want to plot every one of these files in a separate surface plot (splot function) (using three columns) , save the result in png format with the same name as the original file: e.g snap1.png
I want to write a script that can do this for all the 1000 files in just once by loading a gpl file
In addition to that i want to create an animation for the 1000 files.
I am appreciating if you can help with that and please have a look of what I tried
what I tried does not give me a separate plot for every file, it just accumulates the plots of all the files in only one plot
set term png
splot [][][-3:3] for [i=1:1000] 'snap'.i.'_beta800.dat' us\
($1)-($4)/2:($2)-($5)/2:($3*0)-($6)/2:\
($4)*1:($5)*1:($6):($6) w vec head filled size screen 0.015,10,30 lw 2 lc pal z
set output "snap".i.".png"
replot
set term x11
As #GRSousaJr wrote, put it into a do for loop.
I'm wondering why you are writing your plot command like this:
... using ($1)-($4)/2:($2)-($5)/2:($3*0)-($6)/2:($4)*1:($5)*1:($6):($6) ...
I would simply write:
... using ($1-$4/2):($2-$5/2):(-$6/2):4:5:6:6 ...
Code:
### Batch create PNG files
set term pngcairo size 600,600
do for [i=1:1000] {
fname_in = sprintf("snap%d_beta800.dat",i)
fname_out = sprintf("snap%d_beta800.png",i)
set output fname_out
splot fname_in u ($1-$4/2):($2-$5/2):(-$6/2):4:5:6:6 \
w vec head filled size screen 0.015,10,30 lw 2 lc pal z
}
set output
### end of code
I assume you want create your animation from these 1000 PNG files with some other software. Maybe you are aware that you can also create an animated GIF with gnuplot:
Code:
### Create animated file
set term gif size 600,600 animate delay 12 loop 0 optimize
set output "Animation.gif"
do for [i=1:1000] {
fname = sprintf("snap%d_beta800.dat",i)
splot fname u ($1-$4/2):($2-$5/2):(-$6/2):4:5:6:6 \
w vec head filled size screen 0.015,10,30 lw 2 lc pal z
}
set output
### end of code
The gnuplot stats command can be used to report stats for an input dataset. It creates a set of variables containing information about a specific column in the dataset. Here is an example of such use:
set print "StatDat.dat"
do for [i=2:9] { # Here you will use i for the column.
stats 'data.dat' u i nooutput ;
print i, STATS_median, STATS_mean , STATS_stddev # ...
}
set print
plot "StatDat.dat" us 1:2 # or whatever column you want...
It would be useful to include the reported column header, something like:
print STATS_columnheader, STATS_median, STATS_mean , STATS_stddev # ...
However gnuplot does not provide the required STATS_columnheader variable.
Is there an alternative way to achieve this ?
You can use an external tool such as awk to extract and return a column header. You can create a function like this:
columnheading(f,c) = system("awk '/^#/ {next}; {print $".c.";exit}' ".f)
which, given a file f and a column number c, will return the column header. You'd use it like this:
print columnheading('StatDat.dat',i).' ', STATS_median, STATS_mean , STATS_stddev # ...
The awk expression skips all lines until the first non-comment line, prints the word given by the c parameter and exits. The printed word is returned by gnuplots system command.
Workaround
A quick & dirty solution already expressed as comment to the answer here:
it is possible to store the header line once for all in a variable, then to call it when it is needed.
Under *nix it is possible to use head or a combination such head -n 10| tail -n 1 if it is in the 10th line...
Here the example modified:
firstrow = system('head -1 '.datafile) # you call here only one time
set print "StatDat.dat"
do for [i=2:9] { # Here you will use i for the column.
stats datafile u i nooutput ;
print word(firstrow, i), " ", STATS_median, STATS_mean , STATS_stddev
# or whatever you want...
}
set print
plot "StatDat.dat" us 1:2 # or whatever column you want...
Note that the gnuplot function word will return the nth word in string, so you may have problem if the header is composed from more than a word...
... problems that you can overcome with other tricks
Only a path to a hack/trick
The following doesn't works because gnuplot starts to process the file in the plot command after skipping the header and the commented lines...
Since a function can assume the form f(x) = (statement1, statement2, statement3, return value) executing the statements and returning the value (see, e.g.), you can image to build a function that stores the first line "field by filed" in an array (directly form gnuplot 5.1, via some other tricks before), maybe hiding the plot with set terminal unknown.
array MyHeader[4]
f(x,y,z) = (x == 0 ? (MyHeader[y]=z, z ) : z)
set terminal unknown # set terminal dumb
set key autotitle columnhead
do for [i=2:4] { # Here you will use i for the column.
plot datafile using 1:(f($0,i,column(i)))
}
print MyHeader
Unfortunately the above script stores only the 1st row values...
but for the moment I've finished the time I can dedicate to this problem :-(
(maybe someone can find useful some hint, or finish it).
What exactly do you want to do with the headers?
As you can see in the plot command, apparently gnuplot considers the first uncommented line as header line which is then for example used for the legend.
No need for external tools or scripts. With stats you can also easily extract any line, e.g. the second line via every ::1::1.
Since gnuplot 5.2.0 (Sep 2017) you can use arrays which you can also plot,e.g. for tables within the plot.
Script:
### extract header lines via stats
reset session
$Data <<EOD
# comment line
PosX PosY Density # header line
x/m y/cm g/cm2 # (sub)header line
1 1.1 2.1
2 1.2 2.2
3 1.3 2.3
4 1.4 2.4
5 1.5 2.5
6 1.6 2.6
7 1.7 2.7
8 1.8 2.8
9 1.9 2.9
EOD
array Headers[3]
array Units[3]
do for [i=1:|Headers|] {
stats $Data u (Headers[i]=strcol(i)) every ::0::0 nooutput
stats $Data u (Units[i] =strcol(i)) every ::1::1 nooutput
}
print Headers
print Units
set key top left
plot $Data u 1:2 w lp pt 7 lc "red" ti columnheader, \
'' u 1:3 w lp pt 7 lc "blue" ti columnheader, \
Headers u ($1+3):(2.1):2 w labels notitle, \
Units u ($1+3):(1.9):2 w labels notitle
### end of code
Result:
["PosX","PosY","Density"]
["x/m","y/cm","g/cm2"]
I have a data file containing a gaussian function, and an other date file that contains one column with 3 rows. Those three row are all constant which are
1: mean+variance
2: mean
3: mean-variance
from the gaussian in the first file.
I would like to plot all these as constant lines on the gaussian function. I've tried the "every" command, (plot "stat.dat" every ::0::0 w lines) which didn't work.
Thank you, any help is appreciated.
Do you mean something like this?
set terminal pngcairo
set output "gauss.png"
set samples 1000
x0 = -5
s2 = 1
set xrange [-10:10]
set yrange [0:0.5]
plot (1/sqrt(2*pi*s2))*exp(-(x-x0)**2/(2*s2)) title "Gaussian", \
"stat.dat" u 1:(5) every ::0::0 w impulse title "mean + variance", \
"stat.dat" u 1:(5) every ::1::1 w impulse title "mean", \
"stat.dat" u 1:(5) every ::2::2 w impulse title "mean - variance"
I have replaced your data file which contains the gaussian function by an analytical expression. The result looks as follows:
I realise this is perhaps trivial and if I had more time I'd probably easily deal with it myself, but I'm running out of time and I desperately need to get this animation working as soon as possible.
I have data file of the type
0 28.3976 25.1876 12.7771
0.03125 34.1689 21.457 9.70863
0.0625 35.7084 17.6016 5.03987
0.09375 34.3048 13.6718 1.45238
...
where the first column is meant to be treated as time (it is in fact a numerical solution to a certain ODE system). Now. what I need is an animation of a 3d plot of the last three columns tracing a curve as it moves around with time. Is that doable? If so, how? I'm a complete gnuplot beginner and googling around did not help much. I would hugely appreciate any help. Cheers!
The following should show you an animated plot:
# define fixed axis-ranges
set xrange [-1:1]
set yrange [0:20]
set zrange [-1:1]
# filename and n=number of lines of your data
filedata = 'data.dat'
n = system(sprintf('cat %s | wc -l', filedata))
do for [j=1:n] {
set title 'time '.j
splot filedata u 2:3:4 every ::1::j w l lw 2, \
filedata u 2:3:4 every ::j::j w p pt 7 ps 2
}
The first line of the splot command plots the trayectory, and the second line plots the point at the current time.
If you want a gif of this output, simply add the following before the for-loop:
set term gif animate
set output 'output.gif'
This is an example output:
Related:
StackOverflow: Gif Animation in Gnuplot
gnuplot-surprising: creating gif animation
gnuplotting: Animation IV – trajectory
I want to plot data using fit function : function f(x) = a+b*x**2. After ploting i have this result:
correlation matrix of the fit parameters:
m n
m 1.000
n -0.935 1.000
My question is : how can i found a correlation coefficient on gnuplot ?
You can use the stats command in gnuplot, which has syntax similar to the plot command:
stats "file.dat" using 2:(f($2)) name "A"
The correlation coefficient will be stored in the A_correlation variable. (With no name specification, it would be STATS_correlation.) You can use it subsequently to plot your data or just print on the screen using the set label command:
set label 1 sprintf("r = %4.2f",A_correlation) at graph 0.1, graph 0.85
You can find more about the stats command in gnuplot documentation.
Although there is no direct solution to this problem, a workaround is possible. I'll illustrate it using python/numpy. First, the part of the gnuplot script that generates the fit and connects with a python script:
file = "my_data.tsv"
f(x)=a+b*(x)
fit f(x) file using 2:3 via a,b
r = system(sprintf("python correlation.py %s",file))
ti = sprintf("y = %.2f + %.2fx (r = %s)", a, b, r)
plot \
file using 2:3 notitle,\
f(x) title ti
This runs correlation.py to retrieve the correlation 'r' in string format. It uses 'r' to generate a title for the fit line. Then, correlation.py:
from numpy import genfromtxt
from numpy import corrcoef
import sys
data = genfromtxt(sys.argv[1], delimiter='\t')
r = corrcoef(data[1:,1],data[1:,2])[0,1]
print("%.3f" % r).lstrip('0')
Here, the first row is assumed to be a header row. Furthermore, the columns to calculate the correlation for are now hardcoded to nr. 1 and 2. Of course, both settings can be changed and turned into arguments as well.
The resulting title of the fit line is (for a personal example):
y = 2.15 + 1.58x (r = .592)
Since you are probably using fit function you can first refer to this link to arrive at R2 values.
The link uses certain existing variables like FIT_WSSR, FIT_NDF to calculate R2 value.
The code for R2 is stated as:
SST = FIT_WSSR/(FIT_NDF+1)
SSE=FIT_WSSR/(FIT_NDF)
SSR=SST-SSE
R2=SSR/SST
The next step would be to show the R^2 values on the graph. Which can be achieved using the code :
set label 1 sprintf("r = %f",R2) at graph 0.7, graph 0.7
If you're looking for a way to calculate the correlation coefficient as defined on this page, you are out of luck using gnuplot as explained in this Google Groups thread.
There are lots of other tools for calculating correlation coefficients, e.g. numpy.