Correlation coefficient on gnuplot - gnuplot

I want to plot data using fit function : function f(x) = a+b*x**2. After ploting i have this result:
correlation matrix of the fit parameters:
m n
m 1.000
n -0.935 1.000
My question is : how can i found a correlation coefficient on gnuplot ?

You can use the stats command in gnuplot, which has syntax similar to the plot command:
stats "file.dat" using 2:(f($2)) name "A"
The correlation coefficient will be stored in the A_correlation variable. (With no name specification, it would be STATS_correlation.) You can use it subsequently to plot your data or just print on the screen using the set label command:
set label 1 sprintf("r = %4.2f",A_correlation) at graph 0.1, graph 0.85
You can find more about the stats command in gnuplot documentation.

Although there is no direct solution to this problem, a workaround is possible. I'll illustrate it using python/numpy. First, the part of the gnuplot script that generates the fit and connects with a python script:
file = "my_data.tsv"
f(x)=a+b*(x)
fit f(x) file using 2:3 via a,b
r = system(sprintf("python correlation.py %s",file))
ti = sprintf("y = %.2f + %.2fx (r = %s)", a, b, r)
plot \
file using 2:3 notitle,\
f(x) title ti
This runs correlation.py to retrieve the correlation 'r' in string format. It uses 'r' to generate a title for the fit line. Then, correlation.py:
from numpy import genfromtxt
from numpy import corrcoef
import sys
data = genfromtxt(sys.argv[1], delimiter='\t')
r = corrcoef(data[1:,1],data[1:,2])[0,1]
print("%.3f" % r).lstrip('0')
Here, the first row is assumed to be a header row. Furthermore, the columns to calculate the correlation for are now hardcoded to nr. 1 and 2. Of course, both settings can be changed and turned into arguments as well.
The resulting title of the fit line is (for a personal example):
y = 2.15 + 1.58x (r = .592)

Since you are probably using fit function you can first refer to this link to arrive at R2 values.
The link uses certain existing variables like FIT_WSSR, FIT_NDF to calculate R2 value.
The code for R2 is stated as:
SST = FIT_WSSR/(FIT_NDF+1)
SSE=FIT_WSSR/(FIT_NDF)
SSR=SST-SSE
R2=SSR/SST
The next step would be to show the R^2 values on the graph. Which can be achieved using the code :
set label 1 sprintf("r = %f",R2) at graph 0.7, graph 0.7

If you're looking for a way to calculate the correlation coefficient as defined on this page, you are out of luck using gnuplot as explained in this Google Groups thread.
There are lots of other tools for calculating correlation coefficients, e.g. numpy.

Related

Plot the distance between every two points in 2 D

If I have a table with three columns where the first column represents the name of each point, the second column represent numerical data (mean) and the last column represent (second column + fixed number). The following an example how is the data looks like:
I want to plot this table so I have the following figure
If it is possible how I can plot it using either Microsoft Excel or python or R (Bokeh).
Alright, I only know how to do it in ggplot2, I will answer regarding R here.
These method only works if the data-frame is in the format you provided above.
I rename your column to Name.of.Method, Mean, Mean.2.2
Preparation
Loading csv data into R
df <- read.csv('yourdata.csv', sep = ',')
Change column name (Do this if you don't want to change the code below or else you will need to go through each parameter to match your column names.
names(df) <- c("Name.of.Method", "Mean", "Mean.2.2")
Method 1 - Using geom_segment()
ggplot() +
geom_segment(data=df,aes(x = Mean,
y = Name.of.Method,
xend = Mean.2.2,
yend = Name.of.Method))
So as you can see, geom_segment allows us to specify the end position of the line (Hence, xend and yend)
However, it does not look similar to the image you have above.
The line shape seems to represent error bar. Therefore, ggplot provides us with an error bar function.
Method 2 - Using geom_errorbarh()
ggplot(df, aes(y = Name.of.Method, x = Mean)) +
geom_errorbarh(aes(xmin = Mean, xmax = Mean.2.2), linetype = 1, height = .2)
Usually we don't use this method just to draw a line. However, its functionality fits your requirement. You can see that we use xmin and ymin to specify the head and the tail of the line.
The height input is to adjust the height of the bar at the end of the line in both ends.
I would use hbar for this:
from bokeh.io import show, output_file
from bokeh.plotting import figure
output_file("intervals.html")
names = ["SMB", "DB", "SB", "TB"]
p = figure(y_range=names, plot_height=350)
p.hbar(y=names, left=[4,3,2,1], right=[6.2, 5.2, 4.2, 3.2], height=0.3)
show(p)
However Whisker would also be an option if you really want whiskers instead of interval bars.

how to customise plot title in spatstat

How may I change the plot titles and subtitles when using plot command on linnet object. For example
library(spatstat)
first = runiflpp(10, as.linnet(chicago), nsim = 2)
plot(first)
This code above gives two realisations of a a point process and a plot with the plot command because we requested for nsim=2. But it plots the two realisations with plot title 'simulation 1' and 'simulation 2'.
How can I change the subplot titles for example from simulation 1 to experiment 1?
thank you
The simplest way would be to change the names of the items in the list:
names(first) <- paste("experiment", 1:2)
Alternatively you can change the argument main.panel in plot.solist (see ?plot.solist for all the options):
plot(first, main.panel = paste("experiment", 1:2))

plotting a 3D+colour scatter with gnuplot (on torch7)

I'm working with torch7, and I created a PCA function, which gives me an Nx3 tensor which I wish to plot (3D scatter).
I stored it in a file (file.dat).
now I want to plot it, I wrote the following lines
NOTE: those lines are in torch7(lua), but you don't really need to know the language, because the command gnuplot.raw("<command>") uses the regular gnuplot commands.
NOTE 2: I followed helpers on this forum to create this part, I probably read a relevant thread you might want to link here. If you do, please explain what's the difference between the linked explanation an what I did
gnuplot.raw("rgb(r,g,b) = 65536*r + 256*g + b")
gnuplot.raw("blue = rgb(0,0,200)")
gnuplot.raw("red = rgb(200,0,0)")
gnuplot.raw("layer = 1")
gnuplot.raw("splot './file.dat' using 1:2:3:(($4-layer)<0.1 ? red : blue) with points pt 7 linecolor rgb variable notitle")
cols 1 through 3 in file.dat are the x,y,z coordinates, col 4 is either 1 or 2 (determines colour).
LAST NOTE: my script doesn't print an error of any kind, it just doesn't plot the desired 3D scatter.
Thanks ahead

Plot data and fit-functions of multiple files into one plot

I have N input files and I want to plot the data of these files together with their fit function into one single plot (i.e. one plot for all files, data and fit-function).
After a long time of fiddling I found a solution (see below), but I find it "cumbersome and ugly" and I'm wondering if there is a better, more elegant way of achieving the same thing.
I should say that I'm on gnuplot 5.0 under Windows. The test script below doesn't specify a terminal (I'm testing with windows and wxt), but the final script will use pngcairo terminal.
Things that I find sub-optimal about my solution:
I need two intermediary tables $data and $fit. My original attempt was to use a do for{} loop to read each file in turn perform the fit and generate the plot, but that didn't work out.
Rather than using a fit function, I plot the fit curve (in this simple case a straight line) as data into a table. I experimented with creating on-the-fly user functions using eval but just couldn't quite figure it out (especially how to keep them in sync with the data).
I want the fit-equation to be displayed in the chart. I do this by setting labels, but it would be nicer if it would just be part of the key.
My test data:
data1.dat
100 0.15
200 0.29
300 0.46
400 0.58
data2.dat
100 0.12
200 0.22
300 0.35
400 0.48
data3.dat
100 0.1
200 0.22
300 0.29
400 0.40
My gnuplot script:
set key left
set xrange [0:*]
set yrange [0:0.5]
# user function for linear fit
lin(x) = slope * x + offset
max(a,b) = ((a>=b)? a : b)
file_list = "data1 data2 data3"
x_max = 0
# first write all data of interest into a (memory) table
set table $data
do for [name in file_list] {
filename = name . ".dat"
plot filename u 1:2
print ""
print ""
x_max = max(GPVAL_DATA_X_MAX, x_max)
}
unset table
x_max = max(GPVAL_DATA_X_MAX, x_max)
num_indices = words(file_list)
# now calculate a linear fit for each dataset
set sample 2
set table $fit
do for [i = 0:(num_indices-1)]{
fit lin(x) $data index i using 1:2 via slope, offset
plot [0:x_max][0:0.5] lin(x)
set label (i+1) sprintf("%s = %.3g*x + %.3g", word(file_list, i+1)."(x) ", slope, offset) at 200,(0.20 - 0.05*i)
}
unset table
set title "Data and Linear Fit"
set xlabel "x"
set ylabel "y"
#now we got both data and fit for all files, plot everything at once
plot for [i = 0:(num_indices-1)] $data index i title word(file_list,i+1) with points lc i+1, for [i = 0:(num_indices-1)] $fit index i with lines lc i+1 noti
There is always the stupid, brute force way. You can create a new datafile containing all points you want to fit (e.g. using "cat data1.dat data2.dat data3.dat > newdata.dat" in a linux system and then fit newdata).

Gnuplot: Plotting trajectories of multiple objects in separate blocks

I compute the iterated positions of multiple particles, so that my output file looks like :
x1(t=0) y1(t=0)
x2(t=0) y2(t=0)
...
xn(t=0) yn(t=0)
x1(t=1) y1(t=1)
...
xn(t=1) yn(t=1)
(a lot of blocks)
x1(t=p) y1(t=p)
...
xn(t=p) yn(t=p)
For example, the particle 1 is on each first line of a block, etc.
I need to plot the trajectory of each particle in a single plot, with points linked with lines. The problem I stumble upon is to link properly the points corresponding to the correct particle. I found some advice recommending to reformat the data but I have no idea how to handle it. It might be also possible to plot directly the trajectories with a plot command but once again I am low on solutions.
You should be able to do it with a loop (in gnuplot >= 4.6) and the index option to the plot command:
p = (number of particles)
plot for [i=0:p] 'data.dat' index i with linespoints
The with linespoints option also sounds like what you want, which links the data points with lines.
Unfortunately, there is no way to do this with your current datafile setup. You can make a plot which doesn't connect the points using the every (e) keyword:
plot for [i=0:NPOINTS-1] 'test.dat' e ::i::i w p
But, that's not very helpful really if you want the datasets connected, you need to "invert" your data. I'd use python because it's super easy:
#pythonscript.py
import sys #allow us to get commandline arguments
#store data as
#[[x1(t=0) y1(t=0),x2(t=0) y2(t=0),x3(t=0) y3(t=0),...],
# [x1(t=1) y1(t=2),x2(t=2) y2(t=2),x3(t=2) y3(t=2),...],
# ...
# [x1(t=N) y1(t=N),x2(t=N) y2(t=N),x3(t=N) y3(t=N),...],
#]
with open(sys.argv[1]) as fin:
data = []
current = []
data.append(current)
for line in fin:
line = line.rstrip()
if line:
current.append(line)
else:
current = []
data.append(current)
#now transpose the data an write it out. `zip(*data)` will give you:
#[(x1(t=0) y1(t=0),x1(t=1) y1(t=1),x1(t=2) y3(t=2),...),
# (x2(t=0) y2(t=0),x2(t=1) y2(t=1),x2(t=2) y2(t=2),...),
# ...
# (xN(t=0) yN(t=0),xN(t=1) yN(t=1),xN(t=2) yN(t=2),...),
#]
for lst in zip(*data):
for dpoint in lst:
print dpoint
print
For me, given the input file (test.dat):
x1(t=0) y1(t=0)
x2(t=0) y2(t=0)
xn(t=0) yn(t=0)
x1(t=1) y1(t=1)
x2(t=1) y2(t=1)
xn(t=1) yn(t=1)
x1(t=p) y1(t=p)
x2(t=p) y2(t=p)
xn(t=p) yn(t=p)
running python pythonscript.test.dat gives:
x1(t=0) y1(t=0)
x1(t=1) y1(t=1)
x1(t=p) y1(t=p)
x2(t=0) y2(t=0)
x2(t=1) y2(t=1)
x2(t=p) y2(t=p)
xn(t=0) yn(t=0)
xn(t=1) yn(t=1)
xn(t=p) yn(t=p)
Now you can plot that using the solution by andyras:
plot for [i=0:NP] '< python pythonscript.py data.dat' index i w lp

Resources