Performing calculations between multiple data files in gnuplot [duplicate] - gnuplot

I have 2 dat files:
a.dat
#Xs
100 25
200 56
300 75
400 67
b.dat
#Xs
100 65
200 89
300 102
400 167
I want to draw a graph in the gnuplot where the yy values are a ratio between the values of a.dat and b.dat respectively. e.g., 25/65, 56/89, 75/102, and 67/167.
How I do this? I only know make a plot like this, and not with the ratio.
plot "a.dat" using 1:2 with linespoints notitle
"b.dat" using 1:2 with linespoints notitle

You cannot combine the data from two different files in a single using statement. You must combine the two files with an external tool.
The easiest way is to use paste:
plot '< paste a.dat b.dat' using 1:($2/$4) with linespoints
For a platform-independent solution you could use e.g. the following python script, which in this case does the same:
"""paste.py: merge lines of two files."""
import sys
if (len(sys.argv) < 3):
raise RuntimeError('Need two files')
with open(sys.argv[1]) as f1:
with open(sys.argv[2]) as f2:
for line in zip(f1, f2):
print line[0].strip()+' '+line[1],
And then call
plot '< python paste.py a.dat b.dat' using 1:($2/$4) w lp
(see also Gnuplot: plotting the maximum of two files)

The short answer is that... you cannot. Gnuplot processes one file at a time.
The work-around is to use an external tool, e.g. using the shell if you have a unix-like, or gnuplot.
join file1 file2 > merged_file will allow you to merge your files quite easily if the first column is identical in both files. Options allow to join on other columns and manage data missing in either file.
In case there is no common column but hte line number is relevant, paste will do.
In case interpolation is required, because the grids in the two files differ, you have to code this. I have a command-line utility that I distribute as free open-source software and can help: datamerge.

There is a trick if the two datasets don't fit (different sampling in x), but you have a good mathematical model for at least one of them:
fit f2(x) data2 us 1:2 via ...
set table $corr
plot data1 using 2:(f2($1))
unset table
plot $corr using 1:2
This is of course nonsense if both datasets have the same set of independent variables, because you can simply combine them (see other answers).

Just for fun, there is a gnuplot-only and platform-independent solution.
For sure, the command plot '< paste a.dat b.dat' using 1:($2/$4)' from Christoph's answer is unbeatable short (and certainly efficient) in case you are working under Linux or installed the CoreUtils from GnuWin under Windows.
The solution below takes the detour via two long strings. This should work fine unless there is a length limit of strings in gnuplot. I tested only until 100'000 data lines, which will take quite a few minutes. The assumption is that the two files have equal lines with identical x-values. For gnuplot>=5.0 you could write into a datablock instead of a file on disk and do further optimizations.
Script: (works with gnuplot>=4.6.0, March 2012)
### get ratio of numbers from different files
reset
FILE1 = "SO20069641a.dat"
FILE2 = "SO20069641b.dat"
FILE = "SO20069641.dat"
# create some random test data
set samples 100
set table FILE1
plot '+' u (int($0+1)*100):(int(rand(0)*100)+1)
set table FILE2
plot '+' u (int($0+1)*100):(int(rand(0)*100)+1)
unset table
a = ''
b = ''
stats FILE1 u (a=a.' '.strcol(1).' '.strcol(2)) nooutput
stats FILE2 u (b=b.' '.strcol(1).' '.strcol(2)) nooutput
set table "SO20069641.dat"
set samples words(a)/2
splot '+' u (n=2*(int($0+1)),real(word(a,n-1))):(real(word(a,n))):(real(word(b,n)))
unset table
plot FILE u 1:($2/$3) w impulses
### end of script
Result:

Related

Automatic series in gnuplot?

I have input like:
year s1 s2 s3
2000 1 2 3
2001 2 4 6
2002 4 8 12
I don't know how many series. Today it's 3, tomorrow it may be 4.
I want to plot it in a multi-series chart. Something like this:
set key autotitle columnhead
plot 'data/chart-year-subreddit-count' using 1:2 with lines, \
'data/chart-year-subreddit-count' using 1:3 with lines, \
'data/chart-year-subreddit-count' using 1:4 with lines
Except since I don't know how many columns, I don't know what to put in my gnuplot script.
Do I have to write a script to write the file? Or can gnuplot work out how many series there are automatically?
Gnuplot itself cannot count the number of columns, but you can use e.g. wc and head to count the number of columns:
file = 'data/chart-year-subreddit-count'
cols = int(system('head -1 '.file.' | wc -w'))
plot for [i=2:cols] file using 1:i with lines
A bit late, but I disagree with Christoph, since gnuplot 5.0.0 (Jan 2015) was already able to count column via stats and the variable STATS_columns (check help stats).
stats FILE u 0 nooutput
plot for [col=2:STATS_columns] FILE u 1:col
Even for gnuplot 4.6.0 (Mar 2012) there is a gnuplot-only solution (and hence platform-independent).
Current versions:
At least for gnuplot>=5.0.4 (Jul 2016) you have the following option:
plot for [col=2:*] FILE u 1:col

Auto plotting all columns

I have a file with several columns of data (the number of columns N might me quite large). I want to plot all the columns as a function of the first one (that is, plot 'Data.txt' using 1:2, 'Data.txt' using 1:3, ..., 'Data.txt' using 1:N). The thing is, I want this command to work when I don't know the number of columns. Is that possible?
You can count the number of columns in your file using awk and then do a looped plot. There might be a function to get the number of columns in your data file already implemented in gnuplot but I do not know it. You can try this:
N=`awk 'NR==1 {print NF}' Data.txt`
plot for [i=2:N] "Data.txt" u 1:i
If your first row contains a comment (starting by #) change NR== to the appropriate value. If you have a variable number of columns for different rows then you might want to complicate the awk command.
#Paul shows a correct answer, but an even simpler variant is possible. You can use an open-ended iteration that stops when it runs out of columns:
plot for [n=1:*] "data.dat" using 1:n title sprintf("Column %d",n)
Seeing that this questions is very old, I still think it is worth revisiting, as you now (Version 5.2) have access to the number of columns in a file without relying on external tools.
DATA = 'path/to/datafile.txt'
stats DATA
will (among other stuff) store the number of columns in the variable STATS_columns, so now you can do something like:
N=STATS_columns
plot for [i=2:N] DATA using 1:i title DATA.' '.i with lines
which will plot all the columns (assuming the first column is used for the x-axis) with legend entries matching the filename plus the column number.
PS: Not sure when this feature was introduced, but it's there now. :)
You will need two script files:
==== main.plt ====
set <whatever>
N=1
load "loop.plt"
==== loop.plt ====
replot "data.dat" u 0:(column(N))
N+=N+1
if(N<4) reread
Function reread cause that the next line to read by gp will be loop.plt:1. Now you will plot first three columns of data.dat. Function replot adds plot to current image.
Or see: how to convert integer to string in gnuplot?.

Get ratio from 2 files in gnuplot

I have 2 dat files:
a.dat
#Xs
100 25
200 56
300 75
400 67
b.dat
#Xs
100 65
200 89
300 102
400 167
I want to draw a graph in the gnuplot where the yy values are a ratio between the values of a.dat and b.dat respectively. e.g., 25/65, 56/89, 75/102, and 67/167.
How I do this? I only know make a plot like this, and not with the ratio.
plot "a.dat" using 1:2 with linespoints notitle
"b.dat" using 1:2 with linespoints notitle
You cannot combine the data from two different files in a single using statement. You must combine the two files with an external tool.
The easiest way is to use paste:
plot '< paste a.dat b.dat' using 1:($2/$4) with linespoints
For a platform-independent solution you could use e.g. the following python script, which in this case does the same:
"""paste.py: merge lines of two files."""
import sys
if (len(sys.argv) < 3):
raise RuntimeError('Need two files')
with open(sys.argv[1]) as f1:
with open(sys.argv[2]) as f2:
for line in zip(f1, f2):
print line[0].strip()+' '+line[1],
And then call
plot '< python paste.py a.dat b.dat' using 1:($2/$4) w lp
(see also Gnuplot: plotting the maximum of two files)
The short answer is that... you cannot. Gnuplot processes one file at a time.
The work-around is to use an external tool, e.g. using the shell if you have a unix-like, or gnuplot.
join file1 file2 > merged_file will allow you to merge your files quite easily if the first column is identical in both files. Options allow to join on other columns and manage data missing in either file.
In case there is no common column but hte line number is relevant, paste will do.
In case interpolation is required, because the grids in the two files differ, you have to code this. I have a command-line utility that I distribute as free open-source software and can help: datamerge.
There is a trick if the two datasets don't fit (different sampling in x), but you have a good mathematical model for at least one of them:
fit f2(x) data2 us 1:2 via ...
set table $corr
plot data1 using 2:(f2($1))
unset table
plot $corr using 1:2
This is of course nonsense if both datasets have the same set of independent variables, because you can simply combine them (see other answers).
Just for fun, there is a gnuplot-only and platform-independent solution.
For sure, the command plot '< paste a.dat b.dat' using 1:($2/$4)' from Christoph's answer is unbeatable short (and certainly efficient) in case you are working under Linux or installed the CoreUtils from GnuWin under Windows.
The solution below takes the detour via two long strings. This should work fine unless there is a length limit of strings in gnuplot. I tested only until 100'000 data lines, which will take quite a few minutes. The assumption is that the two files have equal lines with identical x-values. For gnuplot>=5.0 you could write into a datablock instead of a file on disk and do further optimizations.
Script: (works with gnuplot>=4.6.0, March 2012)
### get ratio of numbers from different files
reset
FILE1 = "SO20069641a.dat"
FILE2 = "SO20069641b.dat"
FILE = "SO20069641.dat"
# create some random test data
set samples 100
set table FILE1
plot '+' u (int($0+1)*100):(int(rand(0)*100)+1)
set table FILE2
plot '+' u (int($0+1)*100):(int(rand(0)*100)+1)
unset table
a = ''
b = ''
stats FILE1 u (a=a.' '.strcol(1).' '.strcol(2)) nooutput
stats FILE2 u (b=b.' '.strcol(1).' '.strcol(2)) nooutput
set table "SO20069641.dat"
set samples words(a)/2
splot '+' u (n=2*(int($0+1)),real(word(a,n-1))):(real(word(a,n))):(real(word(b,n)))
unset table
plot FILE u 1:($2/$3) w impulses
### end of script
Result:

Plotting GNUPlot graph after computation

I have a data file that lists hits and misses for a certain cache system. Following is the data file format
time hits misses
1 12 2
2 34 8
3 67 13
...
To plot a 2D graph in GNUPlot for time vs hits, the command would be:
plot "data.dat" using 1:2 using lines
Now I want to plot a graph of time vs hit-ratio, For this can I do some computation for the second column like :
plot "data.dat" using 1:2/ (2 + 3) using lines
Here 1, 2, 3 represent the column number.
Any reference to these kind of graph plotting will also be appreciated.
Thanks in advance.
What you have is almost correct. You need to use $ symbols to indicate the column in the calculation:
plot "data.dat" using 1:($2/($2 + $3))
Since you are using $n to refer to the column numbers, you now are able to use n to refer to the number itself. For example,
plot "data.dat" using 1:(2 * $2)
will double the value in the second column.
In general, you can even plot C functions like log and cos of a given column. For example:
plot "data.dat" u 1:(exp($2))
Note the parens on the outside of the argument that uses the value of a particular column.
See here for more info.

How to treat the first line of the data file as column labels in gnuplot?

I have a table like this:
A B C D E F G H I
10 23998 16755 27656 17659 19708 20328 19377 18925
20 37298 33368 53936 41421 44548 40756 40985 37294
I use this command to plot
plot "C:/file.txt" using 1:2 with lines smooth bezier, "C:/file.txt" using 1:3 with lines smooth bezier, ...
However, all the labels come out as the file name. Is it possible for gnuplot to read the first row and label the lines accordingly?
set key autotitle columnhead
plot for [n=2:12] 'vv.csv' u 1:(column(n)) w lines title columnhead(n)
n starts from 2 to skip the header.
I checked the documentation and I don't see a way to do it automatically, but you can manually set a title with
plot "file.txt" using 1:2 title "A" with lines smooth bezier ...
I once wrote a script to plot FM radio station frequencies along an axis from 87MHz to 108MHz, using the names of each radio station as vertical labels. This was not a pure gnuplot solution, the input file is processed with perl with make, but I suggest you have a look at it and see if you can use something like that.
You could also use a gnuplot toolkit such as this one for Python if you want have a lot of data to plot and you want to automate the extraction of the titles.

Resources