gnuplot: Heatmap using character combinations - gnuplot

I am currently analysing two character combinations in texts and I want to visualize the frequencies in a heatmap using gnuplot. My input file is in the format (COUNT stands for the actual number of this combination)
a a COUNT
a b COUNT
...
z y COUNT
z z COUNT
Now I'd like to create a heatmap (like the first one that is shown on this site). On the x axis as well on the y axis I'd like to display the characters from A-Z, i.e.
a
b
...
z
a b ... z
I am pretty new to gnuplot, so I tried plot "input.dat" using 2:1:3 with images, which results in an error message "Can't plot with an empty x range". My naive approach to run set xrange['a':'z'] did not help much.
There are a bunch of related questions on SO, but they either deal with numeric x-values (e.g. Heatmap with Gnuplot on a non-uniform grid) or with different input data formats (e.g. gnuplot: label x and y-axis of matrix (heatmap) with row and column names)
So my question is: What is the easiest way to transform my input file into a nice gnuplot heatmap?

You need to convert the alphabet characters to integers. It might be possible to do this somehow in gnuplot, but it would probably be messy.
My solution would be to use a quick python script to convert the datafile (let's say it is called data.dat):
#!/usr/bin/env python2.7
with open('data.dat', 'r') as i:
with open('data2.dat', 'w') as o:
lines = i.readlines()
for line in lines:
line = line.split()
x = str(ord(line[0].lower()) - ord('a'))
y = str(ord(line[1].lower()) - ord('a'))
o.write("%s %s %s\n" % (x, y, line[2]))
This takes a file like this:
a a 1
a b 2
a c 3
b a 4
b b 5
b c 6
c a 7
c b 8
c c 9
and converts it to:
0 0 1
0 1 2
0 2 3
1 0 4
1 1 5
1 2 6
2 0 7
2 1 8
2 2 9
Then you can plot it in gnuplot:
#!/usr/bin/env gnuplot
set terminal pngcairo
set output 'test.png'
set xtics ("a" 0, "b" 1, "c" 2)
set ytics ("a" 0, "b" 1, "c" 2)
set xlabel 'First Character'
set ylabel 'Second Character'
set title 'Character Combination Counts'
plot 'data2.dat' with image
It's a little clunky to set the tics manually that way, but it works fine.

Edit: Revised code, better sticking to the original question.
Your question basically boils down to: is there an ord() function in gnuplot?
Answer: No, there is not, but you can built it yourself, without the need for calling external scripts. The "ASCII-Trick" is taken from here: how can I find out the ASCII code of a character in gnuplot
The following example works with gnuplot>=4.6.0 (version at the time of OP's question).
Code:
### plotting heatmap from "alphabetical data"
reset
# definition of chr() and ord()
chr(n) = sprintf('%c',n)
ASCII = ''; do for [i=1:255] {ASCII = ASCII.chr(i)}
ord(c) = strstrt(ASCII,c)
FILE = "SO20428010.dat"
# create some random test data
set print FILE
do for [i=1:26] for [j=1:26] {
print sprintf("%s %s %d", chr(i+96), chr(j+96), int(rand(0)*101))
}
set print
set size square
set xrange[0:27]
set yrange[27:0] reverse
set key noautotitle
set palette rgb 33,13,10
ChrToInt(col) = ord(strcol(col))-96
plot FILE u (ChrToInt(1)):(ChrToInt(2)):3:xtic(1):ytic(2) w image
### end of code
Result:

Related

Splitting the range of an axis, to reduce white space

I would like to remove specific xtic values (hours axis) in my graph that are not being used with other data. To be precise, I want to keep the following Xranges [0:5, 12:14], but not the xrange [6:11]. This is to help space out my data, since the unused space is currently smashing them together. I will attach a picture to visualize. Thank you for any help
I tried 'set xrange [0:5, 12:14]' but it did not work.
It is important to know the exact structure of the data because the plotting code needs to adapt to it.
For the following example the data structure is as follows:
7 blocks separated by double blank lines with x y z data.
Each block has a constant x value (from the values 0 1 2 3 5 12 14) whereas y and z vary.
If you have double blank lines you can address the blocks via index (check help index).
The x values are not equidistant, but you basically want to make them appear equidistant.
For this, you can use the pseudocolumn -2 (check help pseudocolumns) which contains the block number starting from 0.
The xtic label is used from column 1 (check help xticlabels).
Code:
### plot "non-equidistant" data equidistant
reset session
# create some test data
set print $Data
do for [x in "0 1 2 3 5 12 14"] {
do for [y=-40:40] {
print sprintf("%s %g %g", x, y, (16-x)*cos(0.05*y)**2)
}
print "\n\n"
}
set print
set xyplane at 0
set grid x,y
set view 60,140
set key at screen 0.3, screen 0.95 noautotitle
set xrange [-0.5:6.5]
splot for [i=0:6] $Data u -2:2:3:xtic(1) index i w l lc i
### end of code
Result:

Gnuplot: how to prevent xticklabels from override "set xtics"

I have such a data file with three columns(named "0.dat", which has nearly 10000 rows of data, and the following is just for example):
i ii iii
1 a 1
2 b 2
3 c 6
4 d 8
5 e 10
6 f 12
7 g 14
8 h 16
9 i 18
10 j 20
The first and third column are x coordinate and y coordinate that is for plot.
The second column(ii) is used for xtics label.
I want xtic and its label to appear on x-axis in an interval of 3, that is, only in the position 1 4 7 on x-axis, should there be xtic mark and xtic label "a d g".
But the following script shows that each poit in my data file creat a xtic, that is to say, "xticklabels" overrides "set xtics".
set term png size 800,600
set output "0.png"
set grid
set xrang [1:]
set xtics 3
plot "0.dat" using 1:3:xticlabels(2) axes x1y1 w l
set output
pause 0
how to prevent xticklabels from override "set xtics"?
xticlabels always overwrite the auto-generated labels.
However you can include the original label as part of the xticlabel.
Here is one option that prints the column 1 content as a number
and the column 2 content as a string.
1) Define the format for the label. Then we use that format for every 3rd label, with a blank label for the other 2 slots:
2) Skip the first line of the file, which does not contain data
3) Use the label format for every 3rd line, with blank labels otherwise
set bmargin 3 # leave room for 2 lines of x labels
label(i1,i2) = sprintf("%d\n%s",column(i1),stringcolumn(i2))
plot '0.dat' skip 1 using 1:3:xticlabel(int($0)%3==0 ? label(1,2) : "") with lines
Alternative approach
Use two plots, one for the actual data with no tic marks, one for only 1/3 of the data with tic marks and labels.
set bmargin 3 # leave room for 2 lines of x labels
label(i1,i2) = sprintf("%d\n%s",column(i1),stringcolumn(i2))
set yrange [0:*] # So that a line at y = -10 will not show
plot '0.dat' skip 1 using 1:3, \
'0.dat' skip 1 every 3 using 1:(-10):xticlabel(label(1,2)) with lines

Plotting horizontal lines in gnuplot on an existing graph, using the same coloured lines

The starting point is that I have a graph with 4 lines on it. They are the results of my simulation, plotted over an x-axis of iteration, at 4 different locations. I also have experimental values at each of those locations. I want to plot those 4 experimental values as horizontal lines on the same graph. I would also like the line colours of the simulation and experiment results at each location to be the same.
With #Tom's help, below, I have got the following script to do this:
unset bars
max = 1e6
set xrange[7000:24000]
set yrange[-0.5:1.5]
plot for [i=2:5] 'sim' using 1:(column(i)) ls i, \
for [i=1:4] 'expt' using (1):1:(max) every ::(i-1)::(i-1) with xerror ls i ps 0
The problem is that I want the values in xrange[x_min:x_max] and yrange[y_min:y_max] to be taken from sim and expt as follows:
x_min = min(sim[:1]) # where min(sim[:1]) means "min value in file 'sim' col 1"
x_max = max(sim[:1])
y_min = min(sim[:2],sim[:3],sim[:4],sim[:5],expt[:1])
y_max = max(sim[:2],sim[:3],sim[:4],sim[:5],expt[:1])
My OS is Scientific Linux: Release 6.3, Kernel Linux 2.6.32-358.2.1.el6.x86_64, GNOME 2.28.2
sim and expt are .txt files
A representative sample of sim is:
7520 0.282511 0.0756715 -0.222863 -0.0898819
7521 0.315944 0.201687 -0.321723 -0.106345
7522 0.230956 0.102217 -0.34196 -0.061009
7523 1.460043 -0.00118292 -0.045077 0.673926
A representative sample of expt is:
1.112
0.123
-0.45
0.862
Thank you for your help.
I think that this is a way to solve your problem:
unset bars
max = 1e6
set xrange[0:8]
plot for [i=1:4] 2*i+sin(x) ls i, \
for [i=1:4] 'expt' using (1):1:(max) every ::(i-1)::(i-1) with xerror ls i ps 0
Based on some information I found on Gnuplot tricks, I have (ab)used error bars to produce horizontal lines based on the points in this data file:
2
4
6
8
The (1):1:(max) specifies that a point should be plotted at the coordinate (1, y), where y is read from the data file. The max is the value of xdelta, which determines the size of the x error bar. This is one way of achieving a horizontal line in your plot, as a suitably large value of max will result in an error bar across the entire xrange of your plot.
Here's what the output looks like:
Considering, that you have a data file with five columns, one with the x-values and four with y-values. Now you have additional file where a number path_to_expt comes from. In order to plot the columns and one horizontal line having the y-value path_to_expt you can use
plot for [i=2:5] path_to_file using 1:(column(i))
This plot col 2 against 1, 3 vs 1, 4 vs 1 and 5 vs 1. To get different styles, just use set linetype to redefine the automatically assigned line types:
set linetype 1 lc rgb 'orange'
# ... other lt definitions
plot for [i=2:5] path_to_file using 1:(column(i))
If you don't want to overwrite exising linetype 1..4, use e.g. 11..14:
set linetype 11 lc rgb 'orange'
# ...
plot for [i=2:5] path_to_file using 1:(column(i)) lt (9 + i)
Finally, in order to plot a horizontal line, using the same x-values as in the data file, use
mynumber = 27
plot path_to_file using 1:(mynumber)
If you don't put a number in parentheses, it is interpreted as column number (like the 1 here), whereas put inside parentheses, it is treated as number.
Another option would be to set arrows:
set arrow from graph 0, first mynumber to graph 1, first mynumber lt 1
plot for [i=2:5] path_to_file using 1:(column(i))

Gnuplot setting x axis value as lines from text

The title might be a little bit confusing , but I can't think of anything better.
I have a file that contains values, for example:
1 2 15
1 2 15
1 2 15
...and so on, and so on N times, where N is the number of lines in the file.
The problem is, then the values across the file are all the same (nothing changes), as in the aforementioned example, I get an error:
Warning: empty x range [0:0], adjusting to [-1:1]
and the plot consists of only dots in the middle of the picture. What I'd like to see in such a case is a series of lines, in this case on y = 1, 2, and 15.
So, how can I set gnuplot to use line num as x value?
The row number can be accessed as column 0:
set style data line
unse key
plot 'file.txt' using 0:1, '' using 0:2, '' using 0:3
This gives you three lines at y=1, y=2 and y=15
You can also iterate over the columns:
plot for [i=1:3] 'file.txt' using 0:i

How to label (x,y) data points in Gnuplot 4.2 with integer numbers

I have a text file with 2 columns of numbers corresponding to (x,y) coords.
4 1
4 5
1 1
1 5
2.5 3
How do I tell gnuplot to plot these points and label each point with its corresponding row #? (Please keep in mind I'm going to apply this to a much larger file with 100 points, so I'm looking for a way to do it automagically, rather than have to create a 3rd column of data corresponding to row numbers).
You can use the with labels flag to the plot command. By default this places the label instead of the point at the place where the point would be. with label takes the offset flag (and any flag you can pass to set label) so you can have the label next to the point. Here is an example script:
#!/usr/bin/env gnuplot
reset
set terminal pngcairo
set output 'test.png'
set xr [0:5]
set yr [0:6]
plot 'data.dat' pt 7, \
'data.dat' using 1:2:($0+1) with labels offset 1 notitle
which produces this output:

Resources