GNU set heatmap axis limits around a dynamically computed point - gnuplot

I'm plotting a heatmap in gnuplot from a text file that is in matrix format:
z11 z12 z13
z21 z22 z23
z31 z32 z33
and so forth, using the following command (not including axis labelling, etc, for brevity):
plot '~/some_text_file.txt' matrix notitle with image
The matrix is quite large, in excess of 50 000 elements in the majority of cases, and it's mostly due to the size of my y-dimension (#rows). I would like to know if there's a way to change the limits in the y-dimension for a set number of values around a maximum, while keeping the x and z dimensions the same. E.g. if a maximum in the matrix is at [4000, 33], I want my y range to be centred at 4000 +- let's say 20% of length of the y-dimension.
Thanks.

Edit:
The solution below is basically the correct idea, however it works in my example but not in general because a bug in how gnuplot uses the stats command with matrix files. See the comments after the answer for further info.
You can do this using stats to get the indices that correspond to the maximum value dynamically.
Consider the following file which I named data:
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 5 3 4
0 1 2 3 4
If I run statsI get:
gnuplot> stats "data" matrix
* FILE:
Records: 25
Out of range: 0
Invalid: 0
Blank: 0
Data Blocks: 1
* MATRIX: [5 X 5]
Mean: 2.1200
Std Dev: 1.5315
Sum: 53.0000
Sum Sq.: 171.0000
Minimum: 0.0000 [ 0 0 ]
Maximum: 5.0000 [ 3 2 ]
COG: 2.9434 2.0566
The maximum value is in position [ 3 2 ] meaning row 3+1 and column 2+1 (in gnuplot the first row/column would be number 0). After running stats some variables are created automatically (help stats for more info), with STATS_index_max_x and STATS_index_max_y among them, which store the position of the maximum:
gnuplot> print STATS_index_max_x
3.0
gnuplot> print STATS_index_max_y
2.0
Which you can use to automatically set the ranges. Now, because STATS_index_max_x actually gives you the y (instead of x) position, you'll need to be careful. The total number of rows to obtain the range can be obtained with a system call (there might be a better built-in function, which I do not know):
gnuplot> range = system("awk 'END{print NR}' data")
gnuplot> print range
5
So basically you'll do:
stats "data" matrix
range = system("awk 'END{print NR}' data")
range_center = STATS_index_max_x
d = 0.2 * range
set yrange [range_center - d : range_center + d]
which will center the yrange at the position of your maximum value and will stretch it by +-20% of its total range.
The result of plot "data" matrix w image is now
instead of

Related

does the ordering of the points in vtkCellArray imply adjacency?

I have a closed contour in the form of a polyline. I am accessing the point
through vtkPolyData.GetLines() and iterating through the cells in
vtkCellArray.
I want to calculate the angle bisector at each vertex of the line. Therefore
I need to know the coordinate of V_{i-1}, V_i and V_{i+1}.
In the vtkCellArray, [n0, p_1, p_2,... , p_n0, ... ] , if p_2 comes after
p_1 in the cell , does it mean that p_1 and p_2 are connected together?
Yes, it does. Just to test your case with vtkPolyLine, let's create a vtkPolyData with a single vtkPolyLine where the last point of the line is same as the first point. We will see that the resultant cell array has the same sequence (i.e. the last and first point are the same.)
import vtk as v
pts = v.vtkPoints()
pts.InsertNextPoint(0,0,0)
pts.InsertNextPoint(1,0,0)
pts.InsertNextPoint(2,0,0)
pts.InsertNextPoint(3,0,0)
polyLine = v.vtkPolyLine()
polyLine.GetPointIds().SetNumberOfIds(5)
polyLine.GetPointIds().SetId(0,0)
polyLine.GetPointIds().SetId(1,1)
polyLine.GetPointIds().SetId(2,2)
polyLine.GetPointIds().SetId(3,3)
polyLine.GetPointIds().SetId(4,0)
lines = v.vtkCellArray()
lines.InsertNextCell(polyLine)
pd = v.vtkPolyData()
pd.SetPoints(pts)
pd.SetLines(lines)
wr = v.vtkPolyDataWriter()
wr.SetFileName('Lines.vtk')
wr.SetInputData(pd)
wr.Write()
The file Lines.vtk contains the following:
# vtk DataFile Version 4.2
vtk output
ASCII
DATASET POLYDATA
POINTS 4 float
0 0 0 1 0 0 2 0 0
3 0 0
LINES 1 6
5 0 1 2 3 0 # This line has 5 points and last and first point are the same (0)

Gnuplot with Linear Regression

I am trying to apply linear regression with (Fit(x)). Instead of having two columns in a data file, e.g. x and y values, this file have, for example, 5 columns. I want to pick the avg value of each column and feed it to the F(X) function.
Data:
A B C D E
2 2 5 10 20
4 5 6 11 1
6 8 7 12 4
8 9 12 13 8
10 11 10 14 17
Could I?
Thanks for help
Assume that your data is as specified and you wish to fit a function f(x)=m*x+b to the data where the 0-based column index (0,1,2,3 or 4) should be the x value and the column average should be the y value. We need to construct a new data file that contains the averages.
In gnuplot 5, we can use something called inline data. This is a special variable that behaves like a file. We will find the average of each column of the data and construct an inline data variable containing these. We do this by looping over the column indices and applying the stats function. The print command can be instructed to print to an inline data variable.
set print $l append
do for [i=1:5] {
stats datafile u i nooutput
print STATS_mean
}
set print # restores ordinary print behavior
With your data, we can see what is contained in $l by printing it with print $l:
6.0
7.0
8.0
12.0
10.0
We now can apply the fit command with this data
f(x) = m*x + b
fit f(x) $l u 0:1 via m,b
This will fit the data so that f(x) = average of column x (or as close to it as can be obtained with the fit).
In gnuplot 4.6, inline data is not available, but we can use a temporary file. Replacing all occurrences of $l with "tempfile" will work the same (except for the print $l command), but will add the data to a temporary file named tempfile.

Gnuplot summing y values for same x values

I have a dataset which looks like this:
0 1 0.1
0 0 0.1
0 1 0.1
1 0 0.2
0 1 0.2
1 0 0.2
...
I now want to do the following operations on each different value in the third column of the table:
Example for 0.1:
First column values summed: 0+0+0=0
Second column values summed: 1+0+1=2
Now I want to substract these two 2-0=2 and in a last step divide them by the occurrences.
2/3 =0.667
The same for 0.2 and my plot should then plot at x=0.1, y=0.667.
I hope my problem is with the example understandable.
You can use the smooth unique option to do exactly this: sum up all y-values belonging to the same x-value and then divide the result by the number of occurences. For the second column, upon which the operation is performed, you use the difference between the second and first column:
plot 'file.txt' using 3:($2 - $1) smooth unique
However, it seems like you'll run in a strange bug then. This works only correct, if you insert an empty or commented row at the beginning of your data file:
The result with the following file.txt
#
0 1 0.1
0 0 0.1
0 1 0.1
1 0 0.2
0 1 0.2
1 0 0.2
is

How to get the value of a specific column in a specific line in any time of processing in gnuplot?

I got a data file in the format like this:
# begin
16 1
15 2
14 3
13 4
12 5
11 6
Now I want to use gnuplot to draw a line through the points:
(1, (16/16)) (2, (16/15)) (3, (16/14)) ... (6, (16/11))
As you see, the x axis is the range [1:6] and the Y axis corresponds the values obtained from the number in the first line at the first column(ie. 16 in this example) divided by the number in each line at the first column.
The problem is that I don't know how to get the value of the number at the first column in the first line (16), so that I could do something like
plot "datafile" using 2:(16/$1) with linespoints
I have done a lot of search about how to achieve that but with no luck. It seems that gnuplot doesn't provide some flexible ways to allow arbitrary data selection. Any ideas how to do that? Or maybe I just got stuck into a not so common problem?
Thanks for your help in advance.
You can use the stats command to extract a single numerical value from your data file. The row is selected with the every option, the column with the using:
col = 1
row = 0
stats 'datafile' every ::row::row using col nooutput
value = STATS_min
plot "datafile" using 2:(value/$1) w lp
Note, that column numbering starts at 1, and row numbering at 0 (comment lines are skipped and aren't counted).

heatmap with category data

I'm trying to draw a heatmap via gnuplot. The problem is: how to accumulate data with gnuplot.
Starting with one dataset:
0 0 0
0 1 1
1 0 2
1 1 3
that can be easily plot via
set view map
splot 'test.data' using 2:1:3 with image
The problem is: there is not only one dataset, but many. See this example data:
0 0 0
0 1 1
1 0 2
1 1 3
0 0 3
0 1 2
1 0 1
1 1 20
It has repeating x/y-values. Is it possible to use gnuplot to sum up the third column (the "data-column" like displayed here:
0 0 0 0 0 3 0 0 3
0 1 1 0 1 2 0 1 3
1 0 2 + 1 0 1 = 1 0 3
1 1 3 1 1 20 1 1 23
My first idea was to use every like in plot 'test.data' using 2:1:3 every 4 with image. But this doesn't work. Does anyone have an idea how to do this?
For the interested ones: i want to plot a heatmap of my fitbit data:
https://gist.github.com/senfi/c0d13a2c91fae13bc5f5
This file contains nine weeks of counted steps i made. the first column is the day of the week (sunday to saturday). The second column represents 5-minute-steps through the day starting at 0:00am. Plotting a single week looks nice, but plotting the sum/average of the last two years may look pretty awesome. Of course, i will post a picture, if we figure it out how to plot this. Feel free to use the steps-data.
This looks like a job for awk to me. awk can be called from within gnuplot like this:
sp '<awk ''{a[$1,$2]+=$3}END{for(i in a){split(i,s,SUBSEP);print s[2],s[1],a[i]}}'' test.data' w image
The awk script accumulates the value of the third column into the array a. The key for each value is the string [$1 SUBSEP $2] (equivalent to [$1,$2]). $N is the value of column N. SUBSEP is a built in variable whose value we don't need to worry about, we just refer to it again later.
When the whole file has been read (the END block), split is used to recover the first two columns by breaking up the array keys. The two parts of the key are printed, followed by the accumulated value. I rearranged the column order in awk as well (print s[2],s[1],a[i]) so that back in gnuplot, using 2:1:3 is no longer needed.

Resources