heatmap with category data - gnuplot

I'm trying to draw a heatmap via gnuplot. The problem is: how to accumulate data with gnuplot.
Starting with one dataset:
0 0 0
0 1 1
1 0 2
1 1 3
that can be easily plot via
set view map
splot 'test.data' using 2:1:3 with image
The problem is: there is not only one dataset, but many. See this example data:
0 0 0
0 1 1
1 0 2
1 1 3
0 0 3
0 1 2
1 0 1
1 1 20
It has repeating x/y-values. Is it possible to use gnuplot to sum up the third column (the "data-column" like displayed here:
0 0 0 0 0 3 0 0 3
0 1 1 0 1 2 0 1 3
1 0 2 + 1 0 1 = 1 0 3
1 1 3 1 1 20 1 1 23
My first idea was to use every like in plot 'test.data' using 2:1:3 every 4 with image. But this doesn't work. Does anyone have an idea how to do this?
For the interested ones: i want to plot a heatmap of my fitbit data:
https://gist.github.com/senfi/c0d13a2c91fae13bc5f5
This file contains nine weeks of counted steps i made. the first column is the day of the week (sunday to saturday). The second column represents 5-minute-steps through the day starting at 0:00am. Plotting a single week looks nice, but plotting the sum/average of the last two years may look pretty awesome. Of course, i will post a picture, if we figure it out how to plot this. Feel free to use the steps-data.

This looks like a job for awk to me. awk can be called from within gnuplot like this:
sp '<awk ''{a[$1,$2]+=$3}END{for(i in a){split(i,s,SUBSEP);print s[2],s[1],a[i]}}'' test.data' w image
The awk script accumulates the value of the third column into the array a. The key for each value is the string [$1 SUBSEP $2] (equivalent to [$1,$2]). $N is the value of column N. SUBSEP is a built in variable whose value we don't need to worry about, we just refer to it again later.
When the whole file has been read (the END block), split is used to recover the first two columns by breaking up the array keys. The two parts of the key are printed, followed by the accumulated value. I rearranged the column order in awk as well (print s[2],s[1],a[i]) so that back in gnuplot, using 2:1:3 is no longer needed.

Related

EXCEL Count number of weeks in month based on date

I am trying to look up a value in a matrix based on a given date. The matrix has the first day of the week along the vertical axis, and the first day of the month along the horizontal axis.
For a given day, e.g. 31/08/15 I would like to match the exact date to the vertical axis of the matrix (i.e. 31/08/15), and the month to the horizontal axis (1/08/15).
So in the example below, an input of 31/08/15 should provide an output of 3.
01/06/2015 01/07/2015 01/08/2015 01/09/2015
03/08/2015 1 0 0 0
10/08/2015 0 2 0 0
17/08/2015 0 0 3 0
24/08/2015 0 0 0 4
31/08/2015 0 0 3 0
I am trying and failing with index and match formulae.
I have tried the following:
=index(area where to look, match(31/08/15,first column,0),match(and(month(31/08/15),year(31/08/15)),(and(month(first row),year(first row)),0)
Hope this is clear, thanks!
You can use an INDEX function with two MATCH functions top supply both the row and column.
    
The formula in D8 is,
=INDEX($B$2:$E$6,MATCH(C8,$A$2:$A$6,0),MATCH(DATE(YEAR(C8),MONTH(C8),1),$B$1:$E$1,0))
I'm a little concerned about the dates matching exactly down column A but a little maths manipulation with the WEEKDAY function would take care of that.
=INDEX($B$2:$E$6,MATCH(C9-WEEKDAY(C9, 2)+1,$A$2:$A$6,0),MATCH(DATE(YEAR(C9),MONTH(C9),1),$B$1:$E$1,0))
Here you go:
=INDEX($B$2:$E$6,MATCH(DATE(2015,8,31),$A$2:$A$6,),MATCH(DATE(2015,8,1),$B$1:$E$1,))

add labels to non-zero elements in stacked column chart Excel 2013

I'm generating a stacked column chart out of an a table like this:
Year 1 2 3
A 50 0 0
B 50 0 0
C 0 100 0
D 0 50 0
E 0 0 10
F 0 0 15
I want column stacks for every year, with series labels for every non-zero value. Currently if I try and add labels then I get 6 labels for every series (with the zero values clustered around the axis). How do I generate series labels for all non-zeroes? I am happy to rearrange my data to a different format to the table above. I am using Excel 2013.
You can throw an error to replace the zeros in your data. That way they won't appear on your chart. You can use:
=NA()

Gnuplot summing y values for same x values

I have a dataset which looks like this:
0 1 0.1
0 0 0.1
0 1 0.1
1 0 0.2
0 1 0.2
1 0 0.2
...
I now want to do the following operations on each different value in the third column of the table:
Example for 0.1:
First column values summed: 0+0+0=0
Second column values summed: 1+0+1=2
Now I want to substract these two 2-0=2 and in a last step divide them by the occurrences.
2/3 =0.667
The same for 0.2 and my plot should then plot at x=0.1, y=0.667.
I hope my problem is with the example understandable.
You can use the smooth unique option to do exactly this: sum up all y-values belonging to the same x-value and then divide the result by the number of occurences. For the second column, upon which the operation is performed, you use the difference between the second and first column:
plot 'file.txt' using 3:($2 - $1) smooth unique
However, it seems like you'll run in a strange bug then. This works only correct, if you insert an empty or commented row at the beginning of your data file:
The result with the following file.txt
#
0 1 0.1
0 0 0.1
0 1 0.1
1 0 0.2
0 1 0.2
1 0 0.2
is

GNU set heatmap axis limits around a dynamically computed point

I'm plotting a heatmap in gnuplot from a text file that is in matrix format:
z11 z12 z13
z21 z22 z23
z31 z32 z33
and so forth, using the following command (not including axis labelling, etc, for brevity):
plot '~/some_text_file.txt' matrix notitle with image
The matrix is quite large, in excess of 50 000 elements in the majority of cases, and it's mostly due to the size of my y-dimension (#rows). I would like to know if there's a way to change the limits in the y-dimension for a set number of values around a maximum, while keeping the x and z dimensions the same. E.g. if a maximum in the matrix is at [4000, 33], I want my y range to be centred at 4000 +- let's say 20% of length of the y-dimension.
Thanks.
Edit:
The solution below is basically the correct idea, however it works in my example but not in general because a bug in how gnuplot uses the stats command with matrix files. See the comments after the answer for further info.
You can do this using stats to get the indices that correspond to the maximum value dynamically.
Consider the following file which I named data:
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 5 3 4
0 1 2 3 4
If I run statsI get:
gnuplot> stats "data" matrix
* FILE:
Records: 25
Out of range: 0
Invalid: 0
Blank: 0
Data Blocks: 1
* MATRIX: [5 X 5]
Mean: 2.1200
Std Dev: 1.5315
Sum: 53.0000
Sum Sq.: 171.0000
Minimum: 0.0000 [ 0 0 ]
Maximum: 5.0000 [ 3 2 ]
COG: 2.9434 2.0566
The maximum value is in position [ 3 2 ] meaning row 3+1 and column 2+1 (in gnuplot the first row/column would be number 0). After running stats some variables are created automatically (help stats for more info), with STATS_index_max_x and STATS_index_max_y among them, which store the position of the maximum:
gnuplot> print STATS_index_max_x
3.0
gnuplot> print STATS_index_max_y
2.0
Which you can use to automatically set the ranges. Now, because STATS_index_max_x actually gives you the y (instead of x) position, you'll need to be careful. The total number of rows to obtain the range can be obtained with a system call (there might be a better built-in function, which I do not know):
gnuplot> range = system("awk 'END{print NR}' data")
gnuplot> print range
5
So basically you'll do:
stats "data" matrix
range = system("awk 'END{print NR}' data")
range_center = STATS_index_max_x
d = 0.2 * range
set yrange [range_center - d : range_center + d]
which will center the yrange at the position of your maximum value and will stretch it by +-20% of its total range.
The result of plot "data" matrix w image is now
instead of

How to compute the maximum series of a specific condition returning true

i have a slight issue to count the MAX frequency of where the third colmn is bigger than the second. This is just a statistic with scores.
The issue is that i want to have it in one single formula without a macro.
B C
------
2 0
1 2
2 1
2 3
0 1
1 2
0 1
3 3
0 2
0 2
i have tried it with:
{=MAX(FREQUENCY(B3:B100;B3:B100>=C3:C100))} to get 1 for B
{=MAX(FREQUENCY(C3:C100;C3:C100>=B3:B100))} to get 7 for C
I excpected it to deliver me the longest series where the value in the one column was bigger than in the other one, but i failed hard...
Try this version to get 7
=MAX(FREQUENCY(IF(C3:C100>=B3:B100,IF(B3:B100<>"",ROW(B3:B100))),IF(C3:C100<B3:B100,ROW(B3:B100))))
confirmed with CTRL+SHIFT+ENTER
obviously reverse the ranges to get your other result
See example here

Resources