Gnuplot summing y values for same x values - gnuplot

I have a dataset which looks like this:
0 1 0.1
0 0 0.1
0 1 0.1
1 0 0.2
0 1 0.2
1 0 0.2
...
I now want to do the following operations on each different value in the third column of the table:
Example for 0.1:
First column values summed: 0+0+0=0
Second column values summed: 1+0+1=2
Now I want to substract these two 2-0=2 and in a last step divide them by the occurrences.
2/3 =0.667
The same for 0.2 and my plot should then plot at x=0.1, y=0.667.
I hope my problem is with the example understandable.

You can use the smooth unique option to do exactly this: sum up all y-values belonging to the same x-value and then divide the result by the number of occurences. For the second column, upon which the operation is performed, you use the difference between the second and first column:
plot 'file.txt' using 3:($2 - $1) smooth unique
However, it seems like you'll run in a strange bug then. This works only correct, if you insert an empty or commented row at the beginning of your data file:
The result with the following file.txt
#
0 1 0.1
0 0 0.1
0 1 0.1
1 0 0.2
0 1 0.2
1 0 0.2
is

Related

Excel expert , I need to find ocurrences . Formula use LET TEXTJOINT SUBSTITUTE

data is in A2:K2:
=LET(txt, TEXTJOIN("", FALSE, 0, --A2:K2, 0), modTxt, SUBSTITUTE(txt, "0.5", 1), halfDaysAdjust, 0.5*(LEN(txt)-LEN(SUBSTITUTE(txt, "00.50", "0000"))), LEN(modTxt)-LEN(SUBSTITUTE(modTxt, "10", "0"))-halfDaysAdjust)
This is formula that is giving me the 3 first logics I need to add one more logic without messing up the other logics
Example of logics
1 0 1 0 1 = 3 ocurrences
1 1 1 0 1 = 2 ocurrences
.5 0 .5 .5 0 = 1.5 ocurrences
1 .5 0 .5 .5 = 3 ocurrences
Description:
First logic :1 are not consecutive is count as separate occurrence
2nd logic: 1s that are consecutive is 1 occurrence
3rd logic:.5 are count as a half point always
4th logic and this is the one I need to add:If a .5 is next to a 1 that .5 becomes in 1 occurrence.
This the example ............................
=LET(txt,TEXTJOIN("",0,0,--A2:H2,0),
modtxt,SUBSTITUTE(
SUBSTITUTE(txt,"10.5","1010"),
"0.51","101"),
LEN(modtxt)-LEN(SUBSTITUTE(modtxt,"10","1"))
+((LEN(modtxt)-LEN(SUBSTITUTE(modtxt,"0.5","00")))/2))
I first modified the 1 followed by 0.5 to 1010 so it would count both as a whole. Then I counted the occurances of 10 in the modified text.
I added the count of occurances of 0.5 in the modified text and divided that count by 2 so it would add the count as halves.

Finding the first value greater than x in dynamic arrays for Excel

As shown on Table 1, I have a list of tenors and on Table 2 there is a list of cashflow times.
I would like to make a fully dynamic sheet and are using "#" referencing.
(1) the first tenor that is greater than the cashflow time column (as shown on result 2)
(2) the last value that is smaller than the cashflow time column (as shown on result 1).
Table 1
tenor
0
0.25
0.5
1
2
3
4
5
Table 2
cashflow time
result1
result2
-0.7392
n/a
0
0.1697
0
0.25
0.4216
0.25
0.5
0.6735
0.5
1
0.9253
0.5
1
1.1690
1
2
1.4209
1
2
For result1:
=XLOOKUP(C2:C8,A2:A9,A2:A9,,-1)
For result2:
=XLOOKUP(C2:C8,A2:A9,A2:A9,,1)
where C2:C8 are the cashflow time values nd A2:A9 are the tenor values.

GNU set heatmap axis limits around a dynamically computed point

I'm plotting a heatmap in gnuplot from a text file that is in matrix format:
z11 z12 z13
z21 z22 z23
z31 z32 z33
and so forth, using the following command (not including axis labelling, etc, for brevity):
plot '~/some_text_file.txt' matrix notitle with image
The matrix is quite large, in excess of 50 000 elements in the majority of cases, and it's mostly due to the size of my y-dimension (#rows). I would like to know if there's a way to change the limits in the y-dimension for a set number of values around a maximum, while keeping the x and z dimensions the same. E.g. if a maximum in the matrix is at [4000, 33], I want my y range to be centred at 4000 +- let's say 20% of length of the y-dimension.
Thanks.
Edit:
The solution below is basically the correct idea, however it works in my example but not in general because a bug in how gnuplot uses the stats command with matrix files. See the comments after the answer for further info.
You can do this using stats to get the indices that correspond to the maximum value dynamically.
Consider the following file which I named data:
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 5 3 4
0 1 2 3 4
If I run statsI get:
gnuplot> stats "data" matrix
* FILE:
Records: 25
Out of range: 0
Invalid: 0
Blank: 0
Data Blocks: 1
* MATRIX: [5 X 5]
Mean: 2.1200
Std Dev: 1.5315
Sum: 53.0000
Sum Sq.: 171.0000
Minimum: 0.0000 [ 0 0 ]
Maximum: 5.0000 [ 3 2 ]
COG: 2.9434 2.0566
The maximum value is in position [ 3 2 ] meaning row 3+1 and column 2+1 (in gnuplot the first row/column would be number 0). After running stats some variables are created automatically (help stats for more info), with STATS_index_max_x and STATS_index_max_y among them, which store the position of the maximum:
gnuplot> print STATS_index_max_x
3.0
gnuplot> print STATS_index_max_y
2.0
Which you can use to automatically set the ranges. Now, because STATS_index_max_x actually gives you the y (instead of x) position, you'll need to be careful. The total number of rows to obtain the range can be obtained with a system call (there might be a better built-in function, which I do not know):
gnuplot> range = system("awk 'END{print NR}' data")
gnuplot> print range
5
So basically you'll do:
stats "data" matrix
range = system("awk 'END{print NR}' data")
range_center = STATS_index_max_x
d = 0.2 * range
set yrange [range_center - d : range_center + d]
which will center the yrange at the position of your maximum value and will stretch it by +-20% of its total range.
The result of plot "data" matrix w image is now
instead of

heatmap with category data

I'm trying to draw a heatmap via gnuplot. The problem is: how to accumulate data with gnuplot.
Starting with one dataset:
0 0 0
0 1 1
1 0 2
1 1 3
that can be easily plot via
set view map
splot 'test.data' using 2:1:3 with image
The problem is: there is not only one dataset, but many. See this example data:
0 0 0
0 1 1
1 0 2
1 1 3
0 0 3
0 1 2
1 0 1
1 1 20
It has repeating x/y-values. Is it possible to use gnuplot to sum up the third column (the "data-column" like displayed here:
0 0 0 0 0 3 0 0 3
0 1 1 0 1 2 0 1 3
1 0 2 + 1 0 1 = 1 0 3
1 1 3 1 1 20 1 1 23
My first idea was to use every like in plot 'test.data' using 2:1:3 every 4 with image. But this doesn't work. Does anyone have an idea how to do this?
For the interested ones: i want to plot a heatmap of my fitbit data:
https://gist.github.com/senfi/c0d13a2c91fae13bc5f5
This file contains nine weeks of counted steps i made. the first column is the day of the week (sunday to saturday). The second column represents 5-minute-steps through the day starting at 0:00am. Plotting a single week looks nice, but plotting the sum/average of the last two years may look pretty awesome. Of course, i will post a picture, if we figure it out how to plot this. Feel free to use the steps-data.
This looks like a job for awk to me. awk can be called from within gnuplot like this:
sp '<awk ''{a[$1,$2]+=$3}END{for(i in a){split(i,s,SUBSEP);print s[2],s[1],a[i]}}'' test.data' w image
The awk script accumulates the value of the third column into the array a. The key for each value is the string [$1 SUBSEP $2] (equivalent to [$1,$2]). $N is the value of column N. SUBSEP is a built in variable whose value we don't need to worry about, we just refer to it again later.
When the whole file has been read (the END block), split is used to recover the first two columns by breaking up the array keys. The two parts of the key are printed, followed by the accumulated value. I rearranged the column order in awk as well (print s[2],s[1],a[i]) so that back in gnuplot, using 2:1:3 is no longer needed.

What is the last row of an origin for?

i have the rotation matrix
cos sin 0 0
-sin cos 0 0
0 0 0 0
0 0 0 1
If I were to change the last row to
1 1 1 1
will it rotate with (1,1,1) as the axis?
If not, what it do, and what does the '1' at row 4 column 4 do?
If you change the last row, it will change the 4th coordinate, which in turn will change the x,y,z coordinates of your vector. The 1 at the bottom right corner simply preserves the 4th coordinate after matrix multiplication.
The 4th coordinate is a "scaling factor" (called the Homogenious coordinate). It is used for perspective projection. In short, (x,y,z,w) is converted to (x/w,y/w,z/w).

Resources