How to build a 3D histogram in GNUPLOT - gnuplot

I have a data file (stat_data_raw.dat). I need to build such a diagram based on it.
Is it possible to do it this way in gnuplot? I broke my head while trying to do this in gnuplot. Such a chart as in the picture is built in Excel.
Data file: stat_data_raw.dat ( https://dropmefiles.com/oeO1L )
Build in Excel: pivot_and_chart.xlsx ( https://dropmefiles.com/xdoqy )
I've already tried a lot. I am using gnuplot v5.4 patchlevel 3.
I have already visited a bunch of pages on the Internet, including the official website and Stack Overflow. But I could not find a suitable script for me to adapt for myself. Even though I know some algorithmic programming languages, the syntax of gnuplot seems strange and confusing to me. It is a pity that there is no visual editor in which one could build and set up a graph, and then export it to the gnuplot format.
I also visited the page http://gnuplot.sourceforge.net/demo_5.4/boxes3d.html many times, but the example given there is too simple. The data in the candlesticks.dat file is very primitive.
I tried to build a graph based on the stat_data_matrix.dat file
(https://dropmefiles.com/omptl)
The data is in the form of a matrix. I can prepare the input data in any format (in the form of a matrix or, as in the first case, flat data). I don't know how best to work with gnuplot.
The maximum that I got on the matrix data:
set terminal qt size 1920, 1080
set encoding utf8
set datafile separator '\t'
set xyplane 0
set boxwidth .3
set boxdepth .3
set cbrange [0.5:15]
unset key; unset colorbox
set view 44, 200
splot for [col = 2:30] 'c:\LOAD\GNUPLOT\stat_data_matrix.dat' u ($0):(col):(column(col)):(col):xtic(1) with boxes lc pal title columnhead

The examples on the gnuplot homepage are a starting point, but sometimes it requires a lot more (commands, experience and understanding) to get the desired graph with all "tricks" and "treats".
I took your text data (not the matrix data).
Your datafile separator is actually TAB, although it might be easier to keep separator whitespace (and not switch to TAB only, i.e. datafile separator tab).
Hence, the first column is a time format, the second column is just "user", the 3rd column the user number, and the 4th column the z-value. I guess it is redundant to write 32 times "user" on the tic label, so I put it once into the y-label.
Some more comments:
time in gnuplot is handled as seconds from 01.01.1970 00:00:00, that's why the boxwidth is 24*3600 = 1 day, and tic spacing says set xtics 24*3600.
this type of graphical representation is not optimal since (depending on the viewing angle) you might hide some data, e.g. data from user 11 is partly hidden by data from user 12. So, you could also play with the viewing angle to improve or maybe sort the users to avoid this.
the 3D-bars are now centered on the grid lines. If you want to have the grid lines at the edges of the bars (like in your Excel image) you have play some more "tricks".
Look at the example below which should be starting point for further tweaking.
Check help datafile separator, help timecolumn, help view, basically to every command you should find a help entry.
Update:
Now with added labels of the z-value . Check plotting style with labels (check help labels).
grid lines are now at the border or the 3D-bars (earlier: centered) check help mxtics.
what I haven't found out yet is how to rotate the tic labels in a 3D plot. I asked a question about this and maybe there will be a better answer than mine which is placing and rotating the tic labels "manually".
As mentioned earlier, depending on the data, hiding of data by higher 3D-bars in front probably cannot be avoided completely. In order to minimize this the users are sorted by highest average. For this, you need to implement a few more lines:
you plot the data to tables (check help table) using the options smooth unique and smooth zsort (check help smooth).
then you are (mis-)using stats (check help stats) to put the order of users into the variable USERS.
during plotting you are using the ternary operator (check help ternary) to filter the data and plot it in the sequence given by USERS.
to get the ytic labels right, you have to place the labels via ytic(...), check help xticlabels.
Edit: if you set the tics via yticlabels(), you won't see the mytics anymore and the minor y-grid lines will not show up. Hence, you have to add the ytic labels manually (pretty annyoing)!
As you can see, if you want to rearrange data a bit it can get pretty awkward in gnuplot, which is not easy to understand for a beginner. Well, gnuplot wants to be a plotting tool, not a data processing tool.
Data: SO73521453.dat (TAB cannot be displayed here, but in the example below we are using whitespace anyway)
Date User Count
2022-07-29 User 1 53
2022-07-29 User 2 3
2022-07-29 User 3 1
2022-07-29 User 4 2
2022-07-29 User 5 1
2022-07-29 User 6 5
2022-07-29 User 7 1
2022-07-29 User 8 1
2022-07-30 User 1 2
2022-07-30 User 2 2
2022-07-30 User 6 1
2022-07-30 User 9 1
2022-07-31 User 1 1
2022-07-31 User 10 1
2022-08-01 User 1 37
2022-08-01 User 2 1
2022-08-01 User 11 1
2022-08-01 User 3 2
2022-08-01 User 4 4
2022-08-01 User 5 1
2022-08-01 User 6 7
2022-08-01 User 9 1
2022-08-01 User 8 1
2022-08-01 User 12 12
2022-08-02 User 1 40
2022-08-02 User 3 2
2022-08-02 User 4 13
2022-08-02 User 5 1
2022-08-02 User 6 6
2022-08-02 User 10 1
2022-08-02 User 12 11
2022-08-03 User 1 25
2022-08-03 User 2 5
2022-08-03 User 13 4
2022-08-03 User 3 4
2022-08-03 User 14 2
2022-08-03 User 4 10
2022-08-03 User 5 1
2022-08-03 User 6 5
2022-08-03 User 15 1
2022-08-03 User 12 2
2022-08-04 User 1 81
2022-08-04 User 3 1
2022-08-04 User 14 1
2022-08-04 User 4 2
2022-08-04 User 5 1
2022-08-04 User 6 1
2022-08-04 User 16 2
2022-08-04 User 17 2
2022-08-04 User 10 1
2022-08-04 User 18 1
2022-08-04 User 12 6
2022-08-05 User 1 40
2022-08-05 User 14 2
2022-08-05 User 4 3
2022-08-05 User 6 3
2022-08-05 User 9 3
2022-08-05 User 10 1
2022-08-05 User 19 1
2022-08-05 User 15 1
2022-08-05 User 18 4
2022-08-05 User 12 17
2022-08-06 User 1 1
2022-08-07 User 1 1
2022-08-07 User 12 2
2022-08-08 User 1 30
2022-08-08 User 13 8
2022-08-08 User 3 3
2022-08-08 User 4 12
2022-08-08 User 5 3
2022-08-08 User 6 3
2022-08-08 User 20 2
2022-08-08 User 12 19
2022-08-08 User 21 1
2022-08-09 User 1 51
2022-08-09 User 11 2
2022-08-09 User 13 6
2022-08-09 User 4 4
2022-08-09 User 6 5
2022-08-09 User 22 1
2022-08-09 User 12 12
2022-08-09 User 21 1
2022-08-09 User 23 1
2022-08-10 User 1 61
2022-08-10 User 2 2
2022-08-10 User 13 2
2022-08-10 User 4 2
2022-08-10 User 6 1
2022-08-10 User 24 1
2022-08-10 User 25 1
2022-08-10 User 15 1
2022-08-10 User 12 10
2022-08-10 User 21 1
2022-08-11 User 1 27
2022-08-11 User 2 4
2022-08-11 User 13 2
2022-08-11 User 14 2
2022-08-11 User 4 2
2022-08-11 User 5 1
2022-08-11 User 6 7
2022-08-11 User 26 1
2022-08-11 User 12 16
2022-08-12 User 1 23
2022-08-12 User 11 1
2022-08-12 User 13 7
2022-08-12 User 3 1
2022-08-12 User 4 1
2022-08-12 User 6 11
2022-08-12 User 20 1
2022-08-12 User 10 1
2022-08-12 User 12 4
2022-08-13 User 11 2
2022-08-14 User 1 2
2022-08-15 User 1 59
2022-08-15 User 2 3
2022-08-15 User 13 5
2022-08-15 User 3 2
2022-08-15 User 14 1
2022-08-15 User 4 3
2022-08-15 User 5 1
2022-08-15 User 6 9
2022-08-15 User 24 1
2022-08-15 User 26 1
2022-08-15 User 27 1
2022-08-15 User 28 1
2022-08-15 User 12 6
2022-08-15 User 23 2
2022-08-16 User 1 53
2022-08-16 User 11 1
2022-08-16 User 13 2
2022-08-16 User 3 1
2022-08-16 User 6 2
2022-08-16 User 24 1
2022-08-16 User 9 1
2022-08-16 User 12 7
2022-08-17 User 1 58
2022-08-17 User 11 2
2022-08-17 User 13 2
2022-08-17 User 3 2
2022-08-17 User 4 3
2022-08-17 User 5 1
2022-08-17 User 6 9
2022-08-17 User 10 1
2022-08-17 User 29 1
2022-08-17 User 12 23
2022-08-17 User 21 1
2022-08-18 User 1 54
2022-08-18 User 2 3
2022-08-18 User 11 1
2022-08-18 User 13 5
2022-08-18 User 3 1
2022-08-18 User 5 1
2022-08-18 User 6 2
2022-08-18 User 28 1
2022-08-18 User 8 1
2022-08-18 User 12 17
2022-08-19 User 1 64
2022-08-19 User 2 1
2022-08-19 User 13 2
2022-08-19 User 3 1
2022-08-19 User 6 5
2022-08-19 User 24 2
2022-08-19 User 9 1
2022-08-19 User 8 1
2022-08-19 User 12 2
2022-08-20 User 1 1
2022-08-20 User 11 2
2022-08-20 User 6 2
2022-08-21 User 1 2
2022-08-21 User 6 3
2022-08-22 User 1 60
2022-08-22 User 2 2
2022-08-22 User 11 1
2022-08-22 User 13 7
2022-08-22 User 3 1
2022-08-22 User 5 1
2022-08-22 User 6 10
2022-08-22 User 28 1
2022-08-22 User 8 1
2022-08-22 User 12 16
2022-08-22 User 23 1
2022-08-23 User 1 31
2022-08-23 User 2 1
2022-08-23 User 13 4
2022-08-23 User 14 1
2022-08-23 User 6 3
2022-08-23 User 16 1
2022-08-23 User 18 1
2022-08-23 User 12 15
2022-08-24 User 1 50
2022-08-24 User 13 2
2022-08-24 User 3 3
2022-08-24 User 14 1
2022-08-24 User 5 2
2022-08-24 User 6 5
2022-08-24 User 9 1
2022-08-24 User 28 1
2022-08-24 User 12 3
2022-08-25 User 1 32
2022-08-25 User 11 1
2022-08-25 User 13 4
2022-08-25 User 30 1
2022-08-25 User 5 2
2022-08-25 User 6 4
2022-08-25 User 16 1
2022-08-25 User 9 1
2022-08-25 User 15 1
2022-08-25 User 12 24
2022-08-26 User 1 11
2022-08-26 User 2 1
2022-08-26 User 13 8
2022-08-26 User 5 1
2022-08-26 User 6 7
2022-08-26 User 31 14
2022-08-26 User 28 1
2022-08-26 User 12 2
2022-08-27 User 2 2
2022-08-27 User 5 1
2022-08-27 User 31 2
2022-08-27 User 9 2
2022-08-27 User 28 1
2022-08-28 User 28 1
Script:
### 3D bars with labels and sorting
reset session
FILE = "SO73521453.dat"
# sort users by highest average
set table $Temp1
plot FILE u 3:4 smooth unique # get average for each user
set table $Temp2
plot $Temp1 u 1:2:(-$2) smooth zsort # sort by highest average
unset table
USERS = ''
stats $Temp2 u (USERS=sprintf("%s %d",USERS,$1)) nooutput # get user order in a string
set datafile separator whitespace
set format x "%b %d" timedate
myBoxSize = 1.0
set boxwidth 24*3600*myBoxSize
set boxdepth myBoxSize
set wall y0 fc "white"
set wall x1 fc "white"
set xyplane at 0
set xtics 24*3600 out scale 0,1 font ",7" offset -1,0.2
set mxtics 2
set ytics 1 out scale 0,1 font ",7" offset 0,0.2
set mytics 2
set ylabel "user" rotate parallel font ",14" offset 0,1
set grid mx,my,z lt 1 lc "grey"
set key noautotitle
set xrange [:] noextend reverse
set yrange [0.5:words(USERS)+0.5] noextend
set view 30,225
set style textbox opaque fc rgb 0x77ffffff
set format y ""
set for [i=1:words(USERS)] label i word(USERS,i) at graph 0, first i, first 0 \
offset 0,-0.5,0 rotate by 0 left font ",9"
splot for [i=1:words(USERS)] FILE u (timecolumn(1,"%Y-%m-%d")):(i):\
($3==word(USERS,i)?$4:NaN):(i) w boxes lc var, \
for [i=1:words(USERS)] '' u (timecolumn(1,"%Y-%m-%d")):(i):\
($3==word(USERS,i)?$4:NaN):4 w labels boxed font ",8"
### end of script
Result: (certainly still to be optimized)

Related

Create a frequency diagram using a dataframe in Pandas (Python3)

I currently have a list of the number of items and their frequency stored in a data frame called transactioncount_freq.
Item Frequency
0 1 3474
1 2 2964
2 3 1532
3 4 937
4 5 360
5 6 168
6 7 57
7 8 25
8 9 5
9 10 5
10 11 3
11 12 1
How would I make a bar chart using the item values as the x axis and the frequency values as the y axis using pandas and matplotlib.pyplot?
You can plot it easily like this
transactioncount_freq.plot(x='Item', y='Frequency', kind='bar')

Excel: HLOOKUP() where blank cells are skipped

I am trying to create an HLOOKUP() style formula that, if it finds a matching heading where the reported value of the row it's on except if it is blank it skips it and looks for the next column with the same heading in the same row.
An example of the data table is as follows:
Heading 1 Heading 2 Heading 1 Heading 4 Heading 5 Heading 1
Sample 1 1 7 13 19
Sample 2 8 14 20 2
Sample 3 9 15 21 3
Sample 4 4 10 16 22
Sample 5 5 11 17 23
Sample 6 12 6 18 24
As you can see, the data under headings 2, 4 and 5 are all in single columns, but the heading 1 values are split between three columns.
I need the final data set to look like this:
Heading 1 Heading 2 Heading 4 Heading 5
Sample 1 1 7 13 19
Sample 2 2 8 14 20
Sample 3 3 9 15 21
Sample 4 4 10 16 22
Sample 5 5 11 17 23
Sample 6 6 12 18 24
I have done some research online and have found a formula that I thought was meant to work as a VLOOKUP(), I can't quite work out what it's doing and when I try it on a transposed version of my data set it doesn't quite do what I expect. I Have been trying to get it work in and also convert it to work in the opposite orientation. The formula is as follows:
{=INDEX($B$3:$G$8,SMALL(IF(INDEX($A$3:$G$8,,MATCH(B$11,$B$2:$G$2,0))<>"",IF($A$3:$A$8=$A12,ROW($A$3:$G$8)-ROW($A3)+$I12)),1),MATCH(B$11,$B$2:$G$2,0))}
This formula is from https://www.mrexcel.com/forum/excel-questions/689238-vlookup-match-but-ignore-blank-cells.html
Running the formula on a transposed version of my data set results in the following:
**Transposed data set**
Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6
Heading 1 1 4 5
Heading 2 7 8 9 10 11 12
Heading 1 6
Heading 4 13 14 15 16 17 18
Heading 5 19 20 21 22 23 24
Heading 1 2 3
**Result**
Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6
Heading 1 1 0 3 0 5 0 1
Heading 2 7 8 9 10 11 12 2
Heading 4 13 14 15 16 17 18 3
Heading 5 19 20 21 22 23 24 4
**Expected result**
Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6
Heading 1 1 2 3 4 5 6
Heading 2 7 8 9 10 11 12
Heading 4 13 14 15 16 17 18
Heading 5 19 20 21 22 23 24
I think that I am probably over complicating this and that there must be a simpler solution to the problem. Any help that anyone can give me would be great. Let me
Thanks!
This is maybe faaar to simple, but why don't you simply add the values of the ´Heading 1´ columns? The empty values are treated as value 0, and by the end you'll have the values you are looking for :-)

Combining and reading data from Excel (.xlsx) into Matlab

There are two parts of my query:
1) I have multiple .xlsx files stored in a folder, a total of 1 year's worth (~ 365 .xlsx files). They are named according to date: ' A_ddmmmyyyy.xlsx' (e.g. A_01Jan2016.xlsx). Each .xlsx has 5 columns of data: Date, Quantity, Latitude, Longitude, Measurement. The problem is, each .xlsx file consists about 400,000 rows of data and although I have scripts in Excel to merge them, the inherent row restriction in Excel prevents me from merging all the data together.
(i) Is there a way to read recursively the data from each .xlsx sheet into MATLAB, and specifying the variable name (i.e. Date, Quantity etc) for each column(variable) within MATLAB (there are no column headings in the .xlsx files)?
(ii) How can I merge the data for each column from each .xlsx together?
Thank you
Jefferson
Let's go by parts
First I do not recommend to join all your files data in one column, there is no need to have this information all together you can work separately with this, using for example datastore
working in matlab in mya directory:
>> pwd
ans =
/home/anquegi/learn/matlab/stackoverflow
I have a folder with a folder that have two sample excel files:
>> ls
20_hz.jpg big_data_store_analysis.m excel_files octave-workspace sample-file.log
40_hz.jpg chirp_signals.m NewCode.m sample.csv
>> ls excel_files/
A_01Jan2016.xlsx A_02Jan2016.xlsx
the content of each file is :
Date Quantity Latitude Longitude Measurement
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
9 9 9 9 9
10 10 10 10 10
11 11 11 11 11
12 12 12 12 12
13 13 13 13 13
14 14 14 14 14
15 15 15 15 15
16 16 16 16 16
17 17 17 17 17
18 18 18 18 18
19 19 19 19 19
20 20 20 20 20
21 21 21 21 21
22 22 22 22 22
Only to who how it will work.
Reading the data:
>> ssds = spreadsheetDatastore('./excel_files')
ssds =
SpreadsheetDatastore with properties:
Files: {
'/home/anquegi/learn/matlab/stackoverflow/excel_files/A_01Jan2016.xlsx';
'/home/anquegi/learn/matlab/stackoverflow/excel_files/A_02Jan2016.xlsx'
}
Sheets: ''
Range: ''
Sheet Format Properties:
NumHeaderLines: 0
ReadVariableNames: true
VariableNames: {'Date', 'Quantity', 'Latitude' ... and 2 more}
VariableTypes: {'double', 'double', 'double' ... and 2 more}
Properties that control the table returned by preview, read, readall:
SelectedVariableNames: {'Date', 'Quantity', 'Latitude' ... and 2 more}
SelectedVariableTypes: {'double', 'double', 'double' ... and 2 more}
ReadSize: 'file'
Now you have all your data in tables let's see a preview
>> data = preview(ssds)
data =
Date Quantity Latitude Longitude Measurement
____ ________ ________ _________ ___________
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
The preview is a good point to get sample data to work.
You do not need to merge you can work throught all the elements:
>> ssds.VariableNames
ans =
'Date' 'Quantity' 'Latitude' 'Longitude' 'Measurement'
>> ssds.VariableTypes
ans =
'double' 'double' 'double' 'double' 'double'
% let's get all the Latitude elements that have Date equal 1, in this case the tow files are the same, so we wil get two elements with value 1
>> reset(ssds)
accum = [];
while hasdata(ssds)
T = read(ssds);
accum(end +1) = T(T.Date == 1,:).Latitude;
end
>> accum
accum =
1 1
So you need to work with datastore and tables, is a bit tricky but very useful, you also would like to control the readsize and other variables in datastore objects. but this is a good way working with large data files in matlab
For older versions of matlab you can use a more traditional approximation:
folder='./excel_files';
filetype='*.xlsx';
f=fullfile(folder,filetype);
d=dir(f);
for k=1:numel(d);
data{k}=xlsread(fullfile(folder,d(k).name));
end
Now you have the data stored in data
folder='./excel_files';
filetype='*.xlsx';
f=fullfile(folder,filetype);
d=dir(f);
for k=1:numel(d);
data{k}=xlsread(fullfile(folder,d(k).name));
end
data
data =
[22x5 double] [22x5 double]
data{1}
ans =
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
9 9 9 9 9
10 10 10 10 10
11 11 11 11 11
12 12 12 12 12
13 13 13 13 13
14 14 14 14 14
15 15 15 15 15
16 16 16 16 16
17 17 17 17 17
18 18 18 18 18
19 19 19 19 19
20 20 20 20 20
21 21 21 21 21
22 22 22 22 22
But be carefull with a lot of large file

gnuplot: fetching a variable value from different row/column for calculations

I want to get a specific value from another row & column to normalize my data. The tricky part is, that this value changes for every data point in my data set.
Here what my data set looks like:
64 22370 1 585 1 10
128 47547 1 4681 1 10
256 291761 1 37449 1 10
128 48446 1.019 4681 1 10
256 480937 1.648 37449 1 10
128 7765 0.163 777 0.166 10
256 7164 0.025 1393 0.037 10
128 37078 0.780 4681 1 10
256 334372 1.146 37449 1 10
128 45543 0.958 4681 1 10
128 5579 0.117 649 0.139 10
128 40121 0.844 4529 0.968 10
128 49494 1.041 4681 1 10
# --> here it starts to repeat
64 48788 1 585 1 20
128 110860 1 4681 1 20
256 717797 1 37449 1 20
128 101666 0.917 4681 1 20
......
......
This data file contains all points for in total 13 different sets, so I plot it with something like this:
plot\
'../logs.dat' every 13::1 u 6:2 title '' with lines lt 3 lc 'black' lw 1,\
'../logs.dat' every 13::3 u 6:2 title '' with lines lt 3 lc 'black' lw 1,\
Now I try to normalize my data. The interesting value is respectively the 1st row 2nd column (starting to count at 0) $1:$2 and then adds 13 to the rows for every data point
For example: The first data set I want to plot would be
(10:47547/47547)
(20:110860/110860)
...
The second plot should be
(10:48446/47547)
(20:101666/110860)
...
And so on.
In pseudo code I would read something like
plot\
'../logs.dat' every 13::1 u 6:($2 / take i:$2 for i = i + 13 ) title '' with lines lt 3 lc 'black' lw 1,\
'../logs.dat' every 13::3 u 6:($2 / take i:$2 for i = i + 13 ) title '' with lines lt 3 lc 'black' lw 1,\
I hope I could make clear what I try to archive.
Thank you for any help!
If the value you want to use for normalisation is the very first to be plotted, then something like this is possible:
plot y0=-1e10, "data" using 1:(y0 == -1e10 ? (y0 = $2, 1) : $2/y0)
The normalisation value y0 is initialised to -1e10 on every replot. Check the help for ternary operator and serial evaluation.
But really you'd better pre-process your data.
If I understood your question correctly you want to normalize some of your data in a special way.
For the first plot you want to start from the second line (row-index 1) and divide the value in the column by itself and continue for every 13th row.
So, this is dividing the values of the second column for the following row indices: 1/1, 14/14, 27/27, ..., (n*13+1)/(n*13+1). This is trivial because it will always be 1.
For the second plot you want to start with the value in column 2 from row index 3 and divide it by the value in column2 of row index 1 and repeat this for every 13th row.
i.e. involved rows-indices: 3/1, 16/14, 29/27, ..., (n*13+3)/(n*13+1)
For the second case, a construct with every 13 will not work because you need every 13th value and every 13th shifted by 2 rows.
So, what you can do:
If you pass by row-index 1 (and every 13th row later), remember the value in column 2 and when you pass by row-index 3, divide this value by the remembered value and plot it, otherwise plot NaN. Repeat this for all rows cycled by 13. You can use the pseudocolumn 0 (check help pseudocolumns) and the modulo operator (check help operators binary).
If you want a continuous line with lines or linespoints you need to set datafile missing NaN because NaN values would interrupt the lines (check help missing). However, this works only for gnuplot>=5.0.6. For gnuplot 5.0.0 (version at OP's question) you have to use some workaround.
Script:
### special normalization of data
reset session
$Data <<EOD
1 900 3 4 5 10
2 1000 3 4 5 10
3 1050 3 4 5 10
4 1100 3 4 5 10
5 1150 3 4 5 10
6 1200 3 4 5 10
7 1250 3 4 5 10
8 1300 3 4 5 10
9 1350 3 4 5 10
10 1400 3 4 5 10
11 1450 3 4 5 10
12 1500 3 4 5 10
13 1550 3 4 5 10
#
1 1900 3 4 5 20
2 2000 3 4 5 20
3 2050 3 4 5 20
4 2100 3 4 5 20
5 2150 3 4 5 20
6 2200 3 4 5 20
7 2250 3 4 5 20
8 2300 3 4 5 20
9 2350 3 4 5 20
10 2400 3 4 5 20
11 2450 3 4 5 20
12 2500 3 4 5 20
13 2550 3 4 5 20
#
1 2900 3 4 5 30
2 3000 3 4 5 30
3 3050 3 4 5 30
4 3100 3 4 5 30
5 3150 3 4 5 30
6 3200 3 4 5 30
7 3250 3 4 5 30
8 3300 3 4 5 30
9 3350 3 4 5 30
10 3400 3 4 5 30
11 3450 3 4 5 30
12 3500 3 4 5 30
13 3550 3 4 5 30
EOD
M = 13 # cycle of your data
set datafile missing NaN # only for gnuplot>=5.0.6
plot $Data u 6:(1) every M w lp pt 7 lc "red" ti "Normalized 1/1", \
'' u 6:(int($0)%M==1?y0=$2:0,int($0)%M==3?$2/y0:NaN) w lp pt 7 lc "blue" ti "Normalized 3/1"
### end of code
Result:

Horizontal Leader Board based on organisation

I am trying to work out the ranking of top 3 users at different organisations and have the data presented horizontally for each user so it can be inputted into our email system to personalise emails.
I am able to create a ranking vertically but I am not sure how to get the formula to rank based on organisation and return value across.
Here is what I need to have in the end:
Name Organisation Usage First Second Third
User 1 Organisation 1 8 User 3 User 5 User 2
User 2 Organisation 1 10 User 3 User 5 User 2
User 3 Organisation 1 222 User 3 User 5 User 2
User 4 Organisation 1 1 User 3 User 5 User 2
User 5 Organisation 1 14 User 3 User 5 User 2
User 1 Organisation 2 215 User 4 User 1 User 5
User 2 Organisation 2 18 User 4 User 1 User 5
User 3 Organisation 2 12 User 4 User 1 User 5
User 4 Organisation 2 310 User 4 User 1 User 5
User 5 Organisation 2 161 User 4 User 1 User 5
I can return a ranking vertically one organisation at a time using
=INDEX($A$2:$A$6,MATCH(1,INDEX(($C$2:$C$6=LARGE($C$2:$C$6,ROWS(H$1:H1)))*(COUNTIF(H$1:H1,$A$2:$A$6)=0),),0))
If someone could help me run this formula based on each organisation and horizontally that would be fantastic!
Thanks,
Sarah.
Non-Empty Usage Solution:
Assuming your data starts in A1 Like so:
A B C D E F
---------------------------------------------------------
1 | Name Organisation Usage First Second Third
2 | User 1 Organisation 1 8 User 3 User 5 User 2
3 | User 2 Organisation 1 10 User 3 User 5 User 2
4 | User 3 Organisation 1 222 User 3 User 5 User 2
5 | User 4 Organisation 1 1 User 3 User 5 User 2
6 | User 5 Organisation 1 14 User 3 User 5 User 2
7 | User 1 Organisation 2 215 User 4 User 1 User 5
8 | User 2 Organisation 2 18 User 4 User 1 User 5
9 | User 3 Organisation 2 12 User 4 User 1 User 5
10| User 4 Organisation 2 310 User 4 User 1 User 5
11| User 5 Organisation 2 161 User 4 User 1 User 5
You can change your formula starting in D2 to:
=INDEX($A$2:$A$11,MATCH(1,INDEX(($C$2:$C$11=LARGE(($B$2:$B$11=$B2)*$C$2:$C$11,COLUMNS($C2:C2)))*(COUNTIF($C2:C2,$A$2:$A$11)=0),),0))
What I changed:
Added ($B$2:$B$11=$B2) inside the LARGE which multiplies all the Usage values for other organizations by 0. Which then won't be picked up by the LARGE function.
Changed the ROWS(H$1:H1) to COLUMNS($C2:C2) so you can rank horizontally
I also changed the cell references to the entire dataset rows 2 to 11
Solution with possible empty Usage:
If the Usage is empty (for all users in the same organization) and you desire the First, Second, and Third column to be blank then also, like so:
A B C D E F
---------------------------------------------------------
1 | Name Organisation Usage First Second Third
2 | User 1 Organisation 1 8 User 3 User 5 User 2
3 | User 2 Organisation 1 10 User 3 User 5 User 2
4 | User 3 Organisation 1 222 User 3 User 5 User 2
5 | User 4 Organisation 1 1 User 3 User 5 User 2
6 | User 5 Organisation 1 14 User 3 User 5 User 2
7 | User 1 Organisation 2
8 | User 2 Organisation 2
9 | User 3 Organisation 2
10| User 4 Organisation 2
11| User 5 Organisation 2
We can accomplish this by checking if the entire Usage for the Organization is 0. Then we can blank out all the ranks for that Organization.
To check if the sum of the usages for the organization is 0 we can use SUMPRODUCT: So for cell D2 that would look like:
=SUMPRODUCT(($C$2:$C$11)*($B$2:$B$11=$B2))=0
Then we can just wrap an IF around everything and blank it if the above statement returns true. So our final formula would look like:
=IF(SUMPRODUCT(($C$2:$C$11)*($B$2:$B$11=$B2))=0,"",INDEX($A$2:$A$11,MATCH(1,INDEX(($C$2:$C$11=LARGE(($B$2:$B$11=$B2)*$C$2:$C$11,COLUMNS($C2:C2)))*(COUNTIF($C2:C2,$A$2:$A$11)=0),),0)))
^^ Throw whatever you want in there
Now if you want the text to say anything else, just put that text inside the quotes for the TRUE condition of the IF statement.

Resources