Excel - calculating the median without removing duplicates - excel

I have a table that looks like this:
ID Total
3 3
3 3
3 3
4 11
4 11
4 11
4 11
4 11
4 11
6 9
6 9
7 13
7 13
7 13
7 13
7 13
7 13
7 13
7 13
7 13
7 13
7 13
7 13
7 13
I would like to calculate the median of column B (Total), excluding duplicate combinations of columns A and B. This could be achieved by constructing a table as below, and calculating the median from that table.
ID Total
3 3
4 11
6 9
7 13
Is there any way of obtaining the median without having to go through this process of manually deleting duplicates?

=MEDIAN(IF(FREQUENCY(MATCH(A2:A25&"|"&B2:B25,A2:A25&"|"&B2:B25,0),ROW(A2:A25)-MIN(ROW(A2:A25))+1),B2:B25))

There is a way with two additional columns. The first column is concatenation of ID and Total, the second counts occurences of each individual combination. Then you just do the median on those rows where the combination occurs for the first time.

Related

How to generate the equal number of groups per week that have members that change the group every week

I have been trying to create an excel that I can use to assign members to a group per week. I need to make sure that each member is in the different group every week.
Below is my excel and here is the formula I use in B3
=INDEX(UNIQUE(RANDARRAY(2, 10, 1, 11)), SEQUENCE(1), {1,2,3,4,5,6,7,8,9,10,11})
The issue i cannot fix is the fact that the groups differ in sizes? Any ideas please.
Week 1 2
PPLs
1 7 8
2 6 11
3 1 3
4 2 7
5 9 7
6 2 10
7 4 8
8 4 5
9 10 8
10 8 9
11 3 6
12 8 6
13 7 9
14 7 8
15 5 8
16 5 8
17 8 8
18 8 10
19 4 9
20 3 2
21 10 9
22 2 10
23 10 6
24 9 3
25 4 9
26 7 6
27 10 7
28 7 7
29 10 5
30 2 5
31 6 6
32 8 8
33 4 4
34 9 10
35 5 9
36 9 7
37 5 7
38 10 9
39 2 10
40 6 5
41 9 2
2 thoughts : simple&(somewhat)repeatable OR noExcel&random.
[ simple&(somewhat)repeatable ]
2 sets only.
Set 1 : group 1 is {member 1-10}, group 2 is {member 11-20} .. group 10 is {member 91-100}
Set 2 : group 1 is {member 1,11,21,31,41,51,61,71,81,91}, group 2 is {member 2,12,22,32,42,52,62,72,82,92} .. group 10 is {member 10,20,30,40,50,60,70,80,90,100}
p/s : although member 1 is always is group 1.. I'd consider it is as a valid group change, since ALL other members of group 1 had changed.
[ noExcel&random ]
Ever heard of 10x10 sudoku? it is widely available online. I have no specific one.. But Sudoku, IS what I meant by noExcel.
example 10x10 sudoku solution :
how to use it :
get 5 different (different is important) solved 10x10 sudoku set, and put it one on top of each other. We should get a 10(col)x50(row) table.
Sort by all 1st row, shall have the sequence (from top, in column 1) : 1,1,1,1,1,2,2,2,2,2,3,3,3,3,3, .. 9,9,9,9,9,10,10,10,10,10
Since we have 50 members (10groups, 5members each). Column 1 will be the assigned group for 50 respective (row) member in week1, Column 2 will be the assigned group for week2... and so on.
if every week is different group, then by week 11 it will come back to his/her original group. OR .. another (5 10x10 sudoku) set perhaps?
(If the idea is unclear.. please ask)
AFAIK. with the rules of sudoku itself, each member(row) will have a different group each week(column), and each element (group number) WILL be repeated for 5 times for each week(column). Thus, solve the
the fact that the groups differ in sizes
part. (please share if this doesn't work though..)
ref : I used https://sudokuspoiler.azurewebsites.net/Sudoku/Sudoku10 to solve http://www.sudoku4me.com/sudoku%2010x10.php puzzle (this solver need 0 to be replaced with 10). As long as it is a valid sudoku solution, it should be fine. Some sudoku use letters instead of numbers. Still, the idea applies.
p/s : I tried excel built in random number generator to generate a ranked (sorted) list, but still end up unable get consistent 5 members per group arrangement, with different group each week (same trouble as OP). I had a fond memory with 9x9 Sudoku, glad to know it came in handy for this solution.

How to select last 5 rows of each unique records in pandas

Using python 3 am trying for each uniqe row in the column 'Name' to get the last 5 records from the column 'Number'. How exactly can this be done in python?
My df looks like this:
Name Number
a 5
a 6
b 7
b 8
a 9
a 10
b 11
b 12
a 9
b 8
I saw same exmples(like this one Get sum of last 5 rows for each unique id ) in SQL but that is time consuming and I would like to learn how to do it in python.
My expected output df would be like this:
Name 1 2 3 4 5
a 5 6 9 10 9
b 7 8 11 12 8
I think you need something like this:
df_out = df.groupby('Name').tail(5)
df_out.set_index(['Name', df_out.groupby('Name').cumcount() +1])['Number'].unstack()
Output:
1 2 3 4 5
Name
a 5 6 9 10 9
b 7 8 11 12 8
Looks like you need pivot after a groupby.cumcount()
df1=df.groupby('Name').tail(5)
final=(df1.assign(k=df1.groupby('Name').cumcount()+1)
.pivot(index='Name', columns='k', values='Number')
.reset_index().rename_axis(None, axis=1))
print(final)
Name 1 2 3 4 5
0 a 5 6 9 10 9
1 b 7 8 11 12 8

Excel: HLOOKUP() where blank cells are skipped

I am trying to create an HLOOKUP() style formula that, if it finds a matching heading where the reported value of the row it's on except if it is blank it skips it and looks for the next column with the same heading in the same row.
An example of the data table is as follows:
Heading 1 Heading 2 Heading 1 Heading 4 Heading 5 Heading 1
Sample 1 1 7 13 19
Sample 2 8 14 20 2
Sample 3 9 15 21 3
Sample 4 4 10 16 22
Sample 5 5 11 17 23
Sample 6 12 6 18 24
As you can see, the data under headings 2, 4 and 5 are all in single columns, but the heading 1 values are split between three columns.
I need the final data set to look like this:
Heading 1 Heading 2 Heading 4 Heading 5
Sample 1 1 7 13 19
Sample 2 2 8 14 20
Sample 3 3 9 15 21
Sample 4 4 10 16 22
Sample 5 5 11 17 23
Sample 6 6 12 18 24
I have done some research online and have found a formula that I thought was meant to work as a VLOOKUP(), I can't quite work out what it's doing and when I try it on a transposed version of my data set it doesn't quite do what I expect. I Have been trying to get it work in and also convert it to work in the opposite orientation. The formula is as follows:
{=INDEX($B$3:$G$8,SMALL(IF(INDEX($A$3:$G$8,,MATCH(B$11,$B$2:$G$2,0))<>"",IF($A$3:$A$8=$A12,ROW($A$3:$G$8)-ROW($A3)+$I12)),1),MATCH(B$11,$B$2:$G$2,0))}
This formula is from https://www.mrexcel.com/forum/excel-questions/689238-vlookup-match-but-ignore-blank-cells.html
Running the formula on a transposed version of my data set results in the following:
**Transposed data set**
Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6
Heading 1 1 4 5
Heading 2 7 8 9 10 11 12
Heading 1 6
Heading 4 13 14 15 16 17 18
Heading 5 19 20 21 22 23 24
Heading 1 2 3
**Result**
Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6
Heading 1 1 0 3 0 5 0 1
Heading 2 7 8 9 10 11 12 2
Heading 4 13 14 15 16 17 18 3
Heading 5 19 20 21 22 23 24 4
**Expected result**
Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6
Heading 1 1 2 3 4 5 6
Heading 2 7 8 9 10 11 12
Heading 4 13 14 15 16 17 18
Heading 5 19 20 21 22 23 24
I think that I am probably over complicating this and that there must be a simpler solution to the problem. Any help that anyone can give me would be great. Let me
Thanks!
This is maybe faaar to simple, but why don't you simply add the values of the ´Heading 1´ columns? The empty values are treated as value 0, and by the end you'll have the values you are looking for :-)

How to filter with a lead lag window on a spark dataframe?

The filter function select all the rows in a Spark dataframe which satisfy certain conditions. How would I do window-filter where a set of rows above and below the row, which satisfies the filter condition, are selected? For example, I have the following dataframe myDF:
A B
1 1
2 12
3 13
4 14
5 10
6 17
7 34
8 12
9 1
10 7
11 1
Now I want to write something like myDF.orderBy($"A").myWindowFilter("B" === 12, 2) which will give me the following dataframe (2 is the lag/lead width):
A B
1 1
2 12
3 13
4 14
6 17
7 34
8 12
9 1
10 7
How do I implement such a function myWindowFilter in Scala/Spark?

Search multiple strings by using reference template and produce a constant number for that string

In sheet 1 I have:
Sno Description
1 uproc_incident_X
2 sys_win_disque_e
3 sys_unx_disk
4 process_unx_event_wait
5 process_unx_Uproc
6 process_win_china
7 http_get_zom_facturation
8 http_get_zom_stars
9 services_win_TaskScheduler
10 check_sos_out
11 check_LOG
12 app_unx_check
13 app_unx_11000
14 app_win_mqmngr
15 app_lnx_log_syslog
16 app_ora_alertlog
17 ora_tbs_usage
sheet 2 contains:
Sno Description Time
1 uproc 20
2 sys_win 20
3 sys_unx 15
4 process 12
5 http_get 12
6 services 10
7 check 10
8 app_unx 15
9 app_win 15
10 app_lnx 10
11 app_ora 10
12 ora 10
I want a formula to write in sheet 1 next to description by matching my sheet 2 with sheet 1 and provide the exact match number result as in sheet 2 in sheet 1 so the final result should look like this:
Sno Description Time
1 uproc_incident_X 20
2 sys_win_disque_e 20
3 sys_unx_disk 15
4 process_unx_event_wait 12
5 process_unx_Uproc 12
6 process_win_china 12
7 http_get_zom_facturation 12
8 http_get_zom_stars 12
9 services_win_TaskScheduler 10
10 check_sos_out 10
11 check_LOG 10
12 app_unx_check 15
13 app_unx_11000 15
14 app_win_mqmngr 15
15 app_lnx_log_syslog 10
16 app_ora_alertlog 10
17 ora_tbs_usage 10
Can any one help me?
I suggest a mapping table as shown in Column C and then in D2 and copied down:
=VLOOKUP(C2,'sheet 2'!B:C,2,0)

Resources