Rank the top 5 entries in different criteria - excel

I have a table that I want to find the top X people in each of the different groups.
Unique Names Number Group
a 30 1
b 4 2
c 19 3
d 40 2
e 1 1
f 9 2
g 15 3
I've ranked the top 5 people by number by using =index($A$2:$A$8,match(large($B$2:$B$8,1),$B$2:$B$8,0)). The 1 in the LARGE function I linked to a ranked range so that when I dragged down it changed up the number.
What I would like to do next is rank the top x number of people in each group. So top 3 in group 1.
I tried =index($A$2:$A$8,match("1"&large($B$2:$B$8,1),$C$2:$C$8&$B$2:$B$8,0)) but it didn't seem to work.
Thanks
EDIT: After looking at the answers below I have realised why they are not working for me. My actual data that I want to use the formula with have multiple entries of numbers. I have adjusted the example data to show this. The problem I have is that if there are duplicate numbers then it returns both of the names even if one is not in the group.
Unique Names Number Group
a 30 1
b 30 2
c 19 3
d 40 2
e 1 1
f 30 2
g 15 3

Proof of Concept
Use the following formula in the example above in cell F2 and copy down and to the right as needed.
=IFERROR(INDEX($A$2:$A$8,MATCH(AGGREGATE(14,6,($C$2:$C$8=F$1)*($B$2:$B$8),ROW($A2)-1),$B$2:$B$8,0)),"")
In the header row provide the group numbers. or come up with a formula to augment and reset the group number as you copy down based on your X number in your question.
Explanation:
The AGGREGATE function unlike the large function is an array function without the need to use CSE. As such we can add criteria to what we want to use. In this case only 1 criteria was used and that was the group number. in the formula it was the following part:
($C$2:$C$8=F$1)
If there were multiple criteria we would use either an + operator as an OR or we would use an * operator as an AND.
The 6 option in the aggregate function allows us to ignore errors. This is useful when trying to get the small. It is also useful for dealing with other information that may cause errors that do not need to be worried about.
As this is technically an array operation avoid using full column/row references as they can bog down your system.
The basics of what the over all formula is doing is building a list that match the group number you are interested in. After filtering your numbers, it then determines which is the largest, second largest etc by what row you have copied down to. It then determine what row the nth largest number occurs in through the match function, and finally it returns to the corresponding name to that row with the index function.

Building on all the other great answers.
Because you have the possibilities of duplicate values in each group we need to do this with two formulas.
First we need to get the numbers in order. I used the Aggregate, but this could be done with the array LARGE(IF()) also:
=IFERROR(AGGREGATE(14,6,$B$2:$B$8/($C$2:$C$8=E$1),ROW(1:1)),"")
Then using that number and order we can reference, we can use a modified version of #ForwardEd's formula, using COUNTIF() to ensure we get the correct name in return.
=IFERROR(INDEX($A$2:$A$8,AGGREGATE(15,6,(ROW($B$2:$B$8)-ROW($B$2)+1)/(($C$2:$C$8=F$1)*($B$2:$B$8=E3)),COUNTIF(E$2:E2,E3)+1)),"")
This will count the number in the results returned and then bring in the correct name.

You could also solve this with array formulas - to filter a group whose name is stored in E1, your code
=INDEX($A$2:$A$8,MATCH(LARGE($B$2:$B$8,1),$B$2:$B$8,0))
would then be adapted to
=INDEX($A$2:$A$8,MATCH(LARGE(IF($C$2:$C$8<>E1,-1,$B$2:$B$8),1),$B$2:$B$8,0))
Note: After entering an array formula, you have press CTRL+SHIFT+ENTER.

Thank you to everyone who offered help but for some reason none of your methods worked for me, which I am sure was to do with the quality of my data. I used an alternate method in the end which is slightly convoluted but seemed to work.
=IF($C2="1",RANK($B2,$B$2:$B$8,1)+ROW()/10000,-1)
Essentially using the rank function and adding a fraction to separate out duplicate values.

Related

Get count only all parameters are within there LSL & USL

From below data table, I tried to get sample count which are each value within their specification limits. To get the sample count, all 3 values must fulfilled its specification requirement. I used SUMPRODUCT function for each and every column and checked its relevant specification limits using following formula.
=SUMPRODUCT((B3:B8>=B9)*(B3:B8<=B10),(C3:C8>=C9)*(C3:C8<=C10),(D3:D8>=D9)*(D3:D8<=D10))
But when I am dealing with more columns, this is getting more complex.
My question is, are there any other way to check all column at once? to reduce the formula complexity.
Note:- Highlighted with red color are out of specification limits. Only 2 & 6 rows are counted the returned result.
You can use SUMPRODUCT/MMULT:
=SUMPRODUCT(--(MMULT((B3:D8>=B9:D9)*(B3:D8<=B10:D10),ROW(A1:A3)^0)=3))
Just remember that the second parameter of MMULT must specify the number of rows corresponding to the number of columns of the first parameter, i.e. B3:D8 = 3 columns => A1:A3 = 3 rows. The comparison with 3 also changes accordingly.

How to find the index of remaining columns if the data is repetitive

I have a data entry like thisData entries
Now, i need to find the smallest 10 values and also get the corresponding person and area and date along with it.
I used SMALL functoin to find the least 10 values. Then I used the index and match functions for getting their corresponding row entries. The problem is since some data entries are being repetitive, these functions are giving the row of the first 2 for all the remaining 2s. How to solve this
In F2 use Rank like this, so you have unique numbers:
=RANK(C2,$C$2:$C$21,1)+ROW()/1000
in G2 use Small, to pull the smallest of the ranked numbers and copy down 10 rows.
=SMALL($F$2:$F$21,ROW(A1))
Now you can pull person, date, real hours and area with an index match in H2, copied across and down.
=INDEX(A$2:A$21,MATCH($G2,$F$2:$F$21,0))

Excel - find the biggest gap between numbers in rows

I have an excel file with >12500 rows in one column.
It contains such random strings with 20 digits:
2,3,4,6,7,8,12,13,14,24,30,42,45,46,48,50,56,58,**59**,61
1,2,6,8,11,12,13,16,17,21,24,27,28,33,34,42,44,48,58,61
3,7,10,13,14,15,18,21,23,24,25,29,30,34,37,48,51,56,57,60
8,11,13,16,17,19,21,27,29,35,36,39,42,44,46,50,53,54,57,60
2,4,7,9,21,26,28,30,32,34,35,37,38,39,43,44,50,60,61,62
10,13,15,18,21,22,23,24,25,26,40,42,48,49,51,52,56,**59**,61,62
1,2,4,7,14,15,18,20,24,29,30,32,35,41,42,50,52,55,58,62
1,4,8,9,10,12,17,24,25,33,37,41,43,44,46,49,52,**59**,61,62
1,2,4,6,9,12,15,17,21,24,30,31,32,36,41,44,47,48,51,58
2,7,10,12,15,16,20,24,25,27,30,33,39,44,45,52,54,55,58,60
5,7,10,11,20,22,24,31,32,33,36,38,39,41,43,47,50,52,56,58
3,6,8,9,14,15,19,21,25,28,34,37,39,45,47,54,55,56,57,**59**
1,2,3,4,5,8,14,15,18,20,23,31,33,37,42,45,46,51,52,55
I need to know whats the biggest gap between rows where a number hasn't repeated. For example - I search for any number (e.g 59) and I need to know what's the largest gap between two rows where number 59 hasn't repeated.
In this example it's 4 row gap between 59's.
Hope that I make myself clear.
Seems like a fun problem which admits a simple but not quite obvious answer. First -- make sure that the data is in 20 columns (use the text to columns feature under the data tab). Using your example, I came up with a spreadsheet that looks like:
V1 holds the target number. The formulas are in columns U.
In U1 I entered:
=IF(ISNA(MATCH($V$1,A1:T1,0)),1,0)
This formula uses MATCH to test if the value in V1 lies in the range to the left of it. If it doesn't the match function returns #N/A. The function ISNA checks for this error value. IF it is present, the overall formula returns 1 (since there are now 1 consecutive row without the target number) otherwise it returns 0.
The formula in U2 is similar with a little twist:
=IF(ISNA(MATCH($V$1,A2:T2,0)),1+U1,0)
The same basic logic -- but rather than returning 1 if the target number isn't present it adds 1 to the number above. The formula is then copied down the rest of the range. It has the effect of keeping a running total of consecutive rows without the target value. This running total is reset to 0 whenever a row with the target value is encountered.
The final ingredient requires no comment. In U14 I just have
=MAX(U1:U13)
which is the number you are looking for (assuming that the maximum number of consecutive rows without the target number is what you are looking for, even if this occurs either at the top or bottom of the data. If you want the largest gap that is literally between two rows where the number occurs, the logic would need to be made more complex).

Ranking with subsets

I'm trying to rank values and have managed to work out how to sort ties. My data looks at the total number of entries, ranks based on that and if there is a tie it looks to the next column of values to sort them out. However, I have two classes (East and West I've called them) of data within my dataset and want to rank them both separately (but stick to the rules above). So, if I had seven entries, 3 of them West and 4 of the East, I want West to have ranking 1,2,3 based on all the values that lie in that subset and East would have ranking 1,2,3,4. Can you explain what your formula is doing so I can understand how to apply your answer better in the future.
Effectively I'm asking what formula needs to go in achieve my result.
Cheers
Paul
There are a few related ways to do this, most involving SUMPRODUCT. If you don't like the solution below and would like to research other ways/explanations, try searching for "rankif".
The function looks up the Class and Value columns and, for every value in those columns, returns a TRUE or 1 if the current Class is a match AND if its Value is larger than the current Value, False or 0 if otherwise. The SUM adds up all these 1s, and the 1+ is for decoration. Remember to enter as an array formula using Ctrl+Shift+Enter before dragging down.
I used the array formula and SUM above to explain, but the following also works and might even be faster since it's not an array formula. It's the same idea, except we hijack SUMPRODUCT's ability to spit out a single value from an array.
=1+SUMPRODUCT(($A$2:$A$8=A2)*($B$2:$B$8>B2))
EDIT
To extend the rank-if, you could add more subsets to rank by multiplying more conditions:
You can also easily add tiebreakers by adding another SUMPRODUCT to treat the ties as an additional subset:
The first SUMPRODUCT is the 'base rank', while the second SUMPRODUCT is tiebreaker #1.

Excel: How to parse/cast text as a formula?

Is it possible to parse/cast text (like "=A1+A2") as a formula in MS Excel? I want to build a formula from pieces of text - some of which will only be typed in later by a user.
If the INDIRECT() function did not only work for referencing cells, then I could have typed this =INDIRECT("=A1+A2").
I know you can a work around this problem by simply adding a lot more hidden columns to do sub calculations. But for the sake scalability and efficiency, I would rather do something like the above.
I found a similar questions here and here, yet they don't solve my problem.
The Real-world problem:
Read on for a better understanding as to why you would want to do the above
Scenario
Each item in the list consists of a string, which contains anywhere from 1 to 5 account names each. Each account name is followed by an account number in brackets. The length of the number determines the type of account. Part of the account number is a date, of which the date format depends on the type of account. Further more, each account type may have more that 1 account-number length associated with it, although each number-length[*] is only associated with 1 account type.
Objectives
Extract account-names and their respective account-numbers and account-types from a list.
Make an assumption as to the account-type from the account-number
Validate this assumption by inspecting the build of the number and elements in the name
Check the validity of the account-numbers depending on their type.
The tricky part (this is where my problem lies)
The account-types and their respective account-number-lengths are not known before hand, and are typed into a table by the user of the sheet, specifying a type of account and the number-lengths associated with this account-type. The user should type this into a list - not go and tinker around with delicate formulas
Done so far
Column A: Contains the raw data (each cell has up to 5 names and numbers)
Columns B..F: Each column extracts 1 name, remains empty if all are already extracted
Columns G..K: Each column extracts 1 number corresponding to its name in columns B..F, remains empty if all are already extracted
Columns L..P: Each column calculates the length of the corresponding number in columns
G..K
Now the user would type the following details into a table which assigns certain number-lengths an account type:
TYPE2, BUSINESS, (OR(length=13,length=6))
where length will later be replaced with the cell address which contains the calculated account number-length.
What I want to do now
Columns Q..U:
Should all indicate the account-type of the corresponding account-number in columns G..K. The idea is to build a nested if-elseIf-elseIf formula using the criteria typed in by the user as specified above. Example of one of the elseIF statements:
SUBSTITUTE(CONCATENATE("=IF(",criteria,",",type,",",errCode)),"length","O10"))
All of these elseIf statements will then be concatenated together to form a master formula which will then need to be parsed/cast as a formula to calculate the account-type
This proposal uses only 5 columns (1 for each account-number, containing the master formula) and a table specifying account-types and criteria, also keeping the user away from formulas. Editing 1 line of code (the criteria) will update all formulas. Efficient & Scalable.
Since the user should never tinker around with the formulas under the hood, a simple 1 column if-elseIf-elseIf is out of the question. The alternative to the above would be to make a separate column to test for each account-type for each account-number. Separating/Abstracting out each test to its own column results in much better readability, easier editing & much less debugging - Unless you like multi-screen-wide-formulas. Example: 5 account-numbers * 10 possible account types = 50 extra columns.
Each edit to any criteria needs to copied to 4 other non-adjacent columns and drag-filled down 10,000 rows (columns can not be adjacent since it is effectively a 5x5 array of columns). Not Efficient nor scalable. Unless I'm missing some elegant way of updating non-adjacent formulas in a single click
The rest of the validations error indications are trivial.
Sample data
Tshepo Trust (6901/2005) Marlene Mead (8602250646085)
Great Force Inv 67 Pty Ltd (200602258007)
Jane (870811) Livingstone (6901/2005) Janette Appel (8503250647056) James (900111)
I know all this would probably be much easier to achieve with clever usage of VBA, eliminating all the need to simulate abstraction, encapsulation, multi-dimensional arrays and functional programming on a spreadsheet. But until I can program in VBA, worksheet formulas will be my refuge.
[*]: account number-length could also be described as the amount of digits in the number or as indicated by this formula: LEN(accNumber)
In VBA you have access to Cell.Formula.
I usually used Range to peek a cell by address.
I'm not sure if this would answer your question(it's a very detailed question!), but if your user was entering the account numbers in a table (I'm calling it 'RefTable') , that was:
Length of account number | business type
----------------------------------------
6 | Accountant
8 | Advisor
Then you could just use a vlookup on the length of the account number, given you've already separated them out.
=vlookup(len(accNumber), Reftable, 2, false)
Make sure that you either use a dynamic range name, or specify plenty of space below in RefTable, so that when your users add types, they don't get lost.
Also, if you have two different accounts with the same length, this could get you into trouble.

Resources