Ranking with subsets - excel

I'm trying to rank values and have managed to work out how to sort ties. My data looks at the total number of entries, ranks based on that and if there is a tie it looks to the next column of values to sort them out. However, I have two classes (East and West I've called them) of data within my dataset and want to rank them both separately (but stick to the rules above). So, if I had seven entries, 3 of them West and 4 of the East, I want West to have ranking 1,2,3 based on all the values that lie in that subset and East would have ranking 1,2,3,4. Can you explain what your formula is doing so I can understand how to apply your answer better in the future.
Effectively I'm asking what formula needs to go in achieve my result.
Cheers
Paul

There are a few related ways to do this, most involving SUMPRODUCT. If you don't like the solution below and would like to research other ways/explanations, try searching for "rankif".
The function looks up the Class and Value columns and, for every value in those columns, returns a TRUE or 1 if the current Class is a match AND if its Value is larger than the current Value, False or 0 if otherwise. The SUM adds up all these 1s, and the 1+ is for decoration. Remember to enter as an array formula using Ctrl+Shift+Enter before dragging down.
I used the array formula and SUM above to explain, but the following also works and might even be faster since it's not an array formula. It's the same idea, except we hijack SUMPRODUCT's ability to spit out a single value from an array.
=1+SUMPRODUCT(($A$2:$A$8=A2)*($B$2:$B$8>B2))
EDIT
To extend the rank-if, you could add more subsets to rank by multiplying more conditions:
You can also easily add tiebreakers by adding another SUMPRODUCT to treat the ties as an additional subset:
The first SUMPRODUCT is the 'base rank', while the second SUMPRODUCT is tiebreaker #1.

Related

Formula to return string value based on ranked categories

This question might have a very simple answer.
I have 3 values (i.e. Low, High and Extreme) that correspond with 5 categories (Low, Moderate, High, Very High and Extreme), ranked from low to high.
I want to find a formula that returns the highest-ranked string value (i.e. Extreme), without having to add extra columns for category or value scores.
Hope someone can help.
Try:
Formula in C2:
=#SORTBY(B2:B4,MATCH(B2:B4,A2:A6,0),-1)
Note that the order in 'B' does not matter as long the order in 'A' is correct.
Here’s one approach, which does require O365 because of the FILTER() function. This would be simpler if we knew for sure how many values were in your values column. Since I don’t know that, I ended up adding a little complexity with the FILTER() function. This prevents the MATCH() function from returning #N/As for blank values, which in turn would break the MAX() function:
=INDEX({"Low","Medium","High","Very High","Extreme"},MAX(MATCH(FILTER(A1:A5,A1:A5<>""), {"Low","Medium","High","Very High","Extreme"},0)))
This can be further reduced by the LET() function:
=LET(
categories, {"Low","Medium","High","Very High","Extreme"},
INDEX(categories,MAX(MATCH(FILTER(A1:A5,A1:A5<>""),categories,0)))
)
Also, if this is a sheet that will get a lot of re-use, you could make the categories a named value using the Excel Name Manager. If you did that the formula would simply be:
=INDEX(categories,MAX(MATCH(FILTER(A1:A5,A1:A5<>""),categories,0)))

Rank equally across multiple columns with different data types using a single formula. Is it possible?

See this data set below. It has 3 columns (PTQ, % Growth, $ Growth) that all need ranked individually then summed up and then ranked again for a total power rank of each region. Is there any way I can do this with a single formula? I do this a lot and it would be nice not to have to rank everything individually each time. Basically I need to end up at the "Power Rank" column.
To clarify, I do not want to rank first on one column then another, they all need to be ranked equally together.
Thought I'd add a solution for non-365 users:
Edit: slight amendment so that the returned ranking mimics that of the RANK function, with thanks to #Scott Craner
=MMULT(N(MMULT(CHOOSE({1,2,3},RANK(A2:A10,A2:A10),RANK(B2:B10,B2:B10),RANK(C2:C10,C2:C10)),{1;1;1})>TRANSPOSE(MMULT(CHOOSE({1,2,3},RANK(A2:A10,A2:A10),RANK(B2:B10,B2:B10),RANK(C2:C10,C2:C10)),{1;1;1}))),ROW(A2:A10)^0)+1
which may require committing with CTRL+SHIFT+ENTER, depending on the version of Excel being used.
The static
{1;1;1}
relates to the fact that 3 columns are being queried, and could of course be replaced with a dynamic alternative.
We can use Office 365 LET and SORT. RANK does not allow the use of arrays, so we need to improvise with MATCH and SORT.
=LET(
ptq,A2:A10,
grwt,B2:B10,
grwthd,C2:C10,
ptqrnk,RANK(ptq,ptq),
gwtprnk,RANK(grwt,grwt),
gwtdrnk,RANK(grwthd,grwthd),
sm,ptqrnk+gwtprnk+gwtdrnk,
MATCH(sm,SORT(sm),0))

Match 2 columns based on a % difference within the value

I am looking for a method to match two excel tables.
I basically have two Systems, where the values do not exactly match only some IDs. The values in system 2 are usually 10-20% different from system 1.
Here is how the sheet looks like:
I tried to use vlookup on the IDs and then going hand-by-hand through the values if they match, by using the filter with the ID. However, this takes extremely long and is very cumbersome.
Any recommendation how to match these two tables, much more easily?
I really appreciate your replies!
If you look at a formula for G3 you would be involving D3:E3 and A:B (where A10:B10 are the matching values).
When someone states that they are looking for a percentage, it is helpful to know "a percentage of what...?". You receive a different result if the calculation is ABS(12 - 15)/15 instead of ABS(12 - 15)/12. One may within tolerance and the other may not.
In any event, the formula for G3 would be something like,
=ABS(E3-VLOOKUP(D3,A:B, 2, FALSE))/E3
... or,
=ABS(E3-VLOOKUP(D3,A:B, 2, FALSE))/VLOOKUP(D3,A:B, 2, FALSE)
That produces a result of 0.25% or 0.20% depending on how you calculate the percentage. You could wrap that in an IF statement for a YES/NO text result or use a custom number format like [Color3][>0.2]\NO;;[Color10]\Y\E\S;# which will show a red NO for values greater than 20% and a green YES for values between 0 and 20%. Negative values do not have to be accounted for as the ABS removes them from consideration.
       
I've only reproduced a minimum of your sample data for demonstration purposes but perhaps you can get an idea on how to proceed from that.

SUMIFS with intermediate VLOOKUP in the criteria

I have 3 tables, 1 of which I want to fill in columns with data based on the other 2. Tables are roughly structured as follows:
Table 1 (Semi-Static Data)
SubGroup Group
----------- -----------
subgroup(1) group(a)
subgroup(2) group(b)
subgroup(3) group(b)
subgroup(4) group(c)
etc.
Table 2 (Variable Data)
SubGroup DataValue
----------- -----------
subgroup(1) datavalue(i)
subgroup(2) datavalue(ii)
subgroup(3) datavalue(iii)
subgroup(4) datavalue(iv)
etc.
Table 3 (Results)
Group TotalValue
----------- -----------
group(a) totalvalue(m)
group(b) totalvalue(n)
group(c) totalvalue(o)
etc.
Where the TotalValue is the sum of all DataValue's for all subgroups that belong to that particular Group.
e.g. for group(b) ---> totalvalue(n) = datavalue(ii) + datavalue(iii)
I am looking to achieve this calculation without adding any additional columns to the Data tables nor using VBA.
Basically I need to perform a COUNTIFS where there is an additional VLOOKUP matching the subgroup criteria range to the group it belongs to, and then only summing for datavalue's that match the group being evaluated. I have tried using array formulas but I'm having issues making it work. Any assistance would be very appreciated. Thank you,
EDIT: Wanted to add some details surrounding my question. First all Google searches did not provide a suitable answer. All the links had solutions to a slightly different problem were the VLOOKUP term is not dependent on the SUMIFS criteria but rather another single static variable. Stack Overflow offered similar solutions. Please let me know if anymore details are required to make my post suitable for this forum. Thank you again.
You can use the SUMPRODUCT function to do it all at once. The first reference $B$2:$B$5 is for the Group names, the second reference $E$2:$E$5 is for the datavalues. The G2 reference is for the group names in the third table, you can enter this formula for the first reference and then drag and fill for the rest.
=SUMPRODUCT($E$2:$E$5 * (G2 = $B$2:$B$5))
Some cell references, and sample data, would be helpful but something like this might be what you want:
=SUMIF(C:C,"="&INDEX(A:A,MATCH(E5,B:B,0)),D:D)
WADR & IMHO, this is simply bad worksheet design. For lack of a single cross-reference column in Table2, any solution would have to be a VBA User Defined Formula or an overly complicated array formula (the latter of which I am not even sure is possible). The data tables are not normalized database tables you can INNER JOIN or GROUP BY ... HAVING.
The formula you are trying to achieve is akin to,
=SUMPRODUCT(SUMIF(D:D, {"subgroup(2)","subgroup(3)"}, E:E))
That only works with hard-coded values as arrayed constants (e.g. {"subgroup(2)","subgroup(3)"}). I know of no way to spit a dynamic list back into the formula using additional native Excel functions but VBA offers some possibilities.
HOWEVER,
The simple addition of one more column to Table2 with a very basic VLOOKUP reduces all of your problems to a SUMIF.
     
The formula in the new column D, row 2 is,
=VLOOKUP(E2, A:B, 2, FALSE)
The formula in I2 is,
=SUMIF(D:D, H2,F:F )
Fill each down as necessary. Sorry if that is not what you wanted to hear.
Thank you everyone that responded and reviewed this post. I have managed to resolve this using an array formula and some matrix algebra. Please note that I am not using VLOOKUP (this operator cannot be performed on arrays) nor SUMIFS as my title states.
My final formula looks like this:
{=SUM(IF([Table2.xlsx]Sheet1!SubGroup=TRANSPOSE(IF([Table1.xlsx]Sheet1!Group=G2,[Table1.xlsx]Sheet1!SubGroup,"")),[Table2.xlsx]Sheet1!DataValue))}
Very simply, I create an array variable that compares the Group being evaluated (e.g. cell G2) with the Groups column for Table 1 and outputs the corresponding matching SubGroups. This results in an array with as many rows as Table 1 had (N) and 1 column: Nx1. I then transpose that array (1xN) and compare it to the SubGroups column (Mx1, M being the number of rows in Table 2) and output the DataValues column for the rows that have a corresponding SubGroup (MxN). Then I perform a sum of the whole array to return a single value.
Notice that as I didn't include a value_if_false output return on either IF operators, it will just populate with FALSE in the arrays were the conditions are not met. This does not matter though for the final result. In the first IF, FALSE will not match the SubGroups so will be ignored. For the second all values FALSE passed to SUM will be calculated as 0. The more complicated question is that it grows the amount of memory required to process as we are not filtering to just have the values we want.
For this application I decided against filtering the subarray as the trade-off in resource utilization was acceptable. If the data sets were any bigger though, I would definitely try doing it. Another concern was that I did not understand fully the filtering logic that I was using based on http://exceltactics.com/make-filtered-list-sub-arrays-excel-using-small/ so decided to simplify. Will revisit this concept latter as I think it will work. I might have completed this solution but was missing transposing the array to compare properly so abandoned this route.

nested excel functions with conditional logic

Just getting started in Excel and I was working with a database extract where I need to count values only if items in another column are unique.
So- below is my starting point:
=SUMPRODUCT(COUNTIF(C3:C94735,{"Sharable Content Object Reference Model 1.2","Authored SCORM/AICC content","Authored External Web Content"}))
what i'd like to figure out is the syntax to do something like this-
=sumproduct (Countif range1 criteria..., where range2 criteria="is unique value")
Am I getting this right? The syntax is a bit confusing, and I'm not sure I've chosen the right functions for the task.
I just had to solve this same problem a week ago.
This method works even when you can't always sort on the grouping column (J in your case). If you can keep the data sorted, #MikeD 's solution will scale better.
Firstly, do you know the FREQUENCY trick for counting unique numbers? FREQUENCY is designed to create histograms. It takes two arrays, 'data' and 'bins'. It sorts 'bins', then creates an output array that's one longer than 'bins'. Then it takes each value in 'data' and determines which bin it belongs in, incrementing the output array accordingly. It returns the array. Here's the important part: If a value appears in 'bins' more than once, any 'data' value meant for that bin goes in the first occurrence. The trick is to use the same array for both 'data' and 'bins'. Think it through, and you'll see that there's one non-zero value in the output for each unique number in the input. Note that it only counts numbers.
In short, I use this:
=SUM(SIGN(FREQUENCY(<array>,<array>)))
to count unique numeric values in <array>
From this, we just need to construct arrays containing numbers where appropriate and text elsewhere.
In the example below, I'm counting unique days when the color is red and the fruit is citrus:
This is my conditional array, returning 1 or true for the rows I'm interested in:
($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0))
Note that this requires ctrl-shift-enter to be used as an array formula.
Since the value I'm grouping by for uniqueness is text (as is yours), I need to convert it to numeric. I use:
MATCH($C$2:$C$10,$C$2:$C$10,0)
Note that this also requires ctrl-shift-enter
So, this is the array of numeric values within which I'm looking for uniqueness:
IF(($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0)),MATCH($C$2:$C$10,$C$2:$C$10,0),"")
Now I plug that into my uniqueness counter:
=SUM(SIGN(FREQUENCY(<array>,<array>)))
to get:
=SUM(SIGN(FREQUENCY(
IF(($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0)),MATCH($C$2:$C$10,$C$2:$C$10,0),""),
IF(($A$2:$A$10="red")*ISNUMBER(MATCH($B$2:$B$10,{"orange","grapefruit","lemon","lime"},0)),MATCH($C$2:$C$10,$C$2:$C$10,0),"")
)))
Again, this must be entered as an array formula using ctrl-shift-enter. Replacing SUM with SUMPRODUCT will not cut it.
In your example, you'd use something like:
=SUM(SIGN(FREQUENCY(
IF(ISNUMBER(MATCH($C$3:$C$94735,{"Sharable Content Object Reference Model 1.2","Authored SCORM/AICC content","Authored External Web Content"},0)),MATCH($J$3:$J$94735,$J$3:$J$94735,0),""),
IF(ISNUMBER(MATCH($C$3:$C$94735,{"Sharable Content Object Reference Model 1.2","Authored SCORM/AICC content","Authored External Web Content"},0)),MATCH($J$3:$J$94735,$J$3:$J$94735,0),"")
)))
I'll note, though, that scaling might be a problem on data sets as large as yours. I tested it on larger data sets, and it was fairly fast on the order of 10k rows, but really slow on the order of 100k rows, such as yours. The internal arrays are plenty fast, but the FREQUENCY function slows down. I'm not sure, but I'd guess it's between O(n log n) and O(n^2) depending on how the sort is implemented.
Maybe this doesn't matter - none of this is volatile, so it'll just need to calculate once upon refreshing the data. If the column data is changing, though, this could be painful.
Asuming the source data is sorted by the key value [A], start with determining the occurence of the key column
B2: =IF(A2=A1;B1+1;1)
Next determine a group sum
C2: =SUMIF($A$2:$A$9;A2;$B$2:$B$9)
A key is unique if its group sum is exactly 1
D2: =(C2=1)
To count records which match a certain criterium AND are unique, include column D in a =IF(AND(D2, [yourcondition];1;0) and sum this column
Another option is to asume a key unique within a sorted list if it is unequal to both its predecessor and successor, so you could find the unique records like
E2: =AND(A2<>A1;A2<>A3)
G2: =IF(AND(E2;F2="this");1;0)
E and G can of course be combined into one single formula (not sure though if that helps ...)
G2(2): =IF(AND(AND(A2<>A1;A2<>A3);F2="this");1;0)
resolving unnecessarily nested AND's:
G2(3): =IF(AND(A2<>A1;A2<>A3;F2="this");1;0)
all formulas in row 2 should be copied down to the end of the list

Resources