How to find the 3 highest values and respective category for a cell - excel

Here is an example of the data I'm trying to organize:
I'm looking for a way to automatically see the top 3 categories (column) for each Name# (row). The size of the category is determined by the number below the category.
Ideally, I'd also like to see a percentage breakdown (from the total) for each category. For example, in row "Name3" 2 categories make up a significantly larger portion of the total values. However, without this percentage breakdown, the 3 top values would seem to be comparable, when they are in fact, not.
Interested to see how this would all work with duplicate numbers, too.
I've tried Excel's rank function, but this doesn't tell me the categories that have the 3 largest sizes, just the 3 highest values.

With Office 365:
=FILTER(SORTBY($B$1:$H$1,B2:H2,-1),SORT(B2:H2,1,-1,TRUE)>=LARGE(B2:H2,3))
And copy down.
If there are ties it will expand the results to include it. It finds the third highest value and returns everything that is equal to or greater than it.

This approach spills all the results at once (array version). In cell J2, you can put the following formula:
=LET(D, A1:H5, A, TAKE(D,,1), DROP(REDUCE("", DROP(A,1), LAMBDA(ac,aa,
VSTACK(ac, TAKE(SORT(DROP(FILTER(D, (A=aa) + (A="")),,1),2,-1,1),1,3)))),1))
It assumes as per input data the cell A1 is empty (if not it can be adjusted accordingly). Here is the output:
An alternative that doesn't require previous assumption (but it is not really a hard one) is the following:
=LET(names, A2:A5, Data, B2:H5, colors, B1:H1, DROP(REDUCE("", names,
LAMBDA(ac,n, VSTACK(ac, TAKE(SORT(VSTACK(colors, INDEX(Data, XMATCH(n,names),0))
,2,-1,TRUE),1,3)))),1))
The non-array version can be obtained from previous approach, and expand it down:
=TAKE(SORT(VSTACK($B$1:$H$1,INDEX($B$2:$H$5, XMATCH(A2,$A$2:$A$5),0)),2,-1,TRUE),1,3)
Explanation
To spill the entire solution it uses DROP/REDUCE/VSTACK pattern. Check my answer to the following question: how to transform a table in Excel from vertical to horizontal but with different length.
For the first formula we filter for a given element of A name (aa) via FILTER the input data (D) to select rows where the name is empty (to consider the header) OR (plus (+) condition) the name is equal to aa. We remove via DROP the first column of the filter result (names column). Next we SORT by the second row (the first rows are the colors) in descending order (-1) by column (last input parameter of SORT we can use TRUE or 1). Finally, we use TAKE to take the first three columns and the first row.
For the second approach, we select the values for a given row (names equals n) and use INDEX to select the entire row (column index 0), then we form an array via VSTACK to add as first row the colors and use the similar logic as in previous approach for sorting and select the corresponding rows and column (colors).
Notes:
If you don't have VSTACK function available, then you can replace it as follow: CHOOSE({1;2}, arr1,arr2) and substitute arr1, arr2, wit the corresponding arrays.
In the second formula instead of INDEX/XMATCH you can use: DROP(FILTER(Data, names=n),,1), it is a matter of personal preference.

Related

how to sum columns using column headers and bridge tables in excel

I have two sets of data in excel, set 1 is the raw data, and set 2 is a bridge table. The desired output is also added. How should I prepare for this formula.
set 1:
set 2:
output expected:
Here, a solution that assumes a variable number of headers and no specific pattern in the column names. Assumed no Excel version constraints as per tags listed in the question. In cell H1, put the following formula which spills the entire result all at once:
=LET(in, A1:F5, lk, A8:B12, header, DROP(TAKE(in,1),,1), A, TAKE(lk,,1),
B, DROP(lk,,1), data, DROP(in,1,1), REDUCE(TAKE(in,,1), UNIQUE(B),
LAMBDA(ac,bb, LET(f, FILTER(A, B=bb),values, CHOOSECOLS(data,XMATCH(f, header)),
sum, MMULT(values, SEQUENCE(ROWS(f),,1,0)), HSTACK(ac, VSTACK(bb, sum))))))
Here it the output:
We use LET function with two input ranges only: in, lk, so the rest of the names defined depend on such range names. It makes the formula easy to maintain and to adapt to your real scenario.
Using DROP and TAKE we extract each portion of the input ranges: header, data, A, B (columns from the second table). We use REDUCE/HSTACK pattern to concatenate the column of the result on each iteration. Check my answer from the question: how to transform a table in Excel from vertical to horizontal but with different length for more information.
We iterate by unique values of B and for each value (bb) we select the column A values (f). We use XMATCH to select the corresponding index columns from header (it doesn't include the date column). We use CHOOSECOOLS to select the corresponding columns from data (values). Now we need to sum by column, and we use MMULT for that. The result is in sum name. Finally, we use HSTACK to concatenate the selected columns one each iteration, including as header the unique values from B.
Note: Instead of MMULT function, you can use the following array function, it is a matter of personal preferences:
BYROW(values, LAMBDA(x, sum(x)))
You could try SUMIFS with the wild card character for each row. For example, for the first column, put the following formula and drag it down.
=SUMIFS($B2:$F2,$B$1:$F$1,"=A*")
Then do the same thing for the other columns, e.g. for column B:

Function to search for specific number and then to further search for the prefix

I have a huge amount of data to process in which 4 points with a related prefix needs to be subtracted from each other.
Data consists of ID and x value
Example
ID = 290.12, 290.03, 290.06, 290.09, 300.12, 300.03, 300.06, 300.09, 301.12, 301.03, 301.06, 301.09
(let's call prefix a "ring number" and suffix time on the clock)
X value = any numerical value for each ID assigned
What I'm hoping to do is to search for the first number before the dot i.e. 300 and then subtract the value of 300.06-300.12 in one cell and in another cell 300.03-300.09.
(The subtraction is just an example, how I need to manipulate with the numbers is slightly more complicated, but I got this one under control)
This is my actual Data and what I need to produce is to the right of the raw data. At the moment, I'm doing it manually for each set of "rings"
Anyone knows how to approach this? I'm thinking vlookup, but I'm not very proficient in excel.
New Excel
I tried vlookup, but I don't know how to construct the formula and I run out of ideas.
Edit:
I found out that REDUCE is no requirement in this case, so it can be shortened to:
=SQRT(SUM(((INDEX(B:D,XMATCH(I3+0.09,A:A),SEQUENCE(1,3))-INDEX(B:D,XMATCH(I3+0.03,A:A),SEQUENCE(1,3)))^2)))
You could change +0.09 and +0.03 to your needs and may reference them using LET() for easy maintaining:
=LET(id,I3,
_id1,0.09,
_id2,0.03,
SQRT(SUM(((INDEX(B:D,XMATCH(id+_id1,A:A),SEQUENCE(1,3))-INDEX(B:D,XMATCH(id+_id2,A:A),SEQUENCE(1,3)))^2))))
Previous answer:
=LET(
id,I3,
_id1,0.09,
_id2,0.03,
SQRT(
REDUCE(0, SEQUENCE(1,3),
LAMBDA(x, y,
x+((INDEX(B:D,XMATCH(id+_id1,A:A),y)
-INDEX(B:D,XMATCH(id+_id2,A:A),y))
^2)))))
This formula looks for the matching value of the id value I3 + _id1 minus the matching value of id value + _id2 for columns B to D and adds the ^2 results per column. Then it calculates it's square root.
You can change _id1 and _id2 to your needs.
To calculate the Delta (as shown) at once you could use:
=LET(id,I3,
_id1,0.09,
_id2,0.03,
_id3,0.12,
_id4,0.06,
x,SQRT(SUM((INDEX(B:D,XMATCH(id+_id1,A:A),SEQUENCE(1,3))-INDEX(B:D,XMATCH(id+_id2,A:A),SEQUENCE(1,3)))^2)),
y,SQRT(SUM((INDEX(B:D,XMATCH(id+_id3,A:A),SEQUENCE(1,3))-INDEX(B:D,XMATCH(id+_id4,A:A),SEQUENCE(1,3)))^2)),
(x-y)*1000)
You can have a column of unique values of the integers and a new column where you reference these values as id and drag down the formula to get your row by row result
In another column you can refer to these columns and sort per the second column using SORTBY()

Excel: Finding max value based on two criteria, if max value has two identical results

If we have a table of sports teams, that have all played against each other and two teams are on top equal with x points, the winner would be crowned with the highest average goal differential.
But how would you do that with a formula in Excel?
This is the formula I am using to find the team with the highest points total:
=INDEX($C$8:$C$11,MATCH(MAX($K$8:$K$11),$K$8:$K$11,0))
This formula would give me the result of Ecuador being highest (first result of max value).
But in reality Qatar should be on top based on same points total AND average goal difference being higher.
Any solutions?
I assume you want to sort in descending order first by column P, then by column GD. If that is not the case, you can adjust it accordingly. In cell A8 you can put the following formula:
=TAKE(SORT(A3:I6, {9,8}, -1),2,1)
Here is the output:
You can also select the columns of your interest first via CHOOSECOLS then SORT:
=TAKE(SORT(CHOOSECOLS(A3:I6,1,8,9), {3,2}, -1),2,1)
If you want to include the row title, then:
=HSTACK({"First Place";"Second Place"},TAKE(SORT(A3:I6, {9,8}, -1),2,1))
Note: MATCH/XMATCH is not appropriate in this context, because what you really need is to sort the result, not finding values that match certain conditions. You can do it, but at the end you will end up implementing a sort manually and Excel has a built-in function for that. The resulting formula will unnecessary more verbose.
With SORT function you can use more than one criteria indicating in an array the columns to consider as first and second criteria, etc.. For your example it would be: {9,8} indicating as first sorting criteria column 9 and then column 8, so if column P have the same value, then it sorts by GD. The third input argument is to specify ascending (1, default)/descending order(-1). If you want to have a different sorting criteria for each column, then instead of -1, you can use it {-1,1} which means sort in descending order the column 9 and in ascending order the column 8. For your case you can use also: {-1,-1}, but using -1 produce the same result.

Excel, convert data in one column to multiple columns

As seen in the picture I have 5 sets of 2's in one column.
I would like it so that each set is in its own column.
Is there a way to do that?
I tried text to columns, but it did not work.
General solution
Imagine I have a vertical array starting in cell B2, which I want to separate into N stacked columns. I will place these columns from cell E4, as the picture indicates.
The code which achieves what I want is:
+OFFSET($B$2,(ROW()-ROW($E$4))*N+(COLUMN()-COLUMN($E$4)),0)
Replace N with your desired number (and the origin and destination cell with your particular values, B2 and E4 in this example), and expand the formula vertically and horizontally to form your desired matrix of N columns. For the case of N=3, you get:
(PS: if your array is horizontal, use transpose to transform to vertical. You can then transpose the resulting matrix, to get the final result.)
Explanation
The logic is simple. The function OFFSET has three compulsory inputs. The first one is the first point of your array you want to transform (in the example above, $B$2. The point you select has an index of 0, the one below an index of 1, etc. So, what you want is to put these ordered index into a matrix form, as shown below (for the case of N=3):
The rule to move these indexes is given in the second entry of the OFFSET function. This is basically a formula that calculates a sequence 0, 1, 2, 3 ... using some fixed values (the number of the row and columns of the first cell where you are putting the result, ROW($E$4) and COLUMN($E$4), which are equal to 4 and 5 respectively), and the variable values of the cell where you are placing the number (ROW() and COLUMN()). The formula computes the difference between actual row and reference row number, scale it by N, and adds any difference between actual and reference column. This formula gives the desired series 0, 1, 2, 3... for our desired output matrix.
Finally, the last item of OFFSET is equal to zero, since we are transforming with a vertical column of data, so no horizontal offset is needed.
You can do it with e.g. formula; enter this to C1 and fill down and right:
=OFFSET($A$1,ROW()-1+(COLUMN()-3)*6,0)
Take the total cells, dived it by 3 and cut and paste. I wasted a 30 mins trying all the solutions offered out there.
I gave up and now my project is complete. Only took about 15 seconds.
To split one column into multiple columns with column first order, in other words, without transpose, we can modify the formula as shown in https://www.extendoffice.com/documents/excel/3132-excel-convert-vector-to-matrix.html, which is the solution for row first order, i.e., with transpose, exchange the roles of ROW() and COLUMN(), example code:
=OFFSET($A$1:$A$10494,ROW()-ROW($B$1)+((COLUMN()-COLUMN($B$1))*(ROWS($A$1:$A$10494)/18)),0,1,1)
Here $a1:$a$10494 is source, $b$1 is destination, 18 is columns numbers to split into.
This can be used to get back the table structure of %debug print output in pdb, for example, which will split the output into narrow bands.

Sort Order formula to alphabetise in Excel

I am currently drawing up a spreadsheet that will automatically remove duplicates and alphabetize a list:
I am using the COUNTIF() function in column G to create a sort order and then VLOOKUP() to find the sort in column J.
The problem I am having is that I can't seem to get my SortOrder column to function properly. At the moment it creates an index for two number 1's meaning the cell highlighted in yellow is missed out and the last entry in the sorted list is null:
If anyone can find and rectify this mistake for me I'll be very grateful as it has been driving me insane all day! Many thanks.
I'll provide my usual method for doing an automatic pulling-in of raw data into a sorted, duplicate-removed list:
Assume raw data is in column A. In column B, use this formula to increase the counter each time the row shows a non-duplicate item in column A. Hardcord B2 to be "1", and use this formula in B3 and drag down.
=if(iserror(match(A3,$A$2:A2,0)),B2+1,B2)
This takes advantage of the fact that when we refer to this row counter in our revised list, we will use the match function, which only checks for the first matching number. Then say you want your new list of data on column D (usually I do this for display purposes, so either 'group-out' [hide] columns that form the formulas, or do this on another tab). You can avoid this step, but if you are already using helper columns I usually do each step in a different column - easier to document. In column C, starting in C3 [C2 hardcoded to 1] and drag down, just have a simple counter, which error-checks to the stop at the end of your list:
=if(C2<max(B:B),C2+1," ")
Then in column D, starting at D2 and dragged down:
=iferror(index(A:A,match(C2,B:B,0)),"")
The index function is like half of the vlookup function - it pulls the result out of a given array, when you provide it with a row number. The match function is like the other half of the vlookup function - it provides you with the row number where an item appears in a given array.
Hope this helps you in the future as well.
The actual reason that this is going wrong as implied by Jeeped's comment is that you can't meaningfully compare a string to a number unless you do a conversion because they are stored differently. So COUNTIF counts numbers and text separately.
20212 will give a count of 1 because it is the only (or lowest) number.
CS10Z002 will give a count of 1 because it is the first text string in alphabetical order.
Another approach is to add the count of numbers to the count if the current cell contains text:-
=COUNTIF(INDIRECT("$D$2:$D$"&$F$3),"<="&D2)+ISTEXT(D2)*COUNT(INDIRECT("$D$2:$D$"&$F$3))
It's easier to show the result of three different conversions with some test data:-
(0) No conversion - just use COUNTIF
=COUNTIF(D$2:D$7,"<="&D2)
"999"<"abc"<"def", 999<1000
(1) Count everything as text
=SUMPRODUCT(--(D$2:D$7&""<=D2&""))
"1000"<"999"
(2) Count numbers before text
=COUNTIF(D$2:D$7,"<="&D2)+ISTEXT(D2)*COUNT(D$2:D$7)
999<1000<"999"
(3) Count everything as text but convert numbers with leading zeroes
=SUMPRODUCT(--(TEXT(D$2:D$7,"000000")<=TEXT(D2,"000000")))
"000999" = "000999", "000999"<"001000"

Resources