Excel find and count unique values in 2 dimensional array - excel-formula

I have arrays of excel data 3 cols x 300 rows - example
A
B
C
UNIT_TESTING
REMOTE_TEAM_AVATARS
SOCIAL_TIME
SOCIAL_TIME
ELIMINATE_LONG_LIVED_FEATURE_BRANCHES
There will be blanks in some rows.
My goal among the 900 individual cells, find the unique values.
Once I have the values displayed I want to count how many instances there of each unique value.
In the trivial case above the result would be:
A
B
SOCIAL_TIME
2
ELIMINATE_LONG_LIVED_FEATURE_BRANCHES
1
...
In an ideal world I want to avoid creating a mid calc column of 900 elements

With Office 365 we can use UNIQUE, FILTER and SEQUENCE to get the desired output:
=LET(
rng, A1:C2,
clm, COLUMNS(rng),
ct, ROWS(rng)*clm,
arr, INDEX(rng,INT(SEQUENCE(ct,,1,1/clm)),MOD(SEQUENCE(ct,,0),clm)+1),
flt, FILTER(arr,arr<>""),
unq, UNIQUE(flt),
SORT(CHOOSE({1,2},unq,COUNTIF(rng,unq)),{2,1},{-1,1}))

Related

Sum of the greatest value in one column, plus the sum of the other values in another column

Consider the following sheet/table:
A B
1 90 71
2 40 25
3 60 16
4 110 13
5 87 82
I want to have a general formula in cell C1 that sums the greatest value in column A (which is 110), plus the sum of the other values in column B (which are 71, 25, 16 and 82). I would appreciate if the formula wasn't an array formula (as in requiring Ctrl + Shift + Enter). I don’t have Office 365, I have Excel 2019.
My attempt
Getting the greatest value in column A is easy, we use MAX(A1:A5).
So the formula I want in cell C1 should be something like:
=MAX(A1:A5) + SUM(array_of_values_to_be_summed)
Obtaining the values of the other rows in column B (what I called array_of_values_to_be_summed in the previous formula) is the hard part. I've read about using INDEX, MATCH, their combination, and obtaining arrays by using parenthesis and equal signs, and I've tried that, without success so far.
For example, I noticed that NOT((A1:A5 = MAX(A1:A5))) yields an array/list containing ones (or TRUEs) for the relative position of the rows to be summed, and containing a zero (or FALSE) for the relative position of the row to be omitted. Maybe this is useful, I couldn't find how.
Any ideas? Thanks.
Edit 1 (solution)
I managed to obtain what I wanted. I simply multiplied the array obtained with the NOT formula, by the range B1:B5. The final formula is:
=MAX(A1:A5) + SUM(NOT((A1:A5 = MAX(A1:A5))) * B1:B5)
Edit 2 (duplicate values)
I forgot to explain what the formula should do if there are duplicates in column A. In that case, the first term of my final formula (the term that has the MAX function) would be the one whose corresponding value in column B is smallest, and the value in column B of the other duplicates would be used in the second term (the one containing the SUM function).
For example, consider the following sheet/table:
A B
1 90 71
2 110 25
3 60 16
4 110 13
5 110 82
Based on the above table, the formula should yield 110 + (71 + 25 + 16 + 82) = 304.
Just to give context, the reason I want such a formula is because I’m writing a spreadsheet that automatically calculates the electric current rating of the short-circuit protective device of the feeder of a group of electric motors in a house or building or mall, as required by the article 430.62(A) of the US National Electrical Code. Column A is the current rating of the short-circuit protective device of the branch-circuit of each motors, and column B is the full-load current of each motor.
You can use this formula
=MAX(A1:A5)
+SUM(B1:B5)
-AGGREGATE(15,6,(B1:B5)/(A1:A5=MAX(A1:A5)),1)
Based on #Anupam Chand's hint for max-value-duplicates there could also be min-value-duplicates in column B for corresponding max-value-duplicates in column A. :) This formula would account for that
=SUM(B1:B5)
+(MAX(A1:A5)-AGGREGATE(15,6,(B1:B5)/(A1:A5=MAX(A1:A5)),1))
*SUMPRODUCT((A1:A5=MAX(A1:A5))*(B1:B5=AGGREGATE(15,6,(B1:B5)/(A1:A5=MAX(A1:A5)),1)))
Or with #Anupam Chand's shorter and better readable and overall better style :)
=SUM(B1:B5)
+(MAX(A1:A5)-MINIFS(B1:B5,A1:A5,MAX(A1:A5)))
*COUNTIFS(A1:A5,MAX(A1:A5),B1:B5,MINIFS(B1:B5,A1:A5,MAX(A1:A5)))
The explanation works for bot solutions:
The SUM-part just sums the whole list.
The second line gets the max-value for column A and the corresponding min-value of column B for the max-values in column A and adds or subtracts it respectively.
The third line counts, how many times the corresponding min-value for the max-value occurs and multiplies it with the second line.
Can you try this ?
=MAX(A1:A5)+SUM(B1:B5)-MINIFS(B1:B5,A1:A5,MAX(A1:A5))
What we're doing is adding the max of A to all rows of B and then subtracting the min value of B where A is the max.
If you have Excel 365 you can use the following LET-Formula
=LET(A,A1:A5,
B,B1:B5,
MaxA,MAX(A),
MinBExclude, MINIFS(B,A,MaxA),
sumB1,SUMPRODUCT(B*(A=MaxA)*(B<>MinBExclude)),
sumB2,SUMPRODUCT(B*(A<>MaxA)),
MaxA +sumB1+sumB2
A and B are shortcuts for the two ranges
MaxA returns the max value for A (110)
MinBExclude filters the values of column B by the MaxA-value (25, 13, 82) and returns the min-value of the filtered result (13)
sumB1 returns the sum of the other MaxA values from column B (26 + 82)
sumB2 returns the sum of the values from B where value in A <> MaxA (71 + 60)
and finally the result is returned
If you don't have Excel 365 you can add helper columns for MaxA, MinBExclude, sumB1 and sumB2 and the final result

Count Unique Dates Associated with Location

I am trying to count the total of Unique Dates based on the location.
Context: I trying to create a formula for counting the number of unique dates based on location. My Spreadsheet looks like this
A B C
1 **Participant Location Date**
2 Participant-A High School X 11/7
3 Participant-B High School X 11/7
4 Participant-C High School X 11/8
5 Participant-E High School Y 11/7
6 Participant-F High School Z 11/7
7 Participant-G High School Z 11/8
So for example: high School X had 2 different dates. What would the formula be to count the unique dates based on the location?
This is also being completed on google sheets.
Thank you!
Another way (with no helper columns) would be to use query() and unique().
=query(unique(B:C), "Select Col1, count(Col2) where Col1 <>'' group by Col1 label count(Col2)'# of unique dates'", 1)
Illustration:
With a simple helper column :
=1/COUNTIFS($A$2:$A$7,A7,$B$2:$B$7,B7)
And to get your results :
=SUMIF($A$2:$A$7,E2,$C$2:$C$7)
This is not one-formula solution but I think it works. First, create a third column concatenating the columns that you want to compare. In this case, at cell D2 write:
=CONCATENATE(B2,C2)
This is for the first row of your example. Then, replicate that to the following rows.
Finally, create a formula that counts unique values:
=SUM(IF(FREQUENCY(IF(LEN(D2:D7)>0,MATCH(D2:D7,D2:D7,0),""), IF(LEN(D2:D7)>0,MATCH(D2:D7,D2:D7,0),""))>0,1))
Assuming your new column of concatenated values is at D2:D7.

How do you group data in columns?

I have numeric data under fifty samples that are mostly similar. I want to count identical columns and give statistics on the same. There are too many rows to select them (37,888). Data looks like:
Sample 1 Sample 2 Sample 3 ........ Sample 50
4 4 0
4 4 0
4 4 ...
0 0
0 0
0 0
0 0
... ...
upto thousands of rows for each sample.
There is a column for date/time as well, would be nice if I could include that in the grouping.
In this snippet, there are many rows. Sample 1 and 2 are identical hence should be grouped together. Sample three would form another group and so on.
While I'm not sure what "There are too many rows to select them" means in this context (there is no limit on the number of rows or items that can be selected and included in a formula), this looks like a job for array formulas.
If you want to determine (for instance) whether columns C and D are equal, from rows 1 through 37888, you can use this formula:
=AND(C1:C37888=D1:D37888)
To make Excel treat this as an array formula, you need to press CTRL-SHIFT-ENTER (Windows) or CMD-ENTER (Mac) after typing the formula. The "AND" function will return TRUE if and only if all corresponding entries are equal: C1=D1, C2=D2, C3=D3, ..., C37888=D37888. It returns FALSE if any corresponding entries disagree.
Exactly what you do next will depend on the nature of the statistics that you want to compute for each group, but this formula will at least help you figure out which columns belong in the same group together.

Find the top n values in a range while keeping the sum of values in another range under x value

I'd like to accomplish the following task. There are three columns of data. Column A represents price, where the sum needs to be kept under $100,000. Column B represents a value. Column C represents a name tied to columns A & B.
Out of >100 rows of data, I need to find the highest 8 values in column B while keeping the sum of the prices in column A under $100,000. And then return the 8 names from column C.
Can this be accomplished?
EDIT:
I attempted the Solver solution w/ no luck. 200 rows looks to be the max w/ Solver, and that is what I'm using now. Here are the steps I've taken:
Create a column called rank RANK(B2,$B$2:$B$200) (used column D -- what is the purpose of this?)
Create a column called flag just put in zeroes (used column E)
Create 3 total cells total_price (=SUM(A2:A200)), total_value (=SUM(B2:B200)) and total_flag (=(E2:E200))
Use solver to minimize total_value (shouldn't this be maximize??)
Add constraints -Total_price<=100000 -Total_flag=8 -Flag cells are binary
Using Simplex LP, it simply changes the flags for the first 8 values. However, the total price for the first 8 values is >$100,000 ($140k). I've tried changing some options in the Solver Parameters as well as using different solving methods to no avail. I'd like to post an image of the parameter settings, but don't have enough "reputation".
EDIT #2:
The first 5 rows looks like this, price goes down to ~$6k at the bottom of the table.
Price Value Name Rank Flag
$22,538 42.81905675 Blow, Joe 1 0
$22,427 37.36240932 Doe, Jane 2 0
$17,158 34.12127693 Hall, Cliff 3 0
$16,625 33.97654031 Povich, John 4 0
$15,631 33.58212402 Cow, Holy 5 0
I'll give you the solver solution as a starting point. It involves the creation of some extra columns and total cells. Note solver is limited in the amount of cells it can handle but will work with 100 anyway.
Create a column called rank RANK(B2,$B$2:$B$100)
Create a column called flag just put in zeroes
Create 3 total cells total_price, total_value and total_flag
Use solver to minimize total_value
Add constraints
-Total_price<=100000
-Total_flag=8
-Flag cells are binary
This will flag the rows you want and you can grab the names however you want.

How do I perform a monte carlo simulation in Open Office?

I am trying to generate some ranges for a problem I am working on. These rangers are going to be based on the sum of the ratio's of a bunch of numbers. So for example, the constant's are 5 6 and 7.
The ranges I get will be 5/x + 6/y + 7/z = S
I want x, y, and z to come out of a list of numbers I have - say .5, .6, .7, .8, .9, and 1
So If I run 100 iterations of this, I want the spreadsheet to randomly fill a value in X from that list of numbers, another random selection for y, and yet another for z.
And like I said, I want that sum, S, to be calculated 100 times in such a way that I will get a range of values for S.
I have been trying to figure out how to do this without the use of macros.
Here's one way to do it. Create a table of x, y, and z input values. Put a column to the left of the table with the number of each input value (1...N). Say that you have 10 potential input values for each. So your table is in A1:D10 with 1 through 10 in column A and the x values in B, y values in C, and z valued in D.
Then you can select a random value of the x values by writing =VLOOKUP(10*RAND()+1,$A$1:$D$10,2,TRUE). This randomly selects a number between 0 and 10 and looks up the x value matching the A column that matches the number, rounded down. E.g. the random number is 4.3 -- then it will select the 4th value. Replace the third parameter in the VLOOKUP column with 3 for y values and 4 for z values...
If you don't have any other data in columns A:D, you can generalize this with =VLOOKUP(count($A:$A)*RAND()+1,$A:$D,2,TRUE).

Resources