Count of values which appear more than once in a column - excel

In my excel column I have values as such:
ID
a
a
a
b
c
c
d
e
I would like to return the count of ids which occur twice or more. In this case answer is 2 (a,c).
Constraints:
No helper cols or one at most(There are a ton of other filters to be added to the countifs which are not relevant to the question,adding helpers would mean 12+ extra columns, one for each month)
2.No VBA ( UDF is ok)
3.Formula result in single cell.
The current formula which I have tried:
=COUNTIFS(F13:F22,COUNTIF(F13:F22,">=2"))
gives me 0.
Thanks in advance.

Hmm with no specific order of values, try:
=SUM(IF(COUNTIF(A2:A9,A2:A9)>1,1/COUNTIF(A2:A9,A2:A9),0))
Enter as array through CtrlShiftEnter
Another variant would be:
=SUMPRODUCT((COUNTIF(A2:A9,A2:A9)>1)/COUNTIF(A2:A9,A2:A9))
With the advantage you won't have to enter as array.
Would you choose to add criteria I believe that the second formula is a bit more userfriendly adding them in, like so (edited your sample data a little to show):
=SUMPRODUCT((B2:B9=1)*(C2:C9="x")*(COUNTIF(A2:A9,A2:A9)>1)/COUNTIF(A2:A9,A2:A9))

Related

Excel - How to count unique records with specified condition in a column?

I have a table like following:
And I want to count for each store, how many bills purchased single Item ? How many purchased 2 and full set of items ? The result should look like below
Where in store 'm3', bill 300 bought 2 item C but only count for 1 in the final result. This is where I struggle with, since I tried
COUNTIFS(A:A, "=m1", B:B, "=A") (then add with count for B and C) to get single item for store m1 but unable to figure out how to distinguish with the unique bill numbers.
Please don't mind asking if needed more clarifications, and I do prefer excel in-built functions rather than VBA.
Using Office 365 formula we start with the basic formula to find the singles:
=BYROW(E2:E4,
LAMBDA(z,SUM(--(
BYROW(--(COUNTIFS(A:A,z,C:C,UNIQUE($C$2:$C$8),B:B,{"A","B","C"})>0),
LAMBDA(a,SUM(a)))=1))))
This will give you the single results:
The others are all variations of that formula:
A,B
=BYROW(E2:E4,
LAMBDA(z,SUM(--(
BYROW(--(COUNTIFS(A:A,z,C:C,UNIQUE($C$2:$C$8),B:B,{"A","B"})>0),
LAMBDA(a,SUM(a)))=2))))-J2#
B,C
=BYROW(E2:E4,
LAMBDA(z,SUM(--(
BYROW(--(COUNTIFS(A:A,z,C:C,UNIQUE($C$2:$C$8),B:B,{"C","B"})>0),
LAMBDA(a,SUM(a)))=2))))-J2#
A,C
=BYROW(E2:E4,
LAMBDA(z,SUM(--(
BYROW(--(COUNTIFS(A:A,z,C:C,UNIQUE($C$2:$C$8),B:B,{"C","A"})>0),
LAMBDA(a,SUM(a)))=2))))-J2#
All
=BYROW(E2:E4,
LAMBDA(z,SUM(--(
BYROW(--(COUNTIFS(A:A,z,C:C,UNIQUE($C$2:$C$8),B:B,{"A","B","C"})>0),
LAMBDA(a,SUM(a)))=3))))
We have to add the -J2# to the end of the doubles to remove when it has all three. It will count in both places.
With Older Versions of Excel we need to do some gymnastics with SUMPRODUCT, MMULT, INDEX, MODE.MULT, etc
=SUMPRODUCT(
--(MMULT(
--(COUNTIFS(A:A,E2,C:C,INDEX(C:C,N(IF({1},MODE.MULT(IF(MATCH($C$2:$C$8,C:C,0)=ROW($C$2:$C$8),ROW($C$2:$C$8)*{1,1}))))),B:B,{"A","B","C"})>0),
{1;1;1})=1))
=SUMPRODUCT(
--(MMULT(
--(COUNTIFS(A:A,E2,C:C,INDEX(C:C,N(IF({1},MODE.MULT(IF(MATCH($C$2:$C$8,C:C,0)=ROW($C$2:$C$8),ROW($C$2:$C$8)*{1,1}))))),B:B,{"A","B"})>0),
{1;1})=2))-J2
=SUMPRODUCT(
--(MMULT(
--(COUNTIFS(A:A,E2,C:C,INDEX(C:C,N(IF({1},MODE.MULT(IF(MATCH($C$2:$C$8,C:C,0)=ROW($C$2:$C$8),ROW($C$2:$C$8)*{1,1}))))),B:B,{"B","C"})>0),
{1;1})=2))-J2
=SUMPRODUCT(
--(MMULT(
--(COUNTIFS(A:A,E2,C:C,INDEX(C:C,N(IF({1},MODE.MULT(IF(MATCH($C$2:$C$8,C:C,0)=ROW($C$2:$C$8),ROW($C$2:$C$8)*{1,1}))))),B:B,{"A","C"})>0),
{1;1})=2))-J2
=SUMPRODUCT(
--(MMULT(
--(COUNTIFS(A:A,E2,C:C,INDEX(C:C,N(IF({1},MODE.MULT(IF(MATCH($C$2:$C$8,C:C,0)=ROW($C$2:$C$8),ROW($C$2:$C$8)*{1,1}))))),B:B,{"A","B","C"})>0),
{1;1;1})=3))
Each of these would be placed in the first row of their respective columns and confirmed as array formula by using Ctrl-Shift-Enter instead of Enter when exiting edit mode.
They they would be dragged/copied down the columns.
Here, another alternative that generates the entire output for all cases with one formula. Use the following formula in cell E2:
=LET(ms, UNIQUE(A2:A8), ux, UNIQUE(A2:C8), CALC, LAMBDA(arr,cnt, BYROW(ms,
LAMBDA(m, LET(subset, CHOOSECOLS(FILTER(ux, INDEX(ux,,1)=m),2,3),
C, INDEX(subset,,2), SUM(BYROW(MMULT(IF(TOROW(C)=UNIQUE(C),1,0),
TRANSPOSE(IF(TOROW(INDEX(subset,,1))=arr,1,0))),
LAMBDA(r, N(SUM(r)=cnt)))))))),
HSTACK(ms, CALC({"A";"B";"C"}, 1), CALC({"A";"B"},2), CALC({"B";"C"},2),
CALC({"C";"A"},2), CALC({"A";"B";"C"}, 3)))
Here is the output:
We create a user LAMBDA function CALC with input argument arr (items values) and cnt (count condition to check), so we can generate all possible output changing the input parameters via HSTACK function.
CALC uses ux name, that represents the input removing duplicated row, such as the combination: {m3,C,300}. Now we cannot use RACON functions, because we need to work with an array, therefore in order to do the count for Item and Bill column values, we use MMULT function combined with IF statement as follow:
MMULT(IF(TOROW(C)=UNIQUE(C),1,0),TRANSPOSE(IF(TOROW(INDEX(subset,,1))=arr,1,0))
The output of MMULT, on each row (unique bill numbers) has the occurrences of each arr value. Therefore if we do the sum by row (inner BYROW) and check against the number of counts we are looking for (cnt), we get the expected counts we are looking for.
The rest is just to invoke CALC for all cases and append by column via HSTACK.

Excel SUMPRODUCT and dynamic text conditions

I am trying to do a summation of rows with certain dynamic conditions. I have rows like:
A can be only one value, K can have multiple OR-values. In the end M is to be summed.
I have tried to use SUMPRODUCT() which works for column A but not for K. What I am looking for is something like:
=SUMPRODUCT(--(!$A$2:$A$20000="AA")*--(!$K$2:$K$20000="AA" OR "BB")*$M$2:$M$20000)
I know I can do ="AA" and then ="BB" but I need "AA" and "BB" to be dynamic based on other cells. And the number of arguments is different. I tried {"AA";"BB"} but I know this will not work as the match then needs to be in the same row.
Can it at all be achieved?
Thanks a lot!
=SUMPRODUCT(($A$2:$A$20000="AA")*(($K$2:$K$20000="AA")+($K$2:$K$20000="BB"))*$M$2:$M$20000)
Note that:
Since you are multiplying/adding arrays, there's no need to include the double unary's
I don't know why you have a ! in your example formula.
To return an OR array of TRUE;FALSE, we add.
Your comments still do not provide a clear explanation of what you are making dynamic.
But to create a dynamic OR for column K, including testing for column A and summing column M, you can do the following:
For column K, let us assume that your possible OR's are entered separately in the range F2:F10
=SUMPRODUCT(MMULT(--($K$2:$K$20000=TRANSPOSE($F$2:$F$10)),--(ROW($F$2:$F$10)>0))*($A$2:$A$20000="AAA")*$M$2:$M$20000)
The matrix multiplication will produce a single column of 19,999 entries which will be a 1 for matches of any of the OR's and 0 if it does not match.
See How to do a row-wise sum in an array formula in Excel?
for information about the MMULT function in this application.
In the above formula, "blanks" in the OR range (F2:F10) will also match blank entries in column K. So it is conceivable that if there is a blank in K and F and a AAA in col A and a value in column M that a wrong result might be returned.
To avoid that possibility, we can use a dynamic formula to size column F where we are entering our OR values:
=INDEX($F$2:$F$10,1):INDEX($F$2:$F$10,COUNTA($F$2:$F$10))
will return only the values in col F that are not blank (assuming no blanks within the column)
So:
=SUMPRODUCT(MMULT(--($K$2:$K$20000=TRANSPOSE(INDEX($F$2:$F$10,1):INDEX($F$2:$F$10,COUNTA($F$2:$F$10)))),--(ROW(INDEX($F$2:$F$10,1):INDEX($F$2:$F$10,COUNTA($F$2:$F$10)))>0))*($A$2:$A$20000="AAA")*$M$2:$M$20000)
Given this data:
the last formula will return a value of 5 (sum of M2,M3,M7)
Use SUMIFS with SUMPRODUCT wrapper:
=SUMPRODUCT(SUMIFS($M$2:$M$20000,$A$2:$A$20000,"AA",$K$2:$K$20000,{"AA","BB"}))

Condensing nested if-statements with multiple criteria

The blue columns is the data given and the red columns is what is being calculated. Then the table to the right is what I am referencing. So, F2 will be calculated by the following steps:
Look at the Machinery column (D), if the cell contains LF, select column K, otherwise select column L
Look at the Grade column (E), if the cell contains RG, select rows 4:8, otherwise select rows 9:12.
Look at the Species column (A), if the cell contains MS, select rows 5 and 10, otherwise.......
Where every the most selected cell is in columns K and L, copy into column F.
Multiply column F by column C.
I don't want to make another column for my final result. I did in the picture to show the two steps separately. So column F should be the final answer (F2 = 107.33). The reference table can be formatted differently as well.
At first, I tried using nested-if statements, but realized that I would have like 20+ if statements for all the different outcomes. I think I would want to use the SEARCH function to find weather of not the cell contains a specific piece of information. Then I would probably use some sort of combination of match, if, v-lookup, index, search, but I am not sure how to condense these.
Any suggestion?
SUMPRODUCT is the function you need. I quickly created some test data on the lines of what you shared like this:
Then I entered the below formula in cell F2
=SUMPRODUCT(($I$4:$I$9=E2)*($J$4:$J$9=LEFT(A2,FIND(" ",A2)-1))*IF(ISERROR(FIND("LF",D2,1)),$L$4:$L$9,$K$4:$K$9))
The formula may look a little scary but is indeed very simple as each sub formula checks for a condition that you would want to evaluate. So, for example,
($I$4:$I$9=E2)
is looking for rows that match GRADE of the current row in range $I$4:$I$9 and so on. The * ensures that the arrays thus returned are multiplied and only the value where all conditions are true remains.
Since some of your conditions require looking for partial content like in Species and Machine, I have used Left and Find functions within Sumproduct
This formula simply returns the value from either column K or L based on the matching conditions and you may easily extend it or add more conditions.

Sort Order formula to alphabetise in Excel

I am currently drawing up a spreadsheet that will automatically remove duplicates and alphabetize a list:
I am using the COUNTIF() function in column G to create a sort order and then VLOOKUP() to find the sort in column J.
The problem I am having is that I can't seem to get my SortOrder column to function properly. At the moment it creates an index for two number 1's meaning the cell highlighted in yellow is missed out and the last entry in the sorted list is null:
If anyone can find and rectify this mistake for me I'll be very grateful as it has been driving me insane all day! Many thanks.
I'll provide my usual method for doing an automatic pulling-in of raw data into a sorted, duplicate-removed list:
Assume raw data is in column A. In column B, use this formula to increase the counter each time the row shows a non-duplicate item in column A. Hardcord B2 to be "1", and use this formula in B3 and drag down.
=if(iserror(match(A3,$A$2:A2,0)),B2+1,B2)
This takes advantage of the fact that when we refer to this row counter in our revised list, we will use the match function, which only checks for the first matching number. Then say you want your new list of data on column D (usually I do this for display purposes, so either 'group-out' [hide] columns that form the formulas, or do this on another tab). You can avoid this step, but if you are already using helper columns I usually do each step in a different column - easier to document. In column C, starting in C3 [C2 hardcoded to 1] and drag down, just have a simple counter, which error-checks to the stop at the end of your list:
=if(C2<max(B:B),C2+1," ")
Then in column D, starting at D2 and dragged down:
=iferror(index(A:A,match(C2,B:B,0)),"")
The index function is like half of the vlookup function - it pulls the result out of a given array, when you provide it with a row number. The match function is like the other half of the vlookup function - it provides you with the row number where an item appears in a given array.
Hope this helps you in the future as well.
The actual reason that this is going wrong as implied by Jeeped's comment is that you can't meaningfully compare a string to a number unless you do a conversion because they are stored differently. So COUNTIF counts numbers and text separately.
20212 will give a count of 1 because it is the only (or lowest) number.
CS10Z002 will give a count of 1 because it is the first text string in alphabetical order.
Another approach is to add the count of numbers to the count if the current cell contains text:-
=COUNTIF(INDIRECT("$D$2:$D$"&$F$3),"<="&D2)+ISTEXT(D2)*COUNT(INDIRECT("$D$2:$D$"&$F$3))
It's easier to show the result of three different conversions with some test data:-
(0) No conversion - just use COUNTIF
=COUNTIF(D$2:D$7,"<="&D2)
"999"<"abc"<"def", 999<1000
(1) Count everything as text
=SUMPRODUCT(--(D$2:D$7&""<=D2&""))
"1000"<"999"
(2) Count numbers before text
=COUNTIF(D$2:D$7,"<="&D2)+ISTEXT(D2)*COUNT(D$2:D$7)
999<1000<"999"
(3) Count everything as text but convert numbers with leading zeroes
=SUMPRODUCT(--(TEXT(D$2:D$7,"000000")<=TEXT(D2,"000000")))
"000999" = "000999", "000999"<"001000"

Use RANK function for cell range based on criteria in separate cell range

I have a question regarding the RANK function in MS Excel 2010. I have a large worksheet whose rows I want to rank based on the values in a column. These values can be positive or negative. I found helpful advice here which explains how to rank the values in a column while excluding all values that equal zero from the ranking and the ranking count. They use the following formula:
IF(O24<0, RANK(O24,$O$24:$O$29) - COUNTIF($O$24:$O$29,0), IF(O24=0, "", RANK(O24,$O$24:$O$29)))
This works great, but it would be even better if I could rank the values only if a corresponding value in the same row but a different column meets certain criteria.
Is something like this possible and how would I do it? How would I update the example formula above to make the change work? Thank you very much in advance for your help.
P.S.: I tried putting in a table but it didn't really work, sorry...
You can use COUNTIFS function to rank based on a condition in another column, e.g. this formula in row 24 copied down [edited to include extra IF)
=IF(O24=0,"",IF(N24="x",COUNTIFS(O$24:O$29,">"&O24,O$24:O$29,"<>0",N$24:N$29,"x")+1,""))
That will rank high to low where column N = "x", ignoring zero values
See this example columns N and O contain random values - press F9 to re-generate new random values and formula results in column Q will change accordingly
It is certainly possible to keep creating more complex formulas whenever you're adding new criteria on which to rank. However by creating intermediary columns with single-step formulas, you'll make your spreadsheet easier to comprehend and easier to add new criteria or edit the existing.
My suggestion is to create a column that excludes the zero's (let's assume this is in column P): =IF(O24 = 0, "", O24)
Then in column R, to eliminate negative values (this step is unnecessary, but your original formula does something similar): =IF(P24 = "", "", P24 - MIN(0, MIN($O$24:$O$29)))
Now in column S, add your newest criteria: =IF(OR(R24="", [enter newest criteria here]), "", R24)
Finally, column T performs the ranking of only the selected rows: =IF(S24="", "", RANK(S24, S$24:S$29))
If exposing columns P, R and S is bothersome, you can always hide them.
A rewording of the answer from barry houdini, using table format.
Value_Col is the column with the values to rank. Group_Column is the column with the group by values, to rank within groups
=COUNTIFS([Value_Col], ">"&[#[Value]], [Value_Column],"<>0", [Group_Column], [#[Group]]) +1

Resources