Count occurrences of strings just once per row in Google Sheets - excel

I have strings of spreadsheet data that need counting by 'type' but not instance.
A B C D
1 Lin 1 2 1
2 Tom 1 4 2
3 Sue 3 1 4
The correct sum of students assigned to teacher 1 is 3, not 4. That teacher 1 meets Lin in lessons B and D is irrelevant to the count.
I borrowed a formula which works in Excel but not in Google Sheets where I and others need to keep and manipulate the data.
F5=SUMPRODUCT(SIGN(COUNTIF(OFFSET(B$2:D$2, ROW($2:$4)-1, 0), E5)))
A B C D E
2 Lin 1 2 1
3 Tom 1 4 2
4 Sue 3 1 4
5 1 [exact string being searched for, ie a teacher name]
I don't know what is not being understood by Google Sheets in that formula. Does anyone know the correct expression to use, or a more efficient way to get the accurate count I need, without duplicates within rows inflating the count?

So this is the mmult way, which works by finding the row totals of students assigned to teacher 1 etc., then seeing how many of the totals are greater than 0.
=ArrayFormula(sum(--(mmult(n(B2:D4=E5),transpose(column(B2:D4)))>0)))
or
=ArrayFormula(sum(sign(mmult(n(B2:D4=E5),transpose(column(B2:D4))))))
Also works in Excel if entered as an array formula without the ArrayFormula wrapper.
A specific Google Sheets one can be quite short
=ArrayFormula(COUNTUNIQUE((B2:D4=E5)*row(B2:D4)))-1
counting the unique rows containing a match.
Note - I am subtracting 1 in the last formula above because I am assuming there is at least one zero (non-match) which should be ignored. This would fail in the extreme case where all students in all classes are assigned to the same teacher so you have a matrix (e.g.) of all 1's. This would be more theoretically correct:
=ArrayFormula(COUNTUNIQUE(if(B2:D4=E5,row(B2:D4),"")))

Related

Retrieve column 4 from Column 2 and 3 which contains minimum and maximum conditions along with Column 1 which is a separate value?

Hello I have a table shown below where I have letters in column 1, and min and max ranges for column 2 and 3. I am trying to retrieve the final number in column 4.
I know I can use a VLOOKUP and set the range as TRUE to get the last column. However, how would I factor in multiple columns/criteria to find match the correct range with the correct letter.
For example, I can would like to get value 4 from the last column. I would have to match with "B" and it would be between 0 and $50,000.
A 0 $50,000 1
A $50,001 $100,000 2
A $100,001 $250,000 3
B 0 $50,000 4
B $50,001 $100,000 5
B $100,001 $250,000 6
C 0 $50,000 7
C $50,001 $100,000 8
C $100,001 $250,000 9
Thank you!
Two ways:
If the pattern is the same as to the breaks of the dollar amounts then use this:
=INDEX(D:D,MATCH(G1,A:A,0)+MATCH(H1,$B$1:$B$3)-1)
Where MATCH(G1,A:A,0) returns the first row where the ID is located and MATCH(H1,$B$1:$B$3) finds the relative location of the price in the first pattern. Change $B$1:$B$3 to encompass the whole pattern.
If the patterns are different then you can use this:
=SUMIFS(D:D,A:A,G1,B:B,"<=" & H1,C:C,">=" & H1)
One more for the future when Microsoft releases FILTER():
=FILTER(D:D,(A:A=G1)*(B:B<=H1)*(C:C>=H1))
This is entered normally and does not matter the pattern.

Dynamically sort list based off associated values with tie-breaker values

I'm trying to sort students based off frequency of participation. I have a table that is automatically generated totaling up how often a student has participated in the last few days.
I want it to do 2 things that I can't figure out.
I want it to ignore students that are at 0 removing them from the resulting rankings.
The first number is most important but I want it to reference the next value in the result of a tie.
Short example of table:
Andy - 1 1 2 3
Brad - 0 1 2 3
Cade - 1 2 3 4
Dane - 1 1 1 2
Desired result:
Cade - 1
Andy - 1
Dane - 1
The tie-breaker isn't that important and I figure I can have conditional formatting to remove children at 0, but I still can't seem to figure it out.
The closest formulas I have found in my searching are:
=INDEX($A$10:$A$9,MATCH(ROWS($C$1:C1),$C$1:$C$9,0))
This one doesn't work because it returns #N/A for pretty much all students who are tied.
=IFERROR(INDEX($C$1:$C$9,MATCH(SMALL(NOT($C$1:$C$9="")*IF(ISNUMBER($C$1:$C$9),COUNTIF($C$1:$C$9,"<="&$C$1:$C$9),COUNTIF($C$1:$C$9,"<="&$C$1:$C$9)+SUM(--ISNUMBER($C$1:$C$9))),ROWS($C$1:C1)+SUM(--ISBLANK($C$1:$C$9))),NOT($C$1:$C$9="")*IF(ISNUMBER($C$1:$C$9),COUNTIF($C$1:$C$9,"<="&$C$1:$C$9),COUNTIF($C$1:$C$9,"<="&$C$1:$C$9)+SUM(--ISNUMBER($C$1:$C$9))),0)),"")
I had this formula that can handle ties but it needs to be OFFSET but I don't know how since it is an array formula. Also, with both these formulas it reverses the ranks with the lowest values at the top. If anyone could assist me I would greatly appreciate it. I'm doing this so that I can give all students a chance to participate equally.
Use a helper column. In that column put the following formula:
=IF(B1=0,"n/a",SUMPRODUCT(B1:E1/10^(COLUMN(B1:E1)-MIN(COLUMN(B1:E1)))))
This will return a single number based on the rankings.
Then in your output column use:
=IFERROR(INDEX(A:A,MATCH(LARGE(F:F,ROW(1:1)),F:F,0)),"")
Then a simple VLOOKUP to return the first number:
=IF(I1<>"",VLOOKUP(I1,A:B,2,FALSE),"")

Compare multiple data from rows

I'm looking for a way to compare multiple rows with data to each other, trying to find the best possible match. Each number in every column must be an approximately match the other numbers in the same column.
Example:
Customer #1: 1 5 10 9 7 7 8 2 3
Customer #2: 10 5 9 3 5 7 4 3 2
Customer #3: 1 4 10 9 8 7 6 2 2
Customer #4: 9 5 6 7 2 1 10 5 6
In this example customer #1 and #3 is quite similar, and I need to find a way to highlight or sort the rows so I can easily find the best match.
I've tried using conditional formatting to highlight the numbers that are the similar, but that is quite confusing, because the amount of data is quite big.
Any ideas of how I could solve this?
Thanks!
The following formula entered in (say) L1 and pulled down gives the best match with the current row based on the sum of the absolute differences between corresponding cells:-
=MIN(IF(ROW($C$1:$K$4)<>ROW(),(MMULT(ABS($C1:$K1-$C$1:$K$4),TRANSPOSE(COLUMN($C$1:$K$4))^0))))
It is an array formula and must be entered with CtrlShiftEnter.
You can then sort on column L to bring the customers with lowest similarity scores to the top or use conditional formatting to highlight rows with a certain similarity value.
EDIT
If you wanted to penalise large differences in individual columns more heavily than small differences to try and avoid pairs of customers which are fairly similar except for having some columns very different, you could try something like the square of the differences:-
=MIN(IF(ROW($C$1:$K$4)<>ROW(),(MMULT(($C1:$K1-$C$1:$K$4)^2,TRANSPOSE(COLUMN($C$1:$K$4))^0))))
then the scores for your test data would come out as 7,127,7,127.
I'm assuming you want to compare customers 2-4 with customer 1 and that you are comparing only within each column. In this case, you could implement a 'scoring system' using multiple IFs. For example,:
A B C D E
1 Customer 1 1 1 2
2 Customer 2 1 2 2
3 Customer 3 0 1 0
you could use in E2
=if(B2=$B$1,1,0)+if(C2=$C$1,1,0)+if(D2=$D$1,1,0)
This will return a 'score' of 1 when you have a match and a 'score' of 0 when you don't. It then adds up the scores and your highest value will be your best match. Copying down would then give
A B C D E
1 Customer 1 1 1 2
2 Customer 2 1 2 2 2
3 Customer 3 0 1 0 1
so customer 2 is the best match.

Excel formula to apply penalty column to ranking

I have thought long and hard about this, but I can't find a solution to what I believe is quite a simple problem.
I have a table of results, where sometimes someone will be given a penalty of a varying amount. This is entered into the penalty column (Col C).
I need a formula which checks if there is an entry into the penalty column and applies it, not only to that row, but to the number of subsequent rows which are affected, depending on the severity of the penalty.
I have tried to see if this is possible by referencing the penalty against the 'ROW()' function but have not been able to achieve the desired effect.
Col D shows the desired output of the formula.
Col E is included for reference only, to show the desired effect on each row.
Col A Col B Col C Col D Col E
Pos Name Penalty New Pos Change
1 Jack 1 0
2 Matt 2 0
3 Daniel 2 5 +2
4 Gordon 3 -1
5 Phillip 4 -1
6 Günther 6 0
7 Johann 3 10 +3
8 Alain 7 -1
9 John 8 -1
10 Gianmaria 9 -1
The big issue is, if someone is handed a big penalty, for example '10' then it affects the following ten rows. I can't work out how to include this variable logic...
I would be interested to hear the approach of others...
You need to use the RANK() function:
Excel RANK Function Examples
In a new column, add the penalty value to the original position, plus a small coeffieient depending on the original position (0.01 per increment perhaps) to move the penalised player below the original person at that position, then in the next column you can RANK() the new column of values (F in my case).
New value is therefore =A2+(IF(C2>0,C2+(0.01*A2)))
Rank is then =RANK(F2,F2:F11,1)
You can combine all the functions into one, but it's clearer to do it in separate columns at first.

EXCEL - Comparing Two Columns - Removing Repeats

If two households share, they create a tie and this tie has a kinship rank that does not change, no matter how often two households share with each other.
KINSHIP RANK EXAMPLE
As you can see, it doesn't matter in which "direction" the tie happened whether it was household 5 who shared to household 3 or vice versa, the kinship rank is still 1
HH1 HH2 RANK
5 3 1
3 5 1
Therefore, I do not need every tie that occurs between two households, but only the first instance that a tie occurred between the two households.
So here is a sample list of many households who shared with each other, sometimes sharing resources with themselves, sharing only once, or sharing multiple times with the same household.
TWO HOUSEHOLD WITH REPEATED TIES
COL.A COL.B
ROW HH1 HH2
1 1 1
2 1 2
3 1 3
4 2 1
5 2 4
6 3 1
7 3 2
8 3 4
9 4 2
This is what I need it to look like:
TWO HOUSEHOLDS WITHOUT REPEATED TIES
COL.A COL.B
ROW HH1 HH2
1 1 1
2 1 2
3 1 3
4 2 4
5 3 2
6 3 4
What I have done
I wrote a simple command for placing the HH1 and HH2 information into the same cell:
=A1&"|"&B1
In the case of the second row, this looks like 1|2 inside cell C2
HH1 and HH2 are combined in column C so how will I be able to compare all of the households in column C to each other? Perhaps a highlighting rule if a repeat happens? Or in another column list if it is a delete or a keep?
Thank you for your assistance everyone.
I suggest a simple COUNTIFS to do the job like this:
=(COUNTIFS(A$1:A1,B2,B$1:B1,A2)+COUNTIFS(A$1:A1,A2,B$1:B1,B2))>0
starting in C2 and then copy down. It will show TRUE for each row which is within the range above it and false if not. Ich checks for both x/y and y/x (the order doesn't matter)
Now simply filter col C to only show rows with TRUE in it. Then simply select and delete it.
This also works with non numerical values like names.
If you still have any questions, just ask ;)
You also can wrap it up to get more informations like this:
=IF((COUNTIFS(A$1:A1,B2,B$1:B1,A2)+COUNTIFS(A$1:A1,A2,B$1:B1,B2)),"",COUNTIFS(A:A,B2,B:B,A2)+COUNTIFS(A:A,A2,B:B,B2))
For C2 and copy down. C1 gets:
=COUNTIFS(A:A,B2,B:B,A2)+COUNTIFS(A:A,A2,B:B,B2)
This will show you only at the first occurrence how many times it is within the whole range.
All done by phone, may contain errors
Use =((A1*B1)/(A1+B1))*((A1*B1)+(A1+B1)) to create unique identifiers. Then use Remove Duplicates in the Data Tools Pane of the Data Tab to remove all rows containing duplicates. Or, alternatively, use something like =IF(IFNA(MATCH(A2,A$1:A1,0),TRUE())=TRUE,"First Share","") dragged and dropped from row 2 to identify First Shares.

Resources