Operation over several columns - excel

I am wondering , if I can write a formula which would operate over several columns, e.g. I want to calculate the amount of males in the school and I have a table:
A B C
Class Sex Number
1 male 3
2 male 4
1 female 6
2 female 5
Right now I have to break the operations into parts:
=(B2="Male")*C2 - additional column and then
=SUMME(D2:D5)
I want to do it at once. It seems like a trivial functionality, but I can not figure it out, how I can do it in one formula.

Related

Excel --- Tried many time but fail

enter image description here
I want to sum all these data given in right side to my format. I tried many time and many formulas but some things left every time. Please help
You can achieve most of what you need with a mixture of formulae and a pivot table. But I couldn't quite get everything you wanted, as some of it doesn't make sense. For example, you have 2013 - 2012 - <5 as your first row header, but there are multiple years that give an age less than 5, e.g. 2014 - 2013 - <5 would also be valid. I would suggest hardcoding the years somehow, as they aren't entirely relevant anyway?
Breaking it down into separate problems.
How do I get the Roman number version of the Class?
You could have a massive IF statement, but I went for a VLOOKUP. You could have used the ROMAN built in function, but you can't because your numbering is "first", "second", etc. instead of "1", "2", etc.
So I actually went for both solutions, just to show how they both work. First I created a named table (list object) called "Named" that looks like this:
Class Class Number
First 1
Second 2
Third 3
Fourth 4
etc.
I then used this to VLOOKUP your class names and convert them into numbers. Next I used the ROMAN function to make this into roman numeral format. So I ended up with this (columns E, F and G in my worksheet):
Class Class Number Roman
First 1 I
First 1 I
First 1 I
First 1 I
First 1 I
Second 2 II
Third 3 III
Fourth 4 IV
With formulae:
Class Number - =VLOOKUP(E2,Named,2,FALSE);
Roman - =ROMAN(F2).
How do I get from "Male"/ "Female" to "B"/ "G"?
This was trivial, just =IF(A2="Male", "B", "G") where I have Gender in column A.
How do I get the "age text" so it handles "<5" and ">22"?
My age was in column D, so this was also trivial, =IF(D2<5,"<5", IF(D2>22, ">22", D2)).
Now I have enough data to pivot, so I selected my entire table, which was basically your sample data with calculated columns added to the right. Then I inserted a pivot table and dragged in row/ column headers to make it match your format. This doesn't give you the year columns, but it gives you everything else. For example, just using a few random rows of sample data (as I couldn't be bothered to type it all in):
I went from:
Gender Date of Birth Age Class
Male 06-Jan-14 4 First
Female 07-Sep-11 6 First
Male 01-Jan-12 6 First
Male 31-Dec-12 5 First
Female 01-Oct-11 6 First
Female 16-Nov-10 7 Second
Male 31-Oct-09 8 Third
Male 25-Oct-10 7 Fourth
To:
I II III IV
B G G B B
<5 1
5 1
6 1 2
7 1 1
8 1
Total 3 2 1 1 1
From here you could simply hardcode the columns A and B from your desired final format, as these years aren't actually based on the data?

Count occurrences of strings just once per row in Google Sheets

I have strings of spreadsheet data that need counting by 'type' but not instance.
A B C D
1 Lin 1 2 1
2 Tom 1 4 2
3 Sue 3 1 4
The correct sum of students assigned to teacher 1 is 3, not 4. That teacher 1 meets Lin in lessons B and D is irrelevant to the count.
I borrowed a formula which works in Excel but not in Google Sheets where I and others need to keep and manipulate the data.
F5=SUMPRODUCT(SIGN(COUNTIF(OFFSET(B$2:D$2, ROW($2:$4)-1, 0), E5)))
A B C D E
2 Lin 1 2 1
3 Tom 1 4 2
4 Sue 3 1 4
5 1 [exact string being searched for, ie a teacher name]
I don't know what is not being understood by Google Sheets in that formula. Does anyone know the correct expression to use, or a more efficient way to get the accurate count I need, without duplicates within rows inflating the count?
So this is the mmult way, which works by finding the row totals of students assigned to teacher 1 etc., then seeing how many of the totals are greater than 0.
=ArrayFormula(sum(--(mmult(n(B2:D4=E5),transpose(column(B2:D4)))>0)))
or
=ArrayFormula(sum(sign(mmult(n(B2:D4=E5),transpose(column(B2:D4))))))
Also works in Excel if entered as an array formula without the ArrayFormula wrapper.
A specific Google Sheets one can be quite short
=ArrayFormula(COUNTUNIQUE((B2:D4=E5)*row(B2:D4)))-1
counting the unique rows containing a match.
Note - I am subtracting 1 in the last formula above because I am assuming there is at least one zero (non-match) which should be ignored. This would fail in the extreme case where all students in all classes are assigned to the same teacher so you have a matrix (e.g.) of all 1's. This would be more theoretically correct:
=ArrayFormula(COUNTUNIQUE(if(B2:D4=E5,row(B2:D4),"")))

Compare multiple data from rows

I'm looking for a way to compare multiple rows with data to each other, trying to find the best possible match. Each number in every column must be an approximately match the other numbers in the same column.
Example:
Customer #1: 1 5 10 9 7 7 8 2 3
Customer #2: 10 5 9 3 5 7 4 3 2
Customer #3: 1 4 10 9 8 7 6 2 2
Customer #4: 9 5 6 7 2 1 10 5 6
In this example customer #1 and #3 is quite similar, and I need to find a way to highlight or sort the rows so I can easily find the best match.
I've tried using conditional formatting to highlight the numbers that are the similar, but that is quite confusing, because the amount of data is quite big.
Any ideas of how I could solve this?
Thanks!
The following formula entered in (say) L1 and pulled down gives the best match with the current row based on the sum of the absolute differences between corresponding cells:-
=MIN(IF(ROW($C$1:$K$4)<>ROW(),(MMULT(ABS($C1:$K1-$C$1:$K$4),TRANSPOSE(COLUMN($C$1:$K$4))^0))))
It is an array formula and must be entered with CtrlShiftEnter.
You can then sort on column L to bring the customers with lowest similarity scores to the top or use conditional formatting to highlight rows with a certain similarity value.
EDIT
If you wanted to penalise large differences in individual columns more heavily than small differences to try and avoid pairs of customers which are fairly similar except for having some columns very different, you could try something like the square of the differences:-
=MIN(IF(ROW($C$1:$K$4)<>ROW(),(MMULT(($C1:$K1-$C$1:$K$4)^2,TRANSPOSE(COLUMN($C$1:$K$4))^0))))
then the scores for your test data would come out as 7,127,7,127.
I'm assuming you want to compare customers 2-4 with customer 1 and that you are comparing only within each column. In this case, you could implement a 'scoring system' using multiple IFs. For example,:
A B C D E
1 Customer 1 1 1 2
2 Customer 2 1 2 2
3 Customer 3 0 1 0
you could use in E2
=if(B2=$B$1,1,0)+if(C2=$C$1,1,0)+if(D2=$D$1,1,0)
This will return a 'score' of 1 when you have a match and a 'score' of 0 when you don't. It then adds up the scores and your highest value will be your best match. Copying down would then give
A B C D E
1 Customer 1 1 1 2
2 Customer 2 1 2 2 2
3 Customer 3 0 1 0 1
so customer 2 is the best match.

EXCEL - Comparing Two Columns - Removing Repeats

If two households share, they create a tie and this tie has a kinship rank that does not change, no matter how often two households share with each other.
KINSHIP RANK EXAMPLE
As you can see, it doesn't matter in which "direction" the tie happened whether it was household 5 who shared to household 3 or vice versa, the kinship rank is still 1
HH1 HH2 RANK
5 3 1
3 5 1
Therefore, I do not need every tie that occurs between two households, but only the first instance that a tie occurred between the two households.
So here is a sample list of many households who shared with each other, sometimes sharing resources with themselves, sharing only once, or sharing multiple times with the same household.
TWO HOUSEHOLD WITH REPEATED TIES
COL.A COL.B
ROW HH1 HH2
1 1 1
2 1 2
3 1 3
4 2 1
5 2 4
6 3 1
7 3 2
8 3 4
9 4 2
This is what I need it to look like:
TWO HOUSEHOLDS WITHOUT REPEATED TIES
COL.A COL.B
ROW HH1 HH2
1 1 1
2 1 2
3 1 3
4 2 4
5 3 2
6 3 4
What I have done
I wrote a simple command for placing the HH1 and HH2 information into the same cell:
=A1&"|"&B1
In the case of the second row, this looks like 1|2 inside cell C2
HH1 and HH2 are combined in column C so how will I be able to compare all of the households in column C to each other? Perhaps a highlighting rule if a repeat happens? Or in another column list if it is a delete or a keep?
Thank you for your assistance everyone.
I suggest a simple COUNTIFS to do the job like this:
=(COUNTIFS(A$1:A1,B2,B$1:B1,A2)+COUNTIFS(A$1:A1,A2,B$1:B1,B2))>0
starting in C2 and then copy down. It will show TRUE for each row which is within the range above it and false if not. Ich checks for both x/y and y/x (the order doesn't matter)
Now simply filter col C to only show rows with TRUE in it. Then simply select and delete it.
This also works with non numerical values like names.
If you still have any questions, just ask ;)
You also can wrap it up to get more informations like this:
=IF((COUNTIFS(A$1:A1,B2,B$1:B1,A2)+COUNTIFS(A$1:A1,A2,B$1:B1,B2)),"",COUNTIFS(A:A,B2,B:B,A2)+COUNTIFS(A:A,A2,B:B,B2))
For C2 and copy down. C1 gets:
=COUNTIFS(A:A,B2,B:B,A2)+COUNTIFS(A:A,A2,B:B,B2)
This will show you only at the first occurrence how many times it is within the whole range.
All done by phone, may contain errors
Use =((A1*B1)/(A1+B1))*((A1*B1)+(A1+B1)) to create unique identifiers. Then use Remove Duplicates in the Data Tools Pane of the Data Tab to remove all rows containing duplicates. Or, alternatively, use something like =IF(IFNA(MATCH(A2,A$1:A1,0),TRUE())=TRUE,"First Share","") dragged and dropped from row 2 to identify First Shares.

Count and average data inside categories under two types of conditions

I'm working on Excel with a lot of data and I'm having difficulty with knowing how to sort through it to get some important numbers. I have minimal Excel experience.
Right now I'm struggling with knowing how to get the average in the difference between two columns. The trick is that I have to get the average in difference when column A is less that column B and then, the same when it's more. And all that within a category.
So for example let's say I have 3 categories: Football, Soccer, and Basketball (these are just made up ones).
So in column A, I have: Soccer, Football, Basketball. Then, in column B and C, I have the scores for John and Adam for the last 3 months, respectively. Lastly, in column D, I have the differences between their scores.
So, for example:
Category John Adam Differences
Soccer 5 3 2
Soccer 6 2 4
Soccer 3 5 2
Soccer 4 0 4
I want to create a table for within each category I have a table like below:
NÂș of cases Avg. Difference between John and Adam
When John's score is >
When John's score is <
When they are equal
Is there some type of formula where I can say something like this:
If the category is Soccer (the category being in column A), take the difference between John's score (column B) and Adam's score (column C) when John's score is larger than Adam's score, then calculate the average of those differences? Then, I would use the same formula but tweak it when John's score is smaller.
Additionally, would there be a formula where I can also, calculate within the category Soccer, how many times John's score is bigger than Adam's?
My data is much larger and I can't do this manually.
A B C D
1 Sport John Adam Differences
2 Soccer 5 3 2
3 Soccer 6 2 4
4 Soccer 3 5 -2
5 Soccer 4 0 4
6 Basketball 20 15 5
7 Basketball 7 13 -6
8 Basketball 26 10 16
9 Basketball 8 11 -3
Type in D1:
=B1-C1
Drag the formula in Column D to all rows which there are values in columns A, B and C.
Create the PivotTable.
Drag Sport to "Row Label" field. Drag Differences to "Row Label" field under Sport.
Drag Differences to "Values" field as: Count of Differences (same way the previous question)
Drag Differences to "Values" field (below Count of Differences), and set the mathematical operation as "Average" of Differences (left-mouse click Differences, choose "Values fields settings" and select "Average").
Give a right-click mouse in cell A5 (see picture bellow) and select "Group" option.
Set "Starting at" = 0; "Ending at" = 1000; "By" = 1000 (as in the picture below). Click ok.
You will have in each Sport, the count (frequency) and average Differences values for two groups:
When the Difference B1-C1 is negative; and
When the Difference B1-C1 is zero or positive.
The average of Differences when the score is equal will be always zero.

Resources