Count rows where two values appear together - excel

My data are in MS Excel:
Col A Col B Col C Col D
1 amy john bob andy
2 andy mel amy john
3 max andy jim bob
4 wil steve andy amy
So, in 4x4 table there are 9 different values.
I need to create table to find how many times each PAIR is occurring in the same ROW. Something like this:
amy andy bob jim john max mel steve will
amy 0
andy 3 0
bob 1 2 0
jim 0 1 1 0
john 2 2 1 0 0
max 0 1 1 1 0 0
mel 1 1 0 0 1 0 0
steve 1 1 0 0 0 0 0 0
will 1 1 0 0 0 0 0 1 0
And I have no clue how to do it...
To reiterate: no duplicated values in each row, each row has unique values, each value in separate cell, so there are column with values and within column values can duplicate.
Any help will be much appreciated!

Assuming your data is in A5:D8 I proceeded like this -
created a helper column with the formula (copied downwards)
=A5&"-"&B5&"-"&C5&"-"&D5
Named this helper column as helper (named range)
listed down and across the unique combinations of names in H4:P4 (across) and G5:G13 (down)
enter this formula in H5 and copy it both downwards and across to fill all 9x9 matrix
=IF($G5=H$4,0,COUNTIFS(helper,"*"&$G5&"*",helper,"*"&H$4&"*"))
Your desired matrix is ready
A detailed blog is available on web for this.

Related

Group by name and count unique values

I have an Excel file like this, where column A and B are given. I want to add column C and D that represent days. D is pretty easy, because it is always one day. C is tricky, because I want to count only "unique" days, where a branch can be one day maximum, where D counts all days.
A B C D
Row Name Branch Unique Overall
1 Jack Health 1 1
2 Jack Health 0 1
3 Jack Food 1 1
4 Jolie Tech 1 1
5 Jolie Food 1 1
6 Jolie Tech 0 1
7 Jolie Health 1 1
I need column C and D for a pivot table like this:
Branch Unique Overall
Health 2 3
Food 2 2
Tech 1 2
I also could add names as a sub position.
Branch Unique Overall
Health 2 3
-Jack 1 2
-Jolie 1 1
Food 2 2
-Jack 1 1
-Jolie 1 1
Tech 1 2
-Jolie 1 2
But that´s something, that can be done after preparing the data and what comes with the program anyway. So how can I design a formula that counts only unique branches for a data set of hundreds of rows?
Thank you!
In C2 put:
=--(COUNTIFS($A$2:A2,A2,$B$2:B2,B2)=1)
Then copy down

using index match with sum if

I need to link up a sumif() with an index match (i'm guessing here) but don't really know where to start.
Basically i a table with different classes of pets, their species and quantity. there are 3 stores. I need an output where i can get the quantity of each species from each store dynamically.
data table:
"A1" Pet Stores
Species Class a b c
cat Fluffy1 1 0 0
cat Fluffy2 3 0 0
cat Fluffy3 5 7 1
cat Fluffy4 6 0 7
dog Barky1 7 6 9
dog Barky2 1 3 9
dog Barky3 0 2 8
dog Barky4 0 2 3
fish Swimmy1 0 0 0
fish Swimmy2 1 3 0
fish Swimmy3 0 2 3
fish Swimmy4 0 0 0
Output:
Pet Store a <--change this
cat 15 <--output
dog 8 <--output
fish 1 <--output
right now my formula for "cat" is =SUMIF($A$3:$A$14,A17,$C$3:$C$14). however, it only looks down the 1 column that i've set. how do i change it such that it searches for the "Pet Store" and returns sum of the respective column?
How about this:
Formula in cell H3 copied down is
=SUMIF($A$2:$A$13,G3,INDEX($C$2:$E$13,,MATCH(H$2,$C$1:$E$1,0)))
Slightly shorter that #teylyn's version:
=SUMIF(A$2:A$13,A16,OFFSET(C$2:C$13,,CODE(B$15)-97))
but less versatile as it relies on the shop names being coded (which however is as in the example and makes sense for column label purposes):
However my preference would be for a PivotTable:

count data using two columns as references

Is it possible to count or countif by using a column as the data, a cell for the criteria (or what to match) and range of what to count?
Here is what I am looking at:
A1 B C D E F G H I J K L M N O
2 Running Data Total Count of Tardies (by category)
3 Date Employees Leader Start of Shift Break 1 Lunch Break 2 Employees Start of Shift Break 1 Lunch Break 2 Total
4 1-Jul Abe Sue 15 Abe 0
5 3-Jul Steve Bob 20 Anna 0
6 5-Jul Eve Andy 9 20 Eve 0
7 7-Jul Anna Andy 30 Helen 0
8 15-Jul Abe Sue 15 Mark 0
9 18-Jul Anna Andy 10 Steve 0
10 20-Jul Helen Sue 9 0
11 31-Jul Mark Bob 45 0
I am trying to count the data entered on the left (running data) in each category and having it show based on the Employees on the right (in the orange cells). So Abe should show 1 for Start of Shift, Eve should show 1 for Break 1 and Break 2, and Anna should show 2 for Start of Shift.
I have tried using:
=countif(C:C,$J4,D:D) to get the data from JUST Column D for Start of shift, but it gives and error saying too many arguments for the function have been entered.
Help...
...and Thanks!
Countif will only look at 1 column to decide what to count.
Countifs will look at multiple columns. Your formula would look something like this:
=COUNTIFS($C:$C,$J4,E:E,">0")

Creating a Two-Mode Network

Using Python 3.2 I am trying to turn data from a CSV file into a two-mode network. For those who do not know what that means, the idea is simple:
This is a snippet of my dataset:
Project_ID Name_1 Name_2 Name_3 Name_4 ... Name_150
1 Jean Mike
2 Mike
3 Joe Sarah Mike Jean Nick
4 Sarah Mike
5 Sarah Jean Mike Joe
I want to create a CSV that puts the Project_IDs across the first row of the CSV and each unique name down the first column (with cell A1 blank) and then a 1 in the i,j cell if that person worked on a given project. NOTE: My data has full names (with middle initial), with no two people having the same name so there will not be any duplicates.
The final data output would look like this:
1 2 3 4 5
Jean 1 0 1 0 1
Mike 1 1 1 1 1
Joe 0 0 1 0 1
Sarah 0 0 1 1 1
... ... ... ... ... ...
Nick 0 0 1 0 0
Start by using the CVS reader
import csv
with open('some.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
print row
Note that row will read as arrays for each line.
The output array should probably be created before you start. As from this question, here is how you could do that
buckets = [[0 for col in range(5)] for row in range(10)]

Excel - lookup on one column, result from second column

The first three columns exist. I am trying to create a formula for the fourth (HH_ANALYSIS_FLAG).
ACCOUNT_NUMBER HOUSEHOLD_NUMBER ACCOUNT_ANALYSIS_FLAG HH_ANALYSIS_FLAG
1001 1 1 0
1002 2 0 0
1003 3 1 0
1004 3 0 0
1005 3 0 0
1006 2 0 0
1007 4 0 0
1008 1 1 0
I have 50,000 accounts. They are flagged as being under analysis with the ACCOUNT_ANALYSIS_FLAG column (0,1). All accounts belong to a household. Multiple accounts can belong to the same household. I need the HH_ANALYSIS_FLAG column to evaluate to true or false (0,1) if any account in the same household is under analysis. So with the above data and a working formula, my spreadsheet would look like so:
ACCOUNT_NUMBER HOUSEHOLD_NUMBER ACCOUNT_ANALYSIS_FLAG HH_ANALYSIS_FLAG
1001 1 1 1
1002 2 0 0
1003 3 1 1
1004 3 0 1
1005 3 0 1
1006 2 0 0
1007 4 0 0
1008 1 1 1
The following formula should do the trick. In fact, it will give you the total number of accounts being analysed per household.
A B C D
1 ACC_NUM HH_NUM ACC_ANALYSIS_FLAG HH_ANALYSIS_FLAG
2 1001 1 1 =SUMIF(B$2:B$50001, B2, C$2:c$50001)
3 1002 2 0 =SUMIF(B$2:B$50001, B3, C$2:c$50001)
4 1003 3 1 =SUMIF(B$2:B$50001, B4, C$2:c$50001)
For each row this takes selects the set of rows that share the value in the ACC_NUM column (based on the row conaining the formula) and sums together the values in the corresponding ACC_ANALYSIS_FLAG columns. This gives you the total number of accounts under analysis for the given household. Compare the result to 0 if you only need to use it as a boolean value.
EDIT:
Apparently the performance of this isn't up to snuff. However, assuming the the household numbers are all colocated, it should be possible to speed things up significantly by changin to something like the following.
2 1001 1 1 =SUMIF(B2:B5, B2, C2:C5)
3 1002 2 0 =SUMIF(B2:B6, B3, C2:C6)
4 1003 2 0 =SUMIF(B2:B7, B3, C2:C7)
5 1004 2 0 =SUMIF(B2:B8, B3, C2:C8)
6 1005 2 0 =SUMIF(B3:B9, B3, C3:C9)
7 1006 2 0 =SUMIF(B4:B10, B3, C4:C10)
8 1007 2 0 =SUMIF(B5:B11, B3, C5:C11)
9 1008 2 0 =SUMIF(B6:B12, B3, C6:C12)
10 1009 2 0 =SUMIF(B7:B13, B3, C7:C13)
This assumes that there are at most 4 accounts per household, and thus limits the range of the SUMIF to the current cell +/- 3 rows.
To avoid referencing invalid cells you'll the first and last rows have to be treated as special cases. If you need to generate a single forumala for all of these cells I think it should be possible using the OFFSET in combination with MAX, MIN and ROW to generate the appropriate ranges with just a little arithmatic.
Insert another column D (you can hide it later), which is equal to the household number if it is being analyzed, and zero if it is not. The formula for D2 can be =B2*C2. Fill column D with this formula.
Then for your HH_ANALYSIS_FLAG column, you can count the number of values in column D which match the household in column B. The formula would be like IF(COUNTIF(D:D,"="&B2)>0,1,0).
I'm not sure whether this approach is fast enough for the 50,000 accounts, though.
A B C D E
1 ACCOUNT_NUMBER HOUSEHOLD_NUMBER ACCOUNT_ANALYSIS_FLAG HH_UNDER_ANALYSIS HH_ANALYSIS_FLAG
2 1001 1 1 1 (=B2*C2) =IF(COUNTIF(D:D,"="&B2)>0,1,0)
3 1002 2 0 0 (=B3*C3) =IF(COUNTIF(D:D,"="&B3)>0,1,0)
4 1003 3 1 3 (=B4*C4) =IF(COUNTIF(D:D,"="&B4)>0,1,0)
Presuming your HOUSEHOLD_NUMBER column is column B:
=IF(SUMIF(B:B,C:C)>0,1,0)
should do it.
Kenneth! Try this one:
=IF(VLOOKUP(B2,$B$2:$C$9,2,0)=1,1,0)
Assuming your table starts from A1, which means Account_Number is in cell A1, and your target column "HH_ANALYSIS_FLAG" is in column D.
Hope it's helpful

Resources