Creating a Two-Mode Network - python-3.x

Using Python 3.2 I am trying to turn data from a CSV file into a two-mode network. For those who do not know what that means, the idea is simple:
This is a snippet of my dataset:
Project_ID Name_1 Name_2 Name_3 Name_4 ... Name_150
1 Jean Mike
2 Mike
3 Joe Sarah Mike Jean Nick
4 Sarah Mike
5 Sarah Jean Mike Joe
I want to create a CSV that puts the Project_IDs across the first row of the CSV and each unique name down the first column (with cell A1 blank) and then a 1 in the i,j cell if that person worked on a given project. NOTE: My data has full names (with middle initial), with no two people having the same name so there will not be any duplicates.
The final data output would look like this:
1 2 3 4 5
Jean 1 0 1 0 1
Mike 1 1 1 1 1
Joe 0 0 1 0 1
Sarah 0 0 1 1 1
... ... ... ... ... ...
Nick 0 0 1 0 0

Start by using the CVS reader
import csv
with open('some.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
print row
Note that row will read as arrays for each line.
The output array should probably be created before you start. As from this question, here is how you could do that
buckets = [[0 for col in range(5)] for row in range(10)]

Related

Count rows where two values appear together

My data are in MS Excel:
Col A Col B Col C Col D
1 amy john bob andy
2 andy mel amy john
3 max andy jim bob
4 wil steve andy amy
So, in 4x4 table there are 9 different values.
I need to create table to find how many times each PAIR is occurring in the same ROW. Something like this:
amy andy bob jim john max mel steve will
amy 0
andy 3 0
bob 1 2 0
jim 0 1 1 0
john 2 2 1 0 0
max 0 1 1 1 0 0
mel 1 1 0 0 1 0 0
steve 1 1 0 0 0 0 0 0
will 1 1 0 0 0 0 0 1 0
And I have no clue how to do it...
To reiterate: no duplicated values in each row, each row has unique values, each value in separate cell, so there are column with values and within column values can duplicate.
Any help will be much appreciated!
Assuming your data is in A5:D8 I proceeded like this -
created a helper column with the formula (copied downwards)
=A5&"-"&B5&"-"&C5&"-"&D5
Named this helper column as helper (named range)
listed down and across the unique combinations of names in H4:P4 (across) and G5:G13 (down)
enter this formula in H5 and copy it both downwards and across to fill all 9x9 matrix
=IF($G5=H$4,0,COUNTIFS(helper,"*"&$G5&"*",helper,"*"&H$4&"*"))
Your desired matrix is ready
A detailed blog is available on web for this.

Group by name and count unique values

I have an Excel file like this, where column A and B are given. I want to add column C and D that represent days. D is pretty easy, because it is always one day. C is tricky, because I want to count only "unique" days, where a branch can be one day maximum, where D counts all days.
A B C D
Row Name Branch Unique Overall
1 Jack Health 1 1
2 Jack Health 0 1
3 Jack Food 1 1
4 Jolie Tech 1 1
5 Jolie Food 1 1
6 Jolie Tech 0 1
7 Jolie Health 1 1
I need column C and D for a pivot table like this:
Branch Unique Overall
Health 2 3
Food 2 2
Tech 1 2
I also could add names as a sub position.
Branch Unique Overall
Health 2 3
-Jack 1 2
-Jolie 1 1
Food 2 2
-Jack 1 1
-Jolie 1 1
Tech 1 2
-Jolie 1 2
But that´s something, that can be done after preparing the data and what comes with the program anyway. So how can I design a formula that counts only unique branches for a data set of hundreds of rows?
Thank you!
In C2 put:
=--(COUNTIFS($A$2:A2,A2,$B$2:B2,B2)=1)
Then copy down

How can I split a single dataframe column, containing multiple values, into a transposed table of 'flags'

I'm trying to separate a single column in a dataframe, that contains multiple comma separated values, into a transposed table that contains columns which flag every potential comma separated value.
e.g.
name purchased total_value
-----------------------------------------------
John Eggs, bread, milk 100
Steve Milk, cheese, wine 140
Susan Beer, cheese, milk 120
Needs to become:
name total_value eggs bread milk cheese wine beer
-----------------------------------------------------------------
John 100 1 1 1 0 0 0
Steve 140 0 0 1 1 1 0
Susan 120 0 0 1 1 0 1
As a small added complication, the purchased column as spaces after the commas, and some of the values have capitals.
Can anyone help me get from A to B?
Thanks in advance
First convert all values to lowercase by lower, call str.get_dummies and last join to original.
For remove original purchased is possible use pop or drop:
df['purchased'] = df['purchased'].str.lower()
df = df.join(df.pop('purchased').str.get_dummies(', '))
df = df.drop('purchased', 1).join(df['purchased'].str.lower().str.get_dummies(', '))
print (df)
name total_value beer bread cheese eggs milk wine
0 John 100 0 1 0 1 1 0
1 Steve 140 0 0 1 0 1 1
2 Susan 120 1 0 1 0 1 0

using index match with sum if

I need to link up a sumif() with an index match (i'm guessing here) but don't really know where to start.
Basically i a table with different classes of pets, their species and quantity. there are 3 stores. I need an output where i can get the quantity of each species from each store dynamically.
data table:
"A1" Pet Stores
Species Class a b c
cat Fluffy1 1 0 0
cat Fluffy2 3 0 0
cat Fluffy3 5 7 1
cat Fluffy4 6 0 7
dog Barky1 7 6 9
dog Barky2 1 3 9
dog Barky3 0 2 8
dog Barky4 0 2 3
fish Swimmy1 0 0 0
fish Swimmy2 1 3 0
fish Swimmy3 0 2 3
fish Swimmy4 0 0 0
Output:
Pet Store a <--change this
cat 15 <--output
dog 8 <--output
fish 1 <--output
right now my formula for "cat" is =SUMIF($A$3:$A$14,A17,$C$3:$C$14). however, it only looks down the 1 column that i've set. how do i change it such that it searches for the "Pet Store" and returns sum of the respective column?
How about this:
Formula in cell H3 copied down is
=SUMIF($A$2:$A$13,G3,INDEX($C$2:$E$13,,MATCH(H$2,$C$1:$E$1,0)))
Slightly shorter that #teylyn's version:
=SUMIF(A$2:A$13,A16,OFFSET(C$2:C$13,,CODE(B$15)-97))
but less versatile as it relies on the shop names being coded (which however is as in the example and makes sense for column label purposes):
However my preference would be for a PivotTable:

count data using two columns as references

Is it possible to count or countif by using a column as the data, a cell for the criteria (or what to match) and range of what to count?
Here is what I am looking at:
A1 B C D E F G H I J K L M N O
2 Running Data Total Count of Tardies (by category)
3 Date Employees Leader Start of Shift Break 1 Lunch Break 2 Employees Start of Shift Break 1 Lunch Break 2 Total
4 1-Jul Abe Sue 15 Abe 0
5 3-Jul Steve Bob 20 Anna 0
6 5-Jul Eve Andy 9 20 Eve 0
7 7-Jul Anna Andy 30 Helen 0
8 15-Jul Abe Sue 15 Mark 0
9 18-Jul Anna Andy 10 Steve 0
10 20-Jul Helen Sue 9 0
11 31-Jul Mark Bob 45 0
I am trying to count the data entered on the left (running data) in each category and having it show based on the Employees on the right (in the orange cells). So Abe should show 1 for Start of Shift, Eve should show 1 for Break 1 and Break 2, and Anna should show 2 for Start of Shift.
I have tried using:
=countif(C:C,$J4,D:D) to get the data from JUST Column D for Start of shift, but it gives and error saying too many arguments for the function have been entered.
Help...
...and Thanks!
Countif will only look at 1 column to decide what to count.
Countifs will look at multiple columns. Your formula would look something like this:
=COUNTIFS($C:$C,$J4,E:E,">0")

Resources