comma seperated values in columns as rows in pandas - python-3.x

I have a dataframe in pandas as mentioned below where elements in column info is same as unique file in column id:
id text info
1 great boy,police
1 excellent boy,police
2 nice girl,mother,teacher
2 good girl,mother,teacher
2 bad girl,mother,teacher
3 awesome grandmother
4 superb grandson
All I want to get list elements as row for each file, like:
id text info
1 great boy
1 excellent police
2 nice girl
2 good mother
2 bad teacher
3 awesome grandmother
4 superb grandson

Let us try
df['new'] = df.loc[~df.id.duplicated(),'info'].str.split(',').explode().values
df
id text info new
0 1 great boy,police boy
1 1 excellent boy,police police
2 2 nice girl,mother,teacher girl
3 2 good girl,mother,teacher mother
4 2 bad girl,mother,teacher teacher
5 3 awesome grandmother grandmother
6 4 superb grandson grandson

Take advantage of the fact that 'info' is duplicated.
df['info'] = df['info'].drop_duplicates().str.split(',').explode().to_numpy()
Output:
id text info
0 1 great boy
1 1 excellent police
2 2 nice girl
3 2 good mother
4 2 bad teacher
5 3 awesome grandmother
6 4 superb grandson

One way using pandas.DataFrame.groupby.transform.
Note that this assumes:
elements in info have same length as the number of members for each id after split by ','
elements in info are identical among the same id.
df["info"] = df.groupby("id")["info"].transform(lambda x: x.str.split(",").iloc[0])
print(df)
Output:
id text info
0 1 great boy
1 1 excellent police
2 2 nice girl
3 2 good mother
4 2 bad teacher
5 3 awesome grandmother
6 4 superb grandson

create temp variable counting the number of rows for each info group:
temp = df.groupby('info').cumcount()
Do a list comprehension to index per text in info:
df['info'] = [ent.split(',')[pos] for ent, pos in zip(df['info'], temp)]
df
id text info
0 1 great boy
1 1 excellent police
2 2 nice girl
3 2 good mother
4 2 bad teacher
5 3 awesome grandmother
6 4 superb grandson

Or try apply:
df['info'] = pd.DataFrame({'info': df['info'].str.split(','), 'n': df.groupby('id').cumcount()}).apply(lambda x: x['info'][x['n']], axis=1)
Output:
>>> df
id text info
0 1 great boy
1 1 excellent police
2 2 nice girl
3 2 good mother
4 2 bad teacher
5 3 awesome grandmother
6 4 superb grandson
>>>

Related

On SQL with one-to-many merging and many as a narrowing condition

Use sqlalchemy
Parent table
id name
1 sea bass
2 Tanaka
3 Mike
4 Louis
5 Jack
Child table
id user_id pname number
1 1 Apples 2
2 1 Banana 1
3 1 Grapes 3
4 2 Apples 2
5 2 Banana 2
6 2 Grapes 1
7 3 Strawberry 5
8 3 Banana 3
9 3 Grapes 1
I want to sort by parent id with apples and number of bananas, but when I search for "parent id with apples", the search is filtered and the bananas disappear. I have searched for a way to achieve this, but have not been able to find it.
Thank you in advance for your help.
Translated with www.DeepL.com/Translator (free version)

convert repetitive lists as rows in pandas column

I have a dataframe in pandas as mentioned below where list elements in column info is same as unique file in column id:
id text info
1 great ['son','daughter']
1 excellent ['son','daughter']
2 nice ['father','mother','brother']
2 good ['father','mother','brother']
2 bad ['father','mother','brother']
3 awesome nan
All I want to get list elements as row for each file, like:
id text info
1 great son
1 excellent daughter
2 nice father
2 good mother
2 bad brother
3 awesome nan
Let us try explode after drop_duplicates
df['info'] = df['info'].drop_duplicates().explode().values
df
Out[298]:
id text info
0 1 great son
1 1 excellent daughter
2 2 nice father
3 2 good mother
4 2 bad brother
5 3 awesome NaN

How to select and subset rows based on sting in pandas dataframe?

My dataset looks like following. I am trying to subset my pandas dataframe such that only the responses by all 3 people will get selected. For example, in below data frame the responses that were answered by all 3 people were "I like to eat" and "You have nice day" . Thus only those should be subsetted. I am not sure how to achieve this in Pandas dataframe.
Note: I am new to Python ,please provide explanation with your code.
DataFrame example
import pandas as pd
data = {'Person':['1', '1','1','2','2','2','2','3','3'],'Response':['I like to eat','You have nice day','My name is ','I like to eat','You have nice day','My name is','This is it','I like to eat','You have nice day'],
}
df = pd.DataFrame(data)
print (df)
Output:
Person Response
0 1 I like to eat
1 1 You have nice day
2 1 My name is
3 2 I like to eat
4 2 You have nice day
5 2 My name is
6 2 This is it
7 3 I like to eat
8 3 You have nice day
IIUC I am using transform with nunique
yourdf=df[df.groupby('Response').Person.transform('nunique')==df.Person.nunique()]
yourdf
Out[463]:
Person Response
0 1 I like to eat
1 1 You have nice day
3 2 I like to eat
4 2 You have nice day
7 3 I like to eat
8 3 You have nice day
Method 2
df.groupby('Response').filter(lambda x : pd.Series(df['Person'].unique()).isin(x['Person']).all())
Out[467]:
Person Response
0 1 I like to eat
1 1 You have nice day
3 2 I like to eat
4 2 You have nice day
7 3 I like to eat
8 3 You have nice day

Add rows according to other rows

My DataFrame object similar to this one:
Product StoreFrom StoreTo Date
1 out melon StoreQ StoreP 20170602
2 out cherry StoreW StoreO 20170614
3 out Apple StoreE StoreU 20170802
4 in Apple StoreE StoreU 20170812
I want to avoid duplications, in 3rd and 4th row show same action. I try to reach
Product StoreFrom StoreTo Date Days
1 out melon StoreQ StoreP 20170602
2 out cherry StoreW StoreO 20170614
5 in Apple StoreE StoreU 20170812 10
and I got more than 10k entry. I could not find similar work to this. Any help will be very useful.
d1 = df.assign(Date=pd.to_datetime(df.Date.astype(str)))
d2 = d1.assign(Days=d1.groupby(cols).Date.apply(lambda x: x - x.iloc[0]))
d2.drop_duplicates(cols, 'last')
io Product StoreFrom StoreTo Date Days
1 out melon StoreQ StoreP 2017-06-02 0 days
2 out cherry StoreW StoreO 2017-06-14 0 days
4 in Apple StoreE StoreU 2017-08-12 10 days

Creating a Two-Mode Network

Using Python 3.2 I am trying to turn data from a CSV file into a two-mode network. For those who do not know what that means, the idea is simple:
This is a snippet of my dataset:
Project_ID Name_1 Name_2 Name_3 Name_4 ... Name_150
1 Jean Mike
2 Mike
3 Joe Sarah Mike Jean Nick
4 Sarah Mike
5 Sarah Jean Mike Joe
I want to create a CSV that puts the Project_IDs across the first row of the CSV and each unique name down the first column (with cell A1 blank) and then a 1 in the i,j cell if that person worked on a given project. NOTE: My data has full names (with middle initial), with no two people having the same name so there will not be any duplicates.
The final data output would look like this:
1 2 3 4 5
Jean 1 0 1 0 1
Mike 1 1 1 1 1
Joe 0 0 1 0 1
Sarah 0 0 1 1 1
... ... ... ... ... ...
Nick 0 0 1 0 0
Start by using the CVS reader
import csv
with open('some.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
print row
Note that row will read as arrays for each line.
The output array should probably be created before you start. As from this question, here is how you could do that
buckets = [[0 for col in range(5)] for row in range(10)]

Resources