I have a pandas dataframe like this:
df1:
id name gender
1 Alice Male
2 Jenny Female
3 Bob Male
And now I want to add a new column sport which will contain values in the form of list.Let's I want to add Football to the rows where gender is male So df1 will look like:
df1:
id name gender sport
1 Alice Male [Football]
2 Jenny Female NA
3 Bob Male [Football]
Now if I want to add Badminton to rows where gender is female and tennis to rows where gender is male so that final output is:
df1:
id name gender sport
1 Alice Male [Football,Tennis]
2 Jenny Female [Badminton]
3 Bob Male [Football,Tennis]
How to write a general function in python which will accomplish this task of appending new values into the column based upon some other column value?
The below should work for you. Initialize column with an empty list and proceed
df['sport'] = np.empty((len(df), 0)).tolist()
def append_sport(df, filter_df, sport):
df.loc[filter_df, 'sport'] = df.loc[filter_df, 'sport'].apply(lambda x: x.append(sport) or x)
return df
filter_df = (df.gender == 'Male')
df = append_sport(df, filter_df, 'Football')
df = append_sport(df, filter_df, 'Cricket')
Output
id name gender sport
0 1 Alice Male [Football, Cricket]
1 2 Jenny Female []
2 3 Bob Male [Football, Cricket]
Related
I have two dataframes with different information about a person, on the first dataframe, person's name may repeat in different rows. I want to add/update the first dataframe with data from the second dataframe where the two columns containing person's data matches on both. Here an example on what I need to accomplish:
df1:
name surname
0 john doe
1 mary doe
2 peter someone
3 mary doe
4 john another
5 paul another
df2:
name surname account_id
0 peter someone 100
1 john doe 200
2 mary doe 300
3 john another 400
I need to accomplish this:
df1:
name surname account_id
0 john doe 200
1 mary doe 300
2 peter someone 100
3 mary doe 300
4 john another 400
5 paul another <empty>
Thanks!
If i have a dataframe and i want to merge ID column based on the Name column without deleting any row.
How would i do this?
Ex-
Name
ID
John
ABC
John
XYZ
Lucy
MNO
I want to convert the above dataframe into the below one
Name
ID
John
ABC, XYZ
John
ABC, XYZ
Lucy
MNO
Use GroupBy.transform with join:
df['ID'] = df.groupby('Name')['ID'].transform(', '.join)
print (df)
Name ID
0 John ABC, XYZ
1 John ABC, XYZ
2 Lucy MNO
Background
import pandas as pd
Names = [list(['Jon', 'Smith', 'jon', 'John']),
list([]),
list(['Bob', 'bobby', 'Bobs'])]
df = pd.DataFrame({'Text' : ['Jon J Smith is Here and jon John from ',
'',
'I like Bob and bobby and also Bobs diner '],
'P_ID': [1,2,3],
'P_Name' : Names
})
#rearrange columns
df = df[['Text', 'P_ID', 'P_Name']]
df
Text P_ID P_Name
0 Jon J Smith is Here and jon John from 1 [Jon, Smith, jon, John]
1 2 []
2 I like Bob and bobby and also Bobs diner 3 [Bob, bobby, Bobs]
Goal
I would like to use the following function
df['new']=df.Text.replace(df.P_Name,'**BLOCK**',regex=True)
but skip row 2, since it has an empty list []
Tried
I have tried the following
try:
df['new']=df.Text.replace(df.P_Name,'**BLOCK**',regex=True)
except ValueError:
pass
But I get the following output
Text P_ID P_Name
0 Jon J Smith is Here and jon John from 1 [Jon, Smith, jon, John]
1 2 []
2 I like Bob and bobby and also Bobs diner 3 [Bob, bobby, Bobs]
Desired Output
Text P_ID P_Name new
0 `**BLOCK**` J `**BLOCK**` is Here and `**BLOCK**` `**BLOCK**` from
1 []
2 I like `**BLOCK**` and `**BLOCK**` and also `**BLOCK**` diner
Question
How do I get my desired output by skipping row 2 and continuing with my function?
Locate the rows which do not have an empty list and use your replace method only on those rows:
# Boolean indexing the rows which do not have an empty list
m = df['P_Name'].str.len().ne(0)
df.loc[m, 'New'] = df.loc[m, 'Text'].replace(df.loc[m].P_Name,'**BLOCK**',regex=True)
Output
Text P_ID P_Name New
0 Jon J Smith is Here and jon John from 1 [Jon, Smith, jon, John] **BLOCK** J **BLOCK** is Here and **BLOCK** **BLOCK** from
1 Test 2 [] NaN
2 I like Bob and bobby and also Bobs diner 3 [Bob, bobby, Bobs] I like **BLOCK** and **BLOCK** and also **BLOCK**s diner
I have a column name called Student name and each row has four or five student names -- like this John mills, Tim Harry, Alex win, Kate marry... I want to take the first two student names and store into a new column called Student 1 and Student 2. Names have been separated from comma.
I created a function and i can able to extract first student name . result storing into my dataframe called student_0
def find_student(df2):
for i in range(2):
df2[f"student name_{i}"] = [x.split(',')[i] for x in df2["student name"]]
return df2
new_df = find_student(df2)
df2 is my dataframe name
I AM NOT GETTING SECOND STUDENT NAME. PLEASE ADVISE
Use Series.str.split with select first 2 columns by positions by DataFrame.iloc if need name and surnames:
print (df2)
student name
0 John mills, Tim Harry, Alex win, Kate marry
1 Brando XI, James Caan, Richard S. Castellano
2 Heath Ledger, Aaron Eckhart, Michael Caine
N = 2
df3 = df2["student name"].str.split(', ', expand=True).iloc[:, :N]
#rename columns names
df3.columns = [f"student name_{i+1}" for i in range(len(df3.columns))]
print (df3)
student name_1 student name_2
0 John mills Tim Harry
1 Brando XI James Caan
2 Heath Ledger Aaron Eckhart
Or use list comprehension:
N = 2
L = [x.split(',')[:2] for x in df2["student name"]]
df3 = pd.DataFrame(L, columns=[f"student name_{i+1}" for i in range(N)])
print (df3)
student name_1 student name_2
0 John mills Tim Harry
1 Brando XI James Caan
2 Heath Ledger Aaron Eckhart
If need only names:
N = 2
L = [[y.split()[0] for y in x.split(',')[:2]] for x in df2["student name"]]
df3 = pd.DataFrame(L, columns=[f"student name_{i+1}" for i in range(N)])
print (df3)
student name_1 student name_2
0 John Tim
1 Brando James
2 Heath Aaron
#join to original if necessary
df2 = df2.join(df3)
try this
def find_student(df2):
for i in range(2):
df2[f"student name_{i}"] = pd.Series(map(lambda x: x.split(',')[i], df2["student name"]))
return df2
Use pandas functionality(str and split), you don't need to write a function.
df = [["John mills, Tim Harry, Alex win, Kate marry"],
["Brando XI, James Caan, Richard S. Castellano"],
["Heath Ledger,Aaron Eckhart, Michael Caine"]]
df2 = pd.DataFrame(df)
df2.columns = ['Student_Name']
df2['student name_1'] = df2.Student_Name.str.split(",").str[0]
df2['student name_2'] = df2.Student_Name.str.split(",").str[1]
I have a column for age and a column for gender. How can I get the number of people with ages between 0-18 and with a gender of "male"?
For example, with the following data, the formula should return "1":
Name
Age
Gender
Rick
70
male
Morty
14
male
Summer
17
female
Beth
34
female
Jerry
35
male
Assuming your age data is in column A:A and gender in column B:B:
=COUNTIFS(A:A,"<=18",B:B,"M")
(and I hope all age values are >0). Otherwise:
=COUNTIFS(A:A,"<=18",A:A,">=0",B:B,"M")