How update a dataframe column value from second dataframe where values on two specific columns that can repeat on first match on both dataframes? - python-3.x

I have two dataframes with different information about a person, on the first dataframe, person's name may repeat in different rows. I want to add/update the first dataframe with data from the second dataframe where the two columns containing person's data matches on both. Here an example on what I need to accomplish:
df1:
name surname
0 john doe
1 mary doe
2 peter someone
3 mary doe
4 john another
5 paul another
df2:
name surname account_id
0 peter someone 100
1 john doe 200
2 mary doe 300
3 john another 400
I need to accomplish this:
df1:
name surname account_id
0 john doe 200
1 mary doe 300
2 peter someone 100
3 mary doe 300
4 john another 400
5 paul another <empty>
Thanks!

Related

Joining column of different rows in pandas

If i have a dataframe and i want to merge ID column based on the Name column without deleting any row.
How would i do this?
Ex-
Name
ID
John
ABC
John
XYZ
Lucy
MNO
I want to convert the above dataframe into the below one
Name
ID
John
ABC, XYZ
John
ABC, XYZ
Lucy
MNO
Use GroupBy.transform with join:
df['ID'] = df.groupby('Name')['ID'].transform(', '.join)
print (df)
Name ID
0 John ABC, XYZ
1 John ABC, XYZ
2 Lucy MNO

How do I create single cell arrays based on unique contiguous groups?

Column A identifies unique families using multiple other columns of data.
Column B is a list of individuals.
I would like Column C to contain cell arrays of these families (Shown Below).
For some reason, the MATCH formula in my attempted solution is returning the last occurrence of the match, so it does not work.
I have tried this formula (the output of this is shown in Column D in the picture):
{=OFFSET(INDEX(A:A, MATCH(A1,A:A)),0,1,COUNTIF(A:A,A1))}
A B C D
1 Tom 1 {Tom One, Sue One} Sue 1
1 Sue 1 {Tom One, Sue One} Sue 1
2 Bob 2 {Bob Two, Joan Two, John Two} John 2
2 Joan 2 {Bob Two, Joan Two, John Two} John 2
2 John 2 {Bob Two, Joan Two, John Two} John 2
3 Tom 3 {Tom Three} Tom 3
4 Joe 4 {Joe Four} Joe 4
You can use the following formula, the condition is that it is sorted baed on column A:
="{"&TEXTJOIN(",",TRUE,INDEX(B:B,MATCH(A1,A:A,0)):INDEX(B:B,MATCH(A1,A:A,1)))&"}"

Appending new elements to a column in pandas dataframe

I have a pandas dataframe like this:
df1:
id name gender
1 Alice Male
2 Jenny Female
3 Bob Male
And now I want to add a new column sport which will contain values in the form of list.Let's I want to add Football to the rows where gender is male So df1 will look like:
df1:
id name gender sport
1 Alice Male [Football]
2 Jenny Female NA
3 Bob Male [Football]
Now if I want to add Badminton to rows where gender is female and tennis to rows where gender is male so that final output is:
df1:
id name gender sport
1 Alice Male [Football,Tennis]
2 Jenny Female [Badminton]
3 Bob Male [Football,Tennis]
How to write a general function in python which will accomplish this task of appending new values into the column based upon some other column value?
The below should work for you. Initialize column with an empty list and proceed
df['sport'] = np.empty((len(df), 0)).tolist()
def append_sport(df, filter_df, sport):
df.loc[filter_df, 'sport'] = df.loc[filter_df, 'sport'].apply(lambda x: x.append(sport) or x)
return df
filter_df = (df.gender == 'Male')
df = append_sport(df, filter_df, 'Football')
df = append_sport(df, filter_df, 'Cricket')
Output
id name gender sport
0 1 Alice Male [Football, Cricket]
1 2 Jenny Female []
2 3 Bob Male [Football, Cricket]

ms excel turn columns into rows

I am looking for a way to take an excel spread sheet and change the columns to rows. The sheet is currently designed like this.
Employee Name Expiration 1 Expiration 2 Expiration 3 Expiration 4
John Doe
Jane Doe
What I need to do is convert it to this
Employee Name Expiration Date
John Doe 1 12-12-12
John Doe 2 12-12-12
John Doe 3 12-12-12
John Doe 4 12-12-12
Jane Doe 1 12-1-12
Jane Doe 2 12-1-12
Jane Doe 3 12-1-12
Jane Doe 4 12-1-12
I am not even sure if this is possible.
You can do this very simply with an INDEX(MATCH(),MATCH()) formula.
This screenshot should be enough to get you started:
Screenshot(1)
Let me know if you need any more info.

Conditionally assign unique values

I have data, for example:
John Doe MD 1
Ben Doe PA
Cal Doe MD 1
Drum Doe PA
Egg Doe NP
Fun Doe MD 1
So everywhere there's an MD I have an IF condition assigning the number 1. But I want to isolate and pull all the names with a 1 consecutively. Example:
John Doe
Cal Doe
Fun Doe
I know I just have to have numeric values such as: john doe-1, cal doe-2, fun doe-3.
I'm having problem with the logic if someone can help I'll appreciate it.
If your data is in three columns (starting A2) then maybe:
=IF(B2="MD",MAX(C$1:C1)+1,"")

Resources