add column from a dataframe to another dataframe with same rows - python-3.x

I have a dataframe (df) that contains 30 000 rows
id Name Age
1 Joey 22
2 Anna 34
3 Jon 33
4 Amy 30
5 Kay 22
And Another dataframe (df2) that contains same columns but with some Ids missing
id Name Age Sport
Jon 33 Tennis
5 Kay 22 Football
Joey 22 Basketball
4 Amy 30 Running
Anna 42 Dancing
I want the missing IDs to appear in df2 with the correspondant name
df2:
id Name Age Sport
3 Jon 33 Tennis
5 Kay 22 Football
1 Joey 22 Basketball
4 Amy 30 Running
2 Anna 42 Dancing
Can someone help ? I am new to pandas and dataframe

you can use .map with .fillna
df2['id'] = df2['id'].replace('',np.nan,regex=True)\
.fillna(df2['Name'].map(df1.set_index('Name')['id'])).astype(int)
print(df2)
id Name Age Sport
0 3 Jon 33 Tennis
1 5 Kay 22 Football
2 1 Joey 22 Basketball
3 4 Amy 30 Running
4 2 Anna 42 Dancing

First, join the two dataframes with pd.merge based on your keys. I suppose the keys are 'Name' and 'Age' in this case. Then replace the null id values in df2, using np.where and .isnull() to find the null values.
df3 = pd.merge(df2, df1, on=['name', 'age'], how='left')
df2['id'] = np.where(df3.id_x.isnull(), df3.id_y, df3.id_x).astype(int)
id name age sport
0 1 Joey 22 Tennis
1 2 Anna 34 Football
2 3 Jon 33 Basketball
3 4 Amy 30 Running
4 5 Kay 22 Dancing

Related

For every row with the same name, find the 5 lowest number Pandas Python

My data frame looks like this
Location week Number
Austria 1 154
Austria 2 140
Belgium 1 139
Bulgaria 2 110
Bulgaria 1 164
the solution should look like this
Location week Number
Austria 3 100
Austria 2 101
Austria 1 102
Bulgaria 2 100
Bulgaria 3 101
Bulgaria 1 102
this means that I need to display
Column 1 : I need to group the countries by name
Column 2 : Week (every country has 53 weeks assigned to them)
Column 3 : Show the numbers that occured in each of 53 weeks in an ascending order
I can not get my head around this
Sort the rows in the order your like (here by Location and Number) and take the first 5 rows per group with groupby+head:
df.sort_values(by=['Location', 'Number']).groupby('Location').head(5)
output:
Location week Number
0 Austria 3 100
1 Austria 2 101
2 Austria 1 102
3 Bulgaria 2 100
4 Bulgaria 3 101
5 Bulgaria 1 102
another way using .cumcount() and .loc
con = df.sort_values('Number',ascending=True).groupby('Location').cumcount()
df.loc[con.lt(5)]

Merge 2 pandas dataframes - python

I have 2 pandas dataframes:
data1=
sample ID
name
sex
0
a
male
1
b
male
2
c
male
3
d
male
4
e
male
data2=
samples
Diabetic
age
0
yes
43
1
yes
50
2
no
63
3
no
21
4
yes
44
I want to merge both data frames to end up with the following data frame
samples
Diabetic
age
name
sex
0
yes
43
a
male
1
yes
50
b
male
2
no
63
c
male
3
no
21
d
male
4
yes
44
e
male

How do I create a new column in pandas which is the sum of another column based on a condition?

I am trying to get the result column to be the sum of the value column for all rows in the data frame where the country is equal to the country in that row, and the date is on or before the date in that row.
Date Country ValueResult
01/01/2019 France 10 10
03/01/2019 England 9 9
03/01/2019 Germany 7 7
22/01/2019 Italy 2 2
07/02/2019 Germany 10 17
17/02/2019 England 6 15
25/02/2019 England 5 20
07/03/2019 France 3 13
17/03/2019 England 3 23
27/03/2019 Germany 3 20
15/04/2019 France 6 19
04/05/2019 England 3 26
07/05/2019 Germany 5 25
21/05/2019 Italy 5 7
05/06/2019 Germany 8 33
21/06/2019 England 3 29
24/06/2019 England 7 36
14/07/2019 France 1 20
16/07/2019 England 5 41
30/07/2019 Germany 6 39
18/08/2019 France 6 26
04/09/2019 England 3 44
08/09/2019 Germany 9 48
15/09/2019 Italy 7 14
05/10/2019 Germany 2 50
I have tried the below code but it sums up the entire column
df['result'] = df.loc[(df['Country'] == df['Country']) & (df['Date'] >= df['Date']), 'Value'].sum()
as your dates are ordered you could do:
df['Result'] = df.grouby('Coutry').Value.cumsum()

Compare two dataframes and remove rows from a df based on a matching column value

I have two pandas df which look like this:
df1:
pid Name score age
100 Ram 3 36
101 Tony 2 40
101 Jack 4 56
200 Jill 6 30
df2
pid Name score age
100 Ram 3 36
101 Tony 2 40
101 John 4 51
101 Jack 9 32
200 Jill 6 30
Both df's are indexed with 'pid'. I would like to compare df1 & df2 based on the column 'score'. i.e, I need to keep only those rows in df2 that are matching with df1 on index and value of score.
My expected result should be
new df2:
pid Name index age
100 Ram 3 36
101 Tony 2 40
101 John 4 51
200 Jill 6 30
Any help on this regard is highly appreciated.
Use merge by columns pid and score, but first create columns from index by reset_index, last create pid index again and for same columns of new DataFrame add reindex by df2.columns:
df = (pd.merge(df1.reset_index(),
df2.reset_index(), on=['score', 'pid'], how='left', suffixes=['_',''])
.set_index('pid')
.reindex(columns=df2.columns))
print (df)
Name score age
pid
100 Ram 3 36
101 Tony 2 40
101 John 4 51
200 Jill 6 30
Inputs:
print (df1)
Name score age
pid
100 Ram 3 36
101 Tony 2 40
101 Jack 4 56
200 Jill 6 30
print (df2)
Name score age
pid
100 Ram 3 36
101 Tony 2 40
101 John 4 51
101 Jack 9 32
200 Jill 6 30

Merge matching rows in excel & summarizing matching columns

Looking to merge some data and summarize the results. Bene poking around google but haven't found anything that will match up duplicates and summarize.
The left side of the table is what I'm starting with, I would like the output on the right side.
Street Name Widgets Sprockets Nuts Bolts Street Name Widgets Sprockets Nuts Bolts
123 Any street ACB Co 10 248 2 50 123 Any street ACB Co 10 846 10 78
123 Any street Bob's plumbing 25 22 2 7 123 Any street Bob's plumbing 25 22 2 7
456 Another st Bill's cars 55 5 456 456 Another st Bill's cars 62 878 13 55
123 Any street ACB Co 54 4 6 789 789 Ave Shelley and co 5 2 2 78
456 Another st Bill's cars 7 878 8 55 789 Ave Divers down 7 90 10 11
789 Ave Shelley and co 5 2 2 78 456 Another st ACB Co 6 50 5
123 Any street ACB Co 544 4 22
456 Another st ACB Co 6 50 5
789 Ave Divers down 6 90 9 4
789 Ave Divers down 1 1 7
Use Pivot Tables an set the layout to tabular.
Details can be found here: https://www.youtube.com/watch?v=LkFPBn7sgEc

Resources