How to merge specific column from another dataframe in Python Pandas? - python-3.x

I have two dataframe df1 and df2, in df1 I have 'id', 'name', 'rol' and in df2 I have 'id', 'sal', 'add', 'deg'.
I have to merge only 'sal' and 'deg' column from df2 to df1.
I have successfully merged all columns from df2 to df1.
but now I just need to add two columns on the basis of common column "id"
I am using python 3.7 version.
df_right = pd.merge(df1,df2,how='right',on='id')
how can I merge only these two columns ('sal' and 'deg') from df2 on the basis of 'id'?

Just go slice before you merge like so.
pd.merge(left=df1, right=df2[['id', 'sal', 'deg']], how='right', on='id')

Related

How do I give col names for reduce way of merging data frames

I have two dfs:- df1 and df2.:-
dfs=[df1,df2]
df_final = reduce(lambda left,right: pd.merge(left,right,on='Serial_Nbr'), dfs)
I want to select only one column apart from the merge column Serial_Nbr in df1while doing the merge.
how do i do this..?
Filter column in df1:
dfs=[df1[['Serial_Nbr']],df2]
Or if only 2 DataFrames remove reduce:
df_final = pd.merge(df1[['Serial_Nbr']], df2, on='Serial_Nbr')

How to compare multiple columns in two tables and find out the duplicates?

I have two dataframe
Dataframe 1
Dataframe 2
ID column is not unique in the two tables. I want to compare all the columns in both the tables except ID's and print the unique rows
Expected output
I tried 'isin' function, but not working. Each dataframe size is 150000 and I removed duplicates in both the tables. Please advise how to do that?
You can use df.append to combine the dataframe, then use df.duplicated which will flag the duplicates.
df3 = df1.append(df, ignore_index=True)
df4 = df3.duplicated(subset=['Team', 'name', 'Country', 'Token'], keep=False)

Python: DataFrame Index shifting

I have several dataframes that I have concatenated with pandas in the line:
xspc = pd.concat([df1,df2,df3], axis = 1, join_axes = [df3.index])
In df2 the index values read one day later than the values of df1, and df3. So for instance when the most current date is 7/1/19 the index values for df1 and df3 will read "7/1/19" while df2 reads '7/2/19'. I would like to be able to concatenate each series so that each dataframe is joined on the most recent date, so in other words I would like all the dataframe values from df1 index value '7/1/19' to be concatenated with dataframe 2 index value '7/2/19' and dataframe 3 index value '7/1/19'. When methods can I use to shift the data around to join on these not matching index values?
You can reset the index of the data frame and then concat the dataframes
df1=df1.reset_index()
df2=df2.reset_index()
df3=df3.reset_index()
df_final = pd.concat([df1,df2,df3],axis=1, join_axes=[df3.index])
This should work since you mentioned that the date in df2 will be one day after df1 or df3

Difference Between two Data frames

Is thee any way yo subtract values of two existing dataframe with the common headers in java ?
For example
DF1
|H0|H1|H2|H3|
|00|01|02|03|
|04|05|06|07|
|08|09|10|11|
DF2
|H0|H1|H2|H3|H4|
|01|02|03|04|12|
|05|06|07|08|13|
|09|11|12|13|14|
Subtraction example:
DF2 - DF1
|H0|H1|H2|H3|H4|
|01|01|01|01|12|
|01|01|01|01|13|
|01|01|01|01|14|

Pandas data frame merge select columns

I want to get data from only df2 (all columns) by comparing 'no' filed in both df1 and df2.
My 3 line code is below, for this i'm getting all columns from df1 and df2 not able to trim fields from df1. How to achieve ?
I've 2 pandas dataframes like below :
df1:
no,name,salary
1,abc,100
2,def,105
3,abc,110
4,def,115
5,abc,120
df2:
no,name,salary,dept,addr
1,abc,100,IT1,ADDR1
2,abc,101,IT2,ADDR2
3,abc,102,IT3,ADDR3
4,abc,103,IT4,ADDR4
5,abc,104,IT5,ADDR5
6,abc,105,IT6,ADDR6
7,abc,106,IT7,ADDR7
8,abc,107,IT8,ADDR8
df1 = pd.read_csv("D:\\data\\data1.csv")
df2 = pd.read_csv("D:\\data\\data2.csv")
resDF = pd.merge(df1, df2, on='no' , how='inner')
I think you need filter only no column, then on and how parameters are not necessary:
resDF = pd.merge(df1[['no']], df2)
Or use boolean indexing with filtering by isin:
resDF = df2[df2['no'].isin(df1['no'])]

Resources