I have a dataframe as shown below.
df:
id player country_code country
1 messi arg argentina
2 neymar bra brazil
3 tevez arg argentina
4 aguero arg argentina
5 rivaldo bra brazil
6 owen eng england
7 lampard eng england
8 gerrard eng england
9 ronaldo bra brazil
10 marria arg argentina
from the above df, I would like to extract the mapping dictionary that relates the country_code with country columns.
Expected Output:
d = {'arg':'argentina', 'bra':'brazil', 'eng':'england'}
Dictionary has unique keys, so is possible convert Series with duplicated index by column country_code:
d = df.set_index('country_code')['country'].to_dict()
If there is possible some country should be different per country_code, then is used last value per country.
I have two dataframe below, I 'd like to merge them to get ID on df1. However, I find by using merge, I cannot get the ID if the names are more than one. df2 has unique name, df1 and df2 are different in rows and columns. My code below:
df1: Name Region
0 P Asia
1 Q Eur
2 R Africa
3 S NA
4 R Africa
5 R Africa
6 S NA
df2: Name Id
0 P 1234
1 Q 1244
2 R 1233
3 S 1111
code:
x= df1.assign(temp1 = df1.groupby ('Name').cumcount())
y= df2.assign(temp1 = df2.groupby ('Name').cumcount())
xy= x.merge(y, on=['Name',temp2],how = 'left').drop(columns = ['temp1'])
the output is:
df1: Name Region Id
0 P Asia 1234
1 Q Eur 1244
2 R Africa 1233
3 S NA 1111
4 R Africa NAN
5 R Africa NAN
6 S NA NAN
How do I find all the id for these duplicate names?
I am comparing two data frames with master_df and create a new column based on a new condition if available.
for example I have master_df and two region df as asia_df and europe_df. I want to check if company of master_df is available in any of the region data frames and create a new column as region as Europe and Asia
master_df
company product
ABC Apple
BCA Mango
DCA Apple
ERT Mango
NFT Oranges
europe_df
account sales
ABC 12
BCA 13
DCA 12
asia_df
account sales
DCA 15
ERT 34
My final output dataframe is expected to be
company product region
ABC Apple Europe
BCA Mango Europe
DCA Apple Europe
DCA Apple Asia
ERT Mango Asia
NFT Oranges Others
When I try to merge and compare, some datas are removed. I need help on how to fix this issues
final_df = europe_df.merge(master_df, left_on='company', right_on='account', how='left').drop_duplicates()
final1_df = asia_df.merge(master_df, left_on='company', right_on='account', how='left').drop_duplicates()
final['region'] = np.where(final_df['account'] == final_df['company'] ,'Europe','Others')
final['region'] = np.where(final1_df['account'] == final1_df['company'] ,'Asia','Others')
First using pd.concat concat the dataframes asia_df and europe_df then use DataFrame.merge to merge them with master_df, finally use Series.fillna to fill NaN values in Region with Others:
r = pd.concat([europe_df.assign(Region='Europe'), asia_df.assign(Region='Asia')])\
.rename(columns={'account': 'company'})[['company', 'Region']]
df = master_df.merge(r, on='company', how='left')
df['Region'] = df['Region'].fillna('Others')
Result:
print(df)
company product Region
0 ABC Apple Europe
1 BCA Mango Europe
2 DCA Apple Europe
3 DCA Apple Asia
4 ERT Mango Asia
5 NFT Oranges Others
I am have two dataframes as below. I want to rewrite the data selection SQL query into pandaswhich contains not exists condition
SQL
Select ORDER_NUM, DRIVER FROM DF
WHERE
1=1
AND NOT EXISTS
(
SELECT 1 FROM
order_addition oa
WHERE
oa.Flag_Value = 'Y'
AND df.ORDER_NUM = oa.ORDER_NUM)
Sample data
order_addition.head(10)
ORDER_NUM Flag_Value
22574536 Y
32459745 Y
15642314 Y
12478965 N
25845673 N
36789156 N
df.head(10)
ORDER_NUM REGION DRIVER
22574536 WEST Ravi
32459745 WEST David
15642314 SOUTH Rahul
12478965 NORTH David
25845673 SOUTH Mani
36789156 SOUTH Tim
How can this be done in pandas easily.
IIUC, you can merge on df1 with values equal to Y, and then find the nans:
result = df2.merge(df1[df1["Flag_Value"].eq("Y")],how="left",on="ORDER_NUM")
print (result[result["Flag_Value"].isnull()])
ORDER_NUM REGION DRIVER Flag_Value
3 12478965 NORTH David NaN
4 25845673 SOUTH Mani NaN
5 36789156 SOUTH Tim NaN
Or even simpler if your ORDER_NUM are unique:
print (df2.loc[~df2["ORDER_NUM"].isin(df1.loc[df1["Flag_Value"].eq("Y"),"ORDER_NUM"])])
ORDER_NUM REGION DRIVER
3 12478965 NORTH David
4 25845673 SOUTH Mani
5 36789156 SOUTH Tim
I have a column in my pandas DataFrame with country names. I want to apply different filters on the column using if-else conditions and have to add a new column on that DataFrame with those conditions.
Current DataFrame:-
Company Country
BV Denmark
BV Sweden
DC Norway
BV Germany
BV France
DC Croatia
BV Italy
DC Germany
BV Austria
BV Spain
I have tried this but in this, I have to define countries again and again.
bookings_d2.loc[(bookings_d2.Country== 'Denmark') | (bookings_d2.Country== 'Norway'), 'Country'] = bookings_d2.Country
In R I am currently using if else condition like this, I want to implement this same thing in python.
R Code Example 1 :
ifelse(bookings_d2$COUNTRY_NAME %in% c('Denmark','Germany','Norway','Sweden','France','Italy','Spain','Germany','Austria','Netherlands','Croatia','Belgium'),
as.character(bookings_d2$COUNTRY_NAME),'Others')
R Code Example 2 :
ifelse(bookings_d2$country %in% c('Germany'),
ifelse(bookings_d2$BOOKING_BRAND %in% c('BV'),'Germany_BV','Germany_DC'),bookings_d2$country)
Expected DataFrame:-
Company Country
BV Denmark
BV Sweden
DC Norway
BV Germany_BV
BV France
DC Croatia
BV Italy
DC Germany_DC
BV Others
BV Others
Not sure exactly what you are trying to achieve, but I guess it is something along the lines of:
df=pd.DataFrame({'country':['Sweden','Spain','China','Japan'], 'continent':[None] * 4})
country continent
0 Sweden None
1 Spain None
2 China None
3 Japan None
df.loc[(df.country=='Sweden') | ( df.country=='Spain'), 'continent'] = "Europe"
df.loc[(df.country=='China') | ( df.country=='Japan'), 'continent'] = "Asia"
country continent
0 Sweden Europe
1 Spain Europe
2 China Asia
3 Japan Asia
You can also use python list comprehension like:
df.continent=["Europe" if (x=="Sweden" or x=="Denmark") else "Other" for x in df.country]
You can use:
For example1: Use Series.isin with numpy.where or loc, but necessary invert mask by ~:
#removed Austria, Spain
L = ['Denmark','Germany','Norway','Sweden','France','Italy',
'Germany','Netherlands','Croatia','Belgium']
df['Country'] = np.where(df['Country'].isin(L), df['Country'], 'Others')
Alternative:
df.loc[~df['Country'].isin(L), 'Country'] ='Others'
For example2: Use numpy.select or nested np.where:
m1 = df['Country'] == 'Germany'
m2 = df['Company'] == 'BV'
df['Country'] = np.select([m1 & m2, m1 & ~m2],['Germany_BV','Germany_DC'], df['Country'])
Alternative:
df['Country'] = np.where(~m1, df['Country'],
np.where(m2, 'Germany_BV','Germany_DC'))
print (df)
Company Country
0 BV Denmark
1 BV Sweden
2 DC Norway
3 BV Germany_BV
4 BV France
5 DC Croatia
6 BV Italy
7 DC Germany_DC
8 BV Others
9 BV Others
You can do to get it:
country_others=['Poland','Switzerland']
df.loc[df['Country']=='Germany','Country']=df.loc[df['Country']=='Germany'].apply(lambda x: x+df['Company'])['Country']
df.loc[(df['Company']=='DC') &(df['Country'].isin(country_others)),'Country']='Others'