Groupby and print entire dataframe in Pandas - python-3.x

I have a dataset as below, in this case, I want to count the number of fruits in each country and output as a column in the dataset.
I tried to use groupby,
df=df.groupby('Country')['Fruits'].count(),
but in this case I am not getting the expected results as the groupby just outputs the count and not the entire dataframe/dataset.
It would be helpful if someone can suggest a better way to do this.
Dataset
Country Fruits Price Sold Weather
India Mango 200 Market sunny
India Apple 250 Shops sunny
India Banana 50 Market winter
India Grapes 150 Road sunny
Germany Apple 350 Supermarket Autumn
Germany Mango 500 Supermarket Rainy
Germany Kiwi 200 Online Spring
Japan Kaki 300 Online sunny
Japan melon 200 Supermarket sunny
Expected Output
Country Fruits Price Sold Weather Number
India Mango 200 Market sunny 4
India Apple 250 Shops sunny 4
India Banana 50 Market winter 4
India Grapes 150 Road sunny 4
Germany Apple 350 Supermarket Autumn 3
Germany Mango 500 Supermarket Rainy 3
Germany Kiwi 200 Online Spring 3
Japan Kaki 300 Online sunny 2
Japan melon 200 Supermarket sunny 2
Thank you:)

You are looking for transform:
df['count'] = df.groupby('Country')['Fruits'].transform('size')
Country Fruits Price Sold Weather count
0 India Mango 200 Market sunny 4
1 India Apple 250 Shops sunny 4
2 India Banana 50 Market winter 4
3 India Grapes 150 Road sunny 4
4 Germany Apple 350 Supermarket Autumn 3
5 Germany Mango 500 Supermarket Rainy 3
6 Germany Kiwi 200 Online Spring 3
7 Japan Kaki 300 Online sunny 2
8 Japan melon 200 Supermarket sunny 2

Related

How can I use "groupby()" for gathering country names?

I have three columns pandas dataframe; the name of the country, year and value. The year starts from 1960 to 2020 for each country.
The data looks like that;
Country Name
Year
value
USA
1960
12
Italy
1960
8
Spain
1960
5
Italy
1961
35
USA
1961
50
I would like to gather same country names. How can I do it? I could not succeed using groupby()รง Groupby() always requires functions like sum().
Country Name
Year
value
USA
1960
12
USA
1961
50
Italy
1960
8
Italy
1961
35
Spain
1960
5
Spain
1960
5

Merge data frame with other and calculate groupby percentage based on the specific condition

I have two data frames as shown below
df1:
Sports Expected_%
Cricket 70
Football 20
Tennis 10
df2:
Region Sports Count Percentage
North Cricket 800 75
North Football 50 5
North Tennis 150 20
South Cricket 1300 65
South Football 550 27.5
South Tennis 150 7.5
Expected Output:
Region Sports Count Percentage Expected_% Expected_count
North Cricket 800 75 70 700
North Football 50 5 20 200
North Tennis 150 20 10 100
South Cricket 1300 65 70 1400
South Football 550 27.5 20 400
South Tennis 150 7.5 10 200
Explanation:
Expected_% for Cricket = 70
Total Count for North = 1000
Expected_Count for North = 1000*70/100 = 700
Use DataFrame.merge with left join for new column, then use GroupBy.transform with sum for new Series, multiple by new column and divide by 100:
df = df2.merge(df1, on='Sports', how='left')
summed = df.groupby('Region')['Count'].transform('sum')
df['Expected_count'] = summed.mul(df['Expected_%']).div(100)
print (df)
Region Sports Count Percentage Expected_% Expected_count
0 North Cricket 800 75.0 70 700.0
1 North Football 50 5.0 20 200.0
2 North Tennis 150 20.0 10 100.0
3 South Cricket 1300 65.0 70 1400.0
4 South Football 550 27.5 20 400.0
5 South Tennis 150 7.5 10 200.0
Or use Series.map for new column:
df2['Expected_%']= df2['Sports'].map(df1.set_index('Sports')['Expected_%'])
summed = df2.groupby('Region')['Count'].transform('sum')
df2['Expected_count'] = summed.mul(df2['Expected_%']).div(100)
print (df2)
Region Sports Count Percentage Expected_% Expected_count
0 North Cricket 800 75.0 70 700.0
1 North Football 50 5.0 20 200.0
2 North Tennis 150 20.0 10 100.0
3 South Cricket 1300 65.0 70 1400.0
4 South Football 550 27.5 20 400.0
5 South Tennis 150 7.5 10 200.0
Another way:
map_dict = dict(df1.values)
df2['Percentage'] = df2.groupby('Region').apply(lambda x: (x['Count'].sum() * x['Sports'].map(map_dict))).div(100).values

Function that returns one value from two columns

I have a DataFrame df:
Country Currency
1 China YEN
2 USA USD
3 Russia USD
4 Germany EUR
5 Nigeria NGN
6 Nigeria USD
7 China CNY
8 USA EUR
9 Nigeria EUR
10 Sweden SEK
I want to make a function that reads both of these columns, by column name, and returns a value that indicates if the currency is a local currency or not.
Result would look like this:
Country Currency LCY?
1 China YEN 0
2 USA USD 1
3 Russia USD 0
4 Germany EUR 1
5 Nigeria NGN 1
6 Nigeria USD 0
7 China CNY 1
8 USA EUR 0
9 Nigeria EUR 0
10 Sweden SEK 1
I tried this, but it didn't work:
LOCAL_CURRENCY = {'China':'CNY',
'USA':'USD',
'Russia':'RUB',
'Germany':'EUR',
'Nigeria':'NGN',
'Sweden':'SEK'}
def f(x,y):
if x in LOCAL_CURRENCY and y in LOCAL_CURRENCY:
return (1)
else:
return (0)
Any thoughts?
You can use map and compare:
df['LCY'] = df['Country'].map(LOCAL_CURRENCY).eq(df['Currency']).astype(int)
Output:
Country Currency LCY
1 China YEN 0
2 USA USD 1
3 Russia USD 0
4 Germany EUR 1
5 Nigeria NGN 1
6 Nigeria USD 0
7 China CNY 1
8 USA EUR 0
9 Nigeria EUR 0
10 Sweden SEK 1

How to sort the Dataframe by comparing two columns

I just want to sort Team 1 and Team 2 By comparing them both.
Team 1 Team 2 Winner Ground
0 Australia England Australia Melbourne
1 England Australia England Manchester
2 Australia England Australia Lord
3 England Australia England Birmingham
4 New Zealand Australia Australia Dunedin
5 Australia New Zealand Australia Christchurch
6 India England England Leeds
7 England India England The Oval
After comparing and Sorting it Would be like:
Team 1 Team 2 Winner Ground
0 England Australia Australia Melbourne
1 England Australia England Manchester
2 England Australia Australia Lord
3 England Australia England Birmingham
4 New Zealand Australia Australia Dunedin
5 New Zealand Australia Australia Christchurch
6 India England England Leeds
7 India England England The Oval
If need sorting values per rows in descending order only for team columns use numpy.sort:
df[['Team 1','Team 2']] = np.sort(df[['Team 1','Team 2']], axis=1)[:, ::-1]
print (df)
Team 1 Team 2 Winner Ground
0 England Australia Australia Melbourne
1 England Australia England Manchester
2 England Australia Australia Lord
3 England Australia England Birmingham
4 New Zealand Australia Australia Dunedin
5 New Zealand Australia Australia Christchurch
6 India England England Leeds
7 India England England The Oval
Details:
First sorting in ascending order:
print (np.sort(df[['Team 1','Team 2']], axis=1))
[['Australia' 'England']
['Australia' 'England']
['Australia' 'England']
['Australia' 'England']
['Australia' 'New Zealand']
['Australia' 'New Zealand']
['England' 'India']
['England' 'India']]
And then swap 'columns' by indexing:
print (np.sort(df[['Team 1','Team 2']], axis=1)[:, ::-1])
[['England' 'Australia']
['England' 'Australia']
['England' 'Australia']
['England' 'Australia']
['New Zealand' 'Australia']
['New Zealand' 'Australia']
['India' 'England']
['India' 'England']]

Display minimum value excluding zero along with adjacent column value from each year + Python 3+, dataframe

I have a dataframe with three columns as Year, Product, Price. I wanted to calculate minimum value excluding zero from Price from each year. Also wanted to populate adjacent value from column Product to the minimum value.
Data:
Year Product Price
2000 Grapes 0
2000 Apple 220
2000 pear 185
2000 Watermelon 172
2001 Orange 0
2001 Muskmelon 90
2001 Pear 165
2001 Watermelon 99
Desirable output in new dataframe:
Year Minimum Price Product
2000 172 Watermelon
2001 90 Muskmelon
First filter out 0 rows by boolean indexing:
df1 = df[df['Price'] != 0]
And then use DataFrameGroupBy.idxmin for indices for minimal Price per groups with selecting by loc:
df2 = df1.loc[df1.groupby('Year')['Price'].idxmin()]
Alternative is use sort_values with drop_duplicates:
df2 = df1.sort_values(['Year', 'Price']).drop_duplicates('Year')
print (df2)
Year Product Price
3 2000 Watermelon 172
5 2001 Muskmelon 90
If possible multiple minimal values and need all of them per groups:
print (df)
Year Product Price
0 2000 Grapes 0
1 2000 Apple 220
2 2000 pear 172
3 2000 Watermelon 172
4 2001 Orange 0
5 2001 Muskmelon 90
6 2001 Pear 165
7 2001 Watermelon 99
df1 = df[df['Price'] != 0]
df = df1[df1['Price'].eq(df1.groupby('Year')['Price'].transform('min'))]
print (df)
Year Product Price
2 2000 pear 172
3 2000 Watermelon 172
5 2001 Muskmelon 90
EDIT:
print (df)
Year Product Price
0 2000 Grapes 0
1 2000 Apple 220
2 2000 pear 185
3 2000 Watermelon 172
4 2001 Orange 0
5 2001 Muskmelon 90
6 2002 Pear 0
7 2002 Watermelon 0
df['Price'] = df['Price'].replace(0, np.nan)
df2 = df.sort_values(['Year', 'Price']).drop_duplicates('Year')
df2['Product'] = df2['Product'].mask(df2['Price'].isnull(), 'No data')
print (df2)
Year Product Price
3 2000 Watermelon 172.0
5 2001 Muskmelon 90.0
6 2002 No data NaN

Resources