Calculate Percentage using Pandas DataFrame

Calculate Percentage using Pandas DataFrame - python-3.x

Of all the Medals won by these 5 countries across all olympics,
what is the percentage medals won by each one of them?
i have combined all excel file in one using panda dataframe but now stuck with finding percentage
Country Gold Silver Bronze Total
0 USA 10 13 11 34
1 China 2 2 4 8
2 UK 1 0 1 2
3 Germany 12 16 8 36
4 Australia 2 0 0 2
0 USA 9 9 7 25
1 China 2 4 5 11
2 UK 0 1 0 1
3 Germany 11 12 6 29
4 Australia 1 0 1 2
0 USA 9 15 13 37
1 China 5 2 4 11
2 UK 1 0 0 1
3 Germany 10 13 7 30
4 Australia 2 1 0 3
Combined data sheet
Code that i have tried till now
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df= pd.DataFrame()
for f in ['E:\\olympics\\Olympics-2002.xlsx','E:\\olympics\\Olympics-
2006.xlsx','E:\\olympics\\Olympics-2010.xlsx',
'E:\\olympics\\Olympics-2014.xlsx','E:\\olympics\\Olympics-
2018.xlsx']:
data = pd.read_excel(f,'Sheet1')
df = df.append(data)
df.to_excel("E:\\olympics\\combineddata.xlsx")
data = pd.read_excel("E:\\olympics\\combineddata.xlsx")
print(data)
final_Data={}
for i in data['Country']:
x=i
t1=(data[(data.Country==x)].Total).tolist()
print("Name of Country=",i, int(sum(t1)))
final_Data.update({i:int(sum(t1))})
t3=data.groupby('Country').Total.sum()
t2= df['Total'].sum()
t4= t3/t2*100
print(t3)
print(t2)
print(t4)
this how is got the answer....Now i need to pull that in plot i want to put it pie

Let's assume you have created the DataFrame as 'df'. Then you can do the following to first group by and then calculate percentages.
df = df.groupby('Country').sum()
df['Gold_percent'] = (df['Gold'] / df['Gold'].sum()) * 100
df['Silver_percent'] = (df['Silver'] / df['Silver'].sum()) * 100
df['Bronze_percent'] = (df['Bronze'] / df['Bronze'].sum()) * 100
df['Total_percent'] = (df['Total'] / df['Total'].sum()) * 100
df.round(2)
print (df)
The output will be as follows:
Gold Silver Bronze ... Silver_percent Bronze_percent Total_percent
Country ...
Australia 5 1 1 ... 1.14 1.49 3.02
China 9 8 13 ... 9.09 19.40 12.93
Germany 33 41 21 ... 46.59 31.34 40.95
UK 2 1 1 ... 1.14 1.49 1.72
USA 28 37 31 ... 42.05 46.27 41.38

I am not having the exact dataset what you have . i am explaining with similar dataset .Try to add a column with sum of medals across rows.then find the percentage by dividing all the row by sum of entire column.
i am posting this as model check this
import pandas as pd
cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
'ExshowroomPrice': [21000,26000,28000,34000],'RTOPrice': [2200,250,2700,3500]}
df = pd.DataFrame(cars, columns = ['Brand', 'ExshowroomPrice','RTOPrice'])
Brand ExshowroomPrice RTOPrice
0 Honda Civic 21000 2200
1 Toyota Corolla 26000 250
2 Ford Focus 28000 2700
3 Audi A4 34000 3500
df['percentage']=(df.ExshowroomPrice +df.RTOPrice) * 100
/(df.ExshowroomPrice.sum() +df.RTOPrice.sum())
print(df)
Brand ExshowroomPrice RTOPrice percentage
0 Honda Civic 21000 2200 19.719507
1 Toyota Corolla 26000 250 22.311942
2 Ford Focus 28000 2700 26.094348
3 Audi A4 34000 3500 31.874203
hope its clear

Related

Inner merge in python with tables having duplicate values in key column

I am struggling to replicate sas(another programming language) inner merge in python .
The python inner merge is not matching with sas inner merge when duplicate key values are coming .
Below is an example :
zw = pd.DataFrame({"ID":[1,0,0,1,0,0,1],
"Name":['Shivansh','Shivansh','Shivansh','Amar','Arpit','Ranjeet','Priyanka'],
"job_profile":['DataS','SWD','DataA','DataA','AndroidD','PythonD','fullstac'],
"salary":[22,15,10,9,16,18,22],
"city":['noida','bangalore','hyderabad','noida','pune','gurugram','bangalore'],
"ant":[10,15,15,10,16,17,18]})
zw1 = pd.DataFrame({"ID-":[1,0,0,1,0,0,1],
"Name":['Shivansh','Shivansh','Swati','Amar','Arpit','Ranjeet','Priyanka'],
"job_profile_":['DataS','SWD','DataA','DataA','AndroidD','PythonD','fullstac'],
"salary_":[2,15,10,9,16,18,22],
"city_":['noida','kochi','hyderabad','noida','pune','gurugram','bangalore'],
"ant_":[1,15,15,10,16,17,18]})
zw and sw1 are the input tables . Both the tables need to be inner merged on the key column Name .The issue is both columns are having duplicate values in Name column .
Python is generating all possible combinations with the duplicate rows .
Below is the expected output :
I tried normal inner merge and tried dropping duplicate row with ID and Name columns , but still not getting the desired output .
df1=pd.merge(zw,zw1,on=['Name'],how='inner')
df1.drop_duplicates(['Name','ID'])

Use df.combine_first + df.sort_values combination:
df = zw.combine_first(zw1).sort_values('Name')
print(df)
ID ID- Name ant ant_ city city_ job_profile \
3 1 1 Amar 10 10 noida noida DataA
4 0 0 Arpit 16 16 pune pune AndroidD
6 1 1 Priyanka 18 18 bangalore bangalore fullstac
5 0 0 Ranjeet 17 17 gurugram gurugram PythonD
0 1 1 Shivansh 10 1 noida noida DataS
1 0 0 Shivansh 15 15 bangalore kochi SWD
2 0 0 Shivansh 15 15 hyderabad hyderabad DataA
job_profile_ salary salary_
3 DataA 9 9
4 AndroidD 16 16
6 fullstac 22 22
5 PythonD 18 18
0 DataS 22 2
1 SWD 15 15
2 DataA 10 10

how to categorize salary into high/med/low group in python?

I have an employee dataset having salary details. I like to add an additional column to display their salary group like high/med/low:
Data:
Empno Sal Deptno
1 800 20
2 1600 30
3 2975 20
4 1250 30
5 2850 30
6 2450 10
7 3000 20
Expected Output:
Empno Sal Deptno Sal_Group
1 800 20 low
2 1600 30 mid
3 2975 20 ...
4 1250 30 ...
5 2850 30 ...
6 2450 10 ...
7 3000 20 high

You can try this:
import pandas as pd
import numpy as np
df = pd.read_csv("file.csv")
bins = np.linspace(min(df['Sal']), max(df['Sal']),4)
groupNames = ["low", "med", "high"]
df['SalGroup'] = pd.cut(df['Sal'], bins, labels = groupNames, include_lowest = True)
print(df)

Groupby and create a new column by randomly assign multiple strings into it in Pandas

Let's say I have students infos id, age and class as follows:
id age class
0 1 23 a
1 2 24 a
2 3 25 b
3 4 22 b
4 5 16 c
5 6 16 d
I want to groupby class and create a new column named major by randomly assign math, art, business, science into it, which means for same class, the major strings are same.
We may need to use apply(lambda x: random.choice..) to realize this, but I don't know how to do this. Thanks for your help.
Output expected:
id age major class
0 1 23 art a
1 2 24 art a
2 3 25 science b
3 4 22 science b
4 5 16 business c
5 6 16 math d

Use numpy.random.choice with number of values by length of DataFrame:
df['major'] = np.random.choice(['math', 'art', 'business', 'science'], size=len(df))
print (df)
id age major
0 1 23 business
1 2 24 art
2 3 25 science
3 4 22 math
4 5 16 science
5 6 16 business
EDIT: for same major values per groups use Series.map with dictionary:
c = df['class'].unique()
vals = np.random.choice(['math', 'art', 'business', 'science'], size=len(c))
df['major'] = df['class'].map(dict(zip(c, vals)))
print (df)
id age class major
0 1 23 a business
1 2 24 a business
2 3 25 b art
3 4 22 b art
4 5 16 c science
5 6 16 d math

How to count number of records in each group and add them to main dataset?

Given that i have a dataset as below:
import pandas as pd
import numpy as np
dt = {
"facility":["Ann Arbor","Ann Arbor","Detriot","Detriot","Detriot"],
"patient_ID":[4388,4388,9086,9086,9086],
"year":[2004,2007,2007,2008,2011],
"month":[8,9,9,6,2],
"Nr_Small":[0,0,5,12,10],
"Nr_Medium":[3,1,1,4,3],
"Nr_Large":[2,0,0,0,0]
}
dt = pd.DataFrame(dt)
dt.head()
i need to add a column which shows the number of records in each groups of paitents. Here is what i am doing:
dt["NumberOfVisits"] = dt.groupby(['patient_ID']).size()
or i tried this one:
but it adds a column of Nas into my dataset.However, my favorit output is as below

Use transform here:
df["NumberOfVisits"]=df.groupby(['patient_ID'])['patient_ID'].transform('size')
print(df)
facility patient_ID year month Nr_Small Nr_Medium Nr_Large \
0 Ann Arbor 4388 2004 8 0 3 2
1 Ann Arbor 4388 2007 9 0 1 0
2 Detriot 9086 2007 9 5 1 0
3 Detriot 9086 2008 6 12 4 0
4 Detriot 9086 2011 2 10 3 0
NumberOfVisits
0 2
1 2
2 3
3 3
4 3

grouping by weekly days pandas

I have a dataframe,df containing
Index Date & Time eventName eventCount
0 2017-08-09 ABC 24
1 2017-08-09 CDE 140
2 2017-08-10 CDE 150
3 2017-08-11 DEF 200
4 2017-08-11 ABC 20
5 2017-08-16 CDE 10
6 2017-08-16 ABC 15
7 2017-08-17 CDE 10
8 2017-08-17 DEF 50
9 2017-08-18 DEF 80
...
I want to sum the eventCount for each weekly day occurrences and plot for the total events for each weekly day(from MON to SUN) i.e. for example:
Summation of the eventCount values of:
2017-08-09 and 2017-08-16(Mondays)=189
2017-08-10 and 2017-08-17(Tuesdays)=210
2017-08-16 and 2017-08-23(Wednesdays)=300
I have tried
dailyOccurenceSum=df['eventCount'].groupby(lambda x: x.weekday).sum()
and I get this error:AttributeError: 'int' object has no attribute 'weekday'

Starting with df -
df
Index Date & Time eventName eventCount
0 0 2017-08-09 ABC 24
1 1 2017-08-09 CDE 140
2 2 2017-08-10 CDE 150
3 3 2017-08-11 DEF 200
4 4 2017-08-11 ABC 20
5 5 2017-08-16 CDE 10
6 6 2017-08-16 ABC 15
7 7 2017-08-17 CDE 10
8 8 2017-08-17 DEF 50
9 9 2017-08-18 DEF 80
First, convert Date & Time to a datetime column -
df['Date & Time'] = pd.to_datetime(df['Date & Time'])
Next, call groupby + sum on the weekday name.
df = df.groupby(df['Date & Time'].dt.weekday_name)['eventCount'].sum()
df
Date & Time
Friday 300
Thursday 210
Wednesday 189
Name: eventCount, dtype: int64
If you want to sort by weekday, convert the index to categorical and call sort_index -
cat = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday', 'Sunday']
df.index = pd.Categorical(df.index, categories=cat, ordered=True)
df = df.sort_index()
df
Wednesday 189
Thursday 210
Friday 300
Name: eventCount, dtype: int64

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Calculate Percentage using Pandas DataFrame - python-3.x

Related

Inner merge in python with tables having duplicate values in key column

how to categorize salary into high/med/low group in python?

Groupby and create a new column by randomly assign multiple strings into it in Pandas

How to count number of records in each group and add them to main dataset?

grouping by weekly days pandas

Categories

Resources