Python: Plot pie chart for every groups in Pandas - python-3.x

Below is the data frame retrieved after grouping my data set:
df1 = data.groupby(['gender','Segment']).agg(Total_Claim = ('claim_amount', 'sum'))
df1['Total_Claim']=df1['Total_Claim'].astype(int)
df1
The output of the same is:
Total_Claim
gender Segment
Female Gold 2110094
Platinum 2369761
Silver 1897617
Male Gold 2699208
Platinum 2096489
Silver 2347217
What would be the most efficient way of plotting a pie chart between the aggregated value of claim amount based on gender and segment?

You can get Series and ploting by Series.plot.pie and autopct parameter for percentages:
df['Total_Claim'].plot.pie(autopct='%1.1f%%')

Related

Excel chart Categories against variable quantity of Price Points

I have some data which I would like to plot, presumably in a scatter chart. The data is in the following format.
Category, ID
Cat1, 560000
Cat1, 560005
Cat1, 880011
Cat2, 580000
Cat2, 580001
Cat2, 580002
Cat2, 780052
Cat3, 600000
Cat3, 600010
Cat3, 600011
Cat3, 1003452
For non-developers, you could think of this as Categories of Tyres and their Prices. I'd like to see that in the "Car" category, we sell items in a set of different price points and in the "Bicycle" category we sell items at a different set of price points. This metaphor only breaks down in that no 2 products can use the same price.
For developers out there, these are in fact a Long, Primary-Key to a particular relational database table and I'm attempting to plot how much and what parts of the available ID range have been utilised already, based on splitting the data into some categories.
I have 13 categories, so these will have to become the x axis due to excel limiting that to 255. Therefore ID will be my Y axis.
By using a 2D Line Chart on just the IDs, and squashing the chart, I was able to plot the overall usage of the ID range.
However, I'm unable to get any chart type to split this by category - I presume XY Scatter or Heatmap has this ability somehow.
Update:- Since posting I have encountered this "Contiguity Chart" which is roughly what I'm after but in Excel if feasible
https://qvdesign.wordpress.com/2012/03/29/qlikview-contiguity-chart-aka-that-chart-from-the-windows-disk-defragmenter/

Plotting chart with varying Data series

I am looking to plotting a car variant price comparison chart where the chart Y axis would be amount currency.
X Axis would be different car brands like Honda Hyundai, Toyota etc.
My plot would be the price for each car variant from low to high As per this image
Note:
Number of variants in the brands may be different so that way the series in the plots will be different in length/size.

How do I transpose a Dataframe and how to scatter plot the transposed df

I have this dataframe with 20 countries and 20 years of data
Country 2000 2001 2002 ...
USA 1 2 3
CANADA 4 5 6
SWEDEN 7 8 9
...
and I want to get a new df to create a scatter plot with y = value for each column (country) and x= Year
Country USA CANADA SWEDEN ...
2000 1 4 7
2001 2 5 8
2002 3 6 9
...
My Code :
data = pd.read_csv("data.csv")
data.set_index("Country Name", inplace = True)
data_transposed = data.T
I'm struggling to create this kind of scatter plot.
Any idea ?
Thanks
Scatter is a plot which receives x and y only, you can scatter the whole dataframe directly. However, a small workaround:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(data={"Country":["USA", "Canada", "Brasil"], 2000:[1,4,7], 2001:[3,7,9], 2002: [2,8,5]})
for column in df.columns:
if column != "Country":
plt.scatter(x=df["Country"], y=df[column])
plt.show()
result:
It just plotting each column separately, eventually you get what you want.
As you see, each year is represent by different colors - you can do the opposite (plotting years and having countries as different colors). Scatter is 1x1: you have Country, Year, Value. You can present only two of them in a scatter plot (unless you use colors for example)
You need to transpose your dataframe for that (as you specify yourself what x and y are) but you can do it with df.transpose(): see documentation.
Notice in my df, country column is not an index. You can use set_index or reset_index to control it.

pandas check if two values are statistically different

I have a pandas dataframe which has some values for Male and some for Female. I would like to calculate if the percentage of both genders' values is significantly different or not and tell confidence intervals of these rates. Given below is the sample code:
data={}
data['gender']=['male','female','female','male','female','female','male','female','male']
data['values']=[10,2,13,4,11,8,14,19,2]
df_new=pd.DataFrame(data)
df_new.head() # make a simple data frame
gender values
0 male 10
1 female 2
2 female 13
3 male 4
4 female 11
df_male=df_new.loc[df_new['gender']=='male']
df_female=df_new.loc[df_new['gender']=='female'] # separate male and female
# calculate percentages
male_percentage=sum(df_male['values'].values)*100/sum(df_new['values'].values)
female_percentage=sum(df_female['values'].values)*100/sum(df_new['values'].values)
# want to tell whether both percentages are statistically different or not and what are their confidence interval rates
print(male_percentage)
print(female_percentage)
Any help will be much appreciated. Thanks!
Use t-test.In this case, use a two t test, meaning you are comparing values/means of two samples.
I am applying an alternative hypothesis; A!=B.
I do this by testing the null hypothesis A=B. This is achieved by calculating a p value. When p falls below a critical value, called alpha, I reject the null hypothesis. Standard value for alpha is 0.05. Below 5% probability, the sample will produce patterns similar to observed values
Extract Samples, in this case a list of values
A=df[df['gender']=='male']['values'].values.tolist()
B=df[df['gender']=='female']['values'].values.tolist()
Using scipy library, do the t -test
from scipy import stats
t_check=stats.ttest_ind(A,B)
t_check
alpha=0.05
if(t_check[1]<alpha):
print('A different from B')
Try this:
df_new.groupby('gender')['values'].sum()/df_new['values'].sum()*100
gender
female 63.855422
male 36.144578
Name: values, dtype: float64

Dynamic axis labels with power Pivot

I am working with Excel and Powerpivot. In my data layer I have two tables related, in the main table I have my data and the ID that refers to my second table which is a reference table containing (Description, short description, units, abbreviation,..)
Simplified example:
Table 1:
Name Year Value Indicator
Usain Bolt 2009 9.58 1
Mike Powell 1990 8.95 2
Table 2:
ID Discipline Short_Descr Unit
1 100 metres 100m seconds
2 Long Jump L. j. metres
I want to create a chart where the axis and title are dynamic using the values in the reference table. The dynamic title is quite easy to build: I create my pivot chart using as Report filter the Description, and then I reference the chat title to this filter. But I would like to include in the vartical axis dinamically the units. So if I change the filter the units in the axis also change

Resources