How to add percentage label on top of bar chart from a data frame with different sum total data groups - python-3.x

I am new in coding with python, I am trying to develop a bar chart with percentage on top. I have a sample data frame Quiz2. I developed code and gives only 1600% at first single bar. Kindly any one with help how can i do it correct?
#Approach 2
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.set()
%matplotlib inline
Quiz2 = pd.DataFrame({'Kaha': ['16', '5'], 'Shiny': ['16', '10']})
data=Quiz2 .rename(index={0: "Male", 1: "Female"})
data=data.astype(float)
Q1p = data[['Kaha','Shiny']].plot(kind='bar', figsize=(5, 5), legend=True, fontsize=12)
Q1p.set_xlabel("Gender", fontsize=12)
Q1p.set_ylabel("Number of people", fontsize=12)
#Q1p.set_xticklabels(x_labels)
for p in Q1p.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
Q1p.annotate(f'{height:.0%}', (x + width/2, y + height*1.02), ha='center')
plt.show()
I want the percentage of Kaha (with 21 sum total) to appear as (76.2% for Male and 23.8% for Female) and that of shy (with 26 sum total) as (61.5% for Male and 38.5%for Female). Kindly requesting help

In approach 2, the reason you have only 1 value displaying is the plt.show()
should be outdented so it comes after the processing of the for loop. You are getting a value of 1600% because you are plotting the value as the height of the bar in the line beginning with Q1p.annotate(f'{height:.0%}' Instead of height this should be height/10*total or something to give you the percentage.
Here is a solution, but not sure if I am computing the percentages correctly:
Quiz2 = pd.DataFrame({'Kaha': ['16', '5'], 'Shiny': ['16', '10']})
data=Quiz2 .rename(index={0: "Male", 1: "Female"})
data=data.astype(float)
total = len(data)*10
Q1p = data[['Kaha','Shiny']].plot(kind='bar', figsize=(5, 5), legend=True, fontsize=12)
Q1p.set_xlabel("Gender", fontsize=12)
Q1p.set_ylabel("Number of people", fontsize=12)
#Q1p.set_xticklabels(x_labels)
for p in Q1p.patches:
width = p.get_width()
height = p.get_height()
x, y = p.get_xy()
Q1p.annotate(f'{height/total:.0%}', (x + width/2, y + height*1.02), ha='center')
plt.show()

Related

Plot 350 users on bar chart using matplotlib

I'm trying to plot around 300 users and how many purchases they have made. My data is in a pandas dataframe, where the column 'ID' refers to a user and 'Number' to the number of purchases.
I have tried so far with the following code I have found but never manage to get all the IDs on one plot?
This is the code:
import random
# Prepare Data
n = subs_['Number'].unique().__len__()+1
all_colors = list(plt.cm.colors.cnames.keys())
random.seed(100)
c = random.choices(all_colors, k=n)
# Plot Bars
plt.figure(figsize=(16,10), dpi= 60)
plt.bar(subs_['ID'], subs_['Number'], color=c, width=.5)
for i, val in enumerate(subs_['Number'].values):
plt.text(i, val, float(val), horizontalalignment='center', verticalalignment='bottom', fontdict={'fontweight':500, 'size':10})
# Decoration
plt.gca().set_xticklabels(subs_['ID'], rotation=60, horizontalalignment= 'right')
plt.title("Number of purchases by user", fontsize=22)
plt.ylabel('# Purchases')
plt.ylim(0, 45)
plt.show()
bar chart of user purchases:
I think that your problem is coming from your IDE:
import random
import matplotlib.pyplot as plt
import pandas as pd
# Prepare Data
d = {'ID': range(1, 300), 'Number': range(1, 300)}
subs_ = pd.DataFrame(data=d)
n = subs_['Number'].unique().__len__()+1
all_colors = list(plt.cm.colors.cnames.keys())
random.seed(100)
c = random.choices(all_colors, k=n)
# Plot Bars
plt.figure(figsize=(16,10), dpi= 60)
plt.bar(subs_['ID'], subs_['Number'], color=c, width=.5)
for i, val in enumerate(subs_['Number'].values):
plt.text(i, val, float(val), horizontalalignment='center', verticalalignment='bottom', fontdict={'fontweight':500, 'size':10})
# Decoration
plt.gca().set_xticklabels(subs_['ID'], rotation=60, horizontalalignment= 'right')
plt.title("Number of purchases by user", fontsize=22)
plt.ylabel('# Purchases')
plt.ylim(0, 45)
plt.show()
Is working fine for me:

Scatter Plot of Multiple Y Values for each X coloring each X value

For each company name I want to assign a color. I tried playing with color parameter in scatterplot but that gives different colors within company names.
import matplotlib.pyplot as plt
import seaborn as sns
y = [[0.15,0.25,0.63],[0.69,0.24,0.85],[0.85,0.41,0.73]]
x = [1,2,3]
sns.set_style("dark")
plt.title("Company Records")
for xe, ye in zip(x, y):
plt.scatter([xe] * len(ye), ye)
plt.xticks([1,2,3]);
plt.axes().set_xticklabels(['ACTP', 'ATC',"LKO"],rotation = 45);
Pass the custom colors along with zip:
colors = ['red', 'magenta', 'pink']
for xe, ye,c in zip(x, y,colors):
plt.scatter([xe] * len(ye), ye, c=c)
plt.xticks([1,2,3]);
plt.axes().set_xticklabels(['ACTP', 'ATC',"LKO"],rotation = 45);
Output:

Adding minor tick marks to a histogram

I am working through this:
https://medium.com/diogo-menezes-borges/introduction-to-statistics-for-data-science-6c246ed2468d
About 3/4 of the way through there is a histogram, but the author does not supply the code used to generate it.
So I decided to give it a go...
I have everything working, but I would like to add minor ticks to my plot.
X-axis only, spaced 200 units apart (matching the bin width used in my code).
In particular, I would like to add minor ticks in the style from the last example from here:
https://matplotlib.org/3.1.0/gallery/ticks_and_spines/major_minor_demo.html
I have tried several times but I just can't get that exact 'style' to work on my plot.
Here is my working code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
print('NumPy: {}'.format(np.__version__))
print('Pandas: {}'.format(pd.__version__))
print('\033[1;31m' + '--------------' + '\033[0m') # Bold red
display_settings = {
'max_columns': 15,
'max_colwidth': 60,
'expand_frame_repr': False, # Wrap to multiple pages
'max_rows': 50,
'precision': 6,
'show_dimensions': False
}
# pd.options.display.float_format = '{:,.2f}'.format
for op, value in display_settings.items():
pd.set_option("display.{}".format(op), value)
file = "e:\\python\\pandas\\medium\\sets.csv"
lego = pd.read_csv(file, encoding="utf-8")
print(lego.shape, '\n')
print(lego.info(), '\n')
print(lego.head(), '\n')
print(lego.isnull().sum(), '\n')
dfs = [lego]
names = ['lego']
def NaN_percent(_df, column_name):
# empty_values = row_count - _df[column_name].count()
empty_values = _df[column_name].isnull().sum()
return (100.0 * empty_values)/row_count
c = 0
print('Columns with missing values expressed as a percentage.')
for df in dfs:
print('\033[1;31m' + ' ' + names[c] + '\033[0m')
row_count = df.shape[0]
for i in list(df):
x = NaN_percent(df, i)
if x > 0:
print(' ' + i + ': ' + str(x.round(4)) + '%')
c += 1
print()
# What is the average number of parts in the sets of legos?
print(lego['num_parts'].mean(), '\n')
# What is the median number of parts in the sets of legos?
print(lego['num_parts'].median(), '\n')
print(lego['num_parts'].max(), '\n')
# Create Bins for Data Ranges
bins = []
for i in range(lego['num_parts'].min(), 6000, 200):
bins.append(i + 1)
# Use 'right' to determine which bin overlapping values fall into.
cuts = pd.cut(lego['num_parts'], bins=bins, right=False)
# Count values in each bin.
print(cuts.value_counts(), '\n')
plt.hist(lego['num_parts'], color='red', edgecolor='black', bins=bins)
plt.title('Histogram of Number of parts')
plt.xlabel('Bin')
plt.ylabel('Number of values per bin')
plt.axvline(x=162.2624, color='blue')
plt.axvline(x=45.0, color='green', linestyle='--')
# https://matplotlib.org/gallery/text_labels_and_annotations/custom_legends.html
legend_elements = [Line2D([0], [0], color='blue', linewidth=2, linestyle='-'),
Line2D([0], [1], color='green', linewidth=2, linestyle='--')
]
labels = ['mean: 162.2624', 'median: 45.0']
plt.legend(legend_elements, labels)
plt.show()
You can just add:
ax = plt.gca()
ax.xaxis.set_minor_locator(AutoMinorLocator())
ax.tick_params(which='minor', length=4, color='r')
See this post to get a better idea about the difference between plt, ax and fig. In broad terms, plt refers to the pyplot library of matplotlib. fig is one "plot" that can consist of one or more subplots. ax refers to one subplot and the x and y-axis defined for them, including the measuring units, tick marks, tick labels etc.. Many function in matplotlib are often called as plt.hist, but in the underlying code they are drawing on the "current axes". These axes can be obtained via plt.gca() or "get current axes". It is not always clear which functions can be called via plt. and which only exist via ax.. Also, sometimes the get slightly different names. You'll need to look in the documentation or search StackOverflow which form is needed in each specific case.

Matplotlib figure annotations outside of window

I am making a program that implements a matplotlib pie/donut chart into a tkinter window to illustrate some data, however, I have added "annotations" or labels from each wedge of the pie chart. Because of this the window that opens when I execute the code fits the chart itself, but the labels are cut off at the edges of the window. Specifically, it looks like this...
Note the top two arrows don't actually have text attached to the corresponding labels so the situation is actually worse than my screenshot depicts.
Even if I get rid of the code related to generating a tkinter GUI, and just try to execute code to generate a regular figure window the labels are initially cut-off. But, if I use the built in zoom-out functionality I can zoom out the make the labels fit.
I have tried to adjust the figsize here...
fig, ax = plt.subplots(figsize=(6, 4), subplot_kw=dict(aspect="equal"))
yet it makes no difference. Hopefully there is a solution, thanks...
Here is my full code if anyone needs...
import numpy as np
import matplotlib.pyplot as plt
player1_cards = {'Mustard', 'Plum', 'Revolver', 'Rope', 'Ballroom', 'Library'}
player2_cards = {'Scarlet', 'White', 'Candlestick'}
player3_cards = {'Green', 'Library', 'Kitchen', 'Conservatory'}
middle_cards = {'Peacock'}
unknown_cards = {'Lead Pipe', 'Wrench', 'Knife', 'Hall', 'Lounge', 'Dining Room', 'Study'}
player1_string = ', '.join(player1_cards)
player1_string = player1_string.replace(', ', '\n')
player2_string = ', '.join(player2_cards)
player2_string = player2_string.replace(', ', '\n')
player3_string = ', '.join(player3_cards)
player3_string = player3_string.replace(', ', '\n')
fig, ax = plt.subplots(figsize=(6, 4), subplot_kw=dict(aspect="equal"))
recipe = [player1_string, player2_string, player3_string, '', '']
data = [len(player1_cards), len(player2_cards), len(player3_cards), 1, 7]
cols = ['#339E5A', '#26823E', '#0C5D2E', '#98D6AE', '#5EC488']
wedges, texts = ax.pie(data, wedgeprops=dict(width=0.5), startangle=90, colors = cols)
for w in wedges:
w.set_linewidth(4)
w.set_edgecolor('white')
bbox_props = dict(boxstyle="square,pad=0.3", fc="w", ec="white", lw=0.72)
kw = dict(xycoords='data', textcoords='data', arrowprops=dict(arrowstyle="-"), bbox=bbox_props, zorder=0, va="center")
for i, p in enumerate(wedges):
ang = (p.theta2 - p.theta1)/2. + p.theta1
y = np.sin(np.deg2rad(ang))
x = np.cos(np.deg2rad(ang))
horizontalalignment = {-1: "right", 1: "left"}[int(np.sign(x))]
connectionstyle = "angle,angleA=0,angleB={}".format(ang)
kw["arrowprops"].update({"connectionstyle": connectionstyle})
ax.annotate(recipe[i], xy=(x, y), xytext=(x + np.sign(x)*.5, y*1.5),
horizontalalignment=horizontalalignment, **kw, family = "Quicksand")
ax.set_title("Matplotlib bakery: A donut")
plt.show()
You would want to play around with the subplot parameters to make space for the text outside the axes.
fig.subplots_adjust(bottom=..., top=..., left=..., right=...)
E.g. in this case
fig.subplots_adjust(bottom=0.2, top=0.9)
seems to give a nice representation

Why is the saved video from FuncAnimation a superpositions of plots?

Regards, I would like to ask about Python's FuncAnimation.
In the full code, I was trying to animate bar plots (for integral illustration). The animated output from
ani = FuncAnimation(fig, update, frames=Iter, init_func = init, blit=True);
plt.show(ani);
looks fine.
But the output video from
ani.save("example_new.mp4", fps = 5)
gives a slightly different version from the animation showed in Python. The output gives a video of 'superposition version' compared to the animation. Unlike the animation : in the video, at each frame, the previous plots kept showing together with the current one.
Here is the full code :
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
fig, ax = plt.subplots()
Num = 20
p = plt.bar([0], [0], 1, color = 'b')
Iter = tuple(range(2, Num+1))
xx = list(np.linspace(0, 2, 200)); yy = list(map(lambda x : x**2,xx));
def init():
ax.set_xlim(0, 2)
ax.set_ylim(0, 4)
return (p)
def update(frame):
w = 2/frame;
X = list(np.linspace(0, 2-w, frame+1));
Y = list(map(lambda x: x**2, X));
X = list(map(lambda x: x + w/2,X));
C = (0, 0, frame/Num);
L = plt.plot(xx , yy, 'y', animated=True)[0]
p = plt.bar(X, Y, w, color = C, animated=True)
P = list(p[:]); P.append(L)
return P
ani = FuncAnimation(fig, update, frames=Iter, init_func = init, interval = 0.25, blit=True)
ani.save("examplenew.mp4", fps = 5)
plt.show(ani)
Any constructive inputs on this would be appreciated. Thanks. Regards, Arief.
When saving the animation, no blitting is used. You can turn off blitting, i.e. blit=False and see the animation the same way as it is saved.
What is happening is that in each iteration a new plot is added without the last one being removed. You basically have two options:
Clear the axes in between, ax.clear() (then remember to set the axes limits again)
update the data for the bars and the plot. Examples to do this:
For plot: Matplotlib Live Update Graph
For bar: Dynamically updating a bar plot in matplotlib

Resources