pandas: draw plot using dict and labels on top of each bar - python-3.x

I am trying to plot a graph from a dict, which works fine but I also have a similar dict with values that I intend to write on top of each bar.
This works fine for plotting the graph:
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['axes.formatter.useoffset'] = False
df = pd.DataFrame([population_dct])
df.sum().sort_values(ascending=False).plot.bar(color='b')
plt.savefig("temp_fig.png")
Where the population_dct is:
{'pak': 210, 'afg': 182, 'ban': 94, 'ind': 32, 'aus': 14, 'usa': 345, 'nz': 571, 'col': 47, 'iran': 2}
Now I have another dict, called counter_dct:
{'pak': 1.12134, 'afg': 32.4522, 'ban': 3.44, 'ind': 1.123, 'aus': 4.22, 'usa': 9.44343, 'nz': 57.12121, 'col': 2.447, 'iran': 27.5}
I need the second dict items to be shown on top of each bar from the previous graph.
What I tried:
df = pd.DataFrame([population_dct])
df.sum().sort_values(ascending=False).plot.bar(color='g')
for i, v in enumerate(counter_dct.values()):
plt.text(v, i, " " + str(v), color='blue', va='center', fontweight='bold')
This has two issues:
counter_dct.values() msesses up with the sequence of values
The values are shown at the bottom of each graph with poor alignment
Perhaps there's a better way to achieve this?

Since you are drawing the graph in a desc manner;
You need to first sort the population_dict in a desc manner based on values
temp_dct = dict(sorted(population_dct.items(), key=lambda x: x[1], reverse=True))
Start with the temp_dct and then get the value from the counter_dct
counter = 0 # to start from the x-axis
for key, val in temp_dct.items():
top_val = counter_dct[key]
plt.text(x=counter, y=val + 2, s=f"{top_val}", fontdict=dict(fontsize=11))
counter += 1
plt.xticks(rotation=45, ha='right')

Related

Use for loop for multi row column plot

I am attempting to run a for loop in order to plot multiple scatter plots. For the code that I have, I only get one plot at the end. How to go about generating the correct row x column plots to save?
I have checked out some of the answers given here and here, but it does not work for me. Is there a more optimum way to generate these plots?
Here is my code:
from sklearn.datasets import make_classification
import seaborn as sns
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
# Generate noisy Data
num_trainsamples = 500
num_testsamples = 50
X_train,y_train = make_classification(n_samples=num_trainsamples,
n_features=240,
n_informative=9,
n_redundant=0,
n_repeated=0,
n_classes=10,
n_clusters_per_class=1,
class_sep=9,
flip_y=0.2,
#weights=[0.5,0.5],
random_state=17)
n_components=2
n_neighbours=[1, 5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30]
local_connectivity=2
min_dist=0.15
target_names = ['t1', 't2', 't3', 't4', 't5', 't6', 't7', 't8', 't9', 't10']
plt.figure(figsize=(15,15))
for i in range(0, len(n_neighbours)):
plt.subplot(3,5,i+1)
plt.clf()
plt.scatter(
X_train[:, 0],
X_train[:, 1],
s = 20,
c=y_train,
cmap=plt.cm.nipy_spectral,
edgecolor="k",
linewidths=0.75,
label=y_train,
alpha=0.45,
)
plt.title(f'n_components = {n_components}, n_neighbors = {n_neighbours[i]}, local_conn = {local_connectivity}, min_dist = {min_dist}')
cbar = plt.colorbar(boundaries=np.arange(11)-0.5)
cbar.set_ticks(np.arange(10))
cbar.set_ticklabels(target_names)
The reason for you seeing just one plot is the line plt.clf(). This command tells matplotlib to clear current figure. So, each time you loop through the code, it clear the previous figure and so, you see just the last one. Commenting that line will give you the below figure, which is what I think you are looking for...
PLOT

Bar plot with different minimal value for each bar

I'm trying to reproduce this type of graph :
basically, the Y axis represent the date of beginning and end of a phenomenon for each year.
but here is what I have when I try to plot my data :
It seems that no matter what, the bar for each year is plotted from the y axis minimal value.
Here is the data I use
Here is my code :
select=pd.read_excel("./writer.xlsx")
select=pd.DataFrame(select)
select["dte"]=pd.to_datetime(select.dte)
select["month_day"]=pd.DatetimeIndex(select.dte).strftime('%B %d')
select["month"]=pd.DatetimeIndex(select.dte).month
select["day"]=pd.DatetimeIndex(select.dte).day
gs=gridspec.GridSpec(2,2)
fig=plt.figure()
ax1=plt.subplot(gs[0,0])
ax2=plt.subplot(gs[0,1])
ax3=plt.subplot(gs[1,:])
###2 others graphs that works just fine
data=pd.DataFrame()
del select["res"],select["Seuil"],select["Seuil%"] #these don't matter for that graph
for year_ in list(set(select.dteYear)):
temp=select.loc[select["dteYear"]==year_]
temp2=temp.iloc[[0,-1]] #the beginning and ending of the phenomenon
data=pd.concat([data,temp2]).reset_index(drop=True)
data=data.sort_values(["month","day"])
ax3.bar(data["dteYear"],data["month_day"],tick_label=data["dteYear"])
plt.show()
If you have some clue to help me, I'd really appreciate, because I havn't found any model to make this type of graph.
thanks !
EDIT :
I tried something else :
height,bottom,x_position=[], [], []
for year_ in list(set(select.dteYear)):
temp=select.loc[select["dteYear"]==year_]
bottom.append(temp["month_day"].iloc[0])
height.append(temp["month_day"].iloc[-1])
x_position.append(year_)
temp2=temp.iloc[[0,-1]]
data=pd.concat([data,temp2]).reset_index(drop=True)
ax3.bar(x=x_position,height=height,bottom=bottom,tick_label=x_position)
I got this error :
Traceback (most recent call last):
File "C:\Users\E31\Documents\cours\stage_dossier\projet_python\tool_etiage\test.py", line 103, in <module>
ax3.bar(x=x_position,height=height,bottom=bottom,tick_label=x_position)
File "C:\Users\E31\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\__init__.py", line 1352, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "C:\Users\E31\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\axes\_axes.py", line 2357, in bar
r = mpatches.Rectangle(
File "C:\Users\E31\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\patches.py", line 752, in __init__
super().__init__(**kwargs)
File "C:\Users\E31\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\patches.py", line 101, in __init__
self.set_linewidth(linewidth)
File "C:\Users\E31\AppData\Local\Programs\Python\Python39\lib\site-packages\matplotlib\patches.py", line 406, in set_linewidth
self._linewidth = float(w)
TypeError: only size-1 arrays can be converted to Python scalars
To make a bar graph that shows a difference between dates you should start by getting your data into a nice format in the dataframe where it is easy to access the bottom and top values of the bar for each year you are plotting. After this you can simply plot the bars and indicate the 'bottom' parameter. The hardest part in your case may be specifying the datetime differences correctly. I added a x tick locator and y tick formatter for the datetimes.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.dates as mdates
# make function that returns a random datetime
# between a start and stop date
def random_date(start, stop):
days = (stop - start).days
rand = np.random.randint(days)
return start + pd.Timedelta(rand, unit='days')
# simulate poster's data
T1 = pd.to_datetime('July 1 2021')
T2 = pd.to_datetime('August 1 2021')
T3 = pd.to_datetime('November 1 2021')
df = pd.DataFrame({
'year' : np.random.choice(np.arange(1969, 2020), size=15, replace=False),
'bottom' : [random_date(T1, T2) for x in range(15)],
'top' : [random_date(T2, T3) for x in range(15)],
}).sort_values(by='year').set_index('year')
# define fig/ax and figsize
fig, ax = plt.subplots(figsize=(16,8))
# plot data
ax.bar(
x = df.index,
height = (df.top - df.bottom),
bottom = df.bottom,
color = '#9e7711'
)
# add x_locator (every 2 years), y tick datetime formatter, grid
# hide top/right spines, and rotate the x ticks for readability
x_locator = ax.xaxis.set_major_locator(mpl.ticker.MultipleLocator(2))
y_formatter = ax.yaxis.set_major_formatter(mdates.DateFormatter('%d %b'))
tick_params = ax.tick_params(axis='x', rotation=45)
grid = ax.grid(axis='y', dashes=(8,3), alpha=0.3, color='gray')
hide_spines = [ax.spines[s].set_visible(False) for s in ['top','right']]

Unable to read data from kdeplot

I have a pandas dataframe with two columns, A and B, named df in the following bits of code.
And I try to plot a kde for each value of B like so:
import seaborn as sbn, numpy as np, pandas as pd
fig = plt.figure(figsize=(15, 7.5))
sbn.kdeplot(data=df, x="A", hue="B", fill=True)
fig.savefig("test.png")
I read the following propositions but only those where I compute the kde from scratch using statsmodel or some other module get me somewhere:
Seaborn/Matplotlib: how to access line values in FacetGrid?
Get data points from Seaborn distplot
For curiosity's sake, I would like to know why I am unable to get something from the following code:
kde = sns.kdeplot(data=df, x="A", hue="B", fill=True)
line = kde.lines[0]
x, y = line.get_data()
print(x, y)
The error I get is IndexError: list index out of range. kde.lines has a length of 0.
Accessing the lines through fig.axes[0].lines[0] also raises an IndexError.
All in all, I think I tried everything proposed in the previous threads (I tried switching to displot instead of using kdeplot but this is the same story, only that I have to access axes differently, note displot and not distplot because it is deprecated), but every time I get to .get_lines(), ax.lines, ... what is returned is an empty list. So I can't get any values out of it.
EDIT : Reproducible example
import pandas as pd, numpy as np, matplotlib.pyplot as plt, seaborn as sbn
# 1. Generate random data
df = pd.DataFrame(columns=["A", "B"])
for i in [1, 2, 3, 5, 7, 8, 10, 12, 15, 17, 20, 40, 50]:
for _ in range(10):
df = df.append({"A": np.random.random() * i, "B": i}, ignore_index=True)
# 2. Plot data
fig = plt.figure(figsize=(15, 7.5))
sbn.kdeplot(data=df, x="A", hue="B", fill=True)
# 3. Read data (error)
ax = fig.axes[0]
x, y = ax.lines[0].get_data()
print(x, y)
This happens because using fill=True changes the object that matplotlib draws.
When no fill is used, lines are plotted:
fig = plt.figure(figsize=(15, 7.5))
ax = sbn.kdeplot(data=df, x="A", hue="B")
print(ax.lines)
# [<matplotlib.lines.Line2D object at 0x000001F365EF7848>, etc.]
when you use fill, it changes them to PolyCollection objects
fig = plt.figure(figsize=(15, 7.5))
ax = sbn.kdeplot(data=df, x="A", hue="B", fill=True)
print(ax.collections)
# [<matplotlib.collections.PolyCollection object at 0x0000016EE13F39C8>, etc.]
You could draw the kdeplot a second time, but with fill=False so that you have access to the line objects

Is it possible to customise the visible edges of a matplotlib table cell while also setting a background colour?

I would like to overlay a matplotlib table on a plot that has a grid, i.e. the table should hide anything behind it on the grid.
I would also like the border to look more like each row has the border rather than a border around each cell. However setting cell.visible_edges seems to break the fill.
An example to illustrate
import matplotlib.pyplot as plt
import numpy as np
# a plot with a grid
x = np.linspace(0, 10, 11)
y = x ** 2
plt.plot(x, y)
plt.grid()
# a table on the same axis with a visible background colour
col_labels = ['col1', 'col2', 'col3']
row_labels = ['row1', 'row2', 'row3']
table_vals = [[11, 12, 13], [21, 22, 23], [31, 32, 33, ]]
colours = [['c'] * 3] * 3
the_table = plt.table(cellText=table_vals,
colWidths=[0.1] * len(col_labels),
cellColours=colours,
rowLabels=row_labels,
bbox=(0.8, 0.8, 0.2, 0.2))
# bump the zorder up so it occludes the underlying chart
the_table.set_zorder(1000)
# change the visible_edges to produce a row border effect
for row_idx, row in enumerate(table_vals):
for col_idx, col in enumerate(row):
edges = 'B'
if col_idx == 0:
edges += 'L'
elif col_idx == 2:
edges += 'R'
the_table[row_idx, col_idx].visible_edges = edges
plt.show()
The result is that cells are filled inconsistently and the grid is still visible
If I remove the change to visible_edges then it fills as expected
Is there a way to combine these two effects?
(examples created using matplotlib 2.2.3)

Can't make dates appear on x-axis in pyplot

So I've been trying to plot some data. I have got the data to fetch from a database and placed it all correctly into the variable text_. This is the snippet of the code:
import sqlite3
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from dateutil.parser import parse
fig, ax = plt.subplots()
# Twin the x-axis twice to make independent y-axes.
axes = [ax, ax.twinx(), ax.twinx()]
# Make some space on the right side for the extra y-axis.
fig.subplots_adjust(right=0.75)
# Move the last y-axis spine over to the right by 20% of the width of the axes
axes[-1].spines['right'].set_position(('axes', 1.2))
# To make the border of the right-most axis visible, we need to turn the frame on. This hides the other plots, however, so we need to turn its fill off.
axes[-1].set_frame_on(True)
axes[-1].patch.set_visible(False)
# And finally we get to plot things...
text_ = [('01/08/2017', 6.5, 143, 88, 60.2, 3), ('02/08/2017', 7.0, 146, 90, 60.2, 4),
('03/08/2017', 6.7, 142, 85, 60.2, 5), ('04/08/2017', 6.9, 144, 86, 60.1, 6),
('05/08/2017', 6.8, 144, 88, 60.2, 7), ('06/08/2017', 6.7, 147, 89, 60.2, 8)]
colors = ('Green', 'Red', 'Blue')
label = ('Blood Sugar Level (mmol/L)', 'Systolic Blood Pressure (mm Hg)', 'Diastolic Blood Pressure (mm Hg)')
y_axisG = [text_[0][1], text_[1][1], text_[2][1], text_[3][1], text_[4][1], text_[5][1]] #Glucose data
y_axisS = [text_[0][2], text_[1][2], text_[2][2], text_[3][2], text_[4][2], text_[5][2]] # Systolic Blood Pressure data
y_axisD = [text_[0][3], text_[1][3], text_[2][3], text_[3][3], text_[4][3], text_[5][3]] # Diastolic Blood Pressure data
AllyData = [y_axisG, y_axisS, y_axisD] #list of the lists of data
dates = [text_[0][0], text_[1][0], text_[2][0], text_[3][0], text_[4][0], text_[5][0]] # the dates as strings
x_axis = [(parse(x, dayfirst=True)) for x in dates] #converting the dates to datetime format for the graph
Blimits = [5.5, 130, 70] #lower limits of the axis
Tlimits = [8, 160, 100] #upper limits of the axis
for ax, color, label, AllyData, Blimits, Tlimits in zip(axes, colors, label, AllyData, Blimits, Tlimits):
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y')) #format's the date
plt.gca().xaxis.set_major_locator(mdates.DayLocator())
data = AllyData
ax.plot(data, color=color) #plots all the y-axis'
ax.set_ylim([Blimits, Tlimits]) #limits
ax.set_ylabel(label, color=color) #y-axis labels
ax.tick_params(axis='y', colors=color)
axes[0].set_xlabel('Date', labelpad=20)
plt.gca().set_title("Last 6 Month's Readings",weight='bold',fontsize=15)
plt.show()
The code currently makes this graph:
Graph with no x-values
I understand the problem is probably in the ax.plot part but I'm not sure what exactly. I tried putting that line of code as ax.plot(data, x_axis, color=color however, this made the whole graph all messed up and the dates didn't show up on the x-axis like i wanted them to.
Is there something I've missed?
If this has been answered elsewhere, please can you show me how to implement that into my code by editing my code?
Thanks a ton
Apparently x_data is never actually used in the code. Instead of
ax.plot(data, color=color)
which plots the data against its indices, you would want to plot the data against the dates stored in x_axis.
ax.plot(x_axis, data, color=color)
Finally, adding plt.gcf().autofmt_xdate() just before plt.show will rotate the dates nicely, such that they don't overlap.

Resources