Creating 3 pandas series as pie charts in one plot - python-3.x

I'm setting up a figure to display 3 pie charts. Data for the charts come from 3 separate pandas series. I suppose I could merge the series into a df and create subplots via that df but I doubt it's needed.
My current code generates 3 pie charts. But they all overlap. I'm confused about how to arrange them.
S19E_sj = (BDdf.loc[BDdf['GRPCODE'] == 'SJ3219'])['Result'].value_counts()
S19E_ge = (BDdf.loc[BDdf['GRPCODE'] == 'G1932'])['Result'].value_counts()
S19E_jl = (BDdf.loc[BDdf['GRPCODE'] == 'JLG1930'])['Result'].value_counts()
fig, ax = plt.subplots(figsize = (8,6))
S19E_sj.plot.pie()
S19E_ge.plot.pie()
S19E_jl.plot.pie()

Although you failed to provide a Minimal, Complete, and Verifiable example, you can try something like this. Create a figure containing 3 subplots arranged in a row, and then assign them individually to your three pie chart commands
fig, axes = plt.subplots(1, 3, figsize = (8,6))
S19E_sj.plot.pie(ax=axes[0])
S19E_ge.plot.pie(ax=axes[1])
S19E_jl.plot.pie(ax=axes[2])
plt.tight_layout()
plt.show()

Related

mini scatter matrix using subplots as a loop

I have a dataset with 25 columns and wanted to examine scatter plots. I first looked at it with
Seaborn scatterplot() but this is too messy and there are too many charts to make sense of it all.
So instead I wanted to iterate a single column over all of the columns.
I created this simple loop:
for col in ds_num.columns:
plt.figure()
sns.scatterplot(x='initial_term',y=col,hue='logo_renewal',data=ds_num)
plt.show()
This worked but it gave it in a one column shape. I'd like it to plot for a few in each row so I tried this instead:
for idx, col in enumerate(ds_num.columns):
fig = plt.figure(figsize=(20,16))
ax[idx+1] = fig.add_subplot(5,5,idx+1)
sns.scatterplot(x='initial_term',y=col,hue='logo_renewal',data=ds_num,ax=ax[idx])
plt.show()
But now I got TypeError: 'AxesSubplot' object does not support item assignment
Any suggestions? Thanks
Found the answer with the help of subplots:
fig, axs = plt.subplots(5,5,figsize=(20,20))
cols = ds_num.columns
for ax, col in zip(axs.flatten(),cols):
sns.scatterplot(x='initial_term',y=col,hue='logo_renewal',data=ds_num,ax=ax,legend=False)
plt.tight_layout()
Notice I removed the legend as it took too much space, this is of course not mandatory

How do you adjust the spacing on a barchart and it's xtick labels?

I am trying to create a barchart (overlaid on a line graph with days as the x axis instead of quarters) where the labels are end-of-quarter days. That is all fine, and generates nicely, but I am trying to set the labels so that they are lined up with the right edge of the plot and the corresponding bar's right-side is aligned with the x-tick.
A reproducible example (with just the bar chart, not the line) is:
import matplotlib.pyplot as pyplot
import pandas
import random
random.seed(2020)
dates = pandas.date_range("2016-12-31", "2017-12-31")
bar = pandas.DataFrame([.02, .01, -0.01, .05], index = ["2017-03-31", "2017-06-30", "2017-09-30", "2017-12-31"], columns = ["test"])
line = pandas.DataFrame([random.random() for r in range(len(dates))], index = dates, columns = ["test"])
fig, ax = pyplot.subplots(1, 1, figsize = (7, 3))
ax2 = fig.add_subplot(111, frame_on = False)
bar.plot(kind = "bar", ax = ax, width = 1)
line.plot(kind = "line", ax = ax2)
ax2.set_xticks([])
ax.yaxis.tick_right()
ax.yaxis.set_label_position("right")
fig.tight_layout()
pyplot.show()
Which yields a plot as:
My goal is to have the right side of the 2017-12-31 column aligned with the right edge of the plot and the 2017-12-31 label at the right side as well. Further, the left side of the 2017-03-31 bar touch the left side of the plot. For the remaining bars, I would like them evenly spaced with all labels aligned with the right side of each bar, and no space in between bars. Like this example below:
Frankly, I'm at a loss. I've tried adding ha="right" to no such avail and just shifting the graphs but that leaves me with other problems and doesn't really address the problem. Even with the bars shifted, I'm still fairly constrained as to moving the tick labels and haven't found anything online that remotely addresses the problem.
Would it be better to create the bar chart so that it has the same index as the line chart, then set the x tick labels to be the desired dates?
Does anyone have any guidance? I've spent too much time on this problem today and it's driving me nuts.
In order to plot the bar chart tightly, you can use the autoscale function as below.
To move the tick labels, you can modify the transformations to include some offset. Below I used 0.7 but you can select it based on other sizes used in your chart.
import matplotlib.pyplot as pyplot
import pandas
import matplotlib.transforms as tr
df = pandas.DataFrame([.02, .01, -0.01, .05], index = ["2017-03-31", "2017-06-30", "2017-09-30", "2017-12-31"], columns = ["test"])
fig, ax = pyplot.subplots(1, 1, figsize = (7, 3))
df.plot(kind = "bar", ax = ax, width = 1)
pyplot.autoscale(enable=True, axis='x', tight=True) # tight layout
# for each tick label, shift 0.7 to right
for tick in ax.get_xticklabels():
tick.set_transform(tick.get_transform()+tr.ScaledTranslation(0.7, 0, fig.dpi_scale_trans))
pyplot.show()
The result looks like this.

Using "hue" for a Seaborn visual: how to get legend in one graph?

I created a scatter plot in seaborn using seaborn.relplot, but am having trouble putting the legend all in one graph.
When I do this simple way, everything works fine:
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
df2 = df[df.ln_amt_000s < 700]
sns.relplot(x='ln_amt_000s', y='hud_med_fm_inc', hue='outcome', size='outcome', legend='brief', ax=ax, data=df2)
The result is a scatter plot as desired, with the legend on the right hand side.
However, when I try to generate a matplotlib figure and axes objects ahead of time to specify the figure dimensions I run into problems:
a4_dims = (10, 10) # generating a matplotlib figure and axes objects ahead of time to specify figure dimensions
df2 = df[df.ln_amt_000s < 700]
fig, ax = plt.subplots(figsize = a4_dims)
sns.relplot(x='ln_amt_000s', y='hud_med_fm_inc', hue='outcome', size='outcome', legend='brief', ax=ax, data=df2)
The result is two graphs -- one that has the scatter plots as expected but missing the legend, and another one below it that is all blank except for the legend on the right hand side.
How do I fix this such? My desired result is one graph where I can specify the figure dimensions and have the legend at the bottom in two rows, below the x-axis (if that is too difficult, or not supported, then the default legend position to the right on the same graph would work too)? I know the problem lies with "ax=ax", and in the way I am specifying the dimensions as matplotlib figure, but I'd like to know specifically why this causes a problem so I can learn from this.
Thank you for your time.
The issue is that sns.relplot is a "Figure-level interface for drawing relational plots onto a FacetGrid" (see the API page). With a simple sns.scatterplot (the default type of plot used by sns.relplot), your code works (changed to use reproducible data):
df = pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv", index_col=0)
fig, ax = plt.subplots(figsize = (5,5))
sns.scatterplot(x = 'Sepal.Length', y = 'Sepal.Width',
hue = 'Species', legend = 'brief',
ax=ax, data = df)
plt.show()
Further edits to legend
Seaborn's legends are a bit finicky. Some tweaks you may want to employ:
Remove the default seaborn title, which is actually a legend entry, by getting and slicing the handles and labels
Set a new title that is actually a title
Move the location and make use of bbox_to_anchor to move outside the plot area (note that the bbox parameters need some tweaking depending on your plot size)
Specify the number of columns
fig, ax = plt.subplots(figsize = (5,5))
sns.scatterplot(x = 'Sepal.Length', y = 'Sepal.Width',
hue = 'Species', legend = 'brief',
ax=ax, data = df)
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:], loc=8,
ncol=2, bbox_to_anchor=[0.5,-.3,0,0])
plt.show()

Plot several boxplots in one figure

I am using python-3.x and I would like to plot several boxplots in one figure, all the data from one numpy array where the shape of this array is (100, 301)
If I use the code below it will plot them all (I will have 301 boxplots in one figure which is too much)
fig, ax = plt.subplots()
ax.boxplot(my_data)
plt.show()
I don't want to plot all the data, I just want to plot 10, 15 or 20 (variable number) of the data by using for loop or any method that work best.
for example, I want to plot boxplots every 50 number of data that mean I will have around 6 boxplots from 301 in my figure, I tried to use for loop but no luck
Any advice would be much appreciated
You can just use indexing to plot every 50th data points using a variable step. To have separate box plots and avoid overlapping, you can specify the positions of individual box plot using the positions parameter. my_data[:, ::step] gives you the desired data to plot. Below is an example using some random data.
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
my_data = np.random.randint(0, 20, (100, 301))
step = 50
posit = range(my_data[:, ::step].shape[1])
ax.boxplot(my_data[:, ::step], positions=posit)
plt.show()

Difference between Scatter object and figure.scatter method

I have the following question related to these two graphs:
Graph 1:
output_notebook()
scatter = Scatter(df_b, x='log_umsatz', y='log_fte', color='target', legend="top_right")
show(scatter)
Graph 2
output_notebook()
scatter = figure(plot_width=500, plot_height=500)
scatter.scatter(x=df_b['log_umsatz'], y=df_b['log_fte'], color=df['target'])
p.legend.location = "top_left"
p.legend.click_policy="hide"
show(scatter)
As you can see, I generated two scatter graphs using bokeh. In the second graph, I try to introduce some interactivity with p.legend.click_policy="hide". I have two issues: The interactivity doesn't work, and legend and color coding are lost in the second example. How come? I expected graph 1 and graph 2 to be identical.
Your main issue is that you are using for Graph 1 Scatter which is a Bokeh Charts model. Bokeh Charts is a high level library for plotting data and does a lot of data processing and chart formatting for you behind the scenes. In Graph 2, you are using a Bokeh glyph to create your plot and so you need to be much more explicit in what you want it to do.
Fixing up your code I can produce the same graph as that original Scatter.
cds = ColumnDataSource(df_b)
color_mapper = CategoricalColorMapper(
palette=['red', 'green'], factors=[0, 1])
scatter = figure(plot_width=500, plot_height=500)
scatter.circle(x='log_umsatz', y='log_fte',
color={'field': 'target', 'transform': color_mapper}, alpha=0.5,
source=cds, legend='target')
scatter.legend.location = "top_right"
As you can see, we need to call in multiple other Bokeh objects. ColumnDataSource to store the pandas data and CategoricalColorMapper to map the colors to the factors.
Now adding an interactive legend to the plot is a little more complicated. Right now on Bokeh interactive legend works on a per glyph basis. That is to say each glyph must be plotted separately to be intractable. You can read more about it here, and here's a quick demo to help you.
scatter = figure(plot_width=500, plot_height=500)
scatter.circle(x=[1, 2, 3], y=[1, 2, 3], color='red', legend='0', alpha=0.5)
scatter.circle(x=[4, 5], y=[4, 5], color='green', legend='1', alpha=0.5)
scatter.legend.location = "top_right"
scatter.legend.click_policy = "hide"

Resources