Control marker properties in seaborn pairwise boxplot - python-3.x

I'm trying to plot a boxplot for two different datasets on the same plot. The x axis are the hours in a day, while the y axis goes from 0 to 1 (let's call it Efficiency). I would like to have different markers for the means of each dataset' boxes. I use the 'meanprops' for seaborn but that changes the marker style for both datasets at the same time. I've added 2000 lines of data in the excel that can be downloaded here. The values might not coincide with the ones in the picture but should be enough.
Basically I want the red squares to be blue on the orange boxplot, and red on the blue boxplot. Here is what I managed to do so far:
I tried changing the meanprops by using a dictionary with the labels as keys , but it seems to be entering a loop (in PyCharm is says Evaluating...)
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
#make sure you have your path sorted out
group1 = pd.read_excel('group1.xls')
ax,fig = plt.subplots(figsize = (20,10))
#does not work
#ax = sns.boxplot(data=group1, x='hour', y='M1_eff', hue='labels',showfliers=False, showmeans=True,\
# meanprops={"marker":{'7':"s",'8':'s'},"markerfacecolor":{'7':"white",'8':'white'},
#"markeredgecolor":{'7':"blue",'8':'red'})
#works but produces similar markers
ax = sns.boxplot(data=group1, x='hour', y='M1_eff', hue='labels',showfliers=False, showmeans=True,\
meanprops={"marker":"s","markerfacecolor":"white", "markeredgecolor":"blue"})
plt.legend(title='Groups', loc=2, bbox_to_anchor=(1, 1),borderaxespad=0.5)
# Add transparency to colors
for patch in ax.artists:
r, g, b, a = patch.get_facecolor()
patch.set_facecolor((r, g, b, .4))
ax.set_xlabel("Hours",fontsize=14)
ax.set_ylabel("M1 Efficiency",fontsize=14)
ax.tick_params(labelsize=10)
plt.show()
I also tried the FacetGrid but to no avail (Stops at 'Evaluating...'):
g = sns.FacetGrid(group1, col="M1_eff", hue="labels",hue_kws=dict(marker=["^", "v"]))
g = (g.map(plt.boxplot, "hour", "M1_eff")
.add_legend())
g.show()
Any help is appreciated!

I don't think you can do this using sns.boxplot() directly. I think you'll have to draw the means "by hand"
N=100
df = pd.DataFrame({'hour':np.random.randint(0,3,size=(N,)),
'M1_eff': np.random.random(size=(N,)),
'labels':np.random.choice([7,8],size=(N,))})
x_col = 'hour'
y_col = 'M1_eff'
hue_col = 'labels'
width = 0.8
hue_order=[7,8]
marker_colors = ['red','blue']
# get the offsets used by boxplot when hue-nesting is used
# https://github.com/mwaskom/seaborn/blob/c73055b2a9d9830c6fbbace07127c370389d04dd/seaborn/categorical.py#L367
n_levels = len(hue_order)
each_width = width / n_levels
offsets = np.linspace(0, width - each_width, n_levels)
offsets -= offsets.mean()
fig, ax = plt.subplots()
ax = sns.boxplot(data=df, x=x_col, y=y_col, hue=hue_col, hue_order=hue_order, showfliers=False, showmeans=False)
means = df.groupby([hue_col,x_col])[y_col].mean()
for (gr,temp),o,c in zip(means.groupby(level=0),offsets,marker_colors):
ax.plot(np.arange(temp.values.size)+o, temp.values, 's', c=c)

Related

How to rotate the xticks in a Seaborn.objects V0.12x plot

By chance, is there a way to rotate the xticks in the graphic below (just to make it a bit more readable)? The usual
sns.xticks() doesnt work in the new Seaborn.objects development (which is amazing!)
tcap.\
assign(date_time2 = tcap['date_time'].dt.date).\
groupby(['date_time2', 'person']).\
agg(counts = ('person', 'count')).\
reset_index().\
pipe(so.Plot, x = "date_time2", y = "counts", color = "person").\
add(so.Line(marker="o", edgecolor="w")).\
label(x = "Date", y = "# of messages",
color = str.capitalize,
title = "Plot 2: Volume of messages by person, by day").\
scale(color=so.Nominal(order=["lorne_a_20014", "kayla_princess94"])).\
show()
In addition, my x-axis is categorical and this warning:
Using categorical units to plot a list of strings that are all parsable as floats or dates. If these strings should be plotted as numbers, cast to the appropriate data type before plotting. appears. I tried using:
import warnings
warnings.filterwarnings("ignore",category=UserWarning)
This can be done by creating an Axis object, rotating the axes there, and then using the so.Plot().on() method to apply the rotated-axis labels. Note this will not work if you also plan to add facets (I found your question while coming to ask about how to combine this with facets).
import seaborn.objects as so
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'a':[1,2,3],
'b':[4,5,6]})
fig, ax = plt.subplots()
ax.xaxis.set_tick_params(rotation=90)
(so.Plot(df, x = 'a', y = 'b')
.add(so.Line())
.on(ax))

How to align heights and widths subplot axes with gridspec and matplotlib?

I am trying to use matplotlib with gridspec to create a subplot such that the axes are arranged to look similar to the figure below; the figure was taken from this unrelated question.
My attempt at recreating this axes arrangement is below. Specifically, my problem is that the axes are not properly aligned. For example, the axis object for the blue histogram is taller than the axis object for the image with various shades of green; the orange histogram seems to properly align in terms of width, but I attribute this to luck. How can I properly align these axes? Unlike the original figure, I would like to add/pad extra empty space between axes such that there borders do not intersect; the slice notation in the code below does this by adding a blank row/column. (In the interest of not making this post longer than it has to be, I did not make the figures "pretty" by playing with axis ticks and the like.)
Unlike the original picture, the axes are not perfectly aligned. Is there a way to do this without using constrained layout? By this, I mean some derivative of fig, ax = plt.subplots(constrained_layout=True)?
The MWE code to recreate my figure is below; note that there was no difference between ax.imshow(...) and ax.matshow(...).
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
## initialize figure and axes
fig = plt.figure()
gs = fig.add_gridspec(6, 6, hspace=0.2, wspace=0.2)
ax_bottom = fig.add_subplot(gs[4:, 2:])
ax_left = fig.add_subplot(gs[:4, :2])
ax_big = fig.add_subplot(gs[:4, 2:])
## generate data
x = np.random.normal(loc=50, scale=10, size=100)
y = np.random.normal(loc=500, scale=50, size=100)
## get singular histograms
x_counts, x_edges = np.histogram(x, bins=np.arange(0, 101, 5))
y_counts, y_edges = np.histogram(y, bins=np.arange(0, 1001, 25))
x_mids = (x_edges[1:] + x_edges[:-1]) / 2
y_mids = (y_edges[1:] + y_edges[:-1]) / 2
## get meshed histogram
sample = np.array([x, y]).T
xy_counts, xy_edges = np.histogramdd(sample, bins=(x_edges, y_edges))
## subplot histogram of x
ax_bottom.bar(x_mids, x_counts,
width=np.diff(x_edges),
color='darkorange')
ax_bottom.set_xlim([x_edges[0], x_edges[-1]])
ax_bottom.set_ylim([0, np.max(x_counts)])
## subplot histogram of y
ax_left.bar(y_mids, y_counts,
width=np.diff(y_edges),
color='steelblue')
ax_left.set_xlim([y_edges[0], y_edges[-1]])
ax_left.set_ylim([0, np.max(y_counts)])
## subplot histogram of xy-mesh
ax_big.imshow(xy_counts,
cmap='Greens',
norm=Normalize(vmin=np.min(xy_counts), vmax=np.max(xy_counts)),
interpolation='nearest',
origin='upper')
plt.show()
plt.close(fig)
EDIT:
One can initialize the axes by explicitly setting width_ratios and height_ratios per row/column; this is shown below. This doesn't affect the output, but maybe I'm using it incorrectly?
## initialize figure and axes
fig = plt.figure()
gs = gridspec.GridSpec(ncols=6, nrows=6, figure=fig, width_ratios=[1]*6, height_ratios=[1]*6)
ax_bottom = fig.add_subplot(gs[4:, 2:])
ax_left = fig.add_subplot(gs[:4, :2])
ax_big = fig.add_subplot(gs[:4, 2:])
The problem is with imshow, which resizes the axes automatically to maintain a square pixel aspect.
You can prevent this by calling:
ax_big.imshow(..., aspect='auto')

How do you adjust the spacing on a barchart and it's xtick labels?

I am trying to create a barchart (overlaid on a line graph with days as the x axis instead of quarters) where the labels are end-of-quarter days. That is all fine, and generates nicely, but I am trying to set the labels so that they are lined up with the right edge of the plot and the corresponding bar's right-side is aligned with the x-tick.
A reproducible example (with just the bar chart, not the line) is:
import matplotlib.pyplot as pyplot
import pandas
import random
random.seed(2020)
dates = pandas.date_range("2016-12-31", "2017-12-31")
bar = pandas.DataFrame([.02, .01, -0.01, .05], index = ["2017-03-31", "2017-06-30", "2017-09-30", "2017-12-31"], columns = ["test"])
line = pandas.DataFrame([random.random() for r in range(len(dates))], index = dates, columns = ["test"])
fig, ax = pyplot.subplots(1, 1, figsize = (7, 3))
ax2 = fig.add_subplot(111, frame_on = False)
bar.plot(kind = "bar", ax = ax, width = 1)
line.plot(kind = "line", ax = ax2)
ax2.set_xticks([])
ax.yaxis.tick_right()
ax.yaxis.set_label_position("right")
fig.tight_layout()
pyplot.show()
Which yields a plot as:
My goal is to have the right side of the 2017-12-31 column aligned with the right edge of the plot and the 2017-12-31 label at the right side as well. Further, the left side of the 2017-03-31 bar touch the left side of the plot. For the remaining bars, I would like them evenly spaced with all labels aligned with the right side of each bar, and no space in between bars. Like this example below:
Frankly, I'm at a loss. I've tried adding ha="right" to no such avail and just shifting the graphs but that leaves me with other problems and doesn't really address the problem. Even with the bars shifted, I'm still fairly constrained as to moving the tick labels and haven't found anything online that remotely addresses the problem.
Would it be better to create the bar chart so that it has the same index as the line chart, then set the x tick labels to be the desired dates?
Does anyone have any guidance? I've spent too much time on this problem today and it's driving me nuts.
In order to plot the bar chart tightly, you can use the autoscale function as below.
To move the tick labels, you can modify the transformations to include some offset. Below I used 0.7 but you can select it based on other sizes used in your chart.
import matplotlib.pyplot as pyplot
import pandas
import matplotlib.transforms as tr
df = pandas.DataFrame([.02, .01, -0.01, .05], index = ["2017-03-31", "2017-06-30", "2017-09-30", "2017-12-31"], columns = ["test"])
fig, ax = pyplot.subplots(1, 1, figsize = (7, 3))
df.plot(kind = "bar", ax = ax, width = 1)
pyplot.autoscale(enable=True, axis='x', tight=True) # tight layout
# for each tick label, shift 0.7 to right
for tick in ax.get_xticklabels():
tick.set_transform(tick.get_transform()+tr.ScaledTranslation(0.7, 0, fig.dpi_scale_trans))
pyplot.show()
The result looks like this.

Matplotlib bar plot with table formatting

I have added a table to the bottom of my plot, but there are a number of issues with it:
The right has too much padding.
The left has too little padding.
The bottom has no padding.
The cells are too small for the text within them.
The table is too close to the bottom of the plot.
The cells belonging to the row names are not colored to match those of the bars.
I'm going out of my mind fiddling with this. Can someone help me fix these issues?
Here is the code (Python 3):
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import matplotlib
# Set styles
plt.style.use(['seaborn-paper', 'seaborn-whitegrid'])
plt.style.use(['seaborn'])
sns.set(palette='colorblind')
matplotlib.rc("font", family="Times New Roman", size=12)
labels = ['n=1','n=2','n=3','n=4','n=5']
a = [98.8,98.8,98.8,98.8,98.8]
b = [98.6,97.8,97.0,96.2,95.4]
bar_width = 0.20
data = [a,b]
print(data)
colors = plt.cm.BuPu(np.linspace(0, 0.5, len(labels)))
columns = ('n=1', 'n=2', 'n=3', 'n=4', 'n=5')
index = np.arange(len(labels))
plt.bar(index, a, bar_width)
plt.bar(index+bar_width+.02, b, bar_width)
plt.table(cellText=data,
rowLabels=['a', 'b'],
rowColours=colors,
colLabels=columns,
loc='bottom')
plt.subplots_adjust(bottom=0.7)
plt.ylabel('Some y label which effect the bottom padding!')
plt.xticks([])
plt.title('Some title')
plt.show()
This is the output:
Update
This is working now, but in case someone else is having issues: Make sure you are not viewing your plots and the changes you make to them with IntelliJ SciView as it does not represent changes accurately and introduces some formatting issues!
I think you can fix the first problem by setting the bounding box when you make the table using bbox like this:
bbox=[0, 0.225, 1, 0.2]
where the parameters are [left, bottom, width, height].
For the second issue (the coloring), that is because the color array is not corresponding to the seaborn coloring. You can query the seaborn color palette with
sns.color_palette(palette='colorblind')
this will give you a list of the colors seaborn is using.
Check the modifications below:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import matplotlib
# Set styles
plt.style.use(['seaborn-paper', 'seaborn-whitegrid'])
plt.style.use(['seaborn'])
sns.set(palette='colorblind')
matplotlib.rc("font", family="Times New Roman", size=12)
labels = ['n=1','n=2','n=3','n=4','n=5']
a = [98.8,98.8,98.8,98.8,98.8]
b = [98.6,97.8,97.0,96.2,95.4]
bar_width = 0.20
data = [a,b]
colors = sns.color_palette(palette='colorblind')
columns = ('n=1', 'n=2', 'n=3', 'n=4', 'n=5')
index = np.arange(len(labels))
fig = plt.figure(figsize=(12,9))
plt.bar(index, a, bar_width)
plt.bar(index+bar_width+.02, b, bar_width)
plt.table(cellText=data,
rowLabels=[' a ', ' b '],
rowColours=colors,
colLabels=columns,
loc='bottom',
bbox=[0, 0.225, 1, 0.2])
fig.subplots_adjust(bottom=0.1)
plt.ylabel('Some y label which effect the bottom padding!')
plt.xticks([])
plt.title('Some title')
plt.show()
I also changed the subplot adjustment to subplot_adjust(bottom=0.1) because it wasn't coming out right otherwise. Here is the output:

Seaborn barplot with two y-axis

considering the following pandas DataFrame:
labels values_a values_b values_x values_y
0 date1 1 3 150 170
1 date2 2 6 200 180
It is easy to plot this with Seaborn (see example code below). However, due to the big difference between values_a/values_b and values_x/values_y, the bars for values_a and values_b are not easily visible (actually, the dataset given above is just a sample and in my real dataset the difference is even bigger). Therefore, I would like to use two y-axis, i.e., one y-axis for values_a/values_b and one for values_x/values_y. I tried to use plt.twinx() to get a second axis but unfortunately, the plot shows only two bars for values_x and values_y, even though there are at least two y-axis with the right scaling. :) Do you have an idea how to fix that and get four bars for each label whereas the values_a/values_b bars relate to the left y-axis and the values_x/values_y bars relate to the right y-axis?
Thanks in advance!
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
columns = ["labels", "values_a", "values_b", "values_x", "values_y"]
test_data = pd.DataFrame.from_records([("date1", 1, 3, 150, 170),\
("date2", 2, 6, 200, 180)],\
columns=columns)
# working example but with unreadable values_a and values_b
test_data_melted = pd.melt(test_data, id_vars=columns[0],\
var_name="source", value_name="value_numbers")
g = sns.barplot(x=columns[0], y="value_numbers", hue="source",\
data=test_data_melted)
plt.show()
# values_a and values_b are not displayed
values1_melted = pd.melt(test_data, id_vars=columns[0],\
value_vars=["values_a", "values_b"],\
var_name="source1", value_name="value_numbers1")
values2_melted = pd.melt(test_data, id_vars=columns[0],\
value_vars=["values_x", "values_y"],\
var_name="source2", value_name="value_numbers2")
g1 = sns.barplot(x=columns[0], y="value_numbers1", hue="source1",\
data=values1_melted)
ax2 = plt.twinx()
g2 = sns.barplot(x=columns[0], y="value_numbers2", hue="source2",\
data=values2_melted, ax=ax2)
plt.show()
This is probably best suited for multiple sub-plots, but if you are truly set on a single plot, you can scale the data before plotting, create another axis and then modify the tick values.
Sample Data
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
columns = ["labels", "values_a", "values_b", "values_x", "values_y"]
test_data = pd.DataFrame.from_records([("date1", 1, 3, 150, 170),\
("date2", 2, 6, 200, 180)],\
columns=columns)
test_data_melted = pd.melt(test_data, id_vars=columns[0],\
var_name="source", value_name="value_numbers")
Code:
# Scale the data, just a simple example of how you might determine the scaling
mask = test_data_melted.source.isin(['values_a', 'values_b'])
scale = int(test_data_melted[~mask].value_numbers.mean()
/test_data_melted[mask].value_numbers.mean())
test_data_melted.loc[mask, 'value_numbers'] = test_data_melted.loc[mask, 'value_numbers']*scale
# Plot
fig, ax1 = plt.subplots()
g = sns.barplot(x=columns[0], y="value_numbers", hue="source",\
data=test_data_melted, ax=ax1)
# Create a second y-axis with the scaled ticks
ax1.set_ylabel('X and Y')
ax2 = ax1.twinx()
# Ensure ticks occur at the same positions, then modify labels
ax2.set_ylim(ax1.get_ylim())
ax2.set_yticklabels(np.round(ax1.get_yticks()/scale,1))
ax2.set_ylabel('A and B')
plt.show()

Resources