Add labels to each box in seaborn's factorplot boxplot - python-3.x

I know there are similar answers such as this one, but that one applies to seaborn's boxplot and it's not working for me with seaborn's factorplot. On a simple factorplot:
import seaborn as sns
tips = sns.load_dataset("tips")
means = tips.groupby(["sex","smoker","time"])["tip"].mean().values
means_labels = [str(int(s)) for s in means]
with sns.plotting_context("notebook",font_scale=2):
g = sns.factorplot(x="sex", y="total_bill", hue="smoker",\
col="time", data=tips, kind="box", size=6, aspect=.7)
How can one add an annotation (in the example above, the means_labels) below each box, like this:
As I said, I tried using the answer above to at least try to get the position of each box:
import matplotlib.pyplot as plt
ax = plt.gca()
pos = range(len(means))
for tick,label in zip(pos,ax.get_xticklabels()):
ax.text(pos[tick], means[tick] + 0.5, meanslabels[tick],
horizontalalignment='center', color='r', weight='semibold')
But this produces:
I believe this is because I'm passing the whole plot's axes instead of the "factorplot" axes. But I couldn't find a way to do so (if instead of ax=plt.gca() I use, like in the example, ax=sns.factorplot(...), I get the error: AttributeError: module 'seaborn' has no attribute 'gca').

Related

How to increase the size of the figure by percentage but keep the original aspect ratio?

I have the following code to draw a figure
import pandas as pd
import urllib3
import seaborn as sns
decathlon = pd.read_csv("https://raw.githubusercontent.com/leanhdung1994/Deep-Learning/main/decathlon.txt", sep='\t')
fig = sns.scatterplot(data = decathlon,
x = '100m', y = 'Long.jump',
hue = 'Points', palette = 'viridis')
sns.regplot(data = decathlon,
x = '100m', y = 'Long.jump',
scatter = False)
I read answers for similar questions and they use the option plt.figure(figsize=(20,10)). I would like to keep the original aspect (the ration of width to length), but increase the size of the figure by some percentage for better look.
Could you please elaborate on how to do so?
I forgot to add a line %config InlineBackend.figure_format = 'svg' in above code. When I add this line below answer unfortunately does not work.
First, the object returned by scatterplot() is an Axes, not a figure. scatterplot() uses the current axes to draw the plot. If there is no current axes, then matplotlib automatically creates one in the current figure. If there is not current figure, then matplotlib automatically creates a new figure.
The size of this figure is determined by the value in rcParams['figure.figsize']. Therefore, you should create a figure that has the same aspect ratio as defined in this variable before calling your plots.
For instance, the code below creates a figure that's 2x the size of the default figure.
tips = sns.load_dataset('tips')
fig = plt.figure(figsize= 2 * np.array(plt.rcParams['figure.figsize']))
ax = sns.scatterplot(data=tips, x="total_bill", y="tip", hue="day")
sns.regplot(data=tips, x="total_bill", y="tip", scatter=False, ax=ax)

Using "hue" for a Seaborn visual: how to get legend in one graph?

I created a scatter plot in seaborn using seaborn.relplot, but am having trouble putting the legend all in one graph.
When I do this simple way, everything works fine:
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
df2 = df[df.ln_amt_000s < 700]
sns.relplot(x='ln_amt_000s', y='hud_med_fm_inc', hue='outcome', size='outcome', legend='brief', ax=ax, data=df2)
The result is a scatter plot as desired, with the legend on the right hand side.
However, when I try to generate a matplotlib figure and axes objects ahead of time to specify the figure dimensions I run into problems:
a4_dims = (10, 10) # generating a matplotlib figure and axes objects ahead of time to specify figure dimensions
df2 = df[df.ln_amt_000s < 700]
fig, ax = plt.subplots(figsize = a4_dims)
sns.relplot(x='ln_amt_000s', y='hud_med_fm_inc', hue='outcome', size='outcome', legend='brief', ax=ax, data=df2)
The result is two graphs -- one that has the scatter plots as expected but missing the legend, and another one below it that is all blank except for the legend on the right hand side.
How do I fix this such? My desired result is one graph where I can specify the figure dimensions and have the legend at the bottom in two rows, below the x-axis (if that is too difficult, or not supported, then the default legend position to the right on the same graph would work too)? I know the problem lies with "ax=ax", and in the way I am specifying the dimensions as matplotlib figure, but I'd like to know specifically why this causes a problem so I can learn from this.
Thank you for your time.
The issue is that sns.relplot is a "Figure-level interface for drawing relational plots onto a FacetGrid" (see the API page). With a simple sns.scatterplot (the default type of plot used by sns.relplot), your code works (changed to use reproducible data):
df = pd.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/datasets/iris.csv", index_col=0)
fig, ax = plt.subplots(figsize = (5,5))
sns.scatterplot(x = 'Sepal.Length', y = 'Sepal.Width',
hue = 'Species', legend = 'brief',
ax=ax, data = df)
plt.show()
Further edits to legend
Seaborn's legends are a bit finicky. Some tweaks you may want to employ:
Remove the default seaborn title, which is actually a legend entry, by getting and slicing the handles and labels
Set a new title that is actually a title
Move the location and make use of bbox_to_anchor to move outside the plot area (note that the bbox parameters need some tweaking depending on your plot size)
Specify the number of columns
fig, ax = plt.subplots(figsize = (5,5))
sns.scatterplot(x = 'Sepal.Length', y = 'Sepal.Width',
hue = 'Species', legend = 'brief',
ax=ax, data = df)
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:], loc=8,
ncol=2, bbox_to_anchor=[0.5,-.3,0,0])
plt.show()

How to make a graph using matplotlib with equally spaced powers of 10? [duplicate]

I want to plot a graph with one logarithmic axis using matplotlib.
I've been reading the docs, but can't figure out the syntax. I know that it's probably something simple like 'scale=linear' in the plot arguments, but I can't seem to get it right
Sample program:
import pylab
import matplotlib.pyplot as plt
a = [pow(10, i) for i in range(10)]
fig = plt.figure()
ax = fig.add_subplot(2, 1, 1)
line, = ax.plot(a, color='blue', lw=2)
pylab.show()
You can use the Axes.set_yscale method. That allows you to change the scale after the Axes object is created. That would also allow you to build a control to let the user pick the scale if you needed to.
The relevant line to add is:
ax.set_yscale('log')
You can use 'linear' to switch back to a linear scale. Here's what your code would look like:
import pylab
import matplotlib.pyplot as plt
a = [pow(10, i) for i in range(10)]
fig = plt.figure()
ax = fig.add_subplot(2, 1, 1)
line, = ax.plot(a, color='blue', lw=2)
ax.set_yscale('log')
pylab.show()
First of all, it's not very tidy to mix pylab and pyplot code. What's more, pyplot style is preferred over using pylab.
Here is a slightly cleaned up code, using only pyplot functions:
from matplotlib import pyplot
a = [ pow(10,i) for i in range(10) ]
pyplot.subplot(2,1,1)
pyplot.plot(a, color='blue', lw=2)
pyplot.yscale('log')
pyplot.show()
The relevant function is pyplot.yscale(). If you use the object-oriented version, replace it by the method Axes.set_yscale(). Remember that you can also change the scale of X axis, using pyplot.xscale() (or Axes.set_xscale()).
Check my question What is the difference between ‘log’ and ‘symlog’? to see a few examples of the graph scales that matplotlib offers.
if you want to change the base of logarithm, just add:
plt.yscale('log',base=2)
Before Matplotlib 3.3, you would have to use basex/basey as the bases of log
You simply need to use semilogy instead of plot:
from pylab import *
import matplotlib.pyplot as pyplot
a = [ pow(10,i) for i in range(10) ]
fig = pyplot.figure()
ax = fig.add_subplot(2,1,1)
line, = ax.semilogy(a, color='blue', lw=2)
show()
I know this is slightly off-topic, since some comments mentioned the ax.set_yscale('log') to be "nicest" solution I thought a rebuttal could be due. I would not recommend using ax.set_yscale('log') for histograms and bar plots. In my version (0.99.1.1) i run into some rendering problems - not sure how general this issue is. However both bar and hist has optional arguments to set the y-scale to log, which work fine.
references:
http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.bar
http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hist
So if you are simply using the unsophisticated API, like I often am (I use it in ipython a lot), then this is simply
yscale('log')
plot(...)
Hope this helps someone looking for a simple answer! :).

Matplotlib - Axes collision warning when setting aspect ratio

I am using matplotlib to plot a hexbin. As a simple example-
import matplotlib.pyplot as plt
import numpy as np
x = np.random.rand(100)
y = np.random.rand(100)
plt.hexbin(x, y, gridsize = 15, cmap='inferno')
plt.gca().invert_yaxis() # To make top left corner as origin
plt.axes().set_aspect('equal', 'datalim')
plt.show()
I get the following warning-
"MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance."
I think it is due to the line-
plt.axes().set_aspect('equal', 'datalim')
How can I use different arguments in this case. The version of matplotlibis 2.1.1
It doesn't seem like you want to create a new axes anyways. So don't use plt.axes() here. Instead get the current axes in the usual way (plt.gca()) and use any of its methods.
plt.gca().set_aspect('equal', 'datalim')

Take control of Seaborn marginal histograms?

Question 1:
How do I remove excess space in the plot, when plotting marginals? Answered below in first post.
Question 2:
How do I get more fine contorl over the margin histogram plots, e.g. to plot both histogram and decide kde parameters for the marginals? Answered below in second post, with JointGrid.
#!/usr/bin/env python3
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
sns.set_palette("viridis")
sns.set(style="white", color_codes=True)
x = np.random.normal(0, 1, 1000)
y = np.random.normal(5, 1, 1000)
df = pd.DataFrame({"x":x, "y":y})
g = sns.jointplot(df["x"],df["y"], bw=0.15, shade=True, xlim=(-3,3), ylim=(2,8),cmap="coolwarm", kind="kde", stat_func=None)
# plt.tight_layout() # This will override seaborn parameters. Remember to exclude.
plt.show()
jointplot has a space parameter that determines the space between the mainplot and the marginplots.
Running this code:
g = sns.jointplot(df["x"],df["y"], bw=0.15, shade=True, xlim=(-3,3),
ylim=(2,8),cmap="coolwarm", kind="kde",
stat_func=None, space = 0)
plt.show()
results in this plot for me:
Please note that running with plt.tight_layout() will overrule the space argument for jointplot.
Edit:
To further specify the parameters of the marginal plot you can use marginal_kws. You must pass a dictionary that specifies the parameters of the kind of marginal plot you use.
In your example you use the kde plot as marginal plots. So I will continue to use that as an example:
Here I show how to change the kernel used to make the marginal plots.
g = sns.jointplot(df["x"],df["y"], bw=0.15, shade=True, xlim=(-3,3),
ylim=(2,8),cmap="coolwarm", kind="kde",
stat_func=None, space = 0, marginal_kws={'kernel': 'epa'})
plt.show()
resulting in this graph:
You can pass any parameter accepted by the kde plot as a key in the dictionary and the desired value for that parameter as the value of for that key.
Okay, so I'm going to go ahead and post an extra answer myself. It's not entirely apparent to me which parameters the extra marginal_kws can control. Instead, it might be more intuitive to build the plot layer-by-layer (especially coming from ggplot) using JointGrid:
g = sns.JointGrid(x="x", y="y", data=df) # Initiate multi-plot
g.plot_joint(sns.kdeplot) # Plot the center x/y plot as sns.kdeplot
g.plot_marginals(sns.distplot, kde=True) # Plot the edges as sns.distplot (histogram), where kde can be set to True

Resources