How to plot histogram subplots for each group - python-3.x

When I run the following code, I get 4 different histograms separated by groups. How can I achieve the same type of visualization with 4 different sns.distplot() also separated by their groups?
df = pd.DataFrame({
"group": [1, 1, 2, 2, 3, 3, 4, 4],
"similarity": [0.1, 0.2, 0.35, 0.6, 0.7, 0.25, 0.15, 0.55]
})
df['similarity'].hist(by=df['group'])

seaborn is a high-level api for matplotlib, and pandas uses matplotlib as the default plotting backend.
From seaborn v0.11.2, sns.distplot is deprecated, and, as per the Warning in the documentation, it is not recommended to directly use FacetGrid.
sns.distplot is replaced by the axes-level function sns.histplot, and the figure-level function sns.displot.
Also see seaborn histplot and displot output doesn't match
It is easy to produce a plot, but not necessarily to produce the correct plot, unless you are aware of the different parameter defaults for each api.
Note the difference between common_bins as True and Fales.
Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1, seaborn 0.11.2
common_bins=False
import seaborn as sns
# plot
g = sns.displot(data=df, x='similarity', col='group', col_wrap=2, common_bins=False, height=4)
common_bins=True (4)
sns.displot, and pandas.DataFrame.plot with kind='hist' and bins=4 produce the same plot.
g = sns.displot(data=df, x='similarity', col='group', col_wrap=2, common_bins=True, bins=4, height=4)
# reshape the dataframe to a wide format
dfp = df.pivot(columns='group', values='similarity')
axes = dfp.plot(kind='hist', subplots=True, layout=(2, 2), figsize=(9, 9), ec='k', bins=4, sharey=True)

You can use FacetGrid from seaborn:
import seaborn as sns
g = sns.FacetGrid(data=df, col='group', col_wrap=2)
g.map(sns.histplot, 'similarity')
Output:

Related

Set edgecolor on seaborn jointplot

I am able to set edgecolors for a seaborn histogram by passing in a hist_kws argument:
sns.distplot(ad_data["Age"], kde = False, bins = 35, hist_kws = {"ec":"black"})
However, I'm unable to similarly set edgecolors for the histograms in a seaborn jointplot. It doesn't accept a hist_kws argument or any other similar argument to set edgecolors. I'm unable to find anything in the document that addresses this. Any help would be appreciated.
For reference, I'm using seaborn 0.9 and matplotlib 3.1.
You need a 'hist_kws' inside the 'marginal_kws':
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
x = np.random.normal(np.repeat([2, 8, 7, 10], 1000), 1)
y = np.random.normal(np.repeat([7, 2, 9, 4], 1000), 1)
g = sns.jointplot(x=x, y=y, color='purple', alpha=0.1,
marginal_kws={'color': 'tomato', 'hist_kws': {'edgecolor': 'black'}})
plt.show()
In this case, jointplot sends the marginal_kws to distplot which in its turn sends the hist_kws to matplotlib's hist.
Similarly, you can also set the parameters of a kde for the distplot:
g = sns.jointplot(x=x, y=y, kind='hex', color='indigo',
marginal_kws={'color': 'purple', 'kde': True,
'kde_kws': {'color': 'crimson', 'lw': 1},
'hist_kws': {'ec': 'black', 'lw': 2}})

How to visualize a list of strings on a colorbar in matplotlib

I have a dataset like
x = 3,4,6,77,3
y = 8,5,2,5,5
labels = "null","exit","power","smile","null"
Then I use
from matplotlib import pyplot as plt
plt.scatter(x,y)
colorbar = plt.colorbar(labels)
plt.show()
to make a scatter plot, but cannot make colorbar showing labels as its colors.
How to get this?
I'm not sure, if it's a good idea to do that for scatter plots in general (you have the same description for different data points, maybe just use some legend here?), but I guess a specific solution to what you have in mind, might be the following:
from matplotlib import pyplot as plt
# Data
x = [3, 4, 6, 77, 3]
y = [8, 5, 2, 5, 5]
labels = ('null', 'exit', 'power', 'smile', 'null')
# Customize colormap and scatter plot
cm = plt.cm.get_cmap('hsv')
sc = plt.scatter(x, y, c=range(5), cmap=cm)
cbar = plt.colorbar(sc, ticks=range(5))
cbar.ax.set_yticklabels(labels)
plt.show()
This will result in such an output:
The code combines this Matplotlib demo and this SO answer.
Hope that helps!
EDIT: Incorporating the comments, I can only think of some kind of label color dictionary, generating a custom colormap from the colors, and before plotting explicitly grabbing the proper color indices from the labels.
Here's the updated code (I added some additional colors and data points to check scalability):
from matplotlib import pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import numpy as np
# Color information; create custom colormap
label_color_dict = {'null': '#FF0000',
'exit': '#00FF00',
'power': '#0000FF',
'smile': '#FF00FF',
'addon': '#AAAAAA',
'addon2': '#444444'}
all_labels = list(label_color_dict.keys())
all_colors = list(label_color_dict.values())
n_colors = len(all_colors)
cm = LinearSegmentedColormap.from_list('custom_colormap', all_colors, N=n_colors)
# Data
x = [3, 4, 6, 77, 3, 10, 40]
y = [8, 5, 2, 5, 5, 4, 7]
labels = ('null', 'exit', 'power', 'smile', 'null', 'addon', 'addon2')
# Get indices from color list for given labels
color_idx = [all_colors.index(label_color_dict[label]) for label in labels]
# Customize colorbar and plot
sc = plt.scatter(x, y, c=color_idx, cmap=cm)
c_ticks = np.arange(n_colors) * (n_colors / (n_colors + 1)) + (2 / n_colors)
cbar = plt.colorbar(sc, ticks=c_ticks)
cbar.ax.set_yticklabels(all_labels)
plt.show()
And, the new output:
Finding the correct middle point of each color segment is (still) not good, but I'll leave this optimization to you.

create density plots of continuous field by categorical field

I have the code below which overlays a density curve on a histogram. It does this for the ‘Fresh’ field in my data, which is a continuous field. I would like to create similar plots filtering by the unique values in the ‘Channel’ field. For example in pandas to create histograms similar to what I'm trying to accomplish I would use:
data_df.hist(column=‘Fresh’,by=‘Channel’)
Can anyone suggest how to do something similar for the seaborn code below?
code:
import seaborn as sns
sns.distplot(data_df[‘Fresh’], hist=True, kde=True,
bins=int(data_df.shape[0]/5), color = 'darkblue',
hist_kws={'edgecolor':'black'},
kde_kws={'linewidth': 4})
data
Channel Fresh
0 2 12669
1 2 7057
2 2 6353
3 1 13265
4 2 22615
5 2 9413
6 2 12126
7 2 7579
8 1 5963
9 2 6006
I think the Seaborn way is to create a FacetGrid, and then to map an axis-level plotting function onto it. In your case:
g = sns.FacetGrid(data_df, col='Channel', margin_titles=True)
g.map(sns.distplot,
'Fresh',
bins=int(data_df.shape[0]/5),
color='darkblue',
hist_kws={'edgecolor': 'black'},
kde_kws={'linewidth': 4});
Check out the docs for more: https://seaborn.pydata.org/tutorial/axis_grids.html
Alternatively, you can groupby your DataFrame based on the Channel and then plot the two groups in different subplots
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
data_df = pd.DataFrame({'Channel': [2, 2, 2, 1, 2, 2, 2, 2, 1, 2],
'Fresh': [12669, 7057, 6353, 13265, 22615,
9413, 12126, 7579, 5963,6006]})
df1 = data_df.groupby('Channel')
fig, axes = plt.subplots(nrows=1, ncols=len(df1), figsize=(10, 3))
for ax, df in zip(axes.flatten(), df1.groups):
sns.distplot(df1.get_group(df)['Fresh'], hist=True, kde=True,
bins=int(data_df.shape[0]/5), color = 'darkblue',
hist_kws={'edgecolor':'black'},
kde_kws={'linewidth': 4}, ax=ax)
plt.tight_layout()

Masking annotations in seaborn heatmap

I would like to make a heatmap that has annotation only in specific cells. I though one way to do this would be to make a heatmap with annotations in all cells and then overlay another heatmap that has no annotation but that is masked in the regions that I want the original annotations to be visible:
import numpy as np
import seaborn as sns
par_corr_p = np.array([[1, 2], [3, 4]])
masked_array = np.ma.array(par_corr_p, mask=par_corr_p<2)
fig, ax = plt.subplots()
sns.heatmap(par_corr_p, ax=ax, cmap ='RdBu_r', annot = par_corr_p, center=0, vmin=-5, vmax=5)
sns.heatmap(par_corr_p, mask = masked_array.mask, ax=ax, cmap ='RdBu_r', center=0, vmin=-5, vmax=5)
However, this is not working - the second heatmap is not covering up the first one:
Please advise
I tried a few things, including using numpy.nan or "" in the annot array. Unfortunately they don't work.
This is probably the easiest way. It involves grabbing the texts of the axes, which should only be the labels in annot which sns.heatmap puts there.
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
par_corr_p = np.array([[1, 2], [3, 4]])
data = par_corr_p
show_annot_array = data >= 2
fig, ax = plt.subplots()
sns.heatmap(
ax=ax,
data=data,
annot=data,
cmap ='RdBu_r', center=0, vmin=-5, vmax=5
)
for text, show_annot in zip(ax.texts, (element for row in show_annot_array for element in row)):
text.set_visible(show_annot)
plt.show()

Second y-axis and overlapping labeling?

I am using python for a simple time-series analysis of calory intake. I am plotting the time series and the rolling mean/std over time. It looks like this:
Here is how I do it:
## packages & libraries
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
from pandas import Series, DataFrame, Panel
## import data and set time series structure
data = pd.read_csv('time_series_calories.csv', parse_dates={'dates': ['year','month','day']}, index_col=0)
## check ts for stationarity
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries):
#Determing rolling statistics
rolmean = pd.rolling_mean(timeseries, window=14)
rolstd = pd.rolling_std(timeseries, window=14)
#Plot rolling statistics:
orig = plt.plot(timeseries, color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')
plt.show()
The plot doesn't look good - since the rolling std distorts the scale of variation and the x-axis labelling is screwed up. I have two question: (1) How can I plot the rolling std on a secony y-axis? (2) How can I fix the x-axis overlapping labeling?
EDIT
With your help I managed to get the following:
But do I get the legend sorted out?
1) Making a second (twin) axis can be done with ax2 = ax1.twinx(), see here for an example. Is this what you needed?
2) I believe there are several old answers to this question, i.e. here, here and here. According to the links provided, the easiest way is probably to use either plt.xticks(rotation=70) or plt.setp( ax.xaxis.get_majorticklabels(), rotation=70 ) or fig.autofmt_xdate().
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot([1, 2, 3, 4, 5], [1, 2, 3, 4, 5])
plt.xticks(rotation=70) # Either this
ax.set_xticks([1, 2, 3, 4, 5])
ax.set_xticklabels(['aaaaaaaaaaaaaaaa','bbbbbbbbbbbbbbbbbb','cccccccccccccccccc','ddddddddddddddddddd','eeeeeeeeeeeeeeeeee'])
# fig.autofmt_xdate() # or this
# plt.setp( ax.xaxis.get_majorticklabels(), rotation=70 ) # or this works
fig.tight_layout()
plt.show()
Answer to Edit
When sharing lines between different axes into one legend is to create some fake-plots into the axis you want to have the legend as:
ax1.plot(something, 'r--') # one plot into ax1
ax2.plot(something else, 'gx') # another into ax2
# create two empty plots into ax1
ax1.plot([][], 'r--', label='Line 1 from ax1') # empty fake-plot with same lines/markers as first line you want to put in legend
ax1.plot([][], 'gx', label='Line 2 from ax2') # empty fake-plot as line 2
ax1.legend()
In my silly example it is probably better to label the original plot in ax1, but I hope you get the idea. The important thing is to create the "legend-plots" with the same line and marker settings as the original plots. Note that the fake-plots will not be plotted since there is no data to plot.

Resources