Masking annotations in seaborn heatmap - python-3.x

I would like to make a heatmap that has annotation only in specific cells. I though one way to do this would be to make a heatmap with annotations in all cells and then overlay another heatmap that has no annotation but that is masked in the regions that I want the original annotations to be visible:
import numpy as np
import seaborn as sns
par_corr_p = np.array([[1, 2], [3, 4]])
masked_array = np.ma.array(par_corr_p, mask=par_corr_p<2)
fig, ax = plt.subplots()
sns.heatmap(par_corr_p, ax=ax, cmap ='RdBu_r', annot = par_corr_p, center=0, vmin=-5, vmax=5)
sns.heatmap(par_corr_p, mask = masked_array.mask, ax=ax, cmap ='RdBu_r', center=0, vmin=-5, vmax=5)
However, this is not working - the second heatmap is not covering up the first one:
Please advise

I tried a few things, including using numpy.nan or "" in the annot array. Unfortunately they don't work.
This is probably the easiest way. It involves grabbing the texts of the axes, which should only be the labels in annot which sns.heatmap puts there.
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
par_corr_p = np.array([[1, 2], [3, 4]])
data = par_corr_p
show_annot_array = data >= 2
fig, ax = plt.subplots()
sns.heatmap(
ax=ax,
data=data,
annot=data,
cmap ='RdBu_r', center=0, vmin=-5, vmax=5
)
for text, show_annot in zip(ax.texts, (element for row in show_annot_array for element in row)):
text.set_visible(show_annot)
plt.show()

Related

Color Matplotlib Histogram Subplots by a Categorical Variable

I am trying to create histogram subplots whose values I want to color by a second, categorical variable.
A small subset of the data is below
data = {'ift': [0.031967, 0.067416, 0.091275, 0.046852, 0.100406],
'ine': [0.078384, 0.09554, 0.234695, 0.182821, 0.190237],
'ift_out': [1, 1, 0, 1, 0],
'ine_out': [1, 1, 0, 0, 1]}
xyz = pd.DataFrame(data)
xyz
My initial stab at it is also below. A bit stumped on the inclusion of the categorical columns as colors
fig, axs = plt.subplots(nrows=2, ncols=1, sharey=True, tight_layout=True)
axs[0].hist(xyz['ift']) # color = xyz['ift_out']
axs[1].hist(xyz['ine']) # color = xyz['ine_out']
plt.show()
Sample output is attached below
Following #JohanC's answer, I made the some changes to my original code as shown below, and that worked they way I wanted
import matplotlib.pyplot as plt
import seaborn as sns
sns.color_palette("tab10")
sns.set(style="darkgrid")
fig, axs = plt.subplots(nrows=1, ncols=2, tight_layout=True)
g = sns.histplot(data=xyz, x='ift',
hue='ift_out', palette=['skyblue','tomato'], multiple='stack', ax=axs[0])
g = sns.histplot(data=xyz, x='ine',
hue='ine_out', palette=['skyblue','tomato'], multiple='stack', ax=axs[1])

How to plot the output of k-means clustering of word embedding using python?

I have used gensims word embeddings to find vectors of each word. Then I used K-means to find clusters of word. There are close to 10,000 tokens/words and I want to plot them.
I want to plot the result in the following way:
Annotate points with name of words
Different color for clusters
Here is what I have done.
tsne = TSNE(perplexity=40, n_components=2, init='pca', n_iter=500)#, random_state=13)
def tsne_plot(data):
"Creates and TSNE model and plots it"
data=data.sample(n = 500).reset_index()
word=data["word"]
cluster=data["clusters"]
data=data.drop(["clusters","word"],axis=1)
X = tsne.fit_transform(data)
plt.figure(figsize=(48, 48))
for i in range(len(X)):
plt.scatter(X[:,0][i],X[:,1][i],c=cluster[i])
plt.annotate(word[i],
xy=(X[:,0][i],X[:,1][i]),
xytext=(3, 2),
textcoords='offset points',
ha='right',
va='bottom')
plt.show()
tsne_plot(data)
Though it's annotating the words but failing to color different groups/clusters?
Anyother other approach which annoates with word anmes and colors different clusters?
This is how it's typically done; with annotations and rainbow colors.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# %matplotlib inline
from sklearn.cluster import KMeans
import seaborn as sns
import matplotlib.pyplot as plt
X = np.array([[5,3],
[10,15],
[15,12],
[24,10],
[30,45],
[85,70],
[71,80],
[60,78],
[55,52],
[80,91],])
kmeans = KMeans(n_clusters=2)
kmeans.fit(X)
print(kmeans.cluster_centers_)
print(kmeans.labels_)
#plt.scatter(X[:,0],X[:,1], c=kmeans.labels_, cmap='rainbow')
data = X
labels = kmeans.labels_
#######################################################################
plt.subplots_adjust(bottom = 0.1)
plt.scatter(data[:, 0], data[:, 1], c=kmeans.labels_, cmap='rainbow')
for label, x, y in zip(labels, data[:, 0], data[:, 1]):
plt.annotate(
label,
xy=(x, y), xytext=(-20, 20),
textcoords='offset points', ha='right', va='bottom',
bbox=dict(boxstyle='round,pad=0.5', fc='red', alpha=0.5),
arrowprops=dict(arrowstyle = '->', connectionstyle='arc3,rad=0'))
plt.show()
#######################################################################
See the link below for all details.
https://stackabuse.com/k-means-clustering-with-scikit-learn/
See the link below for some samples of how to do annotations with characters, rather tan numbers.
https://nikkimarinsek.com/blog/7-ways-to-label-a-cluster-plot-python

How to show horizontal lines at tips of error bar plot using matplotlib?

I can generate an error-bar plot using the code below. The graph produced by the code shows vertical lines that represent the errors in y. I would like to have horizontal lines at the tips of these errors ("error bars") and am not sure how to do so.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(1, 10, 10, dtype=int)
y = 2**x
yerr = np.sqrt(y)*10
fig, ax = plt.subplots()
ax.errorbar(x, y, yerr, solid_capstyle='projecting')
ax.grid(alpha=0.5, linestyle=':')
plt.show()
plt.close(fig)
The code generates the figure below. I've played with the solid_capstyle kwarg. Is there a specific kwarg that does what I am trying to do?
And as an example of what I'd like, the figure below:
In case it's relevant, I am using matplotlib 2.2.2
The argument you are looking for is capsize= in ax.errorbar(). The default is None so the length of the cap will default to the value of matplotlib.rcParams["errorbar.capsize"]. The number you give will be the length of the cap in points:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(1, 10, 10, dtype=int)
y = 2**x
yerr = np.sqrt(y)*10
fig, ax = plt.subplots()
ax.errorbar(x, y, yerr, solid_capstyle='projecting', capsize=5)
ax.grid(alpha=0.5, linestyle=':')
plt.show()

How can I add a normal distribution curve to multiple histograms?

With the following code I create four histograms:
import numpy as np
import pandas as pd
data = pd.DataFrame(np.random.normal((1, 2, 3 , 4), size=(100, 4)))
data.hist(bins=10)
I want the histograms to look like this:
I know how to make it one graph at the time, see here
But how can I do it for multiple histograms without specifying each single one? Ideally I could use 'pd.scatter_matrix'.
Plot each histogram seperately and do the fit to each histogram as in the example you linked or take a look at the hist api example here. Essentially what should be done is
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
fig = plt.figure()
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222)
ax3 = fig.add_subplot(223)
ax4 = fig.add_subplot(224)
for ax in [ax1, ax2, ax3, ax4]:
n, bins, patches = ax.hist(**your_data_here**, 50, normed=1, facecolor='green', alpha=0.75)
bincenters = 0.5*(bins[1:]+bins[:-1])
y = mlab.normpdf( bincenters, mu, sigma)
l = ax.plot(bincenters, y, 'r--', linewidth=1)
plt.show()

MatPlotLib + GeoPandas: Plot Multiple Layers, Control Figsize

Given the shape file available here: I know can produce the basic map that I need with county labels and even some points on the map (see below). The issue I'm having is that I cannot seem to control the size of the figure with figsize.
Here's what I have:
import geopandas as gpd
import matplotlib.pyplot as plt
%matplotlib inline
figsize=5,5
fig = plt.figure(figsize=(figsize),dpi=300)
shpfileshpfile=r'Y:\HQ\TH\Groups\NR\PSPD\Input\US_Counties\cb_2015_us_county_20m.shp'
c=gpd.read_file(shpfile)
c=c.loc[c['GEOID'].isin(['26161','26093','26049','26091','26075','26125','26163','26099','26115','26065'])]
c['coords'] = c['geometry'].apply(lambda x: x.representative_point().coords[:])
c['coords'] = [coords[0] for coords in c['coords']]
ax=c.plot()
#Control some attributes regarding the axis (for the plot above)
ax.spines['top'].set_visible(False);ax.spines['bottom'].set_visible(False);ax.spines['left'].set_visible(False);ax.spines['right'].set_visible(False)
ax.tick_params(axis='y',which='both',left='off',right='off',color='none',labelcolor='none')
ax.tick_params(axis='x',which='both',top='off',bottom='off',color='none',labelcolor='none')
for idx, row in c.iterrows():
ax.annotate(s=row['NAME'], xy=row['coords'],
horizontalalignment='center')
lat2=[42.5,42.3]
lon2=[-84,-83.5]
#Add another plot...
ax.plot(lon2,lat2,alpha=1,marker='o',linestyle='none',markeredgecolor='none',markersize=15,color='white')
plt.show()
As you can see, I opted to call the plots by the axis name because I need to control attributes of the axis, such as tick_params. I'm not sure if there is a better approach. This seems like a "no-brainer" but I can't seem to figure out why I can't control the figure size.
Thanks in advance!
I just had to do the following:
Use fig, ax = plt.subplots(1, 1, figsize = (figsize))
2.use the ax=ax argument in c.plot()
import geopandas as gpd
import matplotlib.pyplot as plt
%matplotlib inline
figsize=5,5
#fig = plt.figure(figsize=(figsize),dpi=300)
#ax = fig.add_subplot(111)
fig, ax = plt.subplots(1, 1, figsize = (figsize))
shpfileshpfile=r'Y:\HQ\TH\Groups\NR\PSPD\Input\US_Counties\cb_2015_us_county_20m.shp'
c=gpd.read_file(shpfile)
c=c.loc[c['GEOID'].isin(['26161','26093','26049','26091','26075','26125','26163','26099','26115','26065'])]
c['coords'] = c['geometry'].apply(lambda x: x.representative_point().coords[:])
c['coords'] = [coords[0] for coords in c['coords']]
c.plot(ax=ax)
ax.spines['top'].set_visible(False);ax.spines['bottom'].set_visible(False);ax.spines['left'].set_visible(False);ax.spines['right'].set_visible(False)
ax.tick_params(axis='y',which='both',left='off',right='off',color='none',labelcolor='none')
ax.tick_params(axis='x',which='both',top='off',bottom='off',color='none',labelcolor='none')
for idx, row in c.iterrows():
ax.annotate(s=row['NAME'], xy=row['coords'],
horizontalalignment='center')
lat2=[42.5,42.3]
lon2=[-84,-83.5]
ax.plot(lon2,lat2,alpha=1,marker='o',linestyle='none',markeredgecolor='none',markersize=15,color='white')

Resources