Set edgecolor on seaborn jointplot - python-3.x

I am able to set edgecolors for a seaborn histogram by passing in a hist_kws argument:
sns.distplot(ad_data["Age"], kde = False, bins = 35, hist_kws = {"ec":"black"})
However, I'm unable to similarly set edgecolors for the histograms in a seaborn jointplot. It doesn't accept a hist_kws argument or any other similar argument to set edgecolors. I'm unable to find anything in the document that addresses this. Any help would be appreciated.
For reference, I'm using seaborn 0.9 and matplotlib 3.1.

You need a 'hist_kws' inside the 'marginal_kws':
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
x = np.random.normal(np.repeat([2, 8, 7, 10], 1000), 1)
y = np.random.normal(np.repeat([7, 2, 9, 4], 1000), 1)
g = sns.jointplot(x=x, y=y, color='purple', alpha=0.1,
marginal_kws={'color': 'tomato', 'hist_kws': {'edgecolor': 'black'}})
plt.show()
In this case, jointplot sends the marginal_kws to distplot which in its turn sends the hist_kws to matplotlib's hist.
Similarly, you can also set the parameters of a kde for the distplot:
g = sns.jointplot(x=x, y=y, kind='hex', color='indigo',
marginal_kws={'color': 'purple', 'kde': True,
'kde_kws': {'color': 'crimson', 'lw': 1},
'hist_kws': {'ec': 'black', 'lw': 2}})

Related

How to plot histogram subplots for each group

When I run the following code, I get 4 different histograms separated by groups. How can I achieve the same type of visualization with 4 different sns.distplot() also separated by their groups?
df = pd.DataFrame({
"group": [1, 1, 2, 2, 3, 3, 4, 4],
"similarity": [0.1, 0.2, 0.35, 0.6, 0.7, 0.25, 0.15, 0.55]
})
df['similarity'].hist(by=df['group'])
seaborn is a high-level api for matplotlib, and pandas uses matplotlib as the default plotting backend.
From seaborn v0.11.2, sns.distplot is deprecated, and, as per the Warning in the documentation, it is not recommended to directly use FacetGrid.
sns.distplot is replaced by the axes-level function sns.histplot, and the figure-level function sns.displot.
Also see seaborn histplot and displot output doesn't match
It is easy to produce a plot, but not necessarily to produce the correct plot, unless you are aware of the different parameter defaults for each api.
Note the difference between common_bins as True and Fales.
Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1, seaborn 0.11.2
common_bins=False
import seaborn as sns
# plot
g = sns.displot(data=df, x='similarity', col='group', col_wrap=2, common_bins=False, height=4)
common_bins=True (4)
sns.displot, and pandas.DataFrame.plot with kind='hist' and bins=4 produce the same plot.
g = sns.displot(data=df, x='similarity', col='group', col_wrap=2, common_bins=True, bins=4, height=4)
# reshape the dataframe to a wide format
dfp = df.pivot(columns='group', values='similarity')
axes = dfp.plot(kind='hist', subplots=True, layout=(2, 2), figsize=(9, 9), ec='k', bins=4, sharey=True)
You can use FacetGrid from seaborn:
import seaborn as sns
g = sns.FacetGrid(data=df, col='group', col_wrap=2)
g.map(sns.histplot, 'similarity')
Output:

Insert a png image in a matplotlib figure

I'm trying to insert a png image in matplotlib figure (ref)
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.figure import Figure
from matplotlib.offsetbox import OffsetImage, AnnotationBbox
ax = plt.subplot(111)
ax.plot(
[1, 2, 3], [1, 2, 3],
'go-',
label='line 1',
linewidth=2
)
arr_img = plt.imread("stinkbug.png")
im = OffsetImage(arr_img)
ab = AnnotationBbox(im, (1, 0), xycoords='axes fraction')
ax.add_artist(ab)
plt.show()
Inset image:
Output obtained:
I'd like to know how to resize the image that has to be inserted to avoid overlaps.
EDIT:
Saving the figure
ax.figure.savefig("output.svg", transparent=True, dpi=600, bbox_inches="tight")
You can zoom the image and the set the box alignment to the lower right corner (0,1) plus some extra for the margins:
im = OffsetImage(arr_img, zoom=.45)
ab = AnnotationBbox(im, (1, 0), xycoords='axes fraction', box_alignment=(1.1,-0.1))
You may also want to use data coordinates, which is the default, and use the default box_alignment to the center, e.g. ab = AnnotationBbox(im, (2.6, 1.45)). See the xycoords parameter doc for more information about various coordinate options.

i am trying to plot an scatter graph but the result shows only x labels . why ylabel is not display?

in my code, if I call pred and test object I get these results.
Pred = array([16.88414476, 33.73226078, 75.357018 , 26.79480124, 60.49103328])
test = array([20, 27, 69, 30, 62], dtype=int64)
I apply:
plt.scatter(pred,test)
How I plot both pred and test results on the graph?
so please help! , how to find the desired output.
Scatter plot would take the given two values as x and y values of the plot.
If you want to plot both of them as separate data, use plt.plot
import matplotlib.pyplot as plt
import numpy as np
Pred = np.array([16.88414476, 33.73226078, 75.357018 , 26.79480124, 60.49103328])
test = np.array([20, 27, 69, 30, 62])
plt.plot(Pred)
plt.plot(test, linestyle='--')
Use could also use the pandas plot functionality
pd.DataFrame({'pred': Pred, 'test': test}).plot()
I think it is not possible to plot 2 arrays in scatter but you can do it in a plot
from matplotlib import pyplot as plt
import numpy as np
Pred = np.array([16.88414476, 33.73226078, 75.357018 , 26.79480124, 60.49103328])
test = np.array([20, 27, 69, 30, 62])
plt.plot(Pred, label='Pred Label')
plt.plot(test, label='Test Label')
plt.legend() # To Show the the labels' names

How to visualize a list of strings on a colorbar in matplotlib

I have a dataset like
x = 3,4,6,77,3
y = 8,5,2,5,5
labels = "null","exit","power","smile","null"
Then I use
from matplotlib import pyplot as plt
plt.scatter(x,y)
colorbar = plt.colorbar(labels)
plt.show()
to make a scatter plot, but cannot make colorbar showing labels as its colors.
How to get this?
I'm not sure, if it's a good idea to do that for scatter plots in general (you have the same description for different data points, maybe just use some legend here?), but I guess a specific solution to what you have in mind, might be the following:
from matplotlib import pyplot as plt
# Data
x = [3, 4, 6, 77, 3]
y = [8, 5, 2, 5, 5]
labels = ('null', 'exit', 'power', 'smile', 'null')
# Customize colormap and scatter plot
cm = plt.cm.get_cmap('hsv')
sc = plt.scatter(x, y, c=range(5), cmap=cm)
cbar = plt.colorbar(sc, ticks=range(5))
cbar.ax.set_yticklabels(labels)
plt.show()
This will result in such an output:
The code combines this Matplotlib demo and this SO answer.
Hope that helps!
EDIT: Incorporating the comments, I can only think of some kind of label color dictionary, generating a custom colormap from the colors, and before plotting explicitly grabbing the proper color indices from the labels.
Here's the updated code (I added some additional colors and data points to check scalability):
from matplotlib import pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import numpy as np
# Color information; create custom colormap
label_color_dict = {'null': '#FF0000',
'exit': '#00FF00',
'power': '#0000FF',
'smile': '#FF00FF',
'addon': '#AAAAAA',
'addon2': '#444444'}
all_labels = list(label_color_dict.keys())
all_colors = list(label_color_dict.values())
n_colors = len(all_colors)
cm = LinearSegmentedColormap.from_list('custom_colormap', all_colors, N=n_colors)
# Data
x = [3, 4, 6, 77, 3, 10, 40]
y = [8, 5, 2, 5, 5, 4, 7]
labels = ('null', 'exit', 'power', 'smile', 'null', 'addon', 'addon2')
# Get indices from color list for given labels
color_idx = [all_colors.index(label_color_dict[label]) for label in labels]
# Customize colorbar and plot
sc = plt.scatter(x, y, c=color_idx, cmap=cm)
c_ticks = np.arange(n_colors) * (n_colors / (n_colors + 1)) + (2 / n_colors)
cbar = plt.colorbar(sc, ticks=c_ticks)
cbar.ax.set_yticklabels(all_labels)
plt.show()
And, the new output:
Finding the correct middle point of each color segment is (still) not good, but I'll leave this optimization to you.

Smooth curves in Python Plots [duplicate]

I've got the following simple script that plots a graph:
import matplotlib.pyplot as plt
import numpy as np
T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])
plt.plot(T,power)
plt.show()
As it is now, the line goes straight from point to point which looks ok, but could be better in my opinion. What I want is to smooth the line between the points. In Gnuplot I would have plotted with smooth cplines.
Is there an easy way to do this in PyPlot? I've found some tutorials, but they all seem rather complex.
You could use scipy.interpolate.spline to smooth out your data yourself:
from scipy.interpolate import spline
# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)
power_smooth = spline(T, power, xnew)
plt.plot(xnew,power_smooth)
plt.show()
spline is deprecated in scipy 0.19.0, use BSpline class instead.
Switching from spline to BSpline isn't a straightforward copy/paste and requires a little tweaking:
from scipy.interpolate import make_interp_spline, BSpline
# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)
spl = make_interp_spline(T, power, k=3) # type: BSpline
power_smooth = spl(xnew)
plt.plot(xnew, power_smooth)
plt.show()
Before:
After:
For this example spline works well, but if the function is not smooth inherently and you want to have smoothed version you can also try:
from scipy.ndimage.filters import gaussian_filter1d
ysmoothed = gaussian_filter1d(y, sigma=2)
plt.plot(x, ysmoothed)
plt.show()
if you increase sigma you can get a more smoothed function.
Proceed with caution with this one. It modifies the original values and may not be what you want.
See the scipy.interpolate documentation for some examples.
The following example demonstrates its use, for linear and cubic spline interpolation:
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import interp1d
# Define x, y, and xnew to resample at.
x = np.linspace(0, 10, num=11, endpoint=True)
y = np.cos(-x**2/9.0)
xnew = np.linspace(0, 10, num=41, endpoint=True)
# Define interpolators.
f_linear = interp1d(x, y)
f_cubic = interp1d(x, y, kind='cubic')
# Plot.
plt.plot(x, y, 'o', label='data')
plt.plot(xnew, f_linear(xnew), '-', label='linear')
plt.plot(xnew, f_cubic(xnew), '--', label='cubic')
plt.legend(loc='best')
plt.show()
Slightly modified for increased readability.
One of the easiest implementations I found was to use that Exponential Moving Average the Tensorboard uses:
def smooth(scalars: List[float], weight: float) -> List[float]: # Weight between 0 and 1
last = scalars[0] # First value in the plot (first timestep)
smoothed = list()
for point in scalars:
smoothed_val = last * weight + (1 - weight) * point # Calculate smoothed value
smoothed.append(smoothed_val) # Save it
last = smoothed_val # Anchor the last smoothed value
return smoothed
ax.plot(x_labels, smooth(train_data, .9), x_labels, train_data)
I presume you mean curve-fitting and not anti-aliasing from the context of your question. PyPlot doesn't have any built-in support for this, but you can easily implement some basic curve-fitting yourself, like the code seen here, or if you're using GuiQwt it has a curve fitting module. (You could probably also steal the code from SciPy to do this as well).
Here is a simple solution for dates:
from scipy.interpolate import make_interp_spline
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as dates
from datetime import datetime
data = {
datetime(2016, 9, 26, 0, 0): 26060, datetime(2016, 9, 27, 0, 0): 23243,
datetime(2016, 9, 28, 0, 0): 22534, datetime(2016, 9, 29, 0, 0): 22841,
datetime(2016, 9, 30, 0, 0): 22441, datetime(2016, 10, 1, 0, 0): 23248
}
#create data
date_np = np.array(list(data.keys()))
value_np = np.array(list(data.values()))
date_num = dates.date2num(date_np)
# smooth
date_num_smooth = np.linspace(date_num.min(), date_num.max(), 100)
spl = make_interp_spline(date_num, value_np, k=3)
value_np_smooth = spl(date_num_smooth)
# print
plt.plot(date_np, value_np)
plt.plot(dates.num2date(date_num_smooth), value_np_smooth)
plt.show()
It's worth your time looking at seaborn for plotting smoothed lines.
The seaborn lmplot function will plot data and regression model fits.
The following illustrates both polynomial and lowess fits:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])
df = pd.DataFrame(data = {'T': T, 'power': power})
sns.lmplot(x='T', y='power', data=df, ci=None, order=4, truncate=False)
sns.lmplot(x='T', y='power', data=df, ci=None, lowess=True, truncate=False)
The order = 4 polynomial fit is overfitting this toy dataset. I don't show it here but order = 2 and order = 3 gave worse results.
The lowess = True fit is underfitting this tiny dataset but may give better results on larger datasets.
Check the seaborn regression tutorial for more examples.
Another way to go, which slightly modifies the function depending on the parameters you use:
from statsmodels.nonparametric.smoothers_lowess import lowess
def smoothing(x, y):
lowess_frac = 0.15 # size of data (%) for estimation =~ smoothing window
lowess_it = 0
x_smooth = x
y_smooth = lowess(y, x, is_sorted=False, frac=lowess_frac, it=lowess_it, return_sorted=False)
return x_smooth, y_smooth
That was better suited than other answers for my specific application case.

Resources