Why does not this plot get bigger as expected? - python-3.x

From a previous question, I got that plt.figure(figsize = 2 * np.array(plt.rcParams['figure.figsize'])) will increase the plot size by 2 times. With below code, I want to plot 4 subplots in the grid 2x2.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
%config InlineBackend.figure_format = 'svg' # Change the image format to svg for better quality
don = pd.read_csv('https://raw.githubusercontent.com/leanhdung1994/Deep-Learning/main/donclassif.txt.gz', sep=';')
fig, ax = plt.subplots(nrows = 2, ncols = 2)
plt.figure(figsize = 2 * np.array(plt.rcParams['figure.figsize'])) # This is to have bigger plot
for row in ax:
for col in row:
kmeans = KMeans(n_clusters = 4)
kmeans.fit(don)
y_kmeans = kmeans.predict(don)
col.scatter(don['V1'], don['V2'], c = y_kmeans, cmap = 'viridis')
centers = kmeans.cluster_centers_
col.scatter(centers[:, 0], centers[:, 1], c = 'red', s = 200, alpha = 0.5);
plt.show()
Could you please explain why plt.figure(figsize = 2 * np.array(plt.rcParams['figure.figsize'])) does not work in this case?

I post #JohanC's comment to remove this question from unanswered list.
It could be written as fig, axes = plt.subplots(nrows=2, ncols=2, figsize=2 * np.array(plt.rcParams['figure.figsize'])). Just calling plt.figure without storing the result creates a dummy new figure, without changing fig and without creating the axes on that new figure won't have the desired result.

Related

Is there a library that will help me fit data easily? I found fitter and i will provide the code but it shows some errors

So, here is my code:
import pandas as pd
import scipy.stats as st
import matplotlib.pyplot as plt
from matplotlib.ticker import AutoMinorLocator
from fitter import Fitter, get_common_distributions
df = pd.read_csv("project3.csv")
bins = [282.33, 594.33, 906.33, 1281.33, 15030.33, 1842.33, 2154.33, 2466.33, 2778.33, 3090.33, 3402.33]
#declaring
facecolor = '#EAEAEA'
color_bars = '#3475D0'
txt_color1 = '#252525'
txt_color2 = '#004C74'
fig, ax = plt.subplots(1, figsize=(16, 6), facecolor=facecolor)
ax.set_facecolor(facecolor)
n, bins, patches = plt.hist(df.City1, color=color_bars, bins=10)
#grid
minor_locator = AutoMinorLocator(2)
plt.gca().xaxis.set_minor_locator(minor_locator)
plt.grid(which='minor', color=facecolor, lw = 0.5)
xticks = [(bins[idx+1] + value)/2 for idx, value in enumerate(bins[:-1])]
xticks_labels = [ "{:.0f}-{:.0f}".format(value, bins[idx+1]) for idx, value in enumerate(bins[:-1])]
plt.xticks(xticks, labels=xticks_labels, c=txt_color1, fontsize=13)
#beautify
ax.tick_params(axis='x', which='both',length=0)
plt.yticks([])
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
for idx, value in enumerate(n):
if value > 0:
plt.text(xticks[idx], value+5, int(value), ha='center', fontsize=16, c=txt_color1)
plt.title('Histogram of rainfall in City1\n', loc = 'right', fontsize = 20, c=txt_color1)
plt.xlabel('\nCentimeters of rainfall', c=txt_color2, fontsize=14)
plt.ylabel('Frequency of occurrence', c=txt_color2, fontsize=14)
plt.tight_layout()
#plt.savefig('City1_Raw.png', facecolor=facecolor)
plt.show()
city1 = df['City1'].values
f = Fitter(city1, distributions=get_common_distributions())
f.fit()
fig = f.plot_pdf(names=None, Nbest=4, lw=1, method='sumsquare_error')
plt.show()
print(f.get_best(method = 'sumsquare_error'))
The issue is with the plots it shows. The first histogram it generates is
Next I get another graph with best fitted distributions which is
Then an output statement
{'chi2': {'df': 10.692966790090342, 'loc': 16.690849400411103, 'scale': 118.71595997157786}}
Process finished with exit code 0
I have a couple of questions. Why is chi2, the best fitted distribution not plotted on the graph?
How do I plot these distributions on top of the histograms and not separately? The hist() function in fitter library can do that but there I don't get to control the bins and so I end up getting like 100 bins with some flat looking data.
How do I solve this issue? I need to plot the best fit curve on the histogram that looks like image1. Can I use any other module/package to get the work done in similar way? This uses least squares fit but I am OK with least likelihood or log likelihood too.
Simple way of plotting things on top of each other (using some properties of the Fitter class)
import scipy.stats as st
import matplotlib.pyplot as plt
from fitter import Fitter, get_common_distributions
from scipy import stats
numberofpoints=50000
df = stats.norm.rvs( loc=1090, scale=500, size=numberofpoints)
fig, ax = plt.subplots(1, figsize=(16, 6))
n, bins, patches = ax.hist( df, bins=30, density=True)
f = Fitter(df, distributions=get_common_distributions())
f.fit()
errorlist = sorted(
[
[f._fitted_errors[dist], dist]
for dist in get_common_distributions()
]
)[:4]
for err, dist in errorlist:
ax.plot( f.x, f.fitted_pdf[dist] )
plt.show()
Using the histogram normalization, one would need to play with scaling to generalize again.

Adjust hspace one-sided for matplotlib subplots

My question is based on this question:
Adjust hspace for some of the subplots
Which adjusts the top plot of a number of subplots and increases the difference in hspace. I want to increase the hspace between two plots within the subplots (in my case: between plot 3 and plot4 from the top).
Here is my example:
import numpy as np
import matplotlib.pyplot as plt
noise = np.random.rand(300)
gs_top = plt.GridSpec(9, 1, hspace=0.5)
gs_base = plt.GridSpec(9, 1, hspace=0)
fig = plt.figure()
fig.patch.set_facecolor('white')
ax0 = fig.add_subplot(gs_base[0,:])
ax1 = fig.add_subplot(gs_base[1,:])
ax2 = fig.add_subplot(gs_top[2,:])
ax3 = fig.add_subplot(gs_base[3,:])
ax4 = fig.add_subplot(gs_base[4,:])
ax5 = fig.add_subplot(gs_base[5,:])
ax0.plot(noise)
ax1.plot(noise)
ax2.plot(noise)
ax3.plot(noise)
ax4.plot(noise)
ax5.plot(noise)
In the example it is shown that the hspace increases between plot 3 and 4. However, I don't want to increase the space between plot 2 and plot 3.
How can I adjust the hspace variable only on one side?
Found the answer after manipulating google by asking with various word combinations. Found this: Stackoverflow answer
In short (dirty way):
Adding a seperate axis and make it invisible.
Example:
import numpy as np
import matplotlib.pyplot as plt
noise = np.random.rand(300)
gs_base = plt.GridSpec(7, 1, hspace=0, height_ratios=[1, 1, 1, 0.8, 1,1,1])
fig = plt.figure()
fig.patch.set_facecolor('white')
ax0 = fig.add_subplot(gs_base[0,:])
ax1 = fig.add_subplot(gs_base[1,:])
ax2 = fig.add_subplot(gs_base[2,:])
ax3 = fig.add_subplot(gs_base[3,:])
ax3.set_visible(False)
ax4 = fig.add_subplot(gs_base[4,:])
ax5 = fig.add_subplot(gs_base[5,:])
ax6 = fig.add_subplot(gs_base[6,:])
ax0.plot(noise)
ax1.plot(noise)
ax2.plot(noise)
ax4.plot(noise)
ax5.plot(noise)
ax6.plot(noise)
In long (correct way):
Couldn't figure it out for the moment.

Matplotlib: how to get color bars that are one on top of each other as opposed to side by side?

I have the following code:
import matplotlib.pyplot as plt
import numpy as np
img1 = np.zeros([512,512])
img2 = np.zeros([512,512])
plt.figure(figsize=(10,10))
plt.imshow(img1, cmap='inferno')
plt.axis('off')
cba = plt.colorbar(shrink=0.25)
cba.ax.set_ylabel('Events / counts', fontsize=14)
cba.ax.tick_params(labelsize=12)
plt.imshow(img2, cmap='turbo', alpha=0.5)
plt.axis('off')
cba = plt.colorbar(shrink=0.25)
cba.ax.set_ylabel('Lifetime / ns)', fontsize=14)
cba.ax.tick_params(labelsize=12)
plt.tight_layout()
plt.show()
which produces the following output:
My question is, how can I get color bars that are on top of one another as opposed to next to each other? Ideally, I would like to get something like this:
You can grab the position of the ax and use it to create new axes for the colorbars. Here is an example:
import numpy as np
import matplotlib.pyplot as plt
from scipy import ndimage
data = ndimage.gaussian_filter(np.random.randn(512, 512), sigma=15, mode='nearest') * 20
fig, ax = plt.subplots()
im1 = ax.imshow(data, vmin=-1, vmax=0, cmap='viridis')
data[data < 0] = np.nan
im2 = ax.imshow(data, vmin=0.001, vmax=1, cmap='Reds_r')
ax.axis('off')
pos = ax.get_position()
bar_h = (pos.y1 - pos.y0) * 0.5 # 0.5 joins the two bars, e.g. 0.48 separates them a bit
ax_cbar1 = fig.add_axes([pos.x1 + 0.02, pos.y0, 0.03, bar_h])
cbar1 = fig.colorbar(im1, cax=ax_cbar1, orientation='vertical')
ax_cbar2 = fig.add_axes([pos.x1 + 0.02, pos.y1 - bar_h, 0.03, bar_h])
cbar2 = fig.colorbar(im2, cax=ax_cbar2, orientation='vertical')
plt.show()

How to plot heatmap for high-dimensional dataset?

I would greatly appreciate if you could let me know how to plot high-resolution heatmap for a large dataset with approximately 150 features.
My code is as follows:
XX = pd.read_csv('Financial Distress.csv')
y = np.array(XX['Financial Distress'].values.tolist())
y = np.array([0 if i > -0.50 else 1 for i in y])
XX = XX.iloc[:, 3:87]
df=XX
df["target_var"]=y.tolist()
target_var=["target_var"]
fig, ax = plt.subplots(figsize=(8, 6))
correlation = df.select_dtypes(include=['float64',
'int64']).iloc[:, 1:].corr()
sns.heatmap(correlation, ax=ax, vmax=1, square=True)
plt.xticks(rotation=90)
plt.yticks(rotation=360)
plt.title('Correlation matrix')
plt.tight_layout()
plt.show()
k = df.shape[1] # number of variables for heatmap
fig, ax = plt.subplots(figsize=(9, 9))
corrmat = df.corr()
# Generate a mask for the upper triangle
mask = np.zeros_like(corrmat, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True
cols = corrmat.nlargest(k, target_var)[target_var].index
cm = np.corrcoef(df[cols].values.T)
sns.set(font_scale=1.0)
hm = sns.heatmap(cm, mask=mask, cbar=True, annot=True,
square=True, fmt='.2f', annot_kws={'size': 7},
yticklabels=cols.values,
xticklabels=cols.
values)
plt.xticks(rotation=90)
plt.yticks(rotation=360)
plt.title('Annotated heatmap matrix')
plt.tight_layout()
plt.show()
It works fine but the plotted heatmap for a dataset with more than 40 features is too small.
Thanks in advance,
Adjusting the figsize and dpi worked for me.
I adapted your code and doubled the size of the heatmap to 165 x 165. The rendering takes a while, but the png looks fine. My backend is "module://ipykernel.pylab.backend_inline."
As noted in my original answer, I'm pretty sure you forgot close the figure object before creating a new one. Try plt.close("all") before fig, ax = plt.subplots() if you get wierd effects.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
print(plt.get_backend())
# close any existing plots
plt.close("all")
df = pd.read_csv("Financial Distress.csv")
# select out the desired columns
df = df.iloc[:, 3:].select_dtypes(include=['float64','int64'])
# copy columns to double size of dataframe
df2 = df.copy()
df2.columns = "c_" + df2.columns
df3 = pd.concat([df, df2], axis=1)
# get the correlation coefficient between the different columns
corr = df3.iloc[:, 1:].corr()
arr_corr = corr.as_matrix()
# mask out the top triangle
arr_corr[np.triu_indices_from(arr_corr)] = np.nan
fig, ax = plt.subplots(figsize=(24, 18))
hm = sns.heatmap(arr_corr, cbar=True, vmin=-0.5, vmax=0.5,
fmt='.2f', annot_kws={'size': 3}, annot=True,
square=True, cmap=plt.cm.Blues)
ticks = np.arange(corr.shape[0]) + 0.5
ax.set_xticks(ticks)
ax.set_xticklabels(corr.columns, rotation=90, fontsize=8)
ax.set_yticks(ticks)
ax.set_yticklabels(corr.index, rotation=360, fontsize=8)
ax.set_title('correlation matrix')
plt.tight_layout()
plt.savefig("corr_matrix_incl_anno_double.png", dpi=300)
full figure:
zoom of top left section:
If I understand your problem correctly, I think all you have to do is increase you figure size:
f, ax = plt.subplots(figsize=(20, 20))
instead of
f, ax = plt.subplots(figsize=(9, 9))

How can I add a normal distribution curve to multiple histograms?

With the following code I create four histograms:
import numpy as np
import pandas as pd
data = pd.DataFrame(np.random.normal((1, 2, 3 , 4), size=(100, 4)))
data.hist(bins=10)
I want the histograms to look like this:
I know how to make it one graph at the time, see here
But how can I do it for multiple histograms without specifying each single one? Ideally I could use 'pd.scatter_matrix'.
Plot each histogram seperately and do the fit to each histogram as in the example you linked or take a look at the hist api example here. Essentially what should be done is
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
fig = plt.figure()
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222)
ax3 = fig.add_subplot(223)
ax4 = fig.add_subplot(224)
for ax in [ax1, ax2, ax3, ax4]:
n, bins, patches = ax.hist(**your_data_here**, 50, normed=1, facecolor='green', alpha=0.75)
bincenters = 0.5*(bins[1:]+bins[:-1])
y = mlab.normpdf( bincenters, mu, sigma)
l = ax.plot(bincenters, y, 'r--', linewidth=1)
plt.show()

Resources