I need to autoscale the y-axis on my bargraph in matplotlib in order to display the small differences in values. The reason why it needs to be autoscaled instead of having a fixed limit is because the values will change depending on what the user inputs. I've tried yscale log, but that doesn't work for negative values. I've tried symlog, but the graph stays the same. This is my current code:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = range(700, 710, 1)
fig, ax = plt.subplots()
ax.bar(x, y)
plt.show()
Plots are automatically scaled for the full range of the data provided to the API.
For a bar plot, the best option to display the differences in the values of the bars, is probably to set the ylim for vertical bars or xlim for horizontal bars.
negative data
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = range(-700, -750, -5)
fig, ax = plt.subplots(figsize=(7, 5))
ax.bar(x, y)
plt.ylim(min(y), max(y))
positive data
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = range(700, 750, 5)
fig, ax = plt.subplots(figsize=(7, 5))
ax.bar(x, y)
plt.ylim(min(y), max(y))
mixed data
If the data has a wide range of positive and negative values, there's probably not a good option, as you've noted symlog doesn't help the issue.
The best option may be to plot the positive and negative data separately.
Creating a mask does't work with a list, so convert the lists to numpy arrays.
import numpy as np
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [700, -700, 710, -710, 720, -720, 730, -730, 740, -740]
x = np.array(x)
y = np.array(y)
mask = y >= 0 # positive mask
pos_y = y[mask] # get the positive values
neg_y = y[~mask] # get the negative values; ~ is not
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(7, 5))
ax1.bar(x[mask], pos_y) # also mask x to plot the bar at the correct x-tick
ax1.set_title('Positive Values')
ax1.set_ylim(min(pos_y), max(pos_y))
ax1.set_xticks(range(0, 12)) # buffer the number of x-ticks, so the x-ticks of the two plots align.
ax2.bar(x[~mask], neg_y)
ax2.set_title('Negative Values')
ax2.set_ylim(min(neg_y), max(neg_y))
ax2.set_xticks(range(0, 12))
plt.tight_layout() # better spacing between the two plots
Related
My legend now shows,
I want to add my label in legend, from 0 to 7, but I don't want to add a for-loop in my code and correct each label step by step, my code like that,
fig, ax = plt.subplots()
ax.set_title('Clusters by OPTICS in 2D space after PCA')
ax.set_xlabel('First Component')
ax.set_ylabel('Second Component')
points = ax.scatter(
pca_2_spec[:,0],
pca_2_spec[:,1],
s = 7,
marker='o',
c = pred_pca_2_spec,
cmap= 'rainbow')
ax.legend(*points.legend_elements(), title = 'cluster')
plt.show()
Assuming pred_pca_2_spec is some np.array with values [0, 5, 10, 15, 20, 30, 35] to change the values of these to be in the range 0-7, simply divide (each element) by 5.
Sample Data:
import numpy as np
from matplotlib import pyplot as plt
np.random.seed(54)
pca_2_spec = np.random.randint(-100, 300, (100, 2))
pred_pca_2_spec = np.random.choice([0, 5, 10, 15, 20, 25, 30, 35], 100)
Plotting Code:
fig, ax = plt.subplots()
ax.set_title('Clusters by OPTICS in 2D space after PCA')
ax.set_xlabel('First Component')
ax.set_ylabel('Second Component')
points = ax.scatter(
pca_2_spec[:, 0],
pca_2_spec[:, 1],
s=7,
marker='o',
c=pred_pca_2_spec / 5, # Divide By 5
cmap='rainbow')
ax.legend(*points.legend_elements(), title='cluster')
plt.show()
I have three 1D arrays (A, B, C) of equal length/size. I plot a scatter plot of B vs. A where I color each scatter plot bullet by the corresponding value in the C array (see the code below).
# Imports
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
# Create the Arrays
A = 10 * np.random.random_sample((20, 20))
B = 10 * np.random.random_sample((20, 20))
C = 100 * np.random.random_sample((20, 20))
A = A.reshape(20*20)
B = B.reshape(20*20)
C = C.reshape(20*20)
# Create the Colormap and Define Boundaries
cmap_C = cm.jet
cmap_C.set_bad(color='white')
bounds_C = np.arange(0, 110, 10)
norm_C = mpl.colors.BoundaryNorm(bounds_C, cmap_C.N)
# Plot the Figure
plt.figure()
plt.scatter(A, B, c=C, marker='o', s=100, cmap=cmap_C, norm=norm_C)
plt.xlim([-1, 11])
plt.ylim([-1, 11])
plt.xticks(np.arange(0, 11, 1))
plt.yticks(np.arange(0, 11, 1))
plt.xlabel('A')
plt.ylabel('B')
plt.grid()
plt.colorbar(label='Value of C')
plt.show()
Some bullets overlap in the figure so we cannot see them clearly. Therfore, next I now want to compute and plot the mean C value of all scatter plot bullets within each 1 integer x 1 integer bin in the figure so that each square grid point is colored by one single color (these bins are illustrated by the figure gridding). How can I do this?
It's not totally clear what you are trying to do, but I think there is an analytic result to your question before you work too hard. The expected mean value of color (C vector) is 50 because you have generated a uniformly distributed sample [0, 100]. The coordinates are also uniformly distributed, but that is irrelevant. Of course, there will be some variance in each of the grid squares.
If you need to go forward as an exercise, I'd construct a dictionary of coordinate:color mappings to help set up a screen...
color_map = {(x, y): color for x, y, color in zip(A,B,C)}
Then you could set up a dictionary to gather results for each grid and probably by taking the int() value of the coordinates put the data into the correct data field for the grid
Below is a solution that works for my purposes.
# Imports
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from zipfile import ZipFile
# Create the Arrays
xx = 5
yy = 5
A = 10 * np.random.random_sample((xx, yy))
B = 10 * np.random.random_sample((xx, yy))
C = 100 * np.random.random_sample((xx, yy))
A = A.reshape(xx*yy)
B = B.reshape(xx*yy)
C = C.reshape(xx*yy)
color_map = {(x, y): color for x, y, color in zip(A,B,C)}
xedges = np.arange(11)
yedges = np.arange(11)
H, xedges, yedges = np.histogram2d(A, B, bins=(xedges, yedges))
HT = H.T
ca = np.asarray(list(color_map))
print(ca)
cai = ca.astype(int)
print(cai)
# Extracting all dictionary values using loop + keys()
res = []
for key in color_map.keys() :
res.append(color_map[key])
res = np.asarray(res)
resi = res.astype(int)
print(resi)
BMC = np.zeros([10, 10])
for i in np.arange(len(resi)):
BMC[cai[i,1],cai[i,0]] = BMC[cai[i,1],cai[i,0]] + resi[i]
print(cai[i])
print(resi[i])
print(BMC[cai[i,1],cai[i,0]])
print(HT)
print(BMC)
BMC = BMC/HT
print(BMC)
# Create the Colormap and Define Boundaries
cmap_C = cm.jet
cmap_C.set_bad(color='white')
bounds_C = np.arange(-5, 115, 10)
norm_C = mpl.colors.BoundaryNorm(bounds_C, cmap_C.N)
cmap_hist2d = cm.CMRmap_r
cmap_hist2d.set_bad(color='white')
bounds_hist2d = np.arange(-0.5, 4.5, 1)
norm_hist2d = mpl.colors.BoundaryNorm(bounds_hist2d, cmap_hist2d.N)
cmap_C = cm.jet
cmap_C.set_bad(color='white')
BMC_plot = np.ma.array ( BMC, mask=np.isnan(BMC)) # Mask NaN
bounds_C = np.arange(-5, 115, 10)
norm_C = mpl.colors.BoundaryNorm(bounds_C, cmap_C.N)
plt.subplot(311)
plt.scatter(A, B, c=C, marker='o', s=100, cmap=cmap_C, norm=norm_C)
plt.xlim([-1, 11])
plt.ylim([-1, 11])
plt.xticks(np.arange(0, 11, 1))
plt.yticks(np.arange(0, 11, 1))
plt.ylabel('B')
plt.grid()
plt.colorbar(label='Value of C', ticks=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
plt.subplot(312)
x, y = np.meshgrid(xedges, yedges)
plt.pcolor(x, y, HT, cmap=cmap_hist2d, norm=norm_hist2d)
plt.xlim([-1, 11])
plt.ylim([-1, 11])
plt.xticks(np.arange(0, 11, 1))
plt.yticks(np.arange(0, 11, 1))
plt.ylabel('B')
plt.grid()
plt.colorbar(label='Number of Data in Bin', ticks=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
plt.subplot(313)
plt.pcolor(x, y, BMC_plot, cmap=cmap_C, norm=norm_C)
plt.xlim([-1, 11])
plt.ylim([-1, 11])
plt.xticks(np.arange(0, 11, 1))
plt.yticks(np.arange(0, 11, 1))
plt.xlabel('A')
plt.ylabel('B')
plt.grid()
plt.colorbar(label='Bin-Mean C Value', ticks=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
plt.show()
I generated this plot and I cannot display a value properly for each bar. Here is my bar plot:
This is my code for creating the above plot:
X_perc = [10, 7, 3, 5, 5]
cols = ['3.1-4.14', '4.14-5.18', '6.22-7.26', '7.26-8.3', '5.18-6.22']
data = np.array([[10, 7, 5, 5, 3],])
fig, ax=plt.subplots()
for i, (name, v) in enumerate(zip(cols, X_perc)):
bottom=np.sum(data[:,0:i], axis=1)
ax.bar(1,data[:,i], bottom=bottom, label="{}".format(name))
ax.text(0.7, (v*i)/v * i + v + i, str(v), fontweight='bold')
plt.legend(framealpha=1)
plt.axis([-10, 10, 0, 31])
plt.tick_params(
axis='x',
which='both',
bottom=False,
top=False,
labelbottom=False)
You can use a cumulative sum of positions to annotate your stacked bars
positions = np.cumsum(data)
fig, ax=plt.subplots()
for i, (name, v) in enumerate(zip(cols, X_perc)):
bottom=np.sum(data[:,0:i], axis=1)
ax.bar(1,data[:,i], bottom=bottom, label="{}".format(name))
ax.text(0.7, positions[i], str(v), fontweight='bold') # Use it here
I am trying to create a series of graphs that share x and y labels. I can get the graphs to each have a label (explained well here!), but this is not what I am looking for.
I want one label that covers the y axis of both graphs, and same for the x axis.
I've been looking at the matplotlib and pandas documentation and I was unable to find anything that addresses this issues when the using by argument.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 1, 2, 3, 4, 3, 4],
'B': [1, 7, 2, 4, 1, 4, 8, 3],
'C': [1, 4, 8, 3, 1, 7, 3, 4],
'D': [1, 2, 6, 5, 8, 3, 1, 7]},
index=[0, 1, 2, 3, 5, 6, 7, 8])
histo = df.hist(by=df['A'], sharey=True, sharex=True)
plt.ylabel('ylabel') # I assume the label is created on the 4th graph and then deleted?
plt.xlabel('xlabel') # Creates a label on the 4th graph.
plt.tight_layout()
plt.show()
The ouput looks like this.
Is there any way that I can create a Y Label that goes across the entire left side of the image (not each graph individually) and the same for the X Label.
As you can see, the x label only appears on the last graph created, and there is no y label.
Help?
This is one way to do it indirectly using the x- and y-labels as texts. I am not aware of a direct way using plt.xlabel or plt.ylabel. When passing an axis object to df.hist, the sharex and sharey arguments have to be passed in plt.subplots(). Here you can manually control/specify the position where you want to put the labels. For example, if you think the x-label is too close to the ticks, you can use 0.5, -0.02, 'X-label' to shift it slightly below.
import matplotlib.pyplot as plt
import pandas as pd
f, ax = plt.subplots(2, 2, figsize=(8, 6), sharex=True, sharey=True)
df = pd.DataFrame({'A': [1, 2, 1, 2, 3, 4, 3, 4],
'B': [1, 7, 2, 4, 1, 4, 8, 3],
'C': [1, 4, 8, 3, 1, 7, 3, 4],
'D': [1, 2, 6, 5, 8, 3, 1, 7]},
index=[0, 1, 2, 3, 5, 6, 7, 8])
histo = df.hist(by=df['A'], ax=ax)
f.text(0, 0.5, 'Y-label', ha='center', va='center', fontsize=20, rotation='vertical')
f.text(0.5, 0, 'X-label', ha='center', va='center', fontsize=20)
plt.tight_layout()
I fixed the issue with the variable number of sub-plots using something like this:
cols = 3
n = len(set(df['A']))
rows = int(n / cols) + (0 if n % cols == 0 else 1)
fig, axes = plt.subplots(rows, cols)
extra = rows * cols - n
if extra:
newaxes = []
count = 0
for row in range(rows):
for col in range(cols):
if count < n:
newaxes.append(axes[row][col])
else:
axes[row][col].axis('off')
count += 1
else:
newaxes = axes
hist = df.hist(by=df['A'], ax=newaxes)
I want to have a scatter plot with ticks as marginals:
x = [ 0, 1, 1.2, 1.3, 4, 5, 6, 7, 8.2, 9, 10]
y = [.2, .4, 2, 3, 4, 5, 5.1, 5.2, 4, 3, 8]
fig, ax1 = plt.subplots()
for spine in ax1.spines.values():
spine.set_visible(False)
ax1.scatter(x, y)
ax1.set_xticks(x)
ax1.set_xticklabels([])
ax1.set_yticks(y)
ax1.set_yticklabels([])
And on top of that, I want to have ticklabels at other positions, not determined by the ticks:
xticklabels = [0, 5, 10]
yticklabels = xticklabels
How could I possibly achieve that?
Matplotlib axes have major and minor ticks. You may use the minor ticks to show the marginal locations of the points. You may turn the major ticks off but show the ticklabels for them.
To set ticks at certain positions you can use a FixedLocator. To change the appearance of the ticks or turn them off, the axes has a tick_params method.
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
x = [ 0, 1, 1.2, 1.3, 4, 5, 6, 7, 8.2, 9, 10]
y = [.2, .4, 2, 3, 4, 5, 5.1, 5.2, 4, 3, 8]
xticklabels = [0, 5, 10]
yticklabels = xticklabels
fig, ax = plt.subplots()
for spine in ax.spines.values():
spine.set_visible(False)
ax.scatter(x, y)
ax.xaxis.set_major_locator(ticker.FixedLocator(xticklabels))
ax.yaxis.set_major_locator(ticker.FixedLocator(yticklabels))
ax.xaxis.set_minor_locator(ticker.FixedLocator(x))
ax.yaxis.set_minor_locator(ticker.FixedLocator(y))
ax.tick_params(axis="both", which="major", bottom="off", left="off")
ax.tick_params(axis="both", which="minor", length=4)
plt.show()
Note that I personally find this plot rather difficult to grasp and if I may, I would propose something more like this:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
x = [ 0, 1, 1.2, 1.3, 4, 5, 6, 7, 8.2, 9, 10]
y = [.2, .4, 2, 3, 4, 5, 5.1, 5.2, 4, 3, 8]
xticklabels = [0, 5, 10]
yticklabels = xticklabels
fig, ax = plt.subplots()
ax.scatter(x, y)
ax.xaxis.set_minor_locator(ticker.FixedLocator(x))
ax.yaxis.set_minor_locator(ticker.FixedLocator(y))
c = "#aaaaaa"
ax.tick_params(axis="both", which="major", direction="out", color=c)
ax.tick_params(axis="both", which="minor", length=6, direction="in",
color="C0", width=1.5)
plt.setp(ax.spines.values(), color=c)
plt.setp(ax.get_xticklabels(), color=c)
plt.setp(ax.get_yticklabels(), color=c)
plt.show()