How to center the grid of a plot on scatter points? - python-3.x

I have this scatter plot:
I'd like to move the grid in a way that each point (green square) would be surrounded by the grid's cells. For example:
The code to reproduce the plot:
import matplotlib.pyplot as plt
data = [24, 24, 24, 16, 16, 2, 2, 2]
x = list(range(0, len(data)))
y = list(range(0, 25))
plt.scatter(x, data, marker='s', c='g', s=100)
plt.yticks(y)
plt.xticks(x)
plt.grid(True)
plt.show()

Maybe something like the following meets the requirement. You can use the minor ticks for the grid and the major ticks for the labels.
import numpy as np
import matplotlib.pyplot as plt
data = [24, 24, 24, 16, 16, 2, 2, 2]
x = list(range(0, len(data)))
fig, ax = plt.subplots()
ax.scatter(x, data, marker='s', c='g', s=49)
ax.set_yticks(np.arange(25))
ax.set_yticks(np.arange(25+1)-0.5, minor=True)
ax.set_xticks(np.arange(len(data)))
ax.set_xticks(np.arange(len(data)+1)-0.5, minor=True)
ax.grid(True, which="minor")
ax.set_aspect("equal")
plt.show()

Related

Is there some way can add label in legend in plot by one step?

My legend now shows,
I want to add my label in legend, from 0 to 7, but I don't want to add a for-loop in my code and correct each label step by step, my code like that,
fig, ax = plt.subplots()
ax.set_title('Clusters by OPTICS in 2D space after PCA')
ax.set_xlabel('First Component')
ax.set_ylabel('Second Component')
points = ax.scatter(
pca_2_spec[:,0],
pca_2_spec[:,1],
s = 7,
marker='o',
c = pred_pca_2_spec,
cmap= 'rainbow')
ax.legend(*points.legend_elements(), title = 'cluster')
plt.show()
Assuming pred_pca_2_spec is some np.array with values [0, 5, 10, 15, 20, 30, 35] to change the values of these to be in the range 0-7, simply divide (each element) by 5.
Sample Data:
import numpy as np
from matplotlib import pyplot as plt
np.random.seed(54)
pca_2_spec = np.random.randint(-100, 300, (100, 2))
pred_pca_2_spec = np.random.choice([0, 5, 10, 15, 20, 25, 30, 35], 100)
Plotting Code:
fig, ax = plt.subplots()
ax.set_title('Clusters by OPTICS in 2D space after PCA')
ax.set_xlabel('First Component')
ax.set_ylabel('Second Component')
points = ax.scatter(
pca_2_spec[:, 0],
pca_2_spec[:, 1],
s=7,
marker='o',
c=pred_pca_2_spec / 5, # Divide By 5
cmap='rainbow')
ax.legend(*points.legend_elements(), title='cluster')
plt.show()

How to autoscale y-axis for bargraph in matplotlib?

I need to autoscale the y-axis on my bargraph in matplotlib in order to display the small differences in values. The reason why it needs to be autoscaled instead of having a fixed limit is because the values will change depending on what the user inputs. I've tried yscale log, but that doesn't work for negative values. I've tried symlog, but the graph stays the same. This is my current code:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = range(700, 710, 1)
fig, ax = plt.subplots()
ax.bar(x, y)
plt.show()
Plots are automatically scaled for the full range of the data provided to the API.
For a bar plot, the best option to display the differences in the values of the bars, is probably to set the ylim for vertical bars or xlim for horizontal bars.
negative data
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = range(-700, -750, -5)
fig, ax = plt.subplots(figsize=(7, 5))
ax.bar(x, y)
plt.ylim(min(y), max(y))
positive data
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = range(700, 750, 5)
fig, ax = plt.subplots(figsize=(7, 5))
ax.bar(x, y)
plt.ylim(min(y), max(y))
mixed data
If the data has a wide range of positive and negative values, there's probably not a good option, as you've noted symlog doesn't help the issue.
The best option may be to plot the positive and negative data separately.
Creating a mask does't work with a list, so convert the lists to numpy arrays.
import numpy as np
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [700, -700, 710, -710, 720, -720, 730, -730, 740, -740]
x = np.array(x)
y = np.array(y)
mask = y >= 0 # positive mask
pos_y = y[mask] # get the positive values
neg_y = y[~mask] # get the negative values; ~ is not
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(7, 5))
ax1.bar(x[mask], pos_y) # also mask x to plot the bar at the correct x-tick
ax1.set_title('Positive Values')
ax1.set_ylim(min(pos_y), max(pos_y))
ax1.set_xticks(range(0, 12)) # buffer the number of x-ticks, so the x-ticks of the two plots align.
ax2.bar(x[~mask], neg_y)
ax2.set_title('Negative Values')
ax2.set_ylim(min(neg_y), max(neg_y))
ax2.set_xticks(range(0, 12))
plt.tight_layout() # better spacing between the two plots

Python: Compute Bin-Mean Value of Scatter Plot Bullets

I have three 1D arrays (A, B, C) of equal length/size. I plot a scatter plot of B vs. A where I color each scatter plot bullet by the corresponding value in the C array (see the code below).
# Imports
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
# Create the Arrays
A = 10 * np.random.random_sample((20, 20))
B = 10 * np.random.random_sample((20, 20))
C = 100 * np.random.random_sample((20, 20))
A = A.reshape(20*20)
B = B.reshape(20*20)
C = C.reshape(20*20)
# Create the Colormap and Define Boundaries
cmap_C = cm.jet
cmap_C.set_bad(color='white')
bounds_C = np.arange(0, 110, 10)
norm_C = mpl.colors.BoundaryNorm(bounds_C, cmap_C.N)
# Plot the Figure
plt.figure()
plt.scatter(A, B, c=C, marker='o', s=100, cmap=cmap_C, norm=norm_C)
plt.xlim([-1, 11])
plt.ylim([-1, 11])
plt.xticks(np.arange(0, 11, 1))
plt.yticks(np.arange(0, 11, 1))
plt.xlabel('A')
plt.ylabel('B')
plt.grid()
plt.colorbar(label='Value of C')
plt.show()
Some bullets overlap in the figure so we cannot see them clearly. Therfore, next I now want to compute and plot the mean C value of all scatter plot bullets within each 1 integer x 1 integer bin in the figure so that each square grid point is colored by one single color (these bins are illustrated by the figure gridding). How can I do this?
It's not totally clear what you are trying to do, but I think there is an analytic result to your question before you work too hard. The expected mean value of color (C vector) is 50 because you have generated a uniformly distributed sample [0, 100]. The coordinates are also uniformly distributed, but that is irrelevant. Of course, there will be some variance in each of the grid squares.
If you need to go forward as an exercise, I'd construct a dictionary of coordinate:color mappings to help set up a screen...
color_map = {(x, y): color for x, y, color in zip(A,B,C)}
Then you could set up a dictionary to gather results for each grid and probably by taking the int() value of the coordinates put the data into the correct data field for the grid
Below is a solution that works for my purposes.
# Imports
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from zipfile import ZipFile
# Create the Arrays
xx = 5
yy = 5
A = 10 * np.random.random_sample((xx, yy))
B = 10 * np.random.random_sample((xx, yy))
C = 100 * np.random.random_sample((xx, yy))
A = A.reshape(xx*yy)
B = B.reshape(xx*yy)
C = C.reshape(xx*yy)
color_map = {(x, y): color for x, y, color in zip(A,B,C)}
xedges = np.arange(11)
yedges = np.arange(11)
H, xedges, yedges = np.histogram2d(A, B, bins=(xedges, yedges))
HT = H.T
ca = np.asarray(list(color_map))
print(ca)
cai = ca.astype(int)
print(cai)
# Extracting all dictionary values using loop + keys()
res = []
for key in color_map.keys() :
res.append(color_map[key])
res = np.asarray(res)
resi = res.astype(int)
print(resi)
BMC = np.zeros([10, 10])
for i in np.arange(len(resi)):
BMC[cai[i,1],cai[i,0]] = BMC[cai[i,1],cai[i,0]] + resi[i]
print(cai[i])
print(resi[i])
print(BMC[cai[i,1],cai[i,0]])
print(HT)
print(BMC)
BMC = BMC/HT
print(BMC)
# Create the Colormap and Define Boundaries
cmap_C = cm.jet
cmap_C.set_bad(color='white')
bounds_C = np.arange(-5, 115, 10)
norm_C = mpl.colors.BoundaryNorm(bounds_C, cmap_C.N)
cmap_hist2d = cm.CMRmap_r
cmap_hist2d.set_bad(color='white')
bounds_hist2d = np.arange(-0.5, 4.5, 1)
norm_hist2d = mpl.colors.BoundaryNorm(bounds_hist2d, cmap_hist2d.N)
cmap_C = cm.jet
cmap_C.set_bad(color='white')
BMC_plot = np.ma.array ( BMC, mask=np.isnan(BMC)) # Mask NaN
bounds_C = np.arange(-5, 115, 10)
norm_C = mpl.colors.BoundaryNorm(bounds_C, cmap_C.N)
plt.subplot(311)
plt.scatter(A, B, c=C, marker='o', s=100, cmap=cmap_C, norm=norm_C)
plt.xlim([-1, 11])
plt.ylim([-1, 11])
plt.xticks(np.arange(0, 11, 1))
plt.yticks(np.arange(0, 11, 1))
plt.ylabel('B')
plt.grid()
plt.colorbar(label='Value of C', ticks=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
plt.subplot(312)
x, y = np.meshgrid(xedges, yedges)
plt.pcolor(x, y, HT, cmap=cmap_hist2d, norm=norm_hist2d)
plt.xlim([-1, 11])
plt.ylim([-1, 11])
plt.xticks(np.arange(0, 11, 1))
plt.yticks(np.arange(0, 11, 1))
plt.ylabel('B')
plt.grid()
plt.colorbar(label='Number of Data in Bin', ticks=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
plt.subplot(313)
plt.pcolor(x, y, BMC_plot, cmap=cmap_C, norm=norm_C)
plt.xlim([-1, 11])
plt.ylim([-1, 11])
plt.xticks(np.arange(0, 11, 1))
plt.yticks(np.arange(0, 11, 1))
plt.xlabel('A')
plt.ylabel('B')
plt.grid()
plt.colorbar(label='Bin-Mean C Value', ticks=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
plt.show()

Python - matplotlib - setting margins

How to set margins of output figure please?
plt.rcParams["figure.figsize"] = [9, 4]
plt.savefig('figure.pdf')
I would like to have minimal white place to top and bottom margins. In the scrip, I remove axes, can it be the problem?
plt.xticks([])
plt.yticks([])
ax.xaxis.set_ticks_position('none')
ax.yaxis.set_ticks_position('none')
Check out tight_layout().
import matplotlib.pyplot as plt
plt.plot([1, 5, 3])
ax = plt.gca()
plt.xticks([])
plt.yticks([])
ax.xaxis.set_ticks_position('none')
ax.yaxis.set_ticks_position('none')
plt.rcParams["figure.figsize"] = [9, 4]
plt.tight_layout(pad=0)
plt.savefig('figure.pdf')

PyPlot Change Scatter Label When Points Overlap

I am graphing my predicted and actual results of an ML project using pyplot. I have a scatter plot of each dataset as a subplot and the Y values are elements of [-1, 0, 1]. I would to change the color of the points if both points have the same X and Y value but am not sure how to implement this. Here is my code so far:
import matplotlib.pyplot as plt
Y = [1, 0, -1, 0, 1]
Z = [1, 1, 1, 1, 1]
plt.subplots()
plt.title('Title')
plt.xlabel('Timestep')
plt.ylabel('Score')
plt.scatter(x = [i for i in range(len(Y))], y = Y, label = 'Actual')
plt.scatter(x = [i for i in range(len(Y))], y = Z, label = 'Predicted')
plt.legend()
I would simply make use of NumPy indexing in this case. Specifically, first plot all the data points and then additionally highlight only those point which fulfill the condition X==Y and X==Z
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
Y = np.array([1, 0, -1, 0, 1])
Z = np.array([1, 1, 1, 1, 1])
X = np.arange(len(Y))
# Labels and titles here
plt.scatter(X, Y, label = 'Actual')
plt.scatter(X, Z, label = 'Predicted')
plt.scatter(X[X==Y], Y[X==Y], color='black', s=500)
plt.scatter(X[X==Z], Z[X==Z], color='red', s=500)
plt.xticks(X)
plt.legend()
plt.show()

Resources