Ploting convex hull for floating point tuples - python-3.x

I saw this post about how to plot a convex hull:
Calculating and displaying a ConvexHull
And tried to mimic it for my needs.
The problem is that it seems that this only works for integer typed tuples, and not floating point tuples.
So I have this function that gets 2D points (float tuples),
and a plot, and I wish that it will return the updated plot (with the convex hull painted) and the hull points:
def get_cp(cluster, plt):
hull = ConvexHull(cluster)
# plt.plot(cluster[:, 0], cluster[:, 1], '.', color='k')
for simplex in hull.simplices:
plt.plot(cluster[simplex, 0], cluster[simplex, 1], 'c')
plt.plot(cluster[hull.vertices, 0], cluster[hull.vertices, 1], 'o', mec='r', color='none', lw=1, markersize=10)
return hull.simplices, plt
Can anyone please assist?

I'm not sure if I understand your question completely. The following is the code from the answer you linked with only one modified line: instead of generating random integer points, I'm generating random float points: points = 10 * np.random.rand(15, 2). The result looks good.
from scipy.spatial import ConvexHull
import matplotlib.pyplot as plt
import numpy as np
points = 10 * np.random.rand(15, 2) # Random points in 2-D
# convert them to a list of tuples, to accomodate user request
points = [tuple(t) for t in points]
# Need to convert your list of tuples to a Numpy array.
# That's mandatory!!!
points = np.array(points)
hull = ConvexHull(points)
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(10, 3))
for ax in (ax1, ax2):
ax.plot(points[:, 0], points[:, 1], '.', color='k')
if ax == ax1:
ax.set_title('Given points')
else:
ax.set_title('Convex hull')
for simplex in hull.simplices:
ax.plot(points[simplex, 0], points[simplex, 1], 'c')
ax.plot(points[hull.vertices, 0], points[hull.vertices, 1], 'o', mec='r', color='none', lw=1, markersize=10)
ax.set_xticks(range(10))
ax.set_yticks(range(10))
plt.show()

Related

Plotting clustered np.array by using plt.scatter

I have a numpy vector in the shape of 17520 and only one column, I want to plot it by using plt.scatter and I don`t know if I should change the shape or not.
My np vector consists of three values 0,1 and 2, where each value represents the cluster number after using hierarchical clustering.
After reading the Excel file and doing some pre-processing, here is my code:
plt.figure(figsize=(10, 7))
plt.title("Customer Dendograms")
dend = shc.dendrogram(shc.linkage(data1, method='ward'))
cluster = AgglomerativeClustering(n_clusters=3, affinity='euclidean', linkage='ward')
X=cluster.fit_predict(data1)
The variable X holds the np vector that I want to plot and the shape is 17520.
Please any help to be able to plot the data in which the 0 values are red, the ones are blue, and the twos are green.
You can try this (assuming that data1 has shape 17520 x 2 and X has shape 17520 x 1).
redClass = data1[X == 0]
blueClass = data1[X == 1]
greenClass = data1[X == 2]
plt.scatter(redClass[:, 0], redClass[:, 1], c='r')
plt.scatter(blueClass[:, 0], blueClass[:, 1], c='b')
plt.scatter(greenClass[:, 0], greenClass[:, 1], c='g')
plt.show()

Lines in 3d plot in python

I have the following script:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
nn = 400 # number of points along circle's perimeter
theta = np.linspace(0, 2*np.pi, nn)
rho = np.ones(nn)
# (x,y) represents points on circle's perimeter
x = np.ravel(rho*np.cos(theta))
y = np.ravel(rho*np.sin(theta))
fig, ax = plt.subplots()
plt.rcParams["figure.figsize"] = [6, 10]
ax = plt.axes(projection='3d') # set the axes for 3D plot
ax.azim = -90 # y rotation (default=270)
ax.elev = 21 # x rotation (default=0)
# low, high values of z for plotting 2 circles at different elev.
loz, hiz = -15, 15
# Plot two circles
ax.plot(x, y, hiz)
ax.plot(x, y, loz)
# set some indices to get proper (x,y) for line plotting
lo1,hi1 = 15, 15+nn//2
lo2,hi2 = lo1+nn//2-27, hi1-nn//2-27
# plot 3d lines using coordinates of selected points
ax.plot([x[lo1], x[hi1]], [y[lo1], y[hi1]], [loz, hiz])
ax.plot([x[lo2], x[hi2]], [y[lo2], y[hi2]], [loz, hiz])
ax.plot([0, 0, 0], [0, 0, 10])
ax.plot([0, 0, 0], [9, 0, 0])
ax.plot([0, 0, 0], [0, 8, 0])
plt.show()
At the end of the script, I would like to plot three lines in three directions. How to do that? Why this:
ax.plot([0, 0, 0], [0, 0, 10])
ax.plot([0, 0, 0], [9, 0, 0])
ax.plot([0, 0, 0], [0, 8, 0])
gives the line in same direction?
And I have a second question, please. How to make the cone more narrower (the base more similar to circle)?
Output now:
ax.plot([0, 0, 0], [0, 0, 10]) is giving plot the x and y coordinates of 3 points, but you haven't given any coordinates in the z direction. Remember the inputs to plot are x, y, z, not, as you seem to have assumed, (x0,y0,z0), (x1,y1,z1)
So this is drawing 3 "lines" where two of them start and end at x=y=z=0, and one of them extends to y=10. The other two ax.plot calls you have are doing similar things.
To draw three lines that start at the origin and each extend along one of the x, y, or z directions, you perhaps meant to use:
ax.plot([0, 0], [0, 0], [0, 10]) # extend in z direction
ax.plot([0, 0], [0, 8], [0, 0]) # extend in y direction
ax.plot([0, 9], [0, 0], [0, 0]) # extend in x direction
Note that this also makes your circles look more like circles
After commenting the last 3 lines of your code, the image is the output I am getting
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
nn = 400 # number of points along circle's perimeter
theta = np.linspace(0, 2*np.pi, nn)
rho = np.ones(nn)
# (x,y) represents points on circle's perimeter
x = np.ravel(rho*np.cos(theta))
y = np.ravel(rho*np.sin(theta))
fig, ax = plt.subplots()
plt.rcParams["figure.figsize"] = [6, 10]
ax = plt.axes(projection='3d') # set the axes for 3D plot
ax.azim = -90 # y rotation (default=270)
ax.elev = 21 # x rotation (default=0)
# low, high values of z for plotting 2 circles at different elev.
loz, hiz = -15, 15
# Plot two circles
ax.plot(x, y, hiz)
ax.plot(x, y, loz)
# set some indices to get proper (x,y) for line plotting
lo1,hi1 = 15, 15+nn//2
lo2,hi2 = lo1+nn//2-27, hi1-nn//2-27
# plot 3d lines using coordinates of selected points
ax.plot([x[lo1], x[hi1]], [y[lo1], y[hi1]], [loz, hiz])
ax.plot([x[lo2], x[hi2]], [y[lo2], y[hi2]], [loz, hiz])
#ax.plot([0, 0, 0], [0, 0, 10])
#ax.plot([0, 0, 0], [9, 0, 0])
#ax.plot([0, 0, 0], [0, 8, 0])
plt.show()
You can see that the base is almost a perfect circle. Because you are also plotting lines in your figure, it is giving an illusion that the base in not a circle.
And regarding the lines in 3 different directions. Since this part of code
ax.plot([0, 0, 0], [0, 0, 10])
ax.plot([0, 0, 0], [9, 0, 0])
ax.plot([0, 0, 0], [0, 8, 0])
has all zeroes in X-Axis, it is essentially plotting the lines on Y-Axis only.
When I give some values in the X-Axis part, like this
ax.plot([1, 0, 0], [0, 0, 10])
ax.plot([0, 0, 5], [9, 0, 0])
ax.plot([0, 8, 0], [0, 8, 0])
The output is
I hope this is what you were asking.

Adding one colorbar for hist2d subplots and make them adjacent

I am struggling with tweaking a plot, I have been working on.
I am facing to two problems:
The plots should be adjacent and with 0 wspace and hspace. I set both values to zero but still there are some spaces between the plots.
I would like to have one colorbar for all the subplots (they all the same range). Right now, the code adds a colorbar to the last subplot as i understand that it needs the third return value of hist2D.
Here is my code so far:
def plot_panel(pannel_plot):
fig, ax = plt.subplots(3, 2, figsize=(7, 7), gridspec_kw={'hspace': 0.0, 'wspace': 0.0}, sharex=True, sharey=True)
fig.subplots_adjust(wspace=0.0)
ax = ax.flatten()
xmin = 0
ymin = 0
xmax = 0.19
ymax = 0.19
hist2_num = 0
h =[]
for i, j in zip(pannel_plot['x'].values(), pannel_plot['y'].values()):
h = ax[hist2_num].hist2d(i, j, bins=50, norm=LogNorm(vmin=1, vmax=5000), range=[[xmin, xmax], [ymin, ymax]])
ax[hist2_num].set_aspect('equal', 'box')
ax[hist2_num].tick_params(axis='both', top=False, bottom=True, left=True, right=False,
labelsize=10, direction='in')
ax[hist2_num].set_xticks(np.arange(xmin, xmax, 0.07))
ax[hist2_num].set_yticks(np.arange(ymin, ymax, 0.07))
hist2_num += 1
fig.colorbar(h[3], orientation='vertical', fraction=.1)
plt.show()
And the corrsiponding result:
Result
I would be glad for any heads up that i am missing!
You can use ImageGrid, which was designed to make this kind of things easier
data = np.vstack([
np.random.multivariate_normal([10, 10], [[3, 2], [2, 3]], size=100000),
np.random.multivariate_normal([30, 20], [[2, 3], [1, 3]], size=1000)
])
from mpl_toolkits.axes_grid1 import ImageGrid
fig = plt.figure(figsize=(4, 6))
grid = ImageGrid(fig, 111, # similar to subplot(111)
nrows_ncols=(3, 2), # creates 2x2 grid of axes
axes_pad=0.1, # pad between axes in inch.
cbar_mode="single",
cbar_location="right",
cbar_pad=0.1
)
for ax in grid:
h = ax.hist2d(data[:, 0], data[:, 1], bins=100)
fig.colorbar(h[3], cax=grid.cbar_axes[0], orientation='vertical')
or
data = np.vstack([
np.random.multivariate_normal([10, 10], [[3, 2], [2, 3]], size=100000),
np.random.multivariate_normal([30, 20], [[2, 3], [1, 3]], size=1000)
])
from mpl_toolkits.axes_grid1 import ImageGrid
fig = plt.figure(figsize=(4, 6))
grid = ImageGrid(fig, 111, # similar to subplot(111)
nrows_ncols=(3, 2), # creates 2x2 grid of axes
axes_pad=0.1, # pad between axes in inch.
cbar_mode="single",
cbar_location="top",
cbar_pad=0.1
)
for ax in grid:
h = ax.hist2d(data[:, 0], data[:, 1], bins=100)
fig.colorbar(h[3], cax=grid.cbar_axes[0], orientation='horizontal')
grid.cbar_axes[0].xaxis.set_ticks_position('top')

How does `cosine` metric works in sklearn's clustering algorithoms?

I'm puzzeled about how does cosine metric works in sklearn's clustering algorithoms.
For example, DBSCAN has a parameter eps and it specified maximum distance when clustering. However, bigger cosine similarity means two vectors are closer, which is just the opposite to our distance concept.
I found that there are cosine_similarity and cosine_distance( just 1-cos() ) in pairwise_metric, and when we specified the metric is cosine we use cosine_similarity.
So, when clustering, how does DBSCAN compares the cosine_similarity and #param eps to decide whether two vectors have the same label?
An example
import numpy as np
from sklearn.cluster import DBSCAN
samples = [[1, 0], [0, 1], [1, 1], [2, 2]]
clf = DBSCAN(metric='cosine', eps=0.1)
result = clf.fit_predict(samples)
print(result)
it outputs [-1, -1, -1, -1] which means these four points are in the same cluster
However,
for points pair [1,1], [2, 2],
its cosine_similarity is 4/(4) = 1,
the cosine distance will be 1-1 = 0, so they are in the same cluster
for points pair[1,1], [1,0],
its cosine_similarity is 1/sqrt(2),
the cosine distance will be 1-1/sqrt(2) = 0.29289321881345254, this distance is bigger than our eps 0.1, why DBSCAN clustered them into the same cluster?
Thanks for #Stanislas Morbieu 's answer, and I finally understand the cosine metric means cosine_distance which is 1-cosine
The implementation of DBSCAN in scikit-learn rely on NearestNeighbors (see the implementation of DBSCAN).
Here is an example to see how it works with cosine metric:
import numpy as np
from sklearn.neighbors import NearestNeighbors
samples = [[1, 0], [0, 1], [1, 1], [2, 2]]
neigh = NearestNeighbors(radius=0.1, metric='cosine')
neigh.fit(samples)
rng = neigh.radius_neighbors([[1, 1]])
print([samples[i] for i in rng[1][0]])
It outputs [[1, 1], [2, 2]], i.e. the points which are closest to [1, 1] in a radius of 0.1.
So points which have a cosine distance smaller than eps in DBSCAN tend to be in the same cluster.
The parameter min_samples of DBSCAN plays an important role. Since by default, it is set to 5, no points can be considered as core point.
Setting it to 1, the example code:
import numpy as np
from sklearn.cluster import DBSCAN
samples = [[1, 0], [0, 1], [1, 1], [2, 2]]
clf = DBSCAN(metric='cosine', eps=0.1, min_samples=1)
result = clf.fit_predict(samples)
print(result)
outputs [0 1 2 2] which means that [1, 1] and [2, 2] are in the same cluster (numbered 2).
By the way, the output [-1, -1, -1, -1] doesn't mean that points are in the same cluster, but that all points are in no cluster.

PyPlot Change Scatter Label When Points Overlap

I am graphing my predicted and actual results of an ML project using pyplot. I have a scatter plot of each dataset as a subplot and the Y values are elements of [-1, 0, 1]. I would to change the color of the points if both points have the same X and Y value but am not sure how to implement this. Here is my code so far:
import matplotlib.pyplot as plt
Y = [1, 0, -1, 0, 1]
Z = [1, 1, 1, 1, 1]
plt.subplots()
plt.title('Title')
plt.xlabel('Timestep')
plt.ylabel('Score')
plt.scatter(x = [i for i in range(len(Y))], y = Y, label = 'Actual')
plt.scatter(x = [i for i in range(len(Y))], y = Z, label = 'Predicted')
plt.legend()
I would simply make use of NumPy indexing in this case. Specifically, first plot all the data points and then additionally highlight only those point which fulfill the condition X==Y and X==Z
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
Y = np.array([1, 0, -1, 0, 1])
Z = np.array([1, 1, 1, 1, 1])
X = np.arange(len(Y))
# Labels and titles here
plt.scatter(X, Y, label = 'Actual')
plt.scatter(X, Z, label = 'Predicted')
plt.scatter(X[X==Y], Y[X==Y], color='black', s=500)
plt.scatter(X[X==Z], Z[X==Z], color='red', s=500)
plt.xticks(X)
plt.legend()
plt.show()

Resources