Plotting clustered np.array by using plt.scatter - python-3.x

I have a numpy vector in the shape of 17520 and only one column, I want to plot it by using plt.scatter and I don`t know if I should change the shape or not.
My np vector consists of three values 0,1 and 2, where each value represents the cluster number after using hierarchical clustering.
After reading the Excel file and doing some pre-processing, here is my code:
plt.figure(figsize=(10, 7))
plt.title("Customer Dendograms")
dend = shc.dendrogram(shc.linkage(data1, method='ward'))
cluster = AgglomerativeClustering(n_clusters=3, affinity='euclidean', linkage='ward')
X=cluster.fit_predict(data1)
The variable X holds the np vector that I want to plot and the shape is 17520.
Please any help to be able to plot the data in which the 0 values are red, the ones are blue, and the twos are green.

You can try this (assuming that data1 has shape 17520 x 2 and X has shape 17520 x 1).
redClass = data1[X == 0]
blueClass = data1[X == 1]
greenClass = data1[X == 2]
plt.scatter(redClass[:, 0], redClass[:, 1], c='r')
plt.scatter(blueClass[:, 0], blueClass[:, 1], c='b')
plt.scatter(greenClass[:, 0], greenClass[:, 1], c='g')
plt.show()

Related

Ploting convex hull for floating point tuples

I saw this post about how to plot a convex hull:
Calculating and displaying a ConvexHull
And tried to mimic it for my needs.
The problem is that it seems that this only works for integer typed tuples, and not floating point tuples.
So I have this function that gets 2D points (float tuples),
and a plot, and I wish that it will return the updated plot (with the convex hull painted) and the hull points:
def get_cp(cluster, plt):
hull = ConvexHull(cluster)
# plt.plot(cluster[:, 0], cluster[:, 1], '.', color='k')
for simplex in hull.simplices:
plt.plot(cluster[simplex, 0], cluster[simplex, 1], 'c')
plt.plot(cluster[hull.vertices, 0], cluster[hull.vertices, 1], 'o', mec='r', color='none', lw=1, markersize=10)
return hull.simplices, plt
Can anyone please assist?
I'm not sure if I understand your question completely. The following is the code from the answer you linked with only one modified line: instead of generating random integer points, I'm generating random float points: points = 10 * np.random.rand(15, 2). The result looks good.
from scipy.spatial import ConvexHull
import matplotlib.pyplot as plt
import numpy as np
points = 10 * np.random.rand(15, 2) # Random points in 2-D
# convert them to a list of tuples, to accomodate user request
points = [tuple(t) for t in points]
# Need to convert your list of tuples to a Numpy array.
# That's mandatory!!!
points = np.array(points)
hull = ConvexHull(points)
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(10, 3))
for ax in (ax1, ax2):
ax.plot(points[:, 0], points[:, 1], '.', color='k')
if ax == ax1:
ax.set_title('Given points')
else:
ax.set_title('Convex hull')
for simplex in hull.simplices:
ax.plot(points[simplex, 0], points[simplex, 1], 'c')
ax.plot(points[hull.vertices, 0], points[hull.vertices, 1], 'o', mec='r', color='none', lw=1, markersize=10)
ax.set_xticks(range(10))
ax.set_yticks(range(10))
plt.show()

How to increase color resolution in python matplotlib 3D plots

(edited to make the code clearer) I am using Poly3DCollection to make a graph where several polygons in a 3D space have a colour that depends on a value contained in a separate array.
cmap = cm.plasma
quantity = [0.1, 0.11, 5, 10]
colours = cmap(quantity)
for i in range(K):
x = [0, 1, 1, 0]
y = [0, 0, 1, 1]
z = [0, 1, 0, 1]
verts = [list(zip(x, y, z))]
ax.add_collection3d(Poly3DCollection(verts, color=colours[i]))
the problem I have is that the resulting image has a very limited colour resolution, and most of the polygons have the same colours.
I understood from this post that it may depend from python automatically using only 7 different colour levels, but unfortunately the solution in the post only applies to 2D plots.
Any idea on how to extend that to 3D plots?

PyPlot Change Scatter Label When Points Overlap

I am graphing my predicted and actual results of an ML project using pyplot. I have a scatter plot of each dataset as a subplot and the Y values are elements of [-1, 0, 1]. I would to change the color of the points if both points have the same X and Y value but am not sure how to implement this. Here is my code so far:
import matplotlib.pyplot as plt
Y = [1, 0, -1, 0, 1]
Z = [1, 1, 1, 1, 1]
plt.subplots()
plt.title('Title')
plt.xlabel('Timestep')
plt.ylabel('Score')
plt.scatter(x = [i for i in range(len(Y))], y = Y, label = 'Actual')
plt.scatter(x = [i for i in range(len(Y))], y = Z, label = 'Predicted')
plt.legend()
I would simply make use of NumPy indexing in this case. Specifically, first plot all the data points and then additionally highlight only those point which fulfill the condition X==Y and X==Z
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
Y = np.array([1, 0, -1, 0, 1])
Z = np.array([1, 1, 1, 1, 1])
X = np.arange(len(Y))
# Labels and titles here
plt.scatter(X, Y, label = 'Actual')
plt.scatter(X, Z, label = 'Predicted')
plt.scatter(X[X==Y], Y[X==Y], color='black', s=500)
plt.scatter(X[X==Z], Z[X==Z], color='red', s=500)
plt.xticks(X)
plt.legend()
plt.show()

How to generate multi class test dataset using numpy?

I want to generate a multi class test dataset using numpy only for a classification problem.
For example X is a numpy array of dimension(mxn), y of dimension(mx1) and let's say there are k no. of classes. Please help me with the code.
[Here X represents the features and y represents the labels]
You can use np.random.randint like:
import numpy as np
m = 4
n = 4
k = 5
X = np.random.randint(0,2,(m,n))
X
array([[1, 1, 1, 1],
[1, 0, 0, 1],
[1, 1, 0, 0],
[1, 1, 1, 1]])
y = np.random.randint(0,k,m)
y
array([3, 3, 0, 4])
You can create multi class dataset using numpy as follows -
def generate_dataset(size, classes=2, noise=0.5):
# Generate random datapoints
labels = np.random.randint(0, classes, size)
x = (np.random.rand(size) + labels) / classes
y = x + np.random.rand(size) * noise
# Reshape data in order to merge them
x = x.reshape(size, 1)
y = y.reshape(size, 1)
labels = labels.reshape(size, 1)
# Merge the data
data = np.hstack((x, y, labels))
return data
When visualised with matplotlib generated data will look like following -
You can change the number of classes and spread of data using classes and noise parameter. Here I have kept linear relation between x-axis and y-axis values which can also be changed as per requirement.

Clip parts of a tensor

I have a theano tensor and I would like to clip its values, but each index to a different range.
For example, if I have a vector [a,b,c] , I want to clip a to [0,1] , clip b to [2,3] and c to [3,5].
How can I do that efficiently?
Thanks!
The theano.tensor.clip operation supports symbolic minimum and maximum values so you can pass three tensors, all of the same shape, and it will perform an element-wise clip of the first with respect to the second (minimum) and third (maximum).
This code shows two variations on this theme. v1 requires the minimum and maximum values to be passed as separate vectors while v2 allows the minimum and maximum values to be passed more like a list of pairs, represented as a two column matrix.
import theano
import theano.tensor as tt
def v1():
x = tt.vector()
min_x = tt.vector()
max_x = tt.vector()
y = tt.clip(x, min_x, max_x)
f = theano.function([x, min_x, max_x], outputs=y)
print f([2, 1, 4], [0, 2, 3], [1, 3, 5])
def v2():
x = tt.vector()
min_max = tt.matrix()
y = tt.clip(x, min_max[:, 0], min_max[:, 1])
f = theano.function([x, min_max], outputs=y)
print f([2, 1, 4], [[0, 1], [2, 3], [3, 5]])
def main():
v1()
v2()
main()

Resources