With a set of points in 3d. How would one go about finding the fewest set of triangles connecting those points; creating a set of triangles all sharing at least two of their sides? (like a square sheet with bumps in it made of triangles)
example points for a 5x100x500 area:
points = [
# Constant Points
(0 , 0 , 0 ),
(0 , 100, 0 ),
(5 , 100, 500),
(1 , 50 , 100),
(2 , 60 , 200),
(3 , 75 , 300),
(4 , 80 , 400),
(5 , 0 , 499),
]
Here's what the points plotted would look like with the axes normalized:
I think Delaunay Triangulation might give you what you require:
http://en.wikipedia.org/wiki/Delaunay_triangulation
Related
I was working on the code "Discrete distribution as horizontal bar chart", found here LINK, using Matplotlib 3.1.1
I've been circling around the question for a while, but I still can't figure it out: what's the meaning of the instruction: category_colors = plt.get_cmap('RdYlGn')(np.linspace(0.15, 0.85, data.shape[1])) ?
As np.linspace(0.15, 0.85, data.shape[1]) resolves to array([0.15 , 0.325, 0.5 , 0.675, 0.85 ]), I first thought that the program was using the colormap RdYlGn (supposed to go from color=0.0 to color=1.0) and was then taking the 5 specific colors located at point 0.15, etc., 0.85
But, printing category_colors resolves to a (5, 4) array:
array([[0.89888504, 0.30549789, 0.20676663, 1. ],
[0.99315648, 0.73233372, 0.42237601, 1. ],
[0.99707805, 0.9987697 , 0.74502115, 1. ],
[0.70196078, 0.87297193, 0.44867359, 1. ],
[0.24805844, 0.66720492, 0.3502499 , 1. ]])
I don't understand what these numbers refer to ???
plt.get_cmap('RdYlGn') returns a function which maps a number between 0 and 1 to a corresponding color, where 0 gets mapped to red, 0.5 to yellow and 1 to green. Often, this function gets the name cmap = plt.get_cmap('RdYlGn'). Then cmap(0) (which is the same as plt.get_cmap('RdYlGn')(0)) would be the rbga-value (0.6470588235294118, 0.0, 0.14901960784313725, 1.0) for (red, green, blue, alpha). In hexadecimal, this color would be #a50026.
By numpy's broadcasting magic, cmap(np.array([0.15 , 0.325, 0.5 , 0.675, 0.85 ])) gets the same result as np.array([cmap(0.15), cmap(0.325), ..., cmap(0.85)]). (In other words, many numpy functions applied to an array return an array of that function applied to the individual elements.)
So, the first row of category_colors = cmap(np.linspace(0.15, 0.85, 5)) will be the rgba-values of the color corresponding to value 0.15, or 0.89888504, 0.30549789, 0.20676663, 1.. This is a color with 90% red, 31% green and 21% blue (and alpha=1 for complete opaque), so quite reddish. The next row are the rgba values corresponding to 0.325, and so on.
Here is some code to illustrate the concepts:
import matplotlib.pyplot as plt
from matplotlib.colors import to_hex # convert a color to hexadecimal format
from matplotlib.cm import ScalarMappable # needed to create a custom colorbar
import numpy as np
cmap = plt.get_cmap('RdYlGn')
color_values = np.linspace(0.15, 0.85, 5)
category_colors = cmap(color_values)
plt.barh(color_values, 1, height=0.15, color=category_colors)
plt.yticks(color_values)
plt.colorbar(ScalarMappable(cmap=cmap), ticks=color_values)
plt.ylim(0, 1)
plt.xlim(0, 1.1)
plt.xticks([])
for val, color in zip(color_values, category_colors):
r, g, b, a = color
plt.text(0.1, val, f'r:{r:0.2f} g:{g:0.2f} b:{b:0.2f} a:{a:0.1f}\nhex:{to_hex(color)}', va='center')
plt.show()
PS: You might also want to read about norms, which map an arbitrary range to the range 0,1 to be used by colormaps.
I have a speed of feature points at every frame. Here I have 165 frames in a video where every frame contains speed of feature points.This is my data.
TrajDbscanData
array([[ 1. , 0.51935178],
[ 1. , 0.52063496],
[ 1. , 0.54598193],
...,
[165. , 0.47198981],
[165. , 2.2686042 ],
[165. , 0.79044946]])
where first index is frame number and second one is speed of a feature point at that frame.
Here I want to do density based clustering for different speed range. For this , I use following code.
import sklearn.cluster as sklc
core_samples, labels_db = sklc.dbscan(
TrajDbscanData, # array has to be (n_samples, n_features)
eps=0.5,
min_samples=15,
metric='euclidean',
algorithm='auto'
)
core_samples_mask = np.zeros_like(labels_db, dtype=bool)
core_samples_mask[core_samples] = True
unique_labels = set(labels_db)
n_clusters_ = len(unique_labels) - (1 if -1 in labels_db else 0)
colors = plt.cm.Spectral(np.linspace(0, 1, len(unique_labels)))
plt.figure(figcount)
figcount+=1
for k, col in zip(unique_labels, colors):
if k == -1:
# Black used for noise.
col = 'k'
class_member_mask = (labels_db == k)
xy = TrajDbscanData[class_member_mask & core_samples_mask]
plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=col, markeredgecolor='k', markersize=6)
xy = TrajDbscanData[class_member_mask & ~core_samples_mask]
plt.plot(xy[:, 0], xy[:, 1], 'x', markerfacecolor=col, markeredgecolor='k', markersize=4)
plt.rcParams["figure.figsize"] = (10,7)
plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.grid(True)
plt.show()
I got the following result.
Y axis is speed and x axis is frame number
I want to do density based clustering according to speed. for example speed upto 1.0 in one cluster , speed from 1 to 1.5 as outlier , speed from 1.5 to 2.0 another cluster and speed above 2.0 in another cluster. This helps to identify common motion pattern types. How can I do this ?
Don't use Euclidean distance.
Since your x and y a is have very different meaning, that is the wrong distance function to use.
Your plot is misleading, because the axes have different scale. If you would scale x and y the same way, you would understand what has been happening... The y axis is effectively ignored, and you slice the data by your discrete integer time axis.
You may need to use Generalized DBSCAN and treat time and value separately!
I am working on a clustering problem. There's a situation where I have 3 cluster centers as below, and I want to calculate euclidean distance from these 3 cluster centers from another m*n dimensional matrix. It would be very helpful if anyone can guide me through this.
kmeans.cluster_centers_
Out[99]:
array([[-2.23020213, 0.35654288],
[ 7.69370352, 1.72991757],
[ 0.92519202, -0.29218753]])
matrix
Out[100]:
array([[ 0.11650485, 0.11650485, 0.11650485, 0.11650485, 0.11650485,
0.11650485],
[ 0.11650485, 0.18446602, 0.18446602, 0.2815534 , 0.37864078,
0.37864078],
[ 0.21359223, 0.21359223, 0.21359223, 0.21359223, 0.29708738,
0.35533981],
...,
[ 0.2640625 , 0.2734375 , 0.30546875, 0.31953125, 0.31953125,
0.31953125],
[ 1. , 1. , 1. , 1. , 1. ,
1. ],
[ 0.5 , 0.5 , 0.5 , 0.5 , 0.5 ,
0.5 ]])
I want to do it in Python. I have used sklearn for my clustering.
Euclidean distance is defined on vectors of a fixed length d.
I.e.it is a function R^d x R^d -> R
So whatever you are trying to do - it is not the usual Euclidean distance. You seem to have k=3 cluster centers with d=2 coordinates, but your matrix has an incompatible shape that cannot be interpreted in an obvious way as 2 d vectors.
Check out this Python code:
degrees = 90
center = (24, 24)
img = np.ones((48,48,3)) * 255
mat = cv2.getRotationMatrix2D(center, degrees, 1.0)
img = cv2.warpAffine(img, mat, (48, 48))
My expectation is that a 3 channel, fully saturated, white square will be created and stored in img. After which, it'll be rotated by 90 degrees. Rotating a white square by 90 degrees should result in ... an indistinguishable white square. But when I:
plt.imshow(img)
plt.show(img)
I see an erroneous black border:
Is there any way to get warpAffine working as expected, i.e. rotate the image without an erroneous border? I've tried the following modifications to no avail:
center = (23, 23)
center = (24, 23)
center = (23, 24)
center = (25, 25)
center = (24, 25)
center = (25, 24)
You should be using the exact center of the image rather than the next closest thing. The rotation is slightly off center using (24,24).
Since getRotationMatrix2D accepts a Point2f, you should be passing the center as (23.5,23.5), as it is the midway point between 0 and 47.
I seem to be unable to find out how to vectorize this py3 loop
import numpy as np
a = np.array([-72, -10, -70, 37, 68, 9, 1, -3, 2, 3, -6, -4, ], np.int16)
result = np.array([-72, -10, -111, -23, 1, -2, 1, -3, 1, 2, -5, -5, ], np.int16)
b = np.copy(a)
for i in range(2, len(b)):
b[i] += int( (b[i-1] + b[i-2]) / 2)
assert (b == result).all()
I tried playing with np.convolve and pandas.rolling_apply but couldn't get it working. Maybe this is the time to learn about c-extensions?
It would be great to get the time for this down to something like 50..100ms for input arrays of ~500k elements.
#hpaulj asked in his answer for a closed expression of b[k] in terms of a[:k]. I didn't think it existed, but I worked a bit on it and indeed found that the closed form contains a bunch of Jacobsthal numbers as #Divakar pointed out.
Here is one closed form:
J_n here is the Jacobsthal number, when expanding it like this:
J_n = (2^n - (-1)^n) / 3
one ends up with an expression which I can imagine to use a vectorized implementation ...
Most numpy code operates on the whole array at once. Ok it iterates in C code but buffered in a way that it doesn't matter which element is used first.
Here changes to b[2] affect the value calculated for b[3] and on down the line.
add.at and other such ufunc do unbuffered calculations. This allows you to add some value repeatedly to one element. I played a bit with it in that case, but no luck so far.
cumsum and cumprod are also handy for problems were values depend earlier ones.
Is it possible to generalize the calculation, so as to as define b[i] in terms of all the a[:i]. We know b[2] as a function of a[:2], but what of b[3]?
Even if we go this working for floats, it might be off when doing integer divisions.
I think you already have the sane solution. Any other vectorization would rely on floating point calculations and it would be really difficult to keep track of the error accumulation. For example say you want to have a matrix vector multiplication: for the first seven terms the matrix would look like
array([[ 1. , 0. , 0. , 0. , 0. , 0. , 0. ],
[ 0. , 1. , 0. , 0. , 0. , 0. , 0. ],
[ 0.5 , 0.5 , 1. , 0. , 0. , 0. , 0. ],
[ 0.25 , 0.75 , 0.5 , 1. , 0. , 0. , 0. ],
[ 0.375 , 0.625 , 0.75 , 0.5 , 1. , 0. , 0. ],
[ 0.3125 , 0.6875 , 0.625 , 0.75 , 0.5 , 1. , 0. ],
[ 0.34375, 0.65625, 0.6875 , 0.625 , 0.75 , 0.5 , 1. ]])
The relationship can be described as the iterative formula
[ a[i-2] ]
b[i] = [0.5 , 0.5 , 1] [ a[i-1] ]
[ a[i] ]
That defines a series of elementary matrices of the form of an identity matrix with
[0 ... 0.5 0.5 1 0 ... 0]
on the ith row. And successive multiplication gives the matrix above for the first seven terms. There is indeed a subdiagonal structure but the terms are getting too small very quickly. As you have shown 2 to the power 500k is not fun.
In order to keep track of floating point noise, an iterative solution is required which is what you have anyways.