How to calculate the distance in Python - python-3.x

let said I have two arrays of points, and I want to know what is the distance between each point.
For example:
array_1 = [p1,p2,p3,p4]
array_2 = [p5,p6]
p1 to p6 is point, something like [1,1,1] (3D)
the output I want is
output = [[distance of p1 to p5, distance of p2 to p5, ... distance of p4 to p5],
[distance of p1 to p6, distance of p2 to p6, ... distance of p4 to p6]]
what is the best approach if I want to use numpy?

You can first arange the two arrays into an m×1×3 and an 1×n×3 shape, and then subtract the coordinates:
delta = array_1[:,None] - array_2
Next we can square the differences in the coordinates, and calculate the sum, then we can calculate the square roout:
distances = np.sqrt((delta*delta).sum(axis=2))
Now distances is an m×n matrix with as ij-th element the distance between the i-th element of the first array, and j-th element of the second array.
For example if we have as data:
>>> array_1 = np.arange(12).reshape(-1,3)
>>> array_2 = 2*np.arange(6).reshape(-1,3)
We get as result:
>>> delta = array_1[:,None] - array_2
>>> distances = np.sqrt((delta*delta).sum(axis=2))
>>> distances
array([[ 2.23606798, 12.20655562],
[ 3.74165739, 7.07106781],
[ 8.77496439, 2.23606798],
[13.92838828, 3.74165739]])
The first element of array_1 has coordinates (0,1,2), and the second of array_2 has coordinates (6,8,10). Hence the distance is:
>>> np.sqrt(6*6 + 7*7 + 8*8)
12.206555615733702
This is what we see in the distances array for distances[0,1].
The above function method can calculate the Euclidean distance for an arbitrary amount of dimensions. Given both array_1 and array_2 have points with the same number of dimensions (1D, 2D, 3D, etc.), this can calculate the distances of the points.

This answer isn't specifically for numpy arrays, but could easily be extended to include them. The module itertools.product is your friend here.
# Fill this with your formula for distance
def calculate_distance(point_1, point_2):
distance = ...
return distance
# The itertools module helps here
import itertools
array_1, array_2 = [p1, p2, p3, p4], [p5, p6]
# Initialise list to store answers
distances = []
# Iterate over every combination and calculate distance
for i, j in itertools.product(array_1, array_2):
distances.append(calculate_distance(i, j)

Related

Finding a third vertex of N-dimension equilateral triangle

Given two vectors X and Y, where the number of elements in each is 5. Find a V vector that satisfies :
||X-V||=||Y-V||=||X-Y||
(X,Y,V) are the vertices of an equilateral triangle.
I have tried the following:
To get a vector V that is perpendicular to A and B :
import NumPy as np
# Example vectors
x = [ 0.93937874, 0.05568767, -2.05847484, -1.15965884, -0.34035054]
y = [-0.45921145, -0.55653187, 0.6027685, 0.13113272, -1.2176953 ]
# convert those vectors to a matrix to apply SVD (sure there is a shorter code to do so)
A_list=[]
A_list.append(x)
A_list.append(y)
A=np.array(A_list) # A is a Numpy matrix
u,s,vh=np.linalg.svd(A)
v=vh[-1:1]
From here, what should I do? assuming that what I have done so far is correct

QR method for eigenvectors Python

I am trying to find the eigenvectors of matrix A using QR method. I found the eigenvalues and eigenvector which corresponds to the largest eigenvalue. How do I find the rest of the eigenvectors without using numpy.linalg.eig?
import numpy as np
A = np.array([
[1, 0.3],
[0.45, 1.2]
])
def eig_evec_decomp(A, max_iter=100):
A_k = A
Q_k = np.eye(A.shape[1])
for k in range(max_iter):
Q, R = np.linalg.qr(A_k)
Q_k = Q_k.dot(Q)
A_k = R.dot(Q)
eigenvalues = np.diag(A_k)
eigenvectors = Q_k
return eigenvalues, eigenvectors
evals, evecs = eig_evec_decomp(A)
print(evals)
# array([1.48078866, 0.71921134])
print(evecs)
# array([[ 0.52937334, -0.84838898],
# [ 0.84838898, 0.52937334]])
Next I check the condition:
Ax=wx
Where:
A - Original matrix;
x - eigenvector;
w - eigenvalue.
Check the conditions:
print(np.allclose(A.dot(evecs[:,0]), evals[0] * evecs[:,0]))
# True
print(np.allclose(A.dot(evecs[:,1]), evals[1] * evecs[:,1]))
# False
There is no promise in the algorithm that Q_k will have the eigenvectors as columns. It is even rather rare that there will be an orthogonal eigenbasis. This is so special that this case has a name, these are the normal matrices, characterized in that they commute with their transpose.
In general, the A_k you converge to will still be upper triangular with non-trivial content above the diagonal. Check by computing Q_k.T # A # Q_k. What is known from the structure is that the ith eigenvector is a linear combination of the first k columns of Q_k. This could simplify solving the eigen-vector equation somewhat. Or directly determine the eigenvectors of the converged A_k and transform back with Q_k.

Distance betwwen two 3D points in python in an array

i'm trying to write a python code to calculate the distance between two 3D points. Those points are listed as follows:
Timestamp, X, Y, Z, Distance
2613, 4.35715, 5.302030, -0.447308
2614, 7.88429, -8.401940, -0.484432
2615, 4.08796, 2.213850, -0.515359
2616, 4.35715, 5.302030, -0.447308
2617, 7.88429, -8.401940, -0.484432
i know the formula but I'm not sure how to list the column to run the formula for 3D point distance!
This is essentially the same question as How can the Euclidean distance be calculated with NumPy?
you can use numpy/scipy.linalg.norm
E.g.
scipy.lingalg.norm(2613-2614)
can you try this code and see if you can get some ideas to start:
# distance between 2 points in 3D
from math import pow, sqrt
from functools import reduce
def calculate_dist(point1, point2):
x, y, z = point1
a, b, c = point2
distance = sqrt(pow(a - x, 2) +
pow(b - y, 2) +
pow(c - z, 2)* 1.0)
return distance
point1 = (2, 3, 4) # tuple
point2 = (1, 5, 7)
print(calculate_dist(point1, point2))
# reduce(calcuate_dist(oint1, point2)) # apply to your data

Numpy finding the number of points within a specific distance in absolute value

I have a bumpy array. I want to find the number of points which lies within an epsilon distance from each point.
My current code is (for a n*2 array, but in general I expect the array to be n * m)
epsilon = np.array([0.5, 0.5])
np.array([ 1/np.float(np.sum(np.all(np.abs(X-x) <= epsilon, axis=1))) for x in X])
But this code might not be efficient when it comes to an array of let us say 1 million rows and 50 columns. Is there a better and more efficient method ?
For example data
X = np.random.rand(10, 2)
you can solve this using broadcasting:
1 / np.sum(np.all(np.abs(X[:, None, ...] - X[None, ...]) <= epsilon, axis=-1), axis=-1)

How to find all distances between points in a matrix without duplicates?

I have a Nx3 matrix that contains the x,y,z coordinates of N points in 3D space. I'd like to find the absolute distances between all points without duplicates.
I tried using scipy.spatial.distance.cdist()
[see documentation here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html ]. However, the output matrix contains duplicats of distances. For example, the distance between the points P1 and P2 is calculated twice as distance from P1 to P2 and again as distance from P2 to P1. See code output:
>>> from scipy.spatial import distance
>>> points = [[1, 2, 3],
... [4, 5, 6],
... [7, 8, 9]]
>>> distances = distance.cdist(points, points, 'euclidean')
>>> print(distances)
[[ 0. 5.19615242 10.39230485]
[ 5.19615242 0. 5.19615242]
[10.39230485 5.19615242 0. ]]
I'd like the output to be without dupilcates. For example, find the distance between the first point and all other points then the second point and the remaining points (exluding the first point) and so on. Ideally, in an efficient and scalable manner that preserves the order of the points. That is once I find the distances, I'd like to query them; e.g. finding distances within a certain range and be able to output points that correspond to these distances.
Looks like in general you want a KDTree implementation, with query_pairs.
from scipy.spatial import KDTree
points_tree = KDTree(points)
points_in_radius = points_tree.query_pairs(radius)
This will be much faster than actually computing all of the instances and applying a tolerance.

Resources