i'm trying to write a python code to calculate the distance between two 3D points. Those points are listed as follows:
Timestamp, X, Y, Z, Distance
2613, 4.35715, 5.302030, -0.447308
2614, 7.88429, -8.401940, -0.484432
2615, 4.08796, 2.213850, -0.515359
2616, 4.35715, 5.302030, -0.447308
2617, 7.88429, -8.401940, -0.484432
i know the formula but I'm not sure how to list the column to run the formula for 3D point distance!
This is essentially the same question as How can the Euclidean distance be calculated with NumPy?
you can use numpy/scipy.linalg.norm
E.g.
scipy.lingalg.norm(2613-2614)
can you try this code and see if you can get some ideas to start:
# distance between 2 points in 3D
from math import pow, sqrt
from functools import reduce
def calculate_dist(point1, point2):
x, y, z = point1
a, b, c = point2
distance = sqrt(pow(a - x, 2) +
pow(b - y, 2) +
pow(c - z, 2)* 1.0)
return distance
point1 = (2, 3, 4) # tuple
point2 = (1, 5, 7)
print(calculate_dist(point1, point2))
# reduce(calcuate_dist(oint1, point2)) # apply to your data
Related
I have some coordinates of a 3D point curve through which I lay a spline like so:
from splipy import curve_factory
pts = [...] #3D coordinate points
curve = curve_factory.curve(pts)
I know that I can get a point in 3D along the curve by evaluating it after a certain length:
point_on_curve = curve.evaluate(t)
print(point_on_curve) #outputs coordinates: (x y z)
Is it however somehow possible to do it the other way round? Is there a function/method that can tell me if a certain point is part of the curve? Or if its almost part of the curve? Something like:
curve.func(point) #output: True
or
curve.func(point) #output: distance to curve 0.0001 --> also part of curve
Thanks!
I've found this script by ventusff that performs an optimization to find the value of the parameter that you call t (in the script is u) which gives the point on the spline closest to the external point.
I report below the code with some changes to make it clearer for you. I've defined a tolerance equal to 0.001.
The selection of the optimization solver and of its parameter values requires a little bit of study. I do not have enough time now for doing that, but you can try to experiment a little bit.
In this case SciPy is used for spline generation and evaluation, but you can easily replace it with splipy. The optimization is the interesting part performed using SciPy.
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import splprep, splev
from scipy.spatial.distance import euclidean
from scipy.optimize import fmin_bfgs
points_count = 40
phi = np.linspace(0, 2. * np.pi, points_count)
k = np.linspace(0, 2, points_count)
r = 0.5 + np.cos(phi)
x, y, z = r * np.cos(phi), r * np.sin(phi), k
tck, u = splprep([x, y, z], s=1)
points = splev(u, tck)
idx = np.random.randint(low=0, high=40)
noise = np.random.normal(scale=0.01)
external_point = np.array([points[0][idx], points[1][idx], points[2][idx]]) + noise
def distance_to_point(u_):
s = splev(u_, tck)
return euclidean(external_point, [s[0][0], s[1][0], s[2][0]])
closest_u = fmin_bfgs(distance_to_point, x0=np.array([0.0]), gtol=1e-8)
closest_point = splev(closest_u, tck)
tol = 1e-3
if euclidean(external_point, [closest_point[0][0], closest_point[1][0], closest_point[2][0]]) < tol:
print("The point is very close to the spline.")
ax = plt.figure().add_subplot(projection='3d')
ax.plot(points[0], points[1], points[2], "r-", label="Spline")
ax.plot(external_point[0], external_point[1], external_point[2], "bo", label="External Point")
ax.plot(closest_point[0], closest_point[1], closest_point[2], "go", label="Closest Point")
plt.legend()
plt.show()
The script draws the plot below:
and prints the following output:
Current function value: 0.000941
Iterations: 5
Function evaluations: 75
Gradient evaluations: 32
The point is very close to the spline.
I have 2 sets of datapoints:
import random
import pandas as pd
A = pd.DataFrame({'x':[random.uniform(0, 1) for i in range(0,100)], 'y':[random.uniform(0, 1) for i in range(0,100)]})
B = pd.DataFrame({'x':[random.uniform(0, 1) for i in range(0,100)], 'y':[random.uniform(0, 1) for i in range(0,100)]})
For each one of these dataset I can produce the jointplot like this:
import seaborn as sns
sns.jointplot(x=A["x"], y=A["y"], kind='kde')
sns.jointplot(x=B["x"], y=B["y"], kind='kde')
Is there a way to calculate the "common area" between these 2 joint plots ?
By common area, I mean, if you put one joint plot "inside" the other, what is the total area of intersection. So if you imagine these 2 joint plots as mountains, and you put one mountain inside the other, how much does one fall inside the other ?
EDIT
To make my question more clear:
import matplotlib.pyplot as plt
import scipy.stats as st
def plot_2d_kde(df):
# Extract x and y
x = df['x']
y = df['y']
# Define the borders
deltaX = (max(x) - min(x))/10
deltaY = (max(y) - min(y))/10
xmin = min(x) - deltaX
xmax = max(x) + deltaX
ymin = min(y) - deltaY
ymax = max(y) + deltaY
# Create meshgrid
xx, yy = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
# We will fit a gaussian kernel using the scipy’s gaussian_kde method
positions = np.vstack([xx.ravel(), yy.ravel()])
values = np.vstack([x, y])
kernel = st.gaussian_kde(values)
f = np.reshape(kernel(positions).T, xx.shape)
fig = plt.figure(figsize=(13, 7))
ax = plt.axes(projection='3d')
surf = ax.plot_surface(xx, yy, f, rstride=1, cstride=1, cmap='coolwarm', edgecolor='none')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('PDF')
ax.set_title('Surface plot of Gaussian 2D KDE')
fig.colorbar(surf, shrink=0.5, aspect=5) # add color bar indicating the PDF
ax.view_init(60, 35)
I am interested in finding the interection/common volume (just the number) of these 2 kde plots:
plot_2d_kde(A)
plot_2d_kde(B)
Credits: The code for the kde plots is from here
I believe this is what you're looking for. I'm basically calculating the space (integration) of the intersection (overlay) of the two KDE distributions.
A = pd.DataFrame({'x':[random.uniform(0, 1) for i in range(0,100)], 'y':[random.uniform(0, 1) for i in range(0,100)]})
B = pd.DataFrame({'x':[random.uniform(0, 1) for i in range(0,100)], 'y':[random.uniform(0, 1) for i in range(0,100)]})
# KDE fro both A and B
kde_a = scipy.stats.gaussian_kde([A.x, A.y])
kde_b = scipy.stats.gaussian_kde([B.x, B.y])
min_x = min(A.x.min(), B.x.min())
min_y = min(A.y.min(), B.y.min())
max_x = max(A.x.max(), B.x.max())
max_y = max(A.y.max(), B.y.max())
print(f"x is from {min_x} to {max_x}")
print(f"y is from {min_y} to {max_y}")
x = [a[0] for a in itertools.product(np.arange(min_x, max_x, 0.01), np.arange(min_y, max_y, 0.01))]
y = [a[1] for a in itertools.product(np.arange(min_x, max_x, 0.01), np.arange(min_y, max_y, 0.01))]
# sample across 100x100 points.
a_dist = kde_a([x, y])
b_dist = kde_b([x, y])
print(a_dist.sum() / len(x)) # intergral of A
print(b_dist.sum() / len(x)) # intergral of B
print(np.minimum(a_dist, b_dist).sum() / len(x)) # intergral of the intersection between A and B
The following code compares calculating the volume of the intersection either via scipy's dblquad or via taking the average value over a grid.
Remarks:
For the 2D case (and with only 100 sample points), it seems the delta's need to be quite larger than 10%. The code below uses 25%. With a delta of 10%, the calculated values for f1 and f2 are about 0.90, while in theory they should be 1.0. With a delta of 25%, these values are around 0.994.
To approximate the volume the simple way, the average needs to be multiplied by the area (here (xmax - xmin)*(ymax - ymin)). Also, the more grid points are considered, the better the approximation. The code below uses 1000x1000 grid points.
Scipy has some special functions to calculate the integral, such as scipy.integrate.dblquad. This is much slower than the 'simple' method, but a bit more precise. The default precision didn't work, so the code below reduces that precision considerably. (dblquad outputs two numbers: the approximate integral and an indication of the error. To only get the integral, dblquad()[0] is used in the code.)
The same approach can be used for more dimensions. For the 'simple' method, create a more dimensional grid (xx, yy, zz = np.mgrid[xmin:xmax:100j, ymin:ymax:100j, zmin:zmax:100j]). Note that a subdivision by 1000 in each dimension would create a grid that's too large to work with.
When using scipy.integrate, dblquad needs to be replaced by tplquad for 3 dimensions or nquad for N dimensions. This probably will also be rather slow, so the accuracy needs to be reduced further.
import numpy as np
import pandas as pd
import scipy.stats as st
from scipy.integrate import dblquad
df1 = pd.DataFrame({'x':np.random.uniform(0, 1, 100), 'y':np.random.uniform(0, 1, 100)})
df2 = pd.DataFrame({'x':np.random.uniform(0, 1, 100), 'y':np.random.uniform(0, 1, 100)})
# Extract x and y
x1 = df1['x']
y1 = df1['y']
x2 = df2['x']
y2 = df2['y']
# Define the borders
deltaX = (np.max([x1, x2]) - np.min([x1, x2])) / 4
deltaY = (np.max([y1, y2]) - np.min([y1, y2])) / 4
xmin = np.min([x1, x2]) - deltaX
xmax = np.max([x1, x2]) + deltaX
ymin = np.min([y1, y2]) - deltaY
ymax = np.max([y1, y2]) + deltaY
# fit a gaussian kernel using scipy’s gaussian_kde method
kernel1 = st.gaussian_kde(np.vstack([x1, y1]))
kernel2 = st.gaussian_kde(np.vstack([x2, y2]))
print('volumes via scipy`s dblquad (volume):')
print(' volume_f1 =', dblquad(lambda y, x: kernel1((x, y)), xmin, xmax, ymin, ymax, epsabs=1e-4, epsrel=1e-4)[0])
print(' volume_f2 =', dblquad(lambda y, x: kernel2((x, y)), xmin, xmax, ymin, ymax, epsabs=1e-4, epsrel=1e-4)[0])
print(' volume_intersection =',
dblquad(lambda y, x: np.minimum(kernel1((x, y)), kernel2((x, y))), xmin, xmax, ymin, ymax, epsabs=1e-4, epsrel=1e-4)[0])
Alternatively, one can calculate the mean value over a grid of points, and multiply the result by the area of the grid. Note that np.mgrid is much faster than creating a list via itertools.
# Create meshgrid
xx, yy = np.mgrid[xmin:xmax:1000j, ymin:ymax:1000j]
positions = np.vstack([xx.ravel(), yy.ravel()])
f1 = np.reshape(kernel1(positions).T, xx.shape)
f2 = np.reshape(kernel2(positions).T, xx.shape)
intersection = np.minimum(f1, f2)
print('volumes via the mean value multiplied by the area:')
print(' volume_f1 =', np.sum(f1) / f1.size * ((xmax - xmin)*(ymax - ymin)))
print(' volume_f2 =', np.sum(f2) / f2.size * ((xmax - xmin)*(ymax - ymin)))
print(' volume_intersection =', np.sum(intersection) / intersection.size * ((xmax - xmin)*(ymax - ymin)))
Example output:
volumes via scipy`s dblquad (volume):
volume_f1 = 0.9946974276169385
volume_f2 = 0.9928998852123891
volume_intersection = 0.9046421634401607
volumes via the mean value multiplied by the area:
volume_f1 = 0.9927873844924111
volume_f2 = 0.9910132867915901
volume_intersection = 0.9028999384136771
I have a bumpy array. I want to find the number of points which lies within an epsilon distance from each point.
My current code is (for a n*2 array, but in general I expect the array to be n * m)
epsilon = np.array([0.5, 0.5])
np.array([ 1/np.float(np.sum(np.all(np.abs(X-x) <= epsilon, axis=1))) for x in X])
But this code might not be efficient when it comes to an array of let us say 1 million rows and 50 columns. Is there a better and more efficient method ?
For example data
X = np.random.rand(10, 2)
you can solve this using broadcasting:
1 / np.sum(np.all(np.abs(X[:, None, ...] - X[None, ...]) <= epsilon, axis=-1), axis=-1)
I need to implement a solver for linear programming problems. All of the restrictions are <= ones such as
5x + 10y <= 10
There can be an arbitrary amount of these restrictions. Also , x>=0 y>=0 implicitly.
I need to find the optimal solutions(max) and show the feasible region in matplotlib. I've found the optimal solution by implementing the simplex method but I can't figure out how to draw the graph.
Some approaches I've found:
This link finds the minimum of the y points from each function and uses plt.fillBetween() to draw the region. But it doesn't work when I change the order of the equations. I'm not sure which y values to minimize(). So I can't use it for arbitrary restrictions.
Find solution for every pair of restrictions and draw a polygon. Not efficient.
An easier approach might be to have matplotlib compute the feasible region on its own (with you only providing the constraints) and then simply overlay the "constraint" lines on top.
# plot the feasible region
d = np.linspace(-2,16,300)
x,y = np.meshgrid(d,d)
plt.imshow( ((y>=2) & (2*y<=25-x) & (4*y>=2*x-8) & (y<=2*x-5)).astype(int) ,
extent=(x.min(),x.max(),y.min(),y.max()),origin="lower", cmap="Greys", alpha = 0.3);
# plot the lines defining the constraints
x = np.linspace(0, 16, 2000)
# y >= 2
y1 = (x*0) + 2
# 2y <= 25 - x
y2 = (25-x)/2.0
# 4y >= 2x - 8
y3 = (2*x-8)/4.0
# y <= 2x - 5
y4 = 2 * x -5
# Make plot
plt.plot(x, 2*np.ones_like(y1))
plt.plot(x, y2, label=r'$2y\leq25-x$')
plt.plot(x, y3, label=r'$4y\geq 2x - 8$')
plt.plot(x, y4, label=r'$y\leq 2x-5$')
plt.xlim(0,16)
plt.ylim(0,11)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.xlabel(r'$x$')
plt.ylabel(r'$y$')
This is a vertex enumeration problem. You can use the function lineqs which visualizes the system of inequalities A x >= b for any number of lines. The function will also display the vertices on which the graph was plotted.
The last 2 lines mean that x,y >=0
from intvalpy import lineqs
import numpy as np
A = -np.array([[5, 10],
[-1, 0],
[0, -1]])
b = -np.array([10, 0, 0])
lineqs(A, b, title='Solution', color='gray', alpha=0.5, s=10, size=(15,15), save=False, show=True)
Visual Solution Link
The data is from a measurement. The picture of the plotted data
I tried using trapz twice, but I get and error code: "ValueError: operands could not be broadcast together with shapes (1,255) (256,531)"
The x has 256 points and y has 532 points, also the Z is a 2d array that has a 256 by 532 lenght. The code is below:
import numpy as np
img=np.loadtxt('focus_x.txt')
m=0
m=np.max(img)
Z=img/m
X=np.loadtxt("pixelx.txt",float)
Y=np.loadtxt("pixely.txt",float)
[X, Y] = np.meshgrid(X, Y)
volume=np.trapz(X,np.trapz(Y,Z))
The docs state that trapz should be used like this
intermediate = np.trapz(Z, x)
result = np.trapz(intermediate, y)
trapz is reducing the dimensionality of its operand (by default on the last axis) using optionally a 1D array of abscissae to determine the sub intervals of integration; it is not using a mesh grid for its operation.
A complete example.
First we compute, using sympy, the integral of a simple bilinear function over a rectangular domain (0, 5) × (0, 7)
In [1]: import sympy as sp, numpy as np
In [2]: x, y = sp.symbols('x y')
In [3]: f = 1 + 2*x + y + x*y
In [4]: f.integrate((x, 0, 5)).integrate((y, 0, 7))
Out[4]: 2555/4
Now we compute the trapezoidal approximation to the integral (as it happens, the approximation is exact for a bilinear function) — we need coordinates arrays
In [5]: x, y = np.linspace(0, 5, 11), np.linspace(0, 7, 22)
(note that the sampling is different in the two directions and different from the defalt value used by trapz) — we need a mesh grid to compute the integrand and we need to compute the integrand
In [6]: X, Y = np.meshgrid(x, y)
In [7]: z = 1 + 2*X + Y + X*Y
and eventually we compute the integral
In [8]: 4*np.trapz(np.trapz(z, x), y)
Out[8]: 2555.0