Finding the centre of multiple lines using least squares approach in Python

Finding the centre of multiple lines using least squares approach in Python - python-3.x

I have a series of lines which roughly (but not exactly) intersect at some point.
I need to find the point which minimises the distance between each line in the centre. I have been trying to follow this methodology:
Nearest point to intersecting lines in 2D
When I create my script in Python to perform this function I get the incorrect answer:
Here is my code, I was wondering if anyone could suggest what I am doing wrong? Or an easier way of going about this. Each line is defined by two points x1 and x2.
def directionalv(x1,x2):
point1=np.array(x1) #point1 and point2 define my line
point2=np.array(x2)
ortho= np.array([[0,-1],[1,0]]) #see wikipedia article
subtract=point2-point1
length=np.linalg.norm(subtract)
fraction = np.divide(subtract,length)
n1=ortho.dot(fraction)
num1=n1.dot(n1.transpose())
num = num1*(point1)
denom=n1.dot(n1.transpose())
return [num,denom]
n1l1=directionalv(x1,x2)
n1l2=directionalv(x3,x4)
n1l3=directionalv(x5,x6)
n1l4=directionalv(x7,x8)
n1l5=directionalv(x9,x10)
numerall=n1l1[0]+n1l2[0]+n1l3[0]+n1l4[0]+n1l5[0] #sum of (n.n^t)pi from wikipedia article
denomall=n1l1[1]+n1l2[1]+n1l3[1]+n1l4[1]+n1l5[1] #sum of n.n^t
point=(numerall/denomall)
My points are as follows
Line1 consists of points x1= [615, 396] and x2 = [616, 880]
Line 2, x3 = [799, 449] x4= [449, 799]
Line 3, x5 = [396, 637] x6 = [880, 636]
Line 4, x7 = [618, 396] x8 = [618, 880]
Line 5, x9 = [483, 456] x10 = [777, 875]
Any help would be really appreciated!
Thank you for your time.

Could it simply be the fact that you should define in Python the matrix as 2 vectors (understand is a column of the matrix, not row!
see: How to define two-dimensional array in python ), you'll then should define the ortho matrix like this:
ortho= np.array([[0,1],[-1,0]])
Otherwise, what does the following means?
numerall=n1l1[0]+n1l2[0]+n1l3[0]+n1l4[0]+n1l5[0] #sum of (n.n^t)pi from wikipedia article
denomall=n1l1[1]+n1l2[1]+n1l3[1]+n1l4[1]+n1l5[1] #sum of n.n^t
point=(numerall/denomall)
I do not understand your interpretation of the transposition of a Matrix; and the inverse of a matrix does not equals to a division.
Use an existing Python library like Numpy to do the computing instead of implementing it yourself. See: https://docs.scipy.org/doc/numpy-1.10.4/reference/generated/numpy.matrix.html

Related

How to fix the issue of plotting a 2D sine wave in python

I want to generate 2D travelling sine wave. To do this, I've set the parameters for the plane wave and generate wave for any time instants like as follows:
import numpy as np
import random
import matplotlib.pyplot as plt
f = 10 # frequency
fs = 100 # sample frequency
Ts = 1/fs # sample period
t = np.arange(0,0.5, Ts) # time index
c = 50 # speed of wave
w = 2*np.pi *f # angular frequency
k = w/c # wave number
resolution = 0.02
x = np.arange(-5, 5, resolution)
y = np.arange(-5, 5, resolution)
dx = np.array(x); M = len(dx)
dy = np.array(y); N = len(dy)
[xx, yy] = np.meshgrid(x, y);
theta = np.pi / 4 # direction of propagation
kx = k* np.cos(theta)
ky = k * np.sin(theta)
So, the plane wave would be
plane_wave = np.sin(kx * xx + ky * yy - w * t[1])
plt.figure();
plt.imshow(plane_wave,cmap='seismic',origin='lower', aspect='auto')
that gives a smooth plane wave as shown in . Also, the sine wave variation with plt.figure(); plt.plot(plane_wave[2,:]) time is given in .
However, when I want to append plane waves at different time instants then there is some discontinuity arises in figure 03 & 04 , and I want to get rid of from this problem.
I'm new in python and any help will be highly appreciated. Thanks in advance.
arr = []
for count in range(len(t)):
p = np.sin(kx * xx + ky * yy - w * t[count]); # plane wave
arr.append(p)
arr = np.array(arr)
print(arr.shape)
pp,q,r = arr.shape
sig = np.reshape(arr, (-1, r))
print('The signal shape is :', sig.shape)
plt.figure(); plt.imshow(sig.transpose(),cmap='seismic',origin='lower', aspect='auto')
plt.xlabel('X'); plt.ylabel('Y')
plt.figure(); plt.plot(sig[2,:])

This is not that much a problem of programming. It has to do more with the fact that you are using the physical quantities in a somewhat unusual way. Your plots are absolutely fine and correct.
What you seem to have misunderstood is the fact that you are talking about a 2D problem with a third dimension added for time. This is by no means wrong but if you try to append the snapshot of the 2D wave side-by-side you are using (again) the x spatial dimension to represent temporal variations. This leads to an inconsistency of the use of that coordinate axis. Now, to make this more intuitive, consider the two time instances separately. Does it not coincide with your intuition that all points on the 2D plane must have different amplitudes (unless of course the time has progressed by a multiple of the period of the wave)? This is the case indeed. Thus, when you try to append the two snapshots, a discontinuity is exhibited. In order to avoid that you have to either use a time step equal to one period, which I believe is of no practical use, or a constant time step that will make the phase of the wave on the left border of the image in the current time equal to the phase of the wave on the right border of the image in the previous time step. Yet, this will always be a constant time step, alternating the phase (on the edges of the image) between the two said values.
The same applies to the 1D case because you use the two coordinate axes to represent the wave (x is the x spatial dimension and y is used to represent the amplitude). This is what can be seen in your last plot.
Now, what would be the solution you may ask. The solution is provided by simple inspection of the mathematical formula of the wave function. In 2D, it is a scalar function of three variables (that is, takes as input three values and outputs one) and so you need at least four dimensions to represent it. Alas, we can't perceive a fourth spatial dimension, but this is not a problem in your case as the output of the function is represented with colors. Then there are three dimensions that could be used to represent the temporal evolution of your function. All you have to do is to create a 3D array where the third dimension represents time and all 2D snapshots will be stored in the first two dimensions.
When it comes to visual representation of the results you could either use some kind of waterfall plots where the z-axis will represent time or utilize the fourth dimension we can perceive, time that is, to create an animation of the evolution of the wave.
I am not very familiar with Python, so I will only provide a generic naive implementation. I am sure a lot of people here could provide some simplification and/or optimisation of the following snippet. I assume that everything in your first two blocks of code is available so changes have to be done only in the last block you present
arr = np.zeros((len(xx), len(yy), len(t))) # Initialise the array to hold the temporal evolution of the snapshots
for i in range(len(t)):
arr[:, :, i] = np.sin(kx * xx + ky * yy - w * t[i])
# Below you can plot the figures with any function you prefer or make an animation out of it

Can this be solved with a line sweep algorithm in O(n)?

In this problem we are given n horizontal segments in the plane, find in O(n) time a line that intersects all the segments and has the largest possible slope, or determine that there is no such line.
I thought about finding all possible lines by having an inequality solving it and getting all possible line equations and then finding the one with the biggest slope however I can't find the solution is related to anything we learned in computational geometry
Can anyone give me a hint or mention any related subject in computational geometry that could help

No, this problem can't be solved by the line sweep algorithm in linear time - please see the #YvesDaoust comment. However, it can be solved in linear time by another method - please see below.
Your intent to describe intersections between n segments and the stabbing line by inequalities was correct, however you could go further and reduce your problem to a linear programming (LP) problem with two variables and 2*n constraints.
Let's denote parameters of a segment i by three numbers - xMin(i), xMax(i) and y(i). The stabbing line will be described by equation y = a*x + b. This line must intersect the segment i, so for each such segment we have two inequalities, guaranteeing the intersection:
a*xMin(i) + b <= y(i)
a*xMax(i) + b >= y(i)
So, you need to maximize the slope a subject to 2*n constraints above, and this is a well known fixed-dimension LP problem with two variables a and b. According to this paper by Nimrod Megiddo, LP problems with n constraints and fixed (non-depending on the n) number of variables can be solved in O(n) time.

As comments suggested line sweep algorithm is slower than O(n) so the answer is NO however simple O(n) approach is still possible for example like this:
find BBOX of your intersection line
so You are searching for x0,y0,x1,y1 coordinates of "inscribed" rectangle crossing the interior of all lines (aqua rectangle). This can be done in O(n)
So search all lines and get:
y0 = min(y)
y1 = max(y)
x0 = max(left_x)
x1 = min(righ_x)
where left_x<right_x are the two x coordinates of each line.
construct your line
In case the x0>x1 there is no such line possible.
For smallest slope simply use one of the diagonals of the BBOX:
Line(x0,y0,x1,y1)
Line(x0,y1,x1,y0)
which depends on your coordinate system (points might be even reversed)...
The biggest slope is vertical line so you can use any x inside <x0,x1> interval for example
Line(x0,y0,x0,y1)

Creating a distance matrix from sets of xyz coordinates (python)

I have a list of xyz coordinates of different points from a PDB file assigned to variable x. Here is a snippet of what it looks like
[ 8.721 15.393 22.939]
[11.2 13.355 25.025]
[11.045 15.057 28.419]
[13.356 13.814 31.169]
[12.54 13.525 34.854]
[14.038 15.691 37.608]
[16.184 12.782 38.807]
[17.496 12.053 35.319]
[18.375 15.721 34.871]
[20.066 15.836 38.288]
[22.355 12.978 37.249]
[22.959 14.307 33.724]
[24.016 17.834 34.691]
[26.63 16.577 37.161]
[29.536 18.241 35.342]
[27.953 21.667 35.829]
I would like to use these points to compute a distance matrix. I have tried to use the SciPy distance_matrix function, however it does not appear to support xyz coordinates, only x and y coordinates. Is there a good way to compute this distance matrix manually?

If you can use biopython Bio.PDB to get these atoms then you can get the distance between 2 atoms by simply subtracting the two atoms distance = atom1 - atom2.
If you really want to get the distance on your own then that is also simple, by using the formula d = sqrt((x2 - x1)**2 + (y2 - y1)**2 + (z2 - z1)**2).
You just need to loop over to get a distance matrix using one of the distance method above:
dist=[[0]*len(array[0])]*len(array)
for i in range(len(array)-1):
for j in range(i+1,len(array)):
dist[i][j]=distance(array[i],array[j])

How to find variability of a set of Cartesian Points (xyz) or fitting/distance to 3D line and/or plane?

So I was looking at this question:
Matlab - Standard Deviation of Cartesian Points
Which basically answers my question, except the problem is I have xyz, not xy. So I don't think Ax=b would work in this case.
I have, say, 10 Cartesian points, and I want to be able to find the standard deviation of these points. Now, I don't want standard deviation of each X, Y and Z (as a result of 3 sets) but I just want to get one number.
This can be done using MATLAB or excel.
To better understand what I'm doing, I have this desired point (1,2,3) and I recorded (1.1,2.1,2.9), (1.2,1.9,3.1) and so on. I wanted to be able to find the variability of all the recorded points.
I'm open for any other suggestions.

If you do the same thing as in the other answer you linked, it should work.
x_vals = xyz(:,1);
y_vals = xyz(:,2);
z_vals = xyz(:,3);
then make A with 3 columns,
A = [x_vals y_vals ones(size(x_vals))];
and
b = z_vals;
Then
sol=A\b;
m = sol(1);
n = sol(2);
c = sol(3);
and then
errs = (m*x_vals + n*y_vals + c) - z_vals;
After that you can use errs just as in the linked question.

Randomly clustered data
If your data is not expected to be near a line or a plane, just compute the distance of each point to the centroid:
xyz_bar = mean(xyz);
M = bsxfun(#minus,xyz,xyz_bar);
d = sqrt(sum(M.^2,2)); % distances to centroid
Then you can compute variability anyway you like. For example, standard deviation and RMS error:
std(d)
sqrt(mean(d.^2))
Data about a 3D line
If the data points are expected to be roughly along the path of a line, with some deviation from it, you might look at the distance to a best fit line. First, fit a 3D line to your points. One way is using the following parametric form of a 3D line:
x = a*t + x0
y = b*t + y0
z = c*t + z0
Generate some test data, with noise:
abc = [2 3 1]; xyz0 = [6 12 3];
t = 0:0.1:10;
xyz = bsxfun(#plus,bsxfun(#times,abc,t.'),xyz0) + 0.5*randn(numel(t),3)
plot3(xyz(:,1),xyz(:,2),xyz(:,3),'*') % to visualize
Estimate the 3D line parameters:
xyz_bar = mean(xyz) % centroid is on the line
M = bsxfun(#minus,xyz,xyz_bar); % remove mean
[~,S,V] = svd(M,0)
abc_est = V(:,1).'
abc/norm(abc) % compare actual slope coefficients
Distance from points to a 3D line:
pointCentroidSeg = bsxfun(#minus,xyz_bar,xyz);
pointCross = cross(pointCentroidSeg, repmat(abc_est,size(xyz,1),1));
errs = sqrt(sum(pointCross.^2,2))
Now you have the distance from each point to the fit line ("error" of each point). You can compute the mean, RMS, standard deviation, etc.:
>> std(errs)
ans =
0.3232
>> sqrt(mean(errs.^2))
ans =
0.7017
Data about a 3D plane
See David's answer.

Given a set of points, how do I approximate the major axis of its shape?

Given a "shape" drawn by the user, I would like to "normalize" it so they all have similar size and orientation. What we have is a set of points. I can approximate the size using bounding box or circle, but the orientation is a bit more tricky.
The right way to do it, I think, is to calculate the majoraxis of its bounding ellipse. To do that you need to calculate the eigenvector of the covariance matrix. Doing so likely will be way too complicated for my need, since I am looking for some good-enough estimate. Picking min, max, and 20 random points could be some starter. Is there an easy way to approximate this?
Edit:
I found Power method to iteratively approximate eigenvector. Wikipedia article.
So far I am liking David's answer.

You'd be calculating the eigenvectors of a 2x2 matrix, which can be done with a few simple formulas, so it's not that complicated. In pseudocode:
// sums are over all points
b = -(sum(x * x) - sum(y * y)) / (2 * sum(x * y))
evec1_x = b + sqrt(b ** 2 + 1)
evec1_y = 1
evec2_x = b - sqrt(b ** 2 + 1)
evec2_y = 1
You could even do this by summing over only some of the points to get an estimate, if you expect that your chosen subset of points would be representative of the full set.
Edit: I think x and y must be translated to zero-mean, i.e. subtract mean from all x, y first (eed3si9n).

Here's a thought... What if you performed a linear regression on the points and used the slope of the resulting line? If not all of the points, at least a sample of them.
The r^2 value would also give you information about the general shape. The closer to 0, the more circular/uniform the shape is (circle/square). The closer to 1, the more stretched out the shape is (oval/rectangle).

The ultimate solution to this problem is running PCA
I wish I could find a nice little implementation for you to refer to...

Here you go! (assuming x is a nx2 vector)
def majAxis(x):
e,v = np.linalg.eig(np.cov(x.T)); return v[:,np.argmax(e)]

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string