geometric margin derivation with unit vector? - svm

In the following graph I would like to obtain the point B:
According to the author this B is equal to:
The question that I have is how the author obtain that result. It says that in A is a point x_(i), so I suppose that he projects A onto the unit vector w; and because B can be any point on the hyperplane he uses Euclidean distance to perform its calculations; something like:
B=A-dist(A,B)
B=x_(i)-Y_(i).w/||w||
Is that ok? but I am not quite sure why he multiply the value of Y_(i) with the unit vector, why is that?
Thanks

This is because dist(A,B) is a scalar. What you need is a vectorial operation to infer the position of a point from another point in the space, so:
point(A) - Vector(W) = point(B)
Therefore, you need to convert the scalar "dist(A,B)" with the magnitude |dist(A,B)|, into a vector, with the direction of W to be able to perform the operation above.
Therefore, Vector(W) above would become:
Vector(W) = dist(A,B) * unitVector(W)
Leading to the equation proposed by the author.
Please let me know if this is clear enough.

Related

Ultimate struggle with a full 3d space controller

Sorry if i'm stupid or something, but i having a deep dread from a work on a "full 3d" space movement.
I'm trying to make a "space ship" KinematicBody controller which using basis vectors as a rotation point and have ability to strafe/move left,right,up,down based on it's facing direction.
The issue is i'm having that i want to use a Vector3 as a storage of all input variables, an input strength in particular, but i can't find a convenient way to orient or use this vector's variables to apply it to velocity.
I got a sort of cheap solution which i don't like with applying a rotation to an input vector so it will "corresponds" to one of the basis, but it's starting to brake at some angels.
Could please somebody suggest what i can change in my logic or maybe there is a way to
use quaternion/matrix related methods/formulas?
I'm not sure I fully understand what you want to do, but I can give you something to work with.
I'll assume that you already have the input as a Vector3. If not, you want to see Input.get_action_strength, Input.get_axis and Input.get_vector.
I'm also assuming that the braking situations you encountered are a case of gimbal lock. But since you are asking about applying velocity not rotation, I'll not go into that topic.
Since you are using a KinematicBody, I suppose you would be using move_and_slide or similar method, which work in global space. But you want the input to have to be based on the current orientation. Thus, you would consider your Vector3 which represents the input to be in local space. And the issue is how to go from that local space to the global space that move_and_slide et.al. need.
Transform
You might be familiar with to_local and to_global. Which would interpret the Vector3 as a position:
var global_input_vector:Vector3 = to_global(input_vector)
And the opposite operation would be:
input_vector = to_local(global_input_vector)
The problem with these is that since these consider the Vector3 to be positions, they will translate the vector depending where the KinematicBody is. We can undo that translation:
var global_vec:Vector3 = to_global(local_vec) - global_transform.orign
And the opposite operation would be:
local_vec = to_local(global_vec + global_transform.orign)
By the way this is another way to write the same code:
var global_vec:Vector3 = (global_transform * local_vec) - global_transform.orign
And the opposite operation would be:
local_vec = global_transform.affine_inverse() * (global_vec + global_transform.orign)
Which I'm mentioning because I want you to see the similarity with the following approach.
Basis
I would rather not consider the Vector3 to be positions. Just free vectors. So, we would transform it with only the Basis, like this:
var global_vec:Vector3 = global_transform.basis * local_vec
And the opposite operation would be:
local_vec = global_transform.affine_inverse().basis * global_vec
This approach will not have the translation problem.
You can think of the Basis as a 3 by 3 matrix, and the Transform is that same matrix augmented with a translation vector (origin).
Quat
However, if you only want rotation, let us se quaternions instead:
var global_vec:Vector3 = global_transform.basis.get_rotation_quat() * local_vec
And the opposite operation would be:
local_vec = global_transform.affine_inverse().basis.get_rotation_quat() * global_vec
Well, actually, let us invert just the quaternion:
local_vec = global_transform.basis.get_rotation_quat().inverse() * global_vec
These will only rotate the vector (no scaling, or any other transformation, just rotation) according to the current orientation of the KinematicBody.
Rotating a Transform
If you are trying to rotate a Transform, either…
Do this (quaternion):
transform = Transform(transform.basis * Basis(quaternion), transform.origin)
Or this (quaternion):
transform = transform * Transform(Basis(quaternion), Vector3.ZERO)
Or this (axis-angle):
transform = Transform(transform.basis.rotated(axis, angle), transform.origin)
Or this (axis-angle):
transform = transform * Transform.Identity.rotated(axis, angle)
Or this (Euler angles):
transform = Transform(transform.basis * Basis(pitch, yaw, roll), transform.origin)
Or this (Euler angles):
transform = transform * Transform(Basis(pitch, yaw, roll), Vector3.ZERO)
Avoid this:
transform = transform.rotated(axis, angle)
The reason is that this rotation is always before translation (i.e. this rotates around the global origin instead of the current position), and you will end up with an undesirable result.
And yes, you could use rotate_x, rotate_y and rotate_z, or set rotation of a Spatial. But sometimes you need to work with a Transform directly.
See also:
Godot/Gdscript rotate + translate from local to world space.
How to LERP between 2 angles going the longest route or path in Godot.

What's the different between using modelViewmatrix directly and using normalMatrix instead? [duplicate]

I am working on some shaders, and I need to transform normals.
I read in few tutorials the way you transform normals is you multiply them with the transpose of the inverse of the modelview matrix. But I can't find explanation of why is that so, and what is the logic behind that?
It flows from the definition of a normal.
Suppose you have the normal, N, and a vector, V, a tangent vector at the same position on the object as the normal. Then by definition N·V = 0.
Tangent vectors run in the same direction as the surface of an object. So if your surface is planar then the tangent is the difference between two identifiable points on the object. So if V = Q - R where Q and R are points on the surface then if you transform the object by B:
V' = BQ - BR
= B(Q - R)
= BV
The same logic applies for non-planar surfaces by considering limits.
In this case suppose you intend to transform the model by the matrix B. So B will be applied to the geometry. Then to figure out what to do to the normals you need to solve for the matrix, A so that:
(AN)·(BV) = 0
Turning that into a row versus column thing to eliminate the explicit dot product:
[tranpose(AN)](BV) = 0
Pull the transpose outside, eliminate the brackets:
transpose(N)*transpose(A)*B*V = 0
So that's "the transpose of the normal" [product with] "the transpose of the known transformation matrix" [product with] "the transformation we're solving for" [product with] "the vector on the surface of the model" = 0
But we started by stating that transpose(N)*V = 0, since that's the same as saying that N·V = 0. So to satisfy our constraints we need the middle part of the expression — transpose(A)*B — to go away.
Hence we can conclude that:
transpose(A)*B = identity
=> transpose(A) = identity*inverse(B)
=> transpose(A) = inverse(B)
=> A = transpose(inverse(B))
My favorite proof is below where N is the normal and V is a tangent vector. Since they are perpendicular their dot product is zero. M is any 3x3 invertible transformation (M-1 * M = I). N' and V' are the vectors transformed by M.
To get some intuition, consider the shear transformation below.
Note that this does not apply to tangent vectors.
Take a look at this tutorial:
https://paroj.github.io/gltut/Illumination/Tut09%20Normal%20Transformation.html
You can imagine that when the surface of a sphere stretches (so the sphere is scaled along one axis or something similar) the normals of that surface will all 'bend' towards each other. It turns out you need to invert the scale applied to the normals to achieve this. This is the same as transforming with the Inverse Transpose Matrix. The link above shows how to derive the inverse transpose matrix from this.
Also note that when the scale is uniform, you can simply pass the original matrix as normal matrix. Imagine the same sphere being scaled uniformly along all axes, the surface will not stretch or bend, nor will the normals.
If the model matrix is made of translation, rotation and scale, you don't need to do inverse transpose to calculate normal matrix. Simply divide the normal by squared scale and multiply by model matrix and we are done. You can extend that to any matrix with perpendicular axes, just calculate squared scale for each axes of the matrix you are using instead.
I wrote the details in my blog: https://lxjk.github.io/2017/10/01/Stop-Using-Normal-Matrix.html
Don't understand why you just don't zero out the 4th element of the direction vector before multiplying with the model matrix. No inverse or transpose needed. Think of the direction vector as the difference between two points. Move the two points with the rest of the model - they are still in the same relative position to the model. Take the difference between the two points to get the new direction, and the 4th element, cancels out to zero. Lot cheaper.

(Incremental)PCA's Eigenvectors are not transposed but should be?

When we posted a homework assignment about PCA we told the course participants to pick any way of calculating the eigenvectors they found. They found multiple ways: eig, eigh (our favorite was svd). In a later task we told them to use the PCAs from scikit-learn - and were surprised that the results differed a lot more than we expected.
I toyed around a bit and we posted an explanation to the participants that either solution was correct and probably just suffered from numerical instabilities in the algorithms. However, recently I picked that file up again during a discussion with a co-worker and we quickly figured out that there's an interesting subtle change to make to get all results to be almost equivalent: Transpose the eigenvectors obtained from the SVD (and thus from the PCAs).
A bit of code to show this:
def pca_eig(data):
"""Uses numpy.linalg.eig to calculate the PCA."""
data = data.T # data
val, vec = np.linalg.eig(data)
return val, vec
versus
def pca_svd(data):
"""Uses numpy.linalg.svd to calculate the PCA."""
u, s, v = np.linalg.svd(data)
return s ** 2, v
Does not yield the same result. Changing the return of pca_svd to s ** 2, v.T, however, works! It makes perfect sense following the definition by wikipedia: The SVD of X follows X=UΣWT where
the right singular vectors W of X are equivalent to the eigenvectors of XTX
So to get the eigenvectors we need to transposed the output v of np.linalg.eig(...).
Unless there is something else going on? Anyway, the PCA and IncrementalPCA both show wrong results (or eig is wrong? I mean, transposing that yields the same equality), and looking at the code for PCA reveals that they are doing it as I did it initially:
U, S, V = linalg.svd(X, full_matrices=False)
# flip eigenvectors' sign to enforce deterministic output
U, V = svd_flip(U, V)
components_ = V
I created a little gist demonstrating the differences (nbviewer), the first with PCA and IncPCA as they are (also no transposition of the SVD), the second with transposed eigenvectors:
Comparison without transposition of SVD/PCAs (normalized data)
Comparison with transposition of SVD/PCAs (normalized data)
As one can clearly see, in the upper image the results are not really great, while the lower image only differs in some signs, thus mirroring the results here and there.
Is this really wrong and a bug in scikit-learn? More likely I am using the math wrong – but what is right? Can you please help me?
If you look at the documentation, it's pretty clear from the shape that the eigenvectors are in the rows, not the columns.
The point of the sklearn PCA is that you can use the transform method to do the correct transformation.

How can I detect and remove unneeded points in cubic bezier

Here is example image of what I want to do:
I want to calculate Path 1 from Path 2.
Screenshot made from Inkscape, where I'm, at first, create Path 1, then add p3 to the original path. This is didn't change the original path at all, because new point actually unneeded. So, how can I detect this point(p3) using Path 2 SVG path representation and calculate Path 1 from Path 2?
Basically, I search for the math formulas, which can help me to convert(also checking that whether it is possible):
C 200,300 300,250 400,250 C 500,250 600,300 600,400
to
C 200,200 600,200 600,400
You're solving a constraint problem. Taking your first compound curve, and using four explicit coordinates for each subcurve, we have:
points1 = point[8];
points2 = point[4];
with the following correspondences:
points1[0] == points2[0];
points1[7] == points2[3];
direction(points1[0],points1[1]) == direction(points2[0], points2[1]);
direction(points1[6],points1[7]) == direction(points2[2], points2[3]);
we also have a constraint on the relative placement for points2[1] and points2[2] due to the tangent of the center point in your compound curve:
direction(points1[2],points[4]) == direction(points2[1],points2[2]);
and lastly, we have a general constraint on where on- and off-curve points can be for cubic curves if we want the curve to pass through a point, which is described over at http://pomax.github.io/bezierinfo/#moulding
Taking the "abc" ratio from that section, we can check whether your compound curve parameters fit a cubic curve: if we construct a new cubic curve with points
A = points1[0];
B = points1[3];
C = points1[7];
with B at t=0.5 (in this case), then we can verify whether the resulting curve fits the constraints that must hold for this to be a legal simplification.
The main problem here is that we, in general, don't know whether the "in between start and end" point should fall on t=0.5, or whether it's a different t value. The easiest solution is to see how far that point is along the total curve (using arc length: distance = arclength(c1) / arclength(c1)+arclength(c2) will tell us) and use that as initial guess for t, iterating outward on either side for a few values.
The second option is to solve a generic cubic equation for the tangent vector at your "in between" point. We form a cubic curve with points
points3 = [ points1[0], points1[1], points1[6], points1[7] ];
and then solve its derivative equations to find one or more t values that have the same tangent direction (but not magnitude!) as our in-between point. Once we have those (and we might have more than 2), we evaluate whether we can create a curve through our three points of interest with the middle point set to each of those found t values. Either one or zero of the found t values will yield a legal curve. If we have one: perfect, we found a simplification. If we find none, then the compound curve cannot be simplified into a single cubic curve.

Is it possible to do an algebraic curve fit with just a single pass of the sample data?

I would like to do an algebraic curve fit of 2D data points, but for various reasons - it isn't really possible to have much of the sample data in memory at once, and iterating through all of it is an expensive process.
(The reason for this is that actually I need to fit thousands of curves simultaneously based on gigabytes of data which I'm reading off disk, and which is therefore sloooooow).
Note that the number of polynomial coefficients will be limited (perhaps 5-10), so an exact fit will be extremely unlikely, but this is ok as I'm trying to find an underlying pattern in data with a lot of random noise.
I understand how one can use a genetic algorithm to fit a curve to a dataset, but this requires many passes through the sample data, and thus isn't practical for my application.
Is there a way to fit a curve with a single pass of the data, where the state that must be maintained from sample to sample is minimal?
I should add that the nature of the data is that the points may lie anywhere on the X axis between 0.0 and 1.0, but the Y values will always be either 1.0 or 0.0.
So, in Java, I'm looking for a class with the following interface:
public interface CurveFit {
public void addData(double x, double y);
public List<Double> getBestFit(); // Returns the polynomial coefficients
}
The class that implements this must not need to keep much data in its instance fields, no more than a kilobyte even for millions of data points. This means that you can't just store the data as you get it to do multiple passes through it later.
edit: Some have suggested that finding an optimal curve in a single pass may be impossible, however an optimal fit is not required, just as close as we can get it in a single pass.
The bare bones of an approach might be if we have a way to start with a curve, and then a way to modify it to get it slightly closer to new data points as they come in - effectively a form of gradient descent. It is hoped that with sufficient data (and the data will be plentiful), we get a pretty good curve. Perhaps this inspires someone to a solution.
Yes, it is a projection. For
y = X beta + error
where lowercased terms are vectors, and X is a matrix, you have the solution vector
\hat{beta} = inverse(X'X) X' y
as per the OLS page. You almost never want to compute this directly but rather use LR, QR or SVD decompositions. References are plentiful in the statistics literature.
If your problem has only one parameter (and x is hence a vector as well) then this reduces to just summation of cross-products between y and x.
If you don't mind that you'll get a straight line "curve", then you only need six variables for any amount of data. Here's the source code that's going into my upcoming book; I'm sure that you can figure out how the DataPoint class works:
Interpolation.h:
#ifndef __INTERPOLATION_H
#define __INTERPOLATION_H
#include "DataPoint.h"
class Interpolation
{
private:
int m_count;
double m_sumX;
double m_sumXX; /* sum of X*X */
double m_sumXY; /* sum of X*Y */
double m_sumY;
double m_sumYY; /* sum of Y*Y */
public:
Interpolation();
void addData(const DataPoint& dp);
double slope() const;
double intercept() const;
double interpolate(double x) const;
double correlate() const;
};
#endif // __INTERPOLATION_H
Interpolation.cpp:
#include <cmath>
#include "Interpolation.h"
Interpolation::Interpolation()
{
m_count = 0;
m_sumX = 0.0;
m_sumXX = 0.0;
m_sumXY = 0.0;
m_sumY = 0.0;
m_sumYY = 0.0;
}
void Interpolation::addData(const DataPoint& dp)
{
m_count++;
m_sumX += dp.getX();
m_sumXX += dp.getX() * dp.getX();
m_sumXY += dp.getX() * dp.getY();
m_sumY += dp.getY();
m_sumYY += dp.getY() * dp.getY();
}
double Interpolation::slope() const
{
return (m_sumXY - (m_sumX * m_sumY / m_count)) /
(m_sumXX - (m_sumX * m_sumX / m_count));
}
double Interpolation::intercept() const
{
return (m_sumY / m_count) - slope() * (m_sumX / m_count);
}
double Interpolation::interpolate(double X) const
{
return intercept() + slope() * X;
}
double Interpolation::correlate() const
{
return m_sumXY / sqrt(m_sumXX * m_sumYY);
}
Why not use a ring buffer of some fixed size (say, the last 1000 points) and do a standard QR decomposition-based least squares fit to the buffered data? Once the buffer fills, each time you get a new point you replace the oldest and re-fit. That way you have a bounded working set that still has some data locality, without all the challenges of live stream (memoryless) processing.
Are you limiting the number of polynomial coefficients (i.e. fitting to a max power of x in your polynomial)?
If not, then you don't need a "best fit" algorithm - you can always fit N data points EXACTLY to a polynomial of N coefficients.
Just use matrices to solve N simultaneous equations for N unknowns (the N coefficients of the polynomial).
If you are limiting to a max number of coefficients, what is your max?
Following your comments and edit:
What you want is a low-pass filter to filter out noise, not fit a polynomial to the noise.
Given the nature of your data:
the points may lie anywhere on the X axis between 0.0 and 1.0, but the Y values will always be either 1.0 or 0.0.
Then you don't need even a single pass, as these two lines will pass exactly through every point:
X = [0.0 ... 1.0], Y = 0.0
X = [0.0 ... 1.0], Y = 1.0
Two short line segments, unit length, and every point falls on one line or the other.
Admittedly, an algorithm to find a good curve fit for arbitrary points in a single pass is interesting, but (based on your question), that's not what you need.
Assuming that you don't know which point should belong to which curve, something like a Hough Transform might provide what you need.
The Hough Transform is a technique that allows you to identify structure within a data set. One use is for computer vision, where it allows easy identification of lines and borders within the field of sight.
Advantages for this situation:
Each point need be considered only once
You don't need to keep a data structure for each candidate line, just one (complex, multi-dimensional) structure
Processing of each line is simple
You can stop at any point and output a set of good matches
You never discard any data, so it's not reliant on any accidental locality of references
You can trade off between accuracy and memory requirements
Isn't limited to exact matches, but will highlight partial matches too.
An approach
To find cubic fits, you'd construct a 4-dimensional Hough space, into which you'd project each of your data-points. Hotspots within Hough space would give you the parameters for the cubic through those points.
You need the solution to an overdetermined linear system. The popular methods are Normal Equations (not usually recommended), QR factorization, and singular value decomposition (SVD). Wikipedia has decent explanations, Trefethen and Bau is very good. Your options:
Out-of-core implementation via the normal equations. This requires the product A'A where A has many more rows than columns (so the result is very small). The matrix A is completely defined by the sample locations so you don't have to store it, thus computing A'A is reasonably cheap (very cheap if you don't need to hit memory for the node locations). Once A'A is computed, you get the solution in one pass through your input data, but the method can be unstable.
Implement an out-of-core QR factorization. Classical Gram-Schmidt will be fastest, but you have to be careful about stability.
Do it in-core with distributed memory (if you have the hardware available). Libraries like PLAPACK and SCALAPACK can do this, the performance should be much better than 1. The parallel scalability is not fantastic, but will be fine if it's a problem size that you would even think about doing in serial.
Use iterative methods to compute an SVD. Depending on the spectral properties of your system (maybe after preconditioning) this could converge very fast and does not require storage for the matrix (which in your case has 5-10 columns each of which are the size of your input data. A good library for this is SLEPc, you only have to find a the product of the Vandermonde matrix with a vector (so you only need to store the sample locations). This is very scalable in parallel.
I believe I found the answer to my own question based on a modified version of this code. For those interested, my Java code is here.

Resources