In VAE tutorial, kl-divergence of two Normal Distributions is defined by:
And in many code, such as here, hereand here, the code is implemented as:
KL_loss = -0.5 * torch.sum(1 + logv - mean.pow(2) - logv.exp())
or
def latent_loss(z_mean, z_stddev):
mean_sq = z_mean * z_mean
stddev_sq = z_stddev * z_stddev
return 0.5 * torch.mean(mean_sq + stddev_sq - torch.log(stddev_sq) - 1)
How are they related? why there is not any "tr" or ".transpose()" in code?
The expressions in the code you posted assume X is an uncorrelated multi-variate Gaussian random variable. This is apparent by the lack of cross terms in the determinant of the covariance matrix. Therefore the mean vector and covariance matrix take the forms
Using this we can quickly derive the following equivalent representations for the components of the original expression
Substituting these back into the original expression gives
I need a solution to project a 2d point onto a 2d line at certain Direction .Here's what i've got so far : This is how i do orthogonal projection :
CVector2d project(Line line , CVector2d point)
{
CVector2d A = line.end - line.start;
CVector2d B = point - line start;
float dot = A.dotProduct(B);
float mag = A.getMagnitude();
float md = dot/mag;
return CVector2d (line.start + A * md);
}
Result :
(Projecting P onto line and the result is Pr):
but i need to project the point onto the line at given DIRECTION which should return a result like this (project point P1 onto line at specific Direction calculate Pr) :
How should I take Direction vector into account to calculate Pr ?
I can come up with 2 methods out of my head.
Personally I would do this using affine transformations (but seems you don not have this concept as you are using vectors not points). The procedure with affine transformations is easy. Rotate the points to one of the cardinal axes read the coordinate of your point zero the other value and inverse transform back. The reason for this strategy is that nearly all transformation procedures reduce to very simple human understandable operations with the affine transformation scheme. So no real work to do once you have the tools and data structures at hand.
However since you didn't see this coming I assume you want to hear a vector operation instead (because you either prefer the matrix operation or run away when its suggested, tough its the same thing). So you have the following situation:
This expressed as a equation system looks like (its intentionally this way to show you that it is NOT code but math at this point):
line.start.x + x*(line.end.x - line.start.x)+ y*direction.x = point.x
line.start.y + x*(line.end.y - line.start.y)+ y*direction.y = point.y
now this can be solved for x (and y)
x = (direction.y * line.start.x - direction.x * line.start.y -
direction.y * point.x + direction.x * point.y) /
(direction.y * line.end.x - direction.x * line.end.y -
direction.y * line.start.x + direction.x * line.start.y);
// the solution for y can be omitted you dont need it
y = -(line.end.y * line.start.x - line.end.x * line.start.y -
line.end.y * point.x + line.start.y * point.x + line.end.x * point.y -
line.start.x point.y)/
(-direction.y * line.end.x + direction.x * line.end.y +
direction.y * line.start.x - direction.x * line.start.y)
Calculation done with mathematica if I didn't copy anything wrong it should work. But I would never use this solution because its not understandable (although it is high school grade math, or at least it is where I am). But use space transformation as described above.
I'm modding a game called Mount&Blade, currently trying to implement lightmapping through custom shaders.
As the in-game format doesn't allows more than one UV map per model and I need to carry the info of a second, non-overlapping parametrization somewhere, a field of four uints (RGBA, used for per-vertex coloring) is my only possibility.
At first thought about just using U,V=R,G but the precision isn't good enough.
Now I'm trying to encode them with the maximum precision available, using two fields (16bit) per coordinate. Snip of my Python exporter:
def decompose(a):
a=int(a*0xffff) #fill the entire range to get the maximum precision
aa =(a&0xff00)>>8 #decompose the first half and save it as an 8bit uint
ab =(a&0x00ff) #decompose the second half
return aa,ab
def compose(na,nb):
return (na<<8|nb)/0xffff
I'd like to know how to do the second part (composing, or unpacking it) in HLSL (DX9, shader model 2.0). Here's my try, close, but doesn't works:
//compose UV from n=(na<<8|nb)/0xffff
float2 thingie = float2(
float( ((In.Color.r*255.f)*256.f)+
(In.Color.g*255.f) )/65535.f,
float( ((In.Color.b*255.f)*256.f)+
(In.Color.w*255.f) )/65535.f
);
//sample the lightmap at that position
Output.RGBColor = tex2D(MeshTextureSamplerHQ, thingie);
Any suggestion or ingenious alternative is welcome.
Remember to normalize aa and ab after you decompose a.
Something like this:
(u1, u2) = decompose(u)
(v1, v2) = decompose(v)
color.r = float(u1) / 255.f
color.g = float(u2) / 255.f
color.b = float(v1) / 255.f
color.a = float(v2) / 255.f
The pixel shader:
float2 texc;
texc.x = (In.Color.r * 256.f + In.Color.g) / 257.f;
texc.y = (In.Color.b * 256.f + In.Color.a) / 257.f;
I have a single of x-y coordinate system
This diagram should represent what you've told me.
The key point, is to express [x2],[y2] in CS1. (I can't use latex here so let's assume that [A] means the vector A, |A| is the length of the vector A)
[v2] = v2x * [x2] + v2y * [y2]
Since we have well defined [v1] and [d2], we can calculate [x']
[x`] = [d2] - [v1]
From [x'] we can calculate x2
[x2] = (|x2|/|x'|)[x`] = (|x1|/|x'|)[x'] since |x1| = |x2|
From x2 we can calculate y2, although I don't remember how. It's a simple 90° rotation.
Should be this:
y2x = - x2y
y2y = x2x
Once we have expressed x2,y2 in CS1, we can compute v2
v2 = v2x * [x2] + v2y * [y2] = v2x * (x2x*[x1]+x2y*[y1]) + v2y * (y2x*[x1]+y2y*[y1])
= (v2xx2x + v2yy2x)[x1] + (v2xx2y + v2yy2y) [y1] // Hope I didn't make any mistake here :)
And finally
[X] = [v1] + [v2]
I think the best option is to create a vector class and do all the math using vector algebra. You just need to define 3 operation: Addition, ScalarMultiplication, 90Rotation.
Given two image buffers (assume it's an array of ints of size width * height, with each element a color value), how can I map an area defined by a quadrilateral from one image buffer into the other (always square) image buffer? I'm led to understand this is called "projective transformation".
I'm also looking for a general (not language- or library-specific) way of doing this, such that it could be reasonably applied in any language without relying on "magic function X that does all the work for me".
An example: I've written a short program in Java using the Processing library (processing.org) that captures video from a camera. During an initial "calibrating" step, the captured video is output directly into a window. The user then clicks on four points to define an area of the video that will be transformed, then mapped into the square window during subsequent operation of the program. If the user were to click on the four points defining the corners of a door visible at an angle in the camera's output, then this transformation would cause the subsequent video to map the transformed image of the door to the entire area of the window, albeit somewhat distorted.
Using linear algebra is much easier than all that geometry! Plus you won't need to use sine, cosine, etc, so you can store each number as a rational fraction and get the exact numerical result if you need it.
What you want is a mapping from your old (x,y) co-ordinates to your new (x',y') co-ordinates. You can do it with matrices. You need to find the 2-by-4 projection matrix P such that P times the old coordinates equals the new co-ordinates. We'll assume that you're mapping lines to lines (not, for instance, straight lines to parabolas). Because you have a projection (parallel lines don't stay parallel) and translation (sliding), you need a factor of (xy) and (1), too. Drawn as matrices:
[x ]
[a b c d]*[y ] = [x']
[e f g h] [x*y] [y']
[1 ]
You need to know a through h so solve these equations:
a*x_0 + b*y_0 + c*x_0*y_0 + d = i_0
a*x_1 + b*y_1 + c*x_1*y_1 + d = i_1
a*x_2 + b*y_2 + c*x_2*y_2 + d = i_2
a*x_3 + b*y_3 + c*x_3*y_3 + d = i_3
e*x_0 + f*y_0 + g*x_0*y_0 + h = j_0
e*x_1 + f*y_1 + g*x_1*y_1 + h = j_1
e*x_2 + f*y_2 + g*x_2*y_2 + h = j_2
e*x_3 + f*y_3 + g*x_3*y_3 + h = j_3
Again, you can use linear algebra:
[x_0 y_0 x_0*y_0 1] [a e] [i_0 j_0]
[x_1 y_1 x_1*y_1 1] * [b f] = [i_1 j_1]
[x_2 y_2 x_2*y_2 1] [c g] [i_2 j_2]
[x_3 y_3 x_3*y_3 1] [d h] [i_3 j_3]
Plug in your corners for x_n,y_n,i_n,j_n. (Corners work best because they are far apart to decrease the error if you're picking the points from, say, user-clicks.) Take the inverse of the 4x4 matrix and multiply it by the right side of the equation. The transpose of that matrix is P. You should be able to find functions to compute a matrix inverse and multiply online.
Where you'll probably have bugs:
When computing, remember to check for division by zero. That's a sign that your matrix is not invertible. That might happen if you try to map one (x,y) co-ordinate to two different points.
If you write your own matrix math, remember that matrices are usually specified row,column (vertical,horizontal) and screen graphics are x,y (horizontal,vertical). You're bound to get something wrong the first time.
EDIT
The assumption below of the invariance of angle ratios is incorrect. Projective transformations instead preserve cross-ratios and incidence. A solution then is:
Find the point C' at the intersection of the lines defined by the segments AD and CP.
Find the point B' at the intersection of the lines defined by the segments AD and BP.
Determine the cross-ratio of B'DAC', i.e. r = (BA' * DC') / (DA * B'C').
Construct the projected line F'HEG'. The cross-ratio of these points is equal to r, i.e. r = (F'E * HG') / (HE * F'G').
F'F and G'G will intersect at the projected point Q so equating the cross-ratios and knowing the length of the side of the square you can determine the position of Q with some arithmetic gymnastics.
Hmmmm....I'll take a stab at this one. This solution relies on the assumption that ratios of angles are preserved in the transformation. See the image for guidance (sorry for the poor image quality...it's REALLY late). The algorithm only provides the mapping of a point in the quadrilateral to a point in the square. You would still need to implement dealing with multiple quad points being mapped to the same square point.
Let ABCD be a quadrilateral where A is the top-left vertex, B is the top-right vertex, C is the bottom-right vertex and D is the bottom-left vertex. The pair (xA, yA) represent the x and y coordinates of the vertex A. We are mapping points in this quadrilateral to the square EFGH whose side has length equal to m.
Compute the lengths AD, CD, AC, BD and BC:
AD = sqrt((xA-xD)^2 + (yA-yD)^2)
CD = sqrt((xC-xD)^2 + (yC-yD)^2)
AC = sqrt((xA-xC)^2 + (yA-yC)^2)
BD = sqrt((xB-xD)^2 + (yB-yD)^2)
BC = sqrt((xB-xC)^2 + (yB-yC)^2)
Let thetaD be the angle at the vertex D and thetaC be the angle at the vertex C. Compute these angles using the cosine law:
thetaD = arccos((AD^2 + CD^2 - AC^2) / (2*AD*CD))
thetaC = arccos((BC^2 + CD^2 - BD^2) / (2*BC*CD))
We map each point P in the quadrilateral to a point Q in the square. For each point P in the quadrilateral, do the following:
Find the distance DP:
DP = sqrt((xP-xD)^2 + (yP-yD)^2)
Find the distance CP:
CP = sqrt((xP-xC)^2 + (yP-yC)^2)
Find the angle thetaP1 between CD and DP:
thetaP1 = arccos((DP^2 + CD^2 - CP^2) / (2*DP*CD))
Find the angle thetaP2 between CD and CP:
thetaP2 = arccos((CP^2 + CD^2 - DP^2) / (2*CP*CD))
The ratio of thetaP1 to thetaD should be the ratio of thetaQ1 to 90. Therefore, calculate thetaQ1:
thetaQ1 = thetaP1 * 90 / thetaD
Similarly, calculate thetaQ2:
thetaQ2 = thetaP2 * 90 / thetaC
Find the distance HQ:
HQ = m * sin(thetaQ2) / sin(180-thetaQ1-thetaQ2)
Finally, the x and y position of Q relative to the bottom-left corner of EFGH is:
x = HQ * cos(thetaQ1)
y = HQ * sin(thetaQ1)
You would have to keep track of how many colour values get mapped to each point in the square so that you can calculate an average colour for each of those points.
I think what you're after is a planar homography, have a look at these lecture notes:
http://www.cs.utoronto.ca/~strider/vis-notes/tutHomography04.pdf
If you scroll down to the end you'll see an example of just what you're describing. I expect there's a function in the Intel OpenCV library which will do just this.
There is a C++ project on CodeProject that includes source for projective transformations of bitmaps. The maths are on Wikipedia here. Note that so far as i know, a projective transformation will not map any arbitrary quadrilateral onto another, but will do so for triangles, you may also want to look up skewing transforms.
If this transformation has to look good (as opposed to the way a bitmap looks if you resize it in Paint), you can't just create a formula that maps destination pixels to source pixels. Values in the destination buffer have to be based on a complex averaging of nearby source pixels or else the results will be highly pixelated.
So unless you want to get into some complex coding, use someone else's magic function, as smacl and Ian have suggested.
Here's how would do it in principle:
map the origin of A to the origin of B via a traslation vector t.
take unit vectors of A (1,0) and (0,1) and calculate how they would be mapped onto the unit vectors of B.
this gives you a transformation matrix M so that every vector a in A maps to M a + t
invert the matrix and negate the traslation vector so for every vector b in B you have the inverse mapping b -> M-1 (b - t)
once you have this transformation, for each point in the target area in B, find the corresponding in A and copy.
The advantage of this mapping is that you only calculate the points you need, i.e. you loop on the target points, not the source points. It was a widely used technique in the "demo coding" scene a few years back.