Microsoft.DirectX.Vector3.Normalize() inconsistency - graphics

Two ways to normalize a Vector3 object; by calling Vector3.Normalize() and the other by normalizing from scratch:
class Tester {
static Vector3 NormalizeVector(Vector3 v)
{
float l = v.Length();
return new Vector3(v.X / l, v.Y / l, v.Z / l);
}
public static void Main(string[] args)
{
Vector3 v = new Vector3(0.0f, 0.0f, 7.0f);
Vector3 v2 = NormalizeVector(v);
Debug.WriteLine(v2.ToString());
v.Normalize();
Debug.WriteLine(v.ToString());
}
}
The code above produces this:
X: 0
Y: 0
Z: 1
X: 0
Y: 0
Z: 0.9999999
Why?
(Bonus points: Why Me?)

Look how they implemented it (e.g. in asm).
Maybe they wanted to be faster and produced something like:
l = 1 / v.length();
return new Vector3(v.X * l, v.Y * l, v.Z * l);
to trade 2 divisions against 3 multiplications (because they thought mults were faster than divs (which is for modern fpus most often not valid)). This introduced one level more of operation, so the less precision.
This would be the often cited "premature optimization".

Don't care about this. There's always some error involved when using floats. If you're curious, try changing to double and see if this still happens.

You should expect this when using floats, the basic reason being that the computer processes in binary and this doesn't map exactly to decimal.
For an intuitive example of issues between different bases consider the fraction 1/3. It cannot be represented exactly in Decimal (it's 0.333333.....) but can be in Terniary (as 0.1).
Generally these issues are a lot less obvious with doubles, at the expense of computing costs (double the number of bits to manipulate). However in view of the fact that a float level of precision was enough to get man to the moon then you really shouldn't obsess :-)
These issues are sort of computer theory 101 (as opposed to programming 101 - which you're obviously well beyond), and if your heading towards Direct X code where similar things can come up regularly I'd suggest it might be a good idea to pick up a basic computer theory book and read it quickly.

You have here an interesting discussion about String formatting of floats.
Just for reference:
Your number requires 24 bits to be represented, which means that you are using up the whole mantissa of a float (23bits + 1 implied bit).
Single.ToString () is ultimately implemented by a native function, so I cannot tell for sure what is going on, but my guess is that it uses the last digit to round the whole mantissa.
The reason behind this could be that you often get numbers that cannot be represented exactly in binary, so you would get a long mantissa; for instance, 0.01 is represented internally as 0.00999... as you can see by writing:
float f = 0.01f;
Console.WriteLine ("{0:G}", f);
Console.WriteLine ("{0:G}", (double) f);
by rounding at the seventh digit, you will get back "0.01", which is what you would have expected.
For what seen above, numbers with only 7 digits will not show this problem, as you already saw.
Just to be clear: the rounding is taking place only when you convert your number to a string: your calculations, if any, will use all the available bits.
Floats have a precision of 7 digits externally (9 internally), so if you go above that then rounding (with potential quirks) is automatic.
If you drop the float down to 7 digits (for instance, 1 to the left, 6 to the right) then it will work out and the string conversion will as well.
As for the bonus points:
Why you ? Because this code was 'eager to blow on you'.
(Vulcan... blow... ok.
Lamest.
Punt.
Ever)

If your code is broken by minute floating point rounding errors, then I'm afraid you need to fix it, as they're just a fact of life.

Related

Solving math with integers larger than any available integer data type

In some programming competitions where the numbers are larger than any available integer data type, we often use strings instead.
Question 1:
Given these large numbers, how to calculate e and f in the below expression?
(a/b) + (c/d) = e/f
note: GCD(e,f) = 1, i.e. they must be in minimised form. For example {e,f} = {1,2} rather than {2,4}.
Also, all a,b,c,d are large numbers known to us.
Question 2:
Can someone also suggest a way to find GCD of two big numbers (bigger than any available integer type)?
I would suggest using full bytes or words rather than strings.
It is relatively easy to think in base 256 instead of base 10 and a lot more efficient for the processor to not do multiplication and division by 10 all the time. Ideally, choose a word size that is half the processor's natural word size, as that makes carry easy to implement. Of course thinking in base 64K or 4G is slightly more complex, but even better than base 256.
The only downside is generating the initial big numbers from the ascii input, which you get for free in base 10. Using a larger word size you can make this more efficient by processing a number of digits initially into a single word (eg 9 digits at a time into 4G), then performing a long multiply of that single word into the correct offset in your large integer format.
A compromise might be to run your engine in base 1 billion: This will still be 9 or 81 times more efficient than using base 10!
The simplest way to solve this equation is to multiply a/b * d/d and c/d * b/b so they both have the common denominator b*d.
I think you will then need to prime factorise your big numbers e and f to find any common factors. Remember to search again for the same factor squared.
Of course, that means you have to write a prime generating sieve. You only need to generate factors up to the square root, or half the digits of the min value of e and f.
You could prime factorise b and d to get a lower initial denominator, but you will need to do it again anyway after the addition.
I think that the way to solve this is to separate the problem:
Process the input numbers as an array of characters (ie. std::string)
Make a class where each object can store an std::list (or similar) that represents one of the large numbers, and can do the needed arithmetic with your data
You can then solve your problems normally, without having to worry about your large inputs causing overflow.
Here's a webpage that explains how you can have such an arithmetic class (with sample code in C++ showing addition).
Once you have such an arithmetic class, you no longer need to worry about how to store the data or any overflow.
I get the impression that you already know how to find the GCD when you don't have overflow issues, but just in case, here's an explanation of finding the GCD (with C++ sample code).
As for the specific math problem:
// given formula: a/b + c/d = e/f
// = ( ( a*d + b*c ) / ( b*d ) )
// Define some variables here to save on copying
// (I assume that your class that holds the
// large numbers is called "ARITHMETIC")
ARITHMETIC numerator = a*d + b*c;
ARITHMETIC denominator = b*d;
ARITHMETIC gcd = GCD( numerator , denominator );
// because we know that GCD(e,f) is 1, this implies:
ARITHMETIC e = numerator / gcd;
ARITHMETIC f = denominator / gcd;

Numerical differentiation using Cauchy (CIF)

I am trying to create a module with a mathematical class for Taylor series, to have it easily accessible for other projects. Hence I wish to optimize it as far as I can.
For those who are not too familiar with Taylor series, it will be a necessity to be able to differentiate a function in a point many times. Given that the normal definition of the mathematical derivative of a function will require immense precision for higher order derivatives, I've decided to use Cauchy's integral formula instead. With a little bit of work, I've managed to rearrange the formula a little bit, as you can see on this picture: Rearranged formula. This provided me with much more accurate results on higher order derivatives than the traditional definition of the derivative. Here is the function i am currently using to differentiate a function in a point:
def myDerivative(f, x, dTheta, degree):
riemannSum = 0
theta = 0
while theta < 2*np.pi:
functionArgument = np.complex128(x + np.exp(1j*theta))
secondFactor = np.complex128(np.exp(-1j * degree * theta))
riemannSum += f(functionArgument) * secondFactor * dTheta
theta += dTheta
return factorial(degree)/(2*np.pi) * riemannSum.real
I've tested this function in my main function with a carefully thought out mathematical function which I know the derivatives of, namely f(x) = sin(x).
def main():
print(myDerivative(f, 0, 2*np.pi/(4*4096), 16))
pass
These derivatives seems to freak out at around the derivative of degree 16. I've also tried to play around with dTheta, but with no luck. I would like to have higher orders as well, but I fear I've run into some kind of machine precission.
My question is in it's simplest form: What can I do to improve this function in order to get higher order of my derivatives?
I seem to have come up with a solution to the problem. I did this by rearranging Cauchy's integral formula in a different way, by exploiting that the initial contour integral can be an arbitrarily large circle around the point of differentiation. Be aware that it is very important that the function is analytic in the complex plane for this to be valid.
New formula
Also this gives a new function for differentiation:
def myDerivative(f, x, dTheta, degree, contourRadius):
riemannSum = 0
theta = 0
while theta < 2*np.pi:
functionArgument = np.complex128(x + contourRadius*np.exp(1j*theta))
secondFactor = (1/contourRadius)**degree*np.complex128(np.exp(-1j * degree * theta))
riemannSum += f(functionArgument) * secondFactor * dTheta
theta += dTheta
return factorial(degree) * riemannSum.real / (2*np.pi)
This gives me a very accurate differentiation of high orders. For instance I am able to differentiate f(x)=e^x 50 times without a problem.
Well, since you are working with a discrete approximation of the derivative (via dTheta), sooner or later you must run into trouble. I'm surprised you were able to get at least 15 accurate derivatives -- good work! But to get derivatives of all orders, either you have to put a limit on what you're willing to accept and say it's good enough, or else compute the derivatives symbolically. Take a look at Sympy for that. Sympy probably has some functions for computing Taylor series too.

Comparison of two floats in Rust to arbitrary level of precision

How can I do a comparison at an arbitrary level of precision such that I can see that two numbers are the same? In Python, I would use a function like round(), so I am looking for something equivalent in Rust.
For example I have:
let x = 1.45555454;
let y = 1.45556766;
In my case, they are similar up to 2 decimal places. So x and y would become 1.46 for the purposes of comparison. I could format these, but that surely is slow, what is the best Rust method to check equivalence, so:
if x == y { // called when we match to 2 decimal places}
To further elucidate the problem and give some context. This is really for dollars and cents accuracy. So normally in python would use the round() function with all its problems. Yes I am aware of the limitations of floating point representations. There are two functions that compute amounts, I compute in dollars and need to handle the cents part to the nearest penny.
The reason to ask the community is that I suspect that if I roll my own, it could hit performance and it's this aspect - which is I why I'm employing Rust, so here I am. Plus I saw something called round() in the Rust documentation, but it seems to take zero parameters unlike pythons version.
From the Python documentation:
Note The behavior of round() for floats can be surprising: for example, round(2.675, 2) gives 2.67 instead of the expected 2.68. This is not a bug: it’s a result of the fact that most decimal fractions can’t be represented exactly as a float.
For more information, check out What Every Programmer Should Know About Floating-Point Arithmetic.
If you don't understand how computers treat floating points, don't use this code. If you know what trouble you are getting yourself into:
fn approx_equal(a: f64, b: f64, decimal_places: u8) -> bool {
let factor = 10.0f64.powi(decimal_places as i32);
let a = (a * factor).trunc();
let b = (b * factor).trunc();
a == b
}
fn main() {
assert!( approx_equal(1.234, 1.235, 1));
assert!( approx_equal(1.234, 1.235, 2));
assert!(!approx_equal(1.234, 1.235, 3));
}
A non-exhaustive list of things that are known (or likely) to be broken with this code:
Sufficiently large floating point numbers and/or number of decimal points
Denormalized numbers
NaN
Infinities
Values near zero (approx_equal(0.09, -0.09, 1))
A potential alternative is to use either a fixed-point or arbitrary-precision type, either of which are going to be slower but more logically consistent to the majority of humans.
This one seems to work pretty well for me.
fn approx_equal (a: f64, b: f64, dp: u8) -> bool {
let p = 10f64.powi(-(dp as i32));
(a-b).abs() < p
}

why does _mm_mulhrs_epi16() always do biased rounding to positive infinity?

Does anyone know why the pmulhrsw instruction or
_mm_mulhrs_epi16(x) := RoundDown((x * y + 16384) / 32768)
always rounds towards positive infinity? To me, this is terribly biased for negative numbers, because then a sequence like -0.6, 0.6, -0.6, 0.6, ... won't add up to 0 on average.
Is this behavior intentional or unintentional? If it's intentional, what could be the use? Is there an easy way to make it less biased?
Lucky for me, I can just change the order of my operations to get a less biased result (my function is a signed geometric mean):
__m128i ChooseSign(x, sign)
{
return _mm_sign_epi16(x, sign)
}
signsDifferent = _mm_srai_epi16(_mm_xor_si128(a, b), 15) // (a ^ b) >> 15
sign = _mm_andnot_si128(signsDifferent, a) // !signsDifferent & a
//result = ChooseSign(sqrt(a * b), sign) * fraction // biased
result = ChooseSign(sqrt(a * b) * fraction, sign)
A most serious mistake. I asked the same question on the Intel developer forums and andysem corrected me, pointing out the behavior is to round to the nearest integer.
I was mistaken into thinking it was biased because the formula from MSDN, https://msdn.microsoft.com/en-us/library/bb513995.aspx
was (x * y + 16384) >> 15. This looked very similar to the int(x + 0.5) method for rounding, which I know is biased for negative #s and cringe at. But I didn't realize right shift for negative numbers isn't the same as dividing and casting to an int.
Plus, it wasn't matching my non-SIMD reference implementation, which turns out to be biased since I was calculating int(sum / 9.0f), which rounds towards zero.
I should've had more doubt before questioning the behavior of something implemented in hardware, which would've been rigorously vetted, since int(x + 0.5) would be a very expensive mistake.
_mm_mulhrs_epi16() still has some bias of always rounding x.5 towards + infinity. But that's not a big deal for my application.

Is it possible to do an algebraic curve fit with just a single pass of the sample data?

I would like to do an algebraic curve fit of 2D data points, but for various reasons - it isn't really possible to have much of the sample data in memory at once, and iterating through all of it is an expensive process.
(The reason for this is that actually I need to fit thousands of curves simultaneously based on gigabytes of data which I'm reading off disk, and which is therefore sloooooow).
Note that the number of polynomial coefficients will be limited (perhaps 5-10), so an exact fit will be extremely unlikely, but this is ok as I'm trying to find an underlying pattern in data with a lot of random noise.
I understand how one can use a genetic algorithm to fit a curve to a dataset, but this requires many passes through the sample data, and thus isn't practical for my application.
Is there a way to fit a curve with a single pass of the data, where the state that must be maintained from sample to sample is minimal?
I should add that the nature of the data is that the points may lie anywhere on the X axis between 0.0 and 1.0, but the Y values will always be either 1.0 or 0.0.
So, in Java, I'm looking for a class with the following interface:
public interface CurveFit {
public void addData(double x, double y);
public List<Double> getBestFit(); // Returns the polynomial coefficients
}
The class that implements this must not need to keep much data in its instance fields, no more than a kilobyte even for millions of data points. This means that you can't just store the data as you get it to do multiple passes through it later.
edit: Some have suggested that finding an optimal curve in a single pass may be impossible, however an optimal fit is not required, just as close as we can get it in a single pass.
The bare bones of an approach might be if we have a way to start with a curve, and then a way to modify it to get it slightly closer to new data points as they come in - effectively a form of gradient descent. It is hoped that with sufficient data (and the data will be plentiful), we get a pretty good curve. Perhaps this inspires someone to a solution.
Yes, it is a projection. For
y = X beta + error
where lowercased terms are vectors, and X is a matrix, you have the solution vector
\hat{beta} = inverse(X'X) X' y
as per the OLS page. You almost never want to compute this directly but rather use LR, QR or SVD decompositions. References are plentiful in the statistics literature.
If your problem has only one parameter (and x is hence a vector as well) then this reduces to just summation of cross-products between y and x.
If you don't mind that you'll get a straight line "curve", then you only need six variables for any amount of data. Here's the source code that's going into my upcoming book; I'm sure that you can figure out how the DataPoint class works:
Interpolation.h:
#ifndef __INTERPOLATION_H
#define __INTERPOLATION_H
#include "DataPoint.h"
class Interpolation
{
private:
int m_count;
double m_sumX;
double m_sumXX; /* sum of X*X */
double m_sumXY; /* sum of X*Y */
double m_sumY;
double m_sumYY; /* sum of Y*Y */
public:
Interpolation();
void addData(const DataPoint& dp);
double slope() const;
double intercept() const;
double interpolate(double x) const;
double correlate() const;
};
#endif // __INTERPOLATION_H
Interpolation.cpp:
#include <cmath>
#include "Interpolation.h"
Interpolation::Interpolation()
{
m_count = 0;
m_sumX = 0.0;
m_sumXX = 0.0;
m_sumXY = 0.0;
m_sumY = 0.0;
m_sumYY = 0.0;
}
void Interpolation::addData(const DataPoint& dp)
{
m_count++;
m_sumX += dp.getX();
m_sumXX += dp.getX() * dp.getX();
m_sumXY += dp.getX() * dp.getY();
m_sumY += dp.getY();
m_sumYY += dp.getY() * dp.getY();
}
double Interpolation::slope() const
{
return (m_sumXY - (m_sumX * m_sumY / m_count)) /
(m_sumXX - (m_sumX * m_sumX / m_count));
}
double Interpolation::intercept() const
{
return (m_sumY / m_count) - slope() * (m_sumX / m_count);
}
double Interpolation::interpolate(double X) const
{
return intercept() + slope() * X;
}
double Interpolation::correlate() const
{
return m_sumXY / sqrt(m_sumXX * m_sumYY);
}
Why not use a ring buffer of some fixed size (say, the last 1000 points) and do a standard QR decomposition-based least squares fit to the buffered data? Once the buffer fills, each time you get a new point you replace the oldest and re-fit. That way you have a bounded working set that still has some data locality, without all the challenges of live stream (memoryless) processing.
Are you limiting the number of polynomial coefficients (i.e. fitting to a max power of x in your polynomial)?
If not, then you don't need a "best fit" algorithm - you can always fit N data points EXACTLY to a polynomial of N coefficients.
Just use matrices to solve N simultaneous equations for N unknowns (the N coefficients of the polynomial).
If you are limiting to a max number of coefficients, what is your max?
Following your comments and edit:
What you want is a low-pass filter to filter out noise, not fit a polynomial to the noise.
Given the nature of your data:
the points may lie anywhere on the X axis between 0.0 and 1.0, but the Y values will always be either 1.0 or 0.0.
Then you don't need even a single pass, as these two lines will pass exactly through every point:
X = [0.0 ... 1.0], Y = 0.0
X = [0.0 ... 1.0], Y = 1.0
Two short line segments, unit length, and every point falls on one line or the other.
Admittedly, an algorithm to find a good curve fit for arbitrary points in a single pass is interesting, but (based on your question), that's not what you need.
Assuming that you don't know which point should belong to which curve, something like a Hough Transform might provide what you need.
The Hough Transform is a technique that allows you to identify structure within a data set. One use is for computer vision, where it allows easy identification of lines and borders within the field of sight.
Advantages for this situation:
Each point need be considered only once
You don't need to keep a data structure for each candidate line, just one (complex, multi-dimensional) structure
Processing of each line is simple
You can stop at any point and output a set of good matches
You never discard any data, so it's not reliant on any accidental locality of references
You can trade off between accuracy and memory requirements
Isn't limited to exact matches, but will highlight partial matches too.
An approach
To find cubic fits, you'd construct a 4-dimensional Hough space, into which you'd project each of your data-points. Hotspots within Hough space would give you the parameters for the cubic through those points.
You need the solution to an overdetermined linear system. The popular methods are Normal Equations (not usually recommended), QR factorization, and singular value decomposition (SVD). Wikipedia has decent explanations, Trefethen and Bau is very good. Your options:
Out-of-core implementation via the normal equations. This requires the product A'A where A has many more rows than columns (so the result is very small). The matrix A is completely defined by the sample locations so you don't have to store it, thus computing A'A is reasonably cheap (very cheap if you don't need to hit memory for the node locations). Once A'A is computed, you get the solution in one pass through your input data, but the method can be unstable.
Implement an out-of-core QR factorization. Classical Gram-Schmidt will be fastest, but you have to be careful about stability.
Do it in-core with distributed memory (if you have the hardware available). Libraries like PLAPACK and SCALAPACK can do this, the performance should be much better than 1. The parallel scalability is not fantastic, but will be fine if it's a problem size that you would even think about doing in serial.
Use iterative methods to compute an SVD. Depending on the spectral properties of your system (maybe after preconditioning) this could converge very fast and does not require storage for the matrix (which in your case has 5-10 columns each of which are the size of your input data. A good library for this is SLEPc, you only have to find a the product of the Vandermonde matrix with a vector (so you only need to store the sample locations). This is very scalable in parallel.
I believe I found the answer to my own question based on a modified version of this code. For those interested, my Java code is here.

Resources