Equation to convert IEEE754 to decimal (normalised form) - decimal

Any help will be greatly appreciated!
I have been working with IEEE754 floating point numbers(32-bit single precision), and i am trying to convert it to decimal normalised format. I have found two equations to do it, but i'm not sure which one is correct.
N= (-1)^s * 1.F *2^(e-127)
OR
N= (-1)^s * 1+F * 2^(e-127)
s= sign bit
F= fraction/mantissa
e=exponent
which of the above equation is correct ? in other words is it 1*F or 1+F ?

Been a while since I've done floating point conversion, but that is a plus, not multiply (So 1+F, NOT 1*F).
The wikipedia article on the subject is pretty good and also says it's a plus: https://en.wikipedia.org/wiki/Single-precision_floating-point_format
From Iowa state with more concise description: http://class.ece.iastate.edu/arun/CprE281_F05/ieee754/ie5.html
Another interesting way to solve it: How to convert an IEEE 754 single-precision binary floating-point to decimal?

Related

Limiting floats to a varying number (decided by the end-user) of decimal points in Python

So, I've learned quite a few ways to control the precision when I'm dealing with floats.
Here is an example of 3 different techniques:
somefloat=0.0123456789
print("{0:.10f}".format(somefloat))
print("%.5f" % somefloat)
print(Decimal(somefloat).quantize(Decimal(".01")))
This will print:
0.0123456789
0.01235
0.01
In all of the above examples, the precision itself is a fixed value, but how could I turn the precision itself a variable that could be
be entered by the end-user?
I mean, the fixed precision values are now inside quatations marks, and I can't seem to find a way to add any variable there. Is there a way, anyway?
I'm on Python 3.
Using format:
somefloat=0.0123456789
precision = 5
print("{0:.{1}f}".format(somefloat, precision))
# 0.01235
Using old-style string interpolation:
print("%.*f" % (precision, somefloat))
# 0.01235
Using decimal:
import decimal
D = decimal.Decimal
q = D(10) ** -precision
print(D(somefloat).quantize(q))
# 0.01235

Python3.4 limiting floats to two decimal points

I am using python 3.4
and I want to limiting the a float number to two decimal points
round(1.2377, 2)
format(1.2377, '.2f')
These two would give my 1.24, but I don't want 1.24, I need 1.23, how do I do it?
You can convert to string and slice then convert to float :
>>> num=1.2377
>>> float(str(num)[:-2])
1.23
read more about Floating Point Arithmetic: Issues and Limitations

Loss of precision 'sqrt' Haskell

In the ghci terminal, I was computing some equations with Haskell using the sqrt function.
I notice that I would sometimes lose precision in my sqrt result, when it was supposed to be simplified.
For example,
sqrt 4 * sqrt 4 = 4 -- This works well!
sqrt 2 * sqrt 2 = 2.0000000000000004 -- Not the exact result.
Normally, I would expect a result of 2.
Is there a way to get the right simplification result?
How does that work in Haskell?
There are usable precise number libraries in Haskell. Two that come to mind are cyclotomic and the CReal module in the numbers package. (Cyclotomic numbers don't support all the operations on complex numbers that you might like, but square roots of integers and rationals are in the domain.)
>>> import Data.Complex.Cyclotomic
>>> sqrtInteger 2
e(8) - e(8)^3
>>> toReal $ sqrtInteger 2
Just 1.414213562373095 -- Maybe Double
>>> sqrtInteger 2 * sqrtInteger 2
2
>>> toReal $ sqrtInteger 2 * sqrtInteger 2
Just 2.0
>>> rootsQuadEq 3 2 1
Just (-1/3 + 1/3*e(8) + 1/3*e(8)^3,-1/3 - 1/3*e(8) - 1/3*e(8)^3)
>>> let eq x = 3*x*x + 2*x + 1
>>> eq (-1/3 + 1/3*e(8) + 1/3*e(8)^3)
0
>>> import Data.Number.CReal
>>> sqrt 2 :: CReal
1.4142135623730950488016887242096980785697 -- Show instance cuts off at 40th place
>>> sqrt 2 * sqrt 2 :: CReal
2.0
>>> sin 3 :: CReal
0.1411200080598672221007448028081102798469
>>> sin 3*sin 3 + cos 3*cos 3 :: CReal
1.0
You do not lose precision. You have limited precision.
The square root of 2 is a real number but not a rational number, therefore it's value cannot be represented exactly by any computer (except representing it symbolically, of course).
Even if you define a very large precision type, it will not be able to represent the square root of 2 exactly. You may get more precision, but never enough to represent that value exactly (unless you have a computer with infinite memory, in which case please hire me).
The explanation for these results lies in the type of the values returned by the sqrt function:
> :t sqrt
sqrt :: Floating a => a -> a
The Floating a means that the value returned belongs to the Floating type class.
The values of all types belonging to this class are stored as floating point numbers. These sacrifice precision for the sake of covering a larger range of numbers.
Double precision floating point numbers can cover very large ranges but they have limited precision and cannot encode all possible numbers. The square root of 2 (√2) is one such number:
> sqrt 2
1.4142135623730951
> sqrt 2 + 0.000000000000000001
1.4142135623730951
As you see above, it is impossible for double precision floating point numbers to be precise enough to represent √2 + 0.000000000000000001, it is simply rounded to the closest approximation which can be expressed using floating point encoding.
As mentioned by another poster, √2 is an irrational number which can be simplified to mean that it requires an infinite number of digits to represent correctly. As such it cannot be represented faithfully using floating point numbers. This leads to errors such as the one you noticed when multiplying it with itself.
You can learn about floating points on their wikipedia page: http://en.wikipedia.org/wiki/Floating_point.
I especially recommend that you read the answer to this other Stack Overflow question: Floating Point Limitations and follow the mentioned link, it will help you understand what's going on under the hood.
Note that this is a problem in every language, not just Haskell. One way to get rid of it entirely is to use symbolic computation libraries but they are much slower than the floating point numbers offered by CPUs. For many computations the loss of precision due to floating points is not a problem.

While computing proper fraction in Haskell

I want to code a function makeFraction :: Float -> Float -> (Int, Int) which returns (x,y) whenever I say makeFraction a b such that x/y is a proper fraction equivalent to a / b. For eg, makeFraction 17.69 5.51 should return (61,19).
I have a subroutine to calculate gcd of two numbers but my first task is to convert a and b to Int e.g. 17.69 and 5.51 should be converted into 1769 and 551.
Now I want to do it for numbers with arbitrary decimal places. Prelude function does not help me much. For instance, when I say toFraction(0.2); it returns 3602879701896397 % 18014398509481984 which would severely strain the correctness of my later computations.
Later I tried getting fractional values by using another library function properFraction(17.69) which suppose to give me only 0.69 but it produces 0.69000...013 which is not I would accept in a proper state of mind.
It does look like a problem arising from Floating point arithmatic. Till now I am not doing any data manipulation but only asking for the part of stored bits which I should be able to fetch from processor registers/memory location. Is there any special function library in Haskell to do such tasks?
PS: Seems like some useful tips are here How to parse a decimal fraction into Rational in Haskell? . But since I have typed so much, I would like to post it. At least the context is different here.
Yes, it is the limited precision of floating-point arithmetic you're encountering. The floating-point format cannot represent 0.2 exactly, so toFraction is actually giving you the exact rational value of the Float number you get when you ask for 0.2.
Similarly, 17.69 cannot be represented exactly, and because the point floats, its best representation has a larger absolute error than the error in the representation of 0.69. Thus, when you take away the integer part, the resulting bits are not the same as if you had asked to represent 0.69 as good as possible from the beginning, and this difference can be seen when the implementation prints out the result in decimal form.
It seems to me that instead of using a floating-point type like Float or Double, you should do all your computations using a type that can represent those numbers exactly, like Rational. For example,
(17.69 :: Rational) / (5.51 :: Rational)
evaluates to 61 % 19
As mentioned in the other answers, a Float cannot necessarily represent a given decimal number exactly. In particular, a Float is stored internally using the form a/(2^m). As a result, real numbers like 3/10 can only ever be approximated by floating point numbers.
But if a decent approximation is all you need, this might help:
import Data.Ratio
convertFloat :: Float -> Rational
convertFloat f = let
denom = 10^6
num = fromInteger denom * f
in round num % denom
For example:
> convertFloat 17.69
1769 % 100
> convertFloat 17.69 / convertFloat 5.51
61 % 19
Check out base's Numeric module, especially the floatToDigits function.
> floatToDigits 10 17.69
([1,7,6,9],2)

Microsoft.DirectX.Vector3.Normalize() inconsistency

Two ways to normalize a Vector3 object; by calling Vector3.Normalize() and the other by normalizing from scratch:
class Tester {
static Vector3 NormalizeVector(Vector3 v)
{
float l = v.Length();
return new Vector3(v.X / l, v.Y / l, v.Z / l);
}
public static void Main(string[] args)
{
Vector3 v = new Vector3(0.0f, 0.0f, 7.0f);
Vector3 v2 = NormalizeVector(v);
Debug.WriteLine(v2.ToString());
v.Normalize();
Debug.WriteLine(v.ToString());
}
}
The code above produces this:
X: 0
Y: 0
Z: 1
X: 0
Y: 0
Z: 0.9999999
Why?
(Bonus points: Why Me?)
Look how they implemented it (e.g. in asm).
Maybe they wanted to be faster and produced something like:
l = 1 / v.length();
return new Vector3(v.X * l, v.Y * l, v.Z * l);
to trade 2 divisions against 3 multiplications (because they thought mults were faster than divs (which is for modern fpus most often not valid)). This introduced one level more of operation, so the less precision.
This would be the often cited "premature optimization".
Don't care about this. There's always some error involved when using floats. If you're curious, try changing to double and see if this still happens.
You should expect this when using floats, the basic reason being that the computer processes in binary and this doesn't map exactly to decimal.
For an intuitive example of issues between different bases consider the fraction 1/3. It cannot be represented exactly in Decimal (it's 0.333333.....) but can be in Terniary (as 0.1).
Generally these issues are a lot less obvious with doubles, at the expense of computing costs (double the number of bits to manipulate). However in view of the fact that a float level of precision was enough to get man to the moon then you really shouldn't obsess :-)
These issues are sort of computer theory 101 (as opposed to programming 101 - which you're obviously well beyond), and if your heading towards Direct X code where similar things can come up regularly I'd suggest it might be a good idea to pick up a basic computer theory book and read it quickly.
You have here an interesting discussion about String formatting of floats.
Just for reference:
Your number requires 24 bits to be represented, which means that you are using up the whole mantissa of a float (23bits + 1 implied bit).
Single.ToString () is ultimately implemented by a native function, so I cannot tell for sure what is going on, but my guess is that it uses the last digit to round the whole mantissa.
The reason behind this could be that you often get numbers that cannot be represented exactly in binary, so you would get a long mantissa; for instance, 0.01 is represented internally as 0.00999... as you can see by writing:
float f = 0.01f;
Console.WriteLine ("{0:G}", f);
Console.WriteLine ("{0:G}", (double) f);
by rounding at the seventh digit, you will get back "0.01", which is what you would have expected.
For what seen above, numbers with only 7 digits will not show this problem, as you already saw.
Just to be clear: the rounding is taking place only when you convert your number to a string: your calculations, if any, will use all the available bits.
Floats have a precision of 7 digits externally (9 internally), so if you go above that then rounding (with potential quirks) is automatic.
If you drop the float down to 7 digits (for instance, 1 to the left, 6 to the right) then it will work out and the string conversion will as well.
As for the bonus points:
Why you ? Because this code was 'eager to blow on you'.
(Vulcan... blow... ok.
Lamest.
Punt.
Ever)
If your code is broken by minute floating point rounding errors, then I'm afraid you need to fix it, as they're just a fact of life.

Resources