To my great surprise, I found that rounding a NaN value in Haskell returns a gigantic negative number:
round (0/0)
-269653970229347386159395778618353710042696546841345985910145121736599013708251444699062715983611304031680170819807090036488184653221624933739271145959211186566651840137298227914453329401869141179179624428127508653257226023513694322210869665811240855745025766026879447359920868907719574457253034494436336205824
The same thing happens with floor and ceiling.
What is happening here? Is this behavior intended? Of course, I understand that anyone who doesn't want this behavior can always write another function that checks isNaN - but are there existing alternative standard library functions that handle NaN more sanely (for some definition of "more sanely")?
TL;DR: NaN have an arbitrary representation between 2 ^ 1024 and 2 ^ 1025 (bounds not included), and - 1.5 * 2 ^ 1024 (which is one possible) NaN happens to be the one you hit.
Why any reasoning is off
What is happening here?
You're entering the region of undefined behaviour. Or at least that is what you would call it in some other languages. The report defines round as follows:
6.4.6 Coercions and Component Extraction
The ceiling, floor, truncate, and round functions each take a real fractional argument and return an integral result. … round x returns the nearest integer to x, the even integer if x is equidistant between two integers.
In our case x does not represent a number to begin with. According to 6.4.6, y = round x should fulfil that any other z from round's codomain has an equal or greater distance:
y = round x ⇒ ∀z : dist(z,x) >= dist(y,x)
However, the distance (aka the subtraction) of numbers is defined only for, well, numbers. If we used
dist n d = fromIntegral n - d
we get in trouble soon: any operation that includes NaN will return NaN again, and comparisons on NaN fail, so our property above does not hold for any z if x was a NaN to begin with. If we check for NaN, we can return any value, but then our property holds for all pairs:
dist n d = if isNaN d then constant else fromIntegral n - d
So we're completely arbitrary in what round x shall return if x was not a number.
Why do we get that large number regardless?
"OK", I hear you say, "that's all fine and dandy, but why do I get that number?" That's a good question.
Is this behavior intended?
Somewhat. It isn't really intended, but to be expected. First of all, we have to know how Double works.
IEE 754 double precision floating point numbers
A Double in Haskell is usually a IEEE 754 compliant double precision floating point number, that is a number that has 64 bits and is represented with
x = s * m * (b ^ e)
where s is a single bit, m is the mantissa (52 bits) and e is the exponent (11 bits, floatRange). b is the base, and its usually 2 (you can check with floadRadix). Since the value of m is normalized, every well-formed Double has a unique representation.
IEEE 754 NaN
Except NaN. NaN is represented as the emax+1, as well as a non-zero mantissa. So if the bitfield
SEEEEEEEEEEEMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
represents a Double, what's a valid way to represent NaN?
?111111111111000000000000000000000000000000000000000000000000000
^
That is, a single M is set to 1, the other are not necessary to set this notion. The sign is arbitrary. Why only a single bit? Because its sufficient.
Interpret NaN as Double
Now, when we ignore the fact that this is a malformed Double—a NaN– and really, really, really want to interpret it as number, what number would we get?
m = 1.5
e = 1024
x = 1.5 * 2 ^ 1024
= 3 * 2 ^ 1024 / 2
= 3 * 2 ^ 1023
And lo and behold, that's exactly the number you get for round (0/0):
ghci> round $ 0 / 0
-269653970229347386159395778618353710042696546841345985910145121736599013708251444699062715983611304031680170819807090036488184653221624933739271145959211186566651840137298227914453329401869141179179624428127508653257226023513694322210869665811240855745025766026879447359920868907719574457253034494436336205824
ghci> negate $ 3 * 2 ^ 1023
-269653970229347386159395778618353710042696546841345985910145121736599013708251444699062715983611304031680170819807090036488184653221624933739271145959211186566651840137298227914453329401869141179179624428127508653257226023513694322210869665811240855745025766026879447359920868907719574457253034494436336205824
Which brings our small adventure to a halt. We have a NaN, which yields a 2 ^ 1024, and we have some non-zero mantissa, which yields a result with absolute value between 2 ^ 1024 < x < 2 ^ 1025.
Note that this isn't the only way NaN can get represented:
In IEEE 754, NaNs are often represented as floating-point numbers with the exponent emax + 1 and nonzero significands. Implementations are free to put system-dependent information into the significand. Thus there is not a unique NaN, but rather a whole family of NaNs.
For more information, see the classic paper on floating point numbers by Goldberg.
This has long been observed as a problem. Here're a few tickets filed against GHC on this very topic:
https://ghc.haskell.org/trac/ghc/ticket/3070
https://ghc.haskell.org/trac/ghc/ticket/11553
https://ghc.haskell.org/trac/ghc/ticket/3676
Unfortunately, this is a thorny issue with lots of ramifications. My personal belief is that this is a genuine bug and it should be fixed properly by throwing an error. But you can read the comments on these tickets to get an understanding of the tricky issues preventing GHC from implementing a proper solution. Essentially, it comes down to speed vs. correctness, and this is one point where (i) the Haskell report is woefully underspecified, and (ii) GHC compromises the latter for the former.
Related
I don't understand how integer_decode in num_traits works. For instance: we have
use num_traits::Float;
let num = 2.0f32;
// (8388608, -22, 1)
let (mantissa, exponent, sign) = Float::integer_decode(num);
But how we get those integers?
Binary representation of 2.0f32 has 0 sign bit, 1 bit as leading bit in exponent and mantissa consisting of zeros. How to get integer decode and why we choose this particular decomposition and not 8388608*2 as mantissa and -23 as exponent?
I didn't write the function, so take this answer with a grain of salt, as it's more of a gut feeling than knowledge. The rationale behind it is not explained in the comments in the function implementation, so unless the author of the code speaks up, we can't deliver more than educational guesses.
f32 is based on IEEE-754, which specifies that a 2.0 shall be represented as the following three parts:
the sign bit 0
indicates that 2.0 is positive
the exponent 128
it's one byte that indicates the exponent, with 127 representing 0. 128 means 1
the mantissa 0
the mantissa consists of 23 bits and has an implicit 1. in front of it. So 0 means 1.0.
To get the actual number, you need to do 0 * (-1) + 2 ^ (128 - 127) * 1.0, which is 2 ^ 1 * 1 = 2.
Now this is not the only way to compute that. You could also do:
map the sign bit to 1 and -1
instead of prefixing the mantissa with 1., add a 1 in front of it, making it an integer. (this avoids having to use a float to decode a float, which is nonsense for obvious reasons)
subtract 127 from the exponent, making it signed. Then, remove 23 from it to compensate that our mantissa is now shifted by 23 bits (because the mantissa is 23 bits long and we moved the comma all the way back to make it an integer).
This would, for 2.0 give us:
sign -1
mantissa 0b100000000000000000000000 = 8388608
exponent 128 - 127 - 23 = -22
Now we can do sign * mantissa * 2 ^ exponent, as specified in the documentation to get our value back.
Note how fast calculating those integers was: a binary decision for the sign, a binary or operation for the mantissa and a single u8 subtraction for the exponent (a single one because one can combine - 127 - 23 to - 150 beforehand).
why we choose this particular decomposition and not 8388608*2 as mantissa and -23 as exponent
The short version is that this guarantees that all possible mantissas can be treated the same way. It's 23 bits long and a 1 with the entire mantissa attached to it is always a valid integer. In the case of 0 this is a 1 with 23 0s, 0b100000000000000000000000, which is 8388608.
integer_decode() documentation is quite clear:
Returns the mantissa, base 2 exponent, and sign as integers, respectively. The original number can be recovered by sign * mantissa * 2 ^ exponent.
1 * 8388608 * 2^-22 == 2
Mostly IEEE 754 uses this description of finite floating-point numbers: A floating-point number has the form (−1)s×be×m, where:
s is 0 or 1.
e is any integer emin ≤ e ≤ emax.
m is a number represented by a digit string of the form d0.d1d2…dp−1 where di is an integer digit 0 ≤ di < b.
This form is described in IEEE-754 2019 clause 3.3. b is the base, p is the precision (number of digits in base b), and emin and emax are bounds on the exponent. This form is useful for certain things, such as describing the normalized form as starting with a leading “1.” or “0.” when the base is two. However, the standard also says, in the same clause:
It is also convenient for some purposes to view the significand as an 10 integer; in which case the finite floating-point numbers are described thus:
Signed zero and non-zero floating-point numbers of the form (−1)s×bp×c, where
s is 0 or 1.
q is any integer emin ≤ q+p−1 ≤ emax.
c is a number represented by a digit string of the form d0d1d2…dp−1 where di is an integer digit 0 ≤ di ≤ b (c is therefore an integer with 0 ≤ c < bp).
(My reproduction of the IEEE-754 text changes some of the typography slightly.) Note two things. First, this matches the results Float::integer_decode gives you. In the Float format, p is 24, so m should have 24 bits, not 25, so it can be 8,388,608 (223) and cannot be 16,777,216 (224).
Second, what makes this form useful is m is always an integer and can be any integer in that range—the low digit of m is immediately to the left of the radix point, so the consecutive values representable in this range of the floating-point format are consecutive integers, and we can analyze them and write proofs using number theory.
You could use alternate forms that are mathematically equivalent but let the significand be 224, but then the low digit in such a form would have to be zero (because there is no bit in the format to represent it having any other value), so it is not particularly useful.
I am working in Haskell and have had an error where, for really large floats z, sin(z) returns some value outside of the range [-1, 1].
I'm learning Haskell for the first time, so I have had very little luck debugging and the program just crashes when sin(z) does return a value outside of the above range as sin(z) is an input into another function that only accepts values inside the range [-1, 1].
Additionally, I don't have access to the other function, I only can send in a value, but it keeps crashing when sin(z) returns a number either greater than 1 or less than -1.
Is there any way to figure out why sin(z) is doing this?
The sin :: Double -> Double function returns a number strictly between -1 and 1 for all finite inputs, no matter how large. In particular, for the largest representable finite positive double, it returns a value that's roughly 0.005:
> sin (1.7976931348623157E+308 :: Double)
4.961954789184062e-3
and for the largest representable finite negative double, it returns a value that's the negative of that:
> sin (-1.7976931348623157E+308 :: Double)
-4.961954789184062e-3
What's undoubtedly happening is that your input to sin has exceeded the finite range of Double. For example, the following double isn't actually representable as a finite double and is "rounded" to infinity:
> 1.7976931348623159E+308 :: Double
Infinity
If you feed such an infinite value to sin, you get NaN:
> sin (1.7976931348623159E+308 :: Double)
NaN
which will undoubtedly cause problems when fed to a function expecting finite numbers between -1 and 1. This can be "fixed" with min:
> min (sin (1.7976931348623159E+308 :: Double)) 1
1.0
but this fix is largely useless because you have a much bigger problem going on.
For numbers this large, the precision of a Double is on the order of plus or minus 1e292. That is, two "adjacent" representable finite doubles of this size are about 1e292 apart and the sin of two such numbers might as well be random numbers between -1 and 1, completely unrelated to any calculation you're trying to complete. Whatever you're trying to do with these numbers can't possibly be working as you intend.
It seems like this is a floating point error; see this similar post. So for very large values, the sin function is returning a value slightly above 1, due to rounding errors.
To solve your problem, I would cap the return value at 1. Specifically, return the min 1 (sin z) instead of just the sin z directly.
Edit: replaced max with min.
I'm trying to display a float but rounding it, so that
f = 5.545
Displays as : 5.55 while
f = 5.544
displays as : 5.54
I've seen method to display only the two first decimals, but I want to have it rounded.
Thank you !
This happens because of the way floating-point numbers are represented in the computer: they're actually in base-2, rather than base-10 (a bit of oversimplification, but good enough). As a consequence, when you type in 0.545, the computer actually records that as 0.5499999999... - very close to 0.545, but slightly smaller. And since it's smaller than 0.545, it's no wonder it gets rounded to 0.54.
If you really have to have exact base-10 numbers, you should use Decimal instead of Float or Double. That package specifically takes care to represent floating-point numbers in base-10 without loss.
x :: Decimal
x = 0.545
show x
> "0.545"
The caveat is that printf does not support Decimal, so you'd have to display it by rounding via roundTo and converting to string via show. Another caveat that roundTo does "banker's rounding" - if the last digit is five, it rounds to the nearest even digit, so we'd need to counteract that as a special case (I couldn't find a ready-to-use function that rounds by arithmetic rules):
displayDecimal :: Decimal -> String
displayDecimal x = show (rounded + compensate)
where rounded = roundTo 2 x
compensate = if (x - rounded) == 0.005 then 0.01 else 0
displayDecimal 0.545
> "0.55"
displayDecimal 0.5450000000001
> "0.55"
displayDecimal 0.544
> "0.54"
displayDecimal 0.5449999999999
> "0.54"
However, if you just want this to work for numbers with three decimal places, you can get away with just adding a very small value before rounding, like 0.00001. This value is small enough that it won't mess up your actual numbers, but large enough to compensate for the base-2 vs. base-10 discrepancy:
displayRounded :: Double -> String
displayRounded x = printf "%.2f" (x + 0.00001)
displayRounded 0.544
> "0.54"
displayRounded 0.545
> "0.55"
So I didn't quite find a real solution to this, but I realised something :
Printf already does the job, but not exactly how I wanted.
Let's say I want to round 1.445, it will display 1.44. But if the number was 1.446, then it would have displayed 1.45.
Not exactly what I wanted, but close enough.
This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 1 year ago.
Does anyone have an explanation for this strange rounding in haskell (GHCi, version 7.2.1). Everything seems fine unless I multiply with 100.
*Main> 1.1
1.1
*Main> 1.1 *10
11.0
*Main> 1.1 *100
110.00000000000001
*Main> 1.1 *1000
1100.0
*Main> 1.1 *10000
11000.0
Edit: what is puzzeling me is that the rounding error only shows when multiplying with 100.
Edit(2): The comments I received made me realize, that this it totally unrelated to haskell, but a general issue with floating point numbers. Numerous questions were already asked (and answered) about floating-point number oddities, where the undelying issue typcally was confusing floats with real numbers.
Perl, python, javascript and C all report 1.1 * 100.0 = 110.00000000000001. Here is what C does
double 10.0 * 1.1 = 11.000000000000000000000000
double 100.0 * 1.1 = 110.000000000000014210854715
double 110.0 = 110.000000000000000000000000
double 1000.0 * 1.1 = 1100.000000000000000000000000
The question "why does this happen only when multiplying with 100" (even though there is a precise representation for 110.0) is still unanswered, but I suppose there is no simple answer, other than fully stepping through a floating-point multiplication (Thanks to Dax Fohl for stressing that 10 is nothing special in binary)
The number 1.1 cannot be represented in finite form in binary. It looks like 1.00011001100110011...
"Rounding errors" are just mathematically inevitable with simple floating-point arithmetic. If you want accuracy, use a Decimal number type.
http://support.microsoft.com/kb/42980
The question "why does this happen only when multiplying with 100" (even though there is a precise representation for 110.0) is still unanswered, but I suppose there is no simple answer, other than fully stepping through a floating-point multiplication
Well, I think there may be things one can say without going to the length of writing the binary multiplication, assuming IEEE 754 arithmetic and the (default) round-to-nearest rounding mode.
The double 1.1d is half a ULP from the real number 1.1. When you multiply it by 10, 100, 1000, and a few more powers of ten, you multiply by a number N that is exactly representable as a double, with the additional property that the result of the real multiplication 1.1 * N is exactly representable as a double, too. That makes 1.1 * N a good candidate for the result of the floating-point multiplication, which we'll write RN(N * 1.1d). But still the multiplication is not automatically rounded to 1.1 * N:
RN(N * 1.1d) = N * 1.1d + E1 with |E1| <= 0.5 * ULP(N*1.1d)
= N * (1.1 + E2) + E1 with |E2| <= 0.5 * ULP(1.1)
= N * 1.1 + (N * E2 + E1)
And the question now is how |N * E2 + E1| compares to ULP(N*1.1d), because since we have assumed N * 1.1 is exactly a floating-point number, if the result of the multiplication (which is also a floating-point number) is within 1 ULP of N * 1.1, it has to be N * 1.1.
In short, it is not so much what's special about 100… It is what's special about the real 1.1d * 100, which 1) is close to a power of two while being below it and 2) has an error of the same sign as the error when converting the real 1.1 to double.
Everytime the real N * 1.1d is relatively closer to the nearest inferior power of two than 1.1 is to 1, the result of the floating-point multiplication of 1.1d by N has to be exactly N * 1.1 (I think). An example of this case is N=1000, N*1.1d ~ 1100, just above 1024.
When the real N * 1.1d is relatively closer to the immediately superior power of two than 1.1 is to 2, there may be a floating-point number that represents N * 1.1d better than N * 1.1 does. But if the errors E1 and E2 compensate each other (i.e. have opposite signs), this should not happen.
I've written a small function in C, which almost do the same work as standart function `fcvt'. As you may know, this function takes a float/double and make a string, representing this number in ANSI characters. Everything works ;-)
For example, for number 1.33334, my function gives me string: "133334" and set up special integer variable `decimal_part', in this example will be 1, which means in decimal part only 1 symbol, everything else is a fraction.
Now I'm curious about what to do standart C function `printf'. It can take %a or %e as format string. Let me cite for %e (link junked):
"double" argument is output in scientific notation
[-]m.nnnnnne+xx
... The exponent always contains two digits.
It said: "The exponent always contains two digits". But what is an Exponent? This is the main question. And also, how to get this 'exponent' from my function above or from `fcvt'.
The notation might be better explained if we expand the e:
[-]m.nnnnnn * (10^xx)
So you have one digit of m (from 0 to 9, but it will only ever be 0 if the entire value is 0), and several digits of n. I guess it might be best to show with examples:
1 = 1.0000 * 10^0 = 1e0
10 = 1.0000 * 10^1 = 1e1
10000 = 1.0000 * 10^4 = 1e4
0.1 = 1.0000 * 10^-1 = 1e-1
1,419 = 1.419 * 10^3 = 1.419e3
0.00000123 = 1.23 * 10^-5 = 1.23e-5
You can look up scientific notation off Google, but it is useful for expressing very large or small numbers like 1232100000000000000 would be 1.2321e24 (I didn't actually count, exponent may be inaccurate).
In C, I think you can actually extract the exponent from the top 12 bits (the first being the sign which you will have to ignore). See: IEEE758-1985 Floating Point
The exponent is the power 10 is raised to then multiplied by the base.
SI is explained at wikipeida. http://en.wikipedia.org/wiki/Scientific_notation
m.nnnnnne+xx is logically equal to m.nnnnnn * 10 ^ +xx
In scientific notation, the exponent is the ten to the XX power, so 1234.5678 can be represented as 1.2345678E03 where the normalized form is multiplied by 10^3 to get the "real" answer.
400 = 4 * 10 ^ 2
2 is the exponent.
If you write a number in scientific notation then the exponent is part of that notation.
You can see a full description here http://en.wikipedia.org/wiki/Scientific_notation, but basically its just another way to write a number, typically used for very large or very small numbers.
Say you have the number 300, that is equal to 3 * 100, or 3 * 10^2 in scientific notation.
If you use %e it will be printed as 3.0e+02