Float to Int type conversion in Python for large integers/numbers

Float to Int type conversion in Python for large integers/numbers - python-3.x

Need some help on the below piece of code that I am working on. Why original number in "a" is different from "c" when it goes through a type conversion. Any way we can make "a" and "c" same when it goes through float -> int type conversion?
a = '46700000000987654321'
b = float(a) => 4.670000000098765e+19
c = int(b) => 46700000000987652096
a == c => False

Please read this document about Floating Point Arithmetic: Issues and Limitations :
https://docs.python.org/3/tutorial/floatingpoint.html
for your example:
from decimal import Decimal
a='46700000000987654321'
b=Decimal(a)
print(b) #46700000000987654321
c=int(b)
print(c) #46700000000987654321

Modified version of my answer to another question (reasonably) duped to this one:
This happens because 46700000000987654321 is greater than the integer representational limits of a C double (which is what a Python float is implemented in terms of).
Typically, C doubles are IEEE 754 64 bit binary floating point values, which means they have 53 bits of integer precision (the last consecutive integer values float can represent are 2 ** 53 - 1 followed by 2 ** 53; it can't represent 2 ** 53 + 1). Problem is, 46700000000987654321 requires 66 bits of integer precision to store ((46700000000987654321).bit_length() will provide this information). When a value is too large for the significand (the integer component) alone, the exponent component of the floating point value is used to scale a smaller integer value by powers of 2 to be roughly in the ballpark of the original value, but this means that the representable integers start to skip, first by 2 (as you require >53 bits), then by 4 (for >54 bits), then 8 (>55 bits), then 16 (>56 bits), etc., skipping twice as far between representable values for each additional bit of magnitude you have beyond 53 bits.
In your case, 46700000000987654321, converted to float, has an integer value of 46700000000987652096 (as you noted), having lost precision in the low digits.
If you need arbitrarily precise base-10 floating point math, replace your use of float with decimal.Decimal (conveniently, your initial value is already a string, so you don't risk loss of precision between how you type a float and the actual value stored); the default precision will handle these values, and you can increase it if you need larger values. If you do that (and convert a to an int for the comparison, since a str is never equal to any numeric type), you get the behavior you expected:
from decimal import Decimal as Dec, getcontext
a = "46700000000987654321"
b = Dec(a); print(b) # => 46700000000987654321
c = int(b); print(c) # => 46700000000987654321
print(int(a) == c) # => True
Try it online!
If you echo the Decimals in an interactive interpreter instead of using print, you'd see Decimal('46700000000987654321') instead, which is the repr form of Decimals, but it's numerically 46700000000987654321, and if converted to int, or stringified via any method that doesn't use the repr, e.g. print, it displays as just 46700000000987654321.

Related

Strange result from Summation of numbers in Excel and Matlab [duplicate]

I am writing a program where I need to delete duplicate points stored in a matrix. The problem is that when it comes to check whether those points are in the matrix, MATLAB can't recognize them in the matrix although they exist.
In the following code, intersections function gets the intersection points:
[points(:,1), points(:,2)] = intersections(...
obj.modifiedVGVertices(1,:), obj.modifiedVGVertices(2,:), ...
[vertex1(1) vertex2(1)], [vertex1(2) vertex2(2)]);
The result:
>> points
points =
12.0000 15.0000
33.0000 24.0000
33.0000 24.0000
>> vertex1
vertex1 =
12
15
>> vertex2
vertex2 =
33
24
Two points (vertex1 and vertex2) should be eliminated from the result. It should be done by the below commands:
points = points((points(:,1) ~= vertex1(1)) | (points(:,2) ~= vertex1(2)), :);
points = points((points(:,1) ~= vertex2(1)) | (points(:,2) ~= vertex2(2)), :);
After doing that, we have this unexpected outcome:
>> points
points =
33.0000 24.0000
The outcome should be an empty matrix. As you can see, the first (or second?) pair of [33.0000 24.0000] has been eliminated, but not the second one.
Then I checked these two expressions:
>> points(1) ~= vertex2(1)
ans =
0
>> points(2) ~= vertex2(2)
ans =
1 % <-- It means 24.0000 is not equal to 24.0000?
What is the problem?
More surprisingly, I made a new script that has only these commands:
points = [12.0000 15.0000
33.0000 24.0000
33.0000 24.0000];
vertex1 = [12 ; 15];
vertex2 = [33 ; 24];
points = points((points(:,1) ~= vertex1(1)) | (points(:,2) ~= vertex1(2)), :);
points = points((points(:,1) ~= vertex2(1)) | (points(:,2) ~= vertex2(2)), :);
The result as expected:
>> points
points =
Empty matrix: 0-by-2

The problem you're having relates to how floating-point numbers are represented on a computer. A more detailed discussion of floating-point representations appears towards the end of my answer (The "Floating-point representation" section). The TL;DR version: because computers have finite amounts of memory, numbers can only be represented with finite precision. Thus, the accuracy of floating-point numbers is limited to a certain number of decimal places (about 16 significant digits for double-precision values, the default used in MATLAB).
Actual vs. displayed precision
Now to address the specific example in the question... while 24.0000 and 24.0000 are displayed in the same manner, it turns out that they actually differ by very small decimal amounts in this case. You don't see it because MATLAB only displays 4 significant digits by default, keeping the overall display neat and tidy. If you want to see the full precision, you should either issue the format long command or view a hexadecimal representation of the number:
>> pi
ans =
3.1416
>> format long
>> pi
ans =
3.141592653589793
>> num2hex(pi)
ans =
400921fb54442d18
Initialized values vs. computed values
Since there are only a finite number of values that can be represented for a floating-point number, it's possible for a computation to result in a value that falls between two of these representations. In such a case, the result has to be rounded off to one of them. This introduces a small machine-precision error. This also means that initializing a value directly or by some computation can give slightly different results. For example, the value 0.1 doesn't have an exact floating-point representation (i.e. it gets slightly rounded off), and so you end up with counter-intuitive results like this due to the way round-off errors accumulate:
>> a=sum([0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1]); % Sum 10 0.1s
>> b=1; % Initialize to 1
>> a == b
ans =
logical
0 % They are unequal!
>> num2hex(a) % Let's check their hex representation to confirm
ans =
3fefffffffffffff
>> num2hex(b)
ans =
3ff0000000000000
How to correctly handle floating-point comparisons
Since floating-point values can differ by very small amounts, any comparisons should be done by checking that the values are within some range (i.e. tolerance) of one another, as opposed to exactly equal to each other. For example:
a = 24;
b = 24.000001;
tolerance = 0.001;
if abs(a-b) < tolerance, disp('Equal!'); end
will display "Equal!".
You could then change your code to something like:
points = points((abs(points(:,1)-vertex1(1)) > tolerance) | ...
(abs(points(:,2)-vertex1(2)) > tolerance),:)
Floating-point representation
A good overview of floating-point numbers (and specifically the IEEE 754 standard for floating-point arithmetic) is What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg.
A binary floating-point number is actually represented by three integers: a sign bit s, a significand (or coefficient/fraction) b, and an exponent e. For double-precision floating-point format, each number is represented by 64 bits laid out in memory as follows:
The real value can then be found with the following formula:
This format allows for number representations in the range 10^-308 to 10^308. For MATLAB you can get these limits from realmin and realmax:
>> realmin
ans =
2.225073858507201e-308
>> realmax
ans =
1.797693134862316e+308
Since there are a finite number of bits used to represent a floating-point number, there are only so many finite numbers that can be represented within the above given range. Computations will often result in a value that doesn't exactly match one of these finite representations, so the values must be rounded off. These machine-precision errors make themselves evident in different ways, as discussed in the above examples.
In order to better understand these round-off errors it's useful to look at the relative floating-point accuracy provided by the function eps, which quantifies the distance from a given number to the next largest floating-point representation:
>> eps(1)
ans =
2.220446049250313e-16
>> eps(1000)
ans =
1.136868377216160e-13
Notice that the precision is relative to the size of a given number being represented; larger numbers will have larger distances between floating-point representations, and will thus have fewer digits of precision following the decimal point. This can be an important consideration with some calculations. Consider the following example:
>> format long % Display full precision
>> x = rand(1, 10); % Get 10 random values between 0 and 1
>> a = mean(x) % Take the mean
a =
0.587307428244141
>> b = mean(x+10000)-10000 % Take the mean at a different scale, then shift back
b =
0.587307428244458
Note that when we shift the values of x from the range [0 1] to the range [10000 10001], compute a mean, then subtract the mean offset for comparison, we get a value that differs for the last 3 significant digits. This illustrates how an offset or scaling of data can change the accuracy of calculations performed on it, which is something that has to be accounted for with certain problems.

Look at this article: The Perils of Floating Point. Though its examples are in FORTRAN it has sense for virtually any modern programming language, including MATLAB. Your problem (and solution for it) is described in "Safe Comparisons" section.

type
format long g
This command will show the FULL value of the number. It's likely to be something like 24.00000021321 != 24.00000123124

Try writing
0.1 + 0.1 + 0.1 == 0.3.
Warning: You might be surprised about the result!

Maybe the two numbers are really 24.0 and 24.000000001 but you're not seeing all the decimal places.

Check out the Matlab EPS function.
Matlab uses floating point math up to 16 digits of precision (only 5 are displayed).

math.sqrt function python gives same result for two different values [duplicate]

Why does the math module return the wrong result?
First test
A = 12345678917
print 'A =',A
B = sqrt(A**2)
print 'B =',int(B)
Result
A = 12345678917
B = 12345678917
Here, the result is correct.
Second test
A = 123456758365483459347856
print 'A =',A
B = sqrt(A**2)
print 'B =',int(B)
Result
A = 123456758365483459347856
B = 123456758365483467538432
Here the result is incorrect.
Why is that the case?

Because math.sqrt(..) first casts the number to a floating point and floating points have a limited mantissa: it can only represent part of the number correctly. So float(A**2) is not equal to A**2. Next it calculates the math.sqrt which is also approximately correct.
Most functions working with floating points will never be fully correct to their integer counterparts. Floating point calculations are almost inherently approximative.
If one calculates A**2 one gets:
>>> 12345678917**2
152415787921658292889L
Now if one converts it to a float(..), one gets:
>>> float(12345678917**2)
1.5241578792165828e+20
But if you now ask whether the two are equal:
>>> float(12345678917**2) == 12345678917**2
False
So information has been lost while converting it to a float.
You can read more about how floats work and why these are approximative in the Wikipedia article about IEEE-754, the formal definition on how floating points work.

The documentation for the math module states "It provides access to the mathematical functions defined by the C standard." It also states "Except when explicitly noted otherwise, all return values are floats."
Those together mean that the parameter to the square root function is a float value. In most systems that means a floating point value that fits into 8 bytes, which is called "double" in the C language. Your code converts your integer value into such a value before calculating the square root, then returns such a value.
However, the 8-byte floating point value can store at most 15 to 17 significant decimal digits. That is what you are getting in your results.
If you want better precision in your square roots, use a function that is guaranteed to give full precision for an integer argument. Just do a web search and you will find several. Those usually do a variation of the Newton-Raphson method to iterate and eventually end at the correct answer. Be aware that this is significantly slower that the math module's sqrt function.
Here is a routine that I modified from the internet. I can't cite the source right now. This version also works for non-integer arguments but just returns the integer part of the square root.
def isqrt(x):
"""Return the integer part of the square root of x, even for very
large values."""
if x < 0:
raise ValueError('square root not defined for negative numbers')
n = int(x)
if n == 0:
return 0
a, b = divmod(n.bit_length(), 2)
x = (1 << (a+b)) - 1
while True:
y = (x + n//x) // 2
if y >= x:
return x
x = y

If you want to calculate sqrt of really large numbers and you need exact results, you can use sympy:
import sympy
num = sympy.Integer(123456758365483459347856)
print(int(num) == int(sympy.sqrt(num**2)))

The way floating-point numbers are stored in memory makes calculations with them prone to slight errors that can nevertheless be significant when exact results are needed. As mentioned in one of the comments, the decimal library can help you here:
>>> A = Decimal(12345678917)
>>> A
Decimal('123456758365483459347856')
>>> B = A.sqrt()**2
>>> B
Decimal('123456758365483459347856.0000')
>>> A == B
True
>>> int(B)
123456758365483459347856
I use version 3.6, which has no hardcoded limit on the size of integers. I don't know if, in 2.7, casting B as an int would cause overflow, but decimal is incredibly useful regardless.

Add floats yields Double

This groovy:
float a = 1;
float b = 2;
def r = a + b;
Creates this Java code when reversed from .class with IntelliJ:
float a = (float)1;
float b = (float)2;
Object r = null;
double var7 = (double)a + (double)b;
r = Double.valueOf(var7);
So r contains a Double.
If I do this:
float a = 1;
float b = 2;
float r = a + b;
It generates code that performs the addition with doubles and converts back to float:
float a = (float)1;
float b = (float)2;
float r = 0.0F;
double var7 = (double)a + (double)b;
r = (float)var7;
So should one abandon floats with groovy as it seems to not want to use them anyway?

Groovy decided to take 5 standard result types of numeric operations. fall back to certain standard numeric types for operations. Those are int, long, BigInteger, double and BigDecimal. Thus adding/multiplying two floats returns a double. Division and pow are special.
From http://www.groovy-lang.org/syntax.html
Division and power binary operations aside,
binary operations between byte, char, short and int result in int
binary operations involving long with byte, char, short and int result
in long
binary operations involving BigInteger and any other integral type
result in BigInteger
binary operations between float, double and BigDecimal result in
double
binary operations between two BigDecimal result in BigDecimal
As for if you should abandon float... normally it is good enough to convert the double to float, especially since groovy is doing that automatically for you.

.net (C#) does something similar with 16-bit integers: Addition of Bytes or Int16s yield Int32. Possibly to prevent overflows.
Operations with "smaller" data types may result in the "bigger" data types. And with bigger, I mean more bits.
As illustrated in this example (more digits also means more bits)
15 (2 digits) x 15 (2 digits) = 225 (3 digits)
1.5 (2 digits) x 1.5 (2 digits) = 2.25 (3 digits)
However, adding two 32 bit integers returns jus a 32 bit integer. And adding two doubles just returns a double. This is because the (virtual) machine is optimized for working with these sizes, which is because physical processors used to be optimized for working with these sizes. Some of them still are. 32 bit operations are often still faster than 64 bit operations, even on 64 bit processors. However, 16 bit operations are not or barely.
Your compiler attempts to protect you against overflows, and allows you to check for them explicitly. So unless you have a good reason not to, I'd default to using these types, and optionally trunc to a compacter type when storing the data.
Good reasons not to include scenarios where you process large amounts (1000s) of numbers, e.g. for graphic processing.

While computing proper fraction in Haskell

I want to code a function makeFraction :: Float -> Float -> (Int, Int) which returns (x,y) whenever I say makeFraction a b such that x/y is a proper fraction equivalent to a / b. For eg, makeFraction 17.69 5.51 should return (61,19).
I have a subroutine to calculate gcd of two numbers but my first task is to convert a and b to Int e.g. 17.69 and 5.51 should be converted into 1769 and 551.
Now I want to do it for numbers with arbitrary decimal places. Prelude function does not help me much. For instance, when I say toFraction(0.2); it returns 3602879701896397 % 18014398509481984 which would severely strain the correctness of my later computations.
Later I tried getting fractional values by using another library function properFraction(17.69) which suppose to give me only 0.69 but it produces 0.69000...013 which is not I would accept in a proper state of mind.
It does look like a problem arising from Floating point arithmatic. Till now I am not doing any data manipulation but only asking for the part of stored bits which I should be able to fetch from processor registers/memory location. Is there any special function library in Haskell to do such tasks?
PS: Seems like some useful tips are here How to parse a decimal fraction into Rational in Haskell? . But since I have typed so much, I would like to post it. At least the context is different here.

Yes, it is the limited precision of floating-point arithmetic you're encountering. The floating-point format cannot represent 0.2 exactly, so toFraction is actually giving you the exact rational value of the Float number you get when you ask for 0.2.
Similarly, 17.69 cannot be represented exactly, and because the point floats, its best representation has a larger absolute error than the error in the representation of 0.69. Thus, when you take away the integer part, the resulting bits are not the same as if you had asked to represent 0.69 as good as possible from the beginning, and this difference can be seen when the implementation prints out the result in decimal form.

It seems to me that instead of using a floating-point type like Float or Double, you should do all your computations using a type that can represent those numbers exactly, like Rational. For example,
(17.69 :: Rational) / (5.51 :: Rational)
evaluates to 61 % 19

As mentioned in the other answers, a Float cannot necessarily represent a given decimal number exactly. In particular, a Float is stored internally using the form a/(2^m). As a result, real numbers like 3/10 can only ever be approximated by floating point numbers.
But if a decent approximation is all you need, this might help:
import Data.Ratio
convertFloat :: Float -> Rational
convertFloat f = let
denom = 10^6
num = fromInteger denom * f
in round num % denom
For example:
> convertFloat 17.69
1769 % 100
> convertFloat 17.69 / convertFloat 5.51
61 % 19

Check out base's Numeric module, especially the floatToDigits function.
> floatToDigits 10 17.69
([1,7,6,9],2)

problem with Double and Rational Number

I am writing a function in which I need to read a string contains floating point number and turn it back to Rational. But When I do toRational (read input :: Double), it will not turn for eg: 0.9 into 9 % 10 as expected, but instead 81..... % 9007...
Thx

This is correct behavior. The number 0.9 is not representable as a Double, not in Haskell, C, or Java. This is because Double and Float use base 2: they can only represent a certain subset of the dyadic fractions exactly.
To get the behavior you want, import the Numeric module and use the readFloat function. The interface is fairly wonky (it uses the ReadS type), so you'll have to wrap it a little. Here's how you can use it:
import Numeric
myReadFloat :: String -> Rational -- type signature is necessary here
myReadFloat str =
case readFloat str of
((n, []):_) -> n
_ -> error "Invalid number"
And, the result:
> myReadFloat "0.9"
9 % 10

Binary floating point numbers cannot precisely represent all the numbers that base-10 can. The number you see as 0.9 is not precisely 0.9 but something very close to it. Never use floating-point types where decimal precision is needed — they just can't do it.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Float to Int type conversion in Python for large integers/numbers - python-3.x

Please read this document about Floating Point Arithmetic: Issues and Limitations : https://docs.python.org/3/tutorial/floatingpoint.html for your example: from decimal import Decimal a='46700000000987654321' b=Decimal(a) print(b) #46700000000987654321 c=int(b) print(c) #46700000000987654321

Related

Strange result from Summation of numbers in Excel and Matlab [duplicate]

math.sqrt function python gives same result for two different values [duplicate]

Add floats yields Double

While computing proper fraction in Haskell

problem with Double and Rational Number

Categories

Resources