I have a scenario where i'm trying to execute a complex application on AIX and Linux.
During the execution the code makes use of the intrinsic function sqrt() for computation, but the result obtained is different on both the machines.
Does anyone know the reason for this behavior? Is there anyway to overcome this?
P.S
Some values are equal on both machines but majority of them are different.
Processors that follow the IEEE 754 specification must return the exact result for square root (or correctly rounded when exact cannot be represented). For the same input values, floating point format, and rounding mode, different IEEE 754 compliant processors must return an identical result. No variation is allowed. Possible reasons for seeing different results:
One of the processors does not follow the IEEE 754 floating point specification.
The values are really the same, but a print related bug or difference makes them appear different.
The rounding mode or precision control is not set the same on both systems.
One system attempts to follow the IEEE 754 specification but has an imperfection in its square root function.
Did you compare binary output to eliminate the possibility of a print formatting bug or difference?
Most processors today support IEEE 754 floating point. An example where IEEE 754 accuracy is not guaranteed is with the OpenCL native_sqrt function. OpenCL defines native_sqrt (in addition to IEEE 754 compliant sqrt) so that speed can be traded for accuracy if desired.
Bugs in IEEE 754 sqrt implementations are not too common today. A difficult case for an IEEE 754 sqrt function is when the rounding mode is set to nearest and the actual result is very near the midway point between two floating point representations. A method for generating these difficult square root arguments can be found in a paper by William Kahan, How to Test Whether SQRT is Rounded Correctly.
There may be slight differences in the numeric representation of the hardware on the two computers or in the algorithm used for the sqrt function of the two compilers. Finite precision arithmetic is not as the same the arithmetic of real numbers and slight differences in calculations should be expected. To judge whether the differences are unusual, you should state the numeric type that you are using (as asked by ChuckCottrill) and give examples. What is the relative difference. For values of order unity, 1E-9 is an expected difference for single precision floating point.
Check the floating point formats available for each cpu. Are you using single-precision or double-precision floating point? You need to use a floating point format with similar precision on both machines if you want comparable/similar answers.
Floating point is an approximation. A single precision floating point only uses 24 bits (including sign bit) for mantissa, and the other 8 bits for exponent. This allows for about 8 digits of precision. Double precision floating point uses 53 bits, allowing for much more precision.
Lacking detail about the binary values of the differing floating point numbers on the two systesm, and the printed representations of these values, you have rounding or representation differences.
Related
I created a simple function to solve e to the power pi explained here
def e_to_power_pi(number):
return (1 + (1/number)) ** (number * math.pi)
from the look of it, clearly simple piece of code. But look at the output difference of these two values:
Example one:
e_to_power_pi(1000000000000000)
output:
32.71613881872869
Example Two:
e_to_power_pi(10000000000000000)
output:
1.0
upon tear down of the code, I learnt that 1.0 is coming from this portion
1 + (1/number) of the code above.
When I tore it down further, I learnt that 1/10000000000000000 outputs correct answer as it should 0.00000000000000001.
But when I add 1 to result it returns 1.0 instead of 1.00000000000000001.
I presumed that it must be default round off in python that may be changing the value.
I decided to use round(<float>, 64) # where <float> is any computation taking place in code above to try and get 64 digits post decimal result. But still I got stuck with the same result when addition was performed i.e. 1.0.
Can someone guide me or point me to the direction where I can learn or further read about it?
You are using double-precision binary floating-point format, with 53-bit significand precision, which is not quite enough to represent your fraction:
10000000000000001/10000000000000000 = 1.0000000000000001
See IEEE 754 double-precision binary floating-point format: binary64
Mathematica can operate in precisions higher than the architecturally imposed machine precision.
See Wolfram Language: MachinePrecision
The Mathematica screenshot below shows you would need a significand precision higher than 53-bit to obtain a result other than 1.
N numericises the fractional result to the requested precision. Machine precision is the default; higher precision calculations are done in software.
Having a file test2.py with the following contents:
print(2.0000000000000003)
print(2.0000000000000002)
I get this output:
$ python3 test2.py
2.0000000000000004
2.0
I thought lack of memory allocated for float might be causing this but 2.0000000000000003 and 2.0000000000000002 need same amount of memory.
IEEE 754 64-bit binary floating point always uses 64 bits to store a number. It can exactly represent a finite subset of the binary fractions. Looking only at the normal numbers, if N is a power of two in its range, it can represent a number of the form, in binary, 1.s*N where s is a string of 52 zeros and ones.
All the 32 bit binary integers, including 2, are exactly representable.
The smallest exactly representable number greater than 2 is 2.000000000000000444089209850062616169452667236328125. It is twice the binary fraction 1.0000000000000000000000000000000000000000000000000001.
2.0000000000000003 is closer to 2.000000000000000444089209850062616169452667236328125 than to 2, so it rounds up and prints as 2.0000000000000004.
2.0000000000000002 is closer to 2.0, so it rounds down to 2.0.
To store numbers between 2.0 and 2.000000000000000444089209850062616169452667236328125 would require a different floating point format likely to take more than 64 bits for each number.
Floats are not stored as integers are, with each bit signaling a yes/no term of 1,2,4,8,16,32,... value that you add up to get the complete number. They are stored as sign + mantissa + exponent in base 2. Several combinations have special meaning (NaN, +-inf, -0,...). Positive and negative numbers are idential in mantissa and exponent, only the sign differs.
At any given time they have a specific bit-length they are "put into". They can not overflow.
They have however a minimal accuracy, if you try to fit numbers into them that would need a bigger accuracy you get rounding errors - thats what you see in your example.
More on floats and storage (with example):
http://stupidpythonideas.blogspot.de/2015/01/ieee-floats-and-python.html
(which links to a more technical https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html)
More on accuracy of floats:
- Floating Point Arithmetic: Issues and Limitations
I'm using Python3's decimal module. Is the underlying arithmetic done using the processors floating point types, or does it use integers? The notion that the results are 'exact' and of arbitrary precision suggests to me that integer maths is used below the surface.
Indeed it is integer math, not float math for sure. Roughly speaking every float is two parts - before and after the decimal dot (integer and the remainder). Thanks to that the calculations are done using integer arithmetic and hence are not rounded up so they are staying precise even if you sum up a very large value with a very small fraction.
This comes at a price - the number of operations is significantly larger and it is not always necessary to be so precise at all times. That is why most of the calculations are done using float arithmetic that may cause a loss of precision when there are many arithmetic operations on floats or there are significant differences between the values (e.g. 10^10 ratio and more). There is a separate field of computer science: numerical analysis or numerical methods that study the clever ways to get the most of the speed of float calculations while maintaining highest precision possible.
I'm a bit confused of the local space coordinate system. Suppose I have a complex object in the local space. I know when I want to put it in the world space I have to multiply it with Scale,Rotate,Translate matrix. But the problem is the local coordinate only ranged from -1.0f to 1.0f, when I want to have vertex like (1/500,1/100,1/100) things will not work, everything will become 0 due to the float accuracy problem.
The only solution to me now is separate them into lots of local space systems and ProjectView each individually to put them together. It seems not the correct way of solving the problem. I've been checked lots of books but none of them mentioned this issue. I really want to know how to solve it.
when I want to have vertex like (1/500,1/100,1/100) things will not work
What makes you think that? The float accuracy problem does not mean something will coerce to 0 if it can't be accurately represented. It just means, it will coerce to the floating point number closest to the intended figure.
It's the very same as writing down, e.g., 3/9 with at most 6 significant decimal digits: 0.33334 – it didn't coerce to 0. And the very same goes for floating point.
Now you may be familiar with scientific notation: x·10^y – this is essentially decimal floating point, a mantissa x and an exponent y which essentially specifies the order of magnitude. In binary floating point it becomes x·2^y. In either case the significant digits are in the mantissa. Your typical floating point number (in OpenGL) has a mantissa of 23 bits, which boils down to an amount of 22 significant binary digits (which are about 7 decimal digits).
I really want to know how to solve it.
The real trouble with floating point numbers is, if you have to mix and merge numbers over a large range of orders of magnitudes. As long as the numbers are of similar order of magnitudes, everything happens with just the mantissa. And that one last change in order of magnitude to the [-1, 1] range will not hurt you; heck this can be done by "normalizing" the floating point value and then simply dropping the exponent.
Recommended read: http://floating-point-gui.de/
Update
One further thing: If you're writing 1/500 in a language like C, then you're performing an integer division and that will of course round down to 0. If you want this to be a floating point operation you either have to write floating point literals or cast to float, i.e.
1./500.
or
(float)1/(float)500
Note that casting one of the operands to float suffices to make this a floating point division.
How can I implement a code in verilog that resolves a exponential equation that has numbers that must be represented as fixed point.
For example I have this equation on C++ and wish to convert to Verilog or VHDL:
double y = 0.1+0.75*(1.0/(1.0+exp((x[i]+40.5)/6.0)));
Where 'y' and 'x' must be fixed point numbers. And 'x' is a vector also.
I looked up for modules and libraries that has fixed point but none of them have exponentials.
Verilog has a real data type that provides simulation-time support for floating-point numbers. It also has an exponentiation operator, e.g., a ** b computes a to the power of b.
However, code written using the real datatype is generally not synthesizable. Instead, in real hardware designs, support for fixed and floating point numbers is generally achieved by implementing arithmetic logic units that implement, e.g., the IEEE floating point standard.
Most of the time, such a design will require at least a couple of cycles even for basic operations like addition and multiplication. More complex algorithms like division, sine, cosine, etc., are generally implemented using algorithms based on approximating polynomials.
If you really want to understand how to represent and manipulate fixed point and floating point numbers, you should probably get a textbook for a mathematics course such as Numerical Methods, or an EE course on Computer Arithmetic.