The code below would result in moneyDouble = 365.24567874299998 and I need it to be exactly 365.245678743
I wouldn't mind having to set a precision and getting some extra zeros to the right.
This number is used to calculate money transaction so it needs to be exact.
std::string money ("365.245678743");
std::string::size_type sz; // alias of size_t
double moneyDouble = std::stod (money,&sz);
Floating-point numbers and exact precision don't mix, period [link]. For this reason, monetary calculations should never be done in floating-point [link]. Use a fixed-point arithmetic library, or just use integer arithmetic and interpret it as whatever fractional part you need. Since your precision requirements seem to be very high (lots of decimals), a big number library might be necessary.
While library recommendations are off-topic on Stack Overflow, this old question seems to offer a good set of links to libraries you might find useful.
The result of your erroneous output of moneyDouble is because moneyDouble is a floating point number. They cannot express tenths, hundredths, thousandths, etc exactly.
Furthermore, floating-point numbers are really expressed in binary form, meaning that only (some) binary numbers can be expressed exactly in floating point. Not to mention that they have finite accuracy, so they can only store only a limited number of digits (including those after the decimal point).
Your best bet is to use fixed-point arithmetic, integer arithmetic, or implement a rational-number class, and you might need number libraries since you may have to deal with very big numbers in very high precision.
See Is floating point math broken? for more information about the unexpected results of floating-point accuracy.
I am using GT 740M (CC 3.5) and I have a RGB to Lab conversion kernel. Using compute capability 1.0 - 1.2 the whole kernel is executed in 924 microseconds however using the compute capability of 1.3 or higher (up to 3.5) the kernel is executed in around 3 ms. According to the table from wikipedia http://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications I found out that it could be caused by the double-precision floating-point operations so I used the -use_fast_math flag, but it did not help.
What can be the reason of the performance hit?
The whole source code can be seen in http://pastebin.com/JjhH101y
cc 1.0 - 1.2 devices do not support double-precision floating point operations. Those operations will be "demoted" to single precision floating point operations on those devices.
At first glance, all of your variables are float not double, but your constants are all double-precision constants.
Therefore arithmetic like this:
a=(x-y)*500.0;
will involve a double-precision floating point multiply on compile targets that support it (which will be subsequently reduced to a float). On compile targets that don't support it, the above operation will be handled entirely via single-precision math.
The --use-fast-math option does not affect conversion between double and float as discussed above.
I would suggest that you start by decorating all your constants as floating-point constants:
a=(x-y)*500.0f;
You might also want to carefully review the CUDA math api to be sure that you get what you want from operations like this:
exp(log(x)/3.0)
in terms of single or double-precision arithmetic.
I've been working with System.Numerics.Complex recently, and I've started to notice the typical floating-point "drift" where the value stored gets calculated a tenth of a millionth off or something like that, which is well-known and common with the float type and even the double type. I looked into the Complex struct, and sure enough, it used double variables. Why does it use double values to store its data and not decimal values, which are designed to prevent this? How do I work around this?
To answer your question:
doubles are several orders of magnitude faster, as operations are done at the hardware level
base-2 floats can actually be more accurate for large computations, as there is less "wobble" when shifting up and down exponents: 1 bit of precision is less than 1 decimal digit. Moreover, base-2 can use an implicit leading bit, which means they can represent more numbers than other bases.
complex numbers are typically used for scientific/engineering applications, where small relative errors of approx 10-16 are outweighed by other sources of error (e.g. due to measurement or the model).
decimals on the other hand are typically used for "accounting" type operations, where round-off error is typically negligible (i.e. addition of small numbers, multiplication by integers, etc.)
I would like to know whether i can assume that same operations on same 64-bit floating point numbers gives exactly the same results on any modern PC and in most common programming languages? (C++, Java, C#, etc.). We can assume, that we are operating on numbers and result is also a number (no NaNs, INFs and so on).
I know there are two very simmilar standards of computation using floating point numbers (IEEE 854-1987 and IEEE 754-2008). However I don't know how it is in practice.
Modern processors that implement 64-bit floating-point typically implement something that is close to the IEEE 754-1985 standard, recently superseded by the 754-2008 standard.
The 754 standard specifies what result you should get from certain basic operations, notably addition, subtraction, multiplication, division, square root, and negation. In most cases, the numeric result is specified precisely: The result must be the representable number that is closest to the exact mathematical result in the direction specified by the rounding mode (to nearest, toward infinity, toward zero, or toward negative infinity). In "to nearest" mode, the standard also specifies how ties are broken.
Because of this, operations that do not involve exception conditions such as overflow will get the same results on different processors that conform to the standard.
However, there are several issues that interfere with getting identical results on different processors. One of them is that the compiler is often free to implement sequences of floating-point operations in a variety of ways. For example, if you write "a = bc + d" in C, where all variables are declared double, the compiler is free to compute "bc" in either double-precision arithmetic or something with more range or precision. If, for example, the processor has registers capable of holding extended-precision floating-point numbers and doing arithmetic with extended-precision does not take any more CPU time than doing arithmetic with double-precision, a compiler is likely to generate code using extended-precision. On such a processor, you might not get the same results as you would on another processor. Even if the compiler does this regularly, it might not in some circumstances because the registers are full during a complicated sequence, so it stores the intermediate results in memory temporarily. When it does that, it might write just the 64-bit double rather than the extended-precision number. So a routine containing floating-point arithmetic might give different results just because it was compiled with different code, perhaps inlined in one place, and the compiler needed registers for something else.
Some processors have instructions to compute a multiply and an add in one instruction, so "bc + d" might be computed with no intermediate rounding and get a more accurate result than on a processor that first computes bc and then adds d.
Your compiler might have switches to control behavior like this.
There are some places where the 754-1985 standard does not require a unique result. For example, when determining whether underflow has occurred (a result is too small to be represented accurately), the standard allows an implementation to make the determination either before or after it rounds the significand (the fraction bits) to the target precision. So some implementations will tell you underflow has occurred when other implementations will not.
A common feature in processors is to have an "almost IEEE 754" mode that eliminates the difficulty of dealing with underflow by substituting zero instead of returning the very small number that the standard requires. Naturally, you will get different numbers when executing in such a mode than when executing in the more compliant mode. The non-compliant mode may be the default set by your compiler and/or operating system, for reasons of performance.
Note that an IEEE 754 implementation is typically not provided just by hardware but by a combination of hardware and software. The processor may do the bulk of the work but rely on the software to handle certain exceptions, set certain modes, and so on.
When you move beyond the basic arithmetic operations to things like sine and cosine, you are very dependent on the library you use. Transcendental functions are generally calculated with carefully engineered approximations. The implementations are developed independently by various engineers and get different results from each other. On one system, the sin function may give results accurate within an ULP (unit of least precision) for small arguments (less than pi or so) but larger errors for large arguments. On another system, the sin function might give results accurate within several ULP for all arguments. No current math library is known to produce correctly rounded results for all inputs. There is a project, crlibm (Correctly Rounded Libm), that has done some good work toward this goal, and they have developed implementations for significant parts of the math library that are correctly rounded and have good performance, but not all of the math library yet.
In summary, if you have a manageable set of calculations, understand your compiler implementation, and are very careful, you can rely on identical results on different processors. Otherwise, getting completely identical results is not something you can rely on.
If you mean getting exactly the same result, then the answer is no.
You might even get different results for debug (non-optimized) builds vs. release builds (optimized) on the same machine in some cases, so don't even assume that the results might be always identical on different machines.
(This can happen e.g. on a computer with an Intel processor, if the optimizer keeps a variable for an intermediate result in a register, that is stored in memory in the unoptimized build. Since Intel FPU registers are 80 bit, and double variables are 64 bit, the intermediate result will be stored with greater precision in the optimized build, causing different values in later results.).
In practice, however, you may often get the same results, but you shouldn't rely on it.
Modern FPUs all implement IEEE754 floats in single and double formats, and some in extended format. A certain set of operations are supported (pretty much anything in math.h), with some special instructions floating around out there.
assuming you are talking about applying multiple operations, I do not think you will get exact numbers. CPU architecture, compiler use, optimization settings will change the results of your computations.
if you mean the exact order of operations (at the assembly level), I think you will still get variations.for example Intel chips use extended precision (80 bits) internally, which may not be the case for other CPUs. (I do not think extended precision is mandated)
The same C# program can bring out different numerical results on the same PC, once compiled in debug mode without optimization, second time compiled in release mode with optimization enabled. That's my personal experience. We did not regard this when we set up an automatic regression test suite for one of our programs for the first time, and were completely surprised that a lot of our tests failed without any apparent reason.
For C# on x86, 80-bit FP registers are used.
The C# standard says that the processor must operate at the same precision as, or greater than, the type itself (i.e. 64-bit in the case of a 'double'). Promotions are allowed, except for storage. That means that locals and parameters could be at greater than 64-bit precision.
In other words, assigning a member variable to a local variable could (and in fact will under certain circumstances) be enough to give an inequality.
See also: Float/double precision in debug/release modes
For the 64-bit data type, I only know of "double precision" / "binary64" from the IEEE 754 (1985 and 2008 don't differ much here for common cases) being used.
Note: The radix types defined in IEEE 854-1987 are superseded by IEEE 754-2008 anyways.
From a previous questions someone indicated that Excel uses a 64 bit (8 byte) double-precision floating.
Is that correct - Is there any material on this at all?
I am trying to tie off numbers and this is killing me!
According to this article yes, it uses 64 bit double precision floating point numbers. The article also describes the rounding errors etc associated with this format.
Have a look at
Floating-point arithmetic
Under the section Precision
A floating-point number is stored in
binary in three parts within a 65-bit
range: the sign, the exponent, and the
mantissa.
Here is another article
Understanding Floating Point Precision, aka “Why does Excel Give Me Seemingly Wrong Answers?”
Have a look at the section Structure of a Floating Point Number