What is the default scale of BigDecimal in groovy? And Rounding?
So when trying to do calculations:
def x = 10.0/30.0 //0.3333333333
def y = 20.0/30.0 //0.6666666667
Base on this, I can assume that it uses scale 10 and rounding half up.
Having trouble finding an official documentation saying that though.
You can find it in the official documentation: The case of the division operator
5.5.1. The case of the division operator
The division operators / (and /= for division and assignment) produce
a double result if either operand is a float or double, and a
BigDecimal result otherwise (when both operands are any combination of
an integral type short, char, byte, int, long, BigInteger or
BigDecimal).
BigDecimal division is performed with the divide() method if the
division is exact (i.e. yielding a result that can be represented
within the bounds of the same precision and scale), or using a
MathContext with a precision of the maximum of the two operands'
precision plus an extra precision of 10, and a scale of the maximum of
10 and the maximum of the operands' scale.
And check it in BigDecimalMath.java:
public Number divideImpl(Number left, Number right) {
BigDecimal bigLeft = toBigDecimal(left);
BigDecimal bigRight = toBigDecimal(right);
try {
return bigLeft.divide(bigRight);
} catch (ArithmeticException e) {
// set a DEFAULT precision if otherwise non-terminating
int precision = Math.max(bigLeft.precision(), bigRight.precision()) + DIVISION_EXTRA_PRECISION;
BigDecimal result = bigLeft.divide(bigRight, new MathContext(precision));
int scale = Math.max(Math.max(bigLeft.scale(), bigRight.scale()), DIVISION_MIN_SCALE);
if (result.scale() > scale) result = result.setScale(scale, BigDecimal.ROUND_HALF_UP);
return result;
}
}
Related
Given two segment endpoints A and B (in two dimensions), I would like to perform linear interpolation based on a value t, i.e.:
C = A + t(B-A)
In the ideal world, A, B and C should be collinear. However, we are operating with limited floating-point here, so there will be small deviations. To work around numerical issues with other operations I am using robust adaptive routines originally created by Jonathan Shewchuk. In particular, Shewchuk implements an orientation function orient2d that uses adaptive precision to exactly test the orientation of three points.
Here my question: is there a known procedure how the interpolation can be computed using the floating-point math, so that it lies exactly on the line between A and B? Here, I care less about the accuracy of the interpolation itself and more about the resulting collinearity. In another terms, its ok if C is shifted around a bit as long as collinearity is satisfied.
The bad news
The request can't be satisfied. There are values of A and B for which there is NO value of t other than 0 and 1 for which lerp(A, B, t) is a float.
A trivial example in single precision is x1 = 12345678.f and x2 = 12345679.f. Regardless of the values of y1 and y2, the required result must have an x component between 12345678.f and 12345679.f, and there's no single-precision float between these two.
The (sorta) good news
The exact interpolated value, however, can be represented as the sum of 5 floating-point values (vectors in the case of 2D): one for the formula's result, one for the error in each operation [1] and one for multiplying the error by t. I'm not sure if that will be useful to you. Here's a 1D C version of the algorithm in single precision that uses fused multiply-add to calculate the product error, for simplicity:
#include <math.h>
float exact_sum(float a, float b, float *err)
{
float sum = a + b;
float z = sum - a;
*err = a - (sum - z) + (b - z);
return sum;
}
float exact_mul(float a, float b, float *err)
{
float prod = a * b;
*err = fmaf(a, b, -prod);
return prod;
}
float exact_lerp(float A, float B, float t,
float *err1, float *err2, float *err3, float *err4)
{
float diff = exact_sum(B, -A, err1);
float prod = exact_mul(diff, t, err2);
*err1 = exact_mul(*err1, t, err4);
return exact_sum(A, prod, err3);
}
In order for this algorithm to work, operations need to conform to IEEE-754 semantics in round-to-nearest mode. That's not guaranteed by the C standard, but the GNU gcc compiler can be instructed to do so, at least in processors supporting SSE2 [2][3].
It is guaranteed that the arithmetic addition of (result + err1 + err2 + err3 + err4) will be equal to the desired result; however, there is no guarantee that the floating-point addition of these quantities will be exact.
To use the above example, exact_lerp(12345678.f, 12345679.f, 0.300000011920928955078125f, &err1, &err2, &err3, &err4) returns a result of 12345678.f and err1, err2, err3 and err4 are 0.0f, 0.0f, 0.300000011920928955078125f and 0.0f respectively. Indeed, the correct result is 12345678.300000011920928955078125 which can't be represented as a single-precision float.
A more convoluted example: exact_lerp(0.23456789553165435791015625f, 7.345678806304931640625f, 0.300000011920928955078125f, &err1, &err2, &err3, &err4) returns 2.3679010868072509765625f and the errors are 6.7055225372314453125e-08f, 8.4771045294473879039287567138671875e-08f, 1.490116119384765625e-08f and 2.66453525910037569701671600341796875e-15f. These numbers add up to the exact result, which is 2.36790125353468550173374751466326415538787841796875 and can't be exactly stored in a single-precision float.
All numbers in the examples above are written using their exact values, rather than a number that approximates to them. For example, 0.3 can't be represented exactly as a single-precision float; the closest one has an exact value of 0.300000011920928955078125 which is the one I've used.
It might be possible that if you calculate err1 + err2 + err3 + err4 + result (in that order), you get an approximation that is considered collinear in your use case. Perhaps worth a try.
References
[1] Graillat, Stef (2007). Accurate Floating Point Product and Exponentiation.
[2] Enabling strict floating point mode in GCC
[3] Semantics of Floating Point Math in GCC
I'm using below two statements :-
double foo = 20.00
float bar = 20.00
println foo == bar
And
double foo = 20.01
float bar = 20.01
println foo == bar
It gives the output as :-
true
false
Can anyone know what makes difference between these two statements?
double and float values don't have an exact internal representation for every value. The only decimal values that can be represented as an IEEE-754 binary floating-point for two decimal points are 0, 0.25, 0.5, 0.75 and 1. The rest of representations will always be slightly off, with small differences between doubles and floats creating this inequality behaviour.
This is not just valid for Groovy, but for Java as well.
For example:
double foo = 20.25
float bar = 20.25
println foo == bar
Output:
true
The 0.1 part of 20.01 is infinite repeating in binary; 20.01 =
10100.00000010100011110101110000101000111101011100001010001111010111...
floats are rounded (to nearest) to 24 significant bits; doubles are rounded to 53. That makes the float
10100.0000001010001111011
and the double
10100.000000101000111101011100001010001111010111000011
In decimal, those are
20.0100002288818359375 and
20.010000000000001563194018672220408916473388671875, respectively.
(You could see this directly using my decimal to floating-point converter.)
The Groovy Float aren't kept in the memory precisely. That is the main cause for the differences you have.
In Groovy the definition of the precision by the number of digits after the right side of the dot can be achieved by the following method signature:
public float trunc(int precision)
precision - the number of decimal places to keep.
For more details please follow the Class Float documentation.
It is more prefered to use BigDecimal class as a floating number when using the Groovy language.
The conversion from Number to String is much easier and there is the option to define the precision of the floating number **in the constructor.
BigDecimal(BigInteger unscaledVal, int scale)
Translates a BigInteger unscaled value and an int scale into a BigDecimal.
For more details please follow the Java BigDecimal documentation. As the Groovy language is based on the Java language. More over the BigDecimal will represent the exact value of the number.
I have a function that has the type Int -> Int -> Int -> Int. When i use div a b as a value for a variable in the function it seems, that the value gets rounded down to 0 if the return of div a b is 1/2 or anything double like.
Is this correct? Does Haskell cut of values like in java, if a double is forced into an integer?
div 1 2 doesn't return 0.5, which is then converted to the integer 0. It returns 0 in the first place. div performs integer division and as such always returns an integer (or other Integral type depending on which type you used it with). There's no doubles involved.
When you do convert a double to an integer, the method of rounding depends on which method you used. For example floor would round the number down whereas round would round to the nearest integer. There are no implicit conversions in Haskell, so any conversion will happen through a function.
Does Haskell cut off values like in java
no it does not.
When doing integer division, Java rounds towards zero, whereas Haskell rounds downwards; so in Haskell
\> (-9) `div` 10
-1
whereas in Java -9 / 10 is zero:
public class IntDiv{
public static void main(String []args){
double a = (-9) / 10;
System.out.printf("%.2f\n", a); // would print 0.00
}
}
This groovy:
float a = 1;
float b = 2;
def r = a + b;
Creates this Java code when reversed from .class with IntelliJ:
float a = (float)1;
float b = (float)2;
Object r = null;
double var7 = (double)a + (double)b;
r = Double.valueOf(var7);
So r contains a Double.
If I do this:
float a = 1;
float b = 2;
float r = a + b;
It generates code that performs the addition with doubles and converts back to float:
float a = (float)1;
float b = (float)2;
float r = 0.0F;
double var7 = (double)a + (double)b;
r = (float)var7;
So should one abandon floats with groovy as it seems to not want to use them anyway?
Groovy decided to take 5 standard result types of numeric operations. fall back to certain standard numeric types for operations. Those are int, long, BigInteger, double and BigDecimal. Thus adding/multiplying two floats returns a double. Division and pow are special.
From http://www.groovy-lang.org/syntax.html
Division and power binary operations aside,
binary operations between byte, char, short and int result in int
binary operations involving long with byte, char, short and int result
in long
binary operations involving BigInteger and any other integral type
result in BigInteger
binary operations between float, double and BigDecimal result in
double
binary operations between two BigDecimal result in BigDecimal
As for if you should abandon float... normally it is good enough to convert the double to float, especially since groovy is doing that automatically for you.
.net (C#) does something similar with 16-bit integers: Addition of Bytes or Int16s yield Int32. Possibly to prevent overflows.
Operations with "smaller" data types may result in the "bigger" data types. And with bigger, I mean more bits.
As illustrated in this example (more digits also means more bits)
15 (2 digits) x 15 (2 digits) = 225 (3 digits)
1.5 (2 digits) x 1.5 (2 digits) = 2.25 (3 digits)
However, adding two 32 bit integers returns jus a 32 bit integer. And adding two doubles just returns a double. This is because the (virtual) machine is optimized for working with these sizes, which is because physical processors used to be optimized for working with these sizes. Some of them still are. 32 bit operations are often still faster than 64 bit operations, even on 64 bit processors. However, 16 bit operations are not or barely.
Your compiler attempts to protect you against overflows, and allows you to check for them explicitly. So unless you have a good reason not to, I'd default to using these types, and optionally trunc to a compacter type when storing the data.
Good reasons not to include scenarios where you process large amounts (1000s) of numbers, e.g. for graphic processing.
How would you implement a ROUND function:
ROUND(value, number of digits)
pi=3.14159265358979323
so, for example, ROUND(pi, 3) = 3.142
if you had these functions at your disposal:
AINT - truncates a value to a whole number
ANINT - calculates the nearest whole number
NINT - returns the nearest integer to the argument
or never minding the above functions, how is floating ROUND done at all ?
If you don't need to worry about overflow, here's how:
ROUND(value, nod) = NINT(value * POWER(10, nod)) / POWER(10, nod)
Otherwise you need to take care of the integer part and the float part separately.
I would assume, excuse my pseudo-code
function Round(value, num){
numsToSave = POWER(10, num);
value *= numsToSave ; //Get the numbers we don't want rounded on the left side of the floating point
value = AINT( ANINT(value) );
value /= numsToSave;
return value;
}
or
function Round(value, num){
numsToSave = POWER(10, num);
value *= numsToSave ; //Get the numbers we don't want rounded on the left side of the floating point
value = NINT(value);
value /= numsToSave;
return value;
}