Is this expected behavior for float fused-multiply-add?

Is this expected behavior for float fused-multiply-add? - rust

I have three numbers with precise representation using (32-bit) floats:
x = 16277216, y = 16077216, z = -261692320000000
I expect performing a fused-multiply-add x*y+z to return the mathematically correct value but rounded.
The correct mathematical value is -2489344, which need not be rounded, and therefore this should be the output of a fused-multiply-add.
But when I perform fma(x,y,z) the result is -6280192 instead.
Why?
I'm using rust.
Note z is the rounded result of -x*y.
let x: f32 = 16277216.0;
let y: f32 = 16077216.0;
let z = - x * y;
assert_eq!(z, -261692320000000.0 as f32); // pass
let result = x.mul_add(y, z);
assert_eq!(result, -2489344.0 as f32); // fail
println!("x: {:>32b}, {}", x.to_bits(), x);
println!("y: {:>32b}, {}", y.to_bits(), y);
println!("z: {:>32b}, {}", z.to_bits(), z);
println!("result: {:>32b}, {}", result.to_bits(), result);
The output is
x: 1001011011110000101111011100000, 16277216
y: 1001011011101010101000110100000, 16077216
z: 11010111011011100000000111111110, -261692320000000
result: 11001010101111111010100000000000, -6280192

I have three numbers with precise representation using (32-bit) floats:
x = 16277216, y = 16077216, z = -261692320000000
This premise is false. -261,692,320,000,000 cannot be represented exactly in any 32-bit floating-point format because its significand requires 37 bits to represent.
The IEEE-754 binary32 format commonly used for float has 24-bit significands. Scaling the significand of −261,692,320,000,000 to be under 224 in magnitude yields −261,692,320,000,000 = −15,598,077.7740478515625•224. As we can see, the significand is not an integer at this scale, so it cannot be represented exactly, and I would not call it precise either. The closest representable value is −15,598,078•224 = -261,692,323,790,848.
println!("z: {:>32b}, {}", z.to_bits(), z);
…
z: 11010111011011100000000111111110, -261692320000000
Rust is lying; the value of z is not -261692320000000. It may have used some algorithm like rounding to 8 significant digits and using zeros for the rest. The actual value of z is −261,692,323,790,848.
The value of 16,277,216•16,077,216 − 261,692,323,790,848 using ordinary real-number arithmetic is −6,280,192, so that result for the FMA is correct.
The rounding error occurred in let z = - x * y;, where multiplying 16,277,216 and 16,077,216 rounded the real-number-arithmetic result of 261,692,317,510,656 to the nearest value representable in binary32, 261,692,323,790,848.

Related

Rust signed modulo unsigned -> unsigned

In (stable) Rust, is there a relatively straightforward method of implementing the following function?
fn mod_euclid(val: i128, modulo: u128) -> u128;
Note the types! That is, 'standard' euclidean modulus (result is always in the range of [0, mod)), avoiding spurious overflow/underflow in the intermediate calculation. Some test cases:
// don't-care, just no panic or UB.
// Mild preference for treating this as though it was mod=1<<128 instead of 0.
assert_dc!(mod_euclid(i128::MAX, 0));
assert_dc!(mod_euclid( 0, 0));
assert_dc!(mod_euclid(i128::MIN, 0));
assert_eq!(mod_euclid( 1, 10), 1);
assert_eq!(mod_euclid( -1, 10), 9);
assert_eq!(mod_euclid( 11, 10), 1);
assert_eq!(mod_euclid( -11, 10), 9);
assert_eq!(mod_euclid(i128::MAX, 1), 0);
assert_eq!(mod_euclid( 0, 1), 0);
assert_eq!(mod_euclid(i128::MIN, 1), 0);
assert_eq!(mod_euclid(i128::MAX, u128::MAX), i128::MAX as u128);
assert_eq!(mod_euclid( 0, u128::MAX), 0);
assert_eq!(mod_euclid(i128::MIN, u128::MAX), i128::MAX as u128);
For signed%signed->signed, or unsigned%unsigned->unsigned, this is relatively straightforward. However, I can't find a good way of calculating signed % unsigned -> unsigned without converting one of the arguments - and as the last example illustrates, this may overflow or underflow no matter which direction you choose.

As far as I can tell, there is no such function in the standard library, but it's not very difficult to write one yourself:
fn mod_euclid(a: i128, b: u128) -> u128 {
if a >= 0 {
(a as u128) % b
} else {
let r = (!a as u128) % b;
b - r - 1
}
}
Playground link
How it works:
If a is non-negative then it's straightforward - just use the unsigned remainder operator.
Otherwise, the bitwise complement !a is non-negative (because the sign bit is flipped), and numerically equal to -a - 1. This means r is equivalent to b - a - 1 modulo b, and hence b - r - 1 is equivalent to a modulo b. Conveniently, b - r - 1 is in the expected range 0..b.

Maybe a little bit more straight forward, use rem_euclid where possible and else return the positive value equivalent to a:
pub fn mod_euclid(a: i128, b: u128) -> u128 {
const UPPER: u128 = i128::MAX as u128;
match b {
1..=UPPER => a.rem_euclid(b as i128) as u128,
_ if a >= 0 => a as u128,
// turn a from two's complement negative into it's
// equivalent positive value by adding u128::MAX
// essentialy calculating u128::MAX - |a|
_ => u128::MAX.wrapping_add_signed(a),
//_ => a as u128 - (a < 0) as u128,
}
}
(The parser didn't like my casting in the match hence UPPER)
Playground
Results in a little fewer instructions & jumps on x86_64 as well.

Why does Rust's floating point division work differently than equivalent C++ code?

Any calculations x / y (where both operands are 32bit floats) yield 0 in Rust when I compile for the AVR ISA. Specifically, I use avr_hal for interacting with the Arduino. I read unsigned integer values, convert them to floating point numbers, divide to get a relative quantity, and multiply with another number (such that f*(x/y) > 1) and cast back to an unsigned integer:
let mut read_value: u16 = 0;
loop {
read_value = pot.analog_read(&mut adc); // between 0 and 1023, potentiometer input
let relative: f32 = (read_value as f32) / 1023.0; // between 0.0 and 1.0
let v: f32 = relative * 1800.0; // > 1.0 in most cases
ufmt::uwriteln!("{}", v as u16); // always 0
}
This problem doesn't seem to occur with seemingly equivalent C++ code:
float read_value = 0;
void loop() {
read_value = analogRead(A0);
float relative = read_value / 1023.0f;
float v = relative * 1000.0;
Serial.println(static_cast<int>(v));
}
What exactly is this Rust-quirk?

Format number by rounding up

I have a number which I want to print with a fixed precision, rounded up. I know I can use {:.3} to truncate it.
assert_eq!("0.0123", format!("{:.3}", 0.0123456))
Is there a simple way to "ceil" it instead?
assert_eq!("0.0124", format!("{:magic}", 0.012301))
assert_eq!("0.0124", format!("{:magic}", 0.012399))
assert_eq!("0.0124", format!("{:magic}", 0.0124))
I can do something like
let x = format!("{:.3}", (((y * 1000.0).ceil() + 0.5) as i64) as f64 / 1000.0)
which is pretty unreadable. It also gives me would give me 3 digits after the decimal point, not three digits of precision, so I need to figure out the scale the number, probably with something like -log10(y) as i64
In case it's not clear, I want a string to show the user, not an f64.
More examples
assert_eq!("1.24e-42", format!("{:magic}", 1.234e-42))
assert_eq!("1240", format!("{:magic}", 1234.5)) // "1240." also works
If the f64 representing 0.123 is slightly larger than the real number 0.123, displaying "0.124" is acceptable.
The two requirements are:
The string, when converted back to an f64, is greater than or equal to the original f64 (so 0.123 -> "0.124" is acceptable)
The string has 3 significant digits (although dropping trailing zeros is acceptable, so 0.5 -> "0.5" and "0.5 -> "0.500" both work)
In case it comes up, the input number will always be positive.

This is harder than it seems because there is no way to tell the formatting machinery to change the rounding strategy. Also, format precision works on the number of digits after the decimal point, not on the number of significant digits. (AFAIK there is no equivalent to the printf("%.3g", n), and even if there were, it wouldn't round up.)
You can use a decimal arithmetic crate such as rust_decimal to do the heavy-lifting - something like:
use rust_decimal::prelude::*;
pub fn fmtup(n: f64, ndigits: u32) -> String {
let d = Decimal::from_f64_retain(n).unwrap();
d.round_sf_with_strategy(ndigits, RoundingStrategy::AwayFromZero)
.unwrap()
.normalize()
.to_string()
}
EDIT: The answer originally included a manual implementation of the rounding due to issues in rust_decimal which have since been fixed. As of Oct 24 2021 the above snippet using rust_decimal is the recommended solution. The only exception is if you need to handle numbers that are very large or very close to zero (such as 1.234e-42 or 1.234e42), which are approximated to zero or rejected by rust_decimal.
To manually round to significant digits, one can scale the number until it has the desired number of digits before the decimal point, and then round it up. In case of 3 digits, scaling would multiply or divide it by 10 until it falls between 100 and 1000. After rounding the number, format the resulting whole number as string, and insert the . at the position determined by the amount of scaling done in the first step.
To avoid inexactness of floating-point division by ten, the number can be first converted to a fraction, and then all operations can proceed on the fraction. Here is an implementation that uses the ubiquitous num crate to provide fractions:
use num::{rational::BigRational, FromPrimitive};
/// Format `n` to `ndigits` significant digits, rounding away from zero.
pub fn fmtup(n: f64, ndigits: i32) -> String {
// Pass 0 (which we can't scale), infinities and NaN to f64::to_string()
if n == 0.0 || !n.is_finite() {
return n.to_string();
}
// Handle negative numbers the easy way.
if n < 0.0 {
return format!("-{}", fmtup(-n, ndigits));
}
// Convert the input to a fraction. From this point onward, we are only doing exact
// arithmetic.
let mut n = BigRational::from_float(n).unwrap();
// Scale N so its whole part is ndigits long, meaning truncating it will result in an
// integer ndigits long. If ndigits is 3, we'd want N to be in (100, 1000] range, so
// that e.g. 0.012345 would be scaled to 123.45, and then rounded up to 124.
let mut scale = 0i16;
let ten = BigRational::from_u8(10).unwrap();
let lower_bound = ten.pow(ndigits - 1);
if n < lower_bound {
while n < lower_bound {
n *= &ten;
scale -= 1;
}
} else {
let upper_bound = lower_bound * &ten;
while n >= upper_bound {
n /= &ten;
scale += 1;
}
}
// Round N up
n = n.ceil();
// Format the number as integer and place the decimal point at the right position.
let mut s = n.to_string();
// multiply N with 10**scale, i.e. append zeros if SCALE is positve, otherwise
// insert the point inside or before the number
if scale > 0 {
s.extend(std::iter::repeat('0').take(scale as _));
} else if scale < 0 {
// Find where to place the decimal point in the string.
let point_pos = s.len() as i16 + scale;
if point_pos <= 0 {
// Negative position means before beginning of the string, so we have
// to pad with zeros. E.g. s == "123" and point_pos == -2 means we
// want "0.00123", and with point_pos == 0 we'd want "0.123".
let mut pad = "0.".to_string();
pad.extend(std::iter::repeat('0').take(-point_pos as _));
pad.push_str(&s);
s = pad;
// Trim trailing zeros after decimal point. E.g. 0.25 gets scaled to
// 250 and then ends up "0.250".
s.truncate(s.trim_end_matches('0').len());
} else {
// Insert the decimal point in the middle of string. E.g. s == "123"
// and point_pos == 1 would result in "1.23".
let point_pos = point_pos as usize;
if s.as_bytes()[point_pos..].iter().all(|&digit| digit == b'0') {
// if only zeros are after the decimal point, e.g. "10.000", omit those
// digits instead of placing the decimal point.
s.truncate(point_pos);
} else {
s.insert(point_pos, '.');
}
}
}
s
}
Playground
Here are some test cases:
fn main() {
let fmt3up = |n| fmtup(n, 3);
assert_eq!("12400", fmt3up(12301.));
assert_eq!("1240", fmt3up(1234.5));
assert_eq!("124", fmt3up(123.01));
assert_eq!("1000", fmt3up(1000.));
assert_eq!("999", fmt3up(999.));
assert_eq!("1010", fmt3up(1001.));
assert_eq!("100", fmt3up(100.));
assert_eq!("10", fmt3up(10.));
assert_eq!("99", fmt3up(99.));
assert_eq!("101", fmt3up(101.));
assert_eq!("0.25", fmt3up(0.25));
assert_eq!("12400", fmt3up(12301.0));
assert_eq!("0.0124", fmt3up(0.0123)); // because 0.0123 is slightly above 123/10_000
assert_eq!("0.0124", fmt3up(0.012301));
assert_eq!("0.00124", fmt3up(0.0012301));
assert_eq!("0.0124", fmt3up(0.012399));
assert_eq!("0.0124", fmt3up(0.0124));
assert_eq!("0.124", fmt3up(0.12301));
assert_eq!("1.24", fmt3up(1.2301));
assert_eq!("1.24", fmt3up(1.234));
}
Note that this will display 1.234e-42 as 0.00000000000000000000000000000000000000000124, but an improvement that to switch to exponential notation should be fairly straightforward.

How do you format a float to the first significant decimal and with specified precision

I'm new to Rust (and coming from a Javascript background), so I decided to build a number formatting application to learn more about it.
One function in the application can accept a tiny float (e.g. 0.000435), and a precision value (e.g. 2), and should return the float formatted to the first significant decimal and with the specified precision applied (e.g. 0.00044)
For example, the function should accept and return the following:
fn meh(float: f64, precision: usize) -> String {
// ... magic happens ... format!(...
}
let float = 0.000456;
let precision = 2:
let result_a = meh(float, precision);
// result_a => "0.00046"
let float = 0.043256;
let precision = 3:
let result_b = meh(float, precision);
// result_b => "0.0433"
I know that format! helps with precision. But I can't find a decent way to find the first significant decimal with out doing something funny like "convert float to a String and iterate until a non-zero value is found...."
I hope that makes sense, any help would be most appreciated.

As mentioned in the comments, Rust's interpretation of "precision" is "number of digits after the decimal point". However, if we instead want it to mean "number of significant digits", we can write the meh function to take this into account:
fn meh(float: f64, precision: usize) -> String {
// compute absolute value
let a = float.abs();
// if abs value is greater than 1, then precision becomes less than "standard"
let precision = if a >= 1. {
// reduce by number of digits, minimum 0
let n = (1. + a.log10().floor()) as usize;
if n <= precision {
precision - n
} else {
0
}
// if precision is less than 1 (but non-zero), then precision becomes greater than "standard"
} else if a > 0. {
// increase number of digits
let n = -(1. + a.log10().floor()) as usize;
precision + n
// special case for 0
} else {
0
};
// format with the given computed precision
format!("{0:.1$}", float, precision)
}
Playground example with test cases

Check if a float can be converted to integer without loss

I wanted to check whether an integer was a power of 2. My standard approach would have been to see if log₂(x) was an integer value, however I found no elegant way to do this. My approaches were the following:
let x = 65;
let y = (x as f64).log(2.0);
// Compute the difference between y and the result of
// of truncating y by casting to int and back
let difference = y - (y as i64 as f64);
// This looks nice but matches on float values will be phased out
match difference {
0.0 => println!("{} is a power of 2", x),
_ => println!("{} is NO power of 2", x),
}
// This seems kind of clunky
if difference == 0.0 {
println!("{} is a power of 2", x);
} else {
println!("{} is NO power of 2", x);
}
Is there a builtin option in Rust to check if a float can be converted to an integer without truncation?
Something that behaves like:
42.0f64.is_int() // True/ Ok()
42.23f64.is_int() // False/ Err()
In other words, a method/ macro/ etc. that allows me to check if I will lose information (decimals) by casting to int.
I already found that checking whether an integer is a power of 2 can be done efficiently with x.count_ones() == 1.

You can use fract to check if there is a non-zero fractional part:
42.0f64.fract() == 0.0;
42.23f64.fract() != 0.0;
Note that this only works if you already know that the number is in range. If you need an extra check to test that the floating-point number is between 0 and u32::MAX (or between i32::MIN and i32::MAX), then you might as well do the conversion and check that it didn't lose precision:
x == (x as u32) as f64

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Is this expected behavior for float fused-multiply-add? - rust

Related

Rust signed modulo unsigned -> unsigned

Why does Rust's floating point division work differently than equivalent C++ code?

Format number by rounding up

How do you format a float to the first significant decimal and with specified precision

Check if a float can be converted to integer without loss

Categories

Resources