Floating point addition algorithm - rust

I'm attempting to write an addition algorithm of two floating point numbers in Rust. I have nearly got it to work, but there are a few cases when the final mantissa is one off from what it should be. (I'm not yet dealing with subnormal numbers). My algorithm is:
fn add_f32(a: f32, b: f32) -> f32 {
let a_bits = a.to_bits();
let b_bits = b.to_bits();
let a_exp = (a_bits << 1) >> (23 + 1);
let b_exp = (b_bits << 1) >> (23 + 1);
let mut a_mant = a_bits & 0x007fffff;
let mut b_mant = b_bits & 0x007fffff;
let mut a_exp = (a_exp as i32).wrapping_sub(127);
let mut b_exp = (b_exp as i32).wrapping_sub(127);
if b_exp > a_exp {
// If b has a larger exponent than a, swap a and b so that a has the larger exponent
core::mem::swap(&mut a_mant, &mut b_mant);
core::mem::swap(&mut a_exp, &mut b_exp);
}
let exp_diff = (a_exp - b_exp) as u32;
// Add the implicit leading 1 bit to the mantissas
a_mant |= 1 << 23;
b_mant |= 1 << 23;
// Append an extra bit to the mantissas to ensure correct rounding
a_mant <<= 1;
b_mant <<= 1;
// If the shift causes an overflow, the b_mant is too small so is set to 0
b_mant = b_mant.checked_shr(exp_diff).unwrap_or(0);
let mut mant = a_mant + b_mant;
let overflow = (mant >> 25) != 0;
if !overflow {
// Check to see if we round up
if mant & 1 == 1 {
mant += 1;
}
}
// check for overflow caused by rounding up
let overflow = overflow || (mant >> 25) != 0;
mant >>= 1;
if overflow {
if mant & 1 == 1 {
mant += 1;
}
// Check to see if we round up
mant >>= 1;
a_exp += 1;
}
// Remove implicit leading one
mant <<= 9;
mant >>= 9;
f32::from_bits(mant | ((a_exp.wrapping_add(127) as u32) << 23))
}
For example, the test
#[test]
fn test_add_small() {
let a = f32::MIN_POSITIVE;
let b = f32::from_bits(f32::MIN_POSITIVE.to_bits() + 1);
let c = add_f32(a, b);
let d = a + b;
assert_eq!(c, d);
}
fails, with the actual answer being 00000001000000000000000000000000 (binary representation) and my answer being 00000001000000000000000000000001.
Is anyone able to help me with what is wrong with my code?

Related

Changing the variables in Rust [duplicate]

In Rust, in order to change the value of a mutable variable, what is the difference in let x = 12 or x = 12 in the following sample code?
fn main() {
let mut x: i32 = 8;
{
println!("{}", x);
let x = 12; // what if change to x = 12
println!("{}", x);
}
println!("{}", x);
let x = 42;
println!("{}", x);
}
The output is 8, 12, 8, 42. If I change let x = 12 to x = 12 ...
fn main() {
let mut x: i32 = 8;
{
println!("{}", x);
x = 12;
println!("{}", x);
}
println!("{}", x);
let x = 42;
println!("{}", x);
}
The output is 8, 12, 12, 42.
I understand that Rust uses let to do variable binding, so the let x = 12 is a variable rebinding and the binding is only valid inside a scope. But how to explain the functionality of x = 12 and the corresponding scope? Is that a type of variable binding?
The second let x introduces a second binding that shadows the first one for the rest of the block. That is, there are two variables named x, but you can only access the second one within the block statement after the let x = 12; statement. These two variables don't need to have the same type!
Then, after the block statement, the second x is out of scope, so you access the first x again.
However, if you write x = 12; instead, that's an assignment expression: the value in x is overwritten. This doesn't introduce a new variable, so the type of the value being assigned must be compatible with the variable's type.
This difference is important if you write a loop. For example, consider this function:
fn fibonacci(mut n: u32) -> u64 {
if n == 0 {
return 1;
}
let mut a = 1;
let mut b = 1;
loop {
if n == 1 {
return b;
}
let next = a + b;
a = b;
b = next;
n -= 1;
}
}
This function reassigns variables, so that each iteration of the loop can operate on the values assigned on the preceding iteration.
However, you might be tempted to write the loop like this:
loop {
if n == 1 {
return b;
}
let (a, b) = (b, a + b);
n -= 1;
}
This doesn't work, because the let statement introduces new variables, and these variables will go out of scope before the next iteration begins. On the next iteration, (b, a + b) will still use the original values.

How to compare i32 with usize

I'm writing a simple insert sort. Here are the relevant codes.
fn main() {
let mut sort_vec = vec![5,2,4,6,1,3];
for j in 1..sort_vec.len() {
let key = sort_vec[j];
let mut i = j - 1;
while i > 0 && sort_vec[i] > key {
sort_vec[i+1] = sort_vec[i];
i = i - 1;
}
sort_vec[i+1] = key;
}
println!("{:?}",sort_vec);
}
It's input is [5, 1, 2, 3, 4, 6].
The problem was when while i > 0 becomes while i >= 0 or while i > -1,it won't work.
So is there a problem comparing i32 with usize? I tried some methods and couldn't succeed. So how should I handle it? Be deeply grateful!
If you change while i > 0 to while i >= 0 the compiler gives you a warning:
warning: comparison is useless due to type limits
--> src\main.rs:9:15
|
9 | while i >= 0 && sort_vec[i] > key {
| ^^^^^^
|
= note: `#[warn(unused_comparisons)]` on by default
and the code panicks at runtime:
thread 'main' panicked at 'attempt to subtract with overflow', src\main.rs:11:17
The problem is that if i goes down to 0 and you try to subtract 1, the integer i overflows because its type is usize which has to be non-negative.
Because usize can't be negative your comparison i >= 0 is always true (that's the compiler warning).
I would recommend changing the logic a bit: Don't compare the second last element with the next one but compare the last element with the previous one.
So you have to start not at j - 1 but at j, and replace because of that in the following lines i in every index with i - 1:
sort_vec[i] -> sort_vec[i - 1]
sort_vec[i + 1] -> sort_vec[i]
working code:
fn main() {
let mut sort_vec = vec![5, 2, 4, 6, 1, 3];
for j in 1..sort_vec.len() {
let key = sort_vec[j];
let mut i = j;
while i > 0 && sort_vec[i - 1] > key {
sort_vec[i] = sort_vec[i - 1];
i = i - 1;
}
sort_vec[i] = key;
}
println!("{:?}", sort_vec);
}
Now you can see the unnecessary assignment let mut i = j. You could change the head of the for-loop to for mut j in ... to remove that and replace all i with j.
fn main() {
let mut sort_vec = vec![5, 2, 4, 6, 1, 3];
for mut j in 1..sort_vec.len() {
let key = sort_vec[j];
while j > 0 && sort_vec[j - 1] > key {
sort_vec[j] = sort_vec[j - 1];
j = j - 1;
}
sort_vec[j] = key;
}
println!("{:?}", sort_vec);
}
Here there's a nice trick (note that in Rust, wrapping arithmetic has to be explicit or it will panic in debug mode):
while i < j && sort_vec[i] > key {
sort_vec[i + 1] = sort_vec[i];
i = i.wrapping_sub(1);
}
sort_vec[i.wrapping_add(1)] = key;
Playground.
The idea is that we let i underflow then overflow back - but when it'll underflow it will be no longer less than j, so the loop will stop.
i has type usize, so it can never be less than 0. Therefore i >= 0 or i >= -1 are always true. One way to fix your issue is to use a for loop with a reversed range:
fn main() {
let mut sort_vec = vec![5, 2, 4, 6, 1, 3];
for j in 1..sort_vec.len() {
for i in (0..j).rev() {
if sort_vec[i+1] < sort_vec[i] {
sort_vec.swap (i, i+1);
} else {
break;
}
}
}
println!("{:?}", sort_vec);
}
Playground

How do you alter a variable's value from within a for loop? [duplicate]

In Rust, in order to change the value of a mutable variable, what is the difference in let x = 12 or x = 12 in the following sample code?
fn main() {
let mut x: i32 = 8;
{
println!("{}", x);
let x = 12; // what if change to x = 12
println!("{}", x);
}
println!("{}", x);
let x = 42;
println!("{}", x);
}
The output is 8, 12, 8, 42. If I change let x = 12 to x = 12 ...
fn main() {
let mut x: i32 = 8;
{
println!("{}", x);
x = 12;
println!("{}", x);
}
println!("{}", x);
let x = 42;
println!("{}", x);
}
The output is 8, 12, 12, 42.
I understand that Rust uses let to do variable binding, so the let x = 12 is a variable rebinding and the binding is only valid inside a scope. But how to explain the functionality of x = 12 and the corresponding scope? Is that a type of variable binding?
The second let x introduces a second binding that shadows the first one for the rest of the block. That is, there are two variables named x, but you can only access the second one within the block statement after the let x = 12; statement. These two variables don't need to have the same type!
Then, after the block statement, the second x is out of scope, so you access the first x again.
However, if you write x = 12; instead, that's an assignment expression: the value in x is overwritten. This doesn't introduce a new variable, so the type of the value being assigned must be compatible with the variable's type.
This difference is important if you write a loop. For example, consider this function:
fn fibonacci(mut n: u32) -> u64 {
if n == 0 {
return 1;
}
let mut a = 1;
let mut b = 1;
loop {
if n == 1 {
return b;
}
let next = a + b;
a = b;
b = next;
n -= 1;
}
}
This function reassigns variables, so that each iteration of the loop can operate on the values assigned on the preceding iteration.
However, you might be tempted to write the loop like this:
loop {
if n == 1 {
return b;
}
let (a, b) = (b, a + b);
n -= 1;
}
This doesn't work, because the let statement introduces new variables, and these variables will go out of scope before the next iteration begins. On the next iteration, (b, a + b) will still use the original values.

Converting large number stored in array of u32 to bytes and back

I'm doing some computational mathematics in Rust, and I have some large numbers which I store in an array of 24 values. I have functions that convert them to bytes and back, but it doesn't work fine for u32 values, whereas it works fine for u64. The code sample can be found below:
fn main() {
let mut bytes = [0u8; 96]; // since u32 is 4 bytes in my system, 4*24 = 96
let mut j;
let mut k: u32;
let mut num: [u32; 24] = [1335565270, 4203813549, 2020505583, 2839365494, 2315860270, 442833049, 1854500981, 2254414916, 4192631541, 2072826612, 1479410393, 718887683, 1421359821, 733943433, 4073545728, 4141847560, 1761299410, 3068851576, 1582484065, 1882676300, 1565750229, 4185060747, 1883946895, 4146];
println!("original_num: {:?}", num);
for i in 0..96 {
j = i / 4;
k = (i % 4) as u32;
bytes[i as usize] = (num[j as usize] >> (4 * k)) as u8;
}
println!("num_to_ytes: {:?}", &bytes[..]);
num = [0u32; 24];
for i in 0..96 {
j = i / 4;
k = (i % 4) as u32;
num[j as usize] |= (bytes[i as usize] as u32) << (4 * k);
}
println!("recovered_num: {:?}", num);
}
Rust playground
The above code does not retrieve the correct number from the byte array. But, if I change all u32 to u64, all 4s to 8s, and reduce the size of num from 24 values to 12, it works all fine. I assume I have some logical problem for the u32 version. The correctly working u64 version can be found in this Rust playground.
Learning how to create a MCVE is a crucial skill when programming. For example, why do you have an array at all? Why do you reuse variables?
Your original first number is 0x4F9B1BD6, the output first number is 0x000B1BD6.
Comparing the intermediate bytes shows that you have garbage:
let num = 0x4F9B1BD6_u32;
println!("{:08X}", num);
let mut bytes = [0u8; BYTES_PER_U32];
for i in 0..bytes.len() {
let k = (i % BYTES_PER_U32) as u32;
bytes[i] = (num >> (4 * k)) as u8;
}
for b in &bytes {
print!("{:X}", b);
}
println!();
4F9B1BD6
D6BD1BB1
Printing out the values of k:
for i in 0..bytes.len() {
let k = (i % BYTES_PER_U32) as u32;
println!("{} / {}", k, 4 * k);
bytes[i] = (num >> (4 * k)) as u8;
}
Shows that you are trying to shift by multiples of 4 bits:
0 / 0
1 / 4
2 / 8
3 / 12
I'm pretty sure that every common platform today uses 8 bits for a byte, not 4.
This is why magic numbers are bad. If you had used constants for the values, you would have noticed the problem much sooner.
since u32 is 4 bytes in my system
A u32 better be 4 bytes on every system — that's why it's a u32.
Overall, don't reinvent the wheel. Use the byteorder crate or equivalent:
extern crate byteorder;
use byteorder::{BigEndian, ReadBytesExt, WriteBytesExt};
const LENGTH: usize = 24;
const BYTES_PER_U32: usize = 4;
fn main() {
let num: [u32; LENGTH] = [
1335565270, 4203813549, 2020505583, 2839365494, 2315860270, 442833049, 1854500981,
2254414916, 4192631541, 2072826612, 1479410393, 718887683, 1421359821, 733943433,
4073545728, 4141847560, 1761299410, 3068851576, 1582484065, 1882676300, 1565750229,
4185060747, 1883946895, 4146,
];
println!("original_num: {:?}", num);
let mut bytes = [0u8; LENGTH * BYTES_PER_U32];
{
let mut bytes = &mut bytes[..];
for &n in &num {
bytes.write_u32::<BigEndian>(n).unwrap();
}
}
let mut num = [0u32; LENGTH];
{
let mut bytes = &bytes[..];
for n in &mut num {
*n = bytes.read_u32::<BigEndian>().unwrap();
}
}
println!("recovered_num: {:?}", num);
}

How to use multiple variables in Rust's for loop?

In the C family of languages, I can do this on one line:
for(int i = lo, int j = mid+1; i <= mid && j <= hi; i++, j++){
...
}
But in Rust... I can only write it like this:
for i in lo..mid+1 {
let mut j = mid+1;
if j <= hi {
break;
}
...
j += 1;
}
Is there's a more efficient way to implement this?
Using an iterator works for above, but using an iterator makes some occasions like using arithmetic troublesome, such as
for (int i = 0; i < n; i ++) {
if (a[i] == ...) {
i += 5;
}
}
In Rust, this does not work. The variable i will not be incremented by 5, but by 1 instead:
for i in 0..n {
if a[i] == ... {
i += 5;
}
}
You can create two parallel range iterators, zip them, then iterate though the combination:
fn main() {
let values = [10, 20, 30, 40, 50, 60, 70, 80, 90];
let lo = 2;
let mid = 5;
let hi = 7;
let early_indexes = lo..(mid + 1);
let late_indexes = (mid + 1)..(hi + 1);
for (i, j) in early_indexes.zip(late_indexes) {
println!("{}, {}", i, j);
println!("{} - {}", values[i], values[j]);
}
}
Someday, inclusive ranges will be stabilized, and you should be able to something like this (depending on the eventual syntax):
let early_indexes = lo...mid;
let late_indexes = (mid + 1)...hi;
for (i, j) in early_indexes.zip(late_indexes) {
println!("{}, {}", i, j);
println!("{} - {}", values[i], values[j]);
}
If you are actually iterating though a slice as I've shown for my example, you can also just combine the two iterators directly and ignore the index:
let early_values = values[lo..(mid + 1)].iter();
let late_values = values[(mid + 1)..(hi + 1)].iter();
for (i, j) in early_values.zip(late_values) {
println!("{}, {}", i, j);
}
The variable i will not be incremented by 5, but by 1 instead.
Yes, incrementing by a step is annoying, and some day it will also be stabilized. In the meantime:
What is a stable way to iterate on a range with custom step?
How do I iterate over a range with a custom step?
If you need full control, you can always use while or loop:
let mut i = 0;
while i < n {
if a[i] == ... {
i += 5;
}
i += 1;
}

Resources