Format number by rounding up - rust

I have a number which I want to print with a fixed precision, rounded up. I know I can use {:.3} to truncate it.
assert_eq!("0.0123", format!("{:.3}", 0.0123456))
Is there a simple way to "ceil" it instead?
assert_eq!("0.0124", format!("{:magic}", 0.012301))
assert_eq!("0.0124", format!("{:magic}", 0.012399))
assert_eq!("0.0124", format!("{:magic}", 0.0124))
I can do something like
let x = format!("{:.3}", (((y * 1000.0).ceil() + 0.5) as i64) as f64 / 1000.0)
which is pretty unreadable. It also gives me would give me 3 digits after the decimal point, not three digits of precision, so I need to figure out the scale the number, probably with something like -log10(y) as i64
In case it's not clear, I want a string to show the user, not an f64.
More examples
assert_eq!("1.24e-42", format!("{:magic}", 1.234e-42))
assert_eq!("1240", format!("{:magic}", 1234.5)) // "1240." also works
If the f64 representing 0.123 is slightly larger than the real number 0.123, displaying "0.124" is acceptable.
The two requirements are:
The string, when converted back to an f64, is greater than or equal to the original f64 (so 0.123 -> "0.124" is acceptable)
The string has 3 significant digits (although dropping trailing zeros is acceptable, so 0.5 -> "0.5" and "0.5 -> "0.500" both work)
In case it comes up, the input number will always be positive.

This is harder than it seems because there is no way to tell the formatting machinery to change the rounding strategy. Also, format precision works on the number of digits after the decimal point, not on the number of significant digits. (AFAIK there is no equivalent to the printf("%.3g", n), and even if there were, it wouldn't round up.)
You can use a decimal arithmetic crate such as rust_decimal to do the heavy-lifting - something like:
use rust_decimal::prelude::*;
pub fn fmtup(n: f64, ndigits: u32) -> String {
let d = Decimal::from_f64_retain(n).unwrap();
d.round_sf_with_strategy(ndigits, RoundingStrategy::AwayFromZero)
.unwrap()
.normalize()
.to_string()
}
EDIT: The answer originally included a manual implementation of the rounding due to issues in rust_decimal which have since been fixed. As of Oct 24 2021 the above snippet using rust_decimal is the recommended solution. The only exception is if you need to handle numbers that are very large or very close to zero (such as 1.234e-42 or 1.234e42), which are approximated to zero or rejected by rust_decimal.
To manually round to significant digits, one can scale the number until it has the desired number of digits before the decimal point, and then round it up. In case of 3 digits, scaling would multiply or divide it by 10 until it falls between 100 and 1000. After rounding the number, format the resulting whole number as string, and insert the . at the position determined by the amount of scaling done in the first step.
To avoid inexactness of floating-point division by ten, the number can be first converted to a fraction, and then all operations can proceed on the fraction. Here is an implementation that uses the ubiquitous num crate to provide fractions:
use num::{rational::BigRational, FromPrimitive};
/// Format `n` to `ndigits` significant digits, rounding away from zero.
pub fn fmtup(n: f64, ndigits: i32) -> String {
// Pass 0 (which we can't scale), infinities and NaN to f64::to_string()
if n == 0.0 || !n.is_finite() {
return n.to_string();
}
// Handle negative numbers the easy way.
if n < 0.0 {
return format!("-{}", fmtup(-n, ndigits));
}
// Convert the input to a fraction. From this point onward, we are only doing exact
// arithmetic.
let mut n = BigRational::from_float(n).unwrap();
// Scale N so its whole part is ndigits long, meaning truncating it will result in an
// integer ndigits long. If ndigits is 3, we'd want N to be in (100, 1000] range, so
// that e.g. 0.012345 would be scaled to 123.45, and then rounded up to 124.
let mut scale = 0i16;
let ten = BigRational::from_u8(10).unwrap();
let lower_bound = ten.pow(ndigits - 1);
if n < lower_bound {
while n < lower_bound {
n *= &ten;
scale -= 1;
}
} else {
let upper_bound = lower_bound * &ten;
while n >= upper_bound {
n /= &ten;
scale += 1;
}
}
// Round N up
n = n.ceil();
// Format the number as integer and place the decimal point at the right position.
let mut s = n.to_string();
// multiply N with 10**scale, i.e. append zeros if SCALE is positve, otherwise
// insert the point inside or before the number
if scale > 0 {
s.extend(std::iter::repeat('0').take(scale as _));
} else if scale < 0 {
// Find where to place the decimal point in the string.
let point_pos = s.len() as i16 + scale;
if point_pos <= 0 {
// Negative position means before beginning of the string, so we have
// to pad with zeros. E.g. s == "123" and point_pos == -2 means we
// want "0.00123", and with point_pos == 0 we'd want "0.123".
let mut pad = "0.".to_string();
pad.extend(std::iter::repeat('0').take(-point_pos as _));
pad.push_str(&s);
s = pad;
// Trim trailing zeros after decimal point. E.g. 0.25 gets scaled to
// 250 and then ends up "0.250".
s.truncate(s.trim_end_matches('0').len());
} else {
// Insert the decimal point in the middle of string. E.g. s == "123"
// and point_pos == 1 would result in "1.23".
let point_pos = point_pos as usize;
if s.as_bytes()[point_pos..].iter().all(|&digit| digit == b'0') {
// if only zeros are after the decimal point, e.g. "10.000", omit those
// digits instead of placing the decimal point.
s.truncate(point_pos);
} else {
s.insert(point_pos, '.');
}
}
}
s
}
Playground
Here are some test cases:
fn main() {
let fmt3up = |n| fmtup(n, 3);
assert_eq!("12400", fmt3up(12301.));
assert_eq!("1240", fmt3up(1234.5));
assert_eq!("124", fmt3up(123.01));
assert_eq!("1000", fmt3up(1000.));
assert_eq!("999", fmt3up(999.));
assert_eq!("1010", fmt3up(1001.));
assert_eq!("100", fmt3up(100.));
assert_eq!("10", fmt3up(10.));
assert_eq!("99", fmt3up(99.));
assert_eq!("101", fmt3up(101.));
assert_eq!("0.25", fmt3up(0.25));
assert_eq!("12400", fmt3up(12301.0));
assert_eq!("0.0124", fmt3up(0.0123)); // because 0.0123 is slightly above 123/10_000
assert_eq!("0.0124", fmt3up(0.012301));
assert_eq!("0.00124", fmt3up(0.0012301));
assert_eq!("0.0124", fmt3up(0.012399));
assert_eq!("0.0124", fmt3up(0.0124));
assert_eq!("0.124", fmt3up(0.12301));
assert_eq!("1.24", fmt3up(1.2301));
assert_eq!("1.24", fmt3up(1.234));
}
Note that this will display 1.234e-42 as 0.00000000000000000000000000000000000000000124, but an improvement that to switch to exponential notation should be fairly straightforward.

Related

Is this expected behavior for float fused-multiply-add?

I have three numbers with precise representation using (32-bit) floats:
x = 16277216, y = 16077216, z = -261692320000000
I expect performing a fused-multiply-add x*y+z to return the mathematically correct value but rounded.
The correct mathematical value is -2489344, which need not be rounded, and therefore this should be the output of a fused-multiply-add.
But when I perform fma(x,y,z) the result is -6280192 instead.
Why?
I'm using rust.
Note z is the rounded result of -x*y.
let x: f32 = 16277216.0;
let y: f32 = 16077216.0;
let z = - x * y;
assert_eq!(z, -261692320000000.0 as f32); // pass
let result = x.mul_add(y, z);
assert_eq!(result, -2489344.0 as f32); // fail
println!("x: {:>32b}, {}", x.to_bits(), x);
println!("y: {:>32b}, {}", y.to_bits(), y);
println!("z: {:>32b}, {}", z.to_bits(), z);
println!("result: {:>32b}, {}", result.to_bits(), result);
The output is
x: 1001011011110000101111011100000, 16277216
y: 1001011011101010101000110100000, 16077216
z: 11010111011011100000000111111110, -261692320000000
result: 11001010101111111010100000000000, -6280192
I have three numbers with precise representation using (32-bit) floats:
x = 16277216, y = 16077216, z = -261692320000000
This premise is false. -261,692,320,000,000 cannot be represented exactly in any 32-bit floating-point format because its significand requires 37 bits to represent.
The IEEE-754 binary32 format commonly used for float has 24-bit significands. Scaling the significand of −261,692,320,000,000 to be under 224 in magnitude yields −261,692,320,000,000 = −15,598,077.7740478515625•224. As we can see, the significand is not an integer at this scale, so it cannot be represented exactly, and I would not call it precise either. The closest representable value is −15,598,078•224 = -261,692,323,790,848.
println!("z: {:>32b}, {}", z.to_bits(), z);
…
z: 11010111011011100000000111111110, -261692320000000
Rust is lying; the value of z is not -261692320000000. It may have used some algorithm like rounding to 8 significant digits and using zeros for the rest. The actual value of z is −261,692,323,790,848.
The value of 16,277,216•16,077,216 − 261,692,323,790,848 using ordinary real-number arithmetic is −6,280,192, so that result for the FMA is correct.
The rounding error occurred in let z = - x * y;, where multiplying 16,277,216 and 16,077,216 rounded the real-number-arithmetic result of 261,692,317,510,656 to the nearest value representable in binary32, 261,692,323,790,848.

Multiply numbers from two iterators in order and without duplicates

I have this code and I want every combination to be multiplied:
fn main() {
let min = 1;
let max = 9;
for i in (min..=max).rev() {
for j in (min..=max).rev() {
println!("{}", i * j);
}
}
}
Result is something like:
81
72
[...]
9
72
65
[...]
8
6
4
2
9
8
7
6
5
4
3
2
1
Is there a clever way to produce the results in descending order (without collecting and sorting) and without duplicates?
Note that this answer provides a solution for this specific problem (multiplication table) but the title asks a more general question (any two iterators).
The naive solution of storing all elements in a vector and then sorting it uses O(n^2 log n) time and O(n^2) space (where n is the size of the multiplication table).
You can use a priority queue to reduce the memory to O(n):
use std::collections::BinaryHeap;
fn main() {
let n = 9;
let mut heap = BinaryHeap::new();
for j in 1..=n {
heap.push((9 * j, j));
}
let mut last = n * n + 1;
while let Some((val, j)) = heap.pop() {
if val < last {
println!("{val}");
last = val;
}
if val > j {
heap.push((val - j, j));
}
}
}
playground.
The conceptual idea behind the algorithm is to consider 9 separate sequences
9*9, 9*8, 9*7, .., 9*1
8*9, 8*8, 8*7, .., 8*1
...
1*9, 1*8, 1*7, .., 1*1
Since they are all decreasing, at a given moment, we only need to consider one element of each sequence (the largest one we haven't reached yet).
These are inserted into the priority queue which allows us to efficiently find the maximum one.
Once we have printed a given element we move onto the next one in the sequence and insert that into the priority queue.
By keeping track of the last element printed we can avoid duplicates.

How do you format a float to the first significant decimal and with specified precision

I'm new to Rust (and coming from a Javascript background), so I decided to build a number formatting application to learn more about it.
One function in the application can accept a tiny float (e.g. 0.000435), and a precision value (e.g. 2), and should return the float formatted to the first significant decimal and with the specified precision applied (e.g. 0.00044)
For example, the function should accept and return the following:
fn meh(float: f64, precision: usize) -> String {
// ... magic happens ... format!(...
}
let float = 0.000456;
let precision = 2:
let result_a = meh(float, precision);
// result_a => "0.00046"
let float = 0.043256;
let precision = 3:
let result_b = meh(float, precision);
// result_b => "0.0433"
I know that format! helps with precision. But I can't find a decent way to find the first significant decimal with out doing something funny like "convert float to a String and iterate until a non-zero value is found...."
I hope that makes sense, any help would be most appreciated.
As mentioned in the comments, Rust's interpretation of "precision" is "number of digits after the decimal point". However, if we instead want it to mean "number of significant digits", we can write the meh function to take this into account:
fn meh(float: f64, precision: usize) -> String {
// compute absolute value
let a = float.abs();
// if abs value is greater than 1, then precision becomes less than "standard"
let precision = if a >= 1. {
// reduce by number of digits, minimum 0
let n = (1. + a.log10().floor()) as usize;
if n <= precision {
precision - n
} else {
0
}
// if precision is less than 1 (but non-zero), then precision becomes greater than "standard"
} else if a > 0. {
// increase number of digits
let n = -(1. + a.log10().floor()) as usize;
precision + n
// special case for 0
} else {
0
};
// format with the given computed precision
format!("{0:.1$}", float, precision)
}
Playground example with test cases

Check if a float can be converted to integer without loss

I wanted to check whether an integer was a power of 2. My standard approach would have been to see if log₂(x) was an integer value, however I found no elegant way to do this. My approaches were the following:
let x = 65;
let y = (x as f64).log(2.0);
// Compute the difference between y and the result of
// of truncating y by casting to int and back
let difference = y - (y as i64 as f64);
// This looks nice but matches on float values will be phased out
match difference {
0.0 => println!("{} is a power of 2", x),
_ => println!("{} is NO power of 2", x),
}
// This seems kind of clunky
if difference == 0.0 {
println!("{} is a power of 2", x);
} else {
println!("{} is NO power of 2", x);
}
Is there a builtin option in Rust to check if a float can be converted to an integer without truncation?
Something that behaves like:
42.0f64.is_int() // True/ Ok()
42.23f64.is_int() // False/ Err()
In other words, a method/ macro/ etc. that allows me to check if I will lose information (decimals) by casting to int.
I already found that checking whether an integer is a power of 2 can be done efficiently with x.count_ones() == 1.
You can use fract to check if there is a non-zero fractional part:
42.0f64.fract() == 0.0;
42.23f64.fract() != 0.0;
Note that this only works if you already know that the number is in range. If you need an extra check to test that the floating-point number is between 0 and u32::MAX (or between i32::MIN and i32::MAX), then you might as well do the conversion and check that it didn't lose precision:
x == (x as u32) as f64

Convert binary ( integer and fraction) from VHDL to decimal, negative value in C code

I have a 14-bit data that is fed from FPGA in vhdl, The NIos II processor reads the 14-bit data from FPGA and do some processing tasks, where Nios II system is programmed in C code
The 14-bit data can be positive, zero or negative. In Altera compiler, I can only define the data to be 8,16 or 32. So I define this to be 16 bit data.
First, I need to check if the data is negative, if it is negative, I need to pad the first two MSB to be bit '1' so the system detects it as negative value instead of positive value.
Second, I need to compute the real value of this binary representation into a decimal value of BOTH integer and fraction.
I learned from this link (Correct algorithm to convert binary floating point "1101.11" into decimal (13.75)?) that I could convert a binary (consists of both integer and fraction) to decimal values.
To be specified, I am able to use this code quoted from this link (Correct algorithm to convert binary floating point "1101.11" into decimal (13.75)?) , reproduced as below:
#include <stdio.h>
#include <math.h>
double convert(const char binary[]){
int bi,i;
int len = 0;
int dot = -1;
double result = 0;
for(bi = 0; binary[bi] != '\0'; bi++){
if(binary[bi] == '.'){
dot = bi;
}
len++;
}
if(dot == -1)
dot=len;
for(i = dot; i >= 0 ; i--){
if (binary[i] == '1'){
result += (double) pow(2,(dot-i-1));
}
}
for(i=dot; binary[i] != '\0'; i++){
if (binary[i] == '1'){
result += 1.0/(double) pow(2.0,(double)(i-dot));
}
}
return result;
}
int main()
{
char bin[] = "1101.11";
char bin1[] = "1101";
char bin2[] = "1101.";
char bin3[] = ".11";
printf("%s -> %f\n",bin, convert(bin));
printf("%s -> %f\n",bin1, convert(bin1));
printf("%s -> %f\n",bin2, convert(bin2));
printf("%s -> %f\n",bin3, convert(bin3));
return 0;
}
I am wondering if this code can be used to check for negative value? I did try with a binary string of 11111101.11 and it gives the output of 253.75...
I have two questions:
What are the modifications I need to do in order to read a negative value?
I know that I can do the bit shift (as below) to check if the msb is 1, if it is 1, I know it is negative value...
if (14bit_data & 0x2000) //if true, it is negative value
The issue is, since it involves fraction part (but not only integer), it confused me a bit if the method still works...
If the binary number is originally not in string format, is there any way I could convert it to string? The binary number is originally fed from a fpga block written in VHDL say, 14 bits, with msb as the sign bit, the following 6 bits are the magnitude for integer and the last 6 bits are the magnitude for fractional part. I need the decimal value in C code for Altera Nios II processor.
OK so I m focusing on the fact that you want to reuse the algorithm you mention at the beginning of your question and assume that the binary representation you have for your signed number is Two's complement but I`m not really sure according to your comments that the input you have is the same than the one used by the algorithm
First pad the 2 MSB to have a 16 bit representation
16bit_data = (14_bit_data & 0x2000) ? ( 14_bit_data | 0xC000) : 14_bit_data ;
In case value is positive then value will remained unchanged and if negative this will be the correct two`s complement representation on 16bits.
For fractionnal part everything is the same compared to algorithm you mentionned in your question.
For integer part everything is the same except the treatment of MSB.
For unsigned number MSB (ie bit[15]) represents pow(2,15-6) ( 6 is the width of frationnal part ) whereas for signed number in Two`s complement representation it represents -pow(2,15-6) meaning that algorithm become
/* integer part operation */
while(p >= 1)
{
rem = (int)fmod(p, 10);
p = (int)(p / 10);
dec = dec + rem * pow(2, t) * (9 != t ? 1 : -1);
++t;
}
or said differently if you don`t want * operator
/* integer part operation */
while(p >= 1)
{
rem = (int)fmod(p, 10);
p = (int)(p / 10);
if( 9 != t)
{
dec = dec + rem * pow(2, t);
}
else
{
dec = dec - rem * pow(2, t);
}
++t;
}
For the second algorithm that you mention, considering you format if dot == 11 and i == 0 we are at MSB ( 10 integer bits followed by dot) so the code become
for(i = dot - 1; i >= 0 ; i--)
{
if (binary[i] == '1')
{
if(11 != dot || i)
{
result += (double) pow(2,(dot-i-1));
}
else
{
// result -= (double) pow(2,(dot-i-1));
// Due to your number format i == 0 and dot == 11 so
result -= 512
}
}
}
WARNING : in brice algorithm the input is character string like "11011.101" whereas according to your description you have an integer input so I`m not sure that this algorithm is suited to your case
I think this should work:
float convert14BitsToFloat(int16_t in)
{
/* Sign-extend in, since it is 14 bits */
if (in & 0x2000) in |= 0xC000;
/* convert to float with 6 decimal places (64 = 2^6) */
return (float)in / 64.0f;
}
To convert any number to string, I would use sprintf. Be aware it may significantly increase the size of your application. If you don't need the float and what to keep a small application, you should make your own conversion function.

Resources