Rust: Using structs that contain a f32 field in a hashmap [duplicate] - rust

I want to use a HashMap<f64, f64>, for saving the distances of a point with known x and key y to another point. f64 as value shouldn't matter here, the focus should be on key.
let mut map = HashMap<f64, f64>::new();
map.insert(0.4, f64::hypot(4.2, 50.0));
map.insert(1.8, f64::hypot(2.6, 50.0));
...
let a = map.get(&0.4).unwrap();
As f64 is neither Eq nor Hash, but only PartialEq, f64 is not sufficient as a key. I need to save the distances first, but also access the distances later by y. The type of y needs to be floating point precision, but if doesn't work with f64, I'll use an i64 with an known exponent.
I tried some hacks by using my own struct Dimension(f64) and then implementing Hash by converting the float into a String and then hashing it.
#[derive(PartialEq, Eq)]
struct DimensionKey(f64);
impl Hash for DimensionKey {
fn hash<H: Hasher>(&self, state: &mut H) {
format!("{}", self.0).hash(state);
}
}
It seems very bad and both solutions, my own struct or float as integers with base and exponent seem to be pretty complicated for just a key.
Update:
I can guarantee that my key never will be NaN, or an infinite value. Also, I won't calculate my keys, only iterating over them and using them. So there should no error with the known error with 0.1 + 0.2 ≠ 0.3.
How to do a binary search on a Vec of floats? and this question have in common to implement total ordering and equality for a floating number, the difference lies only in the hashing or iterating.

Presented with no comment beyond read all the other comments and answers to understand why you probably don't want to do this:
use std::{collections::HashMap, hash};
#[derive(Debug, Copy, Clone)]
struct DontUseThisUnlessYouUnderstandTheDangers(f64);
impl DontUseThisUnlessYouUnderstandTheDangers {
fn key(&self) -> u64 {
self.0.to_bits()
}
}
impl hash::Hash for DontUseThisUnlessYouUnderstandTheDangers {
fn hash<H>(&self, state: &mut H)
where
H: hash::Hasher,
{
self.key().hash(state)
}
}
impl PartialEq for DontUseThisUnlessYouUnderstandTheDangers {
fn eq(&self, other: &DontUseThisUnlessYouUnderstandTheDangers) -> bool {
self.key() == other.key()
}
}
impl Eq for DontUseThisUnlessYouUnderstandTheDangers {}
fn main() {
let a = DontUseThisUnlessYouUnderstandTheDangers(0.1);
let b = DontUseThisUnlessYouUnderstandTheDangers(0.2);
let c = DontUseThisUnlessYouUnderstandTheDangers(0.3);
let mut map = HashMap::new();
map.insert(a, 1);
map.insert(b, 2);
println!("{:?}", map.get(&a));
println!("{:?}", map.get(&b));
println!("{:?}", map.get(&c));
}
Basically, if you want to treat a f64 as a set of bits that have no meaning, well, we can treat them as an equivalently sized bag of bits that know how to be hashed and bitwise-compared.
Don't be surprised when one of the 16 million NaN values doesn't equal another one.

You could split the f64 into the integral and fractional part and store them in a struct in the following manner:
#[derive(Hash, Eq, PartialEq)]
struct Distance {
integral: u64,
fractional: u64
}
The rest is straightforward:
use std::collections::HashMap;
#[derive(Hash, Eq, PartialEq)]
struct Distance {
integral: u64,
fractional: u64
}
impl Distance {
fn new(i: u64, f: u64) -> Distance {
Distance {
integral: i,
fractional: f
}
}
}
fn main() {
let mut map: HashMap<Distance, f64> = HashMap::new();
map.insert(Distance::new(0, 4), f64::hypot(4.2, 50.0));
map.insert(Distance::new(1, 8), f64::hypot(2.6, 50.0));
assert_eq!(map.get(&Distance::new(0, 4)), Some(&f64::hypot(4.2, 50.0)));
}
Edit: As Veedrac said, a more general and efficient option would be to deconstruct the f64 into a mantissa-exponent-sign triplet. The function that can do this, integer_decode(), is deprecated in std, but it can be easily found in Rust GitHub.
The integer_decode() function can be defined as follows:
use std::mem;
fn integer_decode(val: f64) -> (u64, i16, i8) {
let bits: u64 = unsafe { mem::transmute(val) };
let sign: i8 = if bits >> 63 == 0 { 1 } else { -1 };
let mut exponent: i16 = ((bits >> 52) & 0x7ff) as i16;
let mantissa = if exponent == 0 {
(bits & 0xfffffffffffff) << 1
} else {
(bits & 0xfffffffffffff) | 0x10000000000000
};
exponent -= 1023 + 52;
(mantissa, exponent, sign)
}
The definition of Distance could then be:
#[derive(Hash, Eq, PartialEq)]
struct Distance((u64, i16, i8));
impl Distance {
fn new(val: f64) -> Distance {
Distance(integer_decode(val))
}
}
This variant is also easier to use:
fn main() {
let mut map: HashMap<Distance, f64> = HashMap::new();
map.insert(Distance::new(0.4), f64::hypot(4.2, 50.0));
map.insert(Distance::new(1.8), f64::hypot(2.6, 50.0));
assert_eq!(map.get(&Distance::new(0.4)), Some(&f64::hypot(4.2, 50.0)));
}

Unfortunately, floating types equality is hard and counter-intuitive:
fn main() {
println!("{} {} {}", 0.1 + 0.2, 0.3, 0.1 + 0.2 == 0.3);
}
// Prints: 0.30000000000000004 0.3 false
And therefore hashing is hard too, since hashes of equal values should be equal.
If, in your case, you have a small enough range to fit your number in a i64 and you can accept the loss of precision, then a simple solution is to canonicalize first and then define equal/hash in terms of the canonical value:
use std::cmp::Eq;
#[derive(Debug)]
struct Distance(f64);
impl Distance {
fn canonicalize(&self) -> i64 {
(self.0 * 1024.0 * 1024.0).round() as i64
}
}
impl PartialEq for Distance {
fn eq(&self, other: &Distance) -> bool {
self.canonicalize() == other.canonicalize()
}
}
impl Eq for Distance {}
fn main() {
let d = Distance(0.1 + 0.2);
let e = Distance(0.3);
println!("{:?} {:?} {:?}", d, e, d == e);
}
// Prints: Distance(0.30000000000000004) Distance(0.3) true
Hash just follows, and from then on you can use Distance as a key in the hash map:
impl Hash for Distance {
fn hash<H>(&self, state: &mut H) where H: Hasher {
self.canonicalize().hash(state);
}
}
fn main() {
let d = Distance(0.1 + 0.2);
let e = Distance(0.3);
let mut m = HashMap::new();
m.insert(d, "Hello");
println!("{:?}", m.get(&e));
}
// Prints: Some("Hello")
Warning: To reiterate, this strategy only works if (a) the dynamic range of values is small enough to be captured in a i64 (19 digits) and if (b) the dynamic range is known in advance as the factor is static. Fortunately, this holds for many common problems, but it is something to document and test...

You can use the ordered_float crate which does this for you.

Related

Design patterns without the box

Rust beginner here. I have a number of algorithms that are almost identical but, at the final step, they all aggregate the results in slightly differently ways. Let's say the Algorithm does the following:
pub struct Algorithm<T> {
result_aggregator: Box<dyn ResultAggregator<T>>,
}
impl<T> Algorithm<T> {
pub fn calculate(&self, num1: i32, num2: i32) -> T {
let temp = num1 + num2;
self.result_aggregator.create(temp)
}
}
With this, I can create a few different result aggregator classes to take my temp result and transform it into my final result:
pub trait ResultAggregator<T> {
fn create(&self, num: i32) -> T;
}
pub struct FloatAggregator;
pub struct StringAggregator;
impl ResultAggregator<f32> for FloatAggregator {
fn create(&self, num: i32) -> f32 {
num as f32 * 3.14159
}
}
impl ResultAggregator<String> for StringAggregator {
fn create(&self, num: i32) -> String {
format!("~~{num}~~")
}
}
...and call it like so:
fn main() {
// Here's a float example
let aggregator = FloatAggregator;
let algorithm = Algorithm {
result_aggregator: Box::new(aggregator),
};
let result = algorithm.calculate(4, 5);
println!("The result has value {result}");
// Here's a string example
let aggregator = StringAggregator;
let algorithm = Algorithm {
result_aggregator: Box::new(aggregator),
};
let result = algorithm.calculate(4, 5);
println!("The result has value {result}");
}
This is what I've come up with.
Question: Is it possible to do this without the dynamic box? It's performance critical and I understand that generics are usually a good solution but I've had no luck figuring out how to get it working without dynamic dispatch.
So what's the Rusty solution to this problem? I feel like I'm approaching it with my C# hat on which is probably not the way to go.
Link to the playground
You can use an associated type instead of a generic parameter:
pub trait ResultAggregator {
type Output;
fn create(&self, num: i32) -> Self::Output;
}
pub struct FloatAggregator;
pub struct StringAggregator;
impl ResultAggregator for FloatAggregator {
type Output = f32;
fn create(&self, num: i32) -> f32 {
num as f32 * 3.14159
}
}
impl ResultAggregator for StringAggregator {
type Output = String;
fn create(&self, num: i32) -> String {
format!("~~{num}~~")
}
}
pub struct Algorithm<Aggregator> {
result_aggregator: Aggregator,
}
impl<Aggregator: ResultAggregator> Algorithm<Aggregator> {
pub fn calculate(&self, num1: i32, num2: i32) -> Aggregator::Output {
let temp = num1 + num2;
self.result_aggregator.create(temp)
}
}

Determine fn output type from input type

I would like a function that returns f32 if the input is f32, for all other numeric inputs it should return f64.
A simplified example of the structure is this:
use num::{Num, NumCast, ToPrimitive, traits::Float};
fn example<N: Num + ToPrimitive, T: Float>(input: N) -> T {
let output = input + N::one();
NumCast::from(output).unwrap()
}
fn main() {
println!("{}", example::<f32, f32>(1f32));
println!("{}", example::<u32, f64>(1u32));
}
is there a way to control the dispatch so that I can drop the turbofish and it will automatically map
f32 -> f32 and
anything else -> f64 ?
You cannot have a function which returns either f32 or f64 making decision in runtime. But you can implement such behavior using traits.
You can have two traits ToF32 and ToF64 and implement first one for f32 only and ToF64 for all other numeric types except f32. You will need a lot of boilerplate code to implement ToF64 though. It can be compacted using macros.
use num::{NumCast, ToPrimitive, one, Integer};
trait ToF32 {
fn example(self) -> f32;
}
impl ToF32 for f32 {
fn example(self) -> f32 {
let output = self + one::<Self>();
NumCast::from(output).unwrap()
}
}
trait ToF64 {
fn example(self) -> f64;
}
impl<T: Integer + ToPrimitive> ToF64 for T {
fn example(self) -> f64 {
let output = self + one::<Self>();
NumCast::from(output).unwrap()
}
}
fn main() {
println!("{}", 1f32.example());
println!("{}", 1u32.example());
}
Playground link

Error while calculating Getting average of chunks [duplicate]

This code works:
fn main() {
let a: i32 = (1i32..10).sum();
let b = a.pow(2);
}
If I remove the i32 type from a, then I get this error:
rustc 1.13.0 (2c6933acc 2016-11-07)
error: the type of this value must be known in this context
--> <anon>:3:13
|
5 | let b = a.pow(2);
| ^^^^^^^^
Run the example
I would have expected that Rust turns (1i32..10) into an i32 iterator and then sum() knows to return an i32. What am I missing?
The way sum is defined, the return value is open-ended; more than one type can implement the trait Sum<i32>. Here's an example where different types for a are used, both of which compile:
#[derive(Clone, Copy)]
struct Summer {
s: isize,
}
impl Summer {
fn pow(&self, p: isize) {
println!("pow({})", p);
}
}
impl std::iter::Sum<i32> for Summer {
fn sum<I>(iter: I) -> Self
where
I: Iterator<Item = i32>,
{
let mut result = 0isize;
for v in iter {
result += v as isize;
}
Summer { s: result }
}
}
fn main() {
let a1: i32 = (1i32..10).sum();
let a2: Summer = (1i32..10).sum();
let b1 = a1.pow(2);
let b2 = a2.pow(2);
}
Playground
Since both result types are possible, the type cannot be inferred and must be explicitly specified, either by a turbofish (sum::<X>()) or as the result of the expression (let x: X = ...sum();).
and then sum() knows to return an i32
This is the key missing point. While the "input" type is already known (it has to be something that implements Iterator in order for sum to even be available), the "output" type is very flexible.
Check out Iterator::sum:
fn sum<S>(self) -> S
where
S: Sum<Self::Item>,
It returns a generic type S which has to implement Sum. S does not have to match Self::Item. Therefore, the compiler requires you to specify what
type to sum into.
Why is this useful? Check out these two sample implementations from the standard library:
impl Sum<i8> for i8
impl<'a> Sum<&'a i8> for i8
That's right! You can sum up an iterator of u8 or an iterator of &u8! If we didn't have this, then this code wouldn't work:
fn main() {
let a: i32 = (0..5).sum();
let b: i32 = [0, 1, 2, 3, 4].iter().sum();
assert_eq!(a, b);
}
As bluss points out, we could accomplish this by having an associated type which would tie u8 -> u8 and &'a u8 -> u8.
If we only had an associated type though, then the target sum type would always be fixed, and we'd lose flexibility. See When is it appropriate to use an associated type versus a generic type? for more details.
As an example, we can also implement Sum<u8> for our own types. Here, we sum up u8s, but increase the size of the type we are summing, as it's likely the sum would exceed a u8. This implementation is in addition to the existing implementations from the standard library:
#[derive(Debug, Copy, Clone)]
struct Points(i32);
impl std::iter::Sum<u8> for Points {
fn sum<I>(iter: I) -> Points
where
I: Iterator<Item = u8>,
{
let mut pts = Points(0);
for v in iter {
pts.0 += v as i32;
}
pts
}
}
fn main() {
let total: Points = (0u8..42u8).sum();
println!("{:?}", total);
}

How to satisfy the Iterator trait bound in order to use Rayon here?

I'm attempting to parallelise the Ramer–Douglas-Peucker line simplification algorithm by using Rayon's par_iter instead of iter:
extern crate num_traits;
use num_traits::{Float, ToPrimitive};
extern crate rayon;
use self::rayon::prelude::*;
#[derive(PartialEq, Clone, Copy, Debug)]
pub struct Coordinate<T>
where T: Float
{
pub x: T,
pub y: T,
}
#[derive(PartialEq, Clone, Copy, Debug)]
pub struct Point<T>(pub Coordinate<T>) where T: Float;
impl<T> Point<T>
where T: Float + ToPrimitive
{
pub fn new(x: T, y: T) -> Point<T> {
Point(Coordinate { x: x, y: y })
}
pub fn x(&self) -> T {
self.0.x
}
pub fn y(&self) -> T {
self.0.y
}
}
unsafe impl<T> Send for Point<T> where T: Float {}
unsafe impl<T> Sync for Point<T> where T: Float {}
fn distance<T>(a: &Point<T>, p: &Point<T>) -> T
where T: Float
{
let (dx, dy) = (a.x() - p.x(), a.y() - p.y());
dx.hypot(dy)
}
// perpendicular distance from a point to a line
fn point_line_distance<T>(point: &Point<T>, start: &Point<T>, end: &Point<T>) -> T
where T: Float
{
if start == end {
distance(point, start)
} else {
let numerator = ((end.x() - start.x()) * (start.y() - point.y()) -
(start.x() - point.x()) * (end.y() - start.y()))
.abs();
let denominator = distance(start, end);
numerator / denominator
}
}
// Ramer–Douglas-Peucker line simplification algorithm
fn rdp<T>(points: &[Point<T>], epsilon: &T) -> Vec<Point<T>>
where T: Float + Send + Sync
{
if points.is_empty() {
return points.to_vec();
}
let mut dmax = T::zero();
let mut index: usize = 0;
let mut distance: T;
for (i, _) in points.par_iter().enumerate().take(points.len() - 1).skip(1) {
distance = point_line_distance(&points[i], &points[0], &*points.last().unwrap());
if distance > dmax {
index = i;
dmax = distance;
}
}
if dmax > *epsilon {
let mut intermediate = rdp(&points[..index + 1], &*epsilon);
intermediate.pop();
intermediate.extend_from_slice(&rdp(&points[index..], &*epsilon));
intermediate
} else {
vec![*points.first().unwrap(), *points.last().unwrap()]
}
}
#[cfg(test)]
mod test {
use super::{Point};
use super::{rdp};
#[test]
fn rdp_test() {
let mut vec = Vec::new();
vec.push(Point::new(0.0, 0.0));
vec.push(Point::new(5.0, 4.0));
vec.push(Point::new(11.0, 5.5));
vec.push(Point::new(17.3, 3.2));
vec.push(Point::new(27.8, 0.1));
let mut compare = Vec::new();
compare.push(Point::new(0.0, 0.0));
compare.push(Point::new(5.0, 4.0));
compare.push(Point::new(11.0, 5.5));
compare.push(Point::new(27.8, 0.1));
let simplified = rdp(&vec, &1.0);
assert_eq!(simplified, compare);
}
}
I've impld Send and Sync for Point<T>, but when I switch to par_iter, I get the following error:
error[E0277]: the trait bound rayon::par_iter::skip::Skip<rayon::par_iter::take::Take<rayon::par_iter::enumerate::Enumerate<rayon::par_iter::slice::SliceIter<'_, Point<T>>>>>: std::iter::Iterator is not satisfied
--> lib.rs:107:5
= note: rayon::par_iter::skip::Skip<rayon::par_iter::take::Take<rayon::par_iter::enumerate::Enumerate<rayon::par_iter::slice::SliceIter<'_, Point<T>>>>> is not an iterator; maybe try calling .iter() or a similar method
= note: required by std::iter::IntoIterator::into_iter
I don't understand what it's asking for. Is the problem that I'm operating on a tuple?
Rayon's parallel iterators implement ParallelIterator, not Iterator. In particular, this means you cannot just put a par_iter() in a for-loop header and expect it to suddenly be parallel. for is sequential.
Since your original code isn't written in terms of iterator functions, but rather as for loops, you can't parallelize it simply with the switch to par_iter(), but have to actually redesign the code.
In particular, the failing part of the code seems to be implementing the max_by_key function.

How to call `min` on a generic type that could be either an integer or float?

What do I do when I want to call min on integers and floats? For example consider this:
fn foo<T>(v1: T, v2: T)
where ???
{
....
let new_min = min(v1, v2);
....
}
The problem is that min doesn't work for f32. There is another min for floats.
How would I solve this problem?
Create your own trait that defines the behavior of the various types:
trait Min {
fn min(self, other: Self) -> Self;
}
impl Min for u8 {
fn min(self, other: u8) -> u8 { ::std::cmp::min(self, other) }
}
impl Min for f32 {
fn min(self, other: f32) -> f32 { f32::min(self, other) }
}
fn foo<T>(v1: T, v2: T)
where T: Min
{
let new_min = Min::min(v1, v2);
}
As mentioned in other places, floating point comparisons are hard.
There's no one answer to what the result of min(NaN, 0.0) should be, so it's up to you to decide. If you decide that NaN is less than or greater than all other numbers, great! Maybe it's equal to zero! Maybe you should assert that there will never be a NaN...

Resources