I'm writing an application in Rust that will have to use vector arithmetic intensively and I stumbled upon a problem of designing operator overload for a structure type.
So I have a vector structure like that:
struct Vector3d {
pub x: f64,
pub y: f64,
pub z: f64,
}
and I want to be able to write something like that:
let x = Vector3d {x: 1.0, y: 0.0, z: 0.0};
let y = Vector3d {x: -1.0, y: 0.0, z: 0.0};
let u = x + y;
As far as I can see, there are three different ways to do it:
Implement std::ops::Add trait for Vector3d directly. That works, but this trait's method signature is:
fn add(self, other: Vector3d)
So it will invalidate its arguments after usage (because it moves them) which is undesirable in my case since many vectors will be used in multiple expressions.
Implement Add trait for Vector3d and also implement the Copy trait. This works, but I feel iffy on that since Vector3d isn't exactly a lightweight thing (24 bytes at least) that can be copied quickly, especially when there are many calls to arithmetic functions.
Implement Add for references to Vector3d, as suggested here. This works, but in order to apply the operator, I will have to write
let u = &x + &y;
I don't like this notation because it doesn't exactly looks like its mathematic equivalent, just u = x + y.
I'm not sure which variant is optimal. So, the question is: is there a way to overload the '+' operator in such a way that
It accepts its arguments as references instead of copying or moving them;
It allows to write just u = x + y instead of u = &x + &y?
Is there a way to overload the '+' operator in such a way that
It accepts its arguments as references instead of copying or moving them;
It allows to write just u = x + y instead of u = &x + &y?
No, there is no way to do that. Rust greatly values explicitness and hardly converts between types automatically.
However, the solution to your problem is simple: just #[derive(Copy)]. I can assure you that 24 bytes are not a lot. Computers these days love to crunch a lot of data at once instead of working on little chunks of data.
Apart from that, Copy is not really about the performance overhead of copying/cloning:
Types that can be copied by simply copying bits (i.e. memcpy).
And later in the documentation:
Generally speaking, if your type can implement Copy, it should.
Your type Vector3d can be copied by just copying bits, so it should implement Copy (by just #[derive()]ing it).
The performance overhead is a different question. If you have a type that can (and thus does) implement Copy, but you still think the type is too big (again: 24 bytes aren't!), you should design all your methods in a way that they accept references (it's not that easy; please read Matthieu's comment). This also includes the Add impl. And if you want to pass something to a function by reference, the programmer shall explicitly write it. That's what Rust's philosophy would say anyway.
Related
I'd like to understand, what is the correct way to write function signatures that do not need moved values, and are happy to accept a borrowed value. Consider:
fn calculate(x:&f32) -> bool {
if x > 5.0 {
true
}
false
}
Given that I don't consume the value, the borrowed value is good enough for me in this function.
Now when I call this function, the situation varies, either that is the terminal place for the x value to end up in:
let x = 4.0;
calculate(&x);
Or I need to use x again afterwards.
let x = 4.0;
calculate(&x);
let y = x + 5.0;
While the above approach works, it's a bit fiddly for the user of the calculate function to have to keep putting ampersands in, especially in cases like this where the number is hard-coded:
calculate(&-2.5);
Just looks absurd - I'm telling the computer to create a value, borrow it, use it, then dispose of it.
I'd expect that for sized types like floats there'd be a way to simplify this. I do understand that for dynamically sized values there might be further complications.
Ideally I'd like to be able to call the function and pass it either a borrowed value or an owned value.
What is the right way of putting this function signature together? Is there a way to accept both f32 and &f32?
Edit:
Additionally, the other situation I have is this:
let x_borrowed = &4.0;
calculate(x_borrowed);
As in, I only have the borrowed value.
From the comments made so far, it seems like the calculate function should instead accept an f32. And when calling it, I should give it either f32 or dereference an &f32. Unless there's an automatic dereferencing feature that someone can point me to?
For types that implement the Copy trait, there is no need to pass a reference as the function will not take ownership of the original value, instead creating a copy to be passed into the function.
However, with types that do not implement Copy (such as String), you can use the AsRef trait for things like this.
fn is_length_of_4(s: impl AsRef<str>) -> bool {
s.as_ref().len() == 4
}
// These will all work
assert!(is_length_of_4("Test"));
assert!(is_length_of_4(String::from("Test")));
assert!(is_length_of_4(&String::from("Test")));
In rust, I have a function that generates a vector of Strings, and I'd like to return this vector along with a reference to one of the strings. Obviously, I would need to appropriately specify the lifetime of the reference, since it is valid only when the vector is in scope. However, but I can't get this to work.
Here is a minimal example of a failed attempt:
fn foo<'a>() -> ('a Vec<String>, &'a String) {
let x = vec!["some", "data", "in", "the", "vector"].iter().map(|s| s.to_string()).collect::<Vec<String>>();
(x, &x[1])
}
(for this example, I know I could return the index to the vector, but my general problem is more complex. Also, I'd like to understand how to achieve this)
Rust doesn't allow you to do that without unsafe code. Probably your best option is to return the vector with the index of the element in question.
This is conceptually very similar to trying to create a self-referential struct. See this for more on why this is challenging.
Is there a good methodology for minimizing the amount of boilerplate when using the num::Float trait and interacting with primitive types in Rust? As an example, consider a poorly written quadratic equation solver
// External libraries
use num::Float;
// Poorly written quadratic formula solver for a x^2 + bx + c
fn myquad <Real> (a : Real, b : Real, c : Real) -> Option<(Real,Real)>
where
Real : Float
{
let mysqrt = Real::sqrt(b.powi(2)-Real::from(4.0)?*a*c);
let r1 = (-b+mysqrt)/(Real::from(2.0)?*a);
let r2 = (-b-mysqrt)/(Real::from(2.0)?*a);
Some((r1,r2))
}
// Write a couple of tests
fn main() {
let r1 = myquad::<f32> (1.0,1.0,-6.0).unwrap();
println!("Roots of (x-2) (x+3): ({},{})",r1.0,r1.1);
let r2 = myquad::<f64> (6.0,5.0,-4.0).unwrap();
println!("Roots of (2x-1) (3x+4): ({},{})",r2.0,r2.1);
}
I would like the myquad routine to work for a variety of floating point types beyond f32 and f64, but also work for them as well. That said, there's a repeated set of wrappers of the form Real::from(x)? where x is a primitive floating point type. While I understand the need for type consistency, this is somewhat verbose and I have concerns about the manageability of these wrappers for more complicated routines with a lot of primitives. Is there a better way to handle these conversions or have them work implicitly? To be sure, the answer may be no, but I'd like to understand this cost before working on more complicated routines.
The reason you are hitting this roadblock is because you are expecting num::Float to be an implemented trait. It isn't. Its purpose is as an extension trait.
It is implemented for both f32 and f64 and allows you to use all the methods it implements on those types without implementing them in the type itself.
That, however, doesn't mean that you can magically add a T: Float bound and be out of the woods, however, as your operations require multiplication and subtraction. As such, your constants (as you found out by yourself) need to implement Sub<X> and Mul<X>, where X is the type you have chosen for your constants.
There is, however, a trick. If you know the type of your constants... you can require From<X> (where X is the type of your constants). This means you can, at the cost of requiring a lower bound on the size of the floats, easily fix this mess.
This lower bound requirement isn't a problem in your case as you are dependent on the powi method declared on num::Float, and this trait is only implemented for two primitive types: f32 and f64. If you ever wanted to use, say, half::f16, you'd need to get rid of the call to powi. As such, requiring f32 as a lower bound is perfectly acceptable.
fn myquad<T:Float + From<f32>>(a : T, b: T, c: T) -> Option<(T, T)>
{
let mysqrt = (b.powi(2) - a * c * (4.0.into())).sqrt();
let r1 = (-b+mysqrt)/(a * 2.0.into());
let r2 = (-b-mysqrt)/(a * 2.0.into());
Some((r1,r2))
}
I think that's about as far down as you can go in terms of boilerplate.
I want to pass Iterators to a function, which then computes some value from these iterators.
I am not sure how a robust signature to such a function would look like.
Lets say I want to iterate f64.
You can find the code in the playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c614429c541f337adb102c14518cf39e
My first attempt was
fn dot(a : impl std::iter::Iterator<Item = f64>,b : impl std::iter::Iterator<Item = f64>) -> f64 {
a.zip(b).map(|(x,y)| x*y).sum()
}
This fails to compile if we try to iterate over slices
So you can do
fn dot<'a>(a : impl std::iter::Iterator<Item = &'a f64>,b : impl std::iter::Iterator<Item = &'a f64>) -> f64 {
a.zip(b).map(|(x,y)| x*y).sum()
}
This fails to compile if I try to iterate over mapped Ranges.
(Why does the compiler requires the livetime parameters here?)
So I tried to accept references and not references generically:
pub fn dot<T : Borrow<f64>, U : Borrow<f64>>(a : impl std::iter::Iterator::<Item = T>, b: impl std::iter::Iterator::<Item = U>) -> f64 {
a.zip(b).map(|(x,y)| x.borrow()*y.borrow()).sum()
}
This works with all combinations I tried, but it is quite verbose and I don't really understand every aspect of it.
Are there more cases?
What would be the best practice of solving this problem?
There is no right way to write a function that can accept Iterators, but there are some general principles that we can apply to make your function general and easy to use.
Write functions that accept impl IntoIterator<...>. Because all Iterators implement IntoIterator, this is strictly more general than a function that accepts only impl Iterator<...>.
Borrow<T> is the right way to abstract over T and &T.
When trait bounds get verbose, it's often easier to read if you write them in where clauses instead of in-line.
With those in mind, here's how I would probably write dot:
fn dot<I, J>(a: I, b: J) -> f64
where
I: IntoIterator,
J: IntoIterator,
I::Item: Borrow<f64>,
J::Item: Borrow<f64>,
{
a.into_iter()
.zip(b)
.map(|(x, y)| x.borrow() * y.borrow())
.sum()
}
However, I also agree with TobiP64's answer in that this level of generality may not be necessary in every case. This dot is nice because it can accept a wide range of arguments, so you can call dot(&some_vec, some_iterator) and it just works. It's optimized for readability at the call site. On the other hand, if you find the Borrow trait complicates the definition too much, there's nothing wrong with optimizing for readability at the definition, and forcing the caller to add a .iter().copied() sometimes. The only thing I would definitely change about the first dot function is to replace Iterator with IntoIterator.
You can iterate over slices with the first dot implementation like that:
dot([0, 1, 2].iter().cloned(), [0, 1, 2].iter().cloned());
(https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.cloned)
or
dot([0, 1, 2].iter().copied(), [0, 1, 2].iter().copied());
(https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.copied)
Why does the compiler requires the livetime parameters here?
As far as I know every reference in rust has a lifetime, but the compiler can infer simple it in cases. In this case, however the compiler is not yet smart enough, so you need to tell it how long the references yielded by the iterator lives.
Are there more cases?
You can always use iterator methods, like the solution above, to get an iterator over f64, so you don't have to deal with lifetimes or generics.
What would be the best practice of solving this problem?
I would recommend the first version (and thus leaving it to the caller to transform the iterator to Iterator<f64>), simply because it's the most readable.
I like using partial application, because it permits (among other things) to split a complicated function call, that is more readable.
An example of partial application:
fn add(x: i32, y: i32) -> i32 {
x + y
}
fn main() {
let add7 = |x| add(7, x);
println!("{}", add7(35));
}
Is there overhead to this practice?
Here is the kind of thing I like to do (from a real code):
fn foo(n: u32, things: Vec<Things>) {
let create_new_multiplier = |thing| ThingMultiplier::new(thing, n); // ThingMultiplier is an Iterator
let new_things = things.clone().into_iter().flat_map(create_new_multiplier);
things.extend(new_things);
}
This is purely visual. I do not like to imbricate too much the stuff.
There should not be a performance difference between defining the closure before it's used versus defining and using it it directly. There is a type system difference — the compiler doesn't fully know how to infer types in a closure that isn't immediately called.
In code:
let create_new_multiplier = |thing| ThingMultiplier::new(thing, n);
things.clone().into_iter().flat_map(create_new_multiplier)
will be the exact same as
things.clone().into_iter().flat_map(|thing| {
ThingMultiplier::new(thing, n)
})
In general, there should not be a performance cost for using closures. This is what Rust means by "zero cost abstraction": the programmer could not have written it better themselves.
The compiler converts a closure into implementations of the Fn* traits on an anonymous struct. At that point, all the normal compiler optimizations kick in. Because of techniques like monomorphization, it may even be faster. This does mean that you need to do normal profiling to see if they are a bottleneck.
In your particular example, yes, extend can get inlined as a loop, containing another loop for the flat_map which in turn just puts ThingMultiplier instances into the same stack slots holding n and thing.
But you're barking up the wrong efficiency tree here. Instead of wondering whether an allocation of a small struct holding two fields gets optimized away you should rather wonder how efficient that clone is, especially for large inputs.