How to properly pass Iterators to a function in Rust - rust

I want to pass Iterators to a function, which then computes some value from these iterators.
I am not sure how a robust signature to such a function would look like.
Lets say I want to iterate f64.
You can find the code in the playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c614429c541f337adb102c14518cf39e
My first attempt was
fn dot(a : impl std::iter::Iterator<Item = f64>,b : impl std::iter::Iterator<Item = f64>) -> f64 {
a.zip(b).map(|(x,y)| x*y).sum()
}
This fails to compile if we try to iterate over slices
So you can do
fn dot<'a>(a : impl std::iter::Iterator<Item = &'a f64>,b : impl std::iter::Iterator<Item = &'a f64>) -> f64 {
a.zip(b).map(|(x,y)| x*y).sum()
}
This fails to compile if I try to iterate over mapped Ranges.
(Why does the compiler requires the livetime parameters here?)
So I tried to accept references and not references generically:
pub fn dot<T : Borrow<f64>, U : Borrow<f64>>(a : impl std::iter::Iterator::<Item = T>, b: impl std::iter::Iterator::<Item = U>) -> f64 {
a.zip(b).map(|(x,y)| x.borrow()*y.borrow()).sum()
}
This works with all combinations I tried, but it is quite verbose and I don't really understand every aspect of it.
Are there more cases?
What would be the best practice of solving this problem?

There is no right way to write a function that can accept Iterators, but there are some general principles that we can apply to make your function general and easy to use.
Write functions that accept impl IntoIterator<...>. Because all Iterators implement IntoIterator, this is strictly more general than a function that accepts only impl Iterator<...>.
Borrow<T> is the right way to abstract over T and &T.
When trait bounds get verbose, it's often easier to read if you write them in where clauses instead of in-line.
With those in mind, here's how I would probably write dot:
fn dot<I, J>(a: I, b: J) -> f64
where
I: IntoIterator,
J: IntoIterator,
I::Item: Borrow<f64>,
J::Item: Borrow<f64>,
{
a.into_iter()
.zip(b)
.map(|(x, y)| x.borrow() * y.borrow())
.sum()
}
However, I also agree with TobiP64's answer in that this level of generality may not be necessary in every case. This dot is nice because it can accept a wide range of arguments, so you can call dot(&some_vec, some_iterator) and it just works. It's optimized for readability at the call site. On the other hand, if you find the Borrow trait complicates the definition too much, there's nothing wrong with optimizing for readability at the definition, and forcing the caller to add a .iter().copied() sometimes. The only thing I would definitely change about the first dot function is to replace Iterator with IntoIterator.

You can iterate over slices with the first dot implementation like that:
dot([0, 1, 2].iter().cloned(), [0, 1, 2].iter().cloned());
(https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.cloned)
or
dot([0, 1, 2].iter().copied(), [0, 1, 2].iter().copied());
(https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.copied)
Why does the compiler requires the livetime parameters here?
As far as I know every reference in rust has a lifetime, but the compiler can infer simple it in cases. In this case, however the compiler is not yet smart enough, so you need to tell it how long the references yielded by the iterator lives.
Are there more cases?
You can always use iterator methods, like the solution above, to get an iterator over f64, so you don't have to deal with lifetimes or generics.
What would be the best practice of solving this problem?
I would recommend the first version (and thus leaving it to the caller to transform the iterator to Iterator<f64>), simply because it's the most readable.

Related

Why impl RangeBounds<T> for Range<&T> requires T to be sized?

https://doc.rust-lang.org/src/core/ops/range.rs.html#979-986
impl<T> RangeBounds<T> for Range<&T> {
fn start_bound(&self) -> Bound<&T> {
Included(self.start)
}
fn end_bound(&self) -> Bound<&T> {
Excluded(self.end)
}
}
As you can see, T is not marked with ?Sized, which is preventing me from passing a Range<&[u8]> into an argument requires impl RangeBounds<[u8]>.
Are there some design considerations behind it? If this is intended, which is the proper way to pass a range of [u8]?
It's rather unfortunate, but adding the T: ?Sized bound breaks type inference in code like btreemap.range("from".."to") because the compiler can't choose between T = str and T = &str, both of which satisfy the bounds on range.
Some discussion of this has taken place on PR #64327. As far as I know there are no plans to relax the bounds on these impls in the future.
Instead of a Range<T>, you can use a tuple of Bound<T>s; for example, instead of takes_range("from".."to"), you can write the following instead:
use std::ops::Bound;
takes_range((Bound::Included("from"), Bound::Excluded("to"))
This will work because (Bound<&T>, Bound<&T>) does implement RangeBounds<T> even when T is !Sized.

Why does Vec<T> expect &T as the argument to binary_search?

This question is predicated on the assumption that a friendly/ergonomic API in rust should prefer a reference to a type Q where T: Borrow<Q> instead of expecting &T directly. In my experience working with the API of other collection types like HashMap, this definitely seems to be the case. That said...
How come the binary_search method on Vec<T> is not defined that way? Currently, on stable, the binary_search implementation is as follows:
pub fn binary_search(&self, x: &T) -> Result<usize, usize>
where
T: Ord,
{
self.binary_search_by(|p| p.cmp(x))
}
It seems like the following would be a better implementation:
pub fn binary_search_modified<Q>(&self, x: &Q) -> Result<usize, usize>
where
T: Borrow<Q>,
Q: Ord + ?Sized,
{
self.binary_search_by(|p| p.borrow().cmp(x))
}
A comparison of the two APIs above:
let mut v: Vec<String> = Vec::new();
v.push("A".into());
v.push("B".into());
v.push("D".into());
let _ = v.binary_search("C"); // Compilation error!
let _ = v.binary_search(&String::from("C")); // Fine allocate and convert it to the exact type, I guess
let _ = v.binary_search_modified("C"); // Far nicer API, does the same thing
let _ = v.binary_search_modified(&String::from("C")); // Backwards compatible
As a more general question, what considerations go into deciding if a method should accept &T or &Q ... where T: Borrow<Q>?
You are right that binary_search() and a few other methods like contains() could be generalized to accept any type that can be borrowed as T, but unfortunately Rust 1.0 was released with the less general signature. And while it looks like using Borrow is strictly more general, attempts to implement that change broke type inference in too many cases.
There are countless Github issues, PRs and forum discussion on this topic. If you want to follow the history of the attempts to fix this, I suggest starting at the PR finally reverting the attempts to make binary_search() more general and working your way backwards.
Regarding your more general question, my advice would be the same as for any API design question: think about the use cases. Using an additional type parameter makes the code somewhat more complex, and the documentation and compiler errors become less obvious. For methods on a trait, the type parameter will render the trait unusuable for trait objects. So if you can think of convincing use cases for the more general version using Borrow, go for it, but in the absence of a convincing use case it's better to avoid it.

Chain Vector and IntoIterator element in Rust

I want to write a function in Rust that will return the vector composed of start integer, then all intermediate integers and then end integer. The assertion it should hold is this:
assert_eq!(intervals(0, 4, 1..4), vec![0, 1, 2, 3, 4]);
The hint is to use chain method for iterators. The function declaration is predefined, I implemented it in one way, which is the following code:
pub fn intervals< I>(start: u32, end: u32, intermediate: I) -> Vec<u32>
where
I: IntoIterator<Item = u32>,
{
let mut a1 = vec![];
a1.push(start);
let inter: Vec<u32> = intermediate.into_iter().collect();
let mut iter : Vec<u32> = a1.iter().chain(inter.iter()).map(|x| *x).collect();
iter.push(end);
return iter;
}
But I am quite convinced this is not really optimal way to do this. I am sure I am doing lots of unnecessary things in the middle two lines. I tried to use intermediate directly like this:
let mut iter: Vec<u32> = a1.iter().chain(intermediate).map(|x| *x).collect();
But I am getting this error for chain method and I don't know how to solve it:
type mismatch resolving <I as std::iter::IntoIterator>::Item==&u32,
expected u32, found &u32
I am super new in Rust so any advice would be helpful to understand what's the right way to use intermediate parameter here.
Here are a few hints:
You have created three separate vectors (one explicitly, two using collect) when in fact you only need one.
You can use the std::iter::once iterator to produce iterators for the start and end integers
No need to collect the intermediate range. The intermediate argument implements IntoIterator, so you can feed it directly to chain. So, you can chain together the start, intermediate and end.
No need to use the 'return' keyword at the end of a function - the result of a function is the value of the last expression in it (as long as there is no semicolon on the end).
Applying those tips your function would look like this:
use std::iter::once;
pub fn intervals< I>(start: u32, end: u32, intermediate: I) -> Vec<u32>
where
I: IntoIterator<Item = u32>,
{
once(start).chain(intermediate).chain(once(end)).collect()
}
One additional thing to note, to answer your question from the comments:
why trying this: a1.iter().chain(intermediate) gives an error with chain method
Calling Vec::iter() returns an iterator that returns references to the values in the vector. This makes sense: calling iter() does not consume the vector, and its contents remain intact: you could iterate over it multiple times if you wanted.
On the other hand, invoking into_iter() from the IntoIterator trait returns an iterator that returns the values. This also makes sense: into_iter() does consume the object you are calling it on, so the iterator then takes ownership of the items that were previously owned by the object.
Trying to chain together two such iterators does not work because they are each iterating different types. One resolution would be to consume a1 as well, like this:
let mut iter : Vec<u32> = a1.into_iter().chain(intermediate).collect();

Unzip iterator of references to tuples into two collections of references

I have an Iterator<Item = &(T, U)> over a slice &[(T, U)]. I'd like to unzip this iterator into its components (i.e. obtain (Vec<&T>, Vec<&U>)).
Rust provides unzip functionality through the .unzip() method on Interator:
points.iter().unzip()
Unfortunately, this doesn't work as-is because .unzip() expects the type of the iterator's item to be a tuple; mine is a reference to a tuple.
To fix this, I tried to write a function which converts between a reference to a tuple and a tuple of references:
fn distribute_ref<'a, T, U>(x: &'a (T, U)) -> (&'a T, &'a U) {
(&x.0, &x.1)
}
I can then map over the resulting iterator to get something .unzip() can handle:
points.iter().map(distribute_ref).unzip()
This works now, but I this feels a bit strange. In particular, distribute_ref seems like a fairly simple operation that would be provided by the Rust standard library. I'm guessing it either is and I can't find it, or I'm not approaching this the right way.
Is there a better way to do this?
Is there a better way to do this?
"Better" is a bit subjective. You can make your function shorter:
fn distribute_ref<T, U>(x: &(T, U)) -> (&T, &U) {
(&x.0, &x.1)
}
Lifetime elision allows to omit lifetime annotations in this case.
You can use a closure to do the same thing:
points.iter().map(|&(ref a, ref b)| (a, b)).unzip()
Depending on the task it can be sufficient to clone the data. Especially in this case, as reference to u8 takes 4 or 8 times more space than u8 itself.
points().iter().cloned().unzip()

How do I implement the Add trait for a reference to a struct?

I made a two element Vector struct and I want to overload the + operator.
I made all my functions and methods take references, rather than values, and I want the + operator to work the same way.
impl Add for Vector {
fn add(&self, other: &Vector) -> Vector {
Vector {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
Depending on which variation I try, I either get lifetime problems or type mismatches. Specifically, the &self argument seems to not get treated as the right type.
I have seen examples with template arguments on impl as well as Add, but they just result in different errors.
I found How can an operator be overloaded for different RHS types and return values? but the code in the answer doesn't work even if I put a use std::ops::Mul; at the top.
I am using rustc 1.0.0-nightly (ed530d7a3 2015-01-16 22:41:16 +0000)
I won't accept "you only have two fields, why use a reference" as an answer; what if I wanted a 100 element struct? I will accept an answer that demonstrates that even with a large struct I should be passing by value, if that is the case (I don't think it is, though.) I am interested in knowing a good rule of thumb for struct size and passing by value vs struct, but that is not the current question.
You need to implement Add on &Vector rather than on Vector.
impl<'a, 'b> Add<&'b Vector> for &'a Vector {
type Output = Vector;
fn add(self, other: &'b Vector) -> Vector {
Vector {
x: self.x + other.x,
y: self.y + other.y,
}
}
}
In its definition, Add::add always takes self by value. But references are types like any other1, so they can implement traits too. When a trait is implemented on a reference type, the type of self is a reference; the reference is passed by value. Normally, passing by value in Rust implies transferring ownership, but when references are passed by value, they're simply copied (or reborrowed/moved if it's a mutable reference), and that doesn't transfer ownership of the referent (because a reference doesn't own its referent in the first place). Considering all this, it makes sense for Add::add (and many other operators) to take self by value: if you need to take ownership of the operands, you can implement Add on structs/enums directly, and if you don't, you can implement Add on references.
Here, self is of type &'a Vector, because that's the type we're implementing Add on.
Note that I also specified the RHS type parameter with a different lifetime to emphasize the fact that the lifetimes of the two input parameters are unrelated.
1 Actually, reference types are special in that you can implement traits for references to types defined in your crate (i.e. if you're allowed to implement a trait for T, then you're also allowed to implement it for &T). &mut T and Box<T> have the same behavior, but that's not true in general for U<T> where U is not defined in the same crate.
If you want to support all scenarios, you must support all the combinations:
&T op U
T op &U
&T op &U
T op U
In rust proper, this was done through an internal macro.
Luckily, there is a rust crate, impl_ops, that also offers a macro to write that boilerplate for us: the crate offers the impl_op_ex! macro, which generates all the combinations.
Here is their sample:
#[macro_use] extern crate impl_ops;
use std::ops;
impl_op_ex!(+ |a: &DonkeyKong, b: &DonkeyKong| -> i32 { a.bananas + b.bananas });
fn main() {
let total_bananas = &DonkeyKong::new(2) + &DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = &DonkeyKong::new(2) + DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = DonkeyKong::new(2) + &DonkeyKong::new(4);
assert_eq!(6, total_bananas);
let total_bananas = DonkeyKong::new(2) + DonkeyKong::new(4);
assert_eq!(6, total_bananas);
}
Even better, they have a impl_op_ex_commutative! that'll also generate the operators with the parameters reversed if your operator happens to be commutative.

Resources