Unzip iterator of references to tuples into two collections of references - reference

I have an Iterator<Item = &(T, U)> over a slice &[(T, U)]. I'd like to unzip this iterator into its components (i.e. obtain (Vec<&T>, Vec<&U>)).
Rust provides unzip functionality through the .unzip() method on Interator:
points.iter().unzip()
Unfortunately, this doesn't work as-is because .unzip() expects the type of the iterator's item to be a tuple; mine is a reference to a tuple.
To fix this, I tried to write a function which converts between a reference to a tuple and a tuple of references:
fn distribute_ref<'a, T, U>(x: &'a (T, U)) -> (&'a T, &'a U) {
(&x.0, &x.1)
}
I can then map over the resulting iterator to get something .unzip() can handle:
points.iter().map(distribute_ref).unzip()
This works now, but I this feels a bit strange. In particular, distribute_ref seems like a fairly simple operation that would be provided by the Rust standard library. I'm guessing it either is and I can't find it, or I'm not approaching this the right way.
Is there a better way to do this?

Is there a better way to do this?
"Better" is a bit subjective. You can make your function shorter:
fn distribute_ref<T, U>(x: &(T, U)) -> (&T, &U) {
(&x.0, &x.1)
}
Lifetime elision allows to omit lifetime annotations in this case.
You can use a closure to do the same thing:
points.iter().map(|&(ref a, ref b)| (a, b)).unzip()
Depending on the task it can be sufficient to clone the data. Especially in this case, as reference to u8 takes 4 or 8 times more space than u8 itself.
points().iter().cloned().unzip()

Related

Calculate object A, then return object B that references A in Rust

In my code I often want to calculate a new value A, and then return some view of that value B, because B is a type that's more convenient to work with. The simplest case is where A is a vector and B is a slice that I would like to return. Let's say I want to write a function that returns a set of indices. Ideally this would return a slice directly because then I can use it immediately to index a string.
If I return a vector instead of a slice, I have to use to_slice:
fn all_except(except: usize, max:usize) -> Vec<usize> {
(0..except).chain((except + 1)..max).collect()
}
"abcdefg"[all_except(1, 7)]
string indices are ranges of `usize`
the type `str` cannot be indexed by `Vec<usize>`
help: the trait `SliceIndex<str>` is not implemented for `Vec<usize>`
I can't return a slice directly:
fn all_except(except: usize, max:usize) -> &[usize] {
(0..except).chain((except + 1)..max).collect()
}
"abcdefg"[all_except(1, 7)]
^ expected named lifetime parameter
missing lifetime specifier
help: this function's return type contains a borrowed value with an elided lifetime, but the lifetime cannot be derived from the arguments
help: consider using the `'static` lifetime
I can't even return the underlying vector and a slice of it, for the same reason
pub fn except(index: usize, max: usize) -> (&[usize], Vec<usize>) {
let v = (0..index).chain((index + 1)..max).collect();
(v, v.as_slice)
}
"abcdefg"[all_except(1, 7)[1]
Now it may be possible to hack this particular example using deref coercion (I'm not sure), but I have encountered this problem with more complex types. For example, I have a function that loads an ndarray::Array2<T> from CSV file, then want to split it into two parts using array.split_at(), but this returns two ArrayView2<T> which reference the original Array2<T>, so I encounter the same issue. In general I'm wondering if there's a solution to this problem in general. Can I somehow tell the compiler to move A into the parent frame's scope, or let me return a tuple of (A, B), where it realises that the slice is still valid because A is still alive?
Your code doesn't seem to make any sense, you can't index a string using a slice. If you could the first snippet would have worked with just an as_slice in the caller or something, vecs trivially coerce to slices. That's exactly what the compiler error is telling you: the compiler is looking for a SliceIndex and a Vec (or slice) is definitely not that.
That aside,
Can I somehow tell the compiler to move A into the parent frame's scope, or let me return a tuple of (A, B), where it realises that the slice is still valid because A is still alive?
There are packages like owning_ref which can bundle owner and reference to avoid extra allocations. It tends to be somewhat fiddly.
I don't think there's any other general solution, because Rust reasons at the function level, the type checker has no notion of "tell the compiler to move A into the parent scope". So you need a construct which works around borrow checker.

Why does Vec<T> expect &T as the argument to binary_search?

This question is predicated on the assumption that a friendly/ergonomic API in rust should prefer a reference to a type Q where T: Borrow<Q> instead of expecting &T directly. In my experience working with the API of other collection types like HashMap, this definitely seems to be the case. That said...
How come the binary_search method on Vec<T> is not defined that way? Currently, on stable, the binary_search implementation is as follows:
pub fn binary_search(&self, x: &T) -> Result<usize, usize>
where
T: Ord,
{
self.binary_search_by(|p| p.cmp(x))
}
It seems like the following would be a better implementation:
pub fn binary_search_modified<Q>(&self, x: &Q) -> Result<usize, usize>
where
T: Borrow<Q>,
Q: Ord + ?Sized,
{
self.binary_search_by(|p| p.borrow().cmp(x))
}
A comparison of the two APIs above:
let mut v: Vec<String> = Vec::new();
v.push("A".into());
v.push("B".into());
v.push("D".into());
let _ = v.binary_search("C"); // Compilation error!
let _ = v.binary_search(&String::from("C")); // Fine allocate and convert it to the exact type, I guess
let _ = v.binary_search_modified("C"); // Far nicer API, does the same thing
let _ = v.binary_search_modified(&String::from("C")); // Backwards compatible
As a more general question, what considerations go into deciding if a method should accept &T or &Q ... where T: Borrow<Q>?
You are right that binary_search() and a few other methods like contains() could be generalized to accept any type that can be borrowed as T, but unfortunately Rust 1.0 was released with the less general signature. And while it looks like using Borrow is strictly more general, attempts to implement that change broke type inference in too many cases.
There are countless Github issues, PRs and forum discussion on this topic. If you want to follow the history of the attempts to fix this, I suggest starting at the PR finally reverting the attempts to make binary_search() more general and working your way backwards.
Regarding your more general question, my advice would be the same as for any API design question: think about the use cases. Using an additional type parameter makes the code somewhat more complex, and the documentation and compiler errors become less obvious. For methods on a trait, the type parameter will render the trait unusuable for trait objects. So if you can think of convincing use cases for the more general version using Borrow, go for it, but in the absence of a convincing use case it's better to avoid it.

How to properly pass Iterators to a function in Rust

I want to pass Iterators to a function, which then computes some value from these iterators.
I am not sure how a robust signature to such a function would look like.
Lets say I want to iterate f64.
You can find the code in the playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c614429c541f337adb102c14518cf39e
My first attempt was
fn dot(a : impl std::iter::Iterator<Item = f64>,b : impl std::iter::Iterator<Item = f64>) -> f64 {
a.zip(b).map(|(x,y)| x*y).sum()
}
This fails to compile if we try to iterate over slices
So you can do
fn dot<'a>(a : impl std::iter::Iterator<Item = &'a f64>,b : impl std::iter::Iterator<Item = &'a f64>) -> f64 {
a.zip(b).map(|(x,y)| x*y).sum()
}
This fails to compile if I try to iterate over mapped Ranges.
(Why does the compiler requires the livetime parameters here?)
So I tried to accept references and not references generically:
pub fn dot<T : Borrow<f64>, U : Borrow<f64>>(a : impl std::iter::Iterator::<Item = T>, b: impl std::iter::Iterator::<Item = U>) -> f64 {
a.zip(b).map(|(x,y)| x.borrow()*y.borrow()).sum()
}
This works with all combinations I tried, but it is quite verbose and I don't really understand every aspect of it.
Are there more cases?
What would be the best practice of solving this problem?
There is no right way to write a function that can accept Iterators, but there are some general principles that we can apply to make your function general and easy to use.
Write functions that accept impl IntoIterator<...>. Because all Iterators implement IntoIterator, this is strictly more general than a function that accepts only impl Iterator<...>.
Borrow<T> is the right way to abstract over T and &T.
When trait bounds get verbose, it's often easier to read if you write them in where clauses instead of in-line.
With those in mind, here's how I would probably write dot:
fn dot<I, J>(a: I, b: J) -> f64
where
I: IntoIterator,
J: IntoIterator,
I::Item: Borrow<f64>,
J::Item: Borrow<f64>,
{
a.into_iter()
.zip(b)
.map(|(x, y)| x.borrow() * y.borrow())
.sum()
}
However, I also agree with TobiP64's answer in that this level of generality may not be necessary in every case. This dot is nice because it can accept a wide range of arguments, so you can call dot(&some_vec, some_iterator) and it just works. It's optimized for readability at the call site. On the other hand, if you find the Borrow trait complicates the definition too much, there's nothing wrong with optimizing for readability at the definition, and forcing the caller to add a .iter().copied() sometimes. The only thing I would definitely change about the first dot function is to replace Iterator with IntoIterator.
You can iterate over slices with the first dot implementation like that:
dot([0, 1, 2].iter().cloned(), [0, 1, 2].iter().cloned());
(https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.cloned)
or
dot([0, 1, 2].iter().copied(), [0, 1, 2].iter().copied());
(https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.copied)
Why does the compiler requires the livetime parameters here?
As far as I know every reference in rust has a lifetime, but the compiler can infer simple it in cases. In this case, however the compiler is not yet smart enough, so you need to tell it how long the references yielded by the iterator lives.
Are there more cases?
You can always use iterator methods, like the solution above, to get an iterator over f64, so you don't have to deal with lifetimes or generics.
What would be the best practice of solving this problem?
I would recommend the first version (and thus leaving it to the caller to transform the iterator to Iterator<f64>), simply because it's the most readable.

Why does the argument for the find closure need two ampersands?

I have been playing with Rust by porting my Score4 AI engine to it - basing the work on my functional-style implementation in OCaml. I specifically wanted to see how Rust fares with functional-style code.
The end result: It works, and it's very fast - much faster than OCaml. It almost touches the speed of imperative-style C/C++ - which is really cool.
There's a thing that troubles me, though — why do I need two ampersands in the last line of this code?
let moves_and_scores: Vec<_> = moves_and_boards
.iter()
.map(|&(column,board)| (column, score_board(&board)))
.collect();
let target_score = if maximize_or_minimize {
ORANGE_WINS
} else {
YELLOW_WINS
};
if let Some(killer_move) = moves_and_scores.iter()
.find(|& &(_,score)| score==target_score) {
...
I added them is because the compiler errors "guided" me to it; but I am trying to understand why... I used the trick mentioned elsewhere in Stack Overflow to "ask" the compiler to tell me what type something is:
let moves_and_scores: Vec<_> = moves_and_boards
.iter()
.map(|&(column,board)| (column, score_board(&board)))
.collect();
let () = moves_and_scores;
...which caused this error:
src/main.rs:108:9: 108:11 error: mismatched types:
expected `collections::vec::Vec<(u32, i32)>`,
found `()`
(expected struct `collections::vec::Vec`,
found ()) [E0308]
src/main.rs:108 let () = moves_and_scores;
...as I expected, moves_and_scores is a vector of tuples: Vec<(u32, i32)>. But then, in the immediate next line, iter() and find() force me to use the hideous double ampersands in the closure parameter:
if let Some(killer_move) = moves_and_scores.iter()
.find(|& &(_,score)| score==target_score) {
Why does the find closure need two ampersands? I could see why it may need one (pass the tuple by reference to save time/space) but why two? Is it because of the iter? That is, is the iter creating references, and then find expects a reference on each input, so a reference on a reference?
If this is so, isn't this, arguably, a rather ugly design flaw in Rust?
In fact, I would expect find and map and all the rest of the functional primitives to be parts of the collections themselves. Forcing me to iter() to do any kind of functional-style work seems burdensome, and even more so if it forces this kind of "double ampersands" in every possible functional chain.
I am hoping I am missing something obvious - any help/clarification most welcome.
This here
moves_and_scores.iter()
gives you an iterator over borrowed vector elements. If you follow the API doc what type this is, you'll notice that it's just the iterator for a borrowed slice and this implements Iterator with Item=&T where T is (u32, i32) in your case.
Then, you use find which takes a predicate which takes a &Item as parameter. Sice Item already is a reference in your case, the predicate has to take a &&(u32, i32).
pub trait Iterator {
...
fn find<P>(&mut self, predicate: P) -> Option<Self::Item>
where P: FnMut(&Self::Item) -> bool {...}
... ^
It was probably defined like this because it's only supposed to inspect the item and return a bool. This does not require the item being passed by value.
If you want an iterator over (u32, i32) you could write
moves_and_scores.iter().cloned()
cloned() converts the iterator from one with an Item type &T to one with an Item type T if T is Clone. Another way to do it would be to use into_iter() instead of iter().
moves_and_scores.into_iter()
The difference between the two is that the first option clones the borrowed elements while the 2nd one consumes the vector and moves the elements out of it.
By writing the lambda like this
|&&(_, score)| score == target_score
you destructure the "double reference" and create a local copy of the i32. This is allowed since i32 is a simple type that is Copy.
Instead of destructuring the parameter of your predicate you could also write
|move_and_score| move_and_score.1 == target_score
because the dot operator automatically dereferences as many times as needed.

How to create a non consuming iterator from a Vector

Situation:
I have a situation where I would like to call some method defined on the Iterator trait on a function parameter. The function that I would like to call it is taking a parameter of a type which is a trait called VecLike. The function is called get_all_matching_rules.
get_all_matching_rules can receive either a Vec or another similar home made type which also implements Iterator. Of course both of these implement VecLike. I was thinking of adding a function on VecLike to have it return an Iterator so that I could use it in get_all_matching_rules.
If my parameter is named: matching_rules I could then do matching_rules.iter().filter(.
Question:
How do I return a non consuming iterator from a Vec?
I'd like to be able to return a non consuming iterator on a Vec<T> of type Iterator<T>. I am not looking to iterate the items by calling .iter().
If I have (where self is a Vec):
fn iter<'a>(&'a self) -> Iterator<T> {
self.iter()
}
I get the following error:
error: mismatched types: expected `core::iter::Iterator<T>+'a`, found `core::slice::Items<'_,T>` (expected trait core::iter::Iterator, found struct core::slice::Items)
I would like to return the Iterator<t>. If there is a better way to go at this rather than returning an Iterator, I'm all ears.
.iter() on [T], which Vec<T> automatically dereferences to, takes self by reference and produces a type implementing Iterator<&T>. Note that the return type is not Iterator<&T>; Iterator is a trait which is implemented by concrete types, and the concrete type Items<T> is the return type in that case, not Iterator<&T>. There is not currently any syntax for specifying a return type merely as a trait that is implemented by it, though the syntax impl Iterator<&T> has been suggested.
Now you wish something implementing Iterator<T> rather than Iterator<&T>. Under Rust’s memory model where each object is owned by exactly one thing, this is not possible with the same objects; there must be some constraint to allow you to get a new T from the &T. There are two readily provided solutions for this:
The Copy trait, for types that can just be copied bitwise. Given a variable of a type implementing Iterator<&T> where T is Copy, this can be written .map(|&x| x) or .map(|x| *x) (the two are equivalent).
The Clone trait, for any types where the operation can be caused to make sense, regardless of Copy bounds. Given a variable of a type implementing Iterator<&T> where T is Clone, this can be written .map(|x| x.clone()).
Thus, given a vector v, v.iter().map(|x| x.clone()). Generically, something like this:
fn iter<T: Clone>(slice: &[T]) -> Map<&T, T, Items<T>> {
slice.iter().map(|x| x.clone())
}
I'm not sure what you're asking here.
.iter() creates an iterator (Items) which does not move the Vec (You'll get an Iterator over &T).
Filter (and most other iterator adapters) are lazy. Perhaps you should chain() the two iterators before filtering them?
Otherwise, if you don't want the Filter to be consumed, clone it.

Resources