How to create a non consuming iterator from a Vector - rust

Situation:
I have a situation where I would like to call some method defined on the Iterator trait on a function parameter. The function that I would like to call it is taking a parameter of a type which is a trait called VecLike. The function is called get_all_matching_rules.
get_all_matching_rules can receive either a Vec or another similar home made type which also implements Iterator. Of course both of these implement VecLike. I was thinking of adding a function on VecLike to have it return an Iterator so that I could use it in get_all_matching_rules.
If my parameter is named: matching_rules I could then do matching_rules.iter().filter(.
Question:
How do I return a non consuming iterator from a Vec?
I'd like to be able to return a non consuming iterator on a Vec<T> of type Iterator<T>. I am not looking to iterate the items by calling .iter().
If I have (where self is a Vec):
fn iter<'a>(&'a self) -> Iterator<T> {
self.iter()
}
I get the following error:
error: mismatched types: expected `core::iter::Iterator<T>+'a`, found `core::slice::Items<'_,T>` (expected trait core::iter::Iterator, found struct core::slice::Items)
I would like to return the Iterator<t>. If there is a better way to go at this rather than returning an Iterator, I'm all ears.

.iter() on [T], which Vec<T> automatically dereferences to, takes self by reference and produces a type implementing Iterator<&T>. Note that the return type is not Iterator<&T>; Iterator is a trait which is implemented by concrete types, and the concrete type Items<T> is the return type in that case, not Iterator<&T>. There is not currently any syntax for specifying a return type merely as a trait that is implemented by it, though the syntax impl Iterator<&T> has been suggested.
Now you wish something implementing Iterator<T> rather than Iterator<&T>. Under Rust’s memory model where each object is owned by exactly one thing, this is not possible with the same objects; there must be some constraint to allow you to get a new T from the &T. There are two readily provided solutions for this:
The Copy trait, for types that can just be copied bitwise. Given a variable of a type implementing Iterator<&T> where T is Copy, this can be written .map(|&x| x) or .map(|x| *x) (the two are equivalent).
The Clone trait, for any types where the operation can be caused to make sense, regardless of Copy bounds. Given a variable of a type implementing Iterator<&T> where T is Clone, this can be written .map(|x| x.clone()).
Thus, given a vector v, v.iter().map(|x| x.clone()). Generically, something like this:
fn iter<T: Clone>(slice: &[T]) -> Map<&T, T, Items<T>> {
slice.iter().map(|x| x.clone())
}

I'm not sure what you're asking here.
.iter() creates an iterator (Items) which does not move the Vec (You'll get an Iterator over &T).
Filter (and most other iterator adapters) are lazy. Perhaps you should chain() the two iterators before filtering them?
Otherwise, if you don't want the Filter to be consumed, clone it.

Related

How do I return different iterators from a function? [duplicate]

This question already has answers here:
What is the correct way to return an Iterator (or any other trait)?
(2 answers)
Closed 5 months ago.
I'm trying to change the minigrep application that I've implemented in the rust book to also take stdin input like the real grep does.
I've created a small helper function that takes the configuration and decides (currently according to an environment variable) whether to return the buffered reader iterator or the stdin iterator:
fn stdinOrFile(cfg: &Cfg) -> impl Iterator<Item = String> + '_ {
if cfg.stdin {
return io::stdin().lines();
}
let file = File::open(cfg.path.clone()).unwrap();
let reader = BufReader::new(file);
return reader.lines();
}
Realizing that I'm poking with a stick in the dark, it appears that the syntax of returning a trait object is legal, it's a dead-end for now thought. The compiler appears to still try to infer the concrete type of the returned value and complains that the other type is not of the same type, however to the best of my knowledge both implement the iterator trait.
Some ideas I have to get around this:
Box the value before returning it
Create a generic wrapper that would wrap both underlying types, then, since the minigrep matcher only uses the filter functionality of the iterators, I could have separate filter implementations on the wrapper according to which underlying type it holds, which would then call the relevant method on the underlying type.
Any ideas?
Why is the trait object syntax allowed in the return type if a concrete implementation is inferred?
You're right to think you should box it. impl Iterator is not a trait object. It can mean a couple of different things, but in return position it's a special impl return syntax. Effectively, the function still returns a concrete, ordinary iterator of known type, it's just that the compiler infers that type for you. It's most commonly used with iterators, since writing impl Iterator<Item=u32> is far nicer than the actual return type which can be some shenanigans like Map<Vec::Iter<String>, fn(String) -> i32> (and can even be non-denotable in cases where the function argument is a closure).
However, with impl return, there must still exist a concrete type. You can't return two different types in two different branches. So in this case, you do need a trait object. That's dyn Iterator, but dyn Iterator is not sized, so you need to wrap it in a Box to deal with that.
Box<dyn Iterator<Item = String> + '_>
and, of course, you'll need to wrap your return values in Box::new, since Rust won't do that for you.

Calculate object A, then return object B that references A in Rust

In my code I often want to calculate a new value A, and then return some view of that value B, because B is a type that's more convenient to work with. The simplest case is where A is a vector and B is a slice that I would like to return. Let's say I want to write a function that returns a set of indices. Ideally this would return a slice directly because then I can use it immediately to index a string.
If I return a vector instead of a slice, I have to use to_slice:
fn all_except(except: usize, max:usize) -> Vec<usize> {
(0..except).chain((except + 1)..max).collect()
}
"abcdefg"[all_except(1, 7)]
string indices are ranges of `usize`
the type `str` cannot be indexed by `Vec<usize>`
help: the trait `SliceIndex<str>` is not implemented for `Vec<usize>`
I can't return a slice directly:
fn all_except(except: usize, max:usize) -> &[usize] {
(0..except).chain((except + 1)..max).collect()
}
"abcdefg"[all_except(1, 7)]
^ expected named lifetime parameter
missing lifetime specifier
help: this function's return type contains a borrowed value with an elided lifetime, but the lifetime cannot be derived from the arguments
help: consider using the `'static` lifetime
I can't even return the underlying vector and a slice of it, for the same reason
pub fn except(index: usize, max: usize) -> (&[usize], Vec<usize>) {
let v = (0..index).chain((index + 1)..max).collect();
(v, v.as_slice)
}
"abcdefg"[all_except(1, 7)[1]
Now it may be possible to hack this particular example using deref coercion (I'm not sure), but I have encountered this problem with more complex types. For example, I have a function that loads an ndarray::Array2<T> from CSV file, then want to split it into two parts using array.split_at(), but this returns two ArrayView2<T> which reference the original Array2<T>, so I encounter the same issue. In general I'm wondering if there's a solution to this problem in general. Can I somehow tell the compiler to move A into the parent frame's scope, or let me return a tuple of (A, B), where it realises that the slice is still valid because A is still alive?
Your code doesn't seem to make any sense, you can't index a string using a slice. If you could the first snippet would have worked with just an as_slice in the caller or something, vecs trivially coerce to slices. That's exactly what the compiler error is telling you: the compiler is looking for a SliceIndex and a Vec (or slice) is definitely not that.
That aside,
Can I somehow tell the compiler to move A into the parent frame's scope, or let me return a tuple of (A, B), where it realises that the slice is still valid because A is still alive?
There are packages like owning_ref which can bundle owner and reference to avoid extra allocations. It tends to be somewhat fiddly.
I don't think there's any other general solution, because Rust reasons at the function level, the type checker has no notion of "tell the compiler to move A into the parent scope". So you need a construct which works around borrow checker.

Why does the usage of by_ref().take() differ between the Iterator and Read traits?

Here are two functions:
fn foo<I>(iter: &mut I)
where
I: std::iter::Iterator<Item = u8>,
{
let x = iter.by_ref();
let y = x.take(2);
}
fn bar<I>(iter: &mut I)
where
I: std::io::Read,
{
let x = iter.by_ref();
let y = x.take(2);
}
While the first compiles fine, the second gives the compilation error:
error[E0507]: cannot move out of borrowed content
--> src/lib.rs:14:13
|
14 | let y = x.take(2);
| ^ cannot move out of borrowed content
The signatures of by_ref and take are almost identical in std::iter::Iterator and std::io::Read traits, so I supposed that if the first one compiles, the second will compile too. Where am I mistaken?
This is indeed a confusing error message, and the reason you get it is rather subtle. The answer by ozkriff correctly explains that this is because the Read trait is not in scope. I'd like to add a bit more context, and an explanation why you are getting the specific error you see, rather than an error that the method wasn't found.
The take() method on Read and Iterator takes self by value, or in other words, it consumes its receiver. This means you can only call it if you have ownership of the receiver. The functions in your question accept iter by mutable reference, so they don't own the underlying I object, so you can't call <Iterator>::take() or <Read>::take() for the underlying object.
However, as pointed out by ozkriff, the standard library provides "forwarding" implementations of Iterator and Read for mutable references to types that implement the respective traits. When you call iter.take(2) in your first function, you actually end up calling <&mut Iterator<Item = T>>::take(iter, 2), which only consumes your mutable reference to the iterator, not the iterator itself. This is perfectly valid; while the function can't consume the iterator itself since it does not own it, the function does own the reference. In the second function, however, you end up calling <Read>::take(*iter, 2), which tries to consume the underlying reader. Since you don't own that reader, you get an error message explaining that you can't move it out of the borrowed context.
So why does the second method call resolve to a different method? The answer by ozkriff already explains that this happens because the Iterator trait is in the standard prelude, while the Read trait isn't in scope by default. Let's look at the method lookup in more detail. It is documented in the section "Method call expressions" of the Rust language reference:
The first step is to build a list of candidate receiver types. Obtain these by repeatedly dereferencing the receiver expression's type, adding each type encountered to the list, then finally attempting an unsized coercion at the end, and adding the result type if that is successful. Then, for each candidate T, add &T and &mut T to the list immediately after T.
According to this rule, our list of candidate types is
&mut I, &&mut I, &mut &mut I, I, &I, &mut I
Then, for each candidate type T, search for a visible method with a receiver of that type in the following places:
T's inherent methods (methods implemented directly on T).
Any of the methods provided by a visible trait implemented by T. If T is a type parameter, methods provided by trait bounds on T are looked up first. Then all remaining methods in scope are looked up.
For the case I: Iterator, this process starts with looking up a take() method on &mut I. There are no inherent methods on &mut I, since I is a generic type, so we can skip step 1. In step 2, we first look up methods on trait bounds for &mut I, but there are only trait bounds for I, so we move on to looking up take() on all remaining methods in scope. Since Iterator is in scope, we indeed find the forwarding implementation from the standard library, and can stop processing our list of candidate types.
For the second case, I: Read, we also start with &mut I, but since Read is not in scope, we won't see the forwarding implementation. Once we get to I in our list of candidate types, though, the clause on methods provided by trait bounds kicks in: they are looked up first, regardless of whether the trait is in scope. I has a trait bound of Read, so <Read>::take() is found. As we have seen above, calling this method causes the error message.
In summary, traits must be in scope to use their methods, but methods on trait bounds can be used even if the trait isn't in scope.
impl<'a, I: Iterator + ?Sized> Iterator for &'a mut I is the reason why the first function compiles. It implements Iterator for all mutable references to iterators.
The Read trait has the equivalent, but, unlike Iterator, the Read trait isn't in the prelude, so you'll need to use std::io::Read to use this impl:
use std::io::Read; // remove this to get "cannot move out of borrowed content" err
fn foo<I, T>(iter: &mut I)
where
I: std::iter::Iterator<Item = T>,
{
let _y = iter.take(2);
}
fn bar<I>(iter: &mut I)
where
I: std::io::Read,
{
let _y = iter.take(2);
}
Playground

How do I create an iterator that invalidates the previous result each time `next` is called? [duplicate]

This would make it possible to safely iterate over the same element twice, or to hold some state for the global thing being iterated over in the item type.
Something like:
trait IterShort<Iter>
where Self: Borrow<Iter>,
{
type Item;
fn next(self) -> Option<Self::Item>;
}
then an implementation could look like:
impl<'a, MyIter> IterShort<MyIter> for &'a mut MyIter {
type Item = &'a mut MyItem;
fn next(self) -> Option<Self::Item> {
// ...
}
}
I realize I could write my own (I just did), but I'd like one that works with the for-loop notation. Is that possible?
The std::iter::Iterator trait can not do this, but you can write a different trait:
trait StreamingIterator {
type Item;
fn next<'a>(&'a mut self) -> Option<&'a mut Self::Item>;
}
Note that the return value of next borrows the iterator itself, whereas in Vec::iter for example it only borrows the vector.
The downside is that &mut is hard-coded. Making it generic would require higher-kinded types (so that StreamingIterator::Item could itself be generic over a lifetime parameter).
Alexis Beingessner gave a talk about this and more titled Who Owns This Stream of Data? at RustCamp.
As to for loops, they’re really tied to std::iter::IntoIterator which is tied to std::iter::Iterator. You’d just have to implement both.
The standard iterators can't do this as far as I can see. The very definition of an iterator is that the outside has control over the elements while the inside has control over what produces the elements.
From what I understand of what you are trying to do, I'd flip the concept around and instead of returning elements from an iterator to a surrounding environment, pass the environment to the iterator. That is, you create a struct with a constructor function that accepts a closure and implements the iterator trait. On each call to next, the passed-in closure is called with the next element and the return value of that closure or modifications thereof are returned as the current element. That way, next can handle the lifetime of whatever would otherwise be returned to the surrounding environment.

Why does the argument for the find closure need two ampersands?

I have been playing with Rust by porting my Score4 AI engine to it - basing the work on my functional-style implementation in OCaml. I specifically wanted to see how Rust fares with functional-style code.
The end result: It works, and it's very fast - much faster than OCaml. It almost touches the speed of imperative-style C/C++ - which is really cool.
There's a thing that troubles me, though — why do I need two ampersands in the last line of this code?
let moves_and_scores: Vec<_> = moves_and_boards
.iter()
.map(|&(column,board)| (column, score_board(&board)))
.collect();
let target_score = if maximize_or_minimize {
ORANGE_WINS
} else {
YELLOW_WINS
};
if let Some(killer_move) = moves_and_scores.iter()
.find(|& &(_,score)| score==target_score) {
...
I added them is because the compiler errors "guided" me to it; but I am trying to understand why... I used the trick mentioned elsewhere in Stack Overflow to "ask" the compiler to tell me what type something is:
let moves_and_scores: Vec<_> = moves_and_boards
.iter()
.map(|&(column,board)| (column, score_board(&board)))
.collect();
let () = moves_and_scores;
...which caused this error:
src/main.rs:108:9: 108:11 error: mismatched types:
expected `collections::vec::Vec<(u32, i32)>`,
found `()`
(expected struct `collections::vec::Vec`,
found ()) [E0308]
src/main.rs:108 let () = moves_and_scores;
...as I expected, moves_and_scores is a vector of tuples: Vec<(u32, i32)>. But then, in the immediate next line, iter() and find() force me to use the hideous double ampersands in the closure parameter:
if let Some(killer_move) = moves_and_scores.iter()
.find(|& &(_,score)| score==target_score) {
Why does the find closure need two ampersands? I could see why it may need one (pass the tuple by reference to save time/space) but why two? Is it because of the iter? That is, is the iter creating references, and then find expects a reference on each input, so a reference on a reference?
If this is so, isn't this, arguably, a rather ugly design flaw in Rust?
In fact, I would expect find and map and all the rest of the functional primitives to be parts of the collections themselves. Forcing me to iter() to do any kind of functional-style work seems burdensome, and even more so if it forces this kind of "double ampersands" in every possible functional chain.
I am hoping I am missing something obvious - any help/clarification most welcome.
This here
moves_and_scores.iter()
gives you an iterator over borrowed vector elements. If you follow the API doc what type this is, you'll notice that it's just the iterator for a borrowed slice and this implements Iterator with Item=&T where T is (u32, i32) in your case.
Then, you use find which takes a predicate which takes a &Item as parameter. Sice Item already is a reference in your case, the predicate has to take a &&(u32, i32).
pub trait Iterator {
...
fn find<P>(&mut self, predicate: P) -> Option<Self::Item>
where P: FnMut(&Self::Item) -> bool {...}
... ^
It was probably defined like this because it's only supposed to inspect the item and return a bool. This does not require the item being passed by value.
If you want an iterator over (u32, i32) you could write
moves_and_scores.iter().cloned()
cloned() converts the iterator from one with an Item type &T to one with an Item type T if T is Clone. Another way to do it would be to use into_iter() instead of iter().
moves_and_scores.into_iter()
The difference between the two is that the first option clones the borrowed elements while the 2nd one consumes the vector and moves the elements out of it.
By writing the lambda like this
|&&(_, score)| score == target_score
you destructure the "double reference" and create a local copy of the i32. This is allowed since i32 is a simple type that is Copy.
Instead of destructuring the parameter of your predicate you could also write
|move_and_score| move_and_score.1 == target_score
because the dot operator automatically dereferences as many times as needed.

Resources