Implementing a custom Iterator Trait - rust

Source code
pub struct Iterating_ex {
start: u32,
end: u32,
}
impl Iterator for Iterating_ex {
type Item = u32;
fn next(&mut self) -> Option<u32> {
if self.start >= self.end {
None
} else {
let result = Some(self.start);
self.start += 1;
result
}
}
}
fn main() {
let example = Iterating_ex {
start: 0,
end: 5,
};
for i in example {
println!("{i}");
}
}
Output
0
1
2
3
4
Individually I understand what each piece of code is trying to do, however I am having trouble understanding the following, possibly due to my lack of understanding of the generic iterator trait;
Does implementing the Iterator trait for a struct automatically generate an iterable data type? In this case, I don't know why a for loop can be used on example.
It seems like the next method is called as a loop until None is returned. How does the code know to do so?

Ad 1. To be "iterable" in rust means to implement the Iterator trait. Some things however can be turned into an iterator and that is described by another trait IntoIterator. Standard library provides a blanket implementation:
impl<I: Iterator> IntoIterator for I { /* ... */}
Which means that any type that implements Iterator can be turned into one (it's noop). for loops are designed to work with types that implement IntoIterator. That's why you can write for example:
let mut v = vec![1, 2, 3, 4, 5];
for _ in &v {}
for _ in &mut v {}
for _ in v {}
Since types &Vec<T>, &mut Vec<T> and Vec<T> all implement IntoIterator trait. They all are turned into different iterator types of course, and are returning respectively &T, &mut T and T.
Ad 2. As stated before for loops can be used on types that implement IntoIterator. The documentation explains in detail how it works, but in a nutshell for loop is just a syntax sugar that turns this code:
for x in xs {
todo!()
}
Into something like this:
let mut xs_iter = xs.into_iter();
while let Some(x) = xs_iter.next() {
todo!()
}
while loops are also syntax sugar and are de-sugared into loop with a break statement but that's not relevant here.
Side note. I guess that this is just a learning example, and it's great, but the exact same iterator already exists in the standard library as std::ops::Range, so use that it your actual code if you need it.

Related

How to accept str.chars() or str.bytes() in a function and iterate twice?

Is there any way to pass somestring.chars() or somestring.bytes() to a function and allow that function to reconstruct the iterator?
An example is below. The goal is for the function to be able to iterate through coll multiple times, reconstructing it as needed using into_iter(). It works correctly for vectors and arrays, but I have not been able to get it working for the string iterator methods.
// Lifetime needed to indicate the iterator objects
// don't disappear
fn test_single<'a, I, T>(collection: &'a I)
where
&'a I: IntoIterator<Item = T>,
T: Display,
{
let count = collection.into_iter().count();
println!("Len: {}", count);
for x in collection.into_iter() {
println!("Item: {}", x);
}
}
fn main() {
// Works
test_single(&[1, 2, 3, 4]);
test_single(&vec!['a', 'b', 'c', 'd']);
let s = "abcd";
// Desired usage; does not work
// test_single(&s.chars());
// test_single(&s.bytes());
}
The general error is that Iterator is not implemented for &Chars<'_>. This doesn't make sense because chars definitely does implement IntoIterator and Iterator
Is there a solution that allows for the desired usage of test_single(&s.chars())?
Link to the playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=230ee86cd109384a1c62c362aed9d47f
(IntoIterator is prefered over Iterator for my application, since I also need to specify that IntoIterator::IntoIter is a DoubleEndedIterator.)
This can work but not the way you have it written.
You can't iterate a shared reference because Iterator::next() takes &mut self. IntoIterator::into_iter() could be made to work with e.g. &Chars, but that's not necessary because Chars and Bytes both implement Clone, which creates a copy of the iterator (but doesn't duplicate the underlying data).
So you just need to adjust your bounds and accept the iterator by value, cloning it when you will need another iterator later:
fn test_single<I, T>(collection: I)
where
I: Clone + IntoIterator<Item = T>,
T: Display,
{
let count = collection.clone().into_iter().count();
println!("Len: {}", count);
for x in collection.into_iter() {
println!("Item {}", x);
}
}
Now you can call test_single(s.chars()), for example.
(Playground)
Side note: You can express the type I purely with impl, which might be more readable:
fn test_single(collection: impl Clone + IntoIterator<Item=impl Display>) {

Returning iterator from weak references for mapping and modifying values

I'm trying quite complex stuff with Rust where I need the following attributes, and am fighting the compiler.
Object which itself lives from start to finish of application, however, where internal maps/vectors could be modified during application lifetime
Multiple references to object that can read internal maps/vectors of an object
All single threaded
Multiple nested iterators which are map/modified in lazy manner to perform fast and complex calculations (see example below)
A small example, which already causes problems:
use std::cell::RefCell;
use std::rc::Rc;
use std::sync::Weak;
pub struct Holder {
array_ref: Weak<RefCell<Vec<isize>>>,
}
impl Holder {
pub fn new(array_ref: Weak<RefCell<Vec<isize>>>) -> Self {
Self { array_ref }
}
fn get_iterator(&self) -> impl Iterator<Item = f64> + '_ {
self.array_ref
.upgrade()
.unwrap()
.borrow()
.iter()
.map(|value| *value as f64 * 2.0)
}
}
get_iterator is just one of the implementations of a trait, but even this example already does not work.
The reason for Weak/Rc is to make sure that multiple places points to object (from point (1)) and other place can modify its internals (Vec<isize>).
What is the best way to approach this situation, given that end goal is performance critical?
EDIT:
Person suggested using https://doc.rust-lang.org/std/cell/struct.Ref.html#method.map
But unfortunately still can't get - if I should also change return type - or maybe the closure function is wrong here
fn get_iterator(&self) -> impl Iterator<Item=f64> + '_ {
let x = self.array_ref.upgrade().unwrap().borrow();
let map1 = Ref::map(x, |x| &x.iter());
let map2 = Ref::map(map1, |iter| &iter.map(|y| *y as f64 * 2.0));
map2
}
IDEA say it has wrong return type
the trait `Iterator` is not implemented for `Ref<'_, Map<std::slice::Iter<'_, isize>, [closure#src/bin/main.rs:30:46: 30:65]>>`
This won't work because self.array_ref.upgrade() creates a local temporary Arc value, but the Ref only borrows from it. Obviously, you can't return a value that borrows from a local.
To make this work you need a second structure to own the Arc, which can implement Iterator in this case since the produced items aren't references:
pub struct HolderIterator(Arc<RefCell<Vec<isize>>>, usize);
impl Iterator for HolderIterator {
type Item = f64;
fn next(&mut self) -> Option<f64> {
let r = self.0.borrow().get(self.1)
.map(|&y| y as f64 * 2.0);
if r.is_some() {
self.1 += 1;
}
r
}
}
// ...
impl Holder {
// ...
fn get_iterator<'a>(&'a self) -> Option<impl Iterator<Item=f64>> {
self.array_ref.upgrade().map(|rc| HolderIterator(rc, 0))
}
}
Alternatively, if you want the iterator to also weakly-reference the value contained within, you can have it hold a Weak instead and upgrade on each next() call. There are performance implications, but this also makes it easier to have get_iterator() be able to return an iterator directly instead of an Option, and the iterator written so that a failed upgrade means the sequence has ended:
pub struct HolderIterator(Weak<RefCell<Vec<isize>>>, usize);
impl Iterator for HolderIterator {
type Item = f64;
fn next(&mut self) -> Option<f64> {
let r = self.0.upgrade()?
.borrow()
.get(self.1)
.map(|&y| y as f64 * 2.0);
if r.is_some() {
self.1 += 1;
}
r
}
}
// ...
impl Holder {
// ...
fn get_iterator<'a>(&'a self) -> impl Iterator<Item=f64> {
HolderIterator(Weak::clone(&self.array_ref), 0)
}
}
This will make it so that you always get an iterator, but it's empty if the Weak is dead. The Weak can also die during iteration, at which point the sequence will abruptly end.

Extend Existing Data Structure Via an External Library

How does one take an existing data structure (vec, hashmap, set) and extend the methods on it via an external library?
fn main() {
let vec = vec![1, 2, 3];
vec.my_new_method(...)
}
You can take advantage of the fact that a crate defining a trait can implement that trait on whatever type it wants. Here's a simple example of a combined "shift and then push" function. First it will shift the first element out of the vector (if there is one), then it will push the argument on to the end of the vector. If there was a shifted element, it is returned. (This is a bit of a silly operation, but works to demonstrate this technique.)
First we declare the trait, with the signature of the method(s) we want to add:
trait VecExt<T> {
fn shift_and_push(&mut self, v: T) -> Option<T>;
}
Now we can implement the trait for Vec<T>:
impl<T> VecExt<T> for Vec<T> {
fn shift_and_push(&mut self, v: T) -> Option<T> {
let r = if !self.is_empty() { Some(self.remove(0)) } else { None };
self.push(v);
r
}
}
Now, anywhere that VecExt is brought into scope with use (or by being in the same source file as its declaration) this extension method can be used on any vector.

Rust: polymorphic calls for structs in a vector

I'm a complete newbie in Rust and I'm trying to get some understanding of the basics of the language.
Consider the following trait
trait Function {
fn value(&self, arg: &[f64]) -> f64;
}
and two structs implementing it:
struct Add {}
struct Multiply {}
impl Function for Add {
fn value(&self, arg: &[f64]) -> f64 {
arg[0] + arg[1]
}
}
impl Function for Multiply {
fn value(&self, arg: &[f64]) -> f64 {
arg[0] * arg[1]
}
}
In my main() function I want to group two instances of Add and Multiply in a vector, and then call the value method. The following works:
fn main() {
let x = vec![1.0, 2.0];
let funcs: Vec<&dyn Function> = vec![&Add {}, &Multiply {}];
for f in funcs {
println!("{}", f.value(&x));
}
}
And so does:
fn main() {
let x = vec![1.0, 2.0];
let funcs: Vec<Box<dyn Function>> = vec![Box::new(Add {}), Box::new(Multiply {})];
for f in funcs {
println!("{}", f.value(&x));
}
}
Is there any better / less verbose way? Can I work around wrapping the instances in a Box? What is the takeaway with trait objects in this case?
Is there any better / less verbose way?
There isn't really a way to make this less verbose. Since you are using trait objects, you need to tell the compiler that the vectors's items are dyn Function and not the concrete type. The compiler can't just infer that you meant dyn Function trait objects because there could have been other traits that Add and Multiply both implement.
You can't abstract out the calls to Box::new either. For that to work, you would have to somehow map over a heterogeneous collection, which isn't possible in Rust. However, if you are writing this a lot, you might consider adding helper constructor functions for each concrete impl:
impl Add {
fn new() -> Add {
Add {}
}
fn new_boxed() -> Box<Add> {
Box::new(Add::new())
}
}
It's idiomatic to include a new constructor wherever possible, but it's also common to include alternative convenience constructors.
This makes the construction of the vector a bit less noisy:
let funcs: Vec<Box<dyn Function>> = vec!(Add::new_boxed(), Multiply::new_boxed()));
What is the takeaway with trait objects in this case?
There is always a small performance hit with using dynamic dispatch. If all of your objects are the same type, they can be densely packed in memory, which can be much faster for iteration. In general, I wouldn't worry too much about this unless you are creating a library crate, or if you really want to squeeze out the last nanosecond of performance.

How to take ownership of Any:downcast_ref from trait object?

I've met a conflict with Rust's ownership rules and a trait object downcast. This is a sample:
use std::any::Any;
trait Node{
fn gen(&self) -> Box<Node>;
}
struct TextNode;
impl Node for TextNode{
fn gen(&self) -> Box<Node>{
Box::new(TextNode)
}
}
fn main(){
let mut v: Vec<TextNode> = Vec::new();
let node = TextNode.gen();
let foo = &node as &Any;
match foo.downcast_ref::<TextNode>(){
Some(n) => {
v.push(*n);
},
None => ()
};
}
The TextNode::gen method has to return Box<Node> instead of Box<TextNode>, so I have to downcast it to Box<TextNode>.
Any::downcast_ref's return value is Option<&T>, so I can't take ownership of the downcast result and push it to v.
====edit=====
As I am not good at English, my question is vague.
I am implementing (copying may be more precise) the template parser in Go standard library.
What I really need is a vector, Vec<Box<Node>> or Vec<Box<Any>>, which can contain TextNode, NumberNode, ActionNode, any type of node that implements the trait Node can be pushed into it.
Every node type needs to implement the copy method, return Box<Any>, and then downcasting to the concrete type is OK. But to copy Vec<Box<Any>>, as you don't know the concrete type of every element, you have to check one by one, that is really inefficient.
If the copy method returns Box<Node>, then copying Vec<Box<Node>> is simple. But it seems that there is no way to get the concrete type from trait object.
If you control trait Node you can have it return a Box<Any> and use the Box::downcast method
It would look like this:
use std::any::Any;
trait Node {
fn gen(&self) -> Box<Any>; // downcast works on Box<Any>
}
struct TextNode;
impl Node for TextNode {
fn gen(&self) -> Box<Any> {
Box::new(TextNode)
}
}
fn main() {
let mut v: Vec<TextNode> = Vec::new();
let node = TextNode.gen();
if let Ok(n) = node.downcast::<TextNode>() {
v.push(*n);
}
}
Generally speaking, you should not jump to using Any. I know it looks familiar when coming from a language with subtype polymorphism and want to recreate a hierarchy of types with some root type (like in this case: you're trying to recreate the TextNode is a Node relationship and create a Vec of Nodes). I did it too and so did many others: I bet the number of SO questions on Any outnumbers the times Any is actually used on crates.io.
While Any does have its uses, in Rust it has alternatives.
In case you have not looked at them, I wanted to make sure you considered doing this with:
enums
Given different Node types you can express the "a Node is any of these types" relationship with an enum:
struct TextNode;
struct XmlNode;
struct HtmlNode;
enum Node {
Text(TextNode),
Xml(XmlNode),
Html(HtmlNode),
}
With that you can put them all in one Vec and do different things depending on the variant, without downcasting:
let v: Vec<Node> = vec![
Node::Text(TextNode),
Node::Xml(XmlNode),
Node::Html(HtmlNode)];
for n in &v {
match n {
&Node::Text(_) => println!("TextNode"),
&Node::Xml(_) => println!("XmlNode"),
&Node::Html(_) => println!("HtmlNode"),
}
}
playground
adding a variant means potentially changing your code in many places: the enum itself and all the functions that do something with the enum (to add the logic for the new variant). But then again, with Any it's mostly the same, all those functions might need to add the downcast to the new variant.
Trait objects (not Any)
You can try putting the actions you'd want to perform on the various types of nodes in the trait, so you don't need to downcast, but just call methods on the trait object.
This is essentially what you were doing, except putting the method on the Node trait instead of downcasting.
playground
The (more) ideomatic way for the problem:
use std::any::Any;
pub trait Nodeable {
fn as_any(&self) -> &dyn Any;
}
#[derive(Clone, Debug)]
struct TextNode {}
impl Nodeable for TextNode {
fn as_any(&self) -> &dyn Any {
self
}
}
fn main() {
let mut v: Vec<Box<dyn Nodeable>> = Vec::new();
let node = TextNode {}; // or impl TextNode::new
v.push(Box::new(node));
// the downcast back to TextNode could be solved like this:
if let Some(b) = v.pop() { // only if we have a node…
let n = (*b).as_any().downcast_ref::<TextNode>().unwrap(); // this is secure *)
println!("{:?}", n);
};
}
*) This is secure: only Nodeables are allowd to be downcasted to types that had Nodeable implemented.

Resources