How to implement Iterator yielding mutable references [duplicate] - rust

This question already has an answer here:
How can I create my own data structure with an iterator that returns mutable references?
(1 answer)
Closed 1 year ago.
I am trying to implement a simple lookup iterator:
pub struct LookupIterMut<'a, D> {
data : &'a mut [D],
indices : &'a [usize],
i: usize
}
impl<'a, D> Iterator for LookupIterMut<'a, D> {
type Item = &'a mut D;
fn next(&mut self) -> Option<Self::Item> {
if self.i >= self.indices.len() {
None
} else {
let index = self.indices[self.i] as usize;
self.i += 1;
Some(&mut self.data[index]) // error here
}
}
}
The idea was to allow a caller consecutive mutable access to an internal storage. However I am getting the error cannot infer an appropriate lifetime for lifetime parameter in function call due to conflicting requirements.
As far as I understand I would have to change the function signature to next(&'a mut self) -> .. but this would not be an Iterator anymore.
I also discovered that I could simply use raw pointers, though I am not sure if this is appropriate here:
// ...
type Item = *mut D;
// ...
Thanks for your help

Your code is invalid because you try to return multiple mutable references to the same slice with the same lifetime 'a.
For such a thing to work, you would need a different lifetime for each returned Item so that you wouldn't hold 2 mutable references to the same slice. You cannot do that for now because it requires Generic Associated Types:
type Item<'item> = &'item mut D; // Does not work today
One solution is to check that the indices are unique and to rebind the lifetime of the referenced item to 'a in an unsafe block. This is safe because all the indices are unique, so the user cannot hold 2 mutable references to the same item.
Don't forget to encapsulate the whole code inside a module, so that the struct cannot be build without the check in new:
mod my_mod {
pub struct LookupIterMut<'a, D> {
data: &'a mut [D],
indices: &'a [usize],
i: usize,
}
impl<'a, D> LookupIterMut<'a, D> {
pub fn new(data: &'a mut [D], indices: &'a [usize]) -> Result<Self, ()> {
let mut uniq = std::collections::HashSet::new();
let all_distinct = indices.iter().all(move |&x| uniq.insert(x));
if all_distinct {
Ok(LookupIterMut {
data,
indices,
i: 0,
})
} else {
Err(())
}
}
}
impl<'a, D> Iterator for LookupIterMut<'a, D> {
type Item = &'a mut D;
fn next(&mut self) -> Option<Self::Item> {
self.indices.get(self.i).map(|&index| {
self.i += 1;
unsafe { std::mem::transmute(&mut self.data[index]) }
})
}
}
}
Note that your code will panic if one index is out of bounds.

Using unsafe
Reminder: it is unsound to have, at any time, two accessible mutable references to the same underlying value.
The crux of the problem is that the language cannot guarantee that the code abides by the above rule, should indices contain any duplicate, then the iterator as implemented would allow obtaining concurrently two mutable references to the same item in the slice, which is unsound.
When the language cannot make the guarantee on its own, then you either need to find an alternative approach or you need to do your due diligence and then use unsafe.
In this case, on the Playground:
impl<'a, D> LookupIterMut<'a, D> {
pub fn new(data: &'a mut [D], indices: &'a [usize]) -> Self {
let set: HashSet<usize> = indices.iter().copied().collect();
assert!(indices.len() == set.len(), "Duplicate indices!");
Self { data, indices, i: 0 }
}
}
impl<'a, D> Iterator for LookupIterMut<'a, D> {
type Item = &'a mut D;
fn next(&mut self) -> Option<Self::Item> {
if self.i >= self.indices.len() {
None
} else {
let index = self.indices[self.i];
assert!(index < self.data.len());
self.i += 1;
// Safety:
// - index is guaranteed to be within bounds.
// - indices is guaranteed not to contain duplicates.
Some(unsafe { &mut *self.data.as_mut_ptr().offset(index as isize) })
}
}
}
Performance wise, the construction of a HashSet in the constructor is rather unsatisfying but cannot really be avoided in general. If indices was guaranteed to be sorted for example, then the check could be performed without allocation.

Related

How to return a reference when implementing an iterator?

I would like to return a reference to an owned object that is in a collection (viz., a Vec), but I cannot seem to get the lifetimes correct. Here is what I first tried:
struct StringHolder {
strings: Vec<String>,
i: usize,
}
impl Iterator for StringHolder {
type Item<'a> = &'a String;
fn next(&mut self) -> Option<Self::Item> {
if self.i >= self.strings.len() {
None
} else {
self.i += 1;
Some(&self.strings[self.i])
}
}
}
fn main() {
let sh = StringHolder { strings: vec![], i: 0 };
for string in sh {
println!("{}", string);
}
}
I get an error that generic associated types are unstable and lifetimes do not match type in trait. I tried a few other iterations, but nothing seemed to work.
I gather that this may not be possible based on some things I've read, but then I can't seem to figure out how Vec does it itself. For example, I can use the following to simply iterate over the underlying Vec and return a reference on each iteration:
struct StringHolder {
strings: Vec<String>,
}
impl<'a> IntoIterator for &'a StringHolder {
type Item = &'a String;
type IntoIter = ::std::slice::Iter<'a, String>;
fn into_iter(self) -> Self::IntoIter {
(&self.strings).into_iter()
}
}
fn main() {
let sh = StringHolder { strings: vec!["A".to_owned(), "B".to_owned()] };
for string in &sh {
println!("{}", string);
}
}
So that makes me think it is possible, I just haven't figured out lifetimes yet. Thanks for your help.
The Iterator trait doesn't included a lifetime for Item, which is one of the errors you are seeing. The other alludes to GATs which is an unstable Rust feature. GATs applied to this example would let you bound the lifetime of an item for each individual call to next() instead of all items having the same lifetime. Having said that, the Iterator trait is unlikely to change so this more flexible behaviour would have to be a new trait.
Given the design of the Iterator trait, you can't have an iterator own its data and have its Item be a reference to it. There just isn't a way to express the lifetime.
The way iterators are usually written, in order to have the items be references, is to make them hold a reference to the underlying data. This provides a named lifetime for the data, which can be used on the associated Item. Vec sort of does this, but it's a bit different because Vec actually gets its iteration from slice.
Your complete example:
struct StringHolder {
strings: Vec<String>,
}
struct StringHolderIter<'a> {
string_holder: &'a StringHolder,
i: usize,
}
impl<'a> Iterator for StringHolderIter<'a> {
type Item = &'a str;
fn next(&mut self) -> Option<Self::Item> {
if self.i >= self.string_holder.strings.len() {
None
} else {
self.i += 1;
Some(&self.string_holder.strings[self.i - 1])
}
}
}
impl<'a> IntoIterator for &'a StringHolder {
type Item = &'a str;
type IntoIter = StringHolderIter<'a>;
fn into_iter(self) -> Self::IntoIter {
StringHolderIter {
string_holder: self,
i: 0,
}
}
}

How do I create mutable iterator over struct fields

So I am working on a little NES emulator using Rust and I am trying to be fancy with my status register. The register is a struct that holds some fields (flags) that contain a bool, the register itself is part of a CPU struct. Now, I want to loop through these fields and set the bool values based on some instruction I execute. However, am not able to implement a mutable iterator, I've implemented an into_iter() function and are able to iterate through the fields to get/print a bool value but how do I mutate these values within the struct itself? Is this even possible?
pub struct StatusRegister {
CarryFlag: bool,
ZeroFlag: bool,
OverflowFlag: bool,
}
impl StatusRegister {
fn new() -> Self {
StatusRegister {
CarryFlag: true,
ZeroFlag: false,
OverflowFlag: true,
}
}
}
impl<'a> IntoIterator for &'a StatusRegister {
type Item = bool;
type IntoIter = StatusRegisterIterator<'a>;
fn into_iter(self) -> Self::IntoIter {
StatusRegisterIterator {
status: self,
index: 0,
}
}
}
pub struct StatusRegisterIterator<'a> {
status: &'a StatusRegister,
index: usize,
}
impl<'a> Iterator for StatusRegisterIterator<'a> {
type Item = bool;
fn next(&mut self) -> Option<bool> {
let result = match self.index {
0 => self.status.CarryFlag,
1 => self.status.ZeroFlag,
2 => self.status.OverflowFlag,
_ => return None,
};
self.index += 1;
Some(result)
}
}
pub struct CPU {
pub memory: [u8; 0xffff],
pub status: StatusRegister,
}
impl CPU {
pub fn new() -> CPU {
let memory = [0; 0xFFFF];
CPU {
memory,
status: StatusRegister::new(),
}
}
fn execute(&mut self) {
let mut shifter = 0b1000_0000;
for status in self.status.into_iter() {
//mute status here!
println!("{}", status);
shifter <<= 1;
}
}
}
fn main() {
let mut cpu = CPU::new();
cpu.execute();
}
Implementing an iterator over mutable references is hard in general. It becomes unsound if the iterator ever returns references to the same element twice. That means that if you want to write one in purely safe code, you have to somehow convince the compiler that each element is only visited once. That rules out simply using an index: you could always forget to increment the index or set it somewhere and the compiler wouldn't be able to reason about it.
One possible way around is chaining together several std::iter::onces (one for each reference you want to iterate over).
For example,
impl StatusRegister {
fn iter_mut(&mut self) -> impl Iterator<Item = &mut bool> {
use std::iter::once;
once(&mut self.CarryFlag)
.chain(once(&mut self.ZeroFlag))
.chain(once(&mut self.OverflowFlag))
}
}
(playground)
Upsides:
Fairly simple to implement.
No allocations.
No external dependencies.
Downsides:
The iterator has a very complicated type: std::iter::Chain<std::iter::Chain<std::iter::Once<&mut bool>, std::iter::Once<&mut bool>>, std::iter::Once<&mut bool>>.
So you if don't want to use impl Iterator<Item = &mut bool>, you'll have to have that in your code. That includes implementing IntoIterator for &mut StatusRegister, since you'd have to explicitly indicate what the IntoIter type is.
Another approach is using an array or Vec to hold all the mutable references (with the correct lifetime) and then delegate to its iterator implementation to get the values. For example,
impl StatusRegister {
fn iter_mut(&mut self) -> std::vec::IntoIter<&mut bool> {
vec![
&mut self.CarryFlag,
&mut self.ZeroFlag,
&mut self.OverflowFlag,
]
.into_iter()
}
}
(playground)
Upsides:
The type is the much more manageable std::vec::IntoIter<&mut bool>.
Still fairly simple to implement.
No external dependencies.
Downsides:
Requires an allocation every time iter_mut is called.
I also mentioned using an array. That would avoid the allocation, but it turns out that arrays don't yet implement an iterator over their values, so the above code with a [&mut bool; 3] instead of a Vec<&mut bool> won't work. However, there exist crates that implement this functionality for fixed-length arrays with limited size, e.g. arrayvec (or array_vec).
Upsides:
No allocation.
Simple iterator type.
Simple to implement.
Downsides:
External dependency.
The last approach I'll talk about is using unsafe. Since this doesn't have many good upsides over the other approaches, I wouldn't recommend it in general. This is mainly to show you how you could implement this.
Like your original code, we'll implement Iterator on our own struct.
impl<'a> IntoIterator for &'a mut StatusRegister {
type IntoIter = StatusRegisterIterMut<'a>;
type Item = &'a mut bool;
fn into_iter(self) -> Self::IntoIter {
StatusRegisterIterMut {
status: self,
index: 0,
}
}
}
pub struct StatusRegisterIterMut<'a> {
status: &'a mut StatusRegister,
index: usize,
}
The unsafety comes from the next method, where we'll have to (essentially) convert something of type &mut &mut T to &mut T, which is generally unsafe. However, as long as we ensure that next isn't allowed to alias these mutable references, we should be fine. There may be some other subtle issues, so I won't guarantee that this is sound. For what it's worth, MIRI doesn't find any problems with this.
impl<'a> Iterator for StatusRegisterIterMut<'a> {
type Item = &'a mut bool;
// Invariant to keep: index is 0, 1, 2 or 3
// Every call, this increments by one, capped at 3
// index should never be 0 on two different calls
// and similarly for 1 and 2.
fn next(&mut self) -> Option<Self::Item> {
let result = unsafe {
match self.index {
// Safety: Since each of these three branches are
// executed exactly once, we hand out no more than one mutable reference
// to each part of self.status
// Since self.status is valid for 'a
// Each partial borrow is also valid for 'a
0 => &mut *(&mut self.status.CarryFlag as *mut _),
1 => &mut *(&mut self.status.ZeroFlag as *mut _),
2 => &mut *(&mut self.status.OverflowFlag as *mut _),
_ => return None
}
};
// If self.index isn't 0, 1 or 2, we'll have already returned
// So this bumps us up to 1, 2 or 3.
self.index += 1;
Some(result)
}
}
(playground)
Upsides:
No allocations.
Simple iterator type name.
No external dependencies.
Downsides:
Complicated to implement. To successfully use unsafe, you need to be very familiar with what is and isn't allowed. This part of the answer took me the longest by far to make sure I wasn't doing something wrong.
Unsafety infects the module. Within the module defining this iterator, I could "safely" cause unsoundness by messing with the status or index fields of StatusRegisterIterMut. The only thing allowing encapsulation is that outside of this module, those fields aren't visible.

How to implement an iterator over chunks of an array in a struct?

I want to implement an iterator for the struct with an array as one of its fields. The iterator should return a slice of that array, but this requires a lifetime parameter. Where should that parameter go?
The Rust version is 1.37.0
struct A {
a: [u8; 100],
num: usize,
}
impl Iterator for A {
type Item = &[u8]; // this requires a lifetime parameter, but there is none declared
fn next(&mut self) -> Option<Self::Item> {
if self.num >= 10 {
return None;
}
let res = &self.a[10*self.num..10*(self.num+1)];
self.num += 1;
Some(res)
}
}
I wouldn't implement my own. Instead, I'd reuse the existing chunks iterator and implement IntoIterator for a reference to the type:
struct A {
a: [u8; 100],
num: usize,
}
impl<'a> IntoIterator for &'a A {
type Item = &'a [u8];
type IntoIter = std::slice::Chunks<'a, u8>;
fn into_iter(self) -> Self::IntoIter {
self.a.chunks(self.num)
}
}
fn example(a: A) {
for chunk in &a {
println!("{}", chunk.iter().sum::<u8>())
}
}
When you return a reference from a function, its lifetime needs to be tied to something else. Otherwise, the compiler wouldn't know how long the reference is valid (the exception to this is a 'static lifetime, which lasts for the duration of the whole program).
So we need an existing reference to the slices. One standard way to do this is to tie the reference to the iterator itself. For example,
struct Iter<'a> {
slice: &'a [u8; 100],
num: usize,
}
Then what you have works almost verbatim. (I've changed the names of the types and fields to be a little more informative).
impl<'a> Iterator for Iter<'a> {
type Item = &'a [u8];
fn next(&mut self) -> Option<Self::Item> {
if self.num >= 100 {
return None;
}
let res = &self.slice[10 * self.num..10 * (self.num + 1)];
self.num += 1;
Some(res)
}
}
Now, you probably still have an actual [u8; 100] somewhere, not just a reference. If you still want to work with that, what you'll want is a separate struct that has a method to convert into A. For example
struct Data {
array: [u8; 100],
}
impl Data {
fn iter<'a>(&'a self) -> Iter<'a> {
Iter {
slice: &self.array,
num: 0,
}
}
}
Thanks to lifetime elision, the lifetimes on iter can be left out:
impl Data {
fn iter(&self) -> Iter {
Iter {
slice: &self.array,
num: 0,
}
}
}
(playground)
Just a few notes. There was one compiler error with [0u8; 100]. This may have been a typo for [u8; 100], but just in case, here's why we can't do that. In the fields for a struct definition, only the types are specified. There aren't default values for the fields or anything like that. If you're trying to have a default for the struct, consider using the Default trait.
Second, you're probably aware of this, but there's already an implementation of a chunk iterator for slices. If slice is a slice (or can be deref coerced into a slice - vectors and arrays are prime examples), then slice.chunks(n) is an iterator over chunks of that slice with length n. I gave an example of this in the code linked above. Interestingly, that implementation uses a very similar idea: slice.chunks(n) returns a new struct with a lifetime parameter and implements Iterator. This is almost exactly the same as our Data::iter.
Finally, your implementation of next has a bug in it that causes an out-of-bounds panic when run. See if you can spot it!

Cannot infer an appropriate lifetime when returning a slice from an iterator

I have a Vec<Point> with a simple struct Point {x: f32, y: f32, z: f32}. My vector represents hundreds of thousands of lines in 3D (it could be a Vec<Vec<Point>> in fact), so I keep track of the start/end of all lines.
pub struct Streamlines {
lengths: Vec<usize>,
offsets: Vec<usize>, // cumulative sum of lengths
data: Vec<Point>,
}
I want to create a non-consuming iterator for it, usable like:
for streamline in &streamlines {
for point in &streamline {
println!("{} {} {}", point.x, point.y, point.z);
}
println!("")
}
I found How to implement Iterator and IntoIterator for a simple struct? and started copyi-err, adapting :)
impl IntoIterator for Streamlines {
type Item = &[Point];
type IntoIter = StreamlinesIterator;
fn into_iter(self) -> Self::IntoIter {
StreamlinesIterator {
streamlines: self,
it_idx: 0
}
}
}
struct StreamlinesIterator {
streamlines: &Streamlines,
it_idx: usize
}
impl Iterator for StreamlinesIterator {
type Item = &[Point];
fn next(&mut self) -> Option<&[Point]> {
if self.it_idx < self.streamlines.lengths.len() {
let start = self.streamlines.offsets[self.it_idx];
self.it_idx += 1;
let end = self.streamlines.offsets[self.it_idx];
Some(self.streamlines.data[start..end])
}
else {
None
}
}
}
I used slices because I only want to return parts of the vector, then I added lifetimes because it's required, but now I have this error cannot infer an appropriate lifetime for lifetime parameter in generic type due to conflicting requirements
In fact, I don't actually know what I'm doing with the damn <'a>.
cannot infer an appropriate lifetime for lifetime parameter in generic type due to conflicting requirements
That's because you aren't correctly implementing Iterator and have something like this:
impl<'a> Iterator for StreamlinesIterator<'a> {
type Item = &'a [Point];
fn next(&mut self) -> Option<&[Point]> { /* ... */ }
// ...
}
Due to lifetime inference, this is equivalent to:
impl<'a> Iterator for StreamlinesIterator<'a> {
type Item = &'a [Point];
fn next<'b>(&'b mut self) -> Option<&'b [Point]> { /* ... */ }
// ...
}
This is attempting to return a reference that lives as long as the iterator, which you cannot do.
If you correctly implement Iterator, it works:
impl<'a> Iterator for StreamlinesIterator<'a> {
type Item = &'a [Point];
fn next(&mut self) -> Option<&'a [Point]> { /* ... */ }
// Even better:
fn next(&mut self) -> Option<Self::Item> { /* ... */ }
// ...
}
I don't actually know what I'm doing with the damn <'a>.
You should go back and re-read The Rust Programming Language, second edition. When you have specific questions, Stack Overflow, IRC, the User's Forum will all be waiting.

How do I specify the lifetime for the associated type of an iterator that refers to itself but does not mutate itself?

I have this struct:
struct RepIter<T> {
item: T
}
I want to implement Iterator for it so that it returns a reference to its item every time:
impl<T> Iterator for RepIter<T> {
type Item = &T;
fn next(&mut self) -> Option<Self::Item> {
return Some(&self.item);
}
}
This doesn't compile since a lifetime must be specified for type Item = &T;. Searching for a way to do this I found this question. The first solution doesn't seem applicable since I'm implementing a preexisting trait. Trying to copy the second solution directly I get something like this:
impl<'a, T> Iterator for &'a RepIter<T> {
type Item = &'a T;
fn next(self) -> Option<&'a T> {
return Some(&self.item);
}
}
This doesn't work either since I need a mutable self as argument to next. The only way I was able to get it to compile was to write it like this:
impl<'a, T> Iterator for &'a RepIter<T> {
type Item = &'a T;
fn next(&mut self) -> Option<&'a T> {
return Some(&self.item);
}
}
But now self is a reference to a reference, right? I don't know how to call next on an instance of RepIter. For example, this doesn't work:
fn main() {
let mut iter: RepIter<u64> = RepIter { item: 5 };
let res = iter.next();
}
This makes me think my implementation of the trait could be written in a better way.
As discussed in the question that Shepmaster linked to, this is a bit tricky because you really want to change the type of next(), but you can't because it's part of the trait. There are a couple of approaches to solve this though.
Making minimal changes to your code, you can just use the Iterator implementation on the &'a RepIter<T>:
pub fn main() {
let mut iter = RepIter { item: 5 };
let res = (&iter).next();
}
It's a bit unpleasant though.
Another way of looking at this is to change the ownership of your item. If it was already borrowed, then you can make all the types match up nicely:
struct RepIter<'a, T: 'a> {
item: &'a T,
}
impl<'a, T> Iterator for RepIter<'a, T> {
type Item = &'a T;
fn next(&mut self) -> Option<&'a T> {
Some(&self.item)
}
}
pub fn main() {
let val: u64 = 5;
let mut iter = RepIter { item: &val };
let res = iter.next();
}
When designing an iterator, it's often useful to have distinct types for the collection and for the iterator over that collection. Usually, the collection will own the data, and the iterator will borrow from the collection. Collection types typically implement IntoIterator and don't implement Iterator. This means that creating an iterator happens in two steps: we need to create the collection first, then create the iterator from the collection.
Here's a solution that turns your RepIter type into a collection. I'll use Shepmaster's proposition to use iter::repeat to produce the iterator.
use std::iter::{self, Repeat};
struct RepIter<T> {
item: T,
}
impl<T> RepIter<T> {
// When IntoIterator is implemented on `&Self`,
// then by convention, an inherent iter() method is provided as well.
fn iter(&self) -> Repeat<&T> {
iter::repeat(&self.item)
}
}
impl<'a, T> IntoIterator for &'a RepIter<T> {
type Item = &'a T;
type IntoIter = Repeat<&'a T>;
fn into_iter(self) -> Self::IntoIter {
self.iter()
}
}
fn main() {
let iter: RepIter<u64> = RepIter { item: 5 };
let res = iter.iter().next();
println!("{:?}", res);
let res = iter.iter().fuse().next();
println!("{:?}", res);
let res = iter.iter().by_ref().next();
println!("{:?}", res);
}
I would recommend writing your code as:
use std::iter;
fn main() {
let val = 5u64;
let mut iter = iter::repeat(&val);
let res = iter.next();
}
One thing that I don't quite understand yet is that your existing code almost works, but only for certain Iterator methods; those that take self by value instead of reference!
struct RepIter<T> {
item: T,
}
impl<'a, T> Iterator for &'a RepIter<T> {
type Item = &'a T;
fn next(&mut self) -> Option<&'a T> {
return Some(&self.item);
}
}
fn main() {
let iter: RepIter<u64> = RepIter { item: 5 };
// Works
let res = iter.fuse().next();
println!("{:?}", res);
// Doesn't work
let res = iter.by_ref().next();
println!("{:?}", res);
}
There's probably some interesting interaction happening.

Resources