I am trying to implement a system that would use borrow checking/lifetimes in order to provide safe custom indices on a collection. Consider the following code:
struct Graph(i32);
struct Edge<'a>(&'a Graph, i32);
impl Graph {
pub fn get_edge(&self) -> Edge {
Edge(&self, 0)
}
pub fn split(&mut self, Edge(_, edge_id): Edge) {
self.0 = self.0 + edge_id;
}
pub fn join(&mut self, Edge(_, edge0_id): Edge, Edge(_, edge1_id): Edge) {
self.0 = self.0 + edge0_id + edge1_id;
}
}
fn main() {
let mut graph = Graph(0);
let edge = graph.get_edge();
graph.split(edge)
}
References to the graph borrowed by the Edge struct should be dropped when methods such as split or join are called. This would fulfill the API invariant that all edge indices must be destroyed when the graph is mutated. However, the compiler doesn't get it. It fails with messages like
error[E0502]: cannot borrow `graph` as mutable because it is also borrowed as immutable
--> src/main.rs:23:5
|
22 | let edge = graph.get_edge();
| ----- immutable borrow occurs here
23 | graph.split(edge)
| ^^^^^ mutable borrow occurs here
24 | }
| - immutable borrow ends here
If I understand this correctly, the compiler fails to realise that the borrowing of the graph that happened in the edge struct is actually being released when the function is called. Is there a way to teach the compiler what I am trying to do here?
Bonus question: is there a way to do exactly the same but without actually borrowing the graph in the Edge struct? The edge struct is only used as a temporary for the purpose of traversal and will never be part of an external object state (I have 'weak' versions of the edge for that).
Addendum: After some digging around, it seems to be really far from trivial. First of all, Edge(_, edge_id) does not actually destructure the Edge, because _ does not get bound at all (yes, i32 is Copy which makes things even more complicated, but this is easily remedied by wrapping it into a non-Copy struct). Second, even if I completely destructure Edge (i.e. by doing it in a separate scope), the reference to the graph is still there, even though it should have been moved (this must be a bug). It only works if I perform the destructuring in a separate function. Now, I have an idea how to circumvent it (by having a separate object that describes a state change and destructures the indices as they are supplied), but this becomes very awkward very quickly.
You have a second problem that you didn’t mention: how does split know that the user didn’t pass an Edge from a different Graph? Fortunately, it’s possible to solve both problems with higher-rank trait bounds!
First, let’s have Edge carry a PhantomData marker instead of a real reference to the graph:
pub struct Edge<'a>(PhantomData<&'a mut &'a ()>, i32);
Second, let’s move all the Graph operations into a new GraphView object that gets consumed by operations that should invalidate the identifiers:
pub struct GraphView<'a> {
graph: &'a mut Graph,
marker: PhantomData<&'a mut &'a ()>,
}
impl<'a> GraphView<'a> {
pub fn get_edge(&self) -> Edge<'a> {
Edge(PhantomData, 0)
}
pub fn split(self, Edge(_, edge_id): Edge) {
self.graph.0 = self.graph.0 + edge_id;
}
pub fn join(self, Edge(_, edge0_id): Edge, Edge(_, edge1_id): Edge) {
self.graph.0 = self.graph.0 + edge0_id + edge1_id;
}
}
Now all we have to do is guard the construction of GraphView objects such that there’s never more than one with a given lifetime parameter 'a.
We can do this by (1) forcing GraphView<'a> to be invariant over 'a with a PhantomData member as above, and (2) only ever providing a constructed GraphView to a closure with a higher-rank trait bound that creates a fresh lifetime each time:
impl Graph {
pub fn with_view<Ret>(&mut self, f: impl for<'a> FnOnce(GraphView<'a>) -> Ret) -> Ret {
f(GraphView {
graph: self,
marker: PhantomData,
})
}
}
fn main() {
let mut graph = Graph(0);
graph.with_view(|view| {
let edge = view.get_edge();
view.split(edge);
});
}
Full demo on Rust Playground.
This isn’t totally ideal, since the caller may have to go through contortions to put all its operations inside the closure. But I think it’s the best we can do in the current Rust language, and it does allow us to enforce a huge class of compile-time guarantees that almost no other language can express at all. I’d love to see more ergonomic support for this pattern added to the language somehow—perhaps a way to create a fresh lifetime via a return value rather than a closure parameter (pub fn view(&mut self) -> exists<'a> GraphView<'a>)?
Related
I am trying to write a simple ECS:
struct Ecs {
component_sets: HashMap<TypeId, Box<dyn Any>>,
}
impl Ecs {
pub fn read_all<Component>(&self) -> &SparseSet<Component> {
self.component_sets
.get(&TypeId::of::<Component>())
.unwrap()
.downcast_ref::<SparseSet<Component>>()
.unwrap()
}
pub fn write_all<Component>(&mut self) -> &mut SparseSet<Component> {
self.component_sets
.get_mut(&TypeId::of::<Component>())
.unwrap()
.downcast_mut::<SparseSet<Component>>()
.unwrap()
}
}
I am trying to get mutable access to a certain component while another is immutable. This testing code triggers the error:
fn testing() {
let all_pos = { ecs.write_all::<Pos>() };
let all_vel = { ecs.read_all::<Vel>() };
for (p, v) in all_pos.iter_mut().zip(all_vel.iter()) {
p.x += v.x;
p.y += v.y;
}
}
And the error
error[E0502]: cannot borrow `ecs` as immutable because it is also borrowed as mutable
--> src\ecs.rs:191:25
|
190 | let all_pos = { ecs.write_all::<Pos>() };
| --- mutable borrow occurs here
191 | let all_vel = { ecs.read_all::<Vel>() };
| ^^^ immutable borrow occurs here
My understanding of the borrow checker rules tells me that it's totally fine to get references to different component sets mutably or immutably (that is, &mut SparseSet<Pos> and &SparseSet<Vel>) since they are two different types. In order to get these references though, I need to go through the main ECS struct which owns the sets, which is where the compiler complains (i.e. first I use &mut Ecs when I call ecs.write_all and then &Ecs on ecs.read_all).
My first instinct was to enclose the statements in a scope, thinking it could just drop the &mut Ecs after I get the reference to the inner component set so as not to have both mutable and immutable Ecs references alive at the same time. This is probably very stupid, yet I don't fully understand how, so I wouldn't mind some more explaining there.
I suspect one additional level of indirection is needed (similar to RefCell's borrow and borrow_mut) but I am not sure what exactly I should wrap and how I should go about it.
Update
Solution 1: make the method signature of write_all take a &self despite returning a RefMut<'_, SparseSet<Component>> by wrapping the SparseSet in a RefCell (as illustrated in the answer below by Kevin Reid).
Solution 2: similar as above (method signature takes &self) but uses this piece of unsafe code:
fn write_all<Component>(&self) -> &mut SparseSet<Component> {
let set = self.component_sets
.get(&TypeId::of::<Component>())
.unwrap()
.downcast_ref::<SparseSet<Component>>()
.unwrap();
unsafe {
let set_ptr = set as *const SparseSet<Component>;
let set_ptr = set_ptr as *mut SparseSet<Component>;
&mut *set_ptr
}
}
What are benefits of using solution 1, is the implied runtime borrow-checking provided by RefCell an hindrance in this case or would it actually prove useful?
Would the use of unsafe be tolerable in this case? Are there benefits? (e.g. performance)
it's totally fine to get references to different component sets mutably or immutably
This is true: we can safely have multiple mutable, or mutable and immutable references, as long as no mutable reference points to the same data as any other reference.
However, not every means of obtaining those references will be accepted by the compiler's borrow checker. This doesn't mean they're unsound; just that we haven't convinced the compiler that they're safe. In particular, the only way the compiler understands to have simultaneous references is a struct's fields, because the compiler can know those are disjoint using a purely local analysis (looking only at the code of a single function):
struct Ecs {
pub pos: SparseSet<Pos>,
pub vel: SparseSet<Vel>,
}
for (p, v) in ecs.pos.iter_mut().zip(ecs.vel.iter()) {
p.x += v.x;
p.y += v.y;
}
This would compile, because the compiler can see that the references refer to different subsets of memory. It will not compile if you replace ecs.pos with a method ecs.pos() — let alone a HashMap. As soon as you get a function involved, information about field borrowing is hidden. Your function
pub fn write_all<Component>(&mut self) -> &mut SparseSet<Component>
has the elided lifetimes (lifetimes the compiler picks for you because every & must have a lifetime)
pub fn write_all<'a, Component>(&'a mut self) -> &'a mut SparseSet<Component>
which are the only information the compiler will use about what is borrowed. Hence, the 'a mutable reference to the SparseSet is borrowing all of the Ecs (as &'a mut self) and you can't have any other access to it.
The ways to arrange to be able to have multiple mutable references in a mostly-statically-checked way are discussed in the documentation page on Borrow Splitting. However, all of those are based on having some statically known property, which you don't. There's no way to express “this is okay as long as the Component type is not equal to another call's”. Therefore, to do this you do need RefCell, our general-purpose helper for runtime borrow checking.
Given what you've got already, the simplest thing to do is to replace SparseSet<Component> with RefCell<SparseSet<Component>>:
// no mut; changed return type
pub fn write_all<Component>(&self) -> RefMut<'_, SparseSet<Component>> {
self.component_sets
.get(&TypeId::of::<Component>())
.unwrap()
.downcast::<RefCell<SparseSet<Component>>>() // changed type
.unwrap()
.borrow_mut() // added this line
}
Note the changed return type, because borrowing a RefCell must return an explicit handle in order to track the duration of the borrow. However, a Ref or RefMut acts mostly like an & or &mut thanks to deref coercion. (Your code that inserts items in the map, which you didn't show in the question, will also need a RefCell::new.)
Another option is to put the interior mutability — likely via RefCell, but not necessarily — inside the SparseSet type, or create a wrapper type that does that. This might or might not help the code be cleaner.
I have a struct and I want to call one of the struct's methods every time a mutable borrow to it has ended. To do so, I would need to know when the mutable borrow to it has been dropped. How can this be done?
Disclaimer: The answer that follows describes a possible solution, but it's not a very good one, as described by this comment from Sebastien Redl:
[T]his is a bad way of trying to maintain invariants. Mostly because dropping the reference can be suppressed with mem::forget. This is fine for RefCell, where if you don't drop the ref, you will simply eventually panic because you didn't release the dynamic borrow, but it is bad if violating the "fraction is in shortest form" invariant leads to weird results or subtle performance issues down the line, and it is catastrophic if you need to maintain the "thread doesn't outlive variables in the current scope" invariant.
Nevertheless, it's possible to use a temporary struct as a "staging area" that updates the referent when it's dropped, and thus maintain the invariant correctly; however, that version basically amounts to making a proper wrapper type and a kind of weird way to use it. The best way to solve this problem is through an opaque wrapper struct that doesn't expose its internals except through methods that definitely maintain the invariant.
Without further ado, the original answer:
Not exactly... but pretty close. We can use RefCell<T> as a model for how this can be done. It's a bit of an abstract question, but I'll use a concrete example to demonstrate. (This won't be a complete example, but something to show the general principles.)
Let's say you want to make a Fraction struct that is always in simplest form (fully reduced, e.g. 3/5 instead of 6/10). You write a struct RawFraction that will contain the bare data. RawFraction instances are not always in simplest form, but they have a method fn reduce(&mut self) that reduces them.
Now you need a smart pointer type that you will always use to mutate the RawFraction, which calls .reduce() on the pointed-to struct when it's dropped. Let's call it RefMut, because that's the naming scheme RefCell uses. You implement Deref<Target = RawFraction>, DerefMut, and Drop on it, something like this:
pub struct RefMut<'a>(&'a mut RawFraction);
impl<'a> Deref for RefMut<'a> {
type Target = RawFraction;
fn deref(&self) -> &RawFraction {
self.0
}
}
impl<'a> DerefMut for RefMut<'a> {
fn deref_mut(&mut self) -> &mut RawFraction {
self.0
}
}
impl<'a> Drop for RefMut<'a> {
fn drop(&mut self) {
self.0.reduce();
}
}
Now, whenever you have a RefMut to a RawFraction and drop it, you know the RawFraction will be in simplest form afterwards. All you need to do at this point is ensure that RefMut is the only way to get &mut access to the RawFraction part of a Fraction.
pub struct Fraction(RawFraction);
impl Fraction {
pub fn new(numerator: i32, denominator: i32) -> Self {
// create a RawFraction, reduce it and wrap it up
}
pub fn borrow_mut(&mut self) -> RefMut {
RefMut(&mut self.0)
}
}
Pay attention to the pub markings (and lack thereof): I'm using those to ensure the soundness of the exposed interface. All three types should be placed in a module by themselves. It would be incorrect to mark the RawFraction field pub inside Fraction, since then it would be possible (for code outside the module) to create an unreduced Fraction without using new or get a &mut RawFraction without going through RefMut.
Supposing all this code is placed in a module named frac, you can use it something like this (assuming Fraction implements Display):
let f = frac::Fraction::new(3, 10);
println!("{}", f); // prints 3/10
f.borrow_mut().numerator += 3;
println!("{}", f); // prints 3/5
The types encode the invariant: Wherever you have Fraction, you can know that it's fully reduced. When you have a RawFraction, &RawFraction, etc., you can't be sure. If you want, you may also make RawFraction's fields non-pub, so that you can't get an unreduced fraction at all except by calling borrow_mut on a Fraction.
Basically the same thing is done in RefCell. There you want to reduce the runtime borrow-count when a borrow ends. Here you want to perform an arbitrary action.
So let's re-use the concept of writing a function that returns a wrapped reference:
struct Data {
content: i32,
}
impl Data {
fn borrow_mut(&mut self) -> DataRef {
println!("borrowing");
DataRef { data: self }
}
fn check_after_borrow(&self) {
if self.content > 50 {
println!("Hey, content should be <= {:?}!", 50);
}
}
}
struct DataRef<'a> {
data: &'a mut Data
}
impl<'a> Drop for DataRef<'a> {
fn drop(&mut self) {
println!("borrow ends");
self.data.check_after_borrow()
}
}
fn main() {
let mut d = Data { content: 42 };
println!("content is {}", d.content);
{
let b = d.borrow_mut();
//let c = &d; // Compiler won't let you have another borrow at the same time
b.data.content = 123;
println!("content set to {}", b.data.content);
} // borrow ends here
println!("content is now {}", d.content);
}
This results in the following output:
content is 42
borrowing
content set to 123
borrow ends
Hey, content should be <= 50!
content is now 123
Be aware that you can still obtain an unchecked mutable borrow with e.g. let c = &mut d;. This will be silently dropped without calling check_after_borrow.
I ran into a problem that simplifies into the following:
struct MyIter {
vec: Vec<i8>,
}
fn fill_with_useful_data(v: &mut Vec<i8>) {
/* ... */
}
impl<'a> Iterator for MyIter {
type Item = &'a [i8];
fn next(&mut self) -> Option<&'a [i8]> {
fill_with_useful_data(&mut self.vec);
Some(&self.vec)
}
}
fn main() {
for slice in (MyIter { vec: Vec::new() }) {
println!("{}", slice);
}
}
This generates the error:
error[E0207]: the lifetime parameter `'a` is not constrained by the impl trait, self type, or predicates
--> src/main.rs:9:6
|
9 | impl<'a> Iterator for MyIter {
| ^^ unconstrained lifetime parameter
The idea is that the iterator does a bunch of work that reflects in its fields and at each step, it yields a reference into itself to the calling code. In this case I could model it as yielding a copy of the state instead of the reference, but let's pretend that's not possible or just inconveniently expensive.
Intuitively this shouldn't be a problem because the borrow checker can ensure that .next() isn't called again while the yielded reference can still be used to inspect the iterator's state, but the Iterator trait doesn't seem to provide for that sort of thing directly. Even with some permutations like only holding on to a reference to the vector in the iterator itself or making the iterator a reference or something to get the lifetimes baked into the type earlier on, I can't get anything past the borrow checker.
I read the "Iterators yielding mutable references" blogpost but I'm not sure if/how it applies to my problem that doesn't involve mutable references.
This is not possible. If it were allowed one could call next again and thus modify data that is also visible via & or even invalidate the reference entirely. This is because there is no connection between the self object itself and the returned reference: there is no explicit lifetime linking them.
For the compiler to reason about this and allow returning a reference into self next needs a signature like
fn next(&'a mut self) -> Option<&'a [i8]>
However, this differs from the signature of the trait which is not allowed as generic code that just takes an T: Iterator<...> cannot tell that there are different requirements on the use of the return value for some T; all have to be handled identically.
The Iterator trait is designed for return values that are independent of the iterator object, which is necessary for iterator adaptors like .collect to be correct and safe. This is more restrictive than necessary for many uses (e.g. a transient use inside a for loop) but it is just how it is at the moment. I don't think we have the tools for generalising this trait/the for loop properly now (specifically, I think we need associated types with higher rank lifetimes), but maybe in the future.
So I ran into this code snippet showing how to create "unmoveable" types in Rust - moves are prevented because the compiler treats the object as borrowed for its whole lifetime.
use std::cell::Cell;
use std::marker;
struct Unmovable<'a> {
lock: Cell<marker::ContravariantLifetime<'a>>,
marker: marker::NoCopy
}
impl<'a> Unmovable<'a> {
fn new() -> Unmovable<'a> {
Unmovable {
lock: Cell::new(marker::ContravariantLifetime),
marker: marker::NoCopy
}
}
fn lock(&'a self) {
self.lock.set(marker::ContravariantLifetime);
}
fn new_in(self_: &'a mut Option<Unmovable<'a>>) {
*self_ = Some(Unmovable::new());
self_.as_ref().unwrap().lock();
}
}
fn main(){
let x = Unmovable::new();
x.lock();
// error: cannot move out of `x` because it is borrowed
// let z = x;
let mut y = None;
Unmovable::new_in(&mut y);
// error: cannot move out of `y` because it is borrowed
// let z = y;
assert_eq!(std::mem::size_of::<Unmovable>(), 0)
}
I don't yet understand how this works. My guess is that the lifetime of the borrow-pointer argument is forced to match the lifetime of the lock field. The weird thing is, this code continues working in the same way if:
I change ContravariantLifetime<'a> to CovariantLifetime<'a>, or to InvariantLifetime<'a>.
I remove the body of the lock method.
But, if I remove the Cell, and just use lock: marker::ContravariantLifetime<'a> directly, as so:
use std::marker;
struct Unmovable<'a> {
lock: marker::ContravariantLifetime<'a>,
marker: marker::NoCopy
}
impl<'a> Unmovable<'a> {
fn new() -> Unmovable<'a> {
Unmovable {
lock: marker::ContravariantLifetime,
marker: marker::NoCopy
}
}
fn lock(&'a self) {
}
fn new_in(self_: &'a mut Option<Unmovable<'a>>) {
*self_ = Some(Unmovable::new());
self_.as_ref().unwrap().lock();
}
}
fn main(){
let x = Unmovable::new();
x.lock();
// does not error?
let z = x;
let mut y = None;
Unmovable::new_in(&mut y);
// does not error?
let z = y;
assert_eq!(std::mem::size_of::<Unmovable>(), 0)
}
Then the "Unmoveable" object is allowed to move. Why would that be?
The true answer is comprised of a moderately complex consideration of lifetime variancy, with a couple of misleading aspects of the code that need to be sorted out.
For the code below, 'a is an arbitrary lifetime, 'small is an arbitrary lifetime that is smaller than 'a (this can be expressed by the constraint 'a: 'small), and 'static is used as the most common example of a lifetime that is larger than 'a.
Here are the facts and steps to follow in the consideration:
Normally, lifetimes are contravariant; &'a T is contravariant with regards to 'a (as is T<'a> in the absence of any variancy markers), meaning that if you have a &'a T, it’s OK to substitute a longer lifetime than 'a, e.g. you can store in such a place a &'static T and treat it as though it were a &'a T (you’re allowed to shorten the lifetime).
In a few places, lifetimes can be invariant; the most common example is &'a mut T which is invariant with regards to 'a, meaning that if you have a &'a mut T, you cannot store a &'small mut T in it (the borrow doesn’t live long enough), but you also cannot store a &'static mut T in it, because that would cause trouble for the reference being stored as it would be forgotten that it actually lived for longer, and so you could end up with multiple simultaneous mutable references being created.
A Cell contains an UnsafeCell; what isn’t so obvious is that UnsafeCell is magic, being wired to the compiler for special treatment as the language item named “unsafe”. Importantly, UnsafeCell<T> is invariant with regards to T, for similar sorts of reasons to the invariance of &'a mut T with regards to 'a.
Thus, Cell<any lifetime variancy marker> will actually behave the same as Cell<InvariantLifetime<'a>>.
Furthermore, you don’t actually need to use Cell any more; you can just use InvariantLifetime<'a>.
Returning to the example with the Cell wrapping removed and a ContravariantLifetime (actually equivalent to just defining struct Unmovable<'a>;, for contravariance is the default as is no Copy implementation): why does it allow moving the value? … I must confess, I don’t grok this particular case yet and would appreciate some help myself in understanding why it’s allowed. It seems back to front, that covariance would allow the lock to be shortlived but that contravariance and invariance wouldn’t, but in practice it seems that only invariance is performing the desired function.
Anyway, here’s the final result. Cell<ContravariantLifetime<'a>> is changed to InvariantLifetime<'a> and that’s the only functional change, making the lock method function as desired, taking a borrow with an invariant lifetime. (Another solution would be to have lock take &'a mut self, for a mutable reference is, as already discussed, invariant; this is inferior, however, as it requires needless mutability.)
One other thing that needs mentioning: the contents of the lock and new_in methods are completely superfluous. The body of a function will never change the static behaviour of the compiler; only the signature matters. The fact that the lifetime parameter 'a is marked invariant is the key point. So the whole “construct an Unmovable object and call lock on it” part of new_in is completely superfluous. Similarly setting the contents of the cell in lock was a waste of time. (Note that it is again the invariance of 'a in Unmovable<'a> that makes new_in work, not the fact that it is a mutable reference.)
use std::marker;
struct Unmovable<'a> {
lock: marker::InvariantLifetime<'a>,
}
impl<'a> Unmovable<'a> {
fn new() -> Unmovable<'a> {
Unmovable {
lock: marker::InvariantLifetime,
}
}
fn lock(&'a self) { }
fn new_in(_: &'a mut Option<Unmovable<'a>>) { }
}
fn main() {
let x = Unmovable::new();
x.lock();
// This is an error, as desired:
let z = x;
let mut y = None;
Unmovable::new_in(&mut y);
// Yay, this is an error too!
let z = y;
}
An interesting problem! Here's my understanding of it...
Here's another example that doesn't use Cell:
#![feature(core)]
use std::marker::InvariantLifetime;
struct Unmovable<'a> { //'
lock: Option<InvariantLifetime<'a>>, //'
}
impl<'a> Unmovable<'a> {
fn lock_it(&'a mut self) { //'
self.lock = Some(InvariantLifetime)
}
}
fn main() {
let mut u = Unmovable { lock: None };
u.lock_it();
let v = u;
}
(Playpen)
The important trick here is that the structure needs to borrow itself. Once we have done that, it can no longer be moved because any move would invalidate the borrow. This isn't conceptually different from any other kind of borrow:
struct A(u32);
fn main() {
let a = A(42);
let b = &a;
let c = a;
}
The only thing is that you need some way of letting the struct contain its own reference, which isn't possible to do at construction time. My example uses Option, which requires &mut self and the linked example uses Cell, which allows for interior mutability and just &self.
Both examples use a lifetime marker because it allows the typesystem to track the lifetime without needing to worry about a particular instance.
Let's look at your constructor:
fn new() -> Unmovable<'a> { //'
Unmovable {
lock: marker::ContravariantLifetime,
marker: marker::NoCopy
}
}
Here, the lifetime put into lock is chosen by the caller, and it ends up being the normal lifetime of the Unmovable struct. There's no borrow of self.
Let's next look at your lock method:
fn lock(&'a self) {
}
Here, the compiler knows that the lifetime won't change. However, if we make it mutable:
fn lock(&'a mut self) {
}
Bam! It's locked again. This is because the compiler knows that the internal fields could change. We can actually apply this to our Option variant and remove the body of lock_it!
I ran into a problem that simplifies into the following:
struct MyIter {
vec: Vec<i8>,
}
fn fill_with_useful_data(v: &mut Vec<i8>) {
/* ... */
}
impl<'a> Iterator for MyIter {
type Item = &'a [i8];
fn next(&mut self) -> Option<&'a [i8]> {
fill_with_useful_data(&mut self.vec);
Some(&self.vec)
}
}
fn main() {
for slice in (MyIter { vec: Vec::new() }) {
println!("{}", slice);
}
}
This generates the error:
error[E0207]: the lifetime parameter `'a` is not constrained by the impl trait, self type, or predicates
--> src/main.rs:9:6
|
9 | impl<'a> Iterator for MyIter {
| ^^ unconstrained lifetime parameter
The idea is that the iterator does a bunch of work that reflects in its fields and at each step, it yields a reference into itself to the calling code. In this case I could model it as yielding a copy of the state instead of the reference, but let's pretend that's not possible or just inconveniently expensive.
Intuitively this shouldn't be a problem because the borrow checker can ensure that .next() isn't called again while the yielded reference can still be used to inspect the iterator's state, but the Iterator trait doesn't seem to provide for that sort of thing directly. Even with some permutations like only holding on to a reference to the vector in the iterator itself or making the iterator a reference or something to get the lifetimes baked into the type earlier on, I can't get anything past the borrow checker.
I read the "Iterators yielding mutable references" blogpost but I'm not sure if/how it applies to my problem that doesn't involve mutable references.
This is not possible. If it were allowed one could call next again and thus modify data that is also visible via & or even invalidate the reference entirely. This is because there is no connection between the self object itself and the returned reference: there is no explicit lifetime linking them.
For the compiler to reason about this and allow returning a reference into self next needs a signature like
fn next(&'a mut self) -> Option<&'a [i8]>
However, this differs from the signature of the trait which is not allowed as generic code that just takes an T: Iterator<...> cannot tell that there are different requirements on the use of the return value for some T; all have to be handled identically.
The Iterator trait is designed for return values that are independent of the iterator object, which is necessary for iterator adaptors like .collect to be correct and safe. This is more restrictive than necessary for many uses (e.g. a transient use inside a for loop) but it is just how it is at the moment. I don't think we have the tools for generalising this trait/the for loop properly now (specifically, I think we need associated types with higher rank lifetimes), but maybe in the future.