Can you control borrowing a struct vs borrowing a field? - rust

I'm working on a program involving a struct along these lines:
struct App {
data: Vec<u8>,
overlay: Vec<(usize, Vec<u8>)>,
sink: Sink,
}
In brief the data field holds some bytes and overlay is a series of byte sequences to be inserted at specific indices. The Sink type is unimportant except that it has a function like:
impl Sink {
fn process<'a>(&mut self, input: Vec<&'a [u8]>) {
// ...
}
}
I've implemented an iterator to merge the information from data and overlay for consumption by Sink.
struct MergeIter<'a, 'b> {
data: &'a Vec<u8>,
overlay: &'b Vec<(usize, Vec<u8>)>,
// iterator state etc.
}
impl<'a, 'b> Iterator for MergeIter<'a, 'b> {
type Item = &'a [u8];
// ...
}
This is I think a slight lie, because the lifetime of each &[u8] returned by the iterator isn't always that of the original data. The data inserted from overlay has a different lifetime, but I don't see how I can annotate this more accurately. Anyway, the borrow checker doesn't seem to mind - the following approach works:
fn merge<'a, 'b>(data: &'a Vec<u8>, overlay: &'b Vec<(usize, Vec<u8>)>, start: usize) -> Vec<&'a [u8]> {
MergeIter::new(data, overlay, start).collect()
}
impl App {
fn process(&mut self) {
let merged = merge(&self.data, &self.overlay, 0);
// inspect contents of 'merged'
self.sink.process(merged);
}
}
I end up using this merge function all over the place, but always against the same data/overlay. So I figure I'll add an App::merge function for convenience, and here's where the problem begins:
impl App {
fn merge<'a>(&'a self, start: usize) -> Vec<&'a [u8]> {
MergeIter::new(&self.data, &self.overlay, start).collect()
}
fn process(&mut self) {
let merged = self.merge(0);
// inspect contents of 'merged'
self.sink.process(merged);
}
}
App::process now fails to pass the borrow checker - it refuses to allow the mutable borrow of self.sink while self is borrowed.
I've wrestled with this for some time, and if I've understood correctly the problem isn't with process but with this signature:
fn merge<'a>(&'a self, start: usize) -> Vec<&'a [u8]> {
Here I've essentially told the borrow checker that the references returned in the vector are equivalent to the self borrow.
Even though I feel like I've now understood the problem, I still feel like my hands are tied. Leaving the lifetime annotations out doesn't help (because the compiler does the equivalent?), and with only the two references involved there's no way I can see to tell rust that the output reference has a lifetime bound to something else.
I also tried this:
fn merge<'a, 'b>(&'b self, start: usize) -> Vec<&'a [u8]> {
let data: &'a Vec<u8> = &self.data;
MergeIter::new(&self.data, &self.overlay, start).collect()
}
but the compiler complains about the let statement ("unable to infer appropriate lifetime due to conflicting requirements" -- I also find it infuriating that the compiler doesn't explain said requirements).
Is it possible to achieve this? The Rust Reference is kind of light on lifetime annotations and associated syntax.
rustc 1.0.0-nightly (706be5ba1 2015-02-05 23:14:28 +0000)

As long as the method merge takes &self, you cannot accomplish what you desire: it borrows all of each of its arguments and this cannot be altered.
The solution is to change it so that it doesn’t take self, but instead takes the individual fields you wish to be borrowed:
impl App {
...
fn merge(data: &Vec<u8>, overlay: &Vec<(usize, Vec<u8>)>, start: usize) -> Vec<&[u8]> {
MergeIter::new(data, overlay, start).collect()
}
fn process(&mut self) {
let merged = Self::merge(&self.data, &self.overlay, 0);
... // inspect contents of 'merged'
self.sink.process(merged);
}
}

Yes, you've guessed correctly - the error happens because when you have merge method accept &self, the compiler can't know at its call site that it uses only some fields - merge signature only tells it that the data it returns is somehow derived from self, but it doesn't tell how - and so the compiler assumes the "worst" case and prevents you from accessing other fields self has.
I'm afraid there is no way to fix this at the moment, and I'm not sure there ever will be any. However, you can use macros to shorten merge invocations:
macro_rules! merge {
($this:ident, $start:expr) => {
MergeIter::new(&$this.data, &$this.overlay, $start).collect()
}
}
fn process(&mut self) {
let merged = merge!(self, 0);
// inspect contents of 'merged'
self.sink.process(merged);
}

Related

Lifetime parameters in a trait

I'm having difficulties understanding lifetime parameters in the following code snippet.
struct C {
data: Vec<u32>,
cols: usize
}
trait M<'s> {
fn get(&'s self, r: usize, c: usize) -> u32;
fn get_mut(&'s mut self, r: usize, c: usize) -> &'s mut u32;
}
impl<'s> M<'s> for C {
fn get(&'s self, r: usize, c: usize) -> u32 {
return self.data[self.cols*r+c];
}
fn get_mut(&'s mut self, r: usize, c: usize) -> &'s mut u32 {
return &mut self.data[self.cols*r+c];
}
}
#[cfg(test)]
mod tests {
use super::*;
fn create() -> C {
let data = vec![0u32,1u32,2u32,3u32,4u32,5u32];
return C{data, cols: 3};
}
fn select<'s, 'r: 's>(data: &'r mut dyn M<'s>) {
let mut _val: u32 = 0;
for r in 0..2 {
for c in 0..3 {
_val += *data.get_mut(r,c);
}
}
}
#[test]
fn test_select() {
let mut data = create();
select(&mut data);
}
}
The code snippet does not compile, because it complains that *data is borrowed multiple times in the function fn select<'s, 'r: 's>(data: &'r mut dyn M<'s>) {} when calling get_mut (once in every loop iteration). Even safeguarding the questionable line with curly braces (and thus creating a new context) does not help. My expectation (in both cases) would be, that the mutable borrow of &mut data should end right after the execution of that line.
On the other hand, when I remove all lifetime parameters, everything works as expected.
Can anyone explain what's the difference between the two versions (with and without explicit lifetimes)?
I've also tried to find information about additional lifetime parameters for traits, in particular specifying their meaning, but I have found none. So I assume, that they are just a declaration of the used labels inside the trait. But if that is so, then I would assume that leaving out the lifetime parameters completely and applying the eliding rules would lead to the same result.
There are two things to consider. The first is when you use a generic lifetime for a function, that lifetime must be larger than the life of the function call simply by construction. And the second is since the lifetime self is tied to the lifetime parameter of the trait, when you call .get_mut(), data is borrowed for the lifetime of 's. Combining those two principles, data is borrowed for longer than the function call so you can't call it again (its already mutably borrowed).
On the other hand, when I remove all lifetime parameters, everything works as expected. Can anyone explain what's the difference between the two versions (with and without explicit lifetimes)?
Without a generic lifetime on M, the methods will behave as if defined as so:
impl M for C {
fn get<'a>(&'a self, r: usize, c: usize) -> u32 {
return self.data[self.cols * r + c];
}
fn get_mut<'a>(&'a mut self, r: usize, c: usize) -> &'a mut u32 {
return &mut self.data[self.cols * r + c];
}
}
Thus there is no lifetime associated with the trait; the lifetimes given and returned from the function are generic only to those method calls. And since the compiler can choose a new lifetime 'a for each call and it will always pick the shorted lifetime to satisfy its usage, you can then call data.get_mut() multiple times without worry. And I'll be honest, having the lifetime on the trait didn't make much sense with the original code; as mentioned, the code works with all lifetime annotations removed: playground.

Copying local variable into vector in Rust

I'm new to Rust, and I'm trying to copy a local variable into a vector. Here's my attempt:
#[derive(Copy, Clone)]
struct DFAItem<'a> {
reading: usize,
production: &'a grammar::CFGProduction<'a>,
next_terminal: i32,
}
fn add_nonterminal<'a>(cfg: &'a grammar::CFG, nonterminal: usize, itemset: &'a mut Vec<DFAItem>) {
let productions = &cfg.productions[nonterminal];
for prod in productions {
let item = DFAItem {
reading: 0,
production: prod,
next_terminal: 0,
};
itemset.push(item); //here, I get a lifetime error (lifetime 'a required).
match prod.rhs[0] {
grammar::Symbol::Nonterminal(x) if x != nonterminal => add_nonterminal(cfg, x, itemset),
_ => (),
}
}
}
I understand that I can't modify the lifetime of item to make it match itemset, so what I'm trying to do is copy item into the vector, so that would have the vector's lifetime. Any help/tips would be appreciated.
Also, anybody know the syntax so that I could change cfg to have at least as long of a lifetime as itemset instead of the same? Would I just declare a second lifetime or is there a better way to do it?
EDIT: here are the definitions of CFG and CFGProduction:
pub enum Symbol {
Terminal(i32),
Nonterminal(usize),
}
pub struct CFGProduction<'a> {
pub nonterminal: usize,
pub rhs: &'a Vec<Symbol>,
}
pub struct CFG<'a> {
pub terminals: Vec<i32>,
pub productions: Vec<Vec<CFGProduction<'a>>>,
}
First, the lifetime of the itemset vec is not relevant and doesn't need to be constrained to anything. Second, CFG and DFAItem have generic lifetime parameters, so they should be indicated as such when using them in function arguments.
Here's my take, there's two big lifetimes involved here:
'a: the lifetime needed by CFGProduction's
'b: the lifetime of cfg and its subsequent references stored in DFAItems
Therefore, DFAItem should have two lifetimes:
struct DFAItem<'a, 'b> {
// ...
production: &'b grammar::CFGProduction<'a>,
// ...
}
and add_nonterminal()'s signature would look like so:
fn add_nonterminal<'a, 'b>(cfg: &'b grammar::CFG<'a>, nonterminal: usize, itemset: &mut Vec<DFAItem<'a, 'b>>) {
// ...
}
With these lifetime changes, the function body compiles as is. See it on the playground.
You can choose not to do that and just use 'a for everything:
struct DFAItem<'a> {
// ...
production: &'a grammar::CFGProduction<'a>,
// ...
}
fn add_nonterminal<'a>(cfg: &'a grammar::CFG<'a>, nonterminal: usize, itemset: &mut Vec<DFAItem<'a>>) {
// ...
}
but I'd advise against it. Types with the pattern &'a Type<'a>, where the generic lifetime is linked with itself can cause problems down the line; especially with mutability.

How do I create mutable iterator over struct fields

So I am working on a little NES emulator using Rust and I am trying to be fancy with my status register. The register is a struct that holds some fields (flags) that contain a bool, the register itself is part of a CPU struct. Now, I want to loop through these fields and set the bool values based on some instruction I execute. However, am not able to implement a mutable iterator, I've implemented an into_iter() function and are able to iterate through the fields to get/print a bool value but how do I mutate these values within the struct itself? Is this even possible?
pub struct StatusRegister {
CarryFlag: bool,
ZeroFlag: bool,
OverflowFlag: bool,
}
impl StatusRegister {
fn new() -> Self {
StatusRegister {
CarryFlag: true,
ZeroFlag: false,
OverflowFlag: true,
}
}
}
impl<'a> IntoIterator for &'a StatusRegister {
type Item = bool;
type IntoIter = StatusRegisterIterator<'a>;
fn into_iter(self) -> Self::IntoIter {
StatusRegisterIterator {
status: self,
index: 0,
}
}
}
pub struct StatusRegisterIterator<'a> {
status: &'a StatusRegister,
index: usize,
}
impl<'a> Iterator for StatusRegisterIterator<'a> {
type Item = bool;
fn next(&mut self) -> Option<bool> {
let result = match self.index {
0 => self.status.CarryFlag,
1 => self.status.ZeroFlag,
2 => self.status.OverflowFlag,
_ => return None,
};
self.index += 1;
Some(result)
}
}
pub struct CPU {
pub memory: [u8; 0xffff],
pub status: StatusRegister,
}
impl CPU {
pub fn new() -> CPU {
let memory = [0; 0xFFFF];
CPU {
memory,
status: StatusRegister::new(),
}
}
fn execute(&mut self) {
let mut shifter = 0b1000_0000;
for status in self.status.into_iter() {
//mute status here!
println!("{}", status);
shifter <<= 1;
}
}
}
fn main() {
let mut cpu = CPU::new();
cpu.execute();
}
Implementing an iterator over mutable references is hard in general. It becomes unsound if the iterator ever returns references to the same element twice. That means that if you want to write one in purely safe code, you have to somehow convince the compiler that each element is only visited once. That rules out simply using an index: you could always forget to increment the index or set it somewhere and the compiler wouldn't be able to reason about it.
One possible way around is chaining together several std::iter::onces (one for each reference you want to iterate over).
For example,
impl StatusRegister {
fn iter_mut(&mut self) -> impl Iterator<Item = &mut bool> {
use std::iter::once;
once(&mut self.CarryFlag)
.chain(once(&mut self.ZeroFlag))
.chain(once(&mut self.OverflowFlag))
}
}
(playground)
Upsides:
Fairly simple to implement.
No allocations.
No external dependencies.
Downsides:
The iterator has a very complicated type: std::iter::Chain<std::iter::Chain<std::iter::Once<&mut bool>, std::iter::Once<&mut bool>>, std::iter::Once<&mut bool>>.
So you if don't want to use impl Iterator<Item = &mut bool>, you'll have to have that in your code. That includes implementing IntoIterator for &mut StatusRegister, since you'd have to explicitly indicate what the IntoIter type is.
Another approach is using an array or Vec to hold all the mutable references (with the correct lifetime) and then delegate to its iterator implementation to get the values. For example,
impl StatusRegister {
fn iter_mut(&mut self) -> std::vec::IntoIter<&mut bool> {
vec![
&mut self.CarryFlag,
&mut self.ZeroFlag,
&mut self.OverflowFlag,
]
.into_iter()
}
}
(playground)
Upsides:
The type is the much more manageable std::vec::IntoIter<&mut bool>.
Still fairly simple to implement.
No external dependencies.
Downsides:
Requires an allocation every time iter_mut is called.
I also mentioned using an array. That would avoid the allocation, but it turns out that arrays don't yet implement an iterator over their values, so the above code with a [&mut bool; 3] instead of a Vec<&mut bool> won't work. However, there exist crates that implement this functionality for fixed-length arrays with limited size, e.g. arrayvec (or array_vec).
Upsides:
No allocation.
Simple iterator type.
Simple to implement.
Downsides:
External dependency.
The last approach I'll talk about is using unsafe. Since this doesn't have many good upsides over the other approaches, I wouldn't recommend it in general. This is mainly to show you how you could implement this.
Like your original code, we'll implement Iterator on our own struct.
impl<'a> IntoIterator for &'a mut StatusRegister {
type IntoIter = StatusRegisterIterMut<'a>;
type Item = &'a mut bool;
fn into_iter(self) -> Self::IntoIter {
StatusRegisterIterMut {
status: self,
index: 0,
}
}
}
pub struct StatusRegisterIterMut<'a> {
status: &'a mut StatusRegister,
index: usize,
}
The unsafety comes from the next method, where we'll have to (essentially) convert something of type &mut &mut T to &mut T, which is generally unsafe. However, as long as we ensure that next isn't allowed to alias these mutable references, we should be fine. There may be some other subtle issues, so I won't guarantee that this is sound. For what it's worth, MIRI doesn't find any problems with this.
impl<'a> Iterator for StatusRegisterIterMut<'a> {
type Item = &'a mut bool;
// Invariant to keep: index is 0, 1, 2 or 3
// Every call, this increments by one, capped at 3
// index should never be 0 on two different calls
// and similarly for 1 and 2.
fn next(&mut self) -> Option<Self::Item> {
let result = unsafe {
match self.index {
// Safety: Since each of these three branches are
// executed exactly once, we hand out no more than one mutable reference
// to each part of self.status
// Since self.status is valid for 'a
// Each partial borrow is also valid for 'a
0 => &mut *(&mut self.status.CarryFlag as *mut _),
1 => &mut *(&mut self.status.ZeroFlag as *mut _),
2 => &mut *(&mut self.status.OverflowFlag as *mut _),
_ => return None
}
};
// If self.index isn't 0, 1 or 2, we'll have already returned
// So this bumps us up to 1, 2 or 3.
self.index += 1;
Some(result)
}
}
(playground)
Upsides:
No allocations.
Simple iterator type name.
No external dependencies.
Downsides:
Complicated to implement. To successfully use unsafe, you need to be very familiar with what is and isn't allowed. This part of the answer took me the longest by far to make sure I wasn't doing something wrong.
Unsafety infects the module. Within the module defining this iterator, I could "safely" cause unsoundness by messing with the status or index fields of StatusRegisterIterMut. The only thing allowing encapsulation is that outside of this module, those fields aren't visible.

call callback with reference to field

Consider such code:
trait OnUpdate {
fn on_update(&mut self, x: &i32);
}
struct Foo {
field: i32,
cbs: Vec<Box<OnUpdate>>,
}
impl Foo {
fn subscribe(&mut self, cb: Box<OnUpdate>) {
self.cbs.push(cb);
}
fn set_x(&mut self, v: i32) {
self.field = v;
//variant 1
//self.call_callbacks(|v| v.on_update(&self.field));
//variant 2
let f_ref = &self.field;
for item in &mut self.cbs {
item.on_update(f_ref);
}
}
fn call_callbacks<CB: FnMut(&mut Box<OnUpdate>)>(&mut self, mut cb: CB) {
for item in &mut self.cbs {
cb(item);
}
}
}
If I comment variant 2 and uncomment variant 1,
it doesn't compiles, because of I need &Foo and &mut Foo at the same time.
But I really need function in this place, because of I need the same
code to call callbacks in several places.
So do I need macros here to call callbacks, or may be another solution?
Side notes: in real code I use big structure instead of i32,
so I can not copy it. Also I have several methods in OnUpdate,
so I need FnMut in call_callbacks.
An important rule of Rust's borrow checker is, mutable access is exclusive access.
In variant 2, this rule is upheld because the reference to self.field and to mut self.cbs never really overlap. The for loop implicitly invokes into_iter on &mut Vec, which returns a std::slice::IterMut object that references the vector, but not the rest of Foo. In other words, the for loop does not really contain a mutable borrow of self.
In variant 1, there is a call_callbacks which does retain a mutable borrow of self, which means it cannot receive (directly on indirectly) another borrow of self. In other words, at the same time:
It accepts a mutable reference to self, which allows it to modify all its fields, including self.field.
It accepts a closure that also refers to self, because it uses the expression self.field.
Letting this compile would allow call_callbacks to mutate self.field without the closure being aware of it. In case of an integer it might not sound like a big deal, but for other data this would lead to bugs that Rust's borrow checker is explicitly designed to prevent. For example, Rust relies on these properties to prevent unsafe iteration over mutating containers or data races in multi-threaded programs.
In your case it is straightforward to avoid the above situation. set_x is in control both of the contents of the closure and of the mutation to self.field. It could be restated to pass a temporary variable to the closure, and then update self.field, like this:
impl Foo {
fn subscribe(&mut self, cb: Box<OnUpdate>) {
self.cbs.push(cb);
}
fn set_x(&mut self, v: i32) {
self.call_callbacks(|cb| cb.on_update(&v));
self.field = v;
}
fn call_callbacks<OP>(&mut self, mut operation: OP)
where OP: FnMut(&mut OnUpdate)
{
for cb in self.cbs.iter_mut() {
operation(&mut **cb);
}
}
}
Rust has no problem with this code, and effect is the same.
As an exercise, it is possible to write a version of call_callbacks that works like variant 2. In that case, it needs to accept an iterator into the cbs Vec, much like the for loop does, and it must not accept &self at all:
fn set_x(&mut self, v: i32) {
self.field = v;
let fref = &self.field;
Foo::call_callbacks(&mut self.cbs.iter_mut(),
|cb| cb.on_update(fref));
}
fn call_callbacks<OP>(it: &mut Iterator<Item=&mut Box<OnUpdate>>,
mut operation: OP)
where OP: FnMut(&mut OnUpdate)
{
for cb in it {
operation(&mut **cb);
}
}

Borrow vs mutable borrow strange failure in lifetimes

While trying to implement an iterator which yields mutable refs to elements of a linked list, I stumbled upon a strange issue.
This works fine:
impl<'a, T> Iterator<&'a T> for LinkedListIterator<'a, T>{
fn next(&mut self) -> Option<&'a T> {
match self.current {
&Cell(ref x, ref xs) => {self.current = &**xs; Some(x)},
&End => None
}
}
}
But this doesn't work; the compiler says lifetime of self is too short to guarantee its contents can be safely reborrowed:
impl<'a, T> Iterator<&'a mut T> for LinkedListMutIterator<'a, T>{
fn next(&mut self) -> Option<&'a mut T> {
match self.current {
&Cell(ref mut x, ref mut xs) => {self.current = &mut **xs; Some(x)},
&End => None
}
}
}
I would expect that either both example work, or both do not, but I can't understand how borrowing something as mutable vs not-mutable would impact the way the compiler checks for lifetimes. Surely if something lives long enough to be safely borrowed, it lives long enough to be safely mutably borrowed?
EDIT: Here is the definition of both Iterators:
pub struct LinkedListIterator<'a, T>
current: &'a LinkedList<T>
}
pub struct LinkedListMutIterator<'a, T> {
current: &'a mut LinkedList<T>
}
LinkedLisk:
#[deriving(Eq, Clone)]
pub enum LinkedList<T> {
Cell(T, ~LinkedList<T>),
End
}
For a complete view of the file, please see https://github.com/TisButMe/rust-algo/blob/mut_iter/LinkedList/linked_list.rs
Note that you've left out the definition(s) of LinkedListMutIterator for the two variant bits of code, which might be relevant to any real attempt to reproduce and dissect your problem.
So, I'll try to guess at what's going on.
The compiler error message here might be misleading you; there are other factors beyond the lifetime of self that may be relevant here.
In particular I suspect the borrow-checker is complaining because it is trying to ensure that you are not creating multiple mutable-borrows that alias the same state.
It is sound to have multiple immutable-borrows to the same piece of state...
... but you cannot have multiple mutable-borrows to the same piece of state (because we want to ensure that if you have a &mut reference to some state, then that reference is the only way to mutate the state).

Resources