What are the Rust borrowing rules regarding mutable internal references? - rust

It's not intuitive to me why a program like
#[derive(Debug)]
struct Test {
buf: [u8; 16],
}
impl Test {
fn new() -> Test {
Test {
buf: [0u8; 16],
}
}
fn hi(&mut self) {
self.buf[0] = 'H' as u8;
self.buf[1] = 'i' as u8;
self.buf[2] = '!' as u8;
self.print();
}
fn print(&self) {
println!("{:?}", self);
}
}
fn main() {
Test::new().hi();
}
compiles and runs without any problem, but a program like
#[derive(Debug)]
enum State {
Testing([u8; 16]),
}
#[derive(Debug)]
struct Test {
state: State,
}
impl Test {
fn new() -> Test {
Test {
state: State::Testing([0u8; 16]),
}
}
fn hi(&mut self) {
match self.state {
State::Testing(ref mut buf) => {
buf[0] = 'H' as u8;
buf[1] = 'i' as u8;
buf[2] = '!' as u8;
self.print();
},
}
}
fn print(&self) {
println!("{:?}", self);
}
}
fn main() {
Test::new().hi();
}
errors during compilation with an error of
error[E0502]: cannot borrow *self as immutable because
self.state.0 is also borrowed as mutable
Since both programs do essentially the same thing, the second doesn't seem like it would be somehow more unsafe from a memory perspective. I know there must be something about the borrowing and scoping rules that I must be missing, but have no idea what.

In order to make your hi function work you just need to move print out of the scope of the mutable borrow introduced in its match expression:
fn hi(&mut self) {
match self.state {
State::Testing(ref mut buf) => {
buf[0] = 'H' as u8;
buf[1] = 'i' as u8;
buf[2] = '!' as u8;
},
}
self.print();
}
Your two variants are not equivalent due to the presence of the match block in the second case. I don't know how to directly access the tuple struct in the enum without pattern matching (or if this is even possible right now), but if it was the case, then there would in fact be not much difference and both versions would work.

In the match statement, you borrow self.state. Borrow scopes are lexical, so it is borrowed in the entire match block. When you call self.print(), you need to borrow self. But that is not possible, because part of self is already borrowed. If you move self.print() after the match statement, it will work.
Regarding the lexical borrow scope, you can read more in the second part of Two bugs in the borrow checker every Rust developer should know about. Related issues: #6393, #811.

Related

Struct with field holding a mutable reference to another instance of this same struct

I have a struct which has a field (named outer) that I would like to hold a mutable reference to another instance of this same struct. I want to specify to the compiler that this outer field will always live for at least as long as the current instance.
A simplified version of the issue (that does not compile yet):
use std::collections::HashMap;
#[derive(Debug)]
struct Env<'a> {
symbols: HashMap<String, i64>,
outer: Option<&'a mut Env<'a>>
}
impl <'a> Env<'a> {
fn new() -> Self {
Env {
symbols: HashMap::new(),
outer: None,
}
}
fn new_enclosed(outer: &'a mut Env<'a>) -> Self {
Env {
symbols: HashMap::new(),
outer: Some(outer),
}
}
fn resolve(&self, name: &str) -> i64 {
if self.symbols.contains_key(name) {
return *self.symbols.get(name).unwrap();
}
if let Some(outer) = &self.outer {
return outer.resolve(name);
}
return 0;
}
fn insert(&mut self, name: &str, value: i64) {
self.symbols.insert(name.to_owned(), value);
}
fn update(&mut self, name: &str, value: i64) {
if self.symbols.contains_key(name) {
self.symbols.insert(name.to_owned(), value);
} else {
if let Some(outer) = &mut self.outer {
return outer.update(name, value);
}
}
}
}
fn main() {
// outer env
let mut env = Env::new();
env.insert("foo", 0);
env.insert("bar", 0);
one(&mut env);
assert_eq!(env.resolve("foo"), 0); // <-- Cannot borrow as mutable more than once.
assert_eq!(env.resolve("bar"), 2); // <-- Cannot borrow as mutable more than once.
}
fn one<'a>(outer: &'a mut Env<'a>) {
let mut env = Env::new_enclosed(outer);
env.insert("foo", 1);
env.update("bar", 1);
for i in 1..3 {
two(&mut env); // <-- Cannot borrow as mutable more than once.
}
}
fn two<'b>(outer: &'b mut Env<'b>) {
let mut env = Env::new_enclosed(outer);
env.insert("foo", 2);
env.update("bar", 2);
}
Playground link: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=0836f29242e6683a15b57d829657306b
In theory, this sort of structure should be possible without wrapping Env in <Rc<RefCell<...>>, right? So I'm guessing I am messing up the lifetime parameters for the functions one and two in this example, but can't seem to figure it out.
Any pointers (pun intended) would be greatly appreciated.
EDIT 1:
So now I don't think this sort of setup is possible without RefCell because I am passing the mutable reference inside a loop, which is never possible (right?). So there probably isn't an easy answer to this besides a completely different structure altogether, which I would still be very interested to hear other people's thoughts on. But also maybe I should not be afraid to use RefCell at all?
For example, are there any runtime costs associated with RefCell when simply using it to borrow a non-mutable reference?

Is there a way to tell rustc when is is being too conservative about certain borrows?

I'm playing around with building a very simple stack based evaluator in Rust and I've come up against an odd situation where I think the borrow checker is being too conservative:
use std::collections::HashMap;
pub type Value = i32;
pub type Result = std::result::Result<(), Error>;
type Op = Box<dyn Fn(&mut Evaluator) -> Result>;
type OpTable = HashMap<String, Op>;
pub struct Evaluator {
stack: Vec<Value>,
ops: OpTable,
}
#[derive(Debug, PartialEq)]
pub enum Error {
DivisionByZero,
StackUnderflow,
UnknownWord,
InvalidWord,
}
impl Evaluator {
fn add(&mut self) -> Result {
if let (Some(x), Some(y)) = (self.stack.pop(), self.stack.pop()) {
self.stack.push(y + x);
Ok(())
} else {
Err(Error::StackUnderflow)
}
}
fn sub(&mut self) -> Result {
if let (Some(x), Some(y)) = (self.stack.pop(), self.stack.pop()) {
self.stack.push(y - x);
Ok(())
} else {
Err(Error::StackUnderflow)
}
}
pub fn new() -> Evaluator {
let stack: Vec<Value> = vec![];
let mut ops: OpTable = HashMap::new();
ops.insert("+".to_string(), Box::new(Evaluator::add));
ops.insert("-".to_string(), Box::new(Evaluator::sub));
Evaluator { stack, ops }
}
pub fn eval(&mut self, input: &str) -> Result {
let symbols = input.split_ascii_whitespace().collect::<Vec<_>>();
// user definition
if let (Some(&":"), Some(&";")) = (symbols.first(), symbols.last()) {
if symbols.len() > 3 {
let statement = symbols[2..symbols.len() - 1].join(" ");
self.ops.insert(
symbols[1].to_string().to_ascii_lowercase(),
Box::new(move |caller: &mut Evaluator| caller.exec(&statement)),
);
return Ok(());
} else {
return Err(Error::InvalidWord);
}
}
self.exec(input)
}
fn exec(&mut self, input: &str) -> Result {
let symbols = input.split_ascii_whitespace().collect::<Vec<_>>();
for sym in symbols {
if let Ok(n) = sym.parse::<i32>() {
self.stack.push(n);
} else {
let s = sym.to_ascii_lowercase();
if let Some(f) = self.ops.get(&s) { // <--------------errors here
f(self)?; // <----------------------------|
} else {
return Err(Error::InvalidWord);
}
}
}
Ok(())
}
}
fn main() {
let mut e = Evaluator::new();
e.eval("1 2 +");
println!("{:?}", e.stack);
e.eval(": plus-1 1 + ;");
e.eval("4 plus-1");
println!("{:?}", e.stack);
}
I'm getting:
error[E0502]: cannot borrow `*self` as mutable because it is also borrowed as immutable
--> src/main.rs:77:21
|
76 | if let Some(f) = self.ops.get(&s) {
| -------- immutable borrow occurs here
77 | f(self)?;
| -^^^^^^
| |
| mutable borrow occurs here
| immutable borrow later used by call
For more information about this error, try `rustc --explain E0502`.
error: could not compile `evaluator` due to previous error
I believe this is because taking part of the hashmap (f) borrows all of self immutably, then I'm passing self mutably to f(). However, there is no real conflict here (I think).
I'm able to get around this by actually removing and reinserting the value:
fn exec(&mut self, input: &str) -> Result {
let symbols = input.split_ascii_whitespace().collect::<Vec<_>>();
for sym in symbols {
if let Ok(n) = sym.parse::<i32>() {
self.stack.push(n);
} else {
let s = sym.to_ascii_lowercase();
if self.ops.contains_key(&s) {
let f = self.ops.remove(&s).unwrap();
if let Err(e) = f(self) {
self.ops.insert(s, f);
return Err(e);
}
self.ops.insert(s, f);
} else {
return Err(Error::InvalidWord);
}
}
}
Ok(())
}
But this feels hacky and is a lot more verbose and inefficient. Am I missing something? Is there a way to tell the compiler the first version is ok?
The compiler is entirely correct, and so is your explanation: The call to get() needs a borrow on self.ops to return an &Op of the same lifetime. You then try to call that FnMut with a mutable borrow of self; this mutable borrow of self aliases with the immutable borrow on self.ops, and in theory it would be possible for an implementation of this FnMut to modify that borrowed Op through self, which is not allowed. The compiler prevented a situation where mutation occurs through an aliased pointer.
This situation often occurs when passing &mut self around, as immutable borrows on members of self which result in more borrows (&self.ops.get() has the same lifetime as &self) "lock" all of self.
While your second example is cumbersome, it is at least correct as proven by the compiler: By removing Op from the hashtable, the FnMut can't reach itself through self anymore, and mutation while aliasing is prevented.
A better approach is usually to avoid taking &mut self as an argument (&mut self as in &mut Executor).

Why does a call to `fn pop(&mut self) -> Result<T, &str>` continue to borrow my data structure?

I am developing some basic data structures to learn the syntax and Rust in general. Here is what I came up with for a stack:
#[allow(dead_code)]
mod stack {
pub struct Stack<T> {
data: Vec<T>,
}
impl<T> Stack<T> {
pub fn new() -> Stack<T> {
return Stack { data: Vec::new() };
}
pub fn pop(&mut self) -> Result<T, &str> {
let len: usize = self.data.len();
if len > 0 {
let idx_to_rmv: usize = len - 1;
let last: T = self.data.remove(idx_to_rmv);
return Result::Ok(last);
} else {
return Result::Err("Empty stack");
}
}
pub fn push(&mut self, elem: T) {
self.data.push(elem);
}
pub fn is_empty(&self) -> bool {
return self.data.len() == 0;
}
}
}
mod stack_tests {
use super::stack::Stack;
#[test]
fn basics() {
let mut s: Stack<i16> = Stack::new();
s.push(16);
s.push(27);
let pop_result = s.pop().expect("");
assert_eq!(s.pop().expect("Empty stack"), 27);
assert_eq!(s.pop().expect("Empty stack"), 16);
let pop_empty_result = s.pop();
match pop_empty_result {
Ok(_) => panic!("Should have had no result"),
Err(_) => {
println!("Empty stack");
}
}
if s.is_empty() {
println!("O");
}
}
}
I get this interesting error:
error[E0502]: cannot borrow `s` as immutable because it is also borrowed as mutable
--> src/main.rs:58:12
|
49 | let pop_empty_result = s.pop();
| - mutable borrow occurs here
...
58 | if s.is_empty() {
| ^ immutable borrow occurs here
...
61 | }
| - mutable borrow ends here
Why can't I just call pop on my mutable struct?
Why does pop borrow the value? If I add a .expect() after it, it is ok, it doesn't trigger that error. I know that is_empty takes an immutable reference, if I switch it to mutable I just get a second mutable borrow.
Your pop function is declared as:
pub fn pop(&mut self) -> Result<T, &str>
Due to lifetime elision, this expands to
pub fn pop<'a>(&'a mut self) -> Result<T, &'a str>
This says that the Result::Err variant is a string that lives as long as the stack you are calling it on. Since the input and output lifetimes are the same, the returned value might be pointing somewhere into the Stack data structure so the returned value must continue to hold the borrow.
If I add a .expect() after it, it is ok, it doesn't trigger that error.
That's because expect consumes the Result, discarding the Err variant without ever putting it into a variable binding. Since that's never stored, the borrow cannot be saved anywhere and it is released.
To solve the problem, you need to have distinct lifetimes between the input reference and output reference. Since you are using a string literal, the easiest solution is to denote that using the 'static lifetime:
pub fn pop(&mut self) -> Result<T, &'static str>
Extra notes:
Don't call return explicitly at the end of the block / method: return Result::Ok(last) => Result::Ok(last).
Result, Result::Ok, and Result::Err are all imported via the prelude, so you don't need to qualify them: Result::Ok(last) => Ok(last).
There's no need to specify types in many cases let len: usize = self.data.len() => let len = self.data.len().
This happens because of lifetimes. When you construct a method which takes a reference the compiler detects that and if no lifetimes are specified it "generates" them:
pub fn pop<'a>(&'a mut self) -> Result<T, &'a str> {
let len: usize = self.data.len();
if len > 0 {
let idx_to_rmv: usize = len - 1;
let last: T = self.data.remove(idx_to_rmv);
return Result::Ok(last);
} else {
return Result::Err("Empty stack");
}
}
This is what compiler sees actually. So, you want to return a static string, then you have to specify the lifetime for a &str explicitly and let the lifetime for the reference to mut self be inferred automatically:
pub fn pop(&mut self) -> Result<T, &'static str> {

How do I efficiently build a vector and an index of that vector while processing a data stream?

I have a struct Foo:
struct Foo {
v: String,
// Other data not important for the question
}
I want to handle a data stream and save the result into Vec<Foo> and also create an index for this Vec<Foo> on the field Foo::v.
I want to use a HashMap<&str, usize> for the index, where the keys will be &Foo::v and the value is the position in the Vec<Foo>, but I'm open to other suggestions.
I want to do the data stream handling as fast as possible, which requires not doing obvious things twice.
For example, I want to:
allocate a String only once per one data stream reading
not search the index twice, once to check that the key does not exist, once for inserting new key.
not increase the run time by using Rc or RefCell.
The borrow checker does not allow this code:
let mut l = Vec::<Foo>::new();
{
let mut hash = HashMap::<&str, usize>::new();
//here is loop in real code, like:
//let mut s: String;
//while get_s(&mut s) {
let s = "aaa".to_string();
let idx: usize = match hash.entry(&s) { //a
Occupied(ent) => {
*ent.get()
}
Vacant(ent) => {
l.push(Foo { v: s }); //b
ent.insert(l.len() - 1);
l.len() - 1
}
};
// do something with idx
}
There are multiple problems:
hash.entry borrows the key so s must have a "bigger" lifetime than hash
I want to move s at line (b), while I have a read-only reference at line (a)
So how should I implement this simple algorithm without an extra call to String::clone or calling HashMap::get after calling HashMap::insert?
In general, what you are trying to accomplish is unsafe and Rust is correctly preventing you from doing something you shouldn't. For a simple example why, consider a Vec<u8>. If the vector has one item and a capacity of one, adding another value to the vector will cause a re-allocation and copying of all the values in the vector, invalidating any references into the vector. This would cause all of your keys in your index to point to arbitrary memory addresses, thus leading to unsafe behavior. The compiler prevents that.
In this case, there's two extra pieces of information that the compiler is unaware of but the programmer isn't:
There's an extra indirection — String is heap-allocated, so moving the pointer to that heap allocation isn't really a problem.
The String will never be changed. If it were, then it might reallocate, invalidating the referred-to address. Using a Box<[str]> instead of a String would be a way to enforce this via the type system.
In cases like this, it is OK to use unsafe code, so long as you properly document why it's not unsafe.
use std::collections::HashMap;
#[derive(Debug)]
struct Player {
name: String,
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let mut players = Vec::new();
let mut index = HashMap::new();
for &name in &names {
let player = Player { name: name.into() };
let idx = players.len();
// I copied this code from Stack Overflow without reading the prose
// that describes why this unsafe block is actually safe
let stable_name: &str = unsafe { &*(player.name.as_str() as *const str) };
players.push(player);
index.insert(idx, stable_name);
}
for (k, v) in &index {
println!("{:?} -> {:?}", k, v);
}
for v in &players {
println!("{:?}", v);
}
}
However, my guess is that you don't want this code in your main method but want to return it from some function. That will be a problem, as you will quickly run into Why can't I store a value and a reference to that value in the same struct?.
Honestly, there's styles of code that don't fit well within Rust's limitations. If you run into these, you could:
decide that Rust isn't a good fit for you or your problem.
use unsafe code, preferably thoroughly tested and only exposing a safe API.
investigate alternate representations.
For example, I'd probably rewrite the code to have the index be the primary owner of the key:
use std::collections::BTreeMap;
#[derive(Debug)]
struct Player<'a> {
name: &'a str,
data: &'a PlayerData,
}
#[derive(Debug)]
struct PlayerData {
hit_points: u8,
}
#[derive(Debug)]
struct Players(BTreeMap<String, PlayerData>);
impl Players {
fn new<I>(iter: I) -> Self
where
I: IntoIterator,
I::Item: Into<String>,
{
let players = iter
.into_iter()
.map(|name| (name.into(), PlayerData { hit_points: 100 }))
.collect();
Players(players)
}
fn get<'a>(&'a self, name: &'a str) -> Option<Player<'a>> {
self.0.get(name).map(|data| Player { name, data })
}
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let players = Players::new(names.iter().copied());
for (k, v) in &players.0 {
println!("{:?} -> {:?}", k, v);
}
println!("{:?}", players.get("eustice"));
}
Alternatively, as shown in What's the idiomatic way to make a lookup table which uses field of the item as the key?, you could wrap your type and store it in a set container instead:
use std::collections::BTreeSet;
#[derive(Debug, PartialEq, Eq)]
struct Player {
name: String,
hit_points: u8,
}
#[derive(Debug, Eq)]
struct PlayerByName(Player);
impl PlayerByName {
fn key(&self) -> &str {
&self.0.name
}
}
impl PartialOrd for PlayerByName {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
Some(self.cmp(other))
}
}
impl Ord for PlayerByName {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
self.key().cmp(&other.key())
}
}
impl PartialEq for PlayerByName {
fn eq(&self, other: &Self) -> bool {
self.key() == other.key()
}
}
impl std::borrow::Borrow<str> for PlayerByName {
fn borrow(&self) -> &str {
self.key()
}
}
#[derive(Debug)]
struct Players(BTreeSet<PlayerByName>);
impl Players {
fn new<I>(iter: I) -> Self
where
I: IntoIterator,
I::Item: Into<String>,
{
let players = iter
.into_iter()
.map(|name| {
PlayerByName(Player {
name: name.into(),
hit_points: 100,
})
})
.collect();
Players(players)
}
fn get(&self, name: &str) -> Option<&Player> {
self.0.get(name).map(|pbn| &pbn.0)
}
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let players = Players::new(names.iter().copied());
for player in &players.0 {
println!("{:?}", player.0);
}
println!("{:?}", players.get("eustice"));
}
not increase the run time by using Rc or RefCell
Guessing about performance characteristics without performing profiling is never a good idea. I honestly don't believe that there'd be a noticeable performance loss from incrementing an integer when a value is cloned or dropped. If the problem required both an index and a vector, then I would reach for some kind of shared ownership.
not increase the run time by using Rc or RefCell.
#Shepmaster already demonstrated accomplishing this using unsafe, once you have I would encourage you to check how much Rc actually would cost you. Here is a full version with Rc:
use std::{
collections::{hash_map::Entry, HashMap},
rc::Rc,
};
#[derive(Debug)]
struct Foo {
v: Rc<str>,
}
#[derive(Debug)]
struct Collection {
vec: Vec<Foo>,
index: HashMap<Rc<str>, usize>,
}
impl Foo {
fn new(s: &str) -> Foo {
Foo {
v: s.into(),
}
}
}
impl Collection {
fn new() -> Collection {
Collection {
vec: Vec::new(),
index: HashMap::new(),
}
}
fn insert(&mut self, foo: Foo) {
match self.index.entry(foo.v.clone()) {
Entry::Occupied(o) => panic!(
"Duplicate entry for: {}, {:?} inserted before {:?}",
foo.v,
o.get(),
foo
),
Entry::Vacant(v) => v.insert(self.vec.len()),
};
self.vec.push(foo)
}
}
fn main() {
let mut collection = Collection::new();
for foo in vec![Foo::new("Hello"), Foo::new("World"), Foo::new("Go!")] {
collection.insert(foo)
}
println!("{:?}", collection);
}
The error is:
error: `s` does not live long enough
--> <anon>:27:5
|
16 | let idx: usize = match hash.entry(&s) { //a
| - borrow occurs here
...
27 | }
| ^ `s` dropped here while still borrowed
|
= note: values in a scope are dropped in the opposite order they are created
The note: at the end is where the answer is.
s must outlive hash because you are using &s as a key in the HashMap. This reference will become invalid when s is dropped. But, as the note says, hash will be dropped after s. A quick fix is to swap the order of their declarations:
let s = "aaa".to_string();
let mut hash = HashMap::<&str, usize>::new();
But now you have another problem:
error[E0505]: cannot move out of `s` because it is borrowed
--> <anon>:22:33
|
17 | let idx: usize = match hash.entry(&s) { //a
| - borrow of `s` occurs here
...
22 | l.push(Foo { v: s }); //b
| ^ move out of `s` occurs here
This one is more obvious. s is borrowed by the Entry, which will live to the end of the block. Cloning s will fix that:
l.push(Foo { v: s.clone() }); //b
I only want to allocate s only once, not cloning it
But the type of Foo.v is String, so it will own its own copy of the str anyway. Just that type means you have to copy the s.
You can replace it with a &str instead which will allow it to stay as a reference into s:
struct Foo<'a> {
v: &'a str,
}
pub fn main() {
// s now lives longer than l
let s = "aaa".to_string();
let mut l = Vec::<Foo>::new();
{
let mut hash = HashMap::<&str, usize>::new();
let idx: usize = match hash.entry(&s) {
Occupied(ent) => {
*ent.get()
}
Vacant(ent) => {
l.push(Foo { v: &s });
ent.insert(l.len() - 1);
l.len() - 1
}
};
}
}
Note that, previously I had to move the declaration of s to before hash, so that it would outlive it. But now, l holds a reference to s, so it has to be declared even earlier, so that it outlives l.

Why does Rust borrow checker reject this code?

I'm getting a Rust compile error from the borrow checker, and I don't understand why. There's probably something about lifetimes I don't fully understand.
I've boiled it down to a short code sample. In main, I want to do this:
fn main() {
let codeToScan = "40 + 2";
let mut scanner = Scanner::new(codeToScan);
let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
println!("first token is: {}", first_token);
// scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
}
Trying to call scanner.consume_till a second time gives me this error:
example.rs:64:5: 64:12 error: cannot borrow `scanner` as mutable more than once at a time
example.rs:64 scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
^~~~~~~
example.rs:62:23: 62:30 note: previous borrow of `scanner` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `scanner` until the borrow ends
example.rs:62 let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
^~~~~~~
example.rs:65:2: 65:2 note: previous borrow ends here
example.rs:59 fn main() {
...
example.rs:65 }
Basically, I've made something like my own iterator, and the equivalent to the "next" method takes &mut self. Because of that, I can't use the method more than once in the same scope.
However, the Rust std library has an iterator which can be used more than once in the same scope, and it also takes a &mut self parameter.
let test = "this is a string";
let mut iterator = test.chars();
iterator.next();
iterator.next(); // This is PERFECTLY LEGAL
So why does the Rust std library code compile, but mine doesn't? (I'm sure the lifetime annotations are at the root of it, but my understanding of lifetimes doesn't lead to me expecting a problem).
Here's my full code (only 60 lines, shortened for this question):
use std::str::{Chars};
use std::iter::{Enumerate};
#[deriving(Show)]
struct ConsumeResult<'lt> {
value: &'lt str,
startIndex: uint,
endIndex: uint,
}
struct Scanner<'lt> {
code: &'lt str,
char_iterator: Enumerate<Chars<'lt>>,
isEof: bool,
}
impl<'lt> Scanner<'lt> {
fn new<'lt>(code: &'lt str) -> Scanner<'lt> {
Scanner{code: code, char_iterator: code.chars().enumerate(), isEof: false}
}
fn assert_not_eof<'lt>(&'lt self) {
if self.isEof {fail!("Scanner is at EOF."); }
}
fn next(&mut self) -> Option<(uint, char)> {
self.assert_not_eof();
let result = self.char_iterator.next();
if result == None { self.isEof = true; }
return result;
}
fn consume_till<'lt>(&'lt mut self, quit: |char| -> bool) -> ConsumeResult<'lt> {
self.assert_not_eof();
let mut startIndex: Option<uint> = None;
let mut endIndex: Option<uint> = None;
loop {
let should_quit = match self.next() {
None => {
endIndex = Some(endIndex.unwrap() + 1);
true
},
Some((i, ch)) => {
if startIndex == None { startIndex = Some(i);}
endIndex = Some(i);
quit (ch)
}
};
if should_quit {
return ConsumeResult{ value: self.code.slice(startIndex.unwrap(), endIndex.unwrap()),
startIndex:startIndex.unwrap(), endIndex: endIndex.unwrap() };
}
}
}
}
fn main() {
let codeToScan = "40 + 2";
let mut scanner = Scanner::new(codeToScan);
let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
println!("first token is: {}", first_token);
// scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
}
Here's a simpler example of the same thing:
struct Scanner<'a> {
s: &'a str
}
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'a>(&'a mut self) -> &'a str {
let return_value = self.s.slice_to(3);
self.s = self.s.slice_from(3);
return_value
}
}
fn main() {
let mut scan = Scanner { s: "123456" };
let a = scan.step_by_3_bytes();
println!("{}", a);
let b = scan.step_by_3_bytes();
println!("{}", b);
}
If you compile that, you get errors like the code in the question:
<anon>:19:13: 19:17 error: cannot borrow `scan` as mutable more than once at a time
<anon>:19 let b = scan.step_by_3_bytes();
^~~~
<anon>:16:13: 16:17 note: previous borrow of `scan` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `scan` until the borrow ends
<anon>:16 let a = scan.step_by_3_bytes();
^~~~
<anon>:21:2: 21:2 note: previous borrow ends here
<anon>:13 fn main() {
...
<anon>:21 }
^
Now, the first thing to do is to avoid shadowing lifetimes: that is, this code has two lifetimes called 'a and all the 'as in step_by_3_bytes refer to the 'a declare there, none of them actually refer to the 'a in Scanner<'a>. I'll rename the inner one to make it crystal clear what is going on
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'b mut self) -> &'b str {
The problem here is the 'b is connecting the self object with the str return value. The compiler has to assume that calling step_by_3_bytes can make arbitrary modifications, including invalidating previous return values, when looking at the definition of step_by_3_bytes from the outside (which is how the compiler works, type checking is purely based on type signatures of things that are called, no introspect). That is, it could be defined like
struct Scanner<'a> {
s: &'a str,
other: String,
count: uint
}
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'b mut self) -> &'b str {
self.other.push_str(self.s);
// return a reference into data we own
self.other.as_slice()
}
}
Now, each call to step_by_3_bytes starts modifying the object that previous return values came from. E.g. it could cause the String to reallocate and thus move in memory, leaving any other &str return values as dangling pointers. Rust protects against this by tracking these references and disallowing mutation if it could cause such catastrophic events. Going back to our actual code: the compiler is type checking main just by looking at the type signature of step_by_3_bytes/consume_till and so it can only assume the worst case scenario (i.e. the example I just gave).
How do we solve this?
Let's take a step back: as if we're just starting out and don't know which lifetimes we want for the return values, so we'll just leave them anonymous (not actually valid Rust):
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'_ mut self) -> &'_ str {
Now, we get to ask the fun question: which lifetimes do we want where?
It's almost always best to annotate the longest valid lifetimes, and we know our return value lives for 'a (since it comes straight of the s field, and that &str is valid for 'a). That is,
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'_ mut self) -> &'a str {
For the other '_, we don't actually care: as API designers, we don't have any particular desire or need to connect the self borrow with any other references (unlike the return value, where we wanted/needed to express which memory it came from). So, we might as well leave it off
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&mut self) -> &'a str {
The 'b is unused, so it can be killed, leaving us with
impl<'a> Scanner<'a> {
fn step_by_3_bytes(&mut self) -> &'a str {
This expresses that Scanner is referring to some memory that is valid for at least 'a, and then returning references into just that memory. The self object is essentially just a proxy for manipulating those views: once you have the reference it returns, you can discard the Scanner (or call more methods).
In summary, the full, working code is
struct Scanner<'a> {
s: &'a str
}
impl<'a> Scanner<'a> {
fn step_by_3_bytes(&mut self) -> &'a str {
let return_value = self.s.slice_to(3);
self.s = self.s.slice_from(3);
return_value
}
}
fn main() {
let mut scan = Scanner { s: "123456" };
let a = scan.step_by_3_bytes();
println!("{}", a);
let b = scan.step_by_3_bytes();
println!("{}", b);
}
Applying this change to your code is simply adjusting the definition of consume_till.
fn consume_till(&mut self, quit: |char| -> bool) -> ConsumeResult<'lt> {
So why does the Rust std library code compile, but mine doesn't? (I'm sure the lifetime annotations are at the root of it, but my understanding of lifetimes doesn't lead to me expecting a problem).
There's a slight (but not huge) difference here: Chars is just returning a char, i.e. no lifetimes in the return value. The next method (essentially) has signature:
impl<'a> Chars<'a> {
fn next(&mut self) -> Option<char> {
(It's actually in an Iterator trait impl, but that's not important.)
The situation you have here is similar to writing
impl<'a> Chars<'a> {
fn next(&'a mut self) -> Option<char> {
(Similar in terms of "incorrect linking of lifetimes", the details differ.)
Let’s look at consume_till.
It takes &'lt mut self and returns ConsumeResult<'lt>. This means that the lifetime 'lt, the duration of the borrow of the input parameter self, will be that of the output parameter, the return value.
Expressed another way, after calling consume_till, you cannot use self again until its result is out of scope.
That result is placed into first_token, and first_token is still in scope in your last line.
In order to get around this, you must cause first_token to pass out of scope; the insertion of a new block around it will do this:
fn main() {
let code_to_scan = "40 + 2";
let mut scanner = Scanner::new(code_to_scan);
{
let first_token = scanner.consume_till(|c| !c.is_digit());
println!("first token is: {}", first_token);
}
scanner.consume_till(|c| c.is_whitespace());
}
All this does stand to reason: while you have a reference to something inside the Scanner, it is not safe to let you modify it, lest that reference be invalidated. This is the memory safety that Rust provides.

Resources