Why does Rust borrow checker reject this code? - rust

I'm getting a Rust compile error from the borrow checker, and I don't understand why. There's probably something about lifetimes I don't fully understand.
I've boiled it down to a short code sample. In main, I want to do this:
fn main() {
let codeToScan = "40 + 2";
let mut scanner = Scanner::new(codeToScan);
let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
println!("first token is: {}", first_token);
// scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
}
Trying to call scanner.consume_till a second time gives me this error:
example.rs:64:5: 64:12 error: cannot borrow `scanner` as mutable more than once at a time
example.rs:64 scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
^~~~~~~
example.rs:62:23: 62:30 note: previous borrow of `scanner` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `scanner` until the borrow ends
example.rs:62 let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
^~~~~~~
example.rs:65:2: 65:2 note: previous borrow ends here
example.rs:59 fn main() {
...
example.rs:65 }
Basically, I've made something like my own iterator, and the equivalent to the "next" method takes &mut self. Because of that, I can't use the method more than once in the same scope.
However, the Rust std library has an iterator which can be used more than once in the same scope, and it also takes a &mut self parameter.
let test = "this is a string";
let mut iterator = test.chars();
iterator.next();
iterator.next(); // This is PERFECTLY LEGAL
So why does the Rust std library code compile, but mine doesn't? (I'm sure the lifetime annotations are at the root of it, but my understanding of lifetimes doesn't lead to me expecting a problem).
Here's my full code (only 60 lines, shortened for this question):
use std::str::{Chars};
use std::iter::{Enumerate};
#[deriving(Show)]
struct ConsumeResult<'lt> {
value: &'lt str,
startIndex: uint,
endIndex: uint,
}
struct Scanner<'lt> {
code: &'lt str,
char_iterator: Enumerate<Chars<'lt>>,
isEof: bool,
}
impl<'lt> Scanner<'lt> {
fn new<'lt>(code: &'lt str) -> Scanner<'lt> {
Scanner{code: code, char_iterator: code.chars().enumerate(), isEof: false}
}
fn assert_not_eof<'lt>(&'lt self) {
if self.isEof {fail!("Scanner is at EOF."); }
}
fn next(&mut self) -> Option<(uint, char)> {
self.assert_not_eof();
let result = self.char_iterator.next();
if result == None { self.isEof = true; }
return result;
}
fn consume_till<'lt>(&'lt mut self, quit: |char| -> bool) -> ConsumeResult<'lt> {
self.assert_not_eof();
let mut startIndex: Option<uint> = None;
let mut endIndex: Option<uint> = None;
loop {
let should_quit = match self.next() {
None => {
endIndex = Some(endIndex.unwrap() + 1);
true
},
Some((i, ch)) => {
if startIndex == None { startIndex = Some(i);}
endIndex = Some(i);
quit (ch)
}
};
if should_quit {
return ConsumeResult{ value: self.code.slice(startIndex.unwrap(), endIndex.unwrap()),
startIndex:startIndex.unwrap(), endIndex: endIndex.unwrap() };
}
}
}
}
fn main() {
let codeToScan = "40 + 2";
let mut scanner = Scanner::new(codeToScan);
let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
println!("first token is: {}", first_token);
// scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
}

Here's a simpler example of the same thing:
struct Scanner<'a> {
s: &'a str
}
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'a>(&'a mut self) -> &'a str {
let return_value = self.s.slice_to(3);
self.s = self.s.slice_from(3);
return_value
}
}
fn main() {
let mut scan = Scanner { s: "123456" };
let a = scan.step_by_3_bytes();
println!("{}", a);
let b = scan.step_by_3_bytes();
println!("{}", b);
}
If you compile that, you get errors like the code in the question:
<anon>:19:13: 19:17 error: cannot borrow `scan` as mutable more than once at a time
<anon>:19 let b = scan.step_by_3_bytes();
^~~~
<anon>:16:13: 16:17 note: previous borrow of `scan` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `scan` until the borrow ends
<anon>:16 let a = scan.step_by_3_bytes();
^~~~
<anon>:21:2: 21:2 note: previous borrow ends here
<anon>:13 fn main() {
...
<anon>:21 }
^
Now, the first thing to do is to avoid shadowing lifetimes: that is, this code has two lifetimes called 'a and all the 'as in step_by_3_bytes refer to the 'a declare there, none of them actually refer to the 'a in Scanner<'a>. I'll rename the inner one to make it crystal clear what is going on
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'b mut self) -> &'b str {
The problem here is the 'b is connecting the self object with the str return value. The compiler has to assume that calling step_by_3_bytes can make arbitrary modifications, including invalidating previous return values, when looking at the definition of step_by_3_bytes from the outside (which is how the compiler works, type checking is purely based on type signatures of things that are called, no introspect). That is, it could be defined like
struct Scanner<'a> {
s: &'a str,
other: String,
count: uint
}
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'b mut self) -> &'b str {
self.other.push_str(self.s);
// return a reference into data we own
self.other.as_slice()
}
}
Now, each call to step_by_3_bytes starts modifying the object that previous return values came from. E.g. it could cause the String to reallocate and thus move in memory, leaving any other &str return values as dangling pointers. Rust protects against this by tracking these references and disallowing mutation if it could cause such catastrophic events. Going back to our actual code: the compiler is type checking main just by looking at the type signature of step_by_3_bytes/consume_till and so it can only assume the worst case scenario (i.e. the example I just gave).
How do we solve this?
Let's take a step back: as if we're just starting out and don't know which lifetimes we want for the return values, so we'll just leave them anonymous (not actually valid Rust):
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'_ mut self) -> &'_ str {
Now, we get to ask the fun question: which lifetimes do we want where?
It's almost always best to annotate the longest valid lifetimes, and we know our return value lives for 'a (since it comes straight of the s field, and that &str is valid for 'a). That is,
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'_ mut self) -> &'a str {
For the other '_, we don't actually care: as API designers, we don't have any particular desire or need to connect the self borrow with any other references (unlike the return value, where we wanted/needed to express which memory it came from). So, we might as well leave it off
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&mut self) -> &'a str {
The 'b is unused, so it can be killed, leaving us with
impl<'a> Scanner<'a> {
fn step_by_3_bytes(&mut self) -> &'a str {
This expresses that Scanner is referring to some memory that is valid for at least 'a, and then returning references into just that memory. The self object is essentially just a proxy for manipulating those views: once you have the reference it returns, you can discard the Scanner (or call more methods).
In summary, the full, working code is
struct Scanner<'a> {
s: &'a str
}
impl<'a> Scanner<'a> {
fn step_by_3_bytes(&mut self) -> &'a str {
let return_value = self.s.slice_to(3);
self.s = self.s.slice_from(3);
return_value
}
}
fn main() {
let mut scan = Scanner { s: "123456" };
let a = scan.step_by_3_bytes();
println!("{}", a);
let b = scan.step_by_3_bytes();
println!("{}", b);
}
Applying this change to your code is simply adjusting the definition of consume_till.
fn consume_till(&mut self, quit: |char| -> bool) -> ConsumeResult<'lt> {
So why does the Rust std library code compile, but mine doesn't? (I'm sure the lifetime annotations are at the root of it, but my understanding of lifetimes doesn't lead to me expecting a problem).
There's a slight (but not huge) difference here: Chars is just returning a char, i.e. no lifetimes in the return value. The next method (essentially) has signature:
impl<'a> Chars<'a> {
fn next(&mut self) -> Option<char> {
(It's actually in an Iterator trait impl, but that's not important.)
The situation you have here is similar to writing
impl<'a> Chars<'a> {
fn next(&'a mut self) -> Option<char> {
(Similar in terms of "incorrect linking of lifetimes", the details differ.)

Let’s look at consume_till.
It takes &'lt mut self and returns ConsumeResult<'lt>. This means that the lifetime 'lt, the duration of the borrow of the input parameter self, will be that of the output parameter, the return value.
Expressed another way, after calling consume_till, you cannot use self again until its result is out of scope.
That result is placed into first_token, and first_token is still in scope in your last line.
In order to get around this, you must cause first_token to pass out of scope; the insertion of a new block around it will do this:
fn main() {
let code_to_scan = "40 + 2";
let mut scanner = Scanner::new(code_to_scan);
{
let first_token = scanner.consume_till(|c| !c.is_digit());
println!("first token is: {}", first_token);
}
scanner.consume_till(|c| c.is_whitespace());
}
All this does stand to reason: while you have a reference to something inside the Scanner, it is not safe to let you modify it, lest that reference be invalidated. This is the memory safety that Rust provides.

Related

Assembling a string and returning it with lifetime parameters for a l-system

I'm trying to implement a L-System struct and am struggling with it. I already tried different approaches but my main struggle comes from lifetime of references. What I'm trying to achieve is passing the value of the applied axioms back to my system variable, which i passed with the necessary lifetime in apply_axioms_once.
use std::collections::HashMap;
struct LSytem<'a> {
axioms: HashMap<&'a char, &'a str>,
}
impl<'a> LSytem<'a> {
fn apply_axioms_once(&mut self, system: &'a mut str) -> &'a str {
let mut applied: String = String::new();
for c in system.chars() {
let axiom = self.axioms.get(&c).unwrap();
for s in axiom.chars() {
applied.push(s);
}
}
system = applied.as_str();
system
}
fn apply_axioms(&mut self, system: &'a str, iterations: u8) -> &'a str {
let mut applied: &str = system;
// check for 0?
for _ in 0..iterations {
applied = self.apply_axioms_once(applied);
}
&applied
}
}
I already read a couple of similar questions, but still can't quite wrap my head around it. What seems to be the most on point answer is https://stackoverflow.com/a/42506211/18422275, but I'm still puzzled about how to apply this to my issue.
I am still a beginner in rust, and way more bloody than i thought.
This can't work because you return a reference of a data created inside the function (so the given data has a lifetime until the end of the function scope, the returned reference would point to nothing).
You shoud try to return String from your functions instead, so the returned data can be owned.
I made this example to try out:
use std::collections::HashMap;
struct LSytem<'a> {
axioms: HashMap<&'a char, &'a str>,
}
impl<'a> LSytem<'a> {
fn apply_axioms_once(&mut self, system: &String) -> String {
let mut applied: String = String::new();
for c in system.chars() {
let axiom = self.axioms.get(&c).unwrap();
for s in axiom.chars() {
applied.push(s);
}
}
applied
}
fn apply_axioms(&mut self, system: &String, iterations: u8) ->String{
let mut applied = String::from(system);
// check for 0?
for _ in 0..iterations {
applied = self.apply_axioms_once(system);
}
applied
}
}
fn main() {
let mut ls = LSytem {axioms: HashMap::new()};
ls.axioms.insert(&'a', "abc");
let s = String::from("a");
ls.apply_axioms(&s,1);
}

What are the differences when getting an immutable reference from a mutable reference with self-linked lifetimes?

struct Foo01<'a> {
val: u32,
str: &'a String,
}
fn mutate_and_share_01<'a>(foo: &'a mut Foo01<'a>) -> &'a Foo01<'a> {
foo
}
fn mutate_and_share_02<'a>(foo: &'a mut Foo01<'a>) -> &'a Foo01 {
foo
}
fn mutate_and_share_03<'a>(foo: &'a mut Foo01) -> &'a Foo01<'a> {
foo
}
fn main() {
let mut foo = Foo01 { val: 16, str: &String::from("Hello ") };
let foo_mut = &mut foo;
//let loan = mutate_and_share_01(foo_mut);
//let loan2 = mutate_and_share_01(foo_mut); //error
//let loan = mutate_and_share_02(foo_mut);
//let loan2 = mutate_and_share_02(foo_mut); //error
let loan = mutate_and_share_03(foo_mut);
let loan2 = mutate_and_share_03(foo_mut); //good
}
What are differences between these mutate_and_share versions?
In cases 1 and 2, you're saying that the function borrows the structure for as long as the structure borrows its parameter:
foo: &'a mut Foo01<'a>
this says "foo is borrowed from 'a" (&'a mut) and "foo borrows its parameter for 'a" (Foo01<'a>).
Meaning as far as rustc is concerned a call to this function will necessarily borrow the input forever, as the structure necessarily borrows its input for the entirety of its own lifetime, and thus you get locked out: you can't "unborrow" the input by dropping it so the second call can't work ever.
In case 3 you're relating the parameter of the output to the internal borrow which isn't really true but works well enough at least for this case. The reality is that the two lifetimes are unrelated:
fn mutate_and_share<'a, 'b>(foo: &'a mut Foo01<'b>) -> &'a Foo01<'b> {
foo
}
Also do note that your third case only works because you're never using loan, so it's immediately dropped before the second line executes. If you do this:
let loan = mutate_and_share_03(foo_mut);
let loan2 = mutate_and_share_03(foo_mut); //good
print("{}", loan.val)
then it's not going to compile because the mutable borrows are overlapping.
Oh, and &String is generally useless. There are use cases for &mut String, but any time you see an immutable reference to a String you'd be better off with an &str. Same with &Vec<T>, not useful, should be &[T].

Why Rust can't coerce mutable reference to immutable reference in a type constructor?

It is possible to coerce &mut T into &T but it doesn't work if the type mismatch happens within a type constructor.
playground
use ndarray::*; // 0.13.0
fn print(a: &ArrayView1<i32>) {
println!("{:?}", a);
}
pub fn test() {
let mut x = array![1i32, 2, 3];
print(&x.view_mut());
}
For the above code I get following error:
|
9 | print(&x.view_mut());
| ^^^^^^^^^^^^^ types differ in mutability
|
= note: expected reference `&ndarray::ArrayBase<ndarray::ViewRepr<&i32>, ndarray::dimension::dim::Dim<[usize; 1]>>`
found reference `&ndarray::ArrayBase<ndarray::ViewRepr<&mut i32>, ndarray::dimension::dim::Dim<[usize; 1]>>`
It is safe to coerce &mut i32 to &i32 so why it is not applied in this situation? Could you provide some examples on how could it possibly backfire?
In general, it's not safe to coerce Type<&mut T> into Type<&T>.
For example, consider this wrapper type, which is implemented without any unsafe code and is therefore sound:
#[derive(Copy, Clone)]
struct Wrapper<T>(T);
impl<T: Deref> Deref for Wrapper<T> {
type Target = T::Target;
fn deref(&self) -> &T::Target { &self.0 }
}
impl<T: DerefMut> DerefMut for Wrapper<T> {
fn deref_mut(&mut self) -> &mut T::Target { &mut self.0 }
}
This type has the property that &Wrapper<&T> automatically dereferences to &T, and &mut Wrapper<&mut T> automatically dereferences to &mut T. In addition, Wrapper<T> is copyable if T is.
Assume that there exists a function that can take a &Wrapper<&mut T> and coerce it into a &Wrapper<&T>:
fn downgrade_wrapper_ref<'a, 'b, T: ?Sized>(w: &'a Wrapper<&'b mut T>) -> &'a Wrapper<&'b T> {
unsafe {
// the internals of this function is not important
}
}
By using this function, it is possible to get a mutable and immutable reference to the same value at the same time:
fn main() {
let mut value: i32 = 0;
let mut x: Wrapper<&mut i32> = Wrapper(&mut value);
let x_ref: &Wrapper<&mut i32> = &x;
let y_ref: &Wrapper<&i32> = downgrade_wrapper_ref(x_ref);
let y: Wrapper<&i32> = *y_ref;
let a: &mut i32 = &mut *x;
let b: &i32 = &*y;
// these two lines will print the same addresses
// meaning the references point to the same value!
println!("a = {:p}", a as &mut i32); // "a = 0x7ffe56ca6ba4"
println!("b = {:p}", b as &i32); // "b = 0x7ffe56ca6ba4"
}
Full playground example
This is not allowed in Rust, leads to undefined behavior and means that the function downgrade_wrapper_ref is unsound in this case. There may be other specific cases where you, as the programmer, can guarantee that this won't happen, but it still requires you to implement it specifically for those case, using unsafe code, to ensure that you take the responsibility of making those guarantees.
Consider this check for an empty string that relies on content staying unchanged for the runtime of the is_empty function (for illustration purposes only, don't use this in production code):
struct Container<T> {
content: T
}
impl<T> Container<T> {
fn new(content: T) -> Self
{
Self { content }
}
}
impl<'a> Container<&'a String> {
fn is_empty(&self, s: &str) -> bool
{
let str = format!("{}{}", self.content, s);
&str == s
}
}
fn main() {
let mut foo : String = "foo".to_owned();
let container : Container<&mut String> = Container::new(&mut foo);
std::thread::spawn(|| {
container.content.replace_range(1..2, "");
});
println!("an empty str is actually empty: {}", container.is_empty(""))
}
(Playground)
This code does not compile since &mut String does not coerce into &String. If it did, however, it would be possible that the newly created thread changed the content after the format! call but before the equal comparison in the is_empty function, thereby invalidating the assumption that the container's content was immutable, which is required for the empty check.
It seems type coercions don't apply to array elements when array is the function parameter type.
playground

Why does a call to `fn pop(&mut self) -> Result<T, &str>` continue to borrow my data structure?

I am developing some basic data structures to learn the syntax and Rust in general. Here is what I came up with for a stack:
#[allow(dead_code)]
mod stack {
pub struct Stack<T> {
data: Vec<T>,
}
impl<T> Stack<T> {
pub fn new() -> Stack<T> {
return Stack { data: Vec::new() };
}
pub fn pop(&mut self) -> Result<T, &str> {
let len: usize = self.data.len();
if len > 0 {
let idx_to_rmv: usize = len - 1;
let last: T = self.data.remove(idx_to_rmv);
return Result::Ok(last);
} else {
return Result::Err("Empty stack");
}
}
pub fn push(&mut self, elem: T) {
self.data.push(elem);
}
pub fn is_empty(&self) -> bool {
return self.data.len() == 0;
}
}
}
mod stack_tests {
use super::stack::Stack;
#[test]
fn basics() {
let mut s: Stack<i16> = Stack::new();
s.push(16);
s.push(27);
let pop_result = s.pop().expect("");
assert_eq!(s.pop().expect("Empty stack"), 27);
assert_eq!(s.pop().expect("Empty stack"), 16);
let pop_empty_result = s.pop();
match pop_empty_result {
Ok(_) => panic!("Should have had no result"),
Err(_) => {
println!("Empty stack");
}
}
if s.is_empty() {
println!("O");
}
}
}
I get this interesting error:
error[E0502]: cannot borrow `s` as immutable because it is also borrowed as mutable
--> src/main.rs:58:12
|
49 | let pop_empty_result = s.pop();
| - mutable borrow occurs here
...
58 | if s.is_empty() {
| ^ immutable borrow occurs here
...
61 | }
| - mutable borrow ends here
Why can't I just call pop on my mutable struct?
Why does pop borrow the value? If I add a .expect() after it, it is ok, it doesn't trigger that error. I know that is_empty takes an immutable reference, if I switch it to mutable I just get a second mutable borrow.
Your pop function is declared as:
pub fn pop(&mut self) -> Result<T, &str>
Due to lifetime elision, this expands to
pub fn pop<'a>(&'a mut self) -> Result<T, &'a str>
This says that the Result::Err variant is a string that lives as long as the stack you are calling it on. Since the input and output lifetimes are the same, the returned value might be pointing somewhere into the Stack data structure so the returned value must continue to hold the borrow.
If I add a .expect() after it, it is ok, it doesn't trigger that error.
That's because expect consumes the Result, discarding the Err variant without ever putting it into a variable binding. Since that's never stored, the borrow cannot be saved anywhere and it is released.
To solve the problem, you need to have distinct lifetimes between the input reference and output reference. Since you are using a string literal, the easiest solution is to denote that using the 'static lifetime:
pub fn pop(&mut self) -> Result<T, &'static str>
Extra notes:
Don't call return explicitly at the end of the block / method: return Result::Ok(last) => Result::Ok(last).
Result, Result::Ok, and Result::Err are all imported via the prelude, so you don't need to qualify them: Result::Ok(last) => Ok(last).
There's no need to specify types in many cases let len: usize = self.data.len() => let len = self.data.len().
This happens because of lifetimes. When you construct a method which takes a reference the compiler detects that and if no lifetimes are specified it "generates" them:
pub fn pop<'a>(&'a mut self) -> Result<T, &'a str> {
let len: usize = self.data.len();
if len > 0 {
let idx_to_rmv: usize = len - 1;
let last: T = self.data.remove(idx_to_rmv);
return Result::Ok(last);
} else {
return Result::Err("Empty stack");
}
}
This is what compiler sees actually. So, you want to return a static string, then you have to specify the lifetime for a &str explicitly and let the lifetime for the reference to mut self be inferred automatically:
pub fn pop(&mut self) -> Result<T, &'static str> {

What are the Rust borrowing rules regarding mutable internal references?

It's not intuitive to me why a program like
#[derive(Debug)]
struct Test {
buf: [u8; 16],
}
impl Test {
fn new() -> Test {
Test {
buf: [0u8; 16],
}
}
fn hi(&mut self) {
self.buf[0] = 'H' as u8;
self.buf[1] = 'i' as u8;
self.buf[2] = '!' as u8;
self.print();
}
fn print(&self) {
println!("{:?}", self);
}
}
fn main() {
Test::new().hi();
}
compiles and runs without any problem, but a program like
#[derive(Debug)]
enum State {
Testing([u8; 16]),
}
#[derive(Debug)]
struct Test {
state: State,
}
impl Test {
fn new() -> Test {
Test {
state: State::Testing([0u8; 16]),
}
}
fn hi(&mut self) {
match self.state {
State::Testing(ref mut buf) => {
buf[0] = 'H' as u8;
buf[1] = 'i' as u8;
buf[2] = '!' as u8;
self.print();
},
}
}
fn print(&self) {
println!("{:?}", self);
}
}
fn main() {
Test::new().hi();
}
errors during compilation with an error of
error[E0502]: cannot borrow *self as immutable because
self.state.0 is also borrowed as mutable
Since both programs do essentially the same thing, the second doesn't seem like it would be somehow more unsafe from a memory perspective. I know there must be something about the borrowing and scoping rules that I must be missing, but have no idea what.
In order to make your hi function work you just need to move print out of the scope of the mutable borrow introduced in its match expression:
fn hi(&mut self) {
match self.state {
State::Testing(ref mut buf) => {
buf[0] = 'H' as u8;
buf[1] = 'i' as u8;
buf[2] = '!' as u8;
},
}
self.print();
}
Your two variants are not equivalent due to the presence of the match block in the second case. I don't know how to directly access the tuple struct in the enum without pattern matching (or if this is even possible right now), but if it was the case, then there would in fact be not much difference and both versions would work.
In the match statement, you borrow self.state. Borrow scopes are lexical, so it is borrowed in the entire match block. When you call self.print(), you need to borrow self. But that is not possible, because part of self is already borrowed. If you move self.print() after the match statement, it will work.
Regarding the lexical borrow scope, you can read more in the second part of Two bugs in the borrow checker every Rust developer should know about. Related issues: #6393, #811.

Resources