I am reading the official Rust Book and looking at listing 4-8 in Section 4.3.
The code looks like this:
fn first_word(s: &String) -> usize {
let bytes = s.as_bytes();
for (i, &item) in bytes.iter().enumerate() {
if item == b' ' {
return i;
}
}
s.len()
}
fn main() {
let mut s = String::from("hello world");
let word = first_word(&s);
s.clear();
}
This line:
let word = first_word(&s);
seems to borrow an immutable reference to s. (This is where I guess I'm wrong; I just don't know why.)
In the next line, we mutate s by calling the clear() method.
I was expecting the compiler to throw:
cannot borrow `s` as mutable because it is also borrowed as immutable
Why does this compile?
The string s is immutably borrowed for the duration of the call to first_word. As soon as the control is returned to main after first_word, the string is not considered borrowed anymore and can be mutated as you have observed.
If first_word were to return a &String which you extended the lifetime of by assigning it to a variable, then you would see the error you expected. E.g.
fn first_word(s: &String) -> &String {
&s
}
fn main() {
let mut s = String::from("hello world");
let word = first_word(&s);
s.clear();
}
cannot borrow s as mutable because it is also borrowed as immutable
https://rust.godbolt.org/z/cMVdVf
In that case, adding an extra scope would fix that:
fn main() {
let mut s = String::from("hello world");
{
let word = first_word(&s);
}
s.clear();
}
Related
I have a code looks like this:
use std::collections::HashMap;
fn main() {
let x = get_hash_map();
println!("{:?}", x);
}
fn get_hash_map() -> Option<&'static Vec<i32>> {
let mut hm = HashMap::new();
let mut vec = Vec::new();
vec.push(1);
hm.insert("1".to_string(), vec);
return hm.get("1");
}
But I got this error:
error[E0515]: cannot return value referencing local variable `hm`
--> src/main.rs:13:12
|
13 | return hm.get("1");
| --^^^^^^^^^
| |
| returns a value referencing data owned by the current function
| `hm` is borrowed here
Here is the rust playground: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=7605176aee2dd3ff77a0cfd04a89db55
Can anyone suggest the alternatives to fix this problem minimally? Thanks!
fn get_hash_map() -> Option<&'static Vec<i32>> {
let mut hm = HashMap::new();
let mut vec = Vec::new();
vec.push(1);
hm.insert("1".to_string(), vec);
return hm.get("1");
}
This is invalid, because you have declared you are going to return an Option<&'static Vec<i32>>, but you are returning an Option<&'a Vec<i32>> where 'a is the current function lifetime. The HashMap will stop existing as soon as the function returns, freeing the vector, and the reference will then become dangling. This is the exact kind of situation the borrow checker is designed to avoid.
Just return the vector by value:
fn get_hash_map() -> Option<Vec<i32>> {
let mut hm = HashMap::new();
let mut vec = Vec::new();
vec.push(1);
hm.insert("1".to_string(), vec);
return hm.remove("1");
}
remove moves the value out of the map, returning it as an Option<V>.
If you then need an Option<&Vec<i32>>, you can just use as_ref() on your Option<Vec<i32>> to get one. Always remember it will become invalid as soon as its value goes out of scope.
The HashMap hm is local to the scope of the get_hash_map() function and is dropped as soon as get_hash_map() returns. The value returned by hm.get("1") contains a reference to this HashMap, thus its lifetime is also tied to the scope of get_hash_map() which unfortunately is shorter than the ascribed 'static lifetime.
If you remove the 'static lifetime and replaced it by some 'a annotation on the function, you would get a similar error, since again there is no (sound) way to return borrowed data from a function that creates the owner of that data.
You can however create the map in the surrounding scope and pass it via a mutable reference to get_hash_map
use std::collections::HashMap;
fn main() {
let mut hm = HashMap::new();
let x = get_hash_map(&mut hm);
println!("{:?}", x);
}
// note that both `hm` and the reference in the return type have the same 'a lifetime.
fn get_hash_map<'a>(hm: &'a mut HashMap<String, Vec<i32>>) -> Option<&'a Vec<i32>> {
let mut vec = Vec::new();
vec.push(1);
hm.insert("1".to_string(), vec);
hm.get("1");
}
fn say_hello(s: &str) {
println!("Hello {}", s);
}
Why does this work
fn main() {
let mut name = String::from("Charlie");
let x = &mut name;
say_hello(x);
name.push_str(" Brown");
}
but this doesn't?
fn main() {
let mut name = String::from("Charlie");
let x = &mut name;
name.push_str(" Brown");
say_hello(x);
}
all I did was switch the order of the two functions but it seems like x has mutably borrowed name and push_str has also mutably borrowed name in both situations, so why does the first example compile?
If I take out the call to say_hello() why does the order of the two not matter even though there are still two mutable borrows?
Edit:
Is this similar?
fn change_string(s: &mut String) { // s is mutably borrowed but isn't used yet
println!("{}", s.is_empty()); // so the scopes don't overlap even though is_empty is making an immutable borrow?
s.push_str(" Brown");
}
One of Rust's borrowing rules is that mutable references are exclusive. Meaning, while x is alive, name cannot be used.
So, why does the first example compile even if x is still in scope? Because Rust also has non-lexical lifetimes meaning x stops "living" after its last use.
fn main() {
let mut name = String::from("Charlie");
let x = &mut name;
say_hello(x); // "x"s lifetime ends here, releasing the exclusive borrow
name.push_str(" Brown"); // "name" can be used again
}
Because in your first case, the two mutable borrow scopes don't overlap and you have only one borrow at any given point of time; but in your second case, they overlap, which means you have multiple mutable borrows at the certain point of time, which is not allowed:
First case:
fn main() {
let mut name = String::from("Charlie");
let x = &mut name;
say_hello(x); // the mutable borrow ends here
name.push_str(" Brown"); // a new mutable borrow
}
Second case:
fn main() {
let mut name = String::from("Charlie");
let x = &mut name;
name.push_str(" Brown"); // the first mutable borrow is still
// alive, you have two mutable borrows here
say_hello(x); // the first mutable borrow ends here
}
I have a function f that accepts two references, one mut and one not mut. I have values for f inside a HashMap:
use std::collections::HashMap;
fn f(a: &i32, b: &mut i32) {}
fn main() {
let mut map = HashMap::new();
map.insert("1", 1);
map.insert("2", 2);
{
let a: &i32 = map.get("1").unwrap();
println!("a: {}", a);
let b: &mut i32 = map.get_mut("2").unwrap();
println!("b: {}", b);
*b = 5;
}
println!("Results: {:?}", map)
}
This doesn't work because HashMap::get and HashMap::get_mut attempt to mutably borrow and immutably borrow at the same time:
error[E0502]: cannot borrow `map` as mutable because it is also borrowed as immutable
--> src/main.rs:15:27
|
12 | let a: &i32 = map.get("1").unwrap();
| --- immutable borrow occurs here
...
15 | let b: &mut i32 = map.get_mut("2").unwrap();
| ^^^ mutable borrow occurs here
...
18 | }
| - immutable borrow ends here
In my real code I'm using a large, complex structure instead of a i32 so it is not a good idea to clone it.
In fact, I'm borrowing two different things mutably/immutably, like:
struct HashMap {
a: i32,
b: i32,
}
let mut map = HashMap { a: 1, b: 2 };
let a = &map.a;
let b = &mut map.b;
Is there any way to explain to the compiler that this is actually safe code?
I see how it possible to solve in the concrete case with iter_mut:
{
let mut a: &i32 = unsafe { mem::uninitialized() };
let mut b: &mut i32 = unsafe { mem::uninitialized() };
for (k, mut v) in &mut map {
match *k {
"1" => {
a = v;
}
"2" => {
b = v;
}
_ => {}
}
}
f(a, b);
}
But this is slow in comparison with HashMap::get/get_mut
TL;DR: You will need to change the type of HashMap
When using a method, the compiler does not inspect the interior of a method, or perform any runtime simulation: it only bases its ownership/borrow-checking analysis on the signature of the method.
In your case, this means that:
using get will borrow the entire HashMap for as long as the reference lives,
using get_mut will mutably borrow the entire HashMap for as long as the reference lives.
And therefore, it is not possible with a HashMap<K, V> to obtain both a &V and &mut V at the same time.
The work-around, therefore, is to avoid the need for a &mut V entirely.
This can be accomplished by using Cell or RefCell:
Turn your HashMap into HashMap<K, RefCell<V>>,
Use get in both cases,
Use borrow() to get a reference and borrow_mut() to get a mutable reference.
use std::{cell::RefCell, collections::HashMap};
fn main() {
let mut map = HashMap::new();
map.insert("1", RefCell::new(1));
map.insert("2", RefCell::new(2));
{
let a = map.get("1").unwrap();
println!("a: {}", a.borrow());
let b = map.get("2").unwrap();
println!("b: {}", b.borrow());
*b.borrow_mut() = 5;
}
println!("Results: {:?}", map);
}
This will add a runtime check each time you call borrow() or borrow_mut(), and will panic if you ever attempt to use them incorrectly (if the two keys are equal, unlike your expectations).
As for using fields: this works because the compiler can reason about borrowing status on a per-field basis.
Something appears to have changed since the question was asked. In Rust 1.38.0 (possibly earlier), the following compiles and works:
use std::collections::HashMap;
fn f(a: &i32, b: &mut i32) {}
fn main() {
let mut map = HashMap::new();
map.insert("1", 1);
map.insert("2", 2);
let a: &i32 = map.get("1").unwrap();
println!("a: {}", a);
let b: &mut i32 = map.get_mut("2").unwrap();
println!("b: {}", b);
*b = 5;
println!("Results: {:?}", map)
}
playground
There is no need for RefCell, nor is there even a need for the inner scope.
Vec<T> has two methods:
fn push(&mut self, value: T)
fn split_at_mut(&mut self, mid: usize) -> (&mut [T], &mut [T])
They both take a mutable reference to the vector. But the scope of the borrow seems to be different, e.g:
fn works() {
let mut nums: Vec<i64> = vec![1,2,3,4];
nums.push(5);
println!("{}", nums.len());
}
fn doesnt_work() {
let mut nums: Vec<i64> = vec![1,2,3,4];
let (l,r) = nums.split_at_mut(2);
println!("{}", nums.len());
}
fn also_works() {
let mut nums: Vec<i64> = vec![1,2,3,4];
let _ = nums.split_at_mut(2);
println!("{}", nums.len());
}
The doesnt_work function doesn't compile, saying there is already a mutable borrow on nums and that it ends and the end of the function. The problem goes away if I ignore the values returned from split_at_mut.
The borrowing of nums in doesnt_work will last as long as the variables l and r exist because the values in the vector (and the vector itself) have literally been borrowed and are now accessible only through l and r.
You can see this effect by putting the let for l and r in a scope which ends so the borrow also ends. For example this code works fine but if you try to move the println! inside the scope (inside the curly brackets) then it will fail:
fn works() {
let mut nums = vec![1,2,3,4];
{
let (l, r) = nums.split_at_mut(2);
//println!("{}", nums.len()); //println! will fail here
}
println!("{}", nums.len());
}
In your also_works example you don't do anything with the result so the borrow is lost immediately. Basically the compiler can see that there is no way you can access the vector through the result of the method so you're free to access them through the original vector.
Let me answer my own question, since what I was really missing were lifetimes. This code compiles:
fn maybe_use<'a, 'b>(v1: &'a mut Vec<i64>, v2: &'b mut Vec<i64>) -> &'a mut Vec<i64> {
v1
}
fn main() {
let mut nums1: Vec<i64> = vec![1,2,3,4];
let mut nums2: Vec<i64> = vec![1,2,3,4];
let ret = maybe_use(&mut nums1, &mut nums2);
println!("{}", nums2.len());
}
Because the return type of maybe_use makes it clear the reference comes from the first argument. If we change v2 to use 'a lifetime, main stops compiling because both vectors passed to maybe_use are considered borrowed. If we omit the lifetime altogether, compiler emits this error:
this function's return type contains a borrowed value, but the
signature does not say whether it is borrowed from v1 or v2
So what surprised me initially (how does the compiler know split_at_mut returns pointers to the vector?) boils down to the references having the same lifetime.
I'm getting a Rust compile error from the borrow checker, and I don't understand why. There's probably something about lifetimes I don't fully understand.
I've boiled it down to a short code sample. In main, I want to do this:
fn main() {
let codeToScan = "40 + 2";
let mut scanner = Scanner::new(codeToScan);
let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
println!("first token is: {}", first_token);
// scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
}
Trying to call scanner.consume_till a second time gives me this error:
example.rs:64:5: 64:12 error: cannot borrow `scanner` as mutable more than once at a time
example.rs:64 scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
^~~~~~~
example.rs:62:23: 62:30 note: previous borrow of `scanner` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `scanner` until the borrow ends
example.rs:62 let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
^~~~~~~
example.rs:65:2: 65:2 note: previous borrow ends here
example.rs:59 fn main() {
...
example.rs:65 }
Basically, I've made something like my own iterator, and the equivalent to the "next" method takes &mut self. Because of that, I can't use the method more than once in the same scope.
However, the Rust std library has an iterator which can be used more than once in the same scope, and it also takes a &mut self parameter.
let test = "this is a string";
let mut iterator = test.chars();
iterator.next();
iterator.next(); // This is PERFECTLY LEGAL
So why does the Rust std library code compile, but mine doesn't? (I'm sure the lifetime annotations are at the root of it, but my understanding of lifetimes doesn't lead to me expecting a problem).
Here's my full code (only 60 lines, shortened for this question):
use std::str::{Chars};
use std::iter::{Enumerate};
#[deriving(Show)]
struct ConsumeResult<'lt> {
value: &'lt str,
startIndex: uint,
endIndex: uint,
}
struct Scanner<'lt> {
code: &'lt str,
char_iterator: Enumerate<Chars<'lt>>,
isEof: bool,
}
impl<'lt> Scanner<'lt> {
fn new<'lt>(code: &'lt str) -> Scanner<'lt> {
Scanner{code: code, char_iterator: code.chars().enumerate(), isEof: false}
}
fn assert_not_eof<'lt>(&'lt self) {
if self.isEof {fail!("Scanner is at EOF."); }
}
fn next(&mut self) -> Option<(uint, char)> {
self.assert_not_eof();
let result = self.char_iterator.next();
if result == None { self.isEof = true; }
return result;
}
fn consume_till<'lt>(&'lt mut self, quit: |char| -> bool) -> ConsumeResult<'lt> {
self.assert_not_eof();
let mut startIndex: Option<uint> = None;
let mut endIndex: Option<uint> = None;
loop {
let should_quit = match self.next() {
None => {
endIndex = Some(endIndex.unwrap() + 1);
true
},
Some((i, ch)) => {
if startIndex == None { startIndex = Some(i);}
endIndex = Some(i);
quit (ch)
}
};
if should_quit {
return ConsumeResult{ value: self.code.slice(startIndex.unwrap(), endIndex.unwrap()),
startIndex:startIndex.unwrap(), endIndex: endIndex.unwrap() };
}
}
}
}
fn main() {
let codeToScan = "40 + 2";
let mut scanner = Scanner::new(codeToScan);
let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
println!("first token is: {}", first_token);
// scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
}
Here's a simpler example of the same thing:
struct Scanner<'a> {
s: &'a str
}
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'a>(&'a mut self) -> &'a str {
let return_value = self.s.slice_to(3);
self.s = self.s.slice_from(3);
return_value
}
}
fn main() {
let mut scan = Scanner { s: "123456" };
let a = scan.step_by_3_bytes();
println!("{}", a);
let b = scan.step_by_3_bytes();
println!("{}", b);
}
If you compile that, you get errors like the code in the question:
<anon>:19:13: 19:17 error: cannot borrow `scan` as mutable more than once at a time
<anon>:19 let b = scan.step_by_3_bytes();
^~~~
<anon>:16:13: 16:17 note: previous borrow of `scan` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `scan` until the borrow ends
<anon>:16 let a = scan.step_by_3_bytes();
^~~~
<anon>:21:2: 21:2 note: previous borrow ends here
<anon>:13 fn main() {
...
<anon>:21 }
^
Now, the first thing to do is to avoid shadowing lifetimes: that is, this code has two lifetimes called 'a and all the 'as in step_by_3_bytes refer to the 'a declare there, none of them actually refer to the 'a in Scanner<'a>. I'll rename the inner one to make it crystal clear what is going on
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'b mut self) -> &'b str {
The problem here is the 'b is connecting the self object with the str return value. The compiler has to assume that calling step_by_3_bytes can make arbitrary modifications, including invalidating previous return values, when looking at the definition of step_by_3_bytes from the outside (which is how the compiler works, type checking is purely based on type signatures of things that are called, no introspect). That is, it could be defined like
struct Scanner<'a> {
s: &'a str,
other: String,
count: uint
}
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'b mut self) -> &'b str {
self.other.push_str(self.s);
// return a reference into data we own
self.other.as_slice()
}
}
Now, each call to step_by_3_bytes starts modifying the object that previous return values came from. E.g. it could cause the String to reallocate and thus move in memory, leaving any other &str return values as dangling pointers. Rust protects against this by tracking these references and disallowing mutation if it could cause such catastrophic events. Going back to our actual code: the compiler is type checking main just by looking at the type signature of step_by_3_bytes/consume_till and so it can only assume the worst case scenario (i.e. the example I just gave).
How do we solve this?
Let's take a step back: as if we're just starting out and don't know which lifetimes we want for the return values, so we'll just leave them anonymous (not actually valid Rust):
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'_ mut self) -> &'_ str {
Now, we get to ask the fun question: which lifetimes do we want where?
It's almost always best to annotate the longest valid lifetimes, and we know our return value lives for 'a (since it comes straight of the s field, and that &str is valid for 'a). That is,
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'_ mut self) -> &'a str {
For the other '_, we don't actually care: as API designers, we don't have any particular desire or need to connect the self borrow with any other references (unlike the return value, where we wanted/needed to express which memory it came from). So, we might as well leave it off
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&mut self) -> &'a str {
The 'b is unused, so it can be killed, leaving us with
impl<'a> Scanner<'a> {
fn step_by_3_bytes(&mut self) -> &'a str {
This expresses that Scanner is referring to some memory that is valid for at least 'a, and then returning references into just that memory. The self object is essentially just a proxy for manipulating those views: once you have the reference it returns, you can discard the Scanner (or call more methods).
In summary, the full, working code is
struct Scanner<'a> {
s: &'a str
}
impl<'a> Scanner<'a> {
fn step_by_3_bytes(&mut self) -> &'a str {
let return_value = self.s.slice_to(3);
self.s = self.s.slice_from(3);
return_value
}
}
fn main() {
let mut scan = Scanner { s: "123456" };
let a = scan.step_by_3_bytes();
println!("{}", a);
let b = scan.step_by_3_bytes();
println!("{}", b);
}
Applying this change to your code is simply adjusting the definition of consume_till.
fn consume_till(&mut self, quit: |char| -> bool) -> ConsumeResult<'lt> {
So why does the Rust std library code compile, but mine doesn't? (I'm sure the lifetime annotations are at the root of it, but my understanding of lifetimes doesn't lead to me expecting a problem).
There's a slight (but not huge) difference here: Chars is just returning a char, i.e. no lifetimes in the return value. The next method (essentially) has signature:
impl<'a> Chars<'a> {
fn next(&mut self) -> Option<char> {
(It's actually in an Iterator trait impl, but that's not important.)
The situation you have here is similar to writing
impl<'a> Chars<'a> {
fn next(&'a mut self) -> Option<char> {
(Similar in terms of "incorrect linking of lifetimes", the details differ.)
Let’s look at consume_till.
It takes &'lt mut self and returns ConsumeResult<'lt>. This means that the lifetime 'lt, the duration of the borrow of the input parameter self, will be that of the output parameter, the return value.
Expressed another way, after calling consume_till, you cannot use self again until its result is out of scope.
That result is placed into first_token, and first_token is still in scope in your last line.
In order to get around this, you must cause first_token to pass out of scope; the insertion of a new block around it will do this:
fn main() {
let code_to_scan = "40 + 2";
let mut scanner = Scanner::new(code_to_scan);
{
let first_token = scanner.consume_till(|c| !c.is_digit());
println!("first token is: {}", first_token);
}
scanner.consume_till(|c| c.is_whitespace());
}
All this does stand to reason: while you have a reference to something inside the Scanner, it is not safe to let you modify it, lest that reference be invalidated. This is the memory safety that Rust provides.