I am working may way through the Rust book, and it has this code snippet:
fn first_word(s: &String) -> usize {
let bytes = s.as_bytes();
for (i, &item) in bytes.iter().enumerate() {
if item == b' ' {
return i;
}
}
s.len()
}
In the previous chapter, it was explained that
the rust compiler prevents you from retaining references into objects after they go out of scope (a dangling reference), and
variables go out of scope at the last moment they are mentioned (the example given of this showed the creation of both an immutable and mutable reference to the same object in the same block by ensuring that the immutable reference was not mentioned after the creation of the mutable reference).
To me, it looks like bytes is not referenced after the for line header (presumably the code associated with bytes.iter().enumerate() is executed just once before the loop starts, not on every loop iteration), so &item shouldn't be allowed to be a reference into any part of bytes. But I don't see any other object (is "object" the right rust terminology?) that it could be a reference into.
It's true that s is still in scope, but, well... does the compiler even remember the connection between bytes and s by the time the for loop rolls around? Indeed, even if I change the function to accept bytes directly, the compiler thinks things are hunky-dory:
fn first_word(bytes: &[u8]) -> usize {
for (i, &item) in bytes.iter().enumerate() {
if item == b' ' {
return i+1;
}
}
0
}
Here no variables other than i or item is mentioned after the for loop header, so it really seems like &item can't be a reference into anything!
I looked at this very similarly-titled question. The comments and answers there suggest that there might be a lifetime argument which is keeping bytes alive, and proposed a way to ask the compiler what it thinks the type/lifetime is. I haven't learned about lifetimes yet, so I'm fumbling about in the dark a little, but I tried:
fn first_word<'x_lifetime>(s: &String) -> usize {
let bytes = s.as_bytes();
let x_variable: &'x_lifetime () = bytes;
for (i, &item) in bytes.iter().enumerate() {
if item == b' ' {
return i+1;
}
}
0
}
However, the error, while indeed indicating that bytes is a &[u8], which I understand, also seems to imply there's no extra lifetime information associated with bytes. I say this because the error includes lifetime information in the "expected" part but not in the "found" part. Here it is:
error[E0308]: mismatched types
--> test.rs:3:39
|
3 | let x_variable: &'x_lifetime () = bytes;
| --------------- ^^^^^ expected `()`, found slice `[u8]`
| |
| expected due to this
|
= note: expected reference `&'x_lifetime ()`
found reference `&[u8]`
So what is going on here? Obviously some part of my reasoning is off, but what? Why isn't item a dangling reference?
Here is a version of the first function with all the lifetimes and types included, and the for loop replaced with an equivalent while let loop.
fn first_word<'a>(s: &'a String) -> usize {
let bytes: &'a [u8] = s.as_bytes();
let mut iter: std::iter::Enumerate<std::slice::Iter<'a, u8>>
= bytes.iter().enumerate();
while let Some(iter_item) = iter.next() {
let (i, &item): (usize, &'a u8) = iter_item;
if item == b' ' {
return i;
}
}
s.len()
}
Things to notice here:
The type of the iterator produced by bytes.iter().enumerate() has a lifetime parameter which ensures the iterator does not outlive the [u8] it iterates over. (Note that the &[u8] can go away — it isn't needed. What matters is that its referent, the bytes inside the String, stays alive. So, we only really need to think about one lifetime 'a, not separate lifetimes for s and bytes, because there's only one byte-slice inside the String that we're referring to in different ways.
iter — an explicit variable corresponding to the implicit action of for — is used in every iteration of the loop.
I see another misunderstanding, not exactly about the lifetimes and scope:
there might be a lifetime argument which is keeping bytes alive,
Lifetimes never keep something alive. Lifetimes never affect the execution of the program; they never affect when something is dropped or deallocated. A lifetime in some type such as &'a u8 is a compile-time claim that values of that type will be valid for that lifetime. Changing the lifetimes in a program changes only what is to be proven (checked) by the compiler, not what is true about the program.
Related
In attempting to refactor a Rust application that was working fine, I tried to separate out the contents of a loop into a new function. However, in this newly refactored out function, I needed to pass an argument that had to be mutable, and passed by reference. Suddenly the code that absolutely worked inline, broke just because of the mutable reference passing.
My question is: can someone please explain why this does not work with such a "simple" change? (i.e. refactoring out a new function of otherwise unchanged code)
I have a minimal demo of the problem, along with a couple of working comparisons below. Here is the error from that code:
error[E0499]: cannot borrow `str_to_int` as mutable more than once at a time
--> src/main.rs:30:22
|
30 | get_key(key, &mut str_to_int);
| ^^^^^^^^^^^^^^^ `str_to_int` was mutably borrowed here in the previous iteration of the loop
The sample code:
use std::collections::BTreeMap;
fn get_int (
key: u32,
values: &mut BTreeMap<u32, u32>,
) -> &u32 {
values.entry(key).or_insert_with(|| { 1 })
}
fn get_key<'a> (
key: &'a str,
values: &'a mut BTreeMap<&'a str, u32>,
) -> &'a u32 {
values.entry(key).or_insert_with(|| { 1 })
}
fn main() {
let mut int_to_int = BTreeMap::new();
for key in vec![1,2] {
get_int(key, &mut int_to_int);
}
let mut str_to_int_inline = BTreeMap::new();
for key in vec!["a","b"] {
str_to_int_inline.entry(key).or_insert_with(|| { 1 });
}
let mut str_to_int = BTreeMap::new();
for key in vec!["a","b"] {
get_key(key, &mut str_to_int);
}
}
Note that the first loop (int_to_int) is identical to the third loop (str_to_int) except for the data type of the key -- in that the key was not a reference, so no lifetime was required to be specified. And the second loop (str_to_int_inline) is identical to the third loop (str_to_int) except the behavior is inline instead of in a separate function.
There are many related questions and blogs on this topic, but they all seem more specifically focused on particular versions of this question, and I want to know the more generic explanation (to my current understanding). If the answer is already just to understand one of these links better, I will gladly mark this question as a duplicate.
Related questions:
How to fix ".. was mutably borrowed here in the previous iteration of the loop" in Rust?
https://users.rust-lang.org/t/mutable-borrow-starts-here-in-previous-iteration-of-loop/26145
https://github.com/rust-lang/rust/issues/47680#issuecomment-363131420
Why does linking lifetimes matter only with mutable references?
Something that I read also led me to https://github.com/rust-lang/polonius which also seemed like maybe it could make this work, in the future -- any thoughts?
Your problem is very simple: you are specifying the lifetimes for get_key() incorrectly.
The correct (and working) version is:
fn get_key<'a, 'b>(key: &'a str, values: &'b mut BTreeMap<&'a str, u32>) -> &'b u32 {
values.entry(key).or_insert_with(|| 1)
}
Maybe you can already guess what's going on.
Since you used 'a for both the HashMap itself and its keys, it means you were required to borrow the HashMap for as long as the keys' lifetime - 'static. This means two things:
You need a &'static mut HashMap, which you don't have.
You're borrowing the HashMap for 'static in the first iteration of the loop, then borrow it again in the next, while it is still borrowed (because it is borrowed for 'static). This error hides the first error, somewhat erroneously, and is the only error the compiler emits.
In general, using a lifetime twice with a mutable reference is almost always wrong (shared references are more tolerant, as they're covariant over their type).
The Rust Programming Language says:
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
if x.len() > y.len() {
x
} else {
y
}
}
The function signature now tells Rust that for some lifetime 'a, the
function takes two parameters, both of which are string slices that
live at least as long as lifetime 'a. The function signature also
tells Rust that the string slice returned from the function will live
at least as long as lifetime 'a. In practice, it means that the
lifetime of the reference returned by the longest function is the
same as the smaller of the lifetimes of the references passed in.
These relationships are what we want Rust to use when analyzing this
code.
I don't get why it says:
In practice, it means that the lifetime of the reference returned by
the longest function is the same as the smaller of the lifetimes of
the references passed in.
Note the word "smaller". For both parameters and returned values, we specified 'a which is the same. Why does the book say "smaller"? If that was the case, we would have different specifiers(a', b').
It's important to note that in the example, whatever is passed as x and y do not have to have identical lifetimes.
Let's rework the example to use &i32 which makes for easier demonstrations:
fn biggest<'a>(x: &'a i32, y: &'a i32) -> &'a i32 {
if x > y {
x
} else {
y
}
}
Now, given the following example:
fn main() {
let bigger;
let a = 64;
{
let b = 32;
bigger = biggest(&a, &b);
}
dbg!(bigger);
}
We have 3 different lifetimes:
bigger which lives until the end of main
a which lives until the end of main
b which lives until the end of its block
Now lets take the description apart
the lifetime of the reference returned by the [...] function
In our case this would be bigger, which lives until the end of main
smaller of the lifetimes of the references passed in
We passed in a, which gets dropped at the end of main, and b, which gets dropped at the end of its block. Since b gets dropped first, it is the "smaller" lifetime.
And sure enough, the compiler yells at us:
error[E0597]: `b` does not live long enough
--> src/main.rs:7:30
|
7 | bigger = biggest(&a, &b);
| ^^ borrowed value does not live long enough
8 | }
| - `b` dropped here while still borrowed
9 |
10 | dbg!(bigger);
| ------ borrow later used here
But if we move bigger inside the block, and hence reduce its lifetime, the code compiles:
fn main() {
let a = 64;
{
let b = 32;
let bigger = biggest(&a, &b);
dbg!(bigger);
}
println!("This works!");
}
Now a and b still have different lifetimes but the code compiles since the the smaller of both lifetimes (b's) is the same as bigger's lifetime.
Did this help illustrate what "smaller" lifetime means?
I'm trying to modify the borrow of a mutable value, here is a minimal example:
fn main() {
let mut w: [char; 5] = ['h', 'e', 'l', 'l', 'o'];
let mut wslice: &mut [char] = &mut w;
advance_slice(&mut wslice);
advance_slice(&mut wslice);
}
fn advance_slice(s: &mut &mut [char]) {
let new: &mut [char] = &mut s[1..];
*s = new;
}
the compiler gives me this error:
error[E0623]: lifetime mismatch
--> src/main.rs:10:10
|
8 | fn advance_slice(s: &mut &mut [char]) {
| ----------------
| |
| these two types are declared with different lifetimes...
9 | let new: &mut [char] = &mut s[1..];
10 | *s = new;
| ^^^ ...but data from `s` flows into `s` here
I've tried to give the same lifetime to both borrow, without success.
Also this works if I remove the mutability of w.
This error message is indeed unfortunate, and I don't think it provides a good explanation of what's going on here. The issue is a bit involved.
The error is local to the advance_slice() function, so that's all we need to look at. The type of the slice items is irrelevant, so let's take this function definition:
fn advance_slice_mut<T>(s: &mut &mut [T]) {
let new = &mut s[1..];
*s = new;
}
The first line creates a new slice object starting after the first item of the original slice.
Why is this even allowed? Don't we have two mutable references to the same data now? The original slice *s includes the new slice new, and both allow modifying data. The reason this is legal is that *s is implicitly reborrowed when creating the subslice, and *s cannot be used again for the lifetime of that borrow, so we still have only one active reference to the data in the subslice. The reborrow is scoped to the function advance_slice_mut(), and thus has a shorter lifetime than the original slice, which is the root cause of the error you get – you are effectively trying to assign a slice which only lives until the end of the function to a memory location that lives longer than the function call.
This kind of implicit reborrow happens everytime you call a function that takes an argument by mutable reference, including in the implicit call to index_mut() in &mut s[1..]. Mutable references can't be copied, since this would create two mutable references to the same memory, and the Rust language designers decided that an implicit reborrow is the more ergonomic solution than moving mutable references by default. The reborrow does not happen for shared references, though, since they can be freely copied. This means that &s[1..] will have the same lifetime as the original scope, since it is perfectly fine for two overlapping immutable slices to coexist. This explains why your function definition works fine for immutable slices.
So how do we fix this problem? I believe that what you intend to do is perfectly safe – after reassigning the new slice to the old one, the old one is gone, so we don't have two concurrent mutable references to the same memory. To create a slice with the same lifetime as the original slice in safe code, we need to move the original slice out of the reference we got. We can do this by replacing it with an empty slice:
pub fn advance_slice_mut<T>(s: &mut &mut [T]) {
let slice = std::mem::replace(s, &mut []);
*s = &mut slice[1..];
}
Alternatively, we can also resort to unsafe code:
use std::slice::from_raw_parts_mut;
pub fn advance_slice_mut<T>(s: &mut &mut [T]) {
unsafe {
assert!(!s.is_empty());
*s = from_raw_parts_mut(s.as_mut_ptr().add(1), s.len() - 1);
}
}
I've just written a small Rust program which calculates Fibonacci numbers and memoizes the calculation. It works, but I'm a little confused about why, especially the recursive call. (It also probably isn't idiomatic.)
Here's the program:
use std::collections::HashMap;
fn main() {
let n = 42; // hardcoded for simplicity
let mut cache = HashMap::new();
let answer = fib(n, &mut cache);
println!("fib of {} is {}", n, answer);
}
fn fib(n: i32, cache: &mut HashMap<i32,i32>) -> i32 {
if cache.contains_key(&n) {
return cache[&n];
} else {
if n < 1 { panic!("must be >= 1") }
let answer = if n == 1 {
0
} else if n == 2 {
1
} else {
fib(n - 1, cache) + fib(n - 2, cache)
};
cache.insert(n, answer);
answer
}
}
Here's how I understand what's going on:
In main, let mut cache means "I want to be able to mutate this hashmap (or re-assign the variable)".
When main calls fib, it passes &mut cache to say "I'm lending you this, and you're allowed to mutate it."
In the signature of fib, cache: &mut Hashmap means "I expect to be lent a mutable HashMap - to borrow it with permission to mutate"
(Please correct me if I'm wrong.)
But when fib recurses, calling fib(n -1, cache), I do not need to use fib(n -1, &mut cache), and I get an error if I do: "cannot borrow immutable local variable cache as mutable". Huh? It's not an immutable local variable, it's a mutable borrow - right?
If I try fib(n - 1, &cache), I get a slightly different error:
error: mismatched types:
expected `&mut std::collections::hash::map::HashMap<i32, i32>`,
found `&&mut std::collections::hash::map::HashMap<i32, i32>`
Which looks like it's saying "I expected a mutable reference and got a reference to a mutable reference".
I know that fib is lending in the recursive call because if it gave up ownership, it couldn't call cache.insert afterwards. And I know that this isn't a special case for recursion, because if I define fib2 to be nearly identical to fib, I can have them recurse via each other and it works fine.
Why do I not need to explicitly lend a borrowed, mutable variable?
Your three points are pretty much spot-on. When the compiler won't allow you to pass &mut cache, it is because the value is actually already borrowed. The type of cache is &mut HashMap<i32, i32>, so passing &mut cache results in a value of type &mut &mut HashMap<i32, i32>. Just passing cache results in the expected type.
The specific error message cannot borrow immutable local variable cache as mutable is triggered because the variable cache isn't itself mutable, even though the memory it points to (the HashMap) is. This is because the argument declaration cache: &mut HashMap<i32, i32> doesn't declare a mut variable. This is similar to how a let differs in mutability from a let mut. Rust does support mutable arguments, which in this case would look like mut cache: &mut HashMap<i32, i32>.
In your function, cache is a reference of the outside HashMap. And if you still write & mut cache, you are trying to mutate a reference, not the original HashMap.
What is the difference between passing a value to a function by reference and passing it "by Box":
fn main() {
let mut stack_a = 3;
let mut heap_a = Box::new(3);
foo(&mut stack_a);
println!("{}", stack_a);
let r = foo2(&mut stack_a);
// compile error if the next line is uncommented
// println!("{}", stack_a);
bar(heap_a);
// compile error if the next line is uncommented
// println!("{}", heap_a);
}
fn foo(x: &mut i32) {
*x = 5;
}
fn foo2(x: &mut i32) -> &mut i32 {
*x = 5;
x
}
fn bar(mut x: Box<i32>) {
*x = 5;
}
Why is heap_a moved into the function, but stack_a is not (stack_a is still available in the println! statement after the foo() call)?
The error when uncommenting println!("{}", stack_a);:
error[E0502]: cannot borrow `stack_a` as immutable because it is also borrowed as mutable
--> src/main.rs:10:20
|
8 | let r = foo2(&mut stack_a);
| ------- mutable borrow occurs here
9 | // compile error if the next line is uncommented
10 | println!("{}", stack_a);
| ^^^^^^^ immutable borrow occurs here
...
15 | }
| - mutable borrow ends here
I think this error can be explained by referring to lifetimes. In the case of foo, stack_a (in the main function) is moved to function foo, but the compiler finds that the lifetime of the argument of the function foo, x: &mut i32, ends at end of foo. Hence, it lets us use the variable stack_a in the main function after foo returns. In the case of foo2, stack_a is also moved to the function, but we also return it.
Why doesn't the lifetime of heap_a end at end of bar?
Pass-by-value is always either a copy (if the type involved is “trivial”) or a move (if not). Box<i32> is not copyable because it (or at least one of its data members) implements Drop. This is typically done for some kind of “clean up” code. A Box<i32> is an “owning pointer”. It is the sole owner of what it points to and that's why it “feels responsible” to free the i32's memory in its drop function. Imagine what would happen if you copied a Box<i32>: Now, you would have two Box<i32> instances pointing to the same memory location. This would be bad because this would lead to a double-free error. That's why bar(heap_a) moves the Box<i32> instance into bar(). This way, there is always no more than a single owner of the heap-allocated i32. And this makes managing the memory pretty simple: Whoever owns it, frees it eventually.
The difference to foo(&mut stack_a) is that you don't pass stack_a by value. You just “lend” foo() stack_a in a way that foo() is able to mutate it. What foo() gets is a borrowed pointer. When execution comes back from foo(), stack_a is still there (and possibly modified via foo()). You can think of it as stack_a returned to its owning stack frame because foo() just borrowed it only for a while.
The part that appears to confuse you is that by uncommenting the last line of
let r = foo2(&mut stack_a);
// compile error if uncomment next line
// println!("{}", stack_a);
you don't actually test whether stack_a as been moved. stack_a is still there. The compiler simply does not allow you to access it via its name because you still have a mutably borrowed reference to it: r. This is one of the rules we need for memory safety: There can only be one way of accessing a memory location if we're also allowed to alter it. In this example r is a mutably borrowed reference to stack_a. So, stack_a is still considered mutably borrowed. The only way of accessing it is via the borrowed reference r.
With some additional curly braces we can limit the lifetime of that borrowed reference r:
let mut stack_a = 3;
{
let r = foo2(&mut stack_a);
// println!("{}", stack_a); WOULD BE AN ERROR
println!("{}", *r); // Fine!
} // <-- borrowing ends here, r ceases to exist
// No aliasing anymore => we're allowed to use the name stack_a again
println!("{}", stack_a);
After the closing brace there is again only one way of accessing the memory location: the name stack_a. That's why the compiler lets us use it in println!.
Now you may wonder, how does the compiler know that r actually refers to stack_a? Does it analyze the implementation of foo2 for that? No. There is no need. The function signature of foo2 is sufficient in reaching this conclusion. It's
fn foo2(x: &mut i32) -> &mut i32
which is actually short for
fn foo2<'a>(x: &'a mut i32) -> &'a mut i32
according to the so-called “lifetime elision rules”. The meaning of this signature is: foo2() is a function that takes a borrowed pointer to some i32 and returns a borrowed pointer to an i32 which is the same i32 (or at least a “part” of the original i32) because the the same lifetime parameter is used for the return type. As long as you hold on to that return value (r) the compiler considers stack_a mutably borrowed.
If you're interested in why we need to disallow aliasing and (potential) mutation happening at the same time w.r.t. some memory location, check out Niko's great talk.
When you pass a boxed value, you are moving the value completely. You no longer own it, the thing you passed it to does. It is so for any type that is not Copy (plain old data that can just be memcpy’d, which a heap allocation certainly can’t be). This is how Rust’s ownership model works: each object is owned in exactly one place.
If you wish to mutate the contents of the box, you should pass in a &mut i32 rather than the whole Box<i32>.
Really, Box<T> is only useful for recursive data structures (so that they can be represented rather than being of infinite size) and for the very occasional performance optimisation on large types (which you shouldn’t try doing without measurements).
To get &mut i32 out of a Box<i32>, take a mutable reference to the dereferenced box, i.e. &mut *heap_a.
The difference between passing by reference and "by box" is that, in the reference case ("lend"), the caller is responsible for deallocating the object, but in the box case ("move"), the callee is responsible for deallocating the object.
Therefore, Box<T> is useful for passing objects with responsibility for deallocating, while the reference is useful for passing objects without responsibility for deallocating.
A simple example which demonstrates these ideas:
fn main() {
let mut heap_a = Box::new(3);
foo(&mut *heap_a);
println!("{}", heap_a);
let heap_b = Box::new(3);
bar(heap_b);
// can't use `heap_b`. `heap_b` has been deallocated at the end of `bar`
// println!("{}", heap_b);
} // `heap_a` is destroyed here
fn foo(x: &mut i32) {
*x = 5;
}
fn bar(mut x: Box<i32>) {
*x = 5;
} // heap_b (now `x`) is deallocated here