Do lifetime annotations in Rust change the lifetime of the variables? - reference

The Rust chapter states that the annotations don't tamper with the lifetime of a variable but how true is that? According to the book, the function longest takes two references of the strings and return the longer one. But here in the error case
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
if x.len() > y.len() {
x
} else {
y
}
}
fn main() {
let string1 = String::from("long string is long");
let result;
{
let string2 = String::from("xyz");
result = longest(string1.as_str(), string2.as_str());
}
println!("The longest string is {}", result);
}
it does actually change the lifetime of the result variable, doesn't it?
We’ve told Rust that the lifetime of the reference returned by the longest function is the same as the smaller of the lifetimes of the references passed in.

What the book is merely suggesting is that a lifetime parameter of a function cannot interfere with the affected value's lifetime. They cannot make a value live longer (or the opposite) than what is already prescribed by the program.
However, different function signatures can decide the lifetime of those references. Since references are covariant with respect to their lifetimes, you can turn a reference of a "wider" lifetime into a smaller one within that lifetime.
For example, given the definition
fn longest<'a>(a: &'a str, b: &'a str) -> &'a str
, the lifetimes of the two input references must match. However, we can write this:
let local = "I am local string.".to_string();
longest(&local, "I am &'static str!");
The string literal, which has a 'static lifetime, is compatible with the lifetime 'a, in this case mainly constrained by the string local.
Likewise, in the example above, the lifetime 'a has to be constrained to the nested string string2, otherwise it could not be passed by reference to the function. This also means that the output reference is restrained by this lifetime, which is why the code fails to compile when attempting to use the output of longest outside the scope of string2:
error[E0597]: `string2` does not live long enough
--> src/main.rs:14:44
|
14 | result = longest(string1.as_str(), string2.as_str());
| ^^^^^^^ borrowed value does not live long enough
15 | }
| - `string2` dropped here while still borrowed
16 | println!("The longest string is {}", result);
| ------ borrow later used here
See also this question for an extended explanation of lifetimes and their covariance/contravariance characteristics:
How can this instance seemingly outlive its own parameter lifetime?

First, it's important to understand the difference between a lifetime and a scope. References have lifetimes, which are dependent on the scopes of the variables they refer to.
A variable scope is lexical:
fn main() {
let string1 = String::from("long string is long"); // <-- scope of string1 begins here
let result;
{
let string2 = String::from("xyz"); // <-- scope of string2 begins here
result = longest(string1.as_str(), string2.as_str());
// <-- scope of string2 ends here
}
println!("The longest string is {}", result);
// <-- scope of string1 ends here
}
When you create a new reference to a variable, the lifetime of the reference is tied solely to the scope of the variable. Other references have different lifetime information attached to them, depending on where the reference came from and what information is known in that context. When you put named lifetime annotations on a type, the type-checker simply ensures that the lifetime information attached to any references is compatible with the annotations.
fn main() {
let string1 = String::from("long string is long");
let result;
{
let string2 = String::from("xyz");
// The lifetime of result cannot be longer than `'a`
result = longest(string1.as_str(), string2.as_str());
// But a reference to string2 also has lifetime `'a`, which means that
// the lifetime `'a` is only valid for the scope of string2
// <-- i.e. to here
}
// But then we try to use it here — oops!
println!("The longest string is {}", result);
}
We’ve told Rust that the lifetime of the reference returned by the longest function is the same as the smaller of the lifetimes of the references passed in.
Sort of. We did tell this information to Rust, however, the borrow-checker will still check if it is true! If it's isn't already true then we will get an error. We can't change the truthfulness of that information, we can only tell Rust the constraints we want, and it will tell us if we are right.
In your example, you could make the main function valid by changing the lifetime annotations on longest:
fn longest<'a, 'b>(x: &'a str, y: &'b str) -> &'a str {
if x.len() > y.len() {
x
} else {
y // oops!
}
}
But now you get an error inside longest because it no longer meets the requirements: it is now never valid to return y because its lifetime could be shorter than 'a. In fact, the only ways to implement this function correctly are:
Return x
Return a slice of x
Return a &'static str — since 'static outlives all other lifetimes
Use unsafe code

Related

Rust complains about freed mutable borrow but not immutable borrow

The following two rust functions are identical other than the fact that one returns an immutable reference and the other returns a mutable one. Since neither involves borrowing something multiple times, I don't see why the two sould work any differently. However, the one with the mutable reference results in a compile error, while the one with the immutable reference does not:
// This complies with no problems
fn foo<'a>() {
let _: &'a () = &();
}
// This does not compile (see error below)
fn foo_mut<'a>() {
let _: &'a mut () = &mut ();
}
error[E0716]: temporary value dropped while borrowed
--> src/main.rs:14:30
|
13 | fn foo_mut<'a>() {
| -- lifetime `'a` defined here
14 | let _: &'a mut () = &mut ();
| ---------- ^^ creates a temporary which is freed while still in use
| |
| type annotation requires that borrow lasts for `'a`
15 | }
| - temporary value is freed at the end of this statement
For more information about this error, try `rustc --explain E0716`.
error: could not compile `playground` due to previous error
It is also possibly relevant that when there are no explicit lifetimes, the code also has no problem compiling:
// This also compiles with no problem
fn foo_mut_without_lifetime() {
let _: &mut () = &mut ();
}
It seems the only thing that is causing a problem is trying to store a mutable reference with a lifetime, and that immutable references and references without explicit lifetimes have no issue. Why is this happening, and how can I get around it?
Note that there's nothing special about () or generic lifetimes here. This compiles fine:
fn allowed() -> &'static i32 {
let x = &3;
let y: &'static i32 = x;
y
}
And this does not:
fn not_allowed() -> &'static mut i32 {
let x = &mut 3;
let y: &'static mut i32 = x;
y
}
So why is the immutable reference allowed?
When you take a reference of a value, Rust infers the lifetime based on where the value's going to die. Here's an example:
let y;
{
let x = 3;
y = &x;
println!("{y}"); // works fine, `y` is still alive
} // `x` will get dropped at the end of this block
println!("{y}"); // fails to compile, the lifetime of `y` has expired (since `x` has died)
Since x dies at the end of the block, Rust knows that the lifetime of the y reference should only extend until the end of the block as well. Hence, it stops you from using it after x is dead.
This seems pretty obvious. But take a moment to think. In the following code:
let x;
{ // block A
x = &3;
}
What is the inferred lifetime of x? You may be tempted to say "the same as block A". But this would in fact be incorrect. Why? Because Rust is a bit smarter than that. It knows that 3 is a constant, and therefore Rust can fit 3 into the constant table of the final executable. And since the constant table will last as long as the lifetime of the final program, Rust can infer that the expression &3 has a 'static lifetime. Then everything works out fine, since &'static can be cast to any other lifetime as required!
Rust draws an explicit line between constants and temporaries, and one of the benefits of having a constant expression is that taking an immutable reference of any constant will always yield a 'static lifetime. This is not true of temporaries. The following code will not compile:
fn f() -> &'static String {
let x = &String::new();
let y: &'static String = x;
y
}
This is because for temporaries, Rust can't put them in the constant table of the executable, since they have to be computed on-demand, and therefore share the same lifetime as the scope they're in.
Okay, this is great, but why isn't the mutable reference of a constant allowed to be 'static?
There are two problems with allowing this:
On some architectures, constant tables can't be modified. This is true of WASM and some embedded architectures, as well as all Harvard-architecture machines. Providing a &mut reference would just be complete nonsense, since they're not mutable. And such fundamental borrow checker rules should really not differ between platforms.
A &'static mut reference is dangerous, since it is quite literally a global variable.

Why does Rust translate same lifetime specifiers as the smaller one

In Rust docs, we see this example:
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
if x.len() > y.len() {
x
} else {
y
}
}
And explanation looks like this:
The function signature now tells Rust that for some lifetime 'a, the
function takes two parameters, both of which are string slices that
live at least as long as lifetime 'a. The function signature also
tells Rust that the string slice returned from the function will live
at least as long as lifetime 'a. In practice, it means that the
lifetime of the reference returned by the longest function is the same
as the smaller of the lifetimes of the references passed in
Note the words after "in practice". It mentions that:
In practice, it means that the
lifetime of the reference returned by the longest function is the same
as the smaller of the lifetimes of the references passed in
I don't understand why in practice, it means that lifetime of the returned is the same as the smaller of those 2 parameter's lifetimes. Is this something I need to memorize or what ? We can clearly say that parameters and returned values all have 'a same specifier. Why does Rust think that this means returned value should have smaller lifetime of those 2 passed ?
Why does rust think that this means returned value should have SMALLER lifetime of those 2 passed ?
Because that's the only thing that makes sense. Imagine this situation:
let a = "foo"; // &'static str
let s = "bar".to_string(); // String
let b = s.as_str(); // &str (non-static, borrows from s)
let longest = longest(a, b);
The lifetime of a is 'static, i.e. a lasts as long as the program. The lifetime of b is shorter than that, as it's tied to the lifetime of the variable s. But longest only accepts one lifetime!
What Rust does is compute a lifetime that is an intersection of the 'static lifetime of a and the tied-to-s lifetime of b, and uses that as the lifetime of (this invocation of) longest(). If such a lifetime cannot be found, you get a borrow checking error. If it can be found, it's no longer than the shortest source lifetime.
In the above case, the intersection of 'static and the lifetime tied to s is the lifetime tied to s, so that's what's used for the lifetime 'a in longest().

What does "smaller" mean for multiple references that share a lifetime specifier?

The Rust Programming Language says:
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
if x.len() > y.len() {
x
} else {
y
}
}
The function signature now tells Rust that for some lifetime 'a, the
function takes two parameters, both of which are string slices that
live at least as long as lifetime 'a. The function signature also
tells Rust that the string slice returned from the function will live
at least as long as lifetime 'a. In practice, it means that the
lifetime of the reference returned by the longest function is the
same as the smaller of the lifetimes of the references passed in.
These relationships are what we want Rust to use when analyzing this
code.
I don't get why it says:
In practice, it means that the lifetime of the reference returned by
the longest function is the same as the smaller of the lifetimes of
the references passed in.
Note the word "smaller". For both parameters and returned values, we specified 'a which is the same. Why does the book say "smaller"? If that was the case, we would have different specifiers(a', b').
It's important to note that in the example, whatever is passed as x and y do not have to have identical lifetimes.
Let's rework the example to use &i32 which makes for easier demonstrations:
fn biggest<'a>(x: &'a i32, y: &'a i32) -> &'a i32 {
if x > y {
x
} else {
y
}
}
Now, given the following example:
fn main() {
let bigger;
let a = 64;
{
let b = 32;
bigger = biggest(&a, &b);
}
dbg!(bigger);
}
We have 3 different lifetimes:
bigger which lives until the end of main
a which lives until the end of main
b which lives until the end of its block
Now lets take the description apart
the lifetime of the reference returned by the [...] function
In our case this would be bigger, which lives until the end of main
smaller of the lifetimes of the references passed in
We passed in a, which gets dropped at the end of main, and b, which gets dropped at the end of its block. Since b gets dropped first, it is the "smaller" lifetime.
And sure enough, the compiler yells at us:
error[E0597]: `b` does not live long enough
--> src/main.rs:7:30
|
7 | bigger = biggest(&a, &b);
| ^^ borrowed value does not live long enough
8 | }
| - `b` dropped here while still borrowed
9 |
10 | dbg!(bigger);
| ------ borrow later used here
But if we move bigger inside the block, and hence reduce its lifetime, the code compiles:
fn main() {
let a = 64;
{
let b = 32;
let bigger = biggest(&a, &b);
dbg!(bigger);
}
println!("This works!");
}
Now a and b still have different lifetimes but the code compiles since the the smaller of both lifetimes (b's) is the same as bigger's lifetime.
Did this help illustrate what "smaller" lifetime means?

Why isn't this a dangling pointer?

I am working may way through the Rust book, and it has this code snippet:
fn first_word(s: &String) -> usize {
let bytes = s.as_bytes();
for (i, &item) in bytes.iter().enumerate() {
if item == b' ' {
return i;
}
}
s.len()
}
In the previous chapter, it was explained that
the rust compiler prevents you from retaining references into objects after they go out of scope (a dangling reference), and
variables go out of scope at the last moment they are mentioned (the example given of this showed the creation of both an immutable and mutable reference to the same object in the same block by ensuring that the immutable reference was not mentioned after the creation of the mutable reference).
To me, it looks like bytes is not referenced after the for line header (presumably the code associated with bytes.iter().enumerate() is executed just once before the loop starts, not on every loop iteration), so &item shouldn't be allowed to be a reference into any part of bytes. But I don't see any other object (is "object" the right rust terminology?) that it could be a reference into.
It's true that s is still in scope, but, well... does the compiler even remember the connection between bytes and s by the time the for loop rolls around? Indeed, even if I change the function to accept bytes directly, the compiler thinks things are hunky-dory:
fn first_word(bytes: &[u8]) -> usize {
for (i, &item) in bytes.iter().enumerate() {
if item == b' ' {
return i+1;
}
}
0
}
Here no variables other than i or item is mentioned after the for loop header, so it really seems like &item can't be a reference into anything!
I looked at this very similarly-titled question. The comments and answers there suggest that there might be a lifetime argument which is keeping bytes alive, and proposed a way to ask the compiler what it thinks the type/lifetime is. I haven't learned about lifetimes yet, so I'm fumbling about in the dark a little, but I tried:
fn first_word<'x_lifetime>(s: &String) -> usize {
let bytes = s.as_bytes();
let x_variable: &'x_lifetime () = bytes;
for (i, &item) in bytes.iter().enumerate() {
if item == b' ' {
return i+1;
}
}
0
}
However, the error, while indeed indicating that bytes is a &[u8], which I understand, also seems to imply there's no extra lifetime information associated with bytes. I say this because the error includes lifetime information in the "expected" part but not in the "found" part. Here it is:
error[E0308]: mismatched types
--> test.rs:3:39
|
3 | let x_variable: &'x_lifetime () = bytes;
| --------------- ^^^^^ expected `()`, found slice `[u8]`
| |
| expected due to this
|
= note: expected reference `&'x_lifetime ()`
found reference `&[u8]`
So what is going on here? Obviously some part of my reasoning is off, but what? Why isn't item a dangling reference?
Here is a version of the first function with all the lifetimes and types included, and the for loop replaced with an equivalent while let loop.
fn first_word<'a>(s: &'a String) -> usize {
let bytes: &'a [u8] = s.as_bytes();
let mut iter: std::iter::Enumerate<std::slice::Iter<'a, u8>>
= bytes.iter().enumerate();
while let Some(iter_item) = iter.next() {
let (i, &item): (usize, &'a u8) = iter_item;
if item == b' ' {
return i;
}
}
s.len()
}
Things to notice here:
The type of the iterator produced by bytes.iter().enumerate() has a lifetime parameter which ensures the iterator does not outlive the [u8] it iterates over. (Note that the &[u8] can go away — it isn't needed. What matters is that its referent, the bytes inside the String, stays alive. So, we only really need to think about one lifetime 'a, not separate lifetimes for s and bytes, because there's only one byte-slice inside the String that we're referring to in different ways.
iter — an explicit variable corresponding to the implicit action of for — is used in every iteration of the loop.
I see another misunderstanding, not exactly about the lifetimes and scope:
there might be a lifetime argument which is keeping bytes alive,
Lifetimes never keep something alive. Lifetimes never affect the execution of the program; they never affect when something is dropped or deallocated. A lifetime in some type such as &'a u8 is a compile-time claim that values of that type will be valid for that lifetime. Changing the lifetimes in a program changes only what is to be proven (checked) by the compiler, not what is true about the program.

Why can't I call a method with a temporary value?

I can't call Foo::new(words).split_first() in the following code
fn main() {
let words = "Sometimes think, the greatest sorrow than older";
/*
let foo = Foo::new(words);
let first = foo.split_first();
*/
let first = Foo::new(words).split_first();
println!("{}", first);
}
struct Foo<'a> {
part: &'a str,
}
impl<'a> Foo<'a> {
fn split_first(&'a self) -> &'a str {
self.part.split(',').next().expect("Could not find a ','")
}
fn new(s: &'a str) -> Self {
Foo { part: s }
}
}
the compiler will give me an error message
error[E0716]: temporary value dropped while borrowed
--> src/main.rs:8:17
|
8 | let first = Foo::new(words).split_first();
| ^^^^^^^^^^^^^^^ - temporary value is freed at the end of this statement
| |
| creates a temporary which is freed while still in use
9 |
10 | println!("{}", first);
| ----- borrow later used here
|
= note: consider using a `let` binding to create a longer lived value
If I bind the value of Foo::new(words) first, then call the split_first method there is no problem.
These two methods of calling should intuitively be the same but are somehow different.
Short answer: remove the 'a lifetime for the self parameter of split_first: fn split_first(&self) -> &'a str (playground).
Long answer:
When you write this code:
struct Foo<'a> {
part: &'a str,
}
impl<'a> Foo<'a> {
fn new(s: &'a str) -> Self {
Foo { part: s }
}
}
You are telling the compiler that all Foo instances are related to some lifetime 'a that must be equal to or shorter than the lifetime of the string passed as parameter to Foo::new. That lifetime 'a may be different from the lifetime of each Foo instance. When you then write:
let words = "Sometimes think, the greatest sorrow than older";
Foo::new(words)
The compiler infers that the lifetime 'a must be equal to or shorter than the lifetime of words. Barring any other constraints the compiler will use the lifetime of words, which is 'static so it is valid for the full life of the program.
When you add your definition of split_first:
fn split_first(&'a self) -> &'a str
You are adding an extra constraint: you are saying that 'a must also be equal to or shorter than the lifetime of self. The compiler will therefore take the shorter of the lifetime of words and the lifetime of the temporary Foo instance, which is the lifetime of the temporary. #AndersKaseorg's answer explains why that doesn't work.
By removing the 'a lifetime on the self parameter, I am decorrelating 'a from the lifetime of the temporary, so the compiler can again infer that 'a is the lifetime of words, which is long enough for the program to work.
Foo::new(words).split_first() would be interpreted roughly as
let tmp = Foo::new(words);
let ret = tmp.split_first();
drop(tmp);
ret
If Rust allowed you to do this, the references in ret would point [edit: would be allowed by the type of split_first to point*] into the now dropped value of tmp. So it’s a good thing that Rust disallows this. If you wrote the equivalent one-liner in C++, you’d silently get undefined behavior.
By writing the let binding yourself, you delay the drop until the end of the scope, thus extending the region where it’s safe to have these references.
For more details, see temporary lifetimes in the Rust Reference.
* Edit: As pointed out by Jmb, the real problem in this particular example is that the type
fn split_first(&'a self) -> &'a str
isn’t specific enough, and a better solution is to refine the type to:
fn split_first<'b>(&'b self) -> &'a str
which can be abbreviated:
fn split_first(&self) -> &'a str
This conveys the intended guarantee that the returned references do not point into the Foo<'a> (only into the string itself).

Resources