How is output lifetime of a function calculated? - rust

In Rust book (https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html), this code is used as example (paraphrased):
fn main() {
let string1 = String::from("long string is long");
{
let string2 = String::from("xyz");
let result = longest(string1.as_str(), string2.as_str()); // line 5
println!("The longest string is {}", result); // line 6
}
}
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
if x.len() > y.len() { x } else { y }
}
I am confused why this code compiles at all.
Regarding the longest function, the book says, "the generic lifetime 'a will get the concrete lifetime that is equal to the smaller of the lifetimes of x and y".
The book then talked as if string1.as_str() and string2.as_str() live as long as string1 and string2 respectively. But why would they? These two references were not used after line 5, and by line 6, they should have been dead. Why there wasn't an error at line 6 for using result when it is no longer live?
One could say that presence of result somehow extends the input lifetimes, but wouldn't that contradict the notion that "output lifetime is the intersection of input lifetimes"?
Where do I get it wrong?

But why would they? These two references were not used after line 5, and by line 6, they should have been dead.
But they're not dead. In fact, one of them is definitely in result and is getting used on Line 6. A reference can last, at minimum, until the end of the current expression (generally, but not always, until a semicolon), and at maximum as long as the thing it points to continues existing. The lifetime parameter from the output of longest requires that it last as long as result is in scope. Notably, the scope of result is no larger than the scope of either string1 or string2, so there's no issue. If we tried to assign the result of longest to a variable that outlives string2, then we'd have a problem. For instance, this won't compile.
fn main() {
let string1 = String::from("long string is long");
let mut result = "";
{
let string2 = String::from("xyz");
result = longest(string1.as_str(), string2.as_str());
}
println!("The longest string is {}", result);
}
Because that would require result to outlive string2, which is a problem.

The confusion seems to me to originate in the type &'a str. The 'a is the lifetime of the data to which the reference refers, i.e. the region of validity of that data. It is not the region for which the reference variable itself is valid.
So...
string1.as_str() returns a &'s1 str, where 's1 is the lifetime of string1.
string2.as_str() returns a &'s2 str, where 's2 is the lifetime of string2.
longest must infer a single generic lifetime 'a from two parameter lifetimes 's1 and 's2. It does this by choosing the common overlap, the shorter lifetime 's2. This is a form of subtyping: references valid for longer lifetimes can be used transparently as references valid for shorter lifetimes.
So result is a &'s2 str.
Line 6 references result of type &'s2 str where 's2 is the lifetime of string2. The reference name itself is available, and the referent data is valid for the lifetime of string2, 's2, which we are still within.
string1.as_str() and string2.as_str() live as long as string1 and string2 respectively
That's loosely worded. They are references, and they do not last past line 5. But they are references with a lifetime equal to the lifetime of string1 and string2 respectively. This means the data they point to is to live at least that long.
Why there wasn't an error at line 6 for using result when it is no longer live?
result has type &'s2 str, so it is valid. It is a copy of one of the two (temporary) references which are inputs to longest in line 5. The name result is valid in line 6, and it points to data which is guaranteed to live as long as 's2 which is still valid. So, there is no error.
One could say that presence of result somehow extends the input lifetimes
There is no such extension. Those temporaries do not exist past line 5. But one of them is copied into result, and the lifetime is not how long the temporary lives (how long the input reference &'a str lives), but how long the referent data lives.

Related

Why does Rust translate same lifetime specifiers as the smaller one

In Rust docs, we see this example:
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
if x.len() > y.len() {
x
} else {
y
}
}
And explanation looks like this:
The function signature now tells Rust that for some lifetime 'a, the
function takes two parameters, both of which are string slices that
live at least as long as lifetime 'a. The function signature also
tells Rust that the string slice returned from the function will live
at least as long as lifetime 'a. In practice, it means that the
lifetime of the reference returned by the longest function is the same
as the smaller of the lifetimes of the references passed in
Note the words after "in practice". It mentions that:
In practice, it means that the
lifetime of the reference returned by the longest function is the same
as the smaller of the lifetimes of the references passed in
I don't understand why in practice, it means that lifetime of the returned is the same as the smaller of those 2 parameter's lifetimes. Is this something I need to memorize or what ? We can clearly say that parameters and returned values all have 'a same specifier. Why does Rust think that this means returned value should have smaller lifetime of those 2 passed ?
Why does rust think that this means returned value should have SMALLER lifetime of those 2 passed ?
Because that's the only thing that makes sense. Imagine this situation:
let a = "foo"; // &'static str
let s = "bar".to_string(); // String
let b = s.as_str(); // &str (non-static, borrows from s)
let longest = longest(a, b);
The lifetime of a is 'static, i.e. a lasts as long as the program. The lifetime of b is shorter than that, as it's tied to the lifetime of the variable s. But longest only accepts one lifetime!
What Rust does is compute a lifetime that is an intersection of the 'static lifetime of a and the tied-to-s lifetime of b, and uses that as the lifetime of (this invocation of) longest(). If such a lifetime cannot be found, you get a borrow checking error. If it can be found, it's no longer than the shortest source lifetime.
In the above case, the intersection of 'static and the lifetime tied to s is the lifetime tied to s, so that's what's used for the lifetime 'a in longest().

Understanding lifetimes: borrowed value does not live enough

fn main() {
let long;
let str1="12345678".to_string();
{
let str2 = "123".to_string();
long = longest(&str1, &str2);
}
println!("the longest string is: {}", long);
}
fn longest<'a>(x:&'a str, y:&'a str) -> &'a str{
if x.len() > y.len() {
x
} else {
y
}
}
gives
error[E0597]: `str2` does not live long enough
--> src/main.rs:6:31
|
6 | long = longest(&str1, &str2);
| ^^^^^ borrowed value does not live long enough
7 | }
| - `str2` dropped here while still borrowed
8 | println!("the longest string is: {}", long);
| ---- borrow later used here
My theory is that, since the funtion longest has only one lifetime parameter, the compiler is making both x and y to have the lifetime of str1. So Rust is protecting me from calling longest and possibly receive back str2 which has lifetime less than str1 which is the chosen lifetime for 'a.
Is my theory right?
Let's take a closer look at the signature for longest:
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str
What this means is that for a given lifetime 'a, both arguments need to last at least for the length of that lifetime (or longer, which doesn't really matter since you can safely shorten any lifetime in this case without any particular difference) and the return value also lives as long as that lifetime, because the return value comes from one of the arguments and therefore "inherits" the lifetime.
The sole reason for that is that at compile time, you can't really be sure whether x or y will be returned when compiling the function, so the compiler has to assume that either can be returned. Since you've bound both of them with the same lifetime (x, y and the return value have to live at least for the duration of 'a), the resulting lifetime of 'a is the smallest one. Now let's examine the usage of the function:
let long;
let str1 = "12345678".to_string();
{
let str2 = "123".to_string();
long = longest(&str1, &str2);
}
You have two lifetimes here, the one outside the braces (the main() body lifetime) and the lifetime inside the braces (since everything between the braces is destroyed after the closing brace). Because you're storing the strings as String by using .to_string() (owned strings) rather than &'static str (borrowed string literals stored in the program executable file), the string data gets destroyed as soon as it leaves the scope, which, in the case of str2, is the brace scope. The lifetime of str2 ends before the lifetime of str1, therefore, the lifetime of the return value comes from str2 rather than str1.
You then try to store the return value into long — a variable outside the inner brace scope, i.e. into a variable with a lifetime of the main() body rather than the scope. But since the lifetime of str2 restricts the lifetime of the return value for longest in this situation, the return value of longest doesn't live after the braced scope — the owned string you used to store str2 is dropped at the end of the braced scope, releasing resources required to store it, i.e. from a memory safety standpoint it no longer exists.
If you try this, however, everything works fine:
let long;
let str1 = "12345678";
{
let str2 = "123";
long = longest(str1, str2);
}
println!("the longest string is: {}", long);
But why? Remember what I said about how you stored the strings, more specifically, what I said about borrowed string literals which are stored in the executable file. These have a 'static lifetime, which means the entire duration of the program's runtime existence. This means that &'static to anything (not just str) always lives long enough, since now you're referring to memory space inside the executable file (allocated at compile time) rather than a resource on the heap managed by String and dropped when the braced scope ends. You're no longer dealing with a managed resource, you're dealing with a resource managed at compile time, and that pleases the borrow checker by eliminating possible issues with its duration of life, since it's always 'static.
This code for the developer's perspective looks good and it should be because we are printing long in the outer scope so there should be no problem at all.
But Rust's compiler does a strict check on borrowed values and it needs to be sure that every value lives long enough for the variables that depend on that value.
Compiler sees that long depends on the value whose lifetime is shorter that it's own i.e. str, it gives an error. Behind the scenes this is done by a borrow checker.
You can see more details about borrow checker here

Do lifetime annotations in Rust change the lifetime of the variables?

The Rust chapter states that the annotations don't tamper with the lifetime of a variable but how true is that? According to the book, the function longest takes two references of the strings and return the longer one. But here in the error case
fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
if x.len() > y.len() {
x
} else {
y
}
}
fn main() {
let string1 = String::from("long string is long");
let result;
{
let string2 = String::from("xyz");
result = longest(string1.as_str(), string2.as_str());
}
println!("The longest string is {}", result);
}
it does actually change the lifetime of the result variable, doesn't it?
We’ve told Rust that the lifetime of the reference returned by the longest function is the same as the smaller of the lifetimes of the references passed in.
What the book is merely suggesting is that a lifetime parameter of a function cannot interfere with the affected value's lifetime. They cannot make a value live longer (or the opposite) than what is already prescribed by the program.
However, different function signatures can decide the lifetime of those references. Since references are covariant with respect to their lifetimes, you can turn a reference of a "wider" lifetime into a smaller one within that lifetime.
For example, given the definition
fn longest<'a>(a: &'a str, b: &'a str) -> &'a str
, the lifetimes of the two input references must match. However, we can write this:
let local = "I am local string.".to_string();
longest(&local, "I am &'static str!");
The string literal, which has a 'static lifetime, is compatible with the lifetime 'a, in this case mainly constrained by the string local.
Likewise, in the example above, the lifetime 'a has to be constrained to the nested string string2, otherwise it could not be passed by reference to the function. This also means that the output reference is restrained by this lifetime, which is why the code fails to compile when attempting to use the output of longest outside the scope of string2:
error[E0597]: `string2` does not live long enough
--> src/main.rs:14:44
|
14 | result = longest(string1.as_str(), string2.as_str());
| ^^^^^^^ borrowed value does not live long enough
15 | }
| - `string2` dropped here while still borrowed
16 | println!("The longest string is {}", result);
| ------ borrow later used here
See also this question for an extended explanation of lifetimes and their covariance/contravariance characteristics:
How can this instance seemingly outlive its own parameter lifetime?
First, it's important to understand the difference between a lifetime and a scope. References have lifetimes, which are dependent on the scopes of the variables they refer to.
A variable scope is lexical:
fn main() {
let string1 = String::from("long string is long"); // <-- scope of string1 begins here
let result;
{
let string2 = String::from("xyz"); // <-- scope of string2 begins here
result = longest(string1.as_str(), string2.as_str());
// <-- scope of string2 ends here
}
println!("The longest string is {}", result);
// <-- scope of string1 ends here
}
When you create a new reference to a variable, the lifetime of the reference is tied solely to the scope of the variable. Other references have different lifetime information attached to them, depending on where the reference came from and what information is known in that context. When you put named lifetime annotations on a type, the type-checker simply ensures that the lifetime information attached to any references is compatible with the annotations.
fn main() {
let string1 = String::from("long string is long");
let result;
{
let string2 = String::from("xyz");
// The lifetime of result cannot be longer than `'a`
result = longest(string1.as_str(), string2.as_str());
// But a reference to string2 also has lifetime `'a`, which means that
// the lifetime `'a` is only valid for the scope of string2
// <-- i.e. to here
}
// But then we try to use it here — oops!
println!("The longest string is {}", result);
}
We’ve told Rust that the lifetime of the reference returned by the longest function is the same as the smaller of the lifetimes of the references passed in.
Sort of. We did tell this information to Rust, however, the borrow-checker will still check if it is true! If it's isn't already true then we will get an error. We can't change the truthfulness of that information, we can only tell Rust the constraints we want, and it will tell us if we are right.
In your example, you could make the main function valid by changing the lifetime annotations on longest:
fn longest<'a, 'b>(x: &'a str, y: &'b str) -> &'a str {
if x.len() > y.len() {
x
} else {
y // oops!
}
}
But now you get an error inside longest because it no longer meets the requirements: it is now never valid to return y because its lifetime could be shorter than 'a. In fact, the only ways to implement this function correctly are:
Return x
Return a slice of x
Return a &'static str — since 'static outlives all other lifetimes
Use unsafe code

How do I create an iterator of lines from a file that have been split into pieces?

I have a file that I need to read line-by-line and break into two sentences separated by a "=". I am trying to use iterators, but I can't find how to use it properly within split. The documentation says that std::str::Split implements the trait, but I'm still clueless how to use it.
use std::{
fs::File,
io::{prelude::*, BufReader},
};
fn example(path: &str) {
for line in BufReader::new(File::open(path).expect("Failed at opening file.")).lines() {
let words = line.unwrap().split("="); //need to make this an iterable
}
}
How can I use a trait I know is already implemented into something like split?
As #Mateen commented, split already returns an iterable. To fix the lifetime problems, save the value returned by unwrap() into a variable before calling split.
I'll try to explain the lifetime issue here.
First it really helps to look at the function signatures.
pub fn unwrap(self) -> T
pub fn split<'a, P: Pattern<'a>>(&'a self, pat: P) -> Split<'a, P>
unwrap is pretty simple, it takes ownership of itself and returns the inner value.
split looks scary, but it's not too difficult, 'a is just a name for the lifetime, and it just states how long the return value can be used for. In this case it means that both the input arguments must live at least as long as the return value.
// Takes by reference, no ownership change
// v
pub fn split<'a, P: Pattern<'a>>(&'a self, pat: P) -> Split<'a, P>
// ^ ^ ^ ^
// | |--|---| |
// This just declares a name. | |
// | |
// Both of these values must last longer than -----|
This is because split doesn't copy any of the string, it just points to the position on the original string where the split takes place. If the original string for some reason was dropped, the Split will not point to invalid data.
A variable's lifetime (unless the ownership is passed to something else) lasts till it is out of scope, this is either at the closing } if it is named (e.g. with let) or it is at the end of line / ;
That's why there is a lifetime problem in your code:
for line in std::io::BufReader::new(std::fs::File::open(path).expect("Failed at opening file.")).lines() {
let words = line
.unwrap() // <--- Unwrap consumes `line`, `line` can not be used after calling unwrap(),
.split("=") // Passed unwrap()'s output to split as a reference
; //<-- end of line, unwrap()'s output is dropped due to it not being saved to a variable, the result of split now points to nothing, so the compiler complains.
}
Solutions
Saving the return value of unwrap()
for line in std::io::BufReader::new(std::fs::File::open("abc").expect("Failed at opening file.")).lines() {
let words = line.unwrap();
let words_split = words.split("=");
} // <--- `word`'s lifetime ends here, but there is no lifetime issues since `words_split` also ends here.
You can rename words_split to words to shadow the original variable to not clutter variable names if you want, this also doesn't cause an issue since shadowed variables are not dropped immediately, but at the end of its original scope.
Or
Rather than having a iterator of type str, all of which are just fancy pointers to the original string, you can copy each slice out to it's own string, removing the reliance on keeping the original string in scope.
There is almost certainly no reason to do this in your case, since copying each slice takes more processing power and more memory, but rust gives you this control.
let words = line
.unwrap()
.split("=")
.map(|piece|
piece.to_owned() // <--- This copies all the characters in the str into it's own String.
).collect::<Vec<String>>()
; // <--- unwrap()'s output dropped here, but it doesn't matter since the pieces no longer points to the original line string.
let words_iterator = words.iter();
collect gives you the error cannot infer type because you didn't state what you wanted to collect into, either use the turbofish syntax above, or state it on words i.e. let words: Vec<String> = ...
You have to call collect because map doesn't do anything unless you use it, but that's out of the scope of this answer.

Why can I use the same lifetime label for variables that have different lifetimes?

Why does this code compile?
#[derive(Debug)]
pub struct Foo<'a> {
pub x: &'a [&'a str],
}
fn main() {
let s = "s".to_string();
let t: &str = s.as_ref();
{
let v = vec![t];
let foo = Foo { x: &v[..] };
println!("{:?}", &foo);
}
}
In my Foo struct, my x field contains a slice of &strs. My understanding of these lifetime labels is that the slice and the individual &strs have the same lifetime.
However, in my example the &str t (in the outer block) does not have the same lifetime as the container slice (in the inner block).
My understanding of these lifetime labels is that the slice and the individual &strs have the same lifetime.
This is a common misconception. It really means that there has to be a lifetime that applies to both. When instantiated, the Foo struct's 'a corresponds to the lines of the inner block after let v = vec![t]; as that is a lifetime that both variables share.
If this flexibility didn't exist, lifetimes would be super painful to use. Variables defined on two lines have different actual lifetimes (the one defined first outlives the one defined second). If the lifetimes had to actually match, we'd always have to define all the variables on the same line!
Some more detailed information is available in RFC #738 — Variance.
The syntax 'a is actually used in two different situations:
labeling a loop
indicating a bound
The first situation, labeling a loop:
fn main() {
'a: loop {
println!("{}", 3);
break 'a;
}
}
Here, 'a clearly delineates the lifetime of the loop body, and allows breaking from multiple layers of loops in one fell swoop.
The second, and more similar situation, is using 'a to represent a bound:
fn<'a> susbtr(haystack: &'a str, offset: usize) -> &'a str;
In this case, the lifetime 'a does not represent the actual lifetime of the variable, it represents a bound on the lifetime of the referenced variable and allows tying together the bounds of various variables.
Note that the caller and callee interpret the bound differently:
from the perspective of the caller, 'a is an upper-bound, a promise that the return value will live at least as long as the parameter (maybe longer, no guarantee)
from the perspective of the callee (ie, substr), 'a is a lower-bound, a check that any returned value must live at least as long as the parameter (maybe longer, not necessary)
We can have variance since the bound does not represent the actual lifetime, when a single bound is used for multiple lifetimes, the compiler will simply deduce the lowest/highest bound that makes sense for the situation:
the caller gets the lowest upper-bound feasible (ie, the least guarantee)
the caller gets the highest lower-bound feasible (ie, the least constraint)
For example:
fn<'b> either(one: &'b str, two: &'b str, flag: bool) -> &'b str {
if flag { one } else { two }
}
can be called with:
fn<'a> call(o: &'a str, flag: bool) -> &'a str {
either(o, "Hello, World", flag)
}
Here, the lifetime of o is unknown (some 'a) whilst the lifetime of "Hello, World" is known ('static), 'static is by definition the greater of the lifetimes (it lives for all the program).
the caller of call only knows that the return value lives at least as long as o
call must guarantee this, it supplies o and "Hello, World" to either where 'b is deduced to be the lowest bound between 'a and 'static (thus 'a)
either simply must return something that lives as long as either one of its arguments; it's unaware that their lifetime may differ, and cares not

Resources