Related
I'm trying to have a MyType that supports a From<&[u8]> trait, but I'm running into "lifetime problems":
Here's a minimally viable example:
struct MyType {
i: i32
}
impl MyType {
fn from_bytes(_buf: &[u8]) -> MyType {
// for example...
MyType { i: 3 }
}
}
impl From<&[u8]> for MyType {
fn from(bytes: &[u8]) -> Self {
MyType::from_bytes(bytes)
}
}
fn do_smth<T>() -> T where T: From<&[u8]>
{
// for example...
let buf : Vec<u8> = vec![1u8, 2u8];
T::from(buf.as_slice())
}
(...and here's a Rust playground link)
For reasons I cannot understand, the Rust compiler is telling me:
error[E0637]: `&` without an explicit lifetime name cannot be used here
--> src/lib.rs:17:36
|
17 | fn do_smth<T>() -> T where T: From<&[u8]>
| ^ explicit lifetime name needed here
I'm not an expert on lifetimes and I don't understand why this piece of code needs one. What would be the best way to fix this?
Might Rust be thinking that the type T could be a &[u8] itself? But, in that case, the lifetime should be inferred to be the same as the input to From::<&[u8]>::from(), no?
One fix I was given was to do:
fn do_smth<T>() -> T where for<'a> T: From<&'a [u8]>
{
// for example...
let buf : Vec<u8> = vec![1u8, 2u8];
T::from(buf.as_slice())
}
...but I do not understand this fix, nor do I understand why lifetimes are needed in the first place.
Rust first wants you to write:
fn do_smth<'a, T>() -> T
where
T: From<&'a [u8]>,
{
// for example...
let buf: Vec<u8> = vec![1u8, 2u8];
T::from(&buf)
}
where you make explicit that this function can be called for any lifetime 'a and any type T such that T implements From<&'a [u8]>.
But Rust then complains:
error[E0597]: `buf` does not live long enough
--> src/lib.rs:24:13
|
18 | fn do_smth<'a, T>() -> T
| -- lifetime `'a` defined here
...
24 | T::from(&buf)
| --------^^^^-
| | |
| | borrowed value does not live long enough
| argument requires that `buf` is borrowed for `'a`
25 | }
| - `buf` dropped here while still borrowed
You promised that this function could work with any lifetime, but this turns out to not be true, because in the body of the function you create a fresh reference to the local Vec which has a different lifetime, say 'local. Your function only works when 'a equals 'local, but you promise that it also works for all other lifetimes. What you need is a way to express that these lifetimes are the same, and the only way I think that is possible is by changing the local reference to an argument:
fn do_smth<'a, T>(buf: &'a [u8]) -> T
where
T: From<&'a [u8]>,
{
T::from(buf)
}
And then it compiles.
If instead of the function promising it can work with any lifetime, you want to make the caller promise it can work with any lifetime, you can instead use HRTBs to make the caller promise it.
fn do_smth<T>() -> T
where
for<'a> T: From<&'a [u8]>,
{
// for example...
let buf: Vec<u8> = vec![1u8, 2u8];
T::from(&buf)
}
Now, since you can use any lifetime, a local one also works and the code compiles.
Lifetimes represent a "duration" (metaphorically), or, more pragmatically, a scope, in which a variable is valid. Outside of one's lifetime, the variable should be considered as having been freed from memory, even though you haven't done that explicitly, because that's how Rust manages memory.
It becomes a bit more complex when Rust tries to ensure that, once a variable is done for, no other parts of the code that could have had access to that variable still have access. These shared accesses are called borrows, and that's why borrows have lifetimes too. The main condition Rust enforces on them is that a borrow's lifetime is always shorter (or within, depending on how you see it) than its original variable, ie. you can't share something for more time than you actually own it.
Rust therefore enforces all borrows (as well as all variables, really) to have an established lifetime at compile-time. To lighten things, Rust has default rules about what a lifetime should be if it was not explicitly defined by the user, that is, when you talk about a type that involves a lifetime, Rust let's you not write that lifetime explicitly under certain conditions. However, this is not a "lifetime inference", in the sense of inferring types: Rust will not try to make sense out of explicit lifetimes, it's a lot less smart about it. In particular, this lifetime explicitation can fail, in the sense that Rust will not be able to figure out the right lifetime it has to assign even though it was possible to find out that worked.
Back to business: your first error simply stems from the fact that Rust has no rule to make a lifetime if it wasn't provided in the position pointed out by the error. As I said, Rust won't try to infer what the right lifetime would be, it simply checks whether not explicitly putting a lifetime there implicitly means something or not. So, you simply need to put one.
Your first reflex might be to make your function generic over the missing lifetime, which is often the right thing to do (and even the only possible action), that is, do something like that:
fn do_smth<'a, T>() -> T
where
T: From<&'a [u8]>
{
// for example...
let buf : Vec<u8> = vec![1, 2];
T::from(buf.as_slice())
}
What this means is that do_smth is generic over the lifetime 'a, just like it is generic over the type T. This has two consequences:
Rust will proceed to a monomorphisation of your function for each call, meaning it will actually provide a concrete implementation of your function for each type T and each lifetime 'a that is required. In particular, it will automatically find out what is the right lifetime. This might seem contradictory with what I said earlier, about Rust not inferring lifetimes. The difference is that type inference and monomorphisation, although similar, are not the same step, and so the compiler does not work lifetimes in the same way. Don't worry about this until you have understood the rest.
The second consequence, which is a bit disastrous, is that your function exposes the following contract: for any type T, and any lifetime 'a, such that T: From<&'a [u8]>, do_smth can produce a type T. If you think about it, it means that even if T only implements From<&'a [u8]> for a lifetime 'a that is already finished (or, if you see lifetimes as scopes, for a lifetime 'a that is disjoint from do_smth's scope), you can produce an element of type T. This is not what you actually meant: you don't want the caller to give you an arbitrary lifetime. Instead, you know that the lifetime of the borrow of the slice is the one you chose it to be, within your function (because you own the underlying vector), and you want that the type T to be buildable from that slice. That is, you want T: From<&'a [u8]> for a 'a that you have chosen, not one provided by the caller.
This last point should make you understand why the previous snippet of code is unsound, and won't compile. Your function should not take a lifetime as argument, just a type T with certain constraints. But then, how do you encode the said conditions? That's where for<'a> comes into play. If you have a type T such that T: for<'a> From<&'a [u8]>, it means that for all 'a, T: From<&'a [u8]>. In particular, it is true for the lifetime of your slice. This is why the following works
fn do_smth<T>() -> T
where
T: for<'a> From<&'a [u8]>
{
// for example...
let buf: Vec<u8> = vec![1, 2];
T::from(buf.as_slice())
}
Note that, as planned, this version of do_smth is not generic over a lifetime, that is, the caller does not provide a lifetime to the function.
I can't call Foo::new(words).split_first() in the following code
fn main() {
let words = "Sometimes think, the greatest sorrow than older";
/*
let foo = Foo::new(words);
let first = foo.split_first();
*/
let first = Foo::new(words).split_first();
println!("{}", first);
}
struct Foo<'a> {
part: &'a str,
}
impl<'a> Foo<'a> {
fn split_first(&'a self) -> &'a str {
self.part.split(',').next().expect("Could not find a ','")
}
fn new(s: &'a str) -> Self {
Foo { part: s }
}
}
the compiler will give me an error message
error[E0716]: temporary value dropped while borrowed
--> src/main.rs:8:17
|
8 | let first = Foo::new(words).split_first();
| ^^^^^^^^^^^^^^^ - temporary value is freed at the end of this statement
| |
| creates a temporary which is freed while still in use
9 |
10 | println!("{}", first);
| ----- borrow later used here
|
= note: consider using a `let` binding to create a longer lived value
If I bind the value of Foo::new(words) first, then call the split_first method there is no problem.
These two methods of calling should intuitively be the same but are somehow different.
Short answer: remove the 'a lifetime for the self parameter of split_first: fn split_first(&self) -> &'a str (playground).
Long answer:
When you write this code:
struct Foo<'a> {
part: &'a str,
}
impl<'a> Foo<'a> {
fn new(s: &'a str) -> Self {
Foo { part: s }
}
}
You are telling the compiler that all Foo instances are related to some lifetime 'a that must be equal to or shorter than the lifetime of the string passed as parameter to Foo::new. That lifetime 'a may be different from the lifetime of each Foo instance. When you then write:
let words = "Sometimes think, the greatest sorrow than older";
Foo::new(words)
The compiler infers that the lifetime 'a must be equal to or shorter than the lifetime of words. Barring any other constraints the compiler will use the lifetime of words, which is 'static so it is valid for the full life of the program.
When you add your definition of split_first:
fn split_first(&'a self) -> &'a str
You are adding an extra constraint: you are saying that 'a must also be equal to or shorter than the lifetime of self. The compiler will therefore take the shorter of the lifetime of words and the lifetime of the temporary Foo instance, which is the lifetime of the temporary. #AndersKaseorg's answer explains why that doesn't work.
By removing the 'a lifetime on the self parameter, I am decorrelating 'a from the lifetime of the temporary, so the compiler can again infer that 'a is the lifetime of words, which is long enough for the program to work.
Foo::new(words).split_first() would be interpreted roughly as
let tmp = Foo::new(words);
let ret = tmp.split_first();
drop(tmp);
ret
If Rust allowed you to do this, the references in ret would point [edit: would be allowed by the type of split_first to point*] into the now dropped value of tmp. So it’s a good thing that Rust disallows this. If you wrote the equivalent one-liner in C++, you’d silently get undefined behavior.
By writing the let binding yourself, you delay the drop until the end of the scope, thus extending the region where it’s safe to have these references.
For more details, see temporary lifetimes in the Rust Reference.
* Edit: As pointed out by Jmb, the real problem in this particular example is that the type
fn split_first(&'a self) -> &'a str
isn’t specific enough, and a better solution is to refine the type to:
fn split_first<'b>(&'b self) -> &'a str
which can be abbreviated:
fn split_first(&self) -> &'a str
This conveys the intended guarantee that the returned references do not point into the Foo<'a> (only into the string itself).
Considering the following code:
fn foo<'a, T: 'a>(t: T) -> Box<Fn() -> &'a T + 'a> {
Box::new(move || &t)
}
What I expect:
The type T has lifetime 'a.
The value t live as long as T.
t moves to the closure, so the closure live as long as t
The closure returns a reference to t which was moved to the closure. So the reference is valid as long as the closure exists.
There is no lifetime problem, the code compiles.
What actually happens:
The code does not compile:
error[E0495]: cannot infer an appropriate lifetime for borrow expression due to conflicting requirements
--> src/lib.rs:2:22
|
2 | Box::new(move || &t)
| ^^
|
note: first, the lifetime cannot outlive the lifetime as defined on the body at 2:14...
--> src/lib.rs:2:14
|
2 | Box::new(move || &t)
| ^^^^^^^^^^
note: ...so that closure can access `t`
--> src/lib.rs:2:22
|
2 | Box::new(move || &t)
| ^^
note: but, the lifetime must be valid for the lifetime 'a as defined on the function body at 1:8...
--> src/lib.rs:1:8
|
1 | fn foo<'a, T: 'a>(t: T) -> Box<Fn() -> &'a T + 'a> {
| ^^
= note: ...so that the expression is assignable:
expected std::boxed::Box<(dyn std::ops::Fn() -> &'a T + 'a)>
found std::boxed::Box<dyn std::ops::Fn() -> &T>
I do not understand the conflict. How can I fix it?
Very interesting question! I think I understood the problem(s) at play here. Let me try to explain.
tl;dr: closures cannot return references to values captured by moving, because that would be a reference to self. Such a reference cannot be returned because the Fn* traits don't allow us to express that. This is basically the same as the streaming iterator problem and could be fixed via GATs (generic associated types).
Implementing it manually
As you probably know, when you write a closure, the compiler will generate a struct and impl blocks for the appropriate Fn traits, so closures are basically syntax sugar. Let's try to avoid all that sugar and build your type manually.
What you want is a type which owns another type and can return references to that owned type. And you want to have a function which returns a boxed instance of said type.
struct Baz<T>(T);
impl<T> Baz<T> {
fn call(&self) -> &T {
&self.0
}
}
fn make_baz<T>(t: T) -> Box<Baz<T>> {
Box::new(Baz(t))
}
This is pretty equivalent to your boxed closure. Let's try to use it:
let outside = {
let s = "hi".to_string();
let baz = make_baz(s);
println!("{}", baz.call()); // works
baz
};
println!("{}", outside.call()); // works too
This works just fine. The string s is moved into the Baz type and that Baz instance is moved into the Box. s is now owned by baz and then by outside.
It gets more interesting when we add a single character:
let outside = {
let s = "hi".to_string();
let baz = make_baz(&s); // <-- NOW BORROWED!
println!("{}", baz.call()); // works
baz
};
println!("{}", outside.call()); // doesn't work!
Now we cannot make the lifetime of baz bigger than the lifetime of s, since baz contains a reference to s which would be an dangling reference of s would go out of scope earlier than baz.
The point I wanted to make with this snippet: we didn't need to annotate any lifetimes on the type Baz to make this safe; Rust figured it out on its own and enforces that baz lives no longer than s. This will be important below.
Writing a trait for it
So far we only covered the basics. Let's try to write a trait like Fn to get closer to your original problem:
trait MyFn {
type Output;
fn call(&self) -> Self::Output;
}
In our trait, there are no function parameters, but otherwise it's fairly identical to the real Fn trait.
Let's implement it!
impl<T> MyFn for Baz<T> {
type Output = ???;
fn call(&self) -> Self::Output {
&self.0
}
}
Now we have a problem: what do we write instead of ???? Naively one would write &T... but we need a lifetime parameter for that reference. Where do we get one? What lifetime does the return value even have?
Let's check the function we implemented before:
impl<T> Baz<T> {
fn call(&self) -> &T {
&self.0
}
}
So here we use &T without lifetime parameter too. But this only works because of lifetime elision. Basically, the compiler fills in the blanks so that fn call(&self) -> &T is equivalent to:
fn call<'s>(&'s self) -> &'s T
Aha, so the lifetime of the returned reference is bound to the self lifetime! (more experienced Rust users might already have a feeling where this is going...).
(As a side note: why is the returned reference not dependent on the lifetime of T itself? If T references something non-'static then this has to be accounted for, right? Yes, but it is already accounted for! Remember that no instance of Baz<T> can ever live longer than the thing T might reference. So the self lifetime is already shorter than whatever lifetime T might have. Thus we only need to concentrate on the self lifetime)
But how do we express that in the trait impl? Turns out: we can't (yet). This problem is regularly mentioned in the context of streaming iterators -- that is, iterators that return an item with a lifetime bound to the self lifetime. In today's Rust, it is sadly impossible to implement this; the type system is not strong enough.
What about the future?
Luckily, there is an RFC "Generic Associated Types" which was merged some time ago. This RFC extends the Rust type system to allow associated types of traits to be generic (over other types and lifetimes).
Let's see how we can make your example (kinda) work with GATs (according to the RFC; this stuff doesn't work yet ☹). First we have to change the trait definition:
trait MyFn {
type Output<'a>; // <-- we added <'a> to make it generic
fn call(&self) -> Self::Output;
}
The function signature hasn't changed in the code, but notice that lifetime elision kicks in! The above fn call(&self) -> Self::Output is equivalent to:
fn call<'s>(&'s self) -> Self::Output<'s>
So the lifetime of the associated type is bound to the self lifetime. Just as we wanted! The impl looks like this:
impl<T> MyFn for Baz<T> {
type Output<'a> = &'a T;
fn call(&self) -> Self::Output {
&self.0
}
}
To return a boxed MyFn we would need to write this (according to this section of the RFC:
fn make_baz<T>(t: T) -> Box<for<'a> MyFn<Output<'a> = &'a T>> {
Box::new(Baz(t))
}
And what if we want to use the real Fn trait? As far as I understand, we can't, even with GATs. I think it's impossible to change the existing Fn trait to use GATs in a backwards compatible manner. So it's likely that the standard library will keep the less powerful trait as is. (side note: how to evolve the standard library in backwards incompatible ways to use new language features is something I wondered about a few times already; so far I haven't heard of any real plan in this regards; I hope the Rust team comes up with something...)
Summary
What you want is not technically impossible or unsafe (we implemented it as a simple struct and it works). However, unfortunately it is impossible to express what you want in the form of closures/Fn traits in Rust's type system right now. This is the same problem streaming iterators are dealing with.
With the planned GAT feature, it is possible to express all of this in the type system. However, the standard library would need to catch up somehow to make your exact code possible.
What I expect:
The type T has lifetime 'a.
The value t live as long as T.
This makes no sense. A value cannot "live as long" as a type, because a type doesn't live. "T has lifetime 'a" is a very imprecise statement, easy to misunderstand. What T: 'a really means is "instances of T must stay valid at least as long as lifetime 'a. For example, T must not be a reference with a lifetime shorter than 'a, or a struct containing such a reference. Note that this has nothing to do with forming references to T, i.e. &T.
The value t, then, lives as long as its lexical scope (it's a function parameter) says it does, which has nothing to do with 'a at all.
t moves to the closure, so the closure live as long as t
This is also incorrect. The closure lives as long as the closure does lexically. It is a temporary in the result expression, and therefore lives until the end of the result expression. t's lifetime concerns the closure not at all, since it has its own T variable inside, the capture of t. Since the capture is a copy/move of t, it is not in any way affected by t's lifetime.
The temporary closure is then moved into the box's storage, but that's a new object with its own lifetime. The lifetime of that closure is bound to the lifetime of the box, i.e. it is the return value of the function, and later (if you store the box outside the function) the lifetime of whatever variable you store the box in.
All of that means that a closure that returns a reference to its own capture state must bind the lifetime of that reference to its own reference. Unfortunately, this is not possible.
Here's why:
The Fn trait implies the FnMut trait, which in turn implies the FnOnce trait. That is, every function object in Rust can be called with a by-value self argument. This means that every function object must be still valid being called with a by-value self argument and returning the same thing as always.
In other words, trying to write a closure that returns a reference to its own captures expands to roughly this code:
struct Closure<T> {
captured: T,
}
impl<T> FnOnce<()> for Closure<T> {
type Output = &'??? T; // what do I put as lifetime here?
fn call_once(self, _: ()) -> Self::Output {
&self.captured // returning reference to local variable
// no matter what, the reference would be invalid once we return
}
}
And this is why what you're trying to do is fundamentally impossible. Take a step back, think of what you're actually trying to accomplish with this closure, and find some other way to accomplish it.
You expect the type T to have lifetime 'a, but t is not a reference to a value of type T. The function takes ownership of the variable t by argument passing:
// t is moved here, t lifetime is the scope of the function
fn foo<'a, T: 'a>(t: T)
You should do:
fn foo<'a, T: 'a>(t: &'a T) -> Box<Fn() -> &'a T + 'a> {
Box::new(move || t)
}
The other answers are top-notch, but I wanted to chime in with another reason your original code couldn't work. A big problem lies in the signature:
fn foo<'a, T: 'a>(t: T) -> Box<Fn() -> &'a T + 'a>
This says that the caller may specify any lifetime when calling foo and the code will be valid and memory-safe. That cannot possibly be true for this code. It wouldn't make sense to call this with 'a set to 'static, but nothing about this signature would prevent that.
I was reading the lifetimes chapter of the Rust book, and I came across this example for a named/explicit lifetime:
struct Foo<'a> {
x: &'a i32,
}
fn main() {
let x; // -+ x goes into scope
// |
{ // |
let y = &5; // ---+ y goes into scope
let f = Foo { x: y }; // ---+ f goes into scope
x = &f.x; // | | error here
} // ---+ f and y go out of scope
// |
println!("{}", x); // |
} // -+ x goes out of scope
It's quite clear to me that the error being prevented by the compiler is the use-after-free of the reference assigned to x: after the inner scope is done, f and therefore &f.x become invalid, and should not have been assigned to x.
My issue is that the problem could have easily been analyzed away without using the explicit 'a lifetime, for instance by inferring an illegal assignment of a reference to a wider scope (x = &f.x;).
In which cases are explicit lifetimes actually needed to prevent use-after-free (or some other class?) errors?
The other answers all have salient points (fjh's concrete example where an explicit lifetime is needed), but are missing one key thing: why are explicit lifetimes needed when the compiler will tell you you've got them wrong?
This is actually the same question as "why are explicit types needed when the compiler can infer them". A hypothetical example:
fn foo() -> _ {
""
}
Of course, the compiler can see that I'm returning a &'static str, so why does the programmer have to type it?
The main reason is that while the compiler can see what your code does, it doesn't know what your intent was.
Functions are a natural boundary to firewall the effects of changing code. If we were to allow lifetimes to be completely inspected from the code, then an innocent-looking change might affect the lifetimes, which could then cause errors in a function far away. This isn't a hypothetical example. As I understand it, Haskell has this problem when you rely on type inference for top-level functions. Rust nipped that particular problem in the bud.
There is also an efficiency benefit to the compiler — only function signatures need to be parsed in order to verify types and lifetimes. More importantly, it has an efficiency benefit for the programmer. If we didn't have explicit lifetimes, what does this function do:
fn foo(a: &u8, b: &u8) -> &u8
It's impossible to tell without inspecting the source, which would go against a huge number of coding best practices.
by inferring an illegal assignment of a reference to a wider scope
Scopes are lifetimes, essentially. A bit more clearly, a lifetime 'a is a generic lifetime parameter that can be specialized with a specific scope at compile time, based on the call site.
are explicit lifetimes actually needed to prevent [...] errors?
Not at all. Lifetimes are needed to prevent errors, but explicit lifetimes are needed to protect what little sanity programmers have.
Let's have a look at the following example.
fn foo<'a, 'b>(x: &'a u32, y: &'b u32) -> &'a u32 {
x
}
fn main() {
let x = 12;
let z: &u32 = {
let y = 42;
foo(&x, &y)
};
}
Here, the explicit lifetimes are important. This compiles because the result of foo has the same lifetime as its first argument ('a), so it may outlive its second argument. This is expressed by the lifetime names in the signature of foo. If you switched the arguments in the call to foo the compiler would complain that y does not live long enough:
error[E0597]: `y` does not live long enough
--> src/main.rs:10:5
|
9 | foo(&y, &x)
| - borrow occurs here
10 | };
| ^ `y` dropped here while still borrowed
11 | }
| - borrowed value needs to live until here
The lifetime annotation in the following structure:
struct Foo<'a> {
x: &'a i32,
}
specifies that a Foo instance shouldn't outlive the reference it contains (x field).
The example you came across in the Rust book doesn't illustrate this because f and y variables go out of scope at the same time.
A better example would be this:
fn main() {
let f : Foo;
{
let n = 5; // variable that is invalid outside this block
let y = &n;
f = Foo { x: y };
};
println!("{}", f.x);
}
Now, f really outlives the variable pointed to by f.x.
Note that there are no explicit lifetimes in that piece of code, except the structure definition. The compiler is perfectly able to infer lifetimes in main().
In type definitions, however, explicit lifetimes are unavoidable. For example, there is an ambiguity here:
struct RefPair(&u32, &u32);
Should these be different lifetimes or should they be the same? It does matter from the usage perspective, struct RefPair<'a, 'b>(&'a u32, &'b u32) is very different from struct RefPair<'a>(&'a u32, &'a u32).
Now, for simple cases, like the one you provided, the compiler could theoretically elide lifetimes like it does in other places, but such cases are very limited and do not worth extra complexity in the compiler, and this gain in clarity would be at the very least questionable.
If a function receives two references as arguments and returns a reference, then the implementation of the function might sometimes return the first reference and sometimes the second one. It is impossible to predict which reference will be returned for a given call. In this case, it is impossible to infer a lifetime for the returned reference, since each argument reference may refer to a different variable binding with a different lifetime. Explicit lifetimes help to avoid or clarify such a situation.
Likewise, if a structure holds two references (as two member fields) then a member function of the structure may sometimes return the first reference and sometimes the second one. Again explicit lifetimes prevent such ambiguities.
In a few simple situations, there is lifetime elision where the compiler can infer lifetimes.
I've found another great explanation here: http://doc.rust-lang.org/0.12.0/guide-lifetimes.html#returning-references.
In general, it is only possible to return references if they are
derived from a parameter to the procedure. In that case, the pointer
result will always have the same lifetime as one of the parameters;
named lifetimes indicate which parameter that is.
The case from the book is very simple by design. The topic of lifetimes is deemed complex.
The compiler cannot easily infer the lifetime in a function with multiple arguments.
Also, my own optional crate has an OptionBool type with an as_slice method whose signature actually is:
fn as_slice(&self) -> &'static [bool] { ... }
There is absolutely no way the compiler could have figured that one out.
As a newcomer to Rust, my understanding is that explicit lifetimes serve two purposes.
Putting an explicit lifetime annotation on a function restricts the type of code that may appear inside that function. Explicit lifetimes allow the compiler to ensure that your program is doing what you intended.
If you (the compiler) want(s) to check if a piece of code is valid, you (the compiler) will not have to iteratively look inside every function called. It suffices to have a look at the annotations of functions that are directly called by that piece of code. This makes your program much easier to reason about for you (the compiler), and makes compile times managable.
On point 1., Consider the following program written in Python:
import pandas as pd
import numpy as np
def second_row(ar):
return ar[0]
def work(second):
df = pd.DataFrame(data=second)
df.loc[0, 0] = 1
def main():
# .. load data ..
ar = np.array([[0, 0], [0, 0]])
# .. do some work on second row ..
second = second_row(ar)
work(second)
# .. much later ..
print(repr(ar))
if __name__=="__main__":
main()
which will print
array([[1, 0],
[0, 0]])
This type of behaviour always surprises me. What is happening is that df is sharing memory with ar, so when some of the content of df changes in work, that change infects ar as well. However, in some cases this may be exactly what you want, for memory efficiency reasons (no copy). The real problem in this code is that the function second_row is returning the first row instead of the second; good luck debugging that.
Consider instead a similar program written in Rust:
#[derive(Debug)]
struct Array<'a, 'b>(&'a mut [i32], &'b mut [i32]);
impl<'a, 'b> Array<'a, 'b> {
fn second_row(&mut self) -> &mut &'b mut [i32] {
&mut self.0
}
}
fn work(second: &mut [i32]) {
second[0] = 1;
}
fn main() {
// .. load data ..
let ar1 = &mut [0, 0][..];
let ar2 = &mut [0, 0][..];
let mut ar = Array(ar1, ar2);
// .. do some work on second row ..
{
let second = ar.second_row();
work(second);
}
// .. much later ..
println!("{:?}", ar);
}
Compiling this, you get
error[E0308]: mismatched types
--> src/main.rs:6:13
|
6 | &mut self.0
| ^^^^^^^^^^^ lifetime mismatch
|
= note: expected type `&mut &'b mut [i32]`
found type `&mut &'a mut [i32]`
note: the lifetime 'b as defined on the impl at 4:5...
--> src/main.rs:4:5
|
4 | impl<'a, 'b> Array<'a, 'b> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
note: ...does not necessarily outlive the lifetime 'a as defined on the impl at 4:5
--> src/main.rs:4:5
|
4 | impl<'a, 'b> Array<'a, 'b> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
In fact you get two errors, there is also one with the roles of 'a and 'b interchanged. Looking at the annotation of second_row, we find that the output should be &mut &'b mut [i32], i.e., the output is supposed to be a reference to a reference with lifetime 'b (the lifetime of the second row of Array). However, because we are returning the first row (which has lifetime 'a), the compiler complains about lifetime mismatch. At the right place. At the right time. Debugging is a breeze.
The reason why your example does not work is simply because Rust only has local lifetime and type inference. What you are suggesting demands global inference. Whenever you have a reference whose lifetime cannot be elided, it must be annotated.
I think of a lifetime annotation as a contract about a given ref been valid in the receiving scope only while it remains valid in the source scope. Declaring more references in the same lifetime kind of merges the scopes, meaning that all the source refs have to satisfy this contract.
Such annotation allow the compiler to check for the fulfillment of the contract.
I believe that this function declaration tells Rust that the lifetime of the function's output is the same as the lifetime of it's s parameter:
fn substr<'a>(s: &'a str, until: u32) -> &'a str;
^^^^
It seems to me that the compiler only needs to know this(1):
fn substr(s: &'a str, until: u32) -> &'a str;
What does the annotation <'a> after the function name mean? Why does the compiler need it, and what does it do with it?
(1): I know it needs to know even less, due to lifetime elision. But this question is about specifying lifetime explicitly.
Let me expand on the previous answers…
What does the annotation <'a> after the function name mean?
I wouldn't use the word "annotation" for that. Much like <T> introduces a generic type parameter, <'a> introduces a generic lifetime parameter. You can't use any generic parameters without introducing them first and for generic functions this introduction happens right after their name. You can think of a generic function as a family of functions. So, essentially, you get one function for every combination of generic parameters. substr::<'x> would be a specific member of that function family for some lifetime 'x.
If you're unclear on when and why we have to be explicit about lifetimes, read on…
A lifetime parameter is always associated with all reference types. When you write
fn main() {
let x = 28374;
let r = &x;
}
the compiler knows that x lives in the main function's scope enclosed with curly braces. Internally, it identifies this scope with some lifetime parameter. For us, it is unnamed. When you take the address of x, you'll get a value of a specific reference type. A reference type is kind of a member of a two dimensional family of reference types. One axis is the type of what the reference points to and the other axis is a lifetime that is used for two constraints:
The lifetime parameter of a reference type represents an upper bound for how long you can hold on to that reference
The lifetime parameter of a reference type represents a lower bound for the lifetime of the things you can make the reference point to.
Together, these constraints play a vital role in Rust's memory safety story. The goal here is to avoid dangling references. We would like to rule out references that point to some memory region we are not allowed to use anymore because that thing it used to point to does not exist anymore.
One potential source of confusion is probably the fact that lifetime parameters are invisible most of the time. But that does not mean they are not there. References always have a lifetime parameter in their type. But such a lifetime parameter does not have to have a name and most of the time we don't need to mention it anyways because the compiler can assign names for lifetime parameters automatically. This is called "lifetime elision". For example, in the following case, you don't see any lifetime parameters being mentioned:
fn substr(s: &str, until: u32) -> &str {…}
But it's okay to write it like this. It's actually a short-cut syntax for the more explicit
fn substr<'a>(s: &'a str, until: u32) -> &'a str {…}
Here, the compiler automatically assigns the same name to the "input lifetime" and the "output lifetime" because it's a very common pattern and most likely exactly what you want. Because this pattern is so common, the compiler lets us get away without saying anything about lifetimes. It assumes that this more explicit form is what we meant based on a couple of "lifetime elision" rules (which are at least documented here)
There are situations in which explicit lifetime parameters are not optional. For example, if you write
fn min<T: Ord>(x: &T, y: &T) -> &T {
if x <= y {
x
} else {
y
}
}
the compiler will complain because it will interpret the above declaration as
fn min<'a, 'b, 'c, T: Ord>(x: &'a T, y: &'b T) -> &'c T { … }
So, for each reference a separate lifetime parameter is introduced. But no information on how the lifetime parameters relate to each other is available in this signature. The user of this generic function could use any lifetimes. And that's a problem inside its body. We're trying to return either x or y. But the type of x is &'a T. That's not compatible with the return type &'c T. The same is true for y. Since the compiler knows nothing about how these lifetimes relate to each other, it's not safe to return these references as a reference of type &'c T.
Can it ever be safe to go from a value of type &'a T to &'c T? Yes. It's safe if the lifetime 'a is equal or greater than the lifetime 'c. Or in other words 'a: 'c. So, we could write this
fn min<'a, 'b, 'c, T: Ord>(x: &'a T, y: &'b T) -> &'c T
where 'a: 'c, 'b: 'c
{ … }
and get away with it without the compiler complaining about the function's body. But it's actually unnecessarily complex. We can also simply write
fn min<'a, T: Ord>(x: &'a T, y: &'a T) -> &'a T { … }
and use a single lifetime parameter for everything. The compiler is able to deduce 'a as the minimum lifetime of the argument references at the call site just because we used the same lifetime name for both parameters. And this lifetime is precisely what we need for the return type.
I hope this answers your question. :)
Cheers!
What does the annotation <'a> after the function name mean?
fn substr<'a>(s: &'a str, until: u32) -> &'a str;
// ^^^^
This is declaring a generic lifetime parameter. It's similar to a generic type parameter (often seen as <T>), in that the caller of the function gets to decide what the lifetime is. Like you said, the lifetime of the result will be the same as the lifetime of the first argument.
All lifetime names are equivalent, except for one: 'static. This lifetime is pre-set to mean "guaranteed to live for the entire life of the program".
The most common lifetime parameter name is probably 'a, but you can use any letter or string. Single letters are most common, but any snake_case identifier is acceptable.
Why does the compiler need it, and what does it do with it?
Rust generally favors things to be explicit, unless there's a very good ergonomic benefit. For lifetimes, lifetime elision takes care of something like 85+% of cases, which seemed like a clear win.
Type parameters live in the same namespace as other types — is T a generic type or did someone name a struct that? Thus type parameters need to have an explicit annotation that shows that T is a parameter and not a real type. However, lifetime parameters don't have this same problem, so that's not the reason.
Instead, the main benefit of explicitly listing type parameters is because you can control how multiple parameters interact. A nonsense example:
fn better_str<'a, 'b, 'c>(a: &'a str, b: &'b str) -> &'c str
where
'a: 'c,
'b: 'c,
{
if a.len() < b.len() {
a
} else {
b
}
}
We have two strings and say that the input strings may have different lifetimes, but must both outlive the lifetime of the result value.
Another example, as pointed out by DK, is that structs can have their own lifetimes. I made this example also a bit of nonsense, but it hopefully conveys the point:
struct Player<'a> {
name: &'a str,
}
fn name<'p, 'n>(player: &'p Player<'n>) -> &'n str {
player.name
}
Lifetimes can be one of the more mind-bending parts of Rust, but they are pretty great when you start to grasp them.
The <'a> annotation just declares the lifetimes used in the function, exactly like generic parameters <T>.
fn subslice<'a, T>(s: &'a [T], until: u32) -> &'a [T] { \\'
&s[..until as usize]
}
Note that in your example, all lifetimes can be inferred.
fn subslice<T>(s: &[T], until: u32) -> &[T] {
&s[..until as usize]
}
fn substr(s: &str, until: u32) -> &str {
&s[..until as usize]
}
playpen example