Do rust lifetimes only refer to references? - rust

I'm trying to wrap my head around Rust lifetimes (as the official guides don't really explain them that well).
Do rust lifetimes only refer to references, or can they refer to base/primitive values as well?

Lifetimes are the link between values and references to said values.
In order to understand this link, I will use a broken parallel: houses and addresses.
A house is a physical entity. It is built on a piece of land at some time, will live for a few dozen or hundred years, may be renovated multiple times during this time, and will most likely be destroyed at some point.
An address is a logical entity, it may point to a house, or to other physical entities (a field, a school, a train station, a company's HQ, ...).
The lifetime of a house is relatively clear: it represents the duration during which a house is usable, from the moment it is built to the moment it is destroyed. The house may undergo several renovations during this time, and what used to be a simple cabana may end up being a full-fledged manor, but that is of no concern to us; for our purpose the house is living throughout those transformations. Only its creation and ultimate destruction matter... even though it might be better if no one happen to be in the bedroom when we tear the roof down.
Now, imagine that you are a real estate agent. You do not keep the houses you sell in your office, it's impractical; you do, however, keep their addresses!
Without the notion of lifetime, from time to time your customers will complain because the address you sent them to... was the address of a garbage dump, and not at all that lovely two-story house you had the photography of. You might also get a couple of inquiries from the police station asking why people holding onto a booklet from your office were found in a just destroyed house, the ensuing lawsuit might shut down your business.
This is obviously a risk to your business, and therefore you should seek a better solution. What if each address could be tagged with the lifetime of the house it refers to, so that you know not to send people to their death (or disappointment) ?
You may have recognized the C manual memory management strategy in that garbage dump; in C it's up to you, the real estate agent developer, to make sure that your addresses (pointers/references) always refer to living houses.
In Rust, however, the references are tagged with a special marker: 'enough; it represents the a lower-bound on the lifetime of the value referred.
When the compiler checks whether your usage of the reference is safe or not, it asks the question:
Is the value still alive ?
It does not matter whether the value will be there for a 100 years afterward, as long as it lives long 'enough for the use you have of it.

No, they refer to values as well. If it is not clear from the context how long they will live, they have to be annotated as well. It is then called a lifetime bound.
In the following example it is necessary to specify that the value, the reference is referring to, lives at least as long as the reference itself:
use std::num::Primitive;
struct Foo<'a, T: Primitive + 'a> {
a: &'a T
}
Try deleting the + 'a and the compiler will complain. This is required since T could be anything implementing Primitive.

Yes, they only refer to references, however those references can refer to primitive types. Rust is not like Java (and similar languages) that make a distinction between primitive types, which are passed by value, and more complex types (Objects in Java) that are passed by reference. Complex types can be allocated on the stack and passed by value, and references can be taken to primitive types.
For example, here is a function that takes two references to i32's, and returns a reference to the larger one:
fn bigger<'a>(a: &'a i32, b: &'a i32) -> &'a i32 {
if a > b { a } else { b }
}
It uses the lifetime 'a to communicate that the lifetime of the returned reference is the same as that of the references passed in.

When you see a lifetime annotation (e.g. 'a) in the code, there's almost always a reference, or borrowed pointer, involved.
The full syntax for borrowed pointers is &'a T. 'a is the lifetime of the referent. T is the type of the referent.
Structs and enums can have lifetime parameters. This is usually a consequence of the struct or enum containing a borrowed pointer. When you store a borrowed pointer in a struct or enum, you must explicitly state the referent's lifetime. For example, the Cow enum in the standard library contains a borrowed pointer in one of its variants. Therefore, it has a lifetime parameter that is used in the borrowed pointer's type to define the referent's lifetime.
Traits can have type bounds and also a lifetime bound. The lifetime bound indicates the largest region in which all the borrowed pointers in a concrete implementation of that trait are valid (i.e. their referents are alive). If the implementation contains no borrowed
pointers, then the lifetime is inferred as 'static. Lifetime bounds can appear in type parameter definitions, in where clauses and on trait objects.
Sometimes, you might want to define a struct or enum with a lifetime parameter, but without a corresponding value to borrow from. You can use a marker type, such as ContravariantLifetime<'a>, to ensure the lifetime parameter has the proper variance (ContravariantLifetime corresponds to the variance of borrowed pointers; without a marker, the lifetime would be bivariant, which means the lifetime could be substituted with any other lifetime... not very useful!). See an example of this use case here.

Related

What is the significance of the name `'a` in all of Rust's reference lifetime annotation examples?

All of Rust's documentation and third-party/blog examples (at least the top several results in Google) use <'a> to demonstrate lifetime annotations.
What is the significance of the name choice 'a?
Is this a new de-facto convention like the old i in for(i=0; ...)?
Do you use 'a in all the lifetime annotations in your production code?
What are some real-world examples if lifetime scope names you have used in your own code?
There is no special significance to 'a. It's just the first letter of the Latin alphabet and, if you had more than one lifetime, you might name them 'a, 'b, 'c etc.
One reason that descriptive names are not commonly used for lifetime parameters is similar to why we often use single letters for type parameters. These parameters represent all possible types with the actual type left up to the caller. In generic contexts, the actual argument could be anything and often it's completely unconstrained, so naming it might imply that the usage is narrower than it actually is. For example the T in Option<T> means the type and it wouldn't make sense to be more specific than that.
That said, lifetimes are often connected to another type and it's not uncommon to name lifetimes to be a bit more obvious where they come from. For example, you could use 't to name the lifetime of a T parameter:
fn foo<'t, T: 't>(arg: T> {}
serde is an example of a popular library that does this. When a value is borrowed from data that is being deserialized, this is usually named 'de:
pub trait Deserialize<'de>: Sized {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>;
}
An example from my own work where I've named them more carefully is in a query language that searches over in-memory data structures. I used 'd to refer to values borrowed from the data being queried and 'q for values borrowed from the query itself.
'a is just the first letter of the alphabet, and easy to type. Yes, it's a stand-in similar to i when iterating of a range of numbers or x in almost any context. It's most similar to using T for a generic parameter.
It has become a de-factor convention to use 'a, 'b, etc as lifetime names, because it reduces the amount of visual space they take up. In cases where clarity is important, you'll see longer names, often matching the name of the field or generic parameter:
struct SomeStruct<'name, 'comment> {
name: &'name str,
comment: &'comment str,
}
Is this a new de-facto convention like the old i in for(i=0; ...)?
Yes, pretty much. The names you pick for lifetime parameters aren't important (except for 'static, that's a special case). They're like variable names.
See the Rust Reference's definition of lifetime labels - the apostrophe is followed by an IDENTIFIER_OR_KEYWORD token.
Personally, I usually use 'a, 'b, etc. in my code, since the lifetimes aren't usually very tricky or important - I'm just trying to get the borrow checker happy.
But sometimes it's important to actually understand what's happening with the lifetimes involved, and thinking through it is nontrivial. In those cases I might use more informative lifetime parameter names.

When is a static lifetime not appropriate?

I have found a lot of information across the web about rust lifetimes, including information about static lifetimes. It makes sense to me that, in certain situations, you must guarantee that a reference will outlive everything.
For instance, I have a reference that I’m passing to a thread, and the compiler is requesting that the reference been marked as static. In this scenario, that seems to make sense because the compiler can’t know how long the thread will live and thus needs to ensure the passed reference outlives the thread. (I think that’s correct?)
I don’t know where this comes from, but I am always concerned that marking something with a static lifetime is something to be skeptical of, and avoided when possible.
So I wonder if that’s correct. Should I be critical of marking things with a static lifetime? Are there situations when the compiler will want to require one, but an alternate strategy might actually be more optimal?
What are some concrete ways that I can reason about the application of a static lifetime, and possibly determine when it might not be appropriate?
As you might have already guessed, there is no definitive, technical answer to this.
As a newcomer to Rust, 'static references seem to defeat the entire purpose of the borrowing system and there is a notion to avoid them. Once you get more experienced, this notion will go away.
First of all, 'static is not bad as it seems, since all things that have no other lifetimes associated with them are 'static, e.g. String::new(). Notice that 'static does not mean that the value in question does truly live forever. It just means that the value can be made to live forever. In your threading-examples, the thread can't make any promises about its own lifetime, so it needs to be able to make all things passed to it live forever. Any owned value which does not include lifetimes shorter than 'static (like vec![1,2,3]) can be made to live forever (simply by not destroying them) and are therefor 'static.
Second, &'static - the static reference - does not come up often anyway. If it does, you'll usually be aware of why. You won't see a lot of fn foo(bar: &'static Bar) because there simply aren't that many use-cases for it, not because it is actively avoided.
There are situations where 'static does come up in surprising ways. Out of my head:
A Box<dyn Trait> is implicitly a Box<dyn Trait + 'static>. This is because when the type of the value inside the Box gets erased, it might have had lifetimes associated with it; and all (different) types must be valid for as long as the Box lives. Therefore all types need to share a common denominator wrt their lifetimes and Rust is defined to choose 'static. This choice is usually ok, but can lead to surprising "requires 'static" errors. You can generalize this explicitly to Box<dyn Trait + 'a>
If you have a custom impl Drop on your type, the Drop-checker may not be able to prove that the destructor is unable to observe values that have already been dropped. To prevent the Drop impl from accessing references to values that have already been dropped, the compiler requires the entire type to only have 'static references inside of it. This can be overcome by an unsafe impl, which lifts the 'static-requirement.
Instead of &'static T, pass Arc<T> to the thread. This has only a tiny cost and ensures lifetimes will not be longer than necessary.

Using same reference for multiple method parameters

I'll preface by saying I'm very new to Rust, and I'm still wrapping my head around the semantics of the borrow-checker. I have some understanding of why it doesn't like my code, but I'm not sure how to resolve it in an idiomatic way.
I have a method in Rust which accepts 3 parameters with a signature that looks something like this:
fn do_something(&mut self, mem: &mut impl TraitA, bus: &mut impl TraitB, int_lines: &impl TraitC) -> ()
I also have a struct which implements all three of these traits; however, the borrow-checker is complaining when I attempt to use the same reference for multiple parameters:
cannot borrow `*self` as mutable more than once at a time
And also:
cannot borrow `*self` as immutable because it is also borrowed as mutable
My first question is whether this is a shortcoming of the borrow-checker (being unable to recognize that the same reference is being passed), or by design (I suspect this is the case, since from the perspective of the called method each reference is distinct and thus the ownership of each can be regarded separately).
My second question is what the idiomatic approach would be. The two solutions I see are:
a) Combining all three traits into one. While this is technically trivial given my library's design, it would make the code decidedly less clean since the three traits are used to interface with unrelated parts of the struct's state. Furthermore, since this is a library (the do_something method is part of a test), it hinders the possibility of separating the state out into separate structs.
b) Moving each respective part of the struct's state into separate structs, which are then owned by the main struct. This seems like the better option to me, especially since it does not require any changes to the library code itself.
Please let me know if I'm missing another solution, or if there's a way to convince the borrow-checker to accept my original design.
The borrow checker is operating as designed. It only knows you are passing three different mutable references into the same function: it does not know what the function will do with these, even if they do happen to be references to the same struct. Within the function they are three different mutable references to the same struct.
If the three different traits represent three different functional aspects, then your best approach might be to split the struct into different structs, each implementing one of the traits, as you have proposed.
If you would prefer to keep a single struct, and if the function will always be called with a single struct, then you can just pass it in once like this:
fn do_something(&mut self, proc: &mut (impl TraitA + TraitB + TraitC)) -> () { ... }

Do I have to create distinct structs for both owned (easy-to-use) and borrowed (more efficient) data structures?

I have a Message<'a> which has &'a str references on a mostly short-lived buffer.
Those references mandate a specific program flow as they are guaranteed to never outlive the lifetime 'a of the buffer.
Now I also want to have an owned version of Message, such that it can be moved around, sent via threads, etc.
Is there an idiomatic way to achieve this? I thought that Cow<'a, str> might help, but unfortunately, Cow does not magically allocate in case &'a str would outlive the buffer's lifetime.
AFAIK, Cow is not special in the sense that no matter if Cow holds an Owned variant, it must still pass the borrow checker on 'a.
Definition of std::borrow::Cow.
pub enum Cow<'a, B> {
Borrowed(&'a B),
Owned(<B as ToOwned>::Owned),
}
Is there an idiomatic way to have an owned variant of Message? For some reason we have &str and String, &[u8] and Vec<u8>, ... does that mean people generally would go for &msg and Message?
I suppose I still have to think about if an owned variant is really, really needed, but my experience shows that having an escape hatch for owned variants generally improves prototyping speed.
Yes, you need to have multiple types, one representing the owned concept and one representing the borrowed concept.
You'll see the same technique throughout the standard library and third-party crates.
See also:
How to abstract over a reference to a value or a value itself?
How to avoid writing duplicate accessor functions for mutable and immutable references in Rust?

What are the identifiers denoted with a single apostrophe (')?

I've encountered a number of types in Rust denoted with a single apostrophe:
'static
'r
'a
What is the significance of that apostrophe (')? Maybe it's a modifier of references (&)? Generic typing specific to references? I've no idea where the documentation for this is hiding.
These are Rust's named lifetimes.
Quoting from The Rust Programming Language:
Every reference in Rust has a lifetime, which is the scope for which that reference is valid. Most of the time lifetimes are implicit and inferred, just like most of the time types are inferred. Similarly to when we have to annotate types because multiple types are possible, there are cases where the lifetimes of references could be related in a few different ways, so Rust needs us to annotate the relationships using generic lifetime parameters so that it can make sure the actual references used at runtime will definitely be valid.
Lifetime annotations don’t change how long any of the references
involved live. In the same way that functions can accept any type when
the signature specifies a generic type parameter, functions can accept
references with any lifetime when the signature specifies a generic
lifetime parameter. What lifetime annotations do is relate the
lifetimes of multiple references to each other.
Lifetime annotations have a slightly unusual syntax: the names of
lifetime parameters must start with an apostrophe '. The names of
lifetime parameters are usually all lowercase, and like generic types,
their names are usually very short. 'a is the name most people use as
a default. Lifetime parameter annotations go after the & of a
reference, and a space separates the lifetime annotation from the
reference’s type.
Said another way, a lifetime approximates the span of execution during which the data a reference points to is valid. The Rust compiler will conservatively infer the shortest lifetime possible to be safe. If you want to tell the compiler that a reference lives longer than the shortest estimate, you can name it, saying that the output reference, for example, has the same lifetime as a given input reference.
The 'static lifetime is a special lifetime, the longest lived of all lifetimes - for the duration of the program. A typical example are string "literals" that will always be available during the lifetime of the program/module.
You can get more information from this slide deck, starting around slide 29.
Lifetimes in Rust also discusses lifetimes in some depth.
To add to quux00's excellent answer, named lifetimes are also used to indicate the origin of a returned borrowed variable to the rust compiler.
This function
pub fn f(a: &str, b: &str) -> &str {
b
}
won't compile because it returns a borrowed value but does not specify whether it borrowed it from a or b.
To fix that, you'd declare a named lifetime and use the same lifetime for b and the return type:
pub fn f<'r>(a: &str, b: &'r str) -> &'r str {
// ---- --- ---
b
}
and use it as expected
f("a", "b")

Resources