What are the examples of unsafely specified lifetimes? [duplicate] - rust

This question already has answers here:
Why are explicit lifetimes needed in Rust?
(10 answers)
Closed 2 years ago.
I have been learning the lifetimes topic for the last three days, and they start making sense to me now. However, I experimented a lot, but didn't manage to specify lifetimes in a way when they'd lead to runtime-unsafe behavior, because the compiler seems to be smart enough to prevent such cases, by not compiling.
Hence I have the chain of questions below:
Is it true that Rust compiler will catch every case of unsafe lifetime specifiers usage?
If yes, then why does Rust require manually specifying lifetimes, when it can do it on its own, by deducing the unsafe scenarios? Or is it just a relic that will go away once the compiler becomes powerful enough to make lifetime elision everywhere?
If no, what is the example (are the examples) of unsafe lifetime specifiers usage? They'd clearly prove the necessity of manually specifying lifetimes.

It is not possible (barring any compiler bugs) to induce undefined behavior with lifetime specifiers unless you use unsafe code (either in the function or elsewhere). However, lifetime specifiers are still necessary because sometimes there is ambiguity in what the proper lifetime should be. For example:
fn foo(bar: &i32, baz: &i32) -> &i32 {
// ...
}
What should the lifetime of the return type be? The compiler cannot infer this because it could be tied to either bar or baz, and each case would affect how long the return value lasts and therefore how the function can be used. The body of the function cannot be used to infer the lifetime because type and lifetime checks must be possible to complete using only the signature of the function. The only way to remove this ambiguity is to explicitly state what lifetime the return value should have:
fn foo<'a>(bar: &i32, baz: &'a i32) -> &'a i32 {
// ...
}
You can read more about the lifetime elision rules here.

Related

Are Fn() + 'static and FnMut() + 'static equivalent?

I have what I think to be a simple question, but not much luck finding an answer for it.
Background
I understand the difference between Fn and FnMut in Rust, but I see quite often the need to accept closures that require a 'static lifetime bound.
The Question
Is an Fn() + 'static equivalent to FnMut() + 'static?
My Opinion
In my opinion, I seem to believe they are, because an Fn allows for capturing immutable references to its environment, whereas FnMut, mutable references, however, due to the 'static lifetime bound, the only references they can have, are owned ones, and therefore will almost always have move semantics associated with the closure. Since only owned values, or special &'static references can have 'static lifetime, it seems to me pointless to want to have or need an FnMut() in this case, since there is no mutable reference one might be able to get to the closures environment.
Am I wrong with this conclusion? My guess is yes, otherwise there would probably be a Clippy lint for this.
Any closure type can capture any kind of data. The difference is how the closure can access the captured data while it is executed. An Fn closure receives a shared reference to its captured data. An FnMut closure receives a mutable reference to its captured data, so it can mutate it. And finally, an FnOnce closure receives ownership of the captrued data, which is why you can call it only once.
The 'static trait bound means that the captured data has static lifetime. This is completely orthogonal to the question what a closure can do with its captured data while it is called.

Opposite of Borrow trait for Copy types?

I've seen the Borrow trait used to define functions that accept both an owned type or a reference, e.g. T or &T. The borrow() method is then called in the function to obtain &T.
Is there some trait that allows the opposite (i.e. a function that accepts T or &T and obtains T) for Copy types?
E.g. for this example:
use std::borrow::Borrow;
fn foo<T: Borrow<u32>>(value: T) -> u32 {
*value.borrow()
}
fn main() {
println!("{}", foo(&5));
println!("{}", foo(5));
}
This calls borrow() to obtain a reference, which is then immediately dereferenced.
Is there another implementation that just copies the value if T was passed in, and dereferences if &T was given? Or is the above the idiomatic way of writing this sort of thing?
There is not really an inverse trait for Borrow, because it's not really useful as a bound on functions the same way Borrow is. The reason has to do with ownership.
Why is "inverse Borrow" less useful than Borrow?
Functions that need references
Consider a function that only needs to reference its argument:
fn puts(arg: &str) {
println!("{}", arg);
}
Accepting String would be silly here, because puts doesn't need to take ownership of the data, but accepting &str means we might sometimes force the caller to keep the data around longer than necessary:
{
let output = create_some_string();
output.push_str(some_other_string);
puts(&output);
// do some other stuff but never use `output` again
} // `output` isn't dropped until here
The problem being that output isn't needed after it's passed to puts, and the caller knows this, but puts requires a reference, so output has to stay alive until the end of the block. Obviously you can always fix this in the caller by adding more blocks and sometimes a let, but puts can also be made generic to let the caller delegate the responsibility of cleaning up output:
fn puts<T: Borrow<str>>(arg: T) {
println!("{}", arg.borrow());
}
Accepting T: Borrow for puts gives the caller the flexibility to decide whether to keep the argument around or to move it into the function.¹
Functions that need owned values
Now consider the case of a function that actually needs to take ownership:
struct Wrapper(String);
fn wrap(arg: String) -> Wrapper {
Wrapper(arg)
}
In this case accepting &str would be silly, because wrap would have to call to_owned() on it. If the caller has a String that it's no longer using, that would needlessly copy the data that could have just been moved into the function. In this case, accepting String is the more flexible option, because it allows the caller to decide whether to make a clone or pass an existing String. Having an "inverse Borrow" trait would not add any flexibility that arg: String does not already provide.
But String isn't always the most ergonomic argument, because there are several different kinds of string: &str, Cow<str>, Box<str>... We can make wrap a little more ergonomic by saying it accepts anything that can be converted into a String.
fn wrap<T: Into<String>>(arg: T) -> Wrapper {
Wrapper(arg.into())
}
This means you can call it like wrap("hello, world") without having to call .to_owned() on the literal. Which is not really a flexibility win -- the caller can always call .into() instead without loss of generality -- but it is an ergonomic win.
What about Copy types?
Now, you asked about Copy types. For the most part the arguments above still apply. If you're writing a function that, like puts, only needs a &A, using T: Borrow<A> might be more flexible for the caller; for a function like wrap that needs the whole A, it's more flexible to just accept A. But for Copy types the ergonomic advantage of accepting T: Into<A> is much less clear-cut.
For integer types, because generics mess with type inference, using them usually makes it less ergonomic to use literals; you may end up having to explicitly annotate the types.
Since &u32 doesn't implement Into<u32>, that particular trick wouldn't work here anyway.
Since Copy types are readily available as owned values, it's less common to use them by reference in the first place.
Finally, turning a &A into an A when A: Copy is as simple as just adding *; being able to skip that step is probably not a compelling enough win to counterbalance the added complexity of using generics in most cases.
In conclusion, foo should almost certainly just accept value: u32 and let the caller decide how to get that value.
See also
Is it more conventional to pass-by-value or pass-by-reference when the method needs ownership of the value?
¹ For this particular function you'd probably want AsRef<str>, because you're not relying on the extra guarantees of Borrow, and the fact that all T implements Borrow<T> isn't usually relevant for unsized types such as str. But that is beside the point.
With the function you have you can only use a u32 or a type that can be borrowed as u32.
You can make your function more generic by using a second template argument.
fn foo<T: Copy, N: Borrow<T>>(value: N) -> T {
*value.borrow()
}
This is however only a partial solution as it will require type annotations in some cases to work correctly.
For example, it works out of the box with usize:
let v = 0usize;
println!("{}", foo(v));
There is no problem here for the compiler to guess that foo(v) is a usize.
However, if you try foo(&v), the compiler will complain that it cannot find the right output type T because &T could implement several Borrow traits for different types. You need to explicitly specify which one you want to use as output.
let output: usize = foo(&v);

Why is the "move" keyword necessary when it comes to threads; why would I ever not want that behavior?

For example (taken from the Rust docs):
let v = vec![1, 2, 3];
let handle = thread::spawn(move || {
println!("Here's a vector: {:?}", v);
});
This is not a question about what move does, but about why it is necessary to specify.
In cases where you want the closure to take ownership of an outside value, would there ever be a reason not to use the move keyword? If move is always required in these cases, is there any reason why the presence of move couldn't just be implied/omitted? For example:
let v = vec![1, 2, 3];
let handle = thread::spawn(/* move is implied here */ || {
// Compiler recognizes that `v` exists outside of this closure's
// scope and does black magic to make sure the closure takes
// ownership of `v`.
println!("Here's a vector: {:?}", v);
});
The above example gives the following compile error:
closure may outlive the current function, but it borrows `v`, which is owned by the current function
When the error magically goes away simply by adding move, I can't help but wonder to myself: why would I ever not want that behavior?
I'm not suggesting anything is wrong with the required syntax. I'm just trying to gain a deeper understanding of move from people who understand Rust better than I do. :)
It's all about lifetime annotations, and a design decision Rust made long ago.
See, the reason why your thread::spawn example fails to compile is because it expects a 'static closure. Since the new thread can run longer than the code that spawned it, we have to make sure that any captured data stays alive after the caller returns. The solution, as you pointed out, is to pass ownership of the data with move.
But the 'static constraint is a lifetime annotation, and a fundamental principle of Rust is that lifetime annotations never affect run-time behavior. In other words, lifetime annotations are only there to convince the compiler that the code is correct; they can't change what the code does.
If Rust inferred the move keyword based on whether the callee expects 'static, then changing the lifetimes in thread::spawn may change when the captured data is dropped. This means that a lifetime annotation is affecting runtime behavior, which is against this fundamental principle. We can't break this rule, so the move keyword stays.
Addendum: Why are lifetime annotations erased?
To give us the freedom to change how lifetime inference works, which allows for improvements like non-lexical lifetimes (NLL).
So that alternative Rust implementations like mrustc can save effort by ignoring lifetimes.
Much of the compiler assumes that lifetimes work this way, so to make it otherwise would take a huge effort with dubious gain. (See this article by Aaron Turon; it's about specialization, not closures, but its points apply just as well.)
There are actually a few things in play here. To help answer your question, we must first understand why move exists.
Rust has 3 types of closures:
FnOnce, a closure that consumes its captured variables (and hence can only be called once),
FnMut, a closure that mutably borrows its captured variables, and
Fn, a closure that immutably borrows its captured variables.
When you create a closure, Rust infers which trait to use based on how the closure uses the values from the environment. The manner in which a closure captures its environment depends on its type. A FnOnce captures by value (which may be a move or a copy if the type is Copyable), a FnMut mutably borrows, and a Fn immutably borrows. However, if you use the move keyword when declaring a closure, it will always "capture by value", or take ownership of the environment before capturing it. Thus, the move keyword is irrelevant for FnOnces, but it changes how Fns and FnMuts capture data.
Coming to your example, Rust infers the type of the closure to be a Fn, because println! only requires a reference to the value(s) it is printing (the Rust book page you linked talks about this when explaining the error without move). The closure thus attempts to borrow v, and the standard lifetime rules apply. Since thread::spawn requires that the closure passed to it have a 'static lifetime, the captured environment must also have a 'static lifetime, which v does not outlive, causing the error. You must thus explicitly specify that you want the closure to take ownership of v.
This can be further exemplified by changing the closure to something that the compiler would infer to be a FnOnce -- || v, as a simple example. Since the compiler infers that the closure is a FnOnce, it captures v by value by default, and the line let handle = thread::spawn(|| v); compiles without requiring the move.
The existing answers have great information, which led me to an understanding that is easier for me to think about, and hopefully easier for other Rust newcomers to get.
Consider this simple Rust program:
fn print_vec (v: &Vec<u32>) {
println!("Here's a vector: {:?}", v);
}
fn main() {
let mut v: Vec<u32> = vec![1, 2, 3];
print_vec(&v); // `print_vec()` borrows `v`
v.push(4);
}
Now, asking why the move keyword can't be implied is like asking why the "&" in print_vec(&v) can't also be implied.
Rust’s central feature is ownership. You can't just tell the compiler, "Hey, here's a bunch of code I wrote, now please discern perfectly everywhere I intend to reference, borrow, copy, move, etc. Kthnxsbye!" Symbols and keywords like & and move are a necessary and integral part of the language.
In hindsight, this seems really obvious, and makes my question seem a little silly!

Does the third rule of lifetime elision capture all cases for struct implementations?

The third rule of lifetime elision says
If there are multiple input lifetime parameters, but one of them is &self or &mut self because this is a method, then the lifetime of self is assigned to all output lifetime parameters. This makes writing methods much nicer.
Here is the tutorial describing what happened for this function
fn announce_and_return_part(&self, announcement: &str) -> &str
There are two input lifetimes, so Rust applies the first lifetime elision rule and gives both &self and announcement their own lifetimes. Then, because one of the parameters is &self, the return type gets the lifetime of &self, and all lifetimes have been accounted for.
We can show that all the lifetimes are not accounted for since it is possible that announcement will have a different lifetime than &self:
struct ImportantExcerpt<'a> {
part: &'a str,
}
impl<'a> ImportantExcerpt<'a> {
fn announce_and_return_part(&self, announcement: &str) -> &str {
println!("Attention please: {}", announcement);
announcement
}
}
fn main() {
let i = ImportantExcerpt { part: "IAOJSDI" };
let test_string_lifetime;
{
let a = String::from("xyz");
test_string_lifetime = i.announce_and_return_part(a.as_str());
}
println!("{:?}", test_string_lifetime);
}
The lifetime of announcement is not as long as &self, so it is not correct to associate the output lifetime to &self, shouldn't the output lifetime be associated to the longer of the input?
Why is the third rule of lifetime elision a valid way to assign output lifetime?
No, the elision rules do not capture every possible case for lifetimes. If they did, then there wouldn't be any elision rules, they would be the only rules and we wouldn't need any syntax to specify explicit lifetimes.
Quoting from the documentation you linked to, emphasis mine:
The patterns programmed into Rust's analysis of references are called
the lifetime elision rules. These aren't rules for programmers to
follow; the rules are a set of particular cases that the compiler will
consider, and if your code fits these cases, you don't need to write
the lifetimes explicitly.
The elision rules don't provide full inference: if Rust
deterministically applies the rules but there's still ambiguity as to
what lifetimes the references have, it won't guess what the lifetime
of the remaining references should be. In this case, the compiler will
give you an error that can be resolved by adding the lifetime
annotations that correspond to your intentions for how the references
relate to each other.
The lifetime of announcement is not as long as &self, so it is not correct to associate the output lifetime to &self
Why is the third rule of lifetime elision a valid way to assign output lifetime?
"correct" is probably not the right word to use here. What the elision rules have done is a valid way, it just doesn't happen to be what you might have wanted.
shouldn't the output lifetime be associated to the longer of the input?
Yes, that would be acceptable for this example, it's just not the most common case, so it's not what the elision rules were aimed to do.
See also:
Why are explicit lifetimes needed in Rust?
When do I need to specify explicit lifetimes in Rust?
When is it useful to define multiple lifetimes in a struct?
Why would you ever use the same lifetimes for references in a struct?

What are the identifiers denoted with a single apostrophe (')?

I've encountered a number of types in Rust denoted with a single apostrophe:
'static
'r
'a
What is the significance of that apostrophe (')? Maybe it's a modifier of references (&)? Generic typing specific to references? I've no idea where the documentation for this is hiding.
These are Rust's named lifetimes.
Quoting from The Rust Programming Language:
Every reference in Rust has a lifetime, which is the scope for which that reference is valid. Most of the time lifetimes are implicit and inferred, just like most of the time types are inferred. Similarly to when we have to annotate types because multiple types are possible, there are cases where the lifetimes of references could be related in a few different ways, so Rust needs us to annotate the relationships using generic lifetime parameters so that it can make sure the actual references used at runtime will definitely be valid.
Lifetime annotations don’t change how long any of the references
involved live. In the same way that functions can accept any type when
the signature specifies a generic type parameter, functions can accept
references with any lifetime when the signature specifies a generic
lifetime parameter. What lifetime annotations do is relate the
lifetimes of multiple references to each other.
Lifetime annotations have a slightly unusual syntax: the names of
lifetime parameters must start with an apostrophe '. The names of
lifetime parameters are usually all lowercase, and like generic types,
their names are usually very short. 'a is the name most people use as
a default. Lifetime parameter annotations go after the & of a
reference, and a space separates the lifetime annotation from the
reference’s type.
Said another way, a lifetime approximates the span of execution during which the data a reference points to is valid. The Rust compiler will conservatively infer the shortest lifetime possible to be safe. If you want to tell the compiler that a reference lives longer than the shortest estimate, you can name it, saying that the output reference, for example, has the same lifetime as a given input reference.
The 'static lifetime is a special lifetime, the longest lived of all lifetimes - for the duration of the program. A typical example are string "literals" that will always be available during the lifetime of the program/module.
You can get more information from this slide deck, starting around slide 29.
Lifetimes in Rust also discusses lifetimes in some depth.
To add to quux00's excellent answer, named lifetimes are also used to indicate the origin of a returned borrowed variable to the rust compiler.
This function
pub fn f(a: &str, b: &str) -> &str {
b
}
won't compile because it returns a borrowed value but does not specify whether it borrowed it from a or b.
To fix that, you'd declare a named lifetime and use the same lifetime for b and the return type:
pub fn f<'r>(a: &str, b: &'r str) -> &'r str {
// ---- --- ---
b
}
and use it as expected
f("a", "b")

Resources