When is a static lifetime not appropriate? - rust

I have found a lot of information across the web about rust lifetimes, including information about static lifetimes. It makes sense to me that, in certain situations, you must guarantee that a reference will outlive everything.
For instance, I have a reference that I’m passing to a thread, and the compiler is requesting that the reference been marked as static. In this scenario, that seems to make sense because the compiler can’t know how long the thread will live and thus needs to ensure the passed reference outlives the thread. (I think that’s correct?)
I don’t know where this comes from, but I am always concerned that marking something with a static lifetime is something to be skeptical of, and avoided when possible.
So I wonder if that’s correct. Should I be critical of marking things with a static lifetime? Are there situations when the compiler will want to require one, but an alternate strategy might actually be more optimal?
What are some concrete ways that I can reason about the application of a static lifetime, and possibly determine when it might not be appropriate?

As you might have already guessed, there is no definitive, technical answer to this.
As a newcomer to Rust, 'static references seem to defeat the entire purpose of the borrowing system and there is a notion to avoid them. Once you get more experienced, this notion will go away.
First of all, 'static is not bad as it seems, since all things that have no other lifetimes associated with them are 'static, e.g. String::new(). Notice that 'static does not mean that the value in question does truly live forever. It just means that the value can be made to live forever. In your threading-examples, the thread can't make any promises about its own lifetime, so it needs to be able to make all things passed to it live forever. Any owned value which does not include lifetimes shorter than 'static (like vec![1,2,3]) can be made to live forever (simply by not destroying them) and are therefor 'static.
Second, &'static - the static reference - does not come up often anyway. If it does, you'll usually be aware of why. You won't see a lot of fn foo(bar: &'static Bar) because there simply aren't that many use-cases for it, not because it is actively avoided.
There are situations where 'static does come up in surprising ways. Out of my head:
A Box<dyn Trait> is implicitly a Box<dyn Trait + 'static>. This is because when the type of the value inside the Box gets erased, it might have had lifetimes associated with it; and all (different) types must be valid for as long as the Box lives. Therefore all types need to share a common denominator wrt their lifetimes and Rust is defined to choose 'static. This choice is usually ok, but can lead to surprising "requires 'static" errors. You can generalize this explicitly to Box<dyn Trait + 'a>
If you have a custom impl Drop on your type, the Drop-checker may not be able to prove that the destructor is unable to observe values that have already been dropped. To prevent the Drop impl from accessing references to values that have already been dropped, the compiler requires the entire type to only have 'static references inside of it. This can be overcome by an unsafe impl, which lifts the 'static-requirement.

Instead of &'static T, pass Arc<T> to the thread. This has only a tiny cost and ensures lifetimes will not be longer than necessary.

Related

Using same reference for multiple method parameters

I'll preface by saying I'm very new to Rust, and I'm still wrapping my head around the semantics of the borrow-checker. I have some understanding of why it doesn't like my code, but I'm not sure how to resolve it in an idiomatic way.
I have a method in Rust which accepts 3 parameters with a signature that looks something like this:
fn do_something(&mut self, mem: &mut impl TraitA, bus: &mut impl TraitB, int_lines: &impl TraitC) -> ()
I also have a struct which implements all three of these traits; however, the borrow-checker is complaining when I attempt to use the same reference for multiple parameters:
cannot borrow `*self` as mutable more than once at a time
And also:
cannot borrow `*self` as immutable because it is also borrowed as mutable
My first question is whether this is a shortcoming of the borrow-checker (being unable to recognize that the same reference is being passed), or by design (I suspect this is the case, since from the perspective of the called method each reference is distinct and thus the ownership of each can be regarded separately).
My second question is what the idiomatic approach would be. The two solutions I see are:
a) Combining all three traits into one. While this is technically trivial given my library's design, it would make the code decidedly less clean since the three traits are used to interface with unrelated parts of the struct's state. Furthermore, since this is a library (the do_something method is part of a test), it hinders the possibility of separating the state out into separate structs.
b) Moving each respective part of the struct's state into separate structs, which are then owned by the main struct. This seems like the better option to me, especially since it does not require any changes to the library code itself.
Please let me know if I'm missing another solution, or if there's a way to convince the borrow-checker to accept my original design.
The borrow checker is operating as designed. It only knows you are passing three different mutable references into the same function: it does not know what the function will do with these, even if they do happen to be references to the same struct. Within the function they are three different mutable references to the same struct.
If the three different traits represent three different functional aspects, then your best approach might be to split the struct into different structs, each implementing one of the traits, as you have proposed.
If you would prefer to keep a single struct, and if the function will always be called with a single struct, then you can just pass it in once like this:
fn do_something(&mut self, proc: &mut (impl TraitA + TraitB + TraitC)) -> () { ... }

Why is transmuting &T to &mut T Undefined Behaviour?

I want to reinterpret an immutable reference to a mutable reference (in an unsafe block) and be responsible for the safety checks on my own, yet it appears I cannot use mem::transmute() to do so.
let map_of_vecs: HashMap<usize, Vec<_>> = ...;
let vec = map_of_vecs[2];
/// obtain a mutable reference to vec here
I do not want to wrap the Vecs into Cells because that would affect all other areas of code that use map_of_vecs and I only need mutability in one line.
I do not have mutable access to map_of_vecs
The Rust optimiser makes the assumption that &mut T references are unique. For example, it might deduce that a particular piece of memory can be reused because a mutable reference to that memory exists but is never accessed again.
However, if you transmute a &T to a &mut T then you are able to create multiple mutable references to the same data. If the compiler makes this assumption, you could end up dereferencing a value that has been overwritten with something else.
This is just one example of how the compiler might make use of the assumption that mutable references are unique. In fact, the compiler is free to use this information in any way it sees fit — which could (and likely will) change from version to version.
Even if you think you have guaranteed that the reference isn't aliased, you can't always guarantee that users of your code won't create more references. Even if you think you can be sure of that, the existence of references is extremely subtle and it's very easy to miss one. For example when you call a method that takes &self, that's a reference.
The Rust compiler annotates &T function parameters with the LLVM noalias and readonly attributes (provided that T does not contain any UnsafeCell parts). The noalias attribute tells LLVM that the memory behind this pointer may only be written to through this pointer (and not through any other pointers), and the readonly attribute tells LLVM that it can't be written to through this pointer (but possibly other pointers). In combination, the two attributes allow the LLVM optimiser to assume the memory is not changed at all during the execution of this function, and the code can be optimised based on this assumption. The optimiser may reorder instructions or remove code in a way that is only safe to do if you actually stick to this contract.
Another way the conversion can lead to undefined behaviour is for statics: immutable statics without UnsafeCells will be placed into read-only memory, so if you actually write to them, your code will segfault.
For parameters with UnsafeCells the compiler does not emit the readonly attribute, and statics containing an UnsafeCell are placed into writable memory.

Is there a shared pointer with a single strong owner and multiple weak references?

I am looking for a smart pointer similar to Arc/Rc except that it does not allow shared ownership.
I want to have as many rc::Weak references as I need, but I only want one strong reference, a.k.a owner. And I want to enforce that with the type system.
Arc/Rc can be cloned, and they can be owned at several places.
Rolling up my own smart pointer would be an option, but I believe such data structure should already exist, even if outside the standard library.
I am looking for a data structure providing this kind of interface:
impl MySmartPointer<T> {
fn new(object: T) -> Self;
fn weak_ref(&self) -> WeakRef<T>;
fn get_mut(&mut self) -> &mut T;
}
impl WeakRef<T> {
/// If the strong pointer `MySmartPointer` has been dropped,
/// return `None`. Else return Some(&T);
fn get(&self) -> Option<&T>;
}
Let's assume it exists with types Strong<T> and Weak<T>. How do you use Weak<T>? You need some kind of fallible "upgrade" step, so what does Weak<T> upgrade to? It can't be to a plain reference (as you've stated), because Strong<T> needs to know whether or not any "upgraded" Weak<T>s exist. If it didn't, it could deallocate its storage whilst the value is still being accessed.
So Weak<T> must upgrade to some kind of SemiWeak<T> which keeps the underlying allocation alive... which is exactly what shared ownership is.
What if you somehow guaranteed that Strong<T> couldn't be deallocated before all Weak<T>s go away? Congratulations, you've just re-invented T and &T: you could literally just use those instead.
Alright, so what if you made it so that Weak<T> upgrades into a SemiWeak<'a, T> that is tied to the lifetime of the Weak<T> so that it can't outlive it, and can only be a temporary? All you're really doing in that case is hiding the fact that you've got shared ownership. Under the hood, SemiWeak would still need to guarantee the underlying Strong can't go away. You could trivially build such a type from Rc<T> in perhaps ten minutes. This would effectively give you a type that is exactly like Rc<T>, with the same performance and memory cost, but less useful.
In addition, that get_mut method can't exist. There's no way to prevent SemiWeak<T>s from existing. Unless you use borrowing but, again, that's just using T and &T.
So, no, I don't think this exists, nor do I believe it can in the form you've described.
As a final aside, just having Weak<T> at all is a form of shared ownership, because those Weak<T>s need to point to something. In the case of Rc<T>, the weak counter is stored right alongside the strong counter, so whilst the value can be destroyed, the allocation itself sticks around. You could split the two, but now you're paying for two allocations and double indirection (probably leading to more cache misses).

Do Rust lifetimes influence the semantics of the compiled program?

I'm trying to grok lifetimes in Rust and asked myself whether they are "just" a safety measure (and a way to communicate how safety is ensured, or not, in the case of errors) or if there are cases where different choices of lifetimes actually change how the program runs, i.e. whether lifetimes make a semantic difference to the compiled program.
And with "lifetimes" I refer to all the pesky little 'a, 'b, 'static markers we include to make the borrow checker happy. Of course, writing
{
let foo = File::open("foo.txt")?;
}
foo.write_all(b"bar");
instead of
let foo = File::open("foo.txt")?;
foo.write_all(b"bar");
will close the file descriptor before the write occurs, even if we could access foo afterwards, but that kind of scoping and destructor-calling also happens in C++.
No, lifetimes do not affect the generated machine code in any way. At the end of the day, it's all "just pointers" to the compiled code.
Because we are humans speaking a human language, we tend to lump two different but related concepts together: concrete lifetimes and generic lifetime parameters.
All programming languages have concrete lifetimes. That just corresponds to when a resource will be released. That's what your example shows and indeed, C++ works the same as Rust does there. This is often known as Resource Acquisition Is Initialization (RAII). Garbage-collected languages have lifetimes too, but they can be harder to nail down exactly when they end.
What makes Rust neat in this area are the generic lifetime parameters, the things we know as 'a or 'static. These allow the compiler to track the underlying pointers so that the programmer doesn't need to worry if the pointer will remain valid long enough. This works for storing references in structs and passing them to and from functions.

Do rust lifetimes only refer to references?

I'm trying to wrap my head around Rust lifetimes (as the official guides don't really explain them that well).
Do rust lifetimes only refer to references, or can they refer to base/primitive values as well?
Lifetimes are the link between values and references to said values.
In order to understand this link, I will use a broken parallel: houses and addresses.
A house is a physical entity. It is built on a piece of land at some time, will live for a few dozen or hundred years, may be renovated multiple times during this time, and will most likely be destroyed at some point.
An address is a logical entity, it may point to a house, or to other physical entities (a field, a school, a train station, a company's HQ, ...).
The lifetime of a house is relatively clear: it represents the duration during which a house is usable, from the moment it is built to the moment it is destroyed. The house may undergo several renovations during this time, and what used to be a simple cabana may end up being a full-fledged manor, but that is of no concern to us; for our purpose the house is living throughout those transformations. Only its creation and ultimate destruction matter... even though it might be better if no one happen to be in the bedroom when we tear the roof down.
Now, imagine that you are a real estate agent. You do not keep the houses you sell in your office, it's impractical; you do, however, keep their addresses!
Without the notion of lifetime, from time to time your customers will complain because the address you sent them to... was the address of a garbage dump, and not at all that lovely two-story house you had the photography of. You might also get a couple of inquiries from the police station asking why people holding onto a booklet from your office were found in a just destroyed house, the ensuing lawsuit might shut down your business.
This is obviously a risk to your business, and therefore you should seek a better solution. What if each address could be tagged with the lifetime of the house it refers to, so that you know not to send people to their death (or disappointment) ?
You may have recognized the C manual memory management strategy in that garbage dump; in C it's up to you, the real estate agent developer, to make sure that your addresses (pointers/references) always refer to living houses.
In Rust, however, the references are tagged with a special marker: 'enough; it represents the a lower-bound on the lifetime of the value referred.
When the compiler checks whether your usage of the reference is safe or not, it asks the question:
Is the value still alive ?
It does not matter whether the value will be there for a 100 years afterward, as long as it lives long 'enough for the use you have of it.
No, they refer to values as well. If it is not clear from the context how long they will live, they have to be annotated as well. It is then called a lifetime bound.
In the following example it is necessary to specify that the value, the reference is referring to, lives at least as long as the reference itself:
use std::num::Primitive;
struct Foo<'a, T: Primitive + 'a> {
a: &'a T
}
Try deleting the + 'a and the compiler will complain. This is required since T could be anything implementing Primitive.
Yes, they only refer to references, however those references can refer to primitive types. Rust is not like Java (and similar languages) that make a distinction between primitive types, which are passed by value, and more complex types (Objects in Java) that are passed by reference. Complex types can be allocated on the stack and passed by value, and references can be taken to primitive types.
For example, here is a function that takes two references to i32's, and returns a reference to the larger one:
fn bigger<'a>(a: &'a i32, b: &'a i32) -> &'a i32 {
if a > b { a } else { b }
}
It uses the lifetime 'a to communicate that the lifetime of the returned reference is the same as that of the references passed in.
When you see a lifetime annotation (e.g. 'a) in the code, there's almost always a reference, or borrowed pointer, involved.
The full syntax for borrowed pointers is &'a T. 'a is the lifetime of the referent. T is the type of the referent.
Structs and enums can have lifetime parameters. This is usually a consequence of the struct or enum containing a borrowed pointer. When you store a borrowed pointer in a struct or enum, you must explicitly state the referent's lifetime. For example, the Cow enum in the standard library contains a borrowed pointer in one of its variants. Therefore, it has a lifetime parameter that is used in the borrowed pointer's type to define the referent's lifetime.
Traits can have type bounds and also a lifetime bound. The lifetime bound indicates the largest region in which all the borrowed pointers in a concrete implementation of that trait are valid (i.e. their referents are alive). If the implementation contains no borrowed
pointers, then the lifetime is inferred as 'static. Lifetime bounds can appear in type parameter definitions, in where clauses and on trait objects.
Sometimes, you might want to define a struct or enum with a lifetime parameter, but without a corresponding value to borrow from. You can use a marker type, such as ContravariantLifetime<'a>, to ensure the lifetime parameter has the proper variance (ContravariantLifetime corresponds to the variance of borrowed pointers; without a marker, the lifetime would be bivariant, which means the lifetime could be substituted with any other lifetime... not very useful!). See an example of this use case here.

Resources