Why is an explicit lifetime needed in this Rust function output?

Why is an explicit lifetime needed in this Rust function output? - rust

I have a Config struct to store the global config. The config needs to be passed to different structs (abstract layers). For config consistency and saving memory, structs store a reference to the global config instead of a copy.
Then I have the implementation like the following: Playground
/// Global Config
#[derive(Debug)]
struct Config {
pub version: String,
}
/// Layer A
#[derive(Debug)]
struct A<'cfg> {
id: u32,
config: &'cfg Config,
}
/// Layer B
#[derive(Debug)]
struct B<'cfg> {
a_id: u32,
b_id: u32,
config: &'cfg Config,
}
impl<'cfg> A<'cfg> {
pub fn new(id: u32, config: &Config) -> A {
A { id, config }
}
#[allow(unused)]
pub fn version(&self) {
println!("{}_a{}", self.config.version, self.id);
}
pub fn create_many_b(&self) -> Vec<B<'cfg>> { // <-- Removing explicit `'cfg` lifetime for B here makes complier complain
let cfg = self.config;
let mut res = Vec::new();
for id in 0..10u32 {
res.push(B { a_id: self.id, b_id: id, config: cfg });
}
res
}
}
impl<'cfg> B<'cfg> {
pub fn version(&self) {
println!("{}_a{}+b{}", self.config.version, self.a_id, self.b_id);
}
}
fn create_many_a(config: &Config) -> Vec<A> {
let mut res = Vec::new();
for id in 0..5u32 {
res.push(A::new(id, config));
}
res
}
fn main() {
// Create a global config
let cfg = Config { version: "1.0".to_owned() };
println!("cfg pointer: {:p}", &cfg);
// Use the global config to create many `A`s
let a_vec = create_many_a(&cfg);
let mut b_vec = Vec::new();
// then create many `B`s
for a in a_vec.into_iter() {
b_vec.extend(a.create_many_b())
}
for b in b_vec.iter() {
b.version();
}
}
As my code comment said, the compiler rejects building when removing the lifetime mark (playground). My question is why it is required to have an explicit lifetime mark for B in the function create_many_b. How is the lifetime elision work in this example? Shouldn't the Bs created by an A via function create_many_b have the same lifetime as the A, which already has a 'cfg lifetime from the Config?

With these formulations
pub fn create_many_b(&self) -> Vec<B>
fn create_many_a(config: &Config) -> Vec<A>
the compiler deduces that the lifetime information missing for the result is identical to the lifetime of the reference given as argument.
In the case of create_many_a() this is correct because the missing lifetime is exactly the lifetime of config (it's what we want).
On the other hand, in the case of create_many_b() we don't want the Bs to contain a reference with a lifetime which is the lifetime of self (a A) but we want the lifetime used by the config member inside this A.
Thus, writing
impl<'cfg> A<'cfg> {
...
pub fn create_many_b(&self) -> Vec<B<'cfg>>
clearly states that the lifetime information used for B is the same as the one used for A.
The A referenced by self can disappear, the reference inside the B is still valid because it does not depend on the A itself but on the Config the A was dependant on.
In the main() function
for a in a_vec.into_iter() {
introduces a A that will disappear at each iteration (they are consumed from a_vec), so the created Bs cannot last longer than this iteration if they depend on a, but they can if they depend on cfg which will still exist after this loop.
If you had written
for a in a_vec.iter() {
(still with the missing <'cfg> annotation) it would have been accepted by the compiler, but it would have been misleading since the Bs inside b_vec depend on the As inside a_vec and not directly on cfg which is what was intended in the definition of struct B<'cfg> {....

With elided lifetimes, the compiler will infer the lifetimes to be:
pub fn create_many_b<'a>(&'a self) -> Vec<B<'a>>;
This means that the Bs in the vector may not outlive the caller's reference to self.
If you were borrowing from data that was owned by self, this would be the correct lifetime: the data would be invalid after self was dropped. But the data is borrowed from config which is not owned by A, so you can make it less restrictive by expressing in the lifetimes where the data actually comes from:
pub fn create_many_b(&self) -> Vec<B<'cfg>>;
Often you wouldn't notice the difference in these two signatures, but your usage of these types actually requires the more permissive lifetimes. In particular, a_vec.into_iter() consumes a. This conflicts with the inferred lifetime because you then use the Bs in the vector after a is consumed.
Using the less restrictive lifetime allows you to use the Bs as long as config has not been dropped, regardless of what happened to the A.

When you remove the lifetime annotation, the compiler will infer it by filling the gaps. Something like this:
pub fn create_many_b<'a>(&'a self) -> Vec<B<'a>> {
The compiler is assuming that each B will live as long as the reference to self.
When you call this function:
for a in a_vec.into_iter() {
b_vec.extend(a.create_many_b())
} <- "a" is dropped here
You are supposed to have a Vector of Bs, of which each one is holding a reference to a, which has already been dropped at this point.
So, basically, by removing the lifetime annotation you are telling the compiler to infer it, in which case the compiler will fill the lifetimes based on the reference it holds within the function, rather than the lifetime of A.
I hope this helps to understand the error message that you are receiving.

Related

Can I have a mutable reference to a type and its trait object in the same scope? [duplicate]

Why can I have multiple mutable references to a static type in the same scope?
My code:
static mut CURSOR: Option<B> = None;
struct B {
pub field: u16,
}
impl B {
pub fn new(value: u16) -> B {
B { field: value }
}
}
struct A;
impl A {
pub fn get_b(&mut self) -> &'static mut B {
unsafe {
match CURSOR {
Some(ref mut cursor) => cursor,
None => {
CURSOR= Some(B::new(10));
self.get_b()
}
}
}
}
}
fn main() {
// first creation of A, get a mutable reference to b and change its field.
let mut a = A {};
let mut b = a.get_b();
b.field = 15;
println!("{}", b.field);
// second creation of A, a the mutable reference to b and change its field.
let mut a_1 = A {};
let mut b_1 = a_1.get_b();
b_1.field = 16;
println!("{}", b_1.field);
// Third creation of A, get a mutable reference to b and change its field.
let mut a_2 = A {};
let b_2 = a_2.get_b();
b_2.field = 17;
println!("{}", b_1.field);
// now I can change them all
b.field = 1;
b_1.field = 2;
b_2.field = 3;
}
I am aware of the borrowing rules
one or more references (&T) to a resource,
exactly one mutable reference (&mut T).
In the above code, I have a struct A with the get_b() method for returning a mutable reference to B. With this reference, I can mutate the fields of struct B.
The strange thing is that more than one mutable reference can be created in the same scope (b, b_1, b_2) and I can use all of them to modify B.
Why can I have multiple mutable references with the 'static lifetime shown in main()?
My attempt at explaining this is behavior is that because I am returning a mutable reference with a 'static lifetime. Every time I call get_b() it is returning the same mutable reference. And at the end, it is just one identical reference. Is this thought right? Why am I able to use all of the mutable references got from get_b() individually?

There is only one reason for this: you have lied to the compiler. You are misusing unsafe code and have violated Rust's core tenet about mutable aliasing. You state that you are aware of the borrowing rules, but then you go out of your way to break them!
unsafe code gives you a small set of extra abilities, but in exchange you are now responsible for avoiding every possible kind of undefined behavior. Multiple mutable aliases are undefined behavior.
The fact that there's a static involved is completely orthogonal to the problem. You can create multiple mutable references to anything (or nothing) with whatever lifetime you care about:
fn foo() -> (&'static i32, &'static i32, &'static i32) {
let somewhere = 0x42 as *mut i32;
unsafe { (&*somewhere, &*somewhere, &*somewhere) }
}
In your original code, you state that calling get_b is safe for anyone to do any number of times. This is not true. The entire function should be marked unsafe, along with copious documentation about what is and is not allowed to prevent triggering unsafety. Any unsafe block should then have corresponding comments explaining why that specific usage doesn't break the rules needed. All of this makes creating and using unsafe code more tedious than safe code, but compared to C where every line of code is conceptually unsafe, it's still a lot better.
You should only use unsafe code when you know better than the compiler. For most people in most cases, there is very little reason to create unsafe code.
A concrete reminder from the Firefox developers:

How to assign an impl trait in a struct?

Consider some struct (HiddenInaccessibleStruct) that is not accessible, but implements an API trait. The only way to obtain an object of this hidden type is by calling a function, that returns an opaque implementation of this type. Another struct owns some type, that makes use of this API trait. Right now, it seems not possible to assign this field in fn new(). The code below can also be found in rust playgrounds.
// -- public api
trait Bound {
fn call(&self) -> Self;
}
// this is not visible
#[derive(Default)]
struct HiddenInaccessibleStruct;
impl Bound for HiddenInaccessibleStruct {
fn call(&self) -> Self { }
}
// -- public api
pub fn load() -> impl Bound {
HiddenInaccessibleStruct::default()
}
struct Abc<T> where T : Bound {
field : T
}
impl<T> Abc<T> where T : Bound {
pub fn new() -> Self {
let field = load();
Abc {
field // this won't work, since `field` has an opaque type.
}
}
}
Update
The API trait Bound declares a function, that returns Self, hence it is not Sized.

There are two concepts in mid-air collision here: Universal types and existential types. An Abc<T> is a universal type and we, including Abc, can refer to whatever T actually is as T (simple as that). impl Trait-types are Rust's closest approach to existential types, where we only promise that such a type exists, but we can't refer to it (there is no T which holds the solution). This also means your constructor can't actually create a Abc<T>, because it can't decide what T is. Also see this article.
One solution is to kick the problem upstairs: Change the constructor to take a T from the outside, and pass the value into it:
impl<T> Abc<T>
where
T: Bound,
{
pub fn new(field: T) -> Self {
Abc { field }
}
}
fn main() {
let field = load();
let abc = Abc::new(field);
}
See this playground.
This works, but it only shifts the problem: The type of abc in main() is Abc<impl Bound>, which is (currently) impossible to write down. If you change the line to let abc: () = ..., the compiler will complain that you are trying to assign Abc<impl Bound> to (). If you try to comply with the advice and change the line to let abc: Abc<impl Bound> = ..., the compiler will complain that this type is invalid. So you have to leave the type of abc being implied. This brings some useability issues with Abc<impl Bound>, because you can't easily put values of that type into other structs etc.; basically, the existential type "infects" the outer type containing it.
impl Trait-types are mostly useful for immediate consumption, e.g. impl Iterator<Item=...>. In your case, with the aim apparently being to hide the type, you may get away with sealing Bound. In a more general case, it may be better to use dynamic dispatch (Box<dyn Bound>).

Why are borrows of struct members allowed in &mut self, but not of self to immutable methods?

If I have a struct that encapsulates two members, and updates one based on the other, that's fine as long as I do it this way:
struct A {
value: i64
}
impl A {
pub fn new() -> Self {
A { value: 0 }
}
pub fn do_something(&mut self, other: &B) {
self.value += other.value;
}
pub fn value(&self) -> i64 {
self.value
}
}
struct B {
pub value: i64
}
struct State {
a: A,
b: B
}
impl State {
pub fn new() -> Self {
State {
a: A::new(),
b: B { value: 1 }
}
}
pub fn do_stuff(&mut self) -> i64 {
self.a.do_something(&self.b);
self.a.value()
}
pub fn get_b(&self) -> &B {
&self.b
}
}
fn main() {
let mut state = State::new();
println!("{}", state.do_stuff());
}
That is, when I directly refer to self.b. But when I change do_stuff() to this:
pub fn do_stuff(&mut self) -> i64 {
self.a.do_something(self.get_b());
self.a.value()
}
The compiler complains: cannot borrow `*self` as immutable because `self.a` is also borrowed as mutable.
What if I need to do something more complex than just returning a member in order to get the argument for a.do_something()? Must I make a function that returns b by value and store it in a binding, then pass that binding to do_something()? What if b is complex?
More importantly to my understanding, what kind of memory-unsafety is the compiler saving me from here?

A key aspect of mutable references is that they are guaranteed to be the only way to access a particular value while they exist (unless they're reborrowed, which "disables" them temporarily).
When you write
self.a.do_something(&self.b);
the compiler is able to see that the borrow on self.a (which is taken implicitly to perform the method call) is distinct from the borrow on self.b, because it can reason about direct field accesses.
However, when you write
self.a.do_something(self.get_b());
then the compiler doesn't see a borrow on self.b, but rather a borrow on self. That's because lifetime parameters on method signatures cannot propagate such detailed information about borrows. Therefore, the compiler cannot guarantee that the value returned by self.get_b() doesn't give you access to self.a, which would create two references that can access self.a, one of them being mutable, which is illegal.
The reason field borrows don't propagate across functions is to simplify type checking and borrow checking (for machines and for humans). The principle is that the signature should be sufficient for performing those tasks: changing the implementation of a function should not cause errors in its callers.
What if I need to do something more complex than just returning a member in order to get the argument for a.do_something()?
I would move get_b from State to B and call get_b on self.b. This way, the compiler can see the distinct borrows on self.a and self.b and will accept the code.
self.a.do_something(self.b.get_b());

Yes, the compiler isolates functions for the purposes of the safety checks it makes. If it didn't, then every function would essentially have to be inlined everywhere. No one would appreciate this for at least two reasons:
Compile times would go through the roof, and many opportunities for parallelization would have to be discarded.
Changes to a function N calls away could affect the current function. See also Why are explicit lifetimes needed in Rust? which touches on the same concept.
what kind of memory-unsafety is the compiler saving me from here
None, really. In fact, it could be argued that it's creating false positives, as your example shows.
It's really more of a benefit for preserving programmer sanity.
The general advice that I give and follow when I encounter this problem is that the compiler is guiding you to discovering a new type in your existing code.
Your particular example is a bit too simplified for this to make sense, but if you had struct Foo(A, B, C) and found that a method on Foo needed A and B, that's often a good sign that there's a hidden type composed of A and B: struct Foo(Bar, C); struct Bar(A, B).
This isn't a silver bullet as you can end up with methods that need each pair of data, but in my experience it works the majority of the time.

Why is it useful to use PhantomData to inform the compiler that a struct owns a generic if I already implement Drop?

In the Rustonomicon's guide to PhantomData, there is a part about what happens if a Vec-like struct has *const T field, but no PhantomData<T>:
The drop checker will generously determine that Vec<T> does not own any values of type T. This will in turn make it conclude that it doesn't need to worry about Vec dropping any T's in its destructor for determining drop check soundness. This will in turn allow people to create unsoundness using Vec's destructor.
What does it mean? If I implement Drop for a struct and manually destroy all Ts in it, why should I care if compiler knows that my struct owns some Ts?

The PhantomData<T> within Vec<T> (held indirectly via a Unique<T> within RawVec<T>) communicates to the compiler that the vector may own instances of T, and therefore the vector may run destructors for T when the vector is dropped.
Deep dive: We have a combination of factors here:
We have a Vec<T> which has an impl Drop (i.e. a destructor implementation).
Under the rules of RFC 1238, this would usually imply a relationship between instances of Vec<T> and any lifetimes that occur within T, by requiring that all lifetimes within T strictly outlive the vector.
However, the destructor for Vec<T> specifically opts out of this semantics for just that destructor (of Vec<T> itself) via the use of special unstable attributes (see RFC 1238 and RFC 1327). This allows for a vector to hold references that have the same lifetime of the vector itself. This is considered sound; after all, the vector itself will not dereference data pointed to by such references (all its doing is dropping values and deallocating the backing array), as long as an important caveat holds.
The important caveat: While the vector itself will not dereference pointers within its contained values while destructing itself, it will drop the values held by the vector. If those values of type T themselves have destructors, those destructors for T get run. And if those destructors access the data held within their references, then we would have a problem if we allowed dangling pointers within those references.
So, diving in even more deeply: the way that we confirm dropck validity for a given structure S, we first double check if S itself has an impl Drop for S (and if so, we enforce rules on S with respect to its type parameters). But even after that step, we then recursively descend into the structure of S itself, and double check for each of its fields that everything is kosher according to dropck. (Note that we do this even if a type parameter of S is tagged with #[may_dangle].)
In this specific case, we have a Vec<T> which (indirectly via RawVec<T>/Unique<T>) owns a collection of values of type T, represented in a raw pointer *const T. However, the compiler attaches no ownership semantics to *const T; that field alone in a structure S implies no relationship between S and T, and thus enforces no constraint in terms of the relationship of lifetimes within the types S and T (at least from the viewpoint of dropck).
Therefore, if the Vec<T> had solely a *const T, the recursive descent into the structure of the vector would fail to capture the ownership relation between the vector and the instances of T contained within the vector. That, combined with the #[may_dangle] attribute on T, would cause the compiler to accept unsound code (namely cases where destructors for T end up trying to access data that has already been deallocated).
BUT: Vec<T> does not solely contain a *const T. There is also a PhantomData<T>, and that conveys to the compiler "hey, even though you can assume (due to the #[may_dangle] T) that the destructor for Vec won't access data of T when the vector is dropped, it is still possible that some destructor of T itself will access data of T as the vector is dropped."
The end effect: Given Vec<T>, if T doesn't have a destructor, then the compiler provides you with more flexibility (namely, it allows a vector to hold data with references to data that lives for the same amount of time as the vector itself, even though such data may be torn down before the vector is). But if T does have a destructor (and that destructor is not otherwise communicating to the compiler that it won't access any referenced data), then the compiler is more strict, requiring any referenced data to strictly outlive the vector (thus ensuring that when the destructor for T runs, all the referenced data will still be valid).
If one wants to try to understand this via concrete exploration, you can try comparing how the compiler differs in its treatment of little container types that vary in their use of #[may_dangle] and PhantomData.
Here is some sample code I have whipped up to illustrate this:
// Illustration of a case where PhantomData is providing necessary ownership
// info to rustc.
//
// MyBox2<T> uses just a `*const T` to hold the `T` it owns.
// MyBox3<T> has both a `*const T` AND a PhantomData<T>; the latter communicates
// its ownership relationship with `T`.
//
// Skim down to `fn f2()` to see the relevant case,
// and compare it to `fn f3()`. When you run the program,
// the output will include:
//
// drop PrintOnDrop(mb2b, PrintOnDrop("v2b", 13, INVALID), Valid)
//
// (However, in the absence of #[may_dangle], the compiler will constrain
// things in a manner that may indeed imply that PhantomData is unnecessary;
// pnkfelix is not 100% sure of this claim yet, though.)
#![feature(alloc, dropck_eyepatch, generic_param_attrs, heap_api)]
extern crate alloc;
use alloc::heap;
use std::fmt;
use std::marker::PhantomData;
use std::mem;
use std::ptr;
#[derive(Copy, Clone, Debug)]
enum State { INVALID, Valid }
#[derive(Debug)]
struct PrintOnDrop<T: fmt::Debug>(&'static str, T, State);
impl<T: fmt::Debug> PrintOnDrop<T> {
fn new(name: &'static str, t: T) -> Self {
PrintOnDrop(name, t, State::Valid)
}
}
impl<T: fmt::Debug> Drop for PrintOnDrop<T> {
fn drop(&mut self) {
println!("drop PrintOnDrop({}, {:?}, {:?})",
self.0,
self.1,
self.2);
self.2 = State::INVALID;
}
}
struct MyBox1<T> {
v: Box<T>,
}
impl<T> MyBox1<T> {
fn new(t: T) -> Self {
MyBox1 { v: Box::new(t) }
}
}
struct MyBox2<T> {
v: *const T,
}
impl<T> MyBox2<T> {
fn new(t: T) -> Self {
unsafe {
let p = heap::allocate(mem::size_of::<T>(), mem::align_of::<T>());
let p = p as *mut T;
ptr::write(p, t);
MyBox2 { v: p }
}
}
}
unsafe impl<#[may_dangle] T> Drop for MyBox2<T> {
fn drop(&mut self) {
unsafe {
// We want this to be *legal*. This destructor is not
// allowed to call methods on `T` (since it may be in
// an invalid state), but it should be allowed to drop
// instances of `T` as it deconstructs itself.
//
// (Note however that the compiler has no knowledge
// that `MyBox2<T>` owns an instance of `T`.)
ptr::read(self.v);
heap::deallocate(self.v as *mut u8,
mem::size_of::<T>(),
mem::align_of::<T>());
}
}
}
struct MyBox3<T> {
v: *const T,
_pd: PhantomData<T>,
}
impl<T> MyBox3<T> {
fn new(t: T) -> Self {
unsafe {
let p = heap::allocate(mem::size_of::<T>(), mem::align_of::<T>());
let p = p as *mut T;
ptr::write(p, t);
MyBox3 { v: p, _pd: Default::default() }
}
}
}
unsafe impl<#[may_dangle] T> Drop for MyBox3<T> {
fn drop(&mut self) {
unsafe {
ptr::read(self.v);
heap::deallocate(self.v as *mut u8,
mem::size_of::<T>(),
mem::align_of::<T>());
}
}
}
fn f1() {
// `let (v, _mb1);` and `let (_mb1, v)` won't compile due to dropck
let v1; let _mb1;
v1 = PrintOnDrop::new("v1", 13);
_mb1 = MyBox1::new(PrintOnDrop::new("mb1", &v1));
}
fn f2() {
{
let (v2a, _mb2a); // Sound, but not distinguished from below by rustc!
v2a = PrintOnDrop::new("v2a", 13);
_mb2a = MyBox2::new(PrintOnDrop::new("mb2a", &v2a));
}
{
let (_mb2b, v2b); // Unsound!
v2b = PrintOnDrop::new("v2b", 13);
_mb2b = MyBox2::new(PrintOnDrop::new("mb2b", &v2b));
// namely, v2b dropped before _mb2b, but latter contains
// value that attempts to access v2b when being dropped.
}
}
fn f3() {
let v3; let _mb3; // `let (v, mb3);` won't compile due to dropck
v3 = PrintOnDrop::new("v3", 13);
_mb3 = MyBox3::new(PrintOnDrop::new("mb3", &v3));
}
fn main() {
f1(); f2(); f3();
}

Caveat emptor — I'm not that strong in the extremely deep theory that truly answers your question. I'm just a layperson who has used Rust a bit and has read the related RFCs. Always refer back to those original sources for a less-diluted version of the truth.
RFC 769 introduced the actual The Drop-Check Rule:
Let v be some value (either temporary or named) and 'a be some
lifetime (scope); if the type of v owns data of type D, where (1.)
D has a lifetime- or type-parametric Drop implementation, and (2.)
the structure of D can reach a reference of type &'a _, and (3.)
either:
(A.) the Drop impl for D instantiates D at 'a
directly, i.e. D<'a>, or,
(B.) the Drop impl for D has some type parameter with a
trait bound T where T is a trait that has at least
one method,
then 'a must strictly outlive the scope of v.
It then goes further to define some of those terms, including what it means for one type to own another. This goes further to mention PhantomData specifically:
Therefore, as an additional special case to the criteria above for when the type E owns data of type D, we include:
If E is PhantomData<T>, then recurse on T.
A key problem occurs when two variables are defined at the same time:
struct Noisy<'a>(&'a str);
impl<'a> Drop for Noisy<'a> {
fn drop(&mut self) { println!("Dropping {}", self.0 )}
}
fn main() -> () {
let (mut v, s) = (Vec::new(), "hi".to_string());
let noisy = Noisy(&s);
v.push(noisy);
}
As I understand it, without The Drop-Check Rule and indicating that Vec owns Noisy, code like this might compile. When the Vec is dropped, the drop implementation could access an invalid reference; introducing unsafety.
Returning to your points:
If I implement Drop for a struct and manually destroy all Ts in it, why should I care if compiler knows that my struct owns some Ts?
The compiler must know that you own the value because you can/will call drop. Since the implementation of drop is arbitrary, if you are going to call it, the compiler must forbid you from accepting values that would cause unsafe behavior during drop.
Always remember that any arbitrary T can be a value, a reference, a value containing a reference, etc. When trying to puzzle out these types of things, it's important to try to use the most complicated variant for any thought experiments.
All of that should provide enough pieces to connect-the-dots; for full understanding, reading the RFC a few times is probably better than relying on my flawed interpretation.
Then it gets more complicated. RFC 1238 further modifies The Drop-Check Rule, removing this specific reasoning. It does say:
parametricity is a necessary but not sufficient condition to justify the inferences that dropck makes
Continuing to use PhantomData seems the safest thing to do, but it may not be required. An anonymous Twitter benefactor pointed out this code:
use std::marker::PhantomData;
#[derive(Debug)] struct MyGeneric<T> { x: Option<T> }
#[derive(Debug)] struct MyDropper<T> { x: Option<T> }
#[derive(Debug)] struct MyHiddenDropper<T> { x: *const T }
#[derive(Debug)] struct MyHonestHiddenDropper<T> { x: *const T, boo: PhantomData<T> }
impl<T> Drop for MyDropper<T> { fn drop(&mut self) { } }
impl<T> Drop for MyHiddenDropper<T> { fn drop(&mut self) { } }
impl<T> Drop for MyHonestHiddenDropper<T> { fn drop(&mut self) { } }
fn main() {
// Does Compile! (magic annotation on destructor)
{
let (a, mut b) = (0, vec![]);
b.push(&a);
}
// Does Compile! (no destructor)
{
let (a, mut b) = (0, MyGeneric { x: None });
b.x = Some(&a);
}
// Doesn't Compile! (has destructor, no attribute)
{
let (a, mut b) = (0, MyDropper { x: None });
b.x = Some(&a);
}
{
let (a, mut b) = (0, MyHiddenDropper { x: 0 as *const _ });
b.x = &&a;
}
{
let (a, mut b) = (0, MyHonestHiddenDropper { x: 0 as *const _, boo: PhantomData });
b.x = &&a;
}
}
This suggests that the changes in RFC 1238 made the compiler more conservative, such that simply having a lifetime or type parameter is enough to prevent it from compiling.
You can also note that Vec doesn't have this problem because it uses the unsafe_destructor_blind_to_params attribute described in the the RFC.

Why is a Cell used to create unmovable objects?

So I ran into this code snippet showing how to create "unmoveable" types in Rust - moves are prevented because the compiler treats the object as borrowed for its whole lifetime.
use std::cell::Cell;
use std::marker;
struct Unmovable<'a> {
lock: Cell<marker::ContravariantLifetime<'a>>,
marker: marker::NoCopy
}
impl<'a> Unmovable<'a> {
fn new() -> Unmovable<'a> {
Unmovable {
lock: Cell::new(marker::ContravariantLifetime),
marker: marker::NoCopy
}
}
fn lock(&'a self) {
self.lock.set(marker::ContravariantLifetime);
}
fn new_in(self_: &'a mut Option<Unmovable<'a>>) {
*self_ = Some(Unmovable::new());
self_.as_ref().unwrap().lock();
}
}
fn main(){
let x = Unmovable::new();
x.lock();
// error: cannot move out of `x` because it is borrowed
// let z = x;
let mut y = None;
Unmovable::new_in(&mut y);
// error: cannot move out of `y` because it is borrowed
// let z = y;
assert_eq!(std::mem::size_of::<Unmovable>(), 0)
}
I don't yet understand how this works. My guess is that the lifetime of the borrow-pointer argument is forced to match the lifetime of the lock field. The weird thing is, this code continues working in the same way if:
I change ContravariantLifetime<'a> to CovariantLifetime<'a>, or to InvariantLifetime<'a>.
I remove the body of the lock method.
But, if I remove the Cell, and just use lock: marker::ContravariantLifetime<'a> directly, as so:
use std::marker;
struct Unmovable<'a> {
lock: marker::ContravariantLifetime<'a>,
marker: marker::NoCopy
}
impl<'a> Unmovable<'a> {
fn new() -> Unmovable<'a> {
Unmovable {
lock: marker::ContravariantLifetime,
marker: marker::NoCopy
}
}
fn lock(&'a self) {
}
fn new_in(self_: &'a mut Option<Unmovable<'a>>) {
*self_ = Some(Unmovable::new());
self_.as_ref().unwrap().lock();
}
}
fn main(){
let x = Unmovable::new();
x.lock();
// does not error?
let z = x;
let mut y = None;
Unmovable::new_in(&mut y);
// does not error?
let z = y;
assert_eq!(std::mem::size_of::<Unmovable>(), 0)
}
Then the "Unmoveable" object is allowed to move. Why would that be?

The true answer is comprised of a moderately complex consideration of lifetime variancy, with a couple of misleading aspects of the code that need to be sorted out.
For the code below, 'a is an arbitrary lifetime, 'small is an arbitrary lifetime that is smaller than 'a (this can be expressed by the constraint 'a: 'small), and 'static is used as the most common example of a lifetime that is larger than 'a.
Here are the facts and steps to follow in the consideration:
Normally, lifetimes are contravariant; &'a T is contravariant with regards to 'a (as is T<'a> in the absence of any variancy markers), meaning that if you have a &'a T, it’s OK to substitute a longer lifetime than 'a, e.g. you can store in such a place a &'static T and treat it as though it were a &'a T (you’re allowed to shorten the lifetime).
In a few places, lifetimes can be invariant; the most common example is &'a mut T which is invariant with regards to 'a, meaning that if you have a &'a mut T, you cannot store a &'small mut T in it (the borrow doesn’t live long enough), but you also cannot store a &'static mut T in it, because that would cause trouble for the reference being stored as it would be forgotten that it actually lived for longer, and so you could end up with multiple simultaneous mutable references being created.
A Cell contains an UnsafeCell; what isn’t so obvious is that UnsafeCell is magic, being wired to the compiler for special treatment as the language item named “unsafe”. Importantly, UnsafeCell<T> is invariant with regards to T, for similar sorts of reasons to the invariance of &'a mut T with regards to 'a.
Thus, Cell<any lifetime variancy marker> will actually behave the same as Cell<InvariantLifetime<'a>>.
Furthermore, you don’t actually need to use Cell any more; you can just use InvariantLifetime<'a>.
Returning to the example with the Cell wrapping removed and a ContravariantLifetime (actually equivalent to just defining struct Unmovable<'a>;, for contravariance is the default as is no Copy implementation): why does it allow moving the value? … I must confess, I don’t grok this particular case yet and would appreciate some help myself in understanding why it’s allowed. It seems back to front, that covariance would allow the lock to be shortlived but that contravariance and invariance wouldn’t, but in practice it seems that only invariance is performing the desired function.
Anyway, here’s the final result. Cell<ContravariantLifetime<'a>> is changed to InvariantLifetime<'a> and that’s the only functional change, making the lock method function as desired, taking a borrow with an invariant lifetime. (Another solution would be to have lock take &'a mut self, for a mutable reference is, as already discussed, invariant; this is inferior, however, as it requires needless mutability.)
One other thing that needs mentioning: the contents of the lock and new_in methods are completely superfluous. The body of a function will never change the static behaviour of the compiler; only the signature matters. The fact that the lifetime parameter 'a is marked invariant is the key point. So the whole “construct an Unmovable object and call lock on it” part of new_in is completely superfluous. Similarly setting the contents of the cell in lock was a waste of time. (Note that it is again the invariance of 'a in Unmovable<'a> that makes new_in work, not the fact that it is a mutable reference.)
use std::marker;
struct Unmovable<'a> {
lock: marker::InvariantLifetime<'a>,
}
impl<'a> Unmovable<'a> {
fn new() -> Unmovable<'a> {
Unmovable {
lock: marker::InvariantLifetime,
}
}
fn lock(&'a self) { }
fn new_in(_: &'a mut Option<Unmovable<'a>>) { }
}
fn main() {
let x = Unmovable::new();
x.lock();
// This is an error, as desired:
let z = x;
let mut y = None;
Unmovable::new_in(&mut y);
// Yay, this is an error too!
let z = y;
}

An interesting problem! Here's my understanding of it...
Here's another example that doesn't use Cell:
#![feature(core)]
use std::marker::InvariantLifetime;
struct Unmovable<'a> { //'
lock: Option<InvariantLifetime<'a>>, //'
}
impl<'a> Unmovable<'a> {
fn lock_it(&'a mut self) { //'
self.lock = Some(InvariantLifetime)
}
}
fn main() {
let mut u = Unmovable { lock: None };
u.lock_it();
let v = u;
}
(Playpen)
The important trick here is that the structure needs to borrow itself. Once we have done that, it can no longer be moved because any move would invalidate the borrow. This isn't conceptually different from any other kind of borrow:
struct A(u32);
fn main() {
let a = A(42);
let b = &a;
let c = a;
}
The only thing is that you need some way of letting the struct contain its own reference, which isn't possible to do at construction time. My example uses Option, which requires &mut self and the linked example uses Cell, which allows for interior mutability and just &self.
Both examples use a lifetime marker because it allows the typesystem to track the lifetime without needing to worry about a particular instance.
Let's look at your constructor:
fn new() -> Unmovable<'a> { //'
Unmovable {
lock: marker::ContravariantLifetime,
marker: marker::NoCopy
}
}
Here, the lifetime put into lock is chosen by the caller, and it ends up being the normal lifetime of the Unmovable struct. There's no borrow of self.
Let's next look at your lock method:
fn lock(&'a self) {
}
Here, the compiler knows that the lifetime won't change. However, if we make it mutable:
fn lock(&'a mut self) {
}
Bam! It's locked again. This is because the compiler knows that the internal fields could change. We can actually apply this to our Option variant and remove the body of lock_it!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why is an explicit lifetime needed in this Rust function output? - rust

Related

Can I have a mutable reference to a type and its trait object in the same scope? [duplicate]

How to assign an impl trait in a struct?

Why are borrows of struct members allowed in &mut self, but not of self to immutable methods?

Why is it useful to use PhantomData to inform the compiler that a struct owns a generic if I already implement Drop?

Why is a Cell used to create unmovable objects?

Categories

Resources