How does Rust infer lifetime while raw pointers are involved? - rust

struct MyCell<T> {
value: T
}
impl<T> MyCell<T> {
fn new(value: T) -> Self {
MyCell { value }
}
fn get(&self) -> &T {
&self.value
}
fn set(&self, new_value: T) {
unsafe {
*(&self.value as *const T as *mut T) = new_value;
}
}
}
fn set_to_local(cell: &MyCell<&i32>) {
let local = 100;
cell.set(&local);
}
fn main() {
let cell = MyCell::new(&10);
set_to_local(&cell);
}
When calling cell.set(&local), suppose cell is 'x and '&local is 'y, I am told that the covariance rule will change the type of cell from &MyCell<'x, &i32> to &MyCell<'y, &i32>.
How does the assignment inside the unsafe block affect the lifetime inference for the parameters of set()? Raw pointers does not have lifetime then how does the compiler know it should make cell and new_value have the same lifetime using covariance?

How does the assignment inside the unsafe block affect the lifetime inference for the parameters of set()?
It doesn't — and this is not even specific to raw pointers or unsafe. Function bodies never affect any aspect of the function signature (except for async fns which are irrelevant here).
Raw pointers does not have lifetime then how does the compiler know it should make cell and new_value have the same lifetime using covariance?
It sounds like you misread some advice. In the code you have, Cell<T> is invariant in T, but for an interior-mutable type to be sound, it must be invariant (not covariant) in the type parameter. In the code you have, the compiler infers covariance for T because MyCell contains a field that is simply of the type T. Covariance is the “typical” case for most generic types.
Therefore, the code you have is unsound because MyCell is covariant over T but must instead be invariant over T.
Your code is also unsound because in the implementation of set(), you're creating an immutable reference to a T, &self.value, and then writing to its referent. This is “undefined behavior” no matter how you do it, because creating &self.value asserts to the compiler/optimizer that the pointed-to memory won't be modified until the reference is dropped.
If you want to reimplement the standard library's Cell, you must do it like the standard library does, with the UnsafeCell primitive:
pub struct Cell<T: ?Sized> {
value: UnsafeCell<T>,
}
UnsafeCell is how you opt out of &'s immutability guarantees: creating an &UnsafeCell<T> doesn't assert that the T won't be mutated. It is also invariant in T, which automatically makes the containing type invariant in T. Both of these are necessary here.

Related

Is there a way to pass a reference to a generic function and return an impl Trait that isn't related to the argument's lifetime?

I've worked down a real-life example in a web app, which I've solved using unnecessary heap allocation, to the following example:
// Try replacing with (_: &String)
fn make_debug<T>(_: T) -> impl std::fmt::Debug {
42u8
}
fn test() -> impl std::fmt::Debug {
let value = "value".to_string();
// try removing the ampersand to get this to compile
make_debug(&value)
}
pub fn main() {
println!("{:?}", test());
}
As is, compiling this code gives me:
error[E0597]: `value` does not live long enough
--> src/main.rs:9:16
|
5 | fn test() -> impl std::fmt::Debug {
| -------------------- opaque type requires that `value` is borrowed for `'static`
...
9 | make_debug(&value)
| ^^^^^^ borrowed value does not live long enough
10 | }
| - `value` dropped here while still borrowed
I can fix this error in at least two ways:
Instead of passing in a reference to value in test(), pass in value itself
Instead of the parameter T, explicitly state the type of the argument for make_debug as &String or &str
My understanding of what's happening is that, when there is a parameter, the borrow checker is assuming that any lifetime on that parameter affects the output impl Debug value.
Is there a way to keep the code parameterized, continue passing in a reference, and get the borrow checker to accept it?
I think this is due to the rules around how impl trait opaque types capture lifetimes.
If there are lifetimes inside an argument T, then an impl trait has to incorporate them. Additional lifetimes in the type signature follow the normal rules.
For more information please see:
https://github.com/rust-lang/rust/issues/43396#issuecomment-349716967
https://github.com/rust-lang/rfcs/blob/master/text/1951-expand-impl-trait.md#lifetime-parameters
https://github.com/rust-lang/rfcs/blob/master/text/1951-expand-impl-trait.md#assumption-3-there-should-be-an-explicit-marker-when-a-lifetime-could-be-embedded-in-a-return-type
https://github.com/rust-lang/rfcs/blob/master/text/1951-expand-impl-trait.md#scoping-for-type-and-lifetime-parameters
A more complete answer
Original goal: the send_form function takes an input parameter of type &T which is rendered to a binary representation. That binary representation is owned by the resulting impl Future, and no remnant of the original &T remains. Therefore, the lifetime of &T need not outlive the impl Trait. All good.
The problem arises when T itself, additionally, contains references with lifetimes. If we were not using impl Trait, our signature would look something like this:
fn send_form<T>(self, data: &T) -> SendFormFuture;
And by looking at SendFormFuture, we can readily observe that there is no remnant of T in there at all. Therefore, even if T has lifetimes of its own to deal with, we know that all references are used within the body of send_form, and never used again afterward by SendFormFuture.
However, with impl Future as the output, we get no such guarantees. There's no way to know if the concrete implementation of Future in fact holds onto the T.
In the case where T has no references, this still isn't a problem. Either the impl Future references the T, and fully takes ownership of it, or it doesn't reference it, and no lifetime issues arise.
However, if T does have references, you could end up in a situation where the concrete impl Future is holding onto a reference stored in the T. Even though the impl Future has ownership of the T itself, it doesn't have ownership of the values referenced by the T.
This is why the borrow check must be conservative, and insist that any references inside T must have a 'static lifetime.
The only workaround I can see is to bypass impl Future and be explicit in the return type. Then, you can demonstrate to the borrow checker quite easily that the output type does not reference the input T type at all, and any references in it are irrelevant.
The original code in the actix web client for send_form looks like:
https://docs.rs/awc/0.2.1/src/awc/request.rs.html#503-522
pub fn send_form<T: Serialize>(
self,
value: &T,
) -> impl Future<
Item = ClientResponse<impl Stream<Item = Bytes, Error = PayloadError>>,
Error = SendRequestError,
> {
let body = match serde_urlencoded::to_string(value) {
Ok(body) => body,
Err(e) => return Either::A(err(Error::from(e).into())),
};
// set content-type
let slf = self.set_header_if_none(
header::CONTENT_TYPE,
"application/x-www-form-urlencoded",
);
Either::B(slf.send_body(Body::Bytes(Bytes::from(body))))
}
You may need to patch the library or write your own function that does the same thing but with a concrete type. If anyone else knows how to deal with this apparent limitation of impl trait I'd love to hear it.
Here's how far I've gotten on a rewrite of send_form in awc (the actix-web client library):
pub fn send_form_alt<T: Serialize>(
self,
value: &T,
// ) -> impl Future<
// Item = ClientResponse<impl Stream<Item = Bytes, Error = PayloadError>>,
// Error = SendRequestError,
) -> Either<
FutureResult<String, actix_http::error::Error>,
impl Future<
Item = crate::response::ClientResponse<impl futures::stream::Stream>,
Error = SendRequestError,
>,
> {
Some caveats so far:
Either::B is necessarily an opaque impl trait of Future.
The first param of FutureResult might actually be Void or whatever the Void equivalent in Rust is called.

Why can I return a reference to an owned value of a function?

In chapter 19.2 of The Rust Programming Language, the following example compiles without any error. I found out from issue #1834 that there is a new lifetime elision rule that implicitly makes 's longer than 'c.
Although I couldn't find a detailed explanation of this new elision rule, I guess that it is not more than just an implicit version of the longer, more explicit constraint: <'c, 's: 'c>. I think however my confusion is probably not about this new elision rule but of course I could be wrong about this.
My understanding is, that parse_context takes ownership of context as it has not been borrowed but actually moved to the function. That alone implies to me that the lifetime of context should match the lifetime of the function it is owned by regardless of the lifetimes and constraint we defined in Context, and Parser.
Based on those definitions, the part where context outlives the temporary Parser makes perfect sense to me (after all, we defined a longer lifetime), but the part where the &str reference is not dropped when context goes out of scope at the end of parse_context and I can still safely return it -- makes me puzzled.
What have I missed? How can the compiler reason about the lifetime of the returned &str?
UPDATED EXAMPLE
struct Context<'s>(&'s str);
struct Parser<'c, 's>
{
context: &'c Context<'s>,
}
impl<'c, 's> Parser<'c, 's>
{
fn parse(&self) -> Result<(), &'s str>
{
Err(self.context.0)
}
}
fn parse_context(context: Context) -> Result<(), &str>
{
Parser { context: &context }.parse()
}
fn main()
{
let mut s = String::new();
s += "Avail";
s += "able?";
if let Err(text) = parse_context(Context(&s))
{
println!("{}", text);
}
}
You are not returning a reference to the owned value of the function. You are returning a copy of the reference passed in.
Your function is an elaborate version of the identity function:
fn parse_context(s: &str) -> &str {
s
}
In your real code, you take a reference to a struct containing a string slice, then another reference to the string slice, but all of those references are thrown away.
For example, there's an unneeded reference in parse:
fn parse(&self) -> Result<(), &'s str> {
Err( self.context.0)
// ^ no & needed
}
Additionally, if you enable some more lints, you'll be forced to add more lifetimes to your function signature, which might make things more clear:
#![deny(rust_2018_idioms)]
fn parse_context(context: Context<'_>) -> Result<(), &'_ str> {
Parser { context: &context }.parse()
}
See also:
What are Rust's exact auto-dereferencing rules?
Although I couldn't find a detailed explanation of this new elision rule,
T: 'a inference in structs in the edition guide.
struct Context<'s>(&'s str);
→ Values of type Context hold a string with some lifetime 's. This lifetime is implicitly at least as long as the lifetime of the context, but it may be longer.
struct Parser<'c, 's>
{
context: &'c Context<'s>,
}
→ Values of type Parser hold a a reference to context with some lifetime 'c. This context holds a string with some other lifetime 's.
impl<'c, 's> Parser<'c, 's>
{
fn parse(&self) -> Result<(), &'s str>
{
Err(self.context.0)
}
}
→ Function parse returns a value with lifetime 's, ie. with the same lifetime as the string that is stored inside the context, which is not the same as the lifetime of the context itself.
fn parse_context(context: Context) -> Result<(), &str>
{
Parser { context: &context }.parse()
}
→ I'm not sure exactly where this is specified, but obviously the compiler infers that the lifetime for the returned string is the same as the 's parameter used for the context. Note that even though the context itself is moved into parse_context, this only affects the context itself, not the string that it contains.
fn main()
{
let mut s = String::new();
→ Create a new string valid until the end of main
s += "Avail";
s += "able?";
if let Err(text) = parse_context(Context(&s))
→ Create a new context and move it into parse_context. It will automatically be dropped at the end of parse_context. It holds a reference to string s which is valid until the end of main and parse_context returns a string with the same lifetime as s ⇒ text is valid until the end of main.
{
println!("{}", text);
}
}
→ No problem: text is valid until the end of main.
Thank to the comments of Jmb and some bits of Shepmaster's answer it is indeed clear to me now that my confusion was about the RAII rules and the ownership.
That is, the string was created in the main scope, the anonymous Context instance is not taking ownership of the string it is only borrowing a reference, therefore even when the instance is dropped at the end of parse_context along with the borrowed reference, a reference copied to the Err object still exists and points to the existing string -- hence we are using the constrained lifetime variables and the compiler is able to reason about the lifetime of the internal string reference.

What does `impl` mean when used as the argument type or return type of a function?

I read this code:
pub fn up_to(limit: u64) -> impl Generator<Yield = u64, Return = u64> {
move || {
for x in 0..limit {
yield x;
}
return limit;
}
}
What does impl mean? How might this be implemented in plain Rust or C++?
This is the new impl Trait syntax which allows the programmer to avoid naming generic types. The feature is available as of Rust 1.26.
Here, it is used in return position to say "the type returned will implement this trait, and that's all I'm telling you". In this case, note that all return paths of the function must return the exact same concrete type.
The syntax can also be used in argument position, in which case the caller decides what concrete type to pass.
See also:
Using impl Trait in Trait definition
What are the differences between an impl trait argument and generic function parameter?
What makes `impl Trait` as an argument "universal" and as a return value "existential"?
What is the correct way to return an Iterator (or any other trait)?
What does impl mean?
As Matthieu explained, impl X means "return a concrete implementation of trait X". Without this you have the choice of returning a concrete type that implements the trait, e.g. an UpToImpl, or erasing the type by returning a Box<Generator>. The former requires exposing the type, while the latter carries the run-time cost of dynamic allocation and virtual dispatch. More importantly, and this is the clincher for the generator case, the former approach precludes returning a closure, because closures return values of anonymous types.
How might this be implemented in plain Rust or C++?
If you mean how to implement up_to, and you want to do it without incurring allocation overhead, you have to give up using a closure and manually rewrite the generator into a state machine that implements the Generator trait:
#![feature(generator_trait)]
use std::{
ops::{Generator, GeneratorState},
pin::Pin,
};
pub struct UpToImpl {
limit: u64,
x: u64,
}
impl Generator for UpToImpl {
type Yield = u64;
type Return = u64;
fn resume(mut self: Pin<&mut Self>) -> GeneratorState<u64, u64> {
let x = self.x;
if x < self.limit {
self.x += 1;
GeneratorState::Yielded(x)
} else {
GeneratorState::Complete(self.limit)
}
}
}
pub fn up_to(limit: u64) -> UpToImpl {
UpToImpl { x: 0, limit }
}
fn main() {
let mut v = Box::pin(up_to(3));
println!("{:?}", v.as_mut().resume());
println!("{:?}", v.as_mut().resume());
println!("{:?}", v.as_mut().resume());
println!("{:?}", v.as_mut().resume());
}
This kind of transformation is essentially what the Rust compiler does behind the scenes when given a closure that contains yield, except that the generated type equivalent to UpToImpl is anonymous. (A similar but much simpler transformation is used to convert ordinary closures to values of anonymous types that implement one of the Fn traits.)
There is another difference between returning impl Generator and a concrete type. When returning UpToImpl, that type has to be public, and thus becomes part of the function signature. For example, a caller is allowed to do this:
let x: UpToImpl = up_to(10);
That code will break if UpToImpl is ever renamed, or if you decide to switch to using a generator closure.
The up_to in this answer compiles even when changed to return impl Generator, so once impl trait is made stable, that will be a better option for its return type. In that case, the caller cannot rely or refer to the exact return type, and it can be switched between any type that implements the trait, including the anonymous closure, without loss of source-level backward compatibility.
See also:
Lazy sequence generation in Rust

Why are borrows of struct members allowed in &mut self, but not of self to immutable methods?

If I have a struct that encapsulates two members, and updates one based on the other, that's fine as long as I do it this way:
struct A {
value: i64
}
impl A {
pub fn new() -> Self {
A { value: 0 }
}
pub fn do_something(&mut self, other: &B) {
self.value += other.value;
}
pub fn value(&self) -> i64 {
self.value
}
}
struct B {
pub value: i64
}
struct State {
a: A,
b: B
}
impl State {
pub fn new() -> Self {
State {
a: A::new(),
b: B { value: 1 }
}
}
pub fn do_stuff(&mut self) -> i64 {
self.a.do_something(&self.b);
self.a.value()
}
pub fn get_b(&self) -> &B {
&self.b
}
}
fn main() {
let mut state = State::new();
println!("{}", state.do_stuff());
}
That is, when I directly refer to self.b. But when I change do_stuff() to this:
pub fn do_stuff(&mut self) -> i64 {
self.a.do_something(self.get_b());
self.a.value()
}
The compiler complains: cannot borrow `*self` as immutable because `self.a` is also borrowed as mutable.
What if I need to do something more complex than just returning a member in order to get the argument for a.do_something()? Must I make a function that returns b by value and store it in a binding, then pass that binding to do_something()? What if b is complex?
More importantly to my understanding, what kind of memory-unsafety is the compiler saving me from here?
A key aspect of mutable references is that they are guaranteed to be the only way to access a particular value while they exist (unless they're reborrowed, which "disables" them temporarily).
When you write
self.a.do_something(&self.b);
the compiler is able to see that the borrow on self.a (which is taken implicitly to perform the method call) is distinct from the borrow on self.b, because it can reason about direct field accesses.
However, when you write
self.a.do_something(self.get_b());
then the compiler doesn't see a borrow on self.b, but rather a borrow on self. That's because lifetime parameters on method signatures cannot propagate such detailed information about borrows. Therefore, the compiler cannot guarantee that the value returned by self.get_b() doesn't give you access to self.a, which would create two references that can access self.a, one of them being mutable, which is illegal.
The reason field borrows don't propagate across functions is to simplify type checking and borrow checking (for machines and for humans). The principle is that the signature should be sufficient for performing those tasks: changing the implementation of a function should not cause errors in its callers.
What if I need to do something more complex than just returning a member in order to get the argument for a.do_something()?
I would move get_b from State to B and call get_b on self.b. This way, the compiler can see the distinct borrows on self.a and self.b and will accept the code.
self.a.do_something(self.b.get_b());
Yes, the compiler isolates functions for the purposes of the safety checks it makes. If it didn't, then every function would essentially have to be inlined everywhere. No one would appreciate this for at least two reasons:
Compile times would go through the roof, and many opportunities for parallelization would have to be discarded.
Changes to a function N calls away could affect the current function. See also Why are explicit lifetimes needed in Rust? which touches on the same concept.
what kind of memory-unsafety is the compiler saving me from here
None, really. In fact, it could be argued that it's creating false positives, as your example shows.
It's really more of a benefit for preserving programmer sanity.
The general advice that I give and follow when I encounter this problem is that the compiler is guiding you to discovering a new type in your existing code.
Your particular example is a bit too simplified for this to make sense, but if you had struct Foo(A, B, C) and found that a method on Foo needed A and B, that's often a good sign that there's a hidden type composed of A and B: struct Foo(Bar, C); struct Bar(A, B).
This isn't a silver bullet as you can end up with methods that need each pair of data, but in my experience it works the majority of the time.

Why is it useful to use PhantomData to inform the compiler that a struct owns a generic if I already implement Drop?

In the Rustonomicon's guide to PhantomData, there is a part about what happens if a Vec-like struct has *const T field, but no PhantomData<T>:
The drop checker will generously determine that Vec<T> does not own any values of type T. This will in turn make it conclude that it doesn't need to worry about Vec dropping any T's in its destructor for determining drop check soundness. This will in turn allow people to create unsoundness using Vec's destructor.
What does it mean? If I implement Drop for a struct and manually destroy all Ts in it, why should I care if compiler knows that my struct owns some Ts?
The PhantomData<T> within Vec<T> (held indirectly via a Unique<T> within RawVec<T>) communicates to the compiler that the vector may own instances of T, and therefore the vector may run destructors for T when the vector is dropped.
Deep dive: We have a combination of factors here:
We have a Vec<T> which has an impl Drop (i.e. a destructor implementation).
Under the rules of RFC 1238, this would usually imply a relationship between instances of Vec<T> and any lifetimes that occur within T, by requiring that all lifetimes within T strictly outlive the vector.
However, the destructor for Vec<T> specifically opts out of this semantics for just that destructor (of Vec<T> itself) via the use of special unstable attributes (see RFC 1238 and RFC 1327). This allows for a vector to hold references that have the same lifetime of the vector itself. This is considered sound; after all, the vector itself will not dereference data pointed to by such references (all its doing is dropping values and deallocating the backing array), as long as an important caveat holds.
The important caveat: While the vector itself will not dereference pointers within its contained values while destructing itself, it will drop the values held by the vector. If those values of type T themselves have destructors, those destructors for T get run. And if those destructors access the data held within their references, then we would have a problem if we allowed dangling pointers within those references.
So, diving in even more deeply: the way that we confirm dropck validity for a given structure S, we first double check if S itself has an impl Drop for S (and if so, we enforce rules on S with respect to its type parameters). But even after that step, we then recursively descend into the structure of S itself, and double check for each of its fields that everything is kosher according to dropck. (Note that we do this even if a type parameter of S is tagged with #[may_dangle].)
In this specific case, we have a Vec<T> which (indirectly via RawVec<T>/Unique<T>) owns a collection of values of type T, represented in a raw pointer *const T. However, the compiler attaches no ownership semantics to *const T; that field alone in a structure S implies no relationship between S and T, and thus enforces no constraint in terms of the relationship of lifetimes within the types S and T (at least from the viewpoint of dropck).
Therefore, if the Vec<T> had solely a *const T, the recursive descent into the structure of the vector would fail to capture the ownership relation between the vector and the instances of T contained within the vector. That, combined with the #[may_dangle] attribute on T, would cause the compiler to accept unsound code (namely cases where destructors for T end up trying to access data that has already been deallocated).
BUT: Vec<T> does not solely contain a *const T. There is also a PhantomData<T>, and that conveys to the compiler "hey, even though you can assume (due to the #[may_dangle] T) that the destructor for Vec won't access data of T when the vector is dropped, it is still possible that some destructor of T itself will access data of T as the vector is dropped."
The end effect: Given Vec<T>, if T doesn't have a destructor, then the compiler provides you with more flexibility (namely, it allows a vector to hold data with references to data that lives for the same amount of time as the vector itself, even though such data may be torn down before the vector is). But if T does have a destructor (and that destructor is not otherwise communicating to the compiler that it won't access any referenced data), then the compiler is more strict, requiring any referenced data to strictly outlive the vector (thus ensuring that when the destructor for T runs, all the referenced data will still be valid).
If one wants to try to understand this via concrete exploration, you can try comparing how the compiler differs in its treatment of little container types that vary in their use of #[may_dangle] and PhantomData.
Here is some sample code I have whipped up to illustrate this:
// Illustration of a case where PhantomData is providing necessary ownership
// info to rustc.
//
// MyBox2<T> uses just a `*const T` to hold the `T` it owns.
// MyBox3<T> has both a `*const T` AND a PhantomData<T>; the latter communicates
// its ownership relationship with `T`.
//
// Skim down to `fn f2()` to see the relevant case,
// and compare it to `fn f3()`. When you run the program,
// the output will include:
//
// drop PrintOnDrop(mb2b, PrintOnDrop("v2b", 13, INVALID), Valid)
//
// (However, in the absence of #[may_dangle], the compiler will constrain
// things in a manner that may indeed imply that PhantomData is unnecessary;
// pnkfelix is not 100% sure of this claim yet, though.)
#![feature(alloc, dropck_eyepatch, generic_param_attrs, heap_api)]
extern crate alloc;
use alloc::heap;
use std::fmt;
use std::marker::PhantomData;
use std::mem;
use std::ptr;
#[derive(Copy, Clone, Debug)]
enum State { INVALID, Valid }
#[derive(Debug)]
struct PrintOnDrop<T: fmt::Debug>(&'static str, T, State);
impl<T: fmt::Debug> PrintOnDrop<T> {
fn new(name: &'static str, t: T) -> Self {
PrintOnDrop(name, t, State::Valid)
}
}
impl<T: fmt::Debug> Drop for PrintOnDrop<T> {
fn drop(&mut self) {
println!("drop PrintOnDrop({}, {:?}, {:?})",
self.0,
self.1,
self.2);
self.2 = State::INVALID;
}
}
struct MyBox1<T> {
v: Box<T>,
}
impl<T> MyBox1<T> {
fn new(t: T) -> Self {
MyBox1 { v: Box::new(t) }
}
}
struct MyBox2<T> {
v: *const T,
}
impl<T> MyBox2<T> {
fn new(t: T) -> Self {
unsafe {
let p = heap::allocate(mem::size_of::<T>(), mem::align_of::<T>());
let p = p as *mut T;
ptr::write(p, t);
MyBox2 { v: p }
}
}
}
unsafe impl<#[may_dangle] T> Drop for MyBox2<T> {
fn drop(&mut self) {
unsafe {
// We want this to be *legal*. This destructor is not
// allowed to call methods on `T` (since it may be in
// an invalid state), but it should be allowed to drop
// instances of `T` as it deconstructs itself.
//
// (Note however that the compiler has no knowledge
// that `MyBox2<T>` owns an instance of `T`.)
ptr::read(self.v);
heap::deallocate(self.v as *mut u8,
mem::size_of::<T>(),
mem::align_of::<T>());
}
}
}
struct MyBox3<T> {
v: *const T,
_pd: PhantomData<T>,
}
impl<T> MyBox3<T> {
fn new(t: T) -> Self {
unsafe {
let p = heap::allocate(mem::size_of::<T>(), mem::align_of::<T>());
let p = p as *mut T;
ptr::write(p, t);
MyBox3 { v: p, _pd: Default::default() }
}
}
}
unsafe impl<#[may_dangle] T> Drop for MyBox3<T> {
fn drop(&mut self) {
unsafe {
ptr::read(self.v);
heap::deallocate(self.v as *mut u8,
mem::size_of::<T>(),
mem::align_of::<T>());
}
}
}
fn f1() {
// `let (v, _mb1);` and `let (_mb1, v)` won't compile due to dropck
let v1; let _mb1;
v1 = PrintOnDrop::new("v1", 13);
_mb1 = MyBox1::new(PrintOnDrop::new("mb1", &v1));
}
fn f2() {
{
let (v2a, _mb2a); // Sound, but not distinguished from below by rustc!
v2a = PrintOnDrop::new("v2a", 13);
_mb2a = MyBox2::new(PrintOnDrop::new("mb2a", &v2a));
}
{
let (_mb2b, v2b); // Unsound!
v2b = PrintOnDrop::new("v2b", 13);
_mb2b = MyBox2::new(PrintOnDrop::new("mb2b", &v2b));
// namely, v2b dropped before _mb2b, but latter contains
// value that attempts to access v2b when being dropped.
}
}
fn f3() {
let v3; let _mb3; // `let (v, mb3);` won't compile due to dropck
v3 = PrintOnDrop::new("v3", 13);
_mb3 = MyBox3::new(PrintOnDrop::new("mb3", &v3));
}
fn main() {
f1(); f2(); f3();
}
Caveat emptor — I'm not that strong in the extremely deep theory that truly answers your question. I'm just a layperson who has used Rust a bit and has read the related RFCs. Always refer back to those original sources for a less-diluted version of the truth.
RFC 769 introduced the actual The Drop-Check Rule:
Let v be some value (either temporary or named) and 'a be some
lifetime (scope); if the type of v owns data of type D, where (1.)
D has a lifetime- or type-parametric Drop implementation, and (2.)
the structure of D can reach a reference of type &'a _, and (3.)
either:
(A.) the Drop impl for D instantiates D at 'a
directly, i.e. D<'a>, or,
(B.) the Drop impl for D has some type parameter with a
trait bound T where T is a trait that has at least
one method,
then 'a must strictly outlive the scope of v.
It then goes further to define some of those terms, including what it means for one type to own another. This goes further to mention PhantomData specifically:
Therefore, as an additional special case to the criteria above for when the type E owns data of type D, we include:
If E is PhantomData<T>, then recurse on T.
A key problem occurs when two variables are defined at the same time:
struct Noisy<'a>(&'a str);
impl<'a> Drop for Noisy<'a> {
fn drop(&mut self) { println!("Dropping {}", self.0 )}
}
fn main() -> () {
let (mut v, s) = (Vec::new(), "hi".to_string());
let noisy = Noisy(&s);
v.push(noisy);
}
As I understand it, without The Drop-Check Rule and indicating that Vec owns Noisy, code like this might compile. When the Vec is dropped, the drop implementation could access an invalid reference; introducing unsafety.
Returning to your points:
If I implement Drop for a struct and manually destroy all Ts in it, why should I care if compiler knows that my struct owns some Ts?
The compiler must know that you own the value because you can/will call drop. Since the implementation of drop is arbitrary, if you are going to call it, the compiler must forbid you from accepting values that would cause unsafe behavior during drop.
Always remember that any arbitrary T can be a value, a reference, a value containing a reference, etc. When trying to puzzle out these types of things, it's important to try to use the most complicated variant for any thought experiments.
All of that should provide enough pieces to connect-the-dots; for full understanding, reading the RFC a few times is probably better than relying on my flawed interpretation.
Then it gets more complicated. RFC 1238 further modifies The Drop-Check Rule, removing this specific reasoning. It does say:
parametricity is a necessary but not sufficient condition to justify the inferences that dropck makes
Continuing to use PhantomData seems the safest thing to do, but it may not be required. An anonymous Twitter benefactor pointed out this code:
use std::marker::PhantomData;
#[derive(Debug)] struct MyGeneric<T> { x: Option<T> }
#[derive(Debug)] struct MyDropper<T> { x: Option<T> }
#[derive(Debug)] struct MyHiddenDropper<T> { x: *const T }
#[derive(Debug)] struct MyHonestHiddenDropper<T> { x: *const T, boo: PhantomData<T> }
impl<T> Drop for MyDropper<T> { fn drop(&mut self) { } }
impl<T> Drop for MyHiddenDropper<T> { fn drop(&mut self) { } }
impl<T> Drop for MyHonestHiddenDropper<T> { fn drop(&mut self) { } }
fn main() {
// Does Compile! (magic annotation on destructor)
{
let (a, mut b) = (0, vec![]);
b.push(&a);
}
// Does Compile! (no destructor)
{
let (a, mut b) = (0, MyGeneric { x: None });
b.x = Some(&a);
}
// Doesn't Compile! (has destructor, no attribute)
{
let (a, mut b) = (0, MyDropper { x: None });
b.x = Some(&a);
}
{
let (a, mut b) = (0, MyHiddenDropper { x: 0 as *const _ });
b.x = &&a;
}
{
let (a, mut b) = (0, MyHonestHiddenDropper { x: 0 as *const _, boo: PhantomData });
b.x = &&a;
}
}
This suggests that the changes in RFC 1238 made the compiler more conservative, such that simply having a lifetime or type parameter is enough to prevent it from compiling.
You can also note that Vec doesn't have this problem because it uses the unsafe_destructor_blind_to_params attribute described in the the RFC.

Resources