I can't really understand the dereference here. The type of foo is TattleTell<&str>. The method len() is from foo.value, i.e. foo.value.len(). So why deref of TattleTell is invoked?
use std::ops::Deref;
struct TattleTell<T> {
value: T,
}
impl<T> Deref for TattleTell<T> {
type Target = T;
fn deref(&self) -> &T {
println!("{} was used!", std::any::type_name::<T>());
&self.value
}
}
fn main() {
let foo = TattleTell {
value: "secret message",
};
// dereference occurs here immediately
// after foo is auto-referenced for the
// function `len`
println!("{}", foo.len());
}
I won't fully describe Rust's auto-dereferencing rules because they are covered in other answers. For example, here.
In your case, you are trying to call a method, len, on a TattleTell<&'static str>. This type doesn't directly have that method so, using the rules in that other answer, Rust goes looking for it, using the following steps:
Check if the method exists on &TattleTell<&'static str>. It doesn't.
Check if the method exists on *&TattleTell<&'static str>. Due to your Deref implementation, this is a &'static str, so the method exists.
I want to create some references to a str with Rc, without cloning str:
fn main() {
let s = Rc::<str>::from("foo");
let t = Rc::clone(&s); // Creating a new pointer to the same address is easy
let u = Rc::clone(&s[1..2]); // But how can I create a new pointer to a part of `s`?
let w = Rc::<str>::from(&s[0..2]); // This seems to clone str
assert_ne!(&w as *const _, &s as *const _);
}
playground
How can I do this?
While it's possible in principle, the standard library's Rc does not support the case you're trying to create: a counted reference to a part of reference-counted memory.
However, we can get the effect for strings using a fairly straightforward wrapper around Rc which remembers the substring range:
use std::ops::{Deref, Range};
use std::rc::Rc;
#[derive(Clone, Debug, Eq, Hash, PartialEq)]
pub struct RcSubstr {
string: Rc<str>,
span: Range<usize>,
}
impl RcSubstr {
fn new(string: Rc<str>) -> Self {
let span = 0..string.len();
Self { string, span }
}
fn substr(&self, span: Range<usize>) -> Self {
// A full implementation would also have bounds checks to ensure
// the requested range is not larger than the current substring
Self {
string: Rc::clone(&self.string),
span: (self.span.start + span.start)..(self.span.start + span.end)
}
}
}
impl Deref for RcSubstr {
type Target = str;
fn deref(&self) -> &str {
&self.string[self.span.clone()]
}
}
fn main() {
let s = RcSubstr::new(Rc::<str>::from("foo"));
let u = s.substr(1..2);
// We need to deref to print the string rather than the wrapper struct.
// A full implementation would `impl Debug` and `impl Display` to produce
// the expected substring.
println!("{}", &*u);
}
There are a lot of conveniences missing here, such as suitable implementations of Display, Debug, AsRef, Borrow, From, and Into — I've provided only enough code to illustrate how it can work. Once supplemented with the appropriate trait implementations, this should be just as usable as Rc<str> (with the one edge case that it can't be passed to a library type that wants to store Rc<str> in particular).
The crate arcstr claims to offer a finished version of this basic idea, but I haven't used or studied it and so can't guarantee its quality.
The crate owning_ref provides a way to hold references to parts of an Rc or other smart pointer, but there are concerns about its soundness and I don't fully understand which circumstances that applies to (issue search which currently has 3 open issues).
I am trying to simplify my closures, but I had a problem converting my closure to a reference to an associated function when the parameter is owned by the closure but the inner function call only expects a reference.
#![deny(clippy::pedantic)]
fn main() {
let borrowed_structs = vec![BorrowedStruct, BorrowedStruct];
//Selected into_iter specifically to reproduce the minimal scenario that closure gets value instead of reference
borrowed_structs
.into_iter()
.for_each(|consumed_struct: BorrowedStruct| MyStruct::my_method(&consumed_struct));
// I want to write it with static method reference like following line:
// for_each(MyStruct::my_method);
}
struct MyStruct;
struct BorrowedStruct;
impl MyStruct {
fn my_method(prm: &BorrowedStruct) {
prm.say_hello();
}
}
impl BorrowedStruct {
fn say_hello(&self) {
println!("hello");
}
}
Playground
Is it possible to simplify this code:
into_iter().for_each(|consumed_struct: BorrowedStruct| MyStruct::my_method(&consumed_struct));
To the following:
into_iter().for_each(MyStruct::my_method)
Note that into_iter here is only to reproduce to scenario that I own the value in my closure. I know that iter can be used in such scenario but it is not the real scenario that I am working on.
The answer to your general question is no. Types must match exactly when passing a function as a closure argument.
There are one-off workarounds, as shown in rodrigo's answer, but the general solution is to simply take the reference yourself, as you've done:
something_taking_a_closure(|owned_value| some_function_or_method(&owned_value))
I actually advocated for this case about two years ago as part of ergonomics revamp, but no one else seemed interested.
In your specific case, you can remove the type from the closure argument to make it more succinct:
.for_each(|consumed_struct| MyStruct::my_method(&consumed_struct))
I don't think there is a for_each_ref in trait Iterator yet. But you can write your own quite easily (playground):
trait MyIterator {
fn for_each_ref<F>(self, mut f: F)
where
Self: Iterator + Sized,
F: FnMut(&Self::Item),
{
self.for_each(|x| f(&x));
}
}
impl<I: Iterator> MyIterator for I {}
borrowed_structs
.into_iter()
.for_each_ref(MyStruct::my_method);
Another option, if you are able to change the prototype of the my_method function you can make it accept the value either by value or by reference with borrow:
impl MyStruct {
fn my_method(prm: impl Borrow<BorrowedStruct>) {
let prm = prm.borrow();
prm.say_hello();
}
}
And then your original code with .for_each(MyStruct::my_method) just works.
A third option is to use a generic wrapper function (playground):
fn bind_by_ref<T>(mut f: impl FnMut(&T)) -> impl FnMut(T) {
move |x| f(&x)
}
And then call the wrapped function with .for_each(bind_by_ref(MyStruct::my_method));.
In the Rustonomicon's guide to PhantomData, there is a part about what happens if a Vec-like struct has *const T field, but no PhantomData<T>:
The drop checker will generously determine that Vec<T> does not own any values of type T. This will in turn make it conclude that it doesn't need to worry about Vec dropping any T's in its destructor for determining drop check soundness. This will in turn allow people to create unsoundness using Vec's destructor.
What does it mean? If I implement Drop for a struct and manually destroy all Ts in it, why should I care if compiler knows that my struct owns some Ts?
The PhantomData<T> within Vec<T> (held indirectly via a Unique<T> within RawVec<T>) communicates to the compiler that the vector may own instances of T, and therefore the vector may run destructors for T when the vector is dropped.
Deep dive: We have a combination of factors here:
We have a Vec<T> which has an impl Drop (i.e. a destructor implementation).
Under the rules of RFC 1238, this would usually imply a relationship between instances of Vec<T> and any lifetimes that occur within T, by requiring that all lifetimes within T strictly outlive the vector.
However, the destructor for Vec<T> specifically opts out of this semantics for just that destructor (of Vec<T> itself) via the use of special unstable attributes (see RFC 1238 and RFC 1327). This allows for a vector to hold references that have the same lifetime of the vector itself. This is considered sound; after all, the vector itself will not dereference data pointed to by such references (all its doing is dropping values and deallocating the backing array), as long as an important caveat holds.
The important caveat: While the vector itself will not dereference pointers within its contained values while destructing itself, it will drop the values held by the vector. If those values of type T themselves have destructors, those destructors for T get run. And if those destructors access the data held within their references, then we would have a problem if we allowed dangling pointers within those references.
So, diving in even more deeply: the way that we confirm dropck validity for a given structure S, we first double check if S itself has an impl Drop for S (and if so, we enforce rules on S with respect to its type parameters). But even after that step, we then recursively descend into the structure of S itself, and double check for each of its fields that everything is kosher according to dropck. (Note that we do this even if a type parameter of S is tagged with #[may_dangle].)
In this specific case, we have a Vec<T> which (indirectly via RawVec<T>/Unique<T>) owns a collection of values of type T, represented in a raw pointer *const T. However, the compiler attaches no ownership semantics to *const T; that field alone in a structure S implies no relationship between S and T, and thus enforces no constraint in terms of the relationship of lifetimes within the types S and T (at least from the viewpoint of dropck).
Therefore, if the Vec<T> had solely a *const T, the recursive descent into the structure of the vector would fail to capture the ownership relation between the vector and the instances of T contained within the vector. That, combined with the #[may_dangle] attribute on T, would cause the compiler to accept unsound code (namely cases where destructors for T end up trying to access data that has already been deallocated).
BUT: Vec<T> does not solely contain a *const T. There is also a PhantomData<T>, and that conveys to the compiler "hey, even though you can assume (due to the #[may_dangle] T) that the destructor for Vec won't access data of T when the vector is dropped, it is still possible that some destructor of T itself will access data of T as the vector is dropped."
The end effect: Given Vec<T>, if T doesn't have a destructor, then the compiler provides you with more flexibility (namely, it allows a vector to hold data with references to data that lives for the same amount of time as the vector itself, even though such data may be torn down before the vector is). But if T does have a destructor (and that destructor is not otherwise communicating to the compiler that it won't access any referenced data), then the compiler is more strict, requiring any referenced data to strictly outlive the vector (thus ensuring that when the destructor for T runs, all the referenced data will still be valid).
If one wants to try to understand this via concrete exploration, you can try comparing how the compiler differs in its treatment of little container types that vary in their use of #[may_dangle] and PhantomData.
Here is some sample code I have whipped up to illustrate this:
// Illustration of a case where PhantomData is providing necessary ownership
// info to rustc.
//
// MyBox2<T> uses just a `*const T` to hold the `T` it owns.
// MyBox3<T> has both a `*const T` AND a PhantomData<T>; the latter communicates
// its ownership relationship with `T`.
//
// Skim down to `fn f2()` to see the relevant case,
// and compare it to `fn f3()`. When you run the program,
// the output will include:
//
// drop PrintOnDrop(mb2b, PrintOnDrop("v2b", 13, INVALID), Valid)
//
// (However, in the absence of #[may_dangle], the compiler will constrain
// things in a manner that may indeed imply that PhantomData is unnecessary;
// pnkfelix is not 100% sure of this claim yet, though.)
#![feature(alloc, dropck_eyepatch, generic_param_attrs, heap_api)]
extern crate alloc;
use alloc::heap;
use std::fmt;
use std::marker::PhantomData;
use std::mem;
use std::ptr;
#[derive(Copy, Clone, Debug)]
enum State { INVALID, Valid }
#[derive(Debug)]
struct PrintOnDrop<T: fmt::Debug>(&'static str, T, State);
impl<T: fmt::Debug> PrintOnDrop<T> {
fn new(name: &'static str, t: T) -> Self {
PrintOnDrop(name, t, State::Valid)
}
}
impl<T: fmt::Debug> Drop for PrintOnDrop<T> {
fn drop(&mut self) {
println!("drop PrintOnDrop({}, {:?}, {:?})",
self.0,
self.1,
self.2);
self.2 = State::INVALID;
}
}
struct MyBox1<T> {
v: Box<T>,
}
impl<T> MyBox1<T> {
fn new(t: T) -> Self {
MyBox1 { v: Box::new(t) }
}
}
struct MyBox2<T> {
v: *const T,
}
impl<T> MyBox2<T> {
fn new(t: T) -> Self {
unsafe {
let p = heap::allocate(mem::size_of::<T>(), mem::align_of::<T>());
let p = p as *mut T;
ptr::write(p, t);
MyBox2 { v: p }
}
}
}
unsafe impl<#[may_dangle] T> Drop for MyBox2<T> {
fn drop(&mut self) {
unsafe {
// We want this to be *legal*. This destructor is not
// allowed to call methods on `T` (since it may be in
// an invalid state), but it should be allowed to drop
// instances of `T` as it deconstructs itself.
//
// (Note however that the compiler has no knowledge
// that `MyBox2<T>` owns an instance of `T`.)
ptr::read(self.v);
heap::deallocate(self.v as *mut u8,
mem::size_of::<T>(),
mem::align_of::<T>());
}
}
}
struct MyBox3<T> {
v: *const T,
_pd: PhantomData<T>,
}
impl<T> MyBox3<T> {
fn new(t: T) -> Self {
unsafe {
let p = heap::allocate(mem::size_of::<T>(), mem::align_of::<T>());
let p = p as *mut T;
ptr::write(p, t);
MyBox3 { v: p, _pd: Default::default() }
}
}
}
unsafe impl<#[may_dangle] T> Drop for MyBox3<T> {
fn drop(&mut self) {
unsafe {
ptr::read(self.v);
heap::deallocate(self.v as *mut u8,
mem::size_of::<T>(),
mem::align_of::<T>());
}
}
}
fn f1() {
// `let (v, _mb1);` and `let (_mb1, v)` won't compile due to dropck
let v1; let _mb1;
v1 = PrintOnDrop::new("v1", 13);
_mb1 = MyBox1::new(PrintOnDrop::new("mb1", &v1));
}
fn f2() {
{
let (v2a, _mb2a); // Sound, but not distinguished from below by rustc!
v2a = PrintOnDrop::new("v2a", 13);
_mb2a = MyBox2::new(PrintOnDrop::new("mb2a", &v2a));
}
{
let (_mb2b, v2b); // Unsound!
v2b = PrintOnDrop::new("v2b", 13);
_mb2b = MyBox2::new(PrintOnDrop::new("mb2b", &v2b));
// namely, v2b dropped before _mb2b, but latter contains
// value that attempts to access v2b when being dropped.
}
}
fn f3() {
let v3; let _mb3; // `let (v, mb3);` won't compile due to dropck
v3 = PrintOnDrop::new("v3", 13);
_mb3 = MyBox3::new(PrintOnDrop::new("mb3", &v3));
}
fn main() {
f1(); f2(); f3();
}
Caveat emptor — I'm not that strong in the extremely deep theory that truly answers your question. I'm just a layperson who has used Rust a bit and has read the related RFCs. Always refer back to those original sources for a less-diluted version of the truth.
RFC 769 introduced the actual The Drop-Check Rule:
Let v be some value (either temporary or named) and 'a be some
lifetime (scope); if the type of v owns data of type D, where (1.)
D has a lifetime- or type-parametric Drop implementation, and (2.)
the structure of D can reach a reference of type &'a _, and (3.)
either:
(A.) the Drop impl for D instantiates D at 'a
directly, i.e. D<'a>, or,
(B.) the Drop impl for D has some type parameter with a
trait bound T where T is a trait that has at least
one method,
then 'a must strictly outlive the scope of v.
It then goes further to define some of those terms, including what it means for one type to own another. This goes further to mention PhantomData specifically:
Therefore, as an additional special case to the criteria above for when the type E owns data of type D, we include:
If E is PhantomData<T>, then recurse on T.
A key problem occurs when two variables are defined at the same time:
struct Noisy<'a>(&'a str);
impl<'a> Drop for Noisy<'a> {
fn drop(&mut self) { println!("Dropping {}", self.0 )}
}
fn main() -> () {
let (mut v, s) = (Vec::new(), "hi".to_string());
let noisy = Noisy(&s);
v.push(noisy);
}
As I understand it, without The Drop-Check Rule and indicating that Vec owns Noisy, code like this might compile. When the Vec is dropped, the drop implementation could access an invalid reference; introducing unsafety.
Returning to your points:
If I implement Drop for a struct and manually destroy all Ts in it, why should I care if compiler knows that my struct owns some Ts?
The compiler must know that you own the value because you can/will call drop. Since the implementation of drop is arbitrary, if you are going to call it, the compiler must forbid you from accepting values that would cause unsafe behavior during drop.
Always remember that any arbitrary T can be a value, a reference, a value containing a reference, etc. When trying to puzzle out these types of things, it's important to try to use the most complicated variant for any thought experiments.
All of that should provide enough pieces to connect-the-dots; for full understanding, reading the RFC a few times is probably better than relying on my flawed interpretation.
Then it gets more complicated. RFC 1238 further modifies The Drop-Check Rule, removing this specific reasoning. It does say:
parametricity is a necessary but not sufficient condition to justify the inferences that dropck makes
Continuing to use PhantomData seems the safest thing to do, but it may not be required. An anonymous Twitter benefactor pointed out this code:
use std::marker::PhantomData;
#[derive(Debug)] struct MyGeneric<T> { x: Option<T> }
#[derive(Debug)] struct MyDropper<T> { x: Option<T> }
#[derive(Debug)] struct MyHiddenDropper<T> { x: *const T }
#[derive(Debug)] struct MyHonestHiddenDropper<T> { x: *const T, boo: PhantomData<T> }
impl<T> Drop for MyDropper<T> { fn drop(&mut self) { } }
impl<T> Drop for MyHiddenDropper<T> { fn drop(&mut self) { } }
impl<T> Drop for MyHonestHiddenDropper<T> { fn drop(&mut self) { } }
fn main() {
// Does Compile! (magic annotation on destructor)
{
let (a, mut b) = (0, vec![]);
b.push(&a);
}
// Does Compile! (no destructor)
{
let (a, mut b) = (0, MyGeneric { x: None });
b.x = Some(&a);
}
// Doesn't Compile! (has destructor, no attribute)
{
let (a, mut b) = (0, MyDropper { x: None });
b.x = Some(&a);
}
{
let (a, mut b) = (0, MyHiddenDropper { x: 0 as *const _ });
b.x = &&a;
}
{
let (a, mut b) = (0, MyHonestHiddenDropper { x: 0 as *const _, boo: PhantomData });
b.x = &&a;
}
}
This suggests that the changes in RFC 1238 made the compiler more conservative, such that simply having a lifetime or type parameter is enough to prevent it from compiling.
You can also note that Vec doesn't have this problem because it uses the unsafe_destructor_blind_to_params attribute described in the the RFC.
I'm having trouble understanding the rules about traits in algebraic data types.
Here's a simplified example:
use std::rc::Rc;
use std::cell::RefCell;
trait Quack {
fn quack(&self);
}
struct Duck;
impl Quack for Duck {
fn quack(&self) { println!("Quack!"); }
}
fn main() {
let mut pond: Vec<Box<Quack>> = Vec::new();
let duck: Box<Duck> = Box::new(Duck);
pond.push(duck); // This is valid.
let mut lake: Vec<Rc<RefCell<Box<Quack>>>> = Vec::new();
let mallard: Rc<RefCell<Box<Duck>>> = Rc::new(RefCell::new(Box::new(Duck)));
lake.push(mallard); // This is a type mismatch.
}
The above fails to compile, yielding the following error message:
expected `alloc::rc::Rc<core::cell::RefCell<Box<Quack>>>`,
found `alloc::rc::Rc<core::cell::RefCell<Box<Duck>>>`
(expected trait Quack,
found struct `Duck`) [E0308]
src/main.rs:19 lake.push(mallard);
Why is it that pond.push(duck) is valid, yet lake.push(mallard) isn't? In both cases, a Duck has been supplied where a Quack was expected. In the former, the compiler is happy, but in the latter, it's not.
Is the reason for this difference related to CoerceUnsized?
This is a correct behavior, even if it is somewhat unfortunate.
In the first case we have this:
let mut pond: Vec<Box<Quack>> = Vec::new();
let duck: Box<Duck> = Box::new(Duck);
pond.push(duck);
Note that push(), when called on Vec<Box<Quack>>, accepts Box<Quack>, and you're passing Box<Duck>. This is OK - rustc is able to understand that you want to convert a boxed value to a trait object, like here:
let duck: Box<Duck> = Box::new(Duck);
let quack: Box<Quack> = duck; // automatic coercion to a trait object
In the second case we have this:
let mut lake: Vec<Rc<RefCell<Box<Quack>>>> = Vec::new();
let mallard: Rc<RefCell<Box<Duck>>> = Rc::new(RefCell::new(Box::new(Duck)));
lake.push(mallard);
Here push() accepts Rc<RefCell<Box<Quack>>> while you provide Rc<RefCell<Box<Duck>>>:
let mallard: Rc<RefCell<Box<Duck>>> = Rc::new(RefCell::new(Box::new(Duck)));
let quack: Rc<RefCell<Box<Quack>>> = mallard;
And now there is a trouble. Box<T> is a DST-compatible type, so it can be used as a container for a trait object. The same thing will soon be true for Rc and other smart pointers when this RFC is implemented. However, in this case there is no coercion from a concrete type to a trait object because Box<Duck> is inside of additional layers of types (Rc<RefCell<..>>).
Remember, trait object is a fat pointer, so Box<Duck> is different from Box<Quack> in size. Consequently, in principle, they are not directly compatible: you can't just take bytes of Box<Duck> and write them to where Box<Quack> is expected. Rust performs a special conversion, that is, it obtains a pointer to the virtual table for Duck, constructs a fat pointer and writes it to Box<Quack>-typed variable.
When you have Rc<RefCell<Box<Duck>>>, however, rustc would need to know how to construct and destructure both RefCell and Rc in order to apply the same fat pointer conversion to its internals. Naturally, because these are library types, it can't know how to do it. This is also true for any other wrapper type, e.g. Arc or Mutex or even Vec. You don't expect that it would be possible to use Vec<Box<Duck>> as Vec<Box<Quack>>, right?
Also there is a fact that in the example with Rc the Rcs created out of Box<Duck> and Box<Quack> wouldn't have been connected - they would have had different reference counters.
That is, a conversion from a concrete type to a trait object can only happen if you have direct access to a smart pointer which supports DST, not when it is hidden inside some other structure.
That said, I see how it may be possible to allow this for a few select types. For example, we could introduce some kind of Construct/Unwrap traits which are known to the compiler and which it could use to "reach" inside of a stack of wrappers and perform trait object conversion inside them. However, no one designed this thing and provided an RFC about it yet - probably because it is not a widely needed feature.
Vladimir's answer explained what the
compiler is doing. Based on that information, I developed a solution: Creating a wrapper
struct around Box<Quack>.
The wrapper is called QuackWrap. It has a fixed size, and it can be used just like any
other struct (I think). The Box inside QuackWrap allows me to build a QuackWrap
around any trait that implements Quack. Thus, I can have a Vec<Rc<RefCell<QuackWrap>>>
where the inner values are a mixture of Ducks, Gooses, etc.
use std::rc::Rc;
use std::cell::RefCell;
trait Quack {
fn quack(&self);
}
struct Duck;
impl Quack for Duck {
fn quack(&self) { println!("Quack!"); }
}
struct QuackWrap(Box<Quack>);
impl QuackWrap {
pub fn new<T: Quack + 'static>(value: T) -> QuackWrap {
QuackWrap(Box::new(value))
}
}
fn main() {
let mut pond: Vec<Box<Quack>> = Vec::new();
let duck: Box<Duck> = Box::new(Duck);
pond.push(duck); // This is valid.
// This would be a type error:
//let mut lake: Vec<Rc<RefCell<Box<Quack>>>> = Vec::new();
//let mallard: Rc<RefCell<Box<Duck>>> = Rc::new(RefCell::new(Box::new(Duck)));
//lake.push(mallard); // This is a type mismatch.
// Instead, we can do this:
let mut lake: Vec<Rc<RefCell<QuackWrap>>> = Vec::new();
let mallard: Rc<RefCell<QuackWrap>> = Rc::new(RefCell::new(QuackWrap::new(Duck)));
lake.push(mallard); // This is valid.
}
As an added convenience, I'll probably want to implement Deref and DefrefMut on
QuackWrap. But that's not necessary for the above example.