structs with boxed vs. unboxed closures

structs with boxed vs. unboxed closures - rust

I'm still internalizing closures in Rust and how to best work with them, so this question might be somewhat vague, and there will perhaps be silly sub-questions. I'm basically looking for proper idioms and maybe even transforming the way I think about how to do some stuff in Rust.
Storing unboxed closure
The Rust book has an example simple Cacher in it's chapter on closures:
struct Cacher<T>
where
T: Fn(u32) -> u32,
{
calculation: T,
value: Option<u32>,
}
impl<T> Cacher<T>
where
T: Fn(u32) -> u32,
{
fn new(calculation: T) -> Cacher<T> {
Cacher {
calculation,
value: None,
}
}
fn value(&mut self, arg: u32) -> u32 {
match self.value {
Some(v) => v,
None => {
let v = (self.calculation)(arg);
self.value = Some(v);
v
}
}
}
}
It is supposed to be used like this:
let mut c = Cacher::new(|a| a);
let v1 = c.value(1);
That is perfectly fine and useful, but what if I need to have this Cacher be a member of another struct, say (in the spirit of the Rust book chapter), a WorkoutFactory? Since Cacher is parameterized by the closure's type, I am forced to parameterize WorkoutFactory with the same closure type.
Is my understanding correct? I guess so, the Cacher struct structure depends on the type T of the calculation, so the struct WorkoutFactory structure must depend on the type of Cacher. On one hand this feels like a natural, unavoidable and perfectly justified consequence of how closures work in Rust, on the other hand it means that
WorkoutFactory can be contained in another struct that is also forced to be parameterized by T, which can be contained in another struct, ... - the closure type spreads like plague. With perhaps other T's coming from deep down the member hierarchy, the signature of the top-level struct can become monstrous.
the fact that there is some caching involved in WorkoutFactory should be just an implementation detail, perhaps the caching was even added at version 2.0, but the type parameter is visible in the public interface of WorkoutFactory and needs to be accounted for. The seemingly implementation detail is now part of the interface, no good :(
Is there some way to work around these problems without changing the signature of Cacher? How do others cope with this?
Storing boxed closure
If I want to get rid of the type parameters, I can Box the closure. I've come up with the following code:
struct BCacher {
calculation: Box<Fn(u32) -> u32>,
value: Option<u32>,
}
impl BCacher {
fn new<T: Fn(u32) -> u32 + 'static>(calculation: T) -> BCacher {
BCacher {
calculation: Box::new(calculation),
value: None,
}
}
fn value(&mut self, arg: u32) -> u32 {
match self.value {
Some(v) => v,
None => {
let v = (self.calculation)(arg);
self.value = Some(v);
v
}
}
}
}
I can use it exactly like Cacher:
let mut c = BCacher::new(|a| a);
let v1 = c.value(1);
...almost :( The 'static' annotation means I can't do this:
let x = 1;
let mut c = BCacher::new(|a| a + x);
because the closure may outlive x. That is unfortunate, something possible with the non-boxed version is no longer possible with the boxed version.
Furthermore, this version is less efficient, it is necessary to dereference the Box (is that correct?), and RAM access is slow. The difference will most probably be negligible in most cases, but still..
I could address the first issue with lifetime annotation:
struct BLCacher<'a> {
calculation: Box<Fn(u32) -> u32 + 'a>,
value: Option<u32>,
}
but now I'm back to Cacher with type parameters and all the unpleasant consequences of that.
The right to choose
This seems like an unfortunate situation. I have two approaches to storing closure in a struct, and each has it's own set of problems. Let's say I'm willing to live with that, and as the author of the awesome fictional Cacher crate, I want to present the users with both implementations of Cacher, the unboxed Cacher and the boxed BCacher. But I don't want to write the implementation twice. What would be the best way - if there's any at all - to use an existing Cacher implementation to implement BCacher?
On a related note (maybe it is even the same question), let's assume I have a
struct WorkoutFactory<T>
where
T: Fn(u32) -> u32,
{
cacher: Cacher<T>,
}
Is there a way to implement GymFactory without type parameters that would contain - for private purposes - WorkoutFactory with type parameters, probably stored in a Box?
Summary
A long question, sorry for that. Coming from Scala, working with closures is a LOT less straightforward in Rust. I hope I've explained the struggles I've not yet found satisfactory answers to.

Not a full answer sorry, but in this bit of code here:
let x = 1;
let mut c = BCacher::new(|a| a + x);
Why not change it to:
let x = 1;
let mut c = BCacher::new(move |a| a + x);
That way the functor will absorb and own 'x', so there won't be any reference to it, which should hopefully clear up all the other issues.

Related

What is a canonical approach for having lite concrete-specific behaviour in Rust? [duplicate]

How do I get Box<B> or &B or &Box<B> from the a variable in this code:
trait A {}
struct B;
impl A for B {}
fn main() {
let mut a: Box<dyn A> = Box::new(B);
let b = a as Box<B>;
}
This code returns an error:
error[E0605]: non-primitive cast: `std::boxed::Box<dyn A>` as `std::boxed::Box<B>`
--> src/main.rs:8:13
|
8 | let b = a as Box<B>;
| ^^^^^^^^^^^
|
= note: an `as` expression can only be used to convert between primitive types. Consider using the `From` trait

There are two ways to do downcasting in Rust. The first is to use Any. Note that this only allows you to downcast to the exact, original concrete type. Like so:
use std::any::Any;
trait A {
fn as_any(&self) -> &dyn Any;
}
struct B;
impl A for B {
fn as_any(&self) -> &dyn Any {
self
}
}
fn main() {
let a: Box<dyn A> = Box::new(B);
// The indirection through `as_any` is because using `downcast_ref`
// on `Box<A>` *directly* only lets us downcast back to `&A` again.
// The method ensures we get an `Any` vtable that lets us downcast
// back to the original, concrete type.
let b: &B = match a.as_any().downcast_ref::<B>() {
Some(b) => b,
None => panic!("&a isn't a B!"),
};
}
The other way is to implement a method for each "target" on the base trait (in this case, A), and implement the casts for each desired target type.
Wait, why do we need as_any?
Even if you add Any as a requirement for A, it's still not going to work correctly. The first problem is that the A in Box<dyn A> will also implement Any... meaning that when you call downcast_ref, you'll actually be calling it on the object type A. Any can only downcast to the type it was invoked on, which in this case is A, so you'll only be able to cast back down to &dyn A which you already had.
But there's an implementation of Any for the underlying type in there somewhere, right? Well, yes, but you can't get at it. Rust doesn't allow you to "cross cast" from &dyn A to &dyn Any.
That is what as_any is for; because it's something only implemented on our "concrete" types, the compiler doesn't get confused as to which one it's supposed to invoke. Calling it on an &dyn A causes it to dynamically dispatch to the concrete implementation (again, in this case, B::as_any), which returns an &dyn Any using the implementation of Any for B, which is what we want.
Note that you can side-step this whole problem by just not using A at all. Specifically, the following will also work:
fn main() {
let a: Box<dyn Any> = Box::new(B);
let _: &B = match a.downcast_ref::<B>() {
Some(b) => b,
None => panic!("&a isn't a B!")
};
}
However, this precludes you from having any other methods; all you can do here is downcast to a concrete type.
As a final note of potential interest, the mopa crate allows you to combine the functionality of Any with a trait of your own.

It should be clear that the cast can fail if there is another type C implementing A and you try to cast Box<C> into a Box<B>. I don't know your situation, but to me it looks a lot like you are bringing techniques from other languages, like Java, into Rust. I've never encountered this kind of Problem in Rust -- maybe your code design could be improved to avoid this kind of cast.
If you want, you can "cast" pretty much anything with mem::transmute. Sadly, we will have a problem if we just want to cast Box<A> to Box<B> or &A to &B because a pointer to a trait is a fat-pointer that actually consists of two pointers: One to the actual object, one to the vptr. If we're casting it to a struct type, we can just ignore the vptr. Please remember that this solution is highly unsafe and pretty hacky -- I wouldn't use it in "real" code.
let (b, vptr): (Box<B>, *const ()) = unsafe { std::mem::transmute(a) };
EDIT: Screw that, it's even more unsafe than I thought. If you want to do it correctly this way you'd have to use std::raw::TraitObject. This is still unstable though. I don't think that this is of any use to OP; don't use it!
There are better alternatives in this very similar question: How to match trait implementors

Rust: polymorphic calls for structs in a vector

I'm a complete newbie in Rust and I'm trying to get some understanding of the basics of the language.
Consider the following trait
trait Function {
fn value(&self, arg: &[f64]) -> f64;
}
and two structs implementing it:
struct Add {}
struct Multiply {}
impl Function for Add {
fn value(&self, arg: &[f64]) -> f64 {
arg[0] + arg[1]
}
}
impl Function for Multiply {
fn value(&self, arg: &[f64]) -> f64 {
arg[0] * arg[1]
}
}
In my main() function I want to group two instances of Add and Multiply in a vector, and then call the value method. The following works:
fn main() {
let x = vec![1.0, 2.0];
let funcs: Vec<&dyn Function> = vec![&Add {}, &Multiply {}];
for f in funcs {
println!("{}", f.value(&x));
}
}
And so does:
fn main() {
let x = vec![1.0, 2.0];
let funcs: Vec<Box<dyn Function>> = vec![Box::new(Add {}), Box::new(Multiply {})];
for f in funcs {
println!("{}", f.value(&x));
}
}
Is there any better / less verbose way? Can I work around wrapping the instances in a Box? What is the takeaway with trait objects in this case?

Is there any better / less verbose way?
There isn't really a way to make this less verbose. Since you are using trait objects, you need to tell the compiler that the vectors's items are dyn Function and not the concrete type. The compiler can't just infer that you meant dyn Function trait objects because there could have been other traits that Add and Multiply both implement.
You can't abstract out the calls to Box::new either. For that to work, you would have to somehow map over a heterogeneous collection, which isn't possible in Rust. However, if you are writing this a lot, you might consider adding helper constructor functions for each concrete impl:
impl Add {
fn new() -> Add {
Add {}
}
fn new_boxed() -> Box<Add> {
Box::new(Add::new())
}
}
It's idiomatic to include a new constructor wherever possible, but it's also common to include alternative convenience constructors.
This makes the construction of the vector a bit less noisy:
let funcs: Vec<Box<dyn Function>> = vec!(Add::new_boxed(), Multiply::new_boxed()));
What is the takeaway with trait objects in this case?
There is always a small performance hit with using dynamic dispatch. If all of your objects are the same type, they can be densely packed in memory, which can be much faster for iteration. In general, I wouldn't worry too much about this unless you are creating a library crate, or if you really want to squeeze out the last nanosecond of performance.

Why are borrows of struct members allowed in &mut self, but not of self to immutable methods?

If I have a struct that encapsulates two members, and updates one based on the other, that's fine as long as I do it this way:
struct A {
value: i64
}
impl A {
pub fn new() -> Self {
A { value: 0 }
}
pub fn do_something(&mut self, other: &B) {
self.value += other.value;
}
pub fn value(&self) -> i64 {
self.value
}
}
struct B {
pub value: i64
}
struct State {
a: A,
b: B
}
impl State {
pub fn new() -> Self {
State {
a: A::new(),
b: B { value: 1 }
}
}
pub fn do_stuff(&mut self) -> i64 {
self.a.do_something(&self.b);
self.a.value()
}
pub fn get_b(&self) -> &B {
&self.b
}
}
fn main() {
let mut state = State::new();
println!("{}", state.do_stuff());
}
That is, when I directly refer to self.b. But when I change do_stuff() to this:
pub fn do_stuff(&mut self) -> i64 {
self.a.do_something(self.get_b());
self.a.value()
}
The compiler complains: cannot borrow `*self` as immutable because `self.a` is also borrowed as mutable.
What if I need to do something more complex than just returning a member in order to get the argument for a.do_something()? Must I make a function that returns b by value and store it in a binding, then pass that binding to do_something()? What if b is complex?
More importantly to my understanding, what kind of memory-unsafety is the compiler saving me from here?

A key aspect of mutable references is that they are guaranteed to be the only way to access a particular value while they exist (unless they're reborrowed, which "disables" them temporarily).
When you write
self.a.do_something(&self.b);
the compiler is able to see that the borrow on self.a (which is taken implicitly to perform the method call) is distinct from the borrow on self.b, because it can reason about direct field accesses.
However, when you write
self.a.do_something(self.get_b());
then the compiler doesn't see a borrow on self.b, but rather a borrow on self. That's because lifetime parameters on method signatures cannot propagate such detailed information about borrows. Therefore, the compiler cannot guarantee that the value returned by self.get_b() doesn't give you access to self.a, which would create two references that can access self.a, one of them being mutable, which is illegal.
The reason field borrows don't propagate across functions is to simplify type checking and borrow checking (for machines and for humans). The principle is that the signature should be sufficient for performing those tasks: changing the implementation of a function should not cause errors in its callers.
What if I need to do something more complex than just returning a member in order to get the argument for a.do_something()?
I would move get_b from State to B and call get_b on self.b. This way, the compiler can see the distinct borrows on self.a and self.b and will accept the code.
self.a.do_something(self.b.get_b());

Yes, the compiler isolates functions for the purposes of the safety checks it makes. If it didn't, then every function would essentially have to be inlined everywhere. No one would appreciate this for at least two reasons:
Compile times would go through the roof, and many opportunities for parallelization would have to be discarded.
Changes to a function N calls away could affect the current function. See also Why are explicit lifetimes needed in Rust? which touches on the same concept.
what kind of memory-unsafety is the compiler saving me from here
None, really. In fact, it could be argued that it's creating false positives, as your example shows.
It's really more of a benefit for preserving programmer sanity.
The general advice that I give and follow when I encounter this problem is that the compiler is guiding you to discovering a new type in your existing code.
Your particular example is a bit too simplified for this to make sense, but if you had struct Foo(A, B, C) and found that a method on Foo needed A and B, that's often a good sign that there's a hidden type composed of A and B: struct Foo(Bar, C); struct Bar(A, B).
This isn't a silver bullet as you can end up with methods that need each pair of data, but in my experience it works the majority of the time.

Why is it useful to use PhantomData to inform the compiler that a struct owns a generic if I already implement Drop?

In the Rustonomicon's guide to PhantomData, there is a part about what happens if a Vec-like struct has *const T field, but no PhantomData<T>:
The drop checker will generously determine that Vec<T> does not own any values of type T. This will in turn make it conclude that it doesn't need to worry about Vec dropping any T's in its destructor for determining drop check soundness. This will in turn allow people to create unsoundness using Vec's destructor.
What does it mean? If I implement Drop for a struct and manually destroy all Ts in it, why should I care if compiler knows that my struct owns some Ts?

The PhantomData<T> within Vec<T> (held indirectly via a Unique<T> within RawVec<T>) communicates to the compiler that the vector may own instances of T, and therefore the vector may run destructors for T when the vector is dropped.
Deep dive: We have a combination of factors here:
We have a Vec<T> which has an impl Drop (i.e. a destructor implementation).
Under the rules of RFC 1238, this would usually imply a relationship between instances of Vec<T> and any lifetimes that occur within T, by requiring that all lifetimes within T strictly outlive the vector.
However, the destructor for Vec<T> specifically opts out of this semantics for just that destructor (of Vec<T> itself) via the use of special unstable attributes (see RFC 1238 and RFC 1327). This allows for a vector to hold references that have the same lifetime of the vector itself. This is considered sound; after all, the vector itself will not dereference data pointed to by such references (all its doing is dropping values and deallocating the backing array), as long as an important caveat holds.
The important caveat: While the vector itself will not dereference pointers within its contained values while destructing itself, it will drop the values held by the vector. If those values of type T themselves have destructors, those destructors for T get run. And if those destructors access the data held within their references, then we would have a problem if we allowed dangling pointers within those references.
So, diving in even more deeply: the way that we confirm dropck validity for a given structure S, we first double check if S itself has an impl Drop for S (and if so, we enforce rules on S with respect to its type parameters). But even after that step, we then recursively descend into the structure of S itself, and double check for each of its fields that everything is kosher according to dropck. (Note that we do this even if a type parameter of S is tagged with #[may_dangle].)
In this specific case, we have a Vec<T> which (indirectly via RawVec<T>/Unique<T>) owns a collection of values of type T, represented in a raw pointer *const T. However, the compiler attaches no ownership semantics to *const T; that field alone in a structure S implies no relationship between S and T, and thus enforces no constraint in terms of the relationship of lifetimes within the types S and T (at least from the viewpoint of dropck).
Therefore, if the Vec<T> had solely a *const T, the recursive descent into the structure of the vector would fail to capture the ownership relation between the vector and the instances of T contained within the vector. That, combined with the #[may_dangle] attribute on T, would cause the compiler to accept unsound code (namely cases where destructors for T end up trying to access data that has already been deallocated).
BUT: Vec<T> does not solely contain a *const T. There is also a PhantomData<T>, and that conveys to the compiler "hey, even though you can assume (due to the #[may_dangle] T) that the destructor for Vec won't access data of T when the vector is dropped, it is still possible that some destructor of T itself will access data of T as the vector is dropped."
The end effect: Given Vec<T>, if T doesn't have a destructor, then the compiler provides you with more flexibility (namely, it allows a vector to hold data with references to data that lives for the same amount of time as the vector itself, even though such data may be torn down before the vector is). But if T does have a destructor (and that destructor is not otherwise communicating to the compiler that it won't access any referenced data), then the compiler is more strict, requiring any referenced data to strictly outlive the vector (thus ensuring that when the destructor for T runs, all the referenced data will still be valid).
If one wants to try to understand this via concrete exploration, you can try comparing how the compiler differs in its treatment of little container types that vary in their use of #[may_dangle] and PhantomData.
Here is some sample code I have whipped up to illustrate this:
// Illustration of a case where PhantomData is providing necessary ownership
// info to rustc.
//
// MyBox2<T> uses just a `*const T` to hold the `T` it owns.
// MyBox3<T> has both a `*const T` AND a PhantomData<T>; the latter communicates
// its ownership relationship with `T`.
//
// Skim down to `fn f2()` to see the relevant case,
// and compare it to `fn f3()`. When you run the program,
// the output will include:
//
// drop PrintOnDrop(mb2b, PrintOnDrop("v2b", 13, INVALID), Valid)
//
// (However, in the absence of #[may_dangle], the compiler will constrain
// things in a manner that may indeed imply that PhantomData is unnecessary;
// pnkfelix is not 100% sure of this claim yet, though.)
#![feature(alloc, dropck_eyepatch, generic_param_attrs, heap_api)]
extern crate alloc;
use alloc::heap;
use std::fmt;
use std::marker::PhantomData;
use std::mem;
use std::ptr;
#[derive(Copy, Clone, Debug)]
enum State { INVALID, Valid }
#[derive(Debug)]
struct PrintOnDrop<T: fmt::Debug>(&'static str, T, State);
impl<T: fmt::Debug> PrintOnDrop<T> {
fn new(name: &'static str, t: T) -> Self {
PrintOnDrop(name, t, State::Valid)
}
}
impl<T: fmt::Debug> Drop for PrintOnDrop<T> {
fn drop(&mut self) {
println!("drop PrintOnDrop({}, {:?}, {:?})",
self.0,
self.1,
self.2);
self.2 = State::INVALID;
}
}
struct MyBox1<T> {
v: Box<T>,
}
impl<T> MyBox1<T> {
fn new(t: T) -> Self {
MyBox1 { v: Box::new(t) }
}
}
struct MyBox2<T> {
v: *const T,
}
impl<T> MyBox2<T> {
fn new(t: T) -> Self {
unsafe {
let p = heap::allocate(mem::size_of::<T>(), mem::align_of::<T>());
let p = p as *mut T;
ptr::write(p, t);
MyBox2 { v: p }
}
}
}
unsafe impl<#[may_dangle] T> Drop for MyBox2<T> {
fn drop(&mut self) {
unsafe {
// We want this to be *legal*. This destructor is not
// allowed to call methods on `T` (since it may be in
// an invalid state), but it should be allowed to drop
// instances of `T` as it deconstructs itself.
//
// (Note however that the compiler has no knowledge
// that `MyBox2<T>` owns an instance of `T`.)
ptr::read(self.v);
heap::deallocate(self.v as *mut u8,
mem::size_of::<T>(),
mem::align_of::<T>());
}
}
}
struct MyBox3<T> {
v: *const T,
_pd: PhantomData<T>,
}
impl<T> MyBox3<T> {
fn new(t: T) -> Self {
unsafe {
let p = heap::allocate(mem::size_of::<T>(), mem::align_of::<T>());
let p = p as *mut T;
ptr::write(p, t);
MyBox3 { v: p, _pd: Default::default() }
}
}
}
unsafe impl<#[may_dangle] T> Drop for MyBox3<T> {
fn drop(&mut self) {
unsafe {
ptr::read(self.v);
heap::deallocate(self.v as *mut u8,
mem::size_of::<T>(),
mem::align_of::<T>());
}
}
}
fn f1() {
// `let (v, _mb1);` and `let (_mb1, v)` won't compile due to dropck
let v1; let _mb1;
v1 = PrintOnDrop::new("v1", 13);
_mb1 = MyBox1::new(PrintOnDrop::new("mb1", &v1));
}
fn f2() {
{
let (v2a, _mb2a); // Sound, but not distinguished from below by rustc!
v2a = PrintOnDrop::new("v2a", 13);
_mb2a = MyBox2::new(PrintOnDrop::new("mb2a", &v2a));
}
{
let (_mb2b, v2b); // Unsound!
v2b = PrintOnDrop::new("v2b", 13);
_mb2b = MyBox2::new(PrintOnDrop::new("mb2b", &v2b));
// namely, v2b dropped before _mb2b, but latter contains
// value that attempts to access v2b when being dropped.
}
}
fn f3() {
let v3; let _mb3; // `let (v, mb3);` won't compile due to dropck
v3 = PrintOnDrop::new("v3", 13);
_mb3 = MyBox3::new(PrintOnDrop::new("mb3", &v3));
}
fn main() {
f1(); f2(); f3();
}

Caveat emptor — I'm not that strong in the extremely deep theory that truly answers your question. I'm just a layperson who has used Rust a bit and has read the related RFCs. Always refer back to those original sources for a less-diluted version of the truth.
RFC 769 introduced the actual The Drop-Check Rule:
Let v be some value (either temporary or named) and 'a be some
lifetime (scope); if the type of v owns data of type D, where (1.)
D has a lifetime- or type-parametric Drop implementation, and (2.)
the structure of D can reach a reference of type &'a _, and (3.)
either:
(A.) the Drop impl for D instantiates D at 'a
directly, i.e. D<'a>, or,
(B.) the Drop impl for D has some type parameter with a
trait bound T where T is a trait that has at least
one method,
then 'a must strictly outlive the scope of v.
It then goes further to define some of those terms, including what it means for one type to own another. This goes further to mention PhantomData specifically:
Therefore, as an additional special case to the criteria above for when the type E owns data of type D, we include:
If E is PhantomData<T>, then recurse on T.
A key problem occurs when two variables are defined at the same time:
struct Noisy<'a>(&'a str);
impl<'a> Drop for Noisy<'a> {
fn drop(&mut self) { println!("Dropping {}", self.0 )}
}
fn main() -> () {
let (mut v, s) = (Vec::new(), "hi".to_string());
let noisy = Noisy(&s);
v.push(noisy);
}
As I understand it, without The Drop-Check Rule and indicating that Vec owns Noisy, code like this might compile. When the Vec is dropped, the drop implementation could access an invalid reference; introducing unsafety.
Returning to your points:
If I implement Drop for a struct and manually destroy all Ts in it, why should I care if compiler knows that my struct owns some Ts?
The compiler must know that you own the value because you can/will call drop. Since the implementation of drop is arbitrary, if you are going to call it, the compiler must forbid you from accepting values that would cause unsafe behavior during drop.
Always remember that any arbitrary T can be a value, a reference, a value containing a reference, etc. When trying to puzzle out these types of things, it's important to try to use the most complicated variant for any thought experiments.
All of that should provide enough pieces to connect-the-dots; for full understanding, reading the RFC a few times is probably better than relying on my flawed interpretation.
Then it gets more complicated. RFC 1238 further modifies The Drop-Check Rule, removing this specific reasoning. It does say:
parametricity is a necessary but not sufficient condition to justify the inferences that dropck makes
Continuing to use PhantomData seems the safest thing to do, but it may not be required. An anonymous Twitter benefactor pointed out this code:
use std::marker::PhantomData;
#[derive(Debug)] struct MyGeneric<T> { x: Option<T> }
#[derive(Debug)] struct MyDropper<T> { x: Option<T> }
#[derive(Debug)] struct MyHiddenDropper<T> { x: *const T }
#[derive(Debug)] struct MyHonestHiddenDropper<T> { x: *const T, boo: PhantomData<T> }
impl<T> Drop for MyDropper<T> { fn drop(&mut self) { } }
impl<T> Drop for MyHiddenDropper<T> { fn drop(&mut self) { } }
impl<T> Drop for MyHonestHiddenDropper<T> { fn drop(&mut self) { } }
fn main() {
// Does Compile! (magic annotation on destructor)
{
let (a, mut b) = (0, vec![]);
b.push(&a);
}
// Does Compile! (no destructor)
{
let (a, mut b) = (0, MyGeneric { x: None });
b.x = Some(&a);
}
// Doesn't Compile! (has destructor, no attribute)
{
let (a, mut b) = (0, MyDropper { x: None });
b.x = Some(&a);
}
{
let (a, mut b) = (0, MyHiddenDropper { x: 0 as *const _ });
b.x = &&a;
}
{
let (a, mut b) = (0, MyHonestHiddenDropper { x: 0 as *const _, boo: PhantomData });
b.x = &&a;
}
}
This suggests that the changes in RFC 1238 made the compiler more conservative, such that simply having a lifetime or type parameter is enough to prevent it from compiling.
You can also note that Vec doesn't have this problem because it uses the unsafe_destructor_blind_to_params attribute described in the the RFC.

Returning a closure from a function

Note: This question was asked before Rust's first stable release. There have been lots of changes since and the syntax used in the function is not even valid anymore. Still, Shepmaster's answer is excellent and makes this question worth keeping.
Finally unboxed closures have landed, so I am experimenting with them to see what you can do.
I have this simple function:
fn make_adder(a: int, b: int) -> || -> int {
|| a + b
}
However, I get a missing lifetime specifier [E0106] error. I have tried to fix this by changing the return type to ||: 'static -> int, but then I get another error cannot infer an appropriate lifetime due to conflicting requirements.
If I understand correctly, the closure is unboxed so it owns a and b. It seems very strange to me that it needs a lifetime. How can I fix this?

As of Rust 1.26, you can use impl trait:
fn make_adder(a: i32) -> impl Fn(i32) -> i32 {
move |b| a + b
}
fn main() {
println!("{}", make_adder(1)(2));
}
This allows returning an unboxed closure even though it is impossible to specify the exact type of the closure.
This will not help you if any of these are true:
You are targeting Rust before this version
You have any kind of conditional in your function:
fn make_adder(a: i32) -> impl Fn(i32) -> i32 {
if a > 0 {
move |b| a + b
} else {
move |b| a - b
}
}
Here, there isn't a single return type; each closure has a unique, un-namable type.
You need to be able to name the returned type for any reason:
struct Example<F>(F);
fn make_it() -> Example<impl Fn()> {
Example(|| println!("Hello"))
}
fn main() {
let unnamed_type_ok = make_it();
let named_type_bad: /* No valid type here */ = make_it();
}
You cannot (yet) use impl SomeTrait as a variable type.
In these cases, you need to use indirection. The common solution is a trait object, as described in the other answer.

It is possible to return closures inside Boxes, that is, as trait objects implementing certain trait:
fn make_adder(a: i32) -> Box<dyn Fn(i32) -> i32> {
Box::new(move |b| a + b)
}
fn main() {
println!("{}", make_adder(1)(2));
}
(try it here)
There is also an RFC (its tracking issue) on adding unboxed abstract return types which would allow returning closures by value, without boxes, but this RFC was postponed. According to discussion in that RFC, it seems that some work is done on it recently, so it is possible that unboxed abstract return types will be available relatively soon.

The || syntax is still the old boxed closures, so this doesn't work for the same reason it didn't previously.
And, it won't work even using the correct boxed closure syntax |&:| -> int, since it is literally is just sugar for certain traits. At the moment, the sugar syntax is |X: args...| -> ret, where the X can be &, &mut or nothing, corresponding to the Fn, FnMut, FnOnce traits, you can also write Fn<(args...), ret> etc. for the non-sugared form. The sugar is likely to be changing (possibly something like Fn(args...) -> ret).
Each unboxed closure has a unique, unnameable type generated internally by the compiler: the only way to talk about unboxed closures is via generics and trait bounds. In particular, writing
fn make_adder(a: int, b: int) -> |&:| -> int {
|&:| a + b
}
is like writing
fn make_adder(a: int, b: int) -> Fn<(), int> {
|&:| a + b
}
i.e. saying that make_adder returns an unboxed trait value; which doesn't make much sense at the moment. The first thing to try would be
fn make_adder<F: Fn<(), int>>(a: int, b: int) -> F {
|&:| a + b
}
but this is saying that make_adder is returning any F that the caller chooses, while we want to say it returns some fixed (but "hidden") type. This required abstract return types, which says, basically, "the return value implements this trait" while still being unboxed and statically resolved. In the language of that (temporarily closed) RFC,
fn make_adder(a: int, b: int) -> impl Fn<(), int> {
|&:| a + b
}
Or with the closure sugar.
(Another minor point: I'm not 100% sure about unboxed closures, but the old closures certainly still capture things by-reference which is another thing that sinks the code as proposed in the issue. This is being rectified in #16610.)

Here's how to implement a closure based counter:
fn counter() -> impl FnMut() -> i32 {
let mut value = 0;
move || -> i32 {
value += 1;
return value;
}
}
fn main() {
let mut incre = counter();
println!("Count 1: {}", incre());
println!("Count 2: {}", incre());
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

structs with boxed vs. unboxed closures - rust

Related

What is a canonical approach for having lite concrete-specific behaviour in Rust? [duplicate]

Rust: polymorphic calls for structs in a vector

Why are borrows of struct members allowed in &mut self, but not of self to immutable methods?

Why is it useful to use PhantomData to inform the compiler that a struct owns a generic if I already implement Drop?

Returning a closure from a function

Categories

Resources