Creating a vector of mutable Trait objects (sometimes shared) in Rust

Creating a vector of mutable Trait objects (sometimes shared) in Rust - rust

I need to create a stack of pointers with the following constraints:
The pointers need to point to the same Trait object (so Box seems like a fit)
Those Trait objects may need to be modified (RefCell may need to be used?)
Two pointers in the stack may need to point to the same object (Rc seems like a fit)
Right now, the only way I've found to accommodate this is to use a Vec<Rc<RefCell<Box<dyn MyTrait>>>>. Is that the best solution though? It looks like a lot of pointer dereferences needed to access the objects.

I'm not quite sure what you exactly mean with:
The pointers need to point to the same Trait object (so Box seems like a fit)
But if you are interested in storing objects of actually different types, then you need trait-objects and those need to be behind some sort of pointer such as a Box. And a Box is generally a good default (but there are alternatives).
Those Trait objects may need to be modified (RefCell may need to be used?)
Well, actually, that could still be done with a Box.
Two pointers in the stack may need to point to the same object (Rc seems like a fit)
Here, it gets difficult because in Rust sharable and mutable are kind of exclude each other. To be sharable, we need an Rc, which you can think of as a shared box. Then to make it mutable anyway, we can use interior mutability by using a RefCell. So, essentially a Rc<RefCell<_>>, which you can think of as a sharable & mutable Box.
Finally, if you put it all together into a Vec you get: Vec<Rc<RefCell<dyn MyTrait>>> (no Box).
This allows you to have different types in the Vec, having some instances even multiple times in it, and still allowing mutable access to each of them.

Related

When should I use Pin<Arc<T>> in Rust?

I'm pretty new to Rust and have a couple different implementations of a method that includes a closure referencing self. To use the reference in the closure effectively, I've been using Arc<Self> (I am multithreading) and Pin<Arc<Self>>.
I would like to make this method as generally memory efficient as possible. I assume pinning the Arc in memory would help with this. However, (a) I've read that Arcs are pinned and (b) it seems like Pin<Arc<T>> may require additional allocations.
What is Pin<Arc<T>> good for?

Adding Pin around some pointer type does not change the behavior of the program. It only adds a restriction on what further code you can write (and even that, only if the T in Pin<Arc<T>> is not Unpin, which most types are).
Therefore, there is no "memory efficiency" to be gained by adding Pin.
The only use of Pin is to allow working with types that require they be pinned to use them, such as Futures.

How to have a vector of trait objects while maintaining references to said objects

I have a Vec<Box> and id like to store all my widgets in here for rendering but id also like to keep references to certain widgets. When i push the widgets into my vector they are moved and the borrow checker complains when i try to reference them. How can i get around this?

The issue you're seeing here is that you cannot mutate the vector while alos having references into it. This is a good guard to have as it can lead to dangling references.
There are a few ways to fix this, the first being reference counting with either Rc or Arc. This mutates your vector to be Vec<Rc/Arc<WidgetType>> or Vec<Rc/Arc<&dyn WidgetTrait>> if you're doing virtual calls. Then instead of referencing those widgets elsewhere you clone the reference counter. This can be done relativly easily but has the downsides of there being a pointer indirection added and the items can still exist even if they're removed from your render queue.
The next option is to store indexes/keys into the render list. This can be done with usize on a Vec as #trentcl said in the comments or an arbitray key with HashMap. This has the downsides of index movement in the case of a vector or a lot of hashing with a hashmap.
The best option in my opinion requires a library but covers all the other downsides. The library is slotmap. This creates a map with a vector backing that returns keys that you can replace your references with. There are a few options within it as to how you want the data to be stored and which opertions you want to be optimized, but that can all be seen in the documentaion. The main issue with this is that you have to add a dependancy which may not be preferable depending on your specific case.

Data to be determined later: interior mutability or separate HashMap?

I have a struct, call it Book, which let's say stores data on a book sold by a bookstore. It needs to be referenced at many places in some data structure (e.g. with Rc) and so cannot be borrowed mutably in the normal way. However, it has some attribute, say its price, that needs to be filled in at some time later than initialization, after the object already has outstanding references.
So far I can think of two ways to do this, but they both have disadvantages:
Interior mutability: give Book a field such as price: RefCell<Option<i32>> which is initialized to RefCell::new(Option::None) when Book is initialized. Later on, when we determine the price of the book, we can use borrow_mut to set price to Some(10) instead, and from then on we can borrow it to retrieve its value.
My sense is that in general, one wants to avoid interior mutability unless necessary, and it doesn't seem here like it ought to be all that necessary. This technique is also a little awkward because of the Option, which we need because the price won't have a value until later (and setting it to 0 or -1 in the meantime seems un-Rustlike), but which requires lots of matches or unwraps in places where we may be logically certain that the price will have already been filled in.
Separate table: don't store the price inside Book at all, but make a separate data structure to store it, e.g. price_table: HashMap<Rc<Book>, i32>. Have a function which creates and populates this table when prices are determined, and then pass it around by reference (mutably or not) to every function that needs to know or change the prices of books.
Coming from a C background as I do, the HashMap feels like unnecessary overhead both in speed and memory, for data that already has a natural place to live (inside Book) and "should" be accessible via a simple pointer chase. This solution also means I have to clutter up lots of functions with an additional argument that's a reference to price_table.
Is one of these two methods generally more idiomatic in Rust, or are there other approaches that avoid the dilemma? I did see Once, but I don't think it's what I want, because I'd still have to know at initialization time how to fill in price, and I don't know that.
Of course, in other applications, we may need some other type than i32 to represent our desired attribute, so I'd like to be able to handle the general case.

I think that your first approach is optimal for this situation. Since you have outstanding references to some data that you want to write to, you have to check the borrowing rules at runtime, so RefCell is the way to go.
Inside the RefCell, prefer an Option or a custom enum with variants like Price::NotSet and Price::Set(i32). If you are really sure, that all prices are initialized at some point, you could write a method price() that calls unwrap for you or does an assertion with better debug output in the case your RefCell contains a None.
I guess that the HashMap approach would be fine for this case, but if you wanted to have something that is not Copy as your value in there, you could run into the same problem, since there might be outstanding references into the map somewhere.
I agree that the HashMap would not be the idiomatic way to go here and still choose your first approach, even with i32 as the value type.
Edit:
As pointed out in the comments (thanks you!), there are two performance considerations for this situation. Firstly, if you really know, that the contained price is never zero, you can use std::num::NonZeroU16 and get the Option variant None for free (see documentation).
If you are dealing with a type that is Copy (e.g. i32), you should consider using Cell instead of RefCell, because it is lighter. For a more detailed comparison, see https://stackoverflow.com/a/30276150/13679671

Here are two more approaches.
Use Rc<RefCell<<Book>> everywhere, with price: Option<i32>> in the struct.
Declare a strict BookId(usize) and make a library: HashMap<BookId, Book>. Make all your references BookId and thus indirectly reference books through them everywhere you need to do so.

Understand smart pointers in Rust

I am a newbie to Rust and writing to understand the "Smart pointers" in Rust. I have basic understanding of how smart pointers works in C++ and has been using it for memory management since a few years ago. But to my very much surprise, Rust also provides such utility explicitly.
Because from a tutorial here (https://pcwalton.github.io/2013/03/18/an-overview-of-memory-management-in-rust.html), it seems that every raw pointers have been automatically wrapped with a smart pointer, which seems very reasonable. Then why do we still need such Box<T>, Rc<T>, and Ref<T> stuff? According to this specification: https://doc.rust-lang.org/book/ch15-00-smart-pointers.html
Any comments will be apprecicated a lot. Thanks.

You can think about the difference between a T and a Box<T> as the difference between a statically allocated object and a dynamically allocated object (the latter being created via a new expression in C++ terms).
In Rust, both T and Box<T> represent a variable that has ownership over the referent object (i.e. when the variable goes out of scope, the object will be destroyed, whether it was stored by value or by reference). On the contrary, &T and &mut T represent borrowing of the object (i.e. these variables are not responsible for destroying the object, and they cannot outlive the owner of the object).
By default, you'd probably want to use T, but sometimes you might want (or have) to use Box<T>. For example, you would use a Box<T> if you want to own a T that's too large to be allocated in place. You would also use it when the object doesn't have a known size at all, which means that your only choice to store it or pass it around is through the "pointer" (the Box<T>).
In Rust, an object is generally either mutable or aliased, but not both. If you have given out immutable references to an object, you normally need to wait until those references are over before you can mutate that object again.
Additionally, Rust's immutability is transitive. If you receive an object immutably, it means that you have access to its contents (and the contents of those contents, and so on) also immutably.
Normally, all of these things are enforced at compile time. This means that you catch errors faster, but you are limited to being able to express only what the compiler can prove statically.
Like T and Box<T>, you may sometimes use RefCell<T>, which is another ownership type. But unlike T and Box<T>, the RefCell<T> enforces the borrow checking rules at runtime instead of compile time, meaning that sometimes you can do things with it that are safe but wouldn't pass the compiler's static borrow checker. The main example for this is getting a mutable reference to the interior of an object that was received immutably (which, under the statically enforced rules of Rust, would make the entire interior immutable).
The types Ref<T> and RefMut<T> are the runtime-checked equivalents of &T and &mut T respectively.
(EDIT: This whole thing is somewhat of a lie. &mut really means "unique borrow" and & means "non-unique borrow". Certain types, like mutexes, can be non-uniquely but still mutably borrowed, because otherwise they would be useless.)
Rust's ownership model tries to push you to write programs in which objects' lifetimes are known at compile time. This works well in certain scenarios, but makes other scenarios difficult or impossible to express.
Rc<T> and its atomic sibling Arc<T> are reference-counting wrappers of T. They offer you an alternative to the ownership model.
They are useful when you want to use and properly dispose an object, but it is not easy (or possible) to determine, at the moment you're writing the code, which specific variable should be the owner of that object (and therefore should take care of disposing it). Much like in C++, this means that there is no single owner of the object and that the object will be disposed by the last reference-counting wrapper that points to it.

The article you linked uses outdated syntax. Certain smart pointers used to have special names and associated syntax that has been removed since some time before Rust 1.0:
Box<T> replaced ~T ("owned pointers")
Rc<T> replaced #T ("managed pointers")
Because the Internet never forgets, you can still find pre-1.0 documentation and articles (such as the one you linked) that use the old syntax. Check the date of the article: if it's before May 2015, you're dealing with an early, unstable Rust.

When should I use a reference instead of transferring ownership?

From the Rust book's chapter on ownership, non-copyable values can be passed to functions by either transferring ownership or by using a mutable or immutable reference. When you transfer ownership of a value, it can't be used in the original function anymore: you must return it back if you want to. When you pass a reference, you borrow the value and can still use it.
I come from languages where values are immutable by default (Haskell, Idris and the like). As such, I'd probably never think about using references at all. Having the same value in two places looks dangerous (or, at least, awkward) to me. Since references are a feature, there must be a reason to use them.
Are there situations I should force myself to use references? What are those situations and why are they beneficial? Or are they just for convenience and defaulting to passing ownership is fine?

Mutable references in particular look very dangerous.
They are not dangerous, because the Rust compiler will not let you do anything dangerous. If you have a &mut reference to a value then you cannot simultaneously have any other references to it.
In general you should pass references around. This saves copying memory and should be the default thing you do, unless you have a good reason to do otherwise.
Some good reasons to transfer ownership instead:
When the value's type is small in size, such as bool, u32, etc. It's often better performance to move/copy these values to avoid a level of indirection. Usually these values implement Copy, and actually the compiler may make this optimisation for you automatically. Something it's free to do because of a strong type system and immutability by default!
When the value's current owner is going to go out of scope, you may want to move the value somewhere else to keep it alive.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string