Data to be determined later: interior mutability or separate HashMap?

Data to be determined later: interior mutability or separate HashMap? - rust

I have a struct, call it Book, which let's say stores data on a book sold by a bookstore. It needs to be referenced at many places in some data structure (e.g. with Rc) and so cannot be borrowed mutably in the normal way. However, it has some attribute, say its price, that needs to be filled in at some time later than initialization, after the object already has outstanding references.
So far I can think of two ways to do this, but they both have disadvantages:
Interior mutability: give Book a field such as price: RefCell<Option<i32>> which is initialized to RefCell::new(Option::None) when Book is initialized. Later on, when we determine the price of the book, we can use borrow_mut to set price to Some(10) instead, and from then on we can borrow it to retrieve its value.
My sense is that in general, one wants to avoid interior mutability unless necessary, and it doesn't seem here like it ought to be all that necessary. This technique is also a little awkward because of the Option, which we need because the price won't have a value until later (and setting it to 0 or -1 in the meantime seems un-Rustlike), but which requires lots of matches or unwraps in places where we may be logically certain that the price will have already been filled in.
Separate table: don't store the price inside Book at all, but make a separate data structure to store it, e.g. price_table: HashMap<Rc<Book>, i32>. Have a function which creates and populates this table when prices are determined, and then pass it around by reference (mutably or not) to every function that needs to know or change the prices of books.
Coming from a C background as I do, the HashMap feels like unnecessary overhead both in speed and memory, for data that already has a natural place to live (inside Book) and "should" be accessible via a simple pointer chase. This solution also means I have to clutter up lots of functions with an additional argument that's a reference to price_table.
Is one of these two methods generally more idiomatic in Rust, or are there other approaches that avoid the dilemma? I did see Once, but I don't think it's what I want, because I'd still have to know at initialization time how to fill in price, and I don't know that.
Of course, in other applications, we may need some other type than i32 to represent our desired attribute, so I'd like to be able to handle the general case.

I think that your first approach is optimal for this situation. Since you have outstanding references to some data that you want to write to, you have to check the borrowing rules at runtime, so RefCell is the way to go.
Inside the RefCell, prefer an Option or a custom enum with variants like Price::NotSet and Price::Set(i32). If you are really sure, that all prices are initialized at some point, you could write a method price() that calls unwrap for you or does an assertion with better debug output in the case your RefCell contains a None.
I guess that the HashMap approach would be fine for this case, but if you wanted to have something that is not Copy as your value in there, you could run into the same problem, since there might be outstanding references into the map somewhere.
I agree that the HashMap would not be the idiomatic way to go here and still choose your first approach, even with i32 as the value type.
Edit:
As pointed out in the comments (thanks you!), there are two performance considerations for this situation. Firstly, if you really know, that the contained price is never zero, you can use std::num::NonZeroU16 and get the Option variant None for free (see documentation).
If you are dealing with a type that is Copy (e.g. i32), you should consider using Cell instead of RefCell, because it is lighter. For a more detailed comparison, see https://stackoverflow.com/a/30276150/13679671

Here are two more approaches.
Use Rc<RefCell<<Book>> everywhere, with price: Option<i32>> in the struct.
Declare a strict BookId(usize) and make a library: HashMap<BookId, Book>. Make all your references BookId and thus indirectly reference books through them everywhere you need to do so.

Related

Rust Box vs non-box

Given a rust object, is it possible to wrap it so that multiple references and a mutable reference are allowed but do not cause problems?
For example, a Vec that has multiple references and a single mutable reference.

Yes, but...
The type you're looking for is RefCell, but read on before jumping the gun!
Rust is a single-ownership language. It always will be. It's exactly that feature that makes Rust as thread-safe and memory-safe as it is. You cannot fully circumvent this, short of wrapping your entire program in unsafe and using raw pointers exclusively, and if you're going to do that, just write C since you're no longer getting any benefits out of using Rust.
So, at any given moment in your program, there must either be one thing writing to this memory or several things reading. That's the fundamental law of single-ownership. Keep that in mind; you cannot get around that. What I'm about to say still follows that rule.
Usually, we enforce this with our type signatures. If I take a &T, then I'm just an alias and won't write to it. If I take a &mut T, then nobody else can see what I'm doing till I forfeit that reference. That's usually good enough, and if we can, we want to do it that way, since we get guarantees at compile-time.
But it doesn't always work that way. Sometimes we can't prove that what we're doing is okay. Sometimes I've got two functions holding an, ostensibly, mutable reference, but I know, due to some other guarantees Rust doesn't know about, that only one will be writing to it at a time. Enter RefCell. RefCell<T> contains a single T and pretends to be immutable but lets you borrow the thing inside either mutably or immutably with try_borrow_mut and try_borrow. When we call one of these functions, we get a reference-like value that can read (and write, in the mutable case) to the original data, even though we started with a &RefCell<T> that doesn't look mutable.
But the fundamental law still holds. Note that those try_* functions return a Result, i.e. they might fail. If two functions simultaneously try to get try_borrow_mut references, the second one will fail, and it's your job to deal with that eventuality (even if "deal with that" means panic! in your particular use case). All we've done is move the single-ownership rules from compile-time to runtime. We haven't gotten rid of them; we've just changed who's responsible for enforcing them.

Creating a vector of mutable Trait objects (sometimes shared) in Rust

I need to create a stack of pointers with the following constraints:
The pointers need to point to the same Trait object (so Box seems like a fit)
Those Trait objects may need to be modified (RefCell may need to be used?)
Two pointers in the stack may need to point to the same object (Rc seems like a fit)
Right now, the only way I've found to accommodate this is to use a Vec<Rc<RefCell<Box<dyn MyTrait>>>>. Is that the best solution though? It looks like a lot of pointer dereferences needed to access the objects.

I'm not quite sure what you exactly mean with:
The pointers need to point to the same Trait object (so Box seems like a fit)
But if you are interested in storing objects of actually different types, then you need trait-objects and those need to be behind some sort of pointer such as a Box. And a Box is generally a good default (but there are alternatives).
Those Trait objects may need to be modified (RefCell may need to be used?)
Well, actually, that could still be done with a Box.
Two pointers in the stack may need to point to the same object (Rc seems like a fit)
Here, it gets difficult because in Rust sharable and mutable are kind of exclude each other. To be sharable, we need an Rc, which you can think of as a shared box. Then to make it mutable anyway, we can use interior mutability by using a RefCell. So, essentially a Rc<RefCell<_>>, which you can think of as a sharable & mutable Box.
Finally, if you put it all together into a Vec you get: Vec<Rc<RefCell<dyn MyTrait>>> (no Box).
This allows you to have different types in the Vec, having some instances even multiple times in it, and still allowing mutable access to each of them.

How to convert Haxe Array/Vector to another type

Let's say I've got an array or vector of some parent type. To pass it to a function, I need it to be some child type (which I know beforehand that all elements are guaranteed to be all that child type). Is there a convenient way to do that? Right now I can only think to make a whole new array.
Also, it looks like it won't let me do it the other way around: it won't accept an array of child type in the place of the parent type. Is there a good way to solve this situation as well?
It looks like cast v works, but is this the preferred way?

To pass it to a function, I need it to be some child type (which I know beforehand that all elements are guaranteed to be all that child type).
If you really are confident that that's the case, it is safe to use a cast. I don't think there's any prettier way of doing this, nor should there be, as it inherently isn't pretty. Having to do this often indicates a design flaw in your code or the API that is being used.
For the reverse case, it's helpful to understand why it's not safe. The reason is not necessarily as intuitive because of this thought process:
I can assign Child to Base, so why can't I assign Array<Child> to Array<Base>?
This exact example is used to explain Variance in the Haxe Manual. You should definitely read it in full, but I'll give a quick summary here:
var children = [new Child()];
var bases:Array<Base> = cast children;
bases.push(new OtherChild());
children[1].childMethod(); // runtime crash
If you could assign the Array<Child> to an Array<Base>, you could then push() types that are incompatible with Child into it. But again, as you mentioned, you can just cast it to silence the compiler as in the code snippet above.
However, this is not always safe - there might still be code holding a reference to that original Array<Child>, which now suddenly contains things that it doesn't expect! This means we could do something like calling childMethod() on an object that doesn't have that method, and cause a runtime crash.
The opposite is also true, if there's no code holding onto such a reference (or if the references are read-only, for instance via haxe.ds.ReadOnlyArray), it is safe to use a cast.
At the end of the day it's a trade-off between the performance cost of making a copy (which might be negligible depending on the size) and how confident you are that you're smarter than the compiler / know about all references that exist.

When should I use a reference instead of transferring ownership?

From the Rust book's chapter on ownership, non-copyable values can be passed to functions by either transferring ownership or by using a mutable or immutable reference. When you transfer ownership of a value, it can't be used in the original function anymore: you must return it back if you want to. When you pass a reference, you borrow the value and can still use it.
I come from languages where values are immutable by default (Haskell, Idris and the like). As such, I'd probably never think about using references at all. Having the same value in two places looks dangerous (or, at least, awkward) to me. Since references are a feature, there must be a reason to use them.
Are there situations I should force myself to use references? What are those situations and why are they beneficial? Or are they just for convenience and defaulting to passing ownership is fine?

Mutable references in particular look very dangerous.
They are not dangerous, because the Rust compiler will not let you do anything dangerous. If you have a &mut reference to a value then you cannot simultaneously have any other references to it.
In general you should pass references around. This saves copying memory and should be the default thing you do, unless you have a good reason to do otherwise.
Some good reasons to transfer ownership instead:
When the value's type is small in size, such as bool, u32, etc. It's often better performance to move/copy these values to avoid a level of indirection. Usually these values implement Copy, and actually the compiler may make this optimisation for you automatically. Something it's free to do because of a strong type system and immutability by default!
When the value's current owner is going to go out of scope, you may want to move the value somewhere else to keep it alive.

What's the right way to have a thread-safe lazy-initialized possibly mutable value in Rust?

I have a struct that contains a field that is rather expensive to initialize, so I want to be able to do so lazily. However, this may be necessary in a method that takes &self. The field also needs to be able to modified once it is initialized, but this will only occur in methods that take &mut self.
What is the correct (as in idiomatic, as well as in thread-safe) way to do this in Rust? It seems to me that it would be trivial with either of the two constraints:
If it only needed to be lazily initialized, and not mutated, I could simply use lazy-init's Lazy<T> type.
If it only needed to be mutable and not lazy, then I could just use a normal field (obviously).
However, I'm not quite sure what to do with both in place. RwLock seems relevant, but it appears that there is considerable trickiness to thread-safe lazy initialization given what I've seen of lazy-init's source, so I am hesitant to roll my own solution based on it.

The simplest solution is RwLock<Option<T>>.
However, I'm not quite sure what to do with both in place. RwLock seems relevant, but it appears that there is considerable trickiness to thread-safe lazy initialization given what I've seen of lazy-init's source, so I am hesitant to roll my own solution based on it.
lazy-init uses tricky code because it guarantees lock-free access after creation. Lock-free is always a bit trickier.
Note that in Rust it's easy to tell whether something is tricky or not: tricky means using an unsafe block. Since you can use RwLock<Option<T>> without any unsafe block there is nothing for you to worry about.
A variant to RwLock<Option<T>> may be necessary if you want to capture a closure for initialization once, rather than have to pass it at each potential initialization call-site.
In this case, you'll need something like RwLock<SimpleLazy<T>> where:
enum SimpleLazy<T> {
Initialized(T),
Uninitialized(Box<FnOnce() -> T>),
}
You don't have to worry about making SimpleLazy<T> Sync as RwLock will take care of that for you.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string