Rust Box vs non-box

Rust Box vs non-box - rust

Given a rust object, is it possible to wrap it so that multiple references and a mutable reference are allowed but do not cause problems?
For example, a Vec that has multiple references and a single mutable reference.

Yes, but...
The type you're looking for is RefCell, but read on before jumping the gun!
Rust is a single-ownership language. It always will be. It's exactly that feature that makes Rust as thread-safe and memory-safe as it is. You cannot fully circumvent this, short of wrapping your entire program in unsafe and using raw pointers exclusively, and if you're going to do that, just write C since you're no longer getting any benefits out of using Rust.
So, at any given moment in your program, there must either be one thing writing to this memory or several things reading. That's the fundamental law of single-ownership. Keep that in mind; you cannot get around that. What I'm about to say still follows that rule.
Usually, we enforce this with our type signatures. If I take a &T, then I'm just an alias and won't write to it. If I take a &mut T, then nobody else can see what I'm doing till I forfeit that reference. That's usually good enough, and if we can, we want to do it that way, since we get guarantees at compile-time.
But it doesn't always work that way. Sometimes we can't prove that what we're doing is okay. Sometimes I've got two functions holding an, ostensibly, mutable reference, but I know, due to some other guarantees Rust doesn't know about, that only one will be writing to it at a time. Enter RefCell. RefCell<T> contains a single T and pretends to be immutable but lets you borrow the thing inside either mutably or immutably with try_borrow_mut and try_borrow. When we call one of these functions, we get a reference-like value that can read (and write, in the mutable case) to the original data, even though we started with a &RefCell<T> that doesn't look mutable.
But the fundamental law still holds. Note that those try_* functions return a Result, i.e. they might fail. If two functions simultaneously try to get try_borrow_mut references, the second one will fail, and it's your job to deal with that eventuality (even if "deal with that" means panic! in your particular use case). All we've done is move the single-ownership rules from compile-time to runtime. We haven't gotten rid of them; we've just changed who's responsible for enforcing them.

Related

What are the differences between Cell, RefCell, and UnsafeCell? [duplicate]

This question already has answers here:
Is this error due to the compiler's special knowledge about RefCell?
(1 answer)
How does the Rust compiler know `Cell` has internal mutability?
(3 answers)
When I can use either Cell or RefCell, which should I choose?
(3 answers)
Situations where Cell or RefCell is the best choice
(3 answers)
Closed 9 months ago.
The community reviewed whether to reopen this question 9 months ago and left it closed:
Original close reason(s) were not resolved
What are the exact differences between Cell, RefCell, and UnsafeCell?

RefCell
Let's start with RefCell, which in my personal experience is the most commonly used of the three.
Normally, in Rust, borrowing rules are checked at compile-time. You have to prove to the compiler (effectively, to a type-checker-like system called the borrow checker) that everything you're doing is fair game: that every object is either mutably borrowed once or immutably borrowed several times, and that those two threads don't cross. RefCell moves this check to runtime. You can convert a RefCell<T> to a &T with borrow and you can convert a RefCell<T> to &mut T with borrow_mut[1].
When you call these functions, at runtime, Rust checks that the usual borrow rules apply. You don't have to prove to the type checker that it works, but if it turns out that you're wrong, your program will panic[2]. We haven't changed the rules; you've just shoved them from compile time to runtime. This can be useful, for instance, if you're writing a recursive algorithm that manipulates data in some complicated way, and you've personally checked that the borrow rules are followed but you can't prove it to Rust.
Cell
Then there's Cell. A Cell is similar but it's not reference-based. Instead, the fundamental operation of a Cell is replace. replace takes a new value for the cell (by value) and, as the name implies, replaces the contents of the cell. No mutability checks need to be done in this case, since this operation happens in one fell swoop. You call the function, the function does its magic, and it returns. It's not giving you a reference back. There are other functions on Cell, like update in Nightly Rust, but they're all implemented in terms of replace.
Note that none of the types we're talking about can be shared across threads. So, in the case of Cell, there's no concern that one thread is trying to replace the value while another reads it. And in the case of RefCell, there's no need for a complicated locking mechanism to coordinate the runtime checks.
UnsafeCell
Enter UnsafeCell. UnsafeCell is RefCell with all of the safety checks removed. We haven't pushed safety off to the runtime; we've taken it in the backyard and shot it. The fundamental operation of UnsafeCell is get, which is like borrow_mut for RefCell except that it always returns a mutable pointer. It won't fail, it won't complain about race conditions or shared data, it will just give you a pointer. Note that *mut T is a raw pointer type, and the only way to modify a pointer of that type is with Unsafe Rust.
UnsafeCell should not be used directly. It's the compiler primitive used to implement the other two (safe) cell types. If you do decide that you need UnsafeCell for some reason, you need to be very careful to preserve compiler assumptions about data access, because it's very easy to make things go very wrong when you start dipping into Unsafe Rust. Trust me, I speak from experience on this. Last time I unsafe'd some of my code, it started causing enough problems that I eventually rewrote the whole thing using (safe) primitives and never actually figured out what was going wrong. It gets messy fast.
[1] You're actually converting a RefCell<T> to specialized types called Ref<'_, T> and RefMut<'_, T>, respectively, but those types are designed to act like ordinary references, so I omit that detail for brevity.
[2] There are variants of these functions called try_borrow and try_borrow_mut which return a Result rather than panicking on failure.

Rust Ownership Smart Pointers

I've recently started learning Rust and just learned about the Smart Pointers (Box, Rc and RefCell).
In the guide they talked about Rc implementing "shared ownership". But if I understood it correctly, the whole point of the ownership system is that there can only be one owner.
And to me (still a Rust newbie) it seems as if Rc and RefCell take ownership of they value they contain and just "expose" different types of references to the contained value?
Am I wrong and if yes: why is Rust allowed to "cheat" the ownership system like that and would I be theoretically able to implement my own "cheating" types?

if I understood it correctly, the whole point of the ownership system is that there can only be one owner.
No. Rust guarantees that there can be no more than a single mutable borrow and there cannot be mutable and non-mutable borrows at the same time. It doesn't say anything about owners.
why is Rust allowed to "cheat" the ownership system
It doesn't.
would I be theoretically able to implement my own "cheating" types
Yes. Those types are all implemented in Rust¹. Those types are battle-tested and perfectly safe under Rust's safety rules, but they require the use of unsafe at a lower level.
Note that unsafe doesn't permit going around the rule that you can have one mutable borrow XOR any number of non-mutable borrows, but using unsafe, you could do it anyway. This, of course, would actually be unsafe (and trigger undefined behavior).
1: Although some of those types are implemented using features that are still private to the compiler so you wouldn't be able to do everything as efficiently as the standard library, and Box and UnsafeCell are special to the language and cannot be reproduced by a normal library. There are for example many crates providing Rc or Arc alternatives which are better that the standard ones in some cases.

Understand smart pointers in Rust

I am a newbie to Rust and writing to understand the "Smart pointers" in Rust. I have basic understanding of how smart pointers works in C++ and has been using it for memory management since a few years ago. But to my very much surprise, Rust also provides such utility explicitly.
Because from a tutorial here (https://pcwalton.github.io/2013/03/18/an-overview-of-memory-management-in-rust.html), it seems that every raw pointers have been automatically wrapped with a smart pointer, which seems very reasonable. Then why do we still need such Box<T>, Rc<T>, and Ref<T> stuff? According to this specification: https://doc.rust-lang.org/book/ch15-00-smart-pointers.html
Any comments will be apprecicated a lot. Thanks.

You can think about the difference between a T and a Box<T> as the difference between a statically allocated object and a dynamically allocated object (the latter being created via a new expression in C++ terms).
In Rust, both T and Box<T> represent a variable that has ownership over the referent object (i.e. when the variable goes out of scope, the object will be destroyed, whether it was stored by value or by reference). On the contrary, &T and &mut T represent borrowing of the object (i.e. these variables are not responsible for destroying the object, and they cannot outlive the owner of the object).
By default, you'd probably want to use T, but sometimes you might want (or have) to use Box<T>. For example, you would use a Box<T> if you want to own a T that's too large to be allocated in place. You would also use it when the object doesn't have a known size at all, which means that your only choice to store it or pass it around is through the "pointer" (the Box<T>).
In Rust, an object is generally either mutable or aliased, but not both. If you have given out immutable references to an object, you normally need to wait until those references are over before you can mutate that object again.
Additionally, Rust's immutability is transitive. If you receive an object immutably, it means that you have access to its contents (and the contents of those contents, and so on) also immutably.
Normally, all of these things are enforced at compile time. This means that you catch errors faster, but you are limited to being able to express only what the compiler can prove statically.
Like T and Box<T>, you may sometimes use RefCell<T>, which is another ownership type. But unlike T and Box<T>, the RefCell<T> enforces the borrow checking rules at runtime instead of compile time, meaning that sometimes you can do things with it that are safe but wouldn't pass the compiler's static borrow checker. The main example for this is getting a mutable reference to the interior of an object that was received immutably (which, under the statically enforced rules of Rust, would make the entire interior immutable).
The types Ref<T> and RefMut<T> are the runtime-checked equivalents of &T and &mut T respectively.
(EDIT: This whole thing is somewhat of a lie. &mut really means "unique borrow" and & means "non-unique borrow". Certain types, like mutexes, can be non-uniquely but still mutably borrowed, because otherwise they would be useless.)
Rust's ownership model tries to push you to write programs in which objects' lifetimes are known at compile time. This works well in certain scenarios, but makes other scenarios difficult or impossible to express.
Rc<T> and its atomic sibling Arc<T> are reference-counting wrappers of T. They offer you an alternative to the ownership model.
They are useful when you want to use and properly dispose an object, but it is not easy (or possible) to determine, at the moment you're writing the code, which specific variable should be the owner of that object (and therefore should take care of disposing it). Much like in C++, this means that there is no single owner of the object and that the object will be disposed by the last reference-counting wrapper that points to it.

The article you linked uses outdated syntax. Certain smart pointers used to have special names and associated syntax that has been removed since some time before Rust 1.0:
Box<T> replaced ~T ("owned pointers")
Rc<T> replaced #T ("managed pointers")
Because the Internet never forgets, you can still find pre-1.0 documentation and articles (such as the one you linked) that use the old syntax. Check the date of the article: if it's before May 2015, you're dealing with an early, unstable Rust.

When should I use a reference instead of transferring ownership?

From the Rust book's chapter on ownership, non-copyable values can be passed to functions by either transferring ownership or by using a mutable or immutable reference. When you transfer ownership of a value, it can't be used in the original function anymore: you must return it back if you want to. When you pass a reference, you borrow the value and can still use it.
I come from languages where values are immutable by default (Haskell, Idris and the like). As such, I'd probably never think about using references at all. Having the same value in two places looks dangerous (or, at least, awkward) to me. Since references are a feature, there must be a reason to use them.
Are there situations I should force myself to use references? What are those situations and why are they beneficial? Or are they just for convenience and defaulting to passing ownership is fine?

Mutable references in particular look very dangerous.
They are not dangerous, because the Rust compiler will not let you do anything dangerous. If you have a &mut reference to a value then you cannot simultaneously have any other references to it.
In general you should pass references around. This saves copying memory and should be the default thing you do, unless you have a good reason to do otherwise.
Some good reasons to transfer ownership instead:
When the value's type is small in size, such as bool, u32, etc. It's often better performance to move/copy these values to avoid a level of indirection. Usually these values implement Copy, and actually the compiler may make this optimisation for you automatically. Something it's free to do because of a strong type system and immutability by default!
When the value's current owner is going to go out of scope, you may want to move the value somewhere else to keep it alive.

What's the right way to have a thread-safe lazy-initialized possibly mutable value in Rust?

I have a struct that contains a field that is rather expensive to initialize, so I want to be able to do so lazily. However, this may be necessary in a method that takes &self. The field also needs to be able to modified once it is initialized, but this will only occur in methods that take &mut self.
What is the correct (as in idiomatic, as well as in thread-safe) way to do this in Rust? It seems to me that it would be trivial with either of the two constraints:
If it only needed to be lazily initialized, and not mutated, I could simply use lazy-init's Lazy<T> type.
If it only needed to be mutable and not lazy, then I could just use a normal field (obviously).
However, I'm not quite sure what to do with both in place. RwLock seems relevant, but it appears that there is considerable trickiness to thread-safe lazy initialization given what I've seen of lazy-init's source, so I am hesitant to roll my own solution based on it.

The simplest solution is RwLock<Option<T>>.
However, I'm not quite sure what to do with both in place. RwLock seems relevant, but it appears that there is considerable trickiness to thread-safe lazy initialization given what I've seen of lazy-init's source, so I am hesitant to roll my own solution based on it.
lazy-init uses tricky code because it guarantees lock-free access after creation. Lock-free is always a bit trickier.
Note that in Rust it's easy to tell whether something is tricky or not: tricky means using an unsafe block. Since you can use RwLock<Option<T>> without any unsafe block there is nothing for you to worry about.
A variant to RwLock<Option<T>> may be necessary if you want to capture a closure for initialization once, rather than have to pass it at each potential initialization call-site.
In this case, you'll need something like RwLock<SimpleLazy<T>> where:
enum SimpleLazy<T> {
Initialized(T),
Uninitialized(Box<FnOnce() -> T>),
}
You don't have to worry about making SimpleLazy<T> Sync as RwLock will take care of that for you.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string