Situations where Cell or RefCell is the best choice - rust

When would you be required to use Cell or RefCell? It seems like there are many other type choices that would be suitable in place of these, and the documentation warns that using RefCell is a bit of a "last resort".
Is using these types a "code smell"? Can anyone show an example where using these types makes more sense than using another type, such as Rc or even Box?

It is not entirely correct to ask when Cell or RefCell should be used over Box and Rc because these types solve different problems. Indeed, more often than not RefCell is used together with Rc in order to provide mutability with shared ownership. So yes, use cases for Cell and RefCell are entirely dependent on the mutability requirements in your code.
Interior and exterior mutability are very nicely explained in the official Rust book, in the designated chapter on mutability. External mutability is very closely tied to the ownership model, and mostly when we say that something is mutable or immutable we mean exactly the external mutability. Another name for external mutability is inherited mutability, which probably explains the concept more clearly: this kind of mutability is defined by the owner of the data and inherited to everything you can reach from the owner. For example, if your variable of a structural type is mutable, so are all fields of the structure in the variable:
struct Point { x: u32, y: u32 }
// the variable is mutable...
let mut p = Point { x: 10, y: 20 };
// ...and so are fields reachable through this variable
p.x = 11;
p.y = 22;
let q = Point { x: 10, y: 20 };
q.x = 33; // compilation error
Inherited mutability also defines which kinds of references you can get out of the value:
{
let px: &u32 = &p.x; // okay
}
{
let py: &mut u32 = &mut p.x; // okay, because p is mut
}
{
let qx: &u32 = &q.x; // okay
}
{
let qy: &mut u32 = &mut q.y; // compilation error since q is not mut
}
Sometimes, however, inherited mutability is not enough. The canonical example is reference-counted pointer, called Rc in Rust. The following code is entirely valid:
{
let x1: Rc<u32> = Rc::new(1);
let x2: Rc<u32> = x1.clone(); // create another reference to the same data
let x3: Rc<u32> = x2.clone(); // even another
} // here all references are destroyed and the memory they were pointing at is deallocated
At the first glance it is not clear how mutability is related to this, but recall that reference-counted pointers are called so because they contain an internal reference counter which is modified when a reference is duplicated (clone() in Rust) and destroyed (goes out of scope in Rust). Hence Rc has to modify itself even though it is stored inside a non-mut variable.
This is achieved via internal mutability. There are special types in the standard library, the most basic of them being UnsafeCell, which allow one to work around the rules of external mutability and mutate something even if it is stored (transitively) in a non-mut variable.
Another way to say that something has internal mutability is that this something can be modified through a &-reference - that is, if you have a value of type &T and you can modify the state of T which it points at, then T has internal mutability.
For example, Cell can contain Copy data and it can be mutated even if it is stored in non-mut location:
let c: Cell<u32> = Cell::new(1);
c.set(2);
assert_eq!(c.get(), 2);
RefCell can contain non-Copy data and it can give you &mut pointers to its contained value, and absence of aliasing is checked at runtime. This is all explained in detail on their documentation pages.
As it turned out, in overwhelming number of situations you can easily go with external mutability only. Most of existing high-level code in Rust is written that way. Sometimes, however, internal mutability is unavoidable or makes the code much clearer. One example, Rc implementation, is already described above. Another one is when you need shared mutable ownership (that is, you need to access and modify the same value from different parts of your code) - this is usually achieved via Rc<RefCell<T>>, because it can't be done with references alone. Even another example is Arc<Mutex<T>>, Mutex being another type for internal mutability which is also safe to use across threads.
So, as you can see, Cell and RefCell are not replacements for Rc or Box; they solve the task of providing you mutability somewhere where it is not allowed by default. You can write your code without using them at all; and if you get into a situation when you would need them, you will know it.
Cells and RefCells are not code smell; the only reason whey they are described as "last resort" is that they move the task of checking mutability and aliasing rules from the compiler to the runtime code, as in case with RefCell: you can't have two &muts pointing to the same data at the same time, this is statically enforced by the compiler, but with RefCells you can ask the same RefCell to give you as much &muts as you like - except that if you do it more than once it will panic at you, enforcing aliasing rules at runtime. Panics are arguably worse than compilation errors because you can only find errors causing them at runtime rather than at compilation time. Sometimes, however, the static analyzer in the compiler is too restrictive, and you indeed do need to "work around" it.

No, Cell and RefCell aren't "code smells". Normally, mutability is inherited, that is you can mutate a field or a part of a data structure if and only if you have exclusive access to of the whole data structure, and hence you can opt into mutability at that level with mut (i.e., foo.x inherits its mutability or lack thereof from foo). This is a very powerful pattern and should be used whenever it works well (which is surprisingly often). But it's not expressive enough for all code everywhere.
Box and Rc have nothing to do with this. Like almost all other types, they respect inherited mutability: you can mutate the contents of a Box if you have exclusive, mutable access to the Box (because that means you have exclusive access to the contents, too). Conversely, you can never get a &mut to the contents of an Rc because by its nature Rc is shared (i.e. there can be multiple Rcs referring to the same data).
One common case of Cell or RefCell is that you need to share mutable data between several places. Having two &mut references to the same data is normally not allowed (and for good reason!). However, sometimes you need it, and the cell types enable doing it safely.
This could be done via the common combination of Rc<RefCell<T>>, which allows the data to stick around for as long as anyone uses it and allows everyone (but only one at a time!) to mutate it. Or it could be as simple as &Cell<i32> (even if the cell is wrapped in a more meaningful type). The latter is also commonly used for internal, private, mutable state like reference counts.
The documentation actually has several examples of where you'd use Cell or RefCell. A good example is actually Rc itself. When creating a new Rc, the reference count must be increased, but the reference count is shared between all Rcs, so, by inherited mutability, this couldn't possibly work. Rc practically has to use a Cell.
A good guideline is to try writing as much code as possible without cell types, but using them when it hurts too much without them. In some cases, there is a good solution without cells, and, with experience, you'll be able to find those when you previously missed them, but there will always be things that just aren't possible without them.

Suppose you want or need to create some object of the type of your choice and dump it into an Rc.
let x = Rc::new(5i32);
Now, you can easily create another Rc that points to the exact same object and therefore memory location:
let y = x.clone();
let yval: i32 = *y;
Since in Rust you may never have a mutable reference to a memory location to which any other reference exists, these Rc containers can never be modified again.
So, what if you wanted to be able to modify those objects and have multiple Rc pointing to one and the same object?
This is the issue that Cell and RefCell solve. The solution is called "interior mutability", and it means that Rust's aliasing rules are enforced at runtime instead of compile-time.
Back to our original example:
let x = Rc::new(RefCell::new(5i32));
let y = x.clone();
To get a mutable reference to your type, you use borrow_mut on the RefCell.
let yval = x.borrow_mut();
*yval = 45;
In case you already borrowed the value your Rcs point to either mutably or non-mutably, the borrow_mut function will panic, and therefore enforce Rust's aliasing rules.
Rc<RefCell<T>> is just one example for RefCell, there are many other legitimate uses. But the documentation is right. If there is another way, use it, because the compiler cannot help you reason about RefCells.

Related

How does `write` mode on a `RwLock` work if it isn't mutable?

So, I was writing some code and apparently R.A. didn't warn me about some erroneous stuff I had written in regards to how ownership works with lambdas.
So, a friend helped me rewrite some of my code, and this is just a play example, but their new code boils down to this:
let vec = Rc::new(RwLock::new( Vec::new() ));
let vec_rc = vec.clone();
let my_lambda = || -> () {
vec_rc.write().unwrap().push(/* ... */);
}
But what I don't understand is how this works if vec_rc isn't mut.
From my prior knowledge, mutable in Rust cascades; in other words, if the "master-containing" object is immutable the rest will have to be too.
Could I please get some clarity as to what goes on under the hood?
Or is their code erroneous too?
From my prior knowledge, mutable in Rust cascades; in other words, if the "master-containing" object is immutable the rest will have to be too.
This is almost always true... Until we consider interior mutability.
Interior mutability is exactly about that: changing a value through a shared reference. Moreover, while there are other shared references to it. The basic interior mutability primitive is UnsafeCell, but there are multiple abstractions built on top of it - one of them is RwLock (you can see it secretly contains an UnsafeCell).
As a side note, Rc<RwLock<T>> is almost always wrong: Rc is non thread safe, which defeats the whole purpose of RwLock. If you just need shared mutable stated over one thread, use Rc<RefCell<T>>. It is much more performant and can't block (so no deadlock debugging, just a simple panic if something went wrong).

Understanding usage of Rc<RefCell<SomeStruct>> in Rust

I'm looking at some code that uses
Rc<RefCell<SomeStruct>>
So I went out to read about the differences between Rc and RefCell:
Here is a recap of the reasons to choose Box, Rc, or RefCell:
Rc enables multiple owners of the same data; Box and RefCell
have single owners.
Box allows immutable or mutable borrows checked
at compile time; Rc allows only immutable borrows checked at
compile time;
RefCell allows immutable or mutable borrows checked
at runtime. Because RefCell allows mutable borrows checked at
runtime, you can mutate the value inside the RefCell even when the
RefCell is immutable.
So, Rc makes sure that SomeStruct is accessible by many people at the same time. But how do I access? I only see the get_mut method, which returns a mutable reference. But the text explained that "Rc allows only immutable borrows".
If it's possible to access Rc's object in mut and not mut way, why a RefCell is needed?
So, Rc makes sure that SomeStruct is accessible by many people at the same time. But how do I access?
By dereferencing. If you have a variable x of type Rc<...>, you can access the inner value using *x. In many cases this happens implicitly; for example you can call methods on x simply with x.method(...).
I only see the get_mut method, which returns a mutable reference. But the text explained that "Rc allows only immutable borrows".
The get_mut() method is probably more recent than the explanation stating that Rc only allows immutable borrows. Moreover, it only returns a mutable borrow if there currently is only a single owner of the inner value, i.e. if you currently wouldn't need Rc in the first place. As soon as there are multiple owners, get_mut() will return None.
If it's possible to access Rc's object in mut and not mut way, why a RefCell is needed?
RefCell will allow you to get mutable access even when multiple owners exist, and even if you only hold a shared reference to the RefCell. It will dynamically check at runtime that only a single mutable reference exists at any given time, and it will panic if you request a second, concurrent one (or return and error for the try_borrow methods, respecitvely). This functionality is not offered by Rc.
So in summary, Rc gives you shared ownership. The innervalue has multiple owners, and reference counting makes sure the data stays alive as long as at least one owner still holds onto it. This is useful if your data doesn't have a clear single owner. RefCell gives you interior mutability, i.e. you can borrow the inner value dynamically at runtime, and modify it even with a shared reference. The combination Rc<RefCell<...>> gives you the combination of both – a value with multiple owners that can be borrowed mutably by any one of the owners.
For further details, you can read the relevant chapters of the Rust book:
Rc<T>, the Reference Counted Smart Pointer
RefCell<T> and the Interior Mutability Pattern
If it's possible to access Rc's object in mut and not mut way, why a
RefCell is needed?
Rc pointer allows you to have shared ownership. since ownership is shared, the value owned by Rc pointer is immutable
Refcell smart pointer represents single ownership over the data it holds, much like Box smart pointer. the difference is that box smart pointer enforces the borrowing rules at compile time, whereas refcell enforces the borrowing rules at run time.
If you combine them together, you can create a smart pointer which can have multiple owners, and some of the owners would be able to modify the value some cannot. A perfect use case is to create a doubly linked list in rust.
struct LinkedList<T>{
head:Pointer<T>,
tail:Pointer<T>
}
struct Node<T>{
element:T,
next:Pointer<T>,
prev:Pointer<T>,
}
// we need multiple owners who can mutate the data
// it is Option because "end.next" would be None
type Pointer<T>=Option<Rc<RefCell<Node<T>>>>;
In the image "front" and "end" nodes will both point to the "middle" node and they can both mutate it. Imagine you need to insert a new node after "front", you will need to mutate "front.next". So in doubly linked you need multiple ownership and mutability power at the same time.

How to modify private mutable state when the trait dictates a non-mutable self reference? [duplicate]

When would you be required to use Cell or RefCell? It seems like there are many other type choices that would be suitable in place of these, and the documentation warns that using RefCell is a bit of a "last resort".
Is using these types a "code smell"? Can anyone show an example where using these types makes more sense than using another type, such as Rc or even Box?
It is not entirely correct to ask when Cell or RefCell should be used over Box and Rc because these types solve different problems. Indeed, more often than not RefCell is used together with Rc in order to provide mutability with shared ownership. So yes, use cases for Cell and RefCell are entirely dependent on the mutability requirements in your code.
Interior and exterior mutability are very nicely explained in the official Rust book, in the designated chapter on mutability. External mutability is very closely tied to the ownership model, and mostly when we say that something is mutable or immutable we mean exactly the external mutability. Another name for external mutability is inherited mutability, which probably explains the concept more clearly: this kind of mutability is defined by the owner of the data and inherited to everything you can reach from the owner. For example, if your variable of a structural type is mutable, so are all fields of the structure in the variable:
struct Point { x: u32, y: u32 }
// the variable is mutable...
let mut p = Point { x: 10, y: 20 };
// ...and so are fields reachable through this variable
p.x = 11;
p.y = 22;
let q = Point { x: 10, y: 20 };
q.x = 33; // compilation error
Inherited mutability also defines which kinds of references you can get out of the value:
{
let px: &u32 = &p.x; // okay
}
{
let py: &mut u32 = &mut p.x; // okay, because p is mut
}
{
let qx: &u32 = &q.x; // okay
}
{
let qy: &mut u32 = &mut q.y; // compilation error since q is not mut
}
Sometimes, however, inherited mutability is not enough. The canonical example is reference-counted pointer, called Rc in Rust. The following code is entirely valid:
{
let x1: Rc<u32> = Rc::new(1);
let x2: Rc<u32> = x1.clone(); // create another reference to the same data
let x3: Rc<u32> = x2.clone(); // even another
} // here all references are destroyed and the memory they were pointing at is deallocated
At the first glance it is not clear how mutability is related to this, but recall that reference-counted pointers are called so because they contain an internal reference counter which is modified when a reference is duplicated (clone() in Rust) and destroyed (goes out of scope in Rust). Hence Rc has to modify itself even though it is stored inside a non-mut variable.
This is achieved via internal mutability. There are special types in the standard library, the most basic of them being UnsafeCell, which allow one to work around the rules of external mutability and mutate something even if it is stored (transitively) in a non-mut variable.
Another way to say that something has internal mutability is that this something can be modified through a &-reference - that is, if you have a value of type &T and you can modify the state of T which it points at, then T has internal mutability.
For example, Cell can contain Copy data and it can be mutated even if it is stored in non-mut location:
let c: Cell<u32> = Cell::new(1);
c.set(2);
assert_eq!(c.get(), 2);
RefCell can contain non-Copy data and it can give you &mut pointers to its contained value, and absence of aliasing is checked at runtime. This is all explained in detail on their documentation pages.
As it turned out, in overwhelming number of situations you can easily go with external mutability only. Most of existing high-level code in Rust is written that way. Sometimes, however, internal mutability is unavoidable or makes the code much clearer. One example, Rc implementation, is already described above. Another one is when you need shared mutable ownership (that is, you need to access and modify the same value from different parts of your code) - this is usually achieved via Rc<RefCell<T>>, because it can't be done with references alone. Even another example is Arc<Mutex<T>>, Mutex being another type for internal mutability which is also safe to use across threads.
So, as you can see, Cell and RefCell are not replacements for Rc or Box; they solve the task of providing you mutability somewhere where it is not allowed by default. You can write your code without using them at all; and if you get into a situation when you would need them, you will know it.
Cells and RefCells are not code smell; the only reason whey they are described as "last resort" is that they move the task of checking mutability and aliasing rules from the compiler to the runtime code, as in case with RefCell: you can't have two &muts pointing to the same data at the same time, this is statically enforced by the compiler, but with RefCells you can ask the same RefCell to give you as much &muts as you like - except that if you do it more than once it will panic at you, enforcing aliasing rules at runtime. Panics are arguably worse than compilation errors because you can only find errors causing them at runtime rather than at compilation time. Sometimes, however, the static analyzer in the compiler is too restrictive, and you indeed do need to "work around" it.
No, Cell and RefCell aren't "code smells". Normally, mutability is inherited, that is you can mutate a field or a part of a data structure if and only if you have exclusive access to of the whole data structure, and hence you can opt into mutability at that level with mut (i.e., foo.x inherits its mutability or lack thereof from foo). This is a very powerful pattern and should be used whenever it works well (which is surprisingly often). But it's not expressive enough for all code everywhere.
Box and Rc have nothing to do with this. Like almost all other types, they respect inherited mutability: you can mutate the contents of a Box if you have exclusive, mutable access to the Box (because that means you have exclusive access to the contents, too). Conversely, you can never get a &mut to the contents of an Rc because by its nature Rc is shared (i.e. there can be multiple Rcs referring to the same data).
One common case of Cell or RefCell is that you need to share mutable data between several places. Having two &mut references to the same data is normally not allowed (and for good reason!). However, sometimes you need it, and the cell types enable doing it safely.
This could be done via the common combination of Rc<RefCell<T>>, which allows the data to stick around for as long as anyone uses it and allows everyone (but only one at a time!) to mutate it. Or it could be as simple as &Cell<i32> (even if the cell is wrapped in a more meaningful type). The latter is also commonly used for internal, private, mutable state like reference counts.
The documentation actually has several examples of where you'd use Cell or RefCell. A good example is actually Rc itself. When creating a new Rc, the reference count must be increased, but the reference count is shared between all Rcs, so, by inherited mutability, this couldn't possibly work. Rc practically has to use a Cell.
A good guideline is to try writing as much code as possible without cell types, but using them when it hurts too much without them. In some cases, there is a good solution without cells, and, with experience, you'll be able to find those when you previously missed them, but there will always be things that just aren't possible without them.
Suppose you want or need to create some object of the type of your choice and dump it into an Rc.
let x = Rc::new(5i32);
Now, you can easily create another Rc that points to the exact same object and therefore memory location:
let y = x.clone();
let yval: i32 = *y;
Since in Rust you may never have a mutable reference to a memory location to which any other reference exists, these Rc containers can never be modified again.
So, what if you wanted to be able to modify those objects and have multiple Rc pointing to one and the same object?
This is the issue that Cell and RefCell solve. The solution is called "interior mutability", and it means that Rust's aliasing rules are enforced at runtime instead of compile-time.
Back to our original example:
let x = Rc::new(RefCell::new(5i32));
let y = x.clone();
To get a mutable reference to your type, you use borrow_mut on the RefCell.
let yval = x.borrow_mut();
*yval = 45;
In case you already borrowed the value your Rcs point to either mutably or non-mutably, the borrow_mut function will panic, and therefore enforce Rust's aliasing rules.
Rc<RefCell<T>> is just one example for RefCell, there are many other legitimate uses. But the documentation is right. If there is another way, use it, because the compiler cannot help you reason about RefCells.

Why do immutable references to copy types in rust exist?

So I just started learning rust (first few chapters of "the book") and am obviously quite a noob. I finished the ownership-basics chapter (4) and wrote some test programs to make sure I understood everything. I seem to have the basics down but I asked myself why immutable references to copy-types are even possible. I will try to explain my thoughts with examples.
I thought that you maybe want to store a reference to a copy-type so you can check it's value later instead of having a copy of the old value but this can't be it since the underlying value can't be changed as long as it's been borrowed.
The most basic example of this would be this code:
let mut x = 10; // push i32
let x_ref = &x; // push immutable reference to x
// x = 100; change x which is disallowed since it's borrowed currently
println!("{}", x_ref); // do something with the reference since you want the current value of x
The only reason for this I can currently think of (with my current knowledge) is that they just exist so you can call generic methods which require references (like cmp) with them.
This code demonstrates this:
let x = 10; // push i32
// let ordering = 10.cmp(x); try to compare it but you can't since cmp wants a reference
let ordering = 10.cmp(&x) // this works since it's now a reference
So, is that the only reason you can create immutable references to copy-types?
Disclaimer:
I don't see Just continue reading the book as a valid answer. However I fully understand if you say something like Yes you need those for this and this use-case (optional example), it will be covered in chapter X. I hope you understand what I mean :)
EDIT:
Maybe worth mentioning, I'm a C# programmer and not new to programming itself.
EDIT 2:
I don't know if this is technically a duplicate of this question but I do not fully understand the question and the answer so I hope for a more simple answer understandable by a real noob.
An immutable reference to a Copy-type is still "an immutable reference". The code that gets passed the reference can't change the original value. It can make a (hopefully) trivial copy of that value, but it can still only ever change that copy after doing so.
That is, the original owner of the value is ensured that - while receivers of the reference may decide to make a copy and change that - the state of whatever is referenced can't ever change. If the receiver wants to change the value, it can feel free; nobody else is going to see it, though.
Immutable references to primitives are not different, and while being Copy everywhere, you are probably more inclined to what "an immutable reference" means semantically for primitive types. For instance
fn print_the_age(age: &i32) { ... }
That function could make a copy via *age and change it. But the caller will not see that change and it does not make much sense to do so in the first place.
Update due to comment: There is no advantage per se, at least as far as primitives are concerned (larger types may be costly to copy). It does boil down to the semantic relationship between the owner of the i32 and the receiver: "Here is a reference, it is guaranteed to not change while you have that reference, I - the owner - can't change or move or deallocate and there is no other thread else including myself that could possibly do that".
Consider where the reference is coming from: If you receive an &i32, wherever it is coming from can't change and can't deallocate. The `i32´ may be part of a larger type, which - due to handing out a reference - can't move, change or get de-allocated; the receiver is guaranteed of that. It's hard to say there is an advantage per se in here; it might be advantageous to communicate more detailed type (and lifetime!) relationships this way.
They're very useful, because they can be passed to generic functions that expect a reference:
fn map_vec<T, U>(v: &Vec<T>, f: impl Fn(&T) -> U) -> Vec<U> {...}
If immutable references of non-Copy types were forbidden, we would need two versions:
fn map_vec_own<T: !Copy, U>(v: &Vec<T>, f: impl Fn(&T) -> U) -> Vec<U> {...}
fn map_vec_copy<T: Copy, U>(v: &Vec<T>, f: impl Fn( T) -> U) -> Vec<U> {...}
Immutable references are, naturally, used to provide access to the referenced data. For instance, you could have loaded a dictionary and have multiple threads reading from it at the same time, each using their own immutable reference. Because the references are immutable those threads will not corrupt that common data.
Using only mutable references, you can't be sure of that so you need to make full copies. Copying data takes time and space, which are always limited. The primary question for performance tends to be if your data fits in CPU cache.
I'm guessing you were thinking of "copy" types as ones that fit in the same space as the reference itself, i.e. sizeof(type) <= sizeof(type*). Rust's Copy trait indicates data that could be safely copied, no matter the size. These are orthogonal concepts; for instance, a pointer might not be safely copied without adjusting a refernce count, or an array might be copyable but take gigabytes of memory. This is why Rc<T> has the Clone trait, not Copy.

What is the difference between Rc<RefCell<T>> and RefCell<Rc<T>>?

The Rust documentation covers Rc<RefCell<T>> pretty extensively but doesn't go into RefCell<Rc<T>>, which I am now encountering.
Do these effectively give the same result? Is there an important difference between them?
Do these effectively give the same result?
They are very different.
Rc is a pointer with shared ownership while RefCell provides interior mutability. The order in which they are composed makes a big difference to how they can be used.
Usually, you compose them as Rc<RefCell<T>>; the whole thing is shared and each shared owner gets to mutate the contents. The effect of mutating the contents will be seen by all of the shared owners of the outer Rc because the inner data is shared.
You can't share a RefCell<Rc<T>> except by reference, so this configuration is more limited in how it can be used. In order to mutate the inner data, you would need to mutably borrow from the outer RefCell, but then you'd have access to an immutable Rc. The only way to mutate it would be to replace it with a completely different Rc. For example:
let a = Rc::new(1);
let b = Rc::new(2);
let c = RefCell::new(Rc::clone(&a));
let d = RefCell::new(Rc::clone(&a));
*d.borrow_mut() = Rc::clone(&b); // this doesn't affect c
There is no way to mutate the values in a and b. This seems far less useful than Rc<RefCell<T>>.

Resources