How does refcell interact with the borrow checker - rust

I am trying to wrap my head around the RefCell type, not as a utility, but in terms of it's implementation.
I understand the general idea, a struct that has mutable data inside of it that keeps track of how many active references to that data exist.
but how does it interact with the borrow checker to allow you to mutate the data multiple times?

but how does it interact with the borrow checker to allow you to mutate the data multiple times?
Internally, it doesn't really - it uses unsafe code.
RefCell is a checked wrapper around UnsafeCell, which provides the inner mutability. Its get method returns a *mut pointer to the wrapped type, which can be cast to a &mut reference with an arbitrary lifetime. This, of course, is very unsafe, so it's up to the user to ensure there's no violations of the borrow checker rules with UnsafeCell.
Cell does this by not allowing references to the inner object. RefCell does this via essentially a single-threaded lock, keeping track of whether a reference exists and denying access if so. Mutex and RwLock are similar, but use thread-safe locks.
(As an aside, UnsafeCell IS compiler magic, but for a different reason - it's needed to tell the compiler that an object may change behind an immutable reference. But that doesn't have to do with the borrow checks so much as optimizations.)

Related

Rust Ownership Smart Pointers

I've recently started learning Rust and just learned about the Smart Pointers (Box, Rc and RefCell).
In the guide they talked about Rc implementing "shared ownership". But if I understood it correctly, the whole point of the ownership system is that there can only be one owner.
And to me (still a Rust newbie) it seems as if Rc and RefCell take ownership of they value they contain and just "expose" different types of references to the contained value?
Am I wrong and if yes: why is Rust allowed to "cheat" the ownership system like that and would I be theoretically able to implement my own "cheating" types?
if I understood it correctly, the whole point of the ownership system is that there can only be one owner.
No. Rust guarantees that there can be no more than a single mutable borrow and there cannot be mutable and non-mutable borrows at the same time. It doesn't say anything about owners.
why is Rust allowed to "cheat" the ownership system
It doesn't.
would I be theoretically able to implement my own "cheating" types
Yes. Those types are all implemented in Rust¹. Those types are battle-tested and perfectly safe under Rust's safety rules, but they require the use of unsafe at a lower level.
Note that unsafe doesn't permit going around the rule that you can have one mutable borrow XOR any number of non-mutable borrows, but using unsafe, you could do it anyway. This, of course, would actually be unsafe (and trigger undefined behavior).
1: Although some of those types are implemented using features that are still private to the compiler so you wouldn't be able to do everything as efficiently as the standard library, and Box and UnsafeCell are special to the language and cannot be reproduced by a normal library. There are for example many crates providing Rc or Arc alternatives which are better that the standard ones in some cases.

Understanding usage of Rc<RefCell<SomeStruct>> in Rust

I'm looking at some code that uses
Rc<RefCell<SomeStruct>>
So I went out to read about the differences between Rc and RefCell:
Here is a recap of the reasons to choose Box, Rc, or RefCell:
Rc enables multiple owners of the same data; Box and RefCell
have single owners.
Box allows immutable or mutable borrows checked
at compile time; Rc allows only immutable borrows checked at
compile time;
RefCell allows immutable or mutable borrows checked
at runtime. Because RefCell allows mutable borrows checked at
runtime, you can mutate the value inside the RefCell even when the
RefCell is immutable.
So, Rc makes sure that SomeStruct is accessible by many people at the same time. But how do I access? I only see the get_mut method, which returns a mutable reference. But the text explained that "Rc allows only immutable borrows".
If it's possible to access Rc's object in mut and not mut way, why a RefCell is needed?
So, Rc makes sure that SomeStruct is accessible by many people at the same time. But how do I access?
By dereferencing. If you have a variable x of type Rc<...>, you can access the inner value using *x. In many cases this happens implicitly; for example you can call methods on x simply with x.method(...).
I only see the get_mut method, which returns a mutable reference. But the text explained that "Rc allows only immutable borrows".
The get_mut() method is probably more recent than the explanation stating that Rc only allows immutable borrows. Moreover, it only returns a mutable borrow if there currently is only a single owner of the inner value, i.e. if you currently wouldn't need Rc in the first place. As soon as there are multiple owners, get_mut() will return None.
If it's possible to access Rc's object in mut and not mut way, why a RefCell is needed?
RefCell will allow you to get mutable access even when multiple owners exist, and even if you only hold a shared reference to the RefCell. It will dynamically check at runtime that only a single mutable reference exists at any given time, and it will panic if you request a second, concurrent one (or return and error for the try_borrow methods, respecitvely). This functionality is not offered by Rc.
So in summary, Rc gives you shared ownership. The innervalue has multiple owners, and reference counting makes sure the data stays alive as long as at least one owner still holds onto it. This is useful if your data doesn't have a clear single owner. RefCell gives you interior mutability, i.e. you can borrow the inner value dynamically at runtime, and modify it even with a shared reference. The combination Rc<RefCell<...>> gives you the combination of both – a value with multiple owners that can be borrowed mutably by any one of the owners.
For further details, you can read the relevant chapters of the Rust book:
Rc<T>, the Reference Counted Smart Pointer
RefCell<T> and the Interior Mutability Pattern
If it's possible to access Rc's object in mut and not mut way, why a
RefCell is needed?
Rc pointer allows you to have shared ownership. since ownership is shared, the value owned by Rc pointer is immutable
Refcell smart pointer represents single ownership over the data it holds, much like Box smart pointer. the difference is that box smart pointer enforces the borrowing rules at compile time, whereas refcell enforces the borrowing rules at run time.
If you combine them together, you can create a smart pointer which can have multiple owners, and some of the owners would be able to modify the value some cannot. A perfect use case is to create a doubly linked list in rust.
struct LinkedList<T>{
head:Pointer<T>,
tail:Pointer<T>
}
struct Node<T>{
element:T,
next:Pointer<T>,
prev:Pointer<T>,
}
// we need multiple owners who can mutate the data
// it is Option because "end.next" would be None
type Pointer<T>=Option<Rc<RefCell<Node<T>>>>;
In the image "front" and "end" nodes will both point to the "middle" node and they can both mutate it. Imagine you need to insert a new node after "front", you will need to mutate "front.next". So in doubly linked you need multiple ownership and mutability power at the same time.

How to modify private mutable state when the trait dictates a non-mutable self reference? [duplicate]

When would you be required to use Cell or RefCell? It seems like there are many other type choices that would be suitable in place of these, and the documentation warns that using RefCell is a bit of a "last resort".
Is using these types a "code smell"? Can anyone show an example where using these types makes more sense than using another type, such as Rc or even Box?
It is not entirely correct to ask when Cell or RefCell should be used over Box and Rc because these types solve different problems. Indeed, more often than not RefCell is used together with Rc in order to provide mutability with shared ownership. So yes, use cases for Cell and RefCell are entirely dependent on the mutability requirements in your code.
Interior and exterior mutability are very nicely explained in the official Rust book, in the designated chapter on mutability. External mutability is very closely tied to the ownership model, and mostly when we say that something is mutable or immutable we mean exactly the external mutability. Another name for external mutability is inherited mutability, which probably explains the concept more clearly: this kind of mutability is defined by the owner of the data and inherited to everything you can reach from the owner. For example, if your variable of a structural type is mutable, so are all fields of the structure in the variable:
struct Point { x: u32, y: u32 }
// the variable is mutable...
let mut p = Point { x: 10, y: 20 };
// ...and so are fields reachable through this variable
p.x = 11;
p.y = 22;
let q = Point { x: 10, y: 20 };
q.x = 33; // compilation error
Inherited mutability also defines which kinds of references you can get out of the value:
{
let px: &u32 = &p.x; // okay
}
{
let py: &mut u32 = &mut p.x; // okay, because p is mut
}
{
let qx: &u32 = &q.x; // okay
}
{
let qy: &mut u32 = &mut q.y; // compilation error since q is not mut
}
Sometimes, however, inherited mutability is not enough. The canonical example is reference-counted pointer, called Rc in Rust. The following code is entirely valid:
{
let x1: Rc<u32> = Rc::new(1);
let x2: Rc<u32> = x1.clone(); // create another reference to the same data
let x3: Rc<u32> = x2.clone(); // even another
} // here all references are destroyed and the memory they were pointing at is deallocated
At the first glance it is not clear how mutability is related to this, but recall that reference-counted pointers are called so because they contain an internal reference counter which is modified when a reference is duplicated (clone() in Rust) and destroyed (goes out of scope in Rust). Hence Rc has to modify itself even though it is stored inside a non-mut variable.
This is achieved via internal mutability. There are special types in the standard library, the most basic of them being UnsafeCell, which allow one to work around the rules of external mutability and mutate something even if it is stored (transitively) in a non-mut variable.
Another way to say that something has internal mutability is that this something can be modified through a &-reference - that is, if you have a value of type &T and you can modify the state of T which it points at, then T has internal mutability.
For example, Cell can contain Copy data and it can be mutated even if it is stored in non-mut location:
let c: Cell<u32> = Cell::new(1);
c.set(2);
assert_eq!(c.get(), 2);
RefCell can contain non-Copy data and it can give you &mut pointers to its contained value, and absence of aliasing is checked at runtime. This is all explained in detail on their documentation pages.
As it turned out, in overwhelming number of situations you can easily go with external mutability only. Most of existing high-level code in Rust is written that way. Sometimes, however, internal mutability is unavoidable or makes the code much clearer. One example, Rc implementation, is already described above. Another one is when you need shared mutable ownership (that is, you need to access and modify the same value from different parts of your code) - this is usually achieved via Rc<RefCell<T>>, because it can't be done with references alone. Even another example is Arc<Mutex<T>>, Mutex being another type for internal mutability which is also safe to use across threads.
So, as you can see, Cell and RefCell are not replacements for Rc or Box; they solve the task of providing you mutability somewhere where it is not allowed by default. You can write your code without using them at all; and if you get into a situation when you would need them, you will know it.
Cells and RefCells are not code smell; the only reason whey they are described as "last resort" is that they move the task of checking mutability and aliasing rules from the compiler to the runtime code, as in case with RefCell: you can't have two &muts pointing to the same data at the same time, this is statically enforced by the compiler, but with RefCells you can ask the same RefCell to give you as much &muts as you like - except that if you do it more than once it will panic at you, enforcing aliasing rules at runtime. Panics are arguably worse than compilation errors because you can only find errors causing them at runtime rather than at compilation time. Sometimes, however, the static analyzer in the compiler is too restrictive, and you indeed do need to "work around" it.
No, Cell and RefCell aren't "code smells". Normally, mutability is inherited, that is you can mutate a field or a part of a data structure if and only if you have exclusive access to of the whole data structure, and hence you can opt into mutability at that level with mut (i.e., foo.x inherits its mutability or lack thereof from foo). This is a very powerful pattern and should be used whenever it works well (which is surprisingly often). But it's not expressive enough for all code everywhere.
Box and Rc have nothing to do with this. Like almost all other types, they respect inherited mutability: you can mutate the contents of a Box if you have exclusive, mutable access to the Box (because that means you have exclusive access to the contents, too). Conversely, you can never get a &mut to the contents of an Rc because by its nature Rc is shared (i.e. there can be multiple Rcs referring to the same data).
One common case of Cell or RefCell is that you need to share mutable data between several places. Having two &mut references to the same data is normally not allowed (and for good reason!). However, sometimes you need it, and the cell types enable doing it safely.
This could be done via the common combination of Rc<RefCell<T>>, which allows the data to stick around for as long as anyone uses it and allows everyone (but only one at a time!) to mutate it. Or it could be as simple as &Cell<i32> (even if the cell is wrapped in a more meaningful type). The latter is also commonly used for internal, private, mutable state like reference counts.
The documentation actually has several examples of where you'd use Cell or RefCell. A good example is actually Rc itself. When creating a new Rc, the reference count must be increased, but the reference count is shared between all Rcs, so, by inherited mutability, this couldn't possibly work. Rc practically has to use a Cell.
A good guideline is to try writing as much code as possible without cell types, but using them when it hurts too much without them. In some cases, there is a good solution without cells, and, with experience, you'll be able to find those when you previously missed them, but there will always be things that just aren't possible without them.
Suppose you want or need to create some object of the type of your choice and dump it into an Rc.
let x = Rc::new(5i32);
Now, you can easily create another Rc that points to the exact same object and therefore memory location:
let y = x.clone();
let yval: i32 = *y;
Since in Rust you may never have a mutable reference to a memory location to which any other reference exists, these Rc containers can never be modified again.
So, what if you wanted to be able to modify those objects and have multiple Rc pointing to one and the same object?
This is the issue that Cell and RefCell solve. The solution is called "interior mutability", and it means that Rust's aliasing rules are enforced at runtime instead of compile-time.
Back to our original example:
let x = Rc::new(RefCell::new(5i32));
let y = x.clone();
To get a mutable reference to your type, you use borrow_mut on the RefCell.
let yval = x.borrow_mut();
*yval = 45;
In case you already borrowed the value your Rcs point to either mutably or non-mutably, the borrow_mut function will panic, and therefore enforce Rust's aliasing rules.
Rc<RefCell<T>> is just one example for RefCell, there are many other legitimate uses. But the documentation is right. If there is another way, use it, because the compiler cannot help you reason about RefCells.

Why is transmuting &T to &mut T Undefined Behaviour?

I want to reinterpret an immutable reference to a mutable reference (in an unsafe block) and be responsible for the safety checks on my own, yet it appears I cannot use mem::transmute() to do so.
let map_of_vecs: HashMap<usize, Vec<_>> = ...;
let vec = map_of_vecs[2];
/// obtain a mutable reference to vec here
I do not want to wrap the Vecs into Cells because that would affect all other areas of code that use map_of_vecs and I only need mutability in one line.
I do not have mutable access to map_of_vecs
The Rust optimiser makes the assumption that &mut T references are unique. For example, it might deduce that a particular piece of memory can be reused because a mutable reference to that memory exists but is never accessed again.
However, if you transmute a &T to a &mut T then you are able to create multiple mutable references to the same data. If the compiler makes this assumption, you could end up dereferencing a value that has been overwritten with something else.
This is just one example of how the compiler might make use of the assumption that mutable references are unique. In fact, the compiler is free to use this information in any way it sees fit — which could (and likely will) change from version to version.
Even if you think you have guaranteed that the reference isn't aliased, you can't always guarantee that users of your code won't create more references. Even if you think you can be sure of that, the existence of references is extremely subtle and it's very easy to miss one. For example when you call a method that takes &self, that's a reference.
The Rust compiler annotates &T function parameters with the LLVM noalias and readonly attributes (provided that T does not contain any UnsafeCell parts). The noalias attribute tells LLVM that the memory behind this pointer may only be written to through this pointer (and not through any other pointers), and the readonly attribute tells LLVM that it can't be written to through this pointer (but possibly other pointers). In combination, the two attributes allow the LLVM optimiser to assume the memory is not changed at all during the execution of this function, and the code can be optimised based on this assumption. The optimiser may reorder instructions or remove code in a way that is only safe to do if you actually stick to this contract.
Another way the conversion can lead to undefined behaviour is for statics: immutable statics without UnsafeCells will be placed into read-only memory, so if you actually write to them, your code will segfault.
For parameters with UnsafeCells the compiler does not emit the readonly attribute, and statics containing an UnsafeCell are placed into writable memory.

Understand smart pointers in Rust

I am a newbie to Rust and writing to understand the "Smart pointers" in Rust. I have basic understanding of how smart pointers works in C++ and has been using it for memory management since a few years ago. But to my very much surprise, Rust also provides such utility explicitly.
Because from a tutorial here (https://pcwalton.github.io/2013/03/18/an-overview-of-memory-management-in-rust.html), it seems that every raw pointers have been automatically wrapped with a smart pointer, which seems very reasonable. Then why do we still need such Box<T>, Rc<T>, and Ref<T> stuff? According to this specification: https://doc.rust-lang.org/book/ch15-00-smart-pointers.html
Any comments will be apprecicated a lot. Thanks.
You can think about the difference between a T and a Box<T> as the difference between a statically allocated object and a dynamically allocated object (the latter being created via a new expression in C++ terms).
In Rust, both T and Box<T> represent a variable that has ownership over the referent object (i.e. when the variable goes out of scope, the object will be destroyed, whether it was stored by value or by reference). On the contrary, &T and &mut T represent borrowing of the object (i.e. these variables are not responsible for destroying the object, and they cannot outlive the owner of the object).
By default, you'd probably want to use T, but sometimes you might want (or have) to use Box<T>. For example, you would use a Box<T> if you want to own a T that's too large to be allocated in place. You would also use it when the object doesn't have a known size at all, which means that your only choice to store it or pass it around is through the "pointer" (the Box<T>).
In Rust, an object is generally either mutable or aliased, but not both. If you have given out immutable references to an object, you normally need to wait until those references are over before you can mutate that object again.
Additionally, Rust's immutability is transitive. If you receive an object immutably, it means that you have access to its contents (and the contents of those contents, and so on) also immutably.
Normally, all of these things are enforced at compile time. This means that you catch errors faster, but you are limited to being able to express only what the compiler can prove statically.
Like T and Box<T>, you may sometimes use RefCell<T>, which is another ownership type. But unlike T and Box<T>, the RefCell<T> enforces the borrow checking rules at runtime instead of compile time, meaning that sometimes you can do things with it that are safe but wouldn't pass the compiler's static borrow checker. The main example for this is getting a mutable reference to the interior of an object that was received immutably (which, under the statically enforced rules of Rust, would make the entire interior immutable).
The types Ref<T> and RefMut<T> are the runtime-checked equivalents of &T and &mut T respectively.
(EDIT: This whole thing is somewhat of a lie. &mut really means "unique borrow" and & means "non-unique borrow". Certain types, like mutexes, can be non-uniquely but still mutably borrowed, because otherwise they would be useless.)
Rust's ownership model tries to push you to write programs in which objects' lifetimes are known at compile time. This works well in certain scenarios, but makes other scenarios difficult or impossible to express.
Rc<T> and its atomic sibling Arc<T> are reference-counting wrappers of T. They offer you an alternative to the ownership model.
They are useful when you want to use and properly dispose an object, but it is not easy (or possible) to determine, at the moment you're writing the code, which specific variable should be the owner of that object (and therefore should take care of disposing it). Much like in C++, this means that there is no single owner of the object and that the object will be disposed by the last reference-counting wrapper that points to it.
The article you linked uses outdated syntax. Certain smart pointers used to have special names and associated syntax that has been removed since some time before Rust 1.0:
Box<T> replaced ~T ("owned pointers")
Rc<T> replaced #T ("managed pointers")
Because the Internet never forgets, you can still find pre-1.0 documentation and articles (such as the one you linked) that use the old syntax. Check the date of the article: if it's before May 2015, you're dealing with an early, unstable Rust.

Resources