Trouble with bidirectional references using Option<Rc<RefCell<T>>>

Trouble with bidirectional references using Option<Rc<RefCell<T>>> - rust

I am building a NES emulator to learn Rust. I have difficulty organizing components.
My emulator uses the structure Bus to communicate with CPU and PPU. CPU and PPU also need to communicate with bus.
I figured that it is a good idea to create a shared pointer for Bus so that PPU and CPU have the same pointer reference to a bus. This is what I tried: playground. Unfortunately it didn't work.
error[E0308]: mismatched types
--> src/main.rs:33:9
|
32 | fn bus(&mut self) -> &mut Bus {
| -------- expected `&mut Bus` because of return type
33 | self.bus_helper().borrow_mut()
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| expected `&mut Bus`, found struct `RefMut`
| help: consider mutably borrowing here: `&mut self.bus_helper().borrow_mut()`
|
= note: expected mutable reference `&mut Bus`
found struct `RefMut<'_, Bus>`
How to get my code to compile? Also, I wonder whether it is the best way to create bidirectional references? Are there any alternatives?

borrow_mut() returns a RefMut, which is a wrapper that maintains the borrow count of the cell so that the cell can panic if Rust's aliasing rules are violated at runtime (e.g. if a mutable reference and any other reference try to exist at the same time).
You can fix this by changing the return type to RefMut<'_, Bus>.
Alternatively, to make the RefCell implementation detail hidden from the caller, you can instead return impl DerefMut<Target=Bus> + '_ which says "this method returns something that can deref as a mutable Bus and which captures the lifetime of self."
Also, I wonder whether it is the best way to create bidirectional references?
Generally for bidirectional references, you want all of the references away from the "primary value" (e.g. the root of a tree) to be Rc and references in the other direction to be Weak. This allows you to drop the primary value and have the whole network of structures be automatically destroyed. If you have two values with Rcs to each other then you have a reference cycle that must be explicitly broken to avoid a memory leak when you release your "top-level" reference to the network of structures.
However, if you find yourself wanting bidirectional references in this kind of case where you have a fixed set of values that want to refer to each other, you likely want a different pattern altogether. Based on your playground link, the CPU and PPU have strong shared ownership of the bus, and the bus has weak ownership back to the CPU and PPU. (In your model, you should express that with Weak and not raw pointers.)
What this hints at is that a single entity should own all of the components, and that entity should be responsible for their communication. There's a few ways you could model this.
A System value owns all of the components, and this value would provide a communication mechanism. This would replace Bus. The circular reference can be avoided by requiring a reference to System be given to the component values any time they need to do anything.
Channels could be used for communication between components, though this makes synchronous messaging between components trickier.
If the performance implications of channels are prohibitive and the desired API can't work by passing references to System around with every call, you could have references from the components back to System if you pin System so that it can't move, and therefore the references can't be invalidated. This is a bit of an advanced topic; you may want to read the relevant documentation.

Related

Rust Ownership Smart Pointers

I've recently started learning Rust and just learned about the Smart Pointers (Box, Rc and RefCell).
In the guide they talked about Rc implementing "shared ownership". But if I understood it correctly, the whole point of the ownership system is that there can only be one owner.
And to me (still a Rust newbie) it seems as if Rc and RefCell take ownership of they value they contain and just "expose" different types of references to the contained value?
Am I wrong and if yes: why is Rust allowed to "cheat" the ownership system like that and would I be theoretically able to implement my own "cheating" types?

if I understood it correctly, the whole point of the ownership system is that there can only be one owner.
No. Rust guarantees that there can be no more than a single mutable borrow and there cannot be mutable and non-mutable borrows at the same time. It doesn't say anything about owners.
why is Rust allowed to "cheat" the ownership system
It doesn't.
would I be theoretically able to implement my own "cheating" types
Yes. Those types are all implemented in Rust¹. Those types are battle-tested and perfectly safe under Rust's safety rules, but they require the use of unsafe at a lower level.
Note that unsafe doesn't permit going around the rule that you can have one mutable borrow XOR any number of non-mutable borrows, but using unsafe, you could do it anyway. This, of course, would actually be unsafe (and trigger undefined behavior).
1: Although some of those types are implemented using features that are still private to the compiler so you wouldn't be able to do everything as efficiently as the standard library, and Box and UnsafeCell are special to the language and cannot be reproduced by a normal library. There are for example many crates providing Rc or Arc alternatives which are better that the standard ones in some cases.

Understanding usage of Rc<RefCell<SomeStruct>> in Rust

I'm looking at some code that uses
Rc<RefCell<SomeStruct>>
So I went out to read about the differences between Rc and RefCell:
Here is a recap of the reasons to choose Box, Rc, or RefCell:
Rc enables multiple owners of the same data; Box and RefCell
have single owners.
Box allows immutable or mutable borrows checked
at compile time; Rc allows only immutable borrows checked at
compile time;
RefCell allows immutable or mutable borrows checked
at runtime. Because RefCell allows mutable borrows checked at
runtime, you can mutate the value inside the RefCell even when the
RefCell is immutable.
So, Rc makes sure that SomeStruct is accessible by many people at the same time. But how do I access? I only see the get_mut method, which returns a mutable reference. But the text explained that "Rc allows only immutable borrows".
If it's possible to access Rc's object in mut and not mut way, why a RefCell is needed?

So, Rc makes sure that SomeStruct is accessible by many people at the same time. But how do I access?
By dereferencing. If you have a variable x of type Rc<...>, you can access the inner value using *x. In many cases this happens implicitly; for example you can call methods on x simply with x.method(...).
I only see the get_mut method, which returns a mutable reference. But the text explained that "Rc allows only immutable borrows".
The get_mut() method is probably more recent than the explanation stating that Rc only allows immutable borrows. Moreover, it only returns a mutable borrow if there currently is only a single owner of the inner value, i.e. if you currently wouldn't need Rc in the first place. As soon as there are multiple owners, get_mut() will return None.
If it's possible to access Rc's object in mut and not mut way, why a RefCell is needed?
RefCell will allow you to get mutable access even when multiple owners exist, and even if you only hold a shared reference to the RefCell. It will dynamically check at runtime that only a single mutable reference exists at any given time, and it will panic if you request a second, concurrent one (or return and error for the try_borrow methods, respecitvely). This functionality is not offered by Rc.
So in summary, Rc gives you shared ownership. The innervalue has multiple owners, and reference counting makes sure the data stays alive as long as at least one owner still holds onto it. This is useful if your data doesn't have a clear single owner. RefCell gives you interior mutability, i.e. you can borrow the inner value dynamically at runtime, and modify it even with a shared reference. The combination Rc<RefCell<...>> gives you the combination of both – a value with multiple owners that can be borrowed mutably by any one of the owners.
For further details, you can read the relevant chapters of the Rust book:
Rc<T>, the Reference Counted Smart Pointer
RefCell<T> and the Interior Mutability Pattern

If it's possible to access Rc's object in mut and not mut way, why a
RefCell is needed?
Rc pointer allows you to have shared ownership. since ownership is shared, the value owned by Rc pointer is immutable
Refcell smart pointer represents single ownership over the data it holds, much like Box smart pointer. the difference is that box smart pointer enforces the borrowing rules at compile time, whereas refcell enforces the borrowing rules at run time.
If you combine them together, you can create a smart pointer which can have multiple owners, and some of the owners would be able to modify the value some cannot. A perfect use case is to create a doubly linked list in rust.
struct LinkedList<T>{
head:Pointer<T>,
tail:Pointer<T>
}
struct Node<T>{
element:T,
next:Pointer<T>,
prev:Pointer<T>,
}
// we need multiple owners who can mutate the data
// it is Option because "end.next" would be None
type Pointer<T>=Option<Rc<RefCell<Node<T>>>>;
In the image "front" and "end" nodes will both point to the "middle" node and they can both mutate it. Imagine you need to insert a new node after "front", you will need to mutate "front.next". So in doubly linked you need multiple ownership and mutability power at the same time.

Why is transmuting &T to &mut T Undefined Behaviour?

I want to reinterpret an immutable reference to a mutable reference (in an unsafe block) and be responsible for the safety checks on my own, yet it appears I cannot use mem::transmute() to do so.
let map_of_vecs: HashMap<usize, Vec<_>> = ...;
let vec = map_of_vecs[2];
/// obtain a mutable reference to vec here
I do not want to wrap the Vecs into Cells because that would affect all other areas of code that use map_of_vecs and I only need mutability in one line.
I do not have mutable access to map_of_vecs

The Rust optimiser makes the assumption that &mut T references are unique. For example, it might deduce that a particular piece of memory can be reused because a mutable reference to that memory exists but is never accessed again.
However, if you transmute a &T to a &mut T then you are able to create multiple mutable references to the same data. If the compiler makes this assumption, you could end up dereferencing a value that has been overwritten with something else.
This is just one example of how the compiler might make use of the assumption that mutable references are unique. In fact, the compiler is free to use this information in any way it sees fit — which could (and likely will) change from version to version.
Even if you think you have guaranteed that the reference isn't aliased, you can't always guarantee that users of your code won't create more references. Even if you think you can be sure of that, the existence of references is extremely subtle and it's very easy to miss one. For example when you call a method that takes &self, that's a reference.

The Rust compiler annotates &T function parameters with the LLVM noalias and readonly attributes (provided that T does not contain any UnsafeCell parts). The noalias attribute tells LLVM that the memory behind this pointer may only be written to through this pointer (and not through any other pointers), and the readonly attribute tells LLVM that it can't be written to through this pointer (but possibly other pointers). In combination, the two attributes allow the LLVM optimiser to assume the memory is not changed at all during the execution of this function, and the code can be optimised based on this assumption. The optimiser may reorder instructions or remove code in a way that is only safe to do if you actually stick to this contract.
Another way the conversion can lead to undefined behaviour is for statics: immutable statics without UnsafeCells will be placed into read-only memory, so if you actually write to them, your code will segfault.
For parameters with UnsafeCells the compiler does not emit the readonly attribute, and statics containing an UnsafeCell are placed into writable memory.

Understand smart pointers in Rust

I am a newbie to Rust and writing to understand the "Smart pointers" in Rust. I have basic understanding of how smart pointers works in C++ and has been using it for memory management since a few years ago. But to my very much surprise, Rust also provides such utility explicitly.
Because from a tutorial here (https://pcwalton.github.io/2013/03/18/an-overview-of-memory-management-in-rust.html), it seems that every raw pointers have been automatically wrapped with a smart pointer, which seems very reasonable. Then why do we still need such Box<T>, Rc<T>, and Ref<T> stuff? According to this specification: https://doc.rust-lang.org/book/ch15-00-smart-pointers.html
Any comments will be apprecicated a lot. Thanks.

You can think about the difference between a T and a Box<T> as the difference between a statically allocated object and a dynamically allocated object (the latter being created via a new expression in C++ terms).
In Rust, both T and Box<T> represent a variable that has ownership over the referent object (i.e. when the variable goes out of scope, the object will be destroyed, whether it was stored by value or by reference). On the contrary, &T and &mut T represent borrowing of the object (i.e. these variables are not responsible for destroying the object, and they cannot outlive the owner of the object).
By default, you'd probably want to use T, but sometimes you might want (or have) to use Box<T>. For example, you would use a Box<T> if you want to own a T that's too large to be allocated in place. You would also use it when the object doesn't have a known size at all, which means that your only choice to store it or pass it around is through the "pointer" (the Box<T>).
In Rust, an object is generally either mutable or aliased, but not both. If you have given out immutable references to an object, you normally need to wait until those references are over before you can mutate that object again.
Additionally, Rust's immutability is transitive. If you receive an object immutably, it means that you have access to its contents (and the contents of those contents, and so on) also immutably.
Normally, all of these things are enforced at compile time. This means that you catch errors faster, but you are limited to being able to express only what the compiler can prove statically.
Like T and Box<T>, you may sometimes use RefCell<T>, which is another ownership type. But unlike T and Box<T>, the RefCell<T> enforces the borrow checking rules at runtime instead of compile time, meaning that sometimes you can do things with it that are safe but wouldn't pass the compiler's static borrow checker. The main example for this is getting a mutable reference to the interior of an object that was received immutably (which, under the statically enforced rules of Rust, would make the entire interior immutable).
The types Ref<T> and RefMut<T> are the runtime-checked equivalents of &T and &mut T respectively.
(EDIT: This whole thing is somewhat of a lie. &mut really means "unique borrow" and & means "non-unique borrow". Certain types, like mutexes, can be non-uniquely but still mutably borrowed, because otherwise they would be useless.)
Rust's ownership model tries to push you to write programs in which objects' lifetimes are known at compile time. This works well in certain scenarios, but makes other scenarios difficult or impossible to express.
Rc<T> and its atomic sibling Arc<T> are reference-counting wrappers of T. They offer you an alternative to the ownership model.
They are useful when you want to use and properly dispose an object, but it is not easy (or possible) to determine, at the moment you're writing the code, which specific variable should be the owner of that object (and therefore should take care of disposing it). Much like in C++, this means that there is no single owner of the object and that the object will be disposed by the last reference-counting wrapper that points to it.

The article you linked uses outdated syntax. Certain smart pointers used to have special names and associated syntax that has been removed since some time before Rust 1.0:
Box<T> replaced ~T ("owned pointers")
Rc<T> replaced #T ("managed pointers")
Because the Internet never forgets, you can still find pre-1.0 documentation and articles (such as the one you linked) that use the old syntax. Check the date of the article: if it's before May 2015, you're dealing with an early, unstable Rust.

Do rust lifetimes only refer to references?

I'm trying to wrap my head around Rust lifetimes (as the official guides don't really explain them that well).
Do rust lifetimes only refer to references, or can they refer to base/primitive values as well?

Lifetimes are the link between values and references to said values.
In order to understand this link, I will use a broken parallel: houses and addresses.
A house is a physical entity. It is built on a piece of land at some time, will live for a few dozen or hundred years, may be renovated multiple times during this time, and will most likely be destroyed at some point.
An address is a logical entity, it may point to a house, or to other physical entities (a field, a school, a train station, a company's HQ, ...).
The lifetime of a house is relatively clear: it represents the duration during which a house is usable, from the moment it is built to the moment it is destroyed. The house may undergo several renovations during this time, and what used to be a simple cabana may end up being a full-fledged manor, but that is of no concern to us; for our purpose the house is living throughout those transformations. Only its creation and ultimate destruction matter... even though it might be better if no one happen to be in the bedroom when we tear the roof down.
Now, imagine that you are a real estate agent. You do not keep the houses you sell in your office, it's impractical; you do, however, keep their addresses!
Without the notion of lifetime, from time to time your customers will complain because the address you sent them to... was the address of a garbage dump, and not at all that lovely two-story house you had the photography of. You might also get a couple of inquiries from the police station asking why people holding onto a booklet from your office were found in a just destroyed house, the ensuing lawsuit might shut down your business.
This is obviously a risk to your business, and therefore you should seek a better solution. What if each address could be tagged with the lifetime of the house it refers to, so that you know not to send people to their death (or disappointment) ?
You may have recognized the C manual memory management strategy in that garbage dump; in C it's up to you, the real estate agent developer, to make sure that your addresses (pointers/references) always refer to living houses.
In Rust, however, the references are tagged with a special marker: 'enough; it represents the a lower-bound on the lifetime of the value referred.
When the compiler checks whether your usage of the reference is safe or not, it asks the question:
Is the value still alive ?
It does not matter whether the value will be there for a 100 years afterward, as long as it lives long 'enough for the use you have of it.

No, they refer to values as well. If it is not clear from the context how long they will live, they have to be annotated as well. It is then called a lifetime bound.
In the following example it is necessary to specify that the value, the reference is referring to, lives at least as long as the reference itself:
use std::num::Primitive;
struct Foo<'a, T: Primitive + 'a> {
a: &'a T
}
Try deleting the + 'a and the compiler will complain. This is required since T could be anything implementing Primitive.

Yes, they only refer to references, however those references can refer to primitive types. Rust is not like Java (and similar languages) that make a distinction between primitive types, which are passed by value, and more complex types (Objects in Java) that are passed by reference. Complex types can be allocated on the stack and passed by value, and references can be taken to primitive types.
For example, here is a function that takes two references to i32's, and returns a reference to the larger one:
fn bigger<'a>(a: &'a i32, b: &'a i32) -> &'a i32 {
if a > b { a } else { b }
}
It uses the lifetime 'a to communicate that the lifetime of the returned reference is the same as that of the references passed in.

When you see a lifetime annotation (e.g. 'a) in the code, there's almost always a reference, or borrowed pointer, involved.
The full syntax for borrowed pointers is &'a T. 'a is the lifetime of the referent. T is the type of the referent.
Structs and enums can have lifetime parameters. This is usually a consequence of the struct or enum containing a borrowed pointer. When you store a borrowed pointer in a struct or enum, you must explicitly state the referent's lifetime. For example, the Cow enum in the standard library contains a borrowed pointer in one of its variants. Therefore, it has a lifetime parameter that is used in the borrowed pointer's type to define the referent's lifetime.
Traits can have type bounds and also a lifetime bound. The lifetime bound indicates the largest region in which all the borrowed pointers in a concrete implementation of that trait are valid (i.e. their referents are alive). If the implementation contains no borrowed
pointers, then the lifetime is inferred as 'static. Lifetime bounds can appear in type parameter definitions, in where clauses and on trait objects.
Sometimes, you might want to define a struct or enum with a lifetime parameter, but without a corresponding value to borrow from. You can use a marker type, such as ContravariantLifetime<'a>, to ensure the lifetime parameter has the proper variance (ContravariantLifetime corresponds to the variance of borrowed pointers; without a marker, the lifetime would be bivariant, which means the lifetime could be substituted with any other lifetime... not very useful!). See an example of this use case here.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string