I'm trying to implement a simple tree structure with Rc pointers:
use std::rc::Rc;
fn main() {
println!("Hello, world!");
}
enum Expr {
B(i128),
A(Rc<Expr>, Rc<Expr>),
O(Rc<Expr>, Rc<Expr>),
}
struct Node {
data: Rc<Expr>,
parent: Option<Rc<Node>>,
children: Vec<Rc<Node>>,
}
impl Node {
fn add_to_children(mut self, node: &Rc<Node>) {
self.children.push(Rc::clone(node))
}
fn set_as_parent(mut self, node: &Rc<Node>) {
self.parent = Some(Rc::clone(node))
}
fn link_parent_child(parent: &mut Rc<Node>, child: &mut Rc<Node>) {
println!("eheeh");
parent.add_to_children(&child);
child.set_as_parent(&parent);
}
}
This won't compile however:
error[E0507]: cannot move out of an `Rc`
--> src/main.rs:32:9
|
32 | parent.add_to_children(&child);
| ^^^^^^^-----------------------
| | |
| | value moved due to this method call
| move occurs because value has type `Node`, which does not implement the `Copy` trait
|
note: this function takes ownership of the receiver `self`, which moves value
--> src/main.rs:21:28
|
21 | fn add_to_children(mut self, node: &Rc<Node>) {
What's the better way of implementing this type of tree? It is my signature that's wrong?
Your add_to_children and set_as_parent methods take mut self, which means they consume self and try to move out of the Rc. That's not allowed, as there may be other references to the object.
The methods should take &mut self... but you'll run into another issue: Rc only exposes an immutable reference. Because, again, there may be multiple references.
The way to solve that issue is interior mutability. In your case, RefCell is the easiest - it's essentially a single-threaded lock, allowing only one place mutable access at a time. It is not a pointer and does not allocate on the heap by itself - it simply wraps the underlying value.
There's also another issue: Because both your parent and children refer to each other via Rc, you end up with a circular reference, meaning your nodes won't free memory when you drop them. Using std::rc::Weak for the parent references will fix that.
As Chaymin Friedman points out, Rust's rules about mutability can make implementing a tree structure somewhat difficult, especially when it contains parent references. There are many crates out on crates.io that have implemented such tree structures, using a variety of techniques.
Related
In my application (a compiler), I'd like to create data cyclic data structures of various kinds throughout my program's execution that all have the same lifetime (in my case, lasting until the end of compilation). In addition,
I don't need to worry about multi-threading
I only need to append information - no need to delete or garbage collect
I only need immutable references to my data
This seemed like a good use case for an Arena, but I saw that this would require passing the arena around to every function in my program, which seemed like a large overhead.
So instead I found a macro called thread_local! that I can use to define global data. Using this, I thought I might be able to define a custom type that wraps an index into the array, and implement Deref on that type:
use std::cell::RefCell;
enum Floop {
CaseA,
CaseB,
CaseC(FloopRef),
CaseD(FloopRef),
CaseE(Vec<FloopRef>),
}
thread_local! {
static FLOOP_ARRAY: RefCell<Vec<Box<Floop>>> = RefCell::new(Vec::new());
}
pub struct FloopRef(usize);
impl std::ops::Deref for FloopRef {
type Target = Floop;
fn deref(&self) -> &Self::Target {
return FLOOP_ARRAY.with(|floops| &floops.borrow()[self.0]);
}
}
pub fn main() {
// initialize some data
FLOOP_ARRAY.with(|floops| {
floops.borrow_mut().push(Box::new(Floop::CaseA));
let idx = floops.borrow_mut().len();
floops.borrow_mut().push(Box::new(Floop::CaseC(FloopRef(idx))));
});
}
Unfortunately I run into lifetime errors:
error: lifetime may not live long enough
--> src/main.rs:20:36
|
20 | return FLOOP_ARRAY.with(|floops| &floops.borrow()[self.0]);
| ------- ^^^^^^^^^^^^^^^^^^^^^^^^ returning this value requires that `'1` must outlive `'2`
| | |
| | return type of closure is &'2 Box<Floop>
| has type `&'1 RefCell<Vec<Box<Floop>>>`
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:20:36
|
20 | return FLOOP_ARRAY.with(|floops| &floops.borrow()[self.0]);
| ^---------------^^^^^^^^
| ||
| |temporary value created here
| returns a value referencing data owned by the current function
What I'd like to tell the compiler is that I promise I'm never going to remove entries from the Array and that I'm not going to share values across threads and that the array will last until the end of the program so that I can in essence just return a &'static reference to a Floop object. But Rust doesn't seem to be convinced this is safe.
Is there any kind of Rust helper library that would let me do something like this? Or are there safety holes even when I guarantee I only append / only use data with a single thread?
If you would have a reference, you could send the data to another thread, then watch it after it has been dropped because the creating thread was finished.
Even if you would solve this problem, this would still require unsafe code, as the compiler can't be convinced that growing the Vec won't invalidate existing references. This is true in this case since you're using Box, but the compiler cannot know that.
If you pinky promise to never touch the data after the creating thread has finished, you can use the following code. Note that this code is technically UB as when the Vec will grow, we will move all Boxes, and at least currently, moving a Box invalidates all references deriven from it:
enum Floop {
CaseA,
CaseB,
CaseC(&'static Floop),
CaseD(&'static Floop),
CaseE(Vec<&'static Floop>),
}
thread_local! {
static FLOOP_ARRAY: RefCell<Vec<Box<Floop>>> = RefCell::new(Vec::new());
}
fn alloc_floop(floop: Floop) -> &'static mut Floop {
FLOOP_ARRAY.with(|floops| {
let mut floops = floops.borrow_mut();
floops.push(Box::new(floop));
let floop = &mut **floops.last_mut().unwrap() as *mut Floop;
// SAFETY: We never access the data after it has been dropped, and we are
// the only who access this `Box` as we access a `Box` only immediately
// after pushing it.
unsafe { &mut *floop }
})
}
fn main() {
let floop_a = alloc_floop(Floop::CaseA);
let floop_b = alloc_floop(Floop::CaseC(floop_a));
}
A better solution would be something like a thread-safe arena that you can use in a static, but sadly, I found no crate that implements that.
I'm trying to learn a bit of Rust through a toy application, which involves a tree data structure that is filled dynamically by querying an external source. In the beginning, only the root node is present. The tree structure provides a method get_children(id) that returns a [u32] of the IDs of all the node's children — either this data is already known, or the external source is queried and all the nodes are inserted into the tree.
I'm running into the following problem with the borrow checker that I can't seem to figure out:
struct Node {
id: u32,
value: u64, // in my use case, this type is much larger and should not be copied
children: Option<Vec<u32>>,
}
struct Tree {
nodes: std::collections::HashMap<u32, Node>,
}
impl Tree {
fn get_children(&mut self, id: u32) -> Option<&[u32]> {
// This will perform external queries and add new nodes to the tree
None
}
fn first_even_child(&mut self, id: u32) -> Option<u32> {
let children = self.get_children(id)?;
let result = children.iter().find(|&id| self.nodes.get(id).unwrap().value % 2 == 0)?;
Some(*result)
}
}
Which results in:
error[E0502]: cannot borrow `self.nodes` as immutable because it is also borrowed as mutable
--> src/lib.rs:19:43
|
18 | let children = self.get_children(id)?;
| ---- mutable borrow occurs here
19 | let result = children.iter().find(|&id| self.nodes.get(id).unwrap().value % 2 == 0)?;
| ---- ^^^^^ ---------- second borrow occurs due to use of `self.nodes` in closure
| | |
| | immutable borrow occurs here
| mutable borrow later used by call
Since get_children might insert nodes into the tree, we need a &mut self reference. However, the way I see it, after the value of children is known, self no longer needs to be borrowed mutably. Why does this not work, and how would I fix it?
EDIT -- my workaround
After Chayim Friedman's answer, I decided against returning Self. I mostly ran into the above problem when first calling get_children to get a list of IDs and then using nodes.get() to obtain the corresponding Node. Instead, I refactored to provide the following functions:
impl Tree {
fn load_children(&mut self, id: u32) {
// If not present yet, perform queries to add children to the tree
}
fn iter_children(&self, id: u32) -> Option<IterChildren> {
// Provides an iterator over the children of node `id`
}
}
Downgrading a mutable reference into a shared reference produces a reference that should be kept unique. This is necessary for e.g. Cell::from_mut(), which has the following signature:
pub fn from_mut(t: &mut T) -> &Cell<T>
This method relies on the uniqueness guarantee of &mut T to ensure no references to T are kept directly, only via Cell. If downgrading the reference would mean the unqiueness could have been violated, this method would be unsound, because the value inside the Cell could have been changed by another shared references (via interior mutability).
For more about this see Common Rust Lifetime Misconceptions: downgrading mut refs to shared refs is safe.
To solve this you need to get both shared references from the same shared reference that was created from the mutable reference. You can, for example, also return &Self from get_children():
fn get_children(&mut self, id: u32) -> Option<(&Self, &[u32])> {
// This will perform external queries and add new nodes to the tree
Some((self, &[]))
}
fn first_even_child(&mut self, id: u32) -> Option<u32> {
let (this, children) = self.get_children(id)?;
let result = children.iter().find(|&id| this.nodes.get(id).unwrap().value % 2 == 0)?;
Some(*result)
}
I've encountered a confusing error about the use of a mutable and immutable borrow at the same time, after I expect the mutable borrow to end. I've done a lot of research on similar questions (1, 2, 3, 4, 5) which has led me to believe my problem has something to do with lexical lifetimes (though turning on the NLL feature and compiling on nightly doesn't change the result), I just have no idea what; my situation doesn't seem to fit into any of the scenarios of the other questions.
pub enum Chain<'a> {
Root {
value: String,
},
Child {
parent: &'a mut Chain<'a>,
},
}
impl Chain<'_> {
pub fn get(&self) -> &String {
match self {
Chain::Root { ref value } => value,
Chain::Child { ref parent } => parent.get(),
}
}
pub fn get_mut(&mut self) -> &mut String {
match self {
Chain::Root { ref mut value } => value,
Chain::Child { ref mut parent } => parent.get_mut(),
}
}
}
#[test]
fn test() {
let mut root = Chain::Root { value: "foo".to_string() };
{
let mut child = Chain::Child { parent: &mut root };
*child.get_mut() = "bar".to_string();
} // I expect child's borrow to go out of scope here
assert_eq!("bar".to_string(), *root.get());
}
playground
The error is:
error[E0502]: cannot borrow `root` as immutable because it is also borrowed as mutable
--> example.rs:36:36
|
31 | let mut child = Chain::Child { parent: &mut root };
| --------- mutable borrow occurs here
...
36 | assert_eq!("bar".to_string(), *root.get());
| ^^^^
| |
| immutable borrow occurs here
| mutable borrow later used here
I understand why an immutable borrow happens there, but I do not understand how a mutable borrow is used there. How can both be used at the same place? I'm hoping someone can explain what is happening and how I can avoid it.
In short, &'a mut Chain<'a> is extremely limiting and pervasive.
For an immutable reference &T<'a>, the compiler is allowed to shorten the lifetime of 'a when necessary to match other lifetimes or as part of NLL (this is not always the case, it depends on what T is). However, it cannot do so for mutable references &mut T<'a>, otherwise you could assign it a value with a shorter lifetime.
So when the compiler tries to reconcile the lifetimes when the reference and the parameter are linked &'a mut T<'a>, the lifetime of the reference is conceptually expanded to match the lifetime of the parameter. Which essentially means you've created a mutable borrow that will never be released.
Applying that knowledge to your question: creating a reference-based hierarchy is really only possible if the nested values are covariant over their lifetimes. Which excludes:
mutable references
trait objects
structs with interior mutability
Refer to these variations on the playground to see how these don't quite work as expected.
See also:
Why does linking lifetimes matter only with mutable references?
How do I implement the Chain of Responsibility pattern using a chain of trait objects?
What are the differences when getting an immutable reference from a mutable reference with self-linked lifetimes?
For fun, I'll include a case where the Rust standard library does this sort of thing on purpose. The signature of std::thread::scope looks like:
pub fn scope<'env, F, T>(f: F) -> T
where
F: for<'scope> FnOnce(&'scope Scope<'scope, 'env>) -> T
The Scope that is provided to the user-defined function intentionally has its lifetimes tied in a knot to ensure it is only used in intended ways. This is not always the case since structs may be covariant or contravariant over their generic types, but Scope is defined to be invariant. Also, the only function that can be called on it is .spawn() which intentionally takes &'scope self as the self-parameter as well, ensuring that the reference does not have a shorter lifetime than what is given by scope.
Internally, the standard library contains this documentation (source):
Invariance over 'scope, to make sure 'scope cannot shrink, which is necessary for soundness.
Without invariance, this would compile fine but be unsound:
std::thread::scope(|s| {
s.spawn(|| {
let a = String::from("abcd");
s.spawn(|| println!("{a:?}")); // might run after `a` is dropped
});
});
Even if the lifetime of the reference is invariant with respect to itself, this still avoids many problems above because it uses an immutable reference and interior-mutability. If the parameter to .spawn() required &'scope mut self, then this would not work and run into the same problems above when trying to spawn more than one thread.
The issue isn't lexical lifetimes, and adding an explicit drop won't change the error. The issue is with the &'a mut Chain<'a>- that forces root to be borrowed for its entire life, rendering it useless after the borrow is dropped. As per the comment below, doing this with lifetimes is basically impossible. I would suggest using a box instead. Changing the struct to
pub enum Chain{
Root {
value: String,
},
Child {
parent: Box<Chain>,
},
}
and adjusting the other methods as necesary. Alternatively, using an Rc<RefCell<Chain>> if you want the original to remain usable without consuming self.
I've encountered a confusing error about the use of a mutable and immutable borrow at the same time, after I expect the mutable borrow to end. I've done a lot of research on similar questions (1, 2, 3, 4, 5) which has led me to believe my problem has something to do with lexical lifetimes (though turning on the NLL feature and compiling on nightly doesn't change the result), I just have no idea what; my situation doesn't seem to fit into any of the scenarios of the other questions.
pub enum Chain<'a> {
Root {
value: String,
},
Child {
parent: &'a mut Chain<'a>,
},
}
impl Chain<'_> {
pub fn get(&self) -> &String {
match self {
Chain::Root { ref value } => value,
Chain::Child { ref parent } => parent.get(),
}
}
pub fn get_mut(&mut self) -> &mut String {
match self {
Chain::Root { ref mut value } => value,
Chain::Child { ref mut parent } => parent.get_mut(),
}
}
}
#[test]
fn test() {
let mut root = Chain::Root { value: "foo".to_string() };
{
let mut child = Chain::Child { parent: &mut root };
*child.get_mut() = "bar".to_string();
} // I expect child's borrow to go out of scope here
assert_eq!("bar".to_string(), *root.get());
}
playground
The error is:
error[E0502]: cannot borrow `root` as immutable because it is also borrowed as mutable
--> example.rs:36:36
|
31 | let mut child = Chain::Child { parent: &mut root };
| --------- mutable borrow occurs here
...
36 | assert_eq!("bar".to_string(), *root.get());
| ^^^^
| |
| immutable borrow occurs here
| mutable borrow later used here
I understand why an immutable borrow happens there, but I do not understand how a mutable borrow is used there. How can both be used at the same place? I'm hoping someone can explain what is happening and how I can avoid it.
In short, &'a mut Chain<'a> is extremely limiting and pervasive.
For an immutable reference &T<'a>, the compiler is allowed to shorten the lifetime of 'a when necessary to match other lifetimes or as part of NLL (this is not always the case, it depends on what T is). However, it cannot do so for mutable references &mut T<'a>, otherwise you could assign it a value with a shorter lifetime.
So when the compiler tries to reconcile the lifetimes when the reference and the parameter are linked &'a mut T<'a>, the lifetime of the reference is conceptually expanded to match the lifetime of the parameter. Which essentially means you've created a mutable borrow that will never be released.
Applying that knowledge to your question: creating a reference-based hierarchy is really only possible if the nested values are covariant over their lifetimes. Which excludes:
mutable references
trait objects
structs with interior mutability
Refer to these variations on the playground to see how these don't quite work as expected.
See also:
Why does linking lifetimes matter only with mutable references?
How do I implement the Chain of Responsibility pattern using a chain of trait objects?
What are the differences when getting an immutable reference from a mutable reference with self-linked lifetimes?
For fun, I'll include a case where the Rust standard library does this sort of thing on purpose. The signature of std::thread::scope looks like:
pub fn scope<'env, F, T>(f: F) -> T
where
F: for<'scope> FnOnce(&'scope Scope<'scope, 'env>) -> T
The Scope that is provided to the user-defined function intentionally has its lifetimes tied in a knot to ensure it is only used in intended ways. This is not always the case since structs may be covariant or contravariant over their generic types, but Scope is defined to be invariant. Also, the only function that can be called on it is .spawn() which intentionally takes &'scope self as the self-parameter as well, ensuring that the reference does not have a shorter lifetime than what is given by scope.
Internally, the standard library contains this documentation (source):
Invariance over 'scope, to make sure 'scope cannot shrink, which is necessary for soundness.
Without invariance, this would compile fine but be unsound:
std::thread::scope(|s| {
s.spawn(|| {
let a = String::from("abcd");
s.spawn(|| println!("{a:?}")); // might run after `a` is dropped
});
});
Even if the lifetime of the reference is invariant with respect to itself, this still avoids many problems above because it uses an immutable reference and interior-mutability. If the parameter to .spawn() required &'scope mut self, then this would not work and run into the same problems above when trying to spawn more than one thread.
The issue isn't lexical lifetimes, and adding an explicit drop won't change the error. The issue is with the &'a mut Chain<'a>- that forces root to be borrowed for its entire life, rendering it useless after the borrow is dropped. As per the comment below, doing this with lifetimes is basically impossible. I would suggest using a box instead. Changing the struct to
pub enum Chain{
Root {
value: String,
},
Child {
parent: Box<Chain>,
},
}
and adjusting the other methods as necesary. Alternatively, using an Rc<RefCell<Chain>> if you want the original to remain usable without consuming self.
I have a tree-like structure like the following:
use std::{cell::RefCell, collections::HashMap, rc::Rc};
struct Node<T> {
vals: HashMap<String, T>,
parent: Option<Rc<RefCell<Node<T>>>>,
}
This is a chained hash map: each node contains a hash map and an (optional, as the root of the tree has no parent) shared pointer to its parent. Multiple children can share the same parent.
If I want to get a clone of a value out of this chained hash map, I use recursion to walk up the tree, like so:
impl<T> Node<T> {
pub fn get(&self, name: &str) -> Option<T> {
self.vals
.get(name)
.cloned()
.or_else(|| self.parent.as_ref().and_then(|p| p.borrow().get(name)))
}
}
However, I need a mutable reference to an element contained in this tree. Since I cannot return a 'standard' mutable reference to an element, due to it being contained in a RefCell, I thought about using RefMut and the RefMut::map function to obtain one, like so:
use std::cell::RefMut;
impl<T> Node<T> {
pub fn get_mut<'a>(node: RefMut<'a, Node<T>>, name: &str) -> Option<RefMut<'a, T>> {
if node.vals.contains_key(name) {
Some(RefMut::map(node, |n| n.vals.get_mut(name).unwrap()))
} else {
node.parent.and_then(|p| Node::get_mut(p.borrow_mut(), name))
}
}
}
This does not compile: the return value references its child node (due to it being also dependent on its child's borrow), and the RefMut pointing to the child goes out of scope at function exit:
error[E0515]: cannot return value referencing function parameter `p`
--> src/lib.rs:16:31
|
16 | .and_then(|p| Node::get_mut(p.borrow_mut(), name))
| ^^^^^^^^^^^^^^-^^^^^^^^^^^^^^^^^^^^
| | |
| | `p` is borrowed here
| returns a value referencing data owned by the current function
error[E0507]: cannot move out of borrowed content
--> src/lib.rs:15:13
|
15 | node.parent
| ^^^^^^^^^^^ cannot move out of borrowed content
I do not understand how I could go about getting something deferenceable out of this tree. I assume I might need a sort of "RefMut chain" in order to extend the lifetime of the child node RefMut, but wouldn't that also create multiple mutable references to (components of) the same Node?
Alternatively, is there a way to get some sort of Rc<RefCell> pointing to one of the values in a node, as to avoid this sort of dependency chain? I really am stumped as to what to do.
Please do not suggest passing a function to apply to the value with the given name rather than returning a reference, as that does not apply to my use case: I really do need just a mutable reference (or something allowing me to obtain one.)
I do not believe that this is a duplicate of How do I return a reference to something inside a RefCell without breaking encapsulation?, as that answer only deals with returning a reference to a component of a value contained in a single RefCell (which I already do using RefMut::map). My problem involves a chain of Rc<RefCell>s, which that question does not address.