I am trying to implement a scenegraph-like data structure in Rust. I would like an equivalent to this C++ code expressed in safe Rust:
struct Node
{
Node* parent; // should be mutable, and nullable (no parent)
std::vector<Node*> child;
virtual ~Node()
{
for(auto it = child.begin(); it != child.end(); ++it)
{
delete *it;
}
}
void addNode(Node* newNode)
{
if (newNode->parent)
{
newNode->parent.child.erase(newNode->parent.child.find(newNode));
}
newNode->parent = this;
child.push_back(newNode);
}
}
Properties I want:
the parent takes ownership of its children
the nodes are accessible from the outside via a reference of some kind
events that touch one Node can potentially mutate the whole tree
Rust tries to ensure memory safety by forbidding you from doing things that might potentially be unsafe. Since this analysis is performed at compile-time, the compiler can only reason about a subset of manipulations that are known to be safe.
In Rust, you could easily store either a reference to the parent (by borrowing the parent, thus preventing mutation) or the list of child nodes (by owning them, which gives you more freedom), but not both (without using unsafe). This is especially problematic for your implementation of addNode, which requires mutable access to the given node's parent. However, if you store the parent pointer as a mutable reference, then, since only a single mutable reference to a particular object may be usable at a time, the only way to access the parent would be through a child node, and you'd only be able to have a single child node, otherwise you'd have two mutable references to the same parent node.
If you want to avoid unsafe code, there are many alternatives, but they'll all require some sacrifices.
The easiest solution is to simply remove the parent field. We can define a separate data structure to remember the parent of a node while we traverse a tree, rather than storing it in the node itself.
First, let's define Node:
#[derive(Debug)]
struct Node<T> {
data: T,
children: Vec<Node<T>>,
}
impl<T> Node<T> {
fn new(data: T) -> Node<T> {
Node { data: data, children: vec![] }
}
fn add_child(&mut self, child: Node<T>) {
self.children.push(child);
}
}
(I added a data field because a tree isn't super useful without data at the nodes!)
Let's now define another struct to track the parent as we navigate:
#[derive(Debug)]
struct NavigableNode<'a, T: 'a> {
node: &'a Node<T>,
parent: Option<&'a NavigableNode<'a, T>>,
}
impl<'a, T> NavigableNode<'a, T> {
fn child(&self, index: usize) -> NavigableNode<T> {
NavigableNode {
node: &self.node.children[index],
parent: Some(self)
}
}
}
impl<T> Node<T> {
fn navigate<'a>(&'a self) -> NavigableNode<T> {
NavigableNode { node: self, parent: None }
}
}
This solution works fine if you don't need to mutate the tree as you navigate it and you can keep the parent NavigableNode objects around (which works fine for a recursive algorithm, but doesn't work too well if you want to store a NavigableNode in some other data structure and keep it around). The second restriction can be alleviated by using something other than a borrowed pointer to store the parent; if you want maximum genericity, you can use the Borrow trait to allow direct values, borrowed pointers, Boxes, Rc's, etc.
Now, let's talk about zippers. In functional programming, zippers are used to "focus" on a particular element of a data structure (list, tree, map, etc.) so that accessing that element takes constant time, while still retaining all the data of that data structure. If you need to navigate your tree and mutate it during the navigation, while retaining the ability to navigate up the tree, then you could turn a tree into a zipper and perform the modifications through the zipper.
Here's how we could implement a zipper for the Node defined above:
#[derive(Debug)]
struct NodeZipper<T> {
node: Node<T>,
parent: Option<Box<NodeZipper<T>>>,
index_in_parent: usize,
}
impl<T> NodeZipper<T> {
fn child(mut self, index: usize) -> NodeZipper<T> {
// Remove the specified child from the node's children.
// A NodeZipper shouldn't let its users inspect its parent,
// since we mutate the parents
// to move the focused nodes out of their list of children.
// We use swap_remove() for efficiency.
let child = self.node.children.swap_remove(index);
// Return a new NodeZipper focused on the specified child.
NodeZipper {
node: child,
parent: Some(Box::new(self)),
index_in_parent: index,
}
}
fn parent(self) -> NodeZipper<T> {
// Destructure this NodeZipper
let NodeZipper { node, parent, index_in_parent } = self;
// Destructure the parent NodeZipper
let NodeZipper {
node: mut parent_node,
parent: parent_parent,
index_in_parent: parent_index_in_parent,
} = *parent.unwrap();
// Insert the node of this NodeZipper back in its parent.
// Since we used swap_remove() to remove the child,
// we need to do the opposite of that.
parent_node.children.push(node);
let len = parent_node.children.len();
parent_node.children.swap(index_in_parent, len - 1);
// Return a new NodeZipper focused on the parent.
NodeZipper {
node: parent_node,
parent: parent_parent,
index_in_parent: parent_index_in_parent,
}
}
fn finish(mut self) -> Node<T> {
while let Some(_) = self.parent {
self = self.parent();
}
self.node
}
}
impl<T> Node<T> {
fn zipper(self) -> NodeZipper<T> {
NodeZipper { node: self, parent: None, index_in_parent: 0 }
}
}
To use this zipper, you need to have ownership of the root node of the tree. By taking ownership of the nodes, the zipper can move things around in order to avoid copying or cloning nodes. When we move a zipper, we actually drop the old zipper and create a new one (though we could also do it by mutating self, but I thought it was clearer that way, plus it lets you chain method calls).
If the above options are not satisfactory, and you must absolutely store the parent of a node in a node, then the next best option is to use Rc<RefCell<Node<T>>> to refer to the parent and Weak<RefCell<Node<T>>> to the children. Rc enables shared ownership, but adds overhead to perform reference counting at runtime. RefCell enables interior mutability, but adds overhead to keep track of the active borrows at runtime. Weak is like Rc, but it doesn't increment the reference count; this is used to break reference cycles, which would prevent the reference count from dropping to zero, causing a memory leak. See DK.'s answer for an implementation using Rc, Weak and RefCell.
The problem is that this data structure is inherently unsafe; it doesn't have a direct equivalent in Rust that doesn't use unsafe. This is by design.
If you want to translate this into safe Rust code, you need to be more specific about what, exactly, you want from it. I know you listed some properties above, but often people coming to Rust will say "I want everything I have in this C/C++ code", to which the direct answer is "well, you can't."
You're also, unavoidably, going to have to change how you approach this. The example you've given has pointers without any ownership semantics, mutable aliasing, and cycles; all of which Rust will not allow you to simply ignore like C++ does.
The simplest solution is to just get rid of the parent pointer, and maintain that externally (like a filesystem path). This also plays nicely with borrowing because there are no cycles anywhere:
pub struct Node1 {
children: Vec<Node1>,
}
If you need parent pointers, you could go half-way and use Ids instead:
use std::collections::BTreeMap;
type Id = usize;
pub struct Tree {
descendants: BTreeMap<Id, Node2>,
root: Option<Id>,
}
pub struct Node2 {
parent: Id,
children: Vec<Id>,
}
The BTreeMap is effectively your "address space", bypassing borrowing and aliasing issues by not directly using memory addresses.
Of course, this introduces the problem of a given Id not being tied to the particular tree, meaning that the node it belongs to could be destroyed, and now you have what is effectively a dangling pointer. But, that's the price you pay for having aliasing and mutation. It's also less direct.
Or, you could go whole-hog and use reference-counting and dynamic borrow checking:
use std::cell::RefCell;
use std::rc::{Rc, Weak};
// Note: do not derive Clone to make this move-only.
pub struct Node3(Rc<RefCell<Node3_>>);
pub type WeakNode3 = Weak<RefCell<Node3>>;
pub struct Node3_ {
parent: Option<WeakNode3>,
children: Vec<Node3>,
}
impl Node3 {
pub fn add(&self, node: Node3) {
// No need to remove from old parent; move semantics mean that must have
// already been done.
(node.0).borrow_mut().parent = Some(Rc::downgrade(&self.0));
self.children.push(node);
}
}
Here, you'd use Node3 to transfer ownership of a node between parts of the tree, and WeakNode3 for external references. Or, you could make Node3 cloneable and add back the logic in add to make sure a given node doesn't accidentally stay a child of the wrong parent.
This is not strictly better than the second option, because this design absolutely cannot benefit from static borrow-checking. The second one can at least prevent you from mutating the graph from two places at once at compile time; here, if that happens, you'll just crash.
The point is: you can't just have everything. You have to decide which operations you actually need to support. At that point, it's usually just a case of picking the types that give you the necessary properties.
In certain cases, you can also use an arena. An arena guarantees that values stored in it will have the same lifetime as the arena itself. This means that adding more values will not invalidate any existing lifetimes, but moving the arena will. Thus, such a solution is not viable if you need to return the tree.
This solves the problem by removing the ownership from the nodes themselves.
Here's an example that also uses interior mutability to allow a node to be mutated after it is created. In other cases, you can remove this mutability if the tree is constructed once and then simply navigated.
use std::{
cell::{Cell, RefCell},
fmt,
};
use typed_arena::Arena; // 1.6.1
struct Tree<'a, T: 'a> {
nodes: Arena<Node<'a, T>>,
}
impl<'a, T> Tree<'a, T> {
fn new() -> Tree<'a, T> {
Self {
nodes: Arena::new(),
}
}
fn new_node(&'a self, data: T) -> &'a mut Node<'a, T> {
self.nodes.alloc(Node {
data,
tree: self,
parent: Cell::new(None),
children: RefCell::new(Vec::new()),
})
}
}
struct Node<'a, T: 'a> {
data: T,
tree: &'a Tree<'a, T>,
parent: Cell<Option<&'a Node<'a, T>>>,
children: RefCell<Vec<&'a Node<'a, T>>>,
}
impl<'a, T> Node<'a, T> {
fn add_node(&'a self, data: T) -> &'a Node<'a, T> {
let child = self.tree.new_node(data);
child.parent.set(Some(self));
self.children.borrow_mut().push(child);
child
}
}
impl<'a, T> fmt::Debug for Node<'a, T>
where
T: fmt::Debug,
{
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{:?}", self.data)?;
write!(f, " (")?;
for c in self.children.borrow().iter() {
write!(f, "{:?}, ", c)?;
}
write!(f, ")")
}
}
fn main() {
let tree = Tree::new();
let head = tree.new_node(1);
let _left = head.add_node(2);
let _right = head.add_node(3);
println!("{:?}", head); // 1 (2 (), 3 (), )
}
TL;DR: DK.'s second version doesn't compile because parent has another type than self.0, fix it by converting it to a WeakNode. Also, in the line directly below, "self" doesn't have a "children" attribute but self.0 has.
I corrected the version of DK. so it compiles and works. Here is my Code:
dk_tree.rs
use std::cell::RefCell;
use std::rc::{Rc, Weak};
// Note: do not derive Clone to make this move-only.
pub struct Node(Rc<RefCell<Node_>>);
pub struct WeakNode(Weak<RefCell<Node_>>);
struct Node_ {
parent: Option<WeakNode>,
children: Vec<Node>,
}
impl Node {
pub fn new() -> Self {
Node(Rc::new(RefCell::new(Node_ {
parent: None,
children: Vec::new(),
})))
}
pub fn add(&self, node: Node) {
// No need to remove from old parent; move semantics mean that must have
// already been done.
node.0.borrow_mut().parent = Some(WeakNode(Rc::downgrade(&self.0)));
self.0.borrow_mut().children.push(node);
}
// just to have something visually printed
pub fn to_str(&self) -> String {
let mut result_string = "[".to_string();
for child in self.0.borrow().children.iter() {
result_string.push_str(&format!("{},", child.to_str()));
}
result_string += "]";
result_string
}
}
and then the main function in main.rs:
mod dk_tree;
use crate::dk_tree::{Node};
fn main() {
let root = Node::new();
root.add(Node::new());
root.add(Node::new());
let inner_child = Node::new();
inner_child.add(Node::new());
inner_child.add(Node::new());
root.add(inner_child);
let result = root.to_str();
println!("{result:?}");
}
The reason I made the WeakNode be more like the Node is to have an easier conversion between the both
I'm trying Advent of Code in Rust this year, as a way of learning the language. I've parsed the input (from day 7) into the following structure:
struct Process {
name: String,
weight: u32,
children: Vec<String>,
parent: Option<String>
}
These are stored in a HashMap<String, Process>. Now I want to iterate over the values in the map and update the parent values, based on what I find in the parent's "children" vector.
What doesn't work is
for p in self.processes.values() {
for child_name in p.children {
let mut child = self.processes.get_mut(child_name).expect("Child not found.");
child.parent = p.name;
}
}
I can't have both a mutable reference to the HashMap (self.processes) and a non-mutable reference, or two mutable references.
So, what is the most idiomatic way to accomplish this in Rust? The two options I can see are:
Copy the parent/child relationships into a new temporary data structure in one pass, and then update the Process structs in a second pass, after the immutable reference is out of scope.
Change my data structure to put "parent" in its own HashMap.
Is there a third option?
Yes, you can grant internal mutability to the HashMap's values using RefCell:
struct ProcessTree {
processes: HashMap<String, RefCell<Process>>, // change #1
}
impl ProcessTree {
fn update_parents(&self) {
for p in self.processes.values() {
let p = p.borrow(); // change #2
for child_name in &p.children {
let mut child = self.processes
.get(child_name) // change #3
.expect("Child not found.")
.borrow_mut(); // change #4
child.parent = Some(p.name.clone());
}
}
}
}
borrow_mut will panic at runtime if the child is already borrowed with borrow. This happens if a process is its own parent (which should presumably never happen, but in a more robust program you'd want to give a meaningful error message instead of just panicking).
I invented some names and made a few small changes (besides the ones specifically indicated) to make this code compile. Notably, p.name.clone() makes a full copy of p.name. This is necessary because both name and parent are owned Strings.
I'm attempting to build and traverse a DAG. There seems to be two feasible approaches: use Rc<RefCell<Node>> for edges, or utilize an arena allocator and some unsafe code. (See details here.)
I'm opting for the former, but having difficulty traversing the graph to its edges, as any borrow of a child node relies on borrows to its parents:
use std::cell::RefCell;
use std::rc::Rc;
// See: https://aminb.gitbooks.io/rust-for-c/content/graphs/index.html,
// https://github.com/nrc/r4cppp/blob/master/graphs/src/ref_graph.rs
pub type Link<T> = Rc<RefCell<T>>;
pub struct DagNode {
/// Each node can have several edges flowing *into* it, i.e. several owners,
/// hence the use of Rc. RefCell is used so we can have mutability
/// while building the graph.
pub edge: Option<Link<DagNode>>,
// Other data here
}
// Attempt to walk down the DAG until we reach a leaf.
fn walk_to_end(node: &Link<DagNode>) -> &Link<DagNode> {
let nb = node.borrow();
match nb.edge {
Some(ref prev) => walk_to_end(prev),
// Here be dragons: the borrow relies on all previous borrows,
// so this fails to compile.
None => node
}
}
I could modify the reference counts, i.e.
fn walk_to_end(node: Link<HistoryNode>) -> Link<HistoryNode> {
let nb = node.borrow();
match nb.previous {
Some(ref prev) => walk_to_end(prev.clone()),
None => node.clone()
}
}
but bumping the reference counts every time you traverse a node seems like quite the hack. What is the idiomatic approach here?
Rc isn't really a problem here: if you get rid of the RefCells, everything just compiles. Actually, in some situations, this might be a solution: if you need to mutate the contents of the node, but not the edges, you can just change your data structure so the edges aren't inside a RefCell.
The argument also isn't really the problem; this compiles:
fn walk_to_end(node: &Link<DagNode>) -> Link<DagNode> {
let nb = node.borrow();
match nb.edge {
Some(ref prev) => walk_to_end(prev),
None => node.clone()
}
}
The problem here is really returning the result. Basically, there isn't any way to write the return value you want. I mean, you could theoretically make your method return a wrapper around a Vec<Ref<T>>, but that's a lot more expensive than just bumping the reference count on the result.
More generally, Rc<RefCell<T>> is difficult to work with because it's a complicated data structure: you can safely mutate multiple nodes at the same time, and it keeps track of exactly how many edges reference each node.
Note that you don't have to dip into unsafe code to use an arena. https://crates.io/crates/typed-arena provides a safe API for arenas. I'm not sure why the example you linked to uses UnsafeCell; it certainly isn't necessary.
I want to be able to store a struct called Child inside a Parent, where the Child contains a reference back to the parent.
It works if I have the Child structs directly inside the parent like this:
struct Parent<'s> {
cache: RefCell<Vec<Child<'s>>>
}
But if I move the Vec into a separate struct, then it will fail to compile with lifetime errors.
struct Parent<'s> {
cache: RefCell<Cache<'s>>
}
struct Cache<'s> {
children: Vec<Child<'s>>
}
It is possible to make this example work with the separate structs?
Here's the full working code, which compiles fine. When move the children into the separate struct then it fails.
My analysis of the problem:
When Parent contains children directly, 's is the same lifetime as the scope of the Parent struct itself, thus I can call methods that take &'s self on Parent.
When Parent contains Cache which contains children, 's is the same lifetime as the scope of the Cache struct, which is created before Parent, thus it is impossible to call methods on Parent that take &'s self. Attempting to do so gives the error
<anon>:33:15: 33:16 error: `p` does not live long enough
<anon>:33 let obj = p.create_object();
^
<anon>:30:48: 38:2 note: reference must be valid for the block suffix following statement 0 at 30:47...
<anon>:30 let cache = Cache { children: Vec::new() }; // the lifetime `'s` is essentially from this line to the end of the program
<anon>:31 let mut p = Parent { cache: RefCell::new(cache) }; // although the Parent instance was created here, 's still refers to the lifetime before it
<anon>:32 // this fails because p doesn't live long enough
<anon>:33 let obj = p.create_object();
I need a way of shortening 's to the scope of Parent, not the scope of the Cache.
Disclaimer:
This question is very similar to one I asked earlier (https://stackoverflow.com/questions/32579518/rust-lifetime-error-with-self-referencing-struct?noredirect=1#comment53014063_32579518) that was marked as duplicate. I've read through the answer and I believe I'm beyond that as I can get the lifetimes of references right (as shown in my first example). I'm asking this (now slightly different) question again because I now have a concrete example that works, and one that doesn't work. I'm sure that what can be done with one struct can be done with two, right?
You can make it compile by forcing the Cache and the Parent to have the same lifetime by defining them in the same let binding.
fn main() {
let (cache, mut p);
cache = Cache { children: Vec::new() };
p = Parent { cache: RefCell::new(cache) };
let obj = p.create_object();
let c1 = Child { parent: &p, data: 1 };
p.cache.borrow_mut().children.push(c1);
}
Here, we're essentially declaring a destructured tuple and then initializing it. We cannot initialize the tuple directly on the let binding:
let (cache, mut p) = (Cache { children: Vec::new() }, Parent { cache: RefCell::new(cache) });
because the initializer for p references cache, but that name is not defined until the end of the let statement. The separate initialization works because the compiler tracks which variables are initialized; if you swap the order of the assignments, you'll get a compiler error:
<anon>:31:38: 31:43 error: use of possibly uninitialized variable: `cache` [E0381]
<anon>:31 p = Parent { cache: RefCell::new(cache) };
In my code I have a mutually recursive tree structure which looks something like the following:
enum Child<'r> {
A(&'r Node<'r>),
B,
C
}
struct Node<'r> {
children : [&'r Child<'r>,..25]
}
impl <'r>Node<'r> {
fn new() -> Node {
Node {
children : [&B,..25]
}
}
}
but it doesn't compile as-is. What is the best way to modify it to make it do so?
Here is a version where the nodes can be modified from outside the tree, which I presume is what was asked for.
use std::rc::Rc;
use std::cell::RefCell;
struct Node {
a : Option<Rc<RefCell<Node>>>,
b : Option<Rc<RefCell<Node>>>,
value: int
}
impl Node {
fn new(value: int) -> Rc<RefCell<Node>> {
let node = Node {
a: None,
b: None,
value: value
};
Rc::new(RefCell::new(node))
}
}
fn main() {
let first = Node::new(0);
let second = Node::new(0);
let third = Node::new(0);
first.borrow_mut().a = Some(second.clone());
second.borrow_mut().a = Some(third.clone());
second.borrow_mut().value = 1;
third.borrow_mut().value = 2;
println!("Value of second: {}", first.borrow().a.get_ref().borrow().value);
println!("Value of third: {}", first.borrow().a.get_ref().borrow().a.get_ref().borrow().value);
}
Rc is a reference counted pointer and allows a single object to have multiple owners. It doesn't allow mutation however, so a RefCell is required which allows runtime checked mutable borrowing. That's why the code uses Rc<RefCell<Node>>. The Option type is used to represent potential children with Option<Rc<RefCell<Node>>>.
Since, the Rc type auto dereferences, it's possible to directly call RefCell methods on it. These are borrow() and borrow_mut() which return a reference and mutable reference to the underlying Node. There also exist try_borrow() and try_borrow_mut() variants which cannot fail.
get_ref() is a method of the Option type which returns a reference to the underlying Rc<RefCell<Node>>. In a real peogram we would probably want to check whether the Option contains anything beforehand.
Why does the original code not work? References &T imply non-ownership, so something else would have to own the Nodes. While it would be possible to build a tree of &Node types, it wouldn't be possible to modify the Nodes outside of the tree because once borrowed, an object cannot be modified by anything else than the borrowing object.