Trying to split out self-referential data into a seperate struct - struct

I want to be able to store a struct called Child inside a Parent, where the Child contains a reference back to the parent.
It works if I have the Child structs directly inside the parent like this:
struct Parent<'s> {
cache: RefCell<Vec<Child<'s>>>
}
But if I move the Vec into a separate struct, then it will fail to compile with lifetime errors.
struct Parent<'s> {
cache: RefCell<Cache<'s>>
}
struct Cache<'s> {
children: Vec<Child<'s>>
}
It is possible to make this example work with the separate structs?
Here's the full working code, which compiles fine. When move the children into the separate struct then it fails.
My analysis of the problem:
When Parent contains children directly, 's is the same lifetime as the scope of the Parent struct itself, thus I can call methods that take &'s self on Parent.
When Parent contains Cache which contains children, 's is the same lifetime as the scope of the Cache struct, which is created before Parent, thus it is impossible to call methods on Parent that take &'s self. Attempting to do so gives the error
<anon>:33:15: 33:16 error: `p` does not live long enough
<anon>:33 let obj = p.create_object();
^
<anon>:30:48: 38:2 note: reference must be valid for the block suffix following statement 0 at 30:47...
<anon>:30 let cache = Cache { children: Vec::new() }; // the lifetime `'s` is essentially from this line to the end of the program
<anon>:31 let mut p = Parent { cache: RefCell::new(cache) }; // although the Parent instance was created here, 's still refers to the lifetime before it
<anon>:32 // this fails because p doesn't live long enough
<anon>:33 let obj = p.create_object();
I need a way of shortening 's to the scope of Parent, not the scope of the Cache.
Disclaimer:
This question is very similar to one I asked earlier (https://stackoverflow.com/questions/32579518/rust-lifetime-error-with-self-referencing-struct?noredirect=1#comment53014063_32579518) that was marked as duplicate. I've read through the answer and I believe I'm beyond that as I can get the lifetimes of references right (as shown in my first example). I'm asking this (now slightly different) question again because I now have a concrete example that works, and one that doesn't work. I'm sure that what can be done with one struct can be done with two, right?

You can make it compile by forcing the Cache and the Parent to have the same lifetime by defining them in the same let binding.
fn main() {
let (cache, mut p);
cache = Cache { children: Vec::new() };
p = Parent { cache: RefCell::new(cache) };
let obj = p.create_object();
let c1 = Child { parent: &p, data: 1 };
p.cache.borrow_mut().children.push(c1);
}
Here, we're essentially declaring a destructured tuple and then initializing it. We cannot initialize the tuple directly on the let binding:
let (cache, mut p) = (Cache { children: Vec::new() }, Parent { cache: RefCell::new(cache) });
because the initializer for p references cache, but that name is not defined until the end of the let statement. The separate initialization works because the compiler tracks which variables are initialized; if you swap the order of the assignments, you'll get a compiler error:
<anon>:31:38: 31:43 error: use of possibly uninitialized variable: `cache` [E0381]
<anon>:31 p = Parent { cache: RefCell::new(cache) };

Related

Moving something that is in scope of a reference when those references will go out of scope immediately after the move

I am learning Rust coming from a C++ background. I am writing a parser that reads valid formulas such as 12 x (y - 4) and puts them in an abstract syntax tree (AST). I use an enum to store the possible nodes in the AST as you can see in the (heavily simplified) code. I ran into a problem. I want to simplify the expression -(-(12)) for instance by moving the 12, not copying. In general the 12 may be replaced by a deep AST. Currently I identify the situation in the function simplify( .. ) but it wont compile.
I do know why it doesn't compile: The node I'm trying to move (i.e. the 12 in my example) is still in scope as a reference from in the match clause so this is exactly rust preventing me from a possible problem with references. But in my case I know this is what I want, and moreover I exit the function right after I do the moving in the line *node = **child;, so the earlier references will go out of scope right there.
Is there an idiomatic Rust-esque type of way to solve this problem? I would really rather not copy the double--negated subtree.
#[derive(Debug)]
enum Node {
Num(i32),
UnaryMinus(Box<Node>),
}
use Node::*;
fn simplify(node: &mut Node) {
match &node {
Num(_) => (),
UnaryMinus(inner) => match &**inner {
Num(_) => (),
UnaryMinus(child) => {
// cannot move out of `**child` which is behind a shared reference
*node = **child;
return;
}
},
}
}
fn main() {
let double_minus = UnaryMinus(Box::new(UnaryMinus(Box::new(Num(12)))));
}
There are two problems here: The error you're getting concretely is because you're matching against &**inner instead of &mut **inner. It's never okay to move out of a shared reference, but there are circumstances (which we'll see) when it's okay to move out of a mutable one.
With that fixed, you will get the same error just about mutable references. That's because you can't move out of a mutable reference just because you know you're the only one holding it. A move leaves its source uninitialized, and a reference, mutable or shared, to uninitialized memory is never okay. You'll have to leave something in that memory that you're moving out of. You can do that with std::mem::swap. It takes two mutable references and swaps out the contents.
Now, one could obviously try to call std::mem::swap(node, &mut child) but this won't work simply because node is already mutably borrowed in the match expression and you can't mutably borrow something twice.
Moreover, this would leak memory as you now have a reference cycle where node -> inner -> node. This, although perfectly valid to do in Rust, usually isn't what you want.
Instead you'll need some sort of dummy that you can put in child's place. Some simple variant of your enum that can be safely dropped by inner once it gets dropped. In this example that could be Node::Num(0):
#[derive(Debug)]
enum Node {
Num(i32),
UnaryMinus(Box<Node>),
}
// This is just to verify that everything gets dropped properly
// and we don't leak any memory
impl Drop for Node {
fn drop(&mut self) {
println!("dropping {:?}", self);
}
}
use Node::*;
fn simplify(node: &mut Node) {
// `placeholder` will be our dummy
let (mut can_simplify, mut placeholder) = (false, Num(0));
match node {
Num(_) => (),
// we'll need to borrow `child` mutably later, so we have to
// match on `&mut **inner` not `&**inner`
UnaryMinus(inner) => match &mut **inner {
Num(_) => (),
UnaryMinus(ref mut child) => {
// move the contents of `child` into `placeholder` and vice versa
std::mem::swap(&mut placeholder, child);
can_simplify = true;
}
},
}
if can_simplify {
// now we can safely move into `node`
*node = placeholder;
// you could skip the conditional if all other, non-simplifying
// branches return before this statement
}
}
fn main() {
let mut double_minus = UnaryMinus(Box::new(UnaryMinus(Box::new(Num(12)))));
simplify(&mut double_minus);
println!("{:?}", double_minus);
}

How do I store a struct in two places?

I'm doing Advent Of Code Day 7 in Rust. I have to parse a tree out of order like so:
a(10)
c(5) -> a, b
b(20)
That says c is the root with a and b as its children.
I handle this by parsing each line, making an object, and storing it in a hash by name. If it shows up later as a child, like a, I can use that hash to lookup the object and apply it as a child. If it shows up as a child before being defined, like b, I can create a partial version and update it via the hash. The above would be something like:
let mut np = NodeParser{
map: HashMap::new(),
root: None,
};
{
// This would be the result of parsing "a(10)".
{
let a = Node{
name: "a".to_string(),
weight: Some(10),
children: None
};
np.map.insert( a.name.clone(), a );
}
// This is the result of parsing "c(5) -> a, b".
// Note that it creates 'b' with incomplete data.
{
let b = Node{
name: "b".to_string(),
weight: None,
children: None
};
np.map.insert("b".to_string(), b);
let c = Node{
name: "c".to_string(),
weight: Some(5),
children: Some(vec![
*np.map.get("a").unwrap(),
// ^^^^^^^^^^^^^^^^^^^^^^^^^ cannot move out of borrowed content
*np.map.get("b").unwrap()
// ^^^^^^^^^^^^^^^^^^^^^^^^^ cannot move out of borrowed content
])
};
np.map.insert( c.name.clone(), c );
}
// Parsing "b(20)", it's already seen b, so it updates it.
// This also updates the entry in c.children. It avoids
// having to search all nodes for any with b as a child.
{
let mut b = np.map.get_mut( "b" ).unwrap();
b.weight = Some(20);
}
}
I might want to look up a node and look at its children.
// And if I wanted to look at the children of c...
let node = np.map.get("c").unwrap();
for child in node.children.unwrap() {
// ^^^^ cannot move out of borrowed content
println!("{:?}", child);
}
Rust does not like this. It doesn't like that both NodeParser.map and Node.children own a node.
error[E0507]: cannot move out of borrowed content
--> /Users/schwern/tmp/test.rs:46:21
|
46 | *np.map.get("a").unwrap(),
| ^^^^^^^^^^^^^^^^^^^^^^^^^ cannot move out of borrowed content
error[E0507]: cannot move out of borrowed content
--> /Users/schwern/tmp/test.rs:49:21
|
49 | *np.map.get("b").unwrap()
| ^^^^^^^^^^^^^^^^^^^^^^^^^ cannot move out of borrowed content
It doesn't like that the for loop is trying to borrow the node to iterate because I've already borrowed the node from the NodeParser that owns it.
error[E0507]: cannot move out of borrowed content
--> /Users/schwern/tmp/test.rs:68:18
|
68 | for child in node.children.unwrap() {
| ^^^^ cannot move out of borrowed content
I think I understand what I'm doing wrong, but I'm not sure how to make it right.
How should I construct this to make the borrower happy? Because of the way NodeParser.map and Node.children must be linked, copying is not an option.
Here is the code to test with. In the real code both Node and NodeParser have implementations and methods.
One option is unsafe code ... but I would suggest avoiding that if you're using the Advent of Code to learn idiomatic Rust and not just drop all the safety its trying to give you.
Another option is to reference count the Node instances so that the borrow checker is happy and the compiler knows how to clean things up. The std::rc::Rc type does this for you ... and essentially every call to clone() just increments a reference count and returns a new Rc instance. Then every time an object is dropped, the Drop implementation just decrements the reference count.
As for the iteration .. for x in y is syntactic sugar for for x in y.into_iter(). This is attempting to move the contents of children out of node (notice in the IntoIterator trait, into_iter(self) takes ownership of self). To rectify this, you can ask for a reference instead when iterating, using for x in &y. This essentially becomes for x in y.iter(), which does not move the contents.
Here are these suggestions in action.
use std::collections::HashMap;
use std::rc::Rc;
struct NodeParser {
map: HashMap<String, Rc<Node>>,
root: Option<Node>,
}
#[derive(Debug)]
struct Node {
name: String,
children: Option<Vec<Rc<Node>>>,
}
fn main() {
let mut np = NodeParser{
map: HashMap::new(),
root: None,
};
let a = Rc::new(Node{ name: "a".to_string(), children: None });
np.map.insert( a.name.clone(), a.clone() );
let b = Rc::new(Node{ name: "b".to_string(), children: None });
np.map.insert( b.name.clone(), b.clone() );
let c = Rc::new(Node{
name: "c".to_string(),
children: Some(vec![a, b])
});
np.map.insert( c.name.clone(), c.clone() );
let node = np.map.get("c").unwrap();
for child in &node.children {
println!("{:?}", child);
}
}
EDIT: I will expand on my comment here. You can use lifetimes here too if you want, but I'm concerned that the lifetime solution will work against the MCVE and won't work once applied to the actual problem the OP (not just of this question... others as well) actually has. Lifetimes are tricky in Rust and small things like re-ordering the instantiation of variables to allow the lifetime solution can throw people off. My concern being they will run into lifetime issues and therefore the answers won't be appropriate to their actual situation even if it works for the MCVE. Maybe I overthink that though..

How can I simultaneously iterate over a Rust HashMap and modify some of its values?

I'm trying Advent of Code in Rust this year, as a way of learning the language. I've parsed the input (from day 7) into the following structure:
struct Process {
name: String,
weight: u32,
children: Vec<String>,
parent: Option<String>
}
These are stored in a HashMap<String, Process>. Now I want to iterate over the values in the map and update the parent values, based on what I find in the parent's "children" vector.
What doesn't work is
for p in self.processes.values() {
for child_name in p.children {
let mut child = self.processes.get_mut(child_name).expect("Child not found.");
child.parent = p.name;
}
}
I can't have both a mutable reference to the HashMap (self.processes) and a non-mutable reference, or two mutable references.
So, what is the most idiomatic way to accomplish this in Rust? The two options I can see are:
Copy the parent/child relationships into a new temporary data structure in one pass, and then update the Process structs in a second pass, after the immutable reference is out of scope.
Change my data structure to put "parent" in its own HashMap.
Is there a third option?
Yes, you can grant internal mutability to the HashMap's values using RefCell:
struct ProcessTree {
processes: HashMap<String, RefCell<Process>>, // change #1
}
impl ProcessTree {
fn update_parents(&self) {
for p in self.processes.values() {
let p = p.borrow(); // change #2
for child_name in &p.children {
let mut child = self.processes
.get(child_name) // change #3
.expect("Child not found.")
.borrow_mut(); // change #4
child.parent = Some(p.name.clone());
}
}
}
}
borrow_mut will panic at runtime if the child is already borrowed with borrow. This happens if a process is its own parent (which should presumably never happen, but in a more robust program you'd want to give a meaningful error message instead of just panicking).
I invented some names and made a few small changes (besides the ones specifically indicated) to make this code compile. Notably, p.name.clone() makes a full copy of p.name. This is necessary because both name and parent are owned Strings.

Variable in loop does not live long enough

I've been playing with Rust for the past few days,
and I'm still struggling with the memory management (figures).
I wrote a simple project that creates a hierarchy of structs ("keywords") from lexing/parsing a text file.
A keyword is defined like this:
pub struct Keyword<'a> {
name: String,
code: u32,
parent: Option<&'a Keyword<'a>>,
}
which is my equivalent for this C struct:
typedef struct Keyword Keyword;
struct Keyword {
char* name;
unsigned int code;
Keyword* parent;
};
A hierarchy is just a container for keywords, defined like this:
pub struct KeywordHierarchy<'a> {
keywords: Vec<Box<Keyword<'a>>>,
}
impl<'a> KeywordHierarchy<'a> {
fn add_keyword(&mut self, keyword: Box<Keyword<'a>>) {
self.keywords.push(keyword);
}
}
In the parse function (which is a stub of the complete parser), I recreated the condition that spawns the lifetime error in my code.
fn parse<'a>() -> Result<KeywordHierarchy<'a>, String> {
let mut hierarchy = KeywordHierarchy { keywords: Vec::new() };
let mut first_if = true;
let mut second_if = false;
while true {
if first_if {
// All good here.
let root_keyword = Keyword {
name: String::from("ROOT"),
code: 0,
parent: None,
};
hierarchy.add_keyword(Box::new(root_keyword));
first_if = false;
second_if = true;
}
if second_if {
// Hierarchy might have expired here?
// Find parent
match hierarchy.keywords.iter().find(|p| p.code == 0) {
None => return Err(String::from("No such parent")),
Some(parent) => {
// If parent is found, create a child
let child = Keyword {
name: String::from("CHILD"),
code: 1,
parent: Some(&parent),
};
hierarchy.add_keyword(Box::new(child));
}
}
second_if = false;
}
if !first_if && !second_if {
break;
}
}
Ok(hierarchy)
}
There's a while loop that goes through the lexer tokens.
In the first if, I add the ROOT keyword to the hierarchy, which is the only one that doesn't have a parent, and everything goes smoothly as expected.
In the second if, I parse the children keywords and I get a lifetime error when invoking KeywordHierarchy.add_keyword().
hierarchy.keywords` does not live long enough
Could you guys recommend an idiomatic way to fix this?
Cheers.
P.S. Click here for the playground
The immediate problem I can see with your design is that your loop is going to modify the hierarchy.keywords vector (in the first_if block), but it also borrows elements from the hierarchy.keywords vector (in the second_if block).
This is problematic, because modifying a vector may cause its backing buffer to be reallocated, which, if it were allowed, would invalidate all existing borrows to the vector. (And thus it is not allowed.)
Have you considered using an arena to hold your keywords instead of a Vec? Arenas are designed so that you can allocate new things within them while still having outstanding borrows to elements previously allocated within the arena.
Update: Here is a revised version of your code that illustrates using an arena (in this case a rustc_arena::TypedArena, but that's just so I can get this running on the play.rust-lang.org service) alongside a Vec<&Keyword> to handle the lookups.
https://play.rust-lang.org/?gist=fc6d81cb7efa7e4f32c481ab6482e587&version=nightly&backtrace=0
The crucial bits of code are this:
First: the KeywordHierarchy now holds a arena alongside a vec:
pub struct KeywordHierarchy<'a> {
keywords: Vec<&'a Keyword<'a>>,
kw_arena: &'a TypedArena<Keyword<'a>>,
}
Second: Adding a keyword now allocates the spot in the arena, and stashes a reference to that spot in the vec:
fn add_keyword(&mut self, keyword: Keyword<'a>) {
let kw = self.kw_arena.alloc(keyword);
self.keywords.push(kw);
}
Third: the fn parse function now takes an arena (reference) as input, because we need the arena to outlive the stack frame of fn parse:
fn parse<'a>(arena: &'a TypedArena<Keyword<'a>>) -> Result<KeywordHierarchy<'a>, String> {
...
Fourth: To avoid borrow-checker issues with borrowing hierarchy as mutable while also iterating over it, I moved the hierarchy modification outside of your Find parent match:
// Find parent
let parent = match hierarchy.keywords.iter().find(|p| p.code == 0) {
None => return Err(String::from("No such parent")),
Some(parent) => *parent, // (this deref is important)
};
// If parent is found, create a child
let child = Keyword {
name: String::from("CHILD"),
code: 1,
parent: Some(parent),
};
hierarchy.add_keyword(child);
second_if = false;

Rust borrowed pointers and lifetimes

In my code I have a mutually recursive tree structure which looks something like the following:
enum Child<'r> {
A(&'r Node<'r>),
B,
C
}
struct Node<'r> {
children : [&'r Child<'r>,..25]
}
impl <'r>Node<'r> {
fn new() -> Node {
Node {
children : [&B,..25]
}
}
}
but it doesn't compile as-is. What is the best way to modify it to make it do so?
Here is a version where the nodes can be modified from outside the tree, which I presume is what was asked for.
use std::rc::Rc;
use std::cell::RefCell;
struct Node {
a : Option<Rc<RefCell<Node>>>,
b : Option<Rc<RefCell<Node>>>,
value: int
}
impl Node {
fn new(value: int) -> Rc<RefCell<Node>> {
let node = Node {
a: None,
b: None,
value: value
};
Rc::new(RefCell::new(node))
}
}
fn main() {
let first = Node::new(0);
let second = Node::new(0);
let third = Node::new(0);
first.borrow_mut().a = Some(second.clone());
second.borrow_mut().a = Some(third.clone());
second.borrow_mut().value = 1;
third.borrow_mut().value = 2;
println!("Value of second: {}", first.borrow().a.get_ref().borrow().value);
println!("Value of third: {}", first.borrow().a.get_ref().borrow().a.get_ref().borrow().value);
}
Rc is a reference counted pointer and allows a single object to have multiple owners. It doesn't allow mutation however, so a RefCell is required which allows runtime checked mutable borrowing. That's why the code uses Rc<RefCell<Node>>. The Option type is used to represent potential children with Option<Rc<RefCell<Node>>>.
Since, the Rc type auto dereferences, it's possible to directly call RefCell methods on it. These are borrow() and borrow_mut() which return a reference and mutable reference to the underlying Node. There also exist try_borrow() and try_borrow_mut() variants which cannot fail.
get_ref() is a method of the Option type which returns a reference to the underlying Rc<RefCell<Node>>. In a real peogram we would probably want to check whether the Option contains anything beforehand.
Why does the original code not work? References &T imply non-ownership, so something else would have to own the Nodes. While it would be possible to build a tree of &Node types, it wouldn't be possible to modify the Nodes outside of the tree because once borrowed, an object cannot be modified by anything else than the borrowing object.

Resources