Can I move from &mut self to &self within the same function? - rust

I'm trying to learn a bit of Rust through a toy application, which involves a tree data structure that is filled dynamically by querying an external source. In the beginning, only the root node is present. The tree structure provides a method get_children(id) that returns a [u32] of the IDs of all the node's children — either this data is already known, or the external source is queried and all the nodes are inserted into the tree.
I'm running into the following problem with the borrow checker that I can't seem to figure out:
struct Node {
id: u32,
value: u64, // in my use case, this type is much larger and should not be copied
children: Option<Vec<u32>>,
}
struct Tree {
nodes: std::collections::HashMap<u32, Node>,
}
impl Tree {
fn get_children(&mut self, id: u32) -> Option<&[u32]> {
// This will perform external queries and add new nodes to the tree
None
}
fn first_even_child(&mut self, id: u32) -> Option<u32> {
let children = self.get_children(id)?;
let result = children.iter().find(|&id| self.nodes.get(id).unwrap().value % 2 == 0)?;
Some(*result)
}
}
Which results in:
error[E0502]: cannot borrow `self.nodes` as immutable because it is also borrowed as mutable
--> src/lib.rs:19:43
|
18 | let children = self.get_children(id)?;
| ---- mutable borrow occurs here
19 | let result = children.iter().find(|&id| self.nodes.get(id).unwrap().value % 2 == 0)?;
| ---- ^^^^^ ---------- second borrow occurs due to use of `self.nodes` in closure
| | |
| | immutable borrow occurs here
| mutable borrow later used by call
Since get_children might insert nodes into the tree, we need a &mut self reference. However, the way I see it, after the value of children is known, self no longer needs to be borrowed mutably. Why does this not work, and how would I fix it?
EDIT -- my workaround
After Chayim Friedman's answer, I decided against returning Self. I mostly ran into the above problem when first calling get_children to get a list of IDs and then using nodes.get() to obtain the corresponding Node. Instead, I refactored to provide the following functions:
impl Tree {
fn load_children(&mut self, id: u32) {
// If not present yet, perform queries to add children to the tree
}
fn iter_children(&self, id: u32) -> Option<IterChildren> {
// Provides an iterator over the children of node `id`
}
}

Downgrading a mutable reference into a shared reference produces a reference that should be kept unique. This is necessary for e.g. Cell::from_mut(), which has the following signature:
pub fn from_mut(t: &mut T) -> &Cell<T>
This method relies on the uniqueness guarantee of &mut T to ensure no references to T are kept directly, only via Cell. If downgrading the reference would mean the unqiueness could have been violated, this method would be unsound, because the value inside the Cell could have been changed by another shared references (via interior mutability).
For more about this see Common Rust Lifetime Misconceptions: downgrading mut refs to shared refs is safe.
To solve this you need to get both shared references from the same shared reference that was created from the mutable reference. You can, for example, also return &Self from get_children():
fn get_children(&mut self, id: u32) -> Option<(&Self, &[u32])> {
// This will perform external queries and add new nodes to the tree
Some((self, &[]))
}
fn first_even_child(&mut self, id: u32) -> Option<u32> {
let (this, children) = self.get_children(id)?;
let result = children.iter().find(|&id| this.nodes.get(id).unwrap().value % 2 == 0)?;
Some(*result)
}

Related

For loop - Struct with lifetime 'a cannot borrow as mutable because it is also borrowed as immutable

I have a struct which maps ids to indices and vice versa.
struct IdMapping<'a> {
external_2_internal: HashMap<&'a str, usize>,
internal_2_external: HashMap<usize, String>,
}
impl<'a> IdMapping<'a> {
fn new() -> IdMapping<'a> {
IdMapping {
external_2_internal: HashMap::new(),
internal_2_external: HashMap::new(),
}
}
fn insert(&'a mut self, internal: usize, external: String) {
self.internal_2_external.insert(internal, external);
let mapped_external = self.internal_2_external.get(&internal).unwrap();
self.external_2_internal.insert(mapped_external, internal);
}
}
If I am using this structure the following way
fn map_ids<'a>(ids: Vec<String>) -> IdMapping<'a> {
let mut mapping = IdMapping::new();
for (i, id) in ids.iter().enumerate() {
mapping.insert(i, id.clone());
}
mapping
}
I receive the following compiler error:
error[E0499]: cannot borrow `mapping` as mutable more than once at a time
--> src/lib.rs:28:9
|
24 | fn map_ids<'a>(ids: Vec<String>) -> IdMapping<'a> {
| -- lifetime `'a` defined here
...
28 | mapping.insert(i, id.clone());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `mapping` was mutably borrowed here in the previous iteration of the loop
...
31 | mapping
| ------- returning this value requires that `mapping` is borrowed for `'a`
Playground link
Why can't I mutably borrow the mapping for its insert method each iteration of the loop? How should I implement this use case?
The problem is due to the self reference in your struct.
Let's first look at whether this is theoretically sound (assuming we're writing unsafe code):
HashMap uses a flat array (quadratic probing), so objects in the HashMap aren't address stable under insertion of a new element. This means that an insertion into internal_2_external may move the existing Strings around in memory.
String stores its content in a separate heap allocation, the heap allocation remains at the same location. So even if the String is moved around, a &str referencing it will remain pointing to valid memory.
So this would actually work if implemented in unsafe code.
While it's logically sound to do those operations, the type system is unable to recognise that you can move a String while keeping borrowed ranges of it valid. This means that if you borrow a &str from a String, the type system will prevent you from doing any mutation operation on your String, including moving it. As such, you can't do any mutation operation on the hash map either, giving rise to your error.
I can see two safe ways to work around this:
Make external_2_internal keep its own copy of the strings, i.e. useHashMap<String, usize>.
Keep the strings in a separate Vec:
struct IdMapping {
strings: Vec<String>,
external_2_internal: HashMap<usize /* index */, usize>,
internal_2_external: HashMap<usize, usize /* index */>,
}
Alternatively you could work with unsafe code, if you don't want to change the layout of your struct.

Use regular reference instead of `Box` in recursive data structures

I am new to Rust. When I read chapter 15 of The Rust Programming Language, I failed to know why one should use Boxes in recursive data structures instead of regular references. 15.1 of the book explains that indirection is required to avoid infinite-sized structures, but it does not explain why to use Box.
#[derive(Debug)]
enum FunctionalList<'a> {
Cons(u32, &'a FunctionalList<'a>),
Nil,
}
use FunctionalList::{Cons, Nil};
fn main() {
let list = Cons(1, &Cons(2, &Cons(3, &Nil)));
println!("{:?}", list);
}
The code above compiles and produces the desired output. It seems that using FunctionalList to store a small amount of data on stack works perfectly well. Does this code cause troubles?
It is true that the FunctionalList works in this simple case. However, we will run into some difficulties if we try to use this structure in other ways. For instance, suppose we tried to construct a FunctionalList and then return it from a function:
#[derive(Debug)]
enum FunctionalList<'a> {
Cons(u32, &'a FunctionalList<'a>),
Nil,
}
use FunctionalList::{Cons, Nil};
fn make_list(x: u32) -> FunctionalList {
return Cons(x, &Cons(x + 1, &Cons(x + 2, &Nil)));
}
fn main() {
let list = make_list(1);
println!("{:?}", list);
}
This results in the following compile error:
error[E0106]: missing lifetime specifier
--> src/main.rs:9:25
|
9 | fn make_list(x: u32) -> FunctionalList {
| ^^^^^^^^^^^^^^ help: consider giving it an explicit bounded or 'static lifetime: `FunctionalList + 'static`
If we follow the hint and add a 'static lifetime, then we instead get this error:
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:10:12
|
10 | return Cons(x, &Cons(x + 1, &Cons(x + 2, &Nil)));
| ^^^^^^^^^^^^^^^^^^^^^^-----------------^^
| | |
| | temporary value created here
| returns a value referencing data owned by the current function
The issue is that the inner FunctionalList values here are owned by implicit temporary variables whose scope ends at the end of the make_list function. These values would thus be dropped at the end of the function, leaving dangling references to them, which Rust disallows, hence the borrow checker rejects this code.
In contrast, if FunctionalList had been defined to Box its FunctionalList component, then ownership would have been moved from the temporary value into the containing FunctionalList, and we would have been able to return it without any problem.
With your original FunctionalList, the thing we have to think about is that every value in Rust has to have an owner somewhere; and so if, as in this case, the FunctionaList is not the owner of its inner FunctionalLists, then that ownership has to reside somewhere else. In your example, that owner was an implicit temporary variable, but in more complex situations we could use a different kind of external owner. Here's an example of using a TypedArena (from the typed-arena crate) to own the data, so that we can still implement a variation of the make_list function:
use typed_arena::Arena;
#[derive(Debug)]
enum FunctionalList<'a> {
Cons(u32, &'a FunctionalList<'a>),
Nil,
}
use FunctionalList::{Cons, Nil};
fn make_list<'a>(x: u32, arena: &'a Arena<FunctionalList<'a>>) -> &mut FunctionalList<'a> {
let l0 = arena.alloc(Nil);
let l1 = arena.alloc(Cons(x + 2, l0));
let l2 = arena.alloc(Cons(x + 1, l1));
let l3 = arena.alloc(Cons(x, l2));
return l3;
}
fn main() {
let arena = Arena::new();
let list = make_list(1, &arena);
println!("{:?}", list);
}
In this case, we adapted the return type of make_list to return only a mutable reference to a FunctionalList, instead of returning an owned FunctionalList, since now the ownership resides in the arena.

Rust, how to return reference to something in a struct that lasts as long as the struct?

I am porting a compiler I wrote to Rust. In it, I have an enum Entity which represents things like functions and variables:
pub enum Entity<'a> {
Variable(VariableEntity),
Function(FunctionEntity<'a>)
// Room for more later.
}
I then have a struct Scope which is responsible for holding on to these entities in a hash map, where the key is the name given by the programmer to the entity. (For example, declaring a function named sin would put an Entity into the hash map at the key sin.)
pub struct Scope<'a> {
symbols: HashMap<String, Entity<'a>>,
parent: Option<&'a Scope<'a>>
}
I would like to be able to get read-only references to the objects in the HashMap so that I can refer to it from other data structures. For example, when I parse a function call, I want to be able to store a reference to the function that is being called instead of just storing the name of the function and having to look up the reference every time I need the actual Entity object corresponding to the name. To do so, I have made this method:
impl<'a> Scope<'a> {
pub fn lookup(&self, symbol: &str) -> Option<&'a Entity<'a>> {
let result = self.symbols.get(symbol);
match result {
Option::None => match self.parent {
Option::None => Option::None,
Option::Some(parent) => parent.lookup(symbol),
},
Option::Some(_value) => result
}
}
}
However, this results in a compilation error:
error[E0495]: cannot infer an appropriate lifetime for autoref due to conflicting requirements
--> src/vague/scope.rs:29:31
|
29 | let result = self.symbols.get(symbol);
| ^^^
|
note: first, the lifetime cannot outlive the anonymous lifetime #1 defined on the method body at 28:3...
--> src/vague/scope.rs:28:3
|
28 | / pub fn lookup(&self, symbol: &str) -> Option<&'a Entity<'a>> {
29 | | let result = self.symbols.get(symbol);
30 | | match result {
31 | | Option::None => match self.parent {
... |
36 | | }
37 | | }
| |___^
note: ...so that reference does not outlive borrowed content
--> src/vague/scope.rs:29:18
|
29 | let result = self.symbols.get(symbol);
| ^^^^^^^^^^^^
note: but, the lifetime must be valid for the lifetime 'a as defined on the impl at 9:6...
--> src/vague/scope.rs:9:6
|
9 | impl<'a> Scope<'a> {
| ^^
= note: ...so that the expression is assignable:
expected std::option::Option<&'a vague::entity::Entity<'a>>
found std::option::Option<&vague::entity::Entity<'_>>
Things I Tried
There are several ways to make the compilation error go away, but none of them give the behavior I want. First, I can do this:
pub fn lookup(&self, symbol: &str) -> Option<&Entity<'a>> {
But this means the reference will not live long enough, so I can't put it into a struct or any other kind of storage that will outlive the scope that lookup is called from. Another solution was this:
pub fn lookup(&self, symbol: &str) -> Option<&'a Entity> {
Which I do not understand why it could compile. As part of the struct definition, things inside Entity objects in the hash map must live at least as long as the scope, so how can the compiler allow the return type to be missing that? Additionally, why would the addition of <'a> result in the previous compiler error, since the only place the function is getting Entitys from is from the hash map, which is defined as having a value type of Entity<'a>. Another bad fix I found was:
pub fn lookup(&'a self, symbol: &str) -> Option<&'a Entity<'a>> {
Which would mean that lookup can only be called once, which is obviously a problem. My previous understanding was incorrect, but the problem still remains that requiring the reference to self to have the same lifetime as the whole object severely restricts the code in that I can't call this method from a reference with any shorter lifetime, e.g. one passed in as a function argument or one created in a loop.
How can I go about fixing this? Is there some way I can fix the function as I have it now, or do I need to implement the behavior I'm looking for in an entirely different way?
Here's the signature you want:
pub fn lookup(&self, symbol: &str) -> Option<&'a Entity<'a>>
Here's why it can't work: it returns a reference that borrows an Entity for longer than lookup initially borrowed the Scope. This isn't illegal, but it means that the reference lookup returns can't be derived from the self reference. Why? Because given the above signature, this is valid code:
let sc = Scope { ... };
let foo = sc.lookup("foo");
drop(sc);
do_something_with(foo);
This code compiles because it has to: there is no lifetime constraint that the compiler could use to prove it wrong, because the lifetime of foo isn't coupled to the borrow of sc. But clearly, if lookup were implemented the way you first tried, foo would contain a dangling pointer after drop(sc), which is why the compiler rejected it.
You must redesign your data structures to make the given signature for lookup work. It's not clear how best to do this given the code in the question, but here are some ideas:
Decouple the lifetimes in Scope so that the parent is borrowed for a different lifetime than the symbols. Then have lookup take &'parent self. This probably will not work by itself, depending on what you need to do with the Entitys, but you may need to do it anyway if you need to distinguish between the lifetimes of different data.
pub struct Scope<'parent, 'sym> {
symbols: HashMap<String, Entity<'sym>>,
parent: Option<&'parent Scope<'parent, 'sym>>,
}
impl<'parent, 'sym> Scope<'parent, 'sym> {
pub fn lookup(&'parent self, symbol: &str) -> Option<&'parent Entity<'sym>> {
/* ... */
}
}
Store your Scopes and/or your Entitys in an arena. An arena can give out references that outlive the self-borrow, as long as they don't outlive the arena data structure itself. The tradeoff is that nothing in the arena will be deallocated until the whole arena is destroyed. It's not a substitute for garbage collection.
Use Rc or Arc to store your Scopes and/or your Entitys and/or whatever data Entity stores that contains references. This is one way to get rid of the lifetime parameter completely, but it comes with a small runtime cost.

How to get a mutable reference to an arbitrary element in a Rc<RefCell<T>> tree?

I have a tree-like structure like the following:
use std::{cell::RefCell, collections::HashMap, rc::Rc};
struct Node<T> {
vals: HashMap<String, T>,
parent: Option<Rc<RefCell<Node<T>>>>,
}
This is a chained hash map: each node contains a hash map and an (optional, as the root of the tree has no parent) shared pointer to its parent. Multiple children can share the same parent.
If I want to get a clone of a value out of this chained hash map, I use recursion to walk up the tree, like so:
impl<T> Node<T> {
pub fn get(&self, name: &str) -> Option<T> {
self.vals
.get(name)
.cloned()
.or_else(|| self.parent.as_ref().and_then(|p| p.borrow().get(name)))
}
}
However, I need a mutable reference to an element contained in this tree. Since I cannot return a 'standard' mutable reference to an element, due to it being contained in a RefCell, I thought about using RefMut and the RefMut::map function to obtain one, like so:
use std::cell::RefMut;
impl<T> Node<T> {
pub fn get_mut<'a>(node: RefMut<'a, Node<T>>, name: &str) -> Option<RefMut<'a, T>> {
if node.vals.contains_key(name) {
Some(RefMut::map(node, |n| n.vals.get_mut(name).unwrap()))
} else {
node.parent.and_then(|p| Node::get_mut(p.borrow_mut(), name))
}
}
}
This does not compile: the return value references its child node (due to it being also dependent on its child's borrow), and the RefMut pointing to the child goes out of scope at function exit:
error[E0515]: cannot return value referencing function parameter `p`
--> src/lib.rs:16:31
|
16 | .and_then(|p| Node::get_mut(p.borrow_mut(), name))
| ^^^^^^^^^^^^^^-^^^^^^^^^^^^^^^^^^^^
| | |
| | `p` is borrowed here
| returns a value referencing data owned by the current function
error[E0507]: cannot move out of borrowed content
--> src/lib.rs:15:13
|
15 | node.parent
| ^^^^^^^^^^^ cannot move out of borrowed content
I do not understand how I could go about getting something deferenceable out of this tree. I assume I might need a sort of "RefMut chain" in order to extend the lifetime of the child node RefMut, but wouldn't that also create multiple mutable references to (components of) the same Node?
Alternatively, is there a way to get some sort of Rc<RefCell> pointing to one of the values in a node, as to avoid this sort of dependency chain? I really am stumped as to what to do.
Please do not suggest passing a function to apply to the value with the given name rather than returning a reference, as that does not apply to my use case: I really do need just a mutable reference (or something allowing me to obtain one.)
I do not believe that this is a duplicate of How do I return a reference to something inside a RefCell without breaking encapsulation?, as that answer only deals with returning a reference to a component of a value contained in a single RefCell (which I already do using RefMut::map). My problem involves a chain of Rc<RefCell>s, which that question does not address.

Destructuring a struct containing a borrow in a function argument

I am trying to implement a system that would use borrow checking/lifetimes in order to provide safe custom indices on a collection. Consider the following code:
struct Graph(i32);
struct Edge<'a>(&'a Graph, i32);
impl Graph {
pub fn get_edge(&self) -> Edge {
Edge(&self, 0)
}
pub fn split(&mut self, Edge(_, edge_id): Edge) {
self.0 = self.0 + edge_id;
}
pub fn join(&mut self, Edge(_, edge0_id): Edge, Edge(_, edge1_id): Edge) {
self.0 = self.0 + edge0_id + edge1_id;
}
}
fn main() {
let mut graph = Graph(0);
let edge = graph.get_edge();
graph.split(edge)
}
References to the graph borrowed by the Edge struct should be dropped when methods such as split or join are called. This would fulfill the API invariant that all edge indices must be destroyed when the graph is mutated. However, the compiler doesn't get it. It fails with messages like
error[E0502]: cannot borrow `graph` as mutable because it is also borrowed as immutable
--> src/main.rs:23:5
|
22 | let edge = graph.get_edge();
| ----- immutable borrow occurs here
23 | graph.split(edge)
| ^^^^^ mutable borrow occurs here
24 | }
| - immutable borrow ends here
If I understand this correctly, the compiler fails to realise that the borrowing of the graph that happened in the edge struct is actually being released when the function is called. Is there a way to teach the compiler what I am trying to do here?
Bonus question: is there a way to do exactly the same but without actually borrowing the graph in the Edge struct? The edge struct is only used as a temporary for the purpose of traversal and will never be part of an external object state (I have 'weak' versions of the edge for that).
Addendum: After some digging around, it seems to be really far from trivial. First of all, Edge(_, edge_id) does not actually destructure the Edge, because _ does not get bound at all (yes, i32 is Copy which makes things even more complicated, but this is easily remedied by wrapping it into a non-Copy struct). Second, even if I completely destructure Edge (i.e. by doing it in a separate scope), the reference to the graph is still there, even though it should have been moved (this must be a bug). It only works if I perform the destructuring in a separate function. Now, I have an idea how to circumvent it (by having a separate object that describes a state change and destructures the indices as they are supplied), but this becomes very awkward very quickly.
You have a second problem that you didn’t mention: how does split know that the user didn’t pass an Edge from a different Graph? Fortunately, it’s possible to solve both problems with higher-rank trait bounds!
First, let’s have Edge carry a PhantomData marker instead of a real reference to the graph:
pub struct Edge<'a>(PhantomData<&'a mut &'a ()>, i32);
Second, let’s move all the Graph operations into a new GraphView object that gets consumed by operations that should invalidate the identifiers:
pub struct GraphView<'a> {
graph: &'a mut Graph,
marker: PhantomData<&'a mut &'a ()>,
}
impl<'a> GraphView<'a> {
pub fn get_edge(&self) -> Edge<'a> {
Edge(PhantomData, 0)
}
pub fn split(self, Edge(_, edge_id): Edge) {
self.graph.0 = self.graph.0 + edge_id;
}
pub fn join(self, Edge(_, edge0_id): Edge, Edge(_, edge1_id): Edge) {
self.graph.0 = self.graph.0 + edge0_id + edge1_id;
}
}
Now all we have to do is guard the construction of GraphView objects such that there’s never more than one with a given lifetime parameter 'a.
We can do this by (1) forcing GraphView<'a> to be invariant over 'a with a PhantomData member as above, and (2) only ever providing a constructed GraphView to a closure with a higher-rank trait bound that creates a fresh lifetime each time:
impl Graph {
pub fn with_view<Ret>(&mut self, f: impl for<'a> FnOnce(GraphView<'a>) -> Ret) -> Ret {
f(GraphView {
graph: self,
marker: PhantomData,
})
}
}
fn main() {
let mut graph = Graph(0);
graph.with_view(|view| {
let edge = view.get_edge();
view.split(edge);
});
}
Full demo on Rust Playground.
This isn’t totally ideal, since the caller may have to go through contortions to put all its operations inside the closure. But I think it’s the best we can do in the current Rust language, and it does allow us to enforce a huge class of compile-time guarantees that almost no other language can express at all. I’d love to see more ergonomic support for this pattern added to the language somehow—perhaps a way to create a fresh lifetime via a return value rather than a closure parameter (pub fn view(&mut self) -> exists<'a> GraphView<'a>)?

Resources