This question already has answers here:
How can I swap in a new value for a field in a mutable reference to a structure?
(2 answers)
Closed 5 years ago.
Sometimes I run into a problem where, due to implementation details that should be invisible to the user, I need to "destroy" a &mut and replace it in-memory. This typically ends up happening in recursive methods or IntoIterator implementations on recursive structures. It typically follows the form of:
fn create_something(self);
pub fn do_something(&mut self) {
// What you want to do
*self = self.create_something();
}
One example that I happened to have in my current project is in a KD Tree I've written, when I "remove" a node, instead of doing logic to rearrange the children, I just destructure the node I need to remove and rebuild it from the values in its subtrees:
// Some recursive checks to identify is this is our node above this
if let Node{point, left, right} = mem::replace(self, Sentinel) {
let points = left.into_iter().chain(right.into_iter()).collect();
(*self) = KDNode::new(points);
Some(point)
} else {
None
}
Another more in-depth example is the IntoIterator for this KDTree, which has to move a curr value out of the iterator, test it, and then replace it:
// temporarily swap self.curr with a dummy value so we can
// move out of it
let tmp = mem::replace(&mut self.curr, (Sentinel,Left));
match tmp {
// If the next node is a Sentinel, that means the
// "real" next node was either the parent, or we're done
(Sentinel,_) => {
if self.stack.is_empty() {
None
} else {
self.curr = self.stack.pop().expect("Could not pop iterator parent stack");
self.next()
}
}
// If the next node is to yield the current node,
// then the next node is it's right child's leftmost
// descendent. We only "load" the right child, and lazily
// evaluate to its left child next iteration.
(Node{box right,point,..},Me) => {
self.curr = (right,Left);
Some(point)
},
// Left is an instruction to lazily find this node's left-most
// non-sentinel child, so we recurse down, pushing the parents on the
// stack as we go, and then say that our next node is our right child.
// If this child doesn't exist, then it will be taken care of by the Sentinel
// case next call.
(curr # Node{..},Left) => {
let mut curr = curr;
let mut left = get_left(&mut curr);
while !left.is_sentinel() {
self.stack.push((curr,Me));
curr = left;
left = get_left(&mut curr);
}
let (right,point) = get_right_point(curr);
self.curr = (right, Left);
Some(point)
}
As you can see, my current method is to just use mem::replace with a dummy value, and then just overwrite the dummy value later. However, I don't like this for several reasons:
In some cases, there's no suitable dummy value. This is especially true if there's no public/easy way to construct a "zero value" for one or more of your struct members (e.g. what if the struct held a MutexGuard?). If the member you need to dummy-replace is in another module (or crate), you may be bound by difficult constraints of its construction that are undesireable when trying to build a dummy type.
The struct may be rather large, in which case doing more moves than is necessary may be undesirable (in practice, this is unlikely to be a big problem, admittedly).
It just "feels" unclean, since the "move" is technically more of an "update". In fact, the simplest example might be something like *self = self.next.do_something() which will still have problems.
In some cases, such as that first remove snippet I showed, you could perhaps more cleanly represent it as a fn do_something(self) -> Self, but in other cases such as the IntoIterator example this can't be done because you're constrained by the trait definition.
Is there any better, cleaner way to do this sort of in-place update?
In any case we'll need assignment, mem::replace, mem::swap, or something like that. Because given a &mut reference to an object there is no way to move this object (or any of it's fields) out without replacing it's memory area with something valid, as long as Rust forbids references to uninitialized memory.
As for dummy values for replacement, you can always make them yourself for any type by using some wrapper type. For example, I often use Option for this purpose, where Some(T) is the value of type T, and None acts as dummy. This is what I mean:
struct Tree<T>(Option<Node<T>>);
enum Node<T> {
Leaf(T),
Children(Vec<Tree<T>>),
}
impl<T> Tree<T> where T: PartialEq {
fn remove(&mut self, value: &T) {
match self.0.take() {
Some(Node::Leaf(ref leaf_value)) if leaf_value == value =>
(),
node # Some(Node::Leaf(..)) =>
*self = Tree(node),
Some(Node::Children(node_children)) => {
let children: Vec<_> =
node_children
.into_iter()
.filter_map(|mut tree| { tree.remove(value); tree.0 })
.map(|node| Tree(Some(node)))
.collect();
if !children.is_empty() {
*self = Tree(Some(Node::Children(children)));
}
},
None =>
panic!("something went wrong"),
}
}
}
playground link
Related
This question already has an answer here:
Returning a reference from a HashMap or Vec causes a borrow to last beyond the scope it's in?
(1 answer)
Closed 3 months ago.
I am implementing a cache which tries a lookup in a table, if that fails it tries a simple method to get the value, and if that fails, it goes and computes multiple new entries in the cache. The Entry system seems specifically designed for the first half of this, but I cannot get the borrow checker to allow me to complete the last half.
use std::collections::HashMap;
fn main() {
let mut cache = Cache { cache: HashMap::new() };
println!("{}", cache.get_from_cache(10));
}
struct Cache {
cache: HashMap<u32, String>
}
impl Cache {
fn get_from_cache<'a>(&'a mut self, i: u32) -> &'a String {
match self.cache.entry(i) {
std::collections::hash_map::Entry::Occupied(entry) => return entry.into_mut(),
std::collections::hash_map::Entry::Vacant(entry) => {
// Some values have an easy way to be computed...
if i == 5 {
return entry.insert("my string".to_string())
}
}
}
// Neither look-up method succeeded, so we 'compute' values one-by-one
for j in 1..=i {
self.cache.insert(j, "another string".to_string()); // Borrow checker fails here
}
self.cache.get(&i).unwrap()
}
}
The problem is that the Entry from self.cache.entry(i) borrows self.cache for the entire lifetime 'a even though I no longer need it at the point that I attempt to do self.cache.insert.
One solution to this (I think) would be if there were a way to turn my reference to the Entry into a reference to its HashMap and then insert through that reference. However, I see no way to do this with the entry interface. Is there some way to achieve this? Or to otherwise satisfy the borrow checker?
This is easily fixed by separating inserting values from returning the final result. You can first make sure the value is in the cache or insert it with some strategy if not, and then return the new value (now guaranteed to be in the HashMap):
fn get_from_cache<'a>(&'a mut self, i: u32) -> &'a String {
// handle inserting the value if necessary:
match self.cache.entry(i) {
std::collections::hash_map::Entry::Occupied(entry) => (),
// Some values have an easy way to be computed...
std::collections::hash_map::Entry::Vacant(entry) if i == 5 => {
entry.insert("my string".to_string());
}
// ... others aren't
std::collections::hash_map::Entry::Vacant(entry) => {
for j in 1..=i {
self.cache.insert(j, "another string".to_string());
}
}
}
// The value is now definitely in `self.cache` so we can return a reference to it
&self.cache[&i]
}
I am learning Rust coming from a C++ background. I am writing a parser that reads valid formulas such as 12 x (y - 4) and puts them in an abstract syntax tree (AST). I use an enum to store the possible nodes in the AST as you can see in the (heavily simplified) code. I ran into a problem. I want to simplify the expression -(-(12)) for instance by moving the 12, not copying. In general the 12 may be replaced by a deep AST. Currently I identify the situation in the function simplify( .. ) but it wont compile.
I do know why it doesn't compile: The node I'm trying to move (i.e. the 12 in my example) is still in scope as a reference from in the match clause so this is exactly rust preventing me from a possible problem with references. But in my case I know this is what I want, and moreover I exit the function right after I do the moving in the line *node = **child;, so the earlier references will go out of scope right there.
Is there an idiomatic Rust-esque type of way to solve this problem? I would really rather not copy the double--negated subtree.
#[derive(Debug)]
enum Node {
Num(i32),
UnaryMinus(Box<Node>),
}
use Node::*;
fn simplify(node: &mut Node) {
match &node {
Num(_) => (),
UnaryMinus(inner) => match &**inner {
Num(_) => (),
UnaryMinus(child) => {
// cannot move out of `**child` which is behind a shared reference
*node = **child;
return;
}
},
}
}
fn main() {
let double_minus = UnaryMinus(Box::new(UnaryMinus(Box::new(Num(12)))));
}
There are two problems here: The error you're getting concretely is because you're matching against &**inner instead of &mut **inner. It's never okay to move out of a shared reference, but there are circumstances (which we'll see) when it's okay to move out of a mutable one.
With that fixed, you will get the same error just about mutable references. That's because you can't move out of a mutable reference just because you know you're the only one holding it. A move leaves its source uninitialized, and a reference, mutable or shared, to uninitialized memory is never okay. You'll have to leave something in that memory that you're moving out of. You can do that with std::mem::swap. It takes two mutable references and swaps out the contents.
Now, one could obviously try to call std::mem::swap(node, &mut child) but this won't work simply because node is already mutably borrowed in the match expression and you can't mutably borrow something twice.
Moreover, this would leak memory as you now have a reference cycle where node -> inner -> node. This, although perfectly valid to do in Rust, usually isn't what you want.
Instead you'll need some sort of dummy that you can put in child's place. Some simple variant of your enum that can be safely dropped by inner once it gets dropped. In this example that could be Node::Num(0):
#[derive(Debug)]
enum Node {
Num(i32),
UnaryMinus(Box<Node>),
}
// This is just to verify that everything gets dropped properly
// and we don't leak any memory
impl Drop for Node {
fn drop(&mut self) {
println!("dropping {:?}", self);
}
}
use Node::*;
fn simplify(node: &mut Node) {
// `placeholder` will be our dummy
let (mut can_simplify, mut placeholder) = (false, Num(0));
match node {
Num(_) => (),
// we'll need to borrow `child` mutably later, so we have to
// match on `&mut **inner` not `&**inner`
UnaryMinus(inner) => match &mut **inner {
Num(_) => (),
UnaryMinus(ref mut child) => {
// move the contents of `child` into `placeholder` and vice versa
std::mem::swap(&mut placeholder, child);
can_simplify = true;
}
},
}
if can_simplify {
// now we can safely move into `node`
*node = placeholder;
// you could skip the conditional if all other, non-simplifying
// branches return before this statement
}
}
fn main() {
let mut double_minus = UnaryMinus(Box::new(UnaryMinus(Box::new(Num(12)))));
simplify(&mut double_minus);
println!("{:?}", double_minus);
}
struct A {...whatever...};
const MY_CONST_USIZE:usize = 127;
// somewhere in function
// vec1_of_A:Vec<A> vec2_of_A_refs:Vec<&A> have values from different data sources and have different inside_item types
let my_iterator;
if my_rand_condition() { // my_rand_condition is random and compiles for sake of simplicity
my_iterator = vec1_of_A.iter().map(|x| (MY_CONST_USIZE, &x)); // Map<Iter<Vec<A>>>
} else {
my_iterator = vec2_of_A_refs.iter().enumerate(); // Enumerate<Iter<Vec<&A>>>
}
how to make this code compile?
at the end (based on condition) I would like to have iterator able build from both inputs and I don't know how to integrate these Map and Enumerate types into single variable without calling collect() to materialize iterator as Vec
reading material will be welcomed
In the vec_of_A case, first you need to replace &x with x in your map function. The code you have will never compile because the mapping closure tries to return a reference to one of its parameters, which is never allowed in Rust. To make the types match up, you need to dereference the &&A in the vec2_of_A_refs case to &A instead of trying to add a reference to the other.
Also, -127 is an invalid value for usize, so you need to pick a valid value, or use a different type than usize.
Having fixed those, now you need some type of dynamic dispatch. The simplest approach would be boxing into a Box<dyn Iterator>.
Here is a complete example:
#![allow(unused)]
#![allow(non_snake_case)]
struct A;
// Fixed to be a valid usize.
const MY_CONST_USIZE: usize = usize::MAX;
fn my_rand_condition() -> bool { todo!(); }
fn example() {
let vec1_of_A: Vec<A> = vec![];
let vec2_of_A_refs: Vec<&A> = vec![];
let my_iterator: Box<dyn Iterator<Item=(usize, &A)>>;
if my_rand_condition() {
// Fixed to return x instead of &x
my_iterator = Box::new(vec1_of_A.iter().map(|x| (MY_CONST_USIZE, x)));
} else {
// Added map to deref &&A to &A to make the types match
my_iterator = Box::new(vec2_of_A_refs.iter().map(|x| *x).enumerate());
}
for item in my_iterator {
// ...
}
}
(Playground)
Instead of a boxed trait object, you could also use the Either type from the either crate. This is an enum with Left and Right variants, but the Either type itself implements Iterator if both the left and right types also do, with the same type for the Item associated type. For example:
#![allow(unused)]
#![allow(non_snake_case)]
use either::Either;
struct A;
const MY_CONST_USIZE: usize = usize::MAX;
fn my_rand_condition() -> bool { todo!(); }
fn example() {
let vec1_of_A: Vec<A> = vec![];
let vec2_of_A_refs: Vec<&A> = vec![];
let my_iterator;
if my_rand_condition() {
my_iterator = Either::Left(vec1_of_A.iter().map(|x| (MY_CONST_USIZE, x)));
} else {
my_iterator = Either::Right(vec2_of_A_refs.iter().map(|x| *x).enumerate());
}
for item in my_iterator {
// ...
}
}
(Playground)
Why would you choose one approach over the other?
Pros of the Either approach:
It does not require a heap allocation to store the iterator.
It implements dynamic dispatch via match which is likely (but not guaranteed) to be faster than dynamic dispatch via vtable lookup.
Pros of the boxed trait object approach:
It does not depend on any external crates.
It scales easily to many different types of iterators; the Either approach quickly becomes unwieldy with more than two types.
You can do this using a Boxed trait object like so:
let my_iterator: Box<dyn Iterator<Item = _>> = if my_rand_condition() {
Box::new(vec1_of_A.iter().map(|x| (MY_CONST_USIZE, x)))
} else {
Box::new(vec2_of_A_refs.iter().enumerate().map(|(i, x)| (i, *x)))
};
I don't think this is a good idea generally though. A few things to note:
The use of trait objects means the types here must be resolved dynamically. This adds a lot of overhead.
The closure in vec1's iterator's map method cannot reference its arguments. Instead the second map must be added to vec2s iterator. The effect of this is that all the items are being copied regardless. If you are doing this, why not collect()? The overhead for creating the Vec or whatever you choose should be less than that of the dynamic resolution.
Bit pedantic, but remember if statements are expressions in Rust, and so the assignment can be expressed a little more cleanly as I have done above.
I'm developing a basic implementation of a binary search tree in Rust. I was creating a method for counting leaves, but ran into some very strange looking code to get it to work. I wanted to clarify if the way I did it is:
Considered appropriate by Rust standards/convention
Efficient
I'm using an enum that differentiates between a node or nothing being present:
pub enum BST<T: Ord> {
Node {
value: T, // template with type T
left: Box<BST<T>>,
right: Box<BST<T>>,
},
Empty,
}
Now, count_leaves(&self) is first checking if the provided type is either a Node or Empty. If it's Empty, I can just return 0, but if it's a valid Node then I need to check if the left and right children are Empty. If so, then I can return a 1 because I'm at a leaf.
pub fn count_leaves(&self) -> u32 {
match self {
BST::Node {
value: _,
ref left,
ref right,
} => {
match (&**left, &**right) {
(BST::Empty, BST::Empty) => 1,
_ => {
left.count_leaves() + right.count_leaves()
}
}
},
BST::Empty => 0
}
}
So, to check if both left and right are BST::Empty, I wanted to use a tuple! But in doing so, Rust tries to move both left and right into the tuple. Since my type BST<T> does not implement the Copy trait, this is not possible. Also, since left and right are both boxes and borrowed, something simply like this is not possible:
match (left, right) {
BST::Empty => {},
_ => {}
}
In order to use this tuple, it looks like I need to first dereference the borrowed box using *, then dereference that box again into its type using a second *, and then finally borrow using & to avoid a move. This gives the weird looking (&**left, &**right).
From my testing this works, but I thought it looked really strange. Should I rewrite this in a more readable way (if there is one)?
I've considered using Option<> instead of the enum with the Node and Empty, but I wasn't sure if that would lead to anything more readable or more efficient.
Thanks!
EDIT:
Just wanted to clarify that when I say leaves I mean a node in the tree with no children, not a non-empty node.
You're just overthinking it. You already have a base case for when a node is empty so you don't need both matches. When possible you want to ignore the boxes in favor of implicitly using Deref to perform operations on them.
pub fn count_leaves(&self) -> u32 {
match self {
BST::Node { left, right, .. } => 1 + left.count_leaves() + right.count_leaves(),
BST::Empty => 0,
}
}
By manually checking if both sides are empty before calling count_leaves on both sides, you might actually be decreasing performance. A recursive function call (or any function call really) can be very cheap since your code is already at the processor. However, it takes (a very tiny) time for a processor to read a value from a pointer so ideally you only needs to do it once per value. However the compiler is made of eldritch sorcery so it will probably figure out the best way to optimize your code either way. Another option which may help is to add an #[inline] hint to the function to ask the compiler to unroll the recursive call one or more times if it thinks it would be helpful for performance.
You may find it helpful to change the structure of your BST. By making your tree an enum, then it needs to be matched every time you perform any operation on it.
pub struct BST<T> {
left: Option<Box<BST<T>>>,
right: Option<Box<BST<T>>>,
data: T,
}
impl<T> BST<T> {
pub fn new_root(data: T) -> Self {
BST {
left: None,
right: None,
data,
}
}
pub fn count_leaves(&self) -> u64 {
let left_leaves = self.left.as_ref().map_or(0, |x| x.count_leaves());
let right_leaves = self.right.as_ref().map_or(0, |x| x.count_leaves());
left_leaves + right_leaves + 1
}
}
impl<T: Ord> BST<T> {
pub fn insert(&mut self, data: T) {
let side = match self.data.cmp(&data) {
Ordering::Less | Ordering::Equal => &mut self.left,
Ordering::Greater => &mut self.right,
};
if let Some(node) = side {
node.insert(data);
} else {
*side = Some(Box::new(Self::new_root(data)));
}
}
}
Now this works well, but it also introduces a new problem that I'm guessing you were attempting to avoid with your solution. You can't create an empty BST<T>. This may make initializing your program difficult. We can fix this by using a small wrapper struct (Ex: pub struct BinarySearchTree<T>(Option<BST<T>>)). This is also what std::collections::LinkedList does. You may also be surprised to learn that this cuts our memory footprint in half compared to the original post. This is caused by Empty requiring just as much space as Node. So this means we need to allocate the entire next layer of the tree even though we don't use it.
I am trying to implement Red-Black Tree in Rust. After 2 days of battling with the compiler, I am ready to give up and am here asking for help.
This question helped me quite a bit: How do I handle/circumvent "Cannot assign to ... which is behind a & reference" in Rust?
I looked at existing sample code for RB-Trees in Rust, but all of the ones I saw use some form of unsafe operations or null, which we are not supposed to use here.
I have the following code:
#[derive(Debug, Clone, PartialEq)]
pub enum Colour {
Red,
Black,
}
type T_Node<T> = Option<Box<Node<T>>>;
#[derive(Debug, Clone, PartialEq)]
pub struct Node<T: Copy + Clone + Ord> {
value: T,
colour: Colour,
parent: T_Node<T>,
left: T_Node<T>,
right: T_Node<T>,
}
impl<T: Copy + Clone + Ord> Node<T>
{
pub fn new(value: T) -> Node<T>
{
Node {
value: value,
colour: Colour::Red, // add a new node as red, then fix violations
parent: None,
left: None,
right: None,
// height: 1,
}
}
pub fn insert(&mut self, value: T)
{
if self.value == value
{
return;
}
let mut leaf = if value < self.value { &mut self.left } else { &mut self.right };
match leaf
{
None =>
{
let mut new_node = Node::new(value);
new_node.parent = Some(Box::new(self));
new_node.colour = Colour::Red;
(*leaf) = Some(Box::new(new_node));
},
Some(ref mut leaf) =>
{
leaf.insert(value);
}
};
}
}
The line new_node.parent = Some(Box::new(self)); gives me the error.
I understand understand why the error happens (self is declared as a mutable reference) and I have no idea how to fix this, but I need self to be a mutable reference so I can modify my tree (unless you can suggest something better).
I tried to declare the T_Node to have a mutable reference instead of just Node, but that just created more problems.
I am also open to suggestions for a better choice of variable types and what not.
Any help is appreciated.
There are some faults in the design which makes it impossible to go any further without making some changes.
First, Box doesn't support shared ownership but you require that because the same node is referenced by parent (rbtree.right/rbtree.left) and child (rbtree.parent). For that you need Rc.
So instead of Box, you will need to switch to Rc:
type T_Node<T> = Option<Rc<Node<T>>>;
But this doesn't solve the problem. Now your node is inside Rc and Rc doesn't allow mutation to it's contents (you can mutate by get_mut but that requires it to be unique which is not a constant in your case). You won't be able to do much with your tree unless you can mutate a node.
So you need to use interior mutability pattern. For that we will add an additional layer of RefCell.
type T_Node<T> = Option<Rc<RefCell<Node<T>>>>;
Now, this will allow us to mutate the contents inside.
But this doesn't solve it. Because you need to hold a reference from the child to the parent as well, you will end up creating a reference cycle.
Luckily, the rust book explains how to fix reference cycle for the exact same scenario:
To make the child node aware of its parent, we need to add a parent field to our Node struct definition. The trouble is in deciding what the type of parent should be. We know it can’t contain an Rc, because that would create a reference cycle with leaf.parent pointing to branch and branch.children pointing to leaf, which would cause their strong_count values to never be 0. Thinking about the relationships another way, a parent node should own its children: if a parent node is dropped, its child nodes should be dropped as well. However, a child should not own its parent: if we drop a child node, the parent should still exist. This is a case for weak references!
So we need child to hold a weak reference to parent. This can be done as:
type Child<T> = Option<Rc<RefCell<Node<T>>>>;
type Parent<T> = Option<Weak<RefCell<Node<T>>>>;
Now we have fixed majority of the design.
One more thing that we should do is, instead of exposing Node directly, we will encapsulate it in a struct RBTree which will hold the root of the tree and operations like insert, search, delete, etc. can be called on RBtree. This will make things simple and implementation will become more logical.
pub struct RBTree<T: Ord> {
root: Child<T>,
}
Now, let's write an insert implementation similar to yours:
impl<T: Ord> RBTree<T> {
pub fn insert(&mut self, value: T) {
fn insert<T: Ord>(child: &mut Child<T>, mut new_node: Node<T>) {
let child = child.as_ref().unwrap();
let mut child_mut_borrow = child.borrow_mut();
if child_mut_borrow.value == new_node.value {
return;
}
let leaf = if child_mut_borrow.value > new_node.value {
&mut child_mut_borrow.left
} else {
&mut child_mut_borrow.right
};
match leaf {
Some(_) => {
insert(leaf, new_node);
}
None => {
new_node.parent = Some(Rc::downgrade(&child));
*leaf = Some(Rc::new(RefCell::new(new_node)));
}
};
}
let mut new_node = Node::new(value);
if self.root.is_none() {
new_node.parent = None;
self.root = Some(Rc::new(RefCell::new(new_node)));
} else {
// We ensure that a `None` is never sent to insert()
insert(&mut self.root, new_node);
}
}
}
I defined an insert function inside the RBTree::insert just for the sake of simplicity of recursive calls. The outer functions tests for root and further insertions are carried out inside nested insert functions.
Basically, we start with:
let mut new_node = Node::new(value);
This creates a new node.
Then,
if self.root.is_none() {
new_node.parent = None;
self.root = Some(Rc::new(RefCell::new(new_node)));
} else {
// We ensure that a `None` is never sent to insert()
insert(&mut self.root, new_node);
}
If root is None, insert at root, otherwise call insert with root itself. So the nested insert function basically receives the parent in which left and right child are checked and the insertion is made.
Then, the control moves to the nested insert function.
We define the following two lines for making it convenient to access inner data:
let child = child.as_ref().unwrap();
let mut child_mut_borrow = child.borrow_mut();
Just like in your implementation, we return if value is already there:
if child_mut_borrow.value == new_node.value {
return;
}
Now we store a mutable reference to either left or right child:
let leaf = if child_mut_borrow.value > new_node.value {
&mut child_mut_borrow.left
} else {
&mut child_mut_borrow.right
};
Now, a check is made on the child if it is None or Some. In case of None, we make the insertion. Otherwise, we call insert recursively:
match leaf {
Some(_) => {
insert(leaf, new_node);
}
None => {
new_node.parent = Some(Rc::downgrade(&child));
*leaf = Some(Rc::new(RefCell::new(new_node)));
}
};
Rc::downgrade(&child) is for generating a weak reference.
Here is a working sample: Playground