Iterative binary search tree using enum and match - rust

I am a beginner, so bear with me.
Given the following enum definition for a BST.
enum Tree {
Node(i64, Box<Tree>, Box<Tree>),
Nil,
}
I want to implement Tree::insert. Here is what I have so far. (Get ready for spaghetti).
impl Tree {
fn new() -> Tree {
Nil
}
fn insert(self: &mut Tree, new_val: i64) -> Tree {
let mut temp = self;
loop {
match *temp {
Node(val, ref mut left, ref mut right) => /* go left is smaller, else go right*/,
Nil => { break; },
}
}
/* create new node and have the left or right pointer of the leaf point to it */
return *self;
}
I understand how I would implement this with functional style, where I construct a new tree each call with the left and right subtree being the result of the recursive call to insert on the appropriate subtree. I also think I could implement this more easily with a struct. But I want to use this mix of styles.
Onto what is confusing me...
Conceptually, I want to have a reference to the root, and then continue mutating this reference to the left or right subtrees, as I would if I wrote this in C++. But because of borrowing semantics, I can't seem to navigate to the left and right subtree by making a copy of a reference. Additionally, rust doesn't seem to give me access to pointers to the same degree that C/C++ do, and I am generally confused as heck.
Any tips regarding what I need to do to make this piece of code work without outright giving up and moving to functional style or making a struct? Thanks in advance.

Maybe something along the lines of the following?
fn insert(self: &mut Tree, new_val: i64) {
let mut temp : &mut Tree = self;
loop {
temp = match temp {
Tree::Node(val, ref mut left, ref mut right) => {
if new_val<*val {
left.as_mut()
} else {
right.as_mut()
}
},
Tree::Nil => { break; },
}
}
*temp = Tree::Node(new_val, Box::new(Tree::Nil), Box::new(Tree::Nil))
}

Related

Binary tree node with pointer to siblings in Rust

I am trying to figure out the equivalent of the typical setSibling C code exercise:
// Assume the tree is fully balanced, i.e. the lowest level is fully populated.
struct Node {
Node * left;
Node * right;
Node * sibling;
}
void setSibling(Node * root) {
if (!root) return;
if (root->left) {
root->left->sibling = root->right;
if (root->sibling) root->right->sibling = root->sibling->left;
SetSibling(left);
SetSibling(right);
}
}
Of course Rust is a different world, so I am forced to think about ownership now. My lousy attempt.
struct TreeNode<'a> {
left: Option<&'a TreeNode<'a>>,
right: Option<&'a TreeNode<'a>>,
sibling: Option<&'a TreeNode<'a>>,
value: String
}
fn BuildTreeNode<'a>(aLeft: Option<&'a TreeNode<'a>>, aRight: Option<&'a TreeNode<'a>>, aValue: String) -> TreeNode<'a> {
TreeNode {
left: aLeft,
right: aRight,
value: aValue,
sibling: None
}
}
fn SetSibling(node: &mut Option<&TreeNode>) {
match node {
Some(mut n) => {
match n.left {
Some(mut c) => {
//c*.sibling = n.right;
match n.sibling {
Some(s) => { n.right.unwrap().sibling = s.left },
None => {}
}
},
None => {}
}
},
None => return
}
}
What's the canonical way to represent graph nodes like these?
Question: what's the canonical way to represent graph nodes like these?
It seems like a typical case of "confused ownership" once you introduce the sibling link: with a strict tree you can have each parent own its children, but the sibling link means this is a graph, and a given node has multiple owners.
AFAIK there are two main ways to resolve this, at least in safe Rust
reference counting and inner mutability, if each node is reference-counted, the sibling link can be a reference or weak reference with little trouble, the main drawbacks are this requires inner mutability and the navigation is gnarly, though a few utility methods can help
"unfold" the graph into an array, and use indices for your indirection through the tree, the main drawback is this requires either threading or keeping a backreference (with inner mutability) to the array, or alternatively doing everything iteratively
Both basically work around the ownership constraint, one by muddying the ownership itself, and the other by moving ownership to a higher power (the array).
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=68d092d0d86dc32fe07902c832262ef4 seems to be more or less what you're looking using Rc & inner mutability:
use std::cell::RefCell;
use std::rc::{Rc, Weak};
#[derive(Default)]
pub struct TreeNode {
left: Option<Rc<TreeNode>>,
right: Option<Rc<TreeNode>>,
sibling: RefCell<Option<Weak<TreeNode>>>,
v: u8,
}
impl TreeNode {
pub fn new(v: u8) -> Rc<Self> {
Rc::new(TreeNode {
v,
..TreeNode::default()
})
}
pub fn new_with(left: Option<Rc<TreeNode>>, right: Option<Rc<TreeNode>>, v: u8) -> Rc<Self> {
Rc::new(TreeNode {
left,
right,
v,
sibling: RefCell::new(None),
})
}
pub fn set_siblings(self: &Rc<Self>) {
let Some(left) = self.left() else { return };
let right = self.right();
left.sibling.replace(right.map(Rc::downgrade));
if let Some(sibling) = self.sibling() {
// not sure this is correct, depending on construction, with
// 3 5
// \ \
// 2 4
// \/
// 1
// (2) has a sibling (4) but doesn't have a right node, so
// unconditionally setting right.sibling doesn't seem correct
right
.unwrap()
.sibling
.replace(sibling.left().map(Rc::downgrade));
}
left.set_siblings();
right.map(|r| r.set_siblings());
}
pub fn left(&self) -> Option<&Rc<Self>> {
self.left.as_ref()
}
pub fn right(&self) -> Option<&Rc<Self>> {
self.right.as_ref()
}
pub fn sibling(&self) -> Option<Rc<Self>> {
self.sibling.borrow().as_ref()?.upgrade()
}
}
fn main() {
let t = TreeNode::new_with(
TreeNode::new_with(TreeNode::new(1).into(), TreeNode::new(2).into(), 3).into(),
TreeNode::new(4).into(),
5,
);
t.set_siblings();
assert_eq!(t.left().and_then(|l| l.sibling()).unwrap().v, 4);
let ll = t.left().and_then(|l| l.left());
assert_eq!(ll.map(|ll| ll.v), Some(1));
ll.unwrap().sibling().unwrap();
assert_eq!(
t.left()
.and_then(|l| l.left())
.and_then(|ll| ll.sibling())
.unwrap()
.v,
2
);
}
Note that I assumed the tree is immutable once created, only the siblings links have to be generated post-facto. So I only added inner mutability for those. I also used weak pointers which probably isn't necessary, if the tree is put in an inconsistent state it's not like that'll save anything. All it requires is a few upgrade() and downgrade() calls in stead of clone() though so it's not a huge imposition.
That aside, there are lots of issues with your attempt:
having the same lifetime for your reference and your content is usually an error, the compiler will trust what you tell it, and that can have rather odd effects (e.g. of telling the compiler that something gets borrowed forever)
SetSibling (incorrect naming conventions btw) taking an Option is... unnecessary, function clearly expects to set a sibling, just give it a sibling, and remove the unnecessary outer layer of tests
match is nice, when you need it. Here, you probably don't, if let would do the trick fine especially since there is no else branch
rust generally uses methods, and the method to create an instance (if one such is needed) is idiomatically called new (if there's only one)

Implementing depth-first search using a stack

I've come across a problem that requires doing DFS on a tree defined like such:
pub struct TreeNode {
pub val: i32,
pub left: Option<Rc<RefCell<TreeNode>>>,
pub right: Option<Rc<RefCell<TreeNode>>>,
}
I want to use the non-recursive version of the algorithm with an explicit stack. The tree is read-only and the values in the tree are not guaranteed to be unique (they can't be used to identify a node).
Problem is, that the iterative version requires a visited data structure. Normally, in C++, I'd just use an std::set with node pointers for implementing visited. How would I do the same (or analogous) in Rust? There doesn't seem to be an easy way to get a pointer to an object that I can use in a set.
First off, we don't need to keep track of visited if we know there are no circular dependencies. Normally binary trees don't have circular dependencies so we may be able to assume it simply is not an issue. In this case, we can use a VecDeque as our 'stack' queue.
type TreeNodeRef = Rc<RefCell<TreeNode>>;
pub struct TreeNode {
pub val: i32,
pub left: Option<TreeNodeRef>,
pub right: Option<TreeNodeRef>,
}
pub fn dfs(root: TreeNodeRef, target: i32) -> Option<TreeNodeRef> {
let mut queue = VecDeque::new();
queue.push_back(root);
while let Some(node) = queue.pop_front() {
// Check if this is the node we are looking for
if node.borrow().val == target {
return Some(node)
}
// Add left and write to the back of the queue for DFS
let items = node.borrow();
if let Some(left) = &items.left {
queue.push_front(left.clone());
}
if let Some(right) = &items.right {
queue.push_front(right.clone());
}
}
// Search completed and node was not found
None
}
However, if we need to keep a list of visited nodes, we can cheat a little. An Rc<T> is just a boxed value with a reference count so we can extract a pointer from it. Even though we can not compare TreeNodes, we can store where they are kept in memory. When we do that, the solution looks like this:
pub fn dfs(root: TreeNodeRef, target: i32) -> Option<TreeNodeRef> {
let mut visited = HashSet::new();
let mut queue = VecDeque::new();
queue.push_back(root);
while let Some(node) = queue.pop_front() {
// Check node has not been visited yet
if visited.contains(&Rc::as_ptr(&node)) {
continue
}
// Insert node to visited list
visited.insert(Rc::as_ptr(&node));
if node.borrow().val == target {
return Some(node)
}
let items = node.borrow();
if let Some(left) = &items.left {
queue.push_front(left.clone());
}
if let Some(right) = &items.right {
queue.push_front(right.clone());
}
}
None
}
Rust Playground
You may also find it interesting to look at the bottom 2 code examples in this answer to see how a generic search method could be made.
Edit: Alternatively, here is a Rust Playground of how this could be done with a regular Vec and Rc::clone(x) as recommended by #isaactfa.

Best way in Rust to count leaves in a binary search tree?

I'm developing a basic implementation of a binary search tree in Rust. I was creating a method for counting leaves, but ran into some very strange looking code to get it to work. I wanted to clarify if the way I did it is:
Considered appropriate by Rust standards/convention
Efficient
I'm using an enum that differentiates between a node or nothing being present:
pub enum BST<T: Ord> {
Node {
value: T, // template with type T
left: Box<BST<T>>,
right: Box<BST<T>>,
},
Empty,
}
Now, count_leaves(&self) is first checking if the provided type is either a Node or Empty. If it's Empty, I can just return 0, but if it's a valid Node then I need to check if the left and right children are Empty. If so, then I can return a 1 because I'm at a leaf.
pub fn count_leaves(&self) -> u32 {
match self {
BST::Node {
value: _,
ref left,
ref right,
} => {
match (&**left, &**right) {
(BST::Empty, BST::Empty) => 1,
_ => {
left.count_leaves() + right.count_leaves()
}
}
},
BST::Empty => 0
}
}
So, to check if both left and right are BST::Empty, I wanted to use a tuple! But in doing so, Rust tries to move both left and right into the tuple. Since my type BST<T> does not implement the Copy trait, this is not possible. Also, since left and right are both boxes and borrowed, something simply like this is not possible:
match (left, right) {
BST::Empty => {},
_ => {}
}
In order to use this tuple, it looks like I need to first dereference the borrowed box using *, then dereference that box again into its type using a second *, and then finally borrow using & to avoid a move. This gives the weird looking (&**left, &**right).
From my testing this works, but I thought it looked really strange. Should I rewrite this in a more readable way (if there is one)?
I've considered using Option<> instead of the enum with the Node and Empty, but I wasn't sure if that would lead to anything more readable or more efficient.
Thanks!
EDIT:
Just wanted to clarify that when I say leaves I mean a node in the tree with no children, not a non-empty node.
You're just overthinking it. You already have a base case for when a node is empty so you don't need both matches. When possible you want to ignore the boxes in favor of implicitly using Deref to perform operations on them.
pub fn count_leaves(&self) -> u32 {
match self {
BST::Node { left, right, .. } => 1 + left.count_leaves() + right.count_leaves(),
BST::Empty => 0,
}
}
By manually checking if both sides are empty before calling count_leaves on both sides, you might actually be decreasing performance. A recursive function call (or any function call really) can be very cheap since your code is already at the processor. However, it takes (a very tiny) time for a processor to read a value from a pointer so ideally you only needs to do it once per value. However the compiler is made of eldritch sorcery so it will probably figure out the best way to optimize your code either way. Another option which may help is to add an #[inline] hint to the function to ask the compiler to unroll the recursive call one or more times if it thinks it would be helpful for performance.
You may find it helpful to change the structure of your BST. By making your tree an enum, then it needs to be matched every time you perform any operation on it.
pub struct BST<T> {
left: Option<Box<BST<T>>>,
right: Option<Box<BST<T>>>,
data: T,
}
impl<T> BST<T> {
pub fn new_root(data: T) -> Self {
BST {
left: None,
right: None,
data,
}
}
pub fn count_leaves(&self) -> u64 {
let left_leaves = self.left.as_ref().map_or(0, |x| x.count_leaves());
let right_leaves = self.right.as_ref().map_or(0, |x| x.count_leaves());
left_leaves + right_leaves + 1
}
}
impl<T: Ord> BST<T> {
pub fn insert(&mut self, data: T) {
let side = match self.data.cmp(&data) {
Ordering::Less | Ordering::Equal => &mut self.left,
Ordering::Greater => &mut self.right,
};
if let Some(node) = side {
node.insert(data);
} else {
*side = Some(Box::new(Self::new_root(data)));
}
}
}
Now this works well, but it also introduces a new problem that I'm guessing you were attempting to avoid with your solution. You can't create an empty BST<T>. This may make initializing your program difficult. We can fix this by using a small wrapper struct (Ex: pub struct BinarySearchTree<T>(Option<BST<T>>)). This is also what std::collections::LinkedList does. You may also be surprised to learn that this cuts our memory footprint in half compared to the original post. This is caused by Empty requiring just as much space as Node. So this means we need to allocate the entire next layer of the tree even though we don't use it.

Red-Black Tree in Rust, getting 'expected struct Node, found mutable reference'

I am trying to implement Red-Black Tree in Rust. After 2 days of battling with the compiler, I am ready to give up and am here asking for help.
This question helped me quite a bit: How do I handle/circumvent "Cannot assign to ... which is behind a & reference" in Rust?
I looked at existing sample code for RB-Trees in Rust, but all of the ones I saw use some form of unsafe operations or null, which we are not supposed to use here.
I have the following code:
#[derive(Debug, Clone, PartialEq)]
pub enum Colour {
Red,
Black,
}
type T_Node<T> = Option<Box<Node<T>>>;
#[derive(Debug, Clone, PartialEq)]
pub struct Node<T: Copy + Clone + Ord> {
value: T,
colour: Colour,
parent: T_Node<T>,
left: T_Node<T>,
right: T_Node<T>,
}
impl<T: Copy + Clone + Ord> Node<T>
{
pub fn new(value: T) -> Node<T>
{
Node {
value: value,
colour: Colour::Red, // add a new node as red, then fix violations
parent: None,
left: None,
right: None,
// height: 1,
}
}
pub fn insert(&mut self, value: T)
{
if self.value == value
{
return;
}
let mut leaf = if value < self.value { &mut self.left } else { &mut self.right };
match leaf
{
None =>
{
let mut new_node = Node::new(value);
new_node.parent = Some(Box::new(self));
new_node.colour = Colour::Red;
(*leaf) = Some(Box::new(new_node));
},
Some(ref mut leaf) =>
{
leaf.insert(value);
}
};
}
}
The line new_node.parent = Some(Box::new(self)); gives me the error.
I understand understand why the error happens (self is declared as a mutable reference) and I have no idea how to fix this, but I need self to be a mutable reference so I can modify my tree (unless you can suggest something better).
I tried to declare the T_Node to have a mutable reference instead of just Node, but that just created more problems.
I am also open to suggestions for a better choice of variable types and what not.
Any help is appreciated.
There are some faults in the design which makes it impossible to go any further without making some changes.
First, Box doesn't support shared ownership but you require that because the same node is referenced by parent (rbtree.right/rbtree.left) and child (rbtree.parent). For that you need Rc.
So instead of Box, you will need to switch to Rc:
type T_Node<T> = Option<Rc<Node<T>>>;
But this doesn't solve the problem. Now your node is inside Rc and Rc doesn't allow mutation to it's contents (you can mutate by get_mut but that requires it to be unique which is not a constant in your case). You won't be able to do much with your tree unless you can mutate a node.
So you need to use interior mutability pattern. For that we will add an additional layer of RefCell.
type T_Node<T> = Option<Rc<RefCell<Node<T>>>>;
Now, this will allow us to mutate the contents inside.
But this doesn't solve it. Because you need to hold a reference from the child to the parent as well, you will end up creating a reference cycle.
Luckily, the rust book explains how to fix reference cycle for the exact same scenario:
To make the child node aware of its parent, we need to add a parent field to our Node struct definition. The trouble is in deciding what the type of parent should be. We know it can’t contain an Rc, because that would create a reference cycle with leaf.parent pointing to branch and branch.children pointing to leaf, which would cause their strong_count values to never be 0. Thinking about the relationships another way, a parent node should own its children: if a parent node is dropped, its child nodes should be dropped as well. However, a child should not own its parent: if we drop a child node, the parent should still exist. This is a case for weak references!
So we need child to hold a weak reference to parent. This can be done as:
type Child<T> = Option<Rc<RefCell<Node<T>>>>;
type Parent<T> = Option<Weak<RefCell<Node<T>>>>;
Now we have fixed majority of the design.
One more thing that we should do is, instead of exposing Node directly, we will encapsulate it in a struct RBTree which will hold the root of the tree and operations like insert, search, delete, etc. can be called on RBtree. This will make things simple and implementation will become more logical.
pub struct RBTree<T: Ord> {
root: Child<T>,
}
Now, let's write an insert implementation similar to yours:
impl<T: Ord> RBTree<T> {
pub fn insert(&mut self, value: T) {
fn insert<T: Ord>(child: &mut Child<T>, mut new_node: Node<T>) {
let child = child.as_ref().unwrap();
let mut child_mut_borrow = child.borrow_mut();
if child_mut_borrow.value == new_node.value {
return;
}
let leaf = if child_mut_borrow.value > new_node.value {
&mut child_mut_borrow.left
} else {
&mut child_mut_borrow.right
};
match leaf {
Some(_) => {
insert(leaf, new_node);
}
None => {
new_node.parent = Some(Rc::downgrade(&child));
*leaf = Some(Rc::new(RefCell::new(new_node)));
}
};
}
let mut new_node = Node::new(value);
if self.root.is_none() {
new_node.parent = None;
self.root = Some(Rc::new(RefCell::new(new_node)));
} else {
// We ensure that a `None` is never sent to insert()
insert(&mut self.root, new_node);
}
}
}
I defined an insert function inside the RBTree::insert just for the sake of simplicity of recursive calls. The outer functions tests for root and further insertions are carried out inside nested insert functions.
Basically, we start with:
let mut new_node = Node::new(value);
This creates a new node.
Then,
if self.root.is_none() {
new_node.parent = None;
self.root = Some(Rc::new(RefCell::new(new_node)));
} else {
// We ensure that a `None` is never sent to insert()
insert(&mut self.root, new_node);
}
If root is None, insert at root, otherwise call insert with root itself. So the nested insert function basically receives the parent in which left and right child are checked and the insertion is made.
Then, the control moves to the nested insert function.
We define the following two lines for making it convenient to access inner data:
let child = child.as_ref().unwrap();
let mut child_mut_borrow = child.borrow_mut();
Just like in your implementation, we return if value is already there:
if child_mut_borrow.value == new_node.value {
return;
}
Now we store a mutable reference to either left or right child:
let leaf = if child_mut_borrow.value > new_node.value {
&mut child_mut_borrow.left
} else {
&mut child_mut_borrow.right
};
Now, a check is made on the child if it is None or Some. In case of None, we make the insertion. Otherwise, we call insert recursively:
match leaf {
Some(_) => {
insert(leaf, new_node);
}
None => {
new_node.parent = Some(Rc::downgrade(&child));
*leaf = Some(Rc::new(RefCell::new(new_node)));
}
};
Rc::downgrade(&child) is for generating a weak reference.
Here is a working sample: Playground

How to consume and replace a value in an &mut ref [duplicate]

This question already has answers here:
How can I swap in a new value for a field in a mutable reference to a structure?
(2 answers)
Closed 5 years ago.
Sometimes I run into a problem where, due to implementation details that should be invisible to the user, I need to "destroy" a &mut and replace it in-memory. This typically ends up happening in recursive methods or IntoIterator implementations on recursive structures. It typically follows the form of:
fn create_something(self);
pub fn do_something(&mut self) {
// What you want to do
*self = self.create_something();
}
One example that I happened to have in my current project is in a KD Tree I've written, when I "remove" a node, instead of doing logic to rearrange the children, I just destructure the node I need to remove and rebuild it from the values in its subtrees:
// Some recursive checks to identify is this is our node above this
if let Node{point, left, right} = mem::replace(self, Sentinel) {
let points = left.into_iter().chain(right.into_iter()).collect();
(*self) = KDNode::new(points);
Some(point)
} else {
None
}
Another more in-depth example is the IntoIterator for this KDTree, which has to move a curr value out of the iterator, test it, and then replace it:
// temporarily swap self.curr with a dummy value so we can
// move out of it
let tmp = mem::replace(&mut self.curr, (Sentinel,Left));
match tmp {
// If the next node is a Sentinel, that means the
// "real" next node was either the parent, or we're done
(Sentinel,_) => {
if self.stack.is_empty() {
None
} else {
self.curr = self.stack.pop().expect("Could not pop iterator parent stack");
self.next()
}
}
// If the next node is to yield the current node,
// then the next node is it's right child's leftmost
// descendent. We only "load" the right child, and lazily
// evaluate to its left child next iteration.
(Node{box right,point,..},Me) => {
self.curr = (right,Left);
Some(point)
},
// Left is an instruction to lazily find this node's left-most
// non-sentinel child, so we recurse down, pushing the parents on the
// stack as we go, and then say that our next node is our right child.
// If this child doesn't exist, then it will be taken care of by the Sentinel
// case next call.
(curr # Node{..},Left) => {
let mut curr = curr;
let mut left = get_left(&mut curr);
while !left.is_sentinel() {
self.stack.push((curr,Me));
curr = left;
left = get_left(&mut curr);
}
let (right,point) = get_right_point(curr);
self.curr = (right, Left);
Some(point)
}
As you can see, my current method is to just use mem::replace with a dummy value, and then just overwrite the dummy value later. However, I don't like this for several reasons:
In some cases, there's no suitable dummy value. This is especially true if there's no public/easy way to construct a "zero value" for one or more of your struct members (e.g. what if the struct held a MutexGuard?). If the member you need to dummy-replace is in another module (or crate), you may be bound by difficult constraints of its construction that are undesireable when trying to build a dummy type.
The struct may be rather large, in which case doing more moves than is necessary may be undesirable (in practice, this is unlikely to be a big problem, admittedly).
It just "feels" unclean, since the "move" is technically more of an "update". In fact, the simplest example might be something like *self = self.next.do_something() which will still have problems.
In some cases, such as that first remove snippet I showed, you could perhaps more cleanly represent it as a fn do_something(self) -> Self, but in other cases such as the IntoIterator example this can't be done because you're constrained by the trait definition.
Is there any better, cleaner way to do this sort of in-place update?
In any case we'll need assignment, mem::replace, mem::swap, or something like that. Because given a &mut reference to an object there is no way to move this object (or any of it's fields) out without replacing it's memory area with something valid, as long as Rust forbids references to uninitialized memory.
As for dummy values for replacement, you can always make them yourself for any type by using some wrapper type. For example, I often use Option for this purpose, where Some(T) is the value of type T, and None acts as dummy. This is what I mean:
struct Tree<T>(Option<Node<T>>);
enum Node<T> {
Leaf(T),
Children(Vec<Tree<T>>),
}
impl<T> Tree<T> where T: PartialEq {
fn remove(&mut self, value: &T) {
match self.0.take() {
Some(Node::Leaf(ref leaf_value)) if leaf_value == value =>
(),
node # Some(Node::Leaf(..)) =>
*self = Tree(node),
Some(Node::Children(node_children)) => {
let children: Vec<_> =
node_children
.into_iter()
.filter_map(|mut tree| { tree.remove(value); tree.0 })
.map(|node| Tree(Some(node)))
.collect();
if !children.is_empty() {
*self = Tree(Some(Node::Children(children)));
}
},
None =>
panic!("something went wrong"),
}
}
}
playground link

Resources