I'm new to Rust but coming from C++ I find the gymnastics of the type system a bit ... troubling. I use problems from LeetCode to teach myself Rust. Take the following definition of a binary tree:
pub struct TreeNode {
pub val: i32,
pub left: Option<Rc<RefCell<TreeNode>>>,
pub right: Option<Rc<RefCell<TreeNode>>>,
}
Okay, this is a bit ugly already, but I sort of understand the reasons (although it's unclear to me why we can't have a single type that combines the functionality of Option, Rc, and RefCell when these seem to occur together quite often).
Anyhow, here's the BFS algorithm I implemented (it doesn't do anything interesting, the point is the general layout):
use std::collections::VecDeque;
fn bfs(root: Option<Rc<RefCell<TreeNode>>>) {
let mut queue = VecDeque::new();
queue.push_back((root, 0));
while queue.len() > 0 {
// 'pos' measures how far out to the left or right the given node is.
let (node, pos) = queue.pop_front().unwrap();
if node.is_none() { continue; }
let subtree = node.unwrap();
queue.push_back((subtree.borrow().left.clone(), pos - 1));
queue.push_back((subtree.borrow().right.clone(), pos + 1));
}
}
My question is: is this really how these things are done in Rust? Isn't there a more idiosyncratic (more concise) way of doing this? I mean, the code uses one of each unwrap(), borrow(), and clone() to get at the left and right tree pointers. This feels a bit cumbersome, to say the least. It might be how things are done in Rust in general, but I'm curious if it's the norm or if it's the exception?
The unwrap()s here are indeed non-idiomatic and can be replaced. The first unwrap() can be replaced with while let, and the second with if let:
while let Some((node, pos)) = queue.pop_front() {
// 'pos' measures how far out to the left or right the given node is.
if let Some(subtree) = node {
queue.push_back((subtree.borrow().left.clone(), pos - 1));
queue.push_back((subtree.borrow().right.clone(), pos + 1));
}
}
Having to wrap the whole loop body with if let is unfortunate. The experimental feature let_else will solve that (stabilized in Rust 1.65.0):
#![feature(let_else)]
while let Some((node, pos)) = queue.pop_front() {
// 'pos' measures how far out to the left or right the given node is.
let Some(subtree) = node else {
continue;
};
queue.push_back((subtree.borrow().left.clone(), pos - 1));
queue.push_back((subtree.borrow().right.clone(), pos + 1));
}
But the borrow() and clone() are because you try to bypass the ownership rules with Rc<RefCell>. This is somewhat considered an antipattern: if you think you need to use that, think again. There may be much more Rusty way of doing things. Sometimes it is simple, sometimes requires a redesign of your data structure (for example, with graphs that are commonly represented with indices in Rust).
A simpler tree will also have a simpler search function:
pub struct TreeNode {
pub val: i32,
pub left: Option<Box<TreeNode>>,
pub right: Option<Box<TreeNode>>,
}
fn bfs(root: Option<&TreeNode>) {
let mut queue = VecDeque::new();
queue.push_back((root, 0));
while let Some((node, pos)) = queue.pop_front() {
// 'pos' measures how far out to the left or right the given node is.
let Some(subtree) = node else {
continue;
};
queue.push_back((subtree.left.as_deref(), pos - 1));
queue.push_back((subtree.right.as_deref(), pos + 1));
}
}
Related
I am trying to figure out the equivalent of the typical setSibling C code exercise:
// Assume the tree is fully balanced, i.e. the lowest level is fully populated.
struct Node {
Node * left;
Node * right;
Node * sibling;
}
void setSibling(Node * root) {
if (!root) return;
if (root->left) {
root->left->sibling = root->right;
if (root->sibling) root->right->sibling = root->sibling->left;
SetSibling(left);
SetSibling(right);
}
}
Of course Rust is a different world, so I am forced to think about ownership now. My lousy attempt.
struct TreeNode<'a> {
left: Option<&'a TreeNode<'a>>,
right: Option<&'a TreeNode<'a>>,
sibling: Option<&'a TreeNode<'a>>,
value: String
}
fn BuildTreeNode<'a>(aLeft: Option<&'a TreeNode<'a>>, aRight: Option<&'a TreeNode<'a>>, aValue: String) -> TreeNode<'a> {
TreeNode {
left: aLeft,
right: aRight,
value: aValue,
sibling: None
}
}
fn SetSibling(node: &mut Option<&TreeNode>) {
match node {
Some(mut n) => {
match n.left {
Some(mut c) => {
//c*.sibling = n.right;
match n.sibling {
Some(s) => { n.right.unwrap().sibling = s.left },
None => {}
}
},
None => {}
}
},
None => return
}
}
What's the canonical way to represent graph nodes like these?
Question: what's the canonical way to represent graph nodes like these?
It seems like a typical case of "confused ownership" once you introduce the sibling link: with a strict tree you can have each parent own its children, but the sibling link means this is a graph, and a given node has multiple owners.
AFAIK there are two main ways to resolve this, at least in safe Rust
reference counting and inner mutability, if each node is reference-counted, the sibling link can be a reference or weak reference with little trouble, the main drawbacks are this requires inner mutability and the navigation is gnarly, though a few utility methods can help
"unfold" the graph into an array, and use indices for your indirection through the tree, the main drawback is this requires either threading or keeping a backreference (with inner mutability) to the array, or alternatively doing everything iteratively
Both basically work around the ownership constraint, one by muddying the ownership itself, and the other by moving ownership to a higher power (the array).
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=68d092d0d86dc32fe07902c832262ef4 seems to be more or less what you're looking using Rc & inner mutability:
use std::cell::RefCell;
use std::rc::{Rc, Weak};
#[derive(Default)]
pub struct TreeNode {
left: Option<Rc<TreeNode>>,
right: Option<Rc<TreeNode>>,
sibling: RefCell<Option<Weak<TreeNode>>>,
v: u8,
}
impl TreeNode {
pub fn new(v: u8) -> Rc<Self> {
Rc::new(TreeNode {
v,
..TreeNode::default()
})
}
pub fn new_with(left: Option<Rc<TreeNode>>, right: Option<Rc<TreeNode>>, v: u8) -> Rc<Self> {
Rc::new(TreeNode {
left,
right,
v,
sibling: RefCell::new(None),
})
}
pub fn set_siblings(self: &Rc<Self>) {
let Some(left) = self.left() else { return };
let right = self.right();
left.sibling.replace(right.map(Rc::downgrade));
if let Some(sibling) = self.sibling() {
// not sure this is correct, depending on construction, with
// 3 5
// \ \
// 2 4
// \/
// 1
// (2) has a sibling (4) but doesn't have a right node, so
// unconditionally setting right.sibling doesn't seem correct
right
.unwrap()
.sibling
.replace(sibling.left().map(Rc::downgrade));
}
left.set_siblings();
right.map(|r| r.set_siblings());
}
pub fn left(&self) -> Option<&Rc<Self>> {
self.left.as_ref()
}
pub fn right(&self) -> Option<&Rc<Self>> {
self.right.as_ref()
}
pub fn sibling(&self) -> Option<Rc<Self>> {
self.sibling.borrow().as_ref()?.upgrade()
}
}
fn main() {
let t = TreeNode::new_with(
TreeNode::new_with(TreeNode::new(1).into(), TreeNode::new(2).into(), 3).into(),
TreeNode::new(4).into(),
5,
);
t.set_siblings();
assert_eq!(t.left().and_then(|l| l.sibling()).unwrap().v, 4);
let ll = t.left().and_then(|l| l.left());
assert_eq!(ll.map(|ll| ll.v), Some(1));
ll.unwrap().sibling().unwrap();
assert_eq!(
t.left()
.and_then(|l| l.left())
.and_then(|ll| ll.sibling())
.unwrap()
.v,
2
);
}
Note that I assumed the tree is immutable once created, only the siblings links have to be generated post-facto. So I only added inner mutability for those. I also used weak pointers which probably isn't necessary, if the tree is put in an inconsistent state it's not like that'll save anything. All it requires is a few upgrade() and downgrade() calls in stead of clone() though so it's not a huge imposition.
That aside, there are lots of issues with your attempt:
having the same lifetime for your reference and your content is usually an error, the compiler will trust what you tell it, and that can have rather odd effects (e.g. of telling the compiler that something gets borrowed forever)
SetSibling (incorrect naming conventions btw) taking an Option is... unnecessary, function clearly expects to set a sibling, just give it a sibling, and remove the unnecessary outer layer of tests
match is nice, when you need it. Here, you probably don't, if let would do the trick fine especially since there is no else branch
rust generally uses methods, and the method to create an instance (if one such is needed) is idiomatically called new (if there's only one)
I've come across a problem that requires doing DFS on a tree defined like such:
pub struct TreeNode {
pub val: i32,
pub left: Option<Rc<RefCell<TreeNode>>>,
pub right: Option<Rc<RefCell<TreeNode>>>,
}
I want to use the non-recursive version of the algorithm with an explicit stack. The tree is read-only and the values in the tree are not guaranteed to be unique (they can't be used to identify a node).
Problem is, that the iterative version requires a visited data structure. Normally, in C++, I'd just use an std::set with node pointers for implementing visited. How would I do the same (or analogous) in Rust? There doesn't seem to be an easy way to get a pointer to an object that I can use in a set.
First off, we don't need to keep track of visited if we know there are no circular dependencies. Normally binary trees don't have circular dependencies so we may be able to assume it simply is not an issue. In this case, we can use a VecDeque as our 'stack' queue.
type TreeNodeRef = Rc<RefCell<TreeNode>>;
pub struct TreeNode {
pub val: i32,
pub left: Option<TreeNodeRef>,
pub right: Option<TreeNodeRef>,
}
pub fn dfs(root: TreeNodeRef, target: i32) -> Option<TreeNodeRef> {
let mut queue = VecDeque::new();
queue.push_back(root);
while let Some(node) = queue.pop_front() {
// Check if this is the node we are looking for
if node.borrow().val == target {
return Some(node)
}
// Add left and write to the back of the queue for DFS
let items = node.borrow();
if let Some(left) = &items.left {
queue.push_front(left.clone());
}
if let Some(right) = &items.right {
queue.push_front(right.clone());
}
}
// Search completed and node was not found
None
}
However, if we need to keep a list of visited nodes, we can cheat a little. An Rc<T> is just a boxed value with a reference count so we can extract a pointer from it. Even though we can not compare TreeNodes, we can store where they are kept in memory. When we do that, the solution looks like this:
pub fn dfs(root: TreeNodeRef, target: i32) -> Option<TreeNodeRef> {
let mut visited = HashSet::new();
let mut queue = VecDeque::new();
queue.push_back(root);
while let Some(node) = queue.pop_front() {
// Check node has not been visited yet
if visited.contains(&Rc::as_ptr(&node)) {
continue
}
// Insert node to visited list
visited.insert(Rc::as_ptr(&node));
if node.borrow().val == target {
return Some(node)
}
let items = node.borrow();
if let Some(left) = &items.left {
queue.push_front(left.clone());
}
if let Some(right) = &items.right {
queue.push_front(right.clone());
}
}
None
}
Rust Playground
You may also find it interesting to look at the bottom 2 code examples in this answer to see how a generic search method could be made.
Edit: Alternatively, here is a Rust Playground of how this could be done with a regular Vec and Rc::clone(x) as recommended by #isaactfa.
I'm developing a basic implementation of a binary search tree in Rust. I was creating a method for counting leaves, but ran into some very strange looking code to get it to work. I wanted to clarify if the way I did it is:
Considered appropriate by Rust standards/convention
Efficient
I'm using an enum that differentiates between a node or nothing being present:
pub enum BST<T: Ord> {
Node {
value: T, // template with type T
left: Box<BST<T>>,
right: Box<BST<T>>,
},
Empty,
}
Now, count_leaves(&self) is first checking if the provided type is either a Node or Empty. If it's Empty, I can just return 0, but if it's a valid Node then I need to check if the left and right children are Empty. If so, then I can return a 1 because I'm at a leaf.
pub fn count_leaves(&self) -> u32 {
match self {
BST::Node {
value: _,
ref left,
ref right,
} => {
match (&**left, &**right) {
(BST::Empty, BST::Empty) => 1,
_ => {
left.count_leaves() + right.count_leaves()
}
}
},
BST::Empty => 0
}
}
So, to check if both left and right are BST::Empty, I wanted to use a tuple! But in doing so, Rust tries to move both left and right into the tuple. Since my type BST<T> does not implement the Copy trait, this is not possible. Also, since left and right are both boxes and borrowed, something simply like this is not possible:
match (left, right) {
BST::Empty => {},
_ => {}
}
In order to use this tuple, it looks like I need to first dereference the borrowed box using *, then dereference that box again into its type using a second *, and then finally borrow using & to avoid a move. This gives the weird looking (&**left, &**right).
From my testing this works, but I thought it looked really strange. Should I rewrite this in a more readable way (if there is one)?
I've considered using Option<> instead of the enum with the Node and Empty, but I wasn't sure if that would lead to anything more readable or more efficient.
Thanks!
EDIT:
Just wanted to clarify that when I say leaves I mean a node in the tree with no children, not a non-empty node.
You're just overthinking it. You already have a base case for when a node is empty so you don't need both matches. When possible you want to ignore the boxes in favor of implicitly using Deref to perform operations on them.
pub fn count_leaves(&self) -> u32 {
match self {
BST::Node { left, right, .. } => 1 + left.count_leaves() + right.count_leaves(),
BST::Empty => 0,
}
}
By manually checking if both sides are empty before calling count_leaves on both sides, you might actually be decreasing performance. A recursive function call (or any function call really) can be very cheap since your code is already at the processor. However, it takes (a very tiny) time for a processor to read a value from a pointer so ideally you only needs to do it once per value. However the compiler is made of eldritch sorcery so it will probably figure out the best way to optimize your code either way. Another option which may help is to add an #[inline] hint to the function to ask the compiler to unroll the recursive call one or more times if it thinks it would be helpful for performance.
You may find it helpful to change the structure of your BST. By making your tree an enum, then it needs to be matched every time you perform any operation on it.
pub struct BST<T> {
left: Option<Box<BST<T>>>,
right: Option<Box<BST<T>>>,
data: T,
}
impl<T> BST<T> {
pub fn new_root(data: T) -> Self {
BST {
left: None,
right: None,
data,
}
}
pub fn count_leaves(&self) -> u64 {
let left_leaves = self.left.as_ref().map_or(0, |x| x.count_leaves());
let right_leaves = self.right.as_ref().map_or(0, |x| x.count_leaves());
left_leaves + right_leaves + 1
}
}
impl<T: Ord> BST<T> {
pub fn insert(&mut self, data: T) {
let side = match self.data.cmp(&data) {
Ordering::Less | Ordering::Equal => &mut self.left,
Ordering::Greater => &mut self.right,
};
if let Some(node) = side {
node.insert(data);
} else {
*side = Some(Box::new(Self::new_root(data)));
}
}
}
Now this works well, but it also introduces a new problem that I'm guessing you were attempting to avoid with your solution. You can't create an empty BST<T>. This may make initializing your program difficult. We can fix this by using a small wrapper struct (Ex: pub struct BinarySearchTree<T>(Option<BST<T>>)). This is also what std::collections::LinkedList does. You may also be surprised to learn that this cuts our memory footprint in half compared to the original post. This is caused by Empty requiring just as much space as Node. So this means we need to allocate the entire next layer of the tree even though we don't use it.
I tried looking for how Rust implements count_ones(). I'm curious because it seems to vastly outperform my own naive implementation (no kidding), and I would really like to see why it is so performant. My guess is that Rust is using some asm to do the work. For completeness, here was my attempt:
/*
* my attempt to implement count_ones for i32 types
* but this is much slower than the default
* implementation.
*/
fn count_ones(num: i32) -> u32 {
let mut ans: u32 = 0;
let mut _num = num;
while _num > 0 {
if _num & 0x1 == 0x1 {
ans += 1;
}
_num >>= 1;
}
ans
}
I found this on the rust repo, but I can't make sense of it (still new to Rust!) (reproduced below).
#[inline]
fn count_ones(self) -> u32 {
unsafe { $ctpop(self as $ActualT) as u32 }
}
Let's follow the code step-by-step.
First, looking at the snippet you've posted - it contains several macro variables (identifiers with a dollar sign prepended), so it is assumed that this code is, in fact, a part of macro definition. Scrolling up, we get the following:
macro_rules! uint_impl {
($T:ty = $ActualT:ty, $BITS:expr,
$ctpop:path,
$ctlz:path,
$cttz:path,
$bswap:path,
$add_with_overflow:path,
$sub_with_overflow:path,
$mul_with_overflow:path) => {
#[stable(feature = "rust1", since = "1.0.0")]
#[allow(deprecated)]
impl Int for $T {
// skipped
}
}
}
Now, to see that are the variable values here, we should find where this macro is invoked. In general, this might be hard, due to the macro scoping rules, but here we'll just search the same file, and here it is:
uint_impl! { u8 = u8, 8,
intrinsics::ctpop8,
intrinsics::ctlz8,
intrinsics::cttz8,
bswap8,
intrinsics::u8_add_with_overflow,
intrinsics::u8_sub_with_overflow,
intrinsics::u8_mul_with_overflow }
(and multiple another invocations). Comparing this with the macro definition, we see that the function we're looking for will be expanded to the following:
#[inline]
fn count_ones(self) -> u32 {
unsafe { intrinsics::ctpop8(self as u8) as u32 }
}
And, finally, intrinsics::ctpop8 is, as Stargateur mentioned in comment, an LLVM intrinsic, i.e. this call is directly converted into LLVM instruction.
However, there's a little better way to find out what is what.
Let's now look for the function we're interested in in the std documentation. Searching for count_ones brings together a bunch of functions, for each primitive number type independently; we'll take a look on the implementation for u8. Clicking the src link on the function brings us to the code:
doc_comment! {
concat!("Returns the number of ones in the binary representation of `self`.
# Examples
Basic usage:
```
", $Feature, "let n = 0b01001100", stringify!($SelfT), ";
assert_eq!(n.count_ones(), 3);", $EndFeature, "
```"),
#[stable(feature = "rust1", since = "1.0.0")]
#[rustc_const_stable(feature = "const_math", since = "1.32.0")]
#[inline]
pub const fn count_ones(self) -> u32 {
intrinsics::ctpop(self as $ActualT) as u32
}
}
...which just directly calls the intrinsics::ctpop function we've found before.
Now you might wonder, why these two searches yielded different pieces of code. Reason is simple: the commit you're referring to is from the fairly old version of rustc - pre-1.0, if I understand correctly; at that time, numerical operations were implemented as part of Num trait, not directly on primitive types. If you check out the implementation for version 1.44.1, which is the current one at the time of writing, you'll see the same code I've quoted above from the docs.
I am a beginner, so bear with me.
Given the following enum definition for a BST.
enum Tree {
Node(i64, Box<Tree>, Box<Tree>),
Nil,
}
I want to implement Tree::insert. Here is what I have so far. (Get ready for spaghetti).
impl Tree {
fn new() -> Tree {
Nil
}
fn insert(self: &mut Tree, new_val: i64) -> Tree {
let mut temp = self;
loop {
match *temp {
Node(val, ref mut left, ref mut right) => /* go left is smaller, else go right*/,
Nil => { break; },
}
}
/* create new node and have the left or right pointer of the leaf point to it */
return *self;
}
I understand how I would implement this with functional style, where I construct a new tree each call with the left and right subtree being the result of the recursive call to insert on the appropriate subtree. I also think I could implement this more easily with a struct. But I want to use this mix of styles.
Onto what is confusing me...
Conceptually, I want to have a reference to the root, and then continue mutating this reference to the left or right subtrees, as I would if I wrote this in C++. But because of borrowing semantics, I can't seem to navigate to the left and right subtree by making a copy of a reference. Additionally, rust doesn't seem to give me access to pointers to the same degree that C/C++ do, and I am generally confused as heck.
Any tips regarding what I need to do to make this piece of code work without outright giving up and moving to functional style or making a struct? Thanks in advance.
Maybe something along the lines of the following?
fn insert(self: &mut Tree, new_val: i64) {
let mut temp : &mut Tree = self;
loop {
temp = match temp {
Tree::Node(val, ref mut left, ref mut right) => {
if new_val<*val {
left.as_mut()
} else {
right.as_mut()
}
},
Tree::Nil => { break; },
}
}
*temp = Tree::Node(new_val, Box::new(Tree::Nil), Box::new(Tree::Nil))
}