I'm developing a basic implementation of a binary search tree in Rust. I was creating a method for counting leaves, but ran into some very strange looking code to get it to work. I wanted to clarify if the way I did it is:
Considered appropriate by Rust standards/convention
Efficient
I'm using an enum that differentiates between a node or nothing being present:
pub enum BST<T: Ord> {
Node {
value: T, // template with type T
left: Box<BST<T>>,
right: Box<BST<T>>,
},
Empty,
}
Now, count_leaves(&self) is first checking if the provided type is either a Node or Empty. If it's Empty, I can just return 0, but if it's a valid Node then I need to check if the left and right children are Empty. If so, then I can return a 1 because I'm at a leaf.
pub fn count_leaves(&self) -> u32 {
match self {
BST::Node {
value: _,
ref left,
ref right,
} => {
match (&**left, &**right) {
(BST::Empty, BST::Empty) => 1,
_ => {
left.count_leaves() + right.count_leaves()
}
}
},
BST::Empty => 0
}
}
So, to check if both left and right are BST::Empty, I wanted to use a tuple! But in doing so, Rust tries to move both left and right into the tuple. Since my type BST<T> does not implement the Copy trait, this is not possible. Also, since left and right are both boxes and borrowed, something simply like this is not possible:
match (left, right) {
BST::Empty => {},
_ => {}
}
In order to use this tuple, it looks like I need to first dereference the borrowed box using *, then dereference that box again into its type using a second *, and then finally borrow using & to avoid a move. This gives the weird looking (&**left, &**right).
From my testing this works, but I thought it looked really strange. Should I rewrite this in a more readable way (if there is one)?
I've considered using Option<> instead of the enum with the Node and Empty, but I wasn't sure if that would lead to anything more readable or more efficient.
Thanks!
EDIT:
Just wanted to clarify that when I say leaves I mean a node in the tree with no children, not a non-empty node.
You're just overthinking it. You already have a base case for when a node is empty so you don't need both matches. When possible you want to ignore the boxes in favor of implicitly using Deref to perform operations on them.
pub fn count_leaves(&self) -> u32 {
match self {
BST::Node { left, right, .. } => 1 + left.count_leaves() + right.count_leaves(),
BST::Empty => 0,
}
}
By manually checking if both sides are empty before calling count_leaves on both sides, you might actually be decreasing performance. A recursive function call (or any function call really) can be very cheap since your code is already at the processor. However, it takes (a very tiny) time for a processor to read a value from a pointer so ideally you only needs to do it once per value. However the compiler is made of eldritch sorcery so it will probably figure out the best way to optimize your code either way. Another option which may help is to add an #[inline] hint to the function to ask the compiler to unroll the recursive call one or more times if it thinks it would be helpful for performance.
You may find it helpful to change the structure of your BST. By making your tree an enum, then it needs to be matched every time you perform any operation on it.
pub struct BST<T> {
left: Option<Box<BST<T>>>,
right: Option<Box<BST<T>>>,
data: T,
}
impl<T> BST<T> {
pub fn new_root(data: T) -> Self {
BST {
left: None,
right: None,
data,
}
}
pub fn count_leaves(&self) -> u64 {
let left_leaves = self.left.as_ref().map_or(0, |x| x.count_leaves());
let right_leaves = self.right.as_ref().map_or(0, |x| x.count_leaves());
left_leaves + right_leaves + 1
}
}
impl<T: Ord> BST<T> {
pub fn insert(&mut self, data: T) {
let side = match self.data.cmp(&data) {
Ordering::Less | Ordering::Equal => &mut self.left,
Ordering::Greater => &mut self.right,
};
if let Some(node) = side {
node.insert(data);
} else {
*side = Some(Box::new(Self::new_root(data)));
}
}
}
Now this works well, but it also introduces a new problem that I'm guessing you were attempting to avoid with your solution. You can't create an empty BST<T>. This may make initializing your program difficult. We can fix this by using a small wrapper struct (Ex: pub struct BinarySearchTree<T>(Option<BST<T>>)). This is also what std::collections::LinkedList does. You may also be surprised to learn that this cuts our memory footprint in half compared to the original post. This is caused by Empty requiring just as much space as Node. So this means we need to allocate the entire next layer of the tree even though we don't use it.
Related
I am trying to figure out the equivalent of the typical setSibling C code exercise:
// Assume the tree is fully balanced, i.e. the lowest level is fully populated.
struct Node {
Node * left;
Node * right;
Node * sibling;
}
void setSibling(Node * root) {
if (!root) return;
if (root->left) {
root->left->sibling = root->right;
if (root->sibling) root->right->sibling = root->sibling->left;
SetSibling(left);
SetSibling(right);
}
}
Of course Rust is a different world, so I am forced to think about ownership now. My lousy attempt.
struct TreeNode<'a> {
left: Option<&'a TreeNode<'a>>,
right: Option<&'a TreeNode<'a>>,
sibling: Option<&'a TreeNode<'a>>,
value: String
}
fn BuildTreeNode<'a>(aLeft: Option<&'a TreeNode<'a>>, aRight: Option<&'a TreeNode<'a>>, aValue: String) -> TreeNode<'a> {
TreeNode {
left: aLeft,
right: aRight,
value: aValue,
sibling: None
}
}
fn SetSibling(node: &mut Option<&TreeNode>) {
match node {
Some(mut n) => {
match n.left {
Some(mut c) => {
//c*.sibling = n.right;
match n.sibling {
Some(s) => { n.right.unwrap().sibling = s.left },
None => {}
}
},
None => {}
}
},
None => return
}
}
What's the canonical way to represent graph nodes like these?
Question: what's the canonical way to represent graph nodes like these?
It seems like a typical case of "confused ownership" once you introduce the sibling link: with a strict tree you can have each parent own its children, but the sibling link means this is a graph, and a given node has multiple owners.
AFAIK there are two main ways to resolve this, at least in safe Rust
reference counting and inner mutability, if each node is reference-counted, the sibling link can be a reference or weak reference with little trouble, the main drawbacks are this requires inner mutability and the navigation is gnarly, though a few utility methods can help
"unfold" the graph into an array, and use indices for your indirection through the tree, the main drawback is this requires either threading or keeping a backreference (with inner mutability) to the array, or alternatively doing everything iteratively
Both basically work around the ownership constraint, one by muddying the ownership itself, and the other by moving ownership to a higher power (the array).
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=68d092d0d86dc32fe07902c832262ef4 seems to be more or less what you're looking using Rc & inner mutability:
use std::cell::RefCell;
use std::rc::{Rc, Weak};
#[derive(Default)]
pub struct TreeNode {
left: Option<Rc<TreeNode>>,
right: Option<Rc<TreeNode>>,
sibling: RefCell<Option<Weak<TreeNode>>>,
v: u8,
}
impl TreeNode {
pub fn new(v: u8) -> Rc<Self> {
Rc::new(TreeNode {
v,
..TreeNode::default()
})
}
pub fn new_with(left: Option<Rc<TreeNode>>, right: Option<Rc<TreeNode>>, v: u8) -> Rc<Self> {
Rc::new(TreeNode {
left,
right,
v,
sibling: RefCell::new(None),
})
}
pub fn set_siblings(self: &Rc<Self>) {
let Some(left) = self.left() else { return };
let right = self.right();
left.sibling.replace(right.map(Rc::downgrade));
if let Some(sibling) = self.sibling() {
// not sure this is correct, depending on construction, with
// 3 5
// \ \
// 2 4
// \/
// 1
// (2) has a sibling (4) but doesn't have a right node, so
// unconditionally setting right.sibling doesn't seem correct
right
.unwrap()
.sibling
.replace(sibling.left().map(Rc::downgrade));
}
left.set_siblings();
right.map(|r| r.set_siblings());
}
pub fn left(&self) -> Option<&Rc<Self>> {
self.left.as_ref()
}
pub fn right(&self) -> Option<&Rc<Self>> {
self.right.as_ref()
}
pub fn sibling(&self) -> Option<Rc<Self>> {
self.sibling.borrow().as_ref()?.upgrade()
}
}
fn main() {
let t = TreeNode::new_with(
TreeNode::new_with(TreeNode::new(1).into(), TreeNode::new(2).into(), 3).into(),
TreeNode::new(4).into(),
5,
);
t.set_siblings();
assert_eq!(t.left().and_then(|l| l.sibling()).unwrap().v, 4);
let ll = t.left().and_then(|l| l.left());
assert_eq!(ll.map(|ll| ll.v), Some(1));
ll.unwrap().sibling().unwrap();
assert_eq!(
t.left()
.and_then(|l| l.left())
.and_then(|ll| ll.sibling())
.unwrap()
.v,
2
);
}
Note that I assumed the tree is immutable once created, only the siblings links have to be generated post-facto. So I only added inner mutability for those. I also used weak pointers which probably isn't necessary, if the tree is put in an inconsistent state it's not like that'll save anything. All it requires is a few upgrade() and downgrade() calls in stead of clone() though so it's not a huge imposition.
That aside, there are lots of issues with your attempt:
having the same lifetime for your reference and your content is usually an error, the compiler will trust what you tell it, and that can have rather odd effects (e.g. of telling the compiler that something gets borrowed forever)
SetSibling (incorrect naming conventions btw) taking an Option is... unnecessary, function clearly expects to set a sibling, just give it a sibling, and remove the unnecessary outer layer of tests
match is nice, when you need it. Here, you probably don't, if let would do the trick fine especially since there is no else branch
rust generally uses methods, and the method to create an instance (if one such is needed) is idiomatically called new (if there's only one)
With the Rust project I am working on I would like to keep the code as clean as I can but was having issues with side effects - more specifically how to communicate whether they have been successful or not. Assume we have this enum and struct (the struct may contain more members not relevant to my question):
enum Value{
Mutable(i32),
Constant(i32),
}
struct SomeStruct {
// --snip--
value: Value
}
I would like to have a function to change value:
impl SomeStruct {
fn change_value(&mut self, new_value: i32) {
match self.value {
Value::Mutable(_) => self.value = Value::Mutable(new_value),
Value::Constant(_) => (), /* meeeep! do not do this :( */
}
}
}
I am now unsure how to cleanly handle the case where value is Value::Constant.
From what I've learned in C or C++ you would just return a bool and return true when value was successfully changed or false when it wasn't. This does not feel satisfying as the function signature alone would not make it clear what the bool was for. Also, it would make it optional for the caller of change_value to just ignore the return value and not handle the case where the side effect did not actually happen. I could, of course, adjust the function name to something like try_changing_value but that feels more like a band-aid than actually fixing the issue.
The only alternative I know would be to approach it more "functionally":
fn change_value(some_struct: SomeStruct, new_value: i32) -> Option<SomeStruct> {
match self.value {
Value::Mutable(_) => {
let mut new_struct = /* copy other members */;
new_struct.value = Value::Mutable(new_value);
Some(new_struct)
},
Value::Constant(_) => None,
}
}
However, I imagine if SomeStruct is expensive to copy this is not very efficient. Is there another way to do this?
Finally, to give some more context as to how I got here: My actual code is about solving a sudoku and I modelled a sudoku's cell as either having a given (Value::Constant) value or a guessed (Value::Mutable) value. The "given" values are the once you get at the start of the puzzle and the "guessed" values are the once you fill in yourself. This means changing "given" values should not be allowed. If modelling it this way is the actual issue, I would love to hear your suggestions on how to do it differently!
The general pattern to indicate whether something was successful or not is to use Result:
struct CannotChangeValue;
impl SomeStruct {
fn change_value(&mut self, new_value: i32) -> Result<(), CannotChangeValue> {
match self.value {
Value::Mutable(_) => {
self.value = Value::Mutable(new_value);
Ok(())
}
Value::Constant(_) => Err(CannotChangeValue),
}
}
}
That way the caller can use the existing methods, syntax, and other patterns to decide how to deal with it. Like ignore it, log it, propagate it, do something else, etc. And the compiler will warn that the caller will need to do something with the result (even if that something is to explicitly ignore it).
If the API is designed to let callers determine exactly how to mutate the value, then you may want to return Option<&mut i32> instead to indicate: "I may or may not have a value that you can mutate, here it is." This also has a wealth of methods and tools available to handle it.
I think that Result fits your use-case better, but it just depends on the flexibility and level of abstraction that you're after.
For the sake of completeness with kmdreko's answer, this is the way you would implement a mut-ref-getter, which IMO is the simpler and more flexible approach:
enum Value {
Mutable(i32),
Constant(i32),
}
impl Value {
pub fn get_mut(&mut self) -> Option<&mut i32> {
match self {
Value::Mutable(ref mut v) => Some(v),
Value::Constant(_) => None,
}
}
}
Unlike the Result approach, this forces the caller to consider that setting the value may not be possible. (Granted, Result is a must_use type, so they'd get a warning if discarding it.)
You can write a proxy method on SomeStruct that forwards the invocation to this method:
impl SomeStruct {
pub fn get_mut_value(&mut self) -> Option<&mut i32> {
self.value.get_mut()
}
}
I am working on a system which produces and consumes large numbers of "events", they are a name with some small payload of data, and an attached function which is used as a kind of fold-left over the data, something like a reducer.
I receive from the upstream something like {t: 'fieldUpdated', p: {new: 'new field value'}}, and must in my program associate the fieldUpdated "callback" function with the incoming event and apply it. There is a confirmation command I must echo back (which follows a programatic naming convention), and each type is custome.
I tried using simple macros to do codegen for the structs, callbacks, and with the paste::paste! macro crate, and with the stringify macro I made quite good progress.
Regrettably however I did not find a good way to metaprogram these into a list or map using macros. Extending an enum through macros doesn't seem to be possible, and solutions such as the use of ctors seems extremely hacky.
My ideal case is something this:
type evPayload = {
new: String
}
let evHandler = fn(evPayload: )-> Result<(), Error> { Ok(()) }
// ...
let data = r#"{"t": 'fieldUpdated', "p": {"new": 'new field value'}}"#'
let v: Value = serde_json::from_str(data)?;
Given only knowledge of data how can use macros, specifically (boilerplate is actually 2-3 types, 3 functions, some factory and helper functions) in a way that I can do a name-to-function lookup?
It seems like Serde's adjacently, or internally tagged would get me there, if I could modify a enum in a macro https://serde.rs/enum-representations.html#internally-tagged
It almost feels like I need a macro which can either maintain an enum, or I can "cheat" and use module scoped ctors to do a quasi-static initialization of the names and types into a map.
My program would have on the order of 40-100 of these, with anything from 3-10 in a module. I don't think ctors are necessarily a problem here, but the fact that they're a little grey area handshake, and that ctors might preclude one day being able to cross-compile to wasm put me off a little.
I actually had need of something similar today; the enum macro part specifically. But beware of my method: here be dragons!
Someone more experienced than me — and less mad — should probably vet this. Please do not assume my SAFETY comments to be correct.
Also, if you don't have variant that collide with rust keywords, you might want to tear out the '_' prefix hack entirely. I used a static mut byte array for that purpose, as manipulating strings was an order of magnitude slower, but that was benchmarked in a simplified function. There are likely better ways of doing this.
Finally, I am using it where failing to parse must cause panic, so error handling somewhat limited.
With that being said, here's my current solution:
/// NOTE: It is **imperative** that the length of this array is longer that the longest variant name +1
static mut CHECK_BUFF: [u8; 32] = [b'_'; 32];
macro_rules! str_enums {
($enum:ident: $($variant:ident),* $(,)?) => {
#[allow(non_camel_case_types)]
#[derive(Debug, Default, Hash, Clone, PartialEq, Eq, PartialOrd, Ord)]
enum $enum {
#[default]
UNINIT,
$($variant),*,
UNKNOWN
}
impl FromStr for $enum {
type Err = String;
fn from_str(s: &str) -> Result<Self, Self::Err> {
unsafe {
// SAFETY: Currently only single threaded
CHECK_BUFF[1..len].copy_from_slice(s.as_bytes());
let len = s.len() + 1;
assert!(CHECK_BUFF.len() >= len);
// SAFETY: Safe as long as CHECK_BUFF.len() >= s.len() + 1
match from_utf8_unchecked(&CHECK_BUFF[..len]) {
$(stringify!($variant) => Ok(Self::$variant),)*
_ => Err(format!(
"{} variant not accounted for: {s} ({},)",
stringify!($enum),
from_utf8_unchecked(&CHECK_BUFF[..len])
))
}
}
}
}
impl From<&$enum> for &'static str {
fn from(variant: &$enum) -> Self {
unsafe {
match variant {
// SAFETY: The first byte is always '_', and stripping it of should be safe.
$($enum::$variant => from_utf8_unchecked(&stringify!($variant).as_bytes()[1..]),)*
$enum::UNINIT => {
eprintln!("uninitialized {}!", stringify!($enum));
""
}
$enum::UNKNOWN => {
eprintln!("unknown {}!", stringify!($enum));
""
}
}
}
}
}
impl Display for $enum {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{}", Into::<&str>::into(self))
}
}
};
}
And then I call it like so:
str_enums!(
AttributeKind:
_alias,
_allowduplicate,
_altlen,
_api,
...
_enum,
_type,
_struct,
);
str_enums!(
MarkupKind:
_alias,
_apientry,
_command,
_commands,
...
);
I am trying to convert Rc<Vec<F>> into into Rc<Vec<T>> where T and F are numeric types like u8, f32, f64, etc. As the vectors may be quite large, I would like to avoid copying them if F and T are the same type. I do not manage to find out how to do that. Something like this -- it does not compile as the type comparison T == F is invalid:
fn convert_vec<F: num::NumCast + Copy, T: num::NumCast + Copy>(data: &[F], undef: T) -> Vec<T> {
data.iter()
.map(|v| match T::from(*v) {
Some(x) => x,
None => undef,
})
.collect()
}
fn convert_rc_vec<F: num::NumCast + Copy, T: num::NumCast + Copy>(
data: &Rc<Vec<F>>,
undef: T,
) -> anyhow::Result<Rc<Vec<T>>> {
if (T == F) { // invalid
Ok(data.clone()) // invalid
} else {
Ok(Rc::new(convert_vec(data, undef)))
}
}
The vector that I need to convert from is the response from a server which first sends the data type (something like "u8", "f32", "f64", ...) and then the actual data. At present, I store the vector with these data in enum like
pub enum Values {
UInt8(Rc<Vec<u8>>),
Float32(Rc<Vec<f32>>),
Float64(Rc<Vec<f64>>),
// ...
}
At compile time, I do not know in which format the server will send the data, i.e. I do not know F in advance. I do know T in every case I use it, but T might be a different type depending on the use case.
Using specialized functions like convert_rc_vec_to_f32 it is easy to handle the case where clone() is best. But that requires a separate function for each T with almost identical text. I am trying to find a more elegant solution than writing a macro or more or less repeating the code 9 times.
You should not try to prevent your function from being monomorphized with T and F being the same type, or even change its behavior in that case. Instead, you should not use it at all if it would be monomorphized in that case. This is possible because, if T and F were the same type, you would know it at compile time, so you could actually simply remove the function call at all.
It seems that you are actually storing all these vectors into an enum, which means you only know the actual type at run-time. But this doesn't mean my suggestion doesn't apply. Typically, if you wanted to get a vec of f32, you can do something like
match data {
Float32(v) => v,
Float64(v) => convert_rc_vec(v),
UInt8(v) => convert_rc_vec(v),
...
}
If T and F both have a 'static lifetime, then you can use TypeId to compare the two types "at runtime":
if TypeId::of::<T>() == TypeId::of::<F>() {
Ok(data.clone()) // invalid
} else {
/* ... */
}
However, since this comparison happens "at runtime", the type system still doesn't know that T == F inside of this branch. You can use unsafe code to force this "conversion":
if TypeId::of::<T>() == TypeId::of::<F>() {
Ok(unsafe {
// SAFETY: this is sound because `T == F`, so we're
// just helping the compiler along here, with no actual
// type conversions
Rc::<Vec<T>>::from_raw(
Rc::<Vec<F>>::into_raw(data.clone()) as *const _
)
})
} else {
/* ... */
}
This question already has answers here:
How can I swap in a new value for a field in a mutable reference to a structure?
(2 answers)
Closed 5 years ago.
Sometimes I run into a problem where, due to implementation details that should be invisible to the user, I need to "destroy" a &mut and replace it in-memory. This typically ends up happening in recursive methods or IntoIterator implementations on recursive structures. It typically follows the form of:
fn create_something(self);
pub fn do_something(&mut self) {
// What you want to do
*self = self.create_something();
}
One example that I happened to have in my current project is in a KD Tree I've written, when I "remove" a node, instead of doing logic to rearrange the children, I just destructure the node I need to remove and rebuild it from the values in its subtrees:
// Some recursive checks to identify is this is our node above this
if let Node{point, left, right} = mem::replace(self, Sentinel) {
let points = left.into_iter().chain(right.into_iter()).collect();
(*self) = KDNode::new(points);
Some(point)
} else {
None
}
Another more in-depth example is the IntoIterator for this KDTree, which has to move a curr value out of the iterator, test it, and then replace it:
// temporarily swap self.curr with a dummy value so we can
// move out of it
let tmp = mem::replace(&mut self.curr, (Sentinel,Left));
match tmp {
// If the next node is a Sentinel, that means the
// "real" next node was either the parent, or we're done
(Sentinel,_) => {
if self.stack.is_empty() {
None
} else {
self.curr = self.stack.pop().expect("Could not pop iterator parent stack");
self.next()
}
}
// If the next node is to yield the current node,
// then the next node is it's right child's leftmost
// descendent. We only "load" the right child, and lazily
// evaluate to its left child next iteration.
(Node{box right,point,..},Me) => {
self.curr = (right,Left);
Some(point)
},
// Left is an instruction to lazily find this node's left-most
// non-sentinel child, so we recurse down, pushing the parents on the
// stack as we go, and then say that our next node is our right child.
// If this child doesn't exist, then it will be taken care of by the Sentinel
// case next call.
(curr # Node{..},Left) => {
let mut curr = curr;
let mut left = get_left(&mut curr);
while !left.is_sentinel() {
self.stack.push((curr,Me));
curr = left;
left = get_left(&mut curr);
}
let (right,point) = get_right_point(curr);
self.curr = (right, Left);
Some(point)
}
As you can see, my current method is to just use mem::replace with a dummy value, and then just overwrite the dummy value later. However, I don't like this for several reasons:
In some cases, there's no suitable dummy value. This is especially true if there's no public/easy way to construct a "zero value" for one or more of your struct members (e.g. what if the struct held a MutexGuard?). If the member you need to dummy-replace is in another module (or crate), you may be bound by difficult constraints of its construction that are undesireable when trying to build a dummy type.
The struct may be rather large, in which case doing more moves than is necessary may be undesirable (in practice, this is unlikely to be a big problem, admittedly).
It just "feels" unclean, since the "move" is technically more of an "update". In fact, the simplest example might be something like *self = self.next.do_something() which will still have problems.
In some cases, such as that first remove snippet I showed, you could perhaps more cleanly represent it as a fn do_something(self) -> Self, but in other cases such as the IntoIterator example this can't be done because you're constrained by the trait definition.
Is there any better, cleaner way to do this sort of in-place update?
In any case we'll need assignment, mem::replace, mem::swap, or something like that. Because given a &mut reference to an object there is no way to move this object (or any of it's fields) out without replacing it's memory area with something valid, as long as Rust forbids references to uninitialized memory.
As for dummy values for replacement, you can always make them yourself for any type by using some wrapper type. For example, I often use Option for this purpose, where Some(T) is the value of type T, and None acts as dummy. This is what I mean:
struct Tree<T>(Option<Node<T>>);
enum Node<T> {
Leaf(T),
Children(Vec<Tree<T>>),
}
impl<T> Tree<T> where T: PartialEq {
fn remove(&mut self, value: &T) {
match self.0.take() {
Some(Node::Leaf(ref leaf_value)) if leaf_value == value =>
(),
node # Some(Node::Leaf(..)) =>
*self = Tree(node),
Some(Node::Children(node_children)) => {
let children: Vec<_> =
node_children
.into_iter()
.filter_map(|mut tree| { tree.remove(value); tree.0 })
.map(|node| Tree(Some(node)))
.collect();
if !children.is_empty() {
*self = Tree(Some(Node::Children(children)));
}
},
None =>
panic!("something went wrong"),
}
}
}
playground link