I would like to create a struct using the builder pattern which must be validated before construction, and I would like to minimize the construction overhead.
I've come up with a nice way to do that using std::mem::transmute, but I'm far from confident that this approach is really safe, or that it's the best approach.
Here's my code: (Rust Playground)
#[derive(Debug)]
pub struct ValidStruct {
items: Vec<ValidStruct>
}
#[derive(Debug)]
pub struct Builder {
pub items: Vec<Builder>
}
#[derive(Debug)]
pub struct InvalidStructError {}
impl Builder {
pub fn new() -> Self {
Self { items: vec![] }
}
pub fn is_valid(&self) -> bool {
self.items.len() % 2 == 1
}
pub fn build(self) -> Result<ValidStruct, InvalidStructError> {
if !self.is_valid() {
return Err(InvalidStructError {});
}
unsafe {
Ok(std::mem::transmute::<Builder, ValidStruct>(self))
}
}
}
fn main() {
let mut builder = Builder::new();
builder.items.push(Builder::new());
let my_struct = builder.build().unwrap();
println!("{:?}", my_struct)
}
So, this seems to work. I think it should be safe because I know the two structs are identical. Am I missing anything? Could this actually cause problems somehow, or is there a cleaner/better approach available?
You can't normally transmute between different structures just because they seem to have the same fields in the same order, because the compiler might change that. You can avoid the risk by forcing the memory layout but you're then fighting the compiler and preventing optimizations. This approach isn't usually recommended and is, in my opinion, not needed here.
What you want is to have
a recursive data structure with public fields so that you can easily build it
an identical structure, built from the first one but with no public access and only built after validation of the first one
And you want to avoid useless copies for performance reasons.
What I suggest is to have a wrapper class. This makes sense because wrapping a struct in another one is totally costless in Rust.
You could thus have
/// This is the "Builder" struct
pub struct Data {
pub items: Vec<Data>,
}
pub struct ValidStruct {
data: Data, // no public access here
}
impl Data {
pub fn build(self) -> Result<ValidStruct, InvalidStructError> {
if !self.is_valid() {
return Err(InvalidStructError {});
}
Ok(Self{ data })
}
}
(alternatively, you could declare a struct Builder as a wrapper of Data too but with a public access to its field)
the following example illustrates what i am trying to do:
use rayon::prelude::*;
struct Parent {
children: Vec<Child>,
}
struct Child {
value: f64,
index: usize,
//will keep the distances to other children in the children vactor of parent
distances: Vec<f64>,
}
impl Parent {
fn calculate_distances(&mut self) {
self.children
.par_iter_mut()
.for_each(|x| x.calculate_distances(&self.children));
}
}
impl Child {
fn calculate_distances(&mut self, children: &[Child]) {
children
.iter()
.enumerate()
.for_each(|(i, x)| self.distances[i] = (self.value - x.value).abs());
}
}
The above won't compile. The problem is, that i can't access &self.children in the closure of the first for_each. I do understand, why the borrow checker does not allow that, so my question is, if there is a way to make it work with little changes. The solutions i found so far are not really satisfying. One solution would be to just clone children at the beginning of Parent::calculate distances and use that inside the closure(which adds an unnecessary clone). The other solution would be to extract the value field of Child like that:
use rayon::prelude::*;
struct Parent {
children: Vec<Child>,
values: Vec<f64>
}
struct Child {
index: usize,
//will keep the distances to other children in the children vactor of parent
distances: Vec<f64>,
}
impl Parent {
fn calculate_distances(&mut self) {
let values = &self.values;
self.children
.par_iter_mut()
.for_each(|x| x.calculate_distances(values));
}
}
impl Child {
fn calculate_distances(&mut self, values: &[f64]) {
for i in 0..values.len(){
self.distances[i]= (values[self.index]-values[i]).abs();
}
}
}
while this would be efficient it totally messes up my real code, and value conceptually just really belongs to Child. I am relatively new to rust, and just asked myself if there is any nice way of doing this. As far as i understand there would need to be a way to tell the compiler, that i only change the distances field in the parallel iterator, while the value field stays constant. Maybe this is a place to use unsafe? Anyways, i would really appreciate if you could hint me in the right direction, or at least confirm that my code really has to become so much messier to make it work:)
Rust tries hard to prevent you from doing precisely what you want to do: retain access to the whole collection while modifying it. If you're unwilling to adjust the layout of your data to accommodate the borrow checker, you can use interior mutability to make Child::calculate_distances take &self rather than &mut self. Then your problem goes away because it's perfectly fine to hand out multiple shared references to self.children.
Ideally you'd use a RefCell because you don't access the same Child from multiple threads. But Rust doesn't allow that because, based on the signatures of the functions involved, you could do so, which would be a data race. Declaring distances: RefCell<Vec<f64>> makes Child no longer Sync, removing access to Vec<Child>::par_iter().
What you can do is use a Mutex. Although it feels initially wasteful, have in mind that each call to Child::calculate_distances() receives a different Child, so the mutex will always be uncontended and therefore cheap to lock (not involve a system call). And you'd lock it only once per Child::calculate_distances(), not on every access to the array. The code would look like this (playground):
use rayon::prelude::*;
use std::sync::Mutex;
struct Parent {
children: Vec<Child>,
}
struct Child {
value: f64,
index: usize,
//will keep the distances to other children in the children vactor of parent
distances: Mutex<Vec<f64>>,
}
impl Parent {
fn calculate_distances(&mut self) {
self.children
.par_iter()
.for_each(|x| x.calculate_distances(&self.children));
}
}
impl Child {
fn calculate_distances(&self, children: &[Child]) {
let mut distances = self.distances.lock().unwrap();
children
.iter()
.enumerate()
.for_each(|(i, x)| distances[i] = (self.value - x.value).abs());
}
}
You can also try replacing std::sync::Mutex with parking_lot::Mutex which is smaller (only one byte of overhead, no allocation), faster, and doesn't require the unwrap() because it doesn't do lock poisoning.
I'm relatively new to Rust and I'm exercising with macros.
The goal is to archive some sort if React-ish syntax within Rust lang.
I’m aware of the drastic DSL approach like jsx!(<div />), which is a good solution but of another topic. I’m seeking something in the middle, more like Flutter or SwiftUI syntax, which leverages the native language feature as much as possible, but still presents a visually hierarchical code structure.
Here I'm exploring the idea of constructing a View struct using a children! macro.
My main intension here is to enable View.children field to hold an arbitrary arity tuple. Since variadic generic tuple is not available in Rust yet, macro seems to be the only option.
See the code and comment:
struct View<T> {
// children field holds arbitrary nullable data
// but what i really want is Variadic Tuple, sadly not available in rust
children: Option<T>,
}
// `children!` macro is a shorthand for Some(VariadicTuple)
// so to avoid the ugly `Some((...))` double parens syntax.
macro_rules! children {
($($x:expr),+ $(,)?) => (
Some(($($x),*))
);
}
fn main() {
let _view = View {
children: children!(
42,
"Im A String",
View {
children: Some(("foo", "bar")),
},
),
};
}
So far so good. Now there's this final part I really want to further optimize:
View { children: children!(...) }
Is it possible to call a macro right inside the struct body, (instead of at the value position after a field: syntax) so to eliminate the need to write children: field redundantly?
Yes and no. You can not create something exactly as you describe but you can do something very very similar.
You can implement a new function for the View struct like so:
struct View<T> {
// children field holds arbitrary Option<data>.
children: Option<T>,
}
impl<T> View<T> {
pub fn new() -> Self {
Self {
children: children!(
42,
"Im A String",
View {
children: Some(("foo", "bar")),
},
)
}
}
}
// `children!` macro is a shorthand for Some(VariadicTuple)
// so to avoid the ugly `Some((...))` double parens syntax.
macro_rules! children {
($($x:expr),+ $(,)?) => (
Some(($($x),*))
);
}
And then to create a view object you would do this:
fn main() {
let _view1 = View::new();
let _view2 = View::new();
let _view3 = View::new();
// _view1, _view2, and _view3 are *exactly* the same
}
Or if you want to add arguments, something like this with some modifications for your use case:
struct View<T> {
// children field holds arbitrary Option<data>.
children: Option<T>,
}
impl<T> View<T> {
pub fn new(age: i32, string: &str) -> Self {
Self {
children: children!(
age,
string,
View {
children: Some(("foo", "bar")),
},
)
}
}
}
// `children!` macro is a shorthand for Some(VariadicTuple)
// so to avoid the ugly `Some((...))` double parens syntax.
macro_rules! children {
($($x:expr),+ $(,)?) => (
Some(($($x),*))
);
}
And then implementing it:
fn main() {
let _view1 = View::new(42, "Im A String");
let _view2 = View::new(35, "Im another string");
let _view3 = View::new(12, "This is yet another string");
// _view1, _view2, and _view3 are *not* the same because they have differing "age" and "string" attributes.
}
I am here to help if this is not what you are looking for or if you want more help.
In Chapter 4 of "Programming Rust" by Jim Blandy & Jason Orendorff it says,
It follows that the owners and their owned values form trees: your owner is your parent, and the values you own are your children. And at the ultimate root of each tree is a variable; when that variable goes out of scope, the entire tree goes with it. We can see such an ownership tree in the diagram for composers: it’s not a “tree” in the sense of a search tree data structure, or an HTML document made from DOM elements. Rather, we have a tree built from a mixture of types, with Rust’s single-owner rule forbidding any rejoining of structure that could make the arrangement more complex than a tree. Every value in a Rust program is a member of some tree, rooted in some variable.
An example is provided,
This is simplified and pretty, but is there any mechanism to generate an "ownership tree" visualization with Rust or the Rust tooling? Can I dump an ownership tree when debugging?
There isn't really a specific tool for it, but you can get pretty close by deriving the Debug trait. When you derive the Debug trait for a struct, it will give you a recursive representation of all owned data, terminating at primitive types such as str, u32 etc, or when it encounters a custom Debug implementation.
For example, this program here:
use rand;
#[derive(Debug)]
enum State {
Good,
Bad,
Ugly(&'static str),
}
#[derive(Debug)]
struct ExampleStruct {
x_factor: Option<f32>,
children: Vec<ExampleStruct>,
state: State,
}
impl ExampleStruct {
fn random(max_depth: usize) -> Self {
use rand::Rng;
let mut rng = rand::thread_rng();
let child_count = match max_depth {
0 => 0,
_ => rng.gen::<usize>() % max_depth,
};
let mut children = Vec::with_capacity(child_count);
for _ in 0..child_count {
children.push(ExampleStruct::random(max_depth - 1));
}
let state = if rng.gen() {
State::Good
} else if rng.gen() {
State::Bad
} else {
State::Ugly("really ugly")
};
Self {
x_factor: Some(rng.gen()),
children,
state,
}
}
}
fn main() {
let foo = ExampleStruct::random(3);
dbg!(foo);
}
prints something like this:
[src/main.rs:51] foo = ExampleStruct {
x_factor: Some(
0.27388978,
),
children: [
ExampleStruct {
x_factor: Some(
0.5051847,
),
children: [
ExampleStruct {
x_factor: Some(
0.9675246,
),
children: [],
state: Ugly(
"really ugly",
),
},
],
state: Bad,
},
ExampleStruct {
x_factor: Some(
0.70672345,
),
children: [],
state: Ugly(
"really ugly",
),
},
],
state: Bad,
}
Note that not all the data is in line: The children live somewhere else on the heap. They are not stored inside the ExampleStruct, they are simply owned by it.
This could get confusing if you store references to things, because Debug may start to traverse these references. It does not matter to Debug that they aren't owned. In fact, this is the case with the &'static str inside State::Ugly. The actual bytes that make up the string are not owned by any variable, they are hard coded and live inside the program itself. They will exist for as long as the program is running.
I am trying to implement a scenegraph-like data structure in Rust. I would like an equivalent to this C++ code expressed in safe Rust:
struct Node
{
Node* parent; // should be mutable, and nullable (no parent)
std::vector<Node*> child;
virtual ~Node()
{
for(auto it = child.begin(); it != child.end(); ++it)
{
delete *it;
}
}
void addNode(Node* newNode)
{
if (newNode->parent)
{
newNode->parent.child.erase(newNode->parent.child.find(newNode));
}
newNode->parent = this;
child.push_back(newNode);
}
}
Properties I want:
the parent takes ownership of its children
the nodes are accessible from the outside via a reference of some kind
events that touch one Node can potentially mutate the whole tree
Rust tries to ensure memory safety by forbidding you from doing things that might potentially be unsafe. Since this analysis is performed at compile-time, the compiler can only reason about a subset of manipulations that are known to be safe.
In Rust, you could easily store either a reference to the parent (by borrowing the parent, thus preventing mutation) or the list of child nodes (by owning them, which gives you more freedom), but not both (without using unsafe). This is especially problematic for your implementation of addNode, which requires mutable access to the given node's parent. However, if you store the parent pointer as a mutable reference, then, since only a single mutable reference to a particular object may be usable at a time, the only way to access the parent would be through a child node, and you'd only be able to have a single child node, otherwise you'd have two mutable references to the same parent node.
If you want to avoid unsafe code, there are many alternatives, but they'll all require some sacrifices.
The easiest solution is to simply remove the parent field. We can define a separate data structure to remember the parent of a node while we traverse a tree, rather than storing it in the node itself.
First, let's define Node:
#[derive(Debug)]
struct Node<T> {
data: T,
children: Vec<Node<T>>,
}
impl<T> Node<T> {
fn new(data: T) -> Node<T> {
Node { data: data, children: vec![] }
}
fn add_child(&mut self, child: Node<T>) {
self.children.push(child);
}
}
(I added a data field because a tree isn't super useful without data at the nodes!)
Let's now define another struct to track the parent as we navigate:
#[derive(Debug)]
struct NavigableNode<'a, T: 'a> {
node: &'a Node<T>,
parent: Option<&'a NavigableNode<'a, T>>,
}
impl<'a, T> NavigableNode<'a, T> {
fn child(&self, index: usize) -> NavigableNode<T> {
NavigableNode {
node: &self.node.children[index],
parent: Some(self)
}
}
}
impl<T> Node<T> {
fn navigate<'a>(&'a self) -> NavigableNode<T> {
NavigableNode { node: self, parent: None }
}
}
This solution works fine if you don't need to mutate the tree as you navigate it and you can keep the parent NavigableNode objects around (which works fine for a recursive algorithm, but doesn't work too well if you want to store a NavigableNode in some other data structure and keep it around). The second restriction can be alleviated by using something other than a borrowed pointer to store the parent; if you want maximum genericity, you can use the Borrow trait to allow direct values, borrowed pointers, Boxes, Rc's, etc.
Now, let's talk about zippers. In functional programming, zippers are used to "focus" on a particular element of a data structure (list, tree, map, etc.) so that accessing that element takes constant time, while still retaining all the data of that data structure. If you need to navigate your tree and mutate it during the navigation, while retaining the ability to navigate up the tree, then you could turn a tree into a zipper and perform the modifications through the zipper.
Here's how we could implement a zipper for the Node defined above:
#[derive(Debug)]
struct NodeZipper<T> {
node: Node<T>,
parent: Option<Box<NodeZipper<T>>>,
index_in_parent: usize,
}
impl<T> NodeZipper<T> {
fn child(mut self, index: usize) -> NodeZipper<T> {
// Remove the specified child from the node's children.
// A NodeZipper shouldn't let its users inspect its parent,
// since we mutate the parents
// to move the focused nodes out of their list of children.
// We use swap_remove() for efficiency.
let child = self.node.children.swap_remove(index);
// Return a new NodeZipper focused on the specified child.
NodeZipper {
node: child,
parent: Some(Box::new(self)),
index_in_parent: index,
}
}
fn parent(self) -> NodeZipper<T> {
// Destructure this NodeZipper
let NodeZipper { node, parent, index_in_parent } = self;
// Destructure the parent NodeZipper
let NodeZipper {
node: mut parent_node,
parent: parent_parent,
index_in_parent: parent_index_in_parent,
} = *parent.unwrap();
// Insert the node of this NodeZipper back in its parent.
// Since we used swap_remove() to remove the child,
// we need to do the opposite of that.
parent_node.children.push(node);
let len = parent_node.children.len();
parent_node.children.swap(index_in_parent, len - 1);
// Return a new NodeZipper focused on the parent.
NodeZipper {
node: parent_node,
parent: parent_parent,
index_in_parent: parent_index_in_parent,
}
}
fn finish(mut self) -> Node<T> {
while let Some(_) = self.parent {
self = self.parent();
}
self.node
}
}
impl<T> Node<T> {
fn zipper(self) -> NodeZipper<T> {
NodeZipper { node: self, parent: None, index_in_parent: 0 }
}
}
To use this zipper, you need to have ownership of the root node of the tree. By taking ownership of the nodes, the zipper can move things around in order to avoid copying or cloning nodes. When we move a zipper, we actually drop the old zipper and create a new one (though we could also do it by mutating self, but I thought it was clearer that way, plus it lets you chain method calls).
If the above options are not satisfactory, and you must absolutely store the parent of a node in a node, then the next best option is to use Rc<RefCell<Node<T>>> to refer to the parent and Weak<RefCell<Node<T>>> to the children. Rc enables shared ownership, but adds overhead to perform reference counting at runtime. RefCell enables interior mutability, but adds overhead to keep track of the active borrows at runtime. Weak is like Rc, but it doesn't increment the reference count; this is used to break reference cycles, which would prevent the reference count from dropping to zero, causing a memory leak. See DK.'s answer for an implementation using Rc, Weak and RefCell.
The problem is that this data structure is inherently unsafe; it doesn't have a direct equivalent in Rust that doesn't use unsafe. This is by design.
If you want to translate this into safe Rust code, you need to be more specific about what, exactly, you want from it. I know you listed some properties above, but often people coming to Rust will say "I want everything I have in this C/C++ code", to which the direct answer is "well, you can't."
You're also, unavoidably, going to have to change how you approach this. The example you've given has pointers without any ownership semantics, mutable aliasing, and cycles; all of which Rust will not allow you to simply ignore like C++ does.
The simplest solution is to just get rid of the parent pointer, and maintain that externally (like a filesystem path). This also plays nicely with borrowing because there are no cycles anywhere:
pub struct Node1 {
children: Vec<Node1>,
}
If you need parent pointers, you could go half-way and use Ids instead:
use std::collections::BTreeMap;
type Id = usize;
pub struct Tree {
descendants: BTreeMap<Id, Node2>,
root: Option<Id>,
}
pub struct Node2 {
parent: Id,
children: Vec<Id>,
}
The BTreeMap is effectively your "address space", bypassing borrowing and aliasing issues by not directly using memory addresses.
Of course, this introduces the problem of a given Id not being tied to the particular tree, meaning that the node it belongs to could be destroyed, and now you have what is effectively a dangling pointer. But, that's the price you pay for having aliasing and mutation. It's also less direct.
Or, you could go whole-hog and use reference-counting and dynamic borrow checking:
use std::cell::RefCell;
use std::rc::{Rc, Weak};
// Note: do not derive Clone to make this move-only.
pub struct Node3(Rc<RefCell<Node3_>>);
pub type WeakNode3 = Weak<RefCell<Node3>>;
pub struct Node3_ {
parent: Option<WeakNode3>,
children: Vec<Node3>,
}
impl Node3 {
pub fn add(&self, node: Node3) {
// No need to remove from old parent; move semantics mean that must have
// already been done.
(node.0).borrow_mut().parent = Some(Rc::downgrade(&self.0));
self.children.push(node);
}
}
Here, you'd use Node3 to transfer ownership of a node between parts of the tree, and WeakNode3 for external references. Or, you could make Node3 cloneable and add back the logic in add to make sure a given node doesn't accidentally stay a child of the wrong parent.
This is not strictly better than the second option, because this design absolutely cannot benefit from static borrow-checking. The second one can at least prevent you from mutating the graph from two places at once at compile time; here, if that happens, you'll just crash.
The point is: you can't just have everything. You have to decide which operations you actually need to support. At that point, it's usually just a case of picking the types that give you the necessary properties.
In certain cases, you can also use an arena. An arena guarantees that values stored in it will have the same lifetime as the arena itself. This means that adding more values will not invalidate any existing lifetimes, but moving the arena will. Thus, such a solution is not viable if you need to return the tree.
This solves the problem by removing the ownership from the nodes themselves.
Here's an example that also uses interior mutability to allow a node to be mutated after it is created. In other cases, you can remove this mutability if the tree is constructed once and then simply navigated.
use std::{
cell::{Cell, RefCell},
fmt,
};
use typed_arena::Arena; // 1.6.1
struct Tree<'a, T: 'a> {
nodes: Arena<Node<'a, T>>,
}
impl<'a, T> Tree<'a, T> {
fn new() -> Tree<'a, T> {
Self {
nodes: Arena::new(),
}
}
fn new_node(&'a self, data: T) -> &'a mut Node<'a, T> {
self.nodes.alloc(Node {
data,
tree: self,
parent: Cell::new(None),
children: RefCell::new(Vec::new()),
})
}
}
struct Node<'a, T: 'a> {
data: T,
tree: &'a Tree<'a, T>,
parent: Cell<Option<&'a Node<'a, T>>>,
children: RefCell<Vec<&'a Node<'a, T>>>,
}
impl<'a, T> Node<'a, T> {
fn add_node(&'a self, data: T) -> &'a Node<'a, T> {
let child = self.tree.new_node(data);
child.parent.set(Some(self));
self.children.borrow_mut().push(child);
child
}
}
impl<'a, T> fmt::Debug for Node<'a, T>
where
T: fmt::Debug,
{
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{:?}", self.data)?;
write!(f, " (")?;
for c in self.children.borrow().iter() {
write!(f, "{:?}, ", c)?;
}
write!(f, ")")
}
}
fn main() {
let tree = Tree::new();
let head = tree.new_node(1);
let _left = head.add_node(2);
let _right = head.add_node(3);
println!("{:?}", head); // 1 (2 (), 3 (), )
}
TL;DR: DK.'s second version doesn't compile because parent has another type than self.0, fix it by converting it to a WeakNode. Also, in the line directly below, "self" doesn't have a "children" attribute but self.0 has.
I corrected the version of DK. so it compiles and works. Here is my Code:
dk_tree.rs
use std::cell::RefCell;
use std::rc::{Rc, Weak};
// Note: do not derive Clone to make this move-only.
pub struct Node(Rc<RefCell<Node_>>);
pub struct WeakNode(Weak<RefCell<Node_>>);
struct Node_ {
parent: Option<WeakNode>,
children: Vec<Node>,
}
impl Node {
pub fn new() -> Self {
Node(Rc::new(RefCell::new(Node_ {
parent: None,
children: Vec::new(),
})))
}
pub fn add(&self, node: Node) {
// No need to remove from old parent; move semantics mean that must have
// already been done.
node.0.borrow_mut().parent = Some(WeakNode(Rc::downgrade(&self.0)));
self.0.borrow_mut().children.push(node);
}
// just to have something visually printed
pub fn to_str(&self) -> String {
let mut result_string = "[".to_string();
for child in self.0.borrow().children.iter() {
result_string.push_str(&format!("{},", child.to_str()));
}
result_string += "]";
result_string
}
}
and then the main function in main.rs:
mod dk_tree;
use crate::dk_tree::{Node};
fn main() {
let root = Node::new();
root.add(Node::new());
root.add(Node::new());
let inner_child = Node::new();
inner_child.add(Node::new());
inner_child.add(Node::new());
root.add(inner_child);
let result = root.to_str();
println!("{result:?}");
}
The reason I made the WeakNode be more like the Node is to have an easier conversion between the both