Share a reference variable in two collections - rust

I am trying to share a reference in two collections: a map and a list. I want to push to both of the collections and remove from the back of the list and remove from the map too. This code is just a sample to present what I want to do, it doesn't even compile!
What is the right paradigm to implement this? What kind of guarantees are the best practice?
use std::collections::{HashMap, LinkedList};
struct Data {
url: String,
access: u64,
}
struct Holder {
list: LinkedList<Box<Data>>,
map: HashMap<String, Box<Data>>,
}
impl Holder {
fn push(&mut self, d: Data) {
let boxed = Box::new(d);
self.list.push_back(boxed);
self.map.insert(d.url.to_owned(), boxed);
}
fn remove_last(&mut self) {
if let Some(v) = self.list.back() {
self.map.remove(&v.url);
}
self.list.pop_back();
}
}

The idea of storing a single item in multiple collections simultaneously is not new... and is not simple.
The common idea to sharing elements is either doing it unsafely (knowing how many collections there are) or doing it with a shared pointer.
There is a some inefficiency in just blindly using regular collections with a shared pointer:
Node based collections mean you have a double indirection
Switching from one view to the other require finding the element all over again
In Boost.MultiIndex (C++), the paradigm used is to create an intrusive node to wrap the value, and then link this node in the various views. The trick (and difficulty) is to craft the intrusive node in a way that allows getting to the surrounding elements to be able to "unlink" it in O(1) or O(log N).
It would be quite unsafe to do so, and I don't recommend attempting it unless you are ready to spend significant time on it.
Thus, for a quick solution:
type Node = Rc<RefCell<Data>>;
struct Holder {
list: LinkedList<Node>,
map: HashMap<String, Node>,
}
As long as you do not need to remove by url, this is efficient enough.
Complete example:
use std::collections::{HashMap, LinkedList};
use std::cell::RefCell;
use std::rc::Rc;
struct Data {
url: String,
access: u64,
}
type Node = Rc<RefCell<Data>>;
struct Holder {
list: LinkedList<Node>,
map: HashMap<String, Node>,
}
impl Holder {
fn push(&mut self, d: Data) {
let url = d.url.to_owned();
let node = Rc::new(RefCell::new(d));
self.list.push_back(Rc::clone(&node));
self.map.insert(url, node);
}
fn remove_last(&mut self) {
if let Some(node) = self.list.back() {
self.map.remove(&node.borrow().url);
}
self.list.pop_back();
}
}
fn main() {}

Personally, I would use Rc instead of Box.
Alternatively, you could store indexes into your list as the value type of your hash map (i.e. use HashMap<String, usize> instead of HashMap<String, Box<Data>>.
Depending on what you are trying to do, it might be a better idea to only have either a list or a map, and not both.

Related

Zero cost builder pattern for recursive data structure using transmute. Is this safe? Is there a better approach?

I would like to create a struct using the builder pattern which must be validated before construction, and I would like to minimize the construction overhead.
I've come up with a nice way to do that using std::mem::transmute, but I'm far from confident that this approach is really safe, or that it's the best approach.
Here's my code: (Rust Playground)
#[derive(Debug)]
pub struct ValidStruct {
items: Vec<ValidStruct>
}
#[derive(Debug)]
pub struct Builder {
pub items: Vec<Builder>
}
#[derive(Debug)]
pub struct InvalidStructError {}
impl Builder {
pub fn new() -> Self {
Self { items: vec![] }
}
pub fn is_valid(&self) -> bool {
self.items.len() % 2 == 1
}
pub fn build(self) -> Result<ValidStruct, InvalidStructError> {
if !self.is_valid() {
return Err(InvalidStructError {});
}
unsafe {
Ok(std::mem::transmute::<Builder, ValidStruct>(self))
}
}
}
fn main() {
let mut builder = Builder::new();
builder.items.push(Builder::new());
let my_struct = builder.build().unwrap();
println!("{:?}", my_struct)
}
So, this seems to work. I think it should be safe because I know the two structs are identical. Am I missing anything? Could this actually cause problems somehow, or is there a cleaner/better approach available?
You can't normally transmute between different structures just because they seem to have the same fields in the same order, because the compiler might change that. You can avoid the risk by forcing the memory layout but you're then fighting the compiler and preventing optimizations. This approach isn't usually recommended and is, in my opinion, not needed here.
What you want is to have
a recursive data structure with public fields so that you can easily build it
an identical structure, built from the first one but with no public access and only built after validation of the first one
And you want to avoid useless copies for performance reasons.
What I suggest is to have a wrapper class. This makes sense because wrapping a struct in another one is totally costless in Rust.
You could thus have
/// This is the "Builder" struct
pub struct Data {
pub items: Vec<Data>,
}
pub struct ValidStruct {
data: Data, // no public access here
}
impl Data {
pub fn build(self) -> Result<ValidStruct, InvalidStructError> {
if !self.is_valid() {
return Err(InvalidStructError {});
}
Ok(Self{ data })
}
}
(alternatively, you could declare a struct Builder as a wrapper of Data too but with a public access to its field)

Ownership when each item in a vector needs all the other items in said vector

I'm very new to Rust and I'm having trouble figuring out how to handle ownership (and lifetime?) in this case. I'm not new to programming and I'm actually re-implementing something that already works in JS (https://sheraff.github.io/boids/).
In this context, Universe is basically a "simulation" environment containing many Entity. On each universe.tick(), each Entity needs to know which other Entity it can "see". For that, entity.update() takes a list of Entity as an argument.
As a result, I have this piece of script that I can't seem to find a good ownership structure for:
for mut entity in self.entities {
entity.update(&self.entities);
}
How would you go about it?
I could tell you what I have tried but it feels like I have tried everything I could think of, including reading the Rust book, and I've had all the possible error messages... Sorry.
PS: of course, in time, it's not the entire list of entities that would be passed in entity.update but a filtered list.
Below is a general simplification of the code structure:
pub struct Entity {
pub point: Point,
vision: Cone,
angle: Angle
}
impl Entity {
pub fn update(&mut self, entities: &Vec<Entity>) {
// ...
}
}
pub struct Universe {
pub entities: Vec<Entity>
}
impl Universe {
pub fn new() -> Universe {
let mut entities: Vec<Entity> = vec![];
for _ in 1..200 {
let mut entity = Entity::new();
entities.push(entity);
}
Universe {
entities
}
}
pub fn tick(&self) {
for mut entity in self.entities {
entity.update(&self.entities);
}
}
}
If each entity needs to know about each entity, then it seems like it's not the entity that's responsible for the update. You should probably refactor from a method on the entity to a function that operates on all entities.
Iterating over the entities in a nested loop is a bit difficult. One solution would be to iterate over one copy of the array and allow the other loop to take a mutable reference.
pub fn update_entities(entities: &mut Vec<Entity>) {
for a in entities.clone() {
for mut entity in entities.iter_mut() {
if entity.id == a.id { // If you can compare them, otherwise use indexed for loop and compare indices.
continue;
}
// Do stuff.
}
}
}
The reason why you need to copy is because of the golden rule of Rust: you can have either one mutable reference or any number of immutable references. Nested for loops break this rule because you need one mutable reference and one immutable.

What's the best way to have a struct that contains a reference to another struct of the same type? [duplicate]

I am trying to implement a scenegraph-like data structure in Rust. I would like an equivalent to this C++ code expressed in safe Rust:
struct Node
{
Node* parent; // should be mutable, and nullable (no parent)
std::vector<Node*> child;
virtual ~Node()
{
for(auto it = child.begin(); it != child.end(); ++it)
{
delete *it;
}
}
void addNode(Node* newNode)
{
if (newNode->parent)
{
newNode->parent.child.erase(newNode->parent.child.find(newNode));
}
newNode->parent = this;
child.push_back(newNode);
}
}
Properties I want:
the parent takes ownership of its children
the nodes are accessible from the outside via a reference of some kind
events that touch one Node can potentially mutate the whole tree
Rust tries to ensure memory safety by forbidding you from doing things that might potentially be unsafe. Since this analysis is performed at compile-time, the compiler can only reason about a subset of manipulations that are known to be safe.
In Rust, you could easily store either a reference to the parent (by borrowing the parent, thus preventing mutation) or the list of child nodes (by owning them, which gives you more freedom), but not both (without using unsafe). This is especially problematic for your implementation of addNode, which requires mutable access to the given node's parent. However, if you store the parent pointer as a mutable reference, then, since only a single mutable reference to a particular object may be usable at a time, the only way to access the parent would be through a child node, and you'd only be able to have a single child node, otherwise you'd have two mutable references to the same parent node.
If you want to avoid unsafe code, there are many alternatives, but they'll all require some sacrifices.
The easiest solution is to simply remove the parent field. We can define a separate data structure to remember the parent of a node while we traverse a tree, rather than storing it in the node itself.
First, let's define Node:
#[derive(Debug)]
struct Node<T> {
data: T,
children: Vec<Node<T>>,
}
impl<T> Node<T> {
fn new(data: T) -> Node<T> {
Node { data: data, children: vec![] }
}
fn add_child(&mut self, child: Node<T>) {
self.children.push(child);
}
}
(I added a data field because a tree isn't super useful without data at the nodes!)
Let's now define another struct to track the parent as we navigate:
#[derive(Debug)]
struct NavigableNode<'a, T: 'a> {
node: &'a Node<T>,
parent: Option<&'a NavigableNode<'a, T>>,
}
impl<'a, T> NavigableNode<'a, T> {
fn child(&self, index: usize) -> NavigableNode<T> {
NavigableNode {
node: &self.node.children[index],
parent: Some(self)
}
}
}
impl<T> Node<T> {
fn navigate<'a>(&'a self) -> NavigableNode<T> {
NavigableNode { node: self, parent: None }
}
}
This solution works fine if you don't need to mutate the tree as you navigate it and you can keep the parent NavigableNode objects around (which works fine for a recursive algorithm, but doesn't work too well if you want to store a NavigableNode in some other data structure and keep it around). The second restriction can be alleviated by using something other than a borrowed pointer to store the parent; if you want maximum genericity, you can use the Borrow trait to allow direct values, borrowed pointers, Boxes, Rc's, etc.
Now, let's talk about zippers. In functional programming, zippers are used to "focus" on a particular element of a data structure (list, tree, map, etc.) so that accessing that element takes constant time, while still retaining all the data of that data structure. If you need to navigate your tree and mutate it during the navigation, while retaining the ability to navigate up the tree, then you could turn a tree into a zipper and perform the modifications through the zipper.
Here's how we could implement a zipper for the Node defined above:
#[derive(Debug)]
struct NodeZipper<T> {
node: Node<T>,
parent: Option<Box<NodeZipper<T>>>,
index_in_parent: usize,
}
impl<T> NodeZipper<T> {
fn child(mut self, index: usize) -> NodeZipper<T> {
// Remove the specified child from the node's children.
// A NodeZipper shouldn't let its users inspect its parent,
// since we mutate the parents
// to move the focused nodes out of their list of children.
// We use swap_remove() for efficiency.
let child = self.node.children.swap_remove(index);
// Return a new NodeZipper focused on the specified child.
NodeZipper {
node: child,
parent: Some(Box::new(self)),
index_in_parent: index,
}
}
fn parent(self) -> NodeZipper<T> {
// Destructure this NodeZipper
let NodeZipper { node, parent, index_in_parent } = self;
// Destructure the parent NodeZipper
let NodeZipper {
node: mut parent_node,
parent: parent_parent,
index_in_parent: parent_index_in_parent,
} = *parent.unwrap();
// Insert the node of this NodeZipper back in its parent.
// Since we used swap_remove() to remove the child,
// we need to do the opposite of that.
parent_node.children.push(node);
let len = parent_node.children.len();
parent_node.children.swap(index_in_parent, len - 1);
// Return a new NodeZipper focused on the parent.
NodeZipper {
node: parent_node,
parent: parent_parent,
index_in_parent: parent_index_in_parent,
}
}
fn finish(mut self) -> Node<T> {
while let Some(_) = self.parent {
self = self.parent();
}
self.node
}
}
impl<T> Node<T> {
fn zipper(self) -> NodeZipper<T> {
NodeZipper { node: self, parent: None, index_in_parent: 0 }
}
}
To use this zipper, you need to have ownership of the root node of the tree. By taking ownership of the nodes, the zipper can move things around in order to avoid copying or cloning nodes. When we move a zipper, we actually drop the old zipper and create a new one (though we could also do it by mutating self, but I thought it was clearer that way, plus it lets you chain method calls).
If the above options are not satisfactory, and you must absolutely store the parent of a node in a node, then the next best option is to use Rc<RefCell<Node<T>>> to refer to the parent and Weak<RefCell<Node<T>>> to the children. Rc enables shared ownership, but adds overhead to perform reference counting at runtime. RefCell enables interior mutability, but adds overhead to keep track of the active borrows at runtime. Weak is like Rc, but it doesn't increment the reference count; this is used to break reference cycles, which would prevent the reference count from dropping to zero, causing a memory leak. See DK.'s answer for an implementation using Rc, Weak and RefCell.
The problem is that this data structure is inherently unsafe; it doesn't have a direct equivalent in Rust that doesn't use unsafe. This is by design.
If you want to translate this into safe Rust code, you need to be more specific about what, exactly, you want from it. I know you listed some properties above, but often people coming to Rust will say "I want everything I have in this C/C++ code", to which the direct answer is "well, you can't."
You're also, unavoidably, going to have to change how you approach this. The example you've given has pointers without any ownership semantics, mutable aliasing, and cycles; all of which Rust will not allow you to simply ignore like C++ does.
The simplest solution is to just get rid of the parent pointer, and maintain that externally (like a filesystem path). This also plays nicely with borrowing because there are no cycles anywhere:
pub struct Node1 {
children: Vec<Node1>,
}
If you need parent pointers, you could go half-way and use Ids instead:
use std::collections::BTreeMap;
type Id = usize;
pub struct Tree {
descendants: BTreeMap<Id, Node2>,
root: Option<Id>,
}
pub struct Node2 {
parent: Id,
children: Vec<Id>,
}
The BTreeMap is effectively your "address space", bypassing borrowing and aliasing issues by not directly using memory addresses.
Of course, this introduces the problem of a given Id not being tied to the particular tree, meaning that the node it belongs to could be destroyed, and now you have what is effectively a dangling pointer. But, that's the price you pay for having aliasing and mutation. It's also less direct.
Or, you could go whole-hog and use reference-counting and dynamic borrow checking:
use std::cell::RefCell;
use std::rc::{Rc, Weak};
// Note: do not derive Clone to make this move-only.
pub struct Node3(Rc<RefCell<Node3_>>);
pub type WeakNode3 = Weak<RefCell<Node3>>;
pub struct Node3_ {
parent: Option<WeakNode3>,
children: Vec<Node3>,
}
impl Node3 {
pub fn add(&self, node: Node3) {
// No need to remove from old parent; move semantics mean that must have
// already been done.
(node.0).borrow_mut().parent = Some(Rc::downgrade(&self.0));
self.children.push(node);
}
}
Here, you'd use Node3 to transfer ownership of a node between parts of the tree, and WeakNode3 for external references. Or, you could make Node3 cloneable and add back the logic in add to make sure a given node doesn't accidentally stay a child of the wrong parent.
This is not strictly better than the second option, because this design absolutely cannot benefit from static borrow-checking. The second one can at least prevent you from mutating the graph from two places at once at compile time; here, if that happens, you'll just crash.
The point is: you can't just have everything. You have to decide which operations you actually need to support. At that point, it's usually just a case of picking the types that give you the necessary properties.
In certain cases, you can also use an arena. An arena guarantees that values stored in it will have the same lifetime as the arena itself. This means that adding more values will not invalidate any existing lifetimes, but moving the arena will. Thus, such a solution is not viable if you need to return the tree.
This solves the problem by removing the ownership from the nodes themselves.
Here's an example that also uses interior mutability to allow a node to be mutated after it is created. In other cases, you can remove this mutability if the tree is constructed once and then simply navigated.
use std::{
cell::{Cell, RefCell},
fmt,
};
use typed_arena::Arena; // 1.6.1
struct Tree<'a, T: 'a> {
nodes: Arena<Node<'a, T>>,
}
impl<'a, T> Tree<'a, T> {
fn new() -> Tree<'a, T> {
Self {
nodes: Arena::new(),
}
}
fn new_node(&'a self, data: T) -> &'a mut Node<'a, T> {
self.nodes.alloc(Node {
data,
tree: self,
parent: Cell::new(None),
children: RefCell::new(Vec::new()),
})
}
}
struct Node<'a, T: 'a> {
data: T,
tree: &'a Tree<'a, T>,
parent: Cell<Option<&'a Node<'a, T>>>,
children: RefCell<Vec<&'a Node<'a, T>>>,
}
impl<'a, T> Node<'a, T> {
fn add_node(&'a self, data: T) -> &'a Node<'a, T> {
let child = self.tree.new_node(data);
child.parent.set(Some(self));
self.children.borrow_mut().push(child);
child
}
}
impl<'a, T> fmt::Debug for Node<'a, T>
where
T: fmt::Debug,
{
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{:?}", self.data)?;
write!(f, " (")?;
for c in self.children.borrow().iter() {
write!(f, "{:?}, ", c)?;
}
write!(f, ")")
}
}
fn main() {
let tree = Tree::new();
let head = tree.new_node(1);
let _left = head.add_node(2);
let _right = head.add_node(3);
println!("{:?}", head); // 1 (2 (), 3 (), )
}
TL;DR: DK.'s second version doesn't compile because parent has another type than self.0, fix it by converting it to a WeakNode. Also, in the line directly below, "self" doesn't have a "children" attribute but self.0 has.
I corrected the version of DK. so it compiles and works. Here is my Code:
dk_tree.rs
use std::cell::RefCell;
use std::rc::{Rc, Weak};
// Note: do not derive Clone to make this move-only.
pub struct Node(Rc<RefCell<Node_>>);
pub struct WeakNode(Weak<RefCell<Node_>>);
struct Node_ {
parent: Option<WeakNode>,
children: Vec<Node>,
}
impl Node {
pub fn new() -> Self {
Node(Rc::new(RefCell::new(Node_ {
parent: None,
children: Vec::new(),
})))
}
pub fn add(&self, node: Node) {
// No need to remove from old parent; move semantics mean that must have
// already been done.
node.0.borrow_mut().parent = Some(WeakNode(Rc::downgrade(&self.0)));
self.0.borrow_mut().children.push(node);
}
// just to have something visually printed
pub fn to_str(&self) -> String {
let mut result_string = "[".to_string();
for child in self.0.borrow().children.iter() {
result_string.push_str(&format!("{},", child.to_str()));
}
result_string += "]";
result_string
}
}
and then the main function in main.rs:
mod dk_tree;
use crate::dk_tree::{Node};
fn main() {
let root = Node::new();
root.add(Node::new());
root.add(Node::new());
let inner_child = Node::new();
inner_child.add(Node::new());
inner_child.add(Node::new());
root.add(inner_child);
let result = root.to_str();
println!("{result:?}");
}
The reason I made the WeakNode be more like the Node is to have an easier conversion between the both

How to convert &Vector<Mutex> to Vector<Mutex>

I'm working my way trough the Rust examples. There is this piece of code:
fn new(name: &str, left: usize, right: usize) -> Philosopher {
Philosopher {
name: name.to_string(),
left: left,
right: right,
}
}
what is the best way to adapt this to a vector ? This works:
fn new(v: Vec<Mutex<()>>) -> Table {
Table {
forks: v
}
}
Than I tried the following:
fn new(v: &Vec<Mutex<()>>) -> Table {
Table {
forks: v.to_vec()
}
}
But that gives me:
the trait `core::clone::Clone` is not implemented
for the type `std::sync::mutex::Mutex<()>`
Which make sense. But what must I do If I want to pass a reference to Table and do not want to store a reference inside the Table struct ?
The error message actually explains a lot here. When you call to_vec on a &Vec<_>, you have to make a clone of the entire vector. That's because Vec owns the data, while a reference does not. In order to clone a vector, you also have to clone all of the contents. This is because the vector owns all the items inside of it.
However, your vector contains a Mutex, which is not able to be cloned. A mutex represents unique access to some data, so having two separate mutexes to the same data would be pointless.
Instead, you probably want to share references to the mutex, not clone it completely. Chances are, you want an Arc:
use std::sync::{Arc, Mutex};
fn main() {
let things = vec![Arc::new(Mutex::new(()))];
things.to_vec();
}

Can't figure out a function to return a reference to a given type stored in RefCell<Box<Any>>

Most of this is boilerplate, provided as a compilable example. Scroll down.
use std::rc::{Rc, Weak};
use std::cell::RefCell;
use std::any::{Any, AnyRefExt};
struct Shared {
example: int,
}
struct Widget {
parent: Option<Weak<Box<Widget>>>,
specific: RefCell<Box<Any>>,
shared: RefCell<Shared>,
}
impl Widget {
fn new(specific: Box<Any>,
parent: Option<Rc<Box<Widget>>>) -> Rc<Box<Widget>> {
let parent_option = match parent {
Some(parent) => Some(parent.downgrade()),
None => None,
};
let shared = Shared{pos: 10};
Rc::new(box Widget{
parent: parent_option,
specific: RefCell::new(specific),
shared: RefCell::new(shared)})
}
}
struct Woo {
foo: int,
}
impl Woo {
fn new() -> Box<Any> {
box Woo{foo: 10} as Box<Any>
}
}
fn main() {
let widget = Widget::new(Woo::new(), None);
{
// This is a lot of work...
let cell_borrow = widget.specific.borrow();
let woo = cell_borrow.downcast_ref::<Woo>().unwrap();
println!("{}", woo.foo);
}
// Can the above be made into a function?
// let woo = widget.get_specific::<Woo>();
}
I'm learning Rust and trying to figure out some workable way of doing a widget hierarchy. The above basically does what I need, but it is a bit cumbersome. Especially vexing is the fact that I have to use two statements to convert the inner widget (specific member of Widget). I tried several ways of writing a function that does it all, but the amount of reference and lifetime wizardry is just beyond me.
Can it be done? Can the commented out method at the bottom of my example code be made into reality?
Comments regarding better ways of doing this entire thing are appreciated, but put it in the comments section (or create a new question and link it)
I'll just present a working simplified and more idiomatic version of your code and then explain all the changed I made there:
use std::rc::{Rc, Weak};
use std::any::{Any, AnyRefExt};
struct Shared {
example: int,
}
struct Widget {
parent: Option<Weak<Widget>>,
specific: Box<Any>,
shared: Shared,
}
impl Widget {
fn new(specific: Box<Any>, parent: Option<Rc<Widget>>) -> Widget {
let parent_option = match parent {
Some(parent) => Some(parent.downgrade()),
None => None,
};
let shared = Shared { example: 10 };
Widget{
parent: parent_option,
specific: specific,
shared: shared
}
}
fn get_specific<T: 'static>(&self) -> Option<&T> {
self.specific.downcast_ref::<T>()
}
}
struct Woo {
foo: int,
}
impl Woo {
fn new() -> Woo {
Woo { foo: 10 }
}
}
fn main() {
let widget = Widget::new(box Woo::new() as Box<Any>, None);
let specific = widget.get_specific::<Woo>().unwrap();
println!("{}", specific.foo);
}
First of all, there are needless RefCells inside your structure. RefCells are needed very rarely - only when you need to mutate internal state of an object using only & reference (instead of &mut). This is useful tool for implementing abstractions, but it is almost never needed in application code. Because it is not clear from your code that you really need it, I assume that it was used mistakingly and can be removed.
Next, Rc<Box<Something>> when Something is a struct (like in your case where Something = Widget) is redundant. Putting an owned box into a reference-counted box is just an unnecessary double indirection and allocation. Plain Rc<Widget> is the correct way to express this thing. When dynamically sized types land, it will be also true for trait objects.
Finally, you should try to always return unboxed values. Returning Rc<Box<Widget>> (or even Rc<Widget>) is unnecessary limiting for the callers of your code. You can go from Widget to Rc<Widget> easily, but not the other way around. Rust optimizes by-value returns automatically; if your callers need Rc<Widget> they can box the return value themselves:
let rc_w = box(RC) Widget::new(...);
Same thing is also true for Box<Any> returned by Woo::new().
You can see that in the absence of RefCells your get_specific() method can be implemented very easily. However, you really can't do it with RefCell because RefCell uses RAII for dynamic borrow checks, so you can't return a reference to its internals from a method. You'll have to return core::cell::Refs, and your callers will need to downcast_ref() themselves. This is just another reason to use RefCells sparingly.

Resources