Get real value or None from a HashMap - rust

I have a struct that I would like to fill based on a potentially incomplete hash map:
#[derive(Debug)]
struct Sue {
children: Option<u8>,
cats: Option<u8>,
samoyeds: Option<u8>,
pomeranians: Option<u8>,
akitas: Option<u8>,
vizslas: Option<u8>,
goldfish: Option<u8>,
trees: Option<u8>,
cars: Option<u8>,
perfumes: Option<u8>,
}
impl Sue {
fn from_line(line: &str) -> Sue {
let data: Vec<&str> = line.split(" ").collect();
let items: HashMap<String, u8> = data[2..data.len()]
.chunks(2)
.map(|item| {
(
String::from(item[0].trim_end_matches(":")),
String::from(item[1].trim_end_matches(",")).parse().unwrap(),
)
})
.collect();
println!("{:?}\n", items);
Sue {
// children: match items.get("children"){
// Some(val) => Some(*val),
// None => None,
// },
children: *items.get("children"),
// cats: Some(0),
// samoyeds: Some(0),
// pomeranians: Some(0),
// akitas: Some(0),
// vizslas: Some(0),
// goldfish: Some(0),
// trees: Some(0),
// cars: Some(0),
// perfumes: Some(0),
}
}
}
In the snippet above, the items hash map may or may not contain a key for each field of the Sue struct. My idea was to build a hash map with the input, and then for each field of the struct, try to access the corresponding key of the hashmap. If the key is in the hashmap, return the dereferenced value. If the key isn't in the hashmap, return None.
I know I can do that with a match (there is a couple of lines commented showing how I would do this), but I was wondering if there is a more efficient way to do this (it's a lot of boilerplate; one match for each field).
If I run the code above, I'll get:
error[E0614]: type `Option<&u8>` cannot be dereferenced
Any idea? I do want to keep None if the key isn't found. Also for reference, this snippet is for advent of code 2015 day 16.

Use copied or cloned to convert an Option<&T> to Option<T>. The first is available if T: Copy, the second if T: Clone.
u8 implements both so you can use either method.
children: items.get("children").copied(),

For u8, either of the methods John Kugelman mentioned work great. However, there are some types that are neither Copy nor cheaply cloneable (maybe they don't implement Clone at all, or they do but it is costly).
A third option is to remove the item from the map:
struct ExpensiveClone(u8);
impl Clone for ExpensiveClone {
fn clone(&self) -> Self {
std::thread::sleep(Duration::from_secs(1));
Self(self.0)
}
}
In this case, cloning would likely be prohibitively expensive. Instead, if you have owned or mutable access to the map, you can remove it, transferring ownership of the value from the map to your function, without needing to clone the data:
let mut map = HashMap::new();
map.insert(1, ExpensiveClone(1));
let expensive_clone = map.remove(&1).unwrap();
assert_eq!(expensive_clone.0, 1);
It goes without saying that this only works if you can modify the map though

Related

Rust: Modify value in HashMap while immutably borrowing the whole HashMap

I'm trying to learn Rust by using it in a project of mine.
However, I've been struggling with the borrow checker quite a bit in some code which has a very similar form to the following:
use std::collections::HashMap;
use std::pin::Pin;
use std::vec::Vec;
struct MyStruct<'a> {
value: i32,
substructs: Option<Vec<Pin<&'a MyStruct<'a>>>>,
}
struct Toplevel<'a> {
my_structs: HashMap<String, Pin<Box<MyStruct<'a>>>>,
}
fn main() {
let mut toplevel = Toplevel {
my_structs: HashMap::new(),
};
// First pass: add the elements to the HashMap
toplevel.my_structs.insert(
"abc".into(),
Pin::new(Box::new(MyStruct {
value: 0,
substructs: None,
})),
);
toplevel.my_structs.insert(
"def".into(),
Pin::new(Box::new(MyStruct {
value: 5,
substructs: None,
})),
);
toplevel.my_structs.insert(
"ghi".into(),
Pin::new(Box::new(MyStruct {
value: -7,
substructs: None,
})),
);
// Second pass: for each MyStruct, add substructs
let subs = vec![
toplevel.my_structs.get("abc").unwrap().as_ref(),
toplevel.my_structs.get("def").unwrap().as_ref(),
toplevel.my_structs.get("ghi").unwrap().as_ref(),
];
toplevel.my_structs.get_mut("abc").unwrap().substructs = Some(subs);
}
When compiling, I get the following message:
error[E0502]: cannot borrow `toplevel.my_structs` as mutable because it is also borrowed as immutable
--> src/main.rs:48:5
|
44 | toplevel.my_structs.get("abc").unwrap().as_ref(),
| ------------------- immutable borrow occurs here
...
48 | toplevel.my_structs.get_mut("abc").unwrap().substructs = Some(subs);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^--------------------
| |
| mutable borrow occurs here
| immutable borrow later used here
I think I understand why this happens: toplevel.my_structs.get_mut(...) borrows toplevel.my_structs as mutable. However, in the same block, toplevel.my_structs.get(...) also borrows toplevel.my_structs (though this time as immutable).
I also see how this would indeed be a problem if the function which borrows &mut toplevel.my_structs, say, added a new key.
However, all that is done here in the &mut toplevel.my_structs borrow is modify the value corresponding to a specific key, which shouldn't change memory layout (and that's guaranteed, thanks to Pin). Right?
Is there a way to communicate this to the compiler, so that I can compile this code? This appears to be somewhat similar to what motivates the hashmap::Entry API, but I need to be able to access other keys as well, not only the one I want to modify.
Your current problem is about conflicting mutable and immutable borrows, but there's a deeper problem here. This data structure cannot work for what you're trying to do:
struct MyStruct<'a> {
value: i32,
substructs: Option<Vec<Pin<&'a MyStruct<'a>>>>,
}
struct Toplevel<'a> {
my_structs: HashMap<String, Pin<Box<MyStruct<'a>>>>,
}
Any time a type has a lifetime parameter, that lifetime necessarily outlives (or lives exactly as long as) the values of that type. A container Toplevel<'a> which contains references &'a MyStruct must refer to MyStructs which were created before the Toplevel — unless you're using special tools like an arena allocator.
(It's possible to straightforwardly build a tree of references, but they must be constructed leaves first and not using a recursive algorithm; this is usually impractical for dynamic input data.)
In general, references are not really suitable for creating data structures; rather they're for temporarily “borrowing” parts of data structures.
In your case, if you want to have a collection of all the MyStructs and also be able to add connections between them after they are created, you need both shared ownership and interior mutability:
use std::collections::HashMap;
use std::cell::RefCell;
use std::rc::Rc;
struct MyStruct {
value: i32,
substructs: Option<Vec<Rc<RefCell<MyStruct>>>>,
}
struct Toplevel {
my_structs: HashMap<String, Rc<RefCell<MyStruct>>>,
}
The shared ownership via Rc allows both Toplevel and any number of MyStructs to refer to other MyStructs. The interior mutability via RefCell allows the MyStruct's substructs field to be modified even while it's being referred to from other elements of the overall data structure.
Given these definitions, you can write the code that you wanted:
fn main() {
let mut toplevel = Toplevel {
my_structs: HashMap::new(),
};
// First pass: add the elements to the HashMap
toplevel.my_structs.insert(
"abc".into(),
Rc::new(RefCell::new(MyStruct {
value: 0,
substructs: None,
})),
);
toplevel.my_structs.insert(
"def".into(),
Rc::new(RefCell::new(MyStruct {
value: 5,
substructs: None,
})),
);
toplevel.my_structs.insert(
"ghi".into(),
Rc::new(RefCell::new(MyStruct {
value: -7,
substructs: None,
})),
);
// Second pass: for each MyStruct, add substructs
let subs = vec![
toplevel.my_structs["abc"].clone(),
toplevel.my_structs["def"].clone(),
toplevel.my_structs["ghi"].clone(),
];
toplevel.my_structs["abc"].borrow_mut().substructs = Some(subs);
}
Note that because you're having "abc" refer to itself, this creates a reference cycle, which will not be freed when the Toplevel is dropped. To fix this, you can impl Drop for Toplevel and explicitly remove all the substructs references.
Another option, arguably more 'Rusty' is to just use indices for cross-references. This has several pros and cons:
Adds the cost of additional hash lookups.
Removes the cost of reference counting and interior mutability.
Can have “dangling references”: a key could be removed from the map, invalidating the references to it.
use std::collections::HashMap;
struct MyStruct {
value: i32,
substructs: Option<Vec<String>>,
}
struct Toplevel {
my_structs: HashMap<String, MyStruct>,
}
fn main() {
let mut toplevel = Toplevel {
my_structs: HashMap::new(),
};
// First pass: add the elements to the HashMap
toplevel.my_structs.insert(
"abc".into(),
MyStruct {
value: 0,
substructs: None,
},
);
toplevel.my_structs.insert(
"def".into(),
MyStruct {
value: 5,
substructs: None,
},
);
toplevel.my_structs.insert(
"ghi".into(),
MyStruct {
value: -7,
substructs: None,
},
);
// Second pass: for each MyStruct, add substructs
toplevel.my_structs.get_mut("abc").unwrap().substructs =
Some(vec!["abc".into(), "def".into(), "ghi".into()]);
}
In your code, you are attempting to modify a value referenced in the vector as immutable, which is not allowed. You could store mutable references in the vector instead, and mutate them directly, like this:
let subs = vec![
toplevel.my_structs.get_mut("abc").unwrap(),
toplevel.my_structs.get_mut("def").unwrap(),
toplevel.my_structs.get_mut("ghi").unwrap(),
];
(*subs[0]).substructs = Some(subs.clone());
However, it's easier (although more expensive) to store clones of the structs instead of references:
let subs = vec![
toplevel.my_structs.get("abc").unwrap().clone(),
toplevel.my_structs.get("def").unwrap().clone(),
toplevel.my_structs.get("ghi").unwrap().clone(),
];
(*toplevel.my_structs.get_mut("abc").unwrap()).substructs = Some(subs);

can I create a custom iterator the iterates over one sequence then another (chain doesnt work)

I have a struct Folder. I have a method called contents. I want that method to return an object that supports IntoIterator so that the caller can just go
for x in folder.contents(){
...
}
The Item type is (since this is what the hashmap iterator returns - see a little lower)
(&OsString, &FileOrFolder)
where FileOrFolder is an enum
enum FileOrFolder{
File(File),
Folder(Folder)
}
The iterator itself needs to first enumerate a HashMap<OSString, FileOrFolder> owned by the folder and then second, enumerate a Vec<File>. The Vec of files is created on the fly by the contents fn or by the IntoIterator call, whatever works. I tried simply using chain but quickly realized that wasn't going to work. So my rough sketch of what I am trying to do is this:
// the iterator
pub struct FFIter {
files: Vec<FileOrFolder>,
files_iter:Box<dyn Iterator<Item=FileOrFolder>>,
dirs: Box<dyn Iterator<Item = (&OsString, &FileOrFolder)>>,
dirs_done:bool
}
// the thing returned by the contents fn
struct FolderContents{
folder:&Folder
}
// make it iterable
impl IntoIterator for FolderContents {
type Item =(&OsString, &FileOrFolder);
type IntoIter = FFIter;
fn into_iter(self) -> Self::IntoIter {
let files = self.folder.make_the_files()
FFIter {
files: files, // to keep files 'alive'
files_iter: files.iter(),
dirs: Box::new(self.hashmap.iter()),
dirs_done:false
}
}
}
impl Iterator for FFIter {
type Item = (&OsString, &FileOrFolder);
fn next(&mut self) -> Option<(&OsString, &FileOrFolder)> {
None // return empty, lets just get the skeleton built
}
}
impl Folder{
pub fn contents(&self) -> FolderContents{
FolderContents{folder:&self}
}
}
I know this is full of errors, but I need to know if this is doable at all. As you can see I am not even trying to write the code that returns anything. I am just trying to get the basic outline to compile.
I started arm wrestling with the lifetime system and got to the point where I had this
error[E0658]: generic associated types are unstable
--> src\state\files\file_or_folder.rs:46:5
|
46 | type Item<'a> =(&'a OsString, &'a FileOrFolder);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: see issue #44265 <https://github.com/rust-lang/rust/issues/44265> for more information
Which kinda sucked as that is what the compiler said I should do.
I am happy to keep ploughing away at this following the suggestions from the compiler / reading / ... But in the past I have posted a question along these lines and been told - 'of course it can't be done'. So should I be able to make this work?
The Folder type is not Copy and expensive to clone. The File type is simple (string and i64), Copy and Clone
I know I could simply make the caller call two different iterations and merge them, but I am trying to write a transparent replacement module to drop into a large existing codebase.
If somebody says that chain() should work that's great, I will have another go at that.
EDIT Jmp said chain should work,
heres what I tried
pub fn contents(&self) -> Box<dyn Iterator<Item = (&OsString, &FileOrFolder)> + '_> {
let mut files = vec![];
if self.load_done {
for entry in WalkDir::new(&self.full_path)
.max_depth(1)
.skip_hidden(false)
.follow_links(false)
.into_iter()
{
let ent = entry.unwrap();
if ent.file_type().is_file() {
if let Some(name) = ent.path().file_name() {
files.push((
name.to_os_string(),
FileOrFolder::File(File {
name: name.to_os_string(),
size: ent.metadata().unwrap().len() as u128,
}),
));
}
}
}
};
Box::new(
self.contents
.iter()
.map(|(k, v)| (k, v))
.chain(files.iter().map(|x| (&x.0, &x.1))),
)
}
but the compiler complains, correctly, that 'files' get destroyed at the end of the call. What I need is for the vec to be held by the iterator and then dropped at the end of the iteration. Folder itself cannot hold the files - the whole point here is to populate files on the fly, its too expensive, memory wise to hold them.
You claim that files is populated on the fly, but that's precisely what your code is not doing: your code precomputes files before attempting to return it. The solution is to really compute files on the fly, something like this:
pub fn contents(&self) -> Box<dyn Iterator<Item = (&OsString, &FileOrFolder)> + '_> {
let files = WalkDir::new(&self.full_path)
.max_depth(1)
.skip_hidden(false)
.follow_links(false)
.into_iter()
.filter_map (|entry| {
let ent = entry.unwrap;
if ent.file_type().is_file() {
if let Some(name) = ent.path().file_name() {
Some((
name.to_os_string(),
FileOrFolder::File(File {
name: name.to_os_string(),
size: ent.metadata().unwrap().len() as u128,
}),
))
} else None
} else None
});
self.contents
.iter()
.chain (files)
}
Since you haven't given us an MRE, I haven't tested the above, but I think it will fail because self.contents.iter() returns references, whereas files returns owned values. Fixing this requires changing the prototype of the function to return some form of owned values since files cannot be made to return references. I see two ways to do this:
Easiest is to make FileOrFolder clonable and get rid of the references in the prototype:
pub fn contents(&self) -> Box<dyn Iterator<Item = (OsString, FileOrFolder)> + '_> {
let files = ...;
self.contents
.iter()
.cloned()
.chain (files)
Or you can make a wrapper type similar to Cow than can hold either a reference or an owned value:
enum OwnedOrRef<'a, T> {
Owned (T),
Ref (&'a T),
}
pub fn contents(&self) -> Box<dyn Iterator<Item = (OwnedOrRef::<OsString>, OwnedOrRef::<FileOrFolder>)> + '_> {
let files = ...;
self.contents
.iter()
.map (|(k, v)| (OwnedOrRef::Ref (k), OwnedOrRef::Ref (v))
.chain (files
.map (|(k, v)| (OwnedOrRef::Owned (k),
OwnedOrRef::Owned (v)))
}
You can even use Cow if FileOrFolder can implement ToOwned.

What's the best way to have a struct that contains a reference to another struct of the same type? [duplicate]

I am trying to implement a scenegraph-like data structure in Rust. I would like an equivalent to this C++ code expressed in safe Rust:
struct Node
{
Node* parent; // should be mutable, and nullable (no parent)
std::vector<Node*> child;
virtual ~Node()
{
for(auto it = child.begin(); it != child.end(); ++it)
{
delete *it;
}
}
void addNode(Node* newNode)
{
if (newNode->parent)
{
newNode->parent.child.erase(newNode->parent.child.find(newNode));
}
newNode->parent = this;
child.push_back(newNode);
}
}
Properties I want:
the parent takes ownership of its children
the nodes are accessible from the outside via a reference of some kind
events that touch one Node can potentially mutate the whole tree
Rust tries to ensure memory safety by forbidding you from doing things that might potentially be unsafe. Since this analysis is performed at compile-time, the compiler can only reason about a subset of manipulations that are known to be safe.
In Rust, you could easily store either a reference to the parent (by borrowing the parent, thus preventing mutation) or the list of child nodes (by owning them, which gives you more freedom), but not both (without using unsafe). This is especially problematic for your implementation of addNode, which requires mutable access to the given node's parent. However, if you store the parent pointer as a mutable reference, then, since only a single mutable reference to a particular object may be usable at a time, the only way to access the parent would be through a child node, and you'd only be able to have a single child node, otherwise you'd have two mutable references to the same parent node.
If you want to avoid unsafe code, there are many alternatives, but they'll all require some sacrifices.
The easiest solution is to simply remove the parent field. We can define a separate data structure to remember the parent of a node while we traverse a tree, rather than storing it in the node itself.
First, let's define Node:
#[derive(Debug)]
struct Node<T> {
data: T,
children: Vec<Node<T>>,
}
impl<T> Node<T> {
fn new(data: T) -> Node<T> {
Node { data: data, children: vec![] }
}
fn add_child(&mut self, child: Node<T>) {
self.children.push(child);
}
}
(I added a data field because a tree isn't super useful without data at the nodes!)
Let's now define another struct to track the parent as we navigate:
#[derive(Debug)]
struct NavigableNode<'a, T: 'a> {
node: &'a Node<T>,
parent: Option<&'a NavigableNode<'a, T>>,
}
impl<'a, T> NavigableNode<'a, T> {
fn child(&self, index: usize) -> NavigableNode<T> {
NavigableNode {
node: &self.node.children[index],
parent: Some(self)
}
}
}
impl<T> Node<T> {
fn navigate<'a>(&'a self) -> NavigableNode<T> {
NavigableNode { node: self, parent: None }
}
}
This solution works fine if you don't need to mutate the tree as you navigate it and you can keep the parent NavigableNode objects around (which works fine for a recursive algorithm, but doesn't work too well if you want to store a NavigableNode in some other data structure and keep it around). The second restriction can be alleviated by using something other than a borrowed pointer to store the parent; if you want maximum genericity, you can use the Borrow trait to allow direct values, borrowed pointers, Boxes, Rc's, etc.
Now, let's talk about zippers. In functional programming, zippers are used to "focus" on a particular element of a data structure (list, tree, map, etc.) so that accessing that element takes constant time, while still retaining all the data of that data structure. If you need to navigate your tree and mutate it during the navigation, while retaining the ability to navigate up the tree, then you could turn a tree into a zipper and perform the modifications through the zipper.
Here's how we could implement a zipper for the Node defined above:
#[derive(Debug)]
struct NodeZipper<T> {
node: Node<T>,
parent: Option<Box<NodeZipper<T>>>,
index_in_parent: usize,
}
impl<T> NodeZipper<T> {
fn child(mut self, index: usize) -> NodeZipper<T> {
// Remove the specified child from the node's children.
// A NodeZipper shouldn't let its users inspect its parent,
// since we mutate the parents
// to move the focused nodes out of their list of children.
// We use swap_remove() for efficiency.
let child = self.node.children.swap_remove(index);
// Return a new NodeZipper focused on the specified child.
NodeZipper {
node: child,
parent: Some(Box::new(self)),
index_in_parent: index,
}
}
fn parent(self) -> NodeZipper<T> {
// Destructure this NodeZipper
let NodeZipper { node, parent, index_in_parent } = self;
// Destructure the parent NodeZipper
let NodeZipper {
node: mut parent_node,
parent: parent_parent,
index_in_parent: parent_index_in_parent,
} = *parent.unwrap();
// Insert the node of this NodeZipper back in its parent.
// Since we used swap_remove() to remove the child,
// we need to do the opposite of that.
parent_node.children.push(node);
let len = parent_node.children.len();
parent_node.children.swap(index_in_parent, len - 1);
// Return a new NodeZipper focused on the parent.
NodeZipper {
node: parent_node,
parent: parent_parent,
index_in_parent: parent_index_in_parent,
}
}
fn finish(mut self) -> Node<T> {
while let Some(_) = self.parent {
self = self.parent();
}
self.node
}
}
impl<T> Node<T> {
fn zipper(self) -> NodeZipper<T> {
NodeZipper { node: self, parent: None, index_in_parent: 0 }
}
}
To use this zipper, you need to have ownership of the root node of the tree. By taking ownership of the nodes, the zipper can move things around in order to avoid copying or cloning nodes. When we move a zipper, we actually drop the old zipper and create a new one (though we could also do it by mutating self, but I thought it was clearer that way, plus it lets you chain method calls).
If the above options are not satisfactory, and you must absolutely store the parent of a node in a node, then the next best option is to use Rc<RefCell<Node<T>>> to refer to the parent and Weak<RefCell<Node<T>>> to the children. Rc enables shared ownership, but adds overhead to perform reference counting at runtime. RefCell enables interior mutability, but adds overhead to keep track of the active borrows at runtime. Weak is like Rc, but it doesn't increment the reference count; this is used to break reference cycles, which would prevent the reference count from dropping to zero, causing a memory leak. See DK.'s answer for an implementation using Rc, Weak and RefCell.
The problem is that this data structure is inherently unsafe; it doesn't have a direct equivalent in Rust that doesn't use unsafe. This is by design.
If you want to translate this into safe Rust code, you need to be more specific about what, exactly, you want from it. I know you listed some properties above, but often people coming to Rust will say "I want everything I have in this C/C++ code", to which the direct answer is "well, you can't."
You're also, unavoidably, going to have to change how you approach this. The example you've given has pointers without any ownership semantics, mutable aliasing, and cycles; all of which Rust will not allow you to simply ignore like C++ does.
The simplest solution is to just get rid of the parent pointer, and maintain that externally (like a filesystem path). This also plays nicely with borrowing because there are no cycles anywhere:
pub struct Node1 {
children: Vec<Node1>,
}
If you need parent pointers, you could go half-way and use Ids instead:
use std::collections::BTreeMap;
type Id = usize;
pub struct Tree {
descendants: BTreeMap<Id, Node2>,
root: Option<Id>,
}
pub struct Node2 {
parent: Id,
children: Vec<Id>,
}
The BTreeMap is effectively your "address space", bypassing borrowing and aliasing issues by not directly using memory addresses.
Of course, this introduces the problem of a given Id not being tied to the particular tree, meaning that the node it belongs to could be destroyed, and now you have what is effectively a dangling pointer. But, that's the price you pay for having aliasing and mutation. It's also less direct.
Or, you could go whole-hog and use reference-counting and dynamic borrow checking:
use std::cell::RefCell;
use std::rc::{Rc, Weak};
// Note: do not derive Clone to make this move-only.
pub struct Node3(Rc<RefCell<Node3_>>);
pub type WeakNode3 = Weak<RefCell<Node3>>;
pub struct Node3_ {
parent: Option<WeakNode3>,
children: Vec<Node3>,
}
impl Node3 {
pub fn add(&self, node: Node3) {
// No need to remove from old parent; move semantics mean that must have
// already been done.
(node.0).borrow_mut().parent = Some(Rc::downgrade(&self.0));
self.children.push(node);
}
}
Here, you'd use Node3 to transfer ownership of a node between parts of the tree, and WeakNode3 for external references. Or, you could make Node3 cloneable and add back the logic in add to make sure a given node doesn't accidentally stay a child of the wrong parent.
This is not strictly better than the second option, because this design absolutely cannot benefit from static borrow-checking. The second one can at least prevent you from mutating the graph from two places at once at compile time; here, if that happens, you'll just crash.
The point is: you can't just have everything. You have to decide which operations you actually need to support. At that point, it's usually just a case of picking the types that give you the necessary properties.
In certain cases, you can also use an arena. An arena guarantees that values stored in it will have the same lifetime as the arena itself. This means that adding more values will not invalidate any existing lifetimes, but moving the arena will. Thus, such a solution is not viable if you need to return the tree.
This solves the problem by removing the ownership from the nodes themselves.
Here's an example that also uses interior mutability to allow a node to be mutated after it is created. In other cases, you can remove this mutability if the tree is constructed once and then simply navigated.
use std::{
cell::{Cell, RefCell},
fmt,
};
use typed_arena::Arena; // 1.6.1
struct Tree<'a, T: 'a> {
nodes: Arena<Node<'a, T>>,
}
impl<'a, T> Tree<'a, T> {
fn new() -> Tree<'a, T> {
Self {
nodes: Arena::new(),
}
}
fn new_node(&'a self, data: T) -> &'a mut Node<'a, T> {
self.nodes.alloc(Node {
data,
tree: self,
parent: Cell::new(None),
children: RefCell::new(Vec::new()),
})
}
}
struct Node<'a, T: 'a> {
data: T,
tree: &'a Tree<'a, T>,
parent: Cell<Option<&'a Node<'a, T>>>,
children: RefCell<Vec<&'a Node<'a, T>>>,
}
impl<'a, T> Node<'a, T> {
fn add_node(&'a self, data: T) -> &'a Node<'a, T> {
let child = self.tree.new_node(data);
child.parent.set(Some(self));
self.children.borrow_mut().push(child);
child
}
}
impl<'a, T> fmt::Debug for Node<'a, T>
where
T: fmt::Debug,
{
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{:?}", self.data)?;
write!(f, " (")?;
for c in self.children.borrow().iter() {
write!(f, "{:?}, ", c)?;
}
write!(f, ")")
}
}
fn main() {
let tree = Tree::new();
let head = tree.new_node(1);
let _left = head.add_node(2);
let _right = head.add_node(3);
println!("{:?}", head); // 1 (2 (), 3 (), )
}
TL;DR: DK.'s second version doesn't compile because parent has another type than self.0, fix it by converting it to a WeakNode. Also, in the line directly below, "self" doesn't have a "children" attribute but self.0 has.
I corrected the version of DK. so it compiles and works. Here is my Code:
dk_tree.rs
use std::cell::RefCell;
use std::rc::{Rc, Weak};
// Note: do not derive Clone to make this move-only.
pub struct Node(Rc<RefCell<Node_>>);
pub struct WeakNode(Weak<RefCell<Node_>>);
struct Node_ {
parent: Option<WeakNode>,
children: Vec<Node>,
}
impl Node {
pub fn new() -> Self {
Node(Rc::new(RefCell::new(Node_ {
parent: None,
children: Vec::new(),
})))
}
pub fn add(&self, node: Node) {
// No need to remove from old parent; move semantics mean that must have
// already been done.
node.0.borrow_mut().parent = Some(WeakNode(Rc::downgrade(&self.0)));
self.0.borrow_mut().children.push(node);
}
// just to have something visually printed
pub fn to_str(&self) -> String {
let mut result_string = "[".to_string();
for child in self.0.borrow().children.iter() {
result_string.push_str(&format!("{},", child.to_str()));
}
result_string += "]";
result_string
}
}
and then the main function in main.rs:
mod dk_tree;
use crate::dk_tree::{Node};
fn main() {
let root = Node::new();
root.add(Node::new());
root.add(Node::new());
let inner_child = Node::new();
inner_child.add(Node::new());
inner_child.add(Node::new());
root.add(inner_child);
let result = root.to_str();
println!("{result:?}");
}
The reason I made the WeakNode be more like the Node is to have an easier conversion between the both

Idiomatic append operation

I'm writing a function that will transfer the contents from one Vec to another.
I managed to write two different versions of the same code. One is cleaner, but is potentially slower.
Version 1:
fn move_values<T>(buffer: &mut Vec<T>, recipient: &mut Vec<T>) {
loop {
let value = buffer.pop();
if value.is_none() {
return;
}
recipient.push(value.unwrap());
}
}
Version 2:
fn move_values<T>(buffer: &mut Vec<T>, recipient: &mut Vec<T>) {
for value in buffer.iter() {
recipient.push(value.clone());
}
buffer.clear();
}
My initial gut feeling is that Version 1 is faster because it only requires a single run through the buffer; while Version 2 is more "Rusty" because it involves iterating over a collection rather than using loop.
Which of these is more idiomatic or "better practice" in general?
Note, I'm aware of append, I'm trying to do this by hand for educational purposes.
Neither. There's a built-in operation for this, Vec::append:
Moves all the elements of other into Self, leaving other empty.
fn move_values<T>(buffer: &mut Vec<T>, recipient: &mut Vec<T>) {
recipient.append(buffer);
}
Neither of your functions even compile:
fn move_values_1<T>(buffer: &mut Vec<T>, recipient: &mut Vec<T>) {
loop {
let value = buffer.pop();
if value.is_none() {
return;
}
recipient.push_front(card.unwrap());
}
}
error[E0425]: unresolved name `card`
--> src/main.rs:7:30
|
7 | recipient.push_front(card.unwrap());
| ^^^^ unresolved name
fn move_values_2<T>(buffer: &mut Vec<T>, recipient: &mut Vec<T>) {
for value in buffer.iter() {
recipient.push_front(value.clone());
}
buffer.clear();
}
error: no method named `push_front` found for type `&mut std::vec::Vec<T>` in the current scope
--> src/main.rs:7:19
|
7 | recipient.push_front(card.unwrap());
| ^^^^^^^^^^
if I were to implement it myself
Well, there's a reason that it's implemented for you, but sure... let's dig in.
Checking if something is_some or is_none can usually be avoided by pattern matching. For example:
fn move_values_1<T>(buffer: &mut Vec<T>, recipient: &mut Vec<T>) {
while let Some(v) = buffer.pop() {
recipient.push(v);
}
}
Of course, this moves everything in reverse order because pushing and popping to a Vec both occur at the end.
Calling clone doesn't do what you want unless your trait bounds say that T implements Clone. Otherwise, you are just cloning the reference itself.
You can avoid the need for cloning if you drain the values from one collection and insert them into the other:
for value in buffer.drain(..) {
recipient.push(value);
}
But that for loop is silly, just extend the collection using the iterator:
recipient.extend(buffer.drain(..));
I'd still use the built in append method to do this when transferring between collections of the same type, as it is probably optimized for the precise data layout, and potentially specialized for certain types of data.

Removing items from a BTreeMap or BTreeSet found through iteration

I would like to remove items from a BTreeMap which have been found through iteration.
As it is not possible to remove items while iterating, I put the items to delete into a vector. The main issue is that it is not possible to use a vector of references, but only a vector of values. All the keys for which the entry has to be removed must then be cloned (assuming the key implements the Clone trait).
For instance, this short sample does not compile:
use std::collections::BTreeMap;
pub fn clean() {
let mut map = BTreeMap::<String, i32>::new();
let mut to_delete = Vec::new();
{
for (k, v) in map.iter() {
if *v > 10 {
to_delete.push(k);
}
}
}
for k in to_delete.drain(..) {
map.remove(k);
}
}
fn main() {}
It generates the following errors when it is compiled:
error[E0502]: cannot borrow `map` as mutable because it is also borrowed as immutable
--> src/main.rs:17:9
|
9 | for (k, v) in map.iter() {
| --- immutable borrow occurs here
...
17 | map.remove(k);
| ^^^ mutable borrow occurs here
18 | }
19 | }
| - immutable borrow ends here
Changing to_delete.push(k) with to_delete.push(k.clone()) makes this snippet compile correctly but it is quite costly if each key to delete must be cloned.
Is there a better solution?
TL;DR: you cannot.
As far as the compiler is concerned, the implementation of BTreeMap::remove might do this:
pub fn remove<Q>(&mut self, key: &Q) -> Option<V>
where
K: Borrow<Q>,
Q: Ord + ?Sized,
{
// actual deleting code, which destroys the value in the set
// now what `value` pointed to is gone and `value` points to invalid memory
// And now we access that memory, causing undefined behavior
key.borrow();
}
The compiler thus has to prevent using the reference to the value when the collection will be mutated.
To do this, you'd need something like the hypothetical "cursor" API for collections. This would allow you to iterate over the collection, returning a special type that hold the mutable innards of the collection. This type could give you a reference to check against and then allow you to remove the item.
I'd probably look at the problem from a bit different direction. Instead of trying to keep the map, I'd just create a brand new one:
use std::collections::BTreeMap;
pub fn main() {
let mut map = BTreeMap::new();
map.insert("thief", 5);
map.insert("troll", 52);
map.insert("gnome", 7);
let map: BTreeMap<_, _> =
map.into_iter()
.filter(|&(_, v)| v <= 10)
.collect();
println!("{:?}", map); // troll is gone
}
If your condition is equality on the field that makes the struct unique (a.k.a is the only field used in PartialEq and Hash), you can implement Borrow for your type and directly grab it / delete it:
use std::collections::BTreeMap;
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord)]
struct Monster(String);
use std::borrow::Borrow;
impl Borrow<str> for Monster {
fn borrow(&self) -> &str { &self.0 }
}
pub fn main() {
let mut map = BTreeMap::new();
map.insert(Monster("thief".into()), 5);
map.insert(Monster("troll".into()), 52);
map.insert(Monster("gnome".into()), 7);
map.remove("troll");
println!("{:?}", map); // troll is gone
}
See also:
How to implement HashMap with two keys?
How to avoid temporary allocations when using a complex key for a HashMap?
As of 1.53.0, there is a BTreeMap.retain method which allows to remove/retain items based on return value of a closure, while iterating:
map.retain(|_, v| *v<=10);

Resources