Insert in arbitrary nested HashMap - rust

I want to have a data-structure that allows me to have arbitrary nested HashMap. For that I've constructed the following struct:
struct Database {
children: HashMap<String, Database>,
data: String,
}
For inserting in this structure i get a list of keys and a value to insert. So for example for the input
let subkeys = vec!["key1", "key1.1", "key1.1.3"];
let value = "myvalue";
I want the database to have this (pseudo) structure:
{
"data" : "",
"children": {
"key1": {
"data" : "",
"children": {
"key1.1": {
"data" : "",
"children" : {
"key1.1.3": {
"data": "myvalue",
"children" : {}
}
}
}
}
}
}
}
and then for example for a second insert request
let subkeys = vec!["key1", "key1.1", "key1.1.2"];
let value = "myvalue2";
the structure should look (pseudo) like this:
{
"data" : "",
"children": {
"key1": {
"data" : "",
"children": {
"key1.1": {
"data" : "",
"children" : {
"key1.1.3": {
"data": "myvalue",
"children" : {}
},
"key1.1.2": {
"data": "myvalue2",
"children" : {}
}
}
}
}
}
}
}
So here is a minimal reproducible example of what I've tried (not working)
playground
use std::collections::HashMap;
struct Database {
children: HashMap<String, Database>,
data: String,
}
fn main()
{
// make a databse object
let mut db = Database {
children: HashMap::new(),
data: "root".to_string(),
};
// some example subkeys
let subkeys = vec!["key1", "key1.1", "key1.1.3"];
// and the value i want to insert
let value = "myvalue";
// a reference to the current HashMap
// initialize with the root
let mut root = &db.children;
// iterate throught subkeys
for subkey in subkeys.iter() {
// match the result of a get request to the hashmap
match root.get::<String>(&subkey.to_string()) {
Some(child) => {
// if the key already exists set the root to the children of the child
root = &child.children;
}
None => {
// if key doesnt exist add it with a ne empty hashmap
let d = Database{children: HashMap::new(), data: "".to_string()};
// set root to this new databse obejct
root = &d.children;
root.insert(subkey.to_string(), d);
}
}
}
}
So as I understand it there are to problems with this code:
&d.children get s dropped after the match and so root "kind of" has no value
also the root.insert seems to be a problem because root is a & reference, so the data it refers to cannot be borrowed as mutable`
What do I need to do to make my code work and produce results like shown above. Do I maybe need to change something in my struct Database?

First, some comments on what you have so far and why it doesn't work. root needs to be a mutable reference. Note the distinction between a mutable variable (let mut root = &db.children;) and a mutable reference (let root = &mut db.children;). The former allows the variable itself to be changed. The latter allows the data behind the reference to be changed. In this instance, we need both (let mut root = &mut db.children) because we not only change root as we iterate through the nodes, but we also modify the data behind the reference whenever we need to insert a new node.
The same thing applies to d in the inner loop (it needs to be a mutable variable), though as we'll see, mutating d isn't really what we want.
// if key doesnt exist add it with a ne empty hashmap
let d = Database{children: HashMap::new(), data: "".to_string()};
// set root to this new databse obejct
root = &mut d.children;
root.insert(subkey.to_string(), d);
Ignoring the errors for a moment, what should this code do? d is a new Database with no real data in it. Then, we set root to be the (empty) set of children of this new Database. Finally, we insert the new Database into root. But since we changed root in the second step, it's no longer the parent: we're inserting d as a child of itself!
We instead want to switch the order of the second two steps. But if we simply switch those two lines, we get the error
error[E0382]: borrow of moved value: `d`
--> src/main.rs:41:24
|
36 | let mut d = Database{children: HashMap::new(), data: "".to_string()};
| ----- move occurs because `d` has type `Database`, which does not implement the `Copy` trait
37 |
38 | root.insert(subkey.to_string(), d);
| - value moved here
...
41 | root = &mut d.children;
| ^^^^^^^^^^^^^^^ value borrowed here after move
So the problem is that d is no longer a local variable when we try to set root to its children. We need root to be the children of the just-inserted value. The usual idiom for this kind of thing is the entry API. It allows us to attempt to get a value from a HashMap and if it's not found, insert something. Most relevantly, this insertion returns a mutable reference to whatever value now resides at that key.
Now that section looks like
// if key doesnt exist add it with a new empty hashmap
let d = Database{children: HashMap::new(), data: "".to_string()};
// insert the new database object and
// set root to the hashmap of children
root = &mut root.entry(subkey.to_string()).or_insert(d).children;
At this point, we have an apparently working program. By adding a #[derive(Debug)] to Database, we can see what the database looks like with println!("{:#?}, db);. However, if we try to add in the second value, everything blows up. Rather than placing the two values side-by-side, they end up in completely separate branches of the database. This traces back to the commented out lines in the Some(child) branch of the match statement.
We'd like to set root to a mutable reference to child.children, but even just uncommenting that line without any changes causes the error that root is mutably borrow while borrowed elsewhere. The problem is that we're using the borrow in root.get(&subkey.to_string()) now. Before, since we ignored child and the other branch didn't use any data from that borrow, the borrow could end right away. Now it has to last for the whole duration of the match. This prevents us from borrowing mutably even in the None case.
Fortunately, since we're using the entry API, we don't need this match statement at all! The whole thing can just be replaced with
let d = Database {
children: HashMap::new(),
data: "".to_string(),
};
// insert the new database object and
// set root to the hashmap of children
root = &mut root.entry(subkey.to_string()).or_insert(d).children;
If the subkey already exists in the set of children, root.entry(...).or_insert(...) will point to that already existing child.
Now we just need to clean up the code. Since you're using it more that once, I'd recommend factoring the act of inserting a path of keys into a function. Rather than following the HashMap<String, Database> through the path, I'd recommend following the Database itself, since that will allow you to modify its data field at the end. To that end, I'd suggest a function with this signature:
impl Database {
fn insert_path(&mut self, path: &[&str]) -> &mut Database {
todo!()
}
}
Next, since we only need to create a new Database (d) when one doesn't already exist, we can use Entry's or_insert_with method to create the new database only when necessary. This is easiest when there's a function to create the new database, so let's add #[derive(Default)] to the list of derives on Database. That makes our function
impl Database {
fn insert_path(&mut self, path: &[&str]) -> &mut Self {
let mut root = self;
// iterate throught path
for subkey in path.iter() {
// insert the new database object if necessary and
// set root to the hashmap of children
root = root
.children
.entry(subkey.to_string())
// insert (if necessary) using the Database::default method
.or_insert_with(Database::default);
}
root
}
}
At this point we should run cargo clippy to see if there are any suggestions. There's one about using to_string on &&str. To fix that, you have two choices. One, use one of the other methods for converting &strs to Strings instead of to_string. Two, dereference the &&str before using to_string. This second option is simpler. Since we're iterating over &[&str] (Vec<&str>::iter in your original), the items in the iteration are &&str. The idiomatic way to strip off the extra layer of references is to use a pattern to destructure the items.
for &subkey in path {
^^^ this is new
... // subkey has type &str instead of &&str here
}
My last piece of advice would be to change the name of root to something more generic, like node. It's only the root right at the start, so the name is misleading after that. Here's the final code together with your tests (playground):
use std::collections::HashMap;
#[derive(Default, Debug)]
struct Database {
children: HashMap<String, Database>,
data: String,
}
impl Database {
fn insert_path(&mut self, path: &[&str]) -> &mut Self {
// node is a mutable reference to the current database
let mut node = self;
// iterate through the path
for &subkey in path.iter() {
// insert the new database object if necessary and
// set node to (a mutable reference to) the child node
node = node
.children
.entry(subkey.to_string())
.or_insert_with(Database::default);
}
node
}
}
fn main() {
// make a databse object
let mut db = Database {
children: HashMap::new(),
data: "root".to_string(),
};
// some example subkeys
let subkeys = vec!["key1", "key1.1", "key1.1.3"];
// and the value i want to insert
let value = "myvalue";
let node = db.insert_path(&subkeys);
node.data = value.to_string();
println!("{:#?}", db);
let subkeys = vec!["key1", "key1.1", "key1.1.2"];
let value = "myvalue2";
let node = db.insert_path(&subkeys);
node.data = value.to_string();
println!("{:#?}", db);
}

Related

Moving a field of an object in vector index

My question aims at understanding a deeper variant of the "cannot move out of index" error, specifically when I want to move a field of a struct that resides in a vector. Below is a classical example of a vector of objects and an attempt to move a field out of it.
struct Foo {
str_val: String,
int_val: i32,
}
fn main() {
let mut foos = Vec::new();
foos.push(Foo {
str_val: "ten".to_string(),
int_val: 10,
});
foos.push(Foo {
str_val: "twenty".to_string(),
int_val: 20,
});
// Here understandable error.
let moved = foos[0];
// Why also an error? "cannot move out of index of `Vec<Foo>`"
let moved_field = foos[0].str_val;
}
My question:
while I do understand why one cannot move the whole object occupying the element of the index (i.e. because the contiguous index is its owner), but
what I don't understand is why one cannot move a field of such an object.
The logical counter-reasoning I have is that it is allowed to move a field out of a standalone object, and rust appears advanced enough to track the ownership of fields separately:
let mut standalone = Foo {
str_val: "thirty".to_string(),
int_val: 30,
};
let moved_thirty = standalone.str_val;
// Rust seems to be completely ok with tracking fields separately.
let int_moved = standalone.i32;
// "use of moved value: `standalone.str_val`"
let error = standalone.str_val;
When one make a partial move out of a variable the parent variable cannot be used as a whole anymore, since you have stored the object in a vector this is forbidden, for instance the vector may need to move to reallocate more space.

How do I insert into or update an RBTree from rbtree-rs?

I'm having trouble understanding how to idiomatically find and append to or create a new vector if the value is part of a data structure, in this case a Red Black tree.
I'm using this Red Black Tree implementation and the plan is to grab a mutable reference to the value which I will append to if it exists (not None) and create a new vector and move it to the RBTree if there is no value for the key. My code looks like this, slightly altered for brevity so excuse any careless errors:
struct Obj {
tree: RBTree<i32, Vec<String>>,
}
let mut obj = Obj {
tree: RBTree::new(),
};
let k = 5;
let v = "whatever";
match obj.tree.get_mut(k) {
None => {
let mut vec: Vec<Node> = Vec::new();
vec.push(v);
book.tree.insert(k, vec);
}
Some(vec) => vec.push(v),
}
The problem is that I'm getting a mutable reference to the tree when I check for existence because if it does exist, I want to mutate it by appending to the vector. However, if it does not exist, I want to insert a new node which tries to do a mutable borrow so I get a "second mutable borrow occurs here" on this line book.tree.insert(k, vec);.
I would love some insight into how to perform this find or create so that it compiles and is thread safe. I guess it's also possible the library I'm using has problems. Not yet qualified to comment on that.
In such cases the workaround is to move the mutating code outside of get_mut()'s scope.
let needs_insert = match obj.tree.get_mut(k) {
None => true,
Some(vec) => {
vec.push(v);
false
}
};
if needs_insert {
book.tree.insert(k, vec![v]);
}

Variable in loop does not live long enough

I've been playing with Rust for the past few days,
and I'm still struggling with the memory management (figures).
I wrote a simple project that creates a hierarchy of structs ("keywords") from lexing/parsing a text file.
A keyword is defined like this:
pub struct Keyword<'a> {
name: String,
code: u32,
parent: Option<&'a Keyword<'a>>,
}
which is my equivalent for this C struct:
typedef struct Keyword Keyword;
struct Keyword {
char* name;
unsigned int code;
Keyword* parent;
};
A hierarchy is just a container for keywords, defined like this:
pub struct KeywordHierarchy<'a> {
keywords: Vec<Box<Keyword<'a>>>,
}
impl<'a> KeywordHierarchy<'a> {
fn add_keyword(&mut self, keyword: Box<Keyword<'a>>) {
self.keywords.push(keyword);
}
}
In the parse function (which is a stub of the complete parser), I recreated the condition that spawns the lifetime error in my code.
fn parse<'a>() -> Result<KeywordHierarchy<'a>, String> {
let mut hierarchy = KeywordHierarchy { keywords: Vec::new() };
let mut first_if = true;
let mut second_if = false;
while true {
if first_if {
// All good here.
let root_keyword = Keyword {
name: String::from("ROOT"),
code: 0,
parent: None,
};
hierarchy.add_keyword(Box::new(root_keyword));
first_if = false;
second_if = true;
}
if second_if {
// Hierarchy might have expired here?
// Find parent
match hierarchy.keywords.iter().find(|p| p.code == 0) {
None => return Err(String::from("No such parent")),
Some(parent) => {
// If parent is found, create a child
let child = Keyword {
name: String::from("CHILD"),
code: 1,
parent: Some(&parent),
};
hierarchy.add_keyword(Box::new(child));
}
}
second_if = false;
}
if !first_if && !second_if {
break;
}
}
Ok(hierarchy)
}
There's a while loop that goes through the lexer tokens.
In the first if, I add the ROOT keyword to the hierarchy, which is the only one that doesn't have a parent, and everything goes smoothly as expected.
In the second if, I parse the children keywords and I get a lifetime error when invoking KeywordHierarchy.add_keyword().
hierarchy.keywords` does not live long enough
Could you guys recommend an idiomatic way to fix this?
Cheers.
P.S. Click here for the playground
The immediate problem I can see with your design is that your loop is going to modify the hierarchy.keywords vector (in the first_if block), but it also borrows elements from the hierarchy.keywords vector (in the second_if block).
This is problematic, because modifying a vector may cause its backing buffer to be reallocated, which, if it were allowed, would invalidate all existing borrows to the vector. (And thus it is not allowed.)
Have you considered using an arena to hold your keywords instead of a Vec? Arenas are designed so that you can allocate new things within them while still having outstanding borrows to elements previously allocated within the arena.
Update: Here is a revised version of your code that illustrates using an arena (in this case a rustc_arena::TypedArena, but that's just so I can get this running on the play.rust-lang.org service) alongside a Vec<&Keyword> to handle the lookups.
https://play.rust-lang.org/?gist=fc6d81cb7efa7e4f32c481ab6482e587&version=nightly&backtrace=0
The crucial bits of code are this:
First: the KeywordHierarchy now holds a arena alongside a vec:
pub struct KeywordHierarchy<'a> {
keywords: Vec<&'a Keyword<'a>>,
kw_arena: &'a TypedArena<Keyword<'a>>,
}
Second: Adding a keyword now allocates the spot in the arena, and stashes a reference to that spot in the vec:
fn add_keyword(&mut self, keyword: Keyword<'a>) {
let kw = self.kw_arena.alloc(keyword);
self.keywords.push(kw);
}
Third: the fn parse function now takes an arena (reference) as input, because we need the arena to outlive the stack frame of fn parse:
fn parse<'a>(arena: &'a TypedArena<Keyword<'a>>) -> Result<KeywordHierarchy<'a>, String> {
...
Fourth: To avoid borrow-checker issues with borrowing hierarchy as mutable while also iterating over it, I moved the hierarchy modification outside of your Find parent match:
// Find parent
let parent = match hierarchy.keywords.iter().find(|p| p.code == 0) {
None => return Err(String::from("No such parent")),
Some(parent) => *parent, // (this deref is important)
};
// If parent is found, create a child
let child = Keyword {
name: String::from("CHILD"),
code: 1,
parent: Some(parent),
};
hierarchy.add_keyword(child);
second_if = false;

Return exact value in Rust HashMap

I can't find a suitable way to return the exact value of key in a HashMap in Rust . All the existing get methods return in a different format rather than the exact format.
You probably want the HashMap::remove method - it deletes the key from the map and returns the original value rather than a reference:
use std::collections::HashMap;
struct Thing {
content: String,
}
fn main() {
let mut hm: HashMap<u32, Thing> = HashMap::new();
hm.insert(
123,
Thing {
content: "abc".into(),
},
);
hm.insert(
432,
Thing {
content: "def".into(),
},
);
// Remove object from map, and take ownership of it
let value = hm.remove(&432);
if let Some(v) = value {
println!("Took ownership of Thing with content {:?}", v.content);
};
}
The get methods must return a reference to the object because the original object can only exist in one place (it is owned by the HashMap). The remove method can return the original object (i.e "take ownership") only because it removes it from its original owner.
Another solution, depending on the specific situation, may be to take the reference, call .clone() on it to make a new copy of the object (in this case it wouldn't work because Clone isn't implemented for our Thing example object - but it would work if the value way, say, a String)
Finally it may be worth noting you can still use the reference to the object in many circumstances - e.g the previous example could be done by getting a reference:
use std::collections::HashMap;
struct Thing {
content: String,
}
fn main() {
let mut hm: HashMap<u32, Thing> = HashMap::new();
hm.insert(
123,
Thing {
content: "abc".into(),
},
);
hm.insert(
432,
Thing {
content: "def".into(),
},
);
let value = hm.get(&432); // Get reference to the Thing containing "def" instead of removing it from the map and taking ownership
// Print the `content` as in previous example.
if let Some(v) = value {
println!("Showing content of referenced Thing: {:?}", v.content);
}
}
There are two basic methods of obtaining the value for the given key: get() and get_mut(). Use the first one if you just want to read the value, and the second one if you need to modify the value:
fn get(&self, k: &Q) -> Option<&V>
fn get_mut(&mut self, k: &Q) -> Option<&mut V>
As you can see from their signatures, both of these methods return Option rather than a direct value. The reason is that there may be no value associated to the given key:
use std::collections::HashMap;
let mut map = HashMap::new();
map.insert(1, "a");
assert_eq!(map.get(&1), Some(&"a")); // key exists
assert_eq!(map.get(&2), None); // key does not exist
If you are sure that the map contains the given key, you can use unwrap() to get the value out of the option:
assert_eq!(map.get(&1).unwrap(), &"a");
However, in general, it is better (and safer) to consider also the case when the key might not exist. For example, you may use pattern matching:
if let Some(value) = map.get(&1) {
assert_eq!(value, &"a");
} else {
// There is no value associated to the given key.
}

Replacing a borrowed Arc<RwLock>

I have some code that stores an object. I have a function that does the legwork of storing that object and returning an Arc of it.
struct Something {
// ...
}
// create a something, returning a locked Arc of it.
fn make_something(&mut self) -> Arc<RwLock<Something>>
{
let x = Something{};
let stored = Arc::new(RwLock::new(x));
// stored is cloned and put into a container in self
stored.clone()
}
Elsewhere, I have code that sometimes needs to get a new make_something, letting the old Something get stored elsewhere in make_something's Self. However, it gives me scoping problems:
fn elsewhere() {
let mut something_arc = obj.make_something();
let mut something_locked = something_arc.write().unwrap();
loop {
// something_lock is mutated as a Something
// create a new make something, and start a "transaction"
something_arc = obj.make_something();
something_locked = something_arc.write().unwrap();
}
}
The borrow checker is telling me that I can't replace something_arc because it's being borrowed by something_locked.
How do I replace something_arc and something_locked with a new Arc and associated write lock?

Resources