Using Box<> with HashMap<> in Rust - rust

I need to create a large HashMap in Rust, which is why I thought of using the Box to use the heap memory.
My question is about what is the best way to keep this data, certainly I only thought of two possible ways (anticipating that I am not so experienced with Rust).
fn main() {
let hashmap = Box<HashMap<u64, DataStruct>>;
...
}
OR
fn main() {
let hashmap = HashMap<u64, Box<DataStruct>>;
...
}
What is the best way to handle such a thing?
Thanks a lot.

HashMap already stores its data on the heap, you don't need to box your values.
Just like vectors, hash maps store their data on the heap. This
HashMap has keys of type String and values of type i32. Like vectors,
hash maps are homogeneous: all of the keys must have the same type,
and all of the values must have the same type.
Source

Related

Storing mutable references owned by someone else

I want to store structs in HashMap but also reference to same structs inside another struct vector field.
Let's say I have a tree build of two types of structs Item and Relation.
I am storing all the relations in HashMap by their id,
but I also want to fill each item.out_relations vector with mutable references to same Relation structs which are owned by HashMap.
Here is my Item:
pub struct Item<'a> {
pub id: oid::ObjectId,
pub out_relations: Vec<&'a mut Relation>, // <-- want to fill this
}
And when I am iterating over my relations got from DB
I am trying to do smth like this:
let from_id = relation.from_id; //to find related item
item_map // my items loaded here already
.entry(from_id)
.and_modify(|item| {
(*item)
.out_relations.push(
relation_map.try_insert(relation.id, relation).unwrap() // mut ref to relation expected here
)
}
);
For now compiler warns try_insert is unstable feature and points me to this bug
But let's imagine I have this mutable ref to relation owned already by HashMap- is it going to work?
Or this will be again some ref lives not long enough error? What are my options then? Or I better will store just relation id in item out_relations vector rather then refs? And when needed I will take my relation from the hashmap?
This is called shared mutability, and it is forbidden by the borrow checker.
Fortunately Rust offers safe tools to achieve this.
In your case you need to use Rc<RefCell>, so your code would be:
pub struct Item {
pub id: oid::ObjectId,
pub out_relations: Vec<Rc<RefCell<Relation>>>,
}
And when I am iterating over my relations got from DB I am trying to do smth like this:
let relation = Rc::new(RefCell::new(relation));
let from_id = relation.borrow().from_id; // assuming id is a Copy type
item_map // my items loaded here already
.entry(from_id)
.and_modify(|item| {
(*item)
.out_relations.push(
relation_map.try_insert(relation.id, relation.clone()).unwrap() // mut ref to relation expected here
)
}
);
If you want to mutate relation later, you can use .borrow_mut()
relation.borrow_mut().from_id = new_id;
I agree with Oussama Gammoudi's diagnosis, but I'd like to offer alternative solutions.
The issue with reference counting is that it kills performance.
In your vector, can you store the key to the hash map instead? Then, when you need to get the value from the vector, get the key from the vector and fetch the value from the hash map?
Another strategy is to keep a vector of the values, and then store indices into the vector in your hash map and other vector. This Rust Keynote speaker describes the strategy well: https://youtu.be/aKLntZcp27M?t=1787

Rust - create hashmap which uses part of the data it's storing from an iterator

I'm new to rust and am trying to figure out how to create a HashMap of borrowed values from a Vec of data but when I try to do it I get a Vec into a HashMap the ownership model fights me. I don't know how to accomplish this, maybe I'm just trying something that is against the Rust mentality.
For Example:
struct Data{
id: String,
other_value: String,
}
//inside a method somewhere
let data_array = load_data(); // returns a Vec<Data>
let mut hash = HashMap::new(); // HashMap<&String, &Data>
for item in data_array {
hash.insert(&item.id, &item);
}
As far As I know there should be a way to populate this data in this way as the HashMap would be storing references to the original data. Or maybe I've just flat out misunderstood the docs... ¯_(ツ)_/¯
There key issue here is that you are consuming the Vec. for loops in Rust work over things that implement IntoIter. IntoIter moves the Vec into an iterator - the Vec itself no longer exists once this is done.
Therefore the items that you are looping though disappear at the end of each iteration., so those references end up referencing nonexistent data (dangling references). If you tried to using them, Bad Things Would Happen. Rust prevents you shooting yourself in the foot like that, so you get an error telling you that the reference does not live long enough. The solution to make your code compile is very easy. Just add .iter() to the end of the loop, which will iterate through references rather than consume the Vec.
for item in data_array.iter() {
hash.insert(&item.id, item); //Note we don't need an `&` in front of item
}
I'm still relatively new to Rust, so this might not be right, but I think it does what you want, but once and for all -- i.e. a function that makes it easy to turn a collection into a map using a closure to generate the keys:
fn map_by<I,K,V>(iterable: I, f: impl Fn(&V) -> K) -> HashMap<K,V>
where I: IntoIterator<Item = V>,
K: Eq + Hash
{
iterable.into_iter().map(|v| (f(&v), v)).collect()
}
Allowing you to say
map_by(data_array.iter(), |item| &item.id)
Here it is in the playground:
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=87c0e4d1e68ccb6dd3f2c43ac9f318c7
Please nudge me in the right direction if I have this wrong.
Is there a function like this lying around in std?
So turns out you can borrow the iterator value by borrowing the collection (Vec). So the example above turns into:
for item in &data_array {
hash.insert(&item.id, item);
}
Notice the &data_array which turns item from Data type to &Data and allows you to use the borrowed value.

What is the best way to dereference values within chains of iterators?

When I'm using iterators, I often find myself needing to explicitly dereference values. The following code finds the sum of all pairs of elements in a vector:
extern crate itertools;
use crate::itertools::Itertools;
fn main() {
let x: Vec<i32> = (1..4).collect();
x.iter()
.combinations(2)
.map(|xi| xi.iter().map(|bar| **bar)
.sum::<i32>())
.for_each(|bar| println!("{:?}", bar));
}
Is there a better way of performing the dereferencing than using a map?
Even better would be a way of performing these types of operations without explicitly dereferencing at all.
Using xi.iter() means that you are explicitly asking for an iterator of references to the values within xi. Since you don't want references in this case and instead want the actual values, you'd want to use xi.into_iter() to get an iterator for the values.
So you can change
.map(|xi| xi.iter().map(|bar| **bar).sum::<i32>())
to
.map(|xi| xi.into_iter().sum::<i32>())
Playground Link

How does one create a HashMap with a default value in Rust?

Being fairly new to Rust, I was wondering on how to create a HashMap with a default value for a key? For example, having a default value 0 for any key inserted in the HashMap.
In Rust, I know this creates an empty HashMap:
let mut mymap: HashMap<char, usize> = HashMap::new();
I am looking to maintain a counter for a set of keys, for which one way to go about it seems to be:
for ch in "AABCCDDD".chars() {
mymap.insert(ch, 0)
}
Is there a way to do it in a much better way in Rust, maybe something equivalent to what Ruby provides:
mymap = Hash.new(0)
mymap["b"] = 1
mymap["a"] # 0
Answering the problem you have...
I am looking to maintain a counter for a set of keys.
Then you want to look at How to lookup from and insert into a HashMap efficiently?. Hint: *map.entry(key).or_insert(0) += 1
Answering the question you asked...
How does one create a HashMap with a default value in Rust?
No, HashMaps do not have a place to store a default. Doing so would cause every user of that data structure to allocate space to store it, which would be a waste. You'd also have to handle the case where there is no appropriate default, or when a default cannot be easily created.
Instead, you can look up a value using HashMap::get and provide a default if it's missing using Option::unwrap_or:
use std::collections::HashMap;
fn main() {
let mut map: HashMap<char, usize> = HashMap::new();
map.insert('a', 42);
let a = map.get(&'a').cloned().unwrap_or(0);
let b = map.get(&'b').cloned().unwrap_or(0);
println!("{}, {}", a, b); // 42, 0
}
If unwrap_or doesn't work for your case, there are several similar functions that might:
Option::unwrap_or_else
Option::map_or
Option::map_or_else
Of course, you are welcome to wrap this in a function or a data structure to provide a nicer API.
ArtemGr brings up an interesting point:
in C++ there's a notion of a map inserting a default value when a key is accessed. That always seemed a bit leaky though: what if the type doesn't have a default? Rust is less demanding on the mapped types and more explicit about the presence (or absence) of a key.
Rust adds an additional wrinkle to this. Actually inserting a value would require that simply getting a value can also change the HashMap. This would invalidate any existing references to values in the HashMap, as a reallocation might be required. Thus you'd no longer be able to get references to two values at the same time! That would be very restrictive.
What about using entry to get an element from the HashMap, and then modify it.
From the docs:
fn entry(&mut self, key: K) -> Entry<K, V>
Gets the given key's corresponding entry in the map for in-place
manipulation.
example
use std::collections::HashMap;
let mut letters = HashMap::new();
for ch in "a short treatise on fungi".chars() {
let counter = letters.entry(ch).or_insert(0);
*counter += 1;
}
assert_eq!(letters[&'s'], 2);
assert_eq!(letters[&'t'], 3);
assert_eq!(letters[&'u'], 1);
assert_eq!(letters.get(&'y'), None);
.or_insert() and .or_insert_with()
Adding to the existing example for .entry().or_insert(), I wanted to mention that if the default value passed to .or_insert() is dynamically generated, it's better to use .or_insert_with().
Using .or_insert_with() as below, the default value is not generated if the key already exists. It only gets created when necessary.
for v in 0..s.len() {
components.entry(unions.get_root(v))
.or_insert_with(|| vec![]) // vec only created if needed.
.push(v);
}
In the snipped below, the default vector passed to .or_insert() is generated on every call. If the key exists, a vector is being created and then disposed of, which can be wasteful.
components.entry(unions.get_root(v))
.or_insert(vec![]) // vec always created.
.push(v);
So for fixed values that don't have much creation overhead, use .or_insert(), and for values that have appreciable creation overhead, use .or_insert_with().
A way to start a map with initial values is to construct the map from a vector of tuples. For instance, considering, the code below:
let map = vec![("field1".to_string(), value1), ("field2".to_string(), value2)].into_iter().collect::<HashMap<_, _>>();

Rust rustc::middle::graph::Graph with string node indices

I'm new to rust (using 0.10) and exploring its use by implementing something like the rustc::middle::graph::Graph struct, but using strings as node indices and storing nodes in a HashMap.
Assuming non-static keys, what's a reasonable and efficient policy for ownership of the strings? Does the HashMap need to own its keys? Does each NodeIndex need to own its str? Is it possible for the node to own the string that defines its index and have everything else borrow that string? More generally, how should one share an immutable (but non-static) string amongst several data structures? If the answer is "it depends", what are the relevant issues?
If it is possible to have ownership of the string in one place and borrow it elsewhere, how is that accomplished? For example, if the Node struct were modified to store the node index as a string, how would the HashMap and NodeIndex use a borrowed version of it?
Is it possible for the node to own the string that defines its index and have everything else borrow that string?
[...]
If it is possible to have ownership of the string in one place and borrow it elsewhere, how is that accomplished? For example, if the Node struct were modified to store the node index as a string, how would the HashMap and NodeIndex use a borrowed version of it?
Not really: it's impossible (for the compiler) to verify that self-references don't get invalidated, i.e. many borrowing-internally situations (including this one specifically) end up allowing code similar to
struct Foo<'a> {
things: Vec<~str>,
borrows: Vec<&'a str>
}
let mut foo = Foo { ... };
foo.things.push(~"x");
foo.things.push(~"y");
// foo.things is [~"x", ~"y"]
// try to borrow the ~"y" to put a "y" into borrows
foo.borrows.push(foo.things.get(1).as_slice());
// ... time/SLoC passes ...
// we make the borrowed slice point to freed memory
// by popping/deallocating the ~"y"
foo.things.pop();
The compiler has a very hard time tell that an arbitrary modification won't be like the .pop call and invalidate the internal pointers. Basically, putting a self-reference to data that an object owns into the object itself would have to freeze that object, so no further modifications could occur (including moving out of the struct, e.g. to return it).
Tl;dr: you can't have one part of the Graph storing ~strs and another part storing &str slices into those ~strs.
That said, if you were to use some unsafe code, and only allow the Graph to be expanded, i.e. never removing nodes (i.e. never letting a ~str be deallocated until the whole graph is being destroyed), then some version of this could actually work.
More generally, how should one share an immutable (but non-static) string amongst several data structures?
You can use a Rc<~str>, with each data structure storing its own pointer. Something like
struct Graph<T> {
nodes: HashMap<Rc<~str>, Node<T>>
}
struct Node<T> {
name: Rc<~str>,
data: T,
outgoing_edges: Vec<Rc<~str>>
}
Another approach would be using a bidirectional map connecting strings with an "interned" index of a simple type, something like
struct Graph<T> {
index: BiMap<~str, Index>,
nodes: HashMap<Index, Node<T>>
}
#[deriving(Hash, Eq, TotalEq)]
struct Index { x: uint }
struct Node<T> {
name: Index,
data: T,
outgoing_edges: Vec<Index>
}
(Unfortunately this is hypothetical: Rust's stdlib doesn't have a BiMap type like this (yet...).)

Resources