I want to store structs in HashMap but also reference to same structs inside another struct vector field.
Let's say I have a tree build of two types of structs Item and Relation.
I am storing all the relations in HashMap by their id,
but I also want to fill each item.out_relations vector with mutable references to same Relation structs which are owned by HashMap.
Here is my Item:
pub struct Item<'a> {
pub id: oid::ObjectId,
pub out_relations: Vec<&'a mut Relation>, // <-- want to fill this
}
And when I am iterating over my relations got from DB
I am trying to do smth like this:
let from_id = relation.from_id; //to find related item
item_map // my items loaded here already
.entry(from_id)
.and_modify(|item| {
(*item)
.out_relations.push(
relation_map.try_insert(relation.id, relation).unwrap() // mut ref to relation expected here
)
}
);
For now compiler warns try_insert is unstable feature and points me to this bug
But let's imagine I have this mutable ref to relation owned already by HashMap- is it going to work?
Or this will be again some ref lives not long enough error? What are my options then? Or I better will store just relation id in item out_relations vector rather then refs? And when needed I will take my relation from the hashmap?
This is called shared mutability, and it is forbidden by the borrow checker.
Fortunately Rust offers safe tools to achieve this.
In your case you need to use Rc<RefCell>, so your code would be:
pub struct Item {
pub id: oid::ObjectId,
pub out_relations: Vec<Rc<RefCell<Relation>>>,
}
And when I am iterating over my relations got from DB I am trying to do smth like this:
let relation = Rc::new(RefCell::new(relation));
let from_id = relation.borrow().from_id; // assuming id is a Copy type
item_map // my items loaded here already
.entry(from_id)
.and_modify(|item| {
(*item)
.out_relations.push(
relation_map.try_insert(relation.id, relation.clone()).unwrap() // mut ref to relation expected here
)
}
);
If you want to mutate relation later, you can use .borrow_mut()
relation.borrow_mut().from_id = new_id;
I agree with Oussama Gammoudi's diagnosis, but I'd like to offer alternative solutions.
The issue with reference counting is that it kills performance.
In your vector, can you store the key to the hash map instead? Then, when you need to get the value from the vector, get the key from the vector and fetch the value from the hash map?
Another strategy is to keep a vector of the values, and then store indices into the vector in your hash map and other vector. This Rust Keynote speaker describes the strategy well: https://youtu.be/aKLntZcp27M?t=1787
Related
I want to create a function that provides a two step write and commit, like so:
// Omitting locking for brevity
struct States {
commited_state: u64,
// By reference is just a placeholder - I don't know how to do this
pending_states: HashSet<i64>
}
impl States {
fn read_dirty(&self) -> {
// Sum committed state and all non committed states
self.commited_state +
pending_states.into_iter().fold(sum_all_values).unwrap_or(0)
}
fn read_committed(&self) {
self.commited_state
}
}
let state_container = States::default();
async fn update_state(state_container: States, new_state: i64) -> Future {
// This is just pseudo code missing locking and such
// I'd like to add a reference to new_state
state_container.pending_states.insert(
new_state
)
async move {
// I would like to defer the commit
// I add the state to the commited state
state_container.commited_state =+ new_state;
// Then remove it *by reference* from the pending states
state_container.remove(new_state)
}
}
I'd like to be in a situation where I can call it like so
let commit_handler = update_state(state_container, 3).await;
// Do some external transactional stuff
third_party_transactional_service(...)?
// Commit if the above line does not error
commit_handler.await;
The problem I have is that HashMaps and HashSets, hash values based of their value and not their actual reference - so I can't remove them by reference.
I appreciate this a bit of a long question, but I'm just trying to give a bit more context as to what I'm trying to do. I know that in a typical database you'd generally have an atomic counter to generate the transaction ID, but that feels a bit overkill when the pointer reference would be enough.
However, I don't want to get the pointer value using unsafe, because it just seems a bit off to do something relatively simple.
Values in rust don't have an identity like they do in other languages. You need to ascribe them an identity somehow. You've hit on two ways to do this in your question: an ID contained within the value, or the address of the value as a pointer.
Option 1: An ID contained in the value
It's trivial to have a usize ID with a static AtomicUsize (atomics have interior mutability).
use std::sync::atomic::{AtomicUsize, Ordering};
// No impl of clone/copy as we want these IDs to be unique.
#[derive(Debug, Hash, PartialEq, Eq)]
#[repr(transparent)]
pub struct OpaqueIdentifier(usize);
impl OpaqueIdentifier {
pub fn new() -> Self {
static COUNTER: AtomicUsize = AtomicUsize::new(0);
Self(COUNTER.fetch_add(1, Ordering::Relaxed))
}
pub fn id(&self) -> usize {
self.0
}
}
Now your map key becomes usize, and you're done.
Having this be a separate type that doesn't implement Copy or Clone allows you to have a concept of an "owned unique ID" and then every type with one of these IDs is forced not to be Copy, and a Clone impl would require obtaining a new ID.
(You can use a different integer type than usize. I chose it semi-arbitrarily.)
Option 2: A pointer to the value
This is more challenging in Rust since values in Rust are movable by default. In order for this approach to be viable, you have to remove this capability by pinning.
To make this work, both of the following must be true:
You pin the value you're using to provide identity, and
The pinned value is !Unpin (otherwise pinning still allows moves!), which can be forced by adding a PhantomPinned member to the value's type.
Note that the pin contract is only upheld if the object remains pinned for its entire lifetime. To enforce this, your factory for such objects should only dispense pinned boxes.
This could complicate your API as you cannot obtain a mutable reference to a pinned value without unsafe. The pin documentation has examples of how to do this properly.
Assuming that you have done all of this, you can then use *const T as the key in your map (where T is the pinned type). Note that conversion to a pointer is safe -- it's conversion back to a reference that isn't. So you can just use some_pin_box.get_ref() as *const _ to obtain the pointer you'll use for lookup.
The pinned box approach comes with pretty significant drawbacks:
All values being used to provide identity have to be allocated on the heap (unless using local pinning, which is unlikely to be ergonomic -- the pin! macro making this simpler is experimental).
The implementation of the type providing identity has to accept self as a &Pin or &mut Pin, requiring unsafe code to mutate the contents.
In my opinion, it's not even a good semantic fit for the problem. "Location in memory" and "identity" are different things, and it's only kind of by accident that the former can sometimes be used to implement the latter. It's a bit silly that moving a value in memory would change its identity, no?
I'd just go with adding an ID to the value. This is a substantially more obvious pattern, and it has no serious drawbacks.
I need to create a large HashMap in Rust, which is why I thought of using the Box to use the heap memory.
My question is about what is the best way to keep this data, certainly I only thought of two possible ways (anticipating that I am not so experienced with Rust).
fn main() {
let hashmap = Box<HashMap<u64, DataStruct>>;
...
}
OR
fn main() {
let hashmap = HashMap<u64, Box<DataStruct>>;
...
}
What is the best way to handle such a thing?
Thanks a lot.
HashMap already stores its data on the heap, you don't need to box your values.
Just like vectors, hash maps store their data on the heap. This
HashMap has keys of type String and values of type i32. Like vectors,
hash maps are homogeneous: all of the keys must have the same type,
and all of the values must have the same type.
Source
I'm new to rust and am trying to figure out how to create a HashMap of borrowed values from a Vec of data but when I try to do it I get a Vec into a HashMap the ownership model fights me. I don't know how to accomplish this, maybe I'm just trying something that is against the Rust mentality.
For Example:
struct Data{
id: String,
other_value: String,
}
//inside a method somewhere
let data_array = load_data(); // returns a Vec<Data>
let mut hash = HashMap::new(); // HashMap<&String, &Data>
for item in data_array {
hash.insert(&item.id, &item);
}
As far As I know there should be a way to populate this data in this way as the HashMap would be storing references to the original data. Or maybe I've just flat out misunderstood the docs... ¯_(ツ)_/¯
There key issue here is that you are consuming the Vec. for loops in Rust work over things that implement IntoIter. IntoIter moves the Vec into an iterator - the Vec itself no longer exists once this is done.
Therefore the items that you are looping though disappear at the end of each iteration., so those references end up referencing nonexistent data (dangling references). If you tried to using them, Bad Things Would Happen. Rust prevents you shooting yourself in the foot like that, so you get an error telling you that the reference does not live long enough. The solution to make your code compile is very easy. Just add .iter() to the end of the loop, which will iterate through references rather than consume the Vec.
for item in data_array.iter() {
hash.insert(&item.id, item); //Note we don't need an `&` in front of item
}
I'm still relatively new to Rust, so this might not be right, but I think it does what you want, but once and for all -- i.e. a function that makes it easy to turn a collection into a map using a closure to generate the keys:
fn map_by<I,K,V>(iterable: I, f: impl Fn(&V) -> K) -> HashMap<K,V>
where I: IntoIterator<Item = V>,
K: Eq + Hash
{
iterable.into_iter().map(|v| (f(&v), v)).collect()
}
Allowing you to say
map_by(data_array.iter(), |item| &item.id)
Here it is in the playground:
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=87c0e4d1e68ccb6dd3f2c43ac9f318c7
Please nudge me in the right direction if I have this wrong.
Is there a function like this lying around in std?
So turns out you can borrow the iterator value by borrowing the collection (Vec). So the example above turns into:
for item in &data_array {
hash.insert(&item.id, item);
}
Notice the &data_array which turns item from Data type to &Data and allows you to use the borrowed value.
Being fairly new to Rust, I was wondering on how to create a HashMap with a default value for a key? For example, having a default value 0 for any key inserted in the HashMap.
In Rust, I know this creates an empty HashMap:
let mut mymap: HashMap<char, usize> = HashMap::new();
I am looking to maintain a counter for a set of keys, for which one way to go about it seems to be:
for ch in "AABCCDDD".chars() {
mymap.insert(ch, 0)
}
Is there a way to do it in a much better way in Rust, maybe something equivalent to what Ruby provides:
mymap = Hash.new(0)
mymap["b"] = 1
mymap["a"] # 0
Answering the problem you have...
I am looking to maintain a counter for a set of keys.
Then you want to look at How to lookup from and insert into a HashMap efficiently?. Hint: *map.entry(key).or_insert(0) += 1
Answering the question you asked...
How does one create a HashMap with a default value in Rust?
No, HashMaps do not have a place to store a default. Doing so would cause every user of that data structure to allocate space to store it, which would be a waste. You'd also have to handle the case where there is no appropriate default, or when a default cannot be easily created.
Instead, you can look up a value using HashMap::get and provide a default if it's missing using Option::unwrap_or:
use std::collections::HashMap;
fn main() {
let mut map: HashMap<char, usize> = HashMap::new();
map.insert('a', 42);
let a = map.get(&'a').cloned().unwrap_or(0);
let b = map.get(&'b').cloned().unwrap_or(0);
println!("{}, {}", a, b); // 42, 0
}
If unwrap_or doesn't work for your case, there are several similar functions that might:
Option::unwrap_or_else
Option::map_or
Option::map_or_else
Of course, you are welcome to wrap this in a function or a data structure to provide a nicer API.
ArtemGr brings up an interesting point:
in C++ there's a notion of a map inserting a default value when a key is accessed. That always seemed a bit leaky though: what if the type doesn't have a default? Rust is less demanding on the mapped types and more explicit about the presence (or absence) of a key.
Rust adds an additional wrinkle to this. Actually inserting a value would require that simply getting a value can also change the HashMap. This would invalidate any existing references to values in the HashMap, as a reallocation might be required. Thus you'd no longer be able to get references to two values at the same time! That would be very restrictive.
What about using entry to get an element from the HashMap, and then modify it.
From the docs:
fn entry(&mut self, key: K) -> Entry<K, V>
Gets the given key's corresponding entry in the map for in-place
manipulation.
example
use std::collections::HashMap;
let mut letters = HashMap::new();
for ch in "a short treatise on fungi".chars() {
let counter = letters.entry(ch).or_insert(0);
*counter += 1;
}
assert_eq!(letters[&'s'], 2);
assert_eq!(letters[&'t'], 3);
assert_eq!(letters[&'u'], 1);
assert_eq!(letters.get(&'y'), None);
.or_insert() and .or_insert_with()
Adding to the existing example for .entry().or_insert(), I wanted to mention that if the default value passed to .or_insert() is dynamically generated, it's better to use .or_insert_with().
Using .or_insert_with() as below, the default value is not generated if the key already exists. It only gets created when necessary.
for v in 0..s.len() {
components.entry(unions.get_root(v))
.or_insert_with(|| vec![]) // vec only created if needed.
.push(v);
}
In the snipped below, the default vector passed to .or_insert() is generated on every call. If the key exists, a vector is being created and then disposed of, which can be wasteful.
components.entry(unions.get_root(v))
.or_insert(vec![]) // vec always created.
.push(v);
So for fixed values that don't have much creation overhead, use .or_insert(), and for values that have appreciable creation overhead, use .or_insert_with().
A way to start a map with initial values is to construct the map from a vector of tuples. For instance, considering, the code below:
let map = vec![("field1".to_string(), value1), ("field2".to_string(), value2)].into_iter().collect::<HashMap<_, _>>();
I'm new to rust (using 0.10) and exploring its use by implementing something like the rustc::middle::graph::Graph struct, but using strings as node indices and storing nodes in a HashMap.
Assuming non-static keys, what's a reasonable and efficient policy for ownership of the strings? Does the HashMap need to own its keys? Does each NodeIndex need to own its str? Is it possible for the node to own the string that defines its index and have everything else borrow that string? More generally, how should one share an immutable (but non-static) string amongst several data structures? If the answer is "it depends", what are the relevant issues?
If it is possible to have ownership of the string in one place and borrow it elsewhere, how is that accomplished? For example, if the Node struct were modified to store the node index as a string, how would the HashMap and NodeIndex use a borrowed version of it?
Is it possible for the node to own the string that defines its index and have everything else borrow that string?
[...]
If it is possible to have ownership of the string in one place and borrow it elsewhere, how is that accomplished? For example, if the Node struct were modified to store the node index as a string, how would the HashMap and NodeIndex use a borrowed version of it?
Not really: it's impossible (for the compiler) to verify that self-references don't get invalidated, i.e. many borrowing-internally situations (including this one specifically) end up allowing code similar to
struct Foo<'a> {
things: Vec<~str>,
borrows: Vec<&'a str>
}
let mut foo = Foo { ... };
foo.things.push(~"x");
foo.things.push(~"y");
// foo.things is [~"x", ~"y"]
// try to borrow the ~"y" to put a "y" into borrows
foo.borrows.push(foo.things.get(1).as_slice());
// ... time/SLoC passes ...
// we make the borrowed slice point to freed memory
// by popping/deallocating the ~"y"
foo.things.pop();
The compiler has a very hard time tell that an arbitrary modification won't be like the .pop call and invalidate the internal pointers. Basically, putting a self-reference to data that an object owns into the object itself would have to freeze that object, so no further modifications could occur (including moving out of the struct, e.g. to return it).
Tl;dr: you can't have one part of the Graph storing ~strs and another part storing &str slices into those ~strs.
That said, if you were to use some unsafe code, and only allow the Graph to be expanded, i.e. never removing nodes (i.e. never letting a ~str be deallocated until the whole graph is being destroyed), then some version of this could actually work.
More generally, how should one share an immutable (but non-static) string amongst several data structures?
You can use a Rc<~str>, with each data structure storing its own pointer. Something like
struct Graph<T> {
nodes: HashMap<Rc<~str>, Node<T>>
}
struct Node<T> {
name: Rc<~str>,
data: T,
outgoing_edges: Vec<Rc<~str>>
}
Another approach would be using a bidirectional map connecting strings with an "interned" index of a simple type, something like
struct Graph<T> {
index: BiMap<~str, Index>,
nodes: HashMap<Index, Node<T>>
}
#[deriving(Hash, Eq, TotalEq)]
struct Index { x: uint }
struct Node<T> {
name: Index,
data: T,
outgoing_edges: Vec<Index>
}
(Unfortunately this is hypothetical: Rust's stdlib doesn't have a BiMap type like this (yet...).)