How to update a key in a HashMap? - rust

I can't find any method which allows me to update the key of a HashMap. There is get_key_value, but it returns an immutable reference and not a mutable reference.

You generally1 cannot. From the HashMap documentation:
It is a logic error for a key to be modified in such a way that the key's hash, as determined by the Hash trait, or its equality, as determined by the Eq trait, changes while it is in the map. This is normally only possible through Cell, RefCell, global state, I/O, or unsafe code.
Instead, remove the value and re-insert it:
use std::{collections::HashMap, hash::Hash};
fn rename_key<K, V>(h: &mut HashMap<K, V>, old_key: &K, new_key: K)
where
K: Eq + Hash,
{
if let Some(v) = h.remove(old_key) {
h.insert(new_key, v);
}
}
See also:
Changing key of HashMap from child method
1 — As the documentation states, it's possible to modify it, so long as you don't change how the key is hashed or compared. Doing so will cause the HashMap to be in an invalid state.
An example of doing it correctly (but it's still dubious why):
use std::{
cell::RefCell,
collections::HashMap,
hash::{Hash, Hasher},
sync::Arc,
};
#[derive(Debug)]
struct Example<A, B>(A, B);
impl<A, B> PartialEq for Example<A, B>
where
A: PartialEq,
{
fn eq(&self, other: &Self) -> bool {
self.0 == other.0
}
}
impl<A, B> Eq for Example<A, B> where A: Eq {}
impl<A, B> Hash for Example<A, B>
where
A: Hash,
{
fn hash<H>(&self, h: &mut H)
where
H: Hasher,
{
self.0.hash(h)
}
}
fn main() {
let mut h = HashMap::new();
let key = Arc::new(Example(0, RefCell::new(false)));
h.insert(key.clone(), "alpha");
dbg!(&h);
*key.1.borrow_mut() = true;
dbg!(&h);
}
There are other techniques you can use to get a mutable reference to the key, such as the unstable raw_entry_mut (as mentioned by Sven Marnach).

Related

Create a Set of Sets

How does one create a set of sets in Rust? Is it necessary to write an impl block for every concrete type satisfying HashSet<HashSet<_>>?
Minimal failing example:
fn main () {
let a: HashSet<u32> = HashSet::new();
let c: HashSet<HashSet<u32>> = HashSet::new();
c.insert(a);
}
Error:
"insert" method cannot be called on `std::collections::HashSet<std::collections::HashSet<u32>>` due to unsatisfied trait bounds
HashSet doesn't satisfy `std::collections::HashSet<u32>: Hash
Is it possible to override the fact that HashSet is unhashable? I'd like to use a HashSet and need my contents to be unique by actual (memory) equality; I don't need to unique by contents.
I'd like to have a set of sets and want them to be unique by "actual" (memory) equality, not by contents.
To do so you first need to box the hashset so that it has a stable memory address. For example:
struct Set<T>(Box<HashSet<T>>);
To make your Set hashable, you'll need to implement Hash and Eq:
impl<T> Set<T> {
fn as_addr(&self) -> usize {
// as_ref() gives the reference to the heap-allocated contents
// inside the Box, which is stable; convert that reference to a
// pointer and then to usize, and use it for hashing and equality.
self.0.as_ref() as *const _ as usize
}
}
impl<T> Hash for Set<T> {
fn hash<H: Hasher>(&self, state: &mut H) {
self.as_addr().hash(state);
}
}
impl<T> Eq for Set<T> {}
impl<T> PartialEq for Set<T> {
fn eq(&self, other: &Self) -> bool {
self.as_addr() == other.as_addr()
}
}
Finally, you'll need to add some set-like methods and a constructor to make it usable:
impl<T: Hash + Eq> Set<T> {
pub fn new() -> Self {
Set(Box::new(HashSet::new()))
}
pub fn insert(&mut self, value: T) {
self.0.insert(value);
}
pub fn contains(&mut self, value: &T) -> bool {
self.0.contains(value)
}
}
Now your code will work, with the additional use of Rc so that you have the original Set available for lookup after you insert it:
fn main() {
let mut a: Set<u32> = Set::new();
a.insert(1);
let a = Rc::new(a);
let mut c: HashSet<_> = HashSet::new();
c.insert(Rc::clone(&a));
assert!(c.contains(&a));
}
Playground
As pointed out helpfully in the comments, it's not possible to hash sets because they have no fixed address. An effective, if inelegant, solution, is to wrap them in a specialized struct:
struct HashableHashSet<T> {
hash: ...
hashset: HashSet<T>
}
And then hash the struct by memory equality.

How do I remove excessive `clone` calls from a struct that caches arbitrary results?

I am reading the section on closures in the second edition of the Rust book. At the end of this section, there is an exercise to extend the Cacher implementation given before. I gave it a try:
use std::clone::Clone;
use std::cmp::Eq;
use std::collections::HashMap;
use std::hash::Hash;
struct Cacher<T, K, V>
where
T: Fn(K) -> V,
K: Eq + Hash + Clone,
V: Clone,
{
calculation: T,
values: HashMap<K, V>,
}
impl<T, K, V> Cacher<T, K, V>
where
T: Fn(K) -> V,
K: Eq + Hash + Clone,
V: Clone,
{
fn new(calculation: T) -> Cacher<T, K, V> {
Cacher {
calculation,
values: HashMap::new(),
}
}
fn value(&mut self, arg: K) -> V {
match self.values.clone().get(&arg) {
Some(v) => v.clone(),
None => {
self.values
.insert(arg.clone(), (self.calculation)(arg.clone()));
self.values.get(&arg).unwrap().clone()
}
}
}
}
After creating a version that finally works, I am really unhappy with it. What really bugs me is that cacher.value(...) has 5(!) calls to clone() in it. Is there a way to avoid this?
Your suspicion is correct, the code contains too many calls to clone(), defeating the very optimizations Cacher is designed to achieve.
Cloning the entire cache
The one to start with is the call to self.values.clone() - it creates a copy of the entire cache on every single access.
After non-lexical lifetimes
Remove this clone.
Before non-lexical lifetimes
As you likely discovered yourself, simply removing .clone() doesn't compile. This is because the borrow checker considers the map referenced for the entire duration of match. The shared reference returned by HashMap::get points to the item inside the map, which means that while it exists, it is forbidden to create another mutable reference to the same map, which is required by HashMap::insert. For the code to compile, you need to split up the match in order to force the shared reference to go out of scope before insert is invoked:
// avoids unnecessary clone of the whole map
fn value(&mut self, arg: K) -> V {
if let Some(v) = self.values.get(&arg).map(V::clone) {
return v;
} else {
let v = (self.calculation)(arg.clone());
self.values.insert(arg, v.clone());
v
}
}
This is much better and probably "good enough" for most practical purposes. The hot path, where the value is already cached, now consists of only a single clone, and that one is actually necessary because the original value must remain in the hash map. (Also, note that cloning doesn't need to be expensive or imply deep copying - the stored value can be an Rc<RealValue>, which buys object sharing for free. In that case, clone() will simply increment the reference count on the object.)
Clone on cache miss
In case of cache miss, the key must be cloned, because calculation is declared to consume it. A single cloning will be sufficient, though, so we can pass the original arg to insert without cloning it again. The key clone still feels unnecessary, though - a calculation function shouldn't require ownership of the key it is transforming. Removing this clone boils down to modifying the signature of the calculation function to take the key by reference. Changing the trait bounds of T to T: Fn(&K) -> V allows the following formulation of value():
// avoids unnecessary clone of the key
fn value(&mut self, arg: K) -> V {
if let Some(v) = self.values.get(&arg).map(V::clone) {
return v;
} else {
let v = (self.calculation)(&arg);
self.values.insert(arg, v.clone());
v
}
}
Avoiding double lookups
Now are left with exactly two calls to clone(), one in each code path. This is optimal, as far as value cloning is concerned, but the careful reader will still be nagged by one detail: in case of cache miss, the hash table lookup will effectively happen twice for the same key: once in the call to HashMap::get, and then once more in HashMap::insert. It would be nice if we could instead reuse the work done the first time and perform only one hash map lookup. This can be achieved by replacing get() and insert() with entry():
// avoids the second lookup on cache miss
fn value(&mut self, arg: K) -> V {
match self.values.entry(arg) {
Entry::Occupied(entry) => entry.into_mut(),
Entry::Vacant(entry) => {
let v = (self.calculation)(entry.key());
entry.insert(v)
}
}.clone()
}
We've also taken the opportunity to move the .clone() call after the match.
Runnable example in the playground.
I was solving the same exercise and ended with the following code:
use std::thread;
use std::time::Duration;
use std::collections::HashMap;
use std::hash::Hash;
use std::fmt::Display;
struct Cacher<P, R, T>
where
T: Fn(&P) -> R,
P: Eq + Hash + Clone,
{
calculation: T,
values: HashMap<P, R>,
}
impl<P, R, T> Cacher<P, R, T>
where
T: Fn(&P) -> R,
P: Eq + Hash + Clone,
{
fn new(calculation: T) -> Cacher<P, R, T> {
Cacher {
calculation,
values: HashMap::new(),
}
}
fn value<'a>(&'a mut self, key: P) -> &'a R {
let calculation = &self.calculation;
let key_copy = key.clone();
self.values
.entry(key_copy)
.or_insert_with(|| (calculation)(&key))
}
}
It only makes a single copy of the key in the value() method. It does not copy the resulting value, but instead returns a reference with a lifetime specifier, which is equal to the lifetime of the enclosing Cacher instance (which is logical, I think, because values in the map will continue to exist until the Cacher itself is dropped).
Here's a test program:
fn main() {
let mut cacher1 = Cacher::new(|num: &u32| -> u32 {
println!("calculating slowly...");
thread::sleep(Duration::from_secs(2));
*num
});
calculate_and_print(10, &mut cacher1);
calculate_and_print(20, &mut cacher1);
calculate_and_print(10, &mut cacher1);
let mut cacher2 = Cacher::new(|str: &&str| -> usize {
println!("calculating slowly...");
thread::sleep(Duration::from_secs(2));
str.len()
});
calculate_and_print("abc", &mut cacher2);
calculate_and_print("defghi", &mut cacher2);
calculate_and_print("abc", &mut cacher2);
}
fn calculate_and_print<P, R, T>(intensity: P, cacher: &mut Cacher<P, R, T>)
where
T: Fn(&P) -> R,
P: Eq + Hash + Clone,
R: Display,
{
println!("{}", cacher.value(intensity));
}
And its output:
calculating slowly...
10
calculating slowly...
20
10
calculating slowly...
3
calculating slowly...
6
3
If you remove the requirement of returning values, you don't need to perform any clones by making use of the Entry:
use std::{
collections::{hash_map::Entry, HashMap},
fmt::Display,
hash::Hash,
thread,
time::Duration,
};
struct Cacher<P, R, T>
where
T: Fn(&P) -> R,
P: Eq + Hash,
{
calculation: T,
values: HashMap<P, R>,
}
impl<P, R, T> Cacher<P, R, T>
where
T: Fn(&P) -> R,
P: Eq + Hash,
{
fn new(calculation: T) -> Cacher<P, R, T> {
Cacher {
calculation,
values: HashMap::new(),
}
}
fn value<'a>(&'a mut self, key: P) -> &'a R {
let calculation = &self.calculation;
match self.values.entry(key) {
Entry::Occupied(e) => e.into_mut(),
Entry::Vacant(e) => {
let result = (calculation)(e.key());
e.insert(result)
}
}
}
}
fn main() {
let mut cacher1 = Cacher::new(|num: &u32| -> u32 {
println!("calculating slowly...");
thread::sleep(Duration::from_secs(1));
*num
});
calculate_and_print(10, &mut cacher1);
calculate_and_print(20, &mut cacher1);
calculate_and_print(10, &mut cacher1);
let mut cacher2 = Cacher::new(|str: &&str| -> usize {
println!("calculating slowly...");
thread::sleep(Duration::from_secs(2));
str.len()
});
calculate_and_print("abc", &mut cacher2);
calculate_and_print("defghi", &mut cacher2);
calculate_and_print("abc", &mut cacher2);
}
fn calculate_and_print<P, R, T>(intensity: P, cacher: &mut Cacher<P, R, T>)
where
T: Fn(&P) -> R,
P: Eq + Hash,
R: Display,
{
println!("{}", cacher.value(intensity));
}
You could then choose to wrap this in another struct that performed a clone:
struct ValueCacher<P, R, T>
where
T: Fn(&P) -> R,
P: Eq + Hash,
R: Clone,
{
cacher: Cacher<P, R, T>,
}
impl<P, R, T> ValueCacher<P, R, T>
where
T: Fn(&P) -> R,
P: Eq + Hash,
R: Clone,
{
fn new(calculation: T) -> Self {
Self {
cacher: Cacher::new(calculation),
}
}
fn value(&mut self, key: P) -> R {
self.cacher.value(key).clone()
}
}

Implementing only IndexMut without implementing Index

I'm trying to create a DefaultHashMap struct which is basically a wrapper around HashMap, with the difference that when getting a key that is not in the map the default value is put in that key and is returned.
I made a get and a get_mut method and this works fine. Now I'm trying to implement Index and IndexMut as wrappers around those methods. Here I'm running into two problems.
The first problem is caused by the fact that get has to mutate the struct when the key is not present it requires a mutable reference. However, the signature for the index method of Index has &self instead of &mut self, so I cannot implement it.
This causes a second problem, IndexMut requires an Index implementation. So even though IndexMut would have no problems being implemented I cannot do this because Index cannot be implemented.
The first problem is annoying but understandable. For the second one, I don't get why the requirement is even there. I would like to have a way to work around it. Right now I'm doing the following, but I hope someone has a better solution:
impl<K: Eq + Hash, V: Clone> Index<K> for DefaultHashMap<K, V> {
type Output = V;
fn index(&self, _: K) -> &V {
panic!("DefautHashMap doesn't implement indexing without mutating")
}
}
impl<K: Eq + Hash, V: Clone> IndexMut<K> for DefaultHashMap<K, V> {
#[inline]
fn index_mut(&mut self, index: K) -> &mut V {
self.get_mut(index)
}
}
First, I suspect that your requirement "when getting a key that is not in the map the default value is put in that key" is not exactly required!
Consider an immutable access let foo = default_hash_map[bar] + 123;. Unless you're going to use values with interior mutability with the map it might be inconsequential whether default_hash_map[bar] is actually creating a key or just returns a reference to a single default value.
Now, if you really need to create new entries during access then there is a way to do this. The borrow checker restriction that only allows you to add new entries with a mutable access is here to stop you from creating the dangling pointers that would occur whenever you modify the map while holding the references in there. But if you were using a structure with stable references, where stable means that the references are not invalidated when you enter new entries into the structure, then the problem the borrow checker is trying to prevent will go away.
In C++ I would've considered using a deque which is guaranteed by the standard not to invalidate its references when you add new entries to it. Unfortunately, Rust deques are different (though you can probably find arena allocator crates with properties similar to the C++ deque) and so for this example I'm using Box. The boxed values reside separately on the heap and aren't moved when you add new entries into HashMap.
Now, your normal access pattern is probably going to be modifying the new entries and then accessing the existing entries of the map. Thus making new entries in Index::index is an exception and shouldn't slow down the rest of the map. It might make sense therefore to pay the boxing price only for the Index::index access. To do that we might use a second structure, that keeps just the boxed Index::index values.
Knowing that HashMap<K, Box<V>> can be inserted into without invalidating the existing V refereces allows us to use it as a temporary buffer, holding the Index::index-created values until we get a chance to synchronize them with the primary HashMap.
use std::borrow::Borrow;
use std::cell::UnsafeCell;
use std::collections::HashMap;
use std::hash::Hash;
use std::ops::Index;
use std::ops::IndexMut;
struct DefaultHashMap<K, V>(HashMap<K, V>, UnsafeCell<HashMap<K, Box<V>>>, V);
impl<K, V> DefaultHashMap<K, V>
where K: Eq + Hash
{
fn sync(&mut self) {
let buf_map = unsafe { &mut *self.1.get() };
for (k, v) in buf_map.drain() {
self.0.insert(k, *v);
}
}
}
impl<'a, K, V, Q: ?Sized> Index<&'a Q> for DefaultHashMap<K, V>
where K: Eq + Hash + Clone,
K: Borrow<Q>,
K: From<&'a Q>,
Q: Eq + Hash,
V: Clone
{
type Output = V;
fn index(&self, key: &'a Q) -> &V {
if let Some(v) = self.0.get(key) {
v
} else {
let buf_map: &mut HashMap<K, Box<V>> = unsafe { &mut *self.1.get() };
if !buf_map.contains_key(key) {
buf_map.insert(K::from(key), Box::new(self.2.clone()));
}
&*buf_map.get(key).unwrap()
}
}
}
impl<'a, K, V, Q: ?Sized> IndexMut<&'a Q> for DefaultHashMap<K, V>
where K: Eq + Hash + Clone,
K: Borrow<Q>,
K: From<&'a Q>,
Q: Eq + Hash,
V: Clone
{
fn index_mut(&mut self, key: &'a Q) -> &mut V {
self.sync();
if self.0.contains_key(key) {
self.0.get_mut(key).unwrap()
} else {
self.0.insert(K::from(key), self.2.clone());
self.0.get_mut(key).unwrap()
}
}
}
fn main() {
{
let mut dhm = DefaultHashMap::<String, String>(HashMap::new(),
UnsafeCell::new(HashMap::new()),
"bar".into());
for i in 0..10000 {
dhm[&format!("{}", i % 1000)[..]].push('x')
}
println!("{:?}", dhm.0);
}
{
let mut dhm = DefaultHashMap::<String, String>(HashMap::new(),
UnsafeCell::new(HashMap::new()),
"bar".into());
for i in 0..10000 {
let key = format!("{}", i % 1000);
assert!(dhm[&key].len() >= 3);
dhm[&key[..]].push('x');
}
println!("{:?}", dhm.0);
}
{
#[derive(Eq, PartialEq, Clone, Copy, Hash, Debug)]
struct K(u32);
impl<'a> From<&'a u32> for K {
fn from(v: &u32) -> K {
K(*v)
}
}
impl<'a> Borrow<u32> for K {
fn borrow(&self) -> &u32 {
&self.0
}
}
let mut dhm = DefaultHashMap::<K, K>(HashMap::new(),
UnsafeCell::new(HashMap::new()),
K::from(&123));
for i in 0..10000 {
let key = i % 1000;
assert!(dhm[&key].0 >= 123);
dhm[&key].0 += 1;
}
println!("{:?}", dhm.0);
}
}
(playground)
Note that boxing only stabilizes the insertion of new entries. To remove the boxed entries you still need the mutable (&mut self) access to DefaultHashMap.

How to write a safe wrap for HashMap with default value

I implemented a wrap for HashMap with default values and I would like to know if it's safe.
When get is called, the internal map may be resized and previous references to values (obtained with get) would be pointing to invalid address. I tried to solve this problem using the idea that "all problems in computer science can be solved by another level of indirection" (Butler Lampson). I would like to know if this trick makes this code safe.
use std::cell::UnsafeCell;
use std::collections::HashMap;
use std::hash::Hash;
pub struct DefaultHashMap<I: Hash + Eq, T: Clone> {
default: T,
map: UnsafeCell<HashMap<I, Box<T>>>,
}
impl<I: Hash + Eq, T: Clone> DefaultHashMap<I, T> {
pub fn new(default: T) -> Self {
DefaultHashMap {
default: default,
map: UnsafeCell::new(HashMap::new()),
}
}
pub fn get_mut(&mut self, v: I) -> &mut T {
let m = unsafe { &mut *self.map.get() };
m.entry(v).or_insert_with(|| Box::new(self.default.clone()))
}
pub fn get(&self, v: I) -> &T {
let m = unsafe { &mut *self.map.get() };
m.entry(v).or_insert_with(|| Box::new(self.default.clone()))
}
}
#[test]
fn test() {
let mut m = DefaultHashMap::new(10usize);
*m.get_mut(4) = 40;
let a = m.get(4);
for i in 1..1024 {
m.get(i);
}
assert_eq!(a, m.get(4));
assert_eq!(40, *m.get(4));
}
(Playground)
Since you cannot1 mutate the value returned from get, I'd just return a reference to the default value when the value is missing. When you call get_mut however, you can then add the value to the map and return the reference to the newly-added value.
This has the nice benefit of not needing any unsafe code.
use std::{borrow::Borrow, collections::HashMap, hash::Hash};
pub struct DefaultHashMap<K, V> {
default: V,
map: HashMap<K, V>,
}
impl<K, V> DefaultHashMap<K, V>
where
K: Hash + Eq,
V: Clone,
{
pub fn new(default: V) -> Self {
DefaultHashMap {
default,
map: HashMap::new(),
}
}
pub fn get_mut(&mut self, v: K) -> &mut V {
let def = &self.default;
self.map.entry(v).or_insert_with(|| def.clone())
}
pub fn get<B>(&self, v: B) -> &V
where
B: Borrow<K>,
{
self.map.get(v.borrow()).unwrap_or(&self.default)
}
}
#[test]
fn test() {
let mut m = DefaultHashMap::new(10usize);
*m.get_mut(4) = 40;
let a = m.get(4);
for i in 1..1024 {
m.get(i);
}
assert_eq!(a, m.get(4));
assert_eq!(40, *m.get(4));
}
[1]: Technically this will have different behavior if your default value contains internal mutability. In that case, modifications to the default value would apply across the collection. If that's a concern, you'd need to use a solution closer to your original.
I think that you are covered by the borrowing rules here.
Applying the Mutability XOR Aliasing principle here, unsafety would crop up if you could maintain multiple paths to the same value and mutate something at the same time.
In your case, however:
while the internal HashMap can be mutated even through an aliasable reference to DefaultHashMap, nobody has a reference into the HashMap itself
while there are references into the Box, there is no possibility here to erase a Box, so no dangling pointer from here
since you take care to preserve the borrowing relationship (ie, &mut T is only obtained through a &mut DefaultHashMap), it is not possible to have a &mut T and an alias into it
So, your short example looks safe, however be especially wary of not accidentally introducing a method on &DefaultHashMap which would allow to modify an existing value as this would be a short road to dangling pointers.
Personally, I would execute all tests with an Option<String>.

Is it possible to use a HashSet as the key to a HashMap?

I would like to use a HashSet as the key to a HashMap. Is this possible?
use std::collections::{HashMap, HashSet};
fn main() {
let hmap: HashMap<HashSet<usize>, String> = HashMap::new();
}
gives the following error:
error[E0277]: the trait bound `std::collections::HashSet<usize>: std::hash::Hash` is not satisfied
--> src/main.rs:4:49
|
4 | let hmap: HashMap<HashSet<usize>, String> = HashMap::new();
| ^^^^^^^^^^^^ the trait `std::hash::Hash` is not implemented for `std::collections::HashSet<usize>`
|
= note: required by `<std::collections::HashMap<K, V>>::new`
To make something the key of a HashMap, you need to satisfy 3 traits:
Hash — How do you calculate a hash value for the type?
PartialEq — How do you decide if two instances of a type are the same?
Eq — Can you guarantee that the equality is reflexive, symmetric, and transitive? This requires PartialEq.
This is based on the definition of HashMap:
impl<K: Hash + Eq, V> HashMap<K, V, RandomState> {
pub fn new() -> HashMap<K, V, RandomState> { /* ... */ }
}
Checking out the docs for HashSet, you can see what traits it implements (listed at the bottom of the page).
There isn't an implementation of Hash for HashSet, so it cannot be used as a key in a HashMap. That being said, if you have a rational way of computing the hash of a HashSet, then you could create a "newtype" around the HashSet and implement these three traits on it.
Here's an example for the "newtype":
use std::{
collections::{HashMap, HashSet},
hash::{Hash, Hasher},
};
struct Wrapper<T>(HashSet<T>);
impl<T> PartialEq for Wrapper<T>
where
T: Eq + Hash,
{
fn eq(&self, other: &Wrapper<T>) -> bool {
self.0 == other.0
}
}
impl<T> Eq for Wrapper<T> where T: Eq + Hash {}
impl<T> Hash for Wrapper<T> {
fn hash<H>(&self, _state: &mut H)
where
H: Hasher,
{
// do something smart here!!!
}
}
fn main() {
let hmap: HashMap<Wrapper<u32>, String> = HashMap::new();
}

Resources