How to write a safe wrap for HashMap with default value - rust

I implemented a wrap for HashMap with default values and I would like to know if it's safe.
When get is called, the internal map may be resized and previous references to values (obtained with get) would be pointing to invalid address. I tried to solve this problem using the idea that "all problems in computer science can be solved by another level of indirection" (Butler Lampson). I would like to know if this trick makes this code safe.
use std::cell::UnsafeCell;
use std::collections::HashMap;
use std::hash::Hash;
pub struct DefaultHashMap<I: Hash + Eq, T: Clone> {
default: T,
map: UnsafeCell<HashMap<I, Box<T>>>,
}
impl<I: Hash + Eq, T: Clone> DefaultHashMap<I, T> {
pub fn new(default: T) -> Self {
DefaultHashMap {
default: default,
map: UnsafeCell::new(HashMap::new()),
}
}
pub fn get_mut(&mut self, v: I) -> &mut T {
let m = unsafe { &mut *self.map.get() };
m.entry(v).or_insert_with(|| Box::new(self.default.clone()))
}
pub fn get(&self, v: I) -> &T {
let m = unsafe { &mut *self.map.get() };
m.entry(v).or_insert_with(|| Box::new(self.default.clone()))
}
}
#[test]
fn test() {
let mut m = DefaultHashMap::new(10usize);
*m.get_mut(4) = 40;
let a = m.get(4);
for i in 1..1024 {
m.get(i);
}
assert_eq!(a, m.get(4));
assert_eq!(40, *m.get(4));
}
(Playground)

Since you cannot1 mutate the value returned from get, I'd just return a reference to the default value when the value is missing. When you call get_mut however, you can then add the value to the map and return the reference to the newly-added value.
This has the nice benefit of not needing any unsafe code.
use std::{borrow::Borrow, collections::HashMap, hash::Hash};
pub struct DefaultHashMap<K, V> {
default: V,
map: HashMap<K, V>,
}
impl<K, V> DefaultHashMap<K, V>
where
K: Hash + Eq,
V: Clone,
{
pub fn new(default: V) -> Self {
DefaultHashMap {
default,
map: HashMap::new(),
}
}
pub fn get_mut(&mut self, v: K) -> &mut V {
let def = &self.default;
self.map.entry(v).or_insert_with(|| def.clone())
}
pub fn get<B>(&self, v: B) -> &V
where
B: Borrow<K>,
{
self.map.get(v.borrow()).unwrap_or(&self.default)
}
}
#[test]
fn test() {
let mut m = DefaultHashMap::new(10usize);
*m.get_mut(4) = 40;
let a = m.get(4);
for i in 1..1024 {
m.get(i);
}
assert_eq!(a, m.get(4));
assert_eq!(40, *m.get(4));
}
[1]: Technically this will have different behavior if your default value contains internal mutability. In that case, modifications to the default value would apply across the collection. If that's a concern, you'd need to use a solution closer to your original.

I think that you are covered by the borrowing rules here.
Applying the Mutability XOR Aliasing principle here, unsafety would crop up if you could maintain multiple paths to the same value and mutate something at the same time.
In your case, however:
while the internal HashMap can be mutated even through an aliasable reference to DefaultHashMap, nobody has a reference into the HashMap itself
while there are references into the Box, there is no possibility here to erase a Box, so no dangling pointer from here
since you take care to preserve the borrowing relationship (ie, &mut T is only obtained through a &mut DefaultHashMap), it is not possible to have a &mut T and an alias into it
So, your short example looks safe, however be especially wary of not accidentally introducing a method on &DefaultHashMap which would allow to modify an existing value as this would be a short road to dangling pointers.
Personally, I would execute all tests with an Option<String>.

Related

Create a Set of Sets

How does one create a set of sets in Rust? Is it necessary to write an impl block for every concrete type satisfying HashSet<HashSet<_>>?
Minimal failing example:
fn main () {
let a: HashSet<u32> = HashSet::new();
let c: HashSet<HashSet<u32>> = HashSet::new();
c.insert(a);
}
Error:
"insert" method cannot be called on `std::collections::HashSet<std::collections::HashSet<u32>>` due to unsatisfied trait bounds
HashSet doesn't satisfy `std::collections::HashSet<u32>: Hash
Is it possible to override the fact that HashSet is unhashable? I'd like to use a HashSet and need my contents to be unique by actual (memory) equality; I don't need to unique by contents.
I'd like to have a set of sets and want them to be unique by "actual" (memory) equality, not by contents.
To do so you first need to box the hashset so that it has a stable memory address. For example:
struct Set<T>(Box<HashSet<T>>);
To make your Set hashable, you'll need to implement Hash and Eq:
impl<T> Set<T> {
fn as_addr(&self) -> usize {
// as_ref() gives the reference to the heap-allocated contents
// inside the Box, which is stable; convert that reference to a
// pointer and then to usize, and use it for hashing and equality.
self.0.as_ref() as *const _ as usize
}
}
impl<T> Hash for Set<T> {
fn hash<H: Hasher>(&self, state: &mut H) {
self.as_addr().hash(state);
}
}
impl<T> Eq for Set<T> {}
impl<T> PartialEq for Set<T> {
fn eq(&self, other: &Self) -> bool {
self.as_addr() == other.as_addr()
}
}
Finally, you'll need to add some set-like methods and a constructor to make it usable:
impl<T: Hash + Eq> Set<T> {
pub fn new() -> Self {
Set(Box::new(HashSet::new()))
}
pub fn insert(&mut self, value: T) {
self.0.insert(value);
}
pub fn contains(&mut self, value: &T) -> bool {
self.0.contains(value)
}
}
Now your code will work, with the additional use of Rc so that you have the original Set available for lookup after you insert it:
fn main() {
let mut a: Set<u32> = Set::new();
a.insert(1);
let a = Rc::new(a);
let mut c: HashSet<_> = HashSet::new();
c.insert(Rc::clone(&a));
assert!(c.contains(&a));
}
Playground
As pointed out helpfully in the comments, it's not possible to hash sets because they have no fixed address. An effective, if inelegant, solution, is to wrap them in a specialized struct:
struct HashableHashSet<T> {
hash: ...
hashset: HashSet<T>
}
And then hash the struct by memory equality.

How to update a key in a HashMap?

I can't find any method which allows me to update the key of a HashMap. There is get_key_value, but it returns an immutable reference and not a mutable reference.
You generally1 cannot. From the HashMap documentation:
It is a logic error for a key to be modified in such a way that the key's hash, as determined by the Hash trait, or its equality, as determined by the Eq trait, changes while it is in the map. This is normally only possible through Cell, RefCell, global state, I/O, or unsafe code.
Instead, remove the value and re-insert it:
use std::{collections::HashMap, hash::Hash};
fn rename_key<K, V>(h: &mut HashMap<K, V>, old_key: &K, new_key: K)
where
K: Eq + Hash,
{
if let Some(v) = h.remove(old_key) {
h.insert(new_key, v);
}
}
See also:
Changing key of HashMap from child method
1 — As the documentation states, it's possible to modify it, so long as you don't change how the key is hashed or compared. Doing so will cause the HashMap to be in an invalid state.
An example of doing it correctly (but it's still dubious why):
use std::{
cell::RefCell,
collections::HashMap,
hash::{Hash, Hasher},
sync::Arc,
};
#[derive(Debug)]
struct Example<A, B>(A, B);
impl<A, B> PartialEq for Example<A, B>
where
A: PartialEq,
{
fn eq(&self, other: &Self) -> bool {
self.0 == other.0
}
}
impl<A, B> Eq for Example<A, B> where A: Eq {}
impl<A, B> Hash for Example<A, B>
where
A: Hash,
{
fn hash<H>(&self, h: &mut H)
where
H: Hasher,
{
self.0.hash(h)
}
}
fn main() {
let mut h = HashMap::new();
let key = Arc::new(Example(0, RefCell::new(false)));
h.insert(key.clone(), "alpha");
dbg!(&h);
*key.1.borrow_mut() = true;
dbg!(&h);
}
There are other techniques you can use to get a mutable reference to the key, such as the unstable raw_entry_mut (as mentioned by Sven Marnach).

Caching/memoization vs object lifetime

My program is structured as a series of function calls building up the resulting value - each function returns (moves) the returned value to it's caller. This is a simplified version:
struct Value {}
struct ValueBuilder {}
impl ValueBuilder {
pub fn do_things_with_value(&mut self, v : &Value) {
// expensive computations
}
pub fn make_value(&self) -> Value {
Value {}
}
pub fn f(&mut self) -> Value {
let v = self.make_value();
self.do_things_with_value(&v);
v
}
pub fn g(&mut self) -> Value {
let v = self.f();
self.do_things_with_value(&v);
v
}
}
play.rust-lang version
Imagine that there are many more functions similar to f and g, both between them and above. You can see that do_things_with_value is called twice with the same value. I would like to cache/memoize this call so that in the example below "expensive computations" are performed only once. This is my (obviously incorrect) attempt:
#[derive(PartialEq)]
struct Value {}
struct ValueBuilder<'a> {
seen_values: Vec<&'a Value>,
}
impl<'a> ValueBuilder<'a> {
pub fn do_things_with_value(&mut self, v: &'a Value) {
if self.seen_values.iter().any(|x| **x == *v) {
return;
}
self.seen_values.push(v)
// expensive computations
}
pub fn make_value(&self) -> Value {
Value {}
}
pub fn f(&mut self) -> Value {
let v = self.make_value();
self.do_things_with_value(&v); // error: `v` does not live long enough
v
}
pub fn g(&mut self) -> Value {
let v = self.f();
self.do_things_with_value(&v);
v
}
}
play.rust-lang version
I understand why the compiler is doing it - while in this case it happens that v is not dropped between two calls to do_things_with_value, there is no guarantee that it will not be dropped, and dereferencing it would crash the program.
What is a better way to structure this program? Let's assume that:
cloning and storing Values is expensive, and we can't afford seen_values keeping a copy of everything we've ever seen
we also can't refactor the code / Value object to carry additional data (i.e. a bool indicating whether we did expensive computations with this value). It needs to rely on comparing the values using PartialEq
If you need to keep the same value at different points in the program it's easiest to copy or clone it.
However, if cloning is not an option because it is too expensive wrap the values in an Rc. That is a reference counted smart pointer which allows shared ownership of its content. It is relatively cheap to clone without duplicating the contained value.
Note that simply storing Rc<Value> in seen_values will keep all values alive at least as long as the value builder lives. You can avoid that by storing Weak references.
use std::rc::{Rc, Weak};
#[derive(PartialEq)]
struct Value {}
struct ValueBuilder {
seen_values: Vec<Weak<Value>>,
}
impl ValueBuilder {
pub fn do_things_with_value(&mut self, v: &Rc<Value>) {
if self
.seen_values
.iter()
.any(|x| x.upgrade().as_ref() == Some(v))
{
return;
}
self.seen_values.push(Rc::downgrade(v))
// expensive computations
}
pub fn make_value(&self) -> Rc<Value> {
Rc::new(Value {})
}
pub fn f(&mut self) -> Rc<Value> {
let v = self.make_value();
self.do_things_with_value(&v);
v
}
pub fn g(&mut self) -> Rc<Value> {
let v = self.f();
self.do_things_with_value(&v);
v
}
}
While a Rc<Value> is in use by the chain of functions do_things() will remember the value and skip computations. If a value becomes unused (all references dropped) and is later created again, do_things() will repeat the computations.

Implementing only IndexMut without implementing Index

I'm trying to create a DefaultHashMap struct which is basically a wrapper around HashMap, with the difference that when getting a key that is not in the map the default value is put in that key and is returned.
I made a get and a get_mut method and this works fine. Now I'm trying to implement Index and IndexMut as wrappers around those methods. Here I'm running into two problems.
The first problem is caused by the fact that get has to mutate the struct when the key is not present it requires a mutable reference. However, the signature for the index method of Index has &self instead of &mut self, so I cannot implement it.
This causes a second problem, IndexMut requires an Index implementation. So even though IndexMut would have no problems being implemented I cannot do this because Index cannot be implemented.
The first problem is annoying but understandable. For the second one, I don't get why the requirement is even there. I would like to have a way to work around it. Right now I'm doing the following, but I hope someone has a better solution:
impl<K: Eq + Hash, V: Clone> Index<K> for DefaultHashMap<K, V> {
type Output = V;
fn index(&self, _: K) -> &V {
panic!("DefautHashMap doesn't implement indexing without mutating")
}
}
impl<K: Eq + Hash, V: Clone> IndexMut<K> for DefaultHashMap<K, V> {
#[inline]
fn index_mut(&mut self, index: K) -> &mut V {
self.get_mut(index)
}
}
First, I suspect that your requirement "when getting a key that is not in the map the default value is put in that key" is not exactly required!
Consider an immutable access let foo = default_hash_map[bar] + 123;. Unless you're going to use values with interior mutability with the map it might be inconsequential whether default_hash_map[bar] is actually creating a key or just returns a reference to a single default value.
Now, if you really need to create new entries during access then there is a way to do this. The borrow checker restriction that only allows you to add new entries with a mutable access is here to stop you from creating the dangling pointers that would occur whenever you modify the map while holding the references in there. But if you were using a structure with stable references, where stable means that the references are not invalidated when you enter new entries into the structure, then the problem the borrow checker is trying to prevent will go away.
In C++ I would've considered using a deque which is guaranteed by the standard not to invalidate its references when you add new entries to it. Unfortunately, Rust deques are different (though you can probably find arena allocator crates with properties similar to the C++ deque) and so for this example I'm using Box. The boxed values reside separately on the heap and aren't moved when you add new entries into HashMap.
Now, your normal access pattern is probably going to be modifying the new entries and then accessing the existing entries of the map. Thus making new entries in Index::index is an exception and shouldn't slow down the rest of the map. It might make sense therefore to pay the boxing price only for the Index::index access. To do that we might use a second structure, that keeps just the boxed Index::index values.
Knowing that HashMap<K, Box<V>> can be inserted into without invalidating the existing V refereces allows us to use it as a temporary buffer, holding the Index::index-created values until we get a chance to synchronize them with the primary HashMap.
use std::borrow::Borrow;
use std::cell::UnsafeCell;
use std::collections::HashMap;
use std::hash::Hash;
use std::ops::Index;
use std::ops::IndexMut;
struct DefaultHashMap<K, V>(HashMap<K, V>, UnsafeCell<HashMap<K, Box<V>>>, V);
impl<K, V> DefaultHashMap<K, V>
where K: Eq + Hash
{
fn sync(&mut self) {
let buf_map = unsafe { &mut *self.1.get() };
for (k, v) in buf_map.drain() {
self.0.insert(k, *v);
}
}
}
impl<'a, K, V, Q: ?Sized> Index<&'a Q> for DefaultHashMap<K, V>
where K: Eq + Hash + Clone,
K: Borrow<Q>,
K: From<&'a Q>,
Q: Eq + Hash,
V: Clone
{
type Output = V;
fn index(&self, key: &'a Q) -> &V {
if let Some(v) = self.0.get(key) {
v
} else {
let buf_map: &mut HashMap<K, Box<V>> = unsafe { &mut *self.1.get() };
if !buf_map.contains_key(key) {
buf_map.insert(K::from(key), Box::new(self.2.clone()));
}
&*buf_map.get(key).unwrap()
}
}
}
impl<'a, K, V, Q: ?Sized> IndexMut<&'a Q> for DefaultHashMap<K, V>
where K: Eq + Hash + Clone,
K: Borrow<Q>,
K: From<&'a Q>,
Q: Eq + Hash,
V: Clone
{
fn index_mut(&mut self, key: &'a Q) -> &mut V {
self.sync();
if self.0.contains_key(key) {
self.0.get_mut(key).unwrap()
} else {
self.0.insert(K::from(key), self.2.clone());
self.0.get_mut(key).unwrap()
}
}
}
fn main() {
{
let mut dhm = DefaultHashMap::<String, String>(HashMap::new(),
UnsafeCell::new(HashMap::new()),
"bar".into());
for i in 0..10000 {
dhm[&format!("{}", i % 1000)[..]].push('x')
}
println!("{:?}", dhm.0);
}
{
let mut dhm = DefaultHashMap::<String, String>(HashMap::new(),
UnsafeCell::new(HashMap::new()),
"bar".into());
for i in 0..10000 {
let key = format!("{}", i % 1000);
assert!(dhm[&key].len() >= 3);
dhm[&key[..]].push('x');
}
println!("{:?}", dhm.0);
}
{
#[derive(Eq, PartialEq, Clone, Copy, Hash, Debug)]
struct K(u32);
impl<'a> From<&'a u32> for K {
fn from(v: &u32) -> K {
K(*v)
}
}
impl<'a> Borrow<u32> for K {
fn borrow(&self) -> &u32 {
&self.0
}
}
let mut dhm = DefaultHashMap::<K, K>(HashMap::new(),
UnsafeCell::new(HashMap::new()),
K::from(&123));
for i in 0..10000 {
let key = i % 1000;
assert!(dhm[&key].0 >= 123);
dhm[&key].0 += 1;
}
println!("{:?}", dhm.0);
}
}
(playground)
Note that boxing only stabilizes the insertion of new entries. To remove the boxed entries you still need the mutable (&mut self) access to DefaultHashMap.

How can I implement Ord when the comparison depends on data not part of the compared items?

I have a small struct containing only an i32:
struct MyStruct {
value: i32,
}
I want to implement Ord in order to store MyStruct in a BTreeMap or any other data structure that requires you to have Ord on its elements.
In my case, comparing two instances of MyStruct does not depend on the values in them, but asking another data structure (a dictionary), and that data structure is unique for each instance of the BTreeMap I will create. So ideally it would look like this:
impl Ord for MyStruct {
fn cmp(&self, other: &Self, dict: &Dictionary) -> Ordering {
dict.lookup(self.value).cmp(dict.lookup(other.value))
}
}
However this won't be possible, since an Ord implementation only can access two instances of MyStruct, nothing more.
One solution would be storing a pointer to the dictionary in MyStruct but that's overkill. MyStruct is supposed to be a simple wrapper and the pointer would double its size. Another solution is to use a static global, but that's not a good solution either.
In C++ the solution would be easy: Most STL algorithms/data structures let you pass a comparator, where it can be a function object with some state. So I believe Rust would have an idiom to match this somehow, is there any way to accomplish this?
Rust (more specifically Rust's libcollections) currently has no comparator-like construct, so using a mutable static is probably your best bet. This is also used within rustc, e.g. the string interner is static. With that said, the use case isn't exactly uncommon, so maybe if we petition for it, Rust will get external comparators one day.
I remember the debate over whether allowing a custom comparator was worth it or not, and it was decided that this complicated the API a lot when most of the times one could achieve the same effect by using a new (wrapping) type and redefine PartialOrd for it.
It was, ultimately, a trade-off: weighing API simplicity versus unusual needs (which are probably summed up as access to external resources).
In your specific case, there are two solutions:
use the API the way it was intended: create a wrapper structure containing both an instance of MyStruct and a reference to the dictionary, then define Ord on that wrapper and use this as key in the BTreeMap
circumvent the API... somehow
I would personally advise starting with using the API as intended, and measure, before going down the road of trying to circumvent it.
#ker was kind enough to provide the following illustration of achieving wrapping in comments (playground version):
#[derive(Eq, PartialEq, Debug)]
struct MyStruct {
value: i32,
}
#[derive(Debug)]
struct MyStructAsKey<'a> {
inner: MyStruct,
dict: &'a Dictionary,
}
impl<'a> Eq for MyStructAsKey<'a> {}
impl<'a> PartialEq for MyStructAsKey<'a> {
fn eq(&self, other: &Self) -> bool {
self.inner == other.inner && self.dict as *const _ as usize == other.dict as *const _ as usize
}
}
impl<'a> Ord for MyStructAsKey<'a> {
fn cmp(&self, other: &Self) -> ::std::cmp::Ordering {
self.dict.lookup(&self.inner).cmp(&other.dict.lookup(&other.inner))
}
}
impl<'a> PartialOrd for MyStructAsKey<'a> {
fn partial_cmp(&self, other: &Self) -> Option<::std::cmp::Ordering> {
Some(self.dict.lookup(&self.inner).cmp(&other.dict.lookup(&other.inner)))
}
}
#[derive(Default, Debug)]
struct Dictionary(::std::cell::RefCell<::std::collections::HashMap<i32, u64>>);
impl Dictionary {
fn ord_key<'a>(&'a self, ms: MyStruct) -> MyStructAsKey<'a> {
MyStructAsKey {
inner: ms,
dict: self,
}
}
fn lookup(&self, key: &MyStruct) -> u64 {
self.0.borrow()[&key.value]
}
fn create(&self, value: u64) -> MyStruct {
let mut map = self.0.borrow_mut();
let n = map.len();
assert!(n as i32 as usize == n);
let n = n as i32;
map.insert(n, value);
MyStruct {
value: n,
}
}
}
fn main() {
let dict = Dictionary::default();
let a = dict.create(99);
let b = dict.create(42);
let mut set = ::std::collections::BTreeSet::new();
set.insert(dict.ord_key(a));
set.insert(dict.ord_key(b));
println!("{:#?}", set);
let c = dict.create(1000);
let d = dict.create(0);
set.insert(dict.ord_key(c));
set.insert(dict.ord_key(d));
println!("{:#?}", set);
}

Resources