Create a Set of Sets - rust

How does one create a set of sets in Rust? Is it necessary to write an impl block for every concrete type satisfying HashSet<HashSet<_>>?
Minimal failing example:
fn main () {
let a: HashSet<u32> = HashSet::new();
let c: HashSet<HashSet<u32>> = HashSet::new();
c.insert(a);
}
Error:
"insert" method cannot be called on `std::collections::HashSet<std::collections::HashSet<u32>>` due to unsatisfied trait bounds
HashSet doesn't satisfy `std::collections::HashSet<u32>: Hash
Is it possible to override the fact that HashSet is unhashable? I'd like to use a HashSet and need my contents to be unique by actual (memory) equality; I don't need to unique by contents.

I'd like to have a set of sets and want them to be unique by "actual" (memory) equality, not by contents.
To do so you first need to box the hashset so that it has a stable memory address. For example:
struct Set<T>(Box<HashSet<T>>);
To make your Set hashable, you'll need to implement Hash and Eq:
impl<T> Set<T> {
fn as_addr(&self) -> usize {
// as_ref() gives the reference to the heap-allocated contents
// inside the Box, which is stable; convert that reference to a
// pointer and then to usize, and use it for hashing and equality.
self.0.as_ref() as *const _ as usize
}
}
impl<T> Hash for Set<T> {
fn hash<H: Hasher>(&self, state: &mut H) {
self.as_addr().hash(state);
}
}
impl<T> Eq for Set<T> {}
impl<T> PartialEq for Set<T> {
fn eq(&self, other: &Self) -> bool {
self.as_addr() == other.as_addr()
}
}
Finally, you'll need to add some set-like methods and a constructor to make it usable:
impl<T: Hash + Eq> Set<T> {
pub fn new() -> Self {
Set(Box::new(HashSet::new()))
}
pub fn insert(&mut self, value: T) {
self.0.insert(value);
}
pub fn contains(&mut self, value: &T) -> bool {
self.0.contains(value)
}
}
Now your code will work, with the additional use of Rc so that you have the original Set available for lookup after you insert it:
fn main() {
let mut a: Set<u32> = Set::new();
a.insert(1);
let a = Rc::new(a);
let mut c: HashSet<_> = HashSet::new();
c.insert(Rc::clone(&a));
assert!(c.contains(&a));
}
Playground

As pointed out helpfully in the comments, it's not possible to hash sets because they have no fixed address. An effective, if inelegant, solution, is to wrap them in a specialized struct:
struct HashableHashSet<T> {
hash: ...
hashset: HashSet<T>
}
And then hash the struct by memory equality.

Related

How to avoid moving the key with HashMap entry/or_insert idiom if the entry already exists? [duplicate]

Is it possible to use the Entry API to get a value by a AsRef<str>, but inserting it with Into<String>?
This is the working example:
use std::collections::hash_map::{Entry, HashMap};
struct Foo;
#[derive(Default)]
struct Map {
map: HashMap<String, Foo>,
}
impl Map {
fn get(&self, key: impl AsRef<str>) -> &Foo {
self.map.get(key.as_ref()).unwrap()
}
fn create(&mut self, key: impl Into<String>) -> &mut Foo {
match self.map.entry(key.into()) {
Entry::Vacant(entry) => entry.insert(Foo {}),
_ => panic!(),
}
}
fn get_or_create(&mut self, key: impl Into<String>) -> &mut Foo {
match self.map.entry(key.into()) {
Entry::Vacant(entry) => entry.insert(Foo {}),
Entry::Occupied(entry) => entry.into_mut(),
}
}
}
fn main() {
let mut map = Map::default();
map.get_or_create("bar");
map.get_or_create("bar");
assert_eq!(map.map.len(), 1);
}
playground
My problem is that in get_or_create a String will always be created, incurring unneeded memory allocation, even if it's not needed for an occupied entry. Is it possible to fix this in any way? Maybe in a neat way with Cow?
You cannot, safely. This is a limitation of the current entry API, and there's no great solution. The anticipated solution is the "raw" entry API. See Stargateur's answer for an example of using it.
The only stable solution using the Entry API is to always clone the key:
map.entry(key.clone()).or_insert(some_value);
Outside of the Entry API, you can check if the map contains a value and insert it if not:
if !map.contains_key(&key) {
map.insert(key.clone(), some_value);
}
map.get(&key).expect("This is impossible as we just inserted a value");
See also:
[Pre-RFC] Abandonning Morals In The Name Of Performance: The Raw Entry API
WIP: add raw_entry API to HashMap (50821)
Extend entry API to work on borrowed keys. (1769)
Add HashMap.entry_or_clone() method (1203)
For non-entry based solutions, see:
How to avoid temporary allocations when using a complex key for a HashMap?
How to implement HashMap with two keys?
In nightly Rust, you can use the unstable raw_entry_mut() feature that allows this:
Creates a raw entry builder for the HashMap.
[...]
Raw entries are useful for such exotic situations as:
Deferring the creation of an owned key until it is known to be required
In stable Rust, you can add the hashbrown crate which has the same API but stable. The hashbrown crate is actually the underlying implementation of the standard library's hashmap.
Example:
#![feature(hash_raw_entry)]
use std::collections::HashMap;
#[derive(Hash, PartialEq, Eq, Debug)]
struct NoCopy {
foo: i32,
}
impl Clone for NoCopy {
fn clone(&self) -> Self {
println!("Clone of: {:?}", self);
Self { foo: self.foo }
}
}
fn main() {
let no_copy = NoCopy { foo: 21 };
let mut map = HashMap::new();
map.raw_entry_mut()
.from_key(&no_copy)
.or_insert_with(|| (no_copy.clone(), 42));
map.raw_entry_mut()
.from_key(&no_copy)
.or_insert_with(|| (no_copy.clone(), 84));
println!("{:#?}", map);
}
Applied to your original example:
fn get_or_create<K>(&mut self, key: K) -> &mut Foo
where
K: AsRef<str> + Into<String>,
{
self.map
.raw_entry_mut()
.from_key(key.as_ref())
.or_insert_with(|| (key.into(), Foo))
.1
}

Using Entry with a &str even though the key is String in Rust? [duplicate]

Is it possible to use the Entry API to get a value by a AsRef<str>, but inserting it with Into<String>?
This is the working example:
use std::collections::hash_map::{Entry, HashMap};
struct Foo;
#[derive(Default)]
struct Map {
map: HashMap<String, Foo>,
}
impl Map {
fn get(&self, key: impl AsRef<str>) -> &Foo {
self.map.get(key.as_ref()).unwrap()
}
fn create(&mut self, key: impl Into<String>) -> &mut Foo {
match self.map.entry(key.into()) {
Entry::Vacant(entry) => entry.insert(Foo {}),
_ => panic!(),
}
}
fn get_or_create(&mut self, key: impl Into<String>) -> &mut Foo {
match self.map.entry(key.into()) {
Entry::Vacant(entry) => entry.insert(Foo {}),
Entry::Occupied(entry) => entry.into_mut(),
}
}
}
fn main() {
let mut map = Map::default();
map.get_or_create("bar");
map.get_or_create("bar");
assert_eq!(map.map.len(), 1);
}
playground
My problem is that in get_or_create a String will always be created, incurring unneeded memory allocation, even if it's not needed for an occupied entry. Is it possible to fix this in any way? Maybe in a neat way with Cow?
You cannot, safely. This is a limitation of the current entry API, and there's no great solution. The anticipated solution is the "raw" entry API. See Stargateur's answer for an example of using it.
The only stable solution using the Entry API is to always clone the key:
map.entry(key.clone()).or_insert(some_value);
Outside of the Entry API, you can check if the map contains a value and insert it if not:
if !map.contains_key(&key) {
map.insert(key.clone(), some_value);
}
map.get(&key).expect("This is impossible as we just inserted a value");
See also:
[Pre-RFC] Abandonning Morals In The Name Of Performance: The Raw Entry API
WIP: add raw_entry API to HashMap (50821)
Extend entry API to work on borrowed keys. (1769)
Add HashMap.entry_or_clone() method (1203)
For non-entry based solutions, see:
How to avoid temporary allocations when using a complex key for a HashMap?
How to implement HashMap with two keys?
In nightly Rust, you can use the unstable raw_entry_mut() feature that allows this:
Creates a raw entry builder for the HashMap.
[...]
Raw entries are useful for such exotic situations as:
Deferring the creation of an owned key until it is known to be required
In stable Rust, you can add the hashbrown crate which has the same API but stable. The hashbrown crate is actually the underlying implementation of the standard library's hashmap.
Example:
#![feature(hash_raw_entry)]
use std::collections::HashMap;
#[derive(Hash, PartialEq, Eq, Debug)]
struct NoCopy {
foo: i32,
}
impl Clone for NoCopy {
fn clone(&self) -> Self {
println!("Clone of: {:?}", self);
Self { foo: self.foo }
}
}
fn main() {
let no_copy = NoCopy { foo: 21 };
let mut map = HashMap::new();
map.raw_entry_mut()
.from_key(&no_copy)
.or_insert_with(|| (no_copy.clone(), 42));
map.raw_entry_mut()
.from_key(&no_copy)
.or_insert_with(|| (no_copy.clone(), 84));
println!("{:#?}", map);
}
Applied to your original example:
fn get_or_create<K>(&mut self, key: K) -> &mut Foo
where
K: AsRef<str> + Into<String>,
{
self.map
.raw_entry_mut()
.from_key(key.as_ref())
.or_insert_with(|| (key.into(), Foo))
.1
}

How do I use the Entry API with an expensive key that is only constructed if the Entry is Vacant?

Is it possible to use the Entry API to get a value by a AsRef<str>, but inserting it with Into<String>?
This is the working example:
use std::collections::hash_map::{Entry, HashMap};
struct Foo;
#[derive(Default)]
struct Map {
map: HashMap<String, Foo>,
}
impl Map {
fn get(&self, key: impl AsRef<str>) -> &Foo {
self.map.get(key.as_ref()).unwrap()
}
fn create(&mut self, key: impl Into<String>) -> &mut Foo {
match self.map.entry(key.into()) {
Entry::Vacant(entry) => entry.insert(Foo {}),
_ => panic!(),
}
}
fn get_or_create(&mut self, key: impl Into<String>) -> &mut Foo {
match self.map.entry(key.into()) {
Entry::Vacant(entry) => entry.insert(Foo {}),
Entry::Occupied(entry) => entry.into_mut(),
}
}
}
fn main() {
let mut map = Map::default();
map.get_or_create("bar");
map.get_or_create("bar");
assert_eq!(map.map.len(), 1);
}
playground
My problem is that in get_or_create a String will always be created, incurring unneeded memory allocation, even if it's not needed for an occupied entry. Is it possible to fix this in any way? Maybe in a neat way with Cow?
You cannot, safely. This is a limitation of the current entry API, and there's no great solution. The anticipated solution is the "raw" entry API. See Stargateur's answer for an example of using it.
The only stable solution using the Entry API is to always clone the key:
map.entry(key.clone()).or_insert(some_value);
Outside of the Entry API, you can check if the map contains a value and insert it if not:
if !map.contains_key(&key) {
map.insert(key.clone(), some_value);
}
map.get(&key).expect("This is impossible as we just inserted a value");
See also:
[Pre-RFC] Abandonning Morals In The Name Of Performance: The Raw Entry API
WIP: add raw_entry API to HashMap (50821)
Extend entry API to work on borrowed keys. (1769)
Add HashMap.entry_or_clone() method (1203)
For non-entry based solutions, see:
How to avoid temporary allocations when using a complex key for a HashMap?
How to implement HashMap with two keys?
In nightly Rust, you can use the unstable raw_entry_mut() feature that allows this:
Creates a raw entry builder for the HashMap.
[...]
Raw entries are useful for such exotic situations as:
Deferring the creation of an owned key until it is known to be required
In stable Rust, you can add the hashbrown crate which has the same API but stable. The hashbrown crate is actually the underlying implementation of the standard library's hashmap.
Example:
#![feature(hash_raw_entry)]
use std::collections::HashMap;
#[derive(Hash, PartialEq, Eq, Debug)]
struct NoCopy {
foo: i32,
}
impl Clone for NoCopy {
fn clone(&self) -> Self {
println!("Clone of: {:?}", self);
Self { foo: self.foo }
}
}
fn main() {
let no_copy = NoCopy { foo: 21 };
let mut map = HashMap::new();
map.raw_entry_mut()
.from_key(&no_copy)
.or_insert_with(|| (no_copy.clone(), 42));
map.raw_entry_mut()
.from_key(&no_copy)
.or_insert_with(|| (no_copy.clone(), 84));
println!("{:#?}", map);
}
Applied to your original example:
fn get_or_create<K>(&mut self, key: K) -> &mut Foo
where
K: AsRef<str> + Into<String>,
{
self.map
.raw_entry_mut()
.from_key(key.as_ref())
.or_insert_with(|| (key.into(), Foo))
.1
}

How to write a safe wrap for HashMap with default value

I implemented a wrap for HashMap with default values and I would like to know if it's safe.
When get is called, the internal map may be resized and previous references to values (obtained with get) would be pointing to invalid address. I tried to solve this problem using the idea that "all problems in computer science can be solved by another level of indirection" (Butler Lampson). I would like to know if this trick makes this code safe.
use std::cell::UnsafeCell;
use std::collections::HashMap;
use std::hash::Hash;
pub struct DefaultHashMap<I: Hash + Eq, T: Clone> {
default: T,
map: UnsafeCell<HashMap<I, Box<T>>>,
}
impl<I: Hash + Eq, T: Clone> DefaultHashMap<I, T> {
pub fn new(default: T) -> Self {
DefaultHashMap {
default: default,
map: UnsafeCell::new(HashMap::new()),
}
}
pub fn get_mut(&mut self, v: I) -> &mut T {
let m = unsafe { &mut *self.map.get() };
m.entry(v).or_insert_with(|| Box::new(self.default.clone()))
}
pub fn get(&self, v: I) -> &T {
let m = unsafe { &mut *self.map.get() };
m.entry(v).or_insert_with(|| Box::new(self.default.clone()))
}
}
#[test]
fn test() {
let mut m = DefaultHashMap::new(10usize);
*m.get_mut(4) = 40;
let a = m.get(4);
for i in 1..1024 {
m.get(i);
}
assert_eq!(a, m.get(4));
assert_eq!(40, *m.get(4));
}
(Playground)
Since you cannot1 mutate the value returned from get, I'd just return a reference to the default value when the value is missing. When you call get_mut however, you can then add the value to the map and return the reference to the newly-added value.
This has the nice benefit of not needing any unsafe code.
use std::{borrow::Borrow, collections::HashMap, hash::Hash};
pub struct DefaultHashMap<K, V> {
default: V,
map: HashMap<K, V>,
}
impl<K, V> DefaultHashMap<K, V>
where
K: Hash + Eq,
V: Clone,
{
pub fn new(default: V) -> Self {
DefaultHashMap {
default,
map: HashMap::new(),
}
}
pub fn get_mut(&mut self, v: K) -> &mut V {
let def = &self.default;
self.map.entry(v).or_insert_with(|| def.clone())
}
pub fn get<B>(&self, v: B) -> &V
where
B: Borrow<K>,
{
self.map.get(v.borrow()).unwrap_or(&self.default)
}
}
#[test]
fn test() {
let mut m = DefaultHashMap::new(10usize);
*m.get_mut(4) = 40;
let a = m.get(4);
for i in 1..1024 {
m.get(i);
}
assert_eq!(a, m.get(4));
assert_eq!(40, *m.get(4));
}
[1]: Technically this will have different behavior if your default value contains internal mutability. In that case, modifications to the default value would apply across the collection. If that's a concern, you'd need to use a solution closer to your original.
I think that you are covered by the borrowing rules here.
Applying the Mutability XOR Aliasing principle here, unsafety would crop up if you could maintain multiple paths to the same value and mutate something at the same time.
In your case, however:
while the internal HashMap can be mutated even through an aliasable reference to DefaultHashMap, nobody has a reference into the HashMap itself
while there are references into the Box, there is no possibility here to erase a Box, so no dangling pointer from here
since you take care to preserve the borrowing relationship (ie, &mut T is only obtained through a &mut DefaultHashMap), it is not possible to have a &mut T and an alias into it
So, your short example looks safe, however be especially wary of not accidentally introducing a method on &DefaultHashMap which would allow to modify an existing value as this would be a short road to dangling pointers.
Personally, I would execute all tests with an Option<String>.

How can I implement Ord when the comparison depends on data not part of the compared items?

I have a small struct containing only an i32:
struct MyStruct {
value: i32,
}
I want to implement Ord in order to store MyStruct in a BTreeMap or any other data structure that requires you to have Ord on its elements.
In my case, comparing two instances of MyStruct does not depend on the values in them, but asking another data structure (a dictionary), and that data structure is unique for each instance of the BTreeMap I will create. So ideally it would look like this:
impl Ord for MyStruct {
fn cmp(&self, other: &Self, dict: &Dictionary) -> Ordering {
dict.lookup(self.value).cmp(dict.lookup(other.value))
}
}
However this won't be possible, since an Ord implementation only can access two instances of MyStruct, nothing more.
One solution would be storing a pointer to the dictionary in MyStruct but that's overkill. MyStruct is supposed to be a simple wrapper and the pointer would double its size. Another solution is to use a static global, but that's not a good solution either.
In C++ the solution would be easy: Most STL algorithms/data structures let you pass a comparator, where it can be a function object with some state. So I believe Rust would have an idiom to match this somehow, is there any way to accomplish this?
Rust (more specifically Rust's libcollections) currently has no comparator-like construct, so using a mutable static is probably your best bet. This is also used within rustc, e.g. the string interner is static. With that said, the use case isn't exactly uncommon, so maybe if we petition for it, Rust will get external comparators one day.
I remember the debate over whether allowing a custom comparator was worth it or not, and it was decided that this complicated the API a lot when most of the times one could achieve the same effect by using a new (wrapping) type and redefine PartialOrd for it.
It was, ultimately, a trade-off: weighing API simplicity versus unusual needs (which are probably summed up as access to external resources).
In your specific case, there are two solutions:
use the API the way it was intended: create a wrapper structure containing both an instance of MyStruct and a reference to the dictionary, then define Ord on that wrapper and use this as key in the BTreeMap
circumvent the API... somehow
I would personally advise starting with using the API as intended, and measure, before going down the road of trying to circumvent it.
#ker was kind enough to provide the following illustration of achieving wrapping in comments (playground version):
#[derive(Eq, PartialEq, Debug)]
struct MyStruct {
value: i32,
}
#[derive(Debug)]
struct MyStructAsKey<'a> {
inner: MyStruct,
dict: &'a Dictionary,
}
impl<'a> Eq for MyStructAsKey<'a> {}
impl<'a> PartialEq for MyStructAsKey<'a> {
fn eq(&self, other: &Self) -> bool {
self.inner == other.inner && self.dict as *const _ as usize == other.dict as *const _ as usize
}
}
impl<'a> Ord for MyStructAsKey<'a> {
fn cmp(&self, other: &Self) -> ::std::cmp::Ordering {
self.dict.lookup(&self.inner).cmp(&other.dict.lookup(&other.inner))
}
}
impl<'a> PartialOrd for MyStructAsKey<'a> {
fn partial_cmp(&self, other: &Self) -> Option<::std::cmp::Ordering> {
Some(self.dict.lookup(&self.inner).cmp(&other.dict.lookup(&other.inner)))
}
}
#[derive(Default, Debug)]
struct Dictionary(::std::cell::RefCell<::std::collections::HashMap<i32, u64>>);
impl Dictionary {
fn ord_key<'a>(&'a self, ms: MyStruct) -> MyStructAsKey<'a> {
MyStructAsKey {
inner: ms,
dict: self,
}
}
fn lookup(&self, key: &MyStruct) -> u64 {
self.0.borrow()[&key.value]
}
fn create(&self, value: u64) -> MyStruct {
let mut map = self.0.borrow_mut();
let n = map.len();
assert!(n as i32 as usize == n);
let n = n as i32;
map.insert(n, value);
MyStruct {
value: n,
}
}
}
fn main() {
let dict = Dictionary::default();
let a = dict.create(99);
let b = dict.create(42);
let mut set = ::std::collections::BTreeSet::new();
set.insert(dict.ord_key(a));
set.insert(dict.ord_key(b));
println!("{:#?}", set);
let c = dict.create(1000);
let d = dict.create(0);
set.insert(dict.ord_key(c));
set.insert(dict.ord_key(d));
println!("{:#?}", set);
}

Resources