Why is BTreeMap hashable, and not HashMap?

Why is BTreeMap hashable, and not HashMap? - rust

Coming from Python here.
I'm wondering why a BTreeMap is hashable. I'm not surprised a Hashmap isn't, but I don't understand why the BTreeMap is.
For example, I can do that:
let mut seen_comb: HashSet<BTreeMap<u8, u8>> = HashSet::new();
seen_comb.insert(BTreeMap::new());
But I can't do that:
let mut seen: HashSet<HashMap<u8, u8>> = HashSet::new();
seen.insert(HashMap::new());
Because I'm getting:
error[E0599]: the method `insert` exists for struct `HashSet<HashMap<u8, u8>>`, but its trait bounds were not satisfied
--> src/main.rs:14:10
|
14 | seen.insert(HashMap::new());
| ^^^^^^ method cannot be called on `HashSet<HashMap<u8, u8>>` due to unsatisfied trait bounds
|
::: /home/djipey/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/collections/hash/map.rs:209:1
|
209 | pub struct HashMap<K, V, S = RandomState> {
| ----------------------------------------- doesn't satisfy `HashMap<u8, u8>: Hash`
|
= note: the following trait bounds were not satisfied:
`HashMap<u8, u8>: Hash`
In Python, I can't put a dict inside a set, so the BTreeMap behaviour is surprising to me.
Could someone provide an explanation here?

The reason is that BTreeMap has a deterministic iteration order and HashMap does not. To quote the docs from the Hash trait,
When implementing both Hash and Eq, it is important that the following property holds:
k1 == k2 -> hash(k1) == hash(k2)
In other words, if two keys are equal, their hashes must also be equal. HashMap and HashSet both rely on this behavior.
There is no way to guarantee this behavior since the iteration order of HashMap is non-deterministic, so data would be fed to the Hasher in a different order whenever a different HashMap is inputted, even if they contain the same elements, breaking the contract of Hash and causing bad things to happen when used in a HashSet or HashMap.
However, the entire point of the BTreeMap is that it is an ordered map, meaning iteration over it occurs in sorted order, which is fully deterministic, meaning the contract of Hash can be satisfied, so an implementation is provided.
Note that both of these behaviors are different than Python's Dict, which iterates over things in order of insertion.

Actually, the answer for implementing Hash for HashMap is there:
how-to-implement-a-hash-function-for-a-hashset-hashmap
Here is an example of code (playground):
use std::collections::{ HashSet, HashMap, hash_map::DefaultHasher, };
use core::hash::{ Hasher, Hash, };
#[derive(Debug, PartialEq,Eq)]
struct MyMap<K,V>(HashMap<K,V>) where K: Eq + Hash;
impl<K,V> Hash for MyMap<K,V> where K: Eq + Hash, V: Hash, {
fn hash<H>(&self, h: &mut H) where H: Hasher {
let hasher = DefaultHasher::new();
let commut_mix = self.0.iter().map(|(k,v)| {
let mut in_h = hasher.clone();
k.hash(&mut in_h);
v.hash(&mut in_h);
in_h.finish() as u128
}).sum();
h.write_u128(commut_mix);
}
}
fn main() {
let map1 = MyMap([(0,"zero"), (2,"deux"),].into_iter().collect());
let map2 = MyMap([(1,"un"), (3,"trois"), ].into_iter().collect());
let map2bis = MyMap([(1,"un"), (3,"trois"), ].into_iter().collect());
let map3 = MyMap([(0,"zero"), (1,"un"), (2,"deux"), ].into_iter().collect());
let map4 = MyMap([(0,"zero"), (2,"deux"), (3,"trois"), ].into_iter().collect());
let set: HashSet<_> = [map1, map2, map3, map4, map2bis].into_iter().collect();
println!("set -> {:?}", set);
}
with result:
Standard Error
Compiling playground v0.0.1 (/playground)
Finished dev [unoptimized + debuginfo] target(s) in 5.32s
Running `target/debug/playground`
Standard Output
set -> {MyMap({1: "un", 0: "zero", 2: "deux"}), MyMap({0: "zero", 2: "deux"}), MyMap({3: "trois", 0: "zero", 2: "deux"}), MyMap({1: "un", 3: "trois"})}

Related

How do I cast the elements referred by a slice without heap allocation?

Let's suppose there's an array of parameters that need to be used in SQL query. Each parameter must be a &dyn ToSql,which is implemented already for &str.
The need arises to use the object both as &dyn ToSql and as &str, like in the example down below, where it needs to implement Display in order to be printed out.
let params = ["a", "b"];
// this works but allocates
// let tx_params = &params
// .iter()
// .map(|p| p as &(dyn ToSql + Sync))
// .collect::<Vec<_>>();
// this is ideal, doesn't allocate on the heap, but doesn't work
params.map(|p| p as &(dyn ToSql + Sync));
// this has to compile, so can't just crate `params` as [&(dyn ToSql + Sync)] initially
println!("Could not insert {}", params);
Error:
Compiling playground v0.0.1 (/playground)
error[E0277]: the trait bound `str: ToSql` is not satisfied
--> src/main.rs:14:20
|
14 | params.map(|p| p as &(dyn ToSql + Sync));
| ^ the trait `ToSql` is not implemented for `str`
|
= help: the following implementations were found:
<&'a str as ToSql>
= note: required for the cast to the object type `dyn ToSql + Sync`
error[E0277]: the size for values of type `str` cannot be known at compilation time
--> src/main.rs:14:20
|
14 | params.map(|p| p as &(dyn ToSql + Sync));
| ^ doesn't have a size known at compile-time
|
= help: the trait `Sized` is not implemented for `str`
= note: required for the cast to the object type `dyn ToSql + Sync`
For more information about this error, try `rustc --explain E0277`.
error: could not compile `playground` due to 2 previous errors
The trait ToSql isn't implemented for str, but it is for &str, however we borrow checked won't let us borrow p here, even though we're not doing anything with the data, except cast it as a new type.
Playground

I agree with #Caesar's take on this, however you actually can do that without heap allocations.
You can use <[T; N]>::each_ref() for that (this method converts &[T; N] to [&T; N]):
params.each_ref().map(|p| p as &(dyn ToSql + Sync));
Playground.
Unfortunately each_ref() is unstable, but you can write it using stable Rust with unsafe code:
use std::iter;
use std::mem::{self, MaybeUninit};
fn array_each_ref<T, const N: usize>(arr: &[T; N]) -> [&T; N] {
let mut result = [MaybeUninit::uninit(); N];
for (result_item, arr_item) in iter::zip(&mut result, arr) {
*result_item = MaybeUninit::new(arr_item);
}
// SAFETY: `MaybeUninit<T>` is guaranteed to have the same layout as `T`; we
// initialized all items above (can be replaced with `MaybeUninit::array_assume_init()`
// once stabilized).
unsafe { mem::transmute_copy(&result) }
}
Playground.

I fought this a month ago and my general recommendation is: Don't bother. The actual query is so much heavier than an allocation.
The situation is a bit confusing, because you need an &ToSql, but ToSql is implemented for &str, so you need two arrays: One [&str], and one [&ToSql], whose elements reference &strs - so the contenst of [&ToSql] are double references. I don't see an easy way of achieving that without allocating. (let params: [&&str; 2] = params.iter().collect::<Vec<_>>().try_into().unwrap(); works and the allocation will likely be optimized out. Nighly or unsafe ways exist, see #ChayimFriedman's answer.)
In this case, you can work around either by initially declaring:
let params = [&"a", &"b"];
by using an iterator, not an array:
let iter = params.iter().map(|p| p as &(dyn ToSql + Sync));
client.query_raw("select * from foo where id in ?", iter);
In my case, I wasn't able to do anything like this because I was using execute, not query, and execute_raw exists only on tokio-postgres, but not on postgres. So beware of these kinds of pitfalls.

Emulate BTreeMap::pop_last in stable Rust 1.65 or older

Editor's note: as of Rust 1.66.0, BTreeMap::pop_last has been stabilized.
In stable Rust 1.65 or older, is there a way to write a function equivalent to BTreeMap::pop_last?
The best I could come up with is:
fn map_pop_last<K, V>(m: &mut BTreeMap<K, V>) -> Option<(K, V)>
where
K: Ord + Clone,
{
let last = m.iter().next_back();
if let Some((k, _)) = last {
let k_copy = k.clone();
return m.remove_entry(&k_copy);
}
None
}
It works, but it requires that the key is cloneable. BTreeMap::pop_last from Rust nightly imposes no such constraint.
If I remove the cloning like this
fn map_pop_last<K, V>(m: &mut BTreeMap<K, V>) -> Option<(K, V)>
where
K: Ord,
{
let last = m.iter().next_back();
if let Some((k, _)) = last {
return m.remove_entry(k);
}
None
}
it leads to
error[E0502]: cannot borrow `*m` as mutable because it is also borrowed as immutable
--> ...
|
.. | let last = m.iter().next_back();
| -------- immutable borrow occurs here
.. | if let Some((k, _)) = last {
.. | return m.remove_entry(k);
| ^^------------^^^
| | |
| | immutable borrow later used by call
| mutable borrow occurs here
Is there a way to work around this issue without imposing additional constraints on map key and value types?

Is there a way to work around this issue without imposing additional constraints on map key and value types?
It doesn't appear doable in safe Rust, at least not with reasonable algorithmic complexity. (See Aiden4's answer for a solution that does it by re-building the whole map.)
But if you're allowed to use unsafe, and if you're determined enough that you want to delve into it, this code could do it:
// Safety: if key uses shared interior mutability, the comparison function
// must not use it. (E.g. it is not allowed to call borrow_mut() on a
// Rc<RefCell<X>> inside the key). It is extremely unlikely that such a
// key exists, but it's possible to write it, so this must be marked unsafe.
unsafe fn map_pop_last<K, V>(m: &mut BTreeMap<K, V>) -> Option<(K, V)>
where
K: Ord,
{
// We make a shallow copy of the key in the map, and pass a
// reference to the copy to BTreeMap::remove_entry(). Since
// remove_entry() is not dropping the key/value pair (it's returning
// it), our shallow copy will remain throughout the lifetime of
// remove_entry(), even if the key contains references.
let (last_key_ref, _) = m.iter().next_back()?;
let last_key_copy = ManuallyDrop::new(std::ptr::read(last_key_ref));
m.remove_entry(&last_key_copy)
}
Playground

Is there a way to work around this issue without imposing additional constraints on map key and value types?
You can't do it efficiently in safe rust, but it is possible:
fn map_pop_last<K, V>(m: &mut BTreeMap<K, V>) -> Option<(K, V)>
where
K: Ord,
{
let mut temp = BTreeMap::new();
std::mem::swap(m, &mut temp);
let mut iter = temp.into_iter();
let ret = iter.next_back();
m.extend(iter);
ret
}
playground
This will do a full traversal of the map but is safe.

We can even be sure that it is not possible to mutate the map in-place with the current offer of interfaces in safe Rust (version 1.59, at the time of writing):
To extract the key, we need to look the keys of the map. "Look at" implies borrowing here. This gives us a &K reference that also borrows the entire map.
Now, a problem arises: To call any of these remove* methods, we need to mutably borrow the map again, which we cannot do since we still hold the reference to the key that we are about to extract - these lifetimes overlap and it won't compile.
Another approach could be to try to get hold of the Entry, but to get one through the .entry method, we need have an owned K too.

Unable to use concat on vec<u8> in my function

I have a program where I need to append two Vec<u8> before they are are serialized.
Just to be sure how to do it, I made this example program:
let a: Vec<u8> = vec![1, 2, 3, 4, 5, 6];
let b: Vec<u8> = vec![7, 8, 9];
let c = [a, b].concat();
println!("{:?}", c);
Which works perfectly. The issue is now when I have to implement it in my own project.
Here I need to write a function, the function takes a struct as input that looks like this:
pub struct Message2 {
pub ephemeral_key_r: Vec<u8>,
pub c_r: Vec<u8>,
pub ciphertext2: Vec<u8>,
}
and the serialalization function looks like this:
pub fn serialize_message_2(msg: &Message2) -> Result<Vec<u8>> {
let c_r_and_ciphertext = [msg.c_r, msg.ciphertext2].concat();
let encoded = (
Bytes::new(&msg.ephemeral_key_r),
Bytes::new(&c_r_and_ciphertext),
);
Ok(cbor::encode_sequence(encoded)?)
}
The first issue that arises here is that it complains that msg.ciphertext2 and msg.c_r are moved values. This makes sense, so I add an & in front of both of them.
However, when I do this, the call to concat() fails, with this type error:
util.rs(77, 59): method cannot be called on `[&std::vec::Vec<u8>; 2]` due to unsatisfied trait bounds
So, when I borrow the values, then the expression [&msg.c_r, &msg.ciphertext2] becomes an array of two vec's, which there is not a concat() defined for.
I also tried calling clone on both vectors:
let c_r_and_ciphertext = [msg.c_r.clone(), msg.ciphertext2.clone()].concat();
and this actually works out!
But now I'm just wondering, why does borrowing the values change the types?
and is there any things to think about when slapping on clone to values that are moved, and where I cannot borrow for some reason?

The reasons on why .concat() behaves as it does are a bit awkward.
To be able to call .concat(), the Concat trait must be implemented. It is implemented on slices of strings, and slices of V, where V can be Borrowed as slices of copyable T.
First, you're calling concat on an array, not a slice. However, auto-borrowing and unsize coercion are applied when calling a function with .. This turns the [V; 2] into a &[V] (where V = Vec<u8> in the working case and V = &Vec<u8> in the non-workin case). Try calling Concat::concat([a, b]) and you'll notice the difference.
So now is the question whether V can be borrowed as/into some &[T] (where T = u8 in your case). Two possibilities exist:
There is an impl<T> Borrow<[T]> for Vec<T>, so Vec<u8> can be turned into &[u8].
There is an impl<'_, T> Borrow<T> for &'_ T, so if you already have a &[u8], that can be used.
However, there is no impl<T> Borrow<[T]> for &'_ Vec<T>, so concatting [&Vec<_>] won't work.
So much for the theory, on the practical side: You can avoid the clones by using [&msg.c_r[..], &msg.ciphertext2[..]].concat(), because you'll be calling concat on &[&[u8]]. The &x[..] is a neat trick to turn the Vecs into slices (by slicing it, without slicing anything off…). You can also do that with .borrow(), but that's a bit more awkward, since you may need an extra type specification: [msg.c_r.borrow(), msg.ciphertext2.borrow()].concat::<u8>()

I tried to reproduce your error message, which this code does:
fn main() {
let a = vec![1, 2];
let b = vec![3, 4];
println!("{:?}", [&a, &b].concat())
}
gives:
error[E0599]: the method `concat` exists for array `[&Vec<{integer}>; 2]`, but its trait bounds were not satisfied
--> src/main.rs:4:31
|
4 | println!("{:?}", [&a, &b].concat())
| ^^^^^^ method cannot be called on `[&Vec<{integer}>; 2]` due to unsatisfied trait bounds
|
= note: the following trait bounds were not satisfied:
`[&Vec<{integer}>]: Concat<_>`
It is a simple matter of helping the compiler to see that &a works perfectly fine as a slice, by calling it &a[..]:
fn main() {
let a = vec![1, 2];
let b = vec![3, 4];
println!("{:?}", [&a[..], &b[..]].concat())
}
why does borrowing the values change the types?
Borrowing changes a type into a reference to that same type, so T to &T. These types are related, but are not the same.
is there any things to think about when slapping on clone to values that are moved, and where I cannot borrow for some reason?
Cloning is a good way to sacrifice performance to make the borrow checker happy. It (usually) involves copying the entire memory that is cloned, but if your code is not performance critical (which most code is not), then it may still be a good trade-off...

How to sort a Vec<(Cow<str>, Cow<str>) by_key without cloning?

I have a vector of tuples containg key and value pairs and I'd like to sort them by the key. I would like to avoid calling .to_string() on the Cows. I can't seem to find a way to do this without cloning.
use std::borrow::Cow;
fn main() {
let mut v: Vec<(Cow<str>, Cow<str>)> = vec![("a".into(), "xd".into()), ("0".into(), "xy".into())];
v.sort_by_key(|(k,_v)| k);
dbg!(v);
}
Compiling playground v0.0.1 (/playground)
error: lifetime may not live long enough
--> src/main.rs:4:28
|
4 | v.sort_by_key(|(k,_v)| k);
| ------- ^ returning this value requires that `'1` must outlive `'2`
| | |
| | return type of closure is &'2 Cow<'_, str>
| has type `&'1 (Cow<'_, str>, Cow<'_, str>)`
error: aborting due to previous error
error: could not compile `playground`
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=72a529fa5b0d39997d5e3738db9c291a
What I tried
I've tried also creating a function instead of a closure so I could assign the same lifetime to the input arguments and the output (See playground), but then I get an error about an invalid signature.
A compile-able solution (See playground) is to clone the Cow which is fine if the Cow is Borrowed, but if it's not Borrowed then why do I need to clone the underlying String? Can't I just call Deref the String into &str?
Also tried to match explicitly the different Cow variants, but the error is very similar to the first one (See playground).
Error message
Most of all I don't understand the error message: "returning this value requires that '1 must outlive '2". Ok I accept that that is required, but why is this an error?

First of all, I`ll simplify your code a bit for two reasons:
To make it more idiomatic
and remove unnecessary code
fn main() {
let mut vector : Vec<(String, String)> = vec![(String::from("a"), String::from("xd")), (String::from("0"), String::from("xy"))];
dbg!(vector);
}
So far, so good.
To sort the vector avoiding the method call .to_string(), we can do it with the function sort_by code (See the playground):
vector.sort_by(|(k1, _), (k2, _)| k1.cmp(k2));
Note that the function cmp does not return a copy of the key but instead the function cmp returns an Ordering:
pub fn cmp(&self, other: &Self) -> Ordering
The Ordering indicates that a compared value X is [less, equal, greater] than another Y ( X.cmp(Y) ).
Other option is to use the function partial_cmp:
vector.sort_by(|(k1, _), (k2, _)| k1.partial_cmp(k2).unwrap());
The function partial_cmp returns an Option<Ordering> enumeration. This is because we use the unwrap method.
Another option (which does not solve the problem as you want) is using the function sort_by_key:
vector.sort_by_key(|(k1, _)| String::from(k1));
But since this function returns the key, it is a requirement to create a new one to avoid the problem of the lifetime.

Just use sort_by instead of sort_by_key:
v.sort_by(|(k1, _), (k2, _)| k1.cmp(k2));
Most of all I don't understand the error message
The problem is sort_by_key's function declaration:
pub fn sort_by_key<K, F>(&mut self, f: F)
where
F: FnMut(&T) -> K,
K: Ord
This shows that sort_by_key accepts a closure which returns a type K, and &T doesn't have to outlive K. If it were instead defined as
pub fn sort_by_key<'a, K, F>(&mut self, f: F)
where
F: FnMut(&'a T) -> K,
K: Ord + 'a
Then it would work in this case. But it isn't, so we have to live with it :/

Finding most frequently occurring string in a structure in Rust?

I'm looking for the string which occurs most frequently in the second part of the tuple of Vec<(String, Vec<String>)>:
use itertools::Itertools; // 0.8.0
fn main() {
let edges: Vec<(String, Vec<String>)> = vec![];
let x = edges
.iter()
.flat_map(|x| &x.1)
.map(|x| &x[..])
.sorted()
.group_by(|x| x)
.max_by_key(|x| x.len());
}
Playground
This:
takes the iterator
flat-maps to the second part of the tuple
turns elements into a &str
sorts it (via itertools)
groups it by string (via itertools)
find the group with the highest count
This supposedly gives me the group with the most frequently occurring string, except it doesn't compile:
error[E0599]: no method named `max_by_key` found for type `itertools::groupbylazy::GroupBy<&&str, std::vec::IntoIter<&str>, [closure#src/lib.rs:9:19: 9:24]>` in the current scope
--> src/lib.rs:10:10
|
10 | .max_by_key(|x| x.len());
| ^^^^^^^^^^
|
= note: the method `max_by_key` exists but the following trait bounds were not satisfied:
`&mut itertools::groupbylazy::GroupBy<&&str, std::vec::IntoIter<&str>, [closure#src/lib.rs:9:19: 9:24]> : std::iter::Iterator`
I'm totally lost in these types.

You didn't read the documentation for a function you are using. This is not a good idea.
This type implements IntoIterator (it is not an iterator itself),
because the group iterators need to borrow from this value. It should
be stored in a local variable or temporary and iterated.
Personally, I'd just use a BTreeMap or HashMap:
let mut counts = BTreeMap::new();
for word in edges.iter().flat_map(|x| &x.1) {
*counts.entry(word).or_insert(0) += 1;
}
let max = counts.into_iter().max_by_key(|&(_, count)| count);
println!("{:?}", max);
If you really wanted to use the iterators, it could look something like this:
let groups = edges
.iter()
.flat_map(|x| &x.1)
.sorted()
.group_by(|&x| x);
let max = groups
.into_iter()
.map(|(key, group)| (key, group.count()))
.max_by_key(|&(_, count)| count);

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why is BTreeMap hashable, and not HashMap? - rust

Related

How do I cast the elements referred by a slice without heap allocation?

Emulate BTreeMap::pop_last in stable Rust 1.65 or older

Unable to use concat on vec<u8> in my function

How to sort a Vec<(Cow<str>, Cow<str>) by_key without cloning?

Finding most frequently occurring string in a structure in Rust?

Categories

Resources