How to check two HashMap are identical in Rust? - rust

I have two HashMaps (playground):
let mut m1: HashMap<u8, usize, _> = HashMap::new();
m1.insert(1, 100);
m1.insert(2, 200);
let mut m2: HashMap<u8, usize, _> = HashMap::new();
m2.insert(2, 200);
m2.insert(1, 100);
How can I check if the two maps m1 and m2 are identical?
By "identical", I mean all of the following conditions are satisfied.
Type of keys is same.
Type of values is same.
Two maps have exactly the same key set. Insertion order shall not matter.
Two maps have exactly the same value for every key (i.e. m1.get(k) == m2.get(k) for every existing key k).
As far as I tested, just m1 == m2 works. However, it this behavior guaranteed? I want a some sort of guarantee (thus I added #language-lawyer tag).
I've already read the official documentation of HashMap.
Also, what about HashSet and Vec? (I've also read their documentation.)

Looking through the source of the std libraries you can find the implementation of PartialEq for those different collections:
HashMap iterate over all key/value pair and check if the other map has a corresponding entry for that key, and then check if those value are equal: source.
HashSet iterate of the keys and check if the other set contains that key: source.
Vec actually call eq on the underlying slice, which either iterate across every values and compare them: source or does a bitwise comparaison if the type allows it by calling memcmp: source.
I don't know if there is any kind of garanties that this behavior will never change, but being stables, widely use APIs, I don't see them change, ever.

Related

How to iterate over HashMap starting from given key?

Given a HashMap of n elements how does one start iteration from n-x element.
The order of elements does not matter, the only problem I need to solve is to start iteration from given key.
Example:
let mut map: HashMap<&str, i32> = HashMap::new();
map.insert("one", 1);
map.insert("two", 2);
map.insert("three", 3);
map.insert("four", 4);
[...]
for (k, v) in map {
//how to start iteration from third item and not the first one
}
Tried to google it but no examples found so far.
Tried to google it but no examples found so far.
That's because as Chayim Friedman notes it doesn't really make sense, a hashmap has an essentially random internal order, which means it has an arbitrary iteration order. Iterating from or between keys (/ entries) thus doesn't make much sense.
So it sounds a lot like an XY problem, what is the reason why you're trying to iterate "starting from a given key"?
Though if you really want that, you can just use the skip_while adapter, and skip while you have not found the key you're looking for.
Alternatively, since your post is ambiguous (you talk about both key and position) you can use the skip adapter to skip over a fixed number of items.
Technically neither will start iterating from that entry, they'll both start iterating from 0 but will only yield items following the specified break point. The standard library's hashmap has no support for range iteration (because that doesn't really make any sense on hashmap), and its iterators are not random access either (for similar reason).
You may want to use a BTreeMap, which has sorted keys and a range function which iterates over a range of keys.
use std::collections::BTreeMap;
fn main() {
let mut map = BTreeMap::new();
map.insert(1, "one");
map.insert(2, "two");
map.insert(3, "three");
for (&key, &value) in map.range(2..) {
println!("{key}: {value}");
}
}
// 2: two
// 3: three

How to count HashMap values using a predicate in Rust?

I'm trying this, but doesn't work:
let map = HashMap::new();
map.insert(1, "aaa");
map.insert(2, "bbb");
let a = map.counts_by(|k, v| v.starts_with("a"));
What is the right way?
Anything that iterates over collections in Rust is going to factor through the Iterator API, and unlike in Java where iterators are often implicitly used, it's very common in Rust to explicitly ask for an iterator (with .iter()) and do some work directly on it in a functional style. In your case, there are three things we need to do here.
Get the values of the HashMap. This can be done with the values method, which returns an iterator.
Keep only the ones satisfying a particular predicate. This is a filter operation and will produce another iterator. Note that this does not yet iterate over the hash map; it merely produces another iterator capable of doing so later.
Count the matches, using count.
Putting it all together, we have
map.values().filter(|v| v.starts_with("a")).count()
You should filter an iterator of the HashMap, then count the elements of the iterator:
use std::collections::HashMap;
fn main() {
let mut map = HashMap::new();
map.insert(1, "aaa");
map.insert(2, "bbb");
assert_eq!(
map.iter().filter(|(_k, v)| v.starts_with("a")).count(),
1
);
}
Notice that the map also has to be marked as mut in order to insert new elements, and the filter closure destructures into a tuple containing the key and the value, rather than accepting two separate parameters.

How to define an ordered Map/Set with a runtime-defined comparator?

This is similar to How do I use a custom comparator function with BTreeSet? however in my case I won't know the sorting criteria until runtime. The possible criteria are extensive and can't be hard-coded (think something like sort by distance to target or sort by specific bytes in a payload or combination thereof). The sorting criteria won't change after the map/set is created.
The only alternatives I see are:
use a Vec, but log(n) inserts and deletes are crucial
wrap each of the elements with the sorting criteria (directly or indirectly), but that seems wasteful
This is possible with standard C++ containers std::map/std::set but doesn't seem possible with Rust's BTreeMap/BTreeSet. Is there an alternative in the standard library or in another crate that can do this? Or will I have to implement this myself?
My use-case is a database-like system where elements in the set are defined by a schema, like:
Element {
FIELD x: f32
FIELD y: f32
FIELD z: i64
ORDERBY z
}
But since the schema is user-defined at runtime, the elements are stored in a set of bytes (BTreeSet<Vec<u8>>). Likewise the order of the elements is user-defined. So the comparator I would give to BTreeSet would look like |a, b| schema.cmp(a, b). Hard-coded, the above example may look something like:
fn cmp(a: &Vec<u8>, b: &Vec<u8>) -> Ordering {
let a_field = self.get_field(a, 2).as_i64();
let b_field = self.get_field(b, 2).as_i64();
a_field.cmp(b_field)
}
Would it be possible to pass the comparator closure as an argument to each node operation that needs it? It would be owned by the tree wrapper instead of cloned in every node.

How does one create a HashMap with a default value in Rust?

Being fairly new to Rust, I was wondering on how to create a HashMap with a default value for a key? For example, having a default value 0 for any key inserted in the HashMap.
In Rust, I know this creates an empty HashMap:
let mut mymap: HashMap<char, usize> = HashMap::new();
I am looking to maintain a counter for a set of keys, for which one way to go about it seems to be:
for ch in "AABCCDDD".chars() {
mymap.insert(ch, 0)
}
Is there a way to do it in a much better way in Rust, maybe something equivalent to what Ruby provides:
mymap = Hash.new(0)
mymap["b"] = 1
mymap["a"] # 0
Answering the problem you have...
I am looking to maintain a counter for a set of keys.
Then you want to look at How to lookup from and insert into a HashMap efficiently?. Hint: *map.entry(key).or_insert(0) += 1
Answering the question you asked...
How does one create a HashMap with a default value in Rust?
No, HashMaps do not have a place to store a default. Doing so would cause every user of that data structure to allocate space to store it, which would be a waste. You'd also have to handle the case where there is no appropriate default, or when a default cannot be easily created.
Instead, you can look up a value using HashMap::get and provide a default if it's missing using Option::unwrap_or:
use std::collections::HashMap;
fn main() {
let mut map: HashMap<char, usize> = HashMap::new();
map.insert('a', 42);
let a = map.get(&'a').cloned().unwrap_or(0);
let b = map.get(&'b').cloned().unwrap_or(0);
println!("{}, {}", a, b); // 42, 0
}
If unwrap_or doesn't work for your case, there are several similar functions that might:
Option::unwrap_or_else
Option::map_or
Option::map_or_else
Of course, you are welcome to wrap this in a function or a data structure to provide a nicer API.
ArtemGr brings up an interesting point:
in C++ there's a notion of a map inserting a default value when a key is accessed. That always seemed a bit leaky though: what if the type doesn't have a default? Rust is less demanding on the mapped types and more explicit about the presence (or absence) of a key.
Rust adds an additional wrinkle to this. Actually inserting a value would require that simply getting a value can also change the HashMap. This would invalidate any existing references to values in the HashMap, as a reallocation might be required. Thus you'd no longer be able to get references to two values at the same time! That would be very restrictive.
What about using entry to get an element from the HashMap, and then modify it.
From the docs:
fn entry(&mut self, key: K) -> Entry<K, V>
Gets the given key's corresponding entry in the map for in-place
manipulation.
example
use std::collections::HashMap;
let mut letters = HashMap::new();
for ch in "a short treatise on fungi".chars() {
let counter = letters.entry(ch).or_insert(0);
*counter += 1;
}
assert_eq!(letters[&'s'], 2);
assert_eq!(letters[&'t'], 3);
assert_eq!(letters[&'u'], 1);
assert_eq!(letters.get(&'y'), None);
.or_insert() and .or_insert_with()
Adding to the existing example for .entry().or_insert(), I wanted to mention that if the default value passed to .or_insert() is dynamically generated, it's better to use .or_insert_with().
Using .or_insert_with() as below, the default value is not generated if the key already exists. It only gets created when necessary.
for v in 0..s.len() {
components.entry(unions.get_root(v))
.or_insert_with(|| vec![]) // vec only created if needed.
.push(v);
}
In the snipped below, the default vector passed to .or_insert() is generated on every call. If the key exists, a vector is being created and then disposed of, which can be wasteful.
components.entry(unions.get_root(v))
.or_insert(vec![]) // vec always created.
.push(v);
So for fixed values that don't have much creation overhead, use .or_insert(), and for values that have appreciable creation overhead, use .or_insert_with().
A way to start a map with initial values is to construct the map from a vector of tuples. For instance, considering, the code below:
let map = vec![("field1".to_string(), value1), ("field2".to_string(), value2)].into_iter().collect::<HashMap<_, _>>();

Sort HashMap data by value

I want to sort HashMap data by value in Rust (e.g., when counting character frequency in a string).
The Python equivalent of what I’m trying to do is:
count = {}
for c in text:
count[c] = count.get('c', 0) + 1
sorted_data = sorted(count.items(), key=lambda item: -item[1])
print('Most frequent character in text:', sorted_data[0][0])
My corresponding Rust code looks like this:
// Count the frequency of each letter
let mut count: HashMap<char, u32> = HashMap::new();
for c in text.to_lowercase().chars() {
*count.entry(c).or_insert(0) += 1;
}
// Get a sorted (by field 0 ("count") in reversed order) list of the
// most frequently used characters:
let mut count_vec: Vec<(&char, &u32)> = count.iter().collect();
count_vec.sort_by(|a, b| b.1.cmp(a.1));
println!("Most frequent character in text: {}", count_vec[0].0);
Is this idiomatic Rust? Can I construct the count_vec in a way so that it would consume the HashMaps data and owns it (e.g., using map())? Would this be more idomatic?
Is this idiomatic Rust?
There's nothing particularly unidiomatic, except possibly for the unnecessary full type constraint on count_vec; you could just use
let mut count_vec: Vec<_> = count.iter().collect();
It's not difficult from context to work out what the full type of count_vec is. You could also omit the type constraint for count entirely, but then you'd have to play shenanigans with your integer literals to have the correct value type inferred. That is to say, an explicit annotation is eminently reasonable in this case.
The other borderline change you could make if you feel like it would be to use |a, b| a.1.cmp(b.1).reverse() for the sort closure. The Ordering::reverse method just reverses the result so that less-than becomes greater-than, and vice versa. This makes it slightly more obvious that you meant what you wrote, as opposed to accidentally transposing two letters.
Can I construct the count_vec in a way so that it would consume the HashMaps data and owns it?
Not in any meaningful way. Just because HashMap is using memory doesn't mean that memory is in any way compatible with Vec. You could use count.into_iter() to consume the HashMap and move the elements out (as opposed to iterating over pointers), but since both char and u32 are trivially copyable, this doesn't really gain you anything.
This could be another way to address the matter without the need of an intermediary vector.
// Count the frequency of each letter
let mut count: HashMap<char, u32> = HashMap::new();
for c in text.to_lowercase().chars() {
*count.entry(c).or_insert(0) += 1;
}
let top_char = count.iter().max_by(|a, b| a.1.cmp(&b.1)).unwrap();
println!("Most frequent character in text: {}", top_char.0);
use BTreeMap for sorted data
BTreeMap sorts its elements by key by default, therefore exchanging the place of your key and value and putting them into a BTreeMap
let count_b: BTreeMap<&u32,&char> = count.iter().map(|(k,v)| (v,k)).collect();
should give you a sorted map according to character frequency.
Some character of the same frequency shall be lost though. But if you only want the most frequent character, it does not matter.
You can get the result using
println!("Most frequent character in text: {}", count_b.last_key_value().unwrap().1);

Resources