How does one create a HashMap with a default value in Rust? - rust

Being fairly new to Rust, I was wondering on how to create a HashMap with a default value for a key? For example, having a default value 0 for any key inserted in the HashMap.
In Rust, I know this creates an empty HashMap:
let mut mymap: HashMap<char, usize> = HashMap::new();
I am looking to maintain a counter for a set of keys, for which one way to go about it seems to be:
for ch in "AABCCDDD".chars() {
mymap.insert(ch, 0)
}
Is there a way to do it in a much better way in Rust, maybe something equivalent to what Ruby provides:
mymap = Hash.new(0)
mymap["b"] = 1
mymap["a"] # 0

Answering the problem you have...
I am looking to maintain a counter for a set of keys.
Then you want to look at How to lookup from and insert into a HashMap efficiently?. Hint: *map.entry(key).or_insert(0) += 1
Answering the question you asked...
How does one create a HashMap with a default value in Rust?
No, HashMaps do not have a place to store a default. Doing so would cause every user of that data structure to allocate space to store it, which would be a waste. You'd also have to handle the case where there is no appropriate default, or when a default cannot be easily created.
Instead, you can look up a value using HashMap::get and provide a default if it's missing using Option::unwrap_or:
use std::collections::HashMap;
fn main() {
let mut map: HashMap<char, usize> = HashMap::new();
map.insert('a', 42);
let a = map.get(&'a').cloned().unwrap_or(0);
let b = map.get(&'b').cloned().unwrap_or(0);
println!("{}, {}", a, b); // 42, 0
}
If unwrap_or doesn't work for your case, there are several similar functions that might:
Option::unwrap_or_else
Option::map_or
Option::map_or_else
Of course, you are welcome to wrap this in a function or a data structure to provide a nicer API.
ArtemGr brings up an interesting point:
in C++ there's a notion of a map inserting a default value when a key is accessed. That always seemed a bit leaky though: what if the type doesn't have a default? Rust is less demanding on the mapped types and more explicit about the presence (or absence) of a key.
Rust adds an additional wrinkle to this. Actually inserting a value would require that simply getting a value can also change the HashMap. This would invalidate any existing references to values in the HashMap, as a reallocation might be required. Thus you'd no longer be able to get references to two values at the same time! That would be very restrictive.

What about using entry to get an element from the HashMap, and then modify it.
From the docs:
fn entry(&mut self, key: K) -> Entry<K, V>
Gets the given key's corresponding entry in the map for in-place
manipulation.
example
use std::collections::HashMap;
let mut letters = HashMap::new();
for ch in "a short treatise on fungi".chars() {
let counter = letters.entry(ch).or_insert(0);
*counter += 1;
}
assert_eq!(letters[&'s'], 2);
assert_eq!(letters[&'t'], 3);
assert_eq!(letters[&'u'], 1);
assert_eq!(letters.get(&'y'), None);

.or_insert() and .or_insert_with()
Adding to the existing example for .entry().or_insert(), I wanted to mention that if the default value passed to .or_insert() is dynamically generated, it's better to use .or_insert_with().
Using .or_insert_with() as below, the default value is not generated if the key already exists. It only gets created when necessary.
for v in 0..s.len() {
components.entry(unions.get_root(v))
.or_insert_with(|| vec![]) // vec only created if needed.
.push(v);
}
In the snipped below, the default vector passed to .or_insert() is generated on every call. If the key exists, a vector is being created and then disposed of, which can be wasteful.
components.entry(unions.get_root(v))
.or_insert(vec![]) // vec always created.
.push(v);
So for fixed values that don't have much creation overhead, use .or_insert(), and for values that have appreciable creation overhead, use .or_insert_with().

A way to start a map with initial values is to construct the map from a vector of tuples. For instance, considering, the code below:
let map = vec![("field1".to_string(), value1), ("field2".to_string(), value2)].into_iter().collect::<HashMap<_, _>>();

Related

How to iterate over HashMap starting from given key?

Given a HashMap of n elements how does one start iteration from n-x element.
The order of elements does not matter, the only problem I need to solve is to start iteration from given key.
Example:
let mut map: HashMap<&str, i32> = HashMap::new();
map.insert("one", 1);
map.insert("two", 2);
map.insert("three", 3);
map.insert("four", 4);
[...]
for (k, v) in map {
//how to start iteration from third item and not the first one
}
Tried to google it but no examples found so far.
Tried to google it but no examples found so far.
That's because as Chayim Friedman notes it doesn't really make sense, a hashmap has an essentially random internal order, which means it has an arbitrary iteration order. Iterating from or between keys (/ entries) thus doesn't make much sense.
So it sounds a lot like an XY problem, what is the reason why you're trying to iterate "starting from a given key"?
Though if you really want that, you can just use the skip_while adapter, and skip while you have not found the key you're looking for.
Alternatively, since your post is ambiguous (you talk about both key and position) you can use the skip adapter to skip over a fixed number of items.
Technically neither will start iterating from that entry, they'll both start iterating from 0 but will only yield items following the specified break point. The standard library's hashmap has no support for range iteration (because that doesn't really make any sense on hashmap), and its iterators are not random access either (for similar reason).
You may want to use a BTreeMap, which has sorted keys and a range function which iterates over a range of keys.
use std::collections::BTreeMap;
fn main() {
let mut map = BTreeMap::new();
map.insert(1, "one");
map.insert(2, "two");
map.insert(3, "three");
for (&key, &value) in map.range(2..) {
println!("{key}: {value}");
}
}
// 2: two
// 3: three

How to check two HashMap are identical in Rust?

I have two HashMaps (playground):
let mut m1: HashMap<u8, usize, _> = HashMap::new();
m1.insert(1, 100);
m1.insert(2, 200);
let mut m2: HashMap<u8, usize, _> = HashMap::new();
m2.insert(2, 200);
m2.insert(1, 100);
How can I check if the two maps m1 and m2 are identical?
By "identical", I mean all of the following conditions are satisfied.
Type of keys is same.
Type of values is same.
Two maps have exactly the same key set. Insertion order shall not matter.
Two maps have exactly the same value for every key (i.e. m1.get(k) == m2.get(k) for every existing key k).
As far as I tested, just m1 == m2 works. However, it this behavior guaranteed? I want a some sort of guarantee (thus I added #language-lawyer tag).
I've already read the official documentation of HashMap.
Also, what about HashSet and Vec? (I've also read their documentation.)
Looking through the source of the std libraries you can find the implementation of PartialEq for those different collections:
HashMap iterate over all key/value pair and check if the other map has a corresponding entry for that key, and then check if those value are equal: source.
HashSet iterate of the keys and check if the other set contains that key: source.
Vec actually call eq on the underlying slice, which either iterate across every values and compare them: source or does a bitwise comparaison if the type allows it by calling memcmp: source.
I don't know if there is any kind of garanties that this behavior will never change, but being stables, widely use APIs, I don't see them change, ever.

What is the proper way of modifying a value of an entry in a HashMap?

I am a beginner in Rust, I haven't finished the "Book" yet, but one thing made me ask this question.
Considering this code:
fn main() {
let mut entries = HashMap::new();
entries.insert("First".to_string(), 10);
entries.entry("Second".to_string()).or_insert(20);
assert_eq!(10, *entries.get("First").unwrap());
entries.entry(String::from("First")).and_modify(|value| { *value = 20});
assert_eq!(20, *entries.get("First").unwrap());
entries.insert("First".to_string(), 30);
assert_eq!(30, *entries.get("First").unwrap());
}
I have used two ways of modifying an entry:
entries.entry(String::from("First")).and_modify(|value| { *value = 20});
entries.insert("First".to_string(), 30);
The insert way looks clunkish, and I woundn't personally use it to modify a value in an entry, but... it works. Nevertheless, is there a reason not to use it other than semantics? As I said, I'd rather use the entry construct than just bruteforcing an update using insert with an existing key. Something a newbie Rustacean like me could not possibly know?
insert() is a bit more idiomatic when you are replacing an entire value, particularly when you don't know (or care) if the value was present to begin with.
get_mut() is more idiomatic when you want to do something to a value that requires mutability, such as replacing only one field of a struct or invoking a method that requires a mutable reference. If you know the key is present you can use .unwrap(), otherwise you can use one of the other Option utilities or match.
entry(...).and_modify(...) by itself is rarely idiomatic; it's more useful when chaining other methods of Entry together, such as where you want to modify a value if it exists, otherwise add a different value. You might see this pattern when working with maps where the values are totals:
entries.entry(key)
.and_modify(|v| *v += 1)
.or_insert(1);

Rust: How to combine the Entry API with owned data?

I have a HashMap and would like to update a value if it exists and otherwise add a default one. Normally I would do it like this:
some_map.entry(some_key)
.and_modify(|e| modify(e))
.or_insert(default)
But now my modify has type fn(T)->T, but the borrow checker obviously won’t allow me to write:
some_map.entry(some_key)
.and_modify(|e| *e = modify(*e))
.or_insert(default)
What is the preferred way of doing this in Rust? Should I just use remove and insert?
Assuming you can create an empty version of your T for cheap, you could use mem::replace:
some_map.entry(some_key)
.and_modify(|e| {
// swaps the second parameter in and returns the value which was there
let mut v = mem::replace(e, T::empty());
v = modify(v);
// puts the value back in and discards the empty one
mem::replace(e, v);
})
.or_insert(default)
this assumes modify does not panic though, otherwise you'll find yourself with the "empty" value staying in your map. But you'd have a similar issue with remove / insert.

Sort HashMap data by value

I want to sort HashMap data by value in Rust (e.g., when counting character frequency in a string).
The Python equivalent of what I’m trying to do is:
count = {}
for c in text:
count[c] = count.get('c', 0) + 1
sorted_data = sorted(count.items(), key=lambda item: -item[1])
print('Most frequent character in text:', sorted_data[0][0])
My corresponding Rust code looks like this:
// Count the frequency of each letter
let mut count: HashMap<char, u32> = HashMap::new();
for c in text.to_lowercase().chars() {
*count.entry(c).or_insert(0) += 1;
}
// Get a sorted (by field 0 ("count") in reversed order) list of the
// most frequently used characters:
let mut count_vec: Vec<(&char, &u32)> = count.iter().collect();
count_vec.sort_by(|a, b| b.1.cmp(a.1));
println!("Most frequent character in text: {}", count_vec[0].0);
Is this idiomatic Rust? Can I construct the count_vec in a way so that it would consume the HashMaps data and owns it (e.g., using map())? Would this be more idomatic?
Is this idiomatic Rust?
There's nothing particularly unidiomatic, except possibly for the unnecessary full type constraint on count_vec; you could just use
let mut count_vec: Vec<_> = count.iter().collect();
It's not difficult from context to work out what the full type of count_vec is. You could also omit the type constraint for count entirely, but then you'd have to play shenanigans with your integer literals to have the correct value type inferred. That is to say, an explicit annotation is eminently reasonable in this case.
The other borderline change you could make if you feel like it would be to use |a, b| a.1.cmp(b.1).reverse() for the sort closure. The Ordering::reverse method just reverses the result so that less-than becomes greater-than, and vice versa. This makes it slightly more obvious that you meant what you wrote, as opposed to accidentally transposing two letters.
Can I construct the count_vec in a way so that it would consume the HashMaps data and owns it?
Not in any meaningful way. Just because HashMap is using memory doesn't mean that memory is in any way compatible with Vec. You could use count.into_iter() to consume the HashMap and move the elements out (as opposed to iterating over pointers), but since both char and u32 are trivially copyable, this doesn't really gain you anything.
This could be another way to address the matter without the need of an intermediary vector.
// Count the frequency of each letter
let mut count: HashMap<char, u32> = HashMap::new();
for c in text.to_lowercase().chars() {
*count.entry(c).or_insert(0) += 1;
}
let top_char = count.iter().max_by(|a, b| a.1.cmp(&b.1)).unwrap();
println!("Most frequent character in text: {}", top_char.0);
use BTreeMap for sorted data
BTreeMap sorts its elements by key by default, therefore exchanging the place of your key and value and putting them into a BTreeMap
let count_b: BTreeMap<&u32,&char> = count.iter().map(|(k,v)| (v,k)).collect();
should give you a sorted map according to character frequency.
Some character of the same frequency shall be lost though. But if you only want the most frequent character, it does not matter.
You can get the result using
println!("Most frequent character in text: {}", count_b.last_key_value().unwrap().1);

Resources