Rust Collect Hashmap from Iterator of Pairs - rust

We have a HashMap, over which we iterate and map to replace the values, but are running into an issue collecting that back to a new HashMap with different value type.
value of type `std::collections::HashMap<std::string::String, std::string::String>`
cannot be built from `std::iter::Iterator<Item=(&std::string::String, std::string::String)>`
What we are doing essentially boils down to this:
let old: HashMap<String, Value> = some_origin();
let new: HashMap<String, String> = old.iter().map(|(key, value)| {
return (key, some_conversion(value));
}).collect();
The same iterator type is also returned (and not collectable), if one zips two iterators, e.g. in this case zipping key, and the map that only returns the converted value.
new = old.keys().into_iter().zip(old.iter().map(|(key, value)| some_conversion(value)).collect();

The issue is that iter() (docs) returns a 'non-consuming' iterator which hands out references to the underlying values ([1]). The new HashMap cannot be constructed using references (&String), it needs values (String).
In your example, some_conversion seems to return a new String for the value part, so applying .clone() to the key would do the trick:
let old: HashMap<String, Value> = some_origin();
let new: HashMap<String, String> = old.iter().map(|(key, value)| {
return (key.clone(), some_conversion(value));
// ^---- .clone() call inserted
}).collect();
Here is a link to a full example on the rust playground.
Looking at the error message from the compiler [2], this is indeed quite hard to figure out. I think what most helps with this is to build up an intuition around references and ownership in Rust to understand when references are OK and when an owned value is needed.
While I'd recommend reading the sections on references and ownership in the Rust Book and even more Programming Rust, the gist is as follows:
Usually, values in Rust have exactly one owner (exceptions are explicit shared ownership pointers, such as Rc).
When a value is passed 'by value' it is moved to the new location. This invalidates the original owner of the value.
There can be multiple shared references to a value, but while any shared reference exists, the value is immutable (no mutable references can exist or be created, so it cannot be modified or moved).
We cannot move a value out of a shared reference (this would invalidate the original owner, which is immutable while a shared reference exists).
Usually, Rust doesn't automatically copy (in Rust parlance "clone") values, even if it could. Instead it takes ownership of values. (The exception are "Copy" types which are cheap to copy, such as i32).
(Not relevant here) There can also be a single mutable reference to a value. While this mutable reference exits no shared references can be created.
How does this help?
Who owns the keys in the hash map? The hash map does (rule 1)!
But how do we get a new key-value pair into the hash map? The values are moved into the hash map (rule 2).
But we can't move out of a shared reference ... (rule 3 + rule 4)
And Rust doesn't want to clone the value unless we tell it to do so (rule 5)
... so we have to clone it ourselves.
I hope this gives some intuition (again I would really recommend Programming Rust on this). In general, if you do something with a value, you either take ownership over it, or you get a reference. If you take ownership, the original variable that had ownership can no longer be used. If you get a reference, you can't hand ownership to somebody else (without cloning). And Rust doesn't clone for you.
[1]: The docs call this "An iterator visiting all key-value pairs in arbitrary order. The iterator element type is (&'a K, &'a V)." Ignoring the 'a lifetime parameters, you can see that the element type is (&K, &V).
[2]:
13 | .collect();
| ^^^^^^^ value of type `std::collections::HashMap<std::string::String, std::string::String>` cannot be built from `std::iter::Iterator<Item=(&std::string::String, std::string::String)>`
|
= help: the trait `std::iter::FromIterator<(&std::string::String, std::string::String)>` is not implemented for `std::collections::HashMap<std::string::String, std::string::String>`

If you don't need the old map any more you can just use into_iter.
let new: HashMap<String, String> = old.into_iter().map(|(key, value)| {
return (key, some_conversion(value));
}).collect();
You can see a working version here

Related

Use of moved value

I have a simple function in Rust that iterates through numbers and adds them to a vector if they fulfill a condition. This condition is a function that uses a previously defined variable, prime_factors.
The is_multiperfect function only needs to look things up in the prime_factors variable.
fn get_all_mpn_below(integer: usize) -> Vec<usize> {
let prime_factors = get_prime_factors_below(integer);
let mut mpn = vec![1];
for n in (2..integer).step_by(2) {
if is_multiperfect(n, prime_factors) {
mpn.push(n);
}
}
return mpn;
}
However, this yields the following error:
use of moved value: `prime_factors`
let prime_factors = get_prime_factors_below(integer);
------------- move occurs because `prime_factors` has type `HashMap<usize, Vec<usize>>`, which does not implement the `Copy` trait
if is_multiperfect(n, prime_factors) {
^^^^^^^^^^^^^ value moved here, in previous iteration of loop
I've looked up the error and found it was about ownership, however I fail to understand how ownership applies here.
How can I fix this error?
as I don't declare another variable.
Why would you think that's relevant?
Moving is simply the default behaviour of Rust when transferring values (whether setting them, or passing them to function, or returning them from functions). This occurs for all types which are not Copy.
How can I fix this error?
Hard to say since the problem is is_multiperfect and you don't provide that code, so the reader, not being psychic, has no way to know what is_multiperfect wants out of prime_factors.
Possible solutions are:
clone() the map, this creates a complete copy which the callee can use however it wants, leaving the original available, this gives the callee complete freedom but incurs a large cost for the caller
pass the map as an &mut (unique / mutable reference), if the callee needs to update it
pass the map as an & (shared reference), if the callee just needs to look things up in the map

HashMap.remove() causing "cannot borrow as mutable more than once" error [duplicate]

This should be a trivial task in any language. This isn't working in Rust.
use std::collections::HashMap;
fn do_it(map: &mut HashMap<String, String>) {
for (key, value) in map {
println!("{} / {}", key, value);
map.remove(key);
}
}
fn main() {}
Here's the compiler error:
error[E0382]: use of moved value: `*map`
--> src/main.rs:6:9
|
4 | for (key, value) in map {
| --- value moved here
5 | println!("{} / {}", key, value);
6 | map.remove(key);
| ^^^ value used here after move
|
= note: move occurs because `map` has type `&mut std::collections::HashMap<std::string::String, std::string::String>`, which does not implement the `Copy` trait
Why it is trying to move a reference? From the documentation, I didn't think moving/borrowing applied to references.
There are at least two reasons why this is disallowed:
You would need to have two concurrent mutable references to map — one held by the iterator used in the for loop and one in the variable map to call map.remove.
You have references to the key and the value within the map when trying to mutate the map. If you were allowed to modify the map in any way, these references could be invalidated, opening the door for memory unsafety.
A core Rust principle is Aliasing XOR Mutability. You can have multiple immutable references to a value or you can have a single mutable reference to it.
I didn't think moving/borrowing applied to references.
Every type is subject to Rust's rules of moving as well as mutable aliasing. Please let us know what part of the documentation says it isn't so we can address that.
Why it is trying to move a reference?
This is combined of two parts:
You can only have a single mutable reference, so mutable references don't implement the Copy trait
for loops take the value to iterate over by value
When you call for (k, v) in map {}, the ownership of map is transferred to the for loop and is now gone.
I'd perform an immutable borrow of the map (&*map) and iterate over that. At the end, I'd clear the whole thing:
fn do_it(map: &mut HashMap<String, String>) {
for (key, value) in &*map {
println!("{} / {}", key, value);
}
map.clear();
}
remove every value with a key that starts with the letter "A"
I'd use HashMap::retain:
fn do_it(map: &mut HashMap<String, String>) {
map.retain(|key, value| {
println!("{} / {}", key, value);
!key.starts_with("a")
})
}
This guarantees that key and value no longer exist when the map is actually modified, thus any borrow that they would have had is now gone.
This should be a trivial task in any language.
Rust is preventing you from mutating the map while you are iterating over it. In most languages this is allowed, but often the behaviour is not well-defined, and removal of the item can interfere with the iteration, compromising its correctness.
Why it is trying to move a reference?
HashMap implements IntoIterator, so your loop is equivalent to:
for (key, value) in map.into_iter() {
println!("{} / {}", key, value);
map.remove(key);
}
If you look at the definition of into_iter, you'll see that it takes self, not &self or &mut self. Your variable map is a mutable reference, and IntoIterator is implemented for &mut HashMap - the self in into_iter is the &mut HashMap, not the HashMap. Mutable references cannot be copied (since only one mutable reference to any data can exist at one time) so this mutable reference is moved.
The API is intentionally built that way so that you can't do anything dangerous while looping over a structure. Once the loop is complete, the ownership of the structure is relinquished and you can use it again.
One solution is to keep track of the items you intend to remove in a Vec and then remove them afterwards:
fn do_it(map: &mut HashMap<String, String>) {
let mut to_remove = Vec::new();
for (key, value) in &*map {
if key.starts_with("A") {
to_remove.push(key.to_owned());
}
}
for key in to_remove.iter() {
map.remove(key);
}
}
You may also use an iterator to filter the map into a new one. Perhaps something like this:
fn do_it(map: &mut HashMap<String, String>) {
*map = map.into_iter().filter_map(|(key, value)| {
if key.starts_with("A") {
None
} else {
Some((key.to_owned(), value.to_owned()))
}
}).collect();
}
But I just saw Shepmaster's edit - I had forgotten about retain, which is better. It's more concise and doesn't do unnecessary copying as I have done.
Rust actually supports a wide number of potential solutions to this problem, though I myself also found the situation to be a bit confusing at first, and again each time I need a more complicated treatment of my hashmaps.
To iterate through items while removing them, use .drain(). .drain() has the advantage of taking/owning rather than borrowing the values.
If you want to only conditionally remove some of them, use .drain_filter().
If you need to mutate every item but only want to remove some of them, you may mutate them in .drain_filter()'s closure argument, but this will check for removal after the changes.
If you need to check for removal before the changes, use a variable to store the result of the check, then return that variable at the end. A very slightly slower but maybe more clear alternative is just to mutate them in one for loop, then .drain_filter() them in another for loop or a map.
You may also simply allow the hashmap to drop at the end of the function by not borrowing it in the function argument, and initialize a new one if you need it. This completely removes the hashmap, obviously. Obviously you might want to keep the hashmap around so you don't reinitialize it over and over.
You may also call .clear() to remove all elements, after you're done iterating through them to print them.

Calculate object A, then return object B that references A in Rust

In my code I often want to calculate a new value A, and then return some view of that value B, because B is a type that's more convenient to work with. The simplest case is where A is a vector and B is a slice that I would like to return. Let's say I want to write a function that returns a set of indices. Ideally this would return a slice directly because then I can use it immediately to index a string.
If I return a vector instead of a slice, I have to use to_slice:
fn all_except(except: usize, max:usize) -> Vec<usize> {
(0..except).chain((except + 1)..max).collect()
}
"abcdefg"[all_except(1, 7)]
string indices are ranges of `usize`
the type `str` cannot be indexed by `Vec<usize>`
help: the trait `SliceIndex<str>` is not implemented for `Vec<usize>`
I can't return a slice directly:
fn all_except(except: usize, max:usize) -> &[usize] {
(0..except).chain((except + 1)..max).collect()
}
"abcdefg"[all_except(1, 7)]
^ expected named lifetime parameter
missing lifetime specifier
help: this function's return type contains a borrowed value with an elided lifetime, but the lifetime cannot be derived from the arguments
help: consider using the `'static` lifetime
I can't even return the underlying vector and a slice of it, for the same reason
pub fn except(index: usize, max: usize) -> (&[usize], Vec<usize>) {
let v = (0..index).chain((index + 1)..max).collect();
(v, v.as_slice)
}
"abcdefg"[all_except(1, 7)[1]
Now it may be possible to hack this particular example using deref coercion (I'm not sure), but I have encountered this problem with more complex types. For example, I have a function that loads an ndarray::Array2<T> from CSV file, then want to split it into two parts using array.split_at(), but this returns two ArrayView2<T> which reference the original Array2<T>, so I encounter the same issue. In general I'm wondering if there's a solution to this problem in general. Can I somehow tell the compiler to move A into the parent frame's scope, or let me return a tuple of (A, B), where it realises that the slice is still valid because A is still alive?
Your code doesn't seem to make any sense, you can't index a string using a slice. If you could the first snippet would have worked with just an as_slice in the caller or something, vecs trivially coerce to slices. That's exactly what the compiler error is telling you: the compiler is looking for a SliceIndex and a Vec (or slice) is definitely not that.
That aside,
Can I somehow tell the compiler to move A into the parent frame's scope, or let me return a tuple of (A, B), where it realises that the slice is still valid because A is still alive?
There are packages like owning_ref which can bundle owner and reference to avoid extra allocations. It tends to be somewhat fiddly.
I don't think there's any other general solution, because Rust reasons at the function level, the type checker has no notion of "tell the compiler to move A into the parent scope". So you need a construct which works around borrow checker.

How to return a reference to a value from Hashmap wrappered in Arc and Mutex in Rust?

I got some trouble to return the reference of the value in a HashMap<String,String> which is wrappered by Arc and Mutex for sharing between threads. The code is like this:
use std::sync::{Arc,Mutex};
use std::collections::HashMap;
struct Hey{
a:Arc<Mutex<HashMap<String, String>>>
}
impl Hey {
fn get(&self,key:&String)->&String{
self.a.lock().unwrap().get(key).unwrap()
}
}
As shown above, the code failed to compile because of returns a value referencing data owned by the current function. I know that lock() returns MutexGuard which is a local variable. But How could I achieve this approach to get a reference to the value in HashMap. If I can't, what is the motivation of Rust to forbidden this?
Let me explain why rustc thinks that your code is wrong.
You can interact with value protected by Mutex only when you have lock on it.
Lock handled by RAII guard.
So, I desugar your code:
fn get(&self,key:&String)->&String{
let lock = self.a.lock().unwrap();
let reference = lock.get(key).unwrap();
drop(lock); // release your lock
// We return reference to data which doesn't protected by Mutex!
// Someone can delete item from hashmap and you would read deleted data
// Use-After-Free is UB so rustc forbid that
return reference;
}
Probably you need to use Arcs as values:
#[derive(Default)]
struct Hey{
a:Arc<RwLock<HashMap<String, Arc<String>>>>
}
fn get(&self,key:&String)->Arc<String>{
self.a.lock().unwrap().get(key).unwrap().clone()
}
P.S.
Also, you can use Arc<str> (and I would recommend that), which would save you from extra pointer indirection. It can be built from String: let arc: Arc<str> = my_string.into(); or Arc::from(my_string)
TLDR;
Since, you made the design decision of wrapping your data i.e. HashMap<String, String> in Arc<Mutex<..>> I am assuming you need to share this data across threads/tasks in a thread safe manner. That is the primary use case for this design choice.
So, my suggestion for anyone reading this today isn't a direct answer(returning reference) to this question rather to change the design such that you return an owned data using something like .to_owned() method on the result from the get function call.
fn get(&self, key: &String) -> String {
let lock = self.a.lock().unwrap(); // #1 Returns MutexGuard
let val = lock.get(key).unwrap();
val.to_owned()
}
Long Form
In the original code snipped, there are actually 2 problems at hand, though only 1 is mentioned in the question.
cannot return value referencing temporary value
returns a value referencing data owned by the current function
Let's try to dig deeper into each of these one by one.
Problem 1
cannot return value referencing temporary value
The temporary value here is referring to MutexGuard. The lock method doesn't return us the reference to the HashMap rather a MutexGuard wrapped around the HashMap. The reason why .get() works on MutexGuard is because it implements DeRef::deref trait. Essentially, it means that MutexGuard can deref into the value it wraps when needed. This deref happens when we call the .get() method.
We can understand the temporary nature of the mutexguard by diving deeper into how the deref method is implemented under the hood.
fn deref<'a>(&'a self) -> &'a T
This means that MutexGuard can only return reference to HashMap for as long as it is alive. Notice the elided lifetime 'a. But since, we don't store the MutexGuard into any local variable rather directly dereference it the rust compiler thinks that it gets dropped right after the get call.
The lifetime of the HashMap will be same as MutexGuard. Any result will share the lifetime of the HashMap. Hence, the value/result from .get() method gets dropped instantly.
Solution 1: Store the MutexGuard locally
If we store the mutexguard in a local variable using the let binding. Then the HashMap also has the lifetime of the function scope and the reference/result also has the same lifetime.
let lock = self.a.lock().unwrap(); // storing MutexGuard in local binding
let val = lock.get(key).unwrap(); // val can live as long as lock is alive which is function's lifetime
With this issue fixed, there is just one problem left.
Problem 2
returns a value referencing data owned by the current function
Since, we take the lock in the current function scope, the reference returned from the get function will only be alive for as long as the lock is alive. When we return the reference from the function the compiler will start screaming back at us with the error of data ownership is only valid in the current function scope. It makes sense also, since we only asked(indirectly) for the lock to be active in this function scope. It is semantically wrong to expect the reference to be valid outside the scope of this function.
Solution 2: Change in approach
The whole idea of using Arc and Mutex is to add the capability to update the data between multiple threads safely. This thread safety is provided by the Mutex which enables locking mechanism on the wrapped data, in your case HashMap.
As pointed out by #Abhijit-K, It's not a good design to take the reference of any value outside the scope of the lock.
As explained very nicely in the post by #Angelico the lock is dropped within the scope of the function.
Case 1: modifying wrapped data
Only fetch the wrapped value where you have to make changes to the data. Basically, take the lock where you want to change the data, do it in the same scope.
Basically, you pass around the cloned Arc between functions to start with. That is the power of Arc, it can give you many cloned references pointing to the same data on the heap.
Case 2: reading wrapped data
Take a cloned value instead of the reference. Change the approach to return String from &String.
You need to clone the ARC and move the clone ARC to another thread/task. From the clone you can lock and access it. I suggest use RwLock instead of Mutex if there are more accesses than writes.
When you clone ARC you are not cloning the underlying object just the ARC. Also in your case you need to wrap the struct into ARC or change the design, as it is ARC that should be cloned and moved
Approach to share the object should be via guard I believe. With RWLock multiple can read map via the guards:
use async_std::task;
use std::sync::{Arc,RwLock, RwLockReadGuard, RwLockWriteGuard};
use std::collections::HashMap;
#[derive(Default)]
struct Hey{
a:Arc<RwLock<HashMap<String, String>>>
}
impl Hey {
pub fn read(&self) -> RwLockReadGuard<'_, HashMap<String, String>> {
self.a.read().unwrap()
}
pub fn write(&self) -> RwLockWriteGuard<'_, HashMap<String, String>> {
self.a.write().unwrap()
}
}
fn main() {
let h = Hey{..Default::default()};
h.write().insert("k1".to_string(), "v1".to_string());
println!("{:?}", h.read().get("k1"));
task::block_on(async move {
println!("{:?}", h.read().get("k1"));
});
}

Why does cloned() allow this function to compile

I'm starting to learn Rust and I tried to implement a function to reverse a vector of strings. I found a solution but I don't understand why it works.
This works:
fn reverse_strings(strings:Vec<&str>) -> Vec<&str> {
let actual: Vec<_> = strings.iter().cloned().rev().collect();
return actual;
}
But this doesn't.
fn reverse_strings(strings:Vec<&str>) -> Vec<&str> {
let actual: Vec<_> = strings.iter().rev().collect(); // without clone
return actual;
}
Error message
src/main.rs:28:10: 28:16 error: mismatched types:
expected `collections::vec::Vec<&str>`,
found `collections::vec::Vec<&&str>`
(expected str,
found &-ptr) [E0308]
Can someone explain to me why? What happens in the second function? Thanks!
So the call to .cloned() is essentially like doing .map(|i| i.clone()) in the same position (i.e. you can replace the former with the latter).
The thing is that when you call iter(), you're iterating/operating on references to the items being iterated. Notice that the vector already consists of 'references', specifically string slices.
So to zoom in a bit, let's replace cloned() with the equivalent map() that I mentioned above, for pedagogical purposes, since they are equivalent. This is what it actually looks like:
.map(|i: & &str| i.clone())
So notice that that's a reference to a reference (slice), because like I said, iter() operates on references to the items, not the items themselves. So since a single element in the vector being iterated is of type &str, then we're actually getting a reference to that, i.e. & &str. By calling clone() on each of these items, we go from a & &str to a &str, just like calling .clone() on a &i64 would result in an i64.
So to bring everything together, iter() iterates over references to the elements. So if you create a new vector from the collected items yielded by the iterator you construct (which you constructed by calling iter()) you would get a vector of references to references, that is:
let actual: Vec<& &str> = strings.iter().rev().collect();
So first of all realize that this is not the same as the type you're saying the function returns, Vec<&str>. More fundamentally, however, the lifetimes of these references would be local to the function, so even if you changed the return type to Vec<& &str> you would get a lifetime error.
Something else you could do, however, is to use the into_iter() method. This method actually does iterate over each element, not a reference to it. However, this means that the elements are moved from the original iterator/container. This is only possible in your situation because you're passing the vector by value, so you're allowed to move elements out of it.
fn reverse_strings(strings:Vec<&str>) -> Vec<&str> {
let actual: Vec<_> = strings.into_iter().rev().collect();
return actual;
}
playpen
This probably makes a bit more sense than cloning, since we are passed the vector by value, we're allowed to do anything with the elements, including moving them to a different location (in this case the new, reversed vector). And even if we don't, the vector will be dropped at the end of that function anyways, so we might as well. Cloning would be more appropriate if we're not allowed to do that (e.g. if we were passed the vector by reference, or a slice instead of a vector more likely).

Resources