What is the design rationale for supplying an iter_mut function for HashMap but not HashSet in Rust?
Would it be a faux pas to roll one's own (assuming that can even be done)?
Having one could alleviate situations that give rise to
previous borrow of X occurs here; the immutable borrow prevents
subsequent moves or mutable borrows of X until the borrow ends
Example
An extremely convoluted example (Gist) that does not show-case why the parameter passing is the way that it is. Has a short comment explaining the pain-point:
use std::collections::HashSet;
fn derp(v: i32, unprocessed: &mut HashSet<i32>) {
if unprocessed.contains(&v) {
// Pretend that v has been processed
unprocessed.remove(&v);
}
}
fn herp(v: i32) {
let mut unprocessed: HashSet<i32> = HashSet::new();
unprocessed.insert(v);
// I need to iterate over the unprocessed values
while let Some(u) = unprocessed.iter().next() {
// And them pass them mutably to another function
// as I will process the values inside derp and
// remove them from the set.
//
// This is an extremely convoluted example but
// I need for derp to be a separate function
// as I will employ recursion there, as it is
// much more succinct than an iterative version.
derp(*u, &mut unprocessed);
}
}
fn main() {
println!("Hello, world!");
herp(10);
}
The statement
while let Some(u) = unprocessed.iter().next() {
is an immutable borrow, hence
derp(*u, &mut unprocessed);
is impossible as unprocessed cannot be borrowed mutably. The immutable borrow does not end until the end of the while-loop.
I have tried to use this as reference and essentially ended up with trying to fool the borrow checker through various permutations of assignments, enclosing braces, but due to the coupling of the intended expressions the problem remains.
You have to think about what HashSet actually is. The IterMut that you get from HashMap::iter_mut() is only mutable on the value part: (&key, &mut val), ((&'a K, &'a mut V))
HashSet is basically a HashMap<T, ()>, so the actual values are the keys, and if you would modify the keys the hash of them would have to be updated or you get an invalid HashMap.
If your HashSet contains a Copy type, such as i32, you can work on a copy of the value to release the borrow on the HashSet early. To do this, you need to eliminate all borrows from the bindings in the while let expression. In your original code, u is of type &i32, and it keeps borrowing from unprocessed until the end of the loop. If we change the pattern to Some(&u), then u is of type i32, which doesn't borrow from anything, so we're free to use unprocessed as we like.
fn herp(v: i32) {
let mut unprocessed: HashSet<i32> = HashSet::new();
unprocessed.insert(v);
while let Some(&u) = unprocessed.iter().next() {
derp(u, &mut unprocessed);
}
}
If the type is not Copy or is too expensive to copy/clone, you can wrap them in Rc or Arc, and clone them as you iterate on them using cloned() (cloning an Rc or Arc doesn't clone the underlying value, it just clones the Rc pointer and increments the reference counter).
use std::collections::HashSet;
use std::rc::Rc;
fn derp(v: &i32, unprocessed: &mut HashSet<Rc<i32>>) {
if unprocessed.contains(v) {
unprocessed.remove(v);
}
}
fn herp(v: Rc<i32>) {
let mut unprocessed: HashSet<Rc<i32>> = HashSet::new();
unprocessed.insert(v);
while let Some(u) = unprocessed.iter().cloned().next() {
// If you don't use u afterwards,
// you could also pass if by value to derp.
derp(&u, &mut unprocessed);
}
}
fn main() {
println!("Hello, world!");
herp(Rc::new(10));
}
Related
Running into an ownership issue when attempting to reference multiple values from a HashMap in a struct as parameters in a function call. Here is a PoC of the issue.
use std::collections::HashMap;
struct Resource {
map: HashMap<String, String>,
}
impl Resource {
pub fn new() -> Self {
Resource {
map: HashMap::new(),
}
}
pub fn load(&mut self, key: String) -> &mut String {
self.map.get_mut(&key).unwrap()
}
}
fn main() {
// Initialize struct containing a HashMap.
let mut res = Resource {
map: HashMap::new(),
};
res.map.insert("Item1".to_string(), "Value1".to_string());
res.map.insert("Item2".to_string(), "Value2".to_string());
// This compiles and runs.
let mut value1 = res.load("Item1".to_string());
single_parameter(value1);
let mut value2 = res.load("Item2".to_string());
single_parameter(value2);
// This has ownership issues.
// multi_parameter(value1, value2);
}
fn single_parameter(value: &String) {
println!("{}", *value);
}
fn multi_parameter(value1: &mut String, value2: &mut String) {
println!("{}", *value1);
println!("{}", *value2);
}
Uncommenting multi_parameter results in the following error:
28 | let mut value1 = res.load("Item1".to_string());
| --- first mutable borrow occurs here
29 | single_parameter(value1);
30 | let mut value2 = res.load("Item2".to_string());
| ^^^ second mutable borrow occurs here
...
34 | multi_parameter(value1, value2);
| ------ first borrow later used here
It would technically be possible for me to break up the function calls (using the single_parameter function approach), but it would be more convenient to pass the
variables to a single function call.
For additional context, the actual program where I'm encountering this issue is an SDL2 game where I'm attempting to pass multiple textures into a single function call to be drawn, where the texture data may be modified within the function.
This is currently not possible, without resorting to unsafe code or interior mutability at least. There is no way for the compiler to know if two calls to load will yield mutable references to different data as it cannot always infer the value of the key. In theory, mutably borrowing both res.map["Item1"] and res.map["Item2"] would be fine as they would refer to different values in the map, but there is no way for the compiler to know this at compile time.
The easiest way to do this, as already mentioned, is to use a structure that allows interior mutability, like RefCell, which typically enforces the memory safety rules at run-time before returning a borrow of the wrapped value. You can also work around the borrow checker in this case by dealing with mut pointers in unsafe code:
pub fn load_many<'a, const N: usize>(&'a mut self, keys: [&str; N]) -> [&'a mut String; N] {
// TODO: Assert that keys are distinct, so that we don't return
// multiple references to the same value
keys.map(|key| self.load(key) as *mut _)
.map(|ptr| unsafe { &mut *ptr })
}
Rust Playground
The TODO is important, as this assertion is the only way to ensure that the safety invariant of only having one mutable reference to any value at any time is upheld.
It is, however, almost always better (and easier) to use a known safe interior mutation abstraction like RefCell rather than writing your own unsafe code.
struct Test {
a: i32,
b: i32,
}
fn other(x: &mut i32, _refs: &Vec<&i32>) {
*x += 1;
}
fn main() {
let mut xes: Vec<Test> = vec![Test { a: 3, b: 5 }];
let mut refs: Vec<&i32> = Vec::new();
for y in &xes {
refs.push(&y.a);
}
xes.iter_mut().for_each(|val| other(&mut val.b, &refs));
}
Although refs only holds references to the a-member of the elements in xes and the function other uses the b-member, rust produces following error:
error[E0502]: cannot borrow `xes` as mutable because it is also borrowed as immutable
--> /src/main.rs:16:5
|
13 | for y in &xes {
| ---- immutable borrow occurs here
...
16 | xes.iter_mut().for_each(|val| other(&mut val.b, &refs));
| ^^^ mutable borrow occurs here ---- immutable borrow later captured here by closure
Playground
Is there something wrong with the closure? Usually splitting borrows should allow this. What am I missing?
Splitting borrows only works from within one function. Here, though, you're borrowing field a in main and field b in the closure (which, apart from being able to consume and borrow variables from the outer scope, is a distinct function).
As of Rust 1.43.1, function signatures cannot express fine-grained borrows; when a reference is passed (directly or indirectly) to a function, it gets access to all of it. Borrow checking across functions is based on function signatures; this is in part for performance (inference across functions is more costly), in part for ensuring compatibility as a function evolves (especially in a library): what constitutes a valid argument to the function shouldn't depend on the function's implementation.
As I understand it, your requirement is that you need to be able to update field b of your objects based on the value of field a of the whole set of objects.
I see two ways to fix this. First, we can capture all mutable references to b at the same time as we capture the shared references to a. This is a proper example of splitting borrows. A downside of this approach is that we need to allocate two Vecs just to perform the operation.
fn main() {
let mut xes: Vec<Test> = vec![Test { a: 3, b: 5 }];
let mut x_as: Vec<&i32> = Vec::new();
let mut x_bs: Vec<&mut i32> = Vec::new();
for x in &mut xes {
x_as.push(&x.a);
x_bs.push(&mut x.b);
}
x_bs.iter_mut().for_each(|b| other(b, &x_as));
}
Here's an equivalent way of building the two Vecs using iterators:
fn main() {
let mut xes: Vec<Test> = vec![Test { a: 3, b: 5 }];
let (x_as, mut x_bs): (Vec<_>, Vec<_>) =
xes.iter_mut().map(|x| (&x.a, &mut x.b)).unzip();
x_bs.iter_mut().for_each(|b| other(b, &x_as));
}
Another way is to avoid mutable references completely and to use interior mutability instead. The standard library has Cell, which works well for Copy types such as i32, RefCell, which works for all types but does borrowing checking at runtime, adding some slight overhead, and Mutex and RwLock, which can be used in multiple threads but perform lock checks at runtime so at most one thread gets access to the inner value at any time.
Here's an example with Cell. We can eliminate the two temporary Vecs with this approach, and we can pass the whole collection of objects to the other function instead of just references to the a field.
use std::cell::Cell;
struct Test {
a: i32,
b: Cell<i32>,
}
fn other(x: &Cell<i32>, refs: &[Test]) {
x.set(x.get() + 1);
}
fn main() {
let xes: Vec<Test> = vec![Test { a: 3, b: Cell::new(5) }];
xes.iter().for_each(|x| other(&x.b, &xes));
}
I have the following code:
use std::collections::{HashMap, HashSet};
fn populate_connections(
start: i32,
num: i32,
conns: &mut HashMap<i32, HashSet<i32>>,
ancs: &mut HashSet<i32>,
) {
let mut orig_conns = conns.get_mut(&start).unwrap();
let pipes = conns.get(&num).unwrap();
for pipe in pipes.iter() {
if !ancs.contains(pipe) && !orig_conns.contains(pipe) {
ancs.insert(*pipe);
orig_conns.insert(*pipe);
populate_connections(start, num, conns, ancs);
}
}
}
fn main() {}
The logic is not very important, I'm trying to create a function which will itself and walk over pipes.
My issue is that this doesn't compile:
error[E0502]: cannot borrow `*conns` as immutable because it is also borrowed as mutable
--> src/main.rs:10:17
|
9 | let mut orig_conns = conns.get_mut(&start).unwrap();
| ----- mutable borrow occurs here
10 | let pipes = conns.get(&num).unwrap();
| ^^^^^ immutable borrow occurs here
...
19 | }
| - mutable borrow ends here
error[E0499]: cannot borrow `*conns` as mutable more than once at a time
--> src/main.rs:16:46
|
9 | let mut orig_conns = conns.get_mut(&start).unwrap();
| ----- first mutable borrow occurs here
...
16 | populate_connections(start, num, conns, ancs);
| ^^^^^ second mutable borrow occurs here
...
19 | }
| - first borrow ends here
I don't know how to make it work. At the beginning, I'm trying to get two HashSets stored in a HashMap (orig_conns and pipes).
Rust won't let me have both mutable and immutable variables at the same time. I'm confused a bit because this will be completely different objects but I guess if &start == &num, then I would have two different references to the same object (one mutable, one immutable).
Thats ok, but then how can I achieve this? I want to iterate over one HashSet and read and modify other one. Let's assume that they won't be the same HashSet.
If you can change your datatypes and your function signature, you can use a RefCell to create interior mutability:
use std::cell::RefCell;
use std::collections::{HashMap, HashSet};
fn populate_connections(
start: i32,
num: i32,
conns: &HashMap<i32, RefCell<HashSet<i32>>>,
ancs: &mut HashSet<i32>,
) {
let mut orig_conns = conns.get(&start).unwrap().borrow_mut();
let pipes = conns.get(&num).unwrap().borrow();
for pipe in pipes.iter() {
if !ancs.contains(pipe) && !orig_conns.contains(pipe) {
ancs.insert(*pipe);
orig_conns.insert(*pipe);
populate_connections(start, num, conns, ancs);
}
}
}
fn main() {}
Note that if start == num, the thread will panic because this is an attempt to have both mutable and immutable access to the same HashSet.
Safe alternatives to RefCell
Depending on your exact data and code needs, you can also use types like Cell or one of the atomics. These have lower memory overhead than a RefCell and only a small effect on codegen.
In multithreaded cases, you may wish to use a Mutex or RwLock.
Use hashbrown::HashMap
If you can switch to using hashbrown, you may be able to use a method like get_many_mut:
use hashbrown::HashMap; // 0.12.1
fn main() {
let mut map = HashMap::new();
map.insert(1, true);
map.insert(2, false);
dbg!(&map);
if let Some([a, b]) = map.get_many_mut([&1, &2]) {
std::mem::swap(a, b);
}
dbg!(&map);
}
As hashbrown is what powers the standard library hashmap, this is also available in nightly Rust as HashMap::get_many_mut.
Unsafe code
If you can guarantee that your two indices are different, you can use unsafe code and avoid interior mutability:
use std::collections::HashMap;
fn get_mut_pair<'a, K, V>(conns: &'a mut HashMap<K, V>, a: &K, b: &K) -> (&'a mut V, &'a mut V)
where
K: Eq + std::hash::Hash,
{
unsafe {
let a = conns.get_mut(a).unwrap() as *mut _;
let b = conns.get_mut(b).unwrap() as *mut _;
assert_ne!(a, b, "The two keys must not resolve to the same value");
(&mut *a, &mut *b)
}
}
fn main() {
let mut map = HashMap::new();
map.insert(1, true);
map.insert(2, false);
dbg!(&map);
let (a, b) = get_mut_pair(&mut map, &1, &2);
std::mem::swap(a, b);
dbg!(&map);
}
Similar code can be found in libraries like multi_mut.
This code tries to have an abundance of caution. An assertion enforces that the two values are distinct pointers before converting them back into mutable references and we explicitly add lifetimes to the returned variables.
You should understand the nuances of unsafe code before blindly using this solution. Notably, previous versions of this answer were incorrect. Thanks to #oberien for finding the unsoundness in the original implementation of this and proposing a fix. This playground demonstrates how purely safe Rust code could cause the old code to result in memory unsafety.
An enhanced version of this solution could accept an array of keys and return an array of values:
fn get_mut_pair<'a, K, V, const N: usize>(conns: &'a mut HashMap<K, V>, mut ks: [&K; N]) -> [&'a mut V; N]
It becomes more difficult to ensure that all the incoming keys are unique, however.
Note that this function doesn't attempt to solve the original problem, which is vastly more complex than verifying that two indices are disjoint. The original problem requires:
tracking three disjoint borrows, two of which are mutable and one that is immutable.
tracking the recursive call
must not modify the HashMap in any way which would cause resizing, which would invalidate any of the existing references from a previous level.
must not alias any of the references from a previous level.
Using something like RefCell is a much simpler way to ensure you do not trigger memory unsafety.
Consider:
fn main() {
let mut words: Vec<String> = Vec::new();
words.push(String::from("Example1"));
do_something(&mut words);
for word in words.iter() {
println!("{}", word);
}
}
fn do_something(words: &mut Vec<String>) {
//modify vector, maybe push something:
words.push(String::from("Example2"));
}
vs.
fn main() {
let mut words: Vec<String> = Vec::new();
words.push(String::from("Example1"));
words = do_something(words);
for word in words.iter() {
println!("{}", word);
}
}
fn do_something(mut words: Vec<String>) -> Vec<String> {
//modify vector, maybe push something:
words.push(String::from("Example2"));
return words;
}
Both solutions will print:
Example1
Example2
Is there any difference? What should we use?
No, there's really not much difference in the capability of code using one or the other.
Most of the benefits of one vs the other lie outside of pure capability:
Taking a reference is often more ergonomic to the users of your code: they don't have to continue to remember to assign the return value of each function call.
Taking a value vs. a reference is also often a better signal to your user about the intended usage of the code.
There's a hierarchy of what types are interoperable. If you have ownership of a value, you can call a function that takes ownership, a mutable reference, or an immutable reference. If you have a mutable reference, you can call a function that takes a mutable reference or an immutable reference. If you have an immutable reference, you can only call a function that takes an immutable reference. Thus it's common to accept the most permissive type you can.
I have the following code:
use std::collections::{HashMap, HashSet};
fn populate_connections(
start: i32,
num: i32,
conns: &mut HashMap<i32, HashSet<i32>>,
ancs: &mut HashSet<i32>,
) {
let mut orig_conns = conns.get_mut(&start).unwrap();
let pipes = conns.get(&num).unwrap();
for pipe in pipes.iter() {
if !ancs.contains(pipe) && !orig_conns.contains(pipe) {
ancs.insert(*pipe);
orig_conns.insert(*pipe);
populate_connections(start, num, conns, ancs);
}
}
}
fn main() {}
The logic is not very important, I'm trying to create a function which will itself and walk over pipes.
My issue is that this doesn't compile:
error[E0502]: cannot borrow `*conns` as immutable because it is also borrowed as mutable
--> src/main.rs:10:17
|
9 | let mut orig_conns = conns.get_mut(&start).unwrap();
| ----- mutable borrow occurs here
10 | let pipes = conns.get(&num).unwrap();
| ^^^^^ immutable borrow occurs here
...
19 | }
| - mutable borrow ends here
error[E0499]: cannot borrow `*conns` as mutable more than once at a time
--> src/main.rs:16:46
|
9 | let mut orig_conns = conns.get_mut(&start).unwrap();
| ----- first mutable borrow occurs here
...
16 | populate_connections(start, num, conns, ancs);
| ^^^^^ second mutable borrow occurs here
...
19 | }
| - first borrow ends here
I don't know how to make it work. At the beginning, I'm trying to get two HashSets stored in a HashMap (orig_conns and pipes).
Rust won't let me have both mutable and immutable variables at the same time. I'm confused a bit because this will be completely different objects but I guess if &start == &num, then I would have two different references to the same object (one mutable, one immutable).
Thats ok, but then how can I achieve this? I want to iterate over one HashSet and read and modify other one. Let's assume that they won't be the same HashSet.
If you can change your datatypes and your function signature, you can use a RefCell to create interior mutability:
use std::cell::RefCell;
use std::collections::{HashMap, HashSet};
fn populate_connections(
start: i32,
num: i32,
conns: &HashMap<i32, RefCell<HashSet<i32>>>,
ancs: &mut HashSet<i32>,
) {
let mut orig_conns = conns.get(&start).unwrap().borrow_mut();
let pipes = conns.get(&num).unwrap().borrow();
for pipe in pipes.iter() {
if !ancs.contains(pipe) && !orig_conns.contains(pipe) {
ancs.insert(*pipe);
orig_conns.insert(*pipe);
populate_connections(start, num, conns, ancs);
}
}
}
fn main() {}
Note that if start == num, the thread will panic because this is an attempt to have both mutable and immutable access to the same HashSet.
Safe alternatives to RefCell
Depending on your exact data and code needs, you can also use types like Cell or one of the atomics. These have lower memory overhead than a RefCell and only a small effect on codegen.
In multithreaded cases, you may wish to use a Mutex or RwLock.
Use hashbrown::HashMap
If you can switch to using hashbrown, you may be able to use a method like get_many_mut:
use hashbrown::HashMap; // 0.12.1
fn main() {
let mut map = HashMap::new();
map.insert(1, true);
map.insert(2, false);
dbg!(&map);
if let Some([a, b]) = map.get_many_mut([&1, &2]) {
std::mem::swap(a, b);
}
dbg!(&map);
}
As hashbrown is what powers the standard library hashmap, this is also available in nightly Rust as HashMap::get_many_mut.
Unsafe code
If you can guarantee that your two indices are different, you can use unsafe code and avoid interior mutability:
use std::collections::HashMap;
fn get_mut_pair<'a, K, V>(conns: &'a mut HashMap<K, V>, a: &K, b: &K) -> (&'a mut V, &'a mut V)
where
K: Eq + std::hash::Hash,
{
unsafe {
let a = conns.get_mut(a).unwrap() as *mut _;
let b = conns.get_mut(b).unwrap() as *mut _;
assert_ne!(a, b, "The two keys must not resolve to the same value");
(&mut *a, &mut *b)
}
}
fn main() {
let mut map = HashMap::new();
map.insert(1, true);
map.insert(2, false);
dbg!(&map);
let (a, b) = get_mut_pair(&mut map, &1, &2);
std::mem::swap(a, b);
dbg!(&map);
}
Similar code can be found in libraries like multi_mut.
This code tries to have an abundance of caution. An assertion enforces that the two values are distinct pointers before converting them back into mutable references and we explicitly add lifetimes to the returned variables.
You should understand the nuances of unsafe code before blindly using this solution. Notably, previous versions of this answer were incorrect. Thanks to #oberien for finding the unsoundness in the original implementation of this and proposing a fix. This playground demonstrates how purely safe Rust code could cause the old code to result in memory unsafety.
An enhanced version of this solution could accept an array of keys and return an array of values:
fn get_mut_pair<'a, K, V, const N: usize>(conns: &'a mut HashMap<K, V>, mut ks: [&K; N]) -> [&'a mut V; N]
It becomes more difficult to ensure that all the incoming keys are unique, however.
Note that this function doesn't attempt to solve the original problem, which is vastly more complex than verifying that two indices are disjoint. The original problem requires:
tracking three disjoint borrows, two of which are mutable and one that is immutable.
tracking the recursive call
must not modify the HashMap in any way which would cause resizing, which would invalidate any of the existing references from a previous level.
must not alias any of the references from a previous level.
Using something like RefCell is a much simpler way to ensure you do not trigger memory unsafety.