Removing items from a BTreeMap or BTreeSet found through iteration

Removing items from a BTreeMap or BTreeSet found through iteration - rust

I would like to remove items from a BTreeMap which have been found through iteration.
As it is not possible to remove items while iterating, I put the items to delete into a vector. The main issue is that it is not possible to use a vector of references, but only a vector of values. All the keys for which the entry has to be removed must then be cloned (assuming the key implements the Clone trait).
For instance, this short sample does not compile:
use std::collections::BTreeMap;
pub fn clean() {
let mut map = BTreeMap::<String, i32>::new();
let mut to_delete = Vec::new();
{
for (k, v) in map.iter() {
if *v > 10 {
to_delete.push(k);
}
}
}
for k in to_delete.drain(..) {
map.remove(k);
}
}
fn main() {}
It generates the following errors when it is compiled:
error[E0502]: cannot borrow `map` as mutable because it is also borrowed as immutable
--> src/main.rs:17:9
|
9 | for (k, v) in map.iter() {
| --- immutable borrow occurs here
...
17 | map.remove(k);
| ^^^ mutable borrow occurs here
18 | }
19 | }
| - immutable borrow ends here
Changing to_delete.push(k) with to_delete.push(k.clone()) makes this snippet compile correctly but it is quite costly if each key to delete must be cloned.
Is there a better solution?

TL;DR: you cannot.
As far as the compiler is concerned, the implementation of BTreeMap::remove might do this:
pub fn remove<Q>(&mut self, key: &Q) -> Option<V>
where
K: Borrow<Q>,
Q: Ord + ?Sized,
{
// actual deleting code, which destroys the value in the set
// now what `value` pointed to is gone and `value` points to invalid memory
// And now we access that memory, causing undefined behavior
key.borrow();
}
The compiler thus has to prevent using the reference to the value when the collection will be mutated.
To do this, you'd need something like the hypothetical "cursor" API for collections. This would allow you to iterate over the collection, returning a special type that hold the mutable innards of the collection. This type could give you a reference to check against and then allow you to remove the item.
I'd probably look at the problem from a bit different direction. Instead of trying to keep the map, I'd just create a brand new one:
use std::collections::BTreeMap;
pub fn main() {
let mut map = BTreeMap::new();
map.insert("thief", 5);
map.insert("troll", 52);
map.insert("gnome", 7);
let map: BTreeMap<_, _> =
map.into_iter()
.filter(|&(_, v)| v <= 10)
.collect();
println!("{:?}", map); // troll is gone
}
If your condition is equality on the field that makes the struct unique (a.k.a is the only field used in PartialEq and Hash), you can implement Borrow for your type and directly grab it / delete it:
use std::collections::BTreeMap;
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord)]
struct Monster(String);
use std::borrow::Borrow;
impl Borrow<str> for Monster {
fn borrow(&self) -> &str { &self.0 }
}
pub fn main() {
let mut map = BTreeMap::new();
map.insert(Monster("thief".into()), 5);
map.insert(Monster("troll".into()), 52);
map.insert(Monster("gnome".into()), 7);
map.remove("troll");
println!("{:?}", map); // troll is gone
}
See also:
How to implement HashMap with two keys?
How to avoid temporary allocations when using a complex key for a HashMap?

As of 1.53.0, there is a BTreeMap.retain method which allows to remove/retain items based on return value of a closure, while iterating:
map.retain(|_, v| *v<=10);

Related

Multiple Mutable Borrows from Struct Hashmap

Running into an ownership issue when attempting to reference multiple values from a HashMap in a struct as parameters in a function call. Here is a PoC of the issue.
use std::collections::HashMap;
struct Resource {
map: HashMap<String, String>,
}
impl Resource {
pub fn new() -> Self {
Resource {
map: HashMap::new(),
}
}
pub fn load(&mut self, key: String) -> &mut String {
self.map.get_mut(&key).unwrap()
}
}
fn main() {
// Initialize struct containing a HashMap.
let mut res = Resource {
map: HashMap::new(),
};
res.map.insert("Item1".to_string(), "Value1".to_string());
res.map.insert("Item2".to_string(), "Value2".to_string());
// This compiles and runs.
let mut value1 = res.load("Item1".to_string());
single_parameter(value1);
let mut value2 = res.load("Item2".to_string());
single_parameter(value2);
// This has ownership issues.
// multi_parameter(value1, value2);
}
fn single_parameter(value: &String) {
println!("{}", *value);
}
fn multi_parameter(value1: &mut String, value2: &mut String) {
println!("{}", *value1);
println!("{}", *value2);
}
Uncommenting multi_parameter results in the following error:
28 | let mut value1 = res.load("Item1".to_string());
| --- first mutable borrow occurs here
29 | single_parameter(value1);
30 | let mut value2 = res.load("Item2".to_string());
| ^^^ second mutable borrow occurs here
...
34 | multi_parameter(value1, value2);
| ------ first borrow later used here
It would technically be possible for me to break up the function calls (using the single_parameter function approach), but it would be more convenient to pass the
variables to a single function call.
For additional context, the actual program where I'm encountering this issue is an SDL2 game where I'm attempting to pass multiple textures into a single function call to be drawn, where the texture data may be modified within the function.

This is currently not possible, without resorting to unsafe code or interior mutability at least. There is no way for the compiler to know if two calls to load will yield mutable references to different data as it cannot always infer the value of the key. In theory, mutably borrowing both res.map["Item1"] and res.map["Item2"] would be fine as they would refer to different values in the map, but there is no way for the compiler to know this at compile time.
The easiest way to do this, as already mentioned, is to use a structure that allows interior mutability, like RefCell, which typically enforces the memory safety rules at run-time before returning a borrow of the wrapped value. You can also work around the borrow checker in this case by dealing with mut pointers in unsafe code:
pub fn load_many<'a, const N: usize>(&'a mut self, keys: [&str; N]) -> [&'a mut String; N] {
// TODO: Assert that keys are distinct, so that we don't return
// multiple references to the same value
keys.map(|key| self.load(key) as *mut _)
.map(|ptr| unsafe { &mut *ptr })
}
Rust Playground
The TODO is important, as this assertion is the only way to ensure that the safety invariant of only having one mutable reference to any value at any time is upheld.
It is, however, almost always better (and easier) to use a known safe interior mutation abstraction like RefCell rather than writing your own unsafe code.

Is there a way to use the value of an element from a map to update another element without interior mutability? [duplicate]

I have the following code:
use std::collections::{HashMap, HashSet};
fn populate_connections(
start: i32,
num: i32,
conns: &mut HashMap<i32, HashSet<i32>>,
ancs: &mut HashSet<i32>,
) {
let mut orig_conns = conns.get_mut(&start).unwrap();
let pipes = conns.get(&num).unwrap();
for pipe in pipes.iter() {
if !ancs.contains(pipe) && !orig_conns.contains(pipe) {
ancs.insert(*pipe);
orig_conns.insert(*pipe);
populate_connections(start, num, conns, ancs);
}
}
}
fn main() {}
The logic is not very important, I'm trying to create a function which will itself and walk over pipes.
My issue is that this doesn't compile:
error[E0502]: cannot borrow `*conns` as immutable because it is also borrowed as mutable
--> src/main.rs:10:17
|
9 | let mut orig_conns = conns.get_mut(&start).unwrap();
| ----- mutable borrow occurs here
10 | let pipes = conns.get(&num).unwrap();
| ^^^^^ immutable borrow occurs here
...
19 | }
| - mutable borrow ends here
error[E0499]: cannot borrow `*conns` as mutable more than once at a time
--> src/main.rs:16:46
|
9 | let mut orig_conns = conns.get_mut(&start).unwrap();
| ----- first mutable borrow occurs here
...
16 | populate_connections(start, num, conns, ancs);
| ^^^^^ second mutable borrow occurs here
...
19 | }
| - first borrow ends here
I don't know how to make it work. At the beginning, I'm trying to get two HashSets stored in a HashMap (orig_conns and pipes).
Rust won't let me have both mutable and immutable variables at the same time. I'm confused a bit because this will be completely different objects but I guess if &start == &num, then I would have two different references to the same object (one mutable, one immutable).
Thats ok, but then how can I achieve this? I want to iterate over one HashSet and read and modify other one. Let's assume that they won't be the same HashSet.

If you can change your datatypes and your function signature, you can use a RefCell to create interior mutability:
use std::cell::RefCell;
use std::collections::{HashMap, HashSet};
fn populate_connections(
start: i32,
num: i32,
conns: &HashMap<i32, RefCell<HashSet<i32>>>,
ancs: &mut HashSet<i32>,
) {
let mut orig_conns = conns.get(&start).unwrap().borrow_mut();
let pipes = conns.get(&num).unwrap().borrow();
for pipe in pipes.iter() {
if !ancs.contains(pipe) && !orig_conns.contains(pipe) {
ancs.insert(*pipe);
orig_conns.insert(*pipe);
populate_connections(start, num, conns, ancs);
}
}
}
fn main() {}
Note that if start == num, the thread will panic because this is an attempt to have both mutable and immutable access to the same HashSet.
Safe alternatives to RefCell
Depending on your exact data and code needs, you can also use types like Cell or one of the atomics. These have lower memory overhead than a RefCell and only a small effect on codegen.
In multithreaded cases, you may wish to use a Mutex or RwLock.

Use hashbrown::HashMap
If you can switch to using hashbrown, you may be able to use a method like get_many_mut:
use hashbrown::HashMap; // 0.12.1
fn main() {
let mut map = HashMap::new();
map.insert(1, true);
map.insert(2, false);
dbg!(&map);
if let Some([a, b]) = map.get_many_mut([&1, &2]) {
std::mem::swap(a, b);
}
dbg!(&map);
}
As hashbrown is what powers the standard library hashmap, this is also available in nightly Rust as HashMap::get_many_mut.
Unsafe code
If you can guarantee that your two indices are different, you can use unsafe code and avoid interior mutability:
use std::collections::HashMap;
fn get_mut_pair<'a, K, V>(conns: &'a mut HashMap<K, V>, a: &K, b: &K) -> (&'a mut V, &'a mut V)
where
K: Eq + std::hash::Hash,
{
unsafe {
let a = conns.get_mut(a).unwrap() as *mut _;
let b = conns.get_mut(b).unwrap() as *mut _;
assert_ne!(a, b, "The two keys must not resolve to the same value");
(&mut *a, &mut *b)
}
}
fn main() {
let mut map = HashMap::new();
map.insert(1, true);
map.insert(2, false);
dbg!(&map);
let (a, b) = get_mut_pair(&mut map, &1, &2);
std::mem::swap(a, b);
dbg!(&map);
}
Similar code can be found in libraries like multi_mut.
This code tries to have an abundance of caution. An assertion enforces that the two values are distinct pointers before converting them back into mutable references and we explicitly add lifetimes to the returned variables.
You should understand the nuances of unsafe code before blindly using this solution. Notably, previous versions of this answer were incorrect. Thanks to #oberien for finding the unsoundness in the original implementation of this and proposing a fix. This playground demonstrates how purely safe Rust code could cause the old code to result in memory unsafety.
An enhanced version of this solution could accept an array of keys and return an array of values:
fn get_mut_pair<'a, K, V, const N: usize>(conns: &'a mut HashMap<K, V>, mut ks: [&K; N]) -> [&'a mut V; N]
It becomes more difficult to ensure that all the incoming keys are unique, however.
Note that this function doesn't attempt to solve the original problem, which is vastly more complex than verifying that two indices are disjoint. The original problem requires:
tracking three disjoint borrows, two of which are mutable and one that is immutable.
tracking the recursive call
must not modify the HashMap in any way which would cause resizing, which would invalidate any of the existing references from a previous level.
must not alias any of the references from a previous level.
Using something like RefCell is a much simpler way to ensure you do not trigger memory unsafety.

How do I remove a value from a BTreeSet by using a reference to the value to remove? [duplicate]

I would like to remove items from a BTreeMap which have been found through iteration.
As it is not possible to remove items while iterating, I put the items to delete into a vector. The main issue is that it is not possible to use a vector of references, but only a vector of values. All the keys for which the entry has to be removed must then be cloned (assuming the key implements the Clone trait).
For instance, this short sample does not compile:
use std::collections::BTreeMap;
pub fn clean() {
let mut map = BTreeMap::<String, i32>::new();
let mut to_delete = Vec::new();
{
for (k, v) in map.iter() {
if *v > 10 {
to_delete.push(k);
}
}
}
for k in to_delete.drain(..) {
map.remove(k);
}
}
fn main() {}
It generates the following errors when it is compiled:
error[E0502]: cannot borrow `map` as mutable because it is also borrowed as immutable
--> src/main.rs:17:9
|
9 | for (k, v) in map.iter() {
| --- immutable borrow occurs here
...
17 | map.remove(k);
| ^^^ mutable borrow occurs here
18 | }
19 | }
| - immutable borrow ends here
Changing to_delete.push(k) with to_delete.push(k.clone()) makes this snippet compile correctly but it is quite costly if each key to delete must be cloned.
Is there a better solution?

TL;DR: you cannot.
As far as the compiler is concerned, the implementation of BTreeMap::remove might do this:
pub fn remove<Q>(&mut self, key: &Q) -> Option<V>
where
K: Borrow<Q>,
Q: Ord + ?Sized,
{
// actual deleting code, which destroys the value in the set
// now what `value` pointed to is gone and `value` points to invalid memory
// And now we access that memory, causing undefined behavior
key.borrow();
}
The compiler thus has to prevent using the reference to the value when the collection will be mutated.
To do this, you'd need something like the hypothetical "cursor" API for collections. This would allow you to iterate over the collection, returning a special type that hold the mutable innards of the collection. This type could give you a reference to check against and then allow you to remove the item.
I'd probably look at the problem from a bit different direction. Instead of trying to keep the map, I'd just create a brand new one:
use std::collections::BTreeMap;
pub fn main() {
let mut map = BTreeMap::new();
map.insert("thief", 5);
map.insert("troll", 52);
map.insert("gnome", 7);
let map: BTreeMap<_, _> =
map.into_iter()
.filter(|&(_, v)| v <= 10)
.collect();
println!("{:?}", map); // troll is gone
}
If your condition is equality on the field that makes the struct unique (a.k.a is the only field used in PartialEq and Hash), you can implement Borrow for your type and directly grab it / delete it:
use std::collections::BTreeMap;
#[derive(Debug, PartialEq, Eq, PartialOrd, Ord)]
struct Monster(String);
use std::borrow::Borrow;
impl Borrow<str> for Monster {
fn borrow(&self) -> &str { &self.0 }
}
pub fn main() {
let mut map = BTreeMap::new();
map.insert(Monster("thief".into()), 5);
map.insert(Monster("troll".into()), 52);
map.insert(Monster("gnome".into()), 7);
map.remove("troll");
println!("{:?}", map); // troll is gone
}
See also:
How to implement HashMap with two keys?
How to avoid temporary allocations when using a complex key for a HashMap?

As of 1.53.0, there is a BTreeMap.retain method which allows to remove/retain items based on return value of a closure, while iterating:
map.retain(|_, v| *v<=10);

Borrow two mutable values from the same HashMap

I have the following code:
use std::collections::{HashMap, HashSet};
fn populate_connections(
start: i32,
num: i32,
conns: &mut HashMap<i32, HashSet<i32>>,
ancs: &mut HashSet<i32>,
) {
let mut orig_conns = conns.get_mut(&start).unwrap();
let pipes = conns.get(&num).unwrap();
for pipe in pipes.iter() {
if !ancs.contains(pipe) && !orig_conns.contains(pipe) {
ancs.insert(*pipe);
orig_conns.insert(*pipe);
populate_connections(start, num, conns, ancs);
}
}
}
fn main() {}
The logic is not very important, I'm trying to create a function which will itself and walk over pipes.
My issue is that this doesn't compile:
error[E0502]: cannot borrow `*conns` as immutable because it is also borrowed as mutable
--> src/main.rs:10:17
|
9 | let mut orig_conns = conns.get_mut(&start).unwrap();
| ----- mutable borrow occurs here
10 | let pipes = conns.get(&num).unwrap();
| ^^^^^ immutable borrow occurs here
...
19 | }
| - mutable borrow ends here
error[E0499]: cannot borrow `*conns` as mutable more than once at a time
--> src/main.rs:16:46
|
9 | let mut orig_conns = conns.get_mut(&start).unwrap();
| ----- first mutable borrow occurs here
...
16 | populate_connections(start, num, conns, ancs);
| ^^^^^ second mutable borrow occurs here
...
19 | }
| - first borrow ends here
I don't know how to make it work. At the beginning, I'm trying to get two HashSets stored in a HashMap (orig_conns and pipes).
Rust won't let me have both mutable and immutable variables at the same time. I'm confused a bit because this will be completely different objects but I guess if &start == &num, then I would have two different references to the same object (one mutable, one immutable).
Thats ok, but then how can I achieve this? I want to iterate over one HashSet and read and modify other one. Let's assume that they won't be the same HashSet.

If you can change your datatypes and your function signature, you can use a RefCell to create interior mutability:
use std::cell::RefCell;
use std::collections::{HashMap, HashSet};
fn populate_connections(
start: i32,
num: i32,
conns: &HashMap<i32, RefCell<HashSet<i32>>>,
ancs: &mut HashSet<i32>,
) {
let mut orig_conns = conns.get(&start).unwrap().borrow_mut();
let pipes = conns.get(&num).unwrap().borrow();
for pipe in pipes.iter() {
if !ancs.contains(pipe) && !orig_conns.contains(pipe) {
ancs.insert(*pipe);
orig_conns.insert(*pipe);
populate_connections(start, num, conns, ancs);
}
}
}
fn main() {}
Note that if start == num, the thread will panic because this is an attempt to have both mutable and immutable access to the same HashSet.
Safe alternatives to RefCell
Depending on your exact data and code needs, you can also use types like Cell or one of the atomics. These have lower memory overhead than a RefCell and only a small effect on codegen.
In multithreaded cases, you may wish to use a Mutex or RwLock.

Use hashbrown::HashMap
If you can switch to using hashbrown, you may be able to use a method like get_many_mut:
use hashbrown::HashMap; // 0.12.1
fn main() {
let mut map = HashMap::new();
map.insert(1, true);
map.insert(2, false);
dbg!(&map);
if let Some([a, b]) = map.get_many_mut([&1, &2]) {
std::mem::swap(a, b);
}
dbg!(&map);
}
As hashbrown is what powers the standard library hashmap, this is also available in nightly Rust as HashMap::get_many_mut.
Unsafe code
If you can guarantee that your two indices are different, you can use unsafe code and avoid interior mutability:
use std::collections::HashMap;
fn get_mut_pair<'a, K, V>(conns: &'a mut HashMap<K, V>, a: &K, b: &K) -> (&'a mut V, &'a mut V)
where
K: Eq + std::hash::Hash,
{
unsafe {
let a = conns.get_mut(a).unwrap() as *mut _;
let b = conns.get_mut(b).unwrap() as *mut _;
assert_ne!(a, b, "The two keys must not resolve to the same value");
(&mut *a, &mut *b)
}
}
fn main() {
let mut map = HashMap::new();
map.insert(1, true);
map.insert(2, false);
dbg!(&map);
let (a, b) = get_mut_pair(&mut map, &1, &2);
std::mem::swap(a, b);
dbg!(&map);
}
Similar code can be found in libraries like multi_mut.
This code tries to have an abundance of caution. An assertion enforces that the two values are distinct pointers before converting them back into mutable references and we explicitly add lifetimes to the returned variables.
You should understand the nuances of unsafe code before blindly using this solution. Notably, previous versions of this answer were incorrect. Thanks to #oberien for finding the unsoundness in the original implementation of this and proposing a fix. This playground demonstrates how purely safe Rust code could cause the old code to result in memory unsafety.
An enhanced version of this solution could accept an array of keys and return an array of values:
fn get_mut_pair<'a, K, V, const N: usize>(conns: &'a mut HashMap<K, V>, mut ks: [&K; N]) -> [&'a mut V; N]
It becomes more difficult to ensure that all the incoming keys are unique, however.
Note that this function doesn't attempt to solve the original problem, which is vastly more complex than verifying that two indices are disjoint. The original problem requires:
tracking three disjoint borrows, two of which are mutable and one that is immutable.
tracking the recursive call
must not modify the HashMap in any way which would cause resizing, which would invalidate any of the existing references from a previous level.
must not alias any of the references from a previous level.
Using something like RefCell is a much simpler way to ensure you do not trigger memory unsafety.

Why does calling a method on a mutable reference involve "borrowing"?

I'm learning Rust and I'm trying to cargo-cult this code into compiling:
use std::vec::Vec;
use std::collections::BTreeMap;
struct Occ {
docnum: u64,
weight: f32,
}
struct PostWriter<'a> {
bytes: Vec<u8>,
occurrences: BTreeMap<&'a [u8], Vec<Occ>>,
}
impl<'a> PostWriter<'a> {
fn new() -> PostWriter<'a> {
PostWriter {
bytes: Vec::new(),
occurrences: BTreeMap::new(),
}
}
fn add_occurrence(&'a mut self, term: &[u8], occ: Occ) {
let occurrences = &mut self.occurrences;
match occurrences.get_mut(term) {
Some(x) => x.push(occ),
None => {
// Add the term bytes to the big vector of all terms
let termstart = self.bytes.len();
self.bytes.extend(term);
// Create a new occurrences vector
let occs = vec![occ];
// Take the appended term as a slice to use as a key
// ERROR: cannot borrow `*occurrences` as mutable more than once at a time
occurrences.insert(&self.bytes[termstart..], occs);
}
}
}
}
fn main() {}
I get an error:
error[E0499]: cannot borrow `*occurrences` as mutable more than once at a time
--> src/main.rs:34:17
|
24 | match occurrences.get_mut(term) {
| ----------- first mutable borrow occurs here
...
34 | occurrences.insert(&self.bytes[termstart..], occs);
| ^^^^^^^^^^^ second mutable borrow occurs here
35 | }
36 | }
| - first borrow ends here
I don't understand... I'm just calling a method on a mutable reference, why would that line involve borrowing?

I'm just calling a method on a mutable reference, why would that line involve borrowing?
When you call a method on an object that's going to mutate the object, you can't have any other references to that object outstanding. If you did, your mutation could invalidate those references and leave your program in an inconsistent state. For example, say that you had gotten a value out of your hashmap and then added a new value. Adding the new value hits a magic limit and forces memory to be reallocated, your value now points off to nowhere! When you use that value... bang goes the program!
In this case, it looks like you want to do the relatively common "append or insert if missing" operation. You will want to use entry for that:
use std::collections::BTreeMap;
fn main() {
let mut map = BTreeMap::new();
{
let nicknames = map.entry("joe").or_insert(Vec::new());
nicknames.push("shmoe");
// Using scoping to indicate that we are done with borrowing `nicknames`
// If we didn't, then we couldn't borrow map as
// immutable because we could still change it via `nicknames`
}
println!("{:?}", map)
}

Because you're calling a method that borrows as mutable
I had a similar question yesterday about Hash, until I noticed something in the docs. The docs for BTreeMap show a method signature for insert starting with fn insert(&mut self..
So when you call .insert, you're implicitly asking that function to borrow the BTreeMap as mutable.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Removing items from a BTreeMap or BTreeSet found through iteration - rust

As of 1.53.0, there is a BTreeMap.retain method which allows to remove/retain items based on return value of a closure, while iterating: map.retain(|_, v| *v<=10);

Related

Multiple Mutable Borrows from Struct Hashmap

Is there a way to use the value of an element from a map to update another element without interior mutability? [duplicate]

How do I remove a value from a BTreeSet by using a reference to the value to remove? [duplicate]

Borrow two mutable values from the same HashMap

Why does calling a method on a mutable reference involve "borrowing"?

Categories

Resources