Getting first member of a BTreeSet - rust

In Rust, I have a BTreeSet that I'm using to keep my values in order. I have a loop that should retrieve and remove the first (lowest) member of the set. I'm using a cloned iterator to retrieve the first member. Here's the code:
use std::collections::BTreeSet;
fn main() {
let mut start_nodes = BTreeSet::new();
// add items to the set
while !start_nodes.is_empty() {
let mut start_iter = start_nodes.iter();
let mut start_iter_cloned = start_iter.cloned();
let n = start_iter_cloned.next().unwrap();
start_nodes.remove(&n);
}
}
This, however, gives me the following compile error:
error[E0502]: cannot borrow `start_nodes` as mutable because it is also borrowed as immutable
--> prog.rs:60:6
|
56 | let mut start_iter = start_nodes.iter();
| ----------- immutable borrow occurs here
...
60 | start_nodes.remove(&n);
| ^^^^^^^^^^^ mutable borrow occurs here
...
77 | }
| - immutable borrow ends here
Why is start_nodes.iter() considered an immutable borrow? What approach should I take instead to get the first member?
I'm using version 1.14.0 (not by choice).

Why is start_nodes.iter() considered an immutable borrow?
Whenever you ask a question like this one, you need to look at the prototype of the function, in this case the prototype of BTreeSet::iter():
fn iter(&self) -> Iter<T>
If we look up the Iter type that is returned, we find that it's defined as
pub struct Iter<'a, T> where T: 'a { /* fields omitted */ }
The lifetime 'a is not explicitly mentioned in the definition of iter(); however, the lifetime elision rules make the function definition equivalent to
fn iter<'a>(&'a self) -> Iter<'a, T>
From this expanded version, you can see that the return value has a lifetime that is bound to the lifetime of the reference to self that you pass in, which is just another way of stating that the function call creates a shared borrow that lives as long as the return value. If you store the return value in a variable, the borrow lives at least as long as the variable.
What approach should I take instead to get the first member?
As noted in the comments, your code works on recent versions of Rust due to non-lexical lifetimes – the compiler figures out by itself that start_iter and start_iter_cloned don't need to live longer than the call to next(). In older versions of Rust, you can artificially limit the lifetime by introducing a new scope:
while !start_nodes.is_empty() {
let n = {
let mut start_iter = start_nodes.iter();
let mut start_iter_cloned = start_iter.cloned();
start_iter_cloned.next().unwrap()
};
start_nodes.remove(&n);
}
However, note that this code is needlessly long-winded. The new iterator you create and its cloning version only live inside the new scope, and they aren't really used for any other purpose, so you could just as well write
while !start_nodes.is_empty() {
let n = start_nodes.iter().next().unwrap().clone();
start_nodes.remove(&n);
}
which does exactly the same, and avoids the issues with long-living borrows by avoiding to store the intermediate values in variables, to ensure their lifetime ends immediately after the expression.
Finally, while you don't give full details of your use case, I strongly suspect that you would be better off with a BinaryHeap instead of a BTreeSet:
use std::collections::BinaryHeap;
fn main() {
let mut start_nodes = BinaryHeap::new();
start_nodes.push(42);
while let Some(n) = start_nodes.pop() {
// Do something with `n`
}
}
This code is shorter, simpler, completely sidesteps the issue with the borrow checker, and will also be more efficient.

Not sure this is the best approach, but I fixed it by introducing a new scope to ensure that the immutable borrow ends before the mutable borrow occurs:
use std::collections::BTreeSet;
fn main() {
let mut start_nodes = BTreeSet::new();
// add items to the set
while !start_nodes.is_empty() {
let mut n = 0;
{
let mut start_iter = start_nodes.iter();
let mut start_iter_cloned = start_iter.cloned();
let x = &mut n;
*x = start_iter_cloned.next().unwrap();
}
start_nodes.remove(&n);
}
}

Related

Life-time error in Parallel segmented-prime-sieve in Rust

I am trying to make parallel a prime sieve in Rust but Rust compiler not leave of give me a lifemetime error with the parameter true_block.
And data races are irrelevant because how primes are defined.
The error is:
error[E0621]: explicit lifetime required in the type of `true_block`
--> src/sieve.rs:65:22
|
50 | true_block: &mut Vec<bool>,
| -------------- help: add explicit lifetime `'static` to the type of `true_block`: `&'static mut Vec<bool>`
...
65 | handles.push(thread::spawn(move || {
| ^^^^^^^^^^^^^ lifetime `'static` required
The code is:
fn extend_helper(
primes: &Vec<usize>,
true_block: &mut Vec<bool>,
segment_min: usize,
segment_len: usize,
) {
let mut handles: Vec<thread::JoinHandle<()>> = vec![];
let arc_primes = Arc::new(true_block);
let segment_min = Arc::new(segment_min);
let segment_len = Arc::new(segment_len);
for prime in primes {
let prime = Arc::new(prime.clone());
let segment_min = Arc::clone(&segment_min);
let segment_len = Arc::clone(&segment_len);
let shared = Arc::clone(&arc_primes);
handles.push(thread::spawn(move || {
let tmp = smallest_multiple_of_n_geq_m(*prime, *segment_min) - *segment_min;
for j in (tmp..*segment_len).step_by(*prime) {
shared[j] = false;
}
}));
}
for handle in handles {
handle.join().unwrap();
}
}
The lifetime issue is with true_block which is a mutable reference passed into the function. Its lifetime is only as long as the call to the function, but because you're spawning a thread, the thread could live past the end of the function (the compiler can't be sure). Since it's not known how long the thread will live for, the lifetime of data passed to it must be 'static, meaning it will live until the end of the program.
To fix this, you can use .to_owned() to clone the array stored in the Arc, which will get around the lifetime issue. You would then need to copy the array date out of the Arc after joining the threads and write it back to the true_block reference. But it's also not possible to mutate an Arc, which only holds an immutable reference to the value. You need to use an Arc<Mutex<_>> for arc_primes and then you can mutate shared with shared.lock().unwrap()[j] = false; or something safer

Is there a way to use the value of an element from a map to update another element without interior mutability? [duplicate]

I have the following code:
use std::collections::{HashMap, HashSet};
fn populate_connections(
start: i32,
num: i32,
conns: &mut HashMap<i32, HashSet<i32>>,
ancs: &mut HashSet<i32>,
) {
let mut orig_conns = conns.get_mut(&start).unwrap();
let pipes = conns.get(&num).unwrap();
for pipe in pipes.iter() {
if !ancs.contains(pipe) && !orig_conns.contains(pipe) {
ancs.insert(*pipe);
orig_conns.insert(*pipe);
populate_connections(start, num, conns, ancs);
}
}
}
fn main() {}
The logic is not very important, I'm trying to create a function which will itself and walk over pipes.
My issue is that this doesn't compile:
error[E0502]: cannot borrow `*conns` as immutable because it is also borrowed as mutable
--> src/main.rs:10:17
|
9 | let mut orig_conns = conns.get_mut(&start).unwrap();
| ----- mutable borrow occurs here
10 | let pipes = conns.get(&num).unwrap();
| ^^^^^ immutable borrow occurs here
...
19 | }
| - mutable borrow ends here
error[E0499]: cannot borrow `*conns` as mutable more than once at a time
--> src/main.rs:16:46
|
9 | let mut orig_conns = conns.get_mut(&start).unwrap();
| ----- first mutable borrow occurs here
...
16 | populate_connections(start, num, conns, ancs);
| ^^^^^ second mutable borrow occurs here
...
19 | }
| - first borrow ends here
I don't know how to make it work. At the beginning, I'm trying to get two HashSets stored in a HashMap (orig_conns and pipes).
Rust won't let me have both mutable and immutable variables at the same time. I'm confused a bit because this will be completely different objects but I guess if &start == &num, then I would have two different references to the same object (one mutable, one immutable).
Thats ok, but then how can I achieve this? I want to iterate over one HashSet and read and modify other one. Let's assume that they won't be the same HashSet.
If you can change your datatypes and your function signature, you can use a RefCell to create interior mutability:
use std::cell::RefCell;
use std::collections::{HashMap, HashSet};
fn populate_connections(
start: i32,
num: i32,
conns: &HashMap<i32, RefCell<HashSet<i32>>>,
ancs: &mut HashSet<i32>,
) {
let mut orig_conns = conns.get(&start).unwrap().borrow_mut();
let pipes = conns.get(&num).unwrap().borrow();
for pipe in pipes.iter() {
if !ancs.contains(pipe) && !orig_conns.contains(pipe) {
ancs.insert(*pipe);
orig_conns.insert(*pipe);
populate_connections(start, num, conns, ancs);
}
}
}
fn main() {}
Note that if start == num, the thread will panic because this is an attempt to have both mutable and immutable access to the same HashSet.
Safe alternatives to RefCell
Depending on your exact data and code needs, you can also use types like Cell or one of the atomics. These have lower memory overhead than a RefCell and only a small effect on codegen.
In multithreaded cases, you may wish to use a Mutex or RwLock.
Use hashbrown::HashMap
If you can switch to using hashbrown, you may be able to use a method like get_many_mut:
use hashbrown::HashMap; // 0.12.1
fn main() {
let mut map = HashMap::new();
map.insert(1, true);
map.insert(2, false);
dbg!(&map);
if let Some([a, b]) = map.get_many_mut([&1, &2]) {
std::mem::swap(a, b);
}
dbg!(&map);
}
As hashbrown is what powers the standard library hashmap, this is also available in nightly Rust as HashMap::get_many_mut.
Unsafe code
If you can guarantee that your two indices are different, you can use unsafe code and avoid interior mutability:
use std::collections::HashMap;
fn get_mut_pair<'a, K, V>(conns: &'a mut HashMap<K, V>, a: &K, b: &K) -> (&'a mut V, &'a mut V)
where
K: Eq + std::hash::Hash,
{
unsafe {
let a = conns.get_mut(a).unwrap() as *mut _;
let b = conns.get_mut(b).unwrap() as *mut _;
assert_ne!(a, b, "The two keys must not resolve to the same value");
(&mut *a, &mut *b)
}
}
fn main() {
let mut map = HashMap::new();
map.insert(1, true);
map.insert(2, false);
dbg!(&map);
let (a, b) = get_mut_pair(&mut map, &1, &2);
std::mem::swap(a, b);
dbg!(&map);
}
Similar code can be found in libraries like multi_mut.
This code tries to have an abundance of caution. An assertion enforces that the two values are distinct pointers before converting them back into mutable references and we explicitly add lifetimes to the returned variables.
You should understand the nuances of unsafe code before blindly using this solution. Notably, previous versions of this answer were incorrect. Thanks to #oberien for finding the unsoundness in the original implementation of this and proposing a fix. This playground demonstrates how purely safe Rust code could cause the old code to result in memory unsafety.
An enhanced version of this solution could accept an array of keys and return an array of values:
fn get_mut_pair<'a, K, V, const N: usize>(conns: &'a mut HashMap<K, V>, mut ks: [&K; N]) -> [&'a mut V; N]
It becomes more difficult to ensure that all the incoming keys are unique, however.
Note that this function doesn't attempt to solve the original problem, which is vastly more complex than verifying that two indices are disjoint. The original problem requires:
tracking three disjoint borrows, two of which are mutable and one that is immutable.
tracking the recursive call
must not modify the HashMap in any way which would cause resizing, which would invalidate any of the existing references from a previous level.
must not alias any of the references from a previous level.
Using something like RefCell is a much simpler way to ensure you do not trigger memory unsafety.

Why can't I reuse a &mut reference after passing it to a function that accepts a generic type?

Why doesn't this code compile:
fn use_cursor(cursor: &mut io::Cursor<&mut Vec<u8>>) {
// do some work
}
fn take_reference(data: &mut Vec<u8>) {
{
let mut buf = io::Cursor::new(data);
use_cursor(&mut buf);
}
data.len();
}
fn produce_data() {
let mut data = Vec::new();
take_reference(&mut data);
data.len();
}
The error in this case is:
error[E0382]: use of moved value: `*data`
--> src/main.rs:14:5
|
9 | let mut buf = io::Cursor::new(data);
| ---- value moved here
...
14 | data.len();
| ^^^^ value used here after move
|
= note: move occurs because `data` has type `&mut std::vec::Vec<u8>`, which does not implement the `Copy` trait
The signature of io::Cursor::new is such that it takes ownership of its argument. In this case, the argument is a mutable reference to a Vec.
pub fn new(inner: T) -> Cursor<T>
It sort of makes sense to me; because Cursor::new takes ownership of its argument (and not a reference) we can't use that value later on. At the same time it doesn't make sense: we essentially only pass a mutable reference and the cursor goes out of scope afterwards anyway.
In the produce_data function we also pass a mutable reference to take_reference, and it doesn't produce a error when trying to use data again, unlike inside take_reference.
I found it possible to 'reclaim' the reference by using Cursor.into_inner(), but it feels a bit weird to do it manually, since in normal use-cases the borrow-checker is perfectly capable of doing it itself.
Is there a nicer solution to this problem than using .into_inner()? Maybe there's something else I don't understand about the borrow-checker?
Normally, when you pass a mutable reference to a function, the compiler implicitly performs a reborrow. This produces a new borrow with a shorter lifetime.
When the parameter is generic (and is not of the form &mut T), the compiler doesn't do this reborrowing automatically1. However, you can do it manually by dereferencing your existing mutable reference and then referencing it again:
fn take_reference(data: &mut Vec<u8>) {
{
let mut buf = io::Cursor::new(&mut *data);
use_cursor(&mut buf);
}
data.len();
}
1 — This is because the current compiler architecture only allows a chance to do a coercion if both the source and target types are known at the coercion site.

Extending borrowed lifetime for String slice

I have a function that reads in a file, and for each line adds it to a HashSet of type &str, but I can't work out how to tell the borrow checker to increase the lifetime.
Here's my function so far:
fn build_collection_set(reader: &mut BufReader<File>) -> HashSet<&str> {
let mut collection_set: HashSet<&str> = HashSet::new();
for line in reader.lines() {
let line = line.unwrap();
if line.len() > 0 {
collection_set.insert(&*line);
}
}
return collection_set;
}
How do I let Rust know I want to keep it around longer?
but I can't work out how to tell the borrow checker to increase the lifetime.
It's impossible.
The lifetime of a value, in C, C++ or Rust, is defined either:
by its lexical scope, if it is bound to an automatic variable
by its dynamic scope, if it is allocated on the heap
You can create variables which reference this value, and if your reference lives longer than the value, then you have a dangling reference:
in C and C++, you better do nothing with it
in Rust, the compiler will refuse to compile your code
In order to validate your program, the Rust compiler will require that you annotate the lifetime of your references; you will use lifetime annotations such as 'a in &'a T which allow naming a lifetime in order to document the relationship between the lifetime of multiple values.
The operative word is document here: a lifetime is intangible and cannot be influenced, the lifetime annotation 'a is just a name to allow referring to it.
So?
Whenever you find yourself wanting to extend the lifetime of a reference, what you should be looking at instead is extending the lifetime of the referred... or simply not use a reference but a value instead.
In this case, a simple solution is to return String instead of &str:
fn build_collection_set(reader: &mut BufReader<File>) -> HashSet<String> {
let mut collection_set = HashSet::new();
for line in reader.lines() {
let line = line.unwrap();
if line.len() > 0 {
collection_set.insert(line);
}
}
collection_set
}
reader.lines() returns an iterator over owned Strings. But then in your for loop you cast these to borrowed references to &str. So when the iterator goes out of scope all your borrowed references become invalid. Consider using a HashSet<String> instead, which also is zero cost, because the Strings get moved into the HashSet and therefore aren't copied.
Working example
fn build_collection_set(reader: &mut BufReader<File>) -> HashSet<String> {
let mut collection_set: HashSet<String> = HashSet::new();
for line in reader.lines() {
let line = line.unwrap();
if line.len() > 0 {
collection_set.insert(line);
}
}
collection_set
}

Why can't I call a mutable method (and store the result) twice in a row?

In the following example:
struct SimpleMemoryBank {
vec: Vec<Box<i32>>,
}
impl SimpleMemoryBank {
fn new() -> SimpleMemoryBank {
SimpleMemoryBank{ vec: Vec::new() }
}
fn add(&mut self, value: i32) -> &mut i32 {
self.vec.push(Box::new(value));
let last = self.vec.len() - 1;
&mut *self.vec[last]
}
}
fn main() {
let mut foo = SimpleMemoryBank::new();
// Works okay
foo.add(1);
foo.add(2);
// Doesn't work: "cannot borrow `foo` as mutable more than once at a time"
let one = foo.add(1);
let two = foo.add(2);
}
add() can be called multiple times in a row, as long as I don't store the result of the function call. But if I store the result of the function (let one = ...), I then get the error:
problem.rs:26:15: 26:18 error: cannot borrow `foo` as mutable more than once at a time
problem.rs:26 let two = foo.add(2);
^~~
problem.rs:25:15: 25:18 note: previous borrow of `foo` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `foo` until the borrow ends
problem.rs:25 let one = foo.add(1);
^~~
problem.rs:27:2: 27:2 note: previous borrow ends here
problem.rs:17 fn main() {
...
problem.rs:27 }
^
error: aborting due to previous error
Is this a manifestation of issue #6393: borrow scopes should not always be lexical?
How can I work around this? Essentially, I want to add a new Box to the vector, and then return a reference to it (so the caller can use it).
This is exactly the problem that Rust is designed to prevent you from causing. What would happen if you did:
let one = foo.add(1);
foo.vec.clear();
println!("{}", one);
Or what if foo.add worked by pushing the new value at the beginning of the vector? Bad things would happen! The main thing is that while you have a borrow out on a variable, you cannot mutate the variable any more. If you were able to mutate it, then you could potentially invalidate the memory underlying the borrow and then your program could do a number of things, the best case would be that it crashes.
Is this a manifestation of issue #6393: borrow scopes should not always be lexical?
Kind of, but not really. In this example, you never use one or two, so theoretically a non-lexical scope would allow it to compile. However, you then state
I want to add a new Box to the vector, and then return a reference to it (so the caller can use it)
Which means your real code wants to be
let one = foo.add(1);
let two = foo.add(2);
do_something(one);
do_something(two);
So the lifetimes of the variables would overlap.
In this case, if you just want a place to store variables that can't be deallocated individually, don't overlap with each other, and cannot be moved, try using a TypedArena:
extern crate arena;
use arena::TypedArena;
fn main() {
let arena = TypedArena::new();
let one = arena.alloc(1);
let two = arena.alloc(2);
*one = 3;
println!("{}, {}", one, two);
}

Resources