Extending borrowed lifetime for String slice - rust

I have a function that reads in a file, and for each line adds it to a HashSet of type &str, but I can't work out how to tell the borrow checker to increase the lifetime.
Here's my function so far:
fn build_collection_set(reader: &mut BufReader<File>) -> HashSet<&str> {
let mut collection_set: HashSet<&str> = HashSet::new();
for line in reader.lines() {
let line = line.unwrap();
if line.len() > 0 {
collection_set.insert(&*line);
}
}
return collection_set;
}
How do I let Rust know I want to keep it around longer?

but I can't work out how to tell the borrow checker to increase the lifetime.
It's impossible.
The lifetime of a value, in C, C++ or Rust, is defined either:
by its lexical scope, if it is bound to an automatic variable
by its dynamic scope, if it is allocated on the heap
You can create variables which reference this value, and if your reference lives longer than the value, then you have a dangling reference:
in C and C++, you better do nothing with it
in Rust, the compiler will refuse to compile your code
In order to validate your program, the Rust compiler will require that you annotate the lifetime of your references; you will use lifetime annotations such as 'a in &'a T which allow naming a lifetime in order to document the relationship between the lifetime of multiple values.
The operative word is document here: a lifetime is intangible and cannot be influenced, the lifetime annotation 'a is just a name to allow referring to it.
So?
Whenever you find yourself wanting to extend the lifetime of a reference, what you should be looking at instead is extending the lifetime of the referred... or simply not use a reference but a value instead.
In this case, a simple solution is to return String instead of &str:
fn build_collection_set(reader: &mut BufReader<File>) -> HashSet<String> {
let mut collection_set = HashSet::new();
for line in reader.lines() {
let line = line.unwrap();
if line.len() > 0 {
collection_set.insert(line);
}
}
collection_set
}

reader.lines() returns an iterator over owned Strings. But then in your for loop you cast these to borrowed references to &str. So when the iterator goes out of scope all your borrowed references become invalid. Consider using a HashSet<String> instead, which also is zero cost, because the Strings get moved into the HashSet and therefore aren't copied.
Working example
fn build_collection_set(reader: &mut BufReader<File>) -> HashSet<String> {
let mut collection_set: HashSet<String> = HashSet::new();
for line in reader.lines() {
let line = line.unwrap();
if line.len() > 0 {
collection_set.insert(line);
}
}
collection_set
}

Related

Rust mutable String declaration from String reference argument

I have
fn main() {
let x = String::from("12");
fun1(&x);
}
fn fun1(in_fun: &String) {
let mut y = _______;
y.push_str("z");
println!("in fun {}", y);
}
where _____ is the code for declaring y based on the argument in_fun.
At first I tried let mut y = *in_fun; which errors move occurs because '*in_fun' has type 'String', which does not implement the 'Copy' trait and also let mut y = String::from(*in_fun); which gives same error.
The thing that worked was let mut y = String::from(format!("{}", *in_fun));.
Is this the right way to declare a mutable String from &String?
Also I still don't understand why dereferencing &String with * errors? I understood *& dereferencing to returns just the value of the reference.
First of all, the working code:
fn fun1(in_fun: &String) {
let mut y = in_fun.clone();
y.push_str("z");
println!("in fun {}", y);
}
Or, your instincts tell you you have to dereference, so (*in_fun).clone() works just the same, but is a bit redundant. *in_fun.clone() does NOT work because it's equivalent to *(in_fun.clone()) (dereferencing the clone), which isn't what you want. The reason you don't need to dereference the reference before calling clone is because Rust's method resolution allows you to call methods of a type or access properties of a type using a reference to the type, and .clone has an &self receiver.
The reason that let mut y = *in_fun doesn't work is because this attempts to move the string out from underneath the reference, which doesn't work.
&String is an immutable reference. Rust is strict about this and prevents many common mishaps we people tend to run into. Dereferencing &String is not possible as it would break the guarantees of safety in rust, allowing you to modify where you only have read access. See the ownership explanation.
The function should either accept a mutable reference &mut String (then the string can be modified in place) or it needs to .clone() the string from the immutable reference.
Taking a mutable reference is more efficient than cloning, but it restricts the caller from sharing it immutably in parallel.
If the only thing you want to achieve is to print out some additional information, the best way I know of is:
fn fun1<S: std::fmt::Display>(in_fun: S) {
println!("in fun {}z", in_fun);
}
fn main() {
let mut x = String::from("12");
fun1(&x);
fun1(&mut x);
fun1(x);
fun1("12");
}
I use a Display trait so anything that implements will do. See the playground.
On the other hand, if you really need an owned string, then ask for it :)
fn fun1<S: Into<String>>(in_fun: S) {
let mut y = in_fun.into();
y.push('z');
println!("in fun {}", y);
}
fn main() {
let x = String::from("12");
fun1(&x);
fun1(x);
fun1("12");
}
This way you can accept both &str and String and keep efficient, avoiding cloning if possible.

Getting first member of a BTreeSet

In Rust, I have a BTreeSet that I'm using to keep my values in order. I have a loop that should retrieve and remove the first (lowest) member of the set. I'm using a cloned iterator to retrieve the first member. Here's the code:
use std::collections::BTreeSet;
fn main() {
let mut start_nodes = BTreeSet::new();
// add items to the set
while !start_nodes.is_empty() {
let mut start_iter = start_nodes.iter();
let mut start_iter_cloned = start_iter.cloned();
let n = start_iter_cloned.next().unwrap();
start_nodes.remove(&n);
}
}
This, however, gives me the following compile error:
error[E0502]: cannot borrow `start_nodes` as mutable because it is also borrowed as immutable
--> prog.rs:60:6
|
56 | let mut start_iter = start_nodes.iter();
| ----------- immutable borrow occurs here
...
60 | start_nodes.remove(&n);
| ^^^^^^^^^^^ mutable borrow occurs here
...
77 | }
| - immutable borrow ends here
Why is start_nodes.iter() considered an immutable borrow? What approach should I take instead to get the first member?
I'm using version 1.14.0 (not by choice).
Why is start_nodes.iter() considered an immutable borrow?
Whenever you ask a question like this one, you need to look at the prototype of the function, in this case the prototype of BTreeSet::iter():
fn iter(&self) -> Iter<T>
If we look up the Iter type that is returned, we find that it's defined as
pub struct Iter<'a, T> where T: 'a { /* fields omitted */ }
The lifetime 'a is not explicitly mentioned in the definition of iter(); however, the lifetime elision rules make the function definition equivalent to
fn iter<'a>(&'a self) -> Iter<'a, T>
From this expanded version, you can see that the return value has a lifetime that is bound to the lifetime of the reference to self that you pass in, which is just another way of stating that the function call creates a shared borrow that lives as long as the return value. If you store the return value in a variable, the borrow lives at least as long as the variable.
What approach should I take instead to get the first member?
As noted in the comments, your code works on recent versions of Rust due to non-lexical lifetimes – the compiler figures out by itself that start_iter and start_iter_cloned don't need to live longer than the call to next(). In older versions of Rust, you can artificially limit the lifetime by introducing a new scope:
while !start_nodes.is_empty() {
let n = {
let mut start_iter = start_nodes.iter();
let mut start_iter_cloned = start_iter.cloned();
start_iter_cloned.next().unwrap()
};
start_nodes.remove(&n);
}
However, note that this code is needlessly long-winded. The new iterator you create and its cloning version only live inside the new scope, and they aren't really used for any other purpose, so you could just as well write
while !start_nodes.is_empty() {
let n = start_nodes.iter().next().unwrap().clone();
start_nodes.remove(&n);
}
which does exactly the same, and avoids the issues with long-living borrows by avoiding to store the intermediate values in variables, to ensure their lifetime ends immediately after the expression.
Finally, while you don't give full details of your use case, I strongly suspect that you would be better off with a BinaryHeap instead of a BTreeSet:
use std::collections::BinaryHeap;
fn main() {
let mut start_nodes = BinaryHeap::new();
start_nodes.push(42);
while let Some(n) = start_nodes.pop() {
// Do something with `n`
}
}
This code is shorter, simpler, completely sidesteps the issue with the borrow checker, and will also be more efficient.
Not sure this is the best approach, but I fixed it by introducing a new scope to ensure that the immutable borrow ends before the mutable borrow occurs:
use std::collections::BTreeSet;
fn main() {
let mut start_nodes = BTreeSet::new();
// add items to the set
while !start_nodes.is_empty() {
let mut n = 0;
{
let mut start_iter = start_nodes.iter();
let mut start_iter_cloned = start_iter.cloned();
let x = &mut n;
*x = start_iter_cloned.next().unwrap();
}
start_nodes.remove(&n);
}
}

Why can't I reuse a &mut reference after passing it to a function that accepts a generic type?

Why doesn't this code compile:
fn use_cursor(cursor: &mut io::Cursor<&mut Vec<u8>>) {
// do some work
}
fn take_reference(data: &mut Vec<u8>) {
{
let mut buf = io::Cursor::new(data);
use_cursor(&mut buf);
}
data.len();
}
fn produce_data() {
let mut data = Vec::new();
take_reference(&mut data);
data.len();
}
The error in this case is:
error[E0382]: use of moved value: `*data`
--> src/main.rs:14:5
|
9 | let mut buf = io::Cursor::new(data);
| ---- value moved here
...
14 | data.len();
| ^^^^ value used here after move
|
= note: move occurs because `data` has type `&mut std::vec::Vec<u8>`, which does not implement the `Copy` trait
The signature of io::Cursor::new is such that it takes ownership of its argument. In this case, the argument is a mutable reference to a Vec.
pub fn new(inner: T) -> Cursor<T>
It sort of makes sense to me; because Cursor::new takes ownership of its argument (and not a reference) we can't use that value later on. At the same time it doesn't make sense: we essentially only pass a mutable reference and the cursor goes out of scope afterwards anyway.
In the produce_data function we also pass a mutable reference to take_reference, and it doesn't produce a error when trying to use data again, unlike inside take_reference.
I found it possible to 'reclaim' the reference by using Cursor.into_inner(), but it feels a bit weird to do it manually, since in normal use-cases the borrow-checker is perfectly capable of doing it itself.
Is there a nicer solution to this problem than using .into_inner()? Maybe there's something else I don't understand about the borrow-checker?
Normally, when you pass a mutable reference to a function, the compiler implicitly performs a reborrow. This produces a new borrow with a shorter lifetime.
When the parameter is generic (and is not of the form &mut T), the compiler doesn't do this reborrowing automatically1. However, you can do it manually by dereferencing your existing mutable reference and then referencing it again:
fn take_reference(data: &mut Vec<u8>) {
{
let mut buf = io::Cursor::new(&mut *data);
use_cursor(&mut buf);
}
data.len();
}
1 — This is because the current compiler architecture only allows a chance to do a coercion if both the source and target types are known at the coercion site.

Why does HashMap have iter_mut() but HashSet doesn't?

What is the design rationale for supplying an iter_mut function for HashMap but not HashSet in Rust?
Would it be a faux pas to roll one's own (assuming that can even be done)?
Having one could alleviate situations that give rise to
previous borrow of X occurs here; the immutable borrow prevents
subsequent moves or mutable borrows of X until the borrow ends
Example
An extremely convoluted example (Gist) that does not show-case why the parameter passing is the way that it is. Has a short comment explaining the pain-point:
use std::collections::HashSet;
fn derp(v: i32, unprocessed: &mut HashSet<i32>) {
if unprocessed.contains(&v) {
// Pretend that v has been processed
unprocessed.remove(&v);
}
}
fn herp(v: i32) {
let mut unprocessed: HashSet<i32> = HashSet::new();
unprocessed.insert(v);
// I need to iterate over the unprocessed values
while let Some(u) = unprocessed.iter().next() {
// And them pass them mutably to another function
// as I will process the values inside derp and
// remove them from the set.
//
// This is an extremely convoluted example but
// I need for derp to be a separate function
// as I will employ recursion there, as it is
// much more succinct than an iterative version.
derp(*u, &mut unprocessed);
}
}
fn main() {
println!("Hello, world!");
herp(10);
}
The statement
while let Some(u) = unprocessed.iter().next() {
is an immutable borrow, hence
derp(*u, &mut unprocessed);
is impossible as unprocessed cannot be borrowed mutably. The immutable borrow does not end until the end of the while-loop.
I have tried to use this as reference and essentially ended up with trying to fool the borrow checker through various permutations of assignments, enclosing braces, but due to the coupling of the intended expressions the problem remains.
You have to think about what HashSet actually is. The IterMut that you get from HashMap::iter_mut() is only mutable on the value part: (&key, &mut val), ((&'a K, &'a mut V))
HashSet is basically a HashMap<T, ()>, so the actual values are the keys, and if you would modify the keys the hash of them would have to be updated or you get an invalid HashMap.
If your HashSet contains a Copy type, such as i32, you can work on a copy of the value to release the borrow on the HashSet early. To do this, you need to eliminate all borrows from the bindings in the while let expression. In your original code, u is of type &i32, and it keeps borrowing from unprocessed until the end of the loop. If we change the pattern to Some(&u), then u is of type i32, which doesn't borrow from anything, so we're free to use unprocessed as we like.
fn herp(v: i32) {
let mut unprocessed: HashSet<i32> = HashSet::new();
unprocessed.insert(v);
while let Some(&u) = unprocessed.iter().next() {
derp(u, &mut unprocessed);
}
}
If the type is not Copy or is too expensive to copy/clone, you can wrap them in Rc or Arc, and clone them as you iterate on them using cloned() (cloning an Rc or Arc doesn't clone the underlying value, it just clones the Rc pointer and increments the reference counter).
use std::collections::HashSet;
use std::rc::Rc;
fn derp(v: &i32, unprocessed: &mut HashSet<Rc<i32>>) {
if unprocessed.contains(v) {
unprocessed.remove(v);
}
}
fn herp(v: Rc<i32>) {
let mut unprocessed: HashSet<Rc<i32>> = HashSet::new();
unprocessed.insert(v);
while let Some(u) = unprocessed.iter().cloned().next() {
// If you don't use u afterwards,
// you could also pass if by value to derp.
derp(&u, &mut unprocessed);
}
}
fn main() {
println!("Hello, world!");
herp(Rc::new(10));
}

Why is the mutable reference not moved here?

I was under the impression that mutable references (i.e. &mut T) are always moved. That makes perfect sense, since they allow exclusive mutable access.
In the following piece of code I assign a mutable reference to another mutable reference and the original is moved. As a result I cannot use the original any more:
let mut value = 900;
let r_original = &mut value;
let r_new = r_original;
*r_original; // error: use of moved value *r_original
If I have a function like this:
fn make_move(_: &mut i32) {
}
and modify my original example to look like this:
let mut value = 900;
let r_original = &mut value;
make_move(r_original);
*r_original; // no complain
I would expect that the mutable reference r_original is moved when I call the function make_move with it. However that does not happen. I am still able to use the reference after the call.
If I use a generic function make_move_gen:
fn make_move_gen<T>(_: T) {
}
and call it like this:
let mut value = 900;
let r_original = &mut value;
make_move_gen(r_original);
*r_original; // error: use of moved value *r_original
The reference is moved again and therefore the program behaves as I would expect.
Why is the reference not moved when calling the function make_move?
Code example
There might actually be a good reason for this.
&mut T isn't actually a type: all borrows are parametrized by some (potentially inexpressible) lifetime.
When one writes
fn move_try(val: &mut ()) {
{ let new = val; }
*val
}
fn main() {
move_try(&mut ());
}
the type inference engine infers typeof new == typeof val, so they share the original lifetime. This means the borrow from new does not end until the borrow from val does.
This means it's equivalent to
fn move_try<'a>(val: &'a mut ()) {
{ let new: &'a mut _ = val; }
*val
}
fn main() {
move_try(&mut ());
}
However, when you write
fn move_try(val: &mut ()) {
{ let new: &mut _ = val; }
*val
}
fn main() {
move_try(&mut ());
}
a cast happens - the same kind of thing that lets you cast away pointer mutability. This means that the lifetime is some (seemingly unspecifiable) 'b < 'a. This involves a cast, and thus a reborrow, and so the reborrow is able to fall out of scope.
An always-reborrow rule would probably be nicer, but explicit declaration isn't too problematic.
I asked something along those lines here.
It seems that in some (many?) cases, instead of a move, a re-borrow takes place. Memory safety is not violated, only the "moved" value is still around. I could not find any docs on that behavior either.
#Levans opened a github issue here, although I'm not entirely convinced this is just a doc issue: dependably moving out of a &mut reference seems central to Rust's approach of ownership.
It's implicit reborrow. It's a topic not well documented.
This question has already been answered pretty well:
how implicit reborrow works
how reborrow works along with borrow split
If I tweak the generic one a bit, it would not complain either
fn make_move_gen<T>(_: &mut T) {
}
or
let _ = *r_original;

Resources