allocating data structures while making the borrow checker happy (second try)

allocating data structures while making the borrow checker happy (second try) - rust

As a follow up to my question from yesterday (allocating data structures while making the borrow checker happy),
here is a simplified case. I would like to create a data structure and return both it and a reference to it. The following code does not compile:
fn create() -> (String, &str) {
let s = String::new();
let r = &s;
return (s,r);
}
fn f() {
let (s,r) = create();
do_something(r);
}
If I restructure my code as follows, everything works fine:
fn create<'a>(s : &'a String) -> &'a str {
let r = &s;
return r;
}
fn f() {
let s = String::new();
let r = create(&s);
do_something(r);
}
Unfortunately, in some cases structuring the code as in the first example is much preferable. Is there some way to make rust accept the first version?
Since both versions of the code do exactly the same thing, there shouldn't be any problems with safety. The question is just whether rust's type system is powerful enough to express it.

The Rust rules on references are that an object can either have
any amount of immutable references,
or exactly one mutable reference.
In other words, you cannot have an immutable reference and a mutable reference to the same object at the same time. Since the object counts as a mutable reference to itself, you cannot modify the object while an immutable reference to the object is present.
This is why your first piece of code does not compile. When you return (String, &str), String contains a mutable reference to a byte array, and &str contains an immutable reference to the same byte array. It violates the borrow checker rules.
However, you may ask, how does the second piece of code work? It works because you never attempt to modify s. If you modify s after r is created, then it will fail to compile:
fn f() {
let mut s = String::new();
let r = &s;
s += "modified"; // ERROR: cannot borrow `s` as mutable because it is also borrowed as immutable
do_something(r);
}
You shouldn't need to return a data structure and a reference to itself. Simply invoke the reference when it is necessary, rather than storing it in a variable. This will make sure that the borrow checker doesn't complain about references colliding.
fn f() {
let s = String::new();
// If a function takes &str, passing &String will be implicitly converted to &str
do_something(&s);
s += "modified"; // Perfectly fine, will still compile
do_something_else(&s);
}
If you want to learn more about why the &String => &str conversion works, see the std::convert::AsRef trait.

I found a solution: use a raw pointer instead of a reference. The following code works:
fn create() -> (String, *const u8) {
let s = String::new();
let r = s.as_ptr();
return (s,r);
}
fn f() {
let (s,r) = create();
do_something(r);
}

Related

Lifetimes in lambda-based iterators

My questions seems to be closely related to Rust error "cannot infer an appropriate lifetime for borrow expression" when attempting to mutate state inside a closure returning an Iterator, but I think it's not the same. So, this
use std::iter;
fn example(text: String) -> impl Iterator<Item = Option<String>> {
let mut i = 0;
let mut chunk = None;
iter::from_fn(move || {
if i <= text.len() {
let p_chunk = chunk;
chunk = Some(&text[..i]);
i += 1;
Some(p_chunk.map(|s| String::from(s)))
} else {
None
}
})
}
fn main() {}
does not compile. The compiler says it cannot determine the appropriate lifetime for &text[..i]. This is the smallest example I could come up with. The idea being, there is an internal state, which is a slice of text, and the iterator returns new Strings allocated from that internal state. I'm new to Rust, so maybe it's all obvious, but how would I annotate lifetimes so that this compiles?
Note that this example is different from the linked example, because there point was passed as a reference, while here text is moved. Also, the answer there is one and half years old by now, so maybe there is an easier way.
EDIT: Added p_chunk to emphasize that chunk needs to be persistent across calls to next and so cannot be local to the closure but should be captured by it.

Your code is an example of attempting to create a self-referential struct, where the struct is implicitly created by the closure. Since both text and chunk are moved into the closure, you can think of both as members of a struct. As chunk refers to the contents in text, the result is a self-referential struct, which is not supported by the current borrow checker.
While self-referential structs are unsafe in general due to moves, in this case it would be safe because text is heap-allocated and is not subsequently mutated, nor does it escape the closure. Therefore it is impossible for the contents of text to move, and a sufficiently smart borrow checker could prove that what you're trying to do is safe and allow the closure to compile.
The answer to the [linked question] says that referencing through an Option is possible but the structure cannot be moved afterwards. In my case, the self-reference is created after text and chunk were moved in place, and they are never moved again, so in principle it should work.
Agreed - it should work in principle, but it is well known that the current borrow checker doesn't support it. The support would require multiple new features: the borrow checker should special-case heap-allocated types like Box or String whose moves don't affect references into their content, and in this case also prove that you don't resize or mem::replace() the closed-over String.
In this case the best workaround is the "obvious" one: instead of persisting the chunk slice, persist a pair of usize indices (or a Range) and create the slice when you need it.

If you move the chunk Option into the closure, your code compiles. I can't quite answer why declaring chunk outside the closure results in a lifetime error for the borrow of text inside the closure, but the chunk Option looks superfluous anyways and the following code should be equivalent:
fn example(text: String) -> impl Iterator<Item = Option<String>> {
let mut i = 0;
iter::from_fn(move || {
if i <= text.len() {
let chunk = text[..i].to_string();
i += 1;
Some(Some(chunk))
} else {
None
}
})
}
Additionally, it seems unlikely that you really want an Iterator<Item = Option<String>> here instead of an Iterator<Item<String>>, since the iterator never yields Some(None) anyways.
fn example(text: String) -> impl Iterator<Item = String> {
let mut i = 0;
iter::from_fn(move || {
if i <= text.len() {
let chunk = text[..i].to_string();
i += 1;
Some(chunk)
} else {
None
}
})
}
Note, you can also go about this iterator without allocating a String for each chunk, if you take a &str as an argument and tie the lifetime of the output to the input argument:
fn example<'a>(text: &'a str) -> impl Iterator<Item = &'a str> + 'a {
let mut i = 0;
iter::from_fn(move || {
if i <= text.len() {
let chunk = &text[..i];
i += 1;
Some(chunk)
} else {
None
}
})
}

Rust mutable String declaration from String reference argument

I have
fn main() {
let x = String::from("12");
fun1(&x);
}
fn fun1(in_fun: &String) {
let mut y = _______;
y.push_str("z");
println!("in fun {}", y);
}
where _____ is the code for declaring y based on the argument in_fun.
At first I tried let mut y = *in_fun; which errors move occurs because '*in_fun' has type 'String', which does not implement the 'Copy' trait and also let mut y = String::from(*in_fun); which gives same error.
The thing that worked was let mut y = String::from(format!("{}", *in_fun));.
Is this the right way to declare a mutable String from &String?
Also I still don't understand why dereferencing &String with * errors? I understood *& dereferencing to returns just the value of the reference.

First of all, the working code:
fn fun1(in_fun: &String) {
let mut y = in_fun.clone();
y.push_str("z");
println!("in fun {}", y);
}
Or, your instincts tell you you have to dereference, so (*in_fun).clone() works just the same, but is a bit redundant. *in_fun.clone() does NOT work because it's equivalent to *(in_fun.clone()) (dereferencing the clone), which isn't what you want. The reason you don't need to dereference the reference before calling clone is because Rust's method resolution allows you to call methods of a type or access properties of a type using a reference to the type, and .clone has an &self receiver.
The reason that let mut y = *in_fun doesn't work is because this attempts to move the string out from underneath the reference, which doesn't work.

&String is an immutable reference. Rust is strict about this and prevents many common mishaps we people tend to run into. Dereferencing &String is not possible as it would break the guarantees of safety in rust, allowing you to modify where you only have read access. See the ownership explanation.
The function should either accept a mutable reference &mut String (then the string can be modified in place) or it needs to .clone() the string from the immutable reference.
Taking a mutable reference is more efficient than cloning, but it restricts the caller from sharing it immutably in parallel.
If the only thing you want to achieve is to print out some additional information, the best way I know of is:
fn fun1<S: std::fmt::Display>(in_fun: S) {
println!("in fun {}z", in_fun);
}
fn main() {
let mut x = String::from("12");
fun1(&x);
fun1(&mut x);
fun1(x);
fun1("12");
}
I use a Display trait so anything that implements will do. See the playground.
On the other hand, if you really need an owned string, then ask for it :)
fn fun1<S: Into<String>>(in_fun: S) {
let mut y = in_fun.into();
y.push('z');
println!("in fun {}", y);
}
fn main() {
let x = String::from("12");
fun1(&x);
fun1(x);
fun1("12");
}
This way you can accept both &str and String and keep efficient, avoiding cloning if possible.

Getting first member of a BTreeSet

In Rust, I have a BTreeSet that I'm using to keep my values in order. I have a loop that should retrieve and remove the first (lowest) member of the set. I'm using a cloned iterator to retrieve the first member. Here's the code:
use std::collections::BTreeSet;
fn main() {
let mut start_nodes = BTreeSet::new();
// add items to the set
while !start_nodes.is_empty() {
let mut start_iter = start_nodes.iter();
let mut start_iter_cloned = start_iter.cloned();
let n = start_iter_cloned.next().unwrap();
start_nodes.remove(&n);
}
}
This, however, gives me the following compile error:
error[E0502]: cannot borrow `start_nodes` as mutable because it is also borrowed as immutable
--> prog.rs:60:6
|
56 | let mut start_iter = start_nodes.iter();
| ----------- immutable borrow occurs here
...
60 | start_nodes.remove(&n);
| ^^^^^^^^^^^ mutable borrow occurs here
...
77 | }
| - immutable borrow ends here
Why is start_nodes.iter() considered an immutable borrow? What approach should I take instead to get the first member?
I'm using version 1.14.0 (not by choice).

Why is start_nodes.iter() considered an immutable borrow?
Whenever you ask a question like this one, you need to look at the prototype of the function, in this case the prototype of BTreeSet::iter():
fn iter(&self) -> Iter<T>
If we look up the Iter type that is returned, we find that it's defined as
pub struct Iter<'a, T> where T: 'a { /* fields omitted */ }
The lifetime 'a is not explicitly mentioned in the definition of iter(); however, the lifetime elision rules make the function definition equivalent to
fn iter<'a>(&'a self) -> Iter<'a, T>
From this expanded version, you can see that the return value has a lifetime that is bound to the lifetime of the reference to self that you pass in, which is just another way of stating that the function call creates a shared borrow that lives as long as the return value. If you store the return value in a variable, the borrow lives at least as long as the variable.
What approach should I take instead to get the first member?
As noted in the comments, your code works on recent versions of Rust due to non-lexical lifetimes – the compiler figures out by itself that start_iter and start_iter_cloned don't need to live longer than the call to next(). In older versions of Rust, you can artificially limit the lifetime by introducing a new scope:
while !start_nodes.is_empty() {
let n = {
let mut start_iter = start_nodes.iter();
let mut start_iter_cloned = start_iter.cloned();
start_iter_cloned.next().unwrap()
};
start_nodes.remove(&n);
}
However, note that this code is needlessly long-winded. The new iterator you create and its cloning version only live inside the new scope, and they aren't really used for any other purpose, so you could just as well write
while !start_nodes.is_empty() {
let n = start_nodes.iter().next().unwrap().clone();
start_nodes.remove(&n);
}
which does exactly the same, and avoids the issues with long-living borrows by avoiding to store the intermediate values in variables, to ensure their lifetime ends immediately after the expression.
Finally, while you don't give full details of your use case, I strongly suspect that you would be better off with a BinaryHeap instead of a BTreeSet:
use std::collections::BinaryHeap;
fn main() {
let mut start_nodes = BinaryHeap::new();
start_nodes.push(42);
while let Some(n) = start_nodes.pop() {
// Do something with `n`
}
}
This code is shorter, simpler, completely sidesteps the issue with the borrow checker, and will also be more efficient.

Not sure this is the best approach, but I fixed it by introducing a new scope to ensure that the immutable borrow ends before the mutable borrow occurs:
use std::collections::BTreeSet;
fn main() {
let mut start_nodes = BTreeSet::new();
// add items to the set
while !start_nodes.is_empty() {
let mut n = 0;
{
let mut start_iter = start_nodes.iter();
let mut start_iter_cloned = start_iter.cloned();
let x = &mut n;
*x = start_iter_cloned.next().unwrap();
}
start_nodes.remove(&n);
}
}

Is there a difference between borrowing an argument as mutable and taking ownership of it and returning it?

Consider:
fn main() {
let mut words: Vec<String> = Vec::new();
words.push(String::from("Example1"));
do_something(&mut words);
for word in words.iter() {
println!("{}", word);
}
}
fn do_something(words: &mut Vec<String>) {
//modify vector, maybe push something:
words.push(String::from("Example2"));
}
vs.
fn main() {
let mut words: Vec<String> = Vec::new();
words.push(String::from("Example1"));
words = do_something(words);
for word in words.iter() {
println!("{}", word);
}
}
fn do_something(mut words: Vec<String>) -> Vec<String> {
//modify vector, maybe push something:
words.push(String::from("Example2"));
return words;
}
Both solutions will print:
Example1
Example2
Is there any difference? What should we use?

No, there's really not much difference in the capability of code using one or the other.
Most of the benefits of one vs the other lie outside of pure capability:
Taking a reference is often more ergonomic to the users of your code: they don't have to continue to remember to assign the return value of each function call.
Taking a value vs. a reference is also often a better signal to your user about the intended usage of the code.
There's a hierarchy of what types are interoperable. If you have ownership of a value, you can call a function that takes ownership, a mutable reference, or an immutable reference. If you have a mutable reference, you can call a function that takes a mutable reference or an immutable reference. If you have an immutable reference, you can only call a function that takes an immutable reference. Thus it's common to accept the most permissive type you can.

Why is the mutable reference not moved here?

I was under the impression that mutable references (i.e. &mut T) are always moved. That makes perfect sense, since they allow exclusive mutable access.
In the following piece of code I assign a mutable reference to another mutable reference and the original is moved. As a result I cannot use the original any more:
let mut value = 900;
let r_original = &mut value;
let r_new = r_original;
*r_original; // error: use of moved value *r_original
If I have a function like this:
fn make_move(_: &mut i32) {
}
and modify my original example to look like this:
let mut value = 900;
let r_original = &mut value;
make_move(r_original);
*r_original; // no complain
I would expect that the mutable reference r_original is moved when I call the function make_move with it. However that does not happen. I am still able to use the reference after the call.
If I use a generic function make_move_gen:
fn make_move_gen<T>(_: T) {
}
and call it like this:
let mut value = 900;
let r_original = &mut value;
make_move_gen(r_original);
*r_original; // error: use of moved value *r_original
The reference is moved again and therefore the program behaves as I would expect.
Why is the reference not moved when calling the function make_move?
Code example

There might actually be a good reason for this.
&mut T isn't actually a type: all borrows are parametrized by some (potentially inexpressible) lifetime.
When one writes
fn move_try(val: &mut ()) {
{ let new = val; }
*val
}
fn main() {
move_try(&mut ());
}
the type inference engine infers typeof new == typeof val, so they share the original lifetime. This means the borrow from new does not end until the borrow from val does.
This means it's equivalent to
fn move_try<'a>(val: &'a mut ()) {
{ let new: &'a mut _ = val; }
*val
}
fn main() {
move_try(&mut ());
}
However, when you write
fn move_try(val: &mut ()) {
{ let new: &mut _ = val; }
*val
}
fn main() {
move_try(&mut ());
}
a cast happens - the same kind of thing that lets you cast away pointer mutability. This means that the lifetime is some (seemingly unspecifiable) 'b < 'a. This involves a cast, and thus a reborrow, and so the reborrow is able to fall out of scope.
An always-reborrow rule would probably be nicer, but explicit declaration isn't too problematic.

I asked something along those lines here.
It seems that in some (many?) cases, instead of a move, a re-borrow takes place. Memory safety is not violated, only the "moved" value is still around. I could not find any docs on that behavior either.
#Levans opened a github issue here, although I'm not entirely convinced this is just a doc issue: dependably moving out of a &mut reference seems central to Rust's approach of ownership.

It's implicit reborrow. It's a topic not well documented.
This question has already been answered pretty well:
how implicit reborrow works
how reborrow works along with borrow split

If I tweak the generic one a bit, it would not complain either
fn make_move_gen<T>(_: &mut T) {
}
or
let _ = *r_original;

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string