Lifetimes in lambda-based iterators

Lifetimes in lambda-based iterators - rust

My questions seems to be closely related to Rust error "cannot infer an appropriate lifetime for borrow expression" when attempting to mutate state inside a closure returning an Iterator, but I think it's not the same. So, this
use std::iter;
fn example(text: String) -> impl Iterator<Item = Option<String>> {
let mut i = 0;
let mut chunk = None;
iter::from_fn(move || {
if i <= text.len() {
let p_chunk = chunk;
chunk = Some(&text[..i]);
i += 1;
Some(p_chunk.map(|s| String::from(s)))
} else {
None
}
})
}
fn main() {}
does not compile. The compiler says it cannot determine the appropriate lifetime for &text[..i]. This is the smallest example I could come up with. The idea being, there is an internal state, which is a slice of text, and the iterator returns new Strings allocated from that internal state. I'm new to Rust, so maybe it's all obvious, but how would I annotate lifetimes so that this compiles?
Note that this example is different from the linked example, because there point was passed as a reference, while here text is moved. Also, the answer there is one and half years old by now, so maybe there is an easier way.
EDIT: Added p_chunk to emphasize that chunk needs to be persistent across calls to next and so cannot be local to the closure but should be captured by it.

Your code is an example of attempting to create a self-referential struct, where the struct is implicitly created by the closure. Since both text and chunk are moved into the closure, you can think of both as members of a struct. As chunk refers to the contents in text, the result is a self-referential struct, which is not supported by the current borrow checker.
While self-referential structs are unsafe in general due to moves, in this case it would be safe because text is heap-allocated and is not subsequently mutated, nor does it escape the closure. Therefore it is impossible for the contents of text to move, and a sufficiently smart borrow checker could prove that what you're trying to do is safe and allow the closure to compile.
The answer to the [linked question] says that referencing through an Option is possible but the structure cannot be moved afterwards. In my case, the self-reference is created after text and chunk were moved in place, and they are never moved again, so in principle it should work.
Agreed - it should work in principle, but it is well known that the current borrow checker doesn't support it. The support would require multiple new features: the borrow checker should special-case heap-allocated types like Box or String whose moves don't affect references into their content, and in this case also prove that you don't resize or mem::replace() the closed-over String.
In this case the best workaround is the "obvious" one: instead of persisting the chunk slice, persist a pair of usize indices (or a Range) and create the slice when you need it.

If you move the chunk Option into the closure, your code compiles. I can't quite answer why declaring chunk outside the closure results in a lifetime error for the borrow of text inside the closure, but the chunk Option looks superfluous anyways and the following code should be equivalent:
fn example(text: String) -> impl Iterator<Item = Option<String>> {
let mut i = 0;
iter::from_fn(move || {
if i <= text.len() {
let chunk = text[..i].to_string();
i += 1;
Some(Some(chunk))
} else {
None
}
})
}
Additionally, it seems unlikely that you really want an Iterator<Item = Option<String>> here instead of an Iterator<Item<String>>, since the iterator never yields Some(None) anyways.
fn example(text: String) -> impl Iterator<Item = String> {
let mut i = 0;
iter::from_fn(move || {
if i <= text.len() {
let chunk = text[..i].to_string();
i += 1;
Some(chunk)
} else {
None
}
})
}
Note, you can also go about this iterator without allocating a String for each chunk, if you take a &str as an argument and tie the lifetime of the output to the input argument:
fn example<'a>(text: &'a str) -> impl Iterator<Item = &'a str> + 'a {
let mut i = 0;
iter::from_fn(move || {
if i <= text.len() {
let chunk = &text[..i];
i += 1;
Some(chunk)
} else {
None
}
})
}

Related

`HashMap::get_mut` leading to "returns reference to local value", any efficient work-around?

There have been a fair number of questions around this, and the solution is mostly "use Entry".
However this is an issue because HashMap::entry() requires an owned value meaning possibly expensive copies / allocations even when the key is already present and we just want to update the value in-place, hence the use of get_mut. However the use of get_mut on a reference to a local leads rustc to assume that said reference gets stored into the hashmap, and thus that returning the hashmap is an error:
use std::borrow::Cow;
use std::collections::HashMap;
fn get_string() -> String { String::from("xxxxxxx") }
fn foo() -> HashMap<Cow<'static, str>, usize> {
let mut v = HashMap::new();
// stand-in for "get a string slice as key",
// real case is getting a String from an
// mpsc and the key being a segment of that string
let s = get_string();
// stand-in for a structure which contains an `Option<Cow>`
let k = Cow::from(&s[2..3]);
// because of get_mut, `&s` is apparently considered to be stored in `v`?
if let Some(e) = v.get_mut(&k) {
*e += 1;
} else {
v.insert(Cow::from(k.into_owned()), 0);
}
v
}
Note that the manipulations at lines 9~13 are there to clarify the point of the pattern, but get_mut alone is sufficient to trigger the issue
Is there a way around without the efficiency hit, or is an eager allocation the only way? (note: because this is a static issue, dynamic gates like contains_key or get obviously don't do anything).

According to the docs, HashSet::get_mut() requires a value of type &Q such that the key of the hash implements Borrow<Q>.
The key of your hash is Cow<'static, str>, that implements Borrow<str>. This means that you can use either a &Cow<'static, str> or a &str. But you are passing a &Cow<'local, str> for some 'local lifetime. The compiler tries to match that 'local with 'static and issues a somewhat confusing error message about lifetimes.
The solution is actually easy, because you can get an &str from the Cow either calling k.as_ref() or doing &*k, and the lifetime of the &str is unrestricted: (playground)
let k = Cow::from(&s[2..3]);
if let Some(e) = v.get_mut(k.as_ref()) { /* ...*/ }

Extract original slice from SliceStorage and SliceStorageMut

I am working on some software where I am managing a buffer of floats in a Vec<T> where T is either an f32 or f64. I sometimes need to interpret this buffer, or sections of it, as a mathematical vector. To this end, I am taking advantage of MatrixSlice and friends in nalgebra.
I can create a DVectorSliceMut, for example, the following way
fn as_vector<'a>(slice: &'a mut [f64]) -> DVectorSliceMut<'a, f64> {
DVectorSliceMut::from(slice)
}
However, sometimes I need to later extract the original slice from the DVectorSliceMut with the original lifetime 'a. Is there a way to do this?
The StorageMut trait has a as_mut_slice member function, but the lifetime of the returned slice is the lifetime of the reference to the Storage implementor, not the original slice. I am okay with a solution which consumes the DVectorSliceMut if necessary.
Update: Methods into_slice and into_slice_mut have been respectively added to the SliceStorage and SliceStorageMut traits as of nalgebra v0.28.0.

Given the current API of nalgebra (v0.27.1) there isn't much that you can do, except:
life with the shorter life-time of StorageMut::as_mut_slice
make a feature request for such a function at nalgebra (which seems you already did)
employ your own unsafe code to make StorageMut::ptr_mut into a &'a mut
You could go with the third option until nalgebra gets update and implement something like this in your own code:
use nalgebra::base::dimension::Dim;
use nalgebra::base::storage::Storage;
use nalgebra::base::storage::StorageMut;
fn into_slice<'a>(vec: DVectorSliceMut<'a, f64>) -> &'a mut [f64] {
let mut inner = vec.data;
// from nalgebra
// https://docs.rs/nalgebra/0.27.1/src/nalgebra/base/matrix_slice.rs.html#190
let (nrows, ncols) = inner.shape();
if nrows.value() != 0 && ncols.value() != 0 {
let sz = inner.linear_index(nrows.value() - 1, ncols.value() - 1);
unsafe { core::slice::from_raw_parts_mut(inner.ptr_mut(), sz + 1) }
} else {
unsafe { core::slice::from_raw_parts_mut(inner.ptr_mut(), 0) }
}
}

Methods into_slice and into_slice_mut which return the original slice have been respectively added to the SliceStorage and SliceStorageMut traits as of nalgebra v0.28.0.

Confusing automatic dereferencing of Arc

This is an example taken from the Mutex documentation:
use std::sync::{Arc, Mutex};
use std::sync::mpsc::channel;
use std::thread;
const N: usize = 10;
fn main() {
let data = Arc::new(Mutex::new(0));
let (tx,rx) = channel();
for _ in 0..N{
let (data, tx) = (data.clone(), tx.clone());
thread::spawn(move || {
// snippet
});
}
rx.recv().unwrap();
}
My question is where the snippet comment is. It is given as
let mut data = data.lock().unwrap();
*data += 1;
if *data == N {
tx.send(()).unwrap();
}
The type of data is Arc<Mutex<usize>>, so when calling data.lock(), I assumed that the Arc is being automatically dereferenced and an usize is assigned to data. Why do we need a *in front of data again to dereference it?
The following code which first dereferences the Arc and then proceeds with just an usize also works in place of the snippet.
let mut data = *data.lock().unwrap();
data += 1;
if data == N {
tx.send(()).unwrap();
}

Follow the docs. Starting with Arc<T>:
Does Arc::lock exist? No. Check Deref.
Deref::Target is T. Check Mutex<T>.
Does Mutex::lock exist? Yes. It returns LockResult<MutexGuard<T>>.
Where does unwrap come from? LockResult<T> is a synonym for Result<T, PoisonError<T>>. So it's Result::unwrap, which results in a MutexGuard<T>.
Therefore, data is of type MutexGuard<usize>.
So this is wrong:
so when calling data.lock(), I assumed that the Arc is being automatically dereferenced and an usize is assigned to data.
Thus the question is not why you can't assign directly, but how you're able to assign an usize value at all. Again, follow the docs:
data is a MutexGuard<usize>, so check MutexGuard<T>.
*data is a pointer dereference in a context that requires mutation. Look for an implementation of DerefMut.
It says that for MutexGuard<T>, it implements DerefMut::deref_mut(&mut self) -> &mut T.
Thus, the result of *data is &mut usize.
Then we have your modified example. At this point, it should be clear that this is not at all doing the same thing: it's mutating a local variable that happens to contain the same value as the mutex. But because it's a local variable, changing it has absolutely no bearing on the contents of the mutex.
Thus, the short version is: the result of locking a mutex is a "smart pointer" wrapping the actual value, not the value itself. Thus you have to dereference it to access the value.

Mutating a variable in the body of a closure

I'm writing a bunch of assertions that all involve popping a value from a list.
Coming from a Scala background, I naturally did this:
let mut list = List::new();
let assert_pop = |expected| assert_eq!(list.pop(), expected);
So that I could just write assert_pop(None) or assert_pop(Some(3)) instead of having to write assert_eq!(list.pop(), None) or assert_eq!(list.pop(), Some(3)) every time.
Of course the borrow checker doesn't like this one bit because the closure essentially needs to borrow the value for an undisclosed amount of time, while the rest of my code goes around mutating, thus violating the rule of "no aliasing if you're mutating".
The question is: is there a way to get around this? Do I have to write a macro, or is there a funky memory-safe way I can get around this?
Note that I know that I can just define the closure like this:
let_assert_pop = |lst: &mut List, expected| assert_eq!(lst.pop(), expected);
But that would be less DRY, as I'd have to pass in a &mut list as the first argument at every call.

They key is to define the closure as mut, since it needs a mutable reference.
This works:
let mut v = vec![1, 2];
let mut assert_pop = |expected| assert_eq!(v.pop(), expected);
assert_pop(Some(2));
assert_pop(Some(1));
assert_pop(None);
Note that the the pop closure borrows mutably, so you if you want to use the list afterwards, you have to scope it:
let mut v = vec![1,2];
{
let mut assert_pop = |expected| assert_eq!(v.pop(), expected);
assert_pop(Some(2));
v.push(33); // ERROR: v is borrowed mutably...
}
v.push(33); // Works now, since pop is out of scope.

Instead of directly answering your question (which is well-enough answered already), I'll instead address your other points:
just write assert_pop(None) or assert_pop(Some(3))
a memory-safe way solution
don't pass in a &mut list
To solve all that, don't use a closure, just make a new type:
type List<T> = Vec<T>;
struct Thing<T>(List<T>);
impl<T> Thing<T> {
fn assert_pop(&mut self, expected: Option<T>)
where T: PartialEq + std::fmt::Debug,
{
assert_eq!(self.0.pop(), expected);
}
}
fn main() {
let list = List::new();
let mut list = Thing(list);
list.0.push(1);
list.assert_pop(Some(1));
list.assert_pop(None);
// Take it back if we need to
let _list = list.0;
}

Why is the mutable reference not moved here?

I was under the impression that mutable references (i.e. &mut T) are always moved. That makes perfect sense, since they allow exclusive mutable access.
In the following piece of code I assign a mutable reference to another mutable reference and the original is moved. As a result I cannot use the original any more:
let mut value = 900;
let r_original = &mut value;
let r_new = r_original;
*r_original; // error: use of moved value *r_original
If I have a function like this:
fn make_move(_: &mut i32) {
}
and modify my original example to look like this:
let mut value = 900;
let r_original = &mut value;
make_move(r_original);
*r_original; // no complain
I would expect that the mutable reference r_original is moved when I call the function make_move with it. However that does not happen. I am still able to use the reference after the call.
If I use a generic function make_move_gen:
fn make_move_gen<T>(_: T) {
}
and call it like this:
let mut value = 900;
let r_original = &mut value;
make_move_gen(r_original);
*r_original; // error: use of moved value *r_original
The reference is moved again and therefore the program behaves as I would expect.
Why is the reference not moved when calling the function make_move?
Code example

There might actually be a good reason for this.
&mut T isn't actually a type: all borrows are parametrized by some (potentially inexpressible) lifetime.
When one writes
fn move_try(val: &mut ()) {
{ let new = val; }
*val
}
fn main() {
move_try(&mut ());
}
the type inference engine infers typeof new == typeof val, so they share the original lifetime. This means the borrow from new does not end until the borrow from val does.
This means it's equivalent to
fn move_try<'a>(val: &'a mut ()) {
{ let new: &'a mut _ = val; }
*val
}
fn main() {
move_try(&mut ());
}
However, when you write
fn move_try(val: &mut ()) {
{ let new: &mut _ = val; }
*val
}
fn main() {
move_try(&mut ());
}
a cast happens - the same kind of thing that lets you cast away pointer mutability. This means that the lifetime is some (seemingly unspecifiable) 'b < 'a. This involves a cast, and thus a reborrow, and so the reborrow is able to fall out of scope.
An always-reborrow rule would probably be nicer, but explicit declaration isn't too problematic.

I asked something along those lines here.
It seems that in some (many?) cases, instead of a move, a re-borrow takes place. Memory safety is not violated, only the "moved" value is still around. I could not find any docs on that behavior either.
#Levans opened a github issue here, although I'm not entirely convinced this is just a doc issue: dependably moving out of a &mut reference seems central to Rust's approach of ownership.

It's implicit reborrow. It's a topic not well documented.
This question has already been answered pretty well:
how implicit reborrow works
how reborrow works along with borrow split

If I tweak the generic one a bit, it would not complain either
fn make_move_gen<T>(_: &mut T) {
}
or
let _ = *r_original;

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string