Extend lifetime of variable - rust

I'm trying to return a slice from a vector which is built inside my function. Obviously this doesn't work because v's lifetime expires too soon. I'm wondering if there's a way to extend v's lifetime. I want to return a plain slice, not a vector.
pub fn find<'a>(&'a self, name: &str) -> &'a[&'a Element] {
let v: Vec<&'a Element> = self.iter_elements().filter(|&elem| elem.name.borrow().local_name == name).collect();
v.as_slice()
}

You can't forcibly extend a value's lifetime; you just have to return the full Vec. If I may ask, why do you want to return the slice itself? It is almost always unnecessary, since a Vec can be cheaply (both in the sense of easy syntax and low-overhead at runtime) coerced to a slice.
Alternatively, you could return the iterator:
use std::iter;
pub fn find<'a>(&'a self, name: &str) -> Box<Iterator<Item = &'a Element> + 'a> {
Box::new(self.iter_elements()
.filter(move |&elem| elem.name.borrow().local_name == name))
}
For now, you will have to use an iterator trait object, since closure have types that are unnameable.
NB. I had to change the filter closure to capture-by-move (the move keyword) to ensure that it can be returned, or else the name variable would just passed into the closure pointer into find's stack frame, and hence would be restricted from leaving find.

Related

Proper signature for a function accepting an iterator of strings

I'm confused about the proper type to use for an iterator yielding string slices.
fn print_strings<'a>(seq: impl IntoIterator<Item = &'a str>) {
for s in seq {
println!("- {}", s);
}
}
fn main() {
let arr: [&str; 3] = ["a", "b", "c"];
let vec: Vec<&str> = vec!["a", "b", "c"];
let it: std::str::Split<'_, char> = "a b c".split(' ');
print_strings(&arr);
print_strings(&vec);
print_strings(it);
}
Using <Item = &'a str>, the arr and vec calls don't compile. If, instead, I use <Item = &'a'a str>, they work, but the it call doesn't compile.
Of course, I can make the Item type generic too, and do
fn print_strings<'a, I: std::fmt::Display>(seq: impl IntoIterator<Item = I>)
but it's getting silly. Surely there must be a single canonical "iterator of string values" type?
The error you are seeing is expected because seq is &Vec<&str> and &Vec<T> implements IntoIterator with Item=&T, so with your code, you end up with Item=&&str where you are expecting it to be Item=&str in all cases.
The correct way to do this is to expand Item type so that is can handle both &str and &&str. You can do this by using more generics, e.g.
fn print_strings(seq: impl IntoIterator<Item = impl AsRef<str>>) {
for s in seq {
let s = s.as_ref();
println!("- {}", s);
}
}
This requires the Item to be something that you can retrieve a &str from, and then in your loop .as_ref() will return the &str you are looking for.
This also has the added bonus that your code will also work with Vec<String> and any other type that implements AsRef<str>.
TL;DR The signature you use is fine, it's the callers that are providing iterators with wrong Item - but can be easily fixed.
As explained in the other answer, print_string() doesn't accept &arr and &vec because IntoIterator for &[T; n] and &Vec<T> yield references to T. This is because &Vec, itself a reference, is not allowed to consume the Vec in order to move T values out of it. What it can do is hand out references to T items sitting inside the Vec, i.e. items of type &T. In the case of your callers that don't compile, the containers contain &str, so their iterators hand out &&str.
Other than making print_string() more generic, another way to fix the issue is to call it correctly to begin with. For example, these all compile:
print_strings(arr.iter().map(|sref| *sref));
print_strings(vec.iter().copied());
print_strings(it);
Playground
iter() is the method provided by slices (and therefore available on arrays and Vec) that iterates over references to elements, just like IntoIterator of &Vec. We call it explicitly to be able to call map() to convert &&str to &str the obvious way - by using the * operator to dereference the &&str. The copied() iterator adapter is another way of expressing the same, possibly a bit less cryptic than map(|x| *x). (There is also cloned(), equivalent to map(|x| x.clone()).)
It's also possible to call print_strings() if you have a container with String values:
let v = vec!["foo".to_owned(), "bar".to_owned()];
print_strings(v.iter().map(|s| s.as_str()));

How can I create an Iter over a Vec contained in a RefCell? [duplicate]

Given the following struct and impl:
use std::slice::Iter;
use std::cell::RefCell;
struct Foo {
bar: RefCell<Vec<u32>>,
}
impl Foo {
pub fn iter(&self) -> Iter<u32> {
self.bar.borrow().iter()
}
}
fn main() {}
I get an error message about a lifetime issue:
error: borrowed value does not live long enough
--> src/main.rs:9:9
|
9 | self.bar.borrow().iter()
| ^^^^^^^^^^^^^^^^^ does not live long enough
10 | }
| - temporary value only lives until here
|
note: borrowed value must be valid for the anonymous lifetime #1 defined on the body at 8:36...
--> src/main.rs:8:37
|
8 | pub fn iter(&self) -> Iter<u32> {
| _____________________________________^ starting here...
9 | | self.bar.borrow().iter()
10 | | }
| |_____^ ...ending here
How am I able to return and use bars iterator?
You cannot do this because it would allow you to circumvent runtime checks for uniqueness violations.
RefCell provides you a way to "defer" mutability exclusiveness checks to runtime, in exchange allowing mutation of the data it holds inside through shared references. This is done using RAII guards: you can obtain a guard object using a shared reference to RefCell, and then access the data inside RefCell using this guard object:
&'a RefCell<T> -> Ref<'a, T> (with borrow) or RefMut<'a, T> (with borrow_mut)
&'b Ref<'a, T> -> &'b T
&'b mut RefMut<'a, T> -> &'b mut T
The key point here is that 'b is different from 'a, which allows one to obtain &mut T references without having a &mut reference to the RefCell. However, these references will be linked to the guard instead and can't live longer than the guard. This is done intentionally: Ref and RefMut destructors toggle various flags inside their RefCell to force mutability checks and to force borrow() and borrow_mut() panic if these checks fail.
The simplest thing you can do is to return a wrapper around Ref, a reference to which would implement IntoIterator:
use std::cell::Ref;
struct VecRefWrapper<'a, T: 'a> {
r: Ref<'a, Vec<T>>
}
impl<'a, 'b: 'a, T: 'a> IntoIterator for &'b VecRefWrapper<'a, T> {
type IntoIter = Iter<'a, T>;
type Item = &'a T;
fn into_iter(self) -> Iter<'a, T> {
self.r.iter()
}
}
(try it on playground)
You can't implement IntoIterator for VecRefWrapper directly because then the internal Ref will be consumed by into_iter(), giving you essentially the same situation you're in now.
Alternate Solution
Here is an alternate solution that uses interior mutability as it was intended. Instead of creating an iterator for &T values, we should create an iterator for Ref<T> values, which deference automatically.
struct Iter<'a, T> {
inner: Option<Ref<'a, [T]>>,
}
impl<'a, T> Iterator for Iter<'a, T> {
type Item = Ref<'a, T>;
fn next(&mut self) -> Option<Self::Item> {
match self.inner.take() {
Some(borrow) => match *borrow {
[] => None,
[_, ..] => {
let (head, tail) = Ref::map_split(borrow, |slice| {
(&slice[0], &slice[1..])
});
self.inner.replace(tail);
Some(head)
}
},
None => None,
}
}
}
Playground
Explanation
The accepted answer has a few significant drawbacks that may confuse those new to Rust. I will explain how, in my personal experience, the accepted answer might actually be harmful to a beginner, and why I believe this alternative uses interior mutability and iterators as they were intended.
As the previous answer importantly highlights, using RefCell creates a divergent type hierarchy that isolates mutable and immutable access to a shared value, but you do not have to worry about lifetimes to solve the iteration problem:
RefCell<T> .borrow() -> Ref<T> .deref() -> &T
RefCell<T> .borrow_mut() -> RefMut<T> .deref_mut() -> &mut T
The key to solving this without lifetimes is the Ref::map method, which is critically missed in the book. Ref::map "makes a new reference to a component of the borrowed data", or in other words converts a Ref<T> of the outer type to a Ref<U> of some inner value:
Ref::map(Ref<T>, ...) -> Ref<U>
Ref::map and its counterpart RefMut::map are the real stars of the interior mutability pattern, not borrow() and borrow_mut().
Why? Because unlike borrow() and borrow_mut(), Ref::mut and RefMut::map, allow you to create references to interior values that can be "returned".
Consider adding a first() method to the Foo struct described in the question:
fn first(&self) -> &u32 {
&self.bar.borrow()[0]
}
Nope, .borrow() makes a temporary Ref that only lives until the method returns:
error[E0515]: cannot return value referencing temporary value
--> src/main.rs:9:11
|
9 | &self.bar.borrow()[0]
| ^-----------------^^^
| ||
| |temporary value created here
| returns a value referencing data owned by the current function
error: aborting due to previous error; 1 warning emitted
We can make it more obvious what is happening if we break it up and make the implicit deference explicit:
fn first(&self) -> &u32 {
let borrow: Ref<_> = self.bar.borrow();
let bar: &Vec<u32> = borrow.deref();
&bar[0]
}
Now we can see that .borrow() creates a Ref<T> that is owned by the method's scope, and isn't returned and therefore dropped even before the reference it provided can be used. So, what we really need is to return an owned type instead of a reference. We want to return a Ref<T>, as it implements Deref for us!
Ref::map will help us do just that for component (internal) values:
fn first(&self) -> Ref<u32> {
Ref::map(self.bar.borrow(), |bar| &bar[0])
}
Of course, the .deref() will still happen automatically, and Ref<u32> will be mostly be referentially transparent as &u32.
Gotcha. One easy mistake to make when using Ref::map is to try to create an owned value in the closure, which is not possible as when we tried to use borrow(). Consider the type signature of the second parameter, the function: FnOnce(&T) -> &U,. It returns a reference, not an owned type!
This is why we use a slice in the answer &v[..] instead of trying to use the vector's .iter() method, which returns an owned std::slice::Iter<'a, T>. Slices are a reference type.
Additional Thoughts
Alright, so now I will attempt to justify why this solution is better than the accepted answer.
First, the use of IntoIterator is inconsistent with the Rust standard library, and arguably the purpose and intent of the trait. The trait method consumes self: fn into_iter(self) -> ....
let v = vec![1,2,3,4];
let i = v.into_iter();
// v is no longer valid, it was moved into the iterator
Using IntoIterator indirectly for a wrapper is inconsistent as you consume the wrapper and not the collection. In my experience, beginners will benefit from sticking with the conventions. We should use a regular Iterator.
Next, the IntoIterator trait is implemented for the reference &VecRefWrapper and not the owned type VecRefWrapper.
Suppose you are implementing a library. The consumers of your API will have to seemingly arbitrarily decorate owned values with reference operators, as is demonstrated in the example on the playground:
for &i in &foo.iter() {
println!("{}", i);
}
This is a subtle and confusing distinction if you are new to Rust. Why do we have to take a reference to the value when it is anonymously owned by - and should only exist for - the scope of the loop?
Finally, the solution above shows how it is possible to drill all they way into your data with interior mutability, and makes the path forward for implementing a mutable iterator clear as well. Use RefMut.
From my research there is currently no solution to this problem. The biggest problem here is self-referentiality and the fact that rust cannot prove your code to be safe. Or at least not in the generic fashion.
I think it's safe to assume that crates like ouroboros, self-cell and owning_ref are solution if you know that your struct (T in Ref<T>) does not contain any smart pointers nor anything which could invalidate any pointers you might obtain in your "dependent" struct.
Note that self-cell does this safely with extra heap allocation which might be ok in some cases.
There was also RFC for adding map_value to Ref<T> but as you can see, there is always some way to invalidate pointers in general (which does not mean your specific case is wrong it's just that it probably will never be added to the core library/language because it cannot be guaranteed for any T)
Yeah, so no answer, sorry. impl IntoIterator for &T works but I think it's rather hack and it forces you to write for x in &iter instead of for x in iter

How do I create an iterator that invalidates the previous result each time `next` is called? [duplicate]

This would make it possible to safely iterate over the same element twice, or to hold some state for the global thing being iterated over in the item type.
Something like:
trait IterShort<Iter>
where Self: Borrow<Iter>,
{
type Item;
fn next(self) -> Option<Self::Item>;
}
then an implementation could look like:
impl<'a, MyIter> IterShort<MyIter> for &'a mut MyIter {
type Item = &'a mut MyItem;
fn next(self) -> Option<Self::Item> {
// ...
}
}
I realize I could write my own (I just did), but I'd like one that works with the for-loop notation. Is that possible?
The std::iter::Iterator trait can not do this, but you can write a different trait:
trait StreamingIterator {
type Item;
fn next<'a>(&'a mut self) -> Option<&'a mut Self::Item>;
}
Note that the return value of next borrows the iterator itself, whereas in Vec::iter for example it only borrows the vector.
The downside is that &mut is hard-coded. Making it generic would require higher-kinded types (so that StreamingIterator::Item could itself be generic over a lifetime parameter).
Alexis Beingessner gave a talk about this and more titled Who Owns This Stream of Data? at RustCamp.
As to for loops, they’re really tied to std::iter::IntoIterator which is tied to std::iter::Iterator. You’d just have to implement both.
The standard iterators can't do this as far as I can see. The very definition of an iterator is that the outside has control over the elements while the inside has control over what produces the elements.
From what I understand of what you are trying to do, I'd flip the concept around and instead of returning elements from an iterator to a surrounding environment, pass the environment to the iterator. That is, you create a struct with a constructor function that accepts a closure and implements the iterator trait. On each call to next, the passed-in closure is called with the next element and the return value of that closure or modifications thereof are returned as the current element. That way, next can handle the lifetime of whatever would otherwise be returned to the surrounding environment.

Returning a borrow from an owned resource

I am trying to write a function that maps an Arc<[T]> into an Iterable, for use with flat_map (that is, I want to call i.flat_map(my_iter) for some other i: Iterator<Item=Arc<[T]>>).
fn my_iter<'a, T>(n: Arc<[T]>) -> slice::Iter<'a, T> {
let t: &'a [T] = &*n.clone();
t.into_iter()
}
The function above does not work because n.clone() produces an owned value of type Arc<[T]>, which I can dereference to [T] and then borrow to get &[T], but the lifetime of the borrow only lasts until the end of the function, while the 'a lifetime lasts until the client drops the returned iterator.
How do I clone the Arc in such a way that the client takes ownership of the clone, so that the value is only dropped after the client is done with the iterator (assuming no one else is using the Arc)?
Here's some sample code for the source iterator:
struct BaseIter<T>(Arc<[T]>);
impl<T> Iterator for BaseIter<T> {
type Item = Arc<[T]>;
fn next(&mut self) -> Option<Self::Item> {
Some(self.0.clone())
}
}
How do I implement the result of BaseIter(data).flat_map(my_iter) (which is of type Iterator<&T>) given that BaseIter is producing data, not just borrowing it? (The real thing is more complicated than this, it's not always the same result, but the ownership semantics are the same.)
You cannot do this. Remember, lifetimes in Rust are purely compile-time entities and are only used to validate that your code doesn't accidentally access dropped data. For example:
fn my_iter<'a, T>(n: Arc<[T]>) -> slice::Iter<'a, T>
Here 'a does not "last until the client drops the returned iterator"; this reasoning is incorrect. From the point of view of slice::Iter its lifetime parameter means the lifetime of the slice it is pointing at; from the point of view of my_iter 'a is just a lifetime parameter which can be chosen arbitrarily by the caller. In other words, slice::Iter is always tied to some slice with some concrete lifetime, but the signature of my_iter states that it is able to return arbitrary lifetime. Do you see the contradiction?
As a side note, due to covariance of lifetimes you can return a slice of a static slice from such a function:
static DATA: &'static [u8] = &[1, 2, 3];
fn get_data<'a>() -> &'a [u8] {
DATA
}
The above definition compiles, but it only works because DATA is stored in static memory of your program and is always valid when your program is running; this is not so with Arc<[T]>.
Arc<[T]> implies shared ownership, that is, the data inside Arc<[T]> is jointly owned by all clones of the original Arc<[T]> value. Therefore, when the last clone of an Arc goes out of scope, the value it contains is dropped, and the respective memory is freed. Now, consider what would happen if my_iter() was allowed to compile:
let iter = {
let data: Arc<[i32]> = get_arc_slice();
my_iter(data.clone())
};
iter.map(|x| x+1).collect::<Vec<_>>();
Because in my_iter() 'a can be arbitrary and is not linked in any way to Arc<[T]> (and can not be, actually), nothing prevents this code from compilation - the user might as well choose 'static lifetime. However, here all clones of data will be dropped inside the block, and the array it contains inside will be freed. Using iter after the block is unsafe because it now provides access to the freed memory.
How do I clone the Arc in such a way that the client takes ownership of the clone, so that the value is only dropped after the client is done with the iterator (assuming no one else is using the Arc)?
So, as follows from the above, this is impossible. Only the owner of the data determines when this data should be destroyed, and borrowed references (whose existence is always implied by lifetime parameters) may only borrow the data for the time when it exists, but borrows cannot affect when and how the data is destroyed. In order for borrowed references to compile, they need to always borrow only the data which is valid through the whole time these references are active.
What you can do is to rethink your architecture. It is hard to say what exactly can be done without looking at the full code, but in the case of this particular example you can, for example, first collect the iterator into a vector and then iterate over the vector:
let items: Vec<_> = your_iter.collect();
items.iter().flat_map(my_iter)
Note that now my_iter() should indeed accept &Arc<[T]>, just as Francis Gagné has suggested; this way, the lifetimes of the output iterator will be tied to the lifetime of the input reference, and everything should work fine, because now it is guaranteed that Arcs are stored stably in the vector for their later perusal during the iteration.
There's no way to make this work by passing an Arc<[T]> by value. You need to start from a reference to an Arc<[T]> in order to construct a valid slice::Iter.
fn my_iter<'a, T>(n: &'a Arc<[T]>) -> slice::Iter<'a, T> {
n.into_iter()
}
Or, if we elide the lifetimes:
fn my_iter<T>(n: &Arc<[T]>) -> slice::Iter<T> {
n.into_iter()
}
You need to use another iterator as return type of the function my_iter. slice::Iter<'a, T> has an associated type Item = &'a T. You need an iterator with associated type Item = T. Something like vec::IntoIter<T>. You can implement such an iterator yourself:
use std::sync::Arc;
struct BaseIter<T>(Arc<[T]>);
impl<T> Iterator for BaseIter<T> {
type Item = Arc<[T]>;
fn next(&mut self) -> Option<Self::Item> {
Some(self.0.clone())
}
}
struct ArcIntoIter<T>(usize, Arc<[T]>);
impl<T:Clone> Iterator for ArcIntoIter<T> {
type Item = T;
fn next(&mut self) -> Option<Self::Item> {
if self.0 < self.1.len(){
let i = self.0;
self.0+=1;
Some(self.1[i].clone())
}else{
None
}
}
}
fn my_iter<T>(n: Arc<[T]>) -> ArcIntoIter<T> {
ArcIntoIter(0, n)
}
fn main() {
let data = Arc::new(["A","B","C"]);
println!("{:?}", BaseIter(data).take(3).flat_map(my_iter).collect::<String>());
//output:"ABCABCABC"
}

How does Vec<T> implement iter()?

I am looking at the code of Vec<T> to see how it implements iter() as I want to implement iterators for my struct:
pub struct Column<T> {
name: String,
vec: Vec<T>,
...
}
My goal is not to expose the fields and provide iterators to do looping, max, min, sum, avg, etc for a column.
fn test() {
let col: Column<f32> = ...;
let max = col.iter().max();
}
I thought I would see how Vec<T> does iteration. I can see iter() is defined in SliceExt but it's implemented for [T] and not Vec<T> so I am stumped how you can call iter() from Vec<T>?
Indeed, as fjh said, this happens due to how dereference operator functions in Rust and how methods are resolved.
Rust has special Deref trait which allows values of the types implementing it to be "dereferenced" to obtain another type, usually one which is naturally connected to the source type. For example, an implementation like this one:
impl<T> Deref for Vec<T> {
type Target = [T];
fn deref<'a>(&'a self) -> &'a [T] { self.as_slice() }
}
means that applying * unary operator to a Vec<T> would yield [T] which you would need to borrow again:
let v: Vec<u32> = vec![0; 10];
let s: &[u32] = &*v;
(note that even though deref() returns a reference, the dereference operator * returns Target, not &Target - the compiler inserts automatic dereference if you do not borrow the dereferenced value immediately).
This is the first piece of puzzle. The second one is how methods are resolved. Basically, when you write something like
v.iter()
the compiler first tries to find iter() defined on the type of v (in this case Vec<u32>). If no such method can be found, the compiler tries to insert an appropriate number of *s and &s so the method invocation becomes valid. In this case it find that the following is indeed a valid invocation:
(&*v).iter()
Remember, Deref on Vec<T> returns &[T], and slices do have iter() method defined on them. This is also how you can invoke e.g. a method taking &self on a regular value - the compiler automatically inserts a reference operation for you.

Resources