When is it safe to extend the lifetime of references into Arenas? - rust

I have a struct which uses Arena:
struct Foo<'f> {
a: Arena<u64>, // from the typed-arena crate
v: Vec<&'f u64>,
}
Is it safe to extend the lifetime of a reference into the arena so long as it is bound by the lifetime of the main struct?
impl<'f> Foo<'f> {
pub fn bar(&mut self, n: u64) -> Option<&'f u64> {
if n == 0 {
None
} else {
let n_ref = unsafe { std::mem::transmute(self.a.alloc(n)) };
Some(n_ref)
}
}
}
For more context, see this Reddit comment.

Is it safe to extend the lifetime of a reference into the arena so long as it is bound by the lifetime of the main struct?
The Arena will be dropped along with Foo so, in principle, this would be safe, but it would also be unnecessary because the Arena already lives long enough.
However, this is not what your code is actually doing! The lifetime 'f could be longer that the lifetime of the struct — it can be as long as the shortest-lived reference inside v. For example:
fn main() {
let n = 1u64;
let v = vec![&n];
let bar;
{
let mut foo = Foo { a: Arena::new(), v };
bar = foo.bar(2);
// foo is dropped here, along with the Arena
}
// bar is still useable here because 'f is the full scope of `n`
println!("bar = {:?}", bar); // Some(8021790808186446178) - oops!
}
Trying to pretend that the lifetime is longer than it really is has created an opportunity for Undefined Behaviour in safe code.
A possible fix is to own the Arena outside of the struct and rely on the borrow checker to make sure that it is not dropped while it is still in use:
struct Foo<'f> {
a: &'f Arena<u64>,
v: Vec<&'f u64>,
}
impl<'f> Foo<'f> {
pub bar(&mut self, n: u64) -> Option<&'f u64> {
if n == 0 {
None
} else {
Some(self.a.alloc(n))
}
}
}
fn main() {
let arena = Arena::new();
let n = 1u64;
let v = vec![&n];
let bar;
{
let mut foo = Foo { a: &arena, v };
bar = foo.bar(2);
}
println!("bar = {:?}", bar); // Some(2)
}
Just like your unsafe version, the lifetimes express that the reference to the Arena must be valid for at least as long as the items in the Vec. However, this also enforces that this fact is true! Since there is no unsafe code, you can trust the borrow-checker that this will not trigger UB.

Related

mutation reference for Box closure, but not live long enough

I'm rather new to rust, I have some problems,I don't know how to describe my problem. I think talk is cheap and I want to show my code, but I must add more detail for post.
// programA
struct ClosureTest {
x: u8,
}
type Executor = Box<dyn FnMut() -> ()>;
impl ClosureTest {
fn new(x: u8) -> Self {
ClosureTest { x }
}
}
struct Ftest {}
impl Ftest {
fn exec(&self, mut executor: Executor) {
executor();
}
}
fn receive_test(arg: &mut ClosureTest) {
arg.x = 6;
println!("{}", arg.x);
}
fn main() {
let mut closure_test = ClosureTest::new(5);
let f_test = Ftest {};
let executor = Box::new(|| {
receive_test(&mut closure_test);
});
f_test.exec(executor);
}
I got these errors:
error[E0597]: `closure_test` does not live long enough
--> src/main.rs:30:23
|
29 | let executor = Box::new(|| {
| -- value captured here
30 | receive_test(&mut closure_test);
| ^^^^^^^^^^^^ borrowed value does not live long enough
...
33 | f_test.exec(executor);
| -------- cast requires that `closure_test` is borrowed for `'static`
34 | }
| - `closure_test` dropped here while still borrowed
But if I use mut reference like follows, it works fine.
// programB
struct ClosureTest {
x: u8,
}
type Executor<'a> = &'a mut dyn FnMut() -> ();
impl ClosureTest {
fn new(x: u8) -> Self {
ClosureTest { x }
}
}
struct Ftest {}
impl Ftest {
fn exec(&self, executor: Executor) {
executor();
}
}
fn receive_test(arg: &mut ClosureTest) {
arg.x = 6;
println!("{}", arg.x);
}
fn main() {
let mut closure_test = ClosureTest::new(5);
let f_test = Ftest {};
let mut executor = || {
receive_test(&mut closure_test);
};
f_test.exec(&mut executor);
}
So
What's the reason cause this difference.
Why in sync program "programA" will cause lifetime error?
closure_test is drop before f_test.exec(executor)? or after?
To understand what is happening here you'll need to make a little dive into the Rust Book and learn about lifetimes.
I'll simplify for sake of explanation and comment this:
//type Executor = Box< dyn FnMut() -> ()>;
And change the signature of exec function to this:
fn exec(&self, mut executor: Box<dyn FnMut() -> () >) {
executor();
}
First of all let's notice that if we will comment very last line:
//f_test.exec(executor);
the error is gone. The reason for this is that reference to the closure you created - &Box<dyn FnMut() -> ()> has shorter lifetime than other part of main (it's scoped). It means that when you are creating this closure it exists, but just after the reference to this is dropped. That's why it doesn't complain if you not use it, because it's all okay.
Lifetimes are not variables, they are claims, kinda of similar to generics.
If you would change right now the signature of function exec function to this:
impl Ftest {
fn exec<'a>(&self, mut executor: Box<dyn FnMut() -> () + 'a>) {
executor();
}
}
I would say this:
let's say that lifetime of my function is a,
it would takes some parameters and return Box of function which lifetime would be also a, so it would live as long as I.
By doing it we hinted compiler that the lifetime of closure you'll create should live "in same lifetime" as exec() is executed. That's why even if we bring back the last line of main it would work :D.
There is also an option of adding move keyword:
fn main() {
let mut closure_test = ClosureTest::new(5);
let f_test = Ftest {};
let executor = Box::new(move || {
receive_test(&mut closure_test);
});
//println!("{}", closure_test.x);
f_test.exec(executor);
}
This will work, because by adding move keyword we actually tell that we would rather move closure_test object to our scope, not just a reference, therefore it's not dropped. However you wouldn't be able to use it later as shown if you'll uncomment the line with println!().
I'll be happy if anybody would provide some corrections and more details. This concept in Rust is unique and takes time to fully understand so it would be helpful.

Using * to dereference custom type implemented Deref trait raise error [duplicate]

Code:
use std::collections::HashSet;
use std::{mem, ptr, fmt};
use std::ops::Deref;
enum Unsafety {
Normal
}
enum ImplPolarity { Positive }
struct TraitRef;
struct Ty;
struct ImplItem;
enum ItemKind {
Impl(Unsafety,
ImplPolarity,
Option<TraitRef>, // (optional) trait this impl implements
Box<Ty>, // self
),
}
struct Item {
node: ItemKind,
}
pub struct P<T: ?Sized> {
ptr: Box<T>
}
impl<T: 'static> P<T> {
pub fn unwrap(self) -> T {
*self.ptr
}
}
impl<T: ?Sized> Deref for P<T> {
type Target = T;
fn deref(&self) -> &T {
&self.ptr
}
}
fn main() {
let mut items = Vec::<P<Item>>::new();
let mut item2: Item;
for item in items.drain(..) {
if let ItemKind::Impl(Unsafety::Normal,
ImplPolarity::Positive,
Some(ref trait_type),
ref for_type) = item.node {
} else {
// item2 = *item; // AAA
item2 = item.unwrap(); // BBB
}
}
}
Produce the compile-time error:
error[E0505]: cannot move out of `item` because it is borrowed
--> /home/xxx/.emacs.d/rust-playground/at-2017-07-29-204629/snippet.rs:64:21
|
61 | ref for_type) = item.node {
| ---- borrow of `item` occurs here
...
64 | item2 = item.unwrap();
I do not understand two things:
Why does it complain about the borrow in the if branch while we in else branch? They are supposed to be mutually exclusive, and a borrow in one should not influence another.
If I replace Vec in let mut items = Vec::<P<Item>>::new(); with Vec<Box<Item>> and uncomment line AAA and comment line BBB, then it compiles. Both Box and P implement Deref, so the item.node expression should be the same.
Here's a much clearer example:
struct Item;
struct P<T> {
ptr: Box<T>,
}
impl<T> P<T> {
fn new(v: T) -> Self {
P { ptr: Box::new(v) }
}
}
impl<T> std::ops::Deref for P<T> {
type Target = T;
fn deref(&self) -> &T {
&self.ptr
}
}
fn main() {
let mut item = P::new(Item);
// let mut item = Box::new(Item);
*item;
}
Both Box and P implement Deref, so the item.node expression should be the same.
Hard truth time: moving out of Box is special-cased in the compiler. It does not use Deref. Moving out of a Box deallocates the memory and gives you ownership. It is not possible to implement this special ability ourselves.
Maybe at some point in the future a hypothetical trait like DerefMove will be added. This trait is hard to get right. There have been a number of attempts for an RFC for it, but none are currently open.
See also:
Dereferencing Box<T> gives back value instead of reference
How can I reuse a box that I have moved the value out of?

Why Rust can't coerce mutable reference to immutable reference in a type constructor?

It is possible to coerce &mut T into &T but it doesn't work if the type mismatch happens within a type constructor.
playground
use ndarray::*; // 0.13.0
fn print(a: &ArrayView1<i32>) {
println!("{:?}", a);
}
pub fn test() {
let mut x = array![1i32, 2, 3];
print(&x.view_mut());
}
For the above code I get following error:
|
9 | print(&x.view_mut());
| ^^^^^^^^^^^^^ types differ in mutability
|
= note: expected reference `&ndarray::ArrayBase<ndarray::ViewRepr<&i32>, ndarray::dimension::dim::Dim<[usize; 1]>>`
found reference `&ndarray::ArrayBase<ndarray::ViewRepr<&mut i32>, ndarray::dimension::dim::Dim<[usize; 1]>>`
It is safe to coerce &mut i32 to &i32 so why it is not applied in this situation? Could you provide some examples on how could it possibly backfire?
In general, it's not safe to coerce Type<&mut T> into Type<&T>.
For example, consider this wrapper type, which is implemented without any unsafe code and is therefore sound:
#[derive(Copy, Clone)]
struct Wrapper<T>(T);
impl<T: Deref> Deref for Wrapper<T> {
type Target = T::Target;
fn deref(&self) -> &T::Target { &self.0 }
}
impl<T: DerefMut> DerefMut for Wrapper<T> {
fn deref_mut(&mut self) -> &mut T::Target { &mut self.0 }
}
This type has the property that &Wrapper<&T> automatically dereferences to &T, and &mut Wrapper<&mut T> automatically dereferences to &mut T. In addition, Wrapper<T> is copyable if T is.
Assume that there exists a function that can take a &Wrapper<&mut T> and coerce it into a &Wrapper<&T>:
fn downgrade_wrapper_ref<'a, 'b, T: ?Sized>(w: &'a Wrapper<&'b mut T>) -> &'a Wrapper<&'b T> {
unsafe {
// the internals of this function is not important
}
}
By using this function, it is possible to get a mutable and immutable reference to the same value at the same time:
fn main() {
let mut value: i32 = 0;
let mut x: Wrapper<&mut i32> = Wrapper(&mut value);
let x_ref: &Wrapper<&mut i32> = &x;
let y_ref: &Wrapper<&i32> = downgrade_wrapper_ref(x_ref);
let y: Wrapper<&i32> = *y_ref;
let a: &mut i32 = &mut *x;
let b: &i32 = &*y;
// these two lines will print the same addresses
// meaning the references point to the same value!
println!("a = {:p}", a as &mut i32); // "a = 0x7ffe56ca6ba4"
println!("b = {:p}", b as &i32); // "b = 0x7ffe56ca6ba4"
}
Full playground example
This is not allowed in Rust, leads to undefined behavior and means that the function downgrade_wrapper_ref is unsound in this case. There may be other specific cases where you, as the programmer, can guarantee that this won't happen, but it still requires you to implement it specifically for those case, using unsafe code, to ensure that you take the responsibility of making those guarantees.
Consider this check for an empty string that relies on content staying unchanged for the runtime of the is_empty function (for illustration purposes only, don't use this in production code):
struct Container<T> {
content: T
}
impl<T> Container<T> {
fn new(content: T) -> Self
{
Self { content }
}
}
impl<'a> Container<&'a String> {
fn is_empty(&self, s: &str) -> bool
{
let str = format!("{}{}", self.content, s);
&str == s
}
}
fn main() {
let mut foo : String = "foo".to_owned();
let container : Container<&mut String> = Container::new(&mut foo);
std::thread::spawn(|| {
container.content.replace_range(1..2, "");
});
println!("an empty str is actually empty: {}", container.is_empty(""))
}
(Playground)
This code does not compile since &mut String does not coerce into &String. If it did, however, it would be possible that the newly created thread changed the content after the format! call but before the equal comparison in the is_empty function, thereby invalidating the assumption that the container's content was immutable, which is required for the empty check.
It seems type coercions don't apply to array elements when array is the function parameter type.
playground

How do I efficiently build a vector and an index of that vector while processing a data stream?

I have a struct Foo:
struct Foo {
v: String,
// Other data not important for the question
}
I want to handle a data stream and save the result into Vec<Foo> and also create an index for this Vec<Foo> on the field Foo::v.
I want to use a HashMap<&str, usize> for the index, where the keys will be &Foo::v and the value is the position in the Vec<Foo>, but I'm open to other suggestions.
I want to do the data stream handling as fast as possible, which requires not doing obvious things twice.
For example, I want to:
allocate a String only once per one data stream reading
not search the index twice, once to check that the key does not exist, once for inserting new key.
not increase the run time by using Rc or RefCell.
The borrow checker does not allow this code:
let mut l = Vec::<Foo>::new();
{
let mut hash = HashMap::<&str, usize>::new();
//here is loop in real code, like:
//let mut s: String;
//while get_s(&mut s) {
let s = "aaa".to_string();
let idx: usize = match hash.entry(&s) { //a
Occupied(ent) => {
*ent.get()
}
Vacant(ent) => {
l.push(Foo { v: s }); //b
ent.insert(l.len() - 1);
l.len() - 1
}
};
// do something with idx
}
There are multiple problems:
hash.entry borrows the key so s must have a "bigger" lifetime than hash
I want to move s at line (b), while I have a read-only reference at line (a)
So how should I implement this simple algorithm without an extra call to String::clone or calling HashMap::get after calling HashMap::insert?
In general, what you are trying to accomplish is unsafe and Rust is correctly preventing you from doing something you shouldn't. For a simple example why, consider a Vec<u8>. If the vector has one item and a capacity of one, adding another value to the vector will cause a re-allocation and copying of all the values in the vector, invalidating any references into the vector. This would cause all of your keys in your index to point to arbitrary memory addresses, thus leading to unsafe behavior. The compiler prevents that.
In this case, there's two extra pieces of information that the compiler is unaware of but the programmer isn't:
There's an extra indirection — String is heap-allocated, so moving the pointer to that heap allocation isn't really a problem.
The String will never be changed. If it were, then it might reallocate, invalidating the referred-to address. Using a Box<[str]> instead of a String would be a way to enforce this via the type system.
In cases like this, it is OK to use unsafe code, so long as you properly document why it's not unsafe.
use std::collections::HashMap;
#[derive(Debug)]
struct Player {
name: String,
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let mut players = Vec::new();
let mut index = HashMap::new();
for &name in &names {
let player = Player { name: name.into() };
let idx = players.len();
// I copied this code from Stack Overflow without reading the prose
// that describes why this unsafe block is actually safe
let stable_name: &str = unsafe { &*(player.name.as_str() as *const str) };
players.push(player);
index.insert(idx, stable_name);
}
for (k, v) in &index {
println!("{:?} -> {:?}", k, v);
}
for v in &players {
println!("{:?}", v);
}
}
However, my guess is that you don't want this code in your main method but want to return it from some function. That will be a problem, as you will quickly run into Why can't I store a value and a reference to that value in the same struct?.
Honestly, there's styles of code that don't fit well within Rust's limitations. If you run into these, you could:
decide that Rust isn't a good fit for you or your problem.
use unsafe code, preferably thoroughly tested and only exposing a safe API.
investigate alternate representations.
For example, I'd probably rewrite the code to have the index be the primary owner of the key:
use std::collections::BTreeMap;
#[derive(Debug)]
struct Player<'a> {
name: &'a str,
data: &'a PlayerData,
}
#[derive(Debug)]
struct PlayerData {
hit_points: u8,
}
#[derive(Debug)]
struct Players(BTreeMap<String, PlayerData>);
impl Players {
fn new<I>(iter: I) -> Self
where
I: IntoIterator,
I::Item: Into<String>,
{
let players = iter
.into_iter()
.map(|name| (name.into(), PlayerData { hit_points: 100 }))
.collect();
Players(players)
}
fn get<'a>(&'a self, name: &'a str) -> Option<Player<'a>> {
self.0.get(name).map(|data| Player { name, data })
}
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let players = Players::new(names.iter().copied());
for (k, v) in &players.0 {
println!("{:?} -> {:?}", k, v);
}
println!("{:?}", players.get("eustice"));
}
Alternatively, as shown in What's the idiomatic way to make a lookup table which uses field of the item as the key?, you could wrap your type and store it in a set container instead:
use std::collections::BTreeSet;
#[derive(Debug, PartialEq, Eq)]
struct Player {
name: String,
hit_points: u8,
}
#[derive(Debug, Eq)]
struct PlayerByName(Player);
impl PlayerByName {
fn key(&self) -> &str {
&self.0.name
}
}
impl PartialOrd for PlayerByName {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
Some(self.cmp(other))
}
}
impl Ord for PlayerByName {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
self.key().cmp(&other.key())
}
}
impl PartialEq for PlayerByName {
fn eq(&self, other: &Self) -> bool {
self.key() == other.key()
}
}
impl std::borrow::Borrow<str> for PlayerByName {
fn borrow(&self) -> &str {
self.key()
}
}
#[derive(Debug)]
struct Players(BTreeSet<PlayerByName>);
impl Players {
fn new<I>(iter: I) -> Self
where
I: IntoIterator,
I::Item: Into<String>,
{
let players = iter
.into_iter()
.map(|name| {
PlayerByName(Player {
name: name.into(),
hit_points: 100,
})
})
.collect();
Players(players)
}
fn get(&self, name: &str) -> Option<&Player> {
self.0.get(name).map(|pbn| &pbn.0)
}
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let players = Players::new(names.iter().copied());
for player in &players.0 {
println!("{:?}", player.0);
}
println!("{:?}", players.get("eustice"));
}
not increase the run time by using Rc or RefCell
Guessing about performance characteristics without performing profiling is never a good idea. I honestly don't believe that there'd be a noticeable performance loss from incrementing an integer when a value is cloned or dropped. If the problem required both an index and a vector, then I would reach for some kind of shared ownership.
not increase the run time by using Rc or RefCell.
#Shepmaster already demonstrated accomplishing this using unsafe, once you have I would encourage you to check how much Rc actually would cost you. Here is a full version with Rc:
use std::{
collections::{hash_map::Entry, HashMap},
rc::Rc,
};
#[derive(Debug)]
struct Foo {
v: Rc<str>,
}
#[derive(Debug)]
struct Collection {
vec: Vec<Foo>,
index: HashMap<Rc<str>, usize>,
}
impl Foo {
fn new(s: &str) -> Foo {
Foo {
v: s.into(),
}
}
}
impl Collection {
fn new() -> Collection {
Collection {
vec: Vec::new(),
index: HashMap::new(),
}
}
fn insert(&mut self, foo: Foo) {
match self.index.entry(foo.v.clone()) {
Entry::Occupied(o) => panic!(
"Duplicate entry for: {}, {:?} inserted before {:?}",
foo.v,
o.get(),
foo
),
Entry::Vacant(v) => v.insert(self.vec.len()),
};
self.vec.push(foo)
}
}
fn main() {
let mut collection = Collection::new();
for foo in vec![Foo::new("Hello"), Foo::new("World"), Foo::new("Go!")] {
collection.insert(foo)
}
println!("{:?}", collection);
}
The error is:
error: `s` does not live long enough
--> <anon>:27:5
|
16 | let idx: usize = match hash.entry(&s) { //a
| - borrow occurs here
...
27 | }
| ^ `s` dropped here while still borrowed
|
= note: values in a scope are dropped in the opposite order they are created
The note: at the end is where the answer is.
s must outlive hash because you are using &s as a key in the HashMap. This reference will become invalid when s is dropped. But, as the note says, hash will be dropped after s. A quick fix is to swap the order of their declarations:
let s = "aaa".to_string();
let mut hash = HashMap::<&str, usize>::new();
But now you have another problem:
error[E0505]: cannot move out of `s` because it is borrowed
--> <anon>:22:33
|
17 | let idx: usize = match hash.entry(&s) { //a
| - borrow of `s` occurs here
...
22 | l.push(Foo { v: s }); //b
| ^ move out of `s` occurs here
This one is more obvious. s is borrowed by the Entry, which will live to the end of the block. Cloning s will fix that:
l.push(Foo { v: s.clone() }); //b
I only want to allocate s only once, not cloning it
But the type of Foo.v is String, so it will own its own copy of the str anyway. Just that type means you have to copy the s.
You can replace it with a &str instead which will allow it to stay as a reference into s:
struct Foo<'a> {
v: &'a str,
}
pub fn main() {
// s now lives longer than l
let s = "aaa".to_string();
let mut l = Vec::<Foo>::new();
{
let mut hash = HashMap::<&str, usize>::new();
let idx: usize = match hash.entry(&s) {
Occupied(ent) => {
*ent.get()
}
Vacant(ent) => {
l.push(Foo { v: &s });
ent.insert(l.len() - 1);
l.len() - 1
}
};
}
}
Note that, previously I had to move the declaration of s to before hash, so that it would outlive it. But now, l holds a reference to s, so it has to be declared even earlier, so that it outlives l.

Why does Rust borrow checker reject this code?

I'm getting a Rust compile error from the borrow checker, and I don't understand why. There's probably something about lifetimes I don't fully understand.
I've boiled it down to a short code sample. In main, I want to do this:
fn main() {
let codeToScan = "40 + 2";
let mut scanner = Scanner::new(codeToScan);
let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
println!("first token is: {}", first_token);
// scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
}
Trying to call scanner.consume_till a second time gives me this error:
example.rs:64:5: 64:12 error: cannot borrow `scanner` as mutable more than once at a time
example.rs:64 scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
^~~~~~~
example.rs:62:23: 62:30 note: previous borrow of `scanner` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `scanner` until the borrow ends
example.rs:62 let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
^~~~~~~
example.rs:65:2: 65:2 note: previous borrow ends here
example.rs:59 fn main() {
...
example.rs:65 }
Basically, I've made something like my own iterator, and the equivalent to the "next" method takes &mut self. Because of that, I can't use the method more than once in the same scope.
However, the Rust std library has an iterator which can be used more than once in the same scope, and it also takes a &mut self parameter.
let test = "this is a string";
let mut iterator = test.chars();
iterator.next();
iterator.next(); // This is PERFECTLY LEGAL
So why does the Rust std library code compile, but mine doesn't? (I'm sure the lifetime annotations are at the root of it, but my understanding of lifetimes doesn't lead to me expecting a problem).
Here's my full code (only 60 lines, shortened for this question):
use std::str::{Chars};
use std::iter::{Enumerate};
#[deriving(Show)]
struct ConsumeResult<'lt> {
value: &'lt str,
startIndex: uint,
endIndex: uint,
}
struct Scanner<'lt> {
code: &'lt str,
char_iterator: Enumerate<Chars<'lt>>,
isEof: bool,
}
impl<'lt> Scanner<'lt> {
fn new<'lt>(code: &'lt str) -> Scanner<'lt> {
Scanner{code: code, char_iterator: code.chars().enumerate(), isEof: false}
}
fn assert_not_eof<'lt>(&'lt self) {
if self.isEof {fail!("Scanner is at EOF."); }
}
fn next(&mut self) -> Option<(uint, char)> {
self.assert_not_eof();
let result = self.char_iterator.next();
if result == None { self.isEof = true; }
return result;
}
fn consume_till<'lt>(&'lt mut self, quit: |char| -> bool) -> ConsumeResult<'lt> {
self.assert_not_eof();
let mut startIndex: Option<uint> = None;
let mut endIndex: Option<uint> = None;
loop {
let should_quit = match self.next() {
None => {
endIndex = Some(endIndex.unwrap() + 1);
true
},
Some((i, ch)) => {
if startIndex == None { startIndex = Some(i);}
endIndex = Some(i);
quit (ch)
}
};
if should_quit {
return ConsumeResult{ value: self.code.slice(startIndex.unwrap(), endIndex.unwrap()),
startIndex:startIndex.unwrap(), endIndex: endIndex.unwrap() };
}
}
}
}
fn main() {
let codeToScan = "40 + 2";
let mut scanner = Scanner::new(codeToScan);
let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
println!("first token is: {}", first_token);
// scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
}
Here's a simpler example of the same thing:
struct Scanner<'a> {
s: &'a str
}
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'a>(&'a mut self) -> &'a str {
let return_value = self.s.slice_to(3);
self.s = self.s.slice_from(3);
return_value
}
}
fn main() {
let mut scan = Scanner { s: "123456" };
let a = scan.step_by_3_bytes();
println!("{}", a);
let b = scan.step_by_3_bytes();
println!("{}", b);
}
If you compile that, you get errors like the code in the question:
<anon>:19:13: 19:17 error: cannot borrow `scan` as mutable more than once at a time
<anon>:19 let b = scan.step_by_3_bytes();
^~~~
<anon>:16:13: 16:17 note: previous borrow of `scan` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `scan` until the borrow ends
<anon>:16 let a = scan.step_by_3_bytes();
^~~~
<anon>:21:2: 21:2 note: previous borrow ends here
<anon>:13 fn main() {
...
<anon>:21 }
^
Now, the first thing to do is to avoid shadowing lifetimes: that is, this code has two lifetimes called 'a and all the 'as in step_by_3_bytes refer to the 'a declare there, none of them actually refer to the 'a in Scanner<'a>. I'll rename the inner one to make it crystal clear what is going on
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'b mut self) -> &'b str {
The problem here is the 'b is connecting the self object with the str return value. The compiler has to assume that calling step_by_3_bytes can make arbitrary modifications, including invalidating previous return values, when looking at the definition of step_by_3_bytes from the outside (which is how the compiler works, type checking is purely based on type signatures of things that are called, no introspect). That is, it could be defined like
struct Scanner<'a> {
s: &'a str,
other: String,
count: uint
}
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'b mut self) -> &'b str {
self.other.push_str(self.s);
// return a reference into data we own
self.other.as_slice()
}
}
Now, each call to step_by_3_bytes starts modifying the object that previous return values came from. E.g. it could cause the String to reallocate and thus move in memory, leaving any other &str return values as dangling pointers. Rust protects against this by tracking these references and disallowing mutation if it could cause such catastrophic events. Going back to our actual code: the compiler is type checking main just by looking at the type signature of step_by_3_bytes/consume_till and so it can only assume the worst case scenario (i.e. the example I just gave).
How do we solve this?
Let's take a step back: as if we're just starting out and don't know which lifetimes we want for the return values, so we'll just leave them anonymous (not actually valid Rust):
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'_ mut self) -> &'_ str {
Now, we get to ask the fun question: which lifetimes do we want where?
It's almost always best to annotate the longest valid lifetimes, and we know our return value lives for 'a (since it comes straight of the s field, and that &str is valid for 'a). That is,
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'_ mut self) -> &'a str {
For the other '_, we don't actually care: as API designers, we don't have any particular desire or need to connect the self borrow with any other references (unlike the return value, where we wanted/needed to express which memory it came from). So, we might as well leave it off
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&mut self) -> &'a str {
The 'b is unused, so it can be killed, leaving us with
impl<'a> Scanner<'a> {
fn step_by_3_bytes(&mut self) -> &'a str {
This expresses that Scanner is referring to some memory that is valid for at least 'a, and then returning references into just that memory. The self object is essentially just a proxy for manipulating those views: once you have the reference it returns, you can discard the Scanner (or call more methods).
In summary, the full, working code is
struct Scanner<'a> {
s: &'a str
}
impl<'a> Scanner<'a> {
fn step_by_3_bytes(&mut self) -> &'a str {
let return_value = self.s.slice_to(3);
self.s = self.s.slice_from(3);
return_value
}
}
fn main() {
let mut scan = Scanner { s: "123456" };
let a = scan.step_by_3_bytes();
println!("{}", a);
let b = scan.step_by_3_bytes();
println!("{}", b);
}
Applying this change to your code is simply adjusting the definition of consume_till.
fn consume_till(&mut self, quit: |char| -> bool) -> ConsumeResult<'lt> {
So why does the Rust std library code compile, but mine doesn't? (I'm sure the lifetime annotations are at the root of it, but my understanding of lifetimes doesn't lead to me expecting a problem).
There's a slight (but not huge) difference here: Chars is just returning a char, i.e. no lifetimes in the return value. The next method (essentially) has signature:
impl<'a> Chars<'a> {
fn next(&mut self) -> Option<char> {
(It's actually in an Iterator trait impl, but that's not important.)
The situation you have here is similar to writing
impl<'a> Chars<'a> {
fn next(&'a mut self) -> Option<char> {
(Similar in terms of "incorrect linking of lifetimes", the details differ.)
Let’s look at consume_till.
It takes &'lt mut self and returns ConsumeResult<'lt>. This means that the lifetime 'lt, the duration of the borrow of the input parameter self, will be that of the output parameter, the return value.
Expressed another way, after calling consume_till, you cannot use self again until its result is out of scope.
That result is placed into first_token, and first_token is still in scope in your last line.
In order to get around this, you must cause first_token to pass out of scope; the insertion of a new block around it will do this:
fn main() {
let code_to_scan = "40 + 2";
let mut scanner = Scanner::new(code_to_scan);
{
let first_token = scanner.consume_till(|c| !c.is_digit());
println!("first token is: {}", first_token);
}
scanner.consume_till(|c| c.is_whitespace());
}
All this does stand to reason: while you have a reference to something inside the Scanner, it is not safe to let you modify it, lest that reference be invalidated. This is the memory safety that Rust provides.

Resources