How do I pass modified string parameters? - string

I'm on chapter 12 of The Rust Programming Language, where a case insensitive line search is implemented. It doesn't make sense to me to implement the same logic twice, so I figured if I just called the case sensitive search function with the parameters converted to lower case, that might work. It did not.
This is my non working code:
fn main() {
let a = search("Waldo", "where in\nthe world\nis Waldo?");
let b = search("waldo", "where in\nthe world\nis Waldo?");
let c = search_case_insensitive("waldo", "where in\nthe world\nis Waldo?");
println!("{:?}", a);
println!("{:?}", b);
println!("{:?}", c);
}
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
let mut results = Vec::new();
for line in contents.lines() {
if line.contains(query) {
results.push(line);
}
}
results
}
pub fn search_case_insensitive<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
let query = query.to_lowercase();
let contents2: &str = &contents.to_lowercase();
search(&query, contents2)
}
The error in most versions I've come up with is inevitably something very much like:
error[E0597]: borrowed value does not live long enough
--> src/main.rs:25:28
|
25 | let contents2: &str = &contents.to_lowercase();
| ^^^^^^^^^^^^^^^^^^^^^^^ temporary value does not live long enough
...
28 | }
| - temporary value only lives until here
|
note: borrowed value must be valid for the lifetime 'a as defined on the function body at 23:1...
--> src/main.rs:23:1
|
23 | pub fn search_case_insensitive<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

EDIT 2:
Since you've updated the question with an MCVE and you've stated you don't care about straying away from the book examples... here is another version relying on extra allocations via the use of String:
fn main() {
let a = search("Waldo", "where in\nthe world\nis Waldo?");
let b = search("waldo", "where in\nthe world\nis Waldo?");
let c = search_case_insensitive("waldo", "where in\nthe world\nis Waldo?");
println!("{:?}", a);
println!("{:?}", b);
println!("{:?}", c);
}
pub fn search<S>(query: S, contents: S) -> Vec<String> where S: Into<String> {
let query = query.into();
let mut results = Vec::new();
for line in contents.into().lines() {
if line.contains(&query) {
results.push(line.into());
}
}
results
}
pub fn search_case_insensitive<S>(query: S, contents: S) -> Vec<String> where S: Into<String> {
let query = query.into().to_lowercase();
let contents = contents.into().to_lowercase();
search(query, contents)
}
Here it is running in the Playground
EDIT:
I realised I never really gave you an alternative. Here's what I would probably do:
pub enum SearchOptions {
CaseSensitive,
CaseInsensitive
}
pub fn search<'a>(query: &str, contents: &'a str, options: SearchOptions) -> Vec<&'a str> {
let mut results = Vec::new();
for line in contents.lines() {
let check = match options {
SearchOptions::CaseSensitive => line.contains(query),
SearchOptions::CaseInsensitive => line.to_lowercase().contains(&query.to_lowercase()),
};
if check {
results.push(line);
}
}
results
}
This is about as far as you could get "de-dupe"'ing it.
Original answer:
The actual problem is that you're trying to pass the contents around when its bound to the lifetime 'a ... but what you really want to be "case insensitive" is the query.
This isn't bound to the lifetime 'a in quite the same way and as such ... works:
pub fn search_case_insensitive<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
let query = query.to_lowercase();
search(&query, contents)
}
Here it is on the playground
You'll need to still duplicate the logic though ... because you need to match the lowercase query with the lowercase line ... which is demonstrated in the examples in the book:
if line.to_lowercase().contains(&query) {
// ^^^^^^^^^^^^^^ each LINE is converted to lowercase here in the insensitive search
results.push(line);
}
"How do I stop duplicating the logic?" - well they're not quite the same in the first place. I think your attempt wasn't quite what you were after in the first place (happy to be corrected though).

You have introduced an impossible constraint on the lifetime of variable contents2; by writing &'a you are attempting to assign to it the same lifetime as to the contents argument, but it is created and destroyed within the scope of search_case_insensitive and thus is outlived by contents.
In order for contents2 to outlive the body of search_case_insensitive you would need to either return it as a String and assign to some variable outside of it or pass it to search_case_insensitive by reference as long as it already exists as a String elsewhere.
Citing The Book:
It's important to understand that lifetime annotations are
descriptive, not prescriptive. This means that how long a reference is
valid is determined by the code, not by the annotations.

Related

Assembling a string and returning it with lifetime parameters for a l-system

I'm trying to implement a L-System struct and am struggling with it. I already tried different approaches but my main struggle comes from lifetime of references. What I'm trying to achieve is passing the value of the applied axioms back to my system variable, which i passed with the necessary lifetime in apply_axioms_once.
use std::collections::HashMap;
struct LSytem<'a> {
axioms: HashMap<&'a char, &'a str>,
}
impl<'a> LSytem<'a> {
fn apply_axioms_once(&mut self, system: &'a mut str) -> &'a str {
let mut applied: String = String::new();
for c in system.chars() {
let axiom = self.axioms.get(&c).unwrap();
for s in axiom.chars() {
applied.push(s);
}
}
system = applied.as_str();
system
}
fn apply_axioms(&mut self, system: &'a str, iterations: u8) -> &'a str {
let mut applied: &str = system;
// check for 0?
for _ in 0..iterations {
applied = self.apply_axioms_once(applied);
}
&applied
}
}
I already read a couple of similar questions, but still can't quite wrap my head around it. What seems to be the most on point answer is https://stackoverflow.com/a/42506211/18422275, but I'm still puzzled about how to apply this to my issue.
I am still a beginner in rust, and way more bloody than i thought.
This can't work because you return a reference of a data created inside the function (so the given data has a lifetime until the end of the function scope, the returned reference would point to nothing).
You shoud try to return String from your functions instead, so the returned data can be owned.
I made this example to try out:
use std::collections::HashMap;
struct LSytem<'a> {
axioms: HashMap<&'a char, &'a str>,
}
impl<'a> LSytem<'a> {
fn apply_axioms_once(&mut self, system: &String) -> String {
let mut applied: String = String::new();
for c in system.chars() {
let axiom = self.axioms.get(&c).unwrap();
for s in axiom.chars() {
applied.push(s);
}
}
applied
}
fn apply_axioms(&mut self, system: &String, iterations: u8) ->String{
let mut applied = String::from(system);
// check for 0?
for _ in 0..iterations {
applied = self.apply_axioms_once(system);
}
applied
}
}
fn main() {
let mut ls = LSytem {axioms: HashMap::new()};
ls.axioms.insert(&'a', "abc");
let s = String::from("a");
ls.apply_axioms(&s,1);
}

What are the differences when getting an immutable reference from a mutable reference with self-linked lifetimes?

struct Foo01<'a> {
val: u32,
str: &'a String,
}
fn mutate_and_share_01<'a>(foo: &'a mut Foo01<'a>) -> &'a Foo01<'a> {
foo
}
fn mutate_and_share_02<'a>(foo: &'a mut Foo01<'a>) -> &'a Foo01 {
foo
}
fn mutate_and_share_03<'a>(foo: &'a mut Foo01) -> &'a Foo01<'a> {
foo
}
fn main() {
let mut foo = Foo01 { val: 16, str: &String::from("Hello ") };
let foo_mut = &mut foo;
//let loan = mutate_and_share_01(foo_mut);
//let loan2 = mutate_and_share_01(foo_mut); //error
//let loan = mutate_and_share_02(foo_mut);
//let loan2 = mutate_and_share_02(foo_mut); //error
let loan = mutate_and_share_03(foo_mut);
let loan2 = mutate_and_share_03(foo_mut); //good
}
What are differences between these mutate_and_share versions?
In cases 1 and 2, you're saying that the function borrows the structure for as long as the structure borrows its parameter:
foo: &'a mut Foo01<'a>
this says "foo is borrowed from 'a" (&'a mut) and "foo borrows its parameter for 'a" (Foo01<'a>).
Meaning as far as rustc is concerned a call to this function will necessarily borrow the input forever, as the structure necessarily borrows its input for the entirety of its own lifetime, and thus you get locked out: you can't "unborrow" the input by dropping it so the second call can't work ever.
In case 3 you're relating the parameter of the output to the internal borrow which isn't really true but works well enough at least for this case. The reality is that the two lifetimes are unrelated:
fn mutate_and_share<'a, 'b>(foo: &'a mut Foo01<'b>) -> &'a Foo01<'b> {
foo
}
Also do note that your third case only works because you're never using loan, so it's immediately dropped before the second line executes. If you do this:
let loan = mutate_and_share_03(foo_mut);
let loan2 = mutate_and_share_03(foo_mut); //good
print("{}", loan.val)
then it's not going to compile because the mutable borrows are overlapping.
Oh, and &String is generally useless. There are use cases for &mut String, but any time you see an immutable reference to a String you'd be better off with an &str. Same with &Vec<T>, not useful, should be &[T].

Why do I get a "Borrowed value does not live long enough" error for a value I don't need outside of a function?

Why does this happen in this certain scenario? I'm new to Rust, read the 2nd edition book, but.. well, yeah, here I am. :)
fn main() {
Xyz::new(&"whatever=123");
}
pub struct Xyz<'a> {
x: &'a str
}
impl<'a> Xyz<'a> {
pub fn new(query: &str) -> Result<Xyz<'a>, &'a str> {
let mut qkey: String = String::new();
let mut qval: String = String::new();
let mut is_key = true;
for (i, c) in query.chars().enumerate().skip(1) {
if c == '=' {
is_key = false;
} else if c == '&' {
is_key = true;
} else if is_key {
qkey.push(c);
} else {
qval.push(c);
}
if c == '&' || i == query.len() - 1 {
match qkey.as_ref() {
"whatever" => {
let _whatever = Xyz::some_func(&mut qval)?;
}
_ => (),
}
qkey.clear();
qval.clear();
}
}
Ok(Xyz {
x: &""
})
}
fn some_func(_val: &mut String) -> Result<bool, &str> {
Ok(true)
}
}
playground
Error:
error[E0597]: `qval` does not live long enough
--> src/main.rs:29:61
|
29 | let _whatever = Xyz::some_func(&mut qval)?;
| ^^^^ borrowed value does not live long enough
...
41 | }
| - borrowed value only lives until here
|
note: borrowed value must be valid for the lifetime 'a as defined on the impl at 9:1...
--> src/main.rs:9:1
|
9 | impl<'a> Xyz<'a> {
| ^^^^^^^^^^^^^^^^
I don't understand why the actual error happens. I know the note should help me understand the problem, but it doesn't.
I don't need qval outside of this function, so why do I have to still keep it? Is there a conceptual error I made?
A trimmed down reproduction of your issue is
pub struct Xyz<'a> {
x: &'a str
}
impl<'a> Xyz<'a> {
pub fn new(_query: &str) -> Result<Xyz<'a>, &'a str> {
let qval: String = String::new();
Err(&qval)
}
}
with the error
error[E0597]: `qval` does not live long enough
--> src/main.rs:13:18
|
13 | Err(&qval)
| ^^^^ borrowed value does not live long enough
14 | }
| - borrowed value only lives until here
To understand this, I'd recommend taking a step back and thinking about what exactly your function is doing. You create a String, and then take a reference to it, and then return that reference.
This is would be a classic case of use-after-free. From the standpoint of the language, the string no longer exists, so returning a reference to it does not make sense.
Returning &str could make sense, but only if you can guarantee that the &str will only reference data that is still in scope, even after the function has returned. For instance, it would be valid to do Err(&query) (or any subsection of query) if you adjust the function to be fn new(query: &'a str) ->, since then it is it is known that the return value lives as long as the input string.
It is hard to tell from your example if that would be acceptable for usecase.
The simplest would indeed be to return a String (or Result<Xyz<'a>, String>> in your case, possibly with a String in the field for Xyz too). You could also consider returning something like a Cow which is sometimes a subsection of query or a static string, and sometimes a String.
As a side-note, I'll also add that Xyz::new(&"whatever=123"); does not need the & since it is already a reference. You can do
Xyz::new("whatever=123");
just fine. Same for &"".
My confusion was that - in the not stripped down version - both function (new and some_func returned a global &'static str as Err, so I thought the compiler would know that the string would always exist.
This may be the core of the misunderstanding. When you declare
fn some_func(_val: &mut String) -> Result<bool, &str> {
the &str is not 'static. Rust's lifetime elision logic means that this declaration will assume that the reference in the function argument and the reference in the return value have the same lifetime, meaning that what you've declared is
fn some_func<'b>(_val: &'b mut String) -> Result<bool, &'b str> {
which triggers the error you are seeing. If you want it to be static, you'll need to say so, as
fn some_func(_val: &mut String) -> Result<bool, &'static str> {
which would avoid the error, and require that some_func always returns a 'static string.

How do I efficiently build a vector and an index of that vector while processing a data stream?

I have a struct Foo:
struct Foo {
v: String,
// Other data not important for the question
}
I want to handle a data stream and save the result into Vec<Foo> and also create an index for this Vec<Foo> on the field Foo::v.
I want to use a HashMap<&str, usize> for the index, where the keys will be &Foo::v and the value is the position in the Vec<Foo>, but I'm open to other suggestions.
I want to do the data stream handling as fast as possible, which requires not doing obvious things twice.
For example, I want to:
allocate a String only once per one data stream reading
not search the index twice, once to check that the key does not exist, once for inserting new key.
not increase the run time by using Rc or RefCell.
The borrow checker does not allow this code:
let mut l = Vec::<Foo>::new();
{
let mut hash = HashMap::<&str, usize>::new();
//here is loop in real code, like:
//let mut s: String;
//while get_s(&mut s) {
let s = "aaa".to_string();
let idx: usize = match hash.entry(&s) { //a
Occupied(ent) => {
*ent.get()
}
Vacant(ent) => {
l.push(Foo { v: s }); //b
ent.insert(l.len() - 1);
l.len() - 1
}
};
// do something with idx
}
There are multiple problems:
hash.entry borrows the key so s must have a "bigger" lifetime than hash
I want to move s at line (b), while I have a read-only reference at line (a)
So how should I implement this simple algorithm without an extra call to String::clone or calling HashMap::get after calling HashMap::insert?
In general, what you are trying to accomplish is unsafe and Rust is correctly preventing you from doing something you shouldn't. For a simple example why, consider a Vec<u8>. If the vector has one item and a capacity of one, adding another value to the vector will cause a re-allocation and copying of all the values in the vector, invalidating any references into the vector. This would cause all of your keys in your index to point to arbitrary memory addresses, thus leading to unsafe behavior. The compiler prevents that.
In this case, there's two extra pieces of information that the compiler is unaware of but the programmer isn't:
There's an extra indirection — String is heap-allocated, so moving the pointer to that heap allocation isn't really a problem.
The String will never be changed. If it were, then it might reallocate, invalidating the referred-to address. Using a Box<[str]> instead of a String would be a way to enforce this via the type system.
In cases like this, it is OK to use unsafe code, so long as you properly document why it's not unsafe.
use std::collections::HashMap;
#[derive(Debug)]
struct Player {
name: String,
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let mut players = Vec::new();
let mut index = HashMap::new();
for &name in &names {
let player = Player { name: name.into() };
let idx = players.len();
// I copied this code from Stack Overflow without reading the prose
// that describes why this unsafe block is actually safe
let stable_name: &str = unsafe { &*(player.name.as_str() as *const str) };
players.push(player);
index.insert(idx, stable_name);
}
for (k, v) in &index {
println!("{:?} -> {:?}", k, v);
}
for v in &players {
println!("{:?}", v);
}
}
However, my guess is that you don't want this code in your main method but want to return it from some function. That will be a problem, as you will quickly run into Why can't I store a value and a reference to that value in the same struct?.
Honestly, there's styles of code that don't fit well within Rust's limitations. If you run into these, you could:
decide that Rust isn't a good fit for you or your problem.
use unsafe code, preferably thoroughly tested and only exposing a safe API.
investigate alternate representations.
For example, I'd probably rewrite the code to have the index be the primary owner of the key:
use std::collections::BTreeMap;
#[derive(Debug)]
struct Player<'a> {
name: &'a str,
data: &'a PlayerData,
}
#[derive(Debug)]
struct PlayerData {
hit_points: u8,
}
#[derive(Debug)]
struct Players(BTreeMap<String, PlayerData>);
impl Players {
fn new<I>(iter: I) -> Self
where
I: IntoIterator,
I::Item: Into<String>,
{
let players = iter
.into_iter()
.map(|name| (name.into(), PlayerData { hit_points: 100 }))
.collect();
Players(players)
}
fn get<'a>(&'a self, name: &'a str) -> Option<Player<'a>> {
self.0.get(name).map(|data| Player { name, data })
}
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let players = Players::new(names.iter().copied());
for (k, v) in &players.0 {
println!("{:?} -> {:?}", k, v);
}
println!("{:?}", players.get("eustice"));
}
Alternatively, as shown in What's the idiomatic way to make a lookup table which uses field of the item as the key?, you could wrap your type and store it in a set container instead:
use std::collections::BTreeSet;
#[derive(Debug, PartialEq, Eq)]
struct Player {
name: String,
hit_points: u8,
}
#[derive(Debug, Eq)]
struct PlayerByName(Player);
impl PlayerByName {
fn key(&self) -> &str {
&self.0.name
}
}
impl PartialOrd for PlayerByName {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
Some(self.cmp(other))
}
}
impl Ord for PlayerByName {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
self.key().cmp(&other.key())
}
}
impl PartialEq for PlayerByName {
fn eq(&self, other: &Self) -> bool {
self.key() == other.key()
}
}
impl std::borrow::Borrow<str> for PlayerByName {
fn borrow(&self) -> &str {
self.key()
}
}
#[derive(Debug)]
struct Players(BTreeSet<PlayerByName>);
impl Players {
fn new<I>(iter: I) -> Self
where
I: IntoIterator,
I::Item: Into<String>,
{
let players = iter
.into_iter()
.map(|name| {
PlayerByName(Player {
name: name.into(),
hit_points: 100,
})
})
.collect();
Players(players)
}
fn get(&self, name: &str) -> Option<&Player> {
self.0.get(name).map(|pbn| &pbn.0)
}
}
fn main() {
let names = ["alice", "bob", "clarice", "danny", "eustice", "frank"];
let players = Players::new(names.iter().copied());
for player in &players.0 {
println!("{:?}", player.0);
}
println!("{:?}", players.get("eustice"));
}
not increase the run time by using Rc or RefCell
Guessing about performance characteristics without performing profiling is never a good idea. I honestly don't believe that there'd be a noticeable performance loss from incrementing an integer when a value is cloned or dropped. If the problem required both an index and a vector, then I would reach for some kind of shared ownership.
not increase the run time by using Rc or RefCell.
#Shepmaster already demonstrated accomplishing this using unsafe, once you have I would encourage you to check how much Rc actually would cost you. Here is a full version with Rc:
use std::{
collections::{hash_map::Entry, HashMap},
rc::Rc,
};
#[derive(Debug)]
struct Foo {
v: Rc<str>,
}
#[derive(Debug)]
struct Collection {
vec: Vec<Foo>,
index: HashMap<Rc<str>, usize>,
}
impl Foo {
fn new(s: &str) -> Foo {
Foo {
v: s.into(),
}
}
}
impl Collection {
fn new() -> Collection {
Collection {
vec: Vec::new(),
index: HashMap::new(),
}
}
fn insert(&mut self, foo: Foo) {
match self.index.entry(foo.v.clone()) {
Entry::Occupied(o) => panic!(
"Duplicate entry for: {}, {:?} inserted before {:?}",
foo.v,
o.get(),
foo
),
Entry::Vacant(v) => v.insert(self.vec.len()),
};
self.vec.push(foo)
}
}
fn main() {
let mut collection = Collection::new();
for foo in vec![Foo::new("Hello"), Foo::new("World"), Foo::new("Go!")] {
collection.insert(foo)
}
println!("{:?}", collection);
}
The error is:
error: `s` does not live long enough
--> <anon>:27:5
|
16 | let idx: usize = match hash.entry(&s) { //a
| - borrow occurs here
...
27 | }
| ^ `s` dropped here while still borrowed
|
= note: values in a scope are dropped in the opposite order they are created
The note: at the end is where the answer is.
s must outlive hash because you are using &s as a key in the HashMap. This reference will become invalid when s is dropped. But, as the note says, hash will be dropped after s. A quick fix is to swap the order of their declarations:
let s = "aaa".to_string();
let mut hash = HashMap::<&str, usize>::new();
But now you have another problem:
error[E0505]: cannot move out of `s` because it is borrowed
--> <anon>:22:33
|
17 | let idx: usize = match hash.entry(&s) { //a
| - borrow of `s` occurs here
...
22 | l.push(Foo { v: s }); //b
| ^ move out of `s` occurs here
This one is more obvious. s is borrowed by the Entry, which will live to the end of the block. Cloning s will fix that:
l.push(Foo { v: s.clone() }); //b
I only want to allocate s only once, not cloning it
But the type of Foo.v is String, so it will own its own copy of the str anyway. Just that type means you have to copy the s.
You can replace it with a &str instead which will allow it to stay as a reference into s:
struct Foo<'a> {
v: &'a str,
}
pub fn main() {
// s now lives longer than l
let s = "aaa".to_string();
let mut l = Vec::<Foo>::new();
{
let mut hash = HashMap::<&str, usize>::new();
let idx: usize = match hash.entry(&s) {
Occupied(ent) => {
*ent.get()
}
Vacant(ent) => {
l.push(Foo { v: &s });
ent.insert(l.len() - 1);
l.len() - 1
}
};
}
}
Note that, previously I had to move the declaration of s to before hash, so that it would outlive it. But now, l holds a reference to s, so it has to be declared even earlier, so that it outlives l.

Why does Rust borrow checker reject this code?

I'm getting a Rust compile error from the borrow checker, and I don't understand why. There's probably something about lifetimes I don't fully understand.
I've boiled it down to a short code sample. In main, I want to do this:
fn main() {
let codeToScan = "40 + 2";
let mut scanner = Scanner::new(codeToScan);
let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
println!("first token is: {}", first_token);
// scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
}
Trying to call scanner.consume_till a second time gives me this error:
example.rs:64:5: 64:12 error: cannot borrow `scanner` as mutable more than once at a time
example.rs:64 scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
^~~~~~~
example.rs:62:23: 62:30 note: previous borrow of `scanner` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `scanner` until the borrow ends
example.rs:62 let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
^~~~~~~
example.rs:65:2: 65:2 note: previous borrow ends here
example.rs:59 fn main() {
...
example.rs:65 }
Basically, I've made something like my own iterator, and the equivalent to the "next" method takes &mut self. Because of that, I can't use the method more than once in the same scope.
However, the Rust std library has an iterator which can be used more than once in the same scope, and it also takes a &mut self parameter.
let test = "this is a string";
let mut iterator = test.chars();
iterator.next();
iterator.next(); // This is PERFECTLY LEGAL
So why does the Rust std library code compile, but mine doesn't? (I'm sure the lifetime annotations are at the root of it, but my understanding of lifetimes doesn't lead to me expecting a problem).
Here's my full code (only 60 lines, shortened for this question):
use std::str::{Chars};
use std::iter::{Enumerate};
#[deriving(Show)]
struct ConsumeResult<'lt> {
value: &'lt str,
startIndex: uint,
endIndex: uint,
}
struct Scanner<'lt> {
code: &'lt str,
char_iterator: Enumerate<Chars<'lt>>,
isEof: bool,
}
impl<'lt> Scanner<'lt> {
fn new<'lt>(code: &'lt str) -> Scanner<'lt> {
Scanner{code: code, char_iterator: code.chars().enumerate(), isEof: false}
}
fn assert_not_eof<'lt>(&'lt self) {
if self.isEof {fail!("Scanner is at EOF."); }
}
fn next(&mut self) -> Option<(uint, char)> {
self.assert_not_eof();
let result = self.char_iterator.next();
if result == None { self.isEof = true; }
return result;
}
fn consume_till<'lt>(&'lt mut self, quit: |char| -> bool) -> ConsumeResult<'lt> {
self.assert_not_eof();
let mut startIndex: Option<uint> = None;
let mut endIndex: Option<uint> = None;
loop {
let should_quit = match self.next() {
None => {
endIndex = Some(endIndex.unwrap() + 1);
true
},
Some((i, ch)) => {
if startIndex == None { startIndex = Some(i);}
endIndex = Some(i);
quit (ch)
}
};
if should_quit {
return ConsumeResult{ value: self.code.slice(startIndex.unwrap(), endIndex.unwrap()),
startIndex:startIndex.unwrap(), endIndex: endIndex.unwrap() };
}
}
}
}
fn main() {
let codeToScan = "40 + 2";
let mut scanner = Scanner::new(codeToScan);
let first_token = scanner.consume_till(|c| { ! c.is_digit ()});
println!("first token is: {}", first_token);
// scanner.consume_till(|c| { c.is_whitespace ()}); // WHY DOES THIS LINE FAIL?
}
Here's a simpler example of the same thing:
struct Scanner<'a> {
s: &'a str
}
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'a>(&'a mut self) -> &'a str {
let return_value = self.s.slice_to(3);
self.s = self.s.slice_from(3);
return_value
}
}
fn main() {
let mut scan = Scanner { s: "123456" };
let a = scan.step_by_3_bytes();
println!("{}", a);
let b = scan.step_by_3_bytes();
println!("{}", b);
}
If you compile that, you get errors like the code in the question:
<anon>:19:13: 19:17 error: cannot borrow `scan` as mutable more than once at a time
<anon>:19 let b = scan.step_by_3_bytes();
^~~~
<anon>:16:13: 16:17 note: previous borrow of `scan` occurs here; the mutable borrow prevents subsequent moves, borrows, or modification of `scan` until the borrow ends
<anon>:16 let a = scan.step_by_3_bytes();
^~~~
<anon>:21:2: 21:2 note: previous borrow ends here
<anon>:13 fn main() {
...
<anon>:21 }
^
Now, the first thing to do is to avoid shadowing lifetimes: that is, this code has two lifetimes called 'a and all the 'as in step_by_3_bytes refer to the 'a declare there, none of them actually refer to the 'a in Scanner<'a>. I'll rename the inner one to make it crystal clear what is going on
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'b mut self) -> &'b str {
The problem here is the 'b is connecting the self object with the str return value. The compiler has to assume that calling step_by_3_bytes can make arbitrary modifications, including invalidating previous return values, when looking at the definition of step_by_3_bytes from the outside (which is how the compiler works, type checking is purely based on type signatures of things that are called, no introspect). That is, it could be defined like
struct Scanner<'a> {
s: &'a str,
other: String,
count: uint
}
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'b mut self) -> &'b str {
self.other.push_str(self.s);
// return a reference into data we own
self.other.as_slice()
}
}
Now, each call to step_by_3_bytes starts modifying the object that previous return values came from. E.g. it could cause the String to reallocate and thus move in memory, leaving any other &str return values as dangling pointers. Rust protects against this by tracking these references and disallowing mutation if it could cause such catastrophic events. Going back to our actual code: the compiler is type checking main just by looking at the type signature of step_by_3_bytes/consume_till and so it can only assume the worst case scenario (i.e. the example I just gave).
How do we solve this?
Let's take a step back: as if we're just starting out and don't know which lifetimes we want for the return values, so we'll just leave them anonymous (not actually valid Rust):
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'_ mut self) -> &'_ str {
Now, we get to ask the fun question: which lifetimes do we want where?
It's almost always best to annotate the longest valid lifetimes, and we know our return value lives for 'a (since it comes straight of the s field, and that &str is valid for 'a). That is,
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&'_ mut self) -> &'a str {
For the other '_, we don't actually care: as API designers, we don't have any particular desire or need to connect the self borrow with any other references (unlike the return value, where we wanted/needed to express which memory it came from). So, we might as well leave it off
impl<'a> Scanner<'a> {
fn step_by_3_bytes<'b>(&mut self) -> &'a str {
The 'b is unused, so it can be killed, leaving us with
impl<'a> Scanner<'a> {
fn step_by_3_bytes(&mut self) -> &'a str {
This expresses that Scanner is referring to some memory that is valid for at least 'a, and then returning references into just that memory. The self object is essentially just a proxy for manipulating those views: once you have the reference it returns, you can discard the Scanner (or call more methods).
In summary, the full, working code is
struct Scanner<'a> {
s: &'a str
}
impl<'a> Scanner<'a> {
fn step_by_3_bytes(&mut self) -> &'a str {
let return_value = self.s.slice_to(3);
self.s = self.s.slice_from(3);
return_value
}
}
fn main() {
let mut scan = Scanner { s: "123456" };
let a = scan.step_by_3_bytes();
println!("{}", a);
let b = scan.step_by_3_bytes();
println!("{}", b);
}
Applying this change to your code is simply adjusting the definition of consume_till.
fn consume_till(&mut self, quit: |char| -> bool) -> ConsumeResult<'lt> {
So why does the Rust std library code compile, but mine doesn't? (I'm sure the lifetime annotations are at the root of it, but my understanding of lifetimes doesn't lead to me expecting a problem).
There's a slight (but not huge) difference here: Chars is just returning a char, i.e. no lifetimes in the return value. The next method (essentially) has signature:
impl<'a> Chars<'a> {
fn next(&mut self) -> Option<char> {
(It's actually in an Iterator trait impl, but that's not important.)
The situation you have here is similar to writing
impl<'a> Chars<'a> {
fn next(&'a mut self) -> Option<char> {
(Similar in terms of "incorrect linking of lifetimes", the details differ.)
Let’s look at consume_till.
It takes &'lt mut self and returns ConsumeResult<'lt>. This means that the lifetime 'lt, the duration of the borrow of the input parameter self, will be that of the output parameter, the return value.
Expressed another way, after calling consume_till, you cannot use self again until its result is out of scope.
That result is placed into first_token, and first_token is still in scope in your last line.
In order to get around this, you must cause first_token to pass out of scope; the insertion of a new block around it will do this:
fn main() {
let code_to_scan = "40 + 2";
let mut scanner = Scanner::new(code_to_scan);
{
let first_token = scanner.consume_till(|c| !c.is_digit());
println!("first token is: {}", first_token);
}
scanner.consume_till(|c| c.is_whitespace());
}
All this does stand to reason: while you have a reference to something inside the Scanner, it is not safe to let you modify it, lest that reference be invalidated. This is the memory safety that Rust provides.

Resources