Why is variable scope dependent on the definition order? - rust

I have a basic (and probably stupid) ownership question. I am trying to create a vector of &str from String values wrapped inside Some(String). I am using an intermediate variable to store the extracted/unwrapped String and it seems I need to define this intermediary variable before the vector in order to satisfy the borrow checker:
Working code:
fn main() {
let a = Some("a".to_string());
let mut val = String::new();
let mut v = Vec::<&str>::new();
if a.is_some() {
val = a.unwrap();
v.push(&val[..]);
}
println!("{:?}", val);
}
Non working code:
fn main() {
let a = Some("a".to_string());
let mut v = Vec::<&str>::new();
let mut val = String::new();
if a.is_some() {
val = a.unwrap();
v.push(&val[..]);
}
println!("{:?}", val);
}
And the compiler errors:
<anon>:9:17: 9:20 error: `val` does not live long enough
<anon>:9 v.push(&val[..]);
^~~
<anon>:4:35: 12:2 note: reference must be valid for the block suffix following statement 1 at 4:34...
<anon>:4 let mut v = Vec::<&str>::new();
<anon>:5 let mut val = String::new();
<anon>:6
<anon>:7 if a.is_some() {
<anon>:8 val = a.unwrap();
<anon>:9 v.push(&val[..]);
...
<anon>:5:32: 12:2 note: ...but borrowed value is only valid for the block suffix following statement 2 at 5:31
<anon>:5 let mut val = String::new();
<anon>:6
<anon>:7 if a.is_some() {
<anon>:8 val = a.unwrap();
<anon>:9 v.push(&val[..]);
<anon>:10 }
...
error: aborting due to previous error
playpen: application terminated with error code 101
The playpen code
The question is: why do I have to define the val variable before the vector v? As I see it, val scope is the same as v scope, or am I missing something?

Bindings are dropped in reverse order of declaration, i.e. the most recently declared thing is destroyed first. Specifically, in the code that doesn't work, the destructor of val runs before the destructor of v. Without careful consideration of what Vec<&str>::drop() does, this is not safe: It could for example try to look at the contents of the string slices it contains, despite the fact that the String from which they derive is already destroyed.
Vec doesn't actually do that, but other legitimate types do something along those lines. Previously it was impossible to safely implement Drop for types that contain lifetimes/borrowed pointers. A relatively recent change makes it safe by introducing these additional restrictions.
Note that if you declare let v, val; or let val, v; and later assign, the two bindings do have the same lifetime, so it's not impossible to have two variables of the same lifetime.

Related

Mutate vector within filter

So, I have the following code successfully performing filter in vector:
let mut v1 : Vec<i32> = vec!(1,2,3);
let v2 : Vec<&mut i32> = v1.iter_mut().filter(|x| {**x == 2}).collect();
println!("{:?}", v2);
Since the type signature of the predicate in the filter function is
FnMut(&Self::Item) -> bool, I was assuming that that mutation inside
the closure will work:
let mut v1 : Vec<i32> = vec!(1,2,3);
let v2 : Vec<&mut i32> = v1.iter_mut().filter(|x| {**x = 3; **x == 2}).collect();
println!("{:?}", v2);
But the above code results in a compile error. How to fix that ? Note
that I'm playing with rust to get a better understanding, so the abpve
example doesn't make sense (usually, nobody will try to mutate
things inside filter).
You are confusing two concepts: FnMut means that a function can change its captured variables, like:
fn main() {
let v1 = vec![1, 2, 3];
let mut i = 0usize;
let v2: Vec<_> = v1
.into_iter()
.filter(|x| {
i = i + 1;
*x == 2
})
.collect();
println!("We iterate {} times and produce {:?}", i, v2);
}
This doesn't mean that every parameter of a function will be mutable.
In your code, filter() takes a &Self::Item, which is very different from the map() one that takes Self::Item. Because the real type will translate to Map<Item=&mut i32> and Filter<Item=&&mut i32>. Rust forbids you from mutating a reference if it's behind a non mutable reference:
fn test(a: &&mut i32) {
**a = 5;
}
error[E0594]: cannot assign to `**a` which is behind a `&` reference
This is because Rust follows the the-rules-of-references:
At any given time, you can have either one mutable reference or any number of immutable references.
References must always be valid.
This means you can have more than one &&mut but only one &mut &mut. If Rust didn't stop you, you could mutate a &&mut and that would poison any other &&mut.
Unfortunately the full error description of E0594 is still not available, see #61137.
Note: Avoid side effects when you use the iterator API, I think it's OK to mutate your FnMut state but not the item, you should do this in a for loop, like:
fn main() {
let mut v1 = vec![1, 2, 3];
for x in v1.iter_mut().filter(|x| **x == 2) {
*x = 1;
}
println!("{:?}", v1);
}

Problems with Tuple's lifetime in rust.

I'm trying to implement a simple parser for a byte stream.
I'm having troubles when I want to reuse a variable I declared previously,
fn read_data(asn_data: &mut Cursor<&[u8]>) -> Result<(u8, u8, Vec<u8>), Err> {
let total_len = asn_data.get_ref().len();
if total_len < 2 {
return Err(1);
}
let d_type = asn_data.read_u8().unwrap();
let d_len = asn_data.read_u8().unwrap();
if (asn_data.position() + d_len as u64) > total_len as u64 {
return Err(2);
}
let mut buf = vec![0; d_len as usize];
match asn_data.read_exact(&mut buf) {
Err(e) => Err(e),
Ok(()) => Ok((d_type, d_len, buf)),
}
}
fn parse_request(request: &[u8]) -> Option<u8> {
if request.len() == 0 {
return None;
}
let mut rdr = Cursor::new(request);
let data_tuple = read_data(&mut rdr).unwrap();
println!("{:02?}", data_tuple.2);
rdr = Cursor::new(data_tuple.2.as_slice());
let data_tuple = read_data(&mut rdr).unwrap();
println!("{:02x?}", data_tuple.2);
Some(1)
}
In the parse_request function I want to reuse rdr variable, but with the code shown above I get the next error when compiling:
error[E0597]: data_tuple.2 does not live long enough -->
src/main.rs:80:23
| 80 | rdr = Cursor::new(data_tuple.2.as_slice());
| ^^^^^^^^^^^^ borrowed value does not live long enough ... 104 | }
| - data_tuple.2 dropped here while still borrowed
|
= note: values in a scope are dropped in the opposite order they are created
error: aborting due to previous error
However if I write "let mut" when I use the 2nd time rdr variable, the code compiles and works fine...
let mut rdr = Cursor::new(data_tuple.2.as_slice());
I don't understand why... what I want is to reuse the variable instead to declare it again...
I tried with some examples/issues related to variable life time but I didn't get the solution for my case... and the solution I found I don't understand fully...
This is not connected with tuple lifetimes, this is just the drop order.
When the variables are defined in separate let statements in the same scope (that is, in the same block), they will be dropped in reverse order. Looking at your code, we can see:
let mut rdr = Cursor::new(request);
let data_tuple = read_data(&mut rdr).unwrap();
So, data_tuple will be dropped first, while rdr is still alive. This is bad, because rdr must reference the tuple. The easiest fix will be to swap their definitions:
let data_tuple: (u8, u8, Vec<u8>);
let mut rdr = Cursor::new(request);
data_tuple = read_data(&mut rdr).unwrap();
This way, rdr will be dropped first, releasing the reference to data_tuple and letting the tuple be dropped itself.
The "fix" you mentioned works, because every let statement defines new variable, even if the same name is already used, and the existing variable is immediately forgotten. So, when you write:
let mut rdr = Cursor::new(request);
let data_tuple = read_data(&mut rdr).unwrap();
let mut rdr = Cursor::new(data_tuple.2.as_slice());
the second rdr is in no way connected with the first. Essentially, it's almost the same as declaring two different variables, say, rdr and rdr2, and using rdr2 from this place until the end of function.

mismatches between str and string

I this tiny program, but I can't make it run. I get type mismatches between &str and String or similar errors.
So this is the program
use std::fs::File;
use std::io;
use std::io::prelude::*;
use std::io::BufReader;
use std::collections::HashMap;
fn main() {
let mut f = File::open("/home/asti/class.csv").expect("Couldn't open file");
let mut s = String::new();
let reader = BufReader::new(f);
let lines: Result<Vec<_>,_> = reader.lines().collect();
let mut class_students: HashMap<String, Vec<String>> = HashMap::new();
for l in lines.unwrap() {
let mut str_vec: Vec<&str> = l.split(";").collect();
println!("{}", str_vec[2]);
let e = class_students.entry(str_vec[2]).or_insert(vec![]);
e.push(str_vec[2]);
}
println!("{}", class_students);
}
I constantly get this error:
hello_world.rs:20:38: 20:48 error: mismatched types:
expected `collections::string::String`,
found `&str`
(expected struct `collections::string::String`,
found &-ptr) [E0308]
hello_world.rs:20 let e = class_students.entry(str_vec[2]).or_insert(vec![]);
^~~~~~~~~~
I tried changing the line
let mut str_vec: Vec<&str> = l.split(";").collect();
to
let mut str_vec: Vec<String> = l.split(";").collect();
But I got this error:
hello_world.rs:16:53: 16:60 error: the trait `core::iter::FromIterator<&str>` is not implemented for the type `collections::vec::Vec<collections::string::String>` [E0277]
hello_world.rs:16 let mut str_vec: Vec<String> = l.split(";").collect();
So how do I either extract String from l instead of &str? Also, if there's a better solution let me know please as my newbiness with this technology is probably apparent to all.
A more detailed answer than a comment:
The reason your example fails to compile initially is because you are trying to insert a slice into a vector of Strings. Because the primitive type str implements the ToString trait, you can call the to_string() method to convert it to a String giving your vector the correct type.
Another option would be to_owned() as illustrated in this thread.

How to use a String instead of &str in iterator, if/else block

Would someone be kind enough to explain to me why using a String in this script does not work but &str does. Additionally how can I modify it so that it will work with String's? [version 1.2]
use std::collections::{HashMap};
fn main() {
let mut hash = HashMap::<&str, &str>::new();
hash.insert("this", "value");
let l: &str = "this is a borrowed string reference";
// If the above line was defined as:
//let l: String = "this is a string".to_string();
let mut all = l.split(" ");
let name: &str = all.next().unwrap();
if hash.contains_key(name) == true {
hash.remove(name);
} else {
hash.insert(name, "stuff");
}
}
Ok, let's boil this down to just the necessities:
use std::collections::HashMap;
fn main() {
let mut hash = HashMap::<&str, &str>::new();
hash.insert("this", "value");
let l: String = "this is a borrowed string reference".to_string();
hash.insert(&l, "stuff");
}
Compiling this gives us:
<anon>:7:18: 7:19 error: `l` does not live long enough
<anon>:7 hash.insert(&l, "stuff");
^
<anon>:4:49: 8:2 note: reference must be valid for the block suffix following statement 0 at 4:48...
<anon>:4 let mut hash = HashMap::<&str, &str>::new();
<anon>:5 hash.insert("this", "value");
<anon>:6 let l: String = "this is a borrowed string reference".to_string();
<anon>:7 hash.insert(&l, "stuff");
<anon>:8 }
<anon>:6:71: 8:2 note: ...but borrowed value is only valid for the block suffix following statement 2 at 6:70
<anon>:6 let l: String = "this is a borrowed string reference".to_string();
<anon>:7 hash.insert(&l, "stuff");
<anon>:8 }
Which tells you more or less exactly what it won't work. You try to insert a borrowed pointer to the String l into the hashmap. However, the String simply doesn't live long enough.
Specifically, values are destroyed by Rust in reverse lexical order. So, when execution gets to the end of the function, Rust will deallocate l first, followed by hash. This is a problem: it means that there is a window during which hash contains pointers to destroyed data, which Rust absolutely will not allow.
The reason this works with a &str is that a string literal "like this" is not merely &str; it's actually &'static str. This means that a string literal "lives" for the entire duration of the program: it never gets destroyed, thus it's safe for the hashmap to hold a pointer to it.
The solution is to ensure the String will outlive the HashMap:
use std::collections::HashMap;
fn main() {
// Declare `l` here ...
let l;
let mut hash = HashMap::<&str, &str>::new();
hash.insert("this", "value");
// ... but initialise it *here*.
l = "this is a borrowed string reference".to_string();
hash.insert(&l, "stuff");
}
Now, hash gets destroyed first, followed by l. It's fine to leave a variable uninitialised, so long as you do initialise it prior to reading or using it.
If you change l to String you get:
error: l does not live long enough
Which is true:
name is a reference to one of the substrings of l. You're possibly inserting the name reference to hash, but the lifetime of hash is longer than the lifetime of l. Therefore when the lifetime of l ends, hash would contain invalid references. This is not allowed. F.ex. if you remove the insert line, Rust is happy.
There are multiple ways to fix it, depending on your needs. One of them would be to make the lifetime of hash shorter than the lifetime of l, by instantiating hash after l:
use std::collections::{HashMap};
fn main() {
// let l: &str = "this is a borrowed string reference";
// If the above line was defined as:
let l: String = "this is a string".to_string();
let mut hash = HashMap::<&str, &str>::new();
hash.insert("this", "value");
let mut all = l.split(" ");
let name: &str = all.next().unwrap();
if hash.contains_key(name) == true {
hash.remove(name);
} else {
hash.insert(name, "stuff");
}
}
Or, you could store copies of the strings in the map.
You have a lifetime issue.
Any object inserted into hash must only reference things that outlive hash (or not reference anything at all).
However, here, you define l after hash and therefore l has a shorter lifetime. In turn this means that name which references l's buffer has a shorter lifetime than hash and thus name is not suitable to be inserted into hash (though it can be used to search/remove).
Switching the order in which l and hash are defined make it work:
use std::collections::{HashMap};
fn main() {
let l: String = "this is a string".to_string();
let mut hash = HashMap::<&str, &str>::new();
hash.insert("this", "value");
let mut all = l.split(" ");
let name: &str = all.next().unwrap();
if hash.contains_key(name) == true {
hash.remove(name);
} else {
hash.insert(name, "stuff");
}
}
If this is not possible, then use a HashMap<String, String> to avoid lifetime issues.

Using str and String interchangably

Suppose I'm trying to do a fancy zero-copy parser in Rust using &str, but sometimes I need to modify the text (e.g. to implement variable substitution). I really want to do something like this:
fn main() {
let mut v: Vec<&str> = "Hello there $world!".split_whitespace().collect();
for t in v.iter_mut() {
if (t.contains("$world")) {
*t = &t.replace("$world", "Earth");
}
}
println!("{:?}", &v);
}
But of course the String returned by t.replace() doesn't live long enough. Is there a nice way around this? Perhaps there is a type which means "ideally a &str but if necessary a String"? Or maybe there is a way to use lifetime annotations to tell the compiler that the returned String should be kept alive until the end of main() (or have the same lifetime as v)?
Rust has exactly what you want in form of a Cow (Clone On Write) type.
use std::borrow::Cow;
fn main() {
let mut v: Vec<_> = "Hello there $world!".split_whitespace()
.map(|s| Cow::Borrowed(s))
.collect();
for t in v.iter_mut() {
if t.contains("$world") {
*t.to_mut() = t.replace("$world", "Earth");
}
}
println!("{:?}", &v);
}
as #sellibitze correctly notes, the to_mut() creates a new String which causes a heap allocation to store the previous borrowed value. If you are sure you only have borrowed strings, then you can use
*t = Cow::Owned(t.replace("$world", "Earth"));
In case the Vec contains Cow::Owned elements, this would still throw away the allocation. You can prevent that using the following very fragile and unsafe code (It does direct byte-based manipulation of UTF-8 strings and relies of the fact that the replacement happens to be exactly the same number of bytes.) inside your for loop.
let mut last_pos = 0; // so we don't start at the beginning every time
while let Some(pos) = t[last_pos..].find("$world") {
let p = pos + last_pos; // find always starts at last_pos
last_pos = pos + 5;
unsafe {
let s = t.to_mut().as_mut_vec(); // operating on Vec is easier
s.remove(p); // remove $ sign
for (c, sc) in "Earth".bytes().zip(&mut s[p..]) {
*sc = c;
}
}
}
Note that this is tailored exactly to the "$world" -> "Earth" mapping. Any other mappings require careful consideration inside the unsafe code.
std::borrow::Cow, specifically used as Cow<'a, str>, where 'a is the lifetime of the string being parsed.
use std::borrow::Cow;
fn main() {
let mut v: Vec<Cow<'static, str>> = vec![];
v.push("oh hai".into());
v.push(format!("there, {}.", "Mark").into());
println!("{:?}", v);
}
Produces:
["oh hai", "there, Mark."]

Resources