How do I tackle lifetimes in Rust? - rust

I am having issues with the concept of lifetimes in rust. I am trying to use the crate bgpkit_parser to read in a bz2 file via url link and then create a radix trie.
One field extracted from the file is the AS Path which I have named path in my code within the build_routetable function. I am having trouble as to why rust does not like let origin = clean_path.last() which takes the last element in the vector.
fn as_parser(element: &BgpElem) -> Vec<u32> {
let x = &element.as_path.as_ref().unwrap().segments[0];
let mut as_vec = &Vec::new();
let mut as_path: Vec<u32> = Vec::new();
if let AsPathSegment::AsSequence(value) = x {
as_vec = value;
}
for i in as_vec {
as_path.push(i.asn);
}
return as_path;
}
fn prefix_parser(element: &BgpElem) -> String {
let subnet_id = element.prefix.prefix.ip().to_string().to_owned();
let prefix_id = element.prefix.prefix.prefix().to_string().to_owned();
let prefix = format!("{}/{}", subnet_id, prefix_id);//.as_str();
return prefix;
}
fn get_aspath(raw_aspath: Vec<u32>) -> Vec<u32> {
let mut as_path = Vec::new();
for i in raw_aspath {
if i < 64511 {
if as_path.contains(&i) {
continue;
}
else {
as_path.push(i);
}
}
else if 65535 < i && i < 4000000000 {
if as_path.contains(&i) {
continue;
}
else {
as_path.push(i);
}
}
}
return as_path;
}
fn build_routetable(mut trie4: Trie<String, Option<&u32>>, mut trie6: Trie<String, Option<&u32>>) {
let url: &str = "http://archive.routeviews.org/route-views.chile/\
bgpdata/2022.06/RIBS/rib.20220601.0000.bz2";
let parser = BgpkitParser::new(url).unwrap();
let mut count = 0;
for elem in parser {
if elem.elem_type == bgpkit_parser::ElemType::ANNOUNCE {
let record_timestamp = &elem.timestamp;
let record_type = "A";
let peer = &elem.peer_ip;
let prefix = prefix_parser(&elem);
let path = as_parser(&elem);
let clean_path = get_aspath(path);
// Issue is on the below line
// `clean_path` does not live long enough
// borrowed value does not live long
// enough rustc E0597
// main.rs(103, 9): `clean_path` dropped
// here while still borrowed
// main.rs(77, 91): let's call the
// lifetime of this reference `'1`
// main.rs(92, 17): argument requires
// that `clean_path` is borrowed for `'1`
let origin = clean_path.last(); //issue line
if prefix.contains(":") {
trie6.insert(prefix, origin);
}
else {
trie4.insert(prefix, origin);
}
count+=1;
if count >= 10000 {
println!("{:?} | {:?} | {:?} | {:?} | {:?}",
record_type, record_timestamp, peer, prefix, path);
count=0
}
};
}
println!("Trie4 size: {:?} prefixes", trie4.len());
println!("Trie6 size: {:?} prefixes", trie6.len());
}

Short answer: you're "inserting" a reference. But what's being referenced doesn't outlive what it's being inserted into.
Longer: The hint is your trie4 argument, the signature of which is this:
mut trie4: Trie<String, Option<&u32>>
So that lives beyond the length of the loop where things are declared. This is all in the loop:
let origin = clean_path.last(); //issue line
if prefix.contains(":") {
trie6.insert(prefix, origin);
}
While origin is a Vec<u32> and that's fine, the insert method is no doubt taking a String and either an Option<&u32> or a &u32. Obviously a key/value pair. But here's your problem: the value has to live as long as the collection, but your value is the last element contained in the Vec<u32>, which goes away! So you can't put something into it that will not live as long as the "container" object! Rust has just saved you from dangling references (just like it's supposed to).
Basically, your containers should be Trie<String, Option<u32>> without the reference, and then this'll all just work fine. Your problem is that the elements are references, and not just contained regular values, and given the size of what you're containing, it's actually smaller to contain a u32 than a reference (pointer size (though actually, it'll likely be the same either way, because alignment issues)).
Also of note: trie4 and trie6 will both be gone at the end of this function call, because they were moved into this function (not references or mutable references). I hope that's what you want.

Related

How to fix use of moved value in Rust?

I am trying to convert a yaml file to xml using Rust and I am not able to figure out how to fix this error regarding the use of moved value. I think I understand why this error is coming, but haven't got a clue about what to do next.
Here's the code:
struct Element {
element_name: String,
indentation_count: i16,
}
struct Attribute<'a> {
attribute_name: &'a str,
attribute_value: &'a str,
}
fn convert_yaml_to_xml(content: String, indentation_count: i16) -> String {
let mut xml_elements: Vec<Element> = vec![];
let mut attributes: Vec<Attribute> = vec![];
xml_elements.push(Element {element_name: "xmlRoot".to_string(), indentation_count: -1});
let mut target: Vec<u8> = Vec::new();
let mut xml_data_writer = EmitterConfig::new().perform_indent(true).create_writer(&mut target);
let mut attribute_written_flag = false;
let mut xml_event;
xml_event = XmlEvent::start_element("xmlRoot");
for line in content.lines() {
let current_line = line.trim();
let caps = indentation_count_regex.captures(current_line).unwrap();
let current_indentation_count = caps.get(1).unwrap().as_str().to_string().len() as i16;
if ELEMENT_REGEX.is_match(current_line) {
loop {
let current_attribute_option = attributes.pop();
match current_attribute_option {
Some(current_attribute_option) => {
xml_event.attr(current_attribute_option.attribute_name, current_attribute_option.attribute_value)
},
None => {
break;
},
};
}
xml_data_writer.write(xml_event);
// Checking if the line is an element
let caps = ELEMENT_REGEX.captures(current_line).unwrap();
let element_name = caps.get(2);
let xml_element_struct = Element {
indentation_count: current_indentation_count,
element_name: element_name.unwrap().as_str().to_string(),
};
xml_elements.push(xml_element_struct);
xml_event = XmlEvent::start_element(element_name.unwrap().as_str());
attribute_written_flag = false;
} else if ATTR_REGEX.is_match(current_line) {
// Checking if the line is an attribute
let caps = ATTR_REGEX.captures(current_line).unwrap();
let attr_name = caps.get(2);
let attr_value = caps.get(3);
// Saving attributes to a stack
attributes.push(Attribute{ attribute_name: attr_name.unwrap().as_str(), attribute_value: attr_value.unwrap().as_str() });
// xml_event.attr(attr_name.unwrap().as_str(), attr_value.unwrap().as_str());
}/* else if NEW_ATTR_SET_REGEX.is_match(current_line) {
let caps = NEW_ATTR_SET_REGEX.captures(current_line).unwrap();
let new_attr_set_name = caps.get(2);
let new_attr_set_value = caps.get(3);
current_xml_hash.insert("name".to_string(), new_attr_set_name.unwrap().as_str().to_string());
current_xml_hash.insert("value".to_string(), new_attr_set_value.unwrap().as_str().to_string());
} */
}
if attribute_written_flag {
xml_data_writer.write(xml_event);
}
for item in xml_elements.iter() {
let event = XmlEvent::end_element();
let event_name = item.element_name.to_string();
xml_data_writer.write(event.name(event_name.as_str()));
}
println!("OUTPUT");
println!("{:?}", target);
return "".to_string();
}
And here's the error:
error[E0382]: use of moved value: `xml_event`
--> src/main.rs:77:25
|
65 | let mut xml_event;
| ------------- move occurs because `xml_event` has type `StartElementBuilder<'_>`, which does not implement the `Copy` trait
...
77 | xml_event.attr(current_attribute_option.attribute_name, current_attribute_option.attribute_value)
| ^^^^^^^^^ --------------------------------------------------------------------------------------- `xml_event` moved due to this method call, in previous iteration of loop
|
note: this function takes ownership of the receiver `self`, which moves `xml_event`
--> /Users/defiant/.cargo/registry/src/github.com-1ecc6299db9ec823/xml-rs-0.8.4/src/writer/events.rs:193:24
|
193 | pub fn attr<N>(mut self, name: N, value: &'a str) -> StartElementBuilder<'a>
| ^^^^
From XmlEvent::start_element() documentation we see that it produces a StartElementBuilder<'a>.
From StartElementBuilder<'a>::attr() documentation we see that it consumes the StartElementBuilder<'a> (the first parameter is self, not &mut self) and produces a new StartElementBuilder<'a> (which is probably similar to self but considers the expected effect of .attr()).
This approach is known as the consuming builder pattern, which is used in Rust (for example std::thread::Builder).
The typical usage of such an approach consists in chaining the function calls: something.a().b().c().d() such as something is consumed by a(), its result is consumed by b(), the same about c() and finally d() does something useful with the last result.
The alternative would be to use mutable borrows in order to modify in place something but dealing with mutable borrows is known as difficult in some situations.
In your case, you can just reassign the result of .attr() to xml_event because otherwise the .attr() function would have no effect (its result is discarded) and xml_event would become unusable because it is consumed; reassigning it makes it usable again afterwards (at least i guess, i didn't try).

How to pass &mut str and change the original mut str without a return?

I'm learning Rust from the Book and I was tackling the exercises at the end of chapter 8, but I'm hitting a wall with the one about converting words into Pig Latin. I wanted to see specifically if I could pass a &mut String to a function that takes a &mut str (to also accept slices) and modify the referenced string inside it so the changes are reflected back outside without the need of a return, like in C with a char **.
I'm not quite sure if I'm just messing up the syntax or if it's more complicated than it sounds due to Rust's strict rules, which I have yet to fully grasp. For the lifetime errors inside to_pig_latin() I remember reading something that explained how to properly handle the situation but right now I can't find it, so if you could also point it out for me it would be very appreciated.
Also what do you think of the way I handled the chars and indexing inside strings?
use std::io::{self, Write};
fn main() {
let v = vec![
String::from("kaka"),
String::from("Apple"),
String::from("everett"),
String::from("Robin"),
];
for s in &v {
// cannot borrow `s` as mutable, as it is not declared as mutable
// cannot borrow data in a `&` reference as mutable
to_pig_latin(&mut s);
}
for (i, s) in v.iter().enumerate() {
print!("{}", s);
if i < v.len() - 1 {
print!(", ");
}
}
io::stdout().flush().unwrap();
}
fn to_pig_latin(mut s: &mut str) {
let first = s.chars().nth(0).unwrap();
let mut pig;
if "aeiouAEIOU".contains(first) {
pig = format!("{}-{}", s, "hay");
s = &mut pig[..]; // `pig` does not live long enough
} else {
let mut word = String::new();
for (i, c) in s.char_indices() {
if i != 0 {
word.push(c);
}
}
pig = format!("{}-{}{}", word, first.to_lowercase(), "ay");
s = &mut pig[..]; // `pig` does not live long enough
}
}
Edit: here's the fixed code with the suggestions from below.
fn main() {
// added mut
let mut v = vec![
String::from("kaka"),
String::from("Apple"),
String::from("everett"),
String::from("Robin"),
];
// added mut
for mut s in &mut v {
to_pig_latin(&mut s);
}
for (i, s) in v.iter().enumerate() {
print!("{}", s);
if i < v.len() - 1 {
print!(", ");
}
}
println!();
}
// converted into &mut String
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
s.push_str("-hay");
} else {
// added code to make the new first letter uppercase
let second = s.chars().nth(1).unwrap();
*s = format!(
"{}{}-{}ay",
second.to_uppercase(),
// the slice starts at the third char of the string, as if &s[2..]
&s[first.len_utf8() * 2..],
first.to_lowercase()
);
}
}
I'm not quite sure if I'm just messing up the syntax or if it's more complicated than it sounds due to Rust's strict rules, which I have yet to fully grasp. For the lifetime errors inside to_pig_latin() I remember reading something that explained how to properly handle the situation but right now I can't find it, so if you could also point it out for me it would be very appreciated.
What you're trying to do can't work: with a mutable reference you can update the referee in-place, but this is extremely limited here:
a &mut str can't change length or anything of that matter
a &mut str is still just a reference, the memory has to live somewhere, here you're creating new Strings inside your function then trying to use these as the new backing buffers for the reference, which as the compiler tells you doesn't work: the String will be deallocated at the end of the function
What you could do is take an &mut String, that lets you modify the owned string itself in-place, which is much more flexible. And, in fact, corresponds exactly to your request: an &mut str corresponds to a char*, it's a pointer to a place in memory.
A String is also a pointer, so an &mut String is a double-pointer to a zone in memory.
So something like this:
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
*s = format!("{}-{}", s, "hay");
} else {
let mut word = String::new();
for (i, c) in s.char_indices() {
if i != 0 {
word.push(c);
}
}
*s = format!("{}-{}{}", word, first.to_lowercase(), "ay");
}
}
You can also likely avoid some of the complete string allocations by using somewhat finer methods e.g.
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
s.push_str("-hay")
} else {
s.replace_range(first.len_utf8().., "");
write!(s, "-{}ay", first.to_lowercase()).unwrap();
}
}
although the replace_range + write! is not very readable and not super likely to be much of a gain, so that might as well be a format!, something along the lines of:
fn to_pig_latin(s: &mut String) {
let first = s.chars().nth(0).unwrap();
if "aeiouAEIOU".contains(first) {
s.push_str("-hay")
} else {
*s = format!("{}-{}ay", &s[first.len_utf8()..], first.to_lowercase());
}
}

Why does the Rust compiler complain that I use a moved value when I've replaced it with a new value?

I am working on two singly linked lists, named longer and shorter. The length of the longer one is guaranteed to be no less than the shorter one.
I pair the lists element-wise and do something to each pair. If the longer list has more unpaired elements, process the rest of them:
struct List {
next: Option<Box<List>>,
}
fn drain_lists(mut shorter: Option<Box<List>>, mut longer: Option<Box<List>>) {
// Pair the elements in the two lists.
while let (Some(node1), Some(node2)) = (shorter, longer) {
// Actual work elided.
shorter = node1.next;
longer = node2.next;
}
// Process the rest in the longer list.
while let Some(node) = longer {
// Actual work elided.
longer = node.next;
}
}
However, the compiler complains on the second while loop that
error[E0382]: use of moved value
--> src/lib.rs:13:20
|
5 | fn drain_lists(mut shorter: Option<Box<List>>, mut longer: Option<Box<List>>) {
| ---------- move occurs because `longer` has type `std::option::Option<std::boxed::Box<List>>`, which does not implement the `Copy` trait
6 | // Pair the elements in the two lists.
7 | while let (Some(node1), Some(node2)) = (shorter, longer) {
| ------ value moved here
...
13 | while let Some(node) = longer {
| ^^^^ value used here after move
However, I do set a new value for shorter and longer at the end of the loop, so that I will never use a moved value of them.
How should I cater to the compiler?
I think that the problem is caused by the tuple temporary in the first loop. Creating a tuple moves its components into the new tuple, and that happens even when the subsequent pattern matching fails.
First, let me write a simpler version of your code. This compiles fine:
struct Foo(i32);
fn main() {
let mut longer = Foo(0);
while let Foo(x) = longer {
longer = Foo(x + 1);
}
println!("{:?}", longer.0);
}
But if I add a temporary to the while let then I'll trigger a compiler error similar to yours:
fn fwd<T>(t: T) -> T { t }
struct Foo(i32);
fn main() {
let mut longer = Foo(0);
while let Foo(x) = fwd(longer) {
longer = Foo(x + 1);
}
println!("{:?}", longer.0);
// Error: ^ borrow of moved value: `longer`
}
The solution is to add a local variable with the value to be destructured, instead of relying on a temporary. In your code:
struct List {
next: Option<Box<List>>
}
fn drain_lists(shorter: Option<Box<List>>,
longer: Option<Box<List>>) {
// Pair the elements in the two lists.
let mut twolists = (shorter, longer);
while let (Some(node1), Some(node2)) = twolists {
// Actual work elided.
twolists = (node1.next, node2.next);
}
// Process the rest in the longer list.
let (_, mut longer) = twolists;
while let Some(node) = longer {
// Actual work elided.
longer = node.next;
}
}
Other than getting rid of the tuple (shown by others), you can capture a mutable reference to the nodes:
while let (&mut Some(ref mut node1), &mut Some(ref mut node2)) = (&mut shorter, &mut longer) {
shorter = node1.next.take();
longer = node2.next.take();
}
The use of take() enables this to work: shorter = node1.next would complain of moving a field out of a reference, which is not allowed (it would leave the node in an undefined state). But takeing it is ok because it leaves None in the next field.
Looks like the destructuring on line 7 moves the value even when the block afterwards is not evaluated. (Edit: as #Sven Marnach pointed out in the comments, a temporary tuple gets created here which causes the move)
I've uglyfied your code to prove that point :)
struct List {
next: Option<Box<List>>
}
fn drain_lists(mut shorter: Option<Box<List>>,
mut longer: Option<Box<List>>) {
// Pair the elements in the two lists.
match(shorter, longer) {
(Some(node1), Some(node2)) => {
shorter = node1.next;
longer = node2.next;
},
(_, _) => return // without this you get the error
}
// Process the rest in the longer list.
while let Some(node) = longer {
// Actual work elided.
longer = node.next;
}
}
When I added the return for the default case, the code compiled.
One solution is to avoid the tuple and consequently the move of longer into the tuple.
fn actual_work(node1: &Box<List>, node2: &Box<List>) {
// Actual work elided
}
fn drain_lists(mut shorter: Option<Box<List>>, mut longer: Option<Box<List>>) {
while let Some(node1) = shorter {
if let Some(node2) = longer.as_ref() {
actual_work(&node1, node2);
}
shorter = node1.next;
longer = longer.map_or(None, move |l| {
l.next
});
}
// Process the rest in the longer list.
while let Some(node) = longer {
// Actual work elided.
longer = node.next;
}
}

Issues with Rust timelines and ownerships

I am trying to create a hashmap by reading a file. Below is the code that I have written. The twist is that I need to persist subset_description till the next iteration so that I can store it in the hasmap and then finally return the hashmap.
fn myfunction(filename: &Path) -> io::Result<HashMap<&str, &str>> {
let mut SIF = HashMap::new();
let file = File::open(filename).unwrap();
let mut subset_description = "";
for line in BufReader::new(file).lines() {
let thisline = line?;
let line_split: Vec<&str> = thisline.split("=").collect();
subset_description = if thisline.starts_with("a") {
let subset_description = line_split[1].trim();
subset_description
} else {
""
};
let subset_ids = if thisline.starts_with("b") {
let subset_ids = line_split[1].split(",");
let subset_ids = subset_ids.map(|s| s.trim());
subset_ids.collect()
} else {
Vec::new()
};
for k in subset_ids {
SIF.insert(k, subset_description);
println!("");
}
if thisline.starts_with("!dataset_table_begin") {
break;
}
}
Ok(SIF)
}
I am getting the below error and not able to resolve this
error[E0515]: cannot return value referencing local variable `thisline`
--> src/main.rs:73:5
|
51 | let line_split: Vec<&str> = thisline.split("=").collect();
| -------- `thisline` is borrowed here
...
73 | Ok(SIF)
| ^^^^^^^ returns a value referencing data owned by the current function
The problem lies within the guarantees the Rust makes on your behalf. The root of the problem can be seen as following. You are reading a file and manipulating it's content into a HashMap, and you are trying to return reference to the the data you read. But by returning a reference you would need to guarantee, that the strings in the file wont be changed later on, which you naturally can not do.
In Rust terms you keep trying to return references to local variables, which get dropped at the end of the function, which would efficiently leave you with dangling pointers. Here is the changes I made, even though they may not be most efficient, they do compile.
fn myfunction(filename: &Path) -> io::Result<HashMap<String, String>> {
let mut SIF = HashMap::new();
let file = File::open(filename).unwrap();
let mut subset_description = "";
for line in BufReader::new(file).lines() {
let thisline = line?;
let line_split: Vec<String> = thisline.split("=").map(|s| s.to_string()).collect();
subset_description = if thisline.starts_with("a") {
let subset_description = line_split[1].trim();
subset_description
} else {
""
};
let subset_ids = if thisline.starts_with("b") {
let subset_ids = line_split[1].split(",");
let subset_ids = subset_ids.map(|s| s.trim());
subset_ids.map(|s| s.to_string()).collect()
} else {
Vec::new()
};
for k in subset_ids {
SIF.insert(k, subset_description.to_string());
println!("");
}
if thisline.starts_with("!dataset_table_begin") {
break;
}
}
Ok(SIF)
}
As you can see, now you give away the ownership of strings in return value. This is achieved by modifying the return type and using to_string() function, to give away the ownership of local strings to HashMap.
There is an argument that to_string() is slow, so you can explore the use of into or to_owned(), but as I am not proficient with those constructs I can not assist you in optimization.

Dealing with a mutable counter variable in closure

I have written a program to visit directories which is based
on the example on this page.
When I compile it, the compile displays the following 'note':
previous borrow of file_counter occurs here due to use in closure;
How can I display file_counter's value?
Is there a better (ie, more functional-like) way to count displayed files, in this program,
perhaps a non-mutable variable and/or recursion?
Many thanks.
fn main() {
let mut file_counter = 0i;
let display_path_closure = |path: &Path| {
file_counter += 1;
println!("{}) path = {}", file_counter, path.display());
};
let path = Path::new("z:/abc");
let _ = match visit_dirs(&path, display_path_closure) {
Err(e) => println!("error: {}", e),
Ok(_) => println!("Counter: {}", file_counter)
};
}
fn visit_dirs(dir: &Path, cb: |&Path|) -> io::IoResult<()> {
if dir.is_dir() {
let contents = try!(fs::readdir(dir));
for entry in contents.iter() {
if entry.is_dir() {
try!(visit_dirs(entry, |p| cb(p)));
} else {
cb(entry);
}
}
Ok(())
} else {
Err(io::standard_error(io::InvalidInput))
}
}
You can get around this by slight restructuring, putting the closure into an inner block:
fn main() {
let path = Path::new("z:/abc");
let mut file_counter = 0i;
let result = {
let display_path_closure = |path: &Path| {
file_counter += 1;
println!("{}) path = {}", file_counter, path.display());
};
visit_dirs(&path, display_path_closure)
};
let _ = match result {
Err(e) => println!("error: {}", e),
Ok(_) => println!("Counter: {}", file_counter)
};
}
As for why it happens, it is because closure captures all its environment by unique reference (mutable in your case), as if it is declared like this (pretending for a moment that closures capture their environment by value; in fact that's how unboxed closures work):
let mut file_counter = 0i;
let file_counter_ref = &mut file_counter;
// file_counter_ref is a plain pointer so it is copied into the closure,
// not taken by reference itself
let display_path_closure = |path: &Path| {
*file_counter_ref += 1;
println!("{}) path = {}", *file_counter_ref, path.display());
};
So file_counter_ref reference lasts to the end of the block in which closure is defined. In your case it is the whole main function starting from the closure declaration. I agree, this may be surprising and I certainly would also think that closure environment borrows die with the closure (e.g. when the closure is moved into the function and this function returns), but that's how things are now.
The situation with closures in Rust is currently unstable: unboxed closures have just been added to the language, so old boxed closures (like the one in this example) will soon go away; moreover, borrow checker is also being improved. These features may interact in complex ways, so maybe your original example will become possible soon :) (just a speculation, of course)

Resources