Issues with Rust timelines and ownerships - rust

I am trying to create a hashmap by reading a file. Below is the code that I have written. The twist is that I need to persist subset_description till the next iteration so that I can store it in the hasmap and then finally return the hashmap.
fn myfunction(filename: &Path) -> io::Result<HashMap<&str, &str>> {
let mut SIF = HashMap::new();
let file = File::open(filename).unwrap();
let mut subset_description = "";
for line in BufReader::new(file).lines() {
let thisline = line?;
let line_split: Vec<&str> = thisline.split("=").collect();
subset_description = if thisline.starts_with("a") {
let subset_description = line_split[1].trim();
subset_description
} else {
""
};
let subset_ids = if thisline.starts_with("b") {
let subset_ids = line_split[1].split(",");
let subset_ids = subset_ids.map(|s| s.trim());
subset_ids.collect()
} else {
Vec::new()
};
for k in subset_ids {
SIF.insert(k, subset_description);
println!("");
}
if thisline.starts_with("!dataset_table_begin") {
break;
}
}
Ok(SIF)
}
I am getting the below error and not able to resolve this
error[E0515]: cannot return value referencing local variable `thisline`
--> src/main.rs:73:5
|
51 | let line_split: Vec<&str> = thisline.split("=").collect();
| -------- `thisline` is borrowed here
...
73 | Ok(SIF)
| ^^^^^^^ returns a value referencing data owned by the current function

The problem lies within the guarantees the Rust makes on your behalf. The root of the problem can be seen as following. You are reading a file and manipulating it's content into a HashMap, and you are trying to return reference to the the data you read. But by returning a reference you would need to guarantee, that the strings in the file wont be changed later on, which you naturally can not do.
In Rust terms you keep trying to return references to local variables, which get dropped at the end of the function, which would efficiently leave you with dangling pointers. Here is the changes I made, even though they may not be most efficient, they do compile.
fn myfunction(filename: &Path) -> io::Result<HashMap<String, String>> {
let mut SIF = HashMap::new();
let file = File::open(filename).unwrap();
let mut subset_description = "";
for line in BufReader::new(file).lines() {
let thisline = line?;
let line_split: Vec<String> = thisline.split("=").map(|s| s.to_string()).collect();
subset_description = if thisline.starts_with("a") {
let subset_description = line_split[1].trim();
subset_description
} else {
""
};
let subset_ids = if thisline.starts_with("b") {
let subset_ids = line_split[1].split(",");
let subset_ids = subset_ids.map(|s| s.trim());
subset_ids.map(|s| s.to_string()).collect()
} else {
Vec::new()
};
for k in subset_ids {
SIF.insert(k, subset_description.to_string());
println!("");
}
if thisline.starts_with("!dataset_table_begin") {
break;
}
}
Ok(SIF)
}
As you can see, now you give away the ownership of strings in return value. This is achieved by modifying the return type and using to_string() function, to give away the ownership of local strings to HashMap.
There is an argument that to_string() is slow, so you can explore the use of into or to_owned(), but as I am not proficient with those constructs I can not assist you in optimization.

Related

How do I tackle lifetimes in Rust?

I am having issues with the concept of lifetimes in rust. I am trying to use the crate bgpkit_parser to read in a bz2 file via url link and then create a radix trie.
One field extracted from the file is the AS Path which I have named path in my code within the build_routetable function. I am having trouble as to why rust does not like let origin = clean_path.last() which takes the last element in the vector.
fn as_parser(element: &BgpElem) -> Vec<u32> {
let x = &element.as_path.as_ref().unwrap().segments[0];
let mut as_vec = &Vec::new();
let mut as_path: Vec<u32> = Vec::new();
if let AsPathSegment::AsSequence(value) = x {
as_vec = value;
}
for i in as_vec {
as_path.push(i.asn);
}
return as_path;
}
fn prefix_parser(element: &BgpElem) -> String {
let subnet_id = element.prefix.prefix.ip().to_string().to_owned();
let prefix_id = element.prefix.prefix.prefix().to_string().to_owned();
let prefix = format!("{}/{}", subnet_id, prefix_id);//.as_str();
return prefix;
}
fn get_aspath(raw_aspath: Vec<u32>) -> Vec<u32> {
let mut as_path = Vec::new();
for i in raw_aspath {
if i < 64511 {
if as_path.contains(&i) {
continue;
}
else {
as_path.push(i);
}
}
else if 65535 < i && i < 4000000000 {
if as_path.contains(&i) {
continue;
}
else {
as_path.push(i);
}
}
}
return as_path;
}
fn build_routetable(mut trie4: Trie<String, Option<&u32>>, mut trie6: Trie<String, Option<&u32>>) {
let url: &str = "http://archive.routeviews.org/route-views.chile/\
bgpdata/2022.06/RIBS/rib.20220601.0000.bz2";
let parser = BgpkitParser::new(url).unwrap();
let mut count = 0;
for elem in parser {
if elem.elem_type == bgpkit_parser::ElemType::ANNOUNCE {
let record_timestamp = &elem.timestamp;
let record_type = "A";
let peer = &elem.peer_ip;
let prefix = prefix_parser(&elem);
let path = as_parser(&elem);
let clean_path = get_aspath(path);
// Issue is on the below line
// `clean_path` does not live long enough
// borrowed value does not live long
// enough rustc E0597
// main.rs(103, 9): `clean_path` dropped
// here while still borrowed
// main.rs(77, 91): let's call the
// lifetime of this reference `'1`
// main.rs(92, 17): argument requires
// that `clean_path` is borrowed for `'1`
let origin = clean_path.last(); //issue line
if prefix.contains(":") {
trie6.insert(prefix, origin);
}
else {
trie4.insert(prefix, origin);
}
count+=1;
if count >= 10000 {
println!("{:?} | {:?} | {:?} | {:?} | {:?}",
record_type, record_timestamp, peer, prefix, path);
count=0
}
};
}
println!("Trie4 size: {:?} prefixes", trie4.len());
println!("Trie6 size: {:?} prefixes", trie6.len());
}
Short answer: you're "inserting" a reference. But what's being referenced doesn't outlive what it's being inserted into.
Longer: The hint is your trie4 argument, the signature of which is this:
mut trie4: Trie<String, Option<&u32>>
So that lives beyond the length of the loop where things are declared. This is all in the loop:
let origin = clean_path.last(); //issue line
if prefix.contains(":") {
trie6.insert(prefix, origin);
}
While origin is a Vec<u32> and that's fine, the insert method is no doubt taking a String and either an Option<&u32> or a &u32. Obviously a key/value pair. But here's your problem: the value has to live as long as the collection, but your value is the last element contained in the Vec<u32>, which goes away! So you can't put something into it that will not live as long as the "container" object! Rust has just saved you from dangling references (just like it's supposed to).
Basically, your containers should be Trie<String, Option<u32>> without the reference, and then this'll all just work fine. Your problem is that the elements are references, and not just contained regular values, and given the size of what you're containing, it's actually smaller to contain a u32 than a reference (pointer size (though actually, it'll likely be the same either way, because alignment issues)).
Also of note: trie4 and trie6 will both be gone at the end of this function call, because they were moved into this function (not references or mutable references). I hope that's what you want.

How to fix use of moved value in Rust?

I am trying to convert a yaml file to xml using Rust and I am not able to figure out how to fix this error regarding the use of moved value. I think I understand why this error is coming, but haven't got a clue about what to do next.
Here's the code:
struct Element {
element_name: String,
indentation_count: i16,
}
struct Attribute<'a> {
attribute_name: &'a str,
attribute_value: &'a str,
}
fn convert_yaml_to_xml(content: String, indentation_count: i16) -> String {
let mut xml_elements: Vec<Element> = vec![];
let mut attributes: Vec<Attribute> = vec![];
xml_elements.push(Element {element_name: "xmlRoot".to_string(), indentation_count: -1});
let mut target: Vec<u8> = Vec::new();
let mut xml_data_writer = EmitterConfig::new().perform_indent(true).create_writer(&mut target);
let mut attribute_written_flag = false;
let mut xml_event;
xml_event = XmlEvent::start_element("xmlRoot");
for line in content.lines() {
let current_line = line.trim();
let caps = indentation_count_regex.captures(current_line).unwrap();
let current_indentation_count = caps.get(1).unwrap().as_str().to_string().len() as i16;
if ELEMENT_REGEX.is_match(current_line) {
loop {
let current_attribute_option = attributes.pop();
match current_attribute_option {
Some(current_attribute_option) => {
xml_event.attr(current_attribute_option.attribute_name, current_attribute_option.attribute_value)
},
None => {
break;
},
};
}
xml_data_writer.write(xml_event);
// Checking if the line is an element
let caps = ELEMENT_REGEX.captures(current_line).unwrap();
let element_name = caps.get(2);
let xml_element_struct = Element {
indentation_count: current_indentation_count,
element_name: element_name.unwrap().as_str().to_string(),
};
xml_elements.push(xml_element_struct);
xml_event = XmlEvent::start_element(element_name.unwrap().as_str());
attribute_written_flag = false;
} else if ATTR_REGEX.is_match(current_line) {
// Checking if the line is an attribute
let caps = ATTR_REGEX.captures(current_line).unwrap();
let attr_name = caps.get(2);
let attr_value = caps.get(3);
// Saving attributes to a stack
attributes.push(Attribute{ attribute_name: attr_name.unwrap().as_str(), attribute_value: attr_value.unwrap().as_str() });
// xml_event.attr(attr_name.unwrap().as_str(), attr_value.unwrap().as_str());
}/* else if NEW_ATTR_SET_REGEX.is_match(current_line) {
let caps = NEW_ATTR_SET_REGEX.captures(current_line).unwrap();
let new_attr_set_name = caps.get(2);
let new_attr_set_value = caps.get(3);
current_xml_hash.insert("name".to_string(), new_attr_set_name.unwrap().as_str().to_string());
current_xml_hash.insert("value".to_string(), new_attr_set_value.unwrap().as_str().to_string());
} */
}
if attribute_written_flag {
xml_data_writer.write(xml_event);
}
for item in xml_elements.iter() {
let event = XmlEvent::end_element();
let event_name = item.element_name.to_string();
xml_data_writer.write(event.name(event_name.as_str()));
}
println!("OUTPUT");
println!("{:?}", target);
return "".to_string();
}
And here's the error:
error[E0382]: use of moved value: `xml_event`
--> src/main.rs:77:25
|
65 | let mut xml_event;
| ------------- move occurs because `xml_event` has type `StartElementBuilder<'_>`, which does not implement the `Copy` trait
...
77 | xml_event.attr(current_attribute_option.attribute_name, current_attribute_option.attribute_value)
| ^^^^^^^^^ --------------------------------------------------------------------------------------- `xml_event` moved due to this method call, in previous iteration of loop
|
note: this function takes ownership of the receiver `self`, which moves `xml_event`
--> /Users/defiant/.cargo/registry/src/github.com-1ecc6299db9ec823/xml-rs-0.8.4/src/writer/events.rs:193:24
|
193 | pub fn attr<N>(mut self, name: N, value: &'a str) -> StartElementBuilder<'a>
| ^^^^
From XmlEvent::start_element() documentation we see that it produces a StartElementBuilder<'a>.
From StartElementBuilder<'a>::attr() documentation we see that it consumes the StartElementBuilder<'a> (the first parameter is self, not &mut self) and produces a new StartElementBuilder<'a> (which is probably similar to self but considers the expected effect of .attr()).
This approach is known as the consuming builder pattern, which is used in Rust (for example std::thread::Builder).
The typical usage of such an approach consists in chaining the function calls: something.a().b().c().d() such as something is consumed by a(), its result is consumed by b(), the same about c() and finally d() does something useful with the last result.
The alternative would be to use mutable borrows in order to modify in place something but dealing with mutable borrows is known as difficult in some situations.
In your case, you can just reassign the result of .attr() to xml_event because otherwise the .attr() function would have no effect (its result is discarded) and xml_event would become unusable because it is consumed; reassigning it makes it usable again afterwards (at least i guess, i didn't try).

re.captures error: borrowed value does not live long enough

Trying to complete the "Hash Maps" chapter of the Rust book at https://doc.rust-lang.org/book/2018-edition/ch08-03-hash-maps.html , with this code:
extern crate regex;
use std::collections::HashMap;
use std::io;
use regex::Regex;
fn get_command() -> String {
let mut input_cmd = String::new();
io::stdin().read_line(&mut input_cmd)
.expect("Failed to read command");
let input_cmd = input_cmd.trim();
input_cmd.to_string()
}
fn main() {
println!("Add someone by typing e.g. \"Add Sally to Engineering\", list everyone in a department by typing e.g. \"List everyone in Sales\", or list everyone by typing \"List everyone\". To quit, type \"Quit\".");
let mut employees_by_dept: HashMap<&str, Vec<&str>> = HashMap::new();
let add_to_dept_re = Regex::new("^Add ([A-Za-z]+) to ([A-Za-z]+)$").unwrap();
let list_in_dept_re = Regex::new("^List everyone in ([A-Za-z]+)$").unwrap();
let list_all_re = Regex::new("^List everyone$").unwrap();
loop {
let input_cmd = get_command();
let caps = add_to_dept_re.captures(&input_cmd).unwrap();
if add_to_dept_re.is_match(&input_cmd) {
let dept_name = caps.get(2).unwrap().as_str();
let employee_name = caps.get(1).unwrap().as_str();
println!("Adding person");
employees_by_dept.entry(&dept_name)
.or_insert_with(Vec::new)
.push(employee_name);
} else if list_in_dept_re.is_match(&input_cmd) {
println!("Listing people");
} else if list_all_re.is_match(&input_cmd) {
println!("Listing everyone");
} else if input_cmd == "Quit" {
break;
} else {
println!("Invalid command");
break;
}
}
println!("Bye!");
}
But I get this:
error[E0597]: `input_cmd` does not live long enough
--> src/main.rs:28:45
|
28 | let caps = add_to_dept_re.captures(&input_cmd).unwrap();
| ^^^^^^^^^ borrowed value does not live long enough
...
48 | }
| - `input_cmd` dropped here while still borrowed
...
51 | }
| - borrowed value needs to live until here
Have tried .captures(&input_cmd.clone()) and various other things, but doesn't help. Any ideas?
Rust memory safety rules prevents this type of approach: your HashMap value outlives the inserted items.
See embedded comments below but especially the Ownership chapter of the book.
fn main() {
let mut employees_by_dept: HashMap<&str, Vec<&str>> = HashMap::new();
let add_to_dept_re = Regex::new("^Add ([A-Za-z]+) to ([A-Za-z]+)$").unwrap();
let list_in_dept_re = Regex::new("^List everyone in ([A-Za-z]+)$").unwrap();
let list_all_re = Regex::new("^List everyone$").unwrap();
loop {
let input_cmd = get_command();
let caps = add_to_dept_re.captures(&input_cmd).unwrap();// <--- input_cmd
//is borrowed here
// ... code for getting dept_name and employee_name references
// and inserting into HashMap omitted
} // <----- The String input_cmd is dropped here (memory is freed)
// this implies that dept_name and employee_name references
// points to deallocated memory
// ... At this point you will have a live employees_by_dept HashMap
// that contains references to deallocated memory
println!("Bye!");
}
Make instead the HashMap take ownership of the keys/items values:
fn main() {
println!("Add someone by typing e.g. \"Add Sally to Engineering\", list everyone in a department by typing e.g. \"List everyone in Sales\", or list everyone by typing \"List everyone\". To quit, type \"Quit\".");
let mut employees_by_dept: HashMap<String, Vec<String>> = HashMap::new();
let add_to_dept_re = Regex::new("^Add ([A-Za-z]+) to ([A-Za-z]+)$").unwrap();
let list_in_dept_re = Regex::new("^List everyone in ([A-Za-z]+)$").unwrap();
let list_all_re = Regex::new("^List everyone$").unwrap();
loop {
let input_cmd = get_command();
let caps = add_to_dept_re.captures(&input_cmd).unwrap();
if add_to_dept_re.is_match(&input_cmd) {
let dept_name = caps.get(2).unwrap().as_str();
let employee_name = caps.get(1).unwrap().as_str();
println!("Adding person");
employees_by_dept
.entry(dept_name.to_string())
.or_insert_with(Vec::new)
.push(employee_name.to_string());
} else if list_in_dept_re.is_match(&input_cmd) {
println!("Listing people");
} else if list_all_re.is_match(&input_cmd) {
println!("Listing everyone");
} else if input_cmd == "Quit" {
break;
} else {
println!("Invalid command");
break;
}
}
println!("Bye!");
}

How should I read the contents of a file respecting endianess?

I can see that in Rust I can read a file to a byte array with:
File::open(&Path::new("fid")).read_to_end();
I can also read just one u32 in either big endian or little endian format with:
File::open(&Path::new("fid")).read_be_u32();
File::open(&Path::new("fid")).read_le_u32();
but as far as I can see i'm going to have to do something like this (simplified):
let path = Path::new("fid");
let mut file = File::open(&path);
let mut v = vec![];
for n in range(1u64, path.stat().unwrap().size/4u64){
v.push(if big {
file.read_be_u32()
} else {
file.read_le_u32()
});
}
But that's ugly as hell and I'm just wondering if there's a nicer way to do this.
Ok so the if in the loop was a big part of what was ugly so I hoisted that as suggested, the new version is as follows:
let path = Path::new("fid");
let mut file = File::open(&path);
let mut v = vec![];
let fun = if big {
||->IoResult<u32>{file.read_be_u32()}
} else {
||->IoResult<u32>{file.read_le_u32()}
};
for n in range(1u64, path.stat().unwrap().size/4u64){
v.push(fun());
}
Learned about range_step and using _ as an index, so now I'm left with:
let path = Path::new("fid");
let mut file = File::open(&path);
let mut v = vec![];
let fun = if big {
||->IoResult<u32>{file.read_be_u32()}
} else {
||->IoResult<u32>{file.read_le_u32()}
};
for _ in range_step(0u64, path.stat().unwrap().size,4u64){
v.push(fun().unwrap());
}
Any more advice? This is already looking much better.
This solution reads the whole file into a buffer, then creates a view of the buffer as words, then maps those words into a vector, converting endianness. collect() avoids all the reallocations of growing a mutable vector. You could also mmap the file rather than reading it into a buffer.
use std::io::File;
use std::num::{Int, Num};
fn from_bytes<'a, T: Num>(buf: &'a [u8]) -> &'a [T] {
unsafe {
std::mem::transmute(std::raw::Slice {
data: buf.as_ptr(),
len: buf.len() / std::mem::size_of::<T>()
})
}
}
fn main() {
let buf = File::open(&Path::new("fid")).read_to_end().unwrap();
let words: &[u32] = from_bytes(buf.as_slice());
let big = true;
let v: Vec<u32> = words.iter().map(if big {
|&n| { Int::from_be(n) }
} else {
|&n| { Int::from_le(n) }
}).collect();
println!("{}", v);
}

Finding a way to solve "...does not live long enough"

I'm building a multiplex in rust. It's one of my first applications and a great learning experience!
However, I'm facing a problem and I cannot find out how to solve it in rust:
Whenever a new channel is added to the multiplex, I have to listen for data on this channel.
The new channel is allocated on the stack when it is requested by the open() function.
However, this channel must not be allocated on the stack but on the heap somehow, because it should stay alive and should not be freed in the next iteration of my receiving loop.
Right now my code looks like this (v0.10-pre):
extern crate collections;
extern crate sync;
use std::comm::{Chan, Port, Select};
use std::mem::size_of_val;
use std::io::ChanWriter;
use std::io::{ChanWriter, PortReader};
use collections::hashmap::HashMap;
use sync::{rendezvous, SyncPort, SyncChan};
use std::task::try;
use std::rc::Rc;
struct MultiplexStream {
internal_port: Port<(u32, Option<(Port<~[u8]>, Chan<~[u8]>)>)>,
internal_chan: Chan<u32>
}
impl MultiplexStream {
fn new(downstream: (Port<~[u8]>, Chan<~[u8]>)) -> ~MultiplexStream {
let (downstream_port, downstream_chan) = downstream;
let (p1, c1): (Port<u32>, Chan<u32>) = Chan::new();
let (p2, c2):
(Port<(u32, Option<(Port<~[u8]>, Chan<~[u8]>)>)>,
Chan<(u32, Option<(Port<~[u8]>, Chan<~[u8]>)>)>) = Chan::new();
let mux = ~MultiplexStream {
internal_port: p2,
internal_chan: c1
};
spawn(proc() {
let mut pool = Select::new();
let mut by_port_num = HashMap::new();
let mut by_handle_id = HashMap::new();
let mut handle_id2port_num = HashMap::new();
let mut internal_handle = pool.handle(&p1);
let mut downstream_handle = pool.handle(&downstream_port);
unsafe {
internal_handle.add();
downstream_handle.add();
}
loop {
let handle_id = pool.wait();
if handle_id == internal_handle.id() {
// setup new port
let port_num: u32 = p1.recv();
if by_port_num.contains_key(&port_num) {
c2.send((port_num, None))
}
else {
let (p1_,c1_): (Port<~[u8]>, Chan<~[u8]>) = Chan::new();
let (p2_,c2_): (Port<~[u8]>, Chan<~[u8]>) = Chan::new();
/********************************/
let mut h = pool.handle(&p1_); // <--
/********************************/
/* the error is HERE ^^^ */
/********************************/
unsafe { h.add() };
by_port_num.insert(port_num, c2_);
handle_id2port_num.insert(h.id(), port_num);
by_handle_id.insert(h.id(), h);
c2.send((port_num, Some((p2_,c1_))));
}
}
else if handle_id == downstream_handle.id() {
// demultiplex
let res = try(proc() {
let mut reader = PortReader::new(downstream_port);
let port_num = reader.read_le_u32().unwrap();
let data = reader.read_to_end().unwrap();
return (port_num, data);
});
if res.is_ok() {
let (port_num, data) = res.unwrap();
by_port_num.get(&port_num).send(data);
}
else {
// TODO: handle error
}
}
else {
// multiplex
let h = by_handle_id.get_mut(&handle_id);
let port_num = handle_id2port_num.get(&handle_id);
let port_num = *port_num;
let data = h.recv();
try(proc() {
let mut writer = ChanWriter::new(downstream_chan);
writer.write_le_u32(port_num);
writer.write(data);
writer.flush();
});
// todo check if chan was closed
}
}
});
return mux;
}
fn open(self, port_num: u32) -> Result<(Port<~[u8]>, Chan<~[u8]>), ()> {
let res = try(proc() {
self.internal_chan.send(port_num);
let (n, res) = self.internal_port.recv();
assert!(n == port_num);
return res;
});
if res.is_err() {
return Err(());
}
let res = res.unwrap();
if res.is_none() {
return Err(());
}
let (p,c) = res.unwrap();
return Ok((p,c));
}
}
And the compiler raises this error:
multiplex_stream.rs:81:31: 81:35 error: `p1_` does not live long enough
multiplex_stream.rs:81 let mut h = pool.handle(&p1_);
^~~~
multiplex_stream.rs:48:16: 122:4 note: reference must be valid for the block at 48:15...
multiplex_stream.rs:48 spawn(proc() {
multiplex_stream.rs:49 let mut pool = Select::new();
multiplex_stream.rs:50 let mut by_port_num = HashMap::new();
multiplex_stream.rs:51 let mut by_handle_id = HashMap::new();
multiplex_stream.rs:52 let mut handle_id2port_num = HashMap::new();
multiplex_stream.rs:53
...
multiplex_stream.rs:77:11: 87:7 note: ...but borrowed value is only valid for the block at 77:10
multiplex_stream.rs:77 else {
multiplex_stream.rs:78 let (p1_,c1_): (Port<~[u8]>, Chan<~[u8]>) = Chan::new();
multiplex_stream.rs:79 let (p2_,c2_): (Port<~[u8]>, Chan<~[u8]>) = Chan::new();
multiplex_stream.rs:80
multiplex_stream.rs:81 let mut h = pool.handle(&p1_);
multiplex_stream.rs:82 unsafe { h.add() };
Does anyone have an idea how to solve this issue?
The problem is that the new channel that you create does not live long enough—its scope is that of the else block only. You need to ensure that it will live longer—its scope must be at least that of pool.
I haven't made the effort to understand precisely what your code is doing, but what I would expect to be the simplest way to ensure the lifetime of the ports is long enough is to place it into a vector at the same scope as pool, e.g. let ports = ~[];, inserting it with ports.push(p1_); and then taking the reference as &ports[ports.len() - 1]. Sorry, that won't cut it—you can't add new items to a vector while references to its elements are active. You'll need to restructure things somewhat if you want that appraoch to work.

Resources