Access value after it has been borrowed - rust

I have the following function. It is given a file. It should return a random line from the file as a string.
fn get_word(word_list: File) -> String {
let reader = BufReader::new(word_list);
let lines = reader.lines();
let word_count = lines.count();
let y: usize = thread_rng().gen_range(0, word_count - 1);
let element = lines.nth(y);
match element {
Some(x) => println!("Result: {}", x.unwrap()),
None => println!("Error with nth"),
}
let word = String::new(""); // Once the error is gone. I would create the string.
return word;
}
But I keep getting this error:
93 | let lines = reader.lines();
| ----- move occurs because `lines` has type `std::io::Lines<BufReader<File>>`, which does not implement the `Copy` trait
94 | let word_count = lines.count();
| ------- `lines` moved due to this method call
...
99 | let element = lines.nth(y);
| ^^^^^^^^^^^^ value borrowed here after move
|
I am new to Rust and have been learning by try and error. I don't know how to access the data after I have called the count function. If there is another method to accomplish what I want, I would gladly welcome it.

The .count() method consumes the iterator. From the documentation
Consumes the iterator, counting the number of iterations and returning it.
This method will call next repeatedly until None is encountered, returning the number of times it saw Some. Note that next has to be called at least once even if the iterator does not have any elements.
In other words, it reads the file content and discards it. If you want to get the Nth line, then you have to re-read the file using another iterator instance.
If your file is small, you can save the read lines in a vector:
let lines = reader.lines().collect::<Vec<String>>();
Then the length of the vector is the number of lines and you can avoid re-reading the file, but if it's a large file you may end-up crashing with "out of memory" error. In that case you should re-read the file content, or use a better strategy such as indexing where the new lines are, so you can jump straight to the new line, without having to re-read a lot of data.

The value returned by lines is an iterator, which reads the file sequentially. To count the number of lines, the iterator is consumed: self is taken by value; ownership is transferred into the count() function. So you can't rewind and then request the nth line.
The easiest solution is to read all the lines into a vector:
let lines = reader.lines().collect::<Vec<String>>();
let word_count = lines.len();
let y: usize = thread_rng().gen_range(0, word_count - 1);
let word = lines[y].clone();
return word;
Notice the clone call: you can't simply write return lines[y]; because you'd be borrowing the string from the vector, but the vector is destroyed as soon as the function returns. By returning a clone of the string, this is avoided.
(to_owned or even to_string would also work. You can also avoid a copy by using swap_remove; I'm not sure there is a more elegant way to move one element from a vector and discard the rest.)

Note that counting the lines and then selecting one of them requires you to either rewind the iterator and go through it twice (once to count and once to select), or to store everything in memory first (e.g. with .collect::<Vec<_>>). Selecting a random line from the list can however be done in a single pass by randomly choosing on each line whether to keep the currently selected line or replacing it with the latest read line:
fn get_word(word_list: File) -> String {
let reader = BufReader::new(word_list);
let lines = reader.lines();
let mut selected = lines.next().unwrap();
let mut count = 0;
for l in lines {
count += 1;
if thread_rng().gen_range (0, count) == 0 {
selected = l;
}
}
match selected {
Ok(x) => return x,
Err(_) => {
print!("Error get_word");
return String::new();
}
}
}
Or of course the simplest way is to just use choose:
fn get_word(word_list: File) -> String {
use rand::seq::IteratorRandom;
let reader = BufReader::new(word_list);
match reader.lines.choose (thread_rng()) {
Some (Ok (x)) => return x,
_ => {
print!("Error get_word");
return String::new();
}
}
}

In order to solve this problem I used the solution given of using .collect::<Vec<String>> but the whole solution needs a little more work. At least in my case.
First: .lines returns a Iterator of type Result<std::string::String, std::io::Error>.
Second: To access the value of this vector I have to borrow it with &.
Here the working function:
fn get_word(word_list: File) -> String {
let reader = BufReader::new(word_list);
let lines = reader.lines().collect::<Vec<_>>();
let word_count = lines.len();
let y: usize = thread_rng().gen_range(0, word_count - 1);
match &lines[y] {
Ok(x) => return x.to_string(),
Err(_) => {
print!("Error get_word");
return String::new();
}
}
}

Related

Creating struct with values from function parameter Vec<String>and returning Vec<struct> to caller

The purpose of my program is to read questions/answers from a file (line by line), and create several structs from it, put into a Vec for further processing.
I have a rather long piece of code, which I tried to separate into several functions (full version on Playground; hopefully is valid link).
I suppose I'm not understanding a lot about borrowing, lifetimes and other things. Apart from that, the given examples from all around I've seen, I'm not able to adapt to my given problems.
Tryigin to remodel my struct fields from &str to String didn't change anything. As it was with creating Vec<Question> within get_question_list.
Function of concern is as follows:
fn get_question_list<'a>(mut questions: Vec<Question<'a>>, lines: Vec<String>) -> Vec<Question<'a>> {
let count = lines.len();
for i in (0..count).step_by(2) {
let q: &str = lines.get(i).unwrap();
let a: &str = lines.get(i + 1).unwrap();
questions.push(Question::new(q, a));
}
questions
}
This code fails with the compiler as following (excerpt):
error[E0597]: `lines` does not live long enough
--> src/main.rs:126:23
|
119 | fn get_question_list<'a>(mut questions: Vec<Question<'a>>, lines: Vec<String>) -> Vec<Question<'a>> {
| -- lifetime `'a` defined here
...
126 | let a: &str = lines.get(i + 1).unwrap();
| ^^^^^ borrowed value does not live long enough
127 |
128 | questions.push(Question::new(q, a));
| ----------------------------------- argument requires that `lines` is borrowed for `'a`
...
163 | }
| - `lines` dropped here while still borrowed
Call to get_question_list is around:
let lines: Vec<String> = content.split("\n").map(|s| s.to_string()).collect();
let counter = lines.len();
if counter % 2 != 0 {
return Err("Found lines in quiz file are not even (one question or answer is missing.).");
}
questions = get_question_list(questions, lines);
Ok(questions)
The issue is that your Questions are supposed to borrow something (hence the lifetime annotation), but lines gets moved into the function, so when you create a new question from a line, it's borrowing function-local data, which is going to be destroyed at the end of the function. As a consequence, the questions you're creating can't escape the function creating them.
Now what you could do is not move the lines into the function: lines: &[String] would have the lines be owned by the caller, which would "fix" get_question_list.
However the exact same problem exists in read_questions_from_file, and there it can not be resolved: the lines are read from a file, and thus are necessarily local to the function (unless you move the lines-reading to main and read_questions_from_file only borrows them as well).
Therefore the simplest proper fix is to change Question to own its data:
struct Question {
question: String,
answer: String
}
This way the question itself keeps its data alive, and the issue goes away.
We can improve things further though, I think:
First, we can strip out the entire mess around newlines by using String::lines, it will handle cross-platform linebreaks, and will strip them.
It also seems rather odd that get_question_list takes a vector by value only to append to it and immediately return it. A more intuitive interface would be to either:
take the "output vector" by &mut so the caller can pre-size or reuse it across multiple loads, which doesn't really seem useful in this case
or create the output vector internally, which seems like the most sensible case here
Here is what I would consider a more pleasing version: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c0d440d67654b92c75d136eba2bba0c1
fn read_questions_from_file(filename: &str) -> Result<Vec<Question>, Box<dyn Error>> {
let file_content = read_file(filename)?;
let lines: Vec<_> = file_content.lines().collect();
if lines.len() % 2 != 0 {
return Err(Box::new(OddLines));
}
let mut questions = Vec::with_capacity(lines.len() / 2);
for chunk in lines.chunks(2) {
if let [q, a] = chunk {
questions.push(Question::new(q.to_string(), a.to_string()))
} else {
unreachable!("Odd lines should already have been checked");
}
}
Ok(questions)
}
Note that I inlined / removed get_question_list as I don't think it pulls its weight at this point, and it's both trivial and very specific.
Here is a variant which works similarly but with different tradeoffs: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3b8f95aef5bcae904545617749086dbc
fn read_questions_from_file(filename: &str) -> Result<Vec<Question>, Box<dyn Error>> {
let file_content = read_file(filename)?;
let mut lines = file_content.lines();
let mut questions = Vec::new();
while let Some(q) = lines.next() {
let a = lines.next().ok_or(OddLines)?;
questions.push(Question::new(q.to_string(), a.to_string()));
}
Ok(questions)
}
it avoids collecting the lines to a Vec, but as a result has to process the file to the end before it knows that said file is suitable, and it can't preallocate Questions.
At this point, because we do not care for lines being a Vec anymore, we could operate on a BufRead and strip out read_file as well:
fn read_questions_from_file(filename: &str) -> Result<Vec<Question>, Box<dyn Error>> {
let file_content = BufReader::new(File::open(filename)?);
let mut lines = file_content.lines();
let mut questions = Vec::new();
while let Some(q) = lines.next() {
let a = lines.next().ok_or(OddLines)?;
questions.push(Question::new(q?, a?));
}
Ok(questions)
}
The extra ? are because while str::Lines yields &str, io::Lines yields Result<String, io::Error>: IO errors are reported lazily when a read is attempted, meaning every line-read could report a failure if read_to_string would have failed.
OTOH since io::Lines returns a Result<String, ...> we can use q and a directly without needing to convert them to String.

Iterating over lines in a file and looking for substring from a vec! in rust

I'm writing a project in which a struct System can be constructed from a data file.
In the data file, some lines contain keywords that indicates values to be read either inside the line or in the subsequent N following lines (separated with a blank line from the line).
I would like to have a vec! containing the keywords (statically known at compile time), check if the line returned by the iterator contains the keyword and do the appropriate operations.
Now my code looks like this:
impl System {
fn read_data<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>> where P: AsRef<Path> {
let file = File::open(filename)?;
let f = BufReader::new(file);
Ok(f.lines())
}
...
pub fn new_from_data<P>(dataname: P) -> System where P: AsRef<Path> {
let keywd = vec!["atoms", "atom types".into(),
"Atoms".into()];
let mut sys = System::new();
if let Ok(mut lines) = System::read_data(dataname) {
while let Some(line) = lines.next() {
for k in keywd {
let split: Vec<&str> = line.unwrap().split(" ").collect();
if split.contains(k) {
match k {
"atoms" => sys.natoms = split[0].parse().unwrap(),
"atom types" => sys.ntypes = split[0].parse().unwrap(),
"Atoms" => {
lines.next();
// assumes fields are: atom-ID molecule-ID atom-type q x y z
for _ in 1..=sys.natoms {
let atline = lines.next().unwrap().unwrap();
let data: Vec<&str> = atline.split(" ").collect();
let atid: i32 = data[0].parse().unwrap();
let molid: i32 = data[1].parse().unwrap();
let atype: i32 = data[2].parse().unwrap();
let charge: f32 = data[3].parse().unwrap();
let x: f32 = data[4].parse().unwrap();
let y: f32 = data[5].parse().unwrap();
let z: f32 = data[6].parse().unwrap();
let at = Atom::new(atid, molid, atype, charge, x, y, z);
sys.atoms.push(at);
};
},
_ => (),
}
}
}
}
}
sys
}
}
I'm very unsure on two points:
I don't know if I treated the line by line reading of the file in an idiomatic way as I tinkered some examples from the book and Rust by example. But returning an iterator makes me wonder when and how unwrap the results. For example, when calling the iterator inside the while loop do I have to unwrap twice like in let atline = lines.next().unwrap().unwrap();? I think that the compiler does not complain yet because of the 1st error it encounters which is
I cannot wrap my head around the type the give to the value k as I get a typical:
error[E0308]: mismatched types
--> src/system/system.rs:65:39
|
65 | if split.contains(k) {
| ^ expected `&str`, found `str`
|
= note: expected reference `&&str`
found reference `&str`
error: aborting due to previous error
How are we supposed to declare the substring and compare it to the strings I put in keywd? I tried to deference k in contains, tell it to look at &keywd etc but I just feel I'm wasting my time for not properly adressing the problem. Thanks in advance, any help is indeed appreciated.
Let's go through the issues one by one. I'll go through the as they appear in the code.
First you need to borrow keywd in the for loop, i.e. &keywd. Because otherwise keywd gets moved after the first iteration of the while loop, and thus why the compiler complains about that.
for k in &keywd {
let split: Vec<&str> = line.unwrap().split(" ").collect();
Next, when you call .unwrap() on line, that's the same problem. That causes the inner Ok value to get moved out of the Result. Instead you can do line.as_ref().unwrap() as then you get a reference to the inner Ok value and aren't consuming the line Result.
Alternatively, you can .filter_map(Result::ok) on your lines, to avoid (.as_ref()).unwrap() altogether.
You can add that directly to read_data and even simply the return type using impl ....
fn read_data<P>(filename: P) -> io::Result<impl Iterator<Item = String>>
where
P: AsRef<Path>,
{
let file = File::open(filename)?;
let f = BufReader::new(file);
Ok(f.lines().filter_map(Result::ok))
}
Note that you're splitting line for every keywd, which is needless. So you can move that outside of your for loop as well.
All in all, it ends up looking like this:
if let Ok(mut lines) = read_data("test.txt") {
while let Some(line) = lines.next() {
let split: Vec<&str> = line.split(" ").collect();
for k in &keywd {
if split.contains(k) {
...
Given that we borrowed &keywd, then we don't need to change k to &k, as now k is already &&str.

What corner case am I missing in my Rust emulation of C++'s `std::cin >>`?

My plan is to write a simple method which does exactly what std::cin >> from the C++ standard library does:
use std::io::BufRead;
pub fn input<T: std::str::FromStr>(handle: &std::io::Stdin) -> Result<T, T::Err> {
let mut x = String::new();
let mut guard = handle.lock();
loop {
let mut trimmed = false;
let available = guard.fill_buf().unwrap();
let l = match available.iter().position(|&b| !(b as char).is_whitespace()) {
Some(i) => {
trimmed = true;
i
}
None => available.len(),
};
guard.consume(l);
if trimmed {
break;
}
}
let available = guard.fill_buf().unwrap();
let l = match available.iter().position(|&b| (b as char).is_whitespace()) {
Some(i) => i,
None => available.len(),
};
x.push_str(std::str::from_utf8(&available[..l]).unwrap());
guard.consume(l);
T::from_str(&x)
}
The loop is meant to trim away all the whitespace before valid input begins. The match block outside the loop is where the length of the valid input (that is, before trailing whitespaces begin or EOF is reached) is calculated.
Here is an example using the above method.
let handle = std::io::stdin();
let x: i32 = input(&handle).unwrap();
println!("x: {}", x);
let y: String = input(&handle).unwrap();
println!("y: {}", y);
When I tried a few simple tests, the method works as intended. However, when I use this in online programming judges like the one in codeforces, I get a complaint telling that the program sometimes stays idle or that the wrong input has been taken, among other issues, which leads to suspecting that I missed a corner case or something like that. This usually happens when the input is a few hundreds of lines long.
What input is going to break the method? What is the correction?
After a lot of experimentation, I noticed a lag when reading each input, which added up as the number of inputs were increased. The function doesn't make use of a buffer. It tries to access the stream every time it needs to fill a variable, which is slow and hence the lag.
Lesson learnt: Always use a buffer with a good capacity.
However, the idleness issue still persisted, until I replaced the fill_buf, consume pairs with something like read_line or read_string.

How to allocate a string before you know how big it needs to be

I'm sure this is a beginners mistake. My code is:
...
let mut latest_date : Option<Date<Local>> = None;
let mut latest_datetime : Option<DateTime<Local>> = None;
let mut latest_activity : Option<&str> = None;
for wrapped_line in reader.lines() {
let line = wrapped_line.unwrap();
println!("line: {}", line);
if date_re.is_match(&line) {
let captures = date_re.captures(&line).unwrap();
let year = captures.at(1).unwrap().parse::<i32>().unwrap();
let month = captures.at(2).unwrap().parse::<u32>().unwrap();
let day = captures.at(3).unwrap().parse::<u32>().unwrap();
latest_date = Some(Local.ymd(year, month, day));
println!("date: {}", latest_date.unwrap());
}
if time_activity_re.is_match(&line) && latest_date != None {
let captures = time_activity_re.captures(&line).unwrap();
let hour = captures.at(1).unwrap().parse::<u32>().unwrap();
let minute = captures.at(2).unwrap().parse::<u32>().unwrap();
let activity = captures.at(3).unwrap();
latest_datetime = Some(latest_date.unwrap().and_hms(hour, minute, 0));
latest_activity = if activity.len() > 0 {
Some(activity)
} else {
None
};
println!("time activity: {} |{}|", latest_datetime.unwrap(), activity);
}
}
...
My error is:
Compiling tt v0.1.0 (file:///home/chris/cloud/tt)
src/main.rs:69:55: 69:59 error: `line` does not live long enough
src/main.rs:69 let captures = time_activity_re.captures(&line).unwrap();
^~~~
src/main.rs:55:5: 84:6 note: in this expansion of for loop expansion
src/main.rs:53:51: 86:2 note: reference must be valid for the block suffix following statement 7 at 53:50...
src/main.rs:53 let mut latest_activity : Option<&str> = None;
src/main.rs:54
src/main.rs:55 for wrapped_line in reader.lines() {
src/main.rs:56 let line = wrapped_line.unwrap();
src/main.rs:57 println!("line: {}", line);
src/main.rs:58
...
src/main.rs:56:42: 84:6 note: ...but borrowed value is only valid for the block suffix following statement 0 at 56:41
src/main.rs:56 let line = wrapped_line.unwrap();
src/main.rs:57 println!("line: {}", line);
src/main.rs:58
src/main.rs:59 if date_re.is_match(&line) {
src/main.rs:60 let captures = date_re.captures(&line).unwrap();
src/main.rs:61 let year = captures.at(1).unwrap().parse::<i32>().unwrap();
...
error: aborting due to previous error
Could not compile `tt`.
I think the problem is that the latest_activity : Option<&str> lives longer than line inside the loop iteration where latest_activity is reassigned.
Is the correct?
If so, what's the best way of fixing it. The cost of allocating a new string does not bother me, though I would prefer not to do that for each iteration.
I feel I may need a reference-counted box to put the activity in - is this the right approach?
I could allocate a String outside of the loop - but how can I do so before I know how big it will need to be?
The problem is that you are already allocating a new string for every iteration (there's nowhere for the Lines iterator to store a buffer, so it has to allocate a fresh String for each line), but you're trying to store a slice into it outside the loop.
You also can't really know how big an externally allocated String would need to be in this case... so typically you wouldn't worry about it and just resize as necessary.
The simplest way is probably to make latest_activity an Option<String>. When you want to change it, you can use .clear() followed by .push_str(s) (see the String documentation). This should re-use the existing allocation if it's large enough, resizing if it isn't. It might require some re-allocating, but nothing major (provided you don't, for example, try to store increasingly longer and longer strings).
Another possibility would be to just store wrapped_line itself, moving it out of the loop. You could store that alongside the slice indices, and then do the actual slicing outside the loop (no, you can't just store the String and the &str slice separately or together with just standard library types).

Collect items from an iterator at a specific index

I was wondering if it is possible to use .collect() on an iterator to grab items at a specific index. For example if I start with a string, I would normally do:
let line = "Some line of text for example";
let l = line.split(" ");
let lvec: Vec<&str> = l.collect();
let text = &lvec[3];
But what would be nice is something like:
let text: &str = l.collect(index=(3));
No, it's not; however you can easily filter before you collect, which in practice achieves the same effect.
If you wish to filter by index, you need to add the index in and then strip it afterwards:
enumerate (to add the index to the element)
filter based on this index
map to strip the index from the element
Or in code:
fn main() {
let line = "Some line of text for example";
let l = line.split(" ")
.enumerate()
.filter(|&(i, _)| i == 3 )
.map(|(_, e)| e);
let lvec: Vec<&str> = l.collect();
let text = &lvec[0];
println!("{}", text);
}
If you only wish to get a single index (and thus element), then using nth is much easier. It returns an Option<&str> here, which you need to take care of:
fn main() {
let line = "Some line of text for example";
let text = line.split(" ").nth(3).unwrap();
println!("{}", text);
}
If you can have an arbitrary predicate but wishes only the first element that matches, then collecting into a Vec is inefficient: it will consume the whole iterator (no laziness) and allocate potentially a lot of memory that is not needed at all.
You are thus better off simply asking for the first element using the next method of the iterator, which returns an Option<&str> here:
fn main() {
let line = "Some line of text for example";
let text = line.split(" ")
.enumerate()
.filter(|&(i, _)| i % 7 == 3 )
.map(|(_, e)| e)
.next()
.unwrap();
println!("{}", text);
}
If you want to select part of the result, by index, you may also use skip and take before collecting, but I guess you have enough alternatives presented here already.
There is a nth function on Iterator that does this:
let text = line.split(" ").nth(3).unwrap();
No; you can use take and next, though:
let line = "Some line of text for example";
let l = line.split(" ");
let text = l.skip(3).next();
Note that this results in text being an Option<&str>, as there's no guarantee that the sequence actually has at least four elements.
Addendum: using nth is definitely shorter, though I prefer to be explicit about the fact that accessing the nth element of an iterator necessarily consumes all the elements before it.
For anyone who may be interested, you can can do loads of cool things with iterators (thanks Matthieu M), for example to get multiple 'words' from a string according to their index, you can use filter along with logical or || to test for multiple indexes !
let line = "FCC2CCMACXX:4:1105:10758:14389# 81 chrM 1 32 10S90M = 16151 16062"
let words: Vec<&str> = line.split(" ")
.enumerate()
.filter(|&(i, _)| i==1 || i==3 || i==6 )
.map(|(_, e) | e)
.collect();

Resources