How to allocate a string before you know how big it needs to be

How to allocate a string before you know how big it needs to be - string

I'm sure this is a beginners mistake. My code is:
...
let mut latest_date : Option<Date<Local>> = None;
let mut latest_datetime : Option<DateTime<Local>> = None;
let mut latest_activity : Option<&str> = None;
for wrapped_line in reader.lines() {
let line = wrapped_line.unwrap();
println!("line: {}", line);
if date_re.is_match(&line) {
let captures = date_re.captures(&line).unwrap();
let year = captures.at(1).unwrap().parse::<i32>().unwrap();
let month = captures.at(2).unwrap().parse::<u32>().unwrap();
let day = captures.at(3).unwrap().parse::<u32>().unwrap();
latest_date = Some(Local.ymd(year, month, day));
println!("date: {}", latest_date.unwrap());
}
if time_activity_re.is_match(&line) && latest_date != None {
let captures = time_activity_re.captures(&line).unwrap();
let hour = captures.at(1).unwrap().parse::<u32>().unwrap();
let minute = captures.at(2).unwrap().parse::<u32>().unwrap();
let activity = captures.at(3).unwrap();
latest_datetime = Some(latest_date.unwrap().and_hms(hour, minute, 0));
latest_activity = if activity.len() > 0 {
Some(activity)
} else {
None
};
println!("time activity: {} |{}|", latest_datetime.unwrap(), activity);
}
}
...
My error is:
Compiling tt v0.1.0 (file:///home/chris/cloud/tt)
src/main.rs:69:55: 69:59 error: `line` does not live long enough
src/main.rs:69 let captures = time_activity_re.captures(&line).unwrap();
^~~~
src/main.rs:55:5: 84:6 note: in this expansion of for loop expansion
src/main.rs:53:51: 86:2 note: reference must be valid for the block suffix following statement 7 at 53:50...
src/main.rs:53 let mut latest_activity : Option<&str> = None;
src/main.rs:54
src/main.rs:55 for wrapped_line in reader.lines() {
src/main.rs:56 let line = wrapped_line.unwrap();
src/main.rs:57 println!("line: {}", line);
src/main.rs:58
...
src/main.rs:56:42: 84:6 note: ...but borrowed value is only valid for the block suffix following statement 0 at 56:41
src/main.rs:56 let line = wrapped_line.unwrap();
src/main.rs:57 println!("line: {}", line);
src/main.rs:58
src/main.rs:59 if date_re.is_match(&line) {
src/main.rs:60 let captures = date_re.captures(&line).unwrap();
src/main.rs:61 let year = captures.at(1).unwrap().parse::<i32>().unwrap();
...
error: aborting due to previous error
Could not compile `tt`.
I think the problem is that the latest_activity : Option<&str> lives longer than line inside the loop iteration where latest_activity is reassigned.
Is the correct?
If so, what's the best way of fixing it. The cost of allocating a new string does not bother me, though I would prefer not to do that for each iteration.
I feel I may need a reference-counted box to put the activity in - is this the right approach?
I could allocate a String outside of the loop - but how can I do so before I know how big it will need to be?

The problem is that you are already allocating a new string for every iteration (there's nowhere for the Lines iterator to store a buffer, so it has to allocate a fresh String for each line), but you're trying to store a slice into it outside the loop.
You also can't really know how big an externally allocated String would need to be in this case... so typically you wouldn't worry about it and just resize as necessary.
The simplest way is probably to make latest_activity an Option<String>. When you want to change it, you can use .clear() followed by .push_str(s) (see the String documentation). This should re-use the existing allocation if it's large enough, resizing if it isn't. It might require some re-allocating, but nothing major (provided you don't, for example, try to store increasingly longer and longer strings).
Another possibility would be to just store wrapped_line itself, moving it out of the loop. You could store that alongside the slice indices, and then do the actual slicing outside the loop (no, you can't just store the String and the &str slice separately or together with just standard library types).

Related

Access value after it has been borrowed

I have the following function. It is given a file. It should return a random line from the file as a string.
fn get_word(word_list: File) -> String {
let reader = BufReader::new(word_list);
let lines = reader.lines();
let word_count = lines.count();
let y: usize = thread_rng().gen_range(0, word_count - 1);
let element = lines.nth(y);
match element {
Some(x) => println!("Result: {}", x.unwrap()),
None => println!("Error with nth"),
}
let word = String::new(""); // Once the error is gone. I would create the string.
return word;
}
But I keep getting this error:
93 | let lines = reader.lines();
| ----- move occurs because `lines` has type `std::io::Lines<BufReader<File>>`, which does not implement the `Copy` trait
94 | let word_count = lines.count();
| ------- `lines` moved due to this method call
...
99 | let element = lines.nth(y);
| ^^^^^^^^^^^^ value borrowed here after move
|
I am new to Rust and have been learning by try and error. I don't know how to access the data after I have called the count function. If there is another method to accomplish what I want, I would gladly welcome it.

The .count() method consumes the iterator. From the documentation
Consumes the iterator, counting the number of iterations and returning it.
This method will call next repeatedly until None is encountered, returning the number of times it saw Some. Note that next has to be called at least once even if the iterator does not have any elements.
In other words, it reads the file content and discards it. If you want to get the Nth line, then you have to re-read the file using another iterator instance.
If your file is small, you can save the read lines in a vector:
let lines = reader.lines().collect::<Vec<String>>();
Then the length of the vector is the number of lines and you can avoid re-reading the file, but if it's a large file you may end-up crashing with "out of memory" error. In that case you should re-read the file content, or use a better strategy such as indexing where the new lines are, so you can jump straight to the new line, without having to re-read a lot of data.

The value returned by lines is an iterator, which reads the file sequentially. To count the number of lines, the iterator is consumed: self is taken by value; ownership is transferred into the count() function. So you can't rewind and then request the nth line.
The easiest solution is to read all the lines into a vector:
let lines = reader.lines().collect::<Vec<String>>();
let word_count = lines.len();
let y: usize = thread_rng().gen_range(0, word_count - 1);
let word = lines[y].clone();
return word;
Notice the clone call: you can't simply write return lines[y]; because you'd be borrowing the string from the vector, but the vector is destroyed as soon as the function returns. By returning a clone of the string, this is avoided.
(to_owned or even to_string would also work. You can also avoid a copy by using swap_remove; I'm not sure there is a more elegant way to move one element from a vector and discard the rest.)

Note that counting the lines and then selecting one of them requires you to either rewind the iterator and go through it twice (once to count and once to select), or to store everything in memory first (e.g. with .collect::<Vec<_>>). Selecting a random line from the list can however be done in a single pass by randomly choosing on each line whether to keep the currently selected line or replacing it with the latest read line:
fn get_word(word_list: File) -> String {
let reader = BufReader::new(word_list);
let lines = reader.lines();
let mut selected = lines.next().unwrap();
let mut count = 0;
for l in lines {
count += 1;
if thread_rng().gen_range (0, count) == 0 {
selected = l;
}
}
match selected {
Ok(x) => return x,
Err(_) => {
print!("Error get_word");
return String::new();
}
}
}
Or of course the simplest way is to just use choose:
fn get_word(word_list: File) -> String {
use rand::seq::IteratorRandom;
let reader = BufReader::new(word_list);
match reader.lines.choose (thread_rng()) {
Some (Ok (x)) => return x,
_ => {
print!("Error get_word");
return String::new();
}
}
}

In order to solve this problem I used the solution given of using .collect::<Vec<String>> but the whole solution needs a little more work. At least in my case.
First: .lines returns a Iterator of type Result<std::string::String, std::io::Error>.
Second: To access the value of this vector I have to borrow it with &.
Here the working function:
fn get_word(word_list: File) -> String {
let reader = BufReader::new(word_list);
let lines = reader.lines().collect::<Vec<_>>();
let word_count = lines.len();
let y: usize = thread_rng().gen_range(0, word_count - 1);
match &lines[y] {
Ok(x) => return x.to_string(),
Err(_) => {
print!("Error get_word");
return String::new();
}
}
}

The size for values of type `[char]` cannot be known at compilation time

I have the following Rust code ...
const BINARY_SIZE: usize = 5;
let mut all_bits: Vec<[char; BINARY_SIZE]> = Vec::new();
let mut one_bits: [char; BINARY_SIZE] = ['0'; BINARY_SIZE];
all_bits.push(one_bits);
for i in [0..BINARY_SIZE] {
let one = all_bits[0];
let first_ok = one[0]; // This works, first_ok is '0'
let first_fail = one[i]; // This works not
}
How can I get from the variable 'one' the i'th character from the array?
The compiler gives me for let first_fail = one[i]; the error message ..
error[E0277]: the size for values of type [char] cannot be known at compilation time

Your problem is that you're using the Range syntax incorrectly. By wrapping 0..BINARY_SIZE in brackets, you're iterating over the elements in a slice of Ranges, rather than iterating over the values within the range you specified.
This means that i is of type Range rather than type usize. You can prove this by adding let i: usize = i; at the top of the loop. And indexing with a range returns a slice, rather than an element of your array.
Try removing the brackets like so:
const BINARY_SIZE: usize = 5;
let mut all_bits: Vec<[char; BINARY_SIZE]> = Vec::new();
let mut one_bits: [char; BINARY_SIZE] = ['0'; BINARY_SIZE];
all_bits.push(one_bits);
for i in 0..BINARY_SIZE {
let one = all_bits[0];
let first_ok = one[0]; // This works, first_ok is '0'
let first_fail = one[i]; // This works now
}
The error here really doesn't help much. But if you were using a helpful editor integration like rust-analyzer, you would see an inlay type hint showing i: Range.
Perhaps the rust compiler error message here can be improved to trace back through the index type.

Creating struct with values from function parameter Vec<String>and returning Vec<struct> to caller

The purpose of my program is to read questions/answers from a file (line by line), and create several structs from it, put into a Vec for further processing.
I have a rather long piece of code, which I tried to separate into several functions (full version on Playground; hopefully is valid link).
I suppose I'm not understanding a lot about borrowing, lifetimes and other things. Apart from that, the given examples from all around I've seen, I'm not able to adapt to my given problems.
Tryigin to remodel my struct fields from &str to String didn't change anything. As it was with creating Vec<Question> within get_question_list.
Function of concern is as follows:
fn get_question_list<'a>(mut questions: Vec<Question<'a>>, lines: Vec<String>) -> Vec<Question<'a>> {
let count = lines.len();
for i in (0..count).step_by(2) {
let q: &str = lines.get(i).unwrap();
let a: &str = lines.get(i + 1).unwrap();
questions.push(Question::new(q, a));
}
questions
}
This code fails with the compiler as following (excerpt):
error[E0597]: `lines` does not live long enough
--> src/main.rs:126:23
|
119 | fn get_question_list<'a>(mut questions: Vec<Question<'a>>, lines: Vec<String>) -> Vec<Question<'a>> {
| -- lifetime `'a` defined here
...
126 | let a: &str = lines.get(i + 1).unwrap();
| ^^^^^ borrowed value does not live long enough
127 |
128 | questions.push(Question::new(q, a));
| ----------------------------------- argument requires that `lines` is borrowed for `'a`
...
163 | }
| - `lines` dropped here while still borrowed
Call to get_question_list is around:
let lines: Vec<String> = content.split("\n").map(|s| s.to_string()).collect();
let counter = lines.len();
if counter % 2 != 0 {
return Err("Found lines in quiz file are not even (one question or answer is missing.).");
}
questions = get_question_list(questions, lines);
Ok(questions)

The issue is that your Questions are supposed to borrow something (hence the lifetime annotation), but lines gets moved into the function, so when you create a new question from a line, it's borrowing function-local data, which is going to be destroyed at the end of the function. As a consequence, the questions you're creating can't escape the function creating them.
Now what you could do is not move the lines into the function: lines: &[String] would have the lines be owned by the caller, which would "fix" get_question_list.
However the exact same problem exists in read_questions_from_file, and there it can not be resolved: the lines are read from a file, and thus are necessarily local to the function (unless you move the lines-reading to main and read_questions_from_file only borrows them as well).
Therefore the simplest proper fix is to change Question to own its data:
struct Question {
question: String,
answer: String
}
This way the question itself keeps its data alive, and the issue goes away.
We can improve things further though, I think:
First, we can strip out the entire mess around newlines by using String::lines, it will handle cross-platform linebreaks, and will strip them.
It also seems rather odd that get_question_list takes a vector by value only to append to it and immediately return it. A more intuitive interface would be to either:
take the "output vector" by &mut so the caller can pre-size or reuse it across multiple loads, which doesn't really seem useful in this case
or create the output vector internally, which seems like the most sensible case here
Here is what I would consider a more pleasing version: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c0d440d67654b92c75d136eba2bba0c1
fn read_questions_from_file(filename: &str) -> Result<Vec<Question>, Box<dyn Error>> {
let file_content = read_file(filename)?;
let lines: Vec<_> = file_content.lines().collect();
if lines.len() % 2 != 0 {
return Err(Box::new(OddLines));
}
let mut questions = Vec::with_capacity(lines.len() / 2);
for chunk in lines.chunks(2) {
if let [q, a] = chunk {
questions.push(Question::new(q.to_string(), a.to_string()))
} else {
unreachable!("Odd lines should already have been checked");
}
}
Ok(questions)
}
Note that I inlined / removed get_question_list as I don't think it pulls its weight at this point, and it's both trivial and very specific.
Here is a variant which works similarly but with different tradeoffs: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3b8f95aef5bcae904545617749086dbc
fn read_questions_from_file(filename: &str) -> Result<Vec<Question>, Box<dyn Error>> {
let file_content = read_file(filename)?;
let mut lines = file_content.lines();
let mut questions = Vec::new();
while let Some(q) = lines.next() {
let a = lines.next().ok_or(OddLines)?;
questions.push(Question::new(q.to_string(), a.to_string()));
}
Ok(questions)
}
it avoids collecting the lines to a Vec, but as a result has to process the file to the end before it knows that said file is suitable, and it can't preallocate Questions.
At this point, because we do not care for lines being a Vec anymore, we could operate on a BufRead and strip out read_file as well:
fn read_questions_from_file(filename: &str) -> Result<Vec<Question>, Box<dyn Error>> {
let file_content = BufReader::new(File::open(filename)?);
let mut lines = file_content.lines();
let mut questions = Vec::new();
while let Some(q) = lines.next() {
let a = lines.next().ok_or(OddLines)?;
questions.push(Question::new(q?, a?));
}
Ok(questions)
}
The extra ? are because while str::Lines yields &str, io::Lines yields Result<String, io::Error>: IO errors are reported lazily when a read is attempted, meaning every line-read could report a failure if read_to_string would have failed.
OTOH since io::Lines returns a Result<String, ...> we can use q and a directly without needing to convert them to String.

What corner case am I missing in my Rust emulation of C++'s `std::cin >>`?

My plan is to write a simple method which does exactly what std::cin >> from the C++ standard library does:
use std::io::BufRead;
pub fn input<T: std::str::FromStr>(handle: &std::io::Stdin) -> Result<T, T::Err> {
let mut x = String::new();
let mut guard = handle.lock();
loop {
let mut trimmed = false;
let available = guard.fill_buf().unwrap();
let l = match available.iter().position(|&b| !(b as char).is_whitespace()) {
Some(i) => {
trimmed = true;
i
}
None => available.len(),
};
guard.consume(l);
if trimmed {
break;
}
}
let available = guard.fill_buf().unwrap();
let l = match available.iter().position(|&b| (b as char).is_whitespace()) {
Some(i) => i,
None => available.len(),
};
x.push_str(std::str::from_utf8(&available[..l]).unwrap());
guard.consume(l);
T::from_str(&x)
}
The loop is meant to trim away all the whitespace before valid input begins. The match block outside the loop is where the length of the valid input (that is, before trailing whitespaces begin or EOF is reached) is calculated.
Here is an example using the above method.
let handle = std::io::stdin();
let x: i32 = input(&handle).unwrap();
println!("x: {}", x);
let y: String = input(&handle).unwrap();
println!("y: {}", y);
When I tried a few simple tests, the method works as intended. However, when I use this in online programming judges like the one in codeforces, I get a complaint telling that the program sometimes stays idle or that the wrong input has been taken, among other issues, which leads to suspecting that I missed a corner case or something like that. This usually happens when the input is a few hundreds of lines long.
What input is going to break the method? What is the correction?

After a lot of experimentation, I noticed a lag when reading each input, which added up as the number of inputs were increased. The function doesn't make use of a buffer. It tries to access the stream every time it needs to fill a variable, which is slow and hence the lag.
Lesson learnt: Always use a buffer with a good capacity.
However, the idleness issue still persisted, until I replaced the fill_buf, consume pairs with something like read_line or read_string.

Why doesn't this compile - use of undeclared type name `thread::scoped`

I'm trying to get my head around Rust. I've got an alpha version of 1.
Here's the problem I'm trying to program: I have a vector of floats. I want to set up some threads asynchronously. Each thread should wait for the number of seconds specified by each element of the vector, and return the value of the element, plus 10. The results need to be in input order.
It's an artificial example, to be sure, but I wanted to see if I could implement something simple before moving onto more complex code. Here is my code so far:
use std::thread;
use std::old_io::timer;
use std::time::duration::Duration;
fn main() {
let mut vin = vec![1.4f64, 1.2f64, 1.5f64];
let mut guards: Vec<thread::scoped> = Vec::with_capacity(3);
let mut answers: Vec<f64> = Vec::with_capacity(3);
for i in 0..3 {
guards[i] = thread::scoped( move || {
let ms = (1000.0f64 * vin[i]) as i64;
let d = Duration::milliseconds(ms);
timer::sleep(d);
println!("Waited {}", vin[i]);
answers[i] = 10.0f64 + (vin[i] as f64);
})};
for i in 0..3 {guards[i].join(); };
for i in 0..3 {println!("{}", vin[i]); }
}
So the input vector is [1.4, 1.2, 1.5], and I'm expecting the output vector to be [11.4, 11.2, 11.5].
There appear to be a number of problems with my code, but the first one is that I get a compilation error:
threads.rs:7:25: 7:39 error: use of undeclared type name `thread::scoped`
threads.rs:7 let mut guards: Vec<thread::scoped> = Vec::with_capacity(3);
^~~~~~~~~~~~~~
error: aborting due to previous error
There also seem to be a number of other problems, including using vin within a closure. Also, I have no idea what move does, other than the fact that every example I've seen seems to use it.

Your error is due to the fact that thread::scoped is a function, not a type. What you want is a Vec<T> where T is the result type of the function. Rust has a neat feature that helps you here: It automatically detects the correct type of your variables in many situations.
If you use
let mut guards = Vec::with_capacity(3);
the type of guards will be chosen when you use .push() the first time.
There also seem to be a number of other problems.
you are accessing guards[i] in the first for loop, but the length of the guards vector is 0. Its capacity is 3, which means that you won't have any unnecessary allocations as long as the vector never contains more than 3 elements. use guards.push(x) instead of guards[i] = x.
thread::scoped expects a Fn() -> T, so your closure can return an object. You get that object when you call .join(), so you don't need an answer-vector.
vin is moved to the closure. Therefore in the second iteration of the loop that creates your guards, vin isn't available anymore to be moved to the "second" closure. Every loop iteration creates a new closure.
i is moved to the closure. I have no idea what's going on there. But the solution is to let inval = vin[i]; outside the closure, and then use inval inside the closure. This also solves Point 3.
vin is mutable. Yet you never mutate it. Don't bind variables mutably if you don't need to.
vin is an array of f64. Therefore (vin[i] as f64) does nothing. Therefore you can simply use vin[i] directly.
join moves out of the guard. Since you cannot move out of an array, your cannot index into an array of guards and join the element at the specified index. What you can do is loop over the elements of the array and join each guard.
Basically this means: don't iterate over indices (for i in 1..3), but iterate over elements (for element in vector) whenever possible.
All of the above implemented:
use std::thread;
use std::old_io::timer;
use std::time::duration::Duration;
fn main() {
let vin = vec![1.4f64, 1.2f64, 1.5f64];
let mut guards = Vec::with_capacity(3);
for inval in vin {
guards.push(thread::scoped( move || {
let ms = (1000.0f64 * inval) as i64;
let d = Duration::milliseconds(ms);
timer::sleep(d);
println!("Waited {}", inval);
10.0f64 + inval
}));
}
for guard in guards {
let answer = guard.join();
println!("{}", answer);
};
}

In supplement of Ker's answer: if you really need to mutate arrays within a thread, I suppose the most closest valid solution for your task will be something like this:
use std::thread::spawn;
use std::old_io::timer;
use std::sync::{Arc, Mutex};
use std::time::duration::Duration;
fn main() {
let vin = Arc::new(vec![1.4f64, 1.2f64, 1.5f64]);
let answers = Arc::new(Mutex::new(vec![0f64, 0f64, 0f64]));
let mut workers = Vec::new();
for i in 0..3 {
let worker_vin = vin.clone();
let worker_answers = answers.clone();
let worker = spawn( move || {
let ms = (1000.0f64 * worker_vin[i]) as i64;
let d = Duration::milliseconds(ms);
timer::sleep(d);
println!("Waited {}", worker_vin[i]);
let mut answers = worker_answers.lock().unwrap();
answers[i] = 10.0f64 + (worker_vin[i] as f64);
});
workers.push(worker);
}
for worker in workers { worker.join().unwrap(); }
for answer in answers.lock().unwrap().iter() {
println!("{}", answer);
}
}
In order to share vectors between several threads, I have to prove, that these vectors outlive all of my threads. I cannot use just Vec, because it will be destroyed at the end of main block, and another thread could live longer, possibly accessing freed memory. So I took Arc reference counter, which guarantees, that my vectors will be destroyed only when the counter downs to zero.
Arc allows me to share read-only data. In order to mutate answers array, I should use some synchronize tools, like Mutex. That is how Rust prevents me to make data races.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to allocate a string before you know how big it needs to be - string

Related

Access value after it has been borrowed

The size for values of type `[char]` cannot be known at compilation time

Creating struct with values from function parameter Vec<String>and returning Vec<struct> to caller

What corner case am I missing in my Rust emulation of C++'s `std::cin >>`?

Why doesn't this compile - use of undeclared type name `thread::scoped`

Categories

Resources