In Neovim, I usually do a "change inside" action - key: ci + <char> - to change the content inside some brackets or quotations. But it does not work for the vertical line in a rust closure.
// ---------------- v cursor here ---------
let spawn_worker = |nonce: u32| {
let tx = nonce_tx.clone();
thread::spawn(move || {
// Do the CPU work
let _ = make_block(nonce).hash();
// Send the result
let _ = tx.send(());
})
};
For example, I put the cursor here and press ci + | but it does not work.
Does anyone know the most intuitive way to do it without selecting the content for closures in Rust?
Related
I have the following function. It is given a file. It should return a random line from the file as a string.
fn get_word(word_list: File) -> String {
let reader = BufReader::new(word_list);
let lines = reader.lines();
let word_count = lines.count();
let y: usize = thread_rng().gen_range(0, word_count - 1);
let element = lines.nth(y);
match element {
Some(x) => println!("Result: {}", x.unwrap()),
None => println!("Error with nth"),
}
let word = String::new(""); // Once the error is gone. I would create the string.
return word;
}
But I keep getting this error:
93 | let lines = reader.lines();
| ----- move occurs because `lines` has type `std::io::Lines<BufReader<File>>`, which does not implement the `Copy` trait
94 | let word_count = lines.count();
| ------- `lines` moved due to this method call
...
99 | let element = lines.nth(y);
| ^^^^^^^^^^^^ value borrowed here after move
|
I am new to Rust and have been learning by try and error. I don't know how to access the data after I have called the count function. If there is another method to accomplish what I want, I would gladly welcome it.
The .count() method consumes the iterator. From the documentation
Consumes the iterator, counting the number of iterations and returning it.
This method will call next repeatedly until None is encountered, returning the number of times it saw Some. Note that next has to be called at least once even if the iterator does not have any elements.
In other words, it reads the file content and discards it. If you want to get the Nth line, then you have to re-read the file using another iterator instance.
If your file is small, you can save the read lines in a vector:
let lines = reader.lines().collect::<Vec<String>>();
Then the length of the vector is the number of lines and you can avoid re-reading the file, but if it's a large file you may end-up crashing with "out of memory" error. In that case you should re-read the file content, or use a better strategy such as indexing where the new lines are, so you can jump straight to the new line, without having to re-read a lot of data.
The value returned by lines is an iterator, which reads the file sequentially. To count the number of lines, the iterator is consumed: self is taken by value; ownership is transferred into the count() function. So you can't rewind and then request the nth line.
The easiest solution is to read all the lines into a vector:
let lines = reader.lines().collect::<Vec<String>>();
let word_count = lines.len();
let y: usize = thread_rng().gen_range(0, word_count - 1);
let word = lines[y].clone();
return word;
Notice the clone call: you can't simply write return lines[y]; because you'd be borrowing the string from the vector, but the vector is destroyed as soon as the function returns. By returning a clone of the string, this is avoided.
(to_owned or even to_string would also work. You can also avoid a copy by using swap_remove; I'm not sure there is a more elegant way to move one element from a vector and discard the rest.)
Note that counting the lines and then selecting one of them requires you to either rewind the iterator and go through it twice (once to count and once to select), or to store everything in memory first (e.g. with .collect::<Vec<_>>). Selecting a random line from the list can however be done in a single pass by randomly choosing on each line whether to keep the currently selected line or replacing it with the latest read line:
fn get_word(word_list: File) -> String {
let reader = BufReader::new(word_list);
let lines = reader.lines();
let mut selected = lines.next().unwrap();
let mut count = 0;
for l in lines {
count += 1;
if thread_rng().gen_range (0, count) == 0 {
selected = l;
}
}
match selected {
Ok(x) => return x,
Err(_) => {
print!("Error get_word");
return String::new();
}
}
}
Or of course the simplest way is to just use choose:
fn get_word(word_list: File) -> String {
use rand::seq::IteratorRandom;
let reader = BufReader::new(word_list);
match reader.lines.choose (thread_rng()) {
Some (Ok (x)) => return x,
_ => {
print!("Error get_word");
return String::new();
}
}
}
In order to solve this problem I used the solution given of using .collect::<Vec<String>> but the whole solution needs a little more work. At least in my case.
First: .lines returns a Iterator of type Result<std::string::String, std::io::Error>.
Second: To access the value of this vector I have to borrow it with &.
Here the working function:
fn get_word(word_list: File) -> String {
let reader = BufReader::new(word_list);
let lines = reader.lines().collect::<Vec<_>>();
let word_count = lines.len();
let y: usize = thread_rng().gen_range(0, word_count - 1);
match &lines[y] {
Ok(x) => return x.to_string(),
Err(_) => {
print!("Error get_word");
return String::new();
}
}
}
I am writing some code to call an N number of regexes over contents and if possible, I'd like to avoid cloning the strings all the time as not every regex would actually be a match. Is that even possible?
My code where I tried to do is this:
use std::borrow::Cow;
use regex::Regex;
fn main() {
let test = "abcde";
let regexes = vec![
(Regex::new("a").unwrap(), "b"),
(Regex::new("b").unwrap(), "c"),
(Regex::new("z").unwrap(), "-"),
];
let mut contents = Cow::Borrowed(test);
for (regex, new_value) in regexes {
contents = regex.replace_all(&contents, new_value);
}
println!("{}", contents);
}
The expected result there would be cccde (if it worked) and two clones. But to make it work, I have to keep cloning on every iteration:
fn main() {
let test = "abcde";
let regexes = vec![
(Regex::new("a").unwrap(), "b"),
(Regex::new("b").unwrap(), "c"),
(Regex::new("z").unwrap(), "-"),
];
let mut contents = test.to_string();
for (regex, new_value) in regexes {
contents = regex.replace_all(&contents, new_value).to_string();
}
println!("{}", contents);
}
Which then outputs cccde but with 3 clones.
Is it possible to avoid it somehow? I know I could call every regex and rebind the return but I do not have control over the amount of regex that could come.
Thanks in advance!
EDIT 1
For those who want to see the real code:
It is doing O(n^2) regexes operations.
It starts here https://github.com/jaysonsantos/there-i-fixed-it/blob/ad214a27606bc595d80bb7c5968d4f80ac032e65/src/plan/executor.rs#L185-L192 and calls this https://github.com/jaysonsantos/there-i-fixed-it/blob/main/src/plan/mod.rs#L107-L115
EDIT 2
Here is the new code with the accepted answer https://github.com/jaysonsantos/there-i-fixed-it/commit/a4f5916b3e80749de269efa219b0689cb08551f2
You can do it by using a string as the persistent owner of the string as it is being replaced, and on each iteration, checking if the returned Cow is owned. If it is owned, you know the replacement was successful, so you assign the string that is owned by the Cow into the loop variable.
let mut contents = test.to_owned();
for (regex, new_value) in regexes {
let new_contents = regex.replace_all(&contents, new_value);
if let Cow::Owned(new_string) = new_contents {
contents = new_string;
}
}
Note that assignment in Rust is by default a 'move' - this means that the value of new_string is moved rather than copied into contents.
Playground
I'm writing a project in which a struct System can be constructed from a data file.
In the data file, some lines contain keywords that indicates values to be read either inside the line or in the subsequent N following lines (separated with a blank line from the line).
I would like to have a vec! containing the keywords (statically known at compile time), check if the line returned by the iterator contains the keyword and do the appropriate operations.
Now my code looks like this:
impl System {
fn read_data<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>> where P: AsRef<Path> {
let file = File::open(filename)?;
let f = BufReader::new(file);
Ok(f.lines())
}
...
pub fn new_from_data<P>(dataname: P) -> System where P: AsRef<Path> {
let keywd = vec!["atoms", "atom types".into(),
"Atoms".into()];
let mut sys = System::new();
if let Ok(mut lines) = System::read_data(dataname) {
while let Some(line) = lines.next() {
for k in keywd {
let split: Vec<&str> = line.unwrap().split(" ").collect();
if split.contains(k) {
match k {
"atoms" => sys.natoms = split[0].parse().unwrap(),
"atom types" => sys.ntypes = split[0].parse().unwrap(),
"Atoms" => {
lines.next();
// assumes fields are: atom-ID molecule-ID atom-type q x y z
for _ in 1..=sys.natoms {
let atline = lines.next().unwrap().unwrap();
let data: Vec<&str> = atline.split(" ").collect();
let atid: i32 = data[0].parse().unwrap();
let molid: i32 = data[1].parse().unwrap();
let atype: i32 = data[2].parse().unwrap();
let charge: f32 = data[3].parse().unwrap();
let x: f32 = data[4].parse().unwrap();
let y: f32 = data[5].parse().unwrap();
let z: f32 = data[6].parse().unwrap();
let at = Atom::new(atid, molid, atype, charge, x, y, z);
sys.atoms.push(at);
};
},
_ => (),
}
}
}
}
}
sys
}
}
I'm very unsure on two points:
I don't know if I treated the line by line reading of the file in an idiomatic way as I tinkered some examples from the book and Rust by example. But returning an iterator makes me wonder when and how unwrap the results. For example, when calling the iterator inside the while loop do I have to unwrap twice like in let atline = lines.next().unwrap().unwrap();? I think that the compiler does not complain yet because of the 1st error it encounters which is
I cannot wrap my head around the type the give to the value k as I get a typical:
error[E0308]: mismatched types
--> src/system/system.rs:65:39
|
65 | if split.contains(k) {
| ^ expected `&str`, found `str`
|
= note: expected reference `&&str`
found reference `&str`
error: aborting due to previous error
How are we supposed to declare the substring and compare it to the strings I put in keywd? I tried to deference k in contains, tell it to look at &keywd etc but I just feel I'm wasting my time for not properly adressing the problem. Thanks in advance, any help is indeed appreciated.
Let's go through the issues one by one. I'll go through the as they appear in the code.
First you need to borrow keywd in the for loop, i.e. &keywd. Because otherwise keywd gets moved after the first iteration of the while loop, and thus why the compiler complains about that.
for k in &keywd {
let split: Vec<&str> = line.unwrap().split(" ").collect();
Next, when you call .unwrap() on line, that's the same problem. That causes the inner Ok value to get moved out of the Result. Instead you can do line.as_ref().unwrap() as then you get a reference to the inner Ok value and aren't consuming the line Result.
Alternatively, you can .filter_map(Result::ok) on your lines, to avoid (.as_ref()).unwrap() altogether.
You can add that directly to read_data and even simply the return type using impl ....
fn read_data<P>(filename: P) -> io::Result<impl Iterator<Item = String>>
where
P: AsRef<Path>,
{
let file = File::open(filename)?;
let f = BufReader::new(file);
Ok(f.lines().filter_map(Result::ok))
}
Note that you're splitting line for every keywd, which is needless. So you can move that outside of your for loop as well.
All in all, it ends up looking like this:
if let Ok(mut lines) = read_data("test.txt") {
while let Some(line) = lines.next() {
let split: Vec<&str> = line.split(" ").collect();
for k in &keywd {
if split.contains(k) {
...
Given that we borrowed &keywd, then we don't need to change k to &k, as now k is already &&str.
My plan is to write a simple method which does exactly what std::cin >> from the C++ standard library does:
use std::io::BufRead;
pub fn input<T: std::str::FromStr>(handle: &std::io::Stdin) -> Result<T, T::Err> {
let mut x = String::new();
let mut guard = handle.lock();
loop {
let mut trimmed = false;
let available = guard.fill_buf().unwrap();
let l = match available.iter().position(|&b| !(b as char).is_whitespace()) {
Some(i) => {
trimmed = true;
i
}
None => available.len(),
};
guard.consume(l);
if trimmed {
break;
}
}
let available = guard.fill_buf().unwrap();
let l = match available.iter().position(|&b| (b as char).is_whitespace()) {
Some(i) => i,
None => available.len(),
};
x.push_str(std::str::from_utf8(&available[..l]).unwrap());
guard.consume(l);
T::from_str(&x)
}
The loop is meant to trim away all the whitespace before valid input begins. The match block outside the loop is where the length of the valid input (that is, before trailing whitespaces begin or EOF is reached) is calculated.
Here is an example using the above method.
let handle = std::io::stdin();
let x: i32 = input(&handle).unwrap();
println!("x: {}", x);
let y: String = input(&handle).unwrap();
println!("y: {}", y);
When I tried a few simple tests, the method works as intended. However, when I use this in online programming judges like the one in codeforces, I get a complaint telling that the program sometimes stays idle or that the wrong input has been taken, among other issues, which leads to suspecting that I missed a corner case or something like that. This usually happens when the input is a few hundreds of lines long.
What input is going to break the method? What is the correction?
After a lot of experimentation, I noticed a lag when reading each input, which added up as the number of inputs were increased. The function doesn't make use of a buffer. It tries to access the stream every time it needs to fill a variable, which is slow and hence the lag.
Lesson learnt: Always use a buffer with a good capacity.
However, the idleness issue still persisted, until I replaced the fill_buf, consume pairs with something like read_line or read_string.
I'm sure this is a beginners mistake. My code is:
...
let mut latest_date : Option<Date<Local>> = None;
let mut latest_datetime : Option<DateTime<Local>> = None;
let mut latest_activity : Option<&str> = None;
for wrapped_line in reader.lines() {
let line = wrapped_line.unwrap();
println!("line: {}", line);
if date_re.is_match(&line) {
let captures = date_re.captures(&line).unwrap();
let year = captures.at(1).unwrap().parse::<i32>().unwrap();
let month = captures.at(2).unwrap().parse::<u32>().unwrap();
let day = captures.at(3).unwrap().parse::<u32>().unwrap();
latest_date = Some(Local.ymd(year, month, day));
println!("date: {}", latest_date.unwrap());
}
if time_activity_re.is_match(&line) && latest_date != None {
let captures = time_activity_re.captures(&line).unwrap();
let hour = captures.at(1).unwrap().parse::<u32>().unwrap();
let minute = captures.at(2).unwrap().parse::<u32>().unwrap();
let activity = captures.at(3).unwrap();
latest_datetime = Some(latest_date.unwrap().and_hms(hour, minute, 0));
latest_activity = if activity.len() > 0 {
Some(activity)
} else {
None
};
println!("time activity: {} |{}|", latest_datetime.unwrap(), activity);
}
}
...
My error is:
Compiling tt v0.1.0 (file:///home/chris/cloud/tt)
src/main.rs:69:55: 69:59 error: `line` does not live long enough
src/main.rs:69 let captures = time_activity_re.captures(&line).unwrap();
^~~~
src/main.rs:55:5: 84:6 note: in this expansion of for loop expansion
src/main.rs:53:51: 86:2 note: reference must be valid for the block suffix following statement 7 at 53:50...
src/main.rs:53 let mut latest_activity : Option<&str> = None;
src/main.rs:54
src/main.rs:55 for wrapped_line in reader.lines() {
src/main.rs:56 let line = wrapped_line.unwrap();
src/main.rs:57 println!("line: {}", line);
src/main.rs:58
...
src/main.rs:56:42: 84:6 note: ...but borrowed value is only valid for the block suffix following statement 0 at 56:41
src/main.rs:56 let line = wrapped_line.unwrap();
src/main.rs:57 println!("line: {}", line);
src/main.rs:58
src/main.rs:59 if date_re.is_match(&line) {
src/main.rs:60 let captures = date_re.captures(&line).unwrap();
src/main.rs:61 let year = captures.at(1).unwrap().parse::<i32>().unwrap();
...
error: aborting due to previous error
Could not compile `tt`.
I think the problem is that the latest_activity : Option<&str> lives longer than line inside the loop iteration where latest_activity is reassigned.
Is the correct?
If so, what's the best way of fixing it. The cost of allocating a new string does not bother me, though I would prefer not to do that for each iteration.
I feel I may need a reference-counted box to put the activity in - is this the right approach?
I could allocate a String outside of the loop - but how can I do so before I know how big it will need to be?
The problem is that you are already allocating a new string for every iteration (there's nowhere for the Lines iterator to store a buffer, so it has to allocate a fresh String for each line), but you're trying to store a slice into it outside the loop.
You also can't really know how big an externally allocated String would need to be in this case... so typically you wouldn't worry about it and just resize as necessary.
The simplest way is probably to make latest_activity an Option<String>. When you want to change it, you can use .clear() followed by .push_str(s) (see the String documentation). This should re-use the existing allocation if it's large enough, resizing if it isn't. It might require some re-allocating, but nothing major (provided you don't, for example, try to store increasingly longer and longer strings).
Another possibility would be to just store wrapped_line itself, moving it out of the loop. You could store that alongside the slice indices, and then do the actual slicing outside the loop (no, you can't just store the String and the &str slice separately or together with just standard library types).