How do I reuse the SplitWhitespace iterator?

How do I reuse the SplitWhitespace iterator? - rust

I've got a piece of code which is supposed to check if two sentences are "too similar", as defined by a heuristic made clearest by the code.
fn too_similar(thing1: &String, thing2: &String) -> bool {
let split1 = thing1.split_whitespace();
let split2 = thing2.split_whitespace();
let mut matches = 0;
for s1 in split1 {
for s2 in split2 {
if s1.eq(s2) {
matches = matches + 1;
break;
}
}
}
let longer_length =
if thing1.len() > thing2.len() {
thing1.len()
} else {
thing2.len()
};
matches > longer_length / 2
}
However, I'm getting the following compilation error:
error[E0382]: use of moved value: `split2`
--> src/main.rs:7:19
|
7 | for s2 in split2 {
| ^^^^^^ value moved here in previous iteration of loop
|
= note: move occurs because `split2` has type `std::str::SplitWhitespace<'_>`, which does not implement the `Copy` trait
I'm not sure why split2 is getting moved in the first place, but what's the Rust way of writing this function?

split2 is getting moved because iterating with for consumes the iterator and since the type does not implement Copy, Rust isn't copying it implicitly.
You can fix this by creating a new iterator inside the first for:
let split1 = thing1.split_whitespace();
let mut matches = 0;
for s1 in split1 {
for s2 in thing2.split_whitespace() {
if s1.eq(s2) {
matches = matches + 1;
break;
}
}
}
...
You can also rewrite the matches counting loop using some higher order functions available in the Iterator trait:
let matches = thing1.split_whitespace()
.flat_map(|c1| thing2.split_whitespace().filter(move |&c2| c1 == c2))
.count();
longer_length can also be written as:
let longer_length = std::cmp::max(thing1.len(), thing2.len());

There are possibly some better ways to do the word comparison.
If the phrases are long, then iterating over thing2's words for every word in thing1 is not very efficient. If you don't have to worry about words which appear more than once, then HashSet may help, and boils the iteration down to something like:
let words1: HashSet<&str> = thing1.split_whitespace().collect();
let words2: HashSet<&str> = thing2.split_whitespace().collect();
let matches = words1.intersection(&words2).count();
If you do care about repeated words you probably need a HashMap, and something like:
let mut words_hash1: HashMap<&str, usize> = HashMap::new();
for word in thing1.split_whitespace() {
*words_hash1.entry(word).or_insert(0) += 1;
}
let matches2: usize = thing2.split_whitespace()
.map(|s| words_hash1.get(s).cloned().unwrap_or(0))
.sum();

Related

Parallelizing nested loops in rust with rayon

I am trying to parallelize simple nested for loop in Rust with rayon but am unable to:
fn repulsion_force(object: &mut Vec<Node>) {
let v0 = 500.0;
let dx = 0.1;
for i in 0..object.len() {
for j in i + 1..object.len() {
let dir = object[j].position - object[i].position;
let l = dir.length();
let mi = object[i].mass;
let mj = object[j].mass;
let c = (dx / l).powi(13);
let v = dir.normalize() * 3.0 * (v0 / dx) * c;
object[i].current_acceleration -= v / mi;
object[j].current_acceleration += v / mj;
}
}
}
Tried to follow this post and created this:
use rayon::prelude::*;
object.par_iter_mut()
.enumerate()
.zip(object.par_iter_mut().enumerate())
.for_each(|((i, a), (j, b))| {
if j > i {
// code here
}
});
cannot borrow *object as mutable more than once at a time
second mutable borrow occurs here
But it didn't work. My problem is a bit different than one in the post because I modify two elements in one iteration and trying to borrow them both as mutable which Rust does not like, while I don't like idea of doing double the calculations when its not necessary.
Another try was to iterate through Range:
use rayon::prelude::*;
let length = object.len();
(0..length).par_bridge().for_each(|i| {
(i+1..length).for_each(|j| {
let dir = object[j].position - object[i].position;
let l = dir.length();
let mi = object[i].mass;
let mj = object[j].mass;
let c = (dx / l).powi(13);
let v = dir.normalize() * 3.0 * (v0 / dx) * c;
object[i].current_acceleration -= v / mi;
object[j].current_acceleration += v / mj;
});
cannot borrow object as mutable, as it is a captured variable in a Fn closure
This one I honestly don't understand at all, and E0596 isn't much help - my object is a &mut. New to Rust and would appreciate any help!

What you're trying to do is not as trivial as you might imagine :D
But let's give it a shot!
First, let's make a minimal reproducible example, - this is the common way to ask questions on stackoverflow. As you can imagine, we don't know what your code should do. Nor do we have the time to try and figure it out.
We would like to get a simple code piece, which fully describes the problem, copy-paste it, run it and derive a solution.
So here's my minimal example:
#[derive(Debug)]
pub struct Node {
value: i32,
other_value: i32,
}
fn repulsion_force(object: &mut [Node]) {
for i in 0..object.len() {
for j in i + 1..object.len() {
let mi = 2 * object[i].value;
let mj = mi + object[j].value;
object[i].other_value -= mi;
object[j].other_value += mj;
}
}
}
Firstly i've created a simple node type. Secondly, i've simplified the operations.
Note that instead of passing a vector, i'm passing a mutable slice. This form retains more flexibility, in case I migth need to pass a slice form an array for exmaple. Since you're not using push(), there's no need to reference a vector.
So next let's reformulate the problem for parallel computation.
First consider the structure of your loops and access pattern.
Your're iterating over all the elements in the slice, but for each i iteration, you're only modifying the object at [i] and [j > i].
so let's split the slice according to that pattern
fn repulsion_force(object: &mut [Node]) {
for i in 0..object.len() {
let (left, right) = object.split_at_mut(i + 1);
let mut node_i = &mut left[i];
right.iter_mut().for_each(|node_j| {
let mi = 2 * node_i.value;
let mj = mi + node_j.value;
node_i.other_value -= mi;
node_j.other_value += mj;
});
}
}
By splitting the slice we are getting two slices. The left slice contains [i],
the right slice contains [j > i]. next we rely on an iterator instead of indices for the iteration.
The next step would be to make the internal loop parallel. However, the internal loop modifies node_i at each iteration. That means more than one thread might try to write to node_i at the same time, causing a data race. As such the compiler won't allow it. The solution is to include a synchronization mechanism.
For a general type, that might be a mutex. But since you're using standard mathematical operations i've opted for an atomic, as these are usually faster.
So we modifiy the Node type and the internal loop to
#[derive(Debug)]
pub struct Node {
value: i32,
other_value: AtomicI32,
}
fn repulsion_force(object: &mut [Node]) {
for i in 0..object.len() {
let (left, right) = object.split_at_mut(i + 1);
let mut node_i = &mut left[i];
right.iter_mut().par_bridge().for_each(|node_j| {
let mi = 2 * node_i.value;
let mj = mi + node_j.value;
node_i.other_value.fetch_sub(mi, Relaxed);
node_j.other_value.fetch_add(mj, Relaxed);
});
}
}
you can test the code with the snippet
fn main() {
// some arbitrary object vector
let mut object: Vec<Node> = (0..100).map(|k| Node { value: k, other_value: AtomicI32::new(k) }).collect();
repulsion_force(&mut object);
println!("{:?}", object);
}
Hope this help! ;)

nested for loop through two vectors

I want something like this in Rust but I don't understand the compile errors:
fn main() {
let apple: String = String::from("apple");
let accle: String = String::from("accle");
let apple_vec: Vec<char> = apple.chars().collect();
let accle_vec: Vec<char> = accle.chars().collect();
let counter = 0;
for j in accle_vec {
for i in apple_vec {
// if i == j{
counter++;
// }
}
}
println!("counter is {}", counter);
}
I want to compare the characters of two arrays, one by one, and count every time there is a mismatch.

There are several things going on here, so let's break this down.
First error we hit is:
error: expected expression, found `+`
--> src/main.rs:12:21
|
12 | counter++;
| ^ expected expression
error: could not compile `playground` due to previous error
This means that this is invalid syntax. That is because rust does not have ++, instead we can use counter += 1 or counter = counter + 1. I'll use the first.
Making this change, we get a few errors, but concentrating on the counter, we see:
error[E0384]: cannot assign twice to immutable variable `counter`
--> src/main.rs:12:13
|
8 | let counter = 0;
| -------
| |
| first assignment to `counter`
| help: consider making this binding mutable: `mut counter`
...
12 | counter += 1;
| ^^^^^^^^^^^^ cannot assign twice to immutable variable
Some errors have detailed explanations: E0382, E0384.
For more information about an error, try `rustc --explain E0382`.
The advice we get is sound - the counter is declared as immutable and we are trying to mutate it. We should declare it as mutable. So let mut counter = 0
Lastly, we get the following error:
error[E0382]: use of moved value: `apple_vec`
--> src/main.rs:10:18
|
5 | let apple_vec: Vec<char> = apple.chars().collect();
| --------- move occurs because `apple_vec` has type `Vec<char>`, which does not implement the `Copy` trait
...
10 | for i in apple_vec {
| ^^^^^^^^^
| |
| `apple_vec` moved due to this implicit call to `.into_iter()`, in previous iteration of loop
| help: consider borrowing to avoid moving into the for loop: `&apple_vec`
|
note: this function takes ownership of the receiver `self`, which moves `apple_vec`
This is because iterating over the inner vector in this way would drain it on the first pass and on the second pass of the outer loop inner iteration would be impossible. In order to prevent this, you can borrow the vec for the iteration instead like for i in &apple_vec
Putting this all together would yield the following code:
fn main() {
let apple: String = String::from("apple");
let accle: String = String::from("accle");
let apple_vec: Vec<char> = apple.chars().collect();
let accle_vec: Vec<char> = accle.chars().collect();
let mut counter = 0;
for j in &accle_vec {
for i in &apple_vec {
if i == j {
counter += 1;
}
}
}
println!("counter is {}", counter);
}

This is how I would write the code, in a more functional manner:
use std::collections::HashSet;
fn main() {
let apple: String = String::from("appleb");
let accle: String = String::from("acclea");
// this goes through both strings at the same time
// eg. a==a p!=c p!=c
// I think maybe this was your goal
// It'll find matches where the same character is in the same position in
// both strings
// Iterate through the characters of the first string
let count1 = apple.chars()
// zip in the char from the second string
.zip(accle.chars())
// Now you have the char from the first and the second strings
// .. and you're still iterating.
// Filter so that you only keep entries where both chars are the same
.filter(|(a, b)| a == b)
// Count the output
.count();
// This is like having nested loops, where it'll compare
// the first 'a' to every letter in the second word
// eg. 'a' == 'a', 'a' != 'c'
// Using the above values, it returns one extra because there are two
// matches for 'a' in "acclea"
// This is what your original code was doing I think
// iterate through the chars of the first string
let count2 = apple.chars()
// For every char from the first string, iterate through
// *all* the chars of the second string
.flat_map(|a| accle.chars()
// Instead of just returning the char from the second string
// Return a tuple containing the char from the first string and
// the second
.map(move |b| (a, b)))
// Only accept instances where both chars are the same
.filter(|(a, b)| a == b)
// Count them
.count();
// To just see if a char from "apple" is in "accle" (anyhere)
// I would just write
let count3 = apple.chars()
// Only accept a char from "apple" if it can also be found in
// "accle"
.filter(|a| accle.chars().any(|b| *a == b))
.count();
// If speed was important and words were long, you could get all the chars
// from "accle" and put them in a HashSet
let set: HashSet<char> = accle.chars().collect();
let count4 = apple.chars()
.filter(|a| set.contains(a))
.count();
println!("Count1 is {}", count1);
println!("Count2 is {}", count2);
println!("Count3 is {}", count3);
println!("Count4 is {}", count4);
}
Playground link

Iterating over lines in a file and looking for substring from a vec! in rust

I'm writing a project in which a struct System can be constructed from a data file.
In the data file, some lines contain keywords that indicates values to be read either inside the line or in the subsequent N following lines (separated with a blank line from the line).
I would like to have a vec! containing the keywords (statically known at compile time), check if the line returned by the iterator contains the keyword and do the appropriate operations.
Now my code looks like this:
impl System {
fn read_data<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>> where P: AsRef<Path> {
let file = File::open(filename)?;
let f = BufReader::new(file);
Ok(f.lines())
}
...
pub fn new_from_data<P>(dataname: P) -> System where P: AsRef<Path> {
let keywd = vec!["atoms", "atom types".into(),
"Atoms".into()];
let mut sys = System::new();
if let Ok(mut lines) = System::read_data(dataname) {
while let Some(line) = lines.next() {
for k in keywd {
let split: Vec<&str> = line.unwrap().split(" ").collect();
if split.contains(k) {
match k {
"atoms" => sys.natoms = split[0].parse().unwrap(),
"atom types" => sys.ntypes = split[0].parse().unwrap(),
"Atoms" => {
lines.next();
// assumes fields are: atom-ID molecule-ID atom-type q x y z
for _ in 1..=sys.natoms {
let atline = lines.next().unwrap().unwrap();
let data: Vec<&str> = atline.split(" ").collect();
let atid: i32 = data[0].parse().unwrap();
let molid: i32 = data[1].parse().unwrap();
let atype: i32 = data[2].parse().unwrap();
let charge: f32 = data[3].parse().unwrap();
let x: f32 = data[4].parse().unwrap();
let y: f32 = data[5].parse().unwrap();
let z: f32 = data[6].parse().unwrap();
let at = Atom::new(atid, molid, atype, charge, x, y, z);
sys.atoms.push(at);
};
},
_ => (),
}
}
}
}
}
sys
}
}
I'm very unsure on two points:
I don't know if I treated the line by line reading of the file in an idiomatic way as I tinkered some examples from the book and Rust by example. But returning an iterator makes me wonder when and how unwrap the results. For example, when calling the iterator inside the while loop do I have to unwrap twice like in let atline = lines.next().unwrap().unwrap();? I think that the compiler does not complain yet because of the 1st error it encounters which is
I cannot wrap my head around the type the give to the value k as I get a typical:
error[E0308]: mismatched types
--> src/system/system.rs:65:39
|
65 | if split.contains(k) {
| ^ expected `&str`, found `str`
|
= note: expected reference `&&str`
found reference `&str`
error: aborting due to previous error
How are we supposed to declare the substring and compare it to the strings I put in keywd? I tried to deference k in contains, tell it to look at &keywd etc but I just feel I'm wasting my time for not properly adressing the problem. Thanks in advance, any help is indeed appreciated.

Let's go through the issues one by one. I'll go through the as they appear in the code.
First you need to borrow keywd in the for loop, i.e. &keywd. Because otherwise keywd gets moved after the first iteration of the while loop, and thus why the compiler complains about that.
for k in &keywd {
let split: Vec<&str> = line.unwrap().split(" ").collect();
Next, when you call .unwrap() on line, that's the same problem. That causes the inner Ok value to get moved out of the Result. Instead you can do line.as_ref().unwrap() as then you get a reference to the inner Ok value and aren't consuming the line Result.
Alternatively, you can .filter_map(Result::ok) on your lines, to avoid (.as_ref()).unwrap() altogether.
You can add that directly to read_data and even simply the return type using impl ....
fn read_data<P>(filename: P) -> io::Result<impl Iterator<Item = String>>
where
P: AsRef<Path>,
{
let file = File::open(filename)?;
let f = BufReader::new(file);
Ok(f.lines().filter_map(Result::ok))
}
Note that you're splitting line for every keywd, which is needless. So you can move that outside of your for loop as well.
All in all, it ends up looking like this:
if let Ok(mut lines) = read_data("test.txt") {
while let Some(line) = lines.next() {
let split: Vec<&str> = line.split(" ").collect();
for k in &keywd {
if split.contains(k) {
...
Given that we borrowed &keywd, then we don't need to change k to &k, as now k is already &&str.

What's the semantic of assignment in Rust?

How could know the type of a binding if I use auto type deduction when creating a binding? what if the expression on the right side is a borrow(like let x = &5;), will it be value or a borrow? What will happen if I re-assign a borrow or a value?
Just for check, I do can re-assign a borrow if I use let mut x: &mut T = &mut T{}; or let mut x:&T = & T{};, right?

I sense some confusion between binding and assigning:
Binding introduces a new variable, and associates it to a value,
Assigning overwrites a value with another.
This can be illustrated in two simple lines:
let mut x = 5; // Binding
x = 10; // Assigning
A binding may appear in multiple places in Rust:
let statements,
if let/while let conditions,
cases in a match expression,
and even in a for expression, on the left side of in.
Whenever there is a binding, Rust's grammar also allows pattern matching:
in the case of let statements and for expressions, the patterns must be irrefutable,
in the case of if let, while let and match cases, the patterns may fail to match.
Pattern matching means that the type of the variable introduced by the binding differs based on how the binding is made:
let x = &5; // x: &i32
let &y = &5; // y: i32
Assigning always requires using =, the assignment operator.
When assigning, the former value is overwritten, and drop is called on it if it implements Drop.
let mut x = 5;
x = 6;
// Now x == 6, drop was not called because it's a i32.
let mut s = String::from("Hello, World!");
s = String::from("Hello, 神秘德里克!");
// Now s == "Hello, 神秘德里克!", drop was called because it's a String.
The value that is overwritten may be as simple as an integer or float, a more involved struct or enum, or a reference.
let mut r = &5;
r = &6;
// Now r points to 6, drop was not called as it's a reference.
Overwriting a reference does not overwrite the value pointed to by the reference, but the reference itself. The original value still lives on, and will be dropped when it's ready.
To overwrite the pointed to value, one needs to use *, the dereference operator:
let mut x = 5;
let r = &mut x;
*r = 6;
// r still points to x, and now x = 6.
If the type of the dereferenced value requires it, drop will be called:
let mut s = String::from("Hello, World!");
let r = &mut s;
*r = String::from("Hello, 神秘德里克!");
// r still points to s, and now s = "Hello, 神秘德里克!".
I invite you to use to playground to and toy around, you can start from here:
fn main() {
let mut s = String::from("Hello, World!");
{
let r = &mut s;
*r = String::from("Hello, 神秘德里克!");
}
println!("{}", s);
}
Hopefully, things should be a little clearer now, so let's check your samples.
let x = &5;
x is a reference to i32 (&i32). What happens is that the compiler will introduce a temporary in which 5 is stored, and then borrow this temporary.
let mut x: &mut T = T{};
Is impossible. The type of T{} is T not &mut T, so this fails to compile. You could change it to let mut x: &mut T = &mut T{};.
And your last example is similar.

How solve "cannot index a value of type `usize`" error?

I am trying to measure the speed of Vec's [] indexing vs. .get(index) using the following code:
extern crate time;
fn main() {
let v = vec![1; 1_000_000];
let before_rec1 = time::precise_time_ns();
for (i, v) in (0..v.len()).enumerate() {
v[i]
}
let after_rec1 = time::precise_time_ns();
println!("Total time: {}", after_rec1 - before_rec1);
let before_rec2 = time::precise_time_ns();
for (i, v) in (0..v.len()).enumerate() {
v.get(i)
}
let after_rec2 = time::precise_time_ns();
println!("Total time: {}", after_rec2 - before_rec2);
}
but this returns the following errors:
error: cannot index a value of type `usize`
--> src/main.rs:8:9
|
8 | v[i]
| ^^^^
error: no method named `get` found for type `usize` in the current scope
--> src/main.rs:17:11
|
17 | v.get(i)
| ^^^
I'm confused why this doesn't work, since enumerate should give me an index which, by its very name, I should be able to use to index the vector.
Why is this error being thrown?
I know I can/should use iteration rather than C-style way of indexing, but for learning's sake what do I use to iterate over the index values like I'm trying to do here?

You, pal, are mightily confused here.
fn main() {
let v = vec![1; 1_000_000];
This v has type Vec<i32>.
for (i, v) in (0..v.len()).enumerate() {
v[i]
}
You are iterating over a range of indexes, from 0 to v.len(), and using enumerate to generate indices as you go:
This v has type usize
In the loop, v == i, always
So... indeed, the compiler is correct, you cannot use [] on usize.
The program "fixed":
extern crate time;
fn main() {
let v = vec![1; 1_000_000];
let before_rec1 = time::precise_time_ns();
for i in 0..v.len() {
v[i]
}
let after_rec1 = time::precise_time_ns();
println!("Total time: {}", after_rec1 - before_rec1);
let before_rec2 = time::precise_time_ns();
for i in 0..v.len() {
v.get(i)
}
let after_rec2 = time::precise_time_ns();
println!("Total time: {}", after_rec2 - before_rec2);
}
I would add a disclaimer, though, that if I were a compiler, this useless loop would be optimized into a noop. If, after compiling with --release, your programs reports 0, this is what happened.
Rust has built-in benchmarking support, I advise that you use it rather than going the naive way. And... you will also need to inspect the assembly emitted, which is the only way to make sure that you are measuring what you think you are (optimizing compilers are tricky like that).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How do I reuse the SplitWhitespace iterator? - rust

Related

Parallelizing nested loops in rust with rayon

nested for loop through two vectors

Iterating over lines in a file and looking for substring from a vec! in rust

What's the semantic of assignment in Rust?

How solve "cannot index a value of type `usize`" error?

Categories

Resources