Parallelizing nested loops in rust with rayon - multithreading

I am trying to parallelize simple nested for loop in Rust with rayon but am unable to:
fn repulsion_force(object: &mut Vec<Node>) {
let v0 = 500.0;
let dx = 0.1;
for i in 0..object.len() {
for j in i + 1..object.len() {
let dir = object[j].position - object[i].position;
let l = dir.length();
let mi = object[i].mass;
let mj = object[j].mass;
let c = (dx / l).powi(13);
let v = dir.normalize() * 3.0 * (v0 / dx) * c;
object[i].current_acceleration -= v / mi;
object[j].current_acceleration += v / mj;
}
}
}
Tried to follow this post and created this:
use rayon::prelude::*;
object.par_iter_mut()
.enumerate()
.zip(object.par_iter_mut().enumerate())
.for_each(|((i, a), (j, b))| {
if j > i {
// code here
}
});
cannot borrow *object as mutable more than once at a time
second mutable borrow occurs here
But it didn't work. My problem is a bit different than one in the post because I modify two elements in one iteration and trying to borrow them both as mutable which Rust does not like, while I don't like idea of doing double the calculations when its not necessary.
Another try was to iterate through Range:
use rayon::prelude::*;
let length = object.len();
(0..length).par_bridge().for_each(|i| {
(i+1..length).for_each(|j| {
let dir = object[j].position - object[i].position;
let l = dir.length();
let mi = object[i].mass;
let mj = object[j].mass;
let c = (dx / l).powi(13);
let v = dir.normalize() * 3.0 * (v0 / dx) * c;
object[i].current_acceleration -= v / mi;
object[j].current_acceleration += v / mj;
});
cannot borrow object as mutable, as it is a captured variable in a Fn closure
This one I honestly don't understand at all, and E0596 isn't much help - my object is a &mut. New to Rust and would appreciate any help!

What you're trying to do is not as trivial as you might imagine :D
But let's give it a shot!
First, let's make a minimal reproducible example, - this is the common way to ask questions on stackoverflow. As you can imagine, we don't know what your code should do. Nor do we have the time to try and figure it out.
We would like to get a simple code piece, which fully describes the problem, copy-paste it, run it and derive a solution.
So here's my minimal example:
#[derive(Debug)]
pub struct Node {
value: i32,
other_value: i32,
}
fn repulsion_force(object: &mut [Node]) {
for i in 0..object.len() {
for j in i + 1..object.len() {
let mi = 2 * object[i].value;
let mj = mi + object[j].value;
object[i].other_value -= mi;
object[j].other_value += mj;
}
}
}
Firstly i've created a simple node type. Secondly, i've simplified the operations.
Note that instead of passing a vector, i'm passing a mutable slice. This form retains more flexibility, in case I migth need to pass a slice form an array for exmaple. Since you're not using push(), there's no need to reference a vector.
So next let's reformulate the problem for parallel computation.
First consider the structure of your loops and access pattern.
Your're iterating over all the elements in the slice, but for each i iteration, you're only modifying the object at [i] and [j > i].
so let's split the slice according to that pattern
fn repulsion_force(object: &mut [Node]) {
for i in 0..object.len() {
let (left, right) = object.split_at_mut(i + 1);
let mut node_i = &mut left[i];
right.iter_mut().for_each(|node_j| {
let mi = 2 * node_i.value;
let mj = mi + node_j.value;
node_i.other_value -= mi;
node_j.other_value += mj;
});
}
}
By splitting the slice we are getting two slices. The left slice contains [i],
the right slice contains [j > i]. next we rely on an iterator instead of indices for the iteration.
The next step would be to make the internal loop parallel. However, the internal loop modifies node_i at each iteration. That means more than one thread might try to write to node_i at the same time, causing a data race. As such the compiler won't allow it. The solution is to include a synchronization mechanism.
For a general type, that might be a mutex. But since you're using standard mathematical operations i've opted for an atomic, as these are usually faster.
So we modifiy the Node type and the internal loop to
#[derive(Debug)]
pub struct Node {
value: i32,
other_value: AtomicI32,
}
fn repulsion_force(object: &mut [Node]) {
for i in 0..object.len() {
let (left, right) = object.split_at_mut(i + 1);
let mut node_i = &mut left[i];
right.iter_mut().par_bridge().for_each(|node_j| {
let mi = 2 * node_i.value;
let mj = mi + node_j.value;
node_i.other_value.fetch_sub(mi, Relaxed);
node_j.other_value.fetch_add(mj, Relaxed);
});
}
}
you can test the code with the snippet
fn main() {
// some arbitrary object vector
let mut object: Vec<Node> = (0..100).map(|k| Node { value: k, other_value: AtomicI32::new(k) }).collect();
repulsion_force(&mut object);
println!("{:?}", object);
}
Hope this help! ;)

Related

Alternative to swapping vector elements in rust

I'm experimenting with rust by porting some c++ code. I write a lot of code that uses vectors as object pools by moving elements to the back in various ways and then resizing. Here's a ported function:
use rand::{thread_rng, Rng};
fn main() {
for n in 1..11 {
let mut a: Vec<u8> = (1..11).collect();
keep_n_rand(&mut a, n);
println!("{}: {:?}", n, a);
}
}
fn keep_n_rand<T>(x: &mut Vec<T>, n: usize) {
let mut rng = thread_rng();
for i in n..x.len() {
let j = rng.gen_range(0..i);
if j < n {
x.swap(i, j);
}
}
x.truncate(n);
}
It keeps n elements chosen at random. It is done this way because it does not reduce the capacity of the vector so that more objects can be added later without allocating (on average). This might be iterated millions of times.
In c++, I would use x[j] = std::move(x[i]); because I am about to truncate the vector. While it has no impact in this example, if the swap was expensive, it would make sense to move. Is that possible and desirable in rust? I can live with a swap. I'm just curious.
Correct me if I'm wrong: you're looking for a way to retain n random elements in a Vec and discard the rest. In that case, the easiest way would be to use partial_shuffle(), a rand function implemented for slices.
Shuffle a slice in place, but exit early.
Returns two mutable slices from the source slice. The first contains amount elements randomly permuted. The second has the remaining elements that are not fully shuffled.
use rand::{thread_rng, seq::SliceRandom};
fn main() {
let mut rng = thread_rng();
// Use the `RangeInclusive` (`..=`) syntax at times like this.
for n in 1..=10 {
let mut elements: Vec<u8> = (1..=10).collect();
let (elements, _rest) = elements.as_mut_slice().partial_shuffle(&mut rng, n);
println!("{n}: {elements:?}");
}
}
Run this snippet on Rust Playground.
elements is shadowed, going from a Vec to a &mut [T]. If you're only going to use it inside the function, that's probably all you'll need. However, since it's a reference, you can't return it; the data it's pointing to is owned by the original vector, which will be dropped when it goes out of scope. If that's what you need, you'll have to turn the slice into a Vec.
While you can simply construct a new one from it using Vec::from, I suspect (but haven't tested) that it's more efficient to use Vec::split_off.
Splits the collection into two at the given index.
Returns a newly allocated vector containing the elements in the range [at, len). After the call, the original vector will be left containing the elements [0, at) with its previous capacity unchanged.
use rand::{thread_rng, seq::SliceRandom};
fn main() {
let mut rng = thread_rng();
for n in 1..=10 {
let mut elements: Vec<u8> = (1..=10).collect();
elements.as_mut_slice().partial_shuffle(&mut rng, n);
let elements = elements.split_off(elements.len() - n);
// `elements` is still a `Vec`; this time, containing only
// the shuffled elements. You can use it as the return value.
println!("{n}: {elements:?}");
}
}
Run this snippet on Rust Playground.
Since this function lives on a performance-critical path, I'd recommend benchmarking it against your current implementation. At the time of writing this, criterion is the most popular way to do that. That said, rand is an established library, so I imagine it will perform as well or better than a manual implementation.
Sample Benchmark
I don't know what kind of numbers you're working with, but here's a sample benchmark with for n in 1..=100 and (1..=100).collect() (i.e. 100 instead of 10 in both places) without the print statements:
manual time: [73.683 µs 73.749 µs 73.821 µs]
rand with slice time: [68.074 µs 68.147 µs 68.226 µs]
rand with vec time: [54.147 µs 54.213 µs 54.288 µs]
Bizarrely, splitting off a Vec performed vastly better than not. Unless I made an error in my benchmarks, the compiler is probably doing something under the hood that you'll need a more experienced Rustacean than me to explain.
Benchmark Implementation
Cargo.toml
[dependencies]
rand = "0.8.5"
[dev-dependencies]
criterion = "0.4.0"
[[bench]]
name = "rand_benchmark"
harness = false
[[bench]]
name = "rand_vec_benchmark"
harness = false
[[bench]]
name = "manual_benchmark"
harness = false
benches/manual_benchmark.rs
use criterion::{criterion_group, criterion_main, Criterion};
fn manual_solution() {
for n in 1..=100 {
let mut elements: Vec<u8> = (1..=100).collect();
keep_n_rand(&mut elements, n);
}
}
fn keep_n_rand<T>(elements: &mut Vec<T>, n: usize) {
use rand::{thread_rng, Rng};
let mut rng = thread_rng();
for i in n..elements.len() {
let j = rng.gen_range(0..i);
if j < n {
elements.swap(i, j);
}
}
elements.truncate(n);
}
fn benchmark(c: &mut Criterion) {
c.bench_function("manual", |b| b.iter(manual_solution));
}
criterion_group!(benches, benchmark);
criterion_main!(benches);
benches/rand_benchmark.rs
use criterion::{criterion_group, criterion_main, Criterion};
fn rand_solution() {
use rand::{seq::SliceRandom, thread_rng};
let mut rng = thread_rng();
for n in 1..=100 {
let mut elements: Vec<u8> = (1..=100).collect();
let (_elements, _) = elements.as_mut_slice().partial_shuffle(&mut rng, n);
}
}
fn benchmark(c: &mut Criterion) {
c.bench_function("rand with slice", |b| b.iter(rand_solution));
}
criterion_group!(benches, benchmark);
criterion_main!(benches);
benches/rand_vec_benchmark.rs
use criterion::{criterion_group, criterion_main, Criterion};
fn rand_solution() {
use rand::{seq::SliceRandom, thread_rng};
let mut rng = thread_rng();
for n in 1..=100 {
let mut elements: Vec<u8> = (1..=100).collect();
elements.as_mut_slice().partial_shuffle(&mut rng, n);
let _elements = elements.split_off(elements.len() - n);
}
}
fn benchmark(c: &mut Criterion) {
c.bench_function("rand with vec", |b| b.iter(rand_solution));
}
criterion_group!(benches, benchmark);
criterion_main!(benches);
Is that possible and desirable in rust?
It is not possible unless you constrain T: Copy or T: Clone: while C++ uses non-destructive moves (the source is in a valid but unspecified state) Rust uses destructive moves (the source is gone).
There are ways around it using unsafe but they require being very careful and it's probably not worth the hassle (you can look at Vec::swap_remove for a taste, it basically does what you're doing here except only between j and the last element of the vec).
I'd also recommend verified_tinker's solution, as I'm not convinced your shuffle is unbiased.

How do I update a variable in a loop to a reference to a value created inside the loop?

I want to enter a loop with a variable n which is borrowed by the function. At each step, n takes a new value; when exiting the loop, the job is done, with the help of other variables, and n will never be used again.
If I don't use references, I have something like this:
fn test(n: Thing) -> usize {
// stuff
let mut n = n;
for i in 1..10 {
let (q, m) = n.do_something(...);
n = m;
// stuff with x
}
x
}
x is the result of some computation with q and m but it is an usize type and I didn't encounter any issue in this part of the code. I didn't test this code, but this is the idea. I could make code written like this work.
Since I want to do it with a reference; I tried to write:
fn test(n: &Thing) -> usize {
// stuff
let mut n = n;
for i in 1..10 {
let (q, m) = (*n).do_something(...);
n = &m;
// stuff with x
}
x
}
Now the code will not compile because m has a shorter lifetime than n. I tried to make it work by doing some tricky things or by cloning things, but this can't be the right way. In C, the code would work because we don't care about what n is pointing to when exiting the loop since n isn't used after the loop. I perfectly understand that this is where Rust and C differ, but I am pretty sure a clean way of doing it in Rust exists.
Consider my question as very general; I am not asking for some ad-hoc solution for a specific problem.
As Chris Emerson points out, what you are doing is unsafe and it is probably not appropriate to write code like that in C either. The variable you are taking a reference to goes out of scope at the end of each loop iteration, and thus you would have a dangling pointer at the beginning of the next iteration. This would lead to all of the memory errors that Rust attempts to prevent; Rust has prevented you from doing something bad that you thought was safe.
If you want something that can be either borrowed or owned; that's a Cow:
use std::borrow::Cow;
#[derive(Clone)]
struct Thing;
impl Thing {
fn do_something(&self) -> (usize, Thing) {
(1, Thing)
}
}
fn test(n: &Thing) -> usize {
let mut n = Cow::Borrowed(n);
let mut x = 0;
for _ in 1..10 {
let (q, m) = n.do_something();
n = Cow::Owned(m);
x = x + q;
}
x
}
fn main() {
println!("{}", test(&Thing));
}
If I understand this right, the problem is not related to life outside of the loop; m doesn't live long enough to keep a reference for the next iteration.
let mut n = n;
for i in 1..10 {
let (q,m) = (*n).do_something(...)
n = &m
} // At this point m is no longer live, i.e. doesn't live until the next iteration.
Again it depends on the specific types/lifetimes, but you could potentially assign m to a variable with a longer lifetime, but then you're back to the first example.

Rust use of moved value

When using below function:
fn factors(number: &BigInt) -> Vec<BigInt> {
let mut n = number.clone();
let mut i: BigInt = ToBigInt::to_bigint(&2).unwrap();
let mut factors = Vec::<BigInt>::new();
while i * i <= n {
if (n % i) == ToBigInt::to_bigint(&1).unwrap() {
i = i + ToBigInt::to_bigint(&1).unwrap();
}
else {
n = n/i as BigInt;
factors.push(i);
}
i = i + ToBigInt::to_bigint(&1).unwrap();
}
if n > i {
factors.push(n);
}
factors
}
I get moved value errors for literally every time i or n is used, starting from the line with while, also in the if. I have read about borrowing, which I understand decently, but this thing I don't understand.
I am not "copying" the value at all, so I don't see anywhere were I could lose ownership of the variables.
Mul (and the other arithmetic operators) take the parameters by value, so i * i move the value i (this is not a problem for primitive numbers because they implement Copy - BigInt does not).
As Mul is implemented for (two) &BigInt, you can do the multiplication (and the other arithmetic operations) with &:
use num::*;
fn factors(number: &BigInt) -> Vec<BigInt> {
let mut n = number.clone();
let mut i = BigInt::from(2);
let mut factors = Vec::new();
while &i * &i <= n {
if (&n % &i) == BigInt::one() {
i = i + BigInt::one();
} else {
n = n / &i;
factors.push(i.clone());
}
i = i + BigInt::one();
}
if n > i {
factors.push(n);
}
factors
}
Note that I also made some simplifications, like omitting the type on Vec::new and using BigInt::from (cannot fail).
Remember that operators in Rust are just syntactic sugar for function calls.
a + b translates to a.add(b).
Primitive types such as i32 implement the trait Copy. Thus, they can be copied into such an add function and do not need to be moved.
I assume the BigInt type you are working with does not implement this trait.
Therefore, in every binary operation you are moving the values.

How can the state be shared between the returned result and the next iteration when using the scan iterator?

I would like to use the Scan iterator to construct a vector in a declarative fashion. It is clear how to achieve this by copying the intermediate state. The following expression compiles and produced the desired series:
let vec = (0..10).scan(0, |state, current| {
*state = *state + current;
Some(*state)
}).collect::<Vec<_>>();
However, if I try to achieve the same behavior by moving the state instead of copying it, I get in trouble with lifetimes. For example, when working with vectors instead of integers, one cannot move the state out of the closure and reuse it in the next iteration. The expression
let vec = (0..10).map(|x| vec![x]).scan(vec![0], |state, current| {
*state = vec![state[0] + current[0]];
Some(*state)
}).collect::<Vec<_>>();
fails to compile due to
error: cannot move out of borrowed content [E0507]
Some(*state)
^~~~~~
see for example this MVCE.
Borrowing the state instead of moving would also be an option:
let start = &vec![0];
let vec = (0..10).map(|x| vec![x]).scan(start, |state, current| {
*state = &vec![state[0] + current[0]];
Some(*state)
}).collect::<Vec<_>>();
but this fails because the new value falls out of scope when the state is returned.
error: borrowed value does not live long enough
*state = &vec![state[0] + current[0]]
What I ended up doing is using the for loop
let xs = &mut Vec::<Vec<i32>>::with_capacity(10);
xs.push[vec!(0)];
for x in 1..10 {
let z = vec![xs.last().unwrap()[0] + x];
xs.push(z);
};
but I wold prefer a chaining solution.
Let's check the definition of scan:
fn scan<St, B, F>(self, initial_state: St, f: F) -> Scan<Self, St, F>
where F: FnMut(&mut St, Self::Item) -> Option<B>
Note that B is distinct from St. The idea of scan is that:
you keep an accumulator of type St
at each iteration, you produce a value of type B
and indeed it is not quite suited to returning values of type St because you are only borrowing St and do not control its lifetime.
scan is made for you to return a brand new value each time:
let xs = (0..10).scan(0, |state, current| {
*state += current;
Some(NonCopy::new(*state))
}).collect::<Vec<_>>();
and that's it!
A note on efficiency.
The state of scan is a sunk cost so it is best to use a cheap state (a single integer here).
If you need a larger type X and wish to "get your memory back", then you can pass an &mut Option<X> and then use .take() after the scan:
let mut x = Some(NonCopy::new(0));
let xs = (0..10).scan(&mut x, |state, current| {
let i: &mut i32 = &mut state.as_mut().unwrap().value;
*i += current;
Some(NonCopy::new(*i))
}).collect::<Vec<_>>();
let _ = x.take();
It's not as elegant, of course.
I don't think it is possible to do it without cloning value using scan method.
When you return a non-Copy value from the closure, you lose ownership of that value. And it's not possible to keep any reference to it, because it's new owner could move the value in memory anywhere it wants (for example, during vector resizing), and Rust is intended to protect against this kind of errors.
By chaining solution, do you mean this?
let vec = (0..10)
.fold((Vec::with_capacity(10), 0), |(mut vec, previous), x| {
vec.push(vec![previous + x]);
(vec, previous + x)
})
.0;

Creating a vector with non-constant length

Editor's note: this question was asked before Rust 1.0 and some of the assertions in the question are not necessarily true in Rust 1.0. Some answers have been updated to address both versions.
I want to create a vector, but I only know the size I want the vector to be at runtime. This is how I'm doing it now (i.e. creating an empty, mutable vector, and adding vectors to it) :
fn add_pairs(pairs: ~[int]) -> ~[int] {
let mut result : ~[int] = ~[];
let mut i = 0;
while i < pairs.len() {
result += ~[pairs[i] + pairs[i + 1]];
i += 2;
}
return result;
}
This is how I want to do it (i.e., creating a vector and putting everything in it, instead of adding lots of vectors together):
fn add_pairs(pairs: ~[int]) -> ~[int] {
let number_of_pairs = pairs.len() / 2;
let result : ~[int, ..number_of_pairs];
let mut i = 0;
while i < pairs.len() {
result[i] = pairs[2 * i] + pairs[2 * i + 1];
i += 1;
}
return result;
}
Unfortunately, doing the above gives me something like:
error: expected constant expr for vector length: Non-constant path in constant expr
let result: ~[int, ..number_of_pairs];
^~~~~~~~~~~~~~~~~~~~~~~~
I get the impression that vectors have to have their size known at compile time (and so you need to set their size to a constant). Coming from a Java background, I'm confused! Is there a way to create a vector whose size you only know at runtime?
I'm using Rust 0.6.
In Rust version 1.0.0, they've made the std::vec:Vec public structure stable so that you can instantiate a growable vector with let mut my_vec = Vec::new(); You can also use the vec! macro like so: let mut another_vec = vec![1isize, 2isize, 3isize]; What is important to note is that in both cases the variable you're assigning must be mutable.
With these vectors you can call my_vec.push(num); for individual items or another_vec.extend_from_slice(["list", "of", "objects"]); to add items to the end of the vector.
For your specific problem, you could do something like this:
fn add_pairs(pairs: Vec<(Vec<isize>)>) -> Vec<isize> {
let mut result = Vec::new();
for pair in pairs.iter() {
result.push(pair[0]);
result.push(pair[1]);
}
return result;
}
You can see this in action on the Rust Playground where you have (what I assumed) was a nested vector of integer pairs.
There is no way to create an array of constant length with the length determined at runtime; only compile-time constant length arrays are allowed, so (variations of) your first method with Vec<i32> (previously ~[int]) is the only supported way. You could use vec![0; number_of_pairs] to create a vector of the correct size and use the second part.
There are many helper functions for what you are trying to do (using while directly Rust should be very rare):
fn add_pairs(pairs: &[i32]) -> Vec<i32> {
let mut result = Vec::new();
for i in 0..(pairs.len() / 2) {
result.push(pairs[2 * i] + pairs[2 * i + 1])
}
result
}
Or even
fn add_pairs(pairs: &[i32]) -> Vec<i32> {
pairs
.chunks(2)
.filter(|x| x.len() == 2)
.map(|x| x[0] + x[1])
.collect()
}
Docs: chunks, filter, map, collect. (The filter is just because the last element of chunks may have length 1.)
Also note that adding two vectors allocates a whole new one, while push doesn't do this necessarily and is much faster (and .collect is similar).
In at least Rust 1.0, there is a Vec::with_capacity() function that handles this scenario.
Example code:
let n = 44; // pretend this is determined at run time
let mut v = Vec::<f64>::with_capacity(n);
v.push(6.26);
println!("{:?}", v); // prints [6.26]
println!("{:?}", v.len()); // prints 1
println!("{:?}", v.capacity()); // prints 44

Resources