How to use Rayon for parallel calculation of PI - multithreading

My code can calculate Pi using Ramanujan's formula without Rayon and I want to implement Rayon for parallel threading as this is my project.
I know that I need to use this
use rayon::prelude::*;
fn sum_of_squares(input: &[f64]) ->f64 {
for i in total.iter() // <-- just change that!
.map(|&i| i * i)
.sum()
}
but I still don't understand what to do.
Here is my code
use rayon::prelude::*;
pub fn factorial(n: f64) -> f64 {
if n == 0.0 {
return 1.00;
} else {
let result: f64 = factorial(n - 1.0) * n;
return result;
}
}
pub fn estimate_pi() -> f64 {
let mut total: f64 = 0.0;
let mut k: f64 = 0.0;
let factor1: f64 = 2.0;
let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;
loop {
let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
let numm: f64 = 396.0;
let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
let term: f64 = factor * num / den;
total += term;
if term.abs() < 1e-15 {
break;
}
k += 1.0;
}
return 1.0 / total;
}
fn main() {
println!("{}", estimate_pi());
}
Playground

The first step is to make your algorithm parallelizable, by making each iteration independent. The first thing I did was add a debug statement to print the final value of k:
k += 1.0;
}
dbg!(k);
return 1.0 / total;
that printed k = 2, so I can use that to create a range of k values independent for each iteration:
(0..=iterations) // [0, 1, 2] for iterations = 2
We'll iterate over the elements in that range, instead of using the epsilon check you have:
pub fn estimate_pi(iterations: usize) -> f64 {
let mut total: f64 = 0.0;
let factor1: f64 = 2.0;
let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;
for i in 0..=iterations {
let k: f64 = i as f64;
let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
let numm: f64 = 396.0;
let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
let term: f64 = factor * num / den;
total += term;
}
return 1.0 / total;
}
// call estimate_pi(2)
Total is just a sum of all of the iterations, so we can convert this from a loop into an map-reduce operation. For each number in the range, we calculate the term. Then, we use fold (reduce) to calculate the sum.
pub fn estimate_pi(iterations: usize) -> f64 {
let factor1: f64 = 2.0;
let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;
let sum: f64 = (0..=iterations).into_iter().map(|i| {
let k: f64 = i as f64;
let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
let numm: f64 = 396.0;
let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
let term: f64 = factor * num / den;
term
}).fold(0.0, |a, b| a + b);
return 1.0 / sum;
}
Now we can use rayon's methods to convert this into a parallel operation. Replace into_iter() with into_par_iter() and fold(0.0, |a, b| a + b) with reduce(|| 0.0, |a, b| a + b):
pub fn estimate_pi(iterations: usize) -> f64 {
let factor1: f64 = 2.0;
let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;
// map is now a parallel map, and reduce is a parallel reduce
let sum: f64 = (0..=iterations).into_par_iter().map(|i| {
let k: f64 = i as f64;
let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
let numm: f64 = 396.0;
let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
let term: f64 = factor * num / den;
term
}).reduce(|| 0.0, |a, b| a + b);
return 1.0 / sum;
}
Now to clean up the code a bit to make it more idiomatic:
remove explicit typing where appropriate
use implicit returns
use constant for sqrt(2)
more meaningful variable names
embed 396 in the expressions
use std::f64::consts::*;
pub fn estimate_pi(iterations: usize) -> f64 {
let factor = (SQRT_2 * 2.0) / 9801.0;
let sum = (0..=iterations).into_par_iter().map(|i| {
let k = i as f64;
let numerator = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
let denominator = factorial(k).powf(4.0) * (396_f64).powf(4.0 * k);
factor * numerator / denominator
}).reduce(|| 0.0, |a, b| a + b);
1.0 / sum
}
As a last step, we can make factorial parallel as well:
// now have to call this with a `usize`
pub fn factorial(n: usize) -> f64 {
let out = (1..=n).into_par_iter().reduce(|| 1, |a, b| a * b);
out as f64
}
pub fn estimate_pi(iterations: usize) -> f64 {
let factor = (SQRT_2 * 2.0) / 9801.0;
let sum = (0..=iterations).into_par_iter().map(|i| {
let k = i as f64;
// notice we now pass the `i: usize` in here
let numerator = factorial(4 * i) * (1103.0 + (26390.0 * k));
let denominator = factorial(i).powf(4.0) * (396_f64).powf(4.0 * k);
factor * numerator / denominator
}).reduce(|| 0.0, |a, b| a + b);
1.0 / sum
}
Final Code
use rayon::prelude::*;
use std::f64::consts::*;
pub fn factorial(n: usize) -> f64 {
let out = (1..=n).into_par_iter().reduce(|| 1, |a, b| a * b);
out as f64
}
pub fn estimate_pi(iterations: usize) -> f64 {
let factor = (SQRT_2 * 2.0) / 9801.0;
let sum = (0..=iterations).into_par_iter().map(|i| {
let k = i as f64;
let numerator = factorial(4 * i) * (1103.0 + (26390.0 * k));
let denominator = factorial(i).powf(4.0) * (396_f64).powf(4.0 * k);
factor * numerator / denominator
}).reduce(|| 0.0, |a, b| a + b);
1.0 / sum
}
fn main() {
// our algorithm results in the same value as the constant
println!("pi_a: {:.60}", estimate_pi(2));
println!("pi_c: {:.60}", PI);
}
Output
pi_a: 3.141592653589793115997963468544185161590576171875000000000000
pi_c: 3.141592653589793115997963468544185161590576171875000000000000
Playground
Recommendations
You should benchmark different versions of this with different amounts of parallelism to see what is more or less performant. Could be that rayon parallel iterations results in less performance since you have so few total iterations.
You might also consider using a lookup table for factorials since n <= k * 4 <= 8:
pub fn factorial(n: usize) -> f64 {
const TABLE: [f64; 9] = [
1.0, // 0!
1.0, // 1!
2.0, // 2!
6.0, // 3!
24.0, // 4!
120.0, // 5!
720.0, // 6!
5040.0, // 7!
40320.0, // 8!
];
TABLE[n]
}
Playground
And, of course, enabling inlining can help as well.

Related

Writing rust function with traits working for Vec and array []

I would like to implement a function in rust, computing the norm of an array or Vec
for an Vec<f64> I would write the function as
pub fn vector_norm( vec_a : &Vec<f64> ) -> f64 {
let mut norm = 0 as f64;
for i in 0..vec_a.len(){
norm += vec_a[i] * vec_a[i];
}
norm.sqrt()
}
and for an &[f64] I would do
pub fn vector_norm( vec_a : &[f64] ) -> f64 {
let mut norm = 0 as f64;
for i in 0..vec_a.len(){
norm += vec_a[i] * vec_a[i];
}
norm.sqrt()
}
But is there a way to combine both versions into a single function by the use of traits. I was thinking of something like
pub fn vector_norm<T:std::iter::ExactSizeIterator>
( vec_a : &T ) -> f64 {
let mut norm = 0 as f64;
for i in 0..vec_a.len(){
norm += vec_a[i] * vec_a[i];
}
norm.sqrt()
}
This does not work because the the template parameter T is not indexable. Is it possible to do this somehow?? Maybe with an iterator trait or something?
First of all, Vec<T> implements Deref for [T]. This means that &Vec<f64> can be implicitly converted into &[f64]. So, just taking in a &[f64] will work:
fn vector_norm(vec_a: &[f64]) -> f64 {
let mut norm = 0 as f64;
for i in 0..vec_a.len() {
norm += vec_a[i] * vec_a[i];
}
norm.sqrt()
}
fn main() {
let my_vec = vec![1.0, 2.0, 3.0];
// &my_vec is implicitly converted to &[f64]
println!("{:?}", vector_norm(&my_vec));
}
However, if you want to broaden the acceptable values even further to all slice-like types, perhaps AsRef may be of use:
fn vector_norm<T: AsRef<[f64]>>(vec_a: T) -> f64 {
// use AsRef to get a &[f64]
let vec_a: &[f64] = vec_a.as_ref();
let mut norm = 0 as f64;
for i in 0..vec_a.len() {
norm += vec_a[i] * vec_a[i];
}
norm.sqrt()
}
fn main() {
let my_vec = vec![1.0, 2.0, 3.0];
println!("{:?}", vector_norm(&my_vec));
}
In addition to Aplet's answer, I'd add that if you're taking something that is only going to be used in a for _ in loop, you might want to look at IntoIterator.
fn vector_norm<T: IntoIterator<Item = f64>>(t: T) -> f64 {
let mut norm = 0f64;
for i in t {
norm += i * i;
}
norm.sqrt()
}
When you write for i in t, the compiler rewrites that into something that looks a bit more like this:
let mut iter = t.into_iter();
loop {
match iter.next() {
None => break,
Some(i) => {
// loop body
}
}
}
So if you only want to constrain your input as "something that works in a for loop", IntoIterator is the trait you're looking for.

Lifetimes for threading with Arc [duplicate]

This question already has answers here:
The lifetime of self parameter in Rust when using threads [duplicate]
(1 answer)
How can I pass a reference to a stack variable to a thread?
(1 answer)
Closed 2 years ago.
I implemented a simple game of life and I want to try multithreading for the first time.
I use a 1D representation of the gamefield according to the formula index = y * height + x. Each cell is stored in a HashSet. A cell is alive if its contained in the set.
The relevant part of the code regarding the multithreading ([playground][1]):
extern crate crossbeam_utils; // 0.8.1
use std::collections::HashSet;
use std::sync::Arc;
use crossbeam_utils::thread;
pub struct GameOfLife {
width: u32,
height: u32,
live_cells: HashSet<u32>,
}
impl GameOfLife {
pub fn new(width: u32, height: u32) -> Self {
GameOfLife {
width,
height,
live_cells: HashSet::with_capacity((width * height / 2) as usize),
}
}
fn live_neighbours(&self, pos: u32) -> usize {
let indicies: [i64; 8] = [
pos as i64 - self.width as i64 - 1,
pos as i64 - self.width as i64,
pos as i64 - self.width as i64 + 1,
pos as i64 - 1,
pos as i64 + 1,
pos as i64 + self.width as i64 - 1,
pos as i64 + self.width as i64,
pos as i64 + self.width as i64 + 1,
];
indicies
.iter()
.filter(|&&i| {
i >= 0
&& i < (self.width * self.height) as i64
&& self.live_cells.contains(&(i as u32))
})
.count()
}
fn next_gen(&self, start: u32, stop: u32) -> HashSet<u32> {
(start..stop)
.into_iter()
.filter(
|&i| match (self.live_neighbours(i), self.live_cells.contains(&i)) {
(2, true) | (3, _) => true,
_ => false,
},
)
.collect::<HashSet<u32>>()
}
pub fn next_generation(&mut self) {
let size = self.width * self.height;
let half = (size / 2) as u32;
let mut new_cells = HashSet::<u32>::with_capacity(half as usize);
{
let h1 = thread::scope(|s| s.spawn(|_| self.next_gen(0, half))).unwrap();
let h2 = thread::scope(|s| s.spawn(|_| self.next_gen(half, size))).unwrap();
new_cells.extend(&h1.join().unwrap());
new_cells.extend(&h2.join().unwrap());
}
self.live_cells = new_cells;
}
}
fn main() {
let mut gol = GameOfLife::new(170, 40);
//seed random values ...
for _i in 0..1000000 {
gol.next_generation();
}
}
I am using the crate crossbeam_util="0.8".
I am now getting this error:
error: lifetime may not live long enough
--> src/main.rs:60:40
|
60 | let h1 = thread::scope(|s| s.spawn(|_| self.next_gen(0, half))).unwrap();
| -- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ returning this value requires that `'1` must outlive `'2`
| ||
| |return type of closure is ScopedJoinHandle<'2, HashSet<u32>>
| has type `&'1 Scope<'_>`
Updated playground link: playground

How to fix rust operations not working as expected?

I have implemented a simple command-line calculator in Rust. The add function acts as normal but the subtract, multiply, and divide functions don't work. The rest of the code is on GitHub: https://github.com/henryboisdequin/rust-calculator.
calc.rs
impl Calc {
pub fn add(arr: Vec<i64>) -> f64 {
let mut total: f64 = 0.0;
for num in arr {
total += num as f64;
}
total
}
pub fn sub(arr: Vec<i64>) -> f64 {
let mut total: f64 = 0.0;
for num in arr {
total -= num as f64;
}
total
}
pub fn mul(arr: Vec<i64>) -> f64 {
let mut total: f64 = 0.0;
for num in arr {
total *= num as f64;
}
total
}
pub fn div(arr: Vec<i64>) -> f64 {
let mut total: f64 = 0.0;
for num in arr {
total /= num as f64;
}
total
}
}
Instead of having your functions take Vec<i64>, I would instead suggest &[i64], or even &[f64] to avoid the as f64. This wouldn't really break your existing code, as you can just borrow a Vec<i64>, to have it auto dereference into &[i64].
You can simplify add() by using sum(), and mul() by using product().
pub fn add(arr: &[i64]) -> f64 {
arr.iter().map(|&x| x as f64).sum()
}
pub fn mul(arr: &[i64]) -> f64 {
arr.iter().map(|&x| x as f64).product()
}
You can similarly simplify sub() and div() with next() and then fold().
pub fn sub(arr: &[i64]) -> f64 {
let mut it = arr.iter().map(|&x| x as f64);
it.next()
.map(|x| it.fold(x, |acc, x| acc - x))
.unwrap_or(0.0)
}
pub fn div(arr: &[i64]) -> f64 {
let mut it = arr.iter().map(|&x| x as f64);
it.next()
.map(|x| it.fold(x, |acc, x| acc / x))
.unwrap_or(0.0)
}
You can even simplify them further, by using fold_first(). However that is currently experimental and nightly only. Instead you can use fold1() from the itertools crate, or reduce() from the reduce crate.
// itertools = "0.10"
use itertools::Itertools;
pub fn sub(arr: &[i64]) -> f64 {
arr.iter().map(|&x| x as f64).fold1(|a, b| a - b).unwrap_or(0.0)
}
pub fn div(arr: &[i64]) -> f64 {
arr.iter().map(|&x| x as f64).fold1(|a, b| a / b).unwrap_or(0.0)
}
You can even replace the closures with Sub::sub and Div::div.
// itertools = "0.10"
use itertools::Itertools;
use std::ops::{Div, Sub};
pub fn sub(arr: &[i64]) -> f64 {
arr.iter().map(|&x| x as f64).fold1(Sub::sub).unwrap_or(0.0)
}
pub fn div(arr: &[i64]) -> f64 {
arr.iter().map(|&x| x as f64).fold1(Div::div).unwrap_or(0.0)
}
Siguza helped me fix this problem by specifying that my addition function only works because addition is commutative but the other operations are failing because they are not.
Here is the right code:
pub struct Calc;
impl Calc {
pub fn add(arr: Vec<i64>) -> f64 {
let mut total: f64 = 0.0;
for num in arr {
total += num as f64;
}
total
}
pub fn sub(arr: Vec<i64>) -> f64 {
let mut total: f64 = arr[0] as f64;
let mut counter = 0;
while counter != arr.len() - 1 {
total -= arr[counter + 1] as f64;
counter += 1;
}
total
}
pub fn mul(arr: Vec<i64>) -> f64 {
let mut total: f64 = arr[0] as f64;
let mut counter = 0;
while counter != arr.len() - 1 {
total *= arr[counter + 1] as f64;
counter += 1;
}
total
}
pub fn div(arr: Vec<i64>) -> f64 {
let mut total: f64 = arr[0] as f64;
let mut counter = 0;
while counter != arr.len() - 1 {
total /= arr[counter + 1] as f64;
counter += 1;
}
total
}
}
For the operations excluding 0, instead of assigning the total to 0.0, I assigned the total to the first element of the given array and -/*// the total with the rest of the elements in the array.

Can I do this with an iterator?

Hi I wrote a function that maps a vector to the interval [0,1]:
fn vec2interval(v: &Vec<f32>) -> Vec<f32> {
let total: f32 = v.iter().sum();
let mut interval: Vec<f32> = vec![0f32; v.len()];
interval[0] = v[0] / total;
for i in 1..v.len() {
interval[i] = interval[i-1] + v[i] / total;
}
return interval;
}
Is there any way to do the same with iterator? I wrote the following but it's slower and needs a for loop:
fn vec2interval(v: &Vec<f32>) -> Vec<f32> {
let total: f32 = v.iter().sum();
let mut interval: Vec<f32> = v
.iter()
.map(|x| x / total)
.collect::<Vec<f32>>();
for i in 1..v.len() {
interval[i] = interval[i-1] + interval[i];
}
return interval;
}
scan can do all of the job:
fn vec2interval(v: &Vec<f32>) -> Vec<f32> {
let total: f32 = v.iter().sum();
v.iter()
.scan(0.0, |acc, x| {
*acc += x / total;
Some(*acc)
})
.collect()
}
Also, slice (&[u8]) better be used instead of Vec<_> as a parameter.

How to identify floating point precision at runtime with Rust?

How to identify that the precision of 1.0005 in the below code (rust playground link) is 4 at runtime?:
fn round(n: f64, precision: u32) -> f64 {
(n * 10_u32.pow(precision) as f64).round() / 10_i32.pow(precision) as f64
}
fn main() {
let x = 1.0005_f64;
println!("{:?}", round(x, 1));
println!("{:?}", round(x, 2));
println!("{:?}", round(x, 3));
println!("{:?}", round(x, 4));
println!("{:?}", round(x, 5));
}
I'm not sure whether I understand the question correctly. You want the number of decimal places?
fn round(n: f64, precision: u32) -> f64 {
(n * 10_u32.pow(precision) as f64).round() / 10_i32.pow(precision) as f64
}
fn precision(x: f64) -> Option<u32> {
for digits in 0..std::f64::DIGITS {
if round(x, digits) == x {
return Some(digits);
}
}
None
}
fn main() {
let x = 1.0005_f64;
println!("{:?}", precision(x));
}
Playground
I'd also recommend making the types in your round function a bit larger, so you don't run into overflow so fast. The above code fails already as x = 1e-10.
fn round(n: f64, precision: u32) -> f64 {
let precision = precision as f64;
(n * 10_f64.powf(precision)).round() / 10_f64.powf(precision)
}

Resources