Rust - Use threads to iterate over a vector - multithreading

My program creates a grid of numbers, then based on the sum of the numbers surrounding each on the grid the number will change in a set way. I'm using two vectors currently, filling the first with random numbers, calculating the changes, then putting the new values in the second vector. After the new values go into the second vector I then push them back into the first vector before going through the next loop. The error I get currently is:
error[E0499]: cannot borrow `grid_a` as mutable more than once at a time
--> src\main.rs:40:29
|
38 | thread::scope(|s| {
| - has type `&crossbeam::thread::Scope<'1>`
39 | for j in 0..grid_b.len() {
40 | s.spawn(|_| {
| - ^^^ `grid_a` was mutably borrowed here in the previous iteration of the loop
| _____________________|
| |
41 | | grid_a[j] = grid_b[j];
| | ------ borrows occur due to use of `grid_a` in closure
42 | | });
| |______________________- argument requires that `grid_a` is borrowed for `'1`
My current code is below. I'm way more familiar with C++ and C#, in the process of trying to learn Rust for this assignment. If I remove the thread everything compiles and runs properly. I'm not understanding how to avoid the multiple borrow. Ideally I'd like to use a separate thread::scope on with the for loop above the existing thread::scope as well.
use crossbeam::thread;
use rand::Rng;
use std::sync::{Arc, Mutex};
use std::thread::sleep;
use std::time::Duration;
use std::time::Instant;
static NUMROWS: i32 = 4;
static NUMCOLUMNS: i32 = 7;
static GRIDSIZE: i32 = NUMROWS * NUMCOLUMNS;
static PLUSNC: i32 = NUMCOLUMNS + 1;
static MINUSNC: i32 = NUMCOLUMNS - 1;
static NUMLOOP: i32 = 7;
static HIGH: u32 = 35;
fn main() {
let start = Instant::now();
let length = usize::try_from(GRIDSIZE).unwrap();
let total_checks = Arc::new(Mutex::new(0));
let mut grid_a = Vec::<u32>::with_capacity(length);
let mut grid_b = Vec::<u32>::with_capacity(length);
grid_a = fill_grid();
for h in 1..=NUMLOOP {
println!("-- {} --", h);
print_grid(&grid_a);
if h != NUMLOOP {
for i in 0..grid_a.len() {
let mut total_checks = total_checks.lock().unwrap();
grid_b[i] = checker(&grid_a, i.try_into().unwrap());
*total_checks += 1;
}
grid_a.clear();
thread::scope(|s| {
for j in 0..grid_b.len() {
s.spawn(|_| {
grid_a[j] = grid_b[j];
});
}
})
.unwrap();
grid_b.clear();
}
}

When you access a vector (or any slice) via index you're borrowing the whole vector.
You can use iterators which can give you mutable references to all the items in parallel.
use crossbeam::thread;
static NUMROWS: i32 = 4;
static NUMCOLUMNS: i32 = 7;
static GRIDSIZE: i32 = NUMROWS * NUMCOLUMNS;
static NUMLOOP: i32 = 7;
fn fill_grid() -> Vec<u32> {
(0..GRIDSIZE as u32).into_iter().collect()
}
fn main() {
let length = usize::try_from(GRIDSIZE).unwrap();
// just do this since else we create a vec and throw it away immediately
let mut grid_a = fill_grid();
let mut grid_b = Vec::<u32>::with_capacity(length);
for h in 1..=NUMLOOP {
println!("-- {} --", h);
println!("{grid_a:?}");
if h != NUMLOOP {
// removed a bunch of unrelated code
for i in 0..grid_a.len() {
grid_b.push(grid_a[i]);
}
// since we overwrite grid_a anyways we don't need to clear it.
// it would give us headaches anyways since grid_a[j] on an empty
// Vec panics.
// grid_a.clear();
thread::scope(|s| {
// instead of accessing the element via index we iterate over
// mutable references so we don't have to borrow the whole
// vector inside the thread
for (pa, b) in grid_a.iter_mut().zip(grid_b.iter().copied()) {
s.spawn(move |_| {
*pa = b + 1;
});
}
})
.unwrap();
grid_b.clear();
}
}
} // add missing }

Related

How do I move a vector reference into threads? [duplicate]

This question already has an answer here:
How can I pass a reference to a stack variable to a thread?
(1 answer)
Closed last month.
How do I move a vector reference into threads? The closest I get is the (minimized) code below. (I realize that the costly calculation still isn't parallel, as it is locked by the mutex, but one problem at a time.)
Base problem: I'm calculating values based on information saved in a vector. Then I'm storing the results as nodes per vector element. So vector in vector (but only one vector in the example code below). The calculation takes time so I would like to divide it into threads. The structure is big, so I don't want to copy it.
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let n = Nodes::init();
n.calc();
println!("Result: nodes {:?}", n);
}
#[derive(Debug)]
struct Nodes {
nodes: Vec<Info>,
}
impl Nodes {
fn init() -> Self {
let mut n = Nodes { nodes: Vec::new() };
n.nodes.push(Info::init(1));
n.nodes.push(Info::init(2));
n
}
fn calc(&self) {
Nodes::calc_associative(&self.nodes);
}
fn calc_associative(nodes: &Vec<Info>) {
let mut handles = vec![];
let arc_nodes = Arc::new(nodes);
let counter = Arc::new(Mutex::new(0));
for _ in 0..2 {
let arc_nodes = Arc::clone(&arc_nodes);
let counter = Arc::clone(&counter);
let handle = thread::spawn(move || {
let mut idx = counter.lock().unwrap();
// costly calculation
arc_nodes[*idx].set_length(arc_nodes[*idx].get_length() * 2);
*idx += 1;
});
handles.push(handle);
}
for handle in handles {
handle.join().unwrap();
}
}
}
#[derive(Debug)]
struct Info {
length: u32,
}
impl Info {
fn init(length: u32) -> Self {
Info { length }
}
fn get_length(&self) -> u32 {
self.length
}
fn set_length(&mut self, x: u32) {
self.length = x;
}
}
The compiler complains that life time of the reference isn't fulfilled, but isn't that what Arc::clone() should do? Then Arc require a deref, but maybe there are better solutions before starting to dig into that...?
Compiling threads v0.1.0 (/home/freefox/proj/threads)
error[E0596]: cannot borrow data in an `Arc` as mutable
--> src/main.rs:37:17
|
37 | arc_nodes[*idx].set_length(arc_nodes[*idx].get_length() * 2);
| ^^^^^^^^^ cannot borrow as mutable
|
= help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Arc<&Vec<Info>>`
error[E0521]: borrowed data escapes outside of associated function
--> src/main.rs:34:26
|
25 | fn calc_associative(nodes: &Vec<Info>) {
| ----- - let's call the lifetime of this reference `'1`
| |
| `nodes` is a reference that is only valid in the associated function body
...
34 | let handle = thread::spawn(move || {
| __________________________^
35 | | let mut idx = counter.lock().unwrap();
36 | | // costly calculation
37 | | arc_nodes[*idx].set_length(arc_nodes[*idx].get_length() * 2);
38 | | *idx += 1;
39 | | });
| | ^
| | |
| |______________`nodes` escapes the associated function body here
| argument requires that `'1` must outlive `'static`
Some errors have detailed explanations: E0521, E0596.
For more information about an error, try `rustc --explain E0521`.
error: could not compile `threads` due to 2 previous errors
You wrap a reference with Arc. Now the type is Arc<&Vec<Info>>. There is still a reference here, so the variable could still be destroyed before the thread return and we have a dangling reference.
Instead, you should take a &Arc<Vec<Info>>, and on the construction of the Vec wrap it in Arc, or take &[Info] and clone it (let arc_nodes = Arc::new(nodes.to_vec());). You also need a mutex along the way (either Arc<Mutex<Vec<Info>>> or Arc<Vec<Mutex<Info>>>), since you want to change the items.
Or better, since you immediately join() the threads, use scoped threads:
fn calc_associative(nodes: &[Mutex<Info>]) {
let counter = std::sync::atomic::AtomicUsize::new(0); // Changed to atomic, prefer it to mutex wherever possible
std::thread::scope(|s| {
for _ in 0..2 {
s.spawn(|| {
let idx = counter.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
let node = &mut *nodes[idx].lock().unwrap();
// costly calculation
node.set_length(node.get_length() * 2);
});
}
});
}

How do I tackle lifetimes in Rust?

I am having issues with the concept of lifetimes in rust. I am trying to use the crate bgpkit_parser to read in a bz2 file via url link and then create a radix trie.
One field extracted from the file is the AS Path which I have named path in my code within the build_routetable function. I am having trouble as to why rust does not like let origin = clean_path.last() which takes the last element in the vector.
fn as_parser(element: &BgpElem) -> Vec<u32> {
let x = &element.as_path.as_ref().unwrap().segments[0];
let mut as_vec = &Vec::new();
let mut as_path: Vec<u32> = Vec::new();
if let AsPathSegment::AsSequence(value) = x {
as_vec = value;
}
for i in as_vec {
as_path.push(i.asn);
}
return as_path;
}
fn prefix_parser(element: &BgpElem) -> String {
let subnet_id = element.prefix.prefix.ip().to_string().to_owned();
let prefix_id = element.prefix.prefix.prefix().to_string().to_owned();
let prefix = format!("{}/{}", subnet_id, prefix_id);//.as_str();
return prefix;
}
fn get_aspath(raw_aspath: Vec<u32>) -> Vec<u32> {
let mut as_path = Vec::new();
for i in raw_aspath {
if i < 64511 {
if as_path.contains(&i) {
continue;
}
else {
as_path.push(i);
}
}
else if 65535 < i && i < 4000000000 {
if as_path.contains(&i) {
continue;
}
else {
as_path.push(i);
}
}
}
return as_path;
}
fn build_routetable(mut trie4: Trie<String, Option<&u32>>, mut trie6: Trie<String, Option<&u32>>) {
let url: &str = "http://archive.routeviews.org/route-views.chile/\
bgpdata/2022.06/RIBS/rib.20220601.0000.bz2";
let parser = BgpkitParser::new(url).unwrap();
let mut count = 0;
for elem in parser {
if elem.elem_type == bgpkit_parser::ElemType::ANNOUNCE {
let record_timestamp = &elem.timestamp;
let record_type = "A";
let peer = &elem.peer_ip;
let prefix = prefix_parser(&elem);
let path = as_parser(&elem);
let clean_path = get_aspath(path);
// Issue is on the below line
// `clean_path` does not live long enough
// borrowed value does not live long
// enough rustc E0597
// main.rs(103, 9): `clean_path` dropped
// here while still borrowed
// main.rs(77, 91): let's call the
// lifetime of this reference `'1`
// main.rs(92, 17): argument requires
// that `clean_path` is borrowed for `'1`
let origin = clean_path.last(); //issue line
if prefix.contains(":") {
trie6.insert(prefix, origin);
}
else {
trie4.insert(prefix, origin);
}
count+=1;
if count >= 10000 {
println!("{:?} | {:?} | {:?} | {:?} | {:?}",
record_type, record_timestamp, peer, prefix, path);
count=0
}
};
}
println!("Trie4 size: {:?} prefixes", trie4.len());
println!("Trie6 size: {:?} prefixes", trie6.len());
}
Short answer: you're "inserting" a reference. But what's being referenced doesn't outlive what it's being inserted into.
Longer: The hint is your trie4 argument, the signature of which is this:
mut trie4: Trie<String, Option<&u32>>
So that lives beyond the length of the loop where things are declared. This is all in the loop:
let origin = clean_path.last(); //issue line
if prefix.contains(":") {
trie6.insert(prefix, origin);
}
While origin is a Vec<u32> and that's fine, the insert method is no doubt taking a String and either an Option<&u32> or a &u32. Obviously a key/value pair. But here's your problem: the value has to live as long as the collection, but your value is the last element contained in the Vec<u32>, which goes away! So you can't put something into it that will not live as long as the "container" object! Rust has just saved you from dangling references (just like it's supposed to).
Basically, your containers should be Trie<String, Option<u32>> without the reference, and then this'll all just work fine. Your problem is that the elements are references, and not just contained regular values, and given the size of what you're containing, it's actually smaller to contain a u32 than a reference (pointer size (though actually, it'll likely be the same either way, because alignment issues)).
Also of note: trie4 and trie6 will both be gone at the end of this function call, because they were moved into this function (not references or mutable references). I hope that's what you want.

Cannot borrow value from a hashmap as mutable because it is also borrowed as immutable

I want to get two values from a hashmap at the same time, but I can't escape the following error, I have simplified the code as follows, can anyone help me to fix this error.
#[warn(unused_variables)]
use hashbrown::HashMap;
fn do_cal(a: &[usize], b: &[usize]) -> usize {
a.iter().sum::<usize>() + b.iter().sum::<usize>()
}
fn do_check(i: usize, j:usize) -> bool {
i/2 < j - 10
}
fn do_expensive_cal(i: usize) -> Vec<usize> {
vec![i,i,i]
}
fn main() {
let size = 1000000;
let mut hash: HashMap<usize, Vec<usize>> = HashMap::new();
for i in 0..size{
if i > 0 {
hash.remove(&(i - 1));
}
if !hash.contains_key(&i){
hash.insert(i, do_expensive_cal(i));
}
let data1 = hash.get(&i).unwrap();
for j in i + 1..size {
if do_check(i, j) {
break
}
if !hash.contains_key(&j){
hash.insert(j, do_expensive_cal(j));
}
let data2 = hash.get(&j).unwrap();
let res = do_cal(data1, data2);
println!("res:{}", res);
}
}
}
Playground
error[E0502]: cannot borrow hash as mutable because it is also borrowed as immutable
--> src/main.rs:26:8
|
19 | let data1 = hash.get(&i).unwrap();
| ------------ immutable borrow occurs here
...
26 | hash.insert(j, vec![1,2,3]);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here
...
29 | let res = do_cal(data1, data2);
| ----- immutable borrow later used here
For more information about this error, try rustc --explain E0502.
error: could not compile playground due to previous error
Consider this: the borrow checker doesn't know that hash.insert(j, …) will leave the data you inserted with hash.insert(i, …) alone. For the borrow checker, hash.insert(…) may do anything to any element in hash, including rewriting or removing it. So you can't be allowed to hold the reference data1 over hash.insert(j, …).
How to get over that? The easiest is probably to move let data1 = hash.get(…) so it doesn't have to live for so long:
let data1 = hash.get(&i).unwrap();
let data2 = hash.get(&j).unwrap();
let res = do_cal(data1, data2);
This will of course look up data1 every loop iteration (and it must, since hash.insert(j, …) may have resized and thus realocated the content of the hashmap, giving data1 a new storage location in the hashmap). For completeness's sake, there are ways to get around that, but I don't recommend you do any of this:
Clone: let data1 = hash.get(&i).unwrap().clone() (if your vecs are short, this may actually be reasonable…)
As a way of making the cloning cheap, you could use a HashMap<usize, Rc<Vec<usize>>> instead (where you only need to clone the Rc, no the entire Vec)
If you ever need mutable references to both arguments of do_call, you could combine the Rc with a RefCell: Rc<RefCell<Vec<…>>>
If you need to overengineer it even more, you could replace the Rcs with references obtained from allocating in a bump allocator, e.g. bumpalo.
Since the keys to the hash table are the integers 0..100 you can use a Vec to perform these steps, temporarily splitting the Vec into 2 slices to allow the mutation on one side. If you need a HashMap for later computations, you can then create a HashMap from the Vec.
The following code compiles but panics because the j - 10 calculation underflows:
fn do_cal(a: &[usize], b: &[usize]) -> usize {
a.iter().sum::<usize>() + b.iter().sum::<usize>()
}
fn do_check(i: usize, j:usize) -> bool {
i/2 < j - 10
}
fn main() {
let size = 100;
let mut v: Vec<Option<Vec<usize>>> = vec![None; size];
for i in 0..size {
let (v1, v2) = v.split_at_mut(i + 1);
if v1[i].is_none() {
v1[i] = Some(vec![1,2,3]);
}
let data1 = v1[i].as_ref().unwrap();
for (j, item) in (i + 1..).zip(v2.iter_mut()) {
if do_check(i, j) {
break
}
if item.is_none() {
*item = Some(vec![1,2,3]);
}
let data2 = item.as_ref().unwrap();
let res = do_cal(data1, data2);
println!("res:{}", res);
}
}
}

How to give each CPU core mutable access to a portion of a Vec? [duplicate]

This question already has an answer here:
How do I pass disjoint slices from a vector to different threads?
(1 answer)
Closed 4 years ago.
I've got an embarrassingly parallel bit of graphics rendering code that I would like to run across my CPU cores. I've coded up a test case (the function computed is nonsense) to explore how I might parallelize it. I'd like to code this using std Rust in order to learn about using std::thread. But, I don't understand how to give each thread a portion of the framebuffer. I'll put the full testcase code below, but I'll try to break it down first.
The sequential form is super simple:
let mut buffer0 = vec![vec![0i32; WIDTH]; HEIGHT];
for j in 0..HEIGHT {
for i in 0..WIDTH {
buffer0[j][i] = compute(i as i32,j as i32);
}
}
I thought that it would help to make a buffer that was the same size, but re-arranged to be 3D & indexed by core first. This is the same computation, just a reordering of the data to show the workings.
let mut buffer1 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
for c in 0..num_logical_cores {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
buffer1[c][y][i] = compute(i as i32,j as i32)
}
}
}
But, when I try to put the inner part of the code in a closure & create a thread, I get errors about the buffer & lifetimes. I basically don't understand what to do & could use some guidance. I want per_core_buffer to just temporarily refer to the data in buffer2 that belongs to that core & allow it to be written, synchronize all the threads & then read buffer2 afterwards. Is this possible?
let mut buffer2 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
let mut handles = Vec::new();
for c in 0..num_logical_cores {
let per_core_buffer = &mut buffer2[c]; // <<< lifetime error
let handle = thread::spawn(move || {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
per_core_buffer[y][i] = compute(i as i32,j as i32)
}
}
});
handles.push(handle)
}
for handle in handles {
handle.join().unwrap();
}
The error is this & I don't understand:
error[E0597]: `buffer2` does not live long enough
--> src/main.rs:50:36
|
50 | let per_core_buffer = &mut buffer2[c]; // <<< lifetime error
| ^^^^^^^ borrowed value does not live long enough
...
88 | }
| - borrowed value only lives until here
|
= note: borrowed value must be valid for the static lifetime...
The full testcase is:
extern crate num_cpus;
use std::time::Instant;
use std::thread;
fn compute(x: i32, y: i32) -> i32 {
(x*y) % (x+y+10000)
}
fn main() {
let num_logical_cores = num_cpus::get();
const WIDTH: usize = 40000;
const HEIGHT: usize = 10000;
let y_per_core = HEIGHT/num_logical_cores + 1;
// ------------------------------------------------------------
// Serial Calculation...
let mut buffer0 = vec![vec![0i32; WIDTH]; HEIGHT];
let start0 = Instant::now();
for j in 0..HEIGHT {
for i in 0..WIDTH {
buffer0[j][i] = compute(i as i32,j as i32);
}
}
let dur0 = start0.elapsed();
// ------------------------------------------------------------
// On the way to Parallel Calculation...
// Reorder the data buffer to be 3D with one 2D region per core.
let mut buffer1 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
let start1 = Instant::now();
for c in 0..num_logical_cores {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
buffer1[c][y][i] = compute(i as i32,j as i32)
}
}
}
let dur1 = start1.elapsed();
// ------------------------------------------------------------
// Actual Parallel Calculation...
let mut buffer2 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
let mut handles = Vec::new();
let start2 = Instant::now();
for c in 0..num_logical_cores {
let per_core_buffer = &mut buffer2[c]; // <<< lifetime error
let handle = thread::spawn(move || {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
per_core_buffer[y][i] = compute(i as i32,j as i32)
}
}
});
handles.push(handle)
}
for handle in handles {
handle.join().unwrap();
}
let dur2 = start2.elapsed();
println!("Runtime: Serial={0:.3}ms, AlmostParallel={1:.3}ms, Parallel={2:.3}ms",
1000.*dur0.as_secs() as f64 + 1e-6*(dur0.subsec_nanos() as f64),
1000.*dur1.as_secs() as f64 + 1e-6*(dur1.subsec_nanos() as f64),
1000.*dur2.as_secs() as f64 + 1e-6*(dur2.subsec_nanos() as f64));
// Sanity check
for j in 0..HEIGHT {
let c = j % num_logical_cores;
let y = j / num_logical_cores;
for i in 0..WIDTH {
if buffer0[j][i] != buffer1[c][y][i] {
println!("wtf1? {0} {1} {2} {3}",i,j,buffer0[j][i],buffer1[c][y][i])
}
if buffer0[j][i] != buffer2[c][y][i] {
println!("wtf2? {0} {1} {2} {3}",i,j,buffer0[j][i],buffer2[c][y][i])
}
}
}
}
Thanks to #Shepmaster for the pointers and clarification that this is not an easy problem for Rust, and that I needed to consider crates to find a reasonable solution. I'm only just starting out in Rust, so this really wasn't clear to me.
I liked the ability to control the number of threads that scoped_threadpool gives, so I went with that. Translating my code from above directly, I tried to use the 4D buffer with core as the most-significant-index and that ran into troubles because that 3D vector does not implement the Copy trait. The fact that it implements Copy makes me concerned about performance, but I went back to the original problem and implemented it more directly & found a reasonable speedup by making each row a thread. Copying each row will not be a large memory overhead.
The code that works for me is:
let mut buffer2 = vec![vec![0i32; WIDTH]; HEIGHT];
let mut pool = Pool::new(num_logical_cores as u32);
pool.scoped(|scope| {
let mut y = 0;
for e in &mut buffer2 {
scope.execute(move || {
for x in 0..WIDTH {
(*e)[x] = compute(x as i32,y as i32);
}
});
y += 1;
}
});
On a 6 core, 12 thread i7-8700K for 400000x4000 testcase this runs in 3.2 seconds serially & 481ms in parallel--a reasonable speedup.
EDIT: I continued to think about this issue and got a suggestion from Rustlang on twitter that I should consider rayon. I converted my code to rayon and got similar speedup with the following code.
let mut buffer2 = vec![vec![0i32; WIDTH]; HEIGHT];
buffer2
.par_iter_mut()
.enumerate()
.map(|(y,e): (usize, &mut Vec<i32>)| {
for x in 0..WIDTH {
(*e)[x] = compute(x as i32,y as i32);
}
})
.collect::<Vec<_>>();

How do I run parallel threads of computation on a partitioned array?

I'm trying to distribute an array across threads and have the threads sum up portions of the array in parallel. I want thread 0 to sum elements 0 1 2 and Thread 1 sum elements 3 4 5. Thread 2 to sum 6 and 7. and Thread 3 to sum 8 and 9.
I'm new to Rust but have coded with C/C++/Java before. I've literally thrown everything and the garbage sink at this program and I was hoping I could receive some guidance.
Sorry my code is sloppy but I will clean it up when it is a finished product. Please ignore all poorly named variables/inconsistent spacing/etc.
use std::io;
use std::rand;
use std::sync::mpsc::{Sender, Receiver};
use std::sync::mpsc;
use std::thread::Thread;
static NTHREADS: usize = 4;
static NPROCS: usize = 10;
fn main() {
let mut a = [0; 10]; // a: [i32; 10]
let mut endpoint = a.len() / NTHREADS;
let mut remElements = a.len() % NTHREADS;
for x in 0..a.len() {
let secret_number = (rand::random::<i32>() % 100) + 1;
a[x] = secret_number;
println!("{}", a[x]);
}
let mut b = a;
let mut x = 0;
check_sum(&mut a);
// serial_sum(&mut b);
// Channels have two endpoints: the `Sender<T>` and the `Receiver<T>`,
// where `T` is the type of the message to be transferred
// (type annotation is superfluous)
let (tx, rx): (Sender<i32>, Receiver<i32>) = mpsc::channel();
let mut scale: usize = 0;
for id in 0..NTHREADS {
// The sender endpoint can be copied
let thread_tx = tx.clone();
// Each thread will send its id via the channel
Thread::spawn(move || {
// The thread takes ownership over `thread_tx`
// Each thread queues a message in the channel
let numTougherThreads: usize = NPROCS % NTHREADS;
let numTasksPerThread: usize = NPROCS / NTHREADS;
let mut lsum = 0;
if id < numTougherThreads {
let mut q = numTasksPerThread+1;
lsum = 0;
while q > 0 {
lsum = lsum + a[scale];
scale+=1;
q = q-1;
}
println!("Less than numToughThreads lsum: {}", lsum);
}
if id >= numTougherThreads {
let mut z = numTasksPerThread;
lsum = 0;
while z > 0 {
lsum = lsum + a[scale];
scale +=1;
z = z-1;
}
println!("Greater than numToughthreads lsum: {}", lsum);
}
// Sending is a non-blocking operation, the thread will continue
// immediately after sending its message
println!("thread {} finished", id);
thread_tx.send(lsum).unwrap();
});
}
// Here, all the messages are collected
let mut globalSum = 0;
let mut ids = Vec::with_capacity(NTHREADS);
for _ in 0..NTHREADS {
// The `recv` method picks a message from the channel
// `recv` will block the current thread if there no messages available
ids.push(rx.recv());
}
println!("Global Sum: {}", globalSum);
// Show the order in which the messages were sent
println!("ids: {:?}", ids);
}
fn check_sum (arr: &mut [i32]) {
let mut sum = 0;
let mut i = 0;
let mut size = arr.len();
loop {
sum += arr[i];
i+=1;
if i == size { break; }
}
println!("CheckSum is {}", sum);
}
So far I've gotten it to do this much. Can't figure out why threads 0 and 1 have the same sum as well as 2 and 3 doing the same thing:
-5
-49
-32
99
45
-65
-64
-29
-56
65
CheckSum is -91
Greater than numTough lsum: -54
thread 2 finished
Less than numTough lsum: -86
thread 1 finished
Less than numTough lsum: -86
thread 0 finished
Greater than numTough lsum: -54
thread 3 finished
Global Sum: 0
ids: [Ok(-86), Ok(-86), Ok(-54), Ok(-54)]
I managed to rewrite it to work with even numbers by using the below code.
while q > 0 {
if id*s+scale == a.len() { break; }
lsum = lsum + a[id*s+scale];
scale +=1;
q = q-1;
}
println!("Less than numToughThreads lsum: {}", lsum);
}
if id >= numTougherThreads {
let mut z = numTasksPerThread;
lsum = 0;
let mut scale = 0;
while z > 0 {
if id*numTasksPerThread+scale == a.len() { break; }
lsum = lsum + a[id*numTasksPerThread+scale];
scale = scale + 1;
z = z-1;
}
Welcome to Rust! :)
Yeah at first I didn't realize each thread gets it's own copy of scale
Not only that! It also gets its own copy of a!
What you are trying to do could look like the following code. I guess it's easier for you to see a complete working example since you seem to be a Rust beginner and asked for guidance. I deliberately replaced [i32; 10] with a Vec since a Vec is not implicitly Copyable. It requires an explicit clone(); we cannot copy it by accident. Please note all the larger and smaller differences. The code also got a little more functional (less mut). I commented most of the noteworthy things:
extern crate rand;
use std::sync::Arc;
use std::sync::mpsc;
use std::thread;
const NTHREADS: usize = 4; // I replaced `static` by `const`
// gets used for *all* the summing :)
fn sum<I: Iterator<Item=i32>>(iter: I) -> i32 {
let mut s = 0;
for x in iter {
s += x;
}
s
}
fn main() {
// We don't want to clone the whole vector into every closure.
// So we wrap it in an `Arc`. This allows sharing it.
// I also got rid of `mut` here by moving the computations into
// the initialization.
let a: Arc<Vec<_>> =
Arc::new(
(0..10)
.map(|_| {
(rand::random::<i32>() % 100) + 1
})
.collect()
);
let (tx, rx) = mpsc::channel(); // types will be inferred
{ // local scope, we don't need the following variables outside
let num_tasks_per_thread = a.len() / NTHREADS; // same here
let num_tougher_threads = a.len() % NTHREADS; // same here
let mut offset = 0;
for id in 0..NTHREADS {
let chunksize =
if id < num_tougher_threads {
num_tasks_per_thread + 1
} else {
num_tasks_per_thread
};
let my_a = a.clone(); // refers to the *same* `Vec`
let my_tx = tx.clone();
thread::spawn(move || {
let end = offset + chunksize;
let partial_sum =
sum( (&my_a[offset..end]).iter().cloned() );
my_tx.send(partial_sum).unwrap();
});
offset += chunksize;
}
}
// We can close this Sender
drop(tx);
// Iterator magic! Yay! global_sum does not need to be mutable
let global_sum = sum(rx.iter());
println!("global sum via threads : {}", global_sum);
println!("global sum single-threaded: {}", sum(a.iter().cloned()));
}
Using a crate like crossbeam you can write this code:
use crossbeam; // 0.7.3
use rand::distributions::{Distribution, Uniform}; // 0.7.3
const NTHREADS: usize = 4;
fn random_vec(length: usize) -> Vec<i32> {
let step = Uniform::new_inclusive(1, 100);
let mut rng = rand::thread_rng();
step.sample_iter(&mut rng).take(length).collect()
}
fn main() {
let numbers = random_vec(10);
let num_tasks_per_thread = numbers.len() / NTHREADS;
crossbeam::scope(|scope| {
// The `collect` is important to eagerly start the threads!
let threads: Vec<_> = numbers
.chunks(num_tasks_per_thread)
.map(|chunk| scope.spawn(move |_| chunk.iter().cloned().sum::<i32>()))
.collect();
let thread_sum: i32 = threads.into_iter().map(|t| t.join().unwrap()).sum();
let no_thread_sum: i32 = numbers.iter().cloned().sum();
println!("global sum via threads : {}", thread_sum);
println!("global sum single-threaded: {}", no_thread_sum);
})
.unwrap();
}
Scoped threads allow you to pass in a reference that is guaranteed to outlive the thread. You can then use the return value of the thread directly, skipping channels (which are great, just not needed here!).
I followed How can I generate a random number within a range in Rust? to generate the random numbers. I also changed it to be the range [1,100], as I think that's what you meant. However, your original code is actually [-98,100], which you could also do.
Iterator::sum is used to sum up an iterator of numbers.
I threw in some rough performance numbers of the thread work, ignoring the vector construction, working on 100,000,000 numbers, using Rust 1.34 and compiling in release mode:
| threads | time (ns) | relative time (%) |
|---------+-----------+-------------------|
| 1 | 33824667 | 100.00 |
| 2 | 16246549 | 48.03 |
| 3 | 16709280 | 49.40 |
| 4 | 14263326 | 42.17 |
| 5 | 14977901 | 44.28 |
| 6 | 12974001 | 38.36 |
| 7 | 13321743 | 39.38 |
| 8 | 13370793 | 39.53 |
See also:
How can I pass a reference to a stack variable to a thread?
All your tasks get a copy of the scale variable. Thread 1 and 2 both do the same thing since each has scale with a value of 0 and modifies it in the same manner as the other thread.
The same goes for Thread 3 and 4.
Rust prevents you from breaking thread safety. If scale were shared by the threads, you would have race conditions when accessing the variable.
Please read about closures, they explain the variable copying part, and about threading which explains when and how you can share variables between threads.

Resources