Mapping a 2d slice in parallel - rust

I'm trying to map a 2d slice in parallel.
I'm doing this by assigning every child slice to one thread. So, if the slice is 10x10, 10 threads will be spawned.
I'm doing this by creating a Mutex for each child slice. The problem is that I'm getting an error for the parent slice:
error[E0759]: `collection` has an anonymous lifetime `'_` but it needs to satisfy a `'static` lifetime requirement
--> src\lib.rs:151:34
|
145 | fn gen_map_par<T: Send + Sync, F: Fn(f32, &mut T) + Send + Sync + 'static>(&self, collection: &[&[T]], predicate: F) {
| ------- this data with an anonymous lifetime `'_`...
...
151 | for (x, i) in collection.iter().enumerate() {
| ^^^^ ...is captured here...
...
155 | let handle = thread::spawn(move || {
| ------------- ...and is required to live as long as `'static` here
For more information about this error, try `rustc --explain E0759`.
The error shows that collection is being captured. Here is the associated code.
fn gen_map_par<T: Send + Sync, F: Fn(f32, &mut T) + Send + Sync + 'static>(
&self,
collection: &[&[T]],
predicate: F,
) {
let perm = Arc::new(self.perm);
let closure = Arc::new(predicate);
let mut handles = Vec::new();
for (x, i) in collection.iter().enumerate() {
// iterating through parent slice
let p = Arc::clone(&perm);
let c = Arc::clone(&closure);
let s = Arc::new(Mutex::new(*i)); // mutex created for one entire child slice
let handle = thread::spawn(move || {
// spawning the thread
let slice = s.lock().unwrap();
// collection gets moved into the thread for some reason.
for (y, e) in slice.iter().enumerate() {
let v = Simplex::convert_range(
max_2d,
min_2d,
1.0,
-1.0,
generate2d(x as f32, y as f32, &p),
);
(c)(v, &mut slice[y]);
}
});
handles.push(handle);
}
for handle in handles {
handle.join().unwrap();
}
}
I've tried wrapping the 'collection' in an arc/mutex, but neither of these works. I'm not sure how to solve this problem. The only solution I've found is to make collection static, but this would require the user to input a static variable, which I do not want to require.

After trying crossbeam::scope(), doing some research, and reading #Ceasars' suggestions, I have concluded that using a closure inside of a thread like this is either impossible or very difficult to do because the data has to be moved into the closure as a mutable reference, something the compiler will not allow as far as I can tell. Moving on to another project.
Thanks to everybody who responded, reading your suggestions helped me understand parallelism in rust better.

Related

Proper way to share references to Vec between threads

I am new to rust and I am attempting to create a Vec that will live on the main thread, and pass a reference to another thread, which then pushes members onto the vector, for the main thread to use.
use std::{thread};
fn main() {
let mut v: Vec<u8> = Vec::new();
let _ = thread::spawn(move || {
vec_push(&mut v, 0)
});
for i in v.iter_mut() {
println!("poo {}", i);
}
}
fn vec_push(v: &mut Vec<u8>, n: u8) {
v.push(n);
}
This is a simplified version of what I am trying to do. In my main code I am want it to be a Vec of TcpStreams.
I think this post would also apply to maintaining a struct (that doesn't implement Copy) between threads.
I get this error
error[E0382]: borrow of moved value: `v`
--> src/main.rs:8:11
|
4 | let mut v: Vec<u8> = Vec::new();
| ----- move occurs because `v` has type `Vec<u8>`, which does not implement the `Copy` trait
5 | let _ = thread::spawn(move || {
| ------- value moved into closure here
6 | vec_push(&mut v, 0)
| - variable moved due to use in closure
7 | });
8 | for i in v.iter_mut() {
| ^^^^^^^^^^^^ value borrowed here after move
Is there a better way to do this? Am I missing some basic concept?
Any help would be useful, I am used to C where I can just throw around references willy-nilly
What you are doing is wildly unsound. You are trying to have two mutable references to a object, which is strictly forbidden in rust. Rust forbids this to prevent you from having data races that would result in memory unsafety.
If you want to mutate an object from different threads you have to synchronize it somehow. The easiest way to do it is by using Mutex. This probably won't be very efficient in a high-congestion scenario (as locking a mutex can become your bottle neck), but it will be safe.
To share this Mutex between threads you can wrap it in an Arc (an atomic counted shared reference smart pointer). So your code can be transformed to something like this:
use std::thread;
use std::sync::{Arc, Mutex};
fn main() {
let v = Arc::new(Mutex::new(Vec::new()));
let v_clone = Arc::clone(&v);
let t = thread::spawn(move || {
vec_push(v_clone, 0)
});
t.join().unwrap();
for i in v.lock().unwrap().iter_mut() {
println!("poo {}", i);
}
}
fn vec_push(v: Arc<Mutex<Vec<u8>>>, n: u8) {
v.lock().unwrap().push(n);
}
You probably will also want to join your spawned thread, so you should name it.

Life-time error in Parallel segmented-prime-sieve in Rust

I am trying to make parallel a prime sieve in Rust but Rust compiler not leave of give me a lifemetime error with the parameter true_block.
And data races are irrelevant because how primes are defined.
The error is:
error[E0621]: explicit lifetime required in the type of `true_block`
--> src/sieve.rs:65:22
|
50 | true_block: &mut Vec<bool>,
| -------------- help: add explicit lifetime `'static` to the type of `true_block`: `&'static mut Vec<bool>`
...
65 | handles.push(thread::spawn(move || {
| ^^^^^^^^^^^^^ lifetime `'static` required
The code is:
fn extend_helper(
primes: &Vec<usize>,
true_block: &mut Vec<bool>,
segment_min: usize,
segment_len: usize,
) {
let mut handles: Vec<thread::JoinHandle<()>> = vec![];
let arc_primes = Arc::new(true_block);
let segment_min = Arc::new(segment_min);
let segment_len = Arc::new(segment_len);
for prime in primes {
let prime = Arc::new(prime.clone());
let segment_min = Arc::clone(&segment_min);
let segment_len = Arc::clone(&segment_len);
let shared = Arc::clone(&arc_primes);
handles.push(thread::spawn(move || {
let tmp = smallest_multiple_of_n_geq_m(*prime, *segment_min) - *segment_min;
for j in (tmp..*segment_len).step_by(*prime) {
shared[j] = false;
}
}));
}
for handle in handles {
handle.join().unwrap();
}
}
The lifetime issue is with true_block which is a mutable reference passed into the function. Its lifetime is only as long as the call to the function, but because you're spawning a thread, the thread could live past the end of the function (the compiler can't be sure). Since it's not known how long the thread will live for, the lifetime of data passed to it must be 'static, meaning it will live until the end of the program.
To fix this, you can use .to_owned() to clone the array stored in the Arc, which will get around the lifetime issue. You would then need to copy the array date out of the Arc after joining the threads and write it back to the true_block reference. But it's also not possible to mutate an Arc, which only holds an immutable reference to the value. You need to use an Arc<Mutex<_>> for arc_primes and then you can mutate shared with shared.lock().unwrap()[j] = false; or something safer

How to return the captured variable from `FnMut` closure, which is a captor at the same time

I have a function, collect_n, that returns a Future that repeatedly polls a futures::sync::mpsc::Receiver and collects the results into a vector, and resolves with that vector. The problem is that it consumes the Receiver, so that Receiver can't be used again. I'm trying to write a version that takes ownership of the Receiver but then returns ownership back to the caller when the returned Future resolves.
Here's what I wrote:
/// Like collect_n but returns the mpsc::Receiver it consumes so it can be reused.
pub fn collect_n_reusable<T>(
mut rx: mpsc::Receiver<T>,
n: usize,
) -> impl Future<Item = (mpsc::Receiver<T>, Vec<T>), Error = ()> {
let mut events = Vec::new();
future::poll_fn(move || {
while events.len() < n {
let e = try_ready!(rx.poll()).unwrap();
events.push(e);
}
let ret = mem::replace(&mut events, Vec::new());
Ok(Async::Ready((rx, ret)))
})
}
This results in a compile error:
error[E0507]: cannot move out of `rx`, a captured variable in an `FnMut` closure
--> src/test_util.rs:200:26
|
189 | mut rx: mpsc::Receiver<T>,
| ------ captured outer variable
...
200 | Ok(Async::Ready((rx, ret)))
| ^^ move occurs because `rx` has type `futures::sync::mpsc::Receiver<T>`, which does not implement the `Copy` trait
How can I accomplish what I'm trying to do?
It is not possible unless your variable is shareable like Rc or Arc, since FnMut can be called multiple times it is possible that your closure needs to return the captured variable more than one. But after returning you lose the ownership of the variable so you cannot return it back, due to safety Rust doesn't let you do this.
According to your logic we know that once your Future is ready it will not need to be polled again, so we can create a solution without using smart pointer. Lets consider a container object like below, I've used Option:
use futures::sync::mpsc;
use futures::{Future, Async, try_ready};
use futures::stream::Stream;
use core::mem;
pub fn collect_n_reusable<T>(
mut rx: mpsc::Receiver<T>,
n: usize,
) -> impl Future<Item = (mpsc::Receiver<T>, Vec<T>), Error = ()> {
let mut events = Vec::new();
let mut rx = Some(rx); //wrapped with an Option
futures::future::poll_fn(move || {
while events.len() < n {
let e = try_ready!(rx.as_mut().unwrap().poll()).unwrap();//used over an Option
events.push(e);
}
let ret = mem::replace(&mut events, Vec::new());
//We took it from option and returned.
//Careful this will panic if the return line got execute more than once.
Ok(Async::Ready((rx.take().unwrap(), ret)))
})
.fuse()
}
Basically we captured the container not the receiver. So this made compiler happy. I used fuse() to make sure the closure will not be called again after returning Async::Ready, otherwise it would panic for further polls.
The other solution would be using std::mem::swap :
futures::future::poll_fn(move || {
while events.len() < n {
let e = try_ready!(rx.poll()).unwrap();
events.push(e);
}
let mut place_holder_receiver = futures::sync::mpsc::channel(0).1; //creating object with same type to swap with actual one.
let ret = mem::replace(&mut events, Vec::new());
mem::swap(&mut place_holder_receiver, &mut rx); //Swapping the actual receiver with the placeholder
Ok(Async::Ready((place_holder_receiver, ret))) //so we can return placeholder in here
})
.fuse()
Simply I've swapped actual receiver with the place_holder_receiver. Since placeholder is created in FnMut(Not captured) we can return it as many time as we want, so compiler is happy again. Thanks to fuse() this closure will not be called after successful return, otherwise it would return the Receivers with a dropped Sender.
See also:
https://docs.rs/futures/0.3.4/futures/future/trait.FutureExt.html#method.fuse

Getting first member of a BTreeSet

In Rust, I have a BTreeSet that I'm using to keep my values in order. I have a loop that should retrieve and remove the first (lowest) member of the set. I'm using a cloned iterator to retrieve the first member. Here's the code:
use std::collections::BTreeSet;
fn main() {
let mut start_nodes = BTreeSet::new();
// add items to the set
while !start_nodes.is_empty() {
let mut start_iter = start_nodes.iter();
let mut start_iter_cloned = start_iter.cloned();
let n = start_iter_cloned.next().unwrap();
start_nodes.remove(&n);
}
}
This, however, gives me the following compile error:
error[E0502]: cannot borrow `start_nodes` as mutable because it is also borrowed as immutable
--> prog.rs:60:6
|
56 | let mut start_iter = start_nodes.iter();
| ----------- immutable borrow occurs here
...
60 | start_nodes.remove(&n);
| ^^^^^^^^^^^ mutable borrow occurs here
...
77 | }
| - immutable borrow ends here
Why is start_nodes.iter() considered an immutable borrow? What approach should I take instead to get the first member?
I'm using version 1.14.0 (not by choice).
Why is start_nodes.iter() considered an immutable borrow?
Whenever you ask a question like this one, you need to look at the prototype of the function, in this case the prototype of BTreeSet::iter():
fn iter(&self) -> Iter<T>
If we look up the Iter type that is returned, we find that it's defined as
pub struct Iter<'a, T> where T: 'a { /* fields omitted */ }
The lifetime 'a is not explicitly mentioned in the definition of iter(); however, the lifetime elision rules make the function definition equivalent to
fn iter<'a>(&'a self) -> Iter<'a, T>
From this expanded version, you can see that the return value has a lifetime that is bound to the lifetime of the reference to self that you pass in, which is just another way of stating that the function call creates a shared borrow that lives as long as the return value. If you store the return value in a variable, the borrow lives at least as long as the variable.
What approach should I take instead to get the first member?
As noted in the comments, your code works on recent versions of Rust due to non-lexical lifetimes – the compiler figures out by itself that start_iter and start_iter_cloned don't need to live longer than the call to next(). In older versions of Rust, you can artificially limit the lifetime by introducing a new scope:
while !start_nodes.is_empty() {
let n = {
let mut start_iter = start_nodes.iter();
let mut start_iter_cloned = start_iter.cloned();
start_iter_cloned.next().unwrap()
};
start_nodes.remove(&n);
}
However, note that this code is needlessly long-winded. The new iterator you create and its cloning version only live inside the new scope, and they aren't really used for any other purpose, so you could just as well write
while !start_nodes.is_empty() {
let n = start_nodes.iter().next().unwrap().clone();
start_nodes.remove(&n);
}
which does exactly the same, and avoids the issues with long-living borrows by avoiding to store the intermediate values in variables, to ensure their lifetime ends immediately after the expression.
Finally, while you don't give full details of your use case, I strongly suspect that you would be better off with a BinaryHeap instead of a BTreeSet:
use std::collections::BinaryHeap;
fn main() {
let mut start_nodes = BinaryHeap::new();
start_nodes.push(42);
while let Some(n) = start_nodes.pop() {
// Do something with `n`
}
}
This code is shorter, simpler, completely sidesteps the issue with the borrow checker, and will also be more efficient.
Not sure this is the best approach, but I fixed it by introducing a new scope to ensure that the immutable borrow ends before the mutable borrow occurs:
use std::collections::BTreeSet;
fn main() {
let mut start_nodes = BTreeSet::new();
// add items to the set
while !start_nodes.is_empty() {
let mut n = 0;
{
let mut start_iter = start_nodes.iter();
let mut start_iter_cloned = start_iter.cloned();
let x = &mut n;
*x = start_iter_cloned.next().unwrap();
}
start_nodes.remove(&n);
}
}

How can multiple threads share an iterator?

I've been working on a function that will copy a bunch of files from a source to a destination using Rust and threads. I'm getting some trouble making the threads share the iterator. I am not still used to the borrowing system:
extern crate libc;
extern crate num_cpus;
use libc::{c_char, size_t};
use std::thread;
use std::fs::copy;
fn python_str_array_2_str_vec<T, U, V>(_: T, _: U) -> V {
unimplemented!()
}
#[no_mangle]
pub extern "C" fn copyFiles(
sources: *const *const c_char,
destinies: *const *const c_char,
array_len: size_t,
) {
let src: Vec<&str> = python_str_array_2_str_vec(sources, array_len);
let dst: Vec<&str> = python_str_array_2_str_vec(destinies, array_len);
let mut iter = src.iter().zip(dst);
let num_threads = num_cpus::get();
let threads = (0..num_threads).map(|_| {
thread::spawn(|| while let Some((s, d)) = iter.next() {
copy(s, d);
})
});
for t in threads {
t.join();
}
}
fn main() {}
I'm getting this compilation error that I have not been able to solve:
error[E0597]: `src` does not live long enough
--> src/main.rs:20:20
|
20 | let mut iter = src.iter().zip(dst);
| ^^^ does not live long enough
...
30 | }
| - borrowed value only lives until here
|
= note: borrowed value must be valid for the static lifetime...
error[E0373]: closure may outlive the current function, but it borrows `**iter`, which is owned by the current function
--> src/main.rs:23:23
|
23 | thread::spawn(|| while let Some((s, d)) = iter.next() {
| ^^ ---- `**iter` is borrowed here
| |
| may outlive borrowed value `**iter`
|
help: to force the closure to take ownership of `**iter` (and any other referenced variables), use the `move` keyword, as shown:
| thread::spawn(move || while let Some((s, d)) = iter.next() {
I've seen the following questions already:
Value does not live long enough when using multiple threads
I'm not using chunks, I would like to try to share an iterator through the threads although creating chunks to pass them to the threads will be the classic solution.
Unable to send a &str between threads because it does not live long enough
I've seen some of the answers to use channels to communicate with the threads, but I'm not quite sure about using them. There should be an easier way of sharing just one object through threads.
Why doesn't a local variable live long enough for thread::scoped
This got my attention, scoped is supposed to fix my error, but since it is in the unstable channel I would like to see if there is another way of doing it just using spawn.
Can someone explain how should I fix the lifetimes so the iterator can be accessed from the threads?
Here's a minimal, reproducible example of your problem:
use std::thread;
fn main() {
let src = vec!["one"];
let dst = vec!["two"];
let mut iter = src.iter().zip(dst);
thread::spawn(|| {
while let Some((s, d)) = iter.next() {
println!("{} -> {}", s, d);
}
});
}
There are multiple related problems:
The iterator lives on the stack and the thread's closure takes a reference to it.
The closure takes a mutable reference to the iterator.
The iterator itself has a reference to a Vec that lives on the stack.
The Vec itself has references to string slices that likely live on the stack but are not guaranteed to live longer than the thread either way.
Said another way, the Rust compiler has stopped you from executing four separate pieces of memory unsafety.
A main thing to recognize is that any thread you spawn might outlive the place where you spawned it. Even if you call join right away, the compiler cannot statically verify that will happen, so it has to take the conservative path. This is the point of scoped threads — they guarantee the thread exits before the stack frame they were started in.
Additionally, you are attempting to use a mutable reference in multiple concurrent threads. There's zero guarantee that the iterator (or any of the iterators it was built on) can be safely called in parallel. It's entirely possible that two threads call next at exactly the same time. The two pieces of code run in parallel and write to the same memory address. One thread writes half of the data and the other thread writes the other half, and now your program crashes at some arbitrary point in the future.
Using a tool like crossbeam, your code would look something like:
use crossbeam; // 0.7.3
fn main() {
let src = vec!["one"];
let dst = vec!["two"];
let mut iter = src.iter().zip(dst);
while let Some((s, d)) = iter.next() {
crossbeam::scope(|scope| {
scope.spawn(|_| {
println!("{} -> {}", s, d);
});
})
.unwrap();
}
}
As mentioned, this will only spawn one thread at a time, waiting for it to finish. An alternative to get more parallelism (the usual point of this exercise) is to interchange the calls to next and spawn. This requires transferring ownership of s and d to the thread via the move keyword:
use crossbeam; // 0.7.3
fn main() {
let src = vec!["one", "alpha"];
let dst = vec!["two", "beta"];
let mut iter = src.iter().zip(dst);
crossbeam::scope(|scope| {
while let Some((s, d)) = iter.next() {
scope.spawn(move |_| {
println!("{} -> {}", s, d);
});
}
})
.unwrap();
}
If you add a sleep call inside the spawn, you can see the threads run in parallel.
I'd have written it using a for loop, however:
let iter = src.iter().zip(dst);
crossbeam::scope(|scope| {
for (s, d) in iter {
scope.spawn(move |_| {
println!("{} -> {}", s, d);
});
}
}).unwrap();
In the end, the iterator is exercised on the current thread, and each value returned from the iterator is then handed off to a new thread. The new threads are guaranteed to exit before the captured references.
You may be interested in Rayon, a crate that allows easy parallelization of certain types of iterators.
See also:
How can I pass a reference to a stack variable to a thread?
Lifetime troubles sharing references between threads
How do I use static lifetimes with threads?
Thread references require static lifetime?
Lifetime woes when using threads
Cannot call a function in a spawned thread because it "does not fulfill the required lifetime"

Resources