How can multiple threads share an iterator?

How can multiple threads share an iterator? - multithreading

I've been working on a function that will copy a bunch of files from a source to a destination using Rust and threads. I'm getting some trouble making the threads share the iterator. I am not still used to the borrowing system:
extern crate libc;
extern crate num_cpus;
use libc::{c_char, size_t};
use std::thread;
use std::fs::copy;
fn python_str_array_2_str_vec<T, U, V>(_: T, _: U) -> V {
unimplemented!()
}
#[no_mangle]
pub extern "C" fn copyFiles(
sources: *const *const c_char,
destinies: *const *const c_char,
array_len: size_t,
) {
let src: Vec<&str> = python_str_array_2_str_vec(sources, array_len);
let dst: Vec<&str> = python_str_array_2_str_vec(destinies, array_len);
let mut iter = src.iter().zip(dst);
let num_threads = num_cpus::get();
let threads = (0..num_threads).map(|_| {
thread::spawn(|| while let Some((s, d)) = iter.next() {
copy(s, d);
})
});
for t in threads {
t.join();
}
}
fn main() {}
I'm getting this compilation error that I have not been able to solve:
error[E0597]: `src` does not live long enough
--> src/main.rs:20:20
|
20 | let mut iter = src.iter().zip(dst);
| ^^^ does not live long enough
...
30 | }
| - borrowed value only lives until here
|
= note: borrowed value must be valid for the static lifetime...
error[E0373]: closure may outlive the current function, but it borrows `**iter`, which is owned by the current function
--> src/main.rs:23:23
|
23 | thread::spawn(|| while let Some((s, d)) = iter.next() {
| ^^ ---- `**iter` is borrowed here
| |
| may outlive borrowed value `**iter`
|
help: to force the closure to take ownership of `**iter` (and any other referenced variables), use the `move` keyword, as shown:
| thread::spawn(move || while let Some((s, d)) = iter.next() {
I've seen the following questions already:
Value does not live long enough when using multiple threads
I'm not using chunks, I would like to try to share an iterator through the threads although creating chunks to pass them to the threads will be the classic solution.
Unable to send a &str between threads because it does not live long enough
I've seen some of the answers to use channels to communicate with the threads, but I'm not quite sure about using them. There should be an easier way of sharing just one object through threads.
Why doesn't a local variable live long enough for thread::scoped
This got my attention, scoped is supposed to fix my error, but since it is in the unstable channel I would like to see if there is another way of doing it just using spawn.
Can someone explain how should I fix the lifetimes so the iterator can be accessed from the threads?

Here's a minimal, reproducible example of your problem:
use std::thread;
fn main() {
let src = vec!["one"];
let dst = vec!["two"];
let mut iter = src.iter().zip(dst);
thread::spawn(|| {
while let Some((s, d)) = iter.next() {
println!("{} -> {}", s, d);
}
});
}
There are multiple related problems:
The iterator lives on the stack and the thread's closure takes a reference to it.
The closure takes a mutable reference to the iterator.
The iterator itself has a reference to a Vec that lives on the stack.
The Vec itself has references to string slices that likely live on the stack but are not guaranteed to live longer than the thread either way.
Said another way, the Rust compiler has stopped you from executing four separate pieces of memory unsafety.
A main thing to recognize is that any thread you spawn might outlive the place where you spawned it. Even if you call join right away, the compiler cannot statically verify that will happen, so it has to take the conservative path. This is the point of scoped threads — they guarantee the thread exits before the stack frame they were started in.
Additionally, you are attempting to use a mutable reference in multiple concurrent threads. There's zero guarantee that the iterator (or any of the iterators it was built on) can be safely called in parallel. It's entirely possible that two threads call next at exactly the same time. The two pieces of code run in parallel and write to the same memory address. One thread writes half of the data and the other thread writes the other half, and now your program crashes at some arbitrary point in the future.
Using a tool like crossbeam, your code would look something like:
use crossbeam; // 0.7.3
fn main() {
let src = vec!["one"];
let dst = vec!["two"];
let mut iter = src.iter().zip(dst);
while let Some((s, d)) = iter.next() {
crossbeam::scope(|scope| {
scope.spawn(|_| {
println!("{} -> {}", s, d);
});
})
.unwrap();
}
}
As mentioned, this will only spawn one thread at a time, waiting for it to finish. An alternative to get more parallelism (the usual point of this exercise) is to interchange the calls to next and spawn. This requires transferring ownership of s and d to the thread via the move keyword:
use crossbeam; // 0.7.3
fn main() {
let src = vec!["one", "alpha"];
let dst = vec!["two", "beta"];
let mut iter = src.iter().zip(dst);
crossbeam::scope(|scope| {
while let Some((s, d)) = iter.next() {
scope.spawn(move |_| {
println!("{} -> {}", s, d);
});
}
})
.unwrap();
}
If you add a sleep call inside the spawn, you can see the threads run in parallel.
I'd have written it using a for loop, however:
let iter = src.iter().zip(dst);
crossbeam::scope(|scope| {
for (s, d) in iter {
scope.spawn(move |_| {
println!("{} -> {}", s, d);
});
}
}).unwrap();
In the end, the iterator is exercised on the current thread, and each value returned from the iterator is then handed off to a new thread. The new threads are guaranteed to exit before the captured references.
You may be interested in Rayon, a crate that allows easy parallelization of certain types of iterators.
See also:
How can I pass a reference to a stack variable to a thread?
Lifetime troubles sharing references between threads
How do I use static lifetimes with threads?
Thread references require static lifetime?
Lifetime woes when using threads
Cannot call a function in a spawned thread because it "does not fulfill the required lifetime"

Related

Proper way to share references to Vec between threads

I am new to rust and I am attempting to create a Vec that will live on the main thread, and pass a reference to another thread, which then pushes members onto the vector, for the main thread to use.
use std::{thread};
fn main() {
let mut v: Vec<u8> = Vec::new();
let _ = thread::spawn(move || {
vec_push(&mut v, 0)
});
for i in v.iter_mut() {
println!("poo {}", i);
}
}
fn vec_push(v: &mut Vec<u8>, n: u8) {
v.push(n);
}
This is a simplified version of what I am trying to do. In my main code I am want it to be a Vec of TcpStreams.
I think this post would also apply to maintaining a struct (that doesn't implement Copy) between threads.
I get this error
error[E0382]: borrow of moved value: `v`
--> src/main.rs:8:11
|
4 | let mut v: Vec<u8> = Vec::new();
| ----- move occurs because `v` has type `Vec<u8>`, which does not implement the `Copy` trait
5 | let _ = thread::spawn(move || {
| ------- value moved into closure here
6 | vec_push(&mut v, 0)
| - variable moved due to use in closure
7 | });
8 | for i in v.iter_mut() {
| ^^^^^^^^^^^^ value borrowed here after move
Is there a better way to do this? Am I missing some basic concept?
Any help would be useful, I am used to C where I can just throw around references willy-nilly

What you are doing is wildly unsound. You are trying to have two mutable references to a object, which is strictly forbidden in rust. Rust forbids this to prevent you from having data races that would result in memory unsafety.
If you want to mutate an object from different threads you have to synchronize it somehow. The easiest way to do it is by using Mutex. This probably won't be very efficient in a high-congestion scenario (as locking a mutex can become your bottle neck), but it will be safe.
To share this Mutex between threads you can wrap it in an Arc (an atomic counted shared reference smart pointer). So your code can be transformed to something like this:
use std::thread;
use std::sync::{Arc, Mutex};
fn main() {
let v = Arc::new(Mutex::new(Vec::new()));
let v_clone = Arc::clone(&v);
let t = thread::spawn(move || {
vec_push(v_clone, 0)
});
t.join().unwrap();
for i in v.lock().unwrap().iter_mut() {
println!("poo {}", i);
}
}
fn vec_push(v: Arc<Mutex<Vec<u8>>>, n: u8) {
v.lock().unwrap().push(n);
}
You probably will also want to join your spawned thread, so you should name it.

How to loop over thread handles and join if finished, within another loop?

I have a program that creates threads in a loop, and also checks if they have finished and cleans them up if they have. See below for a minimal example:
use std::thread;
fn main() {
let mut v = Vec::<std::thread::JoinHandle<()>>::new();
for _ in 0..10 {
let jh = thread::spawn(|| {
thread::sleep(std::time::Duration::from_secs(1));
});
v.push(jh);
for jh in v.iter_mut() {
if jh.is_finished() {
jh.join().unwrap();
}
}
}
}
This gives the error:
error[E0507]: cannot move out of `*jh` which is behind a mutable reference
--> src\main.rs:13:17
|
13 | jh.join().unwrap();
| ^^^------
| | |
| | `*jh` moved due to this method call
| move occurs because `*jh` has type `JoinHandle<()>`, which does not implement the `Copy` trait
|
note: this function takes ownership of the receiver `self`, which moves `*jh`
--> D:\rust\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\library\std\src\thread\mod.rs:1461:17
|
1461 | pub fn join(self) -> Result<T> {
How can I get the borrow checker to allow this?

JoinHandle::join actually consumes the JoinHandle.
iter_mut(), however, only borrows the elements of the vector and keeps the vector alive. Therefore your JoinHandles are only borrowed, and you cannot call consuming methods on borrowed objects.
What you need to do is to take the ownership of the elements while iterating over the vector, so they can be then consumed by join(). This is achieved by using into_iter() instead of iter_mut().
The second mistake is that you (probably accidentally) wrote the two for loops inside of each other, while they should be independent loops.
The third problem is a little more complex. You cannot check if a thread has finished and then join it the way you did. Therefore I removed the is_finished() check for now and will talk about this further down again.
Here is your fixed code:
use std::thread;
fn main() {
let mut v = Vec::<std::thread::JoinHandle<()>>::new();
for _ in 0..10 {
let jh = thread::spawn(|| {
thread::sleep(std::time::Duration::from_secs(1));
});
v.push(jh);
}
for jh in v.into_iter() {
jh.join().unwrap();
}
}
Reacting to finished threads
This one is harder. If you just want to wait until all of them are finished, the code above is the way to go.
However, if you have to react to finished threads right away, you basically have to set up some kind of event propagation. You don't want to loop over all threads over and over again until they are all finished, because that is something called idle-waiting and consumes a lot of computational power.
So if you want to achieve that there are two problems that have to be dealt with:
join() consumes the JoinHandle(), which would leave behind an incomplete Vec of JoinHandles. This isn't possible, so we need to wrap JoinHandle in a type that can actually be ripped out of the vector partially, like Option.
we need a way to signal to the main thread that a new child thread is finished, so that the main thread doesn't have to continuously iterate over the threads.
All in all this is very complex and tricky to implement.
Here is my attempt:
use std::{
thread::{self, JoinHandle},
time::Duration,
};
fn main() {
let mut v: Vec<Option<JoinHandle<()>>> = Vec::new();
let (send_finished_thread, receive_finished_thread) = std::sync::mpsc::channel();
for i in 0..10 {
let send_finished_thread = send_finished_thread.clone();
let join_handle = thread::spawn(move || {
println!("Thread {} started.", i);
thread::sleep(Duration::from_millis(2000 - i as u64 * 100));
println!("Thread {} finished.", i);
// Signal that we are finished.
// This will wake up the main thread.
send_finished_thread.send(i).unwrap();
});
v.push(Some(join_handle));
}
loop {
// Check if all threads are finished
let num_left = v.iter().filter(|th| th.is_some()).count();
if num_left == 0 {
break;
}
// Wait until a thread is finished, then join it
let i = receive_finished_thread.recv().unwrap();
let join_handle = std::mem::take(&mut v[i]).unwrap();
println!("Joining {} ...", i);
join_handle.join().unwrap();
println!("{} joined.", i);
}
println!("All joined.");
}
Important
This code is just a demonstration. It will deadlock if one of the threads panic. But this shows how complicated that problem is.
It could be solved by utilizing a drop guard, but I think this answer is convoluted enough ;)

How to tell Rust to let me modify a shared variable hidden behind an RwLock?

Safe Rust demands the following from all references:
One or more references (&T) to a resource,
Exactly one mutable reference (&mut T).
I want to have one Vec that can be read by multiple threads and written by one, but only one of those should be possible at a time (as the language demands).
So I use an RwLock.
I need a Vec<i8>. To let it outlive the main function, I Box it and then I RwLock around that, like thus:
fn main() {
println!("Hello, world!");
let mut v = vec![0, 1, 2, 3, 4, 5, 6];
let val = RwLock::new(Box::new(v));
for i in 0..10 {
thread::spawn(move || threadFunc(&val));
}
loop {
let mut VecBox = (val.write().unwrap());
let ref mut v1 = *(*VecBox);
v1.push(1);
//And be very busy.
thread::sleep(Duration::from_millis(10000));
}
}
fn threadFunc(val: &RwLock<Box<Vec<i8>>>) {
loop {
//Use Vec
let VecBox = (val.read().unwrap());
let ref v1 = *(*VecBox);
println!("{}", v1.len());
//And be very busy.
thread::sleep(Duration::from_millis(1000));
}
}
Rust refuses to compile this:
capture of moved value: `val`
--> src/main.rs:14:43
|
14 | thread::spawn(move || threadFunc(&val));
| ------- ^^^ value captured here after move
| |
| value moved (into closure) here
Without the thread:
for i in 0..10 {
threadFunc(&val);
}
It compiles. The problem is with the closure. I have to "move" it, or else Rust complains that it can outlive main, I also can't clone val (RwLock doesn't implement clone()).
What should I do?

Note that there's no structural difference between using a RwLock and a Mutex; they just have different access patterns. See
Concurrent access to vector from multiple threads using a mutex lock for related discussion.
The problem centers around the fact that you've transferred ownership of the vector (in the RwLock) to some thread; therefore your main thread doesn't have it anymore. You can't access it because it's gone.
In fact, you'll have the same problem as you've tried to pass the vector to each of the threads. You only have one vector to give away, so only one thread could have it.
You need thread-safe shared ownership, provided by Arc:
use std::sync::{Arc, RwLock};
use std::thread;
use std::time::Duration;
fn main() {
println!("Hello, world!");
let v = vec![0, 1, 2, 3, 4, 5, 6];
let val = Arc::new(RwLock::new(v));
for _ in 0..10 {
let v = val.clone();
thread::spawn(move || thread_func(v));
}
for _ in 0..5 {
{
let mut val = val.write().unwrap();
val.push(1);
}
thread::sleep(Duration::from_millis(1000));
}
}
fn thread_func(val: Arc<RwLock<Vec<i8>>>) {
loop {
{
let val = val.read().unwrap();
println!("{}", val.len());
}
thread::sleep(Duration::from_millis(100));
}
}
Other things to note:
I removed the infinite loop in main so that the code can actually finish.
I fixed all of the compiler warnings. If you are going to use a compiled language, pay attention to the warnings.
unnecessary parentheses
snake_case identifiers. Definitely do not use PascalCase for local variables; that's used for types. camelCase does not get used in Rust.
I added some blocks to shorten the lifetime that the read / write locks will be held. Otherwise there's a lot of contention and the child threads never have a chance to get a read lock.
let ref v1 = *(*foo); is non-idiomatic. Prefer let v1 = &**foo. You don't even need to do that at all, thanks to Deref.

How do I share a mutable object between threads using Arc?

I'm trying to share a mutable object between threads in Rust using Arc, but I get this error:
error[E0596]: cannot borrow data in a `&` reference as mutable
--> src/main.rs:11:13
|
11 | shared_stats_clone.add_stats();
| ^^^^^^^^^^^^^^^^^^ cannot borrow as mutable
This is the sample code:
use std::{sync::Arc, thread};
fn main() {
let total_stats = Stats::new();
let shared_stats = Arc::new(total_stats);
let threads = 5;
for _ in 0..threads {
let mut shared_stats_clone = shared_stats.clone();
thread::spawn(move || {
shared_stats_clone.add_stats();
});
}
}
struct Stats {
hello: u32,
}
impl Stats {
pub fn new() -> Stats {
Stats { hello: 0 }
}
pub fn add_stats(&mut self) {
self.hello += 1;
}
}
What can I do?

Arc's documentation says:
Shared references in Rust disallow mutation by default, and Arc is no exception: you cannot generally obtain a mutable reference to something inside an Arc. If you need to mutate through an Arc, use Mutex, RwLock, or one of the Atomic types.
You will likely want a Mutex combined with an Arc:
use std::{
sync::{Arc, Mutex},
thread,
};
struct Stats;
impl Stats {
fn add_stats(&mut self, _other: &Stats) {}
}
fn main() {
let shared_stats = Arc::new(Mutex::new(Stats));
let threads = 5;
for _ in 0..threads {
let my_stats = shared_stats.clone();
thread::spawn(move || {
let mut shared = my_stats.lock().unwrap();
shared.add_stats(&Stats);
});
// Note: Immediately joining, no multithreading happening!
// THIS WAS A LIE, see below
}
}
This is largely cribbed from the Mutex documentation.
How can I use shared_stats after the for? (I'm talking about the Stats object). It seems that the shared_stats cannot be easily converted to Stats.
As of Rust 1.15, it's possible to get the value back. See my additional answer for another solution as well.
[A comment in the example] says that there is no multithreading. Why?
Because I got confused! :-)
In the example code, the result of thread::spawn (a JoinHandle) is immediately dropped because it's not stored anywhere. When the handle is dropped, the thread is detached and may or may not ever finish. I was confusing it with JoinGuard, a old, removed API that joined when it is dropped. Sorry for the confusion!
For a bit of editorial, I suggest avoiding mutability completely:
use std::{ops::Add, thread};
#[derive(Debug)]
struct Stats(u64);
// Implement addition on our type
impl Add for Stats {
type Output = Stats;
fn add(self, other: Stats) -> Stats {
Stats(self.0 + other.0)
}
}
fn main() {
let threads = 5;
// Start threads to do computation
let threads: Vec<_> = (0..threads).map(|_| thread::spawn(|| Stats(4))).collect();
// Join all the threads, fail if any of them failed
let result: Result<Vec<_>, _> = threads.into_iter().map(|t| t.join()).collect();
let result = result.unwrap();
// Add up all the results
let sum = result.into_iter().fold(Stats(0), |i, sum| sum + i);
println!("{:?}", sum);
}
Here, we keep a reference to the JoinHandle and then wait for all the threads to finish. We then collect the results and add them all up. This is the common map-reduce pattern. Note that no thread needs any mutability, it all happens in the master thread.

How do you send slices of a Vec to a task in rust?

So, this doesn't work:
use std::comm;
#[deriving(Show)]
struct St { v: u8 }
fn main() {
let mut foo:Vec<St> = Vec::new();
for i in range(0u8, 10) {
foo.push(St { v: i });
}
{
let mut foo_slice = foo.as_mut_slice();
let (f1, f2) = foo_slice.split_at_mut(5);
let (sl, rx):(Sender<Option<&mut [St]>>, Receiver<Option<&mut [St]>>) = comm::channel();
let (sx, rl):(Sender<bool>, Receiver<bool>) = comm::channel();
spawn(proc() {
loop {
let v = rx.recv();
match v {
Some(v) => {
v[0].v = 100u8;
sx.send(true);
},
None => {
sx.send(false);
break;
}
}
}
});
sl.send(Some(f1));
sl.send(Some(f2));
sl.send(None);
println!("{}", rl.recv());
println!("{}", rl.recv());
println!("{}", rl.recv());
}
println!("{}", foo);
}
...because:
sl.send(Some(f1));
sl.send(Some(f2));
sl.send(None);
Infers that the variables f1 and f2 must be 'static, because the task may outlive the function it is running in. Which in turn means that foo must be 'static, and not 'a, which is the lifetime of main().
Thus the somewhat odd error:
<anon>:14:27: 14:30 error: `foo` does not live long enough
<anon>:14 let mut foo_slice = foo.as_mut_slice();
^~~
note: reference must be valid for the static lifetime...
<anon>:6:11: 46:2 note: ...but borrowed value is only valid for the block at 6:10
<anon>:6 fn main() {
<anon>:7
<anon>:8 let mut foo:Vec<St> = Vec::new();
<anon>:9 for i in range(0u8, 10) {
<anon>:10 foo.push(St { v: i });
<anon>:11 }
So, to fix this I thought that using Box <Vec<Foo>> might be the solution, but even then the slices created will have a local lifetime.
I could use unsafe code to transmute the lifetime (this does actually work), but is there a way to safely do the same thing?
playpen: http://is.gd/WQBdSB

Rust prevents you from having mutable access to the same value from within multiple tasks, because that leads to data races. Specifically, a task can't have borrowed pointers (incl. slices) to a value that is owned by another task.
To allow multiple tasks to access the same object, you should use Arc<T>. To provide mutable access to the object, put a RefCell<T> in that Arc: Arc<RefCell<T>>. As for that T, you can't use a slice type, as I just explained. I suggest you create 2 different Arc<RefCell<Vec<St>>> objects, send a clone of an Arc<RefCell<Vec<St>>> on the channel and join the Vecs when the tasks have done their job.
In general, when doing parallel algorithms, you should avoid mutating shared state. This leads to poor performance, because the system needs to invalidate memory caches across cores. If possible, consider having the task allocate and hold on to its result until it's complete, and send the complete result over a channel, rather than a mere bool.
EDIT
We can reformulate your initial program in terms of ownership to understand why it's not sound. The stack frame for the call to main owns foo, the Vec. foo_slice, f1 and f2 borrow that Vec. You spawn a task. That task may outlive the call frame for main, and even outlive the task that spawned it. Therefore, it is illegal to send references to values that are constrained to a stack frame. This is why borrowed pointers, with the exception of &'static T, don't fulfill Send.
Boxing the Vec changes nothing, because the stack frame still owns the Box, so returning from the function will drop the box and its contents.
The compiler cannot verify that the task won't outlive the owner of the values you send references to to the task. If you are sure that the task will terminate before the references you give it become invalid, you can use transmute to cheat on the lifetime, but this is unsafe.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How can multiple threads share an iterator? - multithreading

Related

Proper way to share references to Vec between threads

How to loop over thread handles and join if finished, within another loop?

How to tell Rust to let me modify a shared variable hidden behind an RwLock?

How do I share a mutable object between threads using Arc?

How do you send slices of a Vec to a task in rust?

Categories

Resources