Code not running in parallel when using thread::scoped - rust

Can someone please explain why the code below does not run in parallel? I guess I don't understand how thread::scoped works..
use std::thread;
use std::sync::{Arc, Mutex};
use std::time::Duration;
use std::old_io::timer;
fn main() {
let buf = Arc::new(Mutex::new(Vec::<String>::new()));
let res = test(buf);
println!("{:?}", *res.lock().unwrap());
}
fn test(buf: Arc<Mutex<Vec<String>>>) -> Arc<Mutex<Vec<String>>> {
let guards: Vec<_> = (0..3).map( |i| {
let mtx = buf.clone();
thread::scoped(|| {
println!("Thread: {}", i);
let mut res = mtx.lock().unwrap();
timer::sleep(Duration::seconds(5));
res.push(format!("thread {}", i));
});
}).collect();
buf
}
The code is based on the examples here where it's stated:
The scoped function takes one argument, a closure, indicated by the double bars ||. This closure is executed in a new thread created by scoped. The method is called scoped because it returns a 'join guard', which will automatically join the child thread when it goes out of scope. Because we collect these guards into a Vec, and that vector goes out of scope at the end of our program, our program will wait for every thread to finish before finishing.
Thanks

This is a tricky case. The problem is the humble semicolon. Look at this minimized code:
thread::scoped(|| {});
That semicolon means that the result of the collect isn't a vector of JoinGuards — it's a Vec<()>! Each JoinGuard is dropped immediately, forcing the thread to finish before the next iteration starts.
When you fix this issue, you'll hit the next problem, which is that i and mtx don't live long enough. You'll need to move them into the closure:
thread::scoped(move || {})

Related

Rust - How to pass function parameters to closure

I'm trying to write a function that takes two parameters. The function starts two threads and uses one of the parameters inside one of the thread closures. This doesn't work because of the error "Borrowed data escapes outside of closure". Here's the code.
pub fn measure_stats(testdatapath: &PathBuf, filenameprefix: &String) {
let (tx, rx) = mpsc::channel();
let filename = format!("test.txt")
let measure_thread = thread::spawn(move || {
let stats = sar();
fs::write(filename, stats).expect("failed to write output to file");
// Send a signal that we're done.
let _ = tx.send(());
});
thread::spawn(move || {
let mut n = 0;
loop {
// Break if the measure thread is done.
match rx.try_recv() {
Ok(_) | Err(TryRecvError::Disconnected) => break,
Err(TryRecvError::Empty) => {}
}
let filename = format!("{:04}.img", n);
let filepath = Path::new(testdatapath).join(&filename);
random_file_write(&filepath).unwrap();
random_file_read(&filepath).unwrap();
fs::remove_file(&filepath).expect("failed to remove file");
n += 1;
}
});
measure_thread.join().expect("joining measure thread panicked");
}
The problem is that testdatapath escapes the function body. I think this is a problem because the lifetime of testdatapath is only guaranteed until the end of the closure, but it needs to be the lifetime of the entire program. But it's a little confusing to me.
I've tried cloning the variable, but that didn't help. I'm not sure how I'm supposed to do this. How do I use a function parameter inside the closure or accomplish the same goal some other more canonical way?
If it's okay for the function not to return until both threads complete, then use std::thread::scope() to create scoped threads instead of std::thread::spawn(). Scoped threads allow borrowing data whereas regular spawning cannot, but require the threads to all terminate before the scope ends and the function that created them returns.
If this has to be a “background” task, then you need to make sure that all the data used by each thread is owned, i.e. not a reference. In this case, that means you should change the parameters to be owned:
pub fn measure_stats(testdatapath: PathBuf, filenameprefix: String) {
Then, those values will be moved into the receiving thread, without any lifetime constraints.
You're trying to make testdata live longer than the function, since this is a value you're borrowing and since you can't guarantee that the original PathBuff will outlive closure running in the new thread the compiler is warning you that you're assuming that this would be the case, but not taking any precautions to do so.
The 3 simpler choices:
Move the PathBuff to the function instead of borrowing it (remove the &).
Use an Arc
clone it and move the clone into the thread.

How to loop over thread handles and join if finished, within another loop?

I have a program that creates threads in a loop, and also checks if they have finished and cleans them up if they have. See below for a minimal example:
use std::thread;
fn main() {
let mut v = Vec::<std::thread::JoinHandle<()>>::new();
for _ in 0..10 {
let jh = thread::spawn(|| {
thread::sleep(std::time::Duration::from_secs(1));
});
v.push(jh);
for jh in v.iter_mut() {
if jh.is_finished() {
jh.join().unwrap();
}
}
}
}
This gives the error:
error[E0507]: cannot move out of `*jh` which is behind a mutable reference
--> src\main.rs:13:17
|
13 | jh.join().unwrap();
| ^^^------
| | |
| | `*jh` moved due to this method call
| move occurs because `*jh` has type `JoinHandle<()>`, which does not implement the `Copy` trait
|
note: this function takes ownership of the receiver `self`, which moves `*jh`
--> D:\rust\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\library\std\src\thread\mod.rs:1461:17
|
1461 | pub fn join(self) -> Result<T> {
How can I get the borrow checker to allow this?
JoinHandle::join actually consumes the JoinHandle.
iter_mut(), however, only borrows the elements of the vector and keeps the vector alive. Therefore your JoinHandles are only borrowed, and you cannot call consuming methods on borrowed objects.
What you need to do is to take the ownership of the elements while iterating over the vector, so they can be then consumed by join(). This is achieved by using into_iter() instead of iter_mut().
The second mistake is that you (probably accidentally) wrote the two for loops inside of each other, while they should be independent loops.
The third problem is a little more complex. You cannot check if a thread has finished and then join it the way you did. Therefore I removed the is_finished() check for now and will talk about this further down again.
Here is your fixed code:
use std::thread;
fn main() {
let mut v = Vec::<std::thread::JoinHandle<()>>::new();
for _ in 0..10 {
let jh = thread::spawn(|| {
thread::sleep(std::time::Duration::from_secs(1));
});
v.push(jh);
}
for jh in v.into_iter() {
jh.join().unwrap();
}
}
Reacting to finished threads
This one is harder. If you just want to wait until all of them are finished, the code above is the way to go.
However, if you have to react to finished threads right away, you basically have to set up some kind of event propagation. You don't want to loop over all threads over and over again until they are all finished, because that is something called idle-waiting and consumes a lot of computational power.
So if you want to achieve that there are two problems that have to be dealt with:
join() consumes the JoinHandle(), which would leave behind an incomplete Vec of JoinHandles. This isn't possible, so we need to wrap JoinHandle in a type that can actually be ripped out of the vector partially, like Option.
we need a way to signal to the main thread that a new child thread is finished, so that the main thread doesn't have to continuously iterate over the threads.
All in all this is very complex and tricky to implement.
Here is my attempt:
use std::{
thread::{self, JoinHandle},
time::Duration,
};
fn main() {
let mut v: Vec<Option<JoinHandle<()>>> = Vec::new();
let (send_finished_thread, receive_finished_thread) = std::sync::mpsc::channel();
for i in 0..10 {
let send_finished_thread = send_finished_thread.clone();
let join_handle = thread::spawn(move || {
println!("Thread {} started.", i);
thread::sleep(Duration::from_millis(2000 - i as u64 * 100));
println!("Thread {} finished.", i);
// Signal that we are finished.
// This will wake up the main thread.
send_finished_thread.send(i).unwrap();
});
v.push(Some(join_handle));
}
loop {
// Check if all threads are finished
let num_left = v.iter().filter(|th| th.is_some()).count();
if num_left == 0 {
break;
}
// Wait until a thread is finished, then join it
let i = receive_finished_thread.recv().unwrap();
let join_handle = std::mem::take(&mut v[i]).unwrap();
println!("Joining {} ...", i);
join_handle.join().unwrap();
println!("{} joined.", i);
}
println!("All joined.");
}
Important
This code is just a demonstration. It will deadlock if one of the threads panic. But this shows how complicated that problem is.
It could be solved by utilizing a drop guard, but I think this answer is convoluted enough ;)

Testing a thread Worker with an anonymous function

I am adding tests to the 'hello' web server from the rust book.
My issue/error is around how to test whether a Worker has processed a Job.
My idea is to pass an anonymous function which updates a bool from false to true.
I think ownership is an issue here. I tried wrapping f in a Box, thinking it would prevent passing bool as a value as opposed to a reference. Using Box I struggled to mutate the value of state_updated when it was wrapped in this way.
I also tried writing a basic struct to wrap and update the bool. I have since reverted back to a mut bool.
First question: What changes do I need to make to get the test to pass?
Second question: Is there a better way for me to test this?
Below is a minimal version which reproduces my issue.
The full code is available at the bottom of this page in the rust book.
My current test creates a Worker, sends a Job to the worker, and asserts on an expected change
that could only have occurred if the Worker has processed the Job.
I intend to iterate on this test to add proper thread cleanup in the future.
use std::sync::mpsc;
use std::sync::Arc;
use std::sync::Mutex;
use hello_server_help::Worker;
use std::thread;
use std::time::Duration;
#[test]
fn test_worker_processes_job() {
let (sender, r) = mpsc::channel();
let receiver = Arc::new(Mutex::new(r));
let _ = Worker::new(0, receiver);
let mut state_updated = false;
let f = move || state_updated = true;
sender.send(Box::new(f)).unwrap();
thread::sleep(Duration::from_secs(1)); // primitive wait, for now
assert_eq!(state_updated, true);
}
It's my understanding that f is taking ownership of state_updated. In the assert line, however,
at the end, there is no error along the lines of "referenced after move".
Running the tests gives me the output:
running 1 test
test test_worker_processes_job ... FAILED
failures:
---- test_worker_processes_job stdout ----
thread 'test_worker_processes_job' panicked at 'assertion failed: `(left == right)`
left: `false`,
right: `true`', tests/worker_tests.rs:19:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
The MRE implementation:
use std::sync::mpsc;
use std::sync::Arc;
use std::sync::Mutex;
use std::thread;
pub type Job = Box<dyn FnOnce() + Send + 'static>;
pub struct Worker {
id: usize,
handle: Option<thread::JoinHandle<()>>,
}
impl Worker {
pub fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
let thread = thread::spawn(move || loop {
let job = receiver
.lock()
.expect("Error obtaining lock.")
.recv()
.unwrap();
job();
});
Worker {
id,
handle: Some(thread),
}
}
}
state_updated is a boolean so it implements Copy, which is why you can move it into your closure and keep using it afterwards, and why you can't see the changes: the one that is modified by the closure is the copy and not the original.
If you want to update a boolean in the thread and have it visible in the caller, you will need to make sure that you send a reference and you will need to have some synchronization mechanism. Two solutions:
Use an Arc<Mutex<bool>>:
use std::sync::Arc;
use std::sync::Mutex;
let state_updated = Arc::new (Mutex::new (false));
let state_ref = state_updated.clone()
let f = move || *state_ref.lock().unwrap() = true;
…
assert_eq!(*state_updated.lock().unwrap(), true);
Or use an AtomicBool:
use std::sync::atomic::AtomicBool;
use std::sync::atomic::Ordering;
let state_updated = AtomicBool::new (false);
let state_ref = &state_updated;
let f = move || state_ref.store (true, Ordering::Release);
…
assert_eq!(state_updated.load (Ordering::Acquire), true);
The compiler will complain that "state_ref does not live long enough", but you can get around that by using a scoped thread (or from rayon or crossbeam), or with a bit of unsafe: let state_ref: &'static AtomicBool = unsafe { transmute (&state_updated) }; (just make sure you join the child thread before state_updated goes out of scope).
It might however be better to use a channel for the return value:
use use std::sync::mpsc;
let (rsend, rrecv) = mpsc::channel();
let f = move || rsend.send(());
…
assert_eq!(rrecv.recv_timeout (Duration::from_secs (1)), Ok(()));
that way you only wait until the result is available (the duration is just a timeout if the thread takes too long to compute the result).

How do I modify a value in one thread and read the value in another thread using shared memory?

The following Python code creates a thread (actually a process) with an array containing two floats passed to it, the thread counts up 1 by the first float and -1 by the second float every 5 seconds, while the main thread is continuously printing the two floats:
from multiprocessing import Process, Array
from time import sleep
def target(states):
while True:
states[0] -= 1
states[1] += 1
sleep(5)
def main():
states = Array("d", [0.0, 0.0])
process = Process(target=target, args=(states,))
process.start()
while True:
print(states[0])
print(states[1])
if __name__ == "__main__":
main()
How can I do the same thing using shared memory in Rust? I've tried doing the following (playground):
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let data = Arc::new(Mutex::new([0.0]));
let data = data.clone();
thread::spawn(move || {
let mut data = data.lock().unwrap();
data[0] = 1.0;
});
print!("{}", data[0]);
}
But that's giving a compile error:
error: cannot index a value of type `std::sync::Arc<std::sync::Mutex<[_; 1]>>`
--> <anon>:12:18
|>
12 |> print!("{}", data[0]);
|> ^^^^^^^
And even if that'd work, it does something different. I've read this, but I've still no idea how to do it.
Your code is not that far off! :)
Let's look at the compiler error first: it says that you are apparently attempting to index something. This is true, you want to index the data variable (with data[0]), but the compiler complains that the value you want to index is of type std::sync::Arc<std::sync::Mutex<[_; 1]>> and cannot be indexed.
If you look at the type, you can quickly see: my array is still wrapped in a Mutex<T> which is wrapped in an Arc<T>. This brings us to the solution: you have to lock for read access, too. So you have to add the lock().unwrap() like in the other thread:
print!("{}", data.lock().unwrap()[0]);
But now a new compiler error arises: use of moved value: `data`. Dang! This comes from your name shadowing. You say let data = data.clone(); before starting the thread; this shadows the original data. So how about we replace it by let data_for_thread = data.clone() and use data_for_thread in the other thread? You can see the working result here on the playground.
Making it do the same thing as the Python example isn't that hard anymore then, is it?
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::Duration;
let data = Arc::new(Mutex::new([0.0, 0.0]));
let data_for_thread = data.clone();
thread::spawn(move || {
loop {
thread::sleep(Duration::from_secs(5))
let mut data = data_for_thread.lock().unwrap();
data[0] += 1.0;
data[1] -= 1.0;
}
});
loop {
let data = data.lock().unwrap();
println!("{}, {}", data[0], data[1]);
}
You can try it here on the playground, although I changed a few minor things to allow running on the playground.
Ok, so let's first fix the compiler error:
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let data = Arc::new(Mutex::new([0.0]));
let thread_data = data.clone();
thread::spawn(move || {
let mut data = thread_data.lock().unwrap();
data[0] = 1.0;
});
println!("{}", data.lock().unwrap()[0]);
}
The variable thread_data is always moved into the thread, that is why it cannot be accessed after the thread is spawned.
But this still has a problem: you are starting a thread that will run concurrently with the main thread and the last print statement will execute before the thread changes the value most of the time (it will be random).
To fix this you have to wait for the thread to finish before printing the value:
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let data = Arc::new(Mutex::new([0.0]));
let thread_data = data.clone();
let t = thread::spawn(move || {
let mut data = thread_data.lock().unwrap();
data[0] = 1.0;
});
t.join().unwrap();
println!("{}", data.lock().unwrap()[0]);
}
This will always produce the correct result.
If you update common data by a thread, the other threads might not see the updated value, unless you do the following:
Declare the variable as volatile which makes sure that the latest update is given back to the threads that read the variable. The data is read from the memory block but not from cache.
Make all updates and reads as synchronized which might turn out to be costly in terms of performance but is sure to deal with data corruptions/in-consistency due to non-synchronization methods of writes and reads by distinct threads.

Cannot move data out of a Mutex

Consider the following code example, I have a vector of JoinHandlers in which I need it iterate over to join back to the main thread, however, upon doing so I am getting the error error: cannot move out of borrowed content.
let threads = Arc::new(Mutex::new(Vec::new()));
for _x in 0..100 {
let handle = thread::spawn(move || {
//do some work
}
threads.lock().unwrap().push((handle));
}
for t in threads.lock().unwrap().iter() {
t.join();
}
Unfortunately, you can't do this directly. When Mutex consumes the data structure you fed to it, you can't get it back by value again. You can only get &mut reference to it, which won't allow moving out of it. So even into_iter() won't work - it needs self argument which it can't get from MutexGuard.
There is a workaround, however. You can use Arc<Mutex<Option<Vec<_>>>> instead of Arc<Mutex<Vec<_>>> and then just take() the value out of the mutex:
for t in threads.lock().unwrap().take().unwrap().into_iter() {
}
Then into_iter() will work just fine as the value is moved into the calling thread.
Of course, you will need to construct the vector and push to it appropriately:
let threads = Arc::new(Mutex::new(Some(Vec::new())));
...
threads.lock().unwrap().as_mut().unwrap().push(handle);
However, the best way is to just drop the Arc<Mutex<..>> layer altogether (of course, if this value is not used from other threads).
As referenced in How to take ownership of T from Arc<Mutex<T>>? this is now possible to do without any trickery in Rust using Arc::try_unwrap and Mutex.into_inner()
let threads = Arc::new(Mutex::new(Vec::new()));
for _x in 0..100 {
let handle = thread::spawn(move || {
println!("{}", _x);
});
threads.lock().unwrap().push(handle);
}
let threads_unwrapped: Vec<JoinHandle<_>> = Arc::try_unwrap(threads).unwrap().into_inner().unwrap();
for t in threads_unwrapped.into_iter() {
t.join().unwrap();
}
Play around with it in this playground to verify.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9d5635e7f778bc744d1fb855b92db178
while the drain is a good solution, you can also do the following thing
// with a copy
let built_words: Arc<Mutex<Vec<String>>> = Arc::new(Mutex::new(vec![]));
let result: Vec<String> = built_words.lock().unwrap().clone();
// using drain
let mut locked_result = built_words.lock().unwrap();
let mut result: Vec<String> = vec![];
result.extend(locked_result.drain(..));
I would prefer to clone the data to get the original value. Not sure if it has any performance overhead.

Resources