Read Stdout and Stderr parallel and kill process on timeout - multithreading

I need to interact with a another process.
I want to use thread two threads, one for reading in stdout and one for reading on stderr, these threads pass every line the read to a channel.
The main thread collects the lines and checks for a timeout, if a timeout occurs the process should be killed.
My Problem is how to share the std-out/err handles and be still able to kill the process later if a timeout occurs.
let mut stdout = process.stdout.unwrap();
let (tx_stdout, rx_stdout): (Sender<Result<Vec<u8>, ExecutionError>>, Receiver<Result<Vec<u8>, ExecutionError>>) = mpsc::channel();
std::thread::spawn(move || {
PowerShell::read_lines(&tx_stdout, stdout);
});
let mut stderr = process.stderr.unwrap();
let (tx_stderr, rx_stderr): (Sender<Result<Vec<u8>, ExecutionError>>, Receiver<Result<Vec<u8>, ExecutionError>>) = mpsc::channel();
std::thread::spawn(move || {
PowerShell::read_lines(&tx_stderr, &mut stderr);
});
process.kill(); // this is not possible
I understand why compiler has a problem with that, but I don't know how to solve the problem.
Using Arc causes the same issue properly.
Do you have some suggestions for me?

First of all, two things which are rather helpful and useful when asking for Rust help is to set up a reduced test case (if possible) and post it on https://play.rust-lang.org, and read and post compilation errors.
Here a reduced test case would be this:
use std::process::{Command, Stdio};
fn main() {
let mut process = Command::new("echo")
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn().expect("Failed to start echo process");
let stdout = process.stdout.unwrap();
std::thread::spawn(move || {
drop(stdout);
});
let stderr = process.stderr.unwrap();
std::thread::spawn(move || {
drop(stderr);
});
process.kill(); // this is not possible
}
and the compilation error is
error[E0382]: borrow of partially moved value: `process`
--> src/lib.rs:19:4
|
14 | let stderr = process.stderr.unwrap();
| -------- `process.stderr` partially moved due to this method call
...
19 | process.kill(); // this is not possible
| ^^^^^^^ value borrowed here after partial move
|
note: this function consumes the receiver `self` by taking ownership of it, which moves `process.stderr`
which is rather clear: process.stderr is an Option, Option::unwrap has the following signature:
pub fn unwrap(self) -> T
meaning it takes its subject by value. Meaning after the two unwrap calls the process object is not valid anymore: it's been stripped for parts. Some of the parts are still available, but methods can only be called when the object is known to be valid which is not the case here.
Hence the "partially moved" compilation error, you can't strip something for parts and still use it as-is. Well maybe you can depending on the parts, but Rust doesn't express partial dependencies so the compiler has to assume Child::kill needs the child alive and well, not just some of the bits.
There are various possibilities depending on the exact situation: in some cases you can Copy or Clone the bits you want out, or maybe you can use scoped threads to borrow them. None of these are possible here, but there is still a possibility: Option::take lets you take the content of an option, and replace it with a None. So here you could move the ChildStd* out of the structure *while leaving the structure intact`, that just requires the ability to modify the structure:
use std::process::{Command, Stdio};
fn main() {
let mut process = Command::new("echo")
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn().expect("Failed to start echo process");
let stdout = process.stdout.take().unwrap();
std::thread::spawn(move || {
drop(stdout);
});
let stderr = process.stderr.take().unwrap();
std::thread::spawn(move || {
drop(stderr);
});
process.kill(); // this is not possible
}

Related

Rust - How to pass function parameters to closure

I'm trying to write a function that takes two parameters. The function starts two threads and uses one of the parameters inside one of the thread closures. This doesn't work because of the error "Borrowed data escapes outside of closure". Here's the code.
pub fn measure_stats(testdatapath: &PathBuf, filenameprefix: &String) {
let (tx, rx) = mpsc::channel();
let filename = format!("test.txt")
let measure_thread = thread::spawn(move || {
let stats = sar();
fs::write(filename, stats).expect("failed to write output to file");
// Send a signal that we're done.
let _ = tx.send(());
});
thread::spawn(move || {
let mut n = 0;
loop {
// Break if the measure thread is done.
match rx.try_recv() {
Ok(_) | Err(TryRecvError::Disconnected) => break,
Err(TryRecvError::Empty) => {}
}
let filename = format!("{:04}.img", n);
let filepath = Path::new(testdatapath).join(&filename);
random_file_write(&filepath).unwrap();
random_file_read(&filepath).unwrap();
fs::remove_file(&filepath).expect("failed to remove file");
n += 1;
}
});
measure_thread.join().expect("joining measure thread panicked");
}
The problem is that testdatapath escapes the function body. I think this is a problem because the lifetime of testdatapath is only guaranteed until the end of the closure, but it needs to be the lifetime of the entire program. But it's a little confusing to me.
I've tried cloning the variable, but that didn't help. I'm not sure how I'm supposed to do this. How do I use a function parameter inside the closure or accomplish the same goal some other more canonical way?
If it's okay for the function not to return until both threads complete, then use std::thread::scope() to create scoped threads instead of std::thread::spawn(). Scoped threads allow borrowing data whereas regular spawning cannot, but require the threads to all terminate before the scope ends and the function that created them returns.
If this has to be a “background” task, then you need to make sure that all the data used by each thread is owned, i.e. not a reference. In this case, that means you should change the parameters to be owned:
pub fn measure_stats(testdatapath: PathBuf, filenameprefix: String) {
Then, those values will be moved into the receiving thread, without any lifetime constraints.
You're trying to make testdata live longer than the function, since this is a value you're borrowing and since you can't guarantee that the original PathBuff will outlive closure running in the new thread the compiler is warning you that you're assuming that this would be the case, but not taking any precautions to do so.
The 3 simpler choices:
Move the PathBuff to the function instead of borrowing it (remove the &).
Use an Arc
clone it and move the clone into the thread.

How to loop over thread handles and join if finished, within another loop?

I have a program that creates threads in a loop, and also checks if they have finished and cleans them up if they have. See below for a minimal example:
use std::thread;
fn main() {
let mut v = Vec::<std::thread::JoinHandle<()>>::new();
for _ in 0..10 {
let jh = thread::spawn(|| {
thread::sleep(std::time::Duration::from_secs(1));
});
v.push(jh);
for jh in v.iter_mut() {
if jh.is_finished() {
jh.join().unwrap();
}
}
}
}
This gives the error:
error[E0507]: cannot move out of `*jh` which is behind a mutable reference
--> src\main.rs:13:17
|
13 | jh.join().unwrap();
| ^^^------
| | |
| | `*jh` moved due to this method call
| move occurs because `*jh` has type `JoinHandle<()>`, which does not implement the `Copy` trait
|
note: this function takes ownership of the receiver `self`, which moves `*jh`
--> D:\rust\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\library\std\src\thread\mod.rs:1461:17
|
1461 | pub fn join(self) -> Result<T> {
How can I get the borrow checker to allow this?
JoinHandle::join actually consumes the JoinHandle.
iter_mut(), however, only borrows the elements of the vector and keeps the vector alive. Therefore your JoinHandles are only borrowed, and you cannot call consuming methods on borrowed objects.
What you need to do is to take the ownership of the elements while iterating over the vector, so they can be then consumed by join(). This is achieved by using into_iter() instead of iter_mut().
The second mistake is that you (probably accidentally) wrote the two for loops inside of each other, while they should be independent loops.
The third problem is a little more complex. You cannot check if a thread has finished and then join it the way you did. Therefore I removed the is_finished() check for now and will talk about this further down again.
Here is your fixed code:
use std::thread;
fn main() {
let mut v = Vec::<std::thread::JoinHandle<()>>::new();
for _ in 0..10 {
let jh = thread::spawn(|| {
thread::sleep(std::time::Duration::from_secs(1));
});
v.push(jh);
}
for jh in v.into_iter() {
jh.join().unwrap();
}
}
Reacting to finished threads
This one is harder. If you just want to wait until all of them are finished, the code above is the way to go.
However, if you have to react to finished threads right away, you basically have to set up some kind of event propagation. You don't want to loop over all threads over and over again until they are all finished, because that is something called idle-waiting and consumes a lot of computational power.
So if you want to achieve that there are two problems that have to be dealt with:
join() consumes the JoinHandle(), which would leave behind an incomplete Vec of JoinHandles. This isn't possible, so we need to wrap JoinHandle in a type that can actually be ripped out of the vector partially, like Option.
we need a way to signal to the main thread that a new child thread is finished, so that the main thread doesn't have to continuously iterate over the threads.
All in all this is very complex and tricky to implement.
Here is my attempt:
use std::{
thread::{self, JoinHandle},
time::Duration,
};
fn main() {
let mut v: Vec<Option<JoinHandle<()>>> = Vec::new();
let (send_finished_thread, receive_finished_thread) = std::sync::mpsc::channel();
for i in 0..10 {
let send_finished_thread = send_finished_thread.clone();
let join_handle = thread::spawn(move || {
println!("Thread {} started.", i);
thread::sleep(Duration::from_millis(2000 - i as u64 * 100));
println!("Thread {} finished.", i);
// Signal that we are finished.
// This will wake up the main thread.
send_finished_thread.send(i).unwrap();
});
v.push(Some(join_handle));
}
loop {
// Check if all threads are finished
let num_left = v.iter().filter(|th| th.is_some()).count();
if num_left == 0 {
break;
}
// Wait until a thread is finished, then join it
let i = receive_finished_thread.recv().unwrap();
let join_handle = std::mem::take(&mut v[i]).unwrap();
println!("Joining {} ...", i);
join_handle.join().unwrap();
println!("{} joined.", i);
}
println!("All joined.");
}
Important
This code is just a demonstration. It will deadlock if one of the threads panic. But this shows how complicated that problem is.
It could be solved by utilizing a drop guard, but I think this answer is convoluted enough ;)

How can I move the data between threads safely?

I'm currently trying to call a function to which I pass multiple file names and expect the function to read the files and generate the appropriate structs and return them in a Vec<Audit>. I've been able to accomplish it reading the files one by one but I want to achieve it using threads.
This is the function:
fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
let mut audits = Arc::new(Mutex::new(vec![]));
let mut handlers = vec![];
for file in files {
let audits = Arc::clone(&audits);
handlers.push(thread::spawn(move || {
let mut audits = audits.lock().unwrap();
audits.push(audit_from_xml_file(file.clone()));
audits
}));
}
for handle in handlers {
let _ = handle.join();
}
audits
.lock()
.unwrap()
.into_iter()
.fold(vec![], |mut result, audit| {
result.push(audit);
result
})
}
But it won't compile due to the following error:
error[E0277]: `MutexGuard<'_, Vec<Audit>>` cannot be sent between threads safely
--> src/main.rs:82:23
|
82 | handlers.push(thread::spawn(move || {
| ^^^^^^^^^^^^^ `MutexGuard<'_, Vec<Audit>>` cannot be sent between threads safely
|
::: /home/enthys/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:618:8
I have tried wrapping the generated Audit structs in Some(Audit) to avoid the MutexGuard but then I stumble with Poisonned Thread issues.
The cause of the error is that after after pushing the new Audit into the (locked) audits vec you then try to return the vec's MutexGuard.
In Rust, a thread's function can actually return values, the point of doing that is to send the value back to whoever is join-ing the thread. This means the value is going to move between threads, so the value needs to be movable betweem threads (aka Send), which mutex guards have no reason to be[0].
The easy solution is to just... not do that. Just delete the last line of the spawn function. Though it's not like the code works after that as you still have borrowing issue related to the thing at the end.
An alternative is to lean into the feature (especially if Audit objects are not too big): drop the audits vec entirely and instead have each thread return its audit, then collect from the handlers when you join them:
pub fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
let mut handlers = vec![];
for file in files {
handlers.push(thread::spawn(move || {
audit_from_xml_file(file)
}));
}
handlers.into_iter()
.map(|handler| handler.join().unwrap())
.collect()
}
Though at that point you might as well just let Rayon handle it:
use rayon::prelude::*;
pub fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
files.into_par_iter().map(audit_from_xml_file).collect()
}
That also avoids crashing the program or bringing the machine to its knees if you happen to have millions of files.
[0] and all the reasons not to be, locking on one thread and unlocking on an other is not necessarily supported e.g. ReleaseMutex
The ReleaseMutex function fails if the calling thread does not own the mutex object.
(NB: in the windows lingo, "owning" a mutex means having acquired it via WaitForSingleObject, which translates to lock in posix lingo)
and can be plain UB e.g. pthread_mutex_unlock
If a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked, undefined behavior results.
Your problem is that you are passing your Vec<Audit> (or more precisely the MutexGuard<Vec<Audit>>), to the threads and back again, without really needing it.
And you don't need Mutex or Arc for this simpler task:
fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
let mut handlers = vec![];
for file in files {
handlers.push(thread::spawn(move || {
audit_from_xml_file(file)
}));
}
handlers
.into_iter()
.flat_map(|x| x.join())
.collect()
}

Lifetime of variables passed to a new thread

I have trouble compiling this program:
use std::env;
use std::sync::mpsc;
use std::thread;
use std::time::Duration;
fn main() {
let args: Vec<_> = env::args().skip(1).collect();
let (tx, rx) = mpsc::channel();
for arg in &args {
let t = tx.clone();
thread::spawn(move || {
thread::sleep(Duration::from_millis(50));
let _new_arg = arg.to_string() + "foo";
t.send(arg);
});
}
for _ in &args {
println!("{}", rx.recv().unwrap());
}
}
I read all arguments from the command line and emulate doing some work on each argument in the thread. Then I print out the results of this work, which I do using a channel.
error[E0597]: `args` does not live long enough
--> src/main.rs:11:17
|
11 | for arg in &args {
| ^^^^ does not live long enough
...
24 | }
| - borrowed value only lives until here
|
= note: borrowed value must be valid for the static lifetime...
If I understood well.. the lifetime of args must be static (i.e. the entire time of program execution), while it only lives within the scope of main function (?). I don't understand the reason behind this, and how I could fix it.
The problem lies in spawning a background thread. When you call thread::spawn you effectively have to pass ownership of any resource used in it to the thread, as it might run indefinitely, which means that its lifetime must be 'static.
There are two options to resolve that: the simplest one would be to pass ownership. Your code here
let new_arg = arg.to_string() + "foo";
t.send(arg);
looks like you actually wanted to send new_arg, in which case you could just create the owned result of arg.to_string() before spawning the thread, thus eliminating the need to pass the reference arg.
Another slightly more involved idea, that might be useful at some point though, are scoped threads as implemented in crossbeam for example. These are bound to an explicit scope, where you spawn them and are joined together at the end. This looks somewhat like this:
crossbeam::scope(|scope| {
scope.spawn(|| {
println!("Hello from a scoped thread!");
});
});
Have a look at the docs for further details.

Cannot move data out of a Mutex

Consider the following code example, I have a vector of JoinHandlers in which I need it iterate over to join back to the main thread, however, upon doing so I am getting the error error: cannot move out of borrowed content.
let threads = Arc::new(Mutex::new(Vec::new()));
for _x in 0..100 {
let handle = thread::spawn(move || {
//do some work
}
threads.lock().unwrap().push((handle));
}
for t in threads.lock().unwrap().iter() {
t.join();
}
Unfortunately, you can't do this directly. When Mutex consumes the data structure you fed to it, you can't get it back by value again. You can only get &mut reference to it, which won't allow moving out of it. So even into_iter() won't work - it needs self argument which it can't get from MutexGuard.
There is a workaround, however. You can use Arc<Mutex<Option<Vec<_>>>> instead of Arc<Mutex<Vec<_>>> and then just take() the value out of the mutex:
for t in threads.lock().unwrap().take().unwrap().into_iter() {
}
Then into_iter() will work just fine as the value is moved into the calling thread.
Of course, you will need to construct the vector and push to it appropriately:
let threads = Arc::new(Mutex::new(Some(Vec::new())));
...
threads.lock().unwrap().as_mut().unwrap().push(handle);
However, the best way is to just drop the Arc<Mutex<..>> layer altogether (of course, if this value is not used from other threads).
As referenced in How to take ownership of T from Arc<Mutex<T>>? this is now possible to do without any trickery in Rust using Arc::try_unwrap and Mutex.into_inner()
let threads = Arc::new(Mutex::new(Vec::new()));
for _x in 0..100 {
let handle = thread::spawn(move || {
println!("{}", _x);
});
threads.lock().unwrap().push(handle);
}
let threads_unwrapped: Vec<JoinHandle<_>> = Arc::try_unwrap(threads).unwrap().into_inner().unwrap();
for t in threads_unwrapped.into_iter() {
t.join().unwrap();
}
Play around with it in this playground to verify.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9d5635e7f778bc744d1fb855b92db178
while the drain is a good solution, you can also do the following thing
// with a copy
let built_words: Arc<Mutex<Vec<String>>> = Arc::new(Mutex::new(vec![]));
let result: Vec<String> = built_words.lock().unwrap().clone();
// using drain
let mut locked_result = built_words.lock().unwrap();
let mut result: Vec<String> = vec![];
result.extend(locked_result.drain(..));
I would prefer to clone the data to get the original value. Not sure if it has any performance overhead.

Resources