How can I cause a panic on a thread to immediately end the main thread? - rust

In Rust, a panic terminates the current thread but is not sent back to the main thread. The solution we are told is to use join. However, this blocks the currently executing thread. So if my main thread spawns 2 threads, I cannot join both of them and immediately get a panic back.
let jh1 = thread::spawn(|| { println!("thread 1"); sleep(1000000); };
let jh2 = thread::spawn(|| { panic!("thread 2") };
In the above, if I join on thread 1 and then on thread 2 I will be waiting for 1 before ever receiving a panic from either thread
Although in some cases I desire the current behavior, my goal is to default to Go's behavior where I can spawn a thread and have it panic on that thread and then immediately end the main thread. (The Go specification also documents a protect function, so it is easy to achieve Rust behavior in Go).

Updated for Rust 1.10+, see revision history for the previous version of the answer
good point, in go the main thread doesn't get unwound, the program just crashes, but the original panic is reported. This is in fact the behavior I want (although ideally resources would get cleaned up properly everywhere).
This you can achieve with the recently stable std::panic::set_hook() function. With it, you can set a hook which prints the panic info and then exits the whole process, something like this:
use std::thread;
use std::panic;
use std::process;
fn main() {
// take_hook() returns the default hook in case when a custom one is not set
let orig_hook = panic::take_hook();
panic::set_hook(Box::new(move |panic_info| {
// invoke the default handler and exit the process
orig_hook(panic_info);
process::exit(1);
}));
thread::spawn(move || {
panic!("something bad happened");
}).join();
// this line won't ever be invoked because of process::exit()
println!("Won't be printed");
}
Try commenting the set_hook() call out, and you'll see that the println!() line gets executed.
However, this approach, due to the use of process::exit(), will not allow resources allocated by other threads to be freed. In fact, I'm not sure that Go runtime allows this as well; it is likely that it uses the same approach with aborting the process.

I tried to force my code to stop processing when any of threads panicked. The only more-or-less clear solution without using unstable features was to use Drop trait implemented on some struct. This can lead to a resource leak, but in my scenario I'm ok with this.
use std::process;
use std::thread;
use std::time::Duration;
static THREAD_ERROR_CODE: i32 = 0x1;
static NUM_THREADS: u32 = 17;
static PROBE_SLEEP_MILLIS: u64 = 500;
struct PoisonPill;
impl Drop for PoisonPill {
fn drop(&mut self) {
if thread::panicking() {
println!("dropped while unwinding");
process::exit(THREAD_ERROR_CODE);
}
}
}
fn main() {
let mut thread_handles = vec![];
for i in 0..NUM_THREADS {
thread_handles.push(thread::spawn(move || {
let b = PoisonPill;
thread::sleep(Duration::from_millis(PROBE_SLEEP_MILLIS));
if i % 2 == 0 {
println!("kill {}", i);
panic!();
}
println!("this is thread number {}", i);
}));
}
for handle in thread_handles {
let _ = handle.join();
}
}
No matter how b = PoisonPill leaves it's scope, normal or after panic!, its Drop method kicks in. You can distinguish if the caller panicked using thread::panicking and take some action — in my case killing the process.

Looks like exiting the whole process on a panic in any thread is now (rust 1.62) as simple as adding this to your Cargo.toml:
[profile.release]
panic = 'abort'
[profile.dev]
panic = 'abort'
A panic in a thread then looks like this, with exit code 134:
thread '<unnamed>' panicked at 'panic in thread', src/main.rs:5:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Aborted (core dumped)

Related

Why is fs::read_dir() thread safe on POSIX platforms

Some Background
Originally, Rust switched from readdir(3) to readdir_r(3) for thread safety. But readdir_r(3) has some problems, then they changed it back:
Linux and Android: fs: Use readdir() instead of readdir_r() on Linux and Android
Fuchsia: Switch Fuchsia to readdir (instead of readdir_r)
...
So, in the current implementation, they use readdir(3) on most POSIX platforms
#[cfg(any(
target_os = "android",
target_os = "linux",
target_os = "solaris",
target_os = "fuchsia",
target_os = "redox",
target_os = "illumos"
))]
fn next(&mut self) -> Option<io::Result<DirEntry>> {
unsafe {
loop {
// As of POSIX.1-2017, readdir() is not required to be thread safe; only
// readdir_r() is. However, readdir_r() cannot correctly handle platforms
// with unlimited or variable NAME_MAX. Many modern platforms guarantee
// thread safety for readdir() as long an individual DIR* is not accessed
// concurrently, which is sufficient for Rust.
super::os::set_errno(0);
let entry_ptr = readdir64(self.inner.dirp.0);
Thread issue of readdir(3)
The problem of readdir(3) is that its return value (struct dirent *) is a pointer pointing to the internal buffer of the directory stream (DIR), thus can be overwritten by the following readdir(3) calls. So if we have a DIR stream, and share it with multiple threads, with all threads calling readdir(3), which is a race condition.
If we want to safely handle this, an external synchronization is needed.
My question
Then I am curious about what Rust did to avoid such issues. Well, it seems that they just call readdir(3), memcpy the return value to their caller-allocated buffer, and then return. But this function is not marked as unsafe, this makes me confused.
So my question is why is it safe to call fs::read_dir() in multi-threaded programs?
There is a comment stating that it is safe to use it in Rust without extra external synchronization, but I didn't get it...
It requires external synchronization if a particular directory stream may be shared among threads, but I believe we avoid that naturally from the lack of &mut aliasing. Dir is Sync, but only ReadDir accesses it, and only from its mutable Iterator implementation.
OP's edit after 3 months
At the time of writing this question, I was not familiar with multi-threaded programming in Rust. After refining my skill, taking another look at this post makes me realize that it is pretty easy to verify this question:
// With scpped threads
// Does not compile since we can not mutably borrow pwd more than once
use std::{
fs::read_dir,
thread::{scope, spawn},
};
fn main() {
let mut pwd = read_dir(".").unwrap();
scope(|s| {
for _ in 1..10 {
s.spawn(|| {
let entry = pwd.next().unwrap().unwrap();
println!("{:?}", entry.file_name());
});
}
})
}
// Use interior mutability to share it with multiple threads
// This code does compile because synchronization is applied (RwLock)
use std::{
fs::read_dir,
sync::{Arc, RwLock},
thread::spawn,
};
fn main() {
let pwd = Arc::new(RwLock::new(read_dir(".").unwrap()));
for _ in 1..10 {
spawn({
let pwd = Arc::clone(&pwd);
move || {
let entry = pwd.write().unwrap().next().unwrap().unwrap();
println!("{:?}", entry.file_name());
}
}).join().unwrap();
}
}
readdir is not safe when called from multiple threads with the same DIR* dirp parameter (i.e. with the same self.inner.dirp.0 in the Rust case) but it may be called safely with different dirps. Since calling ReadDir::next requires a &mut self, it is guaranteed that nobody else can call it from another thread at the same time on the same ReadDir instance, and so it is safe.

Rust deadlock with shared struct: Arc + channel + atomic

I'm new to Rust and was trying to generate plenty of JSON data on the fly for a project, but I'm having deadlocks.
I've tried removing the serialization (json_serde) and sending the HashMaps in the channel instead but I still get deadlocks on my computer. If I however comment the send(generator.next()) line and send a string myself, code works flawlessly, thus the deadlock is caused by my DatasetGenerator, but I don't understand why.
Code summary:
Have a DatasetGenerator object that can generate sequences of "events" and serialize them to JSON.
generator.next() works like an "iterator" - It increments an internal atomic counter in the generator and then generates the i-th item in the sequence + serializes the JSON.
Have a generator threadpool generate these JSONs at high throughput (very large payloads each)
Send these JSONs through a channel to other thread (which will send them through network but irrelevant for this question)
Depending if I comment tx_ref.send(generator_ref.next()) or tx_ref.send(some_new_string) below my code deadlocks or succeeds:
src/main.rs:
extern crate threads_pool;
use threads_pool::*;
mod generator;
use std::sync::mpsc;
use std::sync::Arc;
use std::thread;
fn main() {
// N will be an argument, and a very high number. For tests use this:
const N: i64 = 12; // Increase this if you're not getting the deadlock yet, or run cargo run again until it happens.
let (tx, rx) = mpsc::channel();
let tx_producer = tx.clone();
let producer_thread = thread::spawn(move || {
let pool = ThreadPool::new(4);
let generator = Arc::new(generator::data_generator::DatasetGenerator::new(3000));
for i in 0..N {
println!("Generating #{}", i);
let tx_ref = tx_producer.clone();
let generator_ref = generator.clone();
pool.execute(move || {
////////// v !!!DEADLOCK HERE!!! v //////////
tx_ref.send(generator_ref.next()).expect("tx failed."); // This locks!
//tx_ref.send(format!(" {} ", i)).expect("tx failed."); // This works!
////////// ^ !!!DEADLOCK HERE!!! ^ //////////
})
.unwrap();
}
println!("Generator done!");
});
println!("-» Consumer consuming!");
for j in 0..N {
let s = rx.recv().expect("rx failed");
println!("-» Consumed #{}: {} ... ", j, &s[..10]);
}
println!("Consumer done!!");
producer_thread.join().unwrap();
println!("Success. Exit!");
}
This is my DatasetGenerator which seems to be causing all the trouble (as not using serde but outputting the HashMaps still gives deadlocks). src/generator/dataset_generator.rs:
use serde_json::Value;
use std::collections::HashMap;
use std::sync::atomic;
pub struct DatasetGenerator {
num_features: usize,
pub counter: atomic::AtomicI64,
feature_names: Vec<String>,
}
type Datapoint = HashMap<String, Value>;
type Out = String;
impl DatasetGenerator {
pub fn new(num_features: usize) -> DatasetGenerator {
let mut feature_names = Vec::new();
for i in 0..num_features {
feature_names.push(format!("f_{}", i));
}
DatasetGenerator {
num_features,
counter: atomic::AtomicI64::new(0),
feature_names,
}
}
/// Generates the next item in the sequence (iterator-like).
pub fn next(&self) -> Out {
let value = self.counter.fetch_add(1, atomic::Ordering::SeqCst);
self.gen(value)
}
/// Generates the ith item in the sequence. DEADLOCKS!!! ///////////////////////////
pub fn gen(&self, ith: i64) -> Out {
let mut data = Datapoint::with_capacity(self.num_features);
for f in 0..self.num_features {
let name = self.feature_names.get(f).unwrap();
data.insert(name.to_string(), Value::from(ith));
}
serde_json::json!(data).to_string() // Tried without serialization and still deadlocks!
}
}
Commit with deadlock code is here if you want to try out yourself with cargo run: https://github.com/AlbertoEAF/learn-rust/tree/dc5fa867e5a70b605553ef65796fdc9dd42d38a0/rest-injector
Deadlock on Windows with Rust 1.60.0:
Thank you for the help! it's greatly appreciated :)
Update
I've followed the suggestions from #kmdreko's answer below, and apparently the problem is in the generator: not all the items are generated. Even though pool.execute() is called N times, only a random number of closures c < N are executed even if I place pool.close() before leaving the producer_thread. Why does that happen / How can it be fixed?
Fix: Turns out this lockup is caused by the threads_pool library (0.2.6). I switched the thread pool to rayon's and it worked smoothly at the first try.
One thing you should change: an mpsc::Receiver will return an error on .recv() if it cannot possibly yield a result by realizing that all the associated mpsc::Senders have dropped, which is a good indicator that all the work is done. Your tx_refs and even tx_producer will be dropped when their respective tasks/threads complete, however you still have tx in scope that can theoretically give a value. This is what gives you the apparent deadlock. You should simply remove tx_producer and use tx directly so it is moved into the producer thread and dropped accordingly.
Now, you'll see either all N tasks complete, or you'll get an error indicating that some tasks did not complete. The reason not all tasks are completing is because you're creating the thread pool, spawning all the tasks, and then immediately destroying it. The threads_pool documentation says that the threads will finish their current job when the pool is destroyed, but you want to wait until all jobs have completed. For that you need to call the .close() method provided by the PoolManager trait before the end of the closure.
The reason you saw inconsistent behavior, but was benefited by returning a string directly is because the jobs required less work and the threads could get away with completing all them before they saw their signal to exit. Your generator_ref.next() requires much more computation so its not surprising they'd only process 4-plus-a-bit jobs before they see they've been told to exit.

How can I move the data between threads safely?

I'm currently trying to call a function to which I pass multiple file names and expect the function to read the files and generate the appropriate structs and return them in a Vec<Audit>. I've been able to accomplish it reading the files one by one but I want to achieve it using threads.
This is the function:
fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
let mut audits = Arc::new(Mutex::new(vec![]));
let mut handlers = vec![];
for file in files {
let audits = Arc::clone(&audits);
handlers.push(thread::spawn(move || {
let mut audits = audits.lock().unwrap();
audits.push(audit_from_xml_file(file.clone()));
audits
}));
}
for handle in handlers {
let _ = handle.join();
}
audits
.lock()
.unwrap()
.into_iter()
.fold(vec![], |mut result, audit| {
result.push(audit);
result
})
}
But it won't compile due to the following error:
error[E0277]: `MutexGuard<'_, Vec<Audit>>` cannot be sent between threads safely
--> src/main.rs:82:23
|
82 | handlers.push(thread::spawn(move || {
| ^^^^^^^^^^^^^ `MutexGuard<'_, Vec<Audit>>` cannot be sent between threads safely
|
::: /home/enthys/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:618:8
I have tried wrapping the generated Audit structs in Some(Audit) to avoid the MutexGuard but then I stumble with Poisonned Thread issues.
The cause of the error is that after after pushing the new Audit into the (locked) audits vec you then try to return the vec's MutexGuard.
In Rust, a thread's function can actually return values, the point of doing that is to send the value back to whoever is join-ing the thread. This means the value is going to move between threads, so the value needs to be movable betweem threads (aka Send), which mutex guards have no reason to be[0].
The easy solution is to just... not do that. Just delete the last line of the spawn function. Though it's not like the code works after that as you still have borrowing issue related to the thing at the end.
An alternative is to lean into the feature (especially if Audit objects are not too big): drop the audits vec entirely and instead have each thread return its audit, then collect from the handlers when you join them:
pub fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
let mut handlers = vec![];
for file in files {
handlers.push(thread::spawn(move || {
audit_from_xml_file(file)
}));
}
handlers.into_iter()
.map(|handler| handler.join().unwrap())
.collect()
}
Though at that point you might as well just let Rayon handle it:
use rayon::prelude::*;
pub fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
files.into_par_iter().map(audit_from_xml_file).collect()
}
That also avoids crashing the program or bringing the machine to its knees if you happen to have millions of files.
[0] and all the reasons not to be, locking on one thread and unlocking on an other is not necessarily supported e.g. ReleaseMutex
The ReleaseMutex function fails if the calling thread does not own the mutex object.
(NB: in the windows lingo, "owning" a mutex means having acquired it via WaitForSingleObject, which translates to lock in posix lingo)
and can be plain UB e.g. pthread_mutex_unlock
If a thread attempts to unlock a mutex that it has not locked or a mutex which is unlocked, undefined behavior results.
Your problem is that you are passing your Vec<Audit> (or more precisely the MutexGuard<Vec<Audit>>), to the threads and back again, without really needing it.
And you don't need Mutex or Arc for this simpler task:
fn generate_audits_from_files(files: Vec<String>) -> Vec<Audit> {
let mut handlers = vec![];
for file in files {
handlers.push(thread::spawn(move || {
audit_from_xml_file(file)
}));
}
handlers
.into_iter()
.flat_map(|x| x.join())
.collect()
}

Improve Rust's Future to do not create separate thread

I have written a simple future based on this tutorial which looks like this:
extern crate chrono; // 0.4.6
extern crate futures; // 0.1.25
use std::{io, thread};
use chrono::{DateTime, Duration, Utc};
use futures::{Async, Future, Poll, task};
pub struct WaitInAnotherThread {
end_time: DateTime<Utc>,
running: bool,
}
impl WaitInAnotherThread {
pub fn new(how_long: Duration) -> WaitInAnotherThread {
WaitInAnotherThread {
end_time: Utc::now() + how_long,
running: false,
}
}
pub fn run(&mut self, task: task::Task) {
let lend = self.end_time;
thread::spawn(move || {
while Utc::now() < lend {
let delta_sec = lend.timestamp() - Utc::now().timestamp();
if delta_sec > 0 {
thread::sleep(::std::time::Duration::from_secs(delta_sec as u64));
}
task.notify();
}
println!("the time has come == {:?}!", lend);
});
}
}
impl Future for WaitInAnotherThread {
type Item = ();
type Error = Box<io::Error>;
fn poll(&mut self) -> Poll<Self::Item, Self::Error> {
if Utc::now() < self.end_time {
println!("not ready yet! parking the task.");
if !self.running {
println!("side thread not running! starting now!");
self.run(task::current());
self.running = true;
}
Ok(Async::NotReady)
} else {
println!("ready! the task will complete.");
Ok(Async::Ready(()))
}
}
}
So the question is how do I replace pub fn run(&mut self, task: task::Task) with something that will not create a new thread for the future to resolve. It be useful if someone could rewrite my code with replaced run function without separate thread it will help me to understand how things should be. Also I know that tokio has an timeout implementation but I need this code for learning.
I think I understand what you mean.
Lets say you have two task, the Main and the Worker1, in this case you are polling the worker1 to wait for an answer; BUT there is a better way, and this is to wait for competition of the Worker1; and this can be done without having any Future, you simply call from Main the Worker1 function, when the worker is over the Main will go on. You need no future, you are simply calling a function, and the division Main and Worker1 is just an over-complication.
Now, I think your question became relevant in the moment you add at least another worker, last add Worker2, and you want the Main to resume the computation as soon as one of the two task complete; and you don't want those task to be executed in another thread/process, maybe because you are using asynchronous call (which simply mean the threading is done somewhere else, or you are low level enough that you receive Hardware Interrupt).
Since your Worker1 and Worker2 have to share the same thread, you need a way to save the current execution Main, create the one for one of the worker, and after a certain amount of work, time or other even (Scheduler), switch to the other worker, and so on. This is a Multi-Tasking system, and there are various software implementation for it in Rust; but with HW support you could do things that in software only you could not do (like have the hardware prevent one Task to access the resource from the other), plus you can have the CPU take care of the task switching and all... Well, this is what Thread and Process are.
Future are not what you are looking for, they are higher level and you can find some software scheduler that support them.

How to drop the environment of a closure passed to futures-cpupool?

I have the following code:
extern crate futures;
extern crate futures_cpupool;
extern crate tokio_timer;
use std::time::Duration;
use futures::Future;
use futures_cpupool::CpuPool;
use tokio_timer::Timer;
fn work(foo: Foo) {
std::thread::sleep(std::time::Duration::from_secs(10));
}
#[derive(Debug)]
struct Foo { }
impl Drop for Foo {
fn drop(&mut self) {
println!("Dropping Foo");
}
}
fn main() {
let pool = CpuPool::new_num_cpus();
let foo = Foo { };
let work_future = pool.spawn_fn(|| {
let work = work(foo);
let res: Result<(), ()> = Ok(work);
res
});
println!("Created the future");
let timer = Timer::default();
let timeout = timer.sleep(Duration::from_millis(750))
.then(|_| Err(()));
let select = timeout.select(work_future).map(|(win, _)| win);
match select.wait() {
Ok(()) => { },
Err(_) => { },
}
}
It seems this code doesn't execute Foo::drop - no message is printed.
I expected foo to be dropped as soon as timeout future resolves in select, as it's a part of environment of a closure, passed to dropped future.
How to make it execute Foo::drop?
The documentation for CpuPool states:
The worker threads associated with a thread pool are kept alive so long as there is an open handle to the CpuPool or there is work running on them. Once all work has been drained and all references have gone away the worker threads will be shut down.
Additionally, you transfer ownership of foo from main to the closure, which then transfers it to work. work will drop foo at the end of the block. However, work is also performing a blocking sleep operation. This sleep counts as work running on the thread.
The sleep is still going when the main thread exits, which immediately tears down the program, and all the threads, without any time to clean up.
As pointed out in How to terminate or suspend a Rust thread from another thread? (and other questions in other languages), there's no safe way to terminate a thread.
I expected foo to be dropped as soon as timeout future resolves in select, as it's a part of environment of a closure, passed to dropped future.
The future doesn't actually "have" the closure or foo. All it has is a handle to the thread:
pub struct CpuFuture<T, E> {
inner: Receiver<thread::Result<Result<T, E>>>,
keep_running_flag: Arc<AtomicBool>,
}
Strangely, the docs say:
If the returned future is dropped then this CpuPool will attempt to cancel the computation, if possible. That is, if the computation is in the middle of working, it will be interrupted when possible.
However, I don't see any implementation for Drop for CpuFuture, so I don't see how it could be possible (or safe). Instead of Drop, the threadpool itself runs a Future. When that future is polled, it checks to see if the receiver has been dropped. This behavior is provided by the oneshot::Receiver. However, this has nothing to do with threads, which are outside the view of the future.

Resources