What is the recommended way to propagate panics in tokio tasks? - rust

Right now my panics are being swallowed. In my use case, I would like it to crash entire program and also print the stack trace. How should I configure it?

Panics are generally not swallowed, instead they are returned as an error when awaiting the tokio::task::JoinHandle returned from tokio::task::spawn() or tokio::task::spawn_blocking() and can be handled accordingly.
If a panic occurs within the Tokio runtime an error message is printed to stderr like this: "thread 'tokio-runtime-worker' panicked at 'Panicking...', src\main.rs:26:17". If you run the binary with the environment variable RUST_BACKTRACE set to 1 a stacktrace is printed as well.
As with all Rust programs you can set your own panic handler with std::panic::set_hook() to make it exit if any thread panics after printing the panic info like this:
let default_panic = std::panic::take_hook();
std::panic::set_hook(Box::new(move |info| {
default_panic(info);
std::process::exit(1);
}));

Related

What happens when a panic hook panics?

What happens if the function I passed to std::panic::set_hook panics?
I can imagine many ways of reacting to this: consider this UB, abort the program like C++ does, invoke the panic handler again for the new panic, simply abort the execution of the hook... What exactly does Rust promise here?
Context. I'm writing a web app with Rust/WASM backend and I would like to make a panic hook that sends any errors to the server for debugging. This involves a network operation, which can itself fail. So I'm trying to figure out how I can ensure some reasonable behavior in this double-failure scenario.
It's not documented outside of the source code.
The source code for the panic entry point in std has this comment:
// If this is the third nested call (e.g., panics == 2, this is 0-indexed),
// the panic hook probably triggered the last panic, otherwise the
// double-panic check would have aborted the process. In this case abort the
// process real quickly as we don't want to try calling it again as it'll
// probably just panic again.
So the answer to your question is either "invoke the panic handler again for the new panic" or "abort the program" depending on how many times the hook already panicked.
This all assumes you aren't using #![no_std]. If you are then you're either disabling panicking altogether or you are implementing your own panic handler with #[panic_handler], in which case you get to decide what happens yourself.

Why does a panic while panicking result in an illegal instruction?

Consider the following code that purposely causes a double panic:
use scopeguard::defer; // 1.1.0
fn main() {
defer!{ panic!() };
defer!{ panic!() };
}
I know this typically happens when a Drop implementation panics while unwinding from a previous panic, but why does it cause the program to issue an illegal instruction? That sounds like the code is corrupted or jumped somewhere unintended. I figure this might be system or code generation dependent but I tested on various platforms and they all issue similar errors with the same reason:
Linux:
thread panicked while panicking. aborting.
Illegal instruction (core dumped)
Windows (with cargo run):
thread panicked while panicking. aborting.
error: process didn't exit successfully: `target\debug\tests.exe` (exit code: 0xc000001d, STATUS_ILLEGAL_INSTRUCTION)
The Rust Playground:
thread panicked while panicking. aborting.
timeout: the monitored command dumped core
/playground/tools/entrypoint.sh: line 11: 8 Illegal instruction timeout --signal=KILL ${timeout} "$#"
What's going on? What causes this?
This behavior is intended.
From a comment by Jonas Schievink in Why does panicking in a Drop impl cause SIGILL?:
It calls intrinsics::abort(), which LLVM turns into a ub2 instruction, which is illegal, thus SIGILL
I couldn't find any documentation for how double panics are handled, but a paragraph for std::intrinsics::abort() lines up with this behavior:
The current implementation of intrinsics::abort is to invoke an invalid instruction, on most platforms. On Unix, the process will probably terminate with a signal like SIGABRT, SIGILL, SIGTRAP, SIGSEGV or SIGBUS. The precise behaviour is not guaranteed and not stable.
Curiously, this behavior is different from calling std::process::abort(), which always terminates with SIGABRT.
The illegal instruction of choice on x86 is UD2 (I think a typo in the comment above) a.k.a. an undefined instruction which is paradoxically reserved and documented to not be an instruction. So there is no corruption or invalid jump, just a quick and loud way to tell the OS that something has gone very wrong.

Rust Embedded panic destroys stack

I'm using Rust on a Cortex-M4 and using gdb with openocd to debug it.
From C(++) I'm used to looking at the call stack when an exception (like a hardfault) happens. It's really helpful to see which line caused the exception.
However, in Rust, when a panic happens, the call stack is almost empty. Why does this happen?
Is there a way to make Rust preserve the stack (only for the debugger, I don't need to print it)? Or can I insert a breakpoint somewhere where the call stack hasn't been destroyed yet?
Right now I have an unwrap somewhere that panics, but I can't find where unless I step through a whole lot of code.
EDIT: This is the stack trace I do get in the panic handler:
i stack
#0 rust_begin_unwind (info=0x2001f810) at src\main.rs:122
#1 0x080219dc in cortex_m::itm::write_fmt (port=0x2001f820, args=...) at C:\Users\d.dokter\.cargo\registry\src\github.com-1ecc6299db9ec823\cortex-m-0.6.1\src/itm.rs:128
#2 0x2001f894 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
It's also weird that the write_fmt function is on the stack as that is being called inside the handler to log the panic. I find that 0x2001f894 address very suspicious as well, because that's a RAM address.
The easiest solution is to set the panic handler to call abort() instead of unwinding the stack. This can be done by adding this to your Cargo.toml
[profile.dev]
panic = "abort"
[profile.release]
panic = "abort"
With this setting, the panic handler will immediately call abort(), so gdb can still see the whole backtrace.
If you just want to print the stack trace, you can also set the environment variable RUST_BACKTRACE=1.

How to check if a thread has finished in Rust?

When I spawn a thread in Rust, I get a JoinHandle, which is good for... joining (a blocking operation), and not much else. How can I check if a child thread has exited (i.e., JoinHandle.join() would not block) from the parent thread? Bonus points if you know how to kill a child thread.
I imagine you could do this by creating a channel, sending something to the child, and catching errors, but that seems like needless complexity and overhead.
As of Rust 1.7, there's no API in the standard library to check if a child thread has exited without blocking.
A portable workaround would be to use channels to send a message from the child to the parent to signal that the child is about to exit. Receiver has a non-blocking try_recv method. When try_recv does receive a message, you can then use join() on the JoinHandle to retrieve the thread's result.
There are also unstable platform-specific extension traits that let you obtain the raw thread handle. You'd then have to write platform-specific code to test whether the thread has exited or not.
If you think this feature should be in Rust's standard library, you can submit an RFC (be sure to read the README first!).
Bonus points if you know how to kill a child thread.
Threads in Rust are implemented using native OS threads. Even though the operating system might provide a way to kill a thread, it's a bad idea to do so, because the resources that the thread allocated will not be cleaned up until the process ends.
The short answer is not possible yet. But this is not the point that should really be addressed.
Bonus points if you know how to kill a child thread.
NEVER
Even in languages that do support killing threads (see Java here), it is recommended not to.
A thread's execution is generally coded with explicit points of interactions, and there are often implicit assumptions that no other interruption will occur.
The most egregious example is of course resources: the naive "kill" method would be to stop executing the thread; this would mean not releasing any resource. You may think about memory, it's the least of your worries. Imagine, instead, all the Mutex that are not unlocked and will create deadlocks later...
The other option would be to inject a panic in the thread, which would cause unwinding. However, you cannot just start unwinding at any point! The program would have to define safe points at which injecting a panic would be guaranteed to be safe (injecting it at any other point means potentially corrupting shared objects); how to define such safe points and inject the panic there is an open research problem in native languages, especially those executed on systems W^X (where memory pages are either Writable or Executable but never both).
In summary, there is no known way to safely (both memory-wise and functionality-wise) kill a thread.
It's possible, friends. Use refcounters which Rust will drop on end or panic. 100% safe. Example:
use std::time::Duration;
use std::sync::Arc;
use std::sync::atomic::{AtomicBool, Ordering};
use std::thread;
fn main() {
// Play with this flag
let fatal_flag = true;
let do_stop = true;
let working = Arc::new(AtomicBool::new(true));
let control = Arc::downgrade(&working);
thread::spawn(move || {
while (*working).load(Ordering::Relaxed) {
if fatal_flag {
panic!("Oh, my God!");
} else {
thread::sleep(Duration::from_millis(20));
println!("I'm alive!");
}
}
});
thread::sleep(Duration::from_millis(50));
// To stop thread
if do_stop {
match control.upgrade() {
Some(working) => (*working).store(false, Ordering::Relaxed),
None => println!("Sorry, but thread died already."),
}
}
thread::sleep(Duration::from_millis(50));
// To check it's alive / died
match control.upgrade() {
Some(_) => println!("Thread alive!"),
None => println!("Thread ends!"),
}
}
Gist: https://gist.github.com/DenisKolodin/edea80f2f5becb86f718c330219178e2
At playground: https://play.rust-lang.org/?gist=9a0cf161ba0bbffe3824b9db4308e1fb&version=stable&backtrace=0
UPD: I've created thread-control crate which implements this approach: https://github.com/DenisKolodin/thread-control
I think Arc can be used to solve this problem
If the thread exits, the reference counter is reduced by one
As of rust 1.61.0, there is an is_finished method.
https://doc.rust-lang.org/stable/std/thread/struct.JoinHandle.html#method.is_finished

How to handle long running external function calls such as blocking I/O in Rust?

Editor's note: This question is from a version of Rust prior to 1.0 and uses terms and functions that do not exist in Rust 1.0 code. The concepts expressed are still relevant.
I need to read data provided by an external process via a POSIX file descriptor in my Rust program. The file descriptor connection is kept up a very long time (hours) and the other side passes data to me from time to time. I need to read and process the data stream continuously.
To do so, I wrote a loop that calls libc::read() (readv actually) to read the data and processes it when received. Since this would block the whole scheduler, I'm spawning a task on a new scheduler (task::spawn_sched(SingleThreaded)). This works fine as long as it runs, but I can't find a way to cleanly shut down the loop.
Since the loop is blocking most of the time, I can't use a port/channel to notify the loop to exit.
I tried to kill the loop task by taking it down using a failing linked task (spawn the loop task supervised, spawn a linked task within it and wait for a signal on a port to happen before fail!()ing and taking down the loop task with it). It works well in tests, but the libc::read() isn't interrupted (the task doesn't fail before read finishes and it hits task::yield() at some time.
I learned a lot looking at libcore sources, but I can't seem to find a proper solution.
Is there a way to kill a (child) task in Rust even if it's doing some long external function call like a blocking read?
Is there a way to do non-blocking reads on a POSIX file descriptor so that Rust keeps control over the task?
How can I react to signals, e.g. SIGTERMif the user terminates my program? There doesn't seem to be something like sigaction() in Rust yet.
According to mozila, killing a task is no more possible, for now, let alone blocking read.
It will be possible to do so after mozilla/rust/pull/11410, see also my other issue report for rust-zmq erickt/rust-zmq/issues/24 which also depends on this. (sorry about the links)
Maybe the signal listener will work for you.
Is there a way to kill a (child) task in Rust even if it's doing some long external function call like a blocking read?
No.
See also:
How does Rust handle killing threads?
How to terminate or suspend a Rust thread from another thread?
What is the standard way to get a Rust thread out of blocking operations?
Is there a way to do non-blocking reads [...] so that Rust keeps control over the task?
Yes.
See also:
How can I read non-blocking from stdin?
How do I read the output of a child process without blocking in Rust?
How can I force a thread that is blocked reading from a file to resume in Rust?
Force non blocking read with TcpStream
on a POSIX file descriptor
Yes.
See also:
How can I read from a specific raw file descriptor in Rust?
How do I write to a specific raw file descriptor from Rust?
How to get tokio-io's async_read for a File handle
How to asynchronously read a file?
How can I react to signals
Decide your desired platform support, then pick an appropriate crate.
See also:
How to catch signals in Rust
Is there a way to listen to signals on Windows
How to handle SIGSEGV signal in userspace using Rust?
Putting it all together
use future::Either;
use signal_hook::iterator::Signals;
use std::os::unix::io::FromRawFd;
use tokio::{fs::File, io, prelude::*};
type Result<T> = std::result::Result<T, Box<dyn std::error::Error>>;
fn main() -> Result<()> {
let signals = Signals::new(&[signal_hook::SIGUSR1])?;
let signals = signals.into_async()?;
let input = unsafe { std::fs::File::from_raw_fd(5) };
let input = File::from_std(input);
let lines = io::lines(std::io::BufReader::new(input));
let signals = signals.map(Either::A);
let lines = lines.map(Either::B);
let combined = signals.select(lines);
tokio::run({
combined
.map_err(|e| panic!("Early error: {}", e))
.for_each(|v| match v {
Either::A(signal) => {
println!("Got signal: {:?}", signal);
Err(())
}
Either::B(data) => {
println!("Got data: {:?}", data);
Ok(())
}
})
});
Ok(())
}
Cargo.toml
[package]
name = "future_example"
version = "0.1.0"
authors = ["An Devloper <an.devloper#example.com>"]
edition = "2018"
[dependencies]
tokio = "0.1.22"
signal-hook = { version = "0.1.9", features = ["tokio-support"] }
shim.sh
#!/bin/bash
set -eu
exec 5< /tmp/testpipe
exec ./target/debug/future_example
Execution
cargo build
mkfifo /tmp/testpipe
./shim.sh
Another terminal
printf 'hello\nthere\nworld' > /tmp/testpipe
kill -s usr1 $PID_OF_THE_PROCESS

Resources