Program with a spawned thread panics when optimization enabled - multithreading

When I use rustc 1.rs to compile the following code, it runs endlessly as expected.
use std::thread;
fn main() {
thread::spawn(|| {
let a = 2;
loop {
a*a;
}
}).join();
}
A shorter version:
use std::thread;
fn main() {
thread::spawn(|| {
loop {}
}).join();
}
However, if I use rustc -O 1.rs to compile two programs above, they crash:
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Error { repr: Os { code: 0, message: "Success" } }', src/libcore/result.rs:837
stack backtrace:
1: 0x5650bd0acada - std::sys::imp::backtrace::tracing::imp::write::h917062bce4ff48c3
at /build/rustc-1.14.0+dfsg1/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:42
2: 0x5650bd0b068f - std::panicking::default_hook::{{closure}}::h0bacac31b5ed1870
at /build/rustc-1.14.0+dfsg1/src/libstd/panicking.rs:247
3: 0x5650bd0aee7c - std::panicking::default_hook::h5897799da33ece67
at /build/rustc-1.14.0+dfsg1/src/libstd/panicking.rs:263
4: 0x5650bd0af4d7 - std::panicking::rust_panic_with_hook::h109e116a3a861224
at /build/rustc-1.14.0+dfsg1/src/libstd/panicking.rs:451
5: 0x5650bd0af364 - std::panicking::begin_panic::hbb38be1379e09df0
at /build/rustc-1.14.0+dfsg1/src/libstd/panicking.rs:413
6: 0x5650bd0af289 - std::panicking::begin_panic_fmt::h26713cea9bce3ab0
at /build/rustc-1.14.0+dfsg1/src/libstd/panicking.rs:397
7: 0x5650bd0af217 - rust_begin_unwind
at /build/rustc-1.14.0+dfsg1/src/libstd/panicking.rs:373
8: 0x5650bd0e2f3d - core::panicking::panic_fmt::hcfbb59eeb7f27f75
at /build/rustc-1.14.0+dfsg1/src/libcore/panicking.rs:69
9: 0x5650bd0a6e84 - core::result::unwrap_failed::h15a0fc826f4081f4
10: 0x5650bd0b7ffa - __rust_maybe_catch_panic
at /build/rustc-1.14.0+dfsg1/src/libpanic_unwind/lib.rs:97
11: 0x5650bd0a6fc1 - <F as alloc::boxed::FnBox<A>>::call_box::he32a93ebea7bc7ad
12: 0x5650bd0ae6c4 - std::sys::imp::thread::Thread::new::thread_start::ha102a6120fc52763
at /build/rustc-1.14.0+dfsg1/src/liballoc/boxed.rs:605
at /build/rustc-1.14.0+dfsg1/src/libstd/sys_common/thread.rs:21
at /build/rustc-1.14.0+dfsg1/src/libstd/sys/unix/thread.rs:84
13: 0x7fc2d0042423 - start_thread
14: 0x7fc2cfb6e9be - __clone
15: 0x0 - <unknown>
If I remove all code in the closure, it exits with no error:
use std::thread;
fn main() {
thread::spawn(|| {
}).join();
}
If I add println!() in the loop, it works well too:
use std::thread;
fn main() {
thread::spawn(|| {
loop {
println!("123")
}
}).join();
}
I tested this on Rust 1.14 and 1.15, the same problem appears in both.
Is this because I'm using something wrong or is it a bug?

This is a known issue (#28728). In short, LLVM optimizes away loops that have no observable side-effects:
The implementation may assume that any thread will eventually do one
of the following:
terminate
make a call to a library I/O function
access or modify a volatile object, or
perform a synchronization operation or an atomic operation
In the cases here, none of these hold, so LLVM removes the loop entirely. However, the Rust compiler has generated code that assumes the loop never returns. This mismatch causes the crash.
Since having an infinite loop with no side-effects is basically useless, this issue is not of critical priority. The Rust team is currently waiting for LLVM to provide a better solution.
As a workaround, you should simply do something inside the loop, which is likely what you want to do anyway ^_^

Related

What is an example of Rust code that causes a segfault?

I Googled some segfault examples in Rust, but none of crash now. Is Rust able to prevent all segfaults now? Is there a simple demo that can cause a segfault?
If unsafe code is allowed, then:
fn main() {
unsafe { std::ptr::null_mut::<i32>().write(42) };
}
results in:
Compiling playground v0.0.1 (/playground)
Finished dev [unoptimized + debuginfo] target(s) in 1.37s
Running `target/debug/playground`
timeout: the monitored command dumped core
/playground/tools/entrypoint.sh: line 11: 7 Segmentation fault timeout --signal=KILL ${timeout} "$#"
as seen on the playground.
Any situation that would trigger a segfault would require invoking undefined behavior at some point. The compiler is allowed to optimize out code or otherwise exploit the fact that undefined behavior should never occur, so it's very hard to guarantee that some code will segfault. The compiler is well within its rights to make the above program run without triggering a segfault.
As an example, the code above when compiled in release mode results in an "Illegal instruction" instead.
If unsafe code is not allowed, see How does Rust guarantee memory safety and prevent segfaults? for how Rust can guarantee it doesn't happen as long as its memory safety invariants aren't violated (which could only happen in unsafe code).
Don't use unsafe code if you can avoid it.
Strictly speaking, it is always possible to trick a program into thinking that it had a segmentation fault, since this is a signal sent by the OS:
use libc::kill;
use std::process;
fn main() {
unsafe {
// First SIGSEGV will be consumed by Rust runtime
// (see https://users.rust-lang.org/t/is-sigsegv-handled-by-rust-runtime/45680)...
kill(process::id() as i32, libc::SIGSEGV);
// ...but the second will crash the program, as expected
kill(process::id() as i32, libc::SIGSEGV);
}
}
Playground
This is not really an answer to your question, since that's not a "real" segmentation fault, but taking the question literally - Rust program can still end with a "segmentation fault" error, and here's a case which reliably triggers it.
If you're looking more generally for something that will dump core, and not specifically cause a segfault, there is another option which is to cause the compiler to emit an UD2 instruction or equivalent. There's a few things which can produce this:
An empty loop without any side effects is UB because of LLVM optimizations:
fn main() {
(|| loop {})()
}
Playground.
This no longer produces UB.
Trying to create the never (!) type.
#![feature(never_type)]
union Erroneous {
a: (),
b: !,
}
fn main() {
unsafe { Erroneous { a: () }.b }
}
Playground.
Or also trying to use (in this case match on) an enum with no variants:
#[derive(Clone, Copy)]
enum Uninhabited {}
union Erroneous {
a: (),
b: Uninhabited,
}
fn main() {
match unsafe { Erroneous { a: () }.b } {
// Nothing to match on.
}
}
Playground.
And last, you can cheat and just force it to produce a UD2 directly:
#![feature(asm)]
fn main() {
unsafe {
asm! {
"ud2"
}
};
}
Playground
Or using llvm_asm! instead of asm!
#![feature(llvm_asm)]
fn main() {
unsafe {
llvm_asm! {
"ud2"
}
};
}
Playground.
NULL pointer can cause segfault in both C and Rust.
union Foo<'a>{
a:i32,
s:&'a str,
}
fn main() {
let mut a = Foo{s:"fghgf"};
a.a = 0;
unsafe {
print!("{:?}", a.s);
}
}

How do I print a backtrace without panicking using thiserror?

I am running a Rust warp webserver and I need more descriptive error messages. I'd like to print a backtrace or something similar so I can tell where the error started.
I was using the Failure crate, but it is now deprecated so I migrated to thiserror.
Is it possible (without using nightly), to print a backtrace without panicking?
There are ways to get to the backtrace information - but it relies on what are currently "nightly only" APIs. If you're happy to use nightly and stick with thiserror, here's what you do... (If not then see the ASIDE at the end for other ideas).
If you're creating the error from scratch the steps are:
add a backtrace: Backtrace field to your error type.
set the backtrace when you create an error using backtrace::force_captue()
If you're creating this error due to another error and want to use its backtrace instead, then just add #[backtrace] to the source field where you keep the original error.
Then to get the backtrace information you can just use the backtrace() function on std::Error.
An example looks like this:
#![feature(backtrace)]
extern crate thiserror;
use std::backtrace::Backtrace;
use thiserror::Error;
#[derive(Error, Debug)]
pub enum DataStoreError {
//#[error("data store disconnected")]
//Disconnect(#[from] io::Error),
#[error("the data for key `{0}` is not available")]
Redaction(String),
#[error("invalid header (expected {expected:?}, found {found:?})")]
InvalidHeader {
expected: String,
found: String,
backtrace: Backtrace,
},
#[error("unknown data store error")]
Unknown,
}
pub fn explode() -> Result<(), DataStoreError> {
Err(DataStoreError::InvalidHeader {
expected: "A".to_owned(),
found: "B".to_owned(),
backtrace: Backtrace::force_capture(),
})
}
fn main() {
use std::error::Error;
let e = explode().err().unwrap();
let b = e.backtrace();
println!("e = {}", e);
println!("b = {:#?}", b);
}
This outputs something like:
e = invalid header (expected "A", found "B")
b = Some(
Backtrace [
{ fn: "playground::explode", file: "./src/main.rs", line: 28 },
{ fn: "playground::main", file: "./src/main.rs", line: 34 },
{ fn: "core::ops::function::FnOnce::call_once", file: "/rustc/879aff385a5fe0af78f3d45fd2f0b8762934e41e/library/core/src/ops/function.rs", line: 248 },
...
You can see a working version in the playground
ASIDE
If you're not tied to thiserror and need to use the stable version of the compiler, you could instead use snafu which has support for backtraces using the backtrace crate, rather than std::backtrace, and so works on stable.
There may also be ways to make thiserror work with the backtrace crate rather than std::backtrace but I've not tried that.

What is the difference between literal string and Args in Rust?

I have a string parsing function using regex: fn parse(s: &str) -> Option<MyObj>. It works when testing with parse("test string"). But failed when using Args. The failure is that the regex could not matching anything from s.
The way I used Args is: args().map(|arg| parse(&arg)).collect(). I could not see type error here. And println in parse shows s is the same string as "test string".
Updated my description. I am not sure if my problem is related with how String and str are different. Because I was using str and it still failed.
extern crate regex;
use regex::Regex;
use std::env::args;
struct IPRange {
start: u32,
mask: u8,
}
fn parse_iprange(ipr: &str) -> Option<IPRange> {
let parser = Regex::new(
r"^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/(\d+|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})$",
)
.unwrap();
let caps = parser.captures(ipr).unwrap();
return Some(IPRange { start: 0, mask: 0 });
}
fn main() {
let v: Vec<_> = args().map(|arg| parse_iprange(&arg)).collect();
}
$ RUST_BACKTRACE=1 cargo run 192.168.3.1/24
Finished dev [unoptimized + debuginfo] target(s) in 0.04s
Running `target/debug/ip_helper 192.168.3.1/24`
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/libcore/option.rs:345:21
stack backtrace:
0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:39
1: std::sys_common::backtrace::_print
at src/libstd/sys_common/backtrace.rs:70
2: std::panicking::default_hook::{{closure}}
at src/libstd/sys_common/backtrace.rs:58
at src/libstd/panicking.rs:200
3: std::panicking::default_hook
at src/libstd/panicking.rs:215
4: std::panicking::rust_panic_with_hook
at src/libstd/panicking.rs:478
5: std::panicking::continue_panic_fmt
at src/libstd/panicking.rs:385
6: rust_begin_unwind
at src/libstd/panicking.rs:312
7: core::panicking::panic_fmt
at src/libcore/panicking.rs:85
8: core::panicking::panic
at src/libcore/panicking.rs:49
9: <core::option::Option<T>>::unwrap
at /rustc/2aa4c46cfdd726e97360c2734835aa3515e8c858/src/libcore/macros.rs:10
10: ip_helper::parse_iprange
at src/main.rs:18
The first item of args() is implementation behavior:
The first element is traditionally the path of the executable, but it can be set to arbitrary text, and may not even exist. This means this property should not be relied upon for security purposes.
So, you should skip it in your case:
let v: Vec<_> = args().skip(1).map(|arg| parse_iprange(&arg)).collect();

How do I get the return address of a function?

I am writing a Rust library containing an implementation of the callbacks for LLVM SanitizerCoverage. These callbacks can be used to trace the execution of an instrumented program.
A common way to produce a trace is to print the address of each executed basic block. However, in order to do that, it is necessary to retrieve the address of the call instruction that invoked the callback. The C++ examples provided by LLVM rely on the compiler intrinsic __builtin_return_address(0) in order to obtain this information.
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
if (!*guard) return;
void *PC = __builtin_return_address(0);
printf("guard: %p %x PC %p\n", guard, *guard, PC);
}
I am trying to reproduce the same function in Rust but, apparently, there is no equivalent to __builtin_return_address. The only reference I found is from an old version of Rust, but the function described is not available anymore. The function is the following:
pub unsafe extern "rust-intrinsic" fn return_address() -> *const u8
My current hacky solution involves having a C file in my crate that contains the following function:
void* get_return_address() {
return __builtin_return_address(1);
}
If I call it from a Rust function, I am able to obtain the return address of the Rust function itself. This solution, however, requires the compilation of my Rust code with -C force-frame-pointers=yes for it to work, since the C compiler intrinsic relies on the presence of frame pointers.
Concluding, is there a more straightforward way of getting the return address of the current function in Rust?
edit: The removal of the return_address intrinsic is discussed in this GitHub issue.
edit 2: Further testing showed that the backtrace crate is able to correctly extract the return address of the current function, thus avoiding the hack I described before. Credit goes to this tweet.
The problem with this solution is the overhead that is generated creating a full backtrace when only the return address of the current function is needed. In addition, the crate is using C libraries to extract the backtrace; this looks like something that should be done in pure Rust.
edit 3: The compiler intrinsic __builtin_return_address(0) generates a call to the LLVM intrinsic llvm.returnaddress. The corresponding documentation can be found here.
I could not find any official documentation about this, but found out by asking in the rust-lang repository: You can link against LLVM intrinsics, like llvm.returnaddress, with only a few lines of code:
extern {
#[link_name = "llvm.returnaddress"]
fn return_address() -> *const u8;
}
fn foo() {
println!("I was called by {:X}", return_address());
}
The LLVM intrinsic llvm.addressofreturnaddress might also be interesting.
As of 2022 Maurice's answer doesn't work as-is and requires an additional argument.
#![feature(link_llvm_intrinsics)]
extern {
#[link_name = "llvm.returnaddress"]
fn return_address(a: i32) -> *const u8;
}
macro_rules! caller_address {
() => {
unsafe { return_address(0) }
};
}
fn print_caller() {
println!("caller: {:p}", caller_address!());
}
fn main() {
println!("main address: {:p}", main as *const ());
print_caller();
}
Output:
main address: 0x56261a13bb50
caller: 0x56261a13bbaf
Playground link;

How can I cause a panic on a thread to immediately end the main thread?

In Rust, a panic terminates the current thread but is not sent back to the main thread. The solution we are told is to use join. However, this blocks the currently executing thread. So if my main thread spawns 2 threads, I cannot join both of them and immediately get a panic back.
let jh1 = thread::spawn(|| { println!("thread 1"); sleep(1000000); };
let jh2 = thread::spawn(|| { panic!("thread 2") };
In the above, if I join on thread 1 and then on thread 2 I will be waiting for 1 before ever receiving a panic from either thread
Although in some cases I desire the current behavior, my goal is to default to Go's behavior where I can spawn a thread and have it panic on that thread and then immediately end the main thread. (The Go specification also documents a protect function, so it is easy to achieve Rust behavior in Go).
Updated for Rust 1.10+, see revision history for the previous version of the answer
good point, in go the main thread doesn't get unwound, the program just crashes, but the original panic is reported. This is in fact the behavior I want (although ideally resources would get cleaned up properly everywhere).
This you can achieve with the recently stable std::panic::set_hook() function. With it, you can set a hook which prints the panic info and then exits the whole process, something like this:
use std::thread;
use std::panic;
use std::process;
fn main() {
// take_hook() returns the default hook in case when a custom one is not set
let orig_hook = panic::take_hook();
panic::set_hook(Box::new(move |panic_info| {
// invoke the default handler and exit the process
orig_hook(panic_info);
process::exit(1);
}));
thread::spawn(move || {
panic!("something bad happened");
}).join();
// this line won't ever be invoked because of process::exit()
println!("Won't be printed");
}
Try commenting the set_hook() call out, and you'll see that the println!() line gets executed.
However, this approach, due to the use of process::exit(), will not allow resources allocated by other threads to be freed. In fact, I'm not sure that Go runtime allows this as well; it is likely that it uses the same approach with aborting the process.
I tried to force my code to stop processing when any of threads panicked. The only more-or-less clear solution without using unstable features was to use Drop trait implemented on some struct. This can lead to a resource leak, but in my scenario I'm ok with this.
use std::process;
use std::thread;
use std::time::Duration;
static THREAD_ERROR_CODE: i32 = 0x1;
static NUM_THREADS: u32 = 17;
static PROBE_SLEEP_MILLIS: u64 = 500;
struct PoisonPill;
impl Drop for PoisonPill {
fn drop(&mut self) {
if thread::panicking() {
println!("dropped while unwinding");
process::exit(THREAD_ERROR_CODE);
}
}
}
fn main() {
let mut thread_handles = vec![];
for i in 0..NUM_THREADS {
thread_handles.push(thread::spawn(move || {
let b = PoisonPill;
thread::sleep(Duration::from_millis(PROBE_SLEEP_MILLIS));
if i % 2 == 0 {
println!("kill {}", i);
panic!();
}
println!("this is thread number {}", i);
}));
}
for handle in thread_handles {
let _ = handle.join();
}
}
No matter how b = PoisonPill leaves it's scope, normal or after panic!, its Drop method kicks in. You can distinguish if the caller panicked using thread::panicking and take some action — in my case killing the process.
Looks like exiting the whole process on a panic in any thread is now (rust 1.62) as simple as adding this to your Cargo.toml:
[profile.release]
panic = 'abort'
[profile.dev]
panic = 'abort'
A panic in a thread then looks like this, with exit code 134:
thread '<unnamed>' panicked at 'panic in thread', src/main.rs:5:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Aborted (core dumped)

Resources