How to debug illegal hardware instruction error in unsafe block? - rust

Here is the code:
pub struct Node<T> {
data: Option<T>,
level: usize,
forward: [Option<*mut Node<T>>; MAX_HEIGHT],
}
And I want to iterate the linked list:
// let next = some_node.forward[n];
unsafe {
loop {
match next {
None => { break; }
Some(v) => {
write!(f, "{:?}", (*v).data)?;
break;
}
}
}
}
When I use unsafe keyword, I get the [1] 74042 illegal hardware instruction cargo run error, so is there anyway to debug this unsafe block?

unsafe is a way of saying, "shut up, rustc, I know what I'm doing." In this case, you're assuring the compiler that v is always a valid aligned pointer to a Node<T>, that the array indexing of forward resolves to an array of Option<*mut Node<T>> with size MAX_HEIGHT. If any of these assumptions are violated, you're back in undefined behavior land.
You've turned off all the safeties and aimed your compiler at unknown pointers. The rational part of my brain wants to know exactly what you're trying to accomplish here.
The best advice I can offer with the information provided is to use rust-gdb and step through your program until your pointers don't look sane.

Related

What is an example of Rust code that causes a segfault?

I Googled some segfault examples in Rust, but none of crash now. Is Rust able to prevent all segfaults now? Is there a simple demo that can cause a segfault?
If unsafe code is allowed, then:
fn main() {
unsafe { std::ptr::null_mut::<i32>().write(42) };
}
results in:
Compiling playground v0.0.1 (/playground)
Finished dev [unoptimized + debuginfo] target(s) in 1.37s
Running `target/debug/playground`
timeout: the monitored command dumped core
/playground/tools/entrypoint.sh: line 11: 7 Segmentation fault timeout --signal=KILL ${timeout} "$#"
as seen on the playground.
Any situation that would trigger a segfault would require invoking undefined behavior at some point. The compiler is allowed to optimize out code or otherwise exploit the fact that undefined behavior should never occur, so it's very hard to guarantee that some code will segfault. The compiler is well within its rights to make the above program run without triggering a segfault.
As an example, the code above when compiled in release mode results in an "Illegal instruction" instead.
If unsafe code is not allowed, see How does Rust guarantee memory safety and prevent segfaults? for how Rust can guarantee it doesn't happen as long as its memory safety invariants aren't violated (which could only happen in unsafe code).
Don't use unsafe code if you can avoid it.
Strictly speaking, it is always possible to trick a program into thinking that it had a segmentation fault, since this is a signal sent by the OS:
use libc::kill;
use std::process;
fn main() {
unsafe {
// First SIGSEGV will be consumed by Rust runtime
// (see https://users.rust-lang.org/t/is-sigsegv-handled-by-rust-runtime/45680)...
kill(process::id() as i32, libc::SIGSEGV);
// ...but the second will crash the program, as expected
kill(process::id() as i32, libc::SIGSEGV);
}
}
Playground
This is not really an answer to your question, since that's not a "real" segmentation fault, but taking the question literally - Rust program can still end with a "segmentation fault" error, and here's a case which reliably triggers it.
If you're looking more generally for something that will dump core, and not specifically cause a segfault, there is another option which is to cause the compiler to emit an UD2 instruction or equivalent. There's a few things which can produce this:
An empty loop without any side effects is UB because of LLVM optimizations:
fn main() {
(|| loop {})()
}
Playground.
This no longer produces UB.
Trying to create the never (!) type.
#![feature(never_type)]
union Erroneous {
a: (),
b: !,
}
fn main() {
unsafe { Erroneous { a: () }.b }
}
Playground.
Or also trying to use (in this case match on) an enum with no variants:
#[derive(Clone, Copy)]
enum Uninhabited {}
union Erroneous {
a: (),
b: Uninhabited,
}
fn main() {
match unsafe { Erroneous { a: () }.b } {
// Nothing to match on.
}
}
Playground.
And last, you can cheat and just force it to produce a UD2 directly:
#![feature(asm)]
fn main() {
unsafe {
asm! {
"ud2"
}
};
}
Playground
Or using llvm_asm! instead of asm!
#![feature(llvm_asm)]
fn main() {
unsafe {
llvm_asm! {
"ud2"
}
};
}
Playground.
NULL pointer can cause segfault in both C and Rust.
union Foo<'a>{
a:i32,
s:&'a str,
}
fn main() {
let mut a = Foo{s:"fghgf"};
a.a = 0;
unsafe {
print!("{:?}", a.s);
}
}

"borrow of possibly uninitialized variable" error in "obviously" unreachable block

In the following code example the compiler can work out that the if blocks aren't reachable, and yet it still gives me an error.
const A_MODE: bool = false; // I manually edit this to switch "modes"
fn main() {
let a: Vec<u32>;
if A_MODE {
a = vec![1,2,3];
}
if A_MODE {
println!("a: {:?}", a); // error: borrow of possibly uninitialized variable
}
}
Rust Playground
I thought that maybe the compiler was really trying to tell me that I need to initialize a at some point, but this compiles fine:
fn main() {
let a: Vec<u32>;
println!("Finished.");
}
Is the error just because the Rust compiler isn't smart enough yet, or does this behaviour have some purpose? Is there any simple workaround which results in a similar code structure?
I know that I could restructure the code to make it work, but for my purposes the above structure is the most straight-forward and intuitive. My current work-around is to comment and uncomment code blocks, which isn't fun. Thanks!
The compiler does not expand constant expressions in the phase for validating lifetime and ownership, so it is not "obvious" to the compiler.
If you really don't want to run that block, you might want to use #[cfg] (or the cfg-if crate if you like if syntax).
fn main() {
let a: Vec<u32>;
#[cfg(a-mode)] {
a = vec![1,2,3];
}
#[cfg(a-mode)] {
println!("a: {:?}", a); // error: borrow of possibly uninitialized variable
}
}
This way, it will compile both usages without branching at all if a-mode cfg is set, and will not compile either of them otherwise.
The compiler is aware that constant expression conditions never change, but that is handled at a later phase of the compilation for optimizations like removing branching.

How to mimic varargs for utility functions?

Result.expect()'s console output wasn't what I needed, so I extended Result with my own version:
trait ResultExt<T> {
fn or_exit(self, message: &str) -> T;
}
impl<T> ResultExt<T> for ::std::result::Result<T, Error> {
fn or_exit(self, message: &str) -> T {
if self.is_err() {
io::stderr().write(format!("FATAL: {} ({})\n", message, self.err().unwrap()).as_bytes()).unwrap();
process::exit(1);
}
return self.unwrap();
}
}
As I understand, Rust doesn't support varargs yet, so I have to use it like that, correct?
something().or_exit(&format!("Ah-ha! An error! {}", "blah"));
That's too verbose compared to either Java, Kotlin or C. What is the preferred way to solve this?
I don't think the API you suggested is particularly unergonomic. If maximum performance matters, it might make sense to put the error generation in a closure or provide an API for that too, so the String is only allocated when there is actually an error, which might be especially relevant when something is particularly expensive to format. (Like all the _else methods for std::result::Result.)
However, you might be able to make it more ergonomic by defining a macro which takes a result, a &str and format parameters. This could look like this for example: (This is based on #E_net4's comment)
macro_rules! or_exit {
($res:expr, $fmt:expr, $($arg:tt)+) => {
$res.unwrap_or_else(|e| {
let message = format!($fmt, $($arg)+);
eprintln!("FATAL: {} ({})\n", message, e);
process::exit(1)
})
};
}
fn main() {
let x: Result<i32, &'static str> = Err("dumb user, please replace");
let _ = or_exit!(x, "Ah-ha! An error! {}", "blahh");
}
Rust Playground
Note this might not yield the best error messages if users supply invalid arguments, I did not want to change your code too much, but if you decide to actually have the macro only be sugar and nothing else you should probably extend your API to take a closure instead of a string. You might want also to reconsider the naming of the macro.

How to store a reference without having to deal with lifetimes?

As suggested by the dynamic_reload crate's example, I collected Symbols instead of extracting them every time, but Symbol requires a lifetime. Using a lifetime changes method signatures and breaks the compatibility with method DynamicReload::update.
Is it a valid workaround to use std::mem::transmute to change Symbol's lifetime to 'static?
extern crate dynamic_reload;
use dynamic_reload::{DynamicReload, Lib, Symbol, Search, PlatformName, UpdateState};
use std::sync::Arc;
use std::time::Duration;
use std::thread;
use std::mem::transmute;
struct Plugins {
plugins: Vec<(Arc<Lib>, Arc<Symbol<'static, extern "C" fn() -> i32>>)>,
}
impl Plugins {
fn add_plugin(&mut self, plugin: &Arc<Lib>) {
match unsafe { plugin.lib.get(b"shared_fun\0") } {
Ok(temp) => {
let f: Symbol<extern "C" fn() -> i32> = temp;
self.plugins.push((plugin.clone(), Arc::new(unsafe { transmute(f) })));
},
Err(e) => println!("Failed to load symbol: {:?}", e),
}
}
fn unload_plugins(&mut self, lib: &Arc<Lib>) {
for i in (0..self.plugins.len()).rev() {
if &self.plugins[i].0 == lib {
self.plugins.swap_remove(i);
}
}
}
fn reload_plugin(&mut self, lib: &Arc<Lib>) {
Self::add_plugin(self, lib);
}
// called when a lib needs to be reloaded.
fn reload_callback(&mut self, state: UpdateState, lib: Option<&Arc<Lib>>) {
match state {
UpdateState::Before => Self::unload_plugins(self, lib.unwrap()),
UpdateState::After => Self::reload_plugin(self, lib.unwrap()),
UpdateState::ReloadFailed(_) => println!("Failed to reload"),
}
}
}
fn main() {
let mut plugs = Plugins { plugins: Vec::new() };
// Setup the reload handler. A temporary directory will be created inside the target/debug
// where plugins will be loaded from. That is because on some OS:es loading a shared lib
// will lock the file so we can't overwrite it so this works around that issue.
let mut reload_handler = DynamicReload::new(Some(vec!["target/debug"]),
Some("target/debug"),
Search::Default);
// test_shared is generated in build.rs
match reload_handler.add_library("test_shared", PlatformName::Yes) {
Ok(lib) => plugs.add_plugin(&lib),
Err(e) => {
println!("Unable to load dynamic lib, err {:?}", e);
return;
}
}
//
// While this is running (printing a number) change return value in file src/test_shared.rs
// build the project with cargo build and notice that this code will now return the new value
//
loop {
reload_handler.update(Plugins::reload_callback, &mut plugs);
if plugs.plugins.len() > 0 {
let fun = &plugs.plugins[0].1;
println!("Value {}", fun());
}
// Wait for 0.5 sec
thread::sleep(Duration::from_millis(500));
}
}
I still have to keep Arc<Lib> inside the vector because Symbol doesn't implement PartialEq.
How to store a reference without having to deal with lifetimes?
The answer in 98% of the cases is: you don't. Lifetimes are one of the biggest reasons to use Rust. Lifetimes enforce, at compile time, that your references will always refer to something that is valid. If you wish to "ignore" lifetimes, then perhaps Rust may not the best language to realize a particular design. You may need to pick a different language or design.
Is it a valid workaround to use std::mem::transmute to change Symbol's lifetime to 'static?
transmute is The Big Hammer, suitable for all sorts of good and bad ideas and implementations. I would encourage never using it directly, but instead wrapping it in a layer of abstraction that somehow helps you enforce the appropriate restrictions that make that particular transmute correct.
If you choose to use transmute, you are assuming the full responsibility that the compiler previously had. It will be up to you to ensure that the reference is always valid, otherwise you are invoking undefined behavior and your program is allowed to do any number of Very Bad things.
For your specific case, you may be able to use the Rental crate to keep around "the library" and "references into the library" in a single struct that hides the lifetimes of the Symbols. In fact, Rental uses libloading as the motivating example and libloading powers dynamic_reload. See
Why can't I store a value and a reference to that value in the same struct? for more details and pitfalls.
I'm not optimistic that this will work because DynamicReload::update requires a &mut self. During that method call, it could easily invalidate all of the existing references.
See also:
Why can't I store a value and a reference to that value in the same struct?
How can I avoid a ripple effect from changing a concrete struct to generic?

thread '<main>' has overflowed its stack in Rust

I got an error trying this code, which realizes a simple linked list.
use std::rc::Rc;
use std::cell::RefCell;
struct Node {
a : Option<Rc<RefCell<Node>>>,
value: i32
}
impl Node {
fn new(value: i32) -> Rc<RefCell<Node>> {
let node = Node {
a: None,
value: value
};
Rc::new(RefCell::new(node))
}
}
fn main() {
let first = Node::new(0);
let mut t = first.clone();
for i in 1 .. 10_000
{
if t.borrow().a.is_none() {
t.borrow_mut().a = Some(Node::new(i));
}
if t.borrow().a.is_some() {
t = t.borrow().a.as_ref().unwrap().clone();
}
}
println!("Done!");
}
Why does it happen? Does this mean that Rust is not as safe as positioned?
UPD:
If I add this method, the program does not crash.
impl Drop for Node {
fn drop(&mut self) {
let mut children = mem::replace(&mut self.a, None);
loop {
children = match children {
Some(mut n) => mem::replace(&mut n.borrow_mut().a, None),
None => break,
}
}
}
}
But I am not sure that this is the right solution.
Does this mean that Rust is not as safe as positioned?
Rust is only safe against certain kinds of failures; specifically memory corrupting crashes, which are documented here: http://doc.rust-lang.org/reference.html#behavior-considered-undefined
Unfortunately there is a tendency to sometimes expect rust to be more robust against certain sorts of failures that are not memory corrupting. Specifically, you should read http://doc.rust-lang.org/reference.html#behavior-considered-undefined.
tldr; In rust, many things can cause a panic. A panic will cause the current thread to halt, performing shutdown operations.
This may superficially appear similar to a memory corrupting crash from other languages, but it is important to understand although it is an application failure, it is not a memory corrupting failure.
For example, you can treat panic's like exceptions by running actions in a different thread and gracefully handling failure when the thread panics (for whatever reason).
In this specific example, you're using up too much memory on the stack.
This simple example will also fail:
fn main() {
let foo:&mut [i8] = &mut [1i8; 1024 * 1024];
}
(On most rustc; depending on the stack size on that particularly implementation)
I would have thought that moving your allocations to the stack using Box::new() would fix it in this example...
use std::rc::Rc;
use std::cell::RefCell;
#[derive(Debug)]
struct Node {
a : Option<Box<Rc<RefCell<Node>>>>,
value: i32
}
impl Node {
fn new(value: i32) -> Box<Rc<RefCell<Node>>> {
let node = Node {
a: None,
value: value
};
Box::new(Rc::new(RefCell::new(node)))
}
}
fn main() {
let first = Node::new(0);
let mut t = first.clone();
for i in 1 .. 10000
{
if t.borrow().a.is_none() {
t.borrow_mut().a = Some(Node::new(i));
}
if t.borrow().a.is_some() {
let c:Box<Rc<RefCell<Node>>>;
{ c = t.borrow().a.as_ref().unwrap().clone(); }
t = c;
println!("{:?}", t);
}
}
println!("Done!");
}
...but it doesn't. I don't really understand why, but hopefully someone else can look at this and post a more authoritative answer about what exactly is causing stack exhaustion in your code.
For those who come here and are specifically interested in the case where the large struct is a contiguous chunk of memory (instead of a tree of boxes), I found this GitHub issue with further discussion, as well as a solution that worked for me:
https://github.com/rust-lang/rust/issues/53827
Vec's method into_boxed_slice() returns a Box<[T]>, and does not overflow the stack for me.
vec![-1; 3000000].into_boxed_slice()
A note of difference with the vec! macro and array expressions from the docs:
This will use clone to duplicate an expression, so one should be careful using this with types having a nonstandard Clone implementation.
There is also the with_capacity() method on Vec, which is shown in the into_boxed_slice() examples.

Resources