Rust: how to override the default way of reporting error?

Rust: how to override the default way of reporting error? - rust

When error occurs in Rust, most Rust standard and 3rd party libraries (e.g. anyhow) would simply print the error message to stdout/stderr. What I need is to override it with my own function. For example, call Windows API MessageBox with the location information (which file and line does it panic).
So far, I tried several methods and each lacks something.
Write my custom error, and overrides its Debug trait. The problem is that this error type needs to be able to wrap any underlying error (e.g. dyn std::error::Error), so that I could still write
fn test() -> std::result::Result<(), CustomError> {
...
if (has_windows_error) {
return Err(WindowsError(123))?;
}
if (has_io_error) {
return Err(IoError(123))?;
}
Ok(())
}
However, this means CustomError needs to implement both std::error::Error and From<std::error::Error>, which conflicts against each other.
Write my custom result type, and implement Try. This approach requires touching the experimental #![feature(try_trait_v2)].
Use std::panic::set_panic_hook(), then never use ? in the code. Instead, only immediately unwrap() all Results. This is needed because if I allow the error to bubble up, I will lose the location information in PanicInfo.
I'm relatively new to Rust, so I could have been missing many things. Option 3 is what I'm currently using, though I don't like how it forces me to abandon idiomatic coding style.
What is the cleanest, less intrusive way to implement custom error outputting currently?

Option 3 is most certainly not the way to go here. In Rust panics mean unrecoverable errors, and from my experience with the community over the last 2 years this guideline is taken pretty seriously. Moreover options 1 and 2 add pretty substantial side-effects to operations which were never meant to hide stuff like that.
Also I'm not sure what you mean by "3rd party libraries (e.g. anyhow) would simply print the error message to stdout/stderr." If your function returns an anyhow::Result<T> and an error occurs, then all that happens is that error is converted into anyhow::Error, it's not printed unless you print it.
The bottom line is that there's no "error handling framework" in Rust. Results are not special at all, in fact I encourage you to read the source code. If you want to handle a result in a certain way, then match on it and write the code to handle it. Moreover the panic handling machinery is not meant to elevate them to something like C++ exceptions. You can read more about the motivations for it here.
It sounds like what would work best for you is using anyhow or something like it, and at some point in your program you can match on a result, convert the anyhow::Error (or std::error::Error implementer of your choice) to a string and open a message box. If you want to include file and line information about where the error occurred, then that could be added to the error's message/data via the file! and line! macros provided through the standard library. This could look something like the following:
pub fn main() {
if let Err(error) = try_main() {
// show_message_box implementation not shown
show_message_box(error.to_string());
}
}
pub fn try_main() -> anyhow::Result<()> {
// code which makes extensive use of `?`
}
Note that Rust's approach to error handling is different from many other widely used languages, so it's going to require a shift in coding style. Errors are deliberately meant to be intrusive in your code so you actually handle them properly, which is something I came to greatly appreciate with time. In fact, Rust's idiosyncrasies make it a very opinionated language, so when you reach pain points I would advice against trying to solve them by using language features beyond their intended use, as that's more likely to cause additional pain points than to solve your problem.

Related

An easy way to find unwrap() usages of Result only

In Rust two of the most commonly used enums, Option and Result, have a method with the same name unwrap(). I'm not sure why Rust authors chose both enums to use the same method name - it's clear that both enums are somewhat similar, but that decision can make it harder to find all the usages of, say, Result's method only. And I think in a Rust project it would be very useful if we could easily find all the places where we have unwrap() or something else that might panic. For example, if we start off with some proof-of-concept implementation that is OK to panic but later decide to properly handle errors.
Option's unwrap() could also panic, of course, but usually we would have made sure that wouldn't be possible, so there is a clear difference, compared to Result, where we generally expect there might be an error. (Also, I know Option's unwrap() can generally be avoided by using alternatives, but sometimes it does make code simpler.)
Update
It seems from the comments I should probably clarify why I said sometimes Option's unwrapping should be considered safe. I guess an example would be best:
if o.is_none() {
// ...
return ...;
}
// ...
o.unwrap() // <--- Here I do NOT expect a None

Is it idiomatic to panic in From implementations?

The documentation at https://doc.rust-lang.org/std/convert/trait.From.html states
Note: This trait must not fail. If the conversion can fail, use TryFrom.
Suppose I have a From implementation thus:
impl From<SomeStruct> for http::Uri {
fn from(item: SomeStruct) -> http::Uri {
item.uri.parse::<http::Uri>() // can fail
}
}
Further suppose I am completely certain that item.uri.parse will succeed. Is it idiomatic to panic in this scenario? Say, with:
item.uri.parse::<http::Uri>().unwrap()
In this particular case, it appears there's no way to construct an HTTP URI at compile time: https://docs.rs/http/0.2.5/src/http/uri/mod.rs.html#117. In the real scenario .uri is an associated const, so I can test all used values parse. But it seems to me there could be other scenarios when the author is confident in the infallibility of a piece of code, particularly when that confidence can be encoded in tests, and would therefore prefer the ergonomics of From over TryFrom. The Rust compiler, typically quite strict, doesn't prevent this behaviour, though it seems it perhaps could. This makes me think this is a decision the author has been deliberately allowed to make. So the question is asking: what do people tend to do in this situation?

So in general, traits only enforce that the implementors adhere to the signatures and types as laid out in the trait. At least that's what the compiler enforces.
On top of that, there are certain contracts that traits are expected to adhere to just so that there's no weird surprises by those who work with these traits. These contracts aren't checked by the compiler; that would be quite difficult.
Nothing prevents you from implementing all a trait's methods but in way that's totally unrelated to what the trait is all about, like implementing the Display trait but then in the fmt method not actually bothering to use write! and instead, I don't know, delete the user's home directory.
Now back to your specific case. If your from method will not fail, provably so, then of course you can use .unwrap. The point of the cannot fail contract for the From trait is that those who rely on the From trait want to be able to assume that the conversion will go through every time. If you actually panic in your own implementation of from, it means the conversion sometimes doesn't go through, counter to the ideas and contracts in the From trait.

Can anyone show me a solution to Iterate all IPv4 addresses found in a file?

When I am experimenting with a new language that I'm unfamiliar with, my hello world is listing all IPv4 found in a mixed file (for example a log file). I think it is a good exercise because it gets me to practice with IO, packages, functions, regexes and iterators.
I tried for 2-3 hours to accomplish that in Rust, I still haven't found any elegant way to do it. I'm obviously doing it wrong.
Can anyone show me their solution to achieve this? It will help my brain to unlock by seeing the most efficient/elegant way. Or do you recommend me to continue to bash on the pile until I have it right?
Passing a file name to a function, which returns iterator of all IPv4 in that file.
I saw that Rust support iterators as well as generators/yield. I would like to see solutions for both if possible.

For the simplicity I avoid error handling (with unwrap and expect), since it may harm readability. For that kind of task you don't need an external crates (e.g. regexes), because some parsing already implemented in standard library with FromStr. For the per-line reading a BufRead trait with BufReader wrapper might do the thing. And composed it becomes (playground):
fn iterate_over_ips(filename: impl AsRef<Path>) -> impl Iterator<Item = Ipv4Addr> {
let file = File::open(filename).unwrap();
io::BufReader::new(file)
.lines()
.map(|line| line.expect("line read").parse().expect("ip invalid format"))
}
Generator is an unstable feature (so its API may change anytime) and for now mostly used internally by a compiler for asynchronous code. Iterators are way better for this particular task.

Is it safe and defined behavior to transmute between a T and an UnsafeCell<T>?

A recent question was looking for the ability to construct self-referential structures. In discussing possible answers for the question, one potential answer involved using an UnsafeCell for interior mutability and then "discarding" the mutability through a transmute.
Here's a small example of such an idea in action. I'm not deeply interested in the example itself, but it's just enough complication to require a bigger hammer like transmute as opposed to just using UnsafeCell::new and/or UnsafeCell::into_inner:
use std::{
cell::UnsafeCell, mem, rc::{Rc, Weak},
};
// This is our real type.
struct ReallyImmutable {
value: i32,
myself: Weak<ReallyImmutable>,
}
fn initialize() -> Rc<ReallyImmutable> {
// This mirrors ReallyImmutable but we use `UnsafeCell`
// to perform some initial interior mutation.
struct NotReallyImmutable {
value: i32,
myself: Weak<UnsafeCell<NotReallyImmutable>>,
}
let initial = NotReallyImmutable {
value: 42,
myself: Weak::new(),
};
// Without interior mutability, we couldn't update the `myself` field
// after we've created the `Rc`.
let second = Rc::new(UnsafeCell::new(initial));
// Tie the recursive knot
let new_myself = Rc::downgrade(&second);
unsafe {
// Should be safe as there can be no other accesses to this field
(&mut *second.get()).myself = new_myself;
// No one outside of this function needs the interior mutability
// TODO: Is this call safe?
mem::transmute(second)
}
}
fn main() {
let v = initialize();
println!("{} -> {:?}", v.value, v.myself.upgrade().map(|v| v.value))
}
This code appears to print out what I'd expect, but that doesn't mean that it's safe or using defined semantics.
Is transmuting from a UnsafeCell<T> to a T memory safe? Does it invoke undefined behavior? What about transmuting in the opposite direction, from a T to an UnsafeCell<T>?

(I am still new to SO and not sure if "well, maybe" qualifies as an answer, but here you go. ;)
Disclaimer: The rules for these kinds of things are not (yet) set in stone. So, there is no definitive answer yet. I'm going to make some guesses based on (a) what kinds of compiler transformations LLVM does/we will eventually want to do, and (b) what kind of models I have in my head that would define the answer to this.
Also, I see two parts to this: The data layout perspective, and the aliasing perspective. The layout issue is that NotReallyImmutable could, in principle, have a totally different layout than ReallyImmutable. I don't know much about data layout, but with UnsafeCell becoming repr(transparent) and that being the only difference between the two types, I think the intent is for this to work. You are, however, relying on repr(transparent) being "structural" in the sense that it should allow you to replace things in larger types, which I am not sure has been written down explicitly anywhere. Sounds like a proposal for a follow-up RFC that extends the repr(transparent) guarantees appropriately?
As far as aliasing is concerned, the issue is breaking the rules around &T. I'd say that, as long as you never have a live &T around anywhere when writing through the &UnsafeCell<T>, you are good -- but I don't think we can guarantee that quite yet. Let's look in more detail.
Compiler perspective
The relevant optimizations here are the ones that exploit &T being read-only. So if you reordered the last two lines (transmute and the assignment), that code would likely be UB as we may want the compiler to be able to "pre-fetch" the value behind the shared reference and re-use that value later (i.e. after inlining this).
But in your code, we would only emit "read-only" annotations (noalias in LLVM) after the transmute comes back, and the data is indeed read-only starting there. So, this should be good.
Memory models
The "most aggressive" of my memory models essentially asserts that all values are always valid, and I think even that model should be fine with your code. &UnsafeCell is a special case in that model where validity just stops, and nothing is said about what lives behind this reference. The moment the transmute returns, we grab the memory it points to and make it all read-only, and even if we did that "recursively" through the Rc (which my model doesn't, but only because I couldn't figure out a good way to make it do so) you'd be fine as you don't mutate any more after the transmute. (As you may have noticed, this is the same restriction as in the compiler perspective. The point of these models is to allow compiler optimizations, after all. ;)
(As a side-note, I really wish miri was in better shape right now. Seems I have to try and get validation to work again in there, because then I could tell you to just run your code in miri and it'd tell you if that version of my model is okay with what you are doing :D )
I am thinking about other models currently that only check things "on access", but haven't worked out the UnsafeCell story for that model yet. What this example shows is that the model may have to contain ways for a "phase transition" of memory first being UnsafeCell, but later having normal sharing with read-only guarantees. Thanks for bringing this up, that will make for some nice examples to think about!
So, I think I can say that (at least from my side) there is the intent to allow this kind of code, and doing so does not seem to prevent any optimizations. Whether we'll actually manage to find a model that everybody can agree with and that still allows this, I cannot predict.
The opposite direction: T -> UnsafeCell<T>
Now, this is more interesting. The problem is that, as I said above, you must not have a &T live when writing through an UnsafeCell<T>. But what does "live" mean here? That's a hard question! In some of my models, this could be as weak as "a reference of that type exists somewhere and the lifetime is still active", i.e., it could have nothing to do with whether the reference is actually used. (That's useful because it lets us do more optimizations, like moving a load out of a loop even if we cannot prove that the loop ever runs -- which would introduce a use of an otherwise unused reference.) And since &T is Copy, you cannot even really get rid of such a reference either. So, if you have x: &T, then after let y: &UnsafeCell<T> = transmute(x), the old x is still around and its lifetime still active, so writing through y could well be UB.
I think you'd have to somehow restrict the aliasing that &T allows, very carefully making sure that nobody still holds such a reference. I'm not going to say "this is impossible" because people keep surprising me (especially in this community ;) but TBH I cannot think of a way to make this work. I'd be curious if you have an example though where you think this is reasonable.

Is it possible to set a function to only be inlined during a release build?

Possible example:
#[inline(release)]
fn foo() {
println!("moo");
}
If not, is it possible to only include an attribute based on build type or another attribute?

[...] is it possible to only include an attribute based on build type [...]?
Yes. That's what cfg_attr is for:
#[cfg_attr(not(debug_assertions), inline(always))]
#[cfg_attr(debug_assertions, inline(never))]
fn foo() {
println!("moo")
}
This is probably the closest you will get to your goal. Note that inline annotations (even with "always" and "never") can be ignored by the compiler. There are good reasons for that, as you can read below.
However: what do you want to achieve?
Humans are pretty bad at inlining decisions, while compilers are pretty smart. Even without #[inline], the compiler will inline the function in release mode whenever it's a good idea to do so. And it won't be inlined in debug mode.
If you don't have a very good and special reason to tinker with the inlining yourself, you should not touch it! The compiler will do the right thing in nearly all cases :)
Even the reference says:
The compiler automatically inlines functions based on internal heuristics. Incorrectly inlining functions can actually make the program slower, so it should be used with care.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string