How to write fallible nom parsers

How to write fallible nom parsers - rust

If I want to write a nom parser that could fail internally, how do I propagate the error?
As an example, something to parse a NaiveDate might look like:
fn parse_date(i: &str) -> IResult<&str, NaiveDate> {
map(take(10), |s| NaiveDate::parse_from_str(s, "%m/%d/%Y")?)(i)
}
The parse_from_str may fail and returns its own ParseResult type.
I actually rely on it's success/failure to determine if this parser works.
How can I convert an inner Result (in this case chrono::format::ParseResult) to something that works with nom?

You can use Nom's map_res method. MapRes emits only the ErrorKind::MapRes (no custom error), but if only ok/err result is needed that should suffice.
use chrono::NaiveDate;
use nom::bytes::streaming::take;
use nom::combinator::map_res;
use nom::error::{Error, ErrorKind};
use nom::IResult;
fn parse_date(i: &str) -> IResult<&str, NaiveDate> {
map_res(take(10usize), |s| NaiveDate::parse_from_str(s, "%m/%d/%Y"))(i)
}
fn main() {
assert_eq!(
parse_date("01/31/2022: rest").unwrap(),
(": rest", NaiveDate::from_ymd(2022, 01, 31))
);
assert_eq!(
parse_date("yy/xx/2022").unwrap_err(),
nom::Err::Error(Error::new("yy/xx/2022", ErrorKind::MapRes))
);
}

Related

Rust error handling - capturing multiple errors

I've started to learn Rust last week, by reading books and articles, and trying to convert some code from other languages at the same time.
I came across a situation which I'm trying to exemplify through the code below (which is a simplified version of what I was trying to convert from another language):
#[derive(Debug)]
struct InvalidStringSize;
impl std::fmt::Display for InvalidStringSize {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(f, "string is too short")
}
}
impl std::error::Error for InvalidStringSize {}
pub fn extract_codes_as_ints(
message: String,
) -> Result<(i32, i32, i32), Box<dyn std::error::Error>> {
if message.len() < 20 {
return Err(Box::new(InvalidStringSize {}));
}
let code1: i32 = message[0..3].trim().parse()?;
let code2: i32 = message[9..14].trim().parse()?;
let code3: i32 = message[17..20].trim().parse()?;
Ok((code1, code2, code3))
}
So basically I want to extract 3 integers from specific positions of the given string (I could also try to check the other characters for some patterns, but I've left that part out).
I was wondering, is there a way to "catch" or verify all three results of the parse calls at the same time? I don't want to add a match block for each, I'd just like to check if anyone resulted in an error, and return another error in that case. Makes sense?
The only solution I could think of so far would be to create another function with all parses, and match its result. Is there any other way to do this?
Also, any feedback/suggestions on other parts of the code is very welcome, I'm struggling to find out the "right way" to do things in Rust.

The idiomatic way to accomplish this is to define your own error type and return it, with a From<T> implementation for each error type T that can occur in your function. The ? operator will do .into() conversions to match the error type your function is declared to return.
A boxed error is overkill here; just declare an enum listing all of the ways the function can fail. The variant for an integer parse error can even capture the caught error.
use std::fmt::{Display, Formatter, Error as FmtError};
use std::error::Error;
use std::num::ParseIntError;
#[derive(Debug, Clone)]
pub enum ExtractCodeError {
InvalidStringSize,
InvalidInteger(ParseIntError),
}
impl From<ParseIntError> for ExtractCodeError {
fn from(e: ParseIntError) -> Self {
Self::InvalidInteger(e)
}
}
impl Error for ExtractCodeError {}
impl Display for ExtractCodeError {
fn fmt(&self, f: &mut Formatter) -> Result<(), FmtError> {
match self {
Self::InvalidStringSize => write!(f, "string is too short"),
Self::InvalidInteger(e) => write!(f, "invalid integer: {}", e)
}
}
}
Now we just need to change your function's return type and have it return ExtractCodeError::InvalidStringSize when the length is too short. Nothing else needs to change as a ParseIntError is automatically converted into an ExtractCodeError:
pub fn extract_codes_as_ints(
message: String,
) -> Result<(i32, i32, i32), ExtractCodeError> {
if message.len() < 20 {
return Err(ExtractCodeError::InvalidStringSize);
}
let code1: i32 = message[0..3].trim().parse()?;
let code2: i32 = message[9..14].trim().parse()?;
let code3: i32 = message[17..20].trim().parse()?;
Ok((code1, code2, code3))
}
As an added bonus, callers of this function will be able to inspect errors more easily than with a boxed dyn Error.
In more complex cases, such as where you'd want to tweak the error slightly for each possible occurrence of a ParseIntError, you can use .map_err() on results to transform the error. For example:
something_that_can_fail.map_err(|e| SomeOtherError::Foo(e))?;

Propagating details of an error upwards in Rust

I would like to have the details of an error be propagated upwards. I used error-chain previously, but that has not been maintained or kept compatible with the rest of the ecosystem as far as i can tell.
For example, in this example:
use std::str::FromStr;
use anyhow::Result;
fn fail() -> Result<u64> {
Ok(u64::from_str("Some String")?)
}
fn main() {
if let Err(e) = fail(){
println!("{:?}", e);
}
The error i am getting is:
invalid digit found in string
I would need the error message to have the key details, including at the point of failure, for example:
- main: invalid digit found in string
- fail: "Some String" is not a valid digit
What's the best way of doing this?

anyhow provides the context() and with_context() methods for that:
use anyhow::{Context, Result};
use std::str::FromStr;
fn fail() -> Result<u64> {
let s = "Some String";
Ok(u64::from_str(s).with_context(|| format!("\"{s}\" is not a valid digit"))?)
}
fn main() {
if let Err(e) = fail() {
println!("{:?}", e);
}
}
"Some String" is not a valid digit
Caused by:
invalid digit found in string
If you want custom formatting, you can use the Error::chain() method:
if let Err(e) = fail() {
for err in e.chain() {
println!("{err}");
}
}
"Some String" is not a valid digit
invalid digit found in string
And if you want additional details (e.g. where the error happened), you can use a custom error type and downcast it (for error source you can also capture a backtrace).

This is a tricky thing to accomplish and I'm not sure that there is a simple and non-invasive way to capture all of the details of any possible error without knowledge of the particular function being invoked. For example, we may want to display some arguments to the function call that failed, but evaluating other arguments might be problematic -- they may not even be able to be turned into strings.
Maybe the argument is another function call, too, so should we capture its arguments or only its return value?
I whipped up this example quickly to show that we can at least fairly trivially capture the exact source expression. It provides a detail_error! macro that takes an expression that produces Result<T, E> and emits an expression that procudes Result<T, DetailError<E>>. The DetailError wraps the original error value and additionally contains a reference to a string of the original source code fed to the macro.
use std::error::Error;
use std::str::FromStr;
#[derive(Debug)]
struct DetailError<T: Error> {
expr: &'static str,
cause: T,
}
impl<T: Error> DetailError<T> {
pub fn new(expr: &'static str, cause: T) -> DetailError<T> {
DetailError { expr, cause }
}
// Some getters we don't use in this example, but should be present to have
// a complete API.
#[allow(dead_code)]
pub fn cause(&self) -> &T {
&self.cause
}
#[allow(dead_code)]
pub fn expr(&self) -> &'static str {
self.expr
}
}
impl<T: Error> Error for DetailError<T> { }
impl<T: Error> std::fmt::Display for DetailError<T> {
fn fmt(&self, f: &mut std::fmt::Formatter) -> Result<(), std::fmt::Error> {
write!(f, "While evaluating ({}): ", self.expr)?;
std::fmt::Display::fmt(&self.cause, f)
}
}
macro_rules! detail_error {
($e:expr) => {
($e).map_err(|err| DetailError::new(stringify!($e), err))
}
}
fn main() {
match detail_error!(u64::from_str("Some String")) {
Ok(_) => {},
Err(e) => { println!("{}", e); }
};
}
This produces the runtime output:
While evaluating (u64::from_str("Some String")): invalid digit found in string
Note that this only shows the string because it's a literal in the source. If you pass a variable/parameter instead, you will see that identifier in the error message instead of the string.

When you run your app with the environment variable RUST_BACKTRACE set to 1 or full, you'll get more error details, without the need to recompile your program. That, however, doesn't mean you're going to get an extra message like "Some String" is not a valid digit, as the parsing function simply doesn't generate such.

How can I use Stream::map with a function that returns Result?

I've got the following piece of code (see playground):
use futures::{stream, Future, Stream}; // 0.1.25
use std::num::ParseIntError;
fn into_many(i: i32) -> impl Stream<Item = i32, Error = ParseIntError> {
stream::iter_ok(0..i)
}
fn convert_to_string(number: i32) -> Result<String, ParseIntError> {
Ok(number.to_string())
}
fn main() {
println!("start:");
let vec = into_many(10)
.map(|number| convert_to_string(number))
.collect()
.wait()
.unwrap();
println!("vec={:#?}", vec);
println!("finish:");
}
It outputs the following (i.e., Vec<Result<i32, ParseIntError>>):
start:
vec=[
Ok(
"0"
),
Ok(
"1"
),
Ok(
"2"
), ...
Is there any way to make it output a Vec<i32> and if any error happens than immediately stop execution and return from the function (e.g., like this example)?
Note: I do want to use use futures::Stream; // 0.1.25 even if it doesn't make sense for this particular example.

The following code (playground link) as a modification of your current code in your question gets the result you want:
use futures::{stream, Future, Stream}; // 0.1.25
use std::num::ParseIntError;
fn into_many(i: i32) -> impl Stream<Item = i32, Error = ParseIntError> {
stream::iter_ok(0..i)
}
fn convert_to_string(number: i32) -> Result<String, ParseIntError> {
Ok(number.to_string())
}
fn main() {
println!("start:");
let vec: Result<Vec<String>, ParseIntError> = into_many(10)
.map(|number| convert_to_string(number))
.collect()
.wait()
.unwrap()
.into_iter()
.collect();
println!("vec={:#?}", vec);
println!("finish:");
}
Since your current code returned a Vec, we can turn that into an iterator and collect that into the type you want. Type annotations are needed so that collect knows what type to collect the iterator into.
Note that the collect method on the Iterator trait isn't to be confused with the collect method on a Stream.
Finally, while this works, it may not be exactly what you want, since it still waits for all results from the stream to be collected into a vector, before using collect to transform the vector. I don't have experience with futures so not sure how possible this is (it probably is but may require a less neat functional programming style solution).

map with a function that returns Result
Don't do this, that's not when you should use map. Instead, use and_then:
let vec = into_many(10)
.and_then(|number| convert_to_string(number))
.collect()
.wait()
.unwrap();
You should practice with simpler Rust concepts like Option, Result, and iterators before diving into futures. Many concepts transfer over.
See also:
How do I unwrap an arbitrary number of nested Option types?
What is the idiomatic way to handle/unwrap nested Result types?

How to get the name of current program without the directory part?

In Bash this would be ${0##*/}.
use std::env;
use std::path::Path;
fn prog() -> String {
let prog = env::args().next().unwrap();
String::from(Path::new(&prog).file_name().unwrap().to_str().unwrap())
}
fn main() {
println!("{}", prog());
}
Is there a better way? (I particularly dislike those numerous unwrap()s.)

If you don't care about why you can't get the program name, you can handle all the potential errors with a judicious mix of map and and_then. Additionally, return an Option to indicate possible failure:
use std::env;
use std::path::Path;
use std::ffi::OsStr;
fn prog() -> Option<String> {
env::args().next()
.as_ref()
.map(Path::new)
.and_then(Path::file_name)
.and_then(OsStr::to_str)
.map(String::from)
}
fn main() {
println!("{:?}", prog());
}
If you wanted to follow delnan's awesome suggestion to use std::env::current_exe (which I just learned about!), replace env::args().next() with env::current_exe().ok().
If you do want to know why you can't get the program name (and knowing why is usually the first step to fixing a problem), then check out ker's answer.

You can also get rid of the unwraps and still report all error causes properly (instead of munching them into a "something failed" None). You aren't even required to specify the full paths to the conversion methods:
fn prog() -> Result<String, ProgError> {
let path = try!(env::current_exe());
let name = try!(path.file_name().ok_or(ProgError::NoFile));
let s_name = try!(name.to_str().ok_or(ProgError::NotUtf8));
Ok(s_name.to_owned())
}
Together with the future questionmark operator this can also be written as a single dot call chain:
fn prog() -> Result<String, ProgError> {
Ok(env::current_exe()?
.file_name().ok_or(ProgError::NoFile)?
.to_str().ok_or(ProgError::NotUtf8)?
.to_owned())
}
Of course this has the prerequisite of the ProgError type:
use std::io::Error;
#[derive(Debug)]
enum ProgError {
NoFile,
NotUtf8,
Io(Error),
}
impl From<Error> for ProgError {
fn from(err: Error) -> ProgError {
ProgError::Io(err)
}
}
try it out on the Playground

Just one more Option version :)
fn prog() -> Option<String> {
std::env::current_exe()
.ok()?
.file_name()?
.to_str()?
.to_owned()
.into()
}

Using convert::Into with enum to unwrap and convert value

I'm starting to get comfortable with Rust, but there are still some things that are really tripping me up with lifetimes. In this particular case, what I want to do is have an enum which may have different types wrapped as a generic parameter class to create strongly typed query parameters in a URL, though the specific use case is irrelevant, and return a conversion of that wrapped value into an &str. Here's an example of what I want to do:
enum Param<'a> {
MyBool(bool),
MyLong(i64),
MyStr(&'a str),
}
impl<'a> Param<'a> {
fn into(self) -> (&'static str, &'a str) {
match self {
Param::MyBool(b) => ("my_bool", &b.to_string()), // clearly wrong
Param::MyLong(i) => ("my_long", &i.to_string()), // clearly wrong
Param::Value(s) => ("my_str", s),
}
}
}
What I ended up doing is this to deal with the obvious lifetime issue (and yes, it's obvious to me why the lifetime isn't long enough for the into() function):
enum Param<'a> {
MyBool(&'a str), // no more static typing :(
MyLong(&'a str), // no more static typing :(
MyStr(&'a str),
}
impl<'a> Param<'a> {
fn into(self) -> (&'static str, &'a str) {
match self {
Param::MyBool(b) => ("my_bool", b),
Param::MyLong(i) => ("my_long", i),
Param::Value(s) => ("my_str", s),
}
}
}
This seems like an ugly workaround in a case where what I really want to do is guarantee the static typing of certain params, b/c now it's the constructor of the enum that's responsible for the proper type conversion. Curious if there is a way to do this... and yes, at some point I need &str as that is a parameter elsewhere, specifically:
let body = url::form_urlencoded::serialize(
vec![Param::MyBool(&true.to_string()).
into()].
into_iter());
I went through a whole bunch of things like trying to return String instead of &str from into(), but that only caused conversion issues down the line with a map() of String -> &str. Having the tuple correct from the start is the easiest thing, rather than fighting the compiler at every turn after that.
-- update--
Ok, so I went back to a (String,String) tuple in the into() function for the enum. It turns out that there is an "owned" version of the url::form_urlencoded::serialize() function which this is compatible with.
pub fn serialize_owned(pairs: &[(String, String)]) -> String
But, now I'm also trying to use the same pattern for the query string in the hyper::URL, specifically:
fn set_query_from_pairs<'a, I>(&mut self, pairs: I)
where I: Iterator<Item=(&'a str, &'a str)>
and then I try to use map() on the iterator that I have from the (String,String) tuple:
params: Iterator<Item=(String, String)>
url.set_query_from_pairs(params.map(|x: (String, String)| ->
(&str, &str) { let (ref k, ref v) = x; (k, v) } ));
But this gets error: x.0 does not live long enough. Ref seems correct in this case, right? If I don't use ref, then it's k/v that don't live long enough. Is there something 'simple' that I'm missing in this?

It is not really clear why you can't do this:
enum Param<'a> {
MyBool(bool),
MyLong(i64),
MyStr(&'a str),
}
impl<'a> Param<'a> {
fn into(self) -> (&'static str, String) {
match self {
Param::MyBool(b) => ("my_bool", b.to_string()),
Param::MyLong(i) => ("my_long", i.to_string()),
Param::MyStr(s) => ("my_str", s.into()),
}
}
}
(into() for &str -> String conversion is slightly more efficient than to_string())
You can always get a &str from String, e.g. with deref coercion or explicit slicing.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to write fallible nom parsers - rust

Related

Rust error handling - capturing multiple errors

Propagating details of an error upwards in Rust

How can I use Stream::map with a function that returns Result?

How to get the name of current program without the directory part?

Using convert::Into with enum to unwrap and convert value

Categories

Resources