Parse eof or a character in nom - rust

I'm parsing with the nom library.
I would like to match on something that is followed by an 'end', without consuming it.
A end for me is either eof or a char that satisfies a function f: Fn(char) -> bool.
I can use f with nom::character::complete::satisfy, nom::bytes::take_while, or many other nom functions.
The incompatabilites between eof and f makes it impossible for me to compose, with nom combinators or my own combinators.
This is illegal, because of opaque type:
fn end(i: &str) -> IResult<&str, &str> {
match eof(i) {
o # Ok(_) => o,
e # Err(_) => peek(satisfy(f)),
}
}
This is also illegal:
alt((eof, satisfy(f)))
I can't even make this work:
alt((eof, char(' ')))
I can't even make THIS work: (because match arms has incompatible types??)
fn end(i: &str) -> IResult<&str, ()> {
match eof(i) {
o # Ok((sur, res)) => Ok((i, ())),
e # Err(_) => match satisfy(f) {
Ok(_) => Ok((i, ())),
e # Err(_) => e,
},
}
}

I made it work by providing an error myself:
fn end(i: &str) -> IResult<&str, ()> {
match eof::<&str, ()>(i) {
o # Ok((_, _)) => Ok((i, ())),
e # Err(_) => match satisfy::<_, _, ()>(is_url_terminative)(i) {
Ok(_) => Ok((i, ())),
Err(_) => Err(nom::Err::Error(Error {
input: i,
code: nom::error::ErrorKind::Permutation,
})),
},
}
}

Related

Keep the LocatedSpan of an "outer" parser with nom in Rust

I am trying to write a parser using the nom crate (and the nom_locate) that can parse strings such as u{12a}, i.e.:
u\{([0-9a-fA-F]{1,6})\}
I wrote the following parser combinator:
use nom::bytes::complete::{take_while_m_n};
use nom::character::complete::{char};
use nom::combinator::{map_opt, map_res};
use nom::sequence::{delimited, preceded};
pub type LocatedSpan<'a> = nom_locate::LocatedSpan<&'a str>;
pub type IResult<'a, T> = nom::IResult<LocatedSpan<'a>, T>;
#[derive(Clone, Debug)]
pub struct LexerError<'a>(LocatedSpan<'a>, String);
fn expect<'a, F, E, T>(
mut parser: F,
err_msg: E,
) -> impl FnMut(LocatedSpan<'a>) -> IResult<Option<T>>
where
F: FnMut(LocatedSpan<'a>) -> IResult<T>,
E: ToString,
{
use nom::error::Error as NomError;
move |input| match parser(input) {
Ok((remaining, output)) => Ok((remaining, Some(output))),
Err(nom::Err::Error(NomError { input, code: _ }))
| Err(nom::Err::Failure(NomError { input, code: _ })) => {
let err = LexerError(input, err_msg.to_string());
// TODO Report error.
println!("error: {:?}", err);
Ok((input, None))
}
Err(err) => Err(err),
}
}
fn lit_str_unicode_char(input: LocatedSpan) -> IResult<char> {
let parse_hex = take_while_m_n(1, 6, |c: char| c.is_ascii_hexdigit());
// FIXME Figure out a way to keep correct span here.
let parse_delim_hex = preceded(
char('u'),
delimited(
char('{'),
expect(parse_hex, "expected 1-6 hex digits"),
expect(char('}'), "expected closing brace"),
),
);
let parse_u32 = map_res(parse_delim_hex, move |hex| match hex {
None => Err("cannot parse number"),
Some(hex) => match u32::from_str_radix(hex.fragment(), 16) {
Ok(val) => Ok(val),
Err(_) => Err("invalid number"),
},
});
map_opt(parse_u32, std::char::from_u32)(input)
}
fn main() {
let raw = "u{61}";
let span = LocatedSpan::new(raw);
let result = lit_str_unicode_char(span);
println!("{:#?}", result);
}
This works correctly, I am able to get the Unicode character out of the string. However, this approach does not keep the proper spans, i.e.:
u{123}
\..../ <--- the span I want
\/ <--- the span I get
I figured I could wrap the parse_delim_hex in a recognize, which would keep the span correctly, but then I couldn't use the following parsers to "understand" the digits.
How should I get around this issue?
I think you misunderstand the purpose of the first parameter of IResult.
Quote from the documentation:
The Ok side is a pair containing the remainder of the input (the part of the data that was not parsed) and the produced value.
The span you are looking at is not the data that was found, but instead the data that was left over afterwards.
I think what you were trying to achieve is something along those lines:
use nom::bytes::complete::take_while_m_n;
use nom::character::complete::char;
use nom::combinator::{map_opt, map_res};
use nom::{InputTake, Offset};
use nom::sequence::{delimited, preceded};
pub type LocatedSpan<'a> = nom_locate::LocatedSpan<&'a str>;
pub type IResult<'a, T> = nom::IResult<LocatedSpan<'a>, T>;
#[derive(Clone, Debug)]
pub struct LexerError<'a>(LocatedSpan<'a>, String);
fn expect<'a, F, E, T>(
mut parser: F,
err_msg: E,
) -> impl FnMut(LocatedSpan<'a>) -> IResult<Option<T>>
where
F: FnMut(LocatedSpan<'a>) -> IResult<T>,
E: ToString,
{
use nom::error::Error as NomError;
move |input| match parser(input) {
Ok((remaining, output)) => Ok((remaining, Some(output))),
Err(nom::Err::Error(NomError { input, code: _ }))
| Err(nom::Err::Failure(NomError { input, code: _ })) => {
let err = LexerError(input, err_msg.to_string());
// TODO Report error.
println!("error: {:?}", err);
Ok((input, None))
}
Err(err) => Err(err),
}
}
fn lit_str_unicode_char(input: LocatedSpan) -> IResult<(char, LocatedSpan)> {
let parse_hex = take_while_m_n(1, 6, |c: char| c.is_ascii_hexdigit());
// FIXME Figure out a way to keep correct span here.
let parse_delim_hex = preceded(
char('u'),
delimited(
char('{'),
expect(parse_hex, "expected 1-6 hex digits"),
expect(char('}'), "expected closing brace"),
),
);
let parse_u32 = map_res(parse_delim_hex, |hex| match hex {
None => Err("cannot parse number"),
Some(hex) => match u32::from_str_radix(hex.fragment(), 16) {
Ok(val) => Ok(val),
Err(_) => Err("invalid number"),
},
});
// Do the actual parsing
let (s, ch) = map_opt(parse_u32, std::char::from_u32)(input)?;
let span_offset = input.offset(&s);
let span = input.take(span_offset);
Ok((s, (ch, span)))
}
fn main() {
let span = LocatedSpan::new("u{62} bbbb");
let (rest, (ch, span)) = lit_str_unicode_char(span).unwrap();
println!("Leftover: {:?}", rest);
println!("Character: {:?}", ch);
println!("Parsed Span: {:?}", span);
}
Leftover: LocatedSpan { offset: 5, line: 1, fragment: " bbbb", extra: () }
Character: 'b'
Parsed Span: LocatedSpan { offset: 0, line: 1, fragment: "u{62}", extra: () }

Handling a Response from a Result using match

I'm trying to return a Result from a function and extract it's return.
I'm using i32 and a &str and I get a mismatch type error in the match statment (Can't use two different types in a match).
How do I fix this?
fn use_result(par: i32)-> Result<i32, &'static str> {
if par == 0 {
Err("some error")
} else {
println!("par is 1");
Ok(par)
}
}
fn main() {
// Result
let res = match use_result(1) {
Ok(v) => v,
Err(e) => e,
};
}
//Do something with res: v or res: e
}
In Rust, every variable has a single type. In the code you have now, res is either a &'static str or an i32, which is not allowed.
Your options are:
Return early
fn main() {
let res: i32 = match use_result(1) {
Ok(v) => v,
Err(e) => return,
};
}
Different code in each match arm
fn main() {
match use_result(1) {
Ok(v) => {
handle_success(v);
},
Err(e) => {
handle_error(e);
},
};
}
Return an enum
Enums allow you to express that a type is "one of these possible variants" in a type safe way:
enum IntOrString {
Int(i32),
String(&'static str),
}
fn main() {
let i_or_s: IntOrString = match use_result(1) {
Ok(v) => IntOrString::Int(v),
Err(e) => IntOrString::String(e),
};
}
But this is a bit weird, since Result<i32, &'static str> is already an enum, if you want to do anything with an IntOrString you'll need to match on it later on (or an if let, etc).
Panic
fn main() {
let res: i32 = match use_result(1) {
Ok(v) => v,
Err(e) => panic!("cannot be zero"),
};
}
This is more cleanly expressed as use_result(1).unwrap(). It's usually not what you want, since it doesn't allow the caller of the function to recover/handle the error. But if you're calling this from main(), the error has nowhere else to propagate to, so unwrapping is usually OK.

How do I return a custom error from a splitn fn?

I've written this function to parse a comma-separated string and return either a <Vec<&str>> or a custom error:
fn parse_input(s: &str) -> Result<Vec<&str>, ParseError> {
match s.splitn(2, ',').next() {
Ok(s) => s.collect::<Vec<&str>>(),
Err(err) => Err(ParseError::InvalidInput)
}
}
The compiler gives me this response:
Ok(s) => s.collect::<Vec<&str>>(),
^^^^^ expected enum `Option`, found enum `Result`
...
Err(err) => Err(ParseError::InvalidInput)
^^^^^^^^ expected enum `Option`, found enum `Result`
My problem is that I don't understand how to change the code to satisfy the compiler. What is wrong with this function?
.next() returns an Option<&str>, i.e. Some(s) or None.
fn parse_input(s: &str) -> Result<Vec<&str>, ParseError> {
match s.splitn(2, ',').next() {
Some(s) => s.collect::<Vec<&str>>(),
None => Err(ParseError::InvalidInput),
}
}
Just like you wrapped the error with Err to make it a Result, the non-error needs to be wrapped with Ok.
fn parse_input(s: &str) -> Result<Vec<&str>, ParseError> {
match s.splitn(2, ',').next() {
Some(s) => Ok(s.collect::<Vec<&str>>()),
None => Err(ParseError::InvalidInput),
}
}
Whether it’s the pattern-matched Some(s) or the outer parameter s, s.collect() doesn’t make sense. Going by your description, maybe you want to split the string on commas, collect that into a Vec, and produce an error if the result doesn’t consist of exactly two parts?
fn parse_input(s: &str) -> Result<Vec<&str>, ParseError> {
let parts: Vec<_> = s.split(',').collect();
if parts.len() == 2 {
Ok(parts)
} else {
Err(ParseError::InvalidInput)
}
}
Maybe a pair would be better? Also, if more than one comma is acceptable and you just want to split on the first one, split_once fits perfectly.
fn parse_input(s: &str) -> Result<(&str, &str), ParseError> {
s.split_once(',').ok_or(ParseError::InvalidInput)
}

How can I get the T from an Option<T> when using syn?

I'm using syn to parse Rust code. When I read a named field's type using field.ty, I get a syn::Type. When I print it using quote!{#ty}.to_string() I get "Option<String>".
How can I get just "String"? I want to use #ty in quote! to print "String" instead of "Option<String>".
I want to generate code like:
impl Foo {
pub set_bar(&mut self, v: String) {
self.bar = Some(v);
}
}
starting from
struct Foo {
bar: Option<String>
}
My attempt:
let ast: DeriveInput = parse_macro_input!(input as DeriveInput);
let data: Data = ast.data;
match data {
Data::Struct(ref data) => match data.fields {
Fields::Named(ref fields) => {
fields.named.iter().for_each(|field| {
let name = &field.ident.clone().unwrap();
let ty = &field.ty;
quote!{
impl Foo {
pub set_bar(&mut self, v: #ty) {
self.bar = Some(v);
}
}
};
});
}
_ => {}
},
_ => panic!("You can derive it only from struct"),
}
My updated version of the response from #Boiethios, tested and used in a public crate, with support of several syntaxes for Option:
Option
std::option::Option
::std::option::Option
core::option::Option
::core::option::Option
fn extract_type_from_option(ty: &syn::Type) -> Option<&syn::Type> {
use syn::{GenericArgument, Path, PathArguments, PathSegment};
fn extract_type_path(ty: &syn::Type) -> Option<&Path> {
match *ty {
syn::Type::Path(ref typepath) if typepath.qself.is_none() => Some(&typepath.path),
_ => None,
}
}
// TODO store (with lazy static) the vec of string
// TODO maybe optimization, reverse the order of segments
fn extract_option_segment(path: &Path) -> Option<&PathSegment> {
let idents_of_path = path
.segments
.iter()
.into_iter()
.fold(String::new(), |mut acc, v| {
acc.push_str(&v.ident.to_string());
acc.push('|');
acc
});
vec!["Option|", "std|option|Option|", "core|option|Option|"]
.into_iter()
.find(|s| &idents_of_path == *s)
.and_then(|_| path.segments.last())
}
extract_type_path(ty)
.and_then(|path| extract_option_segment(path))
.and_then(|path_seg| {
let type_params = &path_seg.arguments;
// It should have only on angle-bracketed param ("<String>"):
match *type_params {
PathArguments::AngleBracketed(ref params) => params.args.first(),
_ => None,
}
})
.and_then(|generic_arg| match *generic_arg {
GenericArgument::Type(ref ty) => Some(ty),
_ => None,
})
}
You should do something like this untested example:
use syn::{GenericArgument, PathArguments, Type};
fn extract_type_from_option(ty: &Type) -> Type {
fn path_is_option(path: &Path) -> bool {
leading_colon.is_none()
&& path.segments.len() == 1
&& path.segments.iter().next().unwrap().ident == "Option"
}
match ty {
Type::Path(typepath) if typepath.qself.is_none() && path_is_option(typepath.path) => {
// Get the first segment of the path (there is only one, in fact: "Option"):
let type_params = typepath.path.segments.iter().first().unwrap().arguments;
// It should have only on angle-bracketed param ("<String>"):
let generic_arg = match type_params {
PathArguments::AngleBracketed(params) => params.args.iter().first().unwrap(),
_ => panic!("TODO: error handling"),
};
// This argument must be a type:
match generic_arg {
GenericArgument::Type(ty) => ty,
_ => panic!("TODO: error handling"),
}
}
_ => panic!("TODO: error handling"),
}
}
There's not many things to explain, it just "unrolls" the diverse components of a type:
Type -> TypePath -> Path -> PathSegment -> PathArguments -> AngleBracketedGenericArguments -> GenericArgument -> Type.
If there is an easier way to do that, I would be happy to know it.
Note that since syn is a parser, it works with tokens. You cannot know for sure that this is an Option. The user could, for example, type std::option::Option, or write type MaybeString = std::option::Option<String>;. You cannot handle those arbitrary names.

Conditions in a match arm without destructuring

use std::cmp::Ordering;
fn cmp(a: i32, b: i32) -> Ordering {
match {
_ if a < b => Ordering::Less,
_ if a > b => Ordering::Greater,
_ => Ordering::Equal,
}
}
fn main() {
let x = 5;
let y = 10;
println!("{}", match cmp(x, y) {
Ordering::Less => "less",
Ordering::Greater => "greater",
Ordering::Equal => "equal",
});
}
How to use match with conditions, without destructuring (because there's nothing to destructure), in the function cmp above?
The code has been adapted from the well-known example in the book which uses only if/else, however, it does not work:
src/main.rs:5:9: 5:10 error: unexpected token: `_`
src/main.rs:5 _ if a < b => Ordering::Less,
^
Could not compile `match_ordering`.
I am using rustc 1.0.0-nightly (3ef8ff1f8 2015-02-12 00:38:24 +0000).
This would work:
fn cmp(a: i32, b: i32) -> Ordering {
match (a, b) {
(a,b) if a < b => Ordering::Less,
(a,b) if a > b => Ordering::Greater,
_ => Ordering::Equal,
}
}
but it would use destructuring. Is there any other way, or is this just the most idiomatic, clean way of writing it?
You need to match on something, i.e. match { ... } isn't valid because there needs to be something between the match and the {. Since you don't care about the value itself, matching on unit, (), should be fine: match () { ... }, e.g.:
match () {
_ if a < b => Ordering::Less,
_ if a > b => Ordering::Greater,
_ => Ordering::Equal
}
(To be strict: match { _ => ... } is actually trying to parse the { ... } as the match head (i.e. the expression being matched on), and _ isn't valid at the start of an expression, hence the error.)
On idiomacity: I personally think expressing a series of conditions with short results (like this) is fine with match and a series of _ if ...s as you have done.

Resources