Keep the LocatedSpan of an "outer" parser with nom in Rust - rust

I am trying to write a parser using the nom crate (and the nom_locate) that can parse strings such as u{12a}, i.e.:
u\{([0-9a-fA-F]{1,6})\}
I wrote the following parser combinator:
use nom::bytes::complete::{take_while_m_n};
use nom::character::complete::{char};
use nom::combinator::{map_opt, map_res};
use nom::sequence::{delimited, preceded};
pub type LocatedSpan<'a> = nom_locate::LocatedSpan<&'a str>;
pub type IResult<'a, T> = nom::IResult<LocatedSpan<'a>, T>;
#[derive(Clone, Debug)]
pub struct LexerError<'a>(LocatedSpan<'a>, String);
fn expect<'a, F, E, T>(
mut parser: F,
err_msg: E,
) -> impl FnMut(LocatedSpan<'a>) -> IResult<Option<T>>
where
F: FnMut(LocatedSpan<'a>) -> IResult<T>,
E: ToString,
{
use nom::error::Error as NomError;
move |input| match parser(input) {
Ok((remaining, output)) => Ok((remaining, Some(output))),
Err(nom::Err::Error(NomError { input, code: _ }))
| Err(nom::Err::Failure(NomError { input, code: _ })) => {
let err = LexerError(input, err_msg.to_string());
// TODO Report error.
println!("error: {:?}", err);
Ok((input, None))
}
Err(err) => Err(err),
}
}
fn lit_str_unicode_char(input: LocatedSpan) -> IResult<char> {
let parse_hex = take_while_m_n(1, 6, |c: char| c.is_ascii_hexdigit());
// FIXME Figure out a way to keep correct span here.
let parse_delim_hex = preceded(
char('u'),
delimited(
char('{'),
expect(parse_hex, "expected 1-6 hex digits"),
expect(char('}'), "expected closing brace"),
),
);
let parse_u32 = map_res(parse_delim_hex, move |hex| match hex {
None => Err("cannot parse number"),
Some(hex) => match u32::from_str_radix(hex.fragment(), 16) {
Ok(val) => Ok(val),
Err(_) => Err("invalid number"),
},
});
map_opt(parse_u32, std::char::from_u32)(input)
}
fn main() {
let raw = "u{61}";
let span = LocatedSpan::new(raw);
let result = lit_str_unicode_char(span);
println!("{:#?}", result);
}
This works correctly, I am able to get the Unicode character out of the string. However, this approach does not keep the proper spans, i.e.:
u{123}
\..../ <--- the span I want
\/ <--- the span I get
I figured I could wrap the parse_delim_hex in a recognize, which would keep the span correctly, but then I couldn't use the following parsers to "understand" the digits.
How should I get around this issue?

I think you misunderstand the purpose of the first parameter of IResult.
Quote from the documentation:
The Ok side is a pair containing the remainder of the input (the part of the data that was not parsed) and the produced value.
The span you are looking at is not the data that was found, but instead the data that was left over afterwards.
I think what you were trying to achieve is something along those lines:
use nom::bytes::complete::take_while_m_n;
use nom::character::complete::char;
use nom::combinator::{map_opt, map_res};
use nom::{InputTake, Offset};
use nom::sequence::{delimited, preceded};
pub type LocatedSpan<'a> = nom_locate::LocatedSpan<&'a str>;
pub type IResult<'a, T> = nom::IResult<LocatedSpan<'a>, T>;
#[derive(Clone, Debug)]
pub struct LexerError<'a>(LocatedSpan<'a>, String);
fn expect<'a, F, E, T>(
mut parser: F,
err_msg: E,
) -> impl FnMut(LocatedSpan<'a>) -> IResult<Option<T>>
where
F: FnMut(LocatedSpan<'a>) -> IResult<T>,
E: ToString,
{
use nom::error::Error as NomError;
move |input| match parser(input) {
Ok((remaining, output)) => Ok((remaining, Some(output))),
Err(nom::Err::Error(NomError { input, code: _ }))
| Err(nom::Err::Failure(NomError { input, code: _ })) => {
let err = LexerError(input, err_msg.to_string());
// TODO Report error.
println!("error: {:?}", err);
Ok((input, None))
}
Err(err) => Err(err),
}
}
fn lit_str_unicode_char(input: LocatedSpan) -> IResult<(char, LocatedSpan)> {
let parse_hex = take_while_m_n(1, 6, |c: char| c.is_ascii_hexdigit());
// FIXME Figure out a way to keep correct span here.
let parse_delim_hex = preceded(
char('u'),
delimited(
char('{'),
expect(parse_hex, "expected 1-6 hex digits"),
expect(char('}'), "expected closing brace"),
),
);
let parse_u32 = map_res(parse_delim_hex, |hex| match hex {
None => Err("cannot parse number"),
Some(hex) => match u32::from_str_radix(hex.fragment(), 16) {
Ok(val) => Ok(val),
Err(_) => Err("invalid number"),
},
});
// Do the actual parsing
let (s, ch) = map_opt(parse_u32, std::char::from_u32)(input)?;
let span_offset = input.offset(&s);
let span = input.take(span_offset);
Ok((s, (ch, span)))
}
fn main() {
let span = LocatedSpan::new("u{62} bbbb");
let (rest, (ch, span)) = lit_str_unicode_char(span).unwrap();
println!("Leftover: {:?}", rest);
println!("Character: {:?}", ch);
println!("Parsed Span: {:?}", span);
}
Leftover: LocatedSpan { offset: 5, line: 1, fragment: " bbbb", extra: () }
Character: 'b'
Parsed Span: LocatedSpan { offset: 0, line: 1, fragment: "u{62}", extra: () }

Related

in expressions, `_` can only be used on the left-hand side of an assignment

I've the following code I implemented to return an error when a string doesn't contain a match to something like "Mark, 55" to return an error. The compiler complains about my code Err(error) => Err(ParsePersonError::ParseInt(_))
use std::num::ParseIntError;
use std::str::FromStr;
#[derive(Debug, PartialEq)]
struct Person {
name: String,
age: usize,
}
// We will use this error type for the `FromStr` implementation.
#[derive(Debug, PartialEq)]
enum ParsePersonError {
// Empty input string
Empty,
// Incorrect number of fields
BadLen,
// Empty name field
NoName,
// Wrapped error from parse::<usize>()
ParseInt(ParseIntError),
}
// My implementation
impl FromStr for Person {
type Err = ParsePersonError;
fn from_str(s: &str) -> Result<Person, Self::Err> {
if s.len() == 0 {
Err(ParsePersonError::Empty)
}
else {
let v: Vec<&str> = s.split(",").collect();
println!("{:?}",v);
if &v[0]== &""{
Err(ParsePersonError::NoName)
}
else if v.len()!=2 {
Err(ParsePersonError::BadLen)
}
else {
let num = match v[1].parse::<usize>() {
Ok(n) => {
let name = v[0].to_string();
let age = n;
return Ok(Person {name,age})
},
Err(error) => Err(ParsePersonError::ParseInt(_))
};
Err(ParsePersonError::ParseInt(_))
}
}
}
}
However, the compiler doesn't complain about the same code in my test case.
ParsePersonError::ParseInt(_))
#[test]
fn missing_name_and_age() {
assert!(matches!(
",".parse::<Person>(),
Err(ParsePersonError::NoName | ParsePersonError::ParseInt(_))
));
}
What am I missing?
If you de-sugar the match macro in test, it is something like
let result = ",".parse::<Person>();
let r = match result {
Err(ParsePersonError::NoName) => true,
Err(ParsePersonError::ParseInt(_)) => true,
_ => false,
};
assert!(r);
Essentially the '_' is to hold the value attached to ParseInt. Hence an assignment. This is what the compiler error message means.

Match arms have incompatible types. Expected struc `NaiveDate`, found `()`

I'm new to Rust and am trying to wrap my head around error handling.
I'm trying to return error if parsing the date goes wrong, here is the function:
pub fn create_posts(contents: &Vec<String>) -> Result<Vec<Post>, CreatePostError> {
const TITLE_SEP: &str = "Title: ";
const DESC_SEP: &str = "Description: ";
const DATE_SEP: &str = "Date: ";
const TAGS_SEP: &str = "Tags: ";
let mut posts: Vec<Post> = Vec::new();
for entry in contents {
let lines = entry.lines().collect::<Vec<_>>();
let metadata = lines[0].contains(&TITLE_SEP)
&& lines[1].contains(&DESC_SEP)
&& lines[2].contains(&DATE_SEP)
&& lines[3].contains(&TAGS_SEP);
if metadata {
let date = &lines[2][DATE_SEP.len()..];
let parsed_date = match NaiveDate::parse_from_str(date, "%Y-%m-%d") {
Ok(parsed_date) => parsed_date,
Err(e) => eprintln!("Error: {:?}", CreatePostError::ParseError { inner_err: e }),
};
let tags: Vec<String> = lines[3][TAGS_SEP.len()..]
.split(", ")
.map(|s| s.to_string())
.collect();
let mut article_content = String::new();
for line in &lines[4..] {
article_content.push_str(line);
article_content.push_str("\n")
}
let post = Post {
title: lines[0][TITLE_SEP.len()..].to_string(),
description: lines[1][DESC_SEP.len()..].to_string(),
date: parsed_date,
tags,
content: article_content,
};
posts.push(post);
} else {
return Err(CreatePostError::MetadataError);
}
}
return Ok(posts);
}
You can see the full code here since i wrote custom errors: link
The problem I'm having is with this part:
let date = &lines[2][DATE_SEP.len()..];
let parsed_date = match NaiveDate::parse_from_str(date, "%Y-%m-%d") {
Ok(parsed_date) => parsed_date,
Err(e) => eprintln!("Error: {:?}", CreatePostError::ParseError { inner_err: e }),
};
I'm getting match arms have incompatible types. Expected struct NaiveDate, found ()
Here is my enum and impl for the error:
#[derive(Debug)]
pub enum CreatePostError {
ReadFileError { path: PathBuf },
MetadataError,
ParseError { inner_err: ParseError },
}
impl fmt::Display for CreatePostError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
Self::ReadFileError { path } => write!(f, "Error reading file {path:?}"),
Self::MetadataError => write!(f, "Some metadata is missing"),
Self::ParseError { inner_err } => {
write!(f, "Error parsing date: {inner_err}")
}
}
}
}
impl From<chrono::format::ParseError> for CreatePostError {
fn from(e: chrono::format::ParseError) -> Self {
CreatePostError::ParseError { inner_err: e }
}
}
You probably want to have a result here, here is a way to do it:
let parsed_date = match NaiveDate::parse_from_str(date, "%Y-%m-%d") {
Ok(parsed_date) => Ok(parsed_date),
Err(e) => {eprintln!("Error: {:?}", &e);
Err(CreatePostError::ParseError { inner_err: e }}),
}?;
The ? is saying: if the thing before is an error, return it, and if it is Ok, unwrap it.
This pattern is so common that rust's result gives a lot of utilities to make this kind of things easier. Here, the map_err function would make it more straightforward: map_err
see:
let parsed_date = NaiveDate::parse_from_str(date, "%Y-%m-%d")
.map_err(|e| {
eprintln!("Error: {:?}", &e);
CreatePostError::ParseError { inner_err: e }})?;
But it is only a matter of preference and it might be a lot to digest if you are just beginning, so you can choose the way that you like the most.

How can I get the T from an Option<T> when using syn?

I'm using syn to parse Rust code. When I read a named field's type using field.ty, I get a syn::Type. When I print it using quote!{#ty}.to_string() I get "Option<String>".
How can I get just "String"? I want to use #ty in quote! to print "String" instead of "Option<String>".
I want to generate code like:
impl Foo {
pub set_bar(&mut self, v: String) {
self.bar = Some(v);
}
}
starting from
struct Foo {
bar: Option<String>
}
My attempt:
let ast: DeriveInput = parse_macro_input!(input as DeriveInput);
let data: Data = ast.data;
match data {
Data::Struct(ref data) => match data.fields {
Fields::Named(ref fields) => {
fields.named.iter().for_each(|field| {
let name = &field.ident.clone().unwrap();
let ty = &field.ty;
quote!{
impl Foo {
pub set_bar(&mut self, v: #ty) {
self.bar = Some(v);
}
}
};
});
}
_ => {}
},
_ => panic!("You can derive it only from struct"),
}
My updated version of the response from #Boiethios, tested and used in a public crate, with support of several syntaxes for Option:
Option
std::option::Option
::std::option::Option
core::option::Option
::core::option::Option
fn extract_type_from_option(ty: &syn::Type) -> Option<&syn::Type> {
use syn::{GenericArgument, Path, PathArguments, PathSegment};
fn extract_type_path(ty: &syn::Type) -> Option<&Path> {
match *ty {
syn::Type::Path(ref typepath) if typepath.qself.is_none() => Some(&typepath.path),
_ => None,
}
}
// TODO store (with lazy static) the vec of string
// TODO maybe optimization, reverse the order of segments
fn extract_option_segment(path: &Path) -> Option<&PathSegment> {
let idents_of_path = path
.segments
.iter()
.into_iter()
.fold(String::new(), |mut acc, v| {
acc.push_str(&v.ident.to_string());
acc.push('|');
acc
});
vec!["Option|", "std|option|Option|", "core|option|Option|"]
.into_iter()
.find(|s| &idents_of_path == *s)
.and_then(|_| path.segments.last())
}
extract_type_path(ty)
.and_then(|path| extract_option_segment(path))
.and_then(|path_seg| {
let type_params = &path_seg.arguments;
// It should have only on angle-bracketed param ("<String>"):
match *type_params {
PathArguments::AngleBracketed(ref params) => params.args.first(),
_ => None,
}
})
.and_then(|generic_arg| match *generic_arg {
GenericArgument::Type(ref ty) => Some(ty),
_ => None,
})
}
You should do something like this untested example:
use syn::{GenericArgument, PathArguments, Type};
fn extract_type_from_option(ty: &Type) -> Type {
fn path_is_option(path: &Path) -> bool {
leading_colon.is_none()
&& path.segments.len() == 1
&& path.segments.iter().next().unwrap().ident == "Option"
}
match ty {
Type::Path(typepath) if typepath.qself.is_none() && path_is_option(typepath.path) => {
// Get the first segment of the path (there is only one, in fact: "Option"):
let type_params = typepath.path.segments.iter().first().unwrap().arguments;
// It should have only on angle-bracketed param ("<String>"):
let generic_arg = match type_params {
PathArguments::AngleBracketed(params) => params.args.iter().first().unwrap(),
_ => panic!("TODO: error handling"),
};
// This argument must be a type:
match generic_arg {
GenericArgument::Type(ty) => ty,
_ => panic!("TODO: error handling"),
}
}
_ => panic!("TODO: error handling"),
}
}
There's not many things to explain, it just "unrolls" the diverse components of a type:
Type -> TypePath -> Path -> PathSegment -> PathArguments -> AngleBracketedGenericArguments -> GenericArgument -> Type.
If there is an easier way to do that, I would be happy to know it.
Note that since syn is a parser, it works with tokens. You cannot know for sure that this is an Option. The user could, for example, type std::option::Option, or write type MaybeString = std::option::Option<String>;. You cannot handle those arbitrary names.

Why does matching on the result of Regex::find complain about expecting a struct regex::Match but found tuple?

I copied this code from Code Review into IntelliJ IDEA to try and play around with it. I have a homework assignment that is similar to this one (I need to write a version of Linux's bc in Rust), so I am using this code only for reference purposes.
use std::io;
extern crate regex;
#[macro_use]
extern crate lazy_static;
use regex::Regex;
fn main() {
let tokenizer = Tokenizer::new();
loop {
println!("Enter input:");
let mut input = String::new();
io::stdin()
.read_line(&mut input)
.expect("Failed to read line");
let tokens = tokenizer.tokenize(&input);
let stack = shunt(tokens);
let res = calculate(stack);
println!("{}", res);
}
}
#[derive(Debug, PartialEq)]
enum Token {
Number(i64),
Plus,
Sub,
Mul,
Div,
LeftParen,
RightParen,
}
impl Token {
/// Returns the precedence of op
fn precedence(&self) -> usize {
match *self {
Token::Plus | Token::Sub => 1,
Token::Mul | Token::Div => 2,
_ => 0,
}
}
}
struct Tokenizer {
number: Regex,
}
impl Tokenizer {
fn new() -> Tokenizer {
Tokenizer {
number: Regex::new(r"^[0-9]+").expect("Unable to create the regex"),
}
}
/// Tokenizes the input string into a Vec of Tokens.
fn tokenize(&self, mut input: &str) -> Vec<Token> {
let mut res = vec![];
loop {
input = input.trim_left();
if input.is_empty() { break }
let (token, rest) = match self.number.find(input) {
Some((_, end)) => {
let (num, rest) = input.split_at(end);
(Token::Number(num.parse().unwrap()), rest)
},
_ => {
match input.chars().next() {
Some(chr) => {
(match chr {
'+' => Token::Plus,
'-' => Token::Sub,
'*' => Token::Mul,
'/' => Token::Div,
'(' => Token::LeftParen,
')' => Token::RightParen,
_ => panic!("Unknown character!"),
}, &input[chr.len_utf8()..])
}
None => panic!("Ran out of input"),
}
}
};
res.push(token);
input = rest;
}
res
}
}
/// Transforms the tokens created by `tokenize` into RPN using the
/// [Shunting-yard algorithm](https://en.wikipedia.org/wiki/Shunting-yard_algorithm)
fn shunt(tokens: Vec<Token>) -> Vec<Token> {
let mut queue = vec![];
let mut stack: Vec<Token> = vec![];
for token in tokens {
match token {
Token::Number(_) => queue.push(token),
Token::Plus | Token::Sub | Token::Mul | Token::Div => {
while let Some(o) = stack.pop() {
if token.precedence() <= o.precedence() {
queue.push(o);
} else {
stack.push(o);
break;
}
}
stack.push(token)
},
Token::LeftParen => stack.push(token),
Token::RightParen => {
let mut found_paren = false;
while let Some(op) = stack.pop() {
match op {
Token::LeftParen => {
found_paren = true;
break;
},
_ => queue.push(op),
}
}
assert!(found_paren)
},
}
}
while let Some(op) = stack.pop() {
queue.push(op);
}
queue
}
/// Takes a Vec of Tokens converted to RPN by `shunt` and calculates the result
fn calculate(tokens: Vec<Token>) -> i64 {
let mut stack = vec![];
for token in tokens {
match token {
Token::Number(n) => stack.push(n),
Token::Plus => {
let (b, a) = (stack.pop().unwrap(), stack.pop().unwrap());
stack.push(a + b);
},
Token::Sub => {
let (b, a) = (stack.pop().unwrap(), stack.pop().unwrap());
stack.push(a - b);
},
Token::Mul => {
let (b, a) = (stack.pop().unwrap(), stack.pop().unwrap());
stack.push(a * b);
},
Token::Div => {
let (b, a) = (stack.pop().unwrap(), stack.pop().unwrap());
stack.push(a / b);
},
_ => {
// By the time the token stream gets here, all the LeftParen
// and RightParen tokens will have been removed by shunt()
unreachable!();
},
}
}
stack[0]
}
When I run it, however, it gives me this error:
error[E0308]: mismatched types
--> src\main.rs:66:22
|
66 | Some((_, end)) => {
| ^^^^^^^^ expected struct `regex::Match`, found tuple
|
= note: expected type `regex::Match<'_>`
found type `(_, _)`
It's complaining that I am using a tuple for the Some() method when I am supposed to use a token. I am not sure what to pass for the token, because it appears that the tuple is traversing through the Token options. How do I re-write this to make the Some() method recognize the tuple as a Token? I have been working on this for a day but I have not found any really good solutions.
The code you are referencing is over two years old. Notably, that predates regex 1.0. Version 0.1.80 defines Regex::find as:
fn find(&self, text: &str) -> Option<(usize, usize)>
while version 1.0.6 defines it as:
pub fn find<'t>(&self, text: &'t str) -> Option<Match<'t>>
However, Match defines methods to get the starting and ending indices the code was written assuming. In this case, since you only care about the end index, you can call Match::end:
let (token, rest) = match self.number.find(input).map(|x| x.end()) {
Some(end) => {
// ...

How do I return an Iterator that's generated by a function that takes &'a mut self (when self is created locally)?

Update: The title of the post has been updated, and the answer has been moved out of the question. The short answer is you can't. Please see my answer to this question.
I'm following an Error Handling blog post here (github for it is here), and I tried to make some modifications to the code so that the search function returns an Iterator instead of a Vec. This has been insanely difficult, and I'm stuck.
I've gotten up to this point:
fn search<'a, P: AsRef<Path>>(file_path: &Option<P>, city: &str)
-> Result<FilterMap<csv::reader::DecodedRecords<'a, Box<Read>, Row>,
FnMut(Result<Row, csv::Error>)
-> Option<Result<PopulationCount, csv::Error>>>,
CliError> {
let mut found = vec![];
let input: Box<io::Read> = match *file_path {
None => Box::new(io::stdin()),
Some(ref file_path) => Box::new(try!(fs::File::open(file_path))),
};
let mut rdr = csv::Reader::from_reader(input);
let closure = |row: Result<Row, csv::Error>| -> Option<Result<PopulationCount, csv::Error>> {
let row = match row {
Ok(row) => row,
Err(err) => return Some(Err(From::from(err))),
};
match row.population {
None => None,
Some(count) => if row.city == city {
Some(Ok(PopulationCount {
city: row.city,
country: row.country,
count: count,
}))
} else {
None
}
}
};
let found = rdr.decode::<Row>().filter_map(closure);
if !found.all(|row| match row {
Ok(_) => true,
_ => false,
}) {
Err(CliError::NotFound)
} else {
Ok(found)
}
}
with the following error from the compiler:
src/main.rs:97:1: 133:2 error: the trait `core::marker::Sized` is not implemented for the type `core::ops::FnMut(core::result::Result<Row, csv::Error>) -> core::option::Option<core::result::Result<PopulationCount, csv::Error>>` [E0277]
src/main.rs:97 fn search<'a, P: AsRef<Path>>(file_path: &Option<P>, city: &str) -> Result<FilterMap<csv::reader::DecodedRecords<'a, Box<Read>, Row>, FnMut(Result<Row, csv::Error>) -> Option<Result<PopulationCount, csv::Error>>>, CliError> {
src/main.rs:98 let mut found = vec![];
src/main.rs:99 let input: Box<io::Read> = match *file_path {
src/main.rs:100 None => Box::new(io::stdin()),
src/main.rs:101 Some(ref file_path) => Box::new(try!(fs::File::open(file_path))),
src/main.rs:102 };
...
src/main.rs:97:1: 133:2 note: `core::ops::FnMut(core::result::Result<Row, csv::Error>) -> core::option::Option<core::result::Result<PopulationCount, csv::Error>>` does not have a constant size known at compile-time
src/main.rs:97 fn search<'a, P: AsRef<Path>>(file_path: &Option<P>, city: &str) -> Result<FilterMap<csv::reader::DecodedRecords<'a, Box<Read>, Row>, FnMut(Result<Row, csv::Error>) -> Option<Result<PopulationCount, csv::Error>>>, CliError> {
src/main.rs:98 let mut found = vec![];
src/main.rs:99 let input: Box<io::Read> = match *file_path {
src/main.rs:100 None => Box::new(io::stdin()),
src/main.rs:101 Some(ref file_path) => Box::new(try!(fs::File::open(file_path))),
src/main.rs:102 };
...
error: aborting due to previous error
I've also tried this function definition:
fn search<'a, P: AsRef<Path>, F>(file_path: &Option<P>, city: &str)
-> Result<FilterMap<csv::reader::DecodedRecords<'a, Box<Read>, Row>, F>,
CliError>
where F: FnMut(Result<Row, csv::Error>)
-> Option<Result<PopulationCount, csv::Error>> {
with these errors from the compiler:
src/main.rs:131:12: 131:17 error: mismatched types:
expected `core::iter::FilterMap<csv::reader::DecodedRecords<'_, Box<std::io::Read>, Row>, F>`,
found `core::iter::FilterMap<csv::reader::DecodedRecords<'_, Box<std::io::Read>, Row>, [closure src/main.rs:105:19: 122:6]>`
(expected type parameter,
found closure) [E0308]
src/main.rs:131 Ok(found)
I can't Box the closure because then it won't be accepted by filter_map.
I then tried this out:
fn search<'a, P: AsRef<Path>>(file_path: &Option<P>, city: &'a str)
-> Result<(Box<Iterator<Item=Result<PopulationCount, csv::Error>> + 'a>, csv::Reader<Box<io::Read>>), CliError> {
let input: Box<io::Read> = match *file_path {
None => box io::stdin(),
Some(ref file_path) => box try!(fs::File::open(file_path)),
};
let mut rdr = csv::Reader::from_reader(input);
let mut found = rdr.decode::<Row>().filter_map(move |row| {
let row = match row {
Ok(row) => row,
Err(err) => return Some(Err(err)),
};
match row.population {
None => None,
Some(count) if row.city == city => {
Some(Ok(PopulationCount {
city: row.city,
country: row.country,
count: count,
}))
},
_ => None,
}
});
if found.size_hint().0 == 0 {
Err(CliError::NotFound)
} else {
Ok((box found, rdr))
}
}
fn main() {
let args: Args = Docopt::new(USAGE)
.and_then(|d| d.decode())
.unwrap_or_else(|err| err.exit());
match search(&args.arg_data_path, &args.arg_city) {
Err(CliError::NotFound) if args.flag_quiet => process::exit(1),
Err(err) => fatal!("{}", err),
Ok((pops, rdr)) => for pop in pops {
match pop {
Err(err) => panic!(err),
Ok(pop) => println!("{}, {}: {} - {:?}", pop.city, pop.country, pop.count, rdr.byte_offset()),
}
}
}
}
Which gives me this error:
src/main.rs:107:21: 107:24 error: `rdr` does not live long enough
src/main.rs:107 let mut found = rdr.decode::<Row>().filter_map(move |row| {
^~~
src/main.rs:100:117: 130:2 note: reference must be valid for the lifetime 'a as defined on the block at 100:116...
src/main.rs:100 -> Result<(Box<Iterator<Item=Result<PopulationCount, csv::Error>> + 'a>, csv::Reader<Box<io::Read>>), CliError> {
src/main.rs:101 let input: Box<io::Read> = match *file_path {
src/main.rs:102 None => box io::stdin(),
src/main.rs:103 Some(ref file_path) => box try!(fs::File::open(file_path)),
src/main.rs:104 };
src/main.rs:105
...
src/main.rs:106:51: 130:2 note: ...but borrowed value is only valid for the block suffix following statement 1 at 106:50
src/main.rs:106 let mut rdr = csv::Reader::from_reader(input);
src/main.rs:107 let mut found = rdr.decode::<Row>().filter_map(move |row| {
src/main.rs:108 let row = match row {
src/main.rs:109 Ok(row) => row,
src/main.rs:110 Err(err) => return Some(Err(err)),
src/main.rs:111 };
...
error: aborting due to previous error
Have I designed something wrong, or am I taking the wrong approach? Am I missing something really simple and stupid? I'm not sure where to go from here.
Returning iterators is possible, but it comes with some restrictions.
To demonstrate it's possible, two examples, (A) with explicit iterator type and (B) using boxing (playpen link).
use std::iter::FilterMap;
fn is_even(elt: i32) -> Option<i32> {
if elt % 2 == 0 {
Some(elt)
} else { None }
}
/// (A)
pub fn evens<I: IntoIterator<Item=i32>>(iter: I)
-> FilterMap<I::IntoIter, fn(I::Item) -> Option<I::Item>>
{
iter.into_iter().filter_map(is_even)
}
/// (B)
pub fn cumulative_sums<'a, I>(iter: I) -> Box<Iterator<Item=i32> + 'a>
where I: IntoIterator<Item=i32>,
I::IntoIter: 'a,
{
Box::new(iter.into_iter().scan(0, |acc, x| {
*acc += x;
Some(*acc)
}))
}
fn main() {
// The output is:
// 0 is even, 10 is even,
// 1, 3, 6, 10,
for even in evens(vec![0, 3, 7, 10]) {
print!("{} is even, ", even);
}
println!("");
for cs in cumulative_sums(1..5) {
print!("{}, ", cs);
}
println!("");
}
You experienced a problem with (A) -- explicit type! Unboxed closures, that we get from regular lambda expressions with |a, b, c| .. syntax, have unique anonymous types. Functions require explicit return types, so that doesn't work here.
Some solutions for returning closures:
Use a function pointer fn() as in example (A). Often you don't need a closure environment anyway.
Box the closure. This is reasonable, even if the iterators don't support calling it at the moment. Not your fault.
Box the iterator
Return a custom iterator struct. Requires some boilerplate.
You can see that in example (B) we have to be quite careful with lifetimes. It says that the return value is Box<Iterator<Item=i32> + 'a>, what is this 'a? This is the least lifetime required of anything inside the box! We also put the 'a bound on I::IntoIter -- this ensures we can put that inside the box.
If you just say Box<Iterator<Item=i32>> it will assume 'static.
We have to explicitly declare the lifetimes of the contents of our box. Just to be safe.
This is actually the fundamental problem with your function. You have this: DecodedRecords<'a, Box<Read>, Row>, F>
See that, an 'a! This type borrows something. The problem is it doesn't borrow it from the inputs. There are no 'a on the inputs.
You'll realize that it borrows from a value you create during the function, and that value's lifespan ends when the function returns. We cannot return DecodedRecords<'a> from the function, because it wants to borrow a local variable.
Where to go from here? My easiest answer would be to perform the same split that csv does. One part (Struct or value) that owns the reader, and one part (struct or value) that is the iterator and borrows from the reader.
Maybe the csv crate has an owning decoder that takes ownership of the reader it is processing. In that case you can use that to dispel the borrowing trouble.
This answer is based on #bluss's answer + help from #rust on irc.mozilla.org
One issue that's not obvious from the code, and which was causing the final error displayed just above, has to do with the definition of csv::Reader::decode (see its source). It takes &'a mut self, the explanation of this problem is covered in this answer. This essentially causes the lifetime of the reader to be bounded to the block it's called in. The way to fix this is to split the function in half (since I can't control the function definition, as recommended in the previous answer link). I needed a lifetime on the reader that was valid within the main function, so the reader could then be passed down into the search function. See the code below (It could definitely be cleaned up more):
fn population_count<'a, I>(iter: I, city: &'a str)
-> Box<Iterator<Item=Result<PopulationCount,csv::Error>> + 'a>
where I: IntoIterator<Item=Result<Row,csv::Error>>,
I::IntoIter: 'a,
{
Box::new(iter.into_iter().filter_map(move |row| {
let row = match row {
Ok(row) => row,
Err(err) => return Some(Err(err)),
};
match row.population {
None => None,
Some(count) if row.city == city => {
Some(Ok(PopulationCount {
city: row.city,
country: row.country,
count: count,
}))
},
_ => None,
}
}))
}
fn get_reader<P: AsRef<Path>>(file_path: &Option<P>)
-> Result<csv::Reader<Box<io::Read>>, CliError>
{
let input: Box<io::Read> = match *file_path {
None => Box::new(io::stdin()),
Some(ref file_path) => Box::new(try!(fs::File::open(file_path))),
};
Ok(csv::Reader::from_reader(input))
}
fn search<'a>(reader: &'a mut csv::Reader<Box<io::Read>>, city: &'a str)
-> Box<Iterator<Item=Result<PopulationCount, csv::Error>> + 'a>
{
population_count(reader.decode::<Row>(), city)
}
fn main() {
let args: Args = Docopt::new(USAGE)
.and_then(|d| d.decode())
.unwrap_or_else(|err| err.exit());
let reader = get_reader(&args.arg_data_path);
let mut reader = match reader {
Err(err) => fatal!("{}", err),
Ok(reader) => reader,
};
let populations = search(&mut reader, &args.arg_city);
let mut found = false;
for pop in populations {
found = true;
match pop {
Err(err) => fatal!("fatal !! {}", err),
Ok(pop) => println!("{}, {}: {}", pop.city, pop.country, pop.count),
}
}
if !(found || args.flag_quiet) {
fatal!("{}", CliError::NotFound);
}
}
I've learned a lot trying to get this to work, and have much more appreciation for the compiler errors. It's now clear that had this been C, the last error above could actually have caused segfaults, which would have been much harder to debug. I've also realized that converting from a pre-computed vec to an iterator requires more involved thinking about when the memory comes in and out of scope; I can't just change a few function calls and return types and call it a day.

Resources