Extract substring between `\"` with nom - rust

From input "\"Name 1\" something else" I want to extract "Name 1" and the remaining string as " something else". Notice the escaped \".
My current solutions is
use nom::bytes::complete::{tag, is_not};
use nom::sequence::pair;
use nom::IResult;
fn parse_between(i: &str) -> IResult<&str, &str> {
let (i, (_o1, o2)) = pair(tag("\""), is_not("\""))(i)?;
if let Some(res) = i.strip_prefix("\"") {
return Ok((res, o2));
}
Ok((i, o2))
}
fn main() {
println!("{:?}", parse_between("\"Name 1\" something else"));
}
where the output is Ok((" something else", "Name 1")).
Is there a better way to do this? I feel as though calling strip_prefix is an extra step I shouldn't be doing?
Rust Playground link

I believe you're looking for delimited. Here's an example from the nom docs:
use nom::{
IResult,
sequence::delimited,
// see the "streaming/complete" paragraph lower for an explanation of these submodules
character::complete::char,
bytes::complete::is_not
};
fn parens(input: &str) -> IResult<&str, &str> {
delimited(char('('), is_not(")"), char(')'))(input)
}
Adopting it to do what you're looking for:
use nom::bytes::complete::is_not;
use nom::character::complete::char;
use nom::sequence::delimited;
use nom::IResult;
fn parse_between(i: &str) -> IResult<&str, &str> {
delimited(char('"'), is_not("\""), char('"'))(i)
}
fn main() {
println!("{:?}", parse_between("\"Name 1\" something else"));
// Ok((" something else", "Name 1"))
}

Related

Initialize a Vec with not-None values only

If I have variables like this:
let a: u32 = ...;
let b: Option<u32> = ...;
let c: u32 = ...;
, what is the shortest way to make a vector of those values, so that b is only included if it's Some?
In other words, is there something simpler than this:
let v = match b {
None => vec![a, c],
Some(x) => vec![a, x, c],
};
P.S. I would prefer a solution where we don't need to use the variables more than once. Consider this example:
let some_person: String = ...;
let best_man: Option<String> = ...;
let a_third_person: &str = ...;
let another_opt: Option<String> = ...;
...
As can be seen, we might have to use longer variable names, more than one Option (None), expressions (like a_third_person.to_string()), etc.
Yours is fine, but here's a sophisticated one:
[Some(a), b, Some(c)].into_iter().flatten().collect::<Vec<_>>()
This works since Option impls IntoIterator.
If it depends on just one variable:
b.map(|b| vec![a, b, c]).unwrap_or_else(|| vec![a, c]);
Playground
After some thinking and investigating, I've come with the following crazy thing.
The end goal is to have a macro, optional_vec![], that you can pass it either T or Option<T> and it should behave like described in the question. However, I decided on a strong restriction: it should have the best performance possible. So, you write:
optional_vec![a, b, c]
And get at least the performance of hand-written match, if not more. This forbids the use of the simple [Some(a), b, Some(c)].into_iter().flatten().collect::<Vec<_>>(), suggested in my other answer (though even this solution needs some way to differentiate between Option<T> and just T, which, like we'll see, is not an easy problem at all).
I will first warn that I've not found a way to make my macro work with Option. That is, if you want to build a vector of Option<T> from Option<T> and Option<Option<T>>, it will not work.
When a design a complex macro, I like to think first how the expanded code will look like. And in this macro, we have several hard problems to solve.
First, the macro take plain expressions. But somehow, it needs to switch on their type being T or Option<T>. How should such thing be done?
The feature we use to do such things is specialization.
#![feature(specialization)]
pub trait Optional {
fn some_method(self);
}
impl<T> Optional for T {
default fn some_method(self) {
// Just T
}
}
impl<T> Optional for Option<T> {
fn some_method(self) {
// Option<T>
}
}
Like you probably noticed, now we have two problems: first, specialization is unstable, and I'd like to stay with stable. Second, what should be inside the trait? The second problem is easier to solve, so let's begin with it.
Turns out that the most performant way to do the pushing to the vector is to pre-allocate capacity (Vec::with_capacity), write to the vector by using pointers (don't push(), it optimizes badly!) then set the length (Vec::set_len()).
We can get a pointer to the internal buffer of the vector using Vec::as_mut_ptr(), and advance the pointer via <*mut T>::add(1).
So, we need two methods: one to hint us about the capacity (can be zero for None or one for Some() and non-Option elements), and a write_and_advance() method:
pub trait Optional {
type Item;
fn len(&self) -> usize;
unsafe fn write_and_advance(self, place: &mut *mut Self::Item);
}
impl<T> Optional for T {
default type Item = Self;
default fn len(&self) -> usize { 1 }
default unsafe fn write_and_advance(self, place: &mut *mut Self) {
place.write(self);
*place = place.add(1);
}
}
impl<T> Optional<T> for Option<T> {
type Item = T;
fn len(&self) -> usize { self.is_some() as usize }
unsafe fn write_and_advance(self, place: &mut *mut T) {
if let Some(value) = self {
place.write(value);
*place = place.add(1);
}
}
}
It doesn't even compile! For the why, see Mismatch between associated type and type parameter only when impl is marked `default`. Luckily for us, the trick we'll use to workaround specialization not being stable does work in this situation. But for now, let's assume it works. How will the code using this trait look like?
match (a, b, c) { // The match is here because it's the best binding for liftimes: see https://stackoverflow.com/a/54855986/7884305
(a, b, c) => {
let len = Optional::len(&a) + Optional::len(&b) + Optional::len(&c);
let mut result = ::std::vec::Vec::with_capacity(len);
let mut next_element = result.as_mut_ptr();
unsafe {
Optional::write_and_advance(a, &mut next_element);
Optional::write_and_advance(b, &mut next_element);
Optional::write_and_advance(c, &mut next_element);
result.set_len(len);
}
result
}
}
And it works! Except that it does not, because the specialization does not compile as I said, and we also want to not repeat all of this boilerplate but insert it into a macro.
So, how do we solve the problems with specialization: being unstable and not working?
dtonlay has a very cool trick he calls autoref specialization (BTW, all of this repo is a very recommended reading!). This is a trick that can be used to emulate specialization. It works only in macros, but we're in a macro so this is fine.
I will not elaborate about the trick here (I recommend to read his post; he also used this trick in the excellent and very widely used anyhow crate). In short, the idea is to trick the typechecker by implementing a trait for T under certain conditions (the specialized impl) and other trait for &T for the general case (this could be inherent impl if not coherence). Since Rust performs automatic referencing during method resolution, that is take reference to the receiver as needed, this will work - the typechecker will autoref if needed, and will stop in the first applicable impl - i.e. the specialized impl if it matches, or the general impl otherwise.
Here's an example:
use std::fmt;
pub trait Display {
fn foo(&self);
}
// Level 1
impl<T: fmt::Display> Display for T {
fn foo(&self) { println!("Display({}), {}", std::any::type_name::<T>(), self); }
}
pub trait Debug {
fn foo(&self);
}
// Level 2
impl<T: fmt::Debug> Debug for &T {
fn foo(&self) { println!("Debug({}), {:?}", std::any::type_name::<T>(), self); }
}
macro_rules! foo {
($e:expr) => ((&$e).foo());
}
Playground.
We can use this trick in our case:
#[doc(hidden)]
pub mod autoref_specialization {
#[derive(Copy, Clone)]
pub struct OptionTag;
pub trait OptionKind {
fn optional_kind(&self) -> OptionTag;
}
impl<T> OptionKind for Option<T> {
#[inline(always)]
fn optional_kind(&self) -> OptionTag { OptionTag }
}
impl OptionTag {
#[inline(always)]
pub fn len<T>(self, this: &Option<T>) -> usize { this.is_some() as usize }
#[inline(always)]
pub unsafe fn write_and_advance<T>(self, this: Option<T>, place: &mut *mut T) {
if let Some(value) = this {
place.write(value);
*place = place.add(1);
}
}
}
#[derive(Copy, Clone)]
pub struct DefaultTag;
pub trait DefaultKind {
fn optional_kind(&self) -> DefaultTag;
}
impl<T> DefaultKind for &'_ T {
#[inline(always)]
fn optional_kind(&self) -> DefaultTag { DefaultTag }
}
impl DefaultTag {
#[inline(always)]
pub fn len<T>(self, _this: &T) -> usize { 1 }
#[inline(always)]
pub unsafe fn write_and_advance<T>(self, this: T, place: &mut *mut T) {
place.write(this);
*place = place.add(1);
}
}
}
And the expanded code will look like:
use autoref_specialization::{DefaultKind as _, OptionKind as _};
match (a, b, c) {
(a, b, c) => {
let (a_tag, b_tag, c_tag) = (
(&a).optional_kind(),
(&b).optional_kind(),
(&c).optional_kind(),
);
let len = a_tag.len(&a) + b_tag.len(&b) + c_tag.len(&c);
let mut result = ::std::vec::Vec::with_capacity(len);
let mut next_element = result.as_mut_ptr();
unsafe {
a_tag.write_and_advance(a, &mut next_element);
b_tag.write_and_advance(b, &mut next_element);
c_tag.write_and_advance(c, &mut next_element);
result.set_len(len);
}
result
}
}
It may be tempting to try to convert this immediately into a macro, but we still have one unsolved problem: our macro need to generate identifiers. This may not be obvious, but what if we pass optional_vec![1, Some(2), 3]? We need to generate the bindings for the match (in our case, (a, b, c) => ...) and the tag names ((a_tag, b_tag, c_tag)).
Unfortunately, generating names is not something macro_rules! can do in today's Rust. Fortunately, there is an excellent crate paste (another one from dtonlay!) that is a small proc-macro that allows you to do that. It is even available on the playground!
However, we need a series of identifiers. That can be done with tt-munching, by repeatedly adding some letter (I used a), so you get a, aa, aaa, ... you get the idea.
#[doc(hidden)]
pub mod reexports {
pub use std::vec::Vec;
pub use paste::paste;
}
#[macro_export]
macro_rules! optional_vec {
// Empty case
{ #generate_idents
exprs = []
processed_exprs = [$($e:expr,)*]
match_bindings = [$($binding:ident)*]
tags = [$($tag:ident)*]
} => {{
use $crate::autoref_specialization::{DefaultKind as _, OptionKind as _};
match ($($e,)*) {
($($binding,)*) => {
let ($($tag,)*) = (
$((&$binding).optional_kind(),)*
);
let len = 0 $(+ $tag.len(&$binding))*;
let mut result = $crate::reexports::Vec::with_capacity(len);
let mut next_element = result.as_mut_ptr();
unsafe {
$($tag.write_and_advance($binding, &mut next_element);)*
result.set_len(len);
}
result
}
}
}};
{ #generate_idents
exprs = [$e:expr, $($rest:expr,)*]
processed_exprs = [$($processed_exprs:tt)*]
match_bindings = [$first_binding:ident $($bindings:ident)*]
tags = [$($tags:ident)*]
} => {
$crate::reexports::paste! {
$crate::optional_vec! { #generate_idents
exprs = [$($rest,)*]
processed_exprs = [$($processed_exprs)* $e,]
match_bindings = [
[< $first_binding a >]
$first_binding
$($bindings)*
]
tags = [
[< $first_binding a_tag >]
$($tags)*
]
}
}
};
// Entry
[$e:expr $(, $exprs:expr)* $(,)?] => {
$crate::optional_vec! { #generate_idents
exprs = [$($exprs,)+]
processed_exprs = [$e,]
match_bindings = [__optional_vec_a]
tags = [__optional_vec_a_tag]
}
};
}
Playground.
I can also personally recommend
let mut v = vec![a, c];
v.extend(b);
Short and clear.
Sometime the straight forward solution is the best:
fn jim_power(a: u32, b: Option<u32>, c: u32) -> Vec<u32> {
let mut acc = Vec::with_capacity(3);
acc.push(a);
if let Some(b) = b {
acc.push(b);
}
acc.push(c);
acc
}
fn ys_iii(
some_person: String,
best_man: Option<String>,
a_third_person: String,
another_opt: Option<String>,
) -> Vec<String> {
let mut acc = Vec::with_capacity(4);
acc.push(some_person);
best_man.map(|x| acc.push(x));
acc.push(a_third_person);
another_opt.map(|x| acc.push(x));
acc
}
If you don't care about the order of the values, another option is
Iterator::chain(
[a, c].into_iter(),
[b].into_iter().flatten()
).collect()
Playground

How to use nom to parse until a string is found?

It's easy to use nom to parse a string until a character is found. How to use nom to gobble a string until a delimiter or the end? deals with this.
How do I do the same with a string (multiple characters) instead of a single delimiter?
For example, to parse abchello, I want to parse everything until hello is found.
take_until parse everything up to the provided string, excluded.
use nom::{bytes::complete::take_until, IResult};
fn parser(s: &str) -> IResult<&str, &str> {
take_until("hello")(s)
}
fn main() {
let result = parser("abchello");
assert_eq!(Ok(("hello", "abc")), result);
}
This code returns the correct result.
use nom::{IResult, bytes::complete::is_not};
fn parser(s: &str) -> IResult<&str, &str> {
is_not("hello")(s)
}
fn main() {
let result = parser("abchello");
println!("{:?}", result);
}
The documentation is here.
cargo run
-> Ok(("hello", "abc"))

Rust: specify template arguments during "use ... as" import

I'm trying to specify a template parameter of an imported class, so that I don't need to specify it each time I want to use it. Something like this:
use self::binary_heap_plus::BinaryHeap<T,MinComparator> as BinaryMinHeap<T>;
Is this possible?
Is this possible?
Yes it is possible like following:
pub type CustomResult<T> = Result<T, MyError>;
#[derive(Debug)]
pub enum MyError {
MyError1,
}
fn result_returner(prm: i32) -> CustomResult<i32> {
if prm == 1 {
Ok(5)
} else {
Err(MyError::MyError1)
}
}
And also you can make such like type name changings on import as well:
use std::collections::HashMap as CustomNamedMap;
fn main() {
let mut my_map = CustomNamedMap::new();
my_map.insert(1, 2);
println!("Value: {:?}", my_map[&1]);
}
Playground

In Rust, how can I make this code less repetitive?

The goal is to write a function that gets two paths, input_dir and output_dir, and convertes all markdown files from input_dir to html files in output_dir.
I finally managed to get it to run but it was rather frustrating. The parts that should be hard are super easy: the actual conversion from Markdown to HTML is effectively only one line. The seemingly easy parts are what took me the longest. Using a vector of paths and put all files into it is something I replaced with the glob crate. Not because I couldn't get it to work but it was a mess of if let and unwrap. A simple function that iterates over the list of elements and figures out which of them are actually files and not directories? Either I need four indentation levels if if let or I freak out over matches.
What am I doing wrong?
But lets start with some things I tried to get a list of items in a directory filtered to only contain actual files:
use std::fs;
use std::vec::Vec;
fn list_files (path: &str) -> Result<Vec<&str>, &str> {
if let Ok(dir_list) = fs::read_dir(path) {
Ok(dir_list.filter_map(|e| {
match e {
Ok(entry) => match entry.file_type() {
Ok(_) => entry.file_name().to_str(),
_ => None
},
_ => None
}
}).collect())
} else {
Err("nope")
}
}
fn main() {
let files = list_files("testdir");
println!("{:?}", files.unwrap_or(Vec::new()));
}
So, this code doesn't build, because the file name in Line 10 doesn't live long enough. I guess I could somehow create an owned String but that would introduce another nesting level because OsStr.to_string() returns a Result.
Now I looked through the code of the glob crate and they just use a mutable vector:
fn list_files (path: &str) -> Result<Vec<&str>, &str> {
let mut list = Vec::new();
if let Ok(dir_list) = fs::read_dir(path) {
for entry in dir_list {
if let Ok(entry) = entry {
if let Ok(file_type) = entry.file_type() {
if file_type.is_file() {
if let Some(name) = entry.file_name().to_str() {
list.push(name)
}
}
}
}
}
Ok(list)
} else {
Err("nope")
}
}
This not only adds crazy nesting, it also fails with the same problem. If I change from Vec<&str> to Vec<String>, it works:
fn list_files (path: &str) -> Result<Vec<String>, &str> {
let mut list = Vec::new();
if let Ok(dir_list) = fs::read_dir(path) {
for entry in dir_list {
if let Ok(entry) = entry {
if let Ok(file_type) = entry.file_type() {
if file_type.is_file() {
if let Ok(name) = entry.file_name().into_string() {
list.push(name)
}
}
}
}
}
Ok(list)
} else {
Err("nope")
}
}
Looks like I should apply that to my first try, right?
fn list_files (path: &str) -> Result<Vec<String>, &str> {
if let Ok(dir_list) = fs::read_dir(path) {
Ok(dir_list.filter_map(|e| {
match e {
Ok(entry) => match entry.file_type() {
Ok(_) => Some(entry.file_name().into_string().ok()),
_ => None
},
_ => None
}
}).collect())
} else {
Err("nope")
}
}
At least a bit shorter… but it fails to compile because a collection of type std::vec::Vec<std::string::String> cannot be built from an iterator over elements of type std::option::Option<std::string::String>.
It is hard to stay patient here. Why does .filter_map return Options instead of just using them to filter? Now I have to change line 15 from }).collect()) to }).map(|e| e.unwrap()).collect()) which iterates once more over the result set.
That can't be right!
You can massively rely on ? operator:
use std::fs;
use std::io::{Error, ErrorKind};
fn list_files(path: &str) -> Result<Vec<String>, Error> {
let mut list = Vec::new();
for entry in fs::read_dir(path)? {
let entry = entry?;
if entry.file_type()?.is_file() {
list.push(entry.file_name().into_string().map_err(|_| {
Error::new(ErrorKind::InvalidData, "Cannot convert file name")
})?)
}
}
Ok(list)
}
Do not forget that you can split your code into functions or implement your own traits to simplify the final code:
use std::fs;
use std::io::{Error, ErrorKind};
trait CustomGetFileName {
fn get_file_name(self) -> Result<String, Error>;
}
impl CustomGetFileName for std::fs::DirEntry {
fn get_file_name(self) -> Result<String, Error> {
Ok(self.file_name().into_string().map_err(|_|
Error::new(ErrorKind::InvalidData, "Cannot convert file name")
)?)
}
}
fn list_files(path: &str) -> Result<Vec<String>, Error> {
let mut list = Vec::new();
for entry in fs::read_dir(path)? {
let entry = entry?;
if entry.file_type()?.is_file() {
list.push(entry.get_file_name()?)
}
}
Ok(list)
}
An alternative answer with iterators, playground
use std::fs;
use std::error::Error;
use std::path::PathBuf;
fn list_files(path: &str) -> Result<Vec<PathBuf>, Box<Error>> {
let x = fs::read_dir(path)?
.filter_map(|e| e.ok())
.filter(|e| e.metadata().is_ok())
.filter(|e| e.metadata().unwrap().is_file())
.map(|e| e.path())
.collect();
Ok(x)
}
fn main() {
let path = ".";
for res in list_files(path).unwrap() {
println!("{:#?}", res);
}
}

How do I convert a Peekable iterator back to the original iterator?

I want to implement an algorithm that skips ! or !^num at the start of a string:
fn extract_common_part(a: &str) -> Option<&str> {
let mut it = a.chars();
if it.next() != Some('!') {
return None;
}
let mut jt = it.clone().peekable();
if jt.peek() == Some(&'^') {
it.next();
jt.next();
while jt.peek().map_or(false, |v| !v.is_whitespace()) {
it.next();
jt.next();
}
it.next();
}
Some(it.as_str())
}
fn main() {
assert_eq!(extract_common_part("!^4324 1234"), Some("1234"));
assert_eq!(extract_common_part("!1234"), Some("1234"));
}
playground
This works, but I can not find way to return from Peekable to Chars, so I have to advance it and jt iterators. This causes duplicate code.
How can I return from Peekable iterator to corresponding Chars iterator, or maybe there is a simpler way to implement this algorithm?
In short, you cannot. The general answer is to use something like Iterator::by_ref to avoid consuming the Chars iterator:
fn extract_common_part(a: &str) -> Option<&str> {
let mut it = a.chars();
if it.next() != Some('!') {
return None;
}
{
let mut jt = it.by_ref().peekable();
if jt.peek() == Some(&'^') {
jt.next();
while jt.peek().map_or(false, |v| !v.is_whitespace()) {
jt.next();
}
}
}
Some(it.as_str())
}
The problem is that when you call peek and it fails, the underlying iterator has already been advanced. Getting the rest of the string will lose the character that tested false, returning 234.
However, Itertools has peeking_take_while and take_while_ref, both of which should solve the issue.
extern crate itertools;
use itertools::Itertools;
fn extract_common_part(a: &str) -> Option<&str> {
let mut it = a.chars();
if it.next() != Some('!') {
return None;
}
if it.peeking_take_while(|&c| c == '^').next() == Some('^') {
for _ in it.peeking_take_while(|v| !v.is_whitespace()) {}
for _ in it.peeking_take_while(|v| v.is_whitespace()) {}
}
Some(it.as_str())
}
Other options include:
using a crate like strcursor which is designed for this kind of incremental advance over a string.
do the parsing on regular strings directly, and hope the optimizer eliminates redundant bounds checks.
Use a regex or other parsing library
If you are only interested in the result, without validation:
fn extract_common_part(a: &str) -> Option<&str> {
a.chars().rev().position(|v| v.is_whitespace() || v == '!')
.map(|pos| &a[a.len() - pos..])
}
fn main() {
assert_eq!(extract_common_part("!^4324 1234"), Some("1234"));
assert_eq!(extract_common_part("!1234"), Some("1234"));
}

Resources