Iterating over Multiple Lines Using the Rust NOM Parsing Library

Iterating over Multiple Lines Using the Rust NOM Parsing Library - rust

I'm trying to learn NOM for a project in Rust. I have a text file that consists of:
[tag="#43674"]char[/tag] with multiple tags back to back on each line. I'm trying to pull the '#43674' and 'char', store them in a tuple (x, y) and then push those into a vector Vec<(x, y)> for each line of the text file.
So far I have successfully combined parsers into two functions; one for the '#43674' and one for 'char' which I then combine together to return <IResult<&str, (String, String)>. Here is the code:
fn color_tag(i: &str) -> IResult<&str, &str> {
delimited(tag("[color="), take_until("]"), tag("]"))(i)
}
fn char_take(i: &str) -> IResult<&str, &str> {
terminated(take_until("[/color]"), tag("[/color]"))(i)
}
pub fn color_char(i: &str) -> IResult<&str, (String, String)> {
let (i, color) = color_tag(i)?;
let (i, chara) = char_take(i)?;
let colors = color.to_string();
let charas = chara.to_string();
let tuple = (colors, charas);
Ok((i, tuple))
}
How can I iterate this function over a given line of the text file? I already have a function that iterates the text file into lines, but I need color_char to repeat for each closure in that line. Am I missing the point entirely?

You'll probably want to use the nom::multi::many0 combinator to match a parser multiple times, and you can also use the nom::sequence::tuple combinator to combine your color_tag and char_take parsers
// Match color_tag followed by char_take
fn color_char(i: &str) -> IResult<&str, (&str, &str)> {
tuple((color_tag, char_take))(i)
}
// Match 0 or more times on the same line
fn color_char_multiple(i: &str) -> IResult<&str, Vec<(String, String)>> {
many0(color_char)(i)
}
To parse multiple lines, you can modify color_char() to match a trailing newline character with one of the character parsers provided by nom, such as nom::character::complete::line_ending, make it optional using nom::combinator::opt, and combine it with something like nom::sequence::terminated:
terminated(tuple((color_tag, char_take)), opt(line_ending))(i)

Related

Join Vec<String> into String after prepending each value

If given a Vec<String> of values, how can I get back a String where each has been prepended with -a . Here is my solution, and it works, but it feels obtuse, so I'd like to know the best way to do this.
input: ["foo", "bar"]
output: -a foo -a bar
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=5558637f3462414cd57926db0faaf20b
fn f(args: Vec<String>) -> String {
args.into_iter()
.map(|s| format!("-a {}", s))
.collect::<Vec<String>>().join(" ")
}
fn main() {
let args = ["foo", "bar"].iter().map(|&s| s.into()).collect();
println!("{}", f(args));
}

It's concise but there are two points of inefficiency here:
creating a new vec instead of modifying the vec in place
creating a lot of strings when you just want one string
A first improvement would be to just prepend to the strings in place but if you want something more direct you could do
fn f(args: Vec<String>) -> String {
let mut s = String::new();
for arg in args {
if !s.is_empty() {
s.push(' ');
}
s.push_str("-a ");
s.push_str(&arg);
}
s
}
If performance is irrelevant for your use case, then there's no problem with your code. Without additional crates join does need a vec today (you could use intersperse but only in nightly).

If you can accept a slight loss of speed - here is a shorter solution using fold:
fn f(args: Vec<String>) -> String {
args.iter()
.fold(String::new(), |s1, s2| s1 + " -a " + s2)
.trim_start().to_string()
}

Convert key value pair string to HashMap in one line

I have a string with contents like the following:
key1:value1 key2:value2 key3:value3 ...
I want to end up with a HashMap<&str, &str> (or equivalent), which maps the keys and values from the string (e.g. a hash lookup of "key1" should return "value1").
Currently, I am able to easily fill up the HashMap with the following:
let mut hm = HashMap::new();
for item in my_string.split_ascii_whitespace() {
let splits = item.split(":").collect::<Vec<&str>>();
hm.insert(splits[0], splits[1]);
}
However, what if I wanted to do it in one line (even at cost of readability, for "code golf"-ish purposes)? I know how to do this with a HashSet, so I'd imagine it would look somewhat similar; perhaps something like this (which doesn't actually compile):
let hm: HashMap<&str, &str> = HashMap::from_iter(my_string.split_ascii_whitespace().map(|s| s.split(":").take(2).collect::<Vec<&str>>()));
I have tried different combinations of things similar to the above, but I can't seem to find something that will actually compile.

My approach toward solving this problem was remembering that an Iterator<Item=(K, V)> can be collected into an HashMap<K, V> so upon knowing that I worked backwards to try to figure out how I could turn a &str into an Iterator<Item=(&str, &str)> which I managed to do using the String find and split_at methods:
use std::collections::HashMap;
fn one_liner(string: &str) -> HashMap<&str, &str> {
string.split_whitespace().map(|s| s.split_at(s.find(":").unwrap())).map(|(key, val)| (key, &val[1..])).collect()
}
fn main() {
dbg!(one_liner("key1:value1 key2:value2 key3:value3"));
}
playground
The second map call was necessary to remove the leading : character from the value string.

How can I take N bits of byte in nom?

I am trying to write a HTTP2 parser with nom. I'm implementing the HPACK header compression, but having trouble understanding how to work with bit fields in nom.
For example, the Indexed Header Field Representation starts with the first bit as 1.
fn indexed_header_field_tag(i: &[u8]) -> IResult<&[u8], ()> {
nom::bits::streaming::tag(1, 1)(i)
}
This gives me a compiler warning I don't really understand (To be honest, I'm having some problems with the types in nom):
error[E0308]: mismatched types
--> src/parser.rs:179:41
|
179 | nom::bits::streaming::tag(1, 1)(i)
| ^ expected tuple, found `&[u8]`
|
= note: expected tuple `(_, usize)`
found reference `&[u8]`
Wwhat should I put here?
Another example is:
fn take_2_bits(input: &[u8]) -> IResult<&[u8], u64> {
nom::bits::bits(nom::bits::streaming::take::<_, _, _, (_, _)>(2usize))(input)
}
Here, my problem is that the remaining bits of the first byte are discarded, even though I want to further work on them.
I guess I can do it manually with bitwise-ands, but doing it with nom would be nicer.
I've tried with the following approach, but this gives me many compiler warnings:
fn check_tag(input: &[u8]) -> IResult<&[u8], ()> {
use nom::bits::{bits, bytes, complete::take_bits, complete::tag};
let converted_bits = bits(take_bits(2usize))(2)?;
let something = tag(0x80, 2)(converted_bits);
nom::bits::bytes(something)
}
(Inspired from https://docs.rs/nom/5.1.2/nom/bits/fn.bytes.html).
It tells me, that there is no complete::take_bits (I guess only the documentation is a bit off there), but it also tells me:
368 | let converted_bits = bits(take_bits(2usize))(2)?;
| ^ the trait `nom::traits::Slice<std::ops::RangeFrom<usize>>` is not implemented for `{integer}`
and other errors, but which just result due to the first errors.

The bit-oriented interfaces (e.g. take) accept a tuple (I, usize), representing (input, bit_offset), so you need to use a function such as bits to convert the input from i to (i, 0), then convert the output back to bytes by ignoring any remaining bits in the current byte.
For the second question, see the comments on How can I combine nom parsers to get a more bit-oriented interface to the data? : use bits only when you need to switch between bits and bytes, and make bit-oriented functions use bit-oriented input.
Example code
use nom::{IResult, bits::{bits, complete::{take, tag}}};
fn take_2_bits(i: (&[u8], usize)) -> IResult<(&[u8], usize), u8> {
take(2usize)(i)
}
fn check_tag(i: (&[u8], usize)) -> IResult<(&[u8], usize), u8> {
tag(0x01, 1usize)(i)
}
fn do_everything_bits(i: (&[u8], usize)) -> IResult<(&[u8], usize), (u8, u8)> {
let (i, a) = take_2_bits(i)?;
let (i, b) = check_tag(i)?;
Ok((i, (a, b)))
}
fn do_everything_bytes(i: &[u8]) -> IResult<&[u8], (u8, u8)> {
bits(do_everything_bits)(i)
}

How to find the starting offset of a string slice of another string? [duplicate]

This question already has answers here:
How to get the byte offset between `&str`
(2 answers)
Closed 3 years ago.
Given a string and a slice referring to some substring, is it possible to find the starting and ending index of the slice?
I have a ParseString function which takes in a reference to a string, and tries to parse it according to some grammar:
ParseString(inp_string: &str) -> Result<(), &str>
If the parsing is fine, the result is just Ok(()), but if there's some error, it usually is in some substring, and the error instance is Err(e), where e is a slice of that substring.
When given the substring where the error occurs, I want to say something like "Error from characters x to y", where x and y are the starting and ending indices of the erroneous substring.
I don't want to encode the position of the errors directly in Err, because I'm nesting these invocations, and the offsets in the nested slice might not correspond to the some slice in the top level string.

As long as all of your string slices borrow from the same string buffer, you can calculate offsets with simple pointer arithmetic. You need the following methods:
str::as_ptr(): Returns the pointer to the start of the string slice
A way to get the difference between two pointers. Right now, the easiest way is to just cast both pointers to usize (which is always a no-op) and then subtract those. On 1.47.0+, there is a method offset_from() which is slightly nicer.
Here is working code (Playground):
fn get_range(whole_buffer: &str, part: &str) -> (usize, usize) {
let start = part.as_ptr() as usize - whole_buffer.as_ptr() as usize;
let end = start + part.len();
(start, end)
}
fn main() {
let input = "Everyone ♥ Ümläuts!";
let part1 = &input[1..7];
println!("'{}' has offset {:?}", part1, get_range(input, part1));
let part2 = &input[7..16];
println!("'{}' has offset {:?}", part2, get_range(input, part2));
}

Rust actually used to have an unstable method for doing exactly this, but it was removed due to being obsolete, which was a bit odd considering the replacement didn't remotely have the same functionality.
That said, the implementation isn't that big, so you can just add the following to your code somewhere:
pub trait SubsliceOffset {
/**
Returns the byte offset of an inner slice relative to an enclosing outer slice.
Examples
```ignore
let string = "a\nb\nc";
let lines: Vec<&str> = string.lines().collect();
assert!(string.subslice_offset_stable(lines[0]) == Some(0)); // &"a"
assert!(string.subslice_offset_stable(lines[1]) == Some(2)); // &"b"
assert!(string.subslice_offset_stable(lines[2]) == Some(4)); // &"c"
assert!(string.subslice_offset_stable("other!") == None);
```
*/
fn subslice_offset_stable(&self, inner: &Self) -> Option<usize>;
}
impl SubsliceOffset for str {
fn subslice_offset_stable(&self, inner: &str) -> Option<usize> {
let self_beg = self.as_ptr() as usize;
let inner = inner.as_ptr() as usize;
if inner < self_beg || inner > self_beg.wrapping_add(self.len()) {
None
} else {
Some(inner.wrapping_sub(self_beg))
}
}
}
You can remove the _stable suffix if you don't need to support old versions of Rust; it's just there to avoid a name conflict with the now-removed subslice_offset method.

Read a single line from stdin to a string in one line of code

I know that I can read one line and convert it to number in one line, i.e.
let lines: u32 = io::stdin().read_line().ok().unwrap().trim().parse().unwrap();
How to do the same without parse and in one line? Right now I do this:
let line_u = io::stdin().read_line().ok().unwrap();
let line_t = line_u.as_slice().trim();
Edit: Explanation what's going on here:
pub fn stdin() -> StdinReader
fn read_line(&mut self) -> IoResult<String> // method of StdinReader
type IoResult<T> = Result<T, IoError>;
fn ok(self) -> Option<T> // method of Result
fn unwrap(self) -> T // method of Option
fn trim(&self) -> &str // method of str from trait StrExt
fn to_string(?) -> String // I don't know where is this located in documentation
We can use trim on String, because String is a str decoreated with pointer, an owned string.
parse(),
stdin(),
read_line(),
IoResult,
ok(),
unwrap(),
trim(),
str

trim() returns an &str view of the String returned by unwrap(). You can't store this object because the owning String will no longer exist at the end of the statement. So just use to_string() to convert the &str back to a String.
let line = io::stdin().read_line().ok().unwrap().trim().to_string();

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Iterating over Multiple Lines Using the Rust NOM Parsing Library - rust

Related

Join Vec<String> into String after prepending each value

Convert key value pair string to HashMap in one line

How can I take N bits of byte in nom?

How to find the starting offset of a string slice of another string? [duplicate]

Read a single line from stdin to a string in one line of code

Categories

Resources