How can I take N bits of byte in nom?

How can I take N bits of byte in nom? - rust

I am trying to write a HTTP2 parser with nom. I'm implementing the HPACK header compression, but having trouble understanding how to work with bit fields in nom.
For example, the Indexed Header Field Representation starts with the first bit as 1.
fn indexed_header_field_tag(i: &[u8]) -> IResult<&[u8], ()> {
nom::bits::streaming::tag(1, 1)(i)
}
This gives me a compiler warning I don't really understand (To be honest, I'm having some problems with the types in nom):
error[E0308]: mismatched types
--> src/parser.rs:179:41
|
179 | nom::bits::streaming::tag(1, 1)(i)
| ^ expected tuple, found `&[u8]`
|
= note: expected tuple `(_, usize)`
found reference `&[u8]`
Wwhat should I put here?
Another example is:
fn take_2_bits(input: &[u8]) -> IResult<&[u8], u64> {
nom::bits::bits(nom::bits::streaming::take::<_, _, _, (_, _)>(2usize))(input)
}
Here, my problem is that the remaining bits of the first byte are discarded, even though I want to further work on them.
I guess I can do it manually with bitwise-ands, but doing it with nom would be nicer.
I've tried with the following approach, but this gives me many compiler warnings:
fn check_tag(input: &[u8]) -> IResult<&[u8], ()> {
use nom::bits::{bits, bytes, complete::take_bits, complete::tag};
let converted_bits = bits(take_bits(2usize))(2)?;
let something = tag(0x80, 2)(converted_bits);
nom::bits::bytes(something)
}
(Inspired from https://docs.rs/nom/5.1.2/nom/bits/fn.bytes.html).
It tells me, that there is no complete::take_bits (I guess only the documentation is a bit off there), but it also tells me:
368 | let converted_bits = bits(take_bits(2usize))(2)?;
| ^ the trait `nom::traits::Slice<std::ops::RangeFrom<usize>>` is not implemented for `{integer}`
and other errors, but which just result due to the first errors.

The bit-oriented interfaces (e.g. take) accept a tuple (I, usize), representing (input, bit_offset), so you need to use a function such as bits to convert the input from i to (i, 0), then convert the output back to bytes by ignoring any remaining bits in the current byte.
For the second question, see the comments on How can I combine nom parsers to get a more bit-oriented interface to the data? : use bits only when you need to switch between bits and bytes, and make bit-oriented functions use bit-oriented input.
Example code
use nom::{IResult, bits::{bits, complete::{take, tag}}};
fn take_2_bits(i: (&[u8], usize)) -> IResult<(&[u8], usize), u8> {
take(2usize)(i)
}
fn check_tag(i: (&[u8], usize)) -> IResult<(&[u8], usize), u8> {
tag(0x01, 1usize)(i)
}
fn do_everything_bits(i: (&[u8], usize)) -> IResult<(&[u8], usize), (u8, u8)> {
let (i, a) = take_2_bits(i)?;
let (i, b) = check_tag(i)?;
Ok((i, (a, b)))
}
fn do_everything_bytes(i: &[u8]) -> IResult<&[u8], (u8, u8)> {
bits(do_everything_bits)(i)
}

Related

how to deal with this fundamental error in rust?

im writing a program to convert a number to sorted reversed array of digits.
eg : 23453 -> vec![5,4,3,3,2]
but i got this error! and i cant fix this
error[E0599]: no method named `sorted` found for struct `Chars` in the current scope
--> src/main.rs:2050:25
|
2050 | n.to_string().chars().sorted().map(|no| no.to_digit(10).unwrap() as u32).rev().collect::<Vec<u32>>()
| ^^^^^^ method not found in `Chars<'_>`
error[E0277]: `Vec<u32>` doesn't implement `std::fmt::Display`
here is my code,
fn sorted_rev_arr(n : u32) -> Vec<u32>{
n.to_string().chars().sorted().map(|no| no.to_digit(10).unwrap() as u32).rev().collect::<Vec<u32>>()
}
fn main(){
let random_number = 23453;
println!("the new number in array is {}",sorted_rev_arr(random_number));
}
can anybody help me to resolve this issue ?

There is no sorted method in Rust iterators or Vec. You'll have to collect to a Vec first and then sort it:
fn sorted_rev_arr(n: u32) -> Vec<u32> {
let mut digits = n
.to_string()
.chars()
.map(|no| no.to_digit(10).unwrap() as u32)
.collect::<Vec<u32>>();
digits.sort();
digits.reverse();
digits
}
You can also do a reverse sort in one go:
fn sorted_rev_arr(n: u32) -> Vec<u32> {
let mut digits = n
.to_string()
.chars()
.map(|no| no.to_digit(10).unwrap() as u32)
.collect::<Vec<u32>>();
digits.sort_by(|a, b| b.cmp(&a));
digits
}
Also, you need to use {:?} instead of {} to print a Vec.
Playground

Why doesn't &String automatically become &str in some cases?

In this toy example I'd like to map the items from a HashMap<String, String> with a helper function. There are two versions defined, one that takes arguments of the form &String and another with &str. Only the &String one compiles. I had thought that String always dereferences to &str but that doesn't seem to be the case here. What's the difference between a &String and a &str?
use std::collections::HashMap;
// &String works
fn process_item_1(key_value: (&String, &String)) -> String {
let mut result = key_value.0.to_string();
result.push_str(", ");
result.push_str(key_value.1);
result
}
// &str doesn't work (type mismatch in fn arguments)
fn process_item_2(key_value: (&str, &str)) -> String {
let mut result = key_value.0.to_string();
result.push_str(", ");
result.push_str(key_value.1);
result
}
fn main() {
let mut map: HashMap<String, String> = HashMap::new();
map.insert("a".to_string(), "b".to_string());
for s in map.iter().map(process_item_2) { // <-- compile error on this line
println!("{}", s);
}
}
Here's the error for reference:
error[E0631]: type mismatch in function arguments
--> src/main.rs:23:29
|
12 | fn process_item_2(key_value: (&str, &str)) -> String {
| ---------------------------------------------------- found signature of `for<'r, 's> fn((&'r str, &'s str)) -> _`
...
23 | for s in map.iter().map(process_item_2) {
| ^^^^^^^^^^^^^^ expected signature of `fn((&String, &String)) -> _`
Thanks for your help with a beginner Rust question!

It goes even stranger than that:
map.iter().map(|s| process_item_2(s)) // Does not work
map.iter().map(|(s1, s2)| process_item_2((s1, s2))) // Works
The point is that Rust never performs any expensive coercion. Converting &String to &str is cheap: you just take the data pointer and length. But converting (&String, &String) to (&str, &str) is no so cheap anymore: you have to take the data+length of the first string, then of the second string, then concatnate them together (and also, if it was done for tuple, what about (((&String, &String, &String), &String), (&String, &String))? And it was probably done then for arrays too, so what about &[&String; 10_000]?) That's why the first closure fails. The second closure, however, destruct the tuple and rebuild it. That means that instead of coercing a tuple, we coerce &String twice, and build a tuple from the results. That's fine.
The version without the closure is even more expensive: since you're passing a function directly to map(), and map produces &String, someone needs to convert this to &str! In order to do that, the compiler would have to introduce a shim - a small function that does that works: it takes (&String, &String) and calls process_item_2() with the (&String, &String) coerced to (&str, &str). This is a hidden cost, so Rust (almost) never creates shims. This is why it wouldn't work even for &String and not just for (&String, &String). And why |v| f(v) is not always the same as f - the first one performs coercions, while the second doesn't.

Formatting a byte slice in Rust

Using Rust, I want to take a slice of bytes from a vec and display them as hex, on the console, I can make this work using the itertools format function, and println!, but I cannot figure out how it works, here is the code, simplified...
use itertools::Itertools;
// Create a vec of bytes
let mut buf = vec![0; 1024];
... fill the vec with some data, doesn't matter how, I'm reading from a socket ...
// Create a slice into the vec
let bytes = &buf[..5];
// Print the slice, using format from itertools, example output could be: 30 27 02 01 00
println!("{:02x}", bytes.iter().format(" "));
(as an aside, I realize I can use the much simpler itertools join function, but in this case I don't want the default 0x## style formatting, as it is somewhat bulky)
How on earth does this work under the covers? I know itertools format is creating a "Format" struct, and I can see the source code here, https://github.com/rust-itertools/itertools/blob/master/src/format.rs , but I am none the wiser. I suspect the answer has to do with "macro_rules! impl_format" but that is just about where my head explodes.
Can some Rust expert explain the magic? I hate to blindly copy paste code without a clue, am I abusing itertools, maybe there a better, simpler way to go about this.

I suspect the answer has to do with "macro_rules! impl_format" but that is just about where my head explodes.
The impl_format! macro is used to implement the various formatting traits.
impl_format!{Display Debug
UpperExp LowerExp UpperHex LowerHex Octal Binary Pointer}
The author has chosen to write a macro because the implementations all look the same. The way repetitions work in macros means that macros can be very helpful even when they are used only once (here, we could do the same by invoking the macro once for each trait, but that's not true in general).
Let's expand the implementation of LowerHex for Format and look at it:
impl<'a, I> fmt::LowerHex for Format<'a, I>
where I: Iterator,
I::Item: fmt::LowerHex,
{
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
self.format(f, fmt::LowerHex::fmt)
}
}
The fmt method calls another method, format, defined in the same module.
impl<'a, I> Format<'a, I>
where I: Iterator,
{
fn format<F>(&self, f: &mut fmt::Formatter, mut cb: F) -> fmt::Result
where F: FnMut(&I::Item, &mut fmt::Formatter) -> fmt::Result,
{
let mut iter = match self.inner.borrow_mut().take() {
Some(t) => t,
None => panic!("Format: was already formatted once"),
};
if let Some(fst) = iter.next() {
cb(&fst, f)?;
for elt in iter {
if self.sep.len() > 0 {
f.write_str(self.sep)?;
}
cb(&elt, f)?;
}
}
Ok(())
}
}
format takes two arguments: the formatter (f) and a formatting function (cb for callback). The formatting function here is fmt::LowerHex::fmt. This is the fmt method from the LowerHex trait; how does the compiler figure out which LowerHex implementation to use? It's inferred from format's type signature. The type of cb is F, and F must implement FnMut(&I::Item, &mut fmt::Formatter) -> fmt::Result. Notice the type of the first argument: &I::Item (I is the type of the iterator that was passed to format). LowerHex::fmt' signature is:
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result;
For any type Self that implements LowerHex, this function will implement FnMut(&Self, &mut fmt::Formatter) -> fmt::Result. Thus, the compiler infers that Self == I::Item.
One important thing to note here is that the formatting attributes (e.g. the 02 in your formatting string) are stored in the Formatter. Implementations of e.g. LowerHex will use methods such as Formatter::width to retrieve an attribute. The trick here is that the same formatter is used to format multiple values (with the same attributes).
In Rust, methods can be called in two ways: using method syntax and using function syntax. These two functions are equivalent:
use std::fmt;
pub fn method_syntax(f: &mut fmt::Formatter) -> fmt::Result {
use fmt::LowerHex;
let x = 42u8;
x.fmt(f)
}
pub fn function_syntax(f: &mut fmt::Formatter) -> fmt::Result {
let x = 42u8;
fmt::LowerHex::fmt(&x, f)
}
When format is called with fmt::LowerHex::fmt, this means that cb refers to fmt::LowerHex::fmt. format must use function call because there's no guarantee that the callback is even a method!
am I abusing itertools
Not at all; in fact, this is precisely how format is meant to be used.
maybe there a better, simpler way to go about this
There are simpler ways, sure, but using format is very efficient because it doesn't allocate dynamic memory.

Catch string between tags with nom delimited

I'm trying to learn to use nom (5.0.1) and want to get the string between two tags:
use nom::{
bytes::complete::{tag_no_case, take_while},
character::{is_alphanumeric},
error::{ParseError},
sequence::{delimited},
IResult,
};
fn root<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, &str, E> {
delimited(
tag_no_case("START;"),
take_while(is_alphanumeric),
tag_no_case("END;"),
)(i)
}
But this gives me the error
error[E0271]: type mismatch resolving `<&str as nom::InputTakeAtPosition>::Item == u8`
--> src/main.rs:128:9
|
128 | take_while(is_alphanumeric),
| ^^^^^^^^^^^ expected char, found u8
What have I done wrong here? I'm fairly new to Rust and a total beginner with nom so I'm expecting it to be something really obvious in the end :)

The is_alphanumeric from nom expects a parameter of type u8, but you give it a char. Use is_alphanumeric from std instead:
fn root<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, &str, E> {
delimited(
tag_no_case("START;"),
take_while(char::is_alphanumeric),
tag_no_case("END;"),
)(i)
}

I don't have all the info needed (exact type of take_while and is_alphanumeric), so I'll try to approximate one.
As you want take_while(F)(i) to returns &str (ie. almost &[char]), then take_while(F) must have type impl Fn(&[char]) -> IResult<&[char], &[char], Error>.
However, take_while takes an argument of type Fn(<Input as InputTakeAtPosition>::Item) -> bool, and returns impl Fn(Input) -> IResult<Input, Input, Error>.
So, this means F the argument to take_while must have type Fn(&[char]) -> IResult<&[char], &[char], Error>
Does your is_alphanumeric have this type, or is it Fn(&[u8]) -> IResult<&[u8], &[u8], Error>?
Or it could be the opposite, you may have a take_while that works on &[u8], which doesn't work with your function that takes and returns &str (which, again, is mostly a &[char], and absolutely not a &[u8])

How do I compare a vector against a reversed version of itself?

Why won't this compile?
fn isPalindrome<T>(v: Vec<T>) -> bool {
return v.reverse() == v;
}
I get
error[E0308]: mismatched types
--> src/main.rs:2:25
|
2 | return v.reverse() == v;
| ^ expected (), found struct `std::vec::Vec`
|
= note: expected type `()`
found type `std::vec::Vec<T>`

Since you only need to look at the front half and back half, you can use the DoubleEndedIterator trait (methods .next() and .next_back()) to look at pairs of front and back elements this way:
/// Determine if an iterable equals itself reversed
fn is_palindrome<I>(iterable: I) -> bool
where
I: IntoIterator,
I::Item: PartialEq,
I::IntoIter: DoubleEndedIterator,
{
let mut iter = iterable.into_iter();
while let (Some(front), Some(back)) = (iter.next(), iter.next_back()) {
if front != back {
return false;
}
}
true
}
(run in playground)
This version is a bit more general, since it supports any iterable that is double ended, for example slice and chars iterators.
It only examines each element once, and it automatically skips the remaining middle element if the iterator was of odd length.

Read up on the documentation for the function you are using:
Reverse the order of elements in a slice, in place.
Or check the function signature:
fn reverse(&mut self)
The return value of the method is the unit type, an empty tuple (). You can't compare that against a vector.
Stylistically, Rust uses 4 space indents, snake_case identifiers for functions and variables, and has an implicit return at the end of blocks. You should adjust to these conventions in a new language.
Additionally, you should take a &[T] instead of a Vec<T> if you are not adding items to the vector.
To solve your problem, we will use iterators to compare the slice. You can get forward and backward iterators of a slice, which requires a very small amount of space compared to reversing the entire array. Iterator::eq allows you to do the comparison succinctly.
You also need to state that the T is comparable against itself, which requires Eq or PartialEq.
fn is_palindrome<T>(v: &[T]) -> bool
where
T: Eq,
{
v.iter().eq(v.iter().rev())
}
fn main() {
println!("{}", is_palindrome(&[1, 2, 3]));
println!("{}", is_palindrome(&[1, 2, 1]));
}
If you wanted to do the less-space efficient version, you have to allocate a new vector yourself:
fn is_palindrome<T>(v: &[T]) -> bool
where
T: Eq + Clone,
{
let mut reverse = v.to_vec();
reverse.reverse();
reverse == v
}
fn main() {
println!("{}", is_palindrome(&[1, 2, 3]));
println!("{}", is_palindrome(&[1, 2, 1]));
}
Note that we are now also required to Clone the items in the vector, so we add that trait bound to the method.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How can I take N bits of byte in nom? - rust

Related

how to deal with this fundamental error in rust?

Why doesn't &String automatically become &str in some cases?

Formatting a byte slice in Rust

Catch string between tags with nom delimited

How do I compare a vector against a reversed version of itself?

Categories

Resources