I'm having a hard time finding any useful examples of how to use nom to parse a binary file, since it seems that the documentation is heavily biased towards &str input parsers.
I just want to make a function here that will read 4 bytes, turn it into a u32 and return it as a result. Here is my function:
fn take_u32(input: &[u8]) -> IResult<&[u8], u32> {
map_res(
take(4),
|bytes| Ok(u32::from_be_bytes(bytes))
)(input)
}
I'm getting the following error:
error[E0277]: the trait bound `[u8; 4]: From<u8>` is not satisfied
--> src\main.rs:16:9
|
16 | take(4),
| ^^^^ the trait `From<u8>` is not implemented for `[u8; 4]`
|
::: C:\Users\cmbas\.cargo\registry\src\github.com-1ecc6299db9ec823\nom-7.1.0\src\bits\complete.rs:39:6
|
39 | O: From<u8> + AddAssign + Shl<usize, Output = O> + Shr<usize, Output = O>,
| -------- required by this bound in `nom::complete::take`
What is the canonical way to do what I'm attempting?
take() return Self aka your input, In your case slice &[u8].
map_res() is used to map a Result, from_be_bytes() doesn't return a result, the doc also contains an example using TryInto::try_into.
To make your code compile you need to use map_opt() and use try_into() to transform the slice to array:
fn take_u32(input: &[u8]) -> IResult<&[u8], u32> {
map_opt(
take(4),
|bytes| int_bytes.try_into().map(u32::from_be_bytes)
)(input)
}
That said nom already have such basic conbinator be_u32
Related
I am trying to create a high order function which takes in (a function that takes &str and returns iterator of &str). What I am having difficulty is to relate lifetime variables of for<'a> for the function and the return type of that function. It is hard to describe in words, so let's jump right into the code:
use std::fs::File;
use std::io::{BufRead, BufReader, Result};
fn static_dispatcher<'b, I>(
tokenize: impl for<'a> Fn(&'a str) -> I,
ifs: BufReader<File>,
) -> Result<()>
where
I: Iterator<Item = &'b str>,
{
ifs.lines()
.map(|line| tokenize(&line.unwrap()).count())
.for_each(move |n| {
println!("{n}");
});
Ok(())
}
fn main() -> Result<()> {
let ifs = BufReader::new(File::open("/dev/stdin")?);
static_dispatcher(|line| line.split_whitespace(), ifs)
// static_dispatcher(|line| line.split('\t'), ifs)
}
The compiler complains that the lifetime relation of the tokenize's input 'a and output 'b is not specified.
--> src/main.rs:21:30
|
21 | static_dispatcher(|line| line.split_whitespace(), ifs)
| ----- ^^^^^^^^^^^^^^^^^^^^^^^ returning this value requires that `'1` must outlive `'2`
| | |
| | return type of closure is SplitWhitespace<'2>
| has type `&'1 str`
|
= note: requirement occurs because of the type `SplitWhitespace<'_>`, which makes the generic argument `'_` invariant
= note: the struct `SplitWhitespace<'a>` is invariant over the parameter `'a`
= help: see <https://doc.rust-lang.org/nomicon/subtyping.html> for more information about variance
I want to specify 'a = 'b, but I can't because 'a comes from for<'a>, which is not visible for the type I.
I also tried
fn static_dispatcher<'a, I>(
tokenize: impl Fn(&'a str) -> I,
ifs: BufReader<File>,
) -> Result<()>
where
I: Iterator<Item = &'a str>,
but this does not work either b/c the lifetime of tokenize argument must be generic, i.e., must be used with for <'a>.
How can I fix this problem?
I'm refactoring the cartesian product code in Rust's itertools [1] as a way of learning Rust. The cartesian product is formed from an Iterator I and an IntoIterator J. The IntoIterator J is converted into an iterator multiple times and iterated over.
So far I have the code below, which is a minor modification of the code in the itertools source code. The biggest change is specifying a specific type (i8) instead of using generic types.
struct Product {
a: dyn Iterator<Item=i8>,
a_cur: Option<i8>,
b: dyn IntoIterator<Item=i8, IntoIter=dyn Iterator<Item=i8>>,
b_iter: dyn Iterator<Item=i8>
}
impl Iterator for Product {
type Item = (i8, i8);
fn next(&mut self) -> Option<Self::Item> {
let elt_b = match self.b_iter.next() {
None => {
self.b_iter = self.b.into_iter();
match self.b_iter.next() {
None => return None,
Some(x) => {
self.a_cur = self.a.next();
x
}
}
}
Some(x) => x
};
match self.a_cur {
None => None,
Some(ref a) => {
Some((a, elt_b))
}
}
}
}
fn cp(i: impl Iterator<Item=i8>, j: impl IntoIterator<Item=i8>) -> Product {
let p = Product{
a: i,
a_cur: i.next(),
b: j,
b_iter: j.into_iter()};
return p
}
fn main() {
for foo in cp(vec![1,4,7], vec![2,3,9]) {
println!("{:?}", foo);
}
}
Unfortunately the compiler is giving errors I have been unable to fix. I've attempted the fixes suggested by the compiler, but when I make them I get many more "doesn't have size known at compile time" errors.
I'm especially confused because the implementation in Rust's itertools library (link below) has a very similar structure and didn't require specifying lifetimes, borrowing, using Boxes, or the dyn keyword. I'd love to know what I changed that led to the Rust compiler suggesting using borrowing and/or Boxes.
error[E0277]: the size for values of type `(dyn Iterator<Item = i8> + 'static)` cannot be known at compilation time
--> src/main.rs:15:8
|
15 | a: dyn Iterator<Item=i8>,
| ^^^^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
|
= help: the trait `Sized` is not implemented for `(dyn Iterator<Item = i8> + 'static)`
= note: only the last field of a struct may have a dynamically sized type
= help: change the field's type to have a statically known size
help: borrowed types always have a statically known size
|
15 | a: &dyn Iterator<Item=i8>,
| ^
help: the `Box` type always has a statically known size and allocates its contents in the heap
|
15 | a: Box<dyn Iterator<Item=i8>>,
| ^^^^ ^
error[E0277]: the size for values of type `(dyn IntoIterator<Item = i8, IntoIter = (dyn Iterator<Item = i8> + 'static)> + 'static)` cannot be known at compilation time
--> src/main.rs:17:8
|
17 | b: dyn IntoIterator<Item=i8, IntoIter=dyn Iterator<Item=i8>>,
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
|
= help: the trait `Sized` is not implemented for `(dyn IntoIterator<Item = i8, IntoIter = (dyn Iterator<Item = i8> + 'static)> + 'static)`
= note: only the last field of a struct may have a dynamically sized type
= help: change the field's type to have a statically known size
help: borrowed types always have a statically known size
|
17 | b: &dyn IntoIterator<Item=i8, IntoIter=dyn Iterator<Item=i8>>,
| ^
help: the `Box` type always has a statically known size and allocates its contents in the heap
|
17 | b: Box<dyn IntoIterator<Item=i8, IntoIter=dyn Iterator<Item=i8>>>,
| ^^^^ ^
error: aborting due to 2 previous errors
[1] Docs at https://nozaq.github.io/shogi-rs/itertools/trait.Itertools.html#method.cartesian_product and code at https://github.com/rust-itertools/itertools/blob/master/src/adaptors/mod.rs#L286 .
You didn't just change the item type, you also removed the generic iterator. In the itertools crate, there is:
pub struct Product<I, J>
where I: Iterator
{
a: I,
…
Meaning that a is an iterator whose exact type will be specified by the user (but still at compile time).
You have removed the generic I parameter and instead you have written:
pub struct Product
{
a: dyn Iterator<Item=i8>,
…
If that worked, it would mean that a is an iterator whose item type is u8 but whose exact type will be specified at runtime. Therefore at compile-time the compiler can't know the exact type of a nor how much space it should allocate to store a inside Product.
If you want your cartesian product to work for any iterator whose items are u8, you need to keep the generic parameter I with an extra constraint:
pub struct Product<I, J>
where I: Iterator<Item=u8>
{
a: I,
…
And a similar change will be required for J in impl Iterator.
This is related to my earlier question on making a modular exponentiation method generic. I've now arrived at the following code:
fn powm<T>(fbase: &T, exponent: &T, modulus: &T) -> T
where
T: Mul<T, Output = T>
+ From<u8>
+ PartialEq<T>
+ Rem<T, Output = T>
+ Copy
+ for<'a> Rem<&'a T, Output = T>
+ Clone
+ PartialOrd<T>
+ ShrAssign<T>,
for<'a> &'a T: PartialEq<T> + Rem<&'a T, Output = T>,
{
if modulus == T::from(1) {
T::from(0)
} else {
let mut result = T::from(1);
let mut base = fbase % modulus;
let mut exp = exponent.clone();
while exp > T::from(0) {
if exp % T::from(2) == T::from(1) {
result = (result * base) % modulus;
}
exp >>= T::from(1);
base = (base * base) % modulus;
}
result
}
}
It is my understanding that by defining the trait bound where for<'a> &'a T: Rem<&'a T, Output=T> that it is understood that I can use the modulo operator % on two operands of type &'a T, and the result will be of type T. However, I get the following error:
error[E0495]: cannot infer an appropriate lifetime due to conflicting requirements
--> src/main.rs:20:30
|
20 | let mut base = fbase % modulus;
| ^
|
note: first, the lifetime cannot outlive the anonymous lifetime #3 defined on the function body at 3:1...
--> src/main.rs:3:1
|
3 | / fn powm<T>(fbase: &T, exponent: &T, modulus: &T) -> T
4 | | where
5 | | T: Mul<T, Output = T>
6 | | + From<u8>
... |
30 | | }
31 | | }
| |_^
note: ...so that reference does not outlive borrowed content
--> src/main.rs:20:32
|
20 | let mut base = fbase % modulus;
| ^^^^^^^
note: but, the lifetime must be valid for the anonymous lifetime #1 defined on the function body at 3:1...
--> src/main.rs:3:1
|
3 | / fn powm<T>(fbase: &T, exponent: &T, modulus: &T) -> T
4 | | where
5 | | T: Mul<T, Output = T>
6 | | + From<u8>
... |
30 | | }
31 | | }
| |_^
note: ...so that types are compatible (expected std::ops::Rem, found std::ops::Rem<&T>)
--> src/main.rs:20:30
|
20 | let mut base = fbase % modulus;
| ^
The code does work if I replace the line in question by
let mut base = fbase.clone() % modulus;
I don't see why I would need to clone in the first place if I can use the modulo operator already to return a "fresh" element of type T. Do I need to modify my trait bounds instead? Why does this go wrong?
When programming, it's very useful to learn how to create a Minimal, Complete, and Verifiable example (MCVE). This allows you to ignore irrelevant details and focus on the core of the problem.
As one example, your entire blob of code can be reduced down to:
use std::ops::Rem;
fn powm<T>(fbase: &T, modulus: &T)
where
for<'a> &'a T: Rem<&'a T, Output = T>,
{
fbase % modulus;
}
fn main() {}
Once you have a MCVE, you can make permutations to it to explore. For example, we can remove the lifetime elision:
fn powm<'a, 'b, T>(fbase: &'a T, modulus: &'b T)
where
for<'x> &'x T: Rem<&'x T, Output = T>,
{
fbase % modulus;
}
Now we start to see something: what is the relation between all three lifetimes? Well, there isn't one, really. What happens if we make one?
If we say that the input references can be unified to the same
lifetime, it works:
fn powm<'a, T>(fbase: &'a T, modulus: &'a T)
If we say that 'b outlives 'a, it works:
fn powm<'a, 'b: 'a, T>(fbase: &'a T, modulus: &'b T)
If we say that we can have two different lifetimes in the operator, it works:
for<'x, 'y> &'x T: Rem<&'y T, Output = T>,
What about if we poke at the call site?
If we directly call the Rem::rem method, it works:
Rem::rem(fbase, modulus);
If we dereference and re-reference, it works:
&*fbase % &*modulus;
I don't know exactly why the original doesn't work — conceptually both the input references should be able to be unified to one lifetime. It's possible that there's a piece of inference that either cannot or isn't happening, but I'm not aware of it.
Some further discussion with a Rust compiler developer led to an issue as it doesn't quite seem right. This issue has now been resolved and should theoretically be available in Rust 1.23.
This program dies because of infinite recursion:
use std::any::Any;
trait Foo {
fn get(&self, index: usize) -> Option<&Any>;
}
impl Foo for Vec<i32> {
fn get(&self, index: usize) -> Option<&Any> {
Vec::get(self, index).map(|v| v as &Any)
}
}
fn main() {
let v: Vec<i32> = vec![1, 2, 4];
println!("Results: {:?}", v.get(0))
}
The compiler itself warns about this:
warning: function cannot return without recurring
--> src/main.rs:8:5
|
8 | fn get(&self, index: usize) -> Option<&Any> {
| _____^ starting here...
9 | | Vec::get(self, index).map(|v| v as &Any)
10 | | }
| |_____^ ...ending here
|
= note: #[warn(unconditional_recursion)] on by default
note: recursive call site
--> src/main.rs:9:9
|
9 | Vec::get(self, index).map(|v| v as &Any)
| ^^^^^^^^^^^^^^^^^^^^^
= help: a `loop` may express intention better if this is on purpose
Why does universal call syntax not work in this case? The compiler does not understand that I want to call Vec::get not Foo::get.
How can I fix this, if I do not want to change function names?
To specify which method to call, whether inherent or provided from a trait, you want to use the fully qualified syntax:
Type::function(maybe_self, needed_arguments, more_arguments)
Trait::function(maybe_self, needed_arguments, more_arguments)
Your case doesn't work because Vec doesn't have a method called get! get is provided from the Deref implementation to [T].
The easiest fix is to call as_slice directly:
self.as_slice().get(index).map(|v| v as &Any)
You could also use the fully qualified syntax which requires the angle brackets in this case (<...>) to avoid ambiguity with declaring an array literal:
<[i32]>::get(self, index).map(|v| v as &Any)
universal call syntax
Note that while Rust originally used the term universal function call syntax (UFCS), the usage of this term conflicted with the existing understood programming term, so the use of it is not suggested. The replacement term is fully qualified syntax.
This program dies because of infinite recursion:
use std::any::Any;
trait Foo {
fn get(&self, index: usize) -> Option<&Any>;
}
impl Foo for Vec<i32> {
fn get(&self, index: usize) -> Option<&Any> {
Vec::get(self, index).map(|v| v as &Any)
}
}
fn main() {
let v: Vec<i32> = vec![1, 2, 4];
println!("Results: {:?}", v.get(0))
}
The compiler itself warns about this:
warning: function cannot return without recurring
--> src/main.rs:8:5
|
8 | fn get(&self, index: usize) -> Option<&Any> {
| _____^ starting here...
9 | | Vec::get(self, index).map(|v| v as &Any)
10 | | }
| |_____^ ...ending here
|
= note: #[warn(unconditional_recursion)] on by default
note: recursive call site
--> src/main.rs:9:9
|
9 | Vec::get(self, index).map(|v| v as &Any)
| ^^^^^^^^^^^^^^^^^^^^^
= help: a `loop` may express intention better if this is on purpose
Why does universal call syntax not work in this case? The compiler does not understand that I want to call Vec::get not Foo::get.
How can I fix this, if I do not want to change function names?
To specify which method to call, whether inherent or provided from a trait, you want to use the fully qualified syntax:
Type::function(maybe_self, needed_arguments, more_arguments)
Trait::function(maybe_self, needed_arguments, more_arguments)
Your case doesn't work because Vec doesn't have a method called get! get is provided from the Deref implementation to [T].
The easiest fix is to call as_slice directly:
self.as_slice().get(index).map(|v| v as &Any)
You could also use the fully qualified syntax which requires the angle brackets in this case (<...>) to avoid ambiguity with declaring an array literal:
<[i32]>::get(self, index).map(|v| v as &Any)
universal call syntax
Note that while Rust originally used the term universal function call syntax (UFCS), the usage of this term conflicted with the existing understood programming term, so the use of it is not suggested. The replacement term is fully qualified syntax.