I'm trying to implement a parser combinator in rust.
I have the following parser, which is built with no errors/warnings:
use std::char;
#[derive(Debug)]
enum ParseResult<T> {
Success(T),
Failure(&'static str),
}
fn pchar(char_to_match: char) -> impl Fn(&str) -> ParseResult<(&str, &str)> {
move |string: &str| match string.get(0..1) {
Some(found) => match found.chars().next().unwrap() == char_to_match {
true => ParseResult::Success((found, string.get(1..).unwrap())),
false => ParseResult::Failure("char didnt match"),
},
None => ParseResult::Failure("No more left to parse."),
}
}
fn main() {
println!("{:?}", ParseResult::Success(&3));
println!("{:?}", pchar('s')("r"));
println!("{:?}", (|(a, b)| (a, b))((2, 3)));
let input_abc = "ABC";
println!("{:?}", pchar('A')(input_abc));
Instead of returning a function (closure) from pchar, I want to return a Parser<T> which I have declared as:
union Parser<T> {
func: dyn Fn(&str) -> ParseResult<(T, &str)>,
}
Then my second iteration of pchar can return a Parser<T> such as:
fn pchar2(char_to_match: char) -> Parser<&str> {
Parser {
func: |string: &str| match string.get(0..1) {
Some(found) => match found.chars().next().unwrap() == char_to_match {
true => ParseResult::Success((found, string.get(1..).unwrap())),
false => ParseResult::Failure("char didnt match"),
},
None => ParseResult::Failure("No more left to parse."),
},
}
}
I get all sorts of warnings regarding lifetime parameters, size at compile-time and expected trait dyn Fn.
What do I need to learn to solve this issue?
Update:
As per Aleksander Krauze's suggestion, I have updated it to use a Box.
struct Parser<T> {
func: Box<dyn Fn(&str) -> ParseResult<(T, &str)>>,
}
fn pchar2(char_to_match: char) -> Parser<&'static str> {
Parser {
func: Box::new(move |string: &str| match string.get(0..1) {
Some(found) => match found.chars().next().unwrap() == char_to_match {
true => ParseResult::Success((found, string.get(1..).unwrap())),
false => ParseResult::Failure("char didnt match"),
},
None => ParseResult::Failure("No more left to parse."),
}),
}
}
The error I get is that the lifetime of the variable string must outlive char_to_match.
Update 2:
As per Aleksander Krauze's answer. Here is the final working version:
fn pchar2<'a>(char_to_match: char) -> Parser<'a, &'a str> {
Parser {
func: Box::new(move |string: &'a str| match string.char_indices().next() {
Some((i, c)) => match c == char_to_match {
true => ParseResult::Success((
string.get(..i + 1).unwrap(),
string.get(i + 1..).unwrap(),
)),
false => ParseResult::Failure("char didnt match"),
},
None => ParseResult::Failure("No more left to parse."),
}),
}
}
Following snippet should fix your lifetime errors. The main problem was that you tried to return &'static str, when it was only a subslice of string. Types here are a little complex, but you could try to simplify them if Parser wouldn't be generic over T (but have T = &'a str). I don't know however what is your more general usecase, so it might not be viable.
Note that even that this code compiles, it has a bug! In rust strings are UTF-8 encoded, so single character can take anywhere from 1 to 4 bytes. But when you slice &str you are giving bytes indexes. That could result in a panic at runtime if you would slice in the middle of a character.
#[derive(Debug)]
enum ParseResult<T> {
Success(T),
Failure(&'static str),
}
struct Parser<'a, T> {
func: Box<dyn Fn(&'a str) -> ParseResult<(T, &'a str)>>,
}
fn pchar2<'a>(char_to_match: char) -> Parser<'a, &'a str> {
Parser {
func: Box::new(move |string: &'a str| match string.get(0..1) {
Some(found) => match found.chars().next().unwrap() == char_to_match {
true => ParseResult::Success((found, string.get(1..).unwrap())),
false => ParseResult::Failure("char didnt match"),
},
None => ParseResult::Failure("No more left to parse."),
}),
}
}
Related
I have a function Processor::process which can return dynamic vector of functions. When I try to use it I got an error:
error[E0277]: the size for values of type (dyn FnMut(String, Option<Vec<u8>>) -> Option<u8> + 'static) cannot be known at compilation time
This is my code:
fn handler1(a: String, b: Option<Vec<u8>>) -> Option<u8> {
None
}
fn handler2(a: String, b: Option<Vec<u8>>) -> Option<u8> {
None
}
fn handler3(a: String, b: Option<Vec<u8>>) -> Option<u8> {
None
}
struct Processor {}
impl Processor {
pub fn process(data: u8) -> Vec<dyn FnMut(String, Option<Vec<u8>>) -> Option<u8>> {
return match data {
1 => vec![handler1],
2 => vec![handler1, handler2],
3 => vec![handler1, handler2, handler3],
_ => {}
}
}
}
This is minimal sandbox implementation.
Could you please help to set correct typing for function return?
Either you box them, or you return a reference with an specific lifetime. In this case 'static:
fn handler1(a: String, b: Option<Vec<u8>>) -> Option<u8> {
None
}
fn handler2(a: String, b: Option<Vec<u8>>) -> Option<u8> {
None
}
fn handler3(a: String, b: Option<Vec<u8>>) -> Option<u8> {
None
}
struct Processor {}
impl Processor {
pub fn process(data: u8) -> Vec<&'static dyn FnMut(String, Option<Vec<u8>>) -> Option<u8>> {
return match data {
1 => vec![&handler1],
2 => vec![&handler1, &handler2],
3 => vec![&handler1, &handler2, &handler3],
_ => vec![]
}
}
}
Playground
You can also just use function pointers instead of the trait dynamic dispatch:
impl Processor {
pub fn process(data: u8) -> Vec<fn(String, Option<Vec<u8>>) -> Option<u8>> {
return match data {
1 => vec![handler1],
2 => vec![handler1, handler2],
3 => vec![handler1, handler2, handler3],
_ => vec![]
}
}
}
Playground
I can understand borrowing/ownership concepts in Rust, but I have no idea how to work around this case:
use std::collections::{HashMap, HashSet};
struct Val {
t: HashMap<u16, u16>,
l: HashSet<u16>,
}
impl Val {
fn new() -> Val {
Val {
t: HashMap::new(),
l: HashSet::new(),
}
}
fn set(&mut self, k: u16, v: u16) {
self.t.insert(k, v);
self.l.insert(v);
}
fn remove(&mut self, v: &u16) -> bool {
self.l.remove(v)
}
fn do_work(&mut self, v: u16) -> bool {
match self.t.get(&v) {
None => false,
Some(r) => self.remove(r),
}
}
}
fn main() {
let mut v = Val::new();
v.set(123, 100);
v.set(100, 1234);
println!("Size before: {}", v.l.len());
println!("Work: {}", v.do_work(123));
println!("Size after: {}", v.l.len());
}
playground
The compiler has the error:
error[E0502]: cannot borrow `*self` as mutable because it is also borrowed as immutable
--> src/main.rs:28:24
|
26 | match self.t.get(&v) {
| ------ immutable borrow occurs here
27 | None => false,
28 | Some(r) => self.remove(r),
| ^^^^^------^^^
| | |
| | immutable borrow later used by call
| mutable borrow occurs here
I don't understand why I can't mutate in the match arm when I did a get (read value) before; the self.t.get is finished when the mutation via remove begins.
Is this due to scope of the result (Option<&u16>) returned by the get? It's true that the lifetime of the result has a scope inside the match expression, but this design-pattern is used very often (mutate in a match expression).
How do I work around the error?
The declaration of function HashMap::<K,V>::get() is, a bit simplified:
pub fn get<'s>(&'s self, k: &K) -> Option<&'s V>
This means that it returns an optional reference to the contained value, not the value itself. Since the returned reference points to a value inside the map, it actually borrows the map, that is, you cannot mutate the map while this reference exists. This restriction is there to protect you, what would happen if you remove this value while the reference is still alive?
So when you write:
match self.t.get(&v) {
None => false,
//r: &u16
Some(r) => self.remove(r)
}
the captured r is of type &u16 and its lifetime is that of self.t, that is, it is borrowing it. Thus you cannot get a mutable reference to self, that is needed to call remove.
The simplest solution for your problem is the clone() solves every lifetime issue pattern. Since your values are of type u16, that is Copy, it is actually trivial:
match self.t.get(&v) {
None => false,
//r: u16
Some(&r) => self.remove(&r)
}
Now r is actually of type u16 so it borrows nothing and you can mutate self at will.
If your key/value types weren't Copy you could try and clone them, if you are willing to pay for that. If not, there is still another option as your remove() function does not modify the HashMap but an unrelated HashSet. You can still mutate that set if you take care not to reborrow self:
fn remove2(v: &u16, l: &mut HashSet<u16>) -> bool {
l.remove(v)
}
fn do_work(&mut self, v: u16) -> bool {
match self.t.get(&v) {
None => false,
//selt.t is borrowed, now we mut-borrow self.l, no problem
Some(r) => Self::remove2(r, &mut self.l)
}
}
You are trying to remove value from HashMap by using value you get, not key.
Only line 26 is changed Some(_) => self.remove(&v)
This will work:
use std::collections::HashMap;
struct Val {
t: HashMap<u16, u16>
}
impl Val {
fn new() -> Val {
Val { t: HashMap::new() }
}
fn set(&mut self, k: u16, v: u16) {
self.t.insert(k, v);
}
fn remove(&mut self, v: &u16) -> bool {
match self.t.remove(v) {
None => false,
_ => true,
}
}
fn do_work(&mut self, v: u16) -> bool {
match self.t.get(&v) {
None => false,
Some(_) => self.remove(&v)
}
}
}
fn main() {
let mut v = Val::new();
v.set(123, 100);
v.set(1100, 1234);
println!("Size before: {}", v.t.len());
println!("Work: {}", v.do_work(123));
println!("Size after: {}", v.t.len());
}
play.rust
It seems that the following solution is good for primitive types like here u16. For other types, the ownership is moved.
use std::collections::HashMap;
struct Val {
t: HashMap<u16, u16>,
}
impl Val {
fn new() -> Val {
Val { t: HashMap::new() }
}
fn set(&mut self, k: u16, v: u16) {
self.t.insert(k, v);
}
fn remove(&mut self, v: &u16) -> bool {
match self.t.remove(v) {
None => false,
_ => true,
}
}
fn do_work(&mut self, v: u16) -> bool {
match self.t.get(&v) {
None => false,
Some(&v) => self.remove(&v)
}
}
}
fn main() {
let mut v = Val::new();
v.set(123, 100);
v.set(100, 1234);
println!("Size before: {}", v.t.len());
println!("Work: {}", v.do_work(123));
println!("Size after: {}", v.t.len());
}
For other types, we must clone the value:
use std::collections::{HashMap, HashSet};
#[derive(Debug)]
struct Val {
t: HashMap<String, String>,
l: HashSet<String>
}
impl Val {
fn new() -> Val {
Val { t: HashMap::new(), l: HashSet::new() }
}
fn set(&mut self, k: String, v: String) {
self.l.insert(v.clone());
self.t.insert(k, v);
}
fn remove(&mut self, v: &String) -> bool {
self.l.remove(v)
}
fn do_work(&mut self, i: &String) -> bool {
match self.t.get(i) {
None => false,
Some(v) => {
let x = v.clone();
self.remove(&x)
}
}
}
fn do_task(&mut self, i: &String) -> bool {
match self.t.get(i) {
None => false,
Some(v) => self.l.insert(v.clone())
}
}
}
fn main() {
let mut v = Val::new();
v.set("AA".to_string(), "BB".to_string());
v.set("BB".to_string(), "CC".to_string());
println!("Start: {:#?}", v);
println!("Size before: {}", v.l.len());
println!("Work: {}", v.do_work(&"AA".to_string()));
println!("Size after: {}", v.l.len());
println!("After: {:#?}", v);
println!("Task [Exist]: {}", v.do_task(&"BB".to_string()));
println!("Task [New]: {}", v.do_task(&"AA".to_string()));
println!("End: {:#?}", v);
}
But i'd like a solution that has no allocation
At the moment, implementing the std::ops::IndexMut trait on a type in Rust requires that I also implement the std::ops::Index trait as well. The bodies of these implementations end up being virtually identical. For example:
use std::ops::{Index, IndexMut};
enum IndexType {
A,
B,
}
struct Indexable {
a: u8,
b: u8,
}
impl Index<IndexType> for Indexable {
type Output = u8;
fn index<'a>(&'a self, idx: IndexType) -> &'a u8 {
match idx {
IndexType::A => &self.a,
IndexType::B => &self.b,
}
}
}
impl IndexMut<IndexType> for Indexable {
fn index_mut<'a>(&'a mut self, idx: IndexType) -> &'a mut u8 {
match idx {
IndexType::A => &mut self.a,
IndexType::B => &mut self.b,
}
}
}
fn main() {}
This works, and obviously for trivial types this isn't a serious problem, but for more complex types with more interesting indexing this quickly becomes laborious and error-prone. I'm scratching my head trying to find a way to unify this code, but nothing is jumping out at me, and yet I feel there has to/should be a way to do this without essentially having to copy and paste. Any suggestions? What am I missing?
Unfortunately, this cuts across a few things Rust really isn't good at right now. The cleanest solution I could come up with was this:
macro_rules! as_expr {
($e:expr) => { $e };
}
macro_rules! borrow_imm { ($e:expr) => { &$e } }
macro_rules! borrow_mut { ($e:expr) => { &mut $e } }
macro_rules! impl_index {
(
<$idx_ty:ty> for $ty:ty,
($idx:ident) -> $out_ty:ty,
$($body:tt)*
) => {
impl ::std::ops::Index<$idx_ty> for $ty {
type Output = $out_ty;
fn index(&self, $idx: $idx_ty) -> &$out_ty {
macro_rules! index_expr { $($body)* }
index_expr!(self, borrow_imm)
}
}
impl ::std::ops::IndexMut<$idx_ty> for $ty {
fn index_mut(&mut self, $idx: $idx_ty) -> &mut $out_ty {
macro_rules! index_expr { $($body)* }
index_expr!(self, borrow_mut)
}
}
};
}
enum IndexType { A, B }
struct Indexable { a: u8, b: u8 }
impl_index! {
<IndexType> for Indexable,
(idx) -> u8,
($this:expr, $borrow:ident) => {
match idx {
IndexType::A => $borrow!($this.a),
IndexType::B => $borrow!($this.b),
}
}
}
fn main() {
let mut x = Indexable { a: 1, b: 2 };
x[IndexType::A] = 3;
println!("x {{ a: {}, b: {} }}", x[IndexType::A], x[IndexType::B]);
}
The short version is: we turn the body of index/index_mut into a macro so that we can substitute the name of a different macro that, given an expression, expands to either &expr or &mut expr. We also have to re-capture the self parameter (using a different name) because self is really weird in Rust, and I gave up trying to make it work nicely.
I'm writing a lexer in Rust to learn, but I'm stuck with two "cannot move out of borrowed content [E0507]" errors.
I tried all the solutions out there, but nothing seems to work: RefCell, clone(), by_ref(), changing the &mut self to self or &self or mut self, or dereferencing.
Here is my code:
struct Snapshot {
Index: u32,
}
struct Tokenizable<'a, T: 'a>
where T: Iterator
{
Index: u32,
Items: &'a T,
Snapshots: Vec<Snapshot>,
}
impl<'a, T> Tokenizable<'a, T>
where T: Iterator
{
fn new(items: &'a T) -> Tokenizable<'a, T> {
Tokenizable {
Index: 0,
Items: items,
Snapshots: Vec::new(),
}
}
fn end(&mut self) -> bool {
match self.Items.peekable().peek() {
Some(c) => false,
None => true,
}
}
fn peek(&mut self) -> Option<&T::Item> {
match self.Items.peekable().peek() {
Some(c) => Some(c),
None => None,
}
}
}
fn main() {}
error: cannot move out of borrowed content [E0507]
match self.Items.peekable().peek() {
^~~~~~~~~~
help: see the detailed explanation for E0507
error: borrowed value does not live long enough
match self.Items.peekable().peek() {
^~~~~~~~~~~~~~~~~~~~~
note: reference must be valid for the anonymous lifetime #1 defined on the block at 32:43...
fn peek(&mut self) -> Option<&T::Item> {
match self.Items.peekable().peek() {
Some(c) => Some(c),
None => None,
}
}
note: ...but borrowed value is only valid for the block at 32:43
fn peek(&mut self) -> Option<&T::Item> {
match self.Items.peekable().peek() {
Some(c) => Some(c),
None => None,
}
}
error: cannot move out of borrowed content [E0507]
match self.Items.peekable().peek() {
^~~~~~~~~~
help: see the detailed explanation for E0507
As you can see in the docs, the peekable function takes the iterator by value. Therefore it will only work if you own the iterator. However, in your code, Items is a shared reference to the iterator.
Solving this problem requires approaching it from a different angle. For instance, you could take the iterator by value in the constructor and adapt the struct to store the peekable iterator in the Items field.
Basically, what is to be learned from here is the fact that over complicating things and over engineering things almost always does more harm than good.
Final fixed code:
use std::iter::Peekable;
struct Snapshot {
index: u32
}
struct Tokenizable<T> where T: Iterator {
index : u32,
items : Peekable<T>,
snapshots : Vec<Snapshot>,
}
impl<T> Tokenizable<T> where T: Iterator {
fn new (items: T) -> Tokenizable<T> {
Tokenizable {
index : 0,
items : items.peekable (),
snapshots : Vec::new (),
}
}
fn end (&mut self) -> bool {
match self.items.peek () {
Some (c) => false,
None => true
}
}
fn peek (&mut self) -> Option<&<T as Iterator>::Item> {
match self.items.peek () {
Some (c) => Some (c),
None => None
}
}
}
fn main () {
let mut data = "Hello".chars ();
let tokenizable = Tokenizable::new (data);
}
I'm trying to implement a "polymorphic" Input enum which hides whether we're reading from a file or from a stdin. More concretely, I'm trying build an enum that will have a lines method that will in turn "delegate" that call to either a File wrapped into a BufReader or to a StdInLock (both of which have the lines() method).
Here's the enum:
enum Input<'a> {
Console(std::io::StdinLock<'a>),
File(std::io::BufReader<std::fs::File>)
}
I have three methods:
from_arg for deciding whether we're reading from a file or from a stdin by checking whether an argument (filename) was provided,
file for wrapping a file with a BufReader,
console for locking the stdin.
The implementation:
impl<'a> Input<'a> {
fn console() -> Input<'a> {
Input::Console(io::stdin().lock())
}
fn file(path: String) -> io::Result<Input<'a>> {
match File::open(path) {
Ok(file) => Ok(Input::File(std::io::BufReader::new(file))),
Err(_) => panic!("kita"),
}
}
fn from_arg(arg: Option<String>) -> io::Result<Input<'a>> {
Ok(match arg {
None => Input::console(),
Some(path) => try!(Input::file(path)),
})
}
}
As far as I understand, I have to implement both BufRead and Read traits for this to work. This is my attempt:
impl<'a> io::Read for Input<'a> {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
match *self {
Input::Console(ref mut c) => c.read(buf),
Input::File(ref mut f) => f.read(buf),
}
}
}
impl<'a> io::BufRead for Input<'a> {
fn lines(self) -> Lines<Self> {
match self {
Input::Console(ref c) => c.lines(),
Input::File(ref f) => f.lines(),
}
}
fn consume(&mut self, amt: usize) {
match *self {
Input::Console(ref mut c) => c.consume(amt),
Input::File(ref mut f) => f.consume(amt),
}
}
fn fill_buf(&mut self) -> io::Result<&[u8]> {
match *self {
Input::Console(ref mut c) => c.fill_buf(),
Input::File(ref mut f) => f.fill_buf(),
}
}
}
Finally, the invocation:
fn load_input<'a>() -> io::Result<Input<'a>> {
Ok(try!(Input::from_arg(env::args().skip(1).next())))
}
fn main() {
let mut input = match load_input() {
Ok(input) => input,
Err(error) => panic!("Failed: {}", error),
};
for line in input.lines() { /* do stuff */ }
}
Complete example in the playground
The compiler tells me that I'm pattern matching wrongly and that I have mismatched types:
error[E0308]: match arms have incompatible types
--> src/main.rs:41:9
|
41 | / match self {
42 | | Input::Console(ref c) => c.lines(),
| | --------- match arm with an incompatible type
43 | | Input::File(ref f) => f.lines(),
44 | | }
| |_________^ expected enum `Input`, found struct `std::io::StdinLock`
|
= note: expected type `std::io::Lines<Input<'a>>`
found type `std::io::Lines<std::io::StdinLock<'_>>`
I tried to satisfy it with:
match self {
Input::Console(std::io::StdinLock(ref c)) => c.lines(),
Input::File(std::io::BufReader(ref f)) => f.lines(),
}
... but that doesn't work either.
I'm really out of my depth here, it seems.
The answer by #A.B. is correct, but it tries to conform to OP's original program structure. I want to have a more readable alternative for newcomers who stumble upon this question (just like I did).
use std::env;
use std::fs;
use std::io::{self, BufReader, BufRead};
fn main() {
let input = env::args().nth(1);
let reader: Box<dyn BufRead> = match input {
None => Box::new(BufReader::new(io::stdin())),
Some(filename) => Box::new(BufReader::new(fs::File::open(filename).unwrap()))
};
for line in reader.lines() {
println!("{:?}", line);
}
}
See the discussion in reddit from which I borrowed the code.
Note the dyn keyword before boxed BufRead. This pattern is called a trait object.
This is the simplest solution but will borrow and lock Stdin.
use std::fs::File;
use std::io::{self, BufRead, Read};
struct Input<'a> {
source: Box<BufRead + 'a>,
}
impl<'a> Input<'a> {
fn console(stdin: &'a io::Stdin) -> Input<'a> {
Input {
source: Box::new(stdin.lock()),
}
}
fn file(path: &str) -> io::Result<Input<'a>> {
File::open(path).map(|file| Input {
source: Box::new(io::BufReader::new(file)),
})
}
}
impl<'a> Read for Input<'a> {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
self.source.read(buf)
}
}
impl<'a> BufRead for Input<'a> {
fn fill_buf(&mut self) -> io::Result<&[u8]> {
self.source.fill_buf()
}
fn consume(&mut self, amt: usize) {
self.source.consume(amt);
}
}
Due to default trait methods, Read and BufRead are fully implemented for Input. So you can call lines on Input.
let input = Input::file("foo.txt").unwrap();
for line in input.lines() {
println!("input line: {:?}", line);
}
If you're willing to restructure you're code a bit, you can actually get away without doing dynamic dispatch. You just need to make sure whatever code is using the reader is wrapped in it's own function and the concrete types of the arguments for that function are known at compile time.
So if we eschew the enum Input idea for a moment, and building on #Yerke's answer, we can do:
use std::env;
use std::fs;
use std::io::{BufRead, BufReader, Read};
fn main() {
let input = env::args().nth(1);
match input {
Some(filename) => output_lines(fs::File::open(filename).unwrap()),
None => output_lines(std::io::stdin()),
};
}
fn output_lines<R: Read>(reader: R) {
let buffer = BufReader::new(reader);
for line in buffer.lines() {
println!("{:?}", line);
}
}
Because we have a concrete type for R each time we call output_lines, the compiler can monomorphize the output_lines function and do static dispatch. In addition to being less complicated code in my opinion (no need for Box wrapping), it's also slightly faster and the compiler can do more optimizations.