Background:
I am using a combination of Alex and Parsec to parse an indentation sensitive language. An line more indented then the one above it my either be data for that line, or a continuation of the commands on that line. Alex has no way to know for sure so it simply returns a Block{...} token.
The parser knows when a data block is expected and passes the data into the Abstract Syntax Tree to be used by the interpreter.
The problem arises when the parser encounters a Block token when it wanted something else. I have a function that will convert a block token into a list of Tokens called lex_block but I cannot seem to apply it to the right place in the input stream.
Question:
Given a parser combinator that which fails at some token in the input stream, is there a way to modify the token that caused the failure and either continue evaluating the new input stream or at least to try again?
I would like to write a function retry as follows:
> retry :: (s -> Maybe s) -> ParsecT s u m a -> ParsecT s u m a
retry f p acts like parser p, except that if p fails the function f is used to alter the input stream at the point p failed and p is retried. When f returns Nothing the failure is passed on.
I have tried to write this using getInput, setInput and try, but I cannot figure out how to get the input stream at the point the parser combinator failed rather then the point that it started. I think that if I could discover a way to be told where the failure happened in the input stream I could make this work.
Related
I have an attoparsec parser, and tests for it, what annoys me is that if I comment part of the parser and run the tests, the parser doesn't return Left "parse error at line ..." but instead I get Right [].
Note that I'm using parseOnly to make it clear that there'll be no more input.
Otherwise it's nice to get the partially parsed input, it can definitely be useful and I'm glad to have it. However I'd like to be informed that the whole input was not consumed. Maybe to get a character offset of the last consumed letter, or if that's what it takes, at least an option to be returned Left.
If it's relevant, the parser can be found there.
If I comment for instance the line:
<|> PlainText <$> choice (string <$> ["[", "]", "*", "`"])
And run the tests, I get for instance:
1) notes parsing tests parses notes properly
simple test
expected: Right [NormalLine [PlainText "one line* # hello world"]]
but got: Right []
This is from that test.
Depending on if consuming the whole input should be the property of parseNoteDocument or just the tests, I'd extend one or the other with endOfInput or atEnd.
I'd suggest to define a proper Parser for your documents, like
parseNoteDocument' :: Text -> Parsec NoteDocument
parseNoteDocument' = many parseLine
and then define parseNoteDocument in terms of it. Then you can use parseNoteDocument' in the tests by defining a helper that parses a given piece of text using
parseNoteDocument' <* endOfInput
to ensure that the whole input is consumed.
I would like to write a Parsec parser that would parse a sequence of numbers, returning a sorted list, but failing on encountering a duplicate number (the error should point to the position of the duplicate). For the sorted part I can simply use:
manySorted = manyAccum insert
But how can I trigger a Parsec error if the number is already on the list. It doesn't seem like manyAccum allows that and I couldn't figure out how to make my own clone of manyAccum that would (implementation uses unParser which doesn't seem to be exposed outside of Parsec library).
You could try to obtain the current parser's position by
sourcePos :: Monad m => ParsecT s u m SourcePos
sourcePos = statePos `liftM` getParserState
Then accumulate the positions together with parsed values so that you can report the position of the original.
If you're sure you want the parser to be doing this, you can use the fail method with an error message to fail out of the parser. (Complete with an error location, and trying other <|> possibilities, and so on.)
I am trying to write a Forth interpreter in Haskell. There are many sub problems and categories to accomplish this, however, I am trying to accomplish the most basic of steps, and I have been at it for some time in different approaches. The simple input case I am trying to get to is "25 12 +" -> [37]. I am not worried about the lists in Forth are backwards from Haskell, but I do want to try and accommodate the extensibility of the input string down the road, so I am using Maybe, as if there is an error, I will just do Nothing.
I first tried to break the input string into a list of "words" using Prelude's words function. From there I used Prelude's reads function to turn it into a list of tuples (Int,String). So this works great, up until I get to a command "word", such as the char + in the sample problem.
So how do I parse/interpret the string's command to something I can use?
Do I create a new data structure that has all the Forth commands or special characters? (assuming this, how do I convert it from the string format to that data type?)
Need anything else, just ask. I appreciate the help thinking this through.
read is essentially a very simple string parser. Rather than adapting it, you might want to consider learning to use a parser combinator library such as Parsec.
There are a bunch of different tutorials about parser combinators so you'll probably need to do a bit of reading before they 'click.' However, the first example in this tutorial is quite closely related to your problem.
import Text.Parsec
import Text.Parsec.String
play :: String -> Either ParseError Integer
play s = parse pmain "parameter" s
pmain :: Parser Integer
pmain = do
x <- pnum `chainl1` pplus
eof
return x
pnum = read `fmap` many1 digit
pplus = char '+' >> return (+)
It's a simple parser that evaluates arbitrarily long lists:
*Main> play "1+2+3+4+5"
Right 15
It also produces useful parse errors:
*Main> play "1+2+3+4+5~"
Left "parameter" (line 1, column 10):
unexpected '~'
expecting digit, "+" or end of input
If you can understand this simple parser, you should be able to work out how to adapt it to your particular problem (referring to the list of generic combinators in the documentation for Text.Parsec.Combinator). It will take a little longer at first than using read, but using a proper parsing library will make it much easier to achieve the ultimate goal of parsing Forth's whole grammar.
One common problem I have with Parsec is that it tends to ignore invalid input if it occurs in the "right" place.
As a concrete example, suppose we have integer :: Parser Int, and I write
expression = sepBy integer (char '+')
(Ignore whitespace issues for a moment.)
This correctly parses something like "123+456+789". However, if I feed it "123+456-789", it merrily ignores the illegal "-" character and the trailing part of the expression; I actually wanted an error message telling me about the invalid input, not just having it silently ignore that part.
I understand why this happens; what I'm not sure about is how to fix it. What is the general method for designing parsers that consume all supplied input and succeed only if all of it is a valid expression?
It's actually pretty simple--just ensure it's followed by eof:
parse (expression <* eof) "<interactive>" "123+456-789"
eof matches the end of the input, even if the input is just a string and not a file.
Obviously, this only makes sense at the top level of your parser.
This is something of an extension to this question:
Dispatching to correct function with command line arguments in Haskell
So, as it turns out, I don't have a good solution yet for dispatching "commands" from the command line to other functions. So, I'd like to extend the approach in the question above. It seems cumbersome to have to manually add functions to the table and apply the appropriate transformation function to each function so that it takes a list of the correct size instead of its normal arguments. Instead, I'd like to build a table where I'll add functions and "tag" them with the number of arguments it needs to take from the command line. The "add" procedure, should then take care of composing with the correct "takesXarguments" procedure and adding it to the table.
I'd like to be able to install "packages" of functions into the table, which makes me think I need to be able to keep track of the state of the table, since it will change when packages get installed. Is the Reader Monad or the State Monad what I'm looking for?
No monad necessary. Your tagging idea is on the right track, but that information is encoded probably in a different way than you expected.
I would start with a definition of a command:
type Command = [String] -> IO ()
Then you can make "command maker" functions:
mkCommand1 :: (String -> IO ()) -> Command
mkCommand2 :: (String -> String -> IO ()) -> Command
...
Which serves as the tag. If you don't like the proliferation of functions, you can also make a "command lambda":
arg :: (String -> Command) -> Command
arg f (x:xs) = f x xs
arg f [] = fail "Wrong number of arguments"
So that you can write commands like:
printHelloName :: Command
printHelloName = arg $ \first -> arg $ \last -> do
putStrLn $ "Hello, Mr(s). " ++ last
putStrLn $ "May I call you " ++ first ++ "?"
Of course mkCommand1 etc. can be easily written in terms of arg, for the best of both worlds.
As for packages, Command sufficiently encapsulates choices between multiple subcommands, but they don't compose. One option here is to change Command to:
type Command = [String] -> Maybe (IO ())
Which allows you to compose multiple Commands into a single one by taking the first action that does not return Nothing. Now your packages are just values of type Command as well. (In general with Haskell we are very interested in these compositions -- rather than packages and lists, think about how you can take two of some object to make a composite object)
To save you from the desire you have surely built up: (1) there is no reasonable way to detect the number of arguments a function takes*, and (2) there is no way to make a type depend on a number, so you won't be able to create a mkCommand which takes as its first argument an Int for the number of arguments.
Hope this helped.
In this case, it turns out that there is, but I recommend against it and think it is a bad habit -- when things get more abstract the technique breaks down. But I'm something of a purist; the more duct-tapey Haskellers might disagree with me.