Make Parsec function fail instead of expecting more input - haskell

I'm using Parsec to parse some expressions (see this question for more context), and most relevant part of my code is:
statement :: Parser Stmt
statement = assignStmt <|> simpleStmt
assignStmt :: Parser Stmt
assignStmt =
do var <- identifier
reservedOp "="
expr <- expression
return $ Assign var expr
simpleStmt :: Parser Stmt
simpleStmt =
do expr <- expression
return $ Simple expr
In action:
boobla> foo = 100 + ~100
167
boobla> foo
parser error: (line 1, column 4):
unexpected end of input
expecting letter or digit or "="
Second expression should have evaluated to 167, value of foo.
I thought that when Parsec would try to extract token reservedOp "=", it should have failed because there is no such token in the string, then it was to try second function simpleStmt and succeed with it. But it works differently: it expects more input and just throws this exception.
What should I use to make assignStmt fail if there is no more characters in the string (or in current line). foo = 10 should be parsed with assignStmt and foo should be parsed with simpleStmt.

You're missing the try function.
By default, the <|> operator will try the left parser, and if it fails without consuming any characters, it will try the right parser.
However, if — as in your case — the parser fails after having consumed some characters, and the right parser is never tried. Note that this is often the behavior you want; if you had something like
parseForLoop <|> parseWhileLoop
then if the input is something like "for break", then that's not a valid for-loop, and there's no point trying to parse it as a while-loop, since that will surely fail also.
The try combinator changes this behaviour. Specifically, it makes a failed parser appear to have consumed no input. (This has a space penalty; the input could have been thrown away, but try makes it hang around.)

Related

Parsing redundant parenthesis while restoring input on non-committed failure

I have written a parser in Haskell's parsec library for a lisp-like language, and I would like to improve its error messages, but I am stuck in the following situation:
p :: Parser Integer
p = do
try (string "number")
space
integer
parens :: Parser a -> Parser a
parens p = do
char '('
v <- p
char ')'
return v
finalParser p = p <|> (parens finalParser)
So the parser p behaves quite nicely error-wise, it does not commit and consume input until it sees the keyword "number". Once it successfully parsed the keyword, it will not restore its input on failure. If this parser is combined with other parsers using the <|> operator, and it parses the keyword, it will show the error message of p and not try the next parser.
Already parens p does not have this property anymore.
I would like to write a function parens as above that consumes only input, if the parser p consumes input.
Here are a few examples of the behavior, I am trying to achieve. finalParser should parse:
"(((number 5)))", "number 5", "(number 5)" all should parse to the integer 5, and consume everything,
"(5)" should consume nothing (not even the parenthesis) and fail
"(number abc)", "number cde" should fail and consume input.
I would like to write a function parens as above that consumes only input, if the parser p consumes input.
Rather than that, consider factoring your syntax so you don't even need to backtrack after consuming a parenthesis.
Instead of trying to make this work:
parens p1 <|> parens p2
why not write this instead:
parens (p1 <|> p2)
?

Add IO functionality to imperative language parser

I'm trying to add IO functionality such as read and write statements to a parser for an imperative language like shown here https://wiki.haskell.org/Parsing_a_simple_imperative_language
I want to add statements such as write "example" which will write "example" to stdout with something like putStrLn or print.
So far I've made the following changes
-- Add Write data type so that a statement such as `write "test"` can be represented as a statement
data Stmt = Seq [Stmt]
| ... (same as before)
| Write String
deriving (Show)
-- write needs to be added to reservedNames as it has a function in the language
languageDef =
emptyDef { Token...
, ...
, Token.reservedNames = [ "if"
, ...
, "write"
...
-- whenever a statement is parsed, writeStmt function now needs to be called to parse a write statement
statement' :: Parser Stmt
statement' = ifStmt
<|> whileStmt
<|> skipStmt
<|> assignStmt
<|> writeStmt
-- do reserved write, get the identifier, print it to stdout and return the Write statement
writeStmt :: Parser Stmt
writeStmt =
do reserved "write"
var <- identifier
print var
return $ Write var
I'm getting errors on var <- identifier in whileStmt and i'm not sure what else needs to be added or changed to get this to work. thanks
As was noted in the comments, you need to fix the indentation so all of the statements in your do-block line up:
writeStmt :: Parser Stmt
writeStmt =
do reserved "write"
var <- identifier
print var
return $ Write var
and then, you need to delete that print var statement. During parsing, you just want to identify write statements and store them in your abstract syntax tree (AST) (i.e., in your Stmt tree). The actual printing will take place in a separate function that executes the program represented by the AST.
Unfortunately, the wiki page you're working from only shows you how to perform the parsing, not how to actually execute the parsed representation.

Why doesn't Parsec backtrack when one part of the parser succeeds and the rest fails?

I have this parsec parser :
a = optionMaybe $ do {try $ spaceornull *> string "hello";string "No"}
Where spaceornull is ((:[]) <$> try space) <|> string ""
When I test a with input " " I get :
Left (line 1, column 2):
unexpected end of input
expecting "hello"
I don't understand this, spaceornull *> string "hello" should fail because there is no "hello", then with try parsec backtracks and now there is no consumed input but try fails anyway so the parser passed to optionMaybe (the one inside do) fails altogether, it shouldn't try to consume farther input, so we end up with a failed parser without consuming any input so I should get Right Nothing.
But the error message says it, the space is consumed so try didn't really backtrack, does try not backtrack when part of parser succeeds ? and how to make it backtrack with the above ?
try has nothing to do with whether failure is allowed or not. It merely makes it possible to backtrack in case of failure†, but to commence the backtracking you need to provide an alternative parser to start at that point. The usual way to do that is with the <|> operator:
a = optionMaybe $ (try $ spaceornull *> string "hello") <|> string "No"
OTOH, your code is equivalent to
a = optionMaybe $ (try $ spaceornull *> string "hello") >> string "No"
where the monadic chaining operator >> (same as *>) will in the case of parsec check if the LHS succeeds, then go on and also run the RHS parser. So it must be, because you could also write:
a = optionMaybe $ do
s <- try $ spaceornull *> string "hello"
string $ "No"++s
Here I've used the result of the first parser (which you simply threw away, by not <--matching it to any variable) in deciding what the second one should look for. This clearly is only possible of the first actually succeeded!
†Basically, <|> only works if either the LHS immediately fails right at the first character, or if you set a backtracking point with try. The reason this is required is that it would be very inefficient if parsec would need to leave a backtracking point before each and every alternative that needs to be ckecked.

Trying to simplify the checking of an IO Bool in an Attoparsec parser

I'm trying to simplify the below code that's part of an attoparsec parser for a network packet, and I'm at a loss for a nice way to do it.
It starts with a call to atEnd :: IO Bool to determine if there's more to parse. I can't find a nicer way to use atEnd than to unwrap it from IO and then use it in an if statement, but it seems like there must be be a simpler way to check bool inside a monad. Here's the code:
maybePayload :: Parser (Maybe A.Value)
maybePayload = do
e <- atEnd
if e then return Nothing
else do
payload' <- char ':' *> takeByteString
maybe mzero (return . Just) (maybeResult $ parse A.json payload')
The intention is to return Nothing if there is no payload, to return Just A.Value if there is a valid JSON payload, and for the parser to fail if there is a non-valid payload.
Here's the Packet record that eventually gets created:
data Packet = Packet
{ pID :: Integer
, pEndpoint :: String
, pPayload :: Maybe A.Value
}
You're doing a lot of work you don't need to do. First you check if you're at the end of the data and return Nothing if that doesn't work out. That's just not necessary, because if you're at the end, any parser that requires content will fail, and using maybeResult will turn that failure into Nothing.
The only time your parser fails is with the case where the input has data which doesn't start with the character :, the rest of the time it succeeds, even if that's by returning Nothing.
The only actual parsing going on is checking for : then using A.json. I think you're trying to write the whole program inside one parser, whereas you should just do the parsing on its own then call that as necessary. There's no need to check for end of data, or to make sure you get the whole content - that's all built in for free in a parser. Once you get rid of all that unnecessary checking, you get
payload :: Parser A.Value
payload = char ':' *> A.json
If you want to you can use that as maybeResult $ parse payload input to get a Maybe A.Value that's not additionally wrapped in a Parser. If you don't apply maybeResult, you can pattern match on the Result returned to deal separately with Failure, Partial success and Success.
Edit: OK, clearer now, thanks:
(If there's a colon followed by invalid json, fail)
If there's a colon followed by valid json, succeed, wrapping it in Just
If there's just end of input, succeed, returning Nothing
So we get:
maybePayload :: Parser (Maybe A.Value)
maybePayload = char ':' *> (Just <$> A.json)
<|> (Nothing <$ endOfInput)
I've used <$> and <$ from Control.Applicative, or if you prefer, from Data.Functor.
<$> is an infix version of fmap, so Just <$> A.json does A.json and wraps any output in Just.
<$ is fmap.const so replaces the () from endOfInput with Nothing.
Why you need to encode failure in Maybe when the parser monad already has a built-in notion of failure? The problem with using Maybe in this way is that the parser cannot backtrack.
You could try something like this (I haven't tried to typecheck it) and then use option in the caller:
payload :: Parser Value
payload = do
payload' <- char ':' *> takeByteString
let res = parse A.json payload'
case res of
Error msg -> fail msg
Success a -> return a

Parsec error - try doesn't seem to work

I'm currently using the Text.Parsec.Expr module to parse a subset of a scripting language.
Basically, there are two kinds of commands in this language: Assignment of the form $var = expr and a Command of the form $var = $array[$index] - there are of course other commands, but this suffices to explain my problem.
I've created a type Command, to represent this, along with corresponding parsers, where expr for the assignment is handled by Parsec's buildExpressionParser.
Now, the problem. First the parsing code:
main = case parse p "" "$c = $a[$b]" of
Left err -> putStrLn . show $ err
Right r -> putStrLn . show $ r
where p = (try assignment <|> command) <* eof -- (1)
The whole code (50 lines) is pasted here: Link (should compile if you've parsec installed)
The problem is, that parsing fails, since assignment doesn't successfully parse, even though there is a try before. Reversing the parsing order (try command <|> assignment) solves the problem, but is not possible in my case.
Of course I tried to locate the problem further and it appears to me, that the problem is the expression parser (build by buildExpressionParser), since parsing succeeds if I say expr = fail "". However I can't find anything in the Parsec sources that would explain this behaviour.
You parser fails because in fact assigment does succeeds here consuming $c = $a (try it with plain where p = assignment). Then there is supposed to be eof (or the rest of expr from assigment) hence the error. It seems that the beggining of your 'command' is identical to your 'assignment' in the case when 'assignment''s argument is just a var (like $c = $a).
Not sure why you can't reverse command and assignment but another way to make this particular example work would be:
main = case parse p "" "$c = $a[$b]" of
Left err -> putStrLn . show $ err
Right r -> putStrLn . show $ r
where p = try (assignment <* eof) <|> (command <* eof)

Resources