Parsec error - try doesn't seem to work - haskell

I'm currently using the Text.Parsec.Expr module to parse a subset of a scripting language.
Basically, there are two kinds of commands in this language: Assignment of the form $var = expr and a Command of the form $var = $array[$index] - there are of course other commands, but this suffices to explain my problem.
I've created a type Command, to represent this, along with corresponding parsers, where expr for the assignment is handled by Parsec's buildExpressionParser.
Now, the problem. First the parsing code:
main = case parse p "" "$c = $a[$b]" of
Left err -> putStrLn . show $ err
Right r -> putStrLn . show $ r
where p = (try assignment <|> command) <* eof -- (1)
The whole code (50 lines) is pasted here: Link (should compile if you've parsec installed)
The problem is, that parsing fails, since assignment doesn't successfully parse, even though there is a try before. Reversing the parsing order (try command <|> assignment) solves the problem, but is not possible in my case.
Of course I tried to locate the problem further and it appears to me, that the problem is the expression parser (build by buildExpressionParser), since parsing succeeds if I say expr = fail "". However I can't find anything in the Parsec sources that would explain this behaviour.

You parser fails because in fact assigment does succeeds here consuming $c = $a (try it with plain where p = assignment). Then there is supposed to be eof (or the rest of expr from assigment) hence the error. It seems that the beggining of your 'command' is identical to your 'assignment' in the case when 'assignment''s argument is just a var (like $c = $a).
Not sure why you can't reverse command and assignment but another way to make this particular example work would be:
main = case parse p "" "$c = $a[$b]" of
Left err -> putStrLn . show $ err
Right r -> putStrLn . show $ r
where p = try (assignment <* eof) <|> (command <* eof)

Related

Haskell: Parsec: Pipeline of transformers of the whole file

I'm trying to use parsec to read a C/C++/java source file and do a series of transformations on the entire file. The first phase removes strings and the second phase removes comments. (That's because you might get a /* inside a string.)
So each phase transforms a string onto Either String Error, and I want to bind (in the sense of Either) them together to make a pipeline of transformations of the whole file. This seems like a fairly general requirement.
import Text.ParserCombinators.Parsec
commentless, stringless :: Parser String
stringless = fmap concat ( (many (noneOf "\"")) `sepBy` quotedString )
quotedString = (char '"') >> (many quotedChar) >> (char '"')
quotedChar = try (string "\\\"" >> return '"' ) <|> (noneOf "\"")
commentless = fmap concat $ notComment `sepBy` comment
notComment = manyTill anyChar (lookAhead (comment <|> eof))
comment = (string "//" >> manyTill anyChar newline >> spaces >> return ())
<|> (string "/*" >> manyTill anyChar (string "*/") >> spaces >> return ())
main =
do c <- getContents
case parse commentless "(stdin)" c of -- THIS WORKS
-- case parse stringless "(stdin)" c of -- THIS WORKS TOO
-- case parse (stringless `THISISWHATIWANT` commentless) "(stdin)" c of
Left e -> do putStrLn "Error parsing input:"
print e
Right r -> print r
So how can I do this? I tried parserBind but it didn't work.
(In case anybody cares why, I'm trying to do a kind of light parse where I just extract what I want but avoid parsing the entire grammar or even knowing whether it's C++ or Java. All I need to extract is the starting and ending line numbers of all classes and functions. So I envisage a bunch of preprocessing phases that just scrub out comments, #defines/ifdefs, template preambles and contents of parentheses (because of the semicolons in for clauses), then I'll parse for snippets preceding {s (or following }s because of typedefs) and stuff those snippets through yet another phase to get the type and name of whatever it is, then recurse to just the second level to get java member functions.)
You need to bind Either Error, not Parser. You need to move the bind outside the parse, and use multiple parses:
parse stringless "(stdin)" input >>= parse commentless "(stdin)"
There is probably a better approach than what you are using, but this will do what you want.

Why doesn't Parsec backtrack when one part of the parser succeeds and the rest fails?

I have this parsec parser :
a = optionMaybe $ do {try $ spaceornull *> string "hello";string "No"}
Where spaceornull is ((:[]) <$> try space) <|> string ""
When I test a with input " " I get :
Left (line 1, column 2):
unexpected end of input
expecting "hello"
I don't understand this, spaceornull *> string "hello" should fail because there is no "hello", then with try parsec backtracks and now there is no consumed input but try fails anyway so the parser passed to optionMaybe (the one inside do) fails altogether, it shouldn't try to consume farther input, so we end up with a failed parser without consuming any input so I should get Right Nothing.
But the error message says it, the space is consumed so try didn't really backtrack, does try not backtrack when part of parser succeeds ? and how to make it backtrack with the above ?
try has nothing to do with whether failure is allowed or not. It merely makes it possible to backtrack in case of failure†, but to commence the backtracking you need to provide an alternative parser to start at that point. The usual way to do that is with the <|> operator:
a = optionMaybe $ (try $ spaceornull *> string "hello") <|> string "No"
OTOH, your code is equivalent to
a = optionMaybe $ (try $ spaceornull *> string "hello") >> string "No"
where the monadic chaining operator >> (same as *>) will in the case of parsec check if the LHS succeeds, then go on and also run the RHS parser. So it must be, because you could also write:
a = optionMaybe $ do
s <- try $ spaceornull *> string "hello"
string $ "No"++s
Here I've used the result of the first parser (which you simply threw away, by not <--matching it to any variable) in deciding what the second one should look for. This clearly is only possible of the first actually succeeded!
†Basically, <|> only works if either the LHS immediately fails right at the first character, or if you set a backtracking point with try. The reason this is required is that it would be very inefficient if parsec would need to leave a backtracking point before each and every alternative that needs to be ckecked.

Make Parsec function fail instead of expecting more input

I'm using Parsec to parse some expressions (see this question for more context), and most relevant part of my code is:
statement :: Parser Stmt
statement = assignStmt <|> simpleStmt
assignStmt :: Parser Stmt
assignStmt =
do var <- identifier
reservedOp "="
expr <- expression
return $ Assign var expr
simpleStmt :: Parser Stmt
simpleStmt =
do expr <- expression
return $ Simple expr
In action:
boobla> foo = 100 + ~100
167
boobla> foo
parser error: (line 1, column 4):
unexpected end of input
expecting letter or digit or "="
Second expression should have evaluated to 167, value of foo.
I thought that when Parsec would try to extract token reservedOp "=", it should have failed because there is no such token in the string, then it was to try second function simpleStmt and succeed with it. But it works differently: it expects more input and just throws this exception.
What should I use to make assignStmt fail if there is no more characters in the string (or in current line). foo = 10 should be parsed with assignStmt and foo should be parsed with simpleStmt.
You're missing the try function.
By default, the <|> operator will try the left parser, and if it fails without consuming any characters, it will try the right parser.
However, if — as in your case — the parser fails after having consumed some characters, and the right parser is never tried. Note that this is often the behavior you want; if you had something like
parseForLoop <|> parseWhileLoop
then if the input is something like "for break", then that's not a valid for-loop, and there's no point trying to parse it as a while-loop, since that will surely fail also.
The try combinator changes this behaviour. Specifically, it makes a failed parser appear to have consumed no input. (This has a space penalty; the input could have been thrown away, but try makes it hang around.)

Haskell Parsec accounting for multiple expression occrrences in grammar

I have been trying to create a parser using details from the following tutorial
much of the code is copied directly from the tutorial with only a few names changed.
import qualified Text.ParserCombinators.Parsec.Token as P
reserved = P.reserved lexer
integer = P.integer lexer
whiteSpace = P.whiteSpace lexer
identifier = P.identifier lexer
data Express = Seq [Express]
| ID String
| Num Integer
| BoolConst Bool
deriving (Show)
whileParser :: Parser Express
whileParser = whiteSpace >> expr7
expr7 = seqOfStmt
<|> expr8
seqOfStmt =
do list <- (sepBy1 expr8 whiteSpace)
return $ if length list == 1 then head list else Seq list
expr8 :: Parser Express
expr8 = name
<|> number
<|> bTerm
name :: Parser Express
name = fmap ID identifier
number :: Parser Express
number = fmap Num integer
bTerm :: Parser Express
bTerm = (reserved "True" >> return (BoolConst True ))
<|> (reserved "False" >> return (BoolConst False))
I understand that this code might be laughable but I would really like to learn a bit more about where I'm going wrong. I also think that this should provide enough info but if not let me know.
Error:
parse error on input `return'
I believe that the error has something to do with different return types, which is strange because I have tried to use the tutorial at the start of the post as a basis for all that I am attempting.
Thanks in advance,
Seán
If you are not comfortable with the layout rules, you may also use different syntax:
seqOfStmt =
do { list
<- (sepBy1 expr8 whiteSpace);
return $ if length
list == 1
then head list else Seq list;}
The layout without braces and semicolons is regarded superior, though, for 2 reasons:
You don't need to type ugly ; and braces
It forces you to write (mostly) readable code, unlike the distorted crap I gave as example above.
And the rules are really easy:
Don't use tabs, use spaces. Always. (Your editor can do that, if not, throw it away, it's crapware.)
Things that belong together must be aligned in the same column.
For example, you have 2 statements that belong to the do block, hence they must be aligned in the same column. But you have aligned the return with the do, hence the compiler sees this as:
do { list <- sepBy1 expr8 whiteSpace; };
return $ ....;
but what you want is this:
do {
list <- sepBy1 ....;
return $ .....;
}
(Note that you can just leave out the braces and the semicolon and it will be ok as long as you leave the indentation intact.

parsec error in haskelwiki tutorial

I was following the code in http://www.haskell.org/haskellwiki/Hitchhikers_guide_to_Haskell, and the code (in chapter 2) gives an error. There is no author name/email mentioned with the tutorial, so I am coming here for advise. The code is below, and the error occurs on the "eof" word.
module Main where
import Text.ParserCombinators.Parsec
parseInput =
do dirs <- many dirAndSize
eof
return dirs
data Dir = Dir Int String deriving Show
dirAndSize =
do size <- many1 digit
spaces
dir_name <- anyChar `manyTill` newline
return (Dir (read size) dir_name)
main = do
input <- getContents
putStrLn ("Debug: got inputs: " ++ input)
That tutorial was written a long time ago, when parsec was simple. Nowadays, since parsec-3, the library can wrap monads, so you now have to specify (or otherwise disambiguate) the type to use at some points. This is one of them, giving eof e.g. the expression type signature eof :: Parser () makes it compile.

Resources