Incomplete input using endOfInput parser - haskell

Consider the string "(=x250) toto e", and the function:
charToText :: Char -> Text
charToText c = pack [c]
The string is successfully parsed by:
mconcat <$> manyTill (charToText <$> anyChar) (char 'e')
with the expected result "(=x250) toto ".
However, the parser:
mconcat <$> manyTill (charToText <$> anyChar) endOfInput
returns Partial _.
Why is it so ? I thought endOfInput would succeed at the end of the string and stop manyTill (as in the first example).

To get a complete answer, you'll need to provide a fully self-contained example that generates the Result: incomplete input error message, but your Attoparsec parser is working correctly. You can see similar behavior with a much simpler example:
λ> parse (char 'e') "e"
Done "" 'e'
λ> parse endOfInput ""
Partial _
Attoparsec parsers by design allow incremental supply of additional input. When they are run on some (potentially partial) input, they return a Done result if the parser unconditionally succeeds on the supplied input. They return a Partial result if more input is needed to decide if the parser succeeds.
For my example above, char 'e' always successfully parses the partial input "e", no matter what additional input you might decide to supply, hence the result is a Done.
However, endOfInput might succeed on the partial input "", but only if no additional input is going to be supplied. If there is additional input, endOfInput will fail. Because of this, a Partial result is returned.
It's the same for your example. The success of your second parser depends on whether or not additional input is supplied. If there's no additional input, the parser is Done, but if there is additional input, the parser has more to do.
You will either need to arrange to run your parser with parseOnly:
λ> parseOnly (manyTill anyChar endOfInput) "foo"
Right "foo"
or you will feed your parse result an empty bytestring which will indicate that no further input is available:
λ> parse (manyTill anyChar endOfInput) "foo"
Partial _
λ> feed (parse (manyTill anyChar endOfInput) "foo") ""
Done "" "foo"

Related

How can I force Parsec to return an error?

I'm making a parser with Parsec and I try to return a specific error during the parsing.
This is a minimal parser example to expose my problem :
parseA = try seq1
<|> seq2
seq1 = do
manyTill anyChar (try $ string "\n* ")
many1 anyChar
fail "My error message"
seq2 = do
manyTill anyChar (try $ string "\n- ")
many1 anyChar
I would like to perform some tests in the first try $ do sequence and stop the parsing and return a specific error message.
When I don't use fail I get :
ghci> parse parseA "" "aaaaaa\nbbbb\n* ccccc\n- ddd"
Right "ccccc\n- ddd"
When I use fail or unexpected, my parser doesn't stop (due to the try function) and execute the next do sequence:
ghci> parse parseA "" "aaaaaa\nbbbb\n* ccccc\n- ddd"
Right "ddd"
And it's not what I want!
I considered using the basic error function to stop the execution of my parser but I would like to have a "clean" error returned by the parsing function like this:
ghci> parse parseA "" "aaaaaa\nbbbb\n* ccccc\n- ddd"
Left "My error message"
Do you know how to properly stop a parser and return a custom error?
If you want the monad to behave differently then perhaps you should build a different monad. (N.B. I'm not entirely clear what you want, but moving forward anyway).
Solution: Use a Monad Transformer Stack
For example, to get a fail-like function that isn't caught and ignored by Parsec's try you could use an Except monad. Except allows you to throw errors much like exceptions but they are plumbed monadically instead of using the actual exception mechanism which demands IO to catch it.
First, lets define our monad:
import Text.Parsec
import Text.Parsec.Combinator
import Text.Parsec.Char
import Control.Monad.Trans.Except
import Control.Monad.Trans
type EscParse a = ParsecT String () (Except String) a
So the monad is EscParse and combines features of Parsec (via the transformer ParsecT) and Except.
Second, let's define some helpers:
run :: EscParse a -> SourceName -> String -> Either String (Either ParseError a)
run op sn input = runExcept (runPT op () sn input)
escFail :: String -> EscParse a
escFail = lift. throwE
Our run is like runParse but also runs the except monad. You might want to do something to avoid the nested Either, but that's an easy cosmetic change. escFail is what you'd use if you don't want the error to be ignored.
Third, we need to implement your parser using this new monad:
parseA :: EscParse String
parseA = try seq1 <|> seq2
seq1 :: EscParse String
seq1 = do manyTill anyChar (try $ string "\n* ")
many1 anyChar
escFail "My error message"
seq2 :: EscParse String
seq2 = do manyTill anyChar (try $ string "\n- ")
many1 anyChar
Other than spacing and type signature, the above matches what you had but using escFail instead of fail.

Haskell: Parsec: Pipeline of transformers of the whole file

I'm trying to use parsec to read a C/C++/java source file and do a series of transformations on the entire file. The first phase removes strings and the second phase removes comments. (That's because you might get a /* inside a string.)
So each phase transforms a string onto Either String Error, and I want to bind (in the sense of Either) them together to make a pipeline of transformations of the whole file. This seems like a fairly general requirement.
import Text.ParserCombinators.Parsec
commentless, stringless :: Parser String
stringless = fmap concat ( (many (noneOf "\"")) `sepBy` quotedString )
quotedString = (char '"') >> (many quotedChar) >> (char '"')
quotedChar = try (string "\\\"" >> return '"' ) <|> (noneOf "\"")
commentless = fmap concat $ notComment `sepBy` comment
notComment = manyTill anyChar (lookAhead (comment <|> eof))
comment = (string "//" >> manyTill anyChar newline >> spaces >> return ())
<|> (string "/*" >> manyTill anyChar (string "*/") >> spaces >> return ())
main =
do c <- getContents
case parse commentless "(stdin)" c of -- THIS WORKS
-- case parse stringless "(stdin)" c of -- THIS WORKS TOO
-- case parse (stringless `THISISWHATIWANT` commentless) "(stdin)" c of
Left e -> do putStrLn "Error parsing input:"
print e
Right r -> print r
So how can I do this? I tried parserBind but it didn't work.
(In case anybody cares why, I'm trying to do a kind of light parse where I just extract what I want but avoid parsing the entire grammar or even knowing whether it's C++ or Java. All I need to extract is the starting and ending line numbers of all classes and functions. So I envisage a bunch of preprocessing phases that just scrub out comments, #defines/ifdefs, template preambles and contents of parentheses (because of the semicolons in for clauses), then I'll parse for snippets preceding {s (or following }s because of typedefs) and stuff those snippets through yet another phase to get the type and name of whatever it is, then recurse to just the second level to get java member functions.)
You need to bind Either Error, not Parser. You need to move the bind outside the parse, and use multiple parses:
parse stringless "(stdin)" input >>= parse commentless "(stdin)"
There is probably a better approach than what you are using, but this will do what you want.

Why doesn't Parsec backtrack when one part of the parser succeeds and the rest fails?

I have this parsec parser :
a = optionMaybe $ do {try $ spaceornull *> string "hello";string "No"}
Where spaceornull is ((:[]) <$> try space) <|> string ""
When I test a with input " " I get :
Left (line 1, column 2):
unexpected end of input
expecting "hello"
I don't understand this, spaceornull *> string "hello" should fail because there is no "hello", then with try parsec backtracks and now there is no consumed input but try fails anyway so the parser passed to optionMaybe (the one inside do) fails altogether, it shouldn't try to consume farther input, so we end up with a failed parser without consuming any input so I should get Right Nothing.
But the error message says it, the space is consumed so try didn't really backtrack, does try not backtrack when part of parser succeeds ? and how to make it backtrack with the above ?
try has nothing to do with whether failure is allowed or not. It merely makes it possible to backtrack in case of failure†, but to commence the backtracking you need to provide an alternative parser to start at that point. The usual way to do that is with the <|> operator:
a = optionMaybe $ (try $ spaceornull *> string "hello") <|> string "No"
OTOH, your code is equivalent to
a = optionMaybe $ (try $ spaceornull *> string "hello") >> string "No"
where the monadic chaining operator >> (same as *>) will in the case of parsec check if the LHS succeeds, then go on and also run the RHS parser. So it must be, because you could also write:
a = optionMaybe $ do
s <- try $ spaceornull *> string "hello"
string $ "No"++s
Here I've used the result of the first parser (which you simply threw away, by not <--matching it to any variable) in deciding what the second one should look for. This clearly is only possible of the first actually succeeded!
†Basically, <|> only works if either the LHS immediately fails right at the first character, or if you set a backtracking point with try. The reason this is required is that it would be very inefficient if parsec would need to leave a backtracking point before each and every alternative that needs to be ckecked.

Conduit and Attoparsec - extracting delimited text

Say I have a document with text delimited by Jade-style brackets, like {{foo}}. I've written an Attoparsec parser that seems to extract foo properly:
findFoos :: Parser [T.Text]
findFoos = many $ do
manyTill anyChar (string "{{")
manyTill letter (string "}}")
Testing it shows that it works:
> parseOnly findFoos "{{foo}}"
Right ["foo"]
> parseOnly findFoos "{{foo}} "
Right ["foo"]
Now, with the Data.Conduit.Attoparsec module in conduit-extra, I seem to be running into strange behavior:
> yield "{{foo}}" $= (mapOutput snd $ CA.conduitParser findFoos) $$ CL.mapM_ print
["foo"]
> yield "{{foo}} " $= (mapOutput snd $ CA.conduitParser findFoos) $$ CL.mapM_ print
-- floods stdout with empty lists
Is this the desired behavior? Is there a conduit utility I should be using here? Any help with this would be tremendous!
Because it uses many, findFoos will return [] without consuming input when it doesn't find any delimited text.
On the other hand, conduitParser applies a parser repeatedly on a stream, returning each parsed value until it exhausts the stream.
The problem with "{{foo}} " is that the parser will consume {{foo}}, but the blank space remains unconsumed in the stream, so further invocations of the parser always return [].
If you redefine findFoos to consume one quoted element at a time, including the trailing blanks, it should work:
findFoos' :: Parser String
findFoos' = do
manyTill anyChar (string "{{")
manyTill letter (string "}}") <* skipSpace
Real-world examples will have other characters between bracketed texts, so skipping the "extra stuff" after each parse (without consuming any of the {{ opening braces for the next parse) will be a bit more involved.
Perhaps something like the following will work:
findFoos'' :: Parser String
findFoos'' = do
manyTill anyChar (string "{{")
manyTill letter (string "}}") <* skipMany everythingExceptOpeningBraces
where
-- is there a simpler / more efficient way of doing this?
everythingExceptOpeningBraces =
-- skip one or more non-braces
(skip (/='{') *> skipWhile (/='{'))
<|>
-- skip single brace followed by non-brace character
(skip (=='{') *> skip (/='{'))
<|>
-- skip a brace at the very end
(skip (=='{') *> endOfInput)
(This parser will fail, however, if there aren't any bracketed texts in the stream. Perhaps you could build a Parser (Maybe Text) that returns Nothing in that case.)

What's the cleanest way to do case-insensitive parsing with Text.Combinators.Parsec?

I'm writing my first program with Parsec. I want to parse MySQL schema dumps and would like to come up with a nice way to parse strings representing certain keywords in case-insensitive fashion. Here is some code showing the approach I'm using to parse "CREATE" or "create". Is there a better way to do this? An answer that doesn't resort to buildExpressionParser would be best. I'm taking baby steps here.
p_create_t :: GenParser Char st Statement
p_create_t = do
x <- (string "CREATE" <|> string "create")
xs <- manyTill anyChar (char ';')
return $ CreateTable (x ++ xs) [] -- refine later
You can build the case-insensitive parser out of character parsers.
-- Match the lowercase or uppercase form of 'c'
caseInsensitiveChar c = char (toLower c) <|> char (toUpper c)
-- Match the string 's', accepting either lowercase or uppercase form of each character
caseInsensitiveString s = try (mapM caseInsensitiveChar s) <?> "\"" ++ s ++ "\""
Repeating what I said in a comment, as it was apparently helpful:
The simple sledgehammer solution here is to simply map toLower over the entire input before running the parser, then do all your keyword matching in lowercase.
This presents obvious difficulties if you're parsing something that needs to be case-insensitive in some places and case-sensitive in others, or if you care about preserving case for cosmetic reasons. For example, although HTML tags are case-insensitive, converting an entire webpage to lowercase while parsing it would probably be undesirable. Even when compiling a case-insensitive programming language, converting identifiers could be annoying, as any resulting error messages would not match what the programmer wrote.
No, Parsec cannot do that in clean way. string is implemented on top of
primitive tokens combinator that is hard-coded to use equality test
(==). It's a bit simpler to parse case-insensitive character, but you
probably want more.
There is however a modern fork of Parsec, called
Megaparsec which has
built-in solutions for everything you may want:
λ> parseTest (char' 'a') "b"
parse error at line 1, column 1:
unexpected 'b'
expecting 'A' or 'a'
λ> parseTest (string' "foo") "Foo"
"Foo"
λ> parseTest (string' "foo") "FOO"
"FOO"
λ> parseTest (string' "foo") "fo!"
parse error at line 1, column 1:
unexpected "fo!"
expecting "foo"
Note the last error message, it's better than what you can get parsing
characters one by one (especially useful in your particular case). string'
is implemented just like Parsec's string but uses case-insensitive
comparison to compare characters. There are also oneOf' and noneOf' that
may be helpful in some cases.
Disclosure: I'm one of the authors of Megaparsec.
Instead of mapping the entire input with toLower, consider using caseString from Text.ParserCombinators.Parsec.Rfc2234 (from the hsemail package)
Text.ParsecCombinators.Parsec.Rfc2234
p_create_t :: GenParser Char st Statement
p_create_t = do
x <- (caseString "create")
xs <- manyTill anyChar (char ';')
return $ CreateTable (x ++ xs) [] -- refine later
So now x will be whatever case-variant is present in the input without changing your input.
ps: I know that this is an ancient question, I just thought that I would add this as this question came up while I was searching for a similar problem
There is a package name parsec-extra for this purpuse. You need install this package then use 'caseInsensitiveString' parser.
:m Text.Parsec
:m +Text.Parsec.Extra
*> parseTest (caseInsensitiveString "values") "vaLUES"
"values"
*> parseTest (caseInsensitiveString "values") "VAlues"
"values"
Link to package is here:
https://hackage.haskell.org/package/parsec-extra

Resources