parseIdent :: Parser (String)
parseIdent = do
x <- lookAhead $ try $ many1 (choice [alphaNum])
void $ optional endOfLine <|> eof
case x of
"macro" -> fail "illegal"
_ -> pure x
I'm trying to parse an alphanumeric string that only succeeds if it does not match a predetermined value (macro in this case).
However the following is giving me an error of:
*** Exception: Text.ParserCombinators.Parsec.Prim.many: combinator 'many' is applied to a parser that accepts an empty string.
Which does not make sense, how does many1 (choice [alphaNum]) accept an empty string?
This error goes away if i remove the lookAhead $ try. But it 'fails' with illegal:
...
*** Exception: (line 6, column 36):
unexpected " "
expecting letter or digit or new-line
illegal
Am I going about this correctly? Or is there another technique to implement a negative search?
You almost have it:
import Text.Parsec
import Text.Parsec.Char
import Text.Parsec.String
import Control.Monad
parseIdent :: Parser (String)
parseIdent = try $ do
x <- many1 alphaNum
void $ optional endOfLine <|> eof
case x of
"macro" -> fail "illegal"
_ -> pure x
So, why didn't your code work?
the try is in the wrong spot. The real backtracking piece here is backtracking after you've gotten back your alphanumeric word and checked it isn't "macro"
lookAhead has no business here. If you end up with the word you wanted, you do want the word to be consumed from the input. try already takes care of resetting your input stream to its previous state
Related
I'm learning Parsec. I've got this code:
import Text.Parsec.String (Parser)
import Control.Applicative hiding ((<|>))
import Text.ParserCombinators.Parsec hiding (many)
inBracketsP :: Parser [String]
inBracketsP = (many $ between (char '[') (char ']') (many $ char '.')) <* eof
main :: IO ()
main = putStr $ show $ parse inBracketsP "" "[...][..."
The result is
Left (line 1, column 10):
unexpected end of input
expecting "." or "]"
This message is not useful (adding . won't fix the problem). I'd expect something like ']' expected (only ] fixes the problem).
Is it possible to achieve that easily with Parsec? I've seen the SO question Parsec: error message at specific location, which is inspiring, but I'd prefer to stick to the between combinator, without manual lookahead or other overengineering (kind of), if possible.
You can hide a terminal from being displayed in the expected input list by attaching an empty label to it (parser <?> ""):
inBracketsP :: Parser [String]
inBracketsP = (many $ between (char '[') (char ']') (many $ (char '.' <?> ""))) <* eof
-- >>> main
-- Left (line 1, column 10):
-- unexpected end of input
-- expecting "]"
In megaparsec, there is also a hidden combinator that achieves the same effect.
I'm making a parser with Parsec and I try to return a specific error during the parsing.
This is a minimal parser example to expose my problem :
parseA = try seq1
<|> seq2
seq1 = do
manyTill anyChar (try $ string "\n* ")
many1 anyChar
fail "My error message"
seq2 = do
manyTill anyChar (try $ string "\n- ")
many1 anyChar
I would like to perform some tests in the first try $ do sequence and stop the parsing and return a specific error message.
When I don't use fail I get :
ghci> parse parseA "" "aaaaaa\nbbbb\n* ccccc\n- ddd"
Right "ccccc\n- ddd"
When I use fail or unexpected, my parser doesn't stop (due to the try function) and execute the next do sequence:
ghci> parse parseA "" "aaaaaa\nbbbb\n* ccccc\n- ddd"
Right "ddd"
And it's not what I want!
I considered using the basic error function to stop the execution of my parser but I would like to have a "clean" error returned by the parsing function like this:
ghci> parse parseA "" "aaaaaa\nbbbb\n* ccccc\n- ddd"
Left "My error message"
Do you know how to properly stop a parser and return a custom error?
If you want the monad to behave differently then perhaps you should build a different monad. (N.B. I'm not entirely clear what you want, but moving forward anyway).
Solution: Use a Monad Transformer Stack
For example, to get a fail-like function that isn't caught and ignored by Parsec's try you could use an Except monad. Except allows you to throw errors much like exceptions but they are plumbed monadically instead of using the actual exception mechanism which demands IO to catch it.
First, lets define our monad:
import Text.Parsec
import Text.Parsec.Combinator
import Text.Parsec.Char
import Control.Monad.Trans.Except
import Control.Monad.Trans
type EscParse a = ParsecT String () (Except String) a
So the monad is EscParse and combines features of Parsec (via the transformer ParsecT) and Except.
Second, let's define some helpers:
run :: EscParse a -> SourceName -> String -> Either String (Either ParseError a)
run op sn input = runExcept (runPT op () sn input)
escFail :: String -> EscParse a
escFail = lift. throwE
Our run is like runParse but also runs the except monad. You might want to do something to avoid the nested Either, but that's an easy cosmetic change. escFail is what you'd use if you don't want the error to be ignored.
Third, we need to implement your parser using this new monad:
parseA :: EscParse String
parseA = try seq1 <|> seq2
seq1 :: EscParse String
seq1 = do manyTill anyChar (try $ string "\n* ")
many1 anyChar
escFail "My error message"
seq2 :: EscParse String
seq2 = do manyTill anyChar (try $ string "\n- ")
many1 anyChar
Other than spacing and type signature, the above matches what you had but using escFail instead of fail.
I wanted to replace sed and awk with Parsec. For example, extract number from strings like unknown structure but containing the number 42 and maybe some other stuff.
I run into "unexpected end of input". I'm looking for equivalent of non-greedy .*([0-9]+).*.
module Main where
import Text.Parsec
parser :: Parsec String () Int
parser = do
_ <- many anyToken
x <- read <$> many1 digit
_ <- many anyToken
return x
main :: IO ()
main = interact (show . parse parser "STDIN")
This can be easily done with my library regex-applicative. It gives you both the combinator interface and the features of regular expressions that you seem to want.
Here's a working version that's closest to your example:
{-# LANGUAGE ApplicativeDo #-}
import Text.Regex.Applicative
import Text.Regex.Applicative.Common (decimal)
parser :: RE Char Int
parser = do
_ <- few anySym
x <- decimal
_ <- many anySym
return x
main :: IO ()
main = interact (show . match parser)
Here's an even shorter version, using findFirstInfix:
import Text.Regex.Applicative
import Text.Regex.Applicative.Common (decimal)
main :: IO ()
main = interact (snd3 . findFirstInfix decimal)
where snd3 (_, r, _) = r
If you want to perform actual tokenization (e.g. skip 93 in foo93bar), then take a look at lexer-applicative, a tokenizer based on regex-applicative.
Replacing sed and awk with parsers is what the
replace-megaparsec
library is all about.
Extract numbers from unstructured strings with the
sepCap
parser combinator.
import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char.Lexer
parseTest (sepCap (decimal :: Parsec Void String Int))
$ "unknown structure but containing the number 42 and maybe some other stuff"
[ Left "unknown structure but containing the number "
, Right 42
, Left " and maybe some other stuff"
]
This cannot work, since anyToken accepts and consumes - as its names says - any token, including digits. And you apply it many times. Therefore the attempt to read digits with the second parser must fail. There simply cannot be any tokens left.
Instead make your first parser accept any character, that is not a digit (using isDigit from module Data.Char):
parser :: Parsec String () Int
parser = do
_ <- many $ satisfy (not . isDigit)
x <- read <$> many1 digit
_ <- many anyToken
return x
I have this parsec parser :
a = optionMaybe $ do {try $ spaceornull *> string "hello";string "No"}
Where spaceornull is ((:[]) <$> try space) <|> string ""
When I test a with input " " I get :
Left (line 1, column 2):
unexpected end of input
expecting "hello"
I don't understand this, spaceornull *> string "hello" should fail because there is no "hello", then with try parsec backtracks and now there is no consumed input but try fails anyway so the parser passed to optionMaybe (the one inside do) fails altogether, it shouldn't try to consume farther input, so we end up with a failed parser without consuming any input so I should get Right Nothing.
But the error message says it, the space is consumed so try didn't really backtrack, does try not backtrack when part of parser succeeds ? and how to make it backtrack with the above ?
try has nothing to do with whether failure is allowed or not. It merely makes it possible to backtrack in case of failure†, but to commence the backtracking you need to provide an alternative parser to start at that point. The usual way to do that is with the <|> operator:
a = optionMaybe $ (try $ spaceornull *> string "hello") <|> string "No"
OTOH, your code is equivalent to
a = optionMaybe $ (try $ spaceornull *> string "hello") >> string "No"
where the monadic chaining operator >> (same as *>) will in the case of parsec check if the LHS succeeds, then go on and also run the RHS parser. So it must be, because you could also write:
a = optionMaybe $ do
s <- try $ spaceornull *> string "hello"
string $ "No"++s
Here I've used the result of the first parser (which you simply threw away, by not <--matching it to any variable) in deciding what the second one should look for. This clearly is only possible of the first actually succeeded!
†Basically, <|> only works if either the LHS immediately fails right at the first character, or if you set a backtracking point with try. The reason this is required is that it would be very inefficient if parsec would need to leave a backtracking point before each and every alternative that needs to be ckecked.
I'm writing my first program with Parsec. I want to parse MySQL schema dumps and would like to come up with a nice way to parse strings representing certain keywords in case-insensitive fashion. Here is some code showing the approach I'm using to parse "CREATE" or "create". Is there a better way to do this? An answer that doesn't resort to buildExpressionParser would be best. I'm taking baby steps here.
p_create_t :: GenParser Char st Statement
p_create_t = do
x <- (string "CREATE" <|> string "create")
xs <- manyTill anyChar (char ';')
return $ CreateTable (x ++ xs) [] -- refine later
You can build the case-insensitive parser out of character parsers.
-- Match the lowercase or uppercase form of 'c'
caseInsensitiveChar c = char (toLower c) <|> char (toUpper c)
-- Match the string 's', accepting either lowercase or uppercase form of each character
caseInsensitiveString s = try (mapM caseInsensitiveChar s) <?> "\"" ++ s ++ "\""
Repeating what I said in a comment, as it was apparently helpful:
The simple sledgehammer solution here is to simply map toLower over the entire input before running the parser, then do all your keyword matching in lowercase.
This presents obvious difficulties if you're parsing something that needs to be case-insensitive in some places and case-sensitive in others, or if you care about preserving case for cosmetic reasons. For example, although HTML tags are case-insensitive, converting an entire webpage to lowercase while parsing it would probably be undesirable. Even when compiling a case-insensitive programming language, converting identifiers could be annoying, as any resulting error messages would not match what the programmer wrote.
No, Parsec cannot do that in clean way. string is implemented on top of
primitive tokens combinator that is hard-coded to use equality test
(==). It's a bit simpler to parse case-insensitive character, but you
probably want more.
There is however a modern fork of Parsec, called
Megaparsec which has
built-in solutions for everything you may want:
λ> parseTest (char' 'a') "b"
parse error at line 1, column 1:
unexpected 'b'
expecting 'A' or 'a'
λ> parseTest (string' "foo") "Foo"
"Foo"
λ> parseTest (string' "foo") "FOO"
"FOO"
λ> parseTest (string' "foo") "fo!"
parse error at line 1, column 1:
unexpected "fo!"
expecting "foo"
Note the last error message, it's better than what you can get parsing
characters one by one (especially useful in your particular case). string'
is implemented just like Parsec's string but uses case-insensitive
comparison to compare characters. There are also oneOf' and noneOf' that
may be helpful in some cases.
Disclosure: I'm one of the authors of Megaparsec.
Instead of mapping the entire input with toLower, consider using caseString from Text.ParserCombinators.Parsec.Rfc2234 (from the hsemail package)
Text.ParsecCombinators.Parsec.Rfc2234
p_create_t :: GenParser Char st Statement
p_create_t = do
x <- (caseString "create")
xs <- manyTill anyChar (char ';')
return $ CreateTable (x ++ xs) [] -- refine later
So now x will be whatever case-variant is present in the input without changing your input.
ps: I know that this is an ancient question, I just thought that I would add this as this question came up while I was searching for a similar problem
There is a package name parsec-extra for this purpuse. You need install this package then use 'caseInsensitiveString' parser.
:m Text.Parsec
:m +Text.Parsec.Extra
*> parseTest (caseInsensitiveString "values") "vaLUES"
"values"
*> parseTest (caseInsensitiveString "values") "VAlues"
"values"
Link to package is here:
https://hackage.haskell.org/package/parsec-extra