What is the difference between "try" and "lookAhead" functions in parsec?
The combinators try and lookAhead are similar in that they both let Parsec "rewind", but they apply in different circumstances. In particular, try rewinds failure while lookAhead rewinds success.
By the documentation, try "pretends that it hasn't consumed any input when an error occurs" while lookAhead p "parses p without consuming any input", but "if p fails and consumes some input, so does lookAhead".
So if you think of a parser running as walking along some streaming state and either failing or succeeding, which we might write in Haskell terms as
type Parser a = [Tokens] -> (Either Error a, [Tokens])
then try ensures that if (try p) input ---> (Left err, output) then input == output and lookAhead has it such that (lookAhead p) input ---> (Right a, output) then input == output, but if (lookAhead p) input ---> (Left err, output) then they may be allowed to differ.
We can see this in action by looking at the code for Parsec directly, which is somewhat more complex than my notion of Parser above. First we examine ParsecT
newtype ParsecT s u m a
= ParsecT {unParser :: forall b .
State s u
-> (a -> State s u -> ParseError -> m b) -- consumed ok
-> (ParseError -> m b) -- consumed err
-> (a -> State s u -> ParseError -> m b) -- empty ok
-> (ParseError -> m b) -- empty err
-> m b
}
ParsecT is a continuation-based datatype. If you look at how one of them is constructed
ParsecT $ \s cok cerr eok eerr -> ...
You'll see how we have access to the State s u, s, and four functions which determine how we move forward. For instance, the fail clause of ParsecT's Monad instance uses the eerr option, constructing a ParseError from the current input position and the passed error message.
parserFail :: String -> ParsecT s u m a
parserFail msg
= ParsecT $ \s _ _ _ eerr ->
eerr $ newErrorMessage (Message msg) (statePos s)
While the most primitive successful token parse (tokenPrim) uses a complex sequence of events eventually culminating in calling cok with an updated State s u.
With this intuition, the source for try is particularly simple.
try :: ParsecT s u m a -> ParsecT s u m a
try p =
ParsecT $ \s cok _ eok eerr ->
unParser p s cok eerr eok eerr
It simply builds a new ParsecT based on the one passed to try, but with the "empty err" continuation in place of the consumed error. Whatever parsing combinator next sees try p will be unable to access its actual "consumed err" continuation and thus try is protected from changing its state on errors.
But lookAhead is more sophisticated
lookAhead :: (Stream s m t) => ParsecT s u m a -> ParsecT s u m a
lookAhead p = do{ state <- getParserState
; x <- p'
; setParserState state
; return x
}
where
p' = ParsecT $ \s cok cerr eok eerr ->
unParser p s eok cerr eok eerr
Examining just the where-clause we see it depends on modifying the passed parser p to use the "empty ok" continuation in place of the "consumed ok" continuation. This is symmetric to what try did. Further, it ensures that the parser state is unaffected by whatever happens when this modified p' is run via its do-block.
Related
I need a combinator like p1 << p2, but p2 should run only if p1 has succeeded and consumed some input.
If p1 succeeded without consuming input, p2 should not run.
If p1 failed, then p2 is also ignored?
Overall result is r1's result
Parsec primitives make an internal distinction between a parser that succeeds after consuming some input and a parser that succeeds after consuming no input which you should be able to leverage. In particular, the following ought to work to parse p and then -- conditioned on p successfully consuming input -- parse q and discard its results:
ifConsumed :: Monad m => ParsecT s u m a -> ParsecT s u m b -> ParsecT s u m a
ifConsumed p q = mkPT k
where -- k :: State s u -> m (Consumed (m (Reply s u a)))
k s = do cons <- runParsecT p s
case cons of
Consumed mrep -> do
rep <- mrep
case rep of
Ok x s' err -> runParsecT (fmap (const x) q) s'
Error err -> return . Consumed . return $ Error err
Empty mrep -> do
rep <- mrep
case rep of
Ok x s' err -> return . Empty . return $ Ok x s' err
Error err -> return . Empty . return $ Error err
It's ugly because Parsec doesn't directly expose the ParsecT constructor, so you have to use the mkPt and runParsecT intermediaries which add a lot of boilerplate.
In a nutshell, it runs the p parser. If this succeeds with input consumed (the Consumed -> Ok branch), it runs the q parser modified via fmap to return the value parsed by p. On the other hand, if p succeeds with no input consumed (the Empty -> Ok branch), it simply returns success without running the q parser.
The only caveat is that I'm not 100% sure how, within the Parsec library itself, the invariant is preserved whereby the Consumed -> Ok branch only gets called when input has been consumed, so I don't know if this is truly reliable. You'll want to test it carefully in your particular use case.
For the following parser --- which parses a list of one or more elements separated commas where each element consists of zero or more digits, then two exclamation marks only if the previous parser consumed some input, then a semicolon --- it seems to work:
p :: Parser [String]
p = ifConsumed (sepBy1 (many digit) (char ',')) (char '!' >> char '!') <* char ';'
runp :: String -> Either ParseError [String]
runp = parse p ""
Some tests:
runp "" -- fails, expecting semicolon
runp ";" -- returns [""]
runp "!!;" -- fails, "!!" w/ no preceding content
runp ",;" -- fails, missing "!!"
runp ",!!;" -- returns ["",""]
runp ",!;" -- fails, expecting second "!"
runp ",1,23;" -- fails, missing "!!"
runp ",1,23!!;" -- returns ["","1","23"]
With a naive parser implementation, you should be able to do this:
(<<) p1 p2 = P $ \inp -> case parse p1 inp of
ErrorResult e -> ErrorResult e
SuccessResult (rem, res) -> if rem == inp
then SuccessResult (rem, res)
else parse p2 rem
Though Parsec is more advanced, you could probably roll your own there as well.
I don't think you can do that for arbitrary parsers p1 and p2: you need them to communicate somehow. If you could do this, it seems to me that you would break referential transparency.
For example, consider parsing the input string repeat 'x': whether p1 consumes a character or not, p2 will see the string as an endless sea of x characters. If it hasn't communicated with p1 somehow (eg by modifying something in the parser's state), then you can't know whether a character was consumed; if your combinator were somehow able to treat these two cases differently, it would be breaking the rules.
I am writing interpreter in Haskell and I have map: name_of_function -> defintion of function.
Map String Defun
My interpreter monad:
type InterpreterMonad = StateT (EnvFun, (Stack EnvEval)) (ErrorT String IO ) ()
And as you can see "last" monad is ErroT.
Now, when I would like handle calling functions:
handleFCall :: Stmt -> InterpreterMonad
handleFCall (VarName name) args = get >>= \(envFun, contextStack) -> case (Map.lookup (VarName name) envFun) of
Nothing -> throwError "Err"
(Just x) -> DoSthOther
And as you can see I have to use case. However I am using >>= so I would like to avoid case of here. But, Map.lookup return Nothing on fail and I would like to add my error message.
I have no experience in Haskell so I don't know how to deal with it . Morever, every criticism to my code is welcome.
Cheers.
There's nothing wrong with using a case statement, although reformatted with do notation your function would look more like
handleFCall (VarName name) args = do
(envFun, contextStack) <- get
case Map.lookup (VarName name) envFun of
Nothing -> throwError "Err"
Just x -> doSthOther x
Which would be more familiar to other Haskell programmers. However, if you really want to avoid using a case, just use the maybe function:
handleFCall (VarName name) args = do
(envFun, contextStack) <- get
maybe (throwError "Err") doSthOther $ Map.lookup (VarName name) envFun
Or with >>=
handleFCall (VarName name) args
= get >>=
(\(envFun, contextStack) ->
maybe (throwError "Err") doSthOther
$ Map.lookup (VarName name) envFun
)
But personally I find the case statement more readable.
The errors package has the following operations in Control.Error.Util (and reexported from Control.Error):
-- You can turn `Maybe` into `Either` by supplying a constant to use
-- for the `Left` case.
note :: a -> Maybe b -> Either a b
-- Ditto for `MaybeT` and `ExceptT`
noteT :: Monad m => a -> MaybeT m b -> ExceptT a m b
-- And there are useful shortcuts for other combinations
failWith :: Applicative m => e -> Maybe a -> ExceptT e m a
failWith in particular looks like it would fit quite nicely here, something like this expression would do the trick.
failWith "Err" (Map.lookup (VarName name) envFun)
Note that the ErrorT transformer that you're using in your example has been deprecated in favor of ExceptT (as the signatures above are using), so you'd have to change your code accordingly to use the errors package. On the other hand these functions are simple enough that you can just bake your own if you so prefer.
Working in a Coding Dojo today I tried the following
example :: IO ()
example = do input <- getLine
parsed <- parseOnly parser input
...
where parseOnly :: Parser a -> Either String a (from attoparsec) of course the compiler complained that Either .. is not IO .. essentially telling me I am mixing monads.
Of course this can be solved by
case parseOnly parser input of .. -> ..
which is kind of unelegant, I think. Also my guess is that somebody else had this problem earlier and the solution I think is related to monad transformers, but the last bits I cannot piece together.
It also reminded me of liftIO - but this is the other way around I think which solves the problem of lifting an IO action happening inside some surrounding monad (more precisely MonadIO - say for example inside Snap when one wants to print something to stdout while getting some http).
More general this problem seems to be for a Monad m1 and a (different) Monad m2 how can I do something like
example = do a <- m1Action
b <- m2Action
..
You can't, in general. The whole do block has to be one specific monad (because example needs to have some specific type). If you could bind an arbitrary other monad inside that do block, you'd have unsafePerformIO.
Monad transformers allow you to produce one monad combining the kinds of things multiple other monads can do. But you have to decide that all the actions in your do block use the same monad transformer stack to use those, they're not a way to arbitrarily switch monads mid do-block.
Your solution with case only works because you've got a particular known monad (Either) that has a way of extracting values from inside it. Not all monads provide this, so it's impossible to build a general solution without knowing the particular monads involved. That's why the do block syntax doesn't provide such a shortcut.
In general, monad transformers are for this kind of interleaving. You can use ExceptT
example :: IO (Either String ())
example = runExceptT $ do
input <- liftIO getLine
parsed <- parseOnly parser input
...
Note that parseOnly must return ExceptT String IO a for some a. Or better ExceptT String m a for any m. Or if you want parseOnly to return Either String a it's
example :: IO (Either String ())
example = runExceptT $ do
input <- lift getLine
parsed <- ExceptT $ return $ parseOnly parser input
...
But I think all you need is just
eitherToIO :: Either String a -> IO a
eitherToIO (Left s) = error s
eitherToIO (Right x) = return x
parseOnly :: ... -> String -> Either String Int
example :: IO ()
example = do
input <- getLine
parsed <- eitherToIO $ parseOnly parser input
...
You need to make that expression type check; just as in pure code. Here,
... = do a <- act1 -- m1 monad
b <- act2 -- m2 monad
...
de-sugars to:
... = act1 >>= (\a -> act2 >>= \b -> ...)
>>= is of signature:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
the outer bind is specialized with m1, so it expects that the expression inside parenthesis be of type: a -> m1 b, whereas the inner bind is specialized with m2, so the expression inside parenthesis will be of type a -> m2 b:
-- outer bind expects ( \a -> m1 b )
act1 >>= (\a -> act2 >>= \b -> ...)
-- inner bind results ( \a -> m2 b )
for this to type check, you need a function of signature m2 b -> m1 b in between the two; that is what lift does for a certain class of m2 and m1 monads: namely m1 ~ t m2 where t is an instance of MonadTrans:
lift :: (Monad m, MonadTrans t) => m a -> t m a
Using Parsec how does one indicate an error at a specific position if a semantic rule is violated. I know typically we don't want to do such things, but consider the example grammar.
<foo> ::= <bar> | ...
<bar> ::= a positive integer power of two
The <bar> rule is a finite set (my example is arbitrary), and a pure approach to the above could be a careful application of the choice combinator, but this might be impractical in space and time. In recursive descent or toolkit-generated parsers the standard trick is to parse an integer (a more relaxed grammar) and then semantically check the harder constraints. For Parsec, I could use a natural parser and check the result calling fail when that doesn't match or unexpected or whatever. But if we do that, the default error location is the wrong one. Somehow I need to raise the error at the earlier state.
I tried a brute force solution and wrote a combinator that uses getPosition and setPosition as illustrated by this very similar question. Of course, I was also unsuccessful (the error location is, of course wrong). I've run into this pattern many times. I am kind of looking for this type of combinator:
withPredicate :: (a -> Bool) -> String -> P a -> P a
withPredicate pred lbl p = do
ok <- lookAhead $ fmap pred (try p) <|> return False -- peek ahead
if ok then p -- consume the input if the value passed the predicate
else fail lbl -- otherwise raise the error at the *start* of this token
pPowerOfTwo = withPredicate isPowerOfTwo "power of two" natural
where isPowerOfTwo = (`elem` [2^i | i<-[1..20]])
The above does not work. (I tried variants on this as well.) Somehow the parser backtracks a says it's expecting a digit. I assume it's returning the error that made it the furthest. Even {get,set}ParserState fails erase that memory.
Am I handling this syntactic pattern wrong? How would all you Parsec users approach these type of problems?
Thanks!
I think both your ideas are OK. The other two answers deal with Parsec, but I'd like to note that in both
cases Megaparsec just does the right thing:
{-# LANGUAGE TypeApplications #-}
module Main (main) where
import Control.Monad
import Data.Void
import Text.Megaparsec
import qualified Text.Megaparsec.Char.Lexer as L
type Parser = Parsec Void String
withPredicate1 :: (a -> Bool) -> String -> Parser a -> Parser a
withPredicate1 f msg p = do
r <- lookAhead p
if f r
then p
else fail msg
withPredicate2 :: (a -> Bool) -> String -> Parser a -> Parser a
withPredicate2 f msg p = do
mpos <- getNextTokenPosition -- †
r <- p
if f r
then return r
else do
forM_ mpos setPosition
fail msg
main :: IO ()
main = do
let msg = "I only like numbers greater than 42!"
parseTest' (withPredicate1 #Integer (> 42) msg L.decimal) "11"
parseTest' (withPredicate2 #Integer (> 42) msg L.decimal) "22"
If I run it:
The next big Haskell project is about to start!
λ> :main
1:1:
|
1 | 11
| ^
I only like numbers greater than 42!
1:1:
|
1 | 22
| ^
I only like numbers greater than 42!
λ>
Try it for yourself! Works as expected.
† getNextTokenPosition is more correct than getPosition for streams where tokens contain position of their beginning and end in themselves. This may or may not be important in your case.
It's not a solution I like, but you can hypnotize Parsec into believing it's had a single failure with consumption:
failAt pos msg = mkPT (\_ -> return (Consumed (return $ Error $ newErrorMessage (Expect msg) pos)))
Here's a complete example:
import Control.Monad
import Text.Parsec
import Text.Parsec.Char
import Text.Parsec.Error
import Text.Parsec.Prim
import Debug.Trace
failAt pos msg = mkPT (\_ -> return (Consumed (return $ Error $ newErrorMessage (Expect msg) pos)))
type P a = Parsec String () a
withPredicate :: (a -> Bool) -> String -> P a -> P a
withPredicate pred msg p = do
pos <- getPosition
x <- p
unless (pred x) $ failAt pos msg
return x
natural = read <$> many1 digit
pPowerOfTwo = withPredicate isPowerOfTwo "power of two" natural
where isPowerOfTwo = (`elem` [2^i | i<-[1..20]])
main = print $ runParser pPowerOfTwo () "myinput" "4095"
When run, it results in:
Left "myinput" (line 1, column 1):
expecting power of two
I think the problem stems from how Parsec picks the "best error" in the non-deterministic setting. See Text.Parsec.Error.mergeError. Specifically, this selects the longest match when choosing which error is the error to report. I think we need some way to make Parsec order errors differently, which may be too obscure for us solving this problem.
In my case, I here's how I worked around the problem:
I solved stacked an Exception monad within my ParsecT type.
type P m = P.ParsecT String ParSt (ExceptT Diagnostic m)
Then I introduced a pair of combinators:
(Note: Loc is my internal location type)
-- stops hard on an error (no backtracking)
-- which is why I say "semantic" instead of "syntax" error
throwSemanticError :: (MonadTrans t, Monad m) => Loc -> String -> t (ExceptT Diagnostic m) a
throwSemanticError loc msg = throwSemanticErrorDiag $! Diagnostic loc msg
withLoc :: Monad m => (Loc -> P m a) -> P m a
withLoc pa = getLoc >>= pa
Now in parsing I can write:
parsePrimeNumber = withLoc $ \loc ->
i <- parseInt
unless (isPrime i) $ throwSemanticError loc "number is not prime!"
return i
The top level interface to run one of these monads is really nasty.
runP :: Monad m
=> ParseOpts
-> P m a
-> String
-> m (ParseResult a)
runP pos pma inp =
case runExceptT (P.runParserT pma (initPSt pos) "" inp) of
mea -> do
ea <- mea
case ea of
-- semantic error (throwSemanticError)
Left err -> return $! PError err
-- regular parse error
Right (Left err) -> return $ PError (errToDiag err)
-- success
Right (Right a) -> return (PSuccess a [])
I'm not terribly happy with this solution and desire something better.
I wish parsec had a:
semanticCheck :: (a -> Parsec Bool) -> Parsec a -> Parsec a
semanticCheck pred p =
a <- p
z <- pred a
unless z $
... somehow raise the error from the beginning of this token/parse
rather than the end ... and when propagating the error up,
use the end parse position, so this parse error beats out other
failed parsers that make it past the beginning of this token
(but not to the end)
return a
Using lookAhead, we can run a parser without consuming any input or registering any new errors, but record the state that we end up in. We can then apply a guard to the result of the parser. The guard can fail in whatever manner it desires if the value does not pass the semantic check. If the guard fails, then the error is located at the initial position. If the guard succeeds, we reset the parser to the recorded state, avoiding the need to re-execute p.
guardP :: Stream s m t => (a -> ParsecT s u m ()) -> ParsecT s u m a -> ParsecT s u m a
guardP guard p = do
(a, s) <- try . lookAhead $ do
a <- p
s <- getParserState
return (a, s)
guard a
setParserState s
return a
We can now implement pPowerOfTwo:
pPowerOfTwo :: Stream s m Char => ParsecT s u m Integer
pPowerOfTwo = guardP guardPowerOfTwo natural <?> "power of two"
where guardPowerOfTwo s = unless (s `elem` [2^i | i <- [1..20]]) . unexpected $ show s
I'm getting started with Netwire version 5.
I have no problem writing all the wires I want to transform my inputs into my outputs.
Now the time has come to write the IO wrapper to tie in my real-world inputs, and I am a bit confused.
Am I supposed to create a custom session type for the s parameter of Wire s e m a b and embed my sensor values in there?
If so, I have these questions:
What's up with the Monoid s context of class (Monoid s, Real t) => HasTime t s | s -> t? What is it used for?
I was thinking of tacking on a Map String Double with my sensor readings, but how should my monoid crunch the dictionaries? Should it be left-biased? Right-biased? None of the above?
If not, what am I supposed to do? I want to end up with wires of the form Wire s InhibitionReason Identity () Double for some s, representing my input.
It's my understanding that I don't want or need to use the monadic m parameter of Wire for this purpose, allowing the wires themselves to be pure and confining the IO to the code that steps through the top-level wire(s). Is this incorrect?
The simplest way to put data into a Wire s e m a b is via the input a. It's possible, through the use of WPure or WGen to get data out of the state delta s or the underlying Monad m, but these take us further away from the main abstractions. The main abstractions are Arrow and Category, which only know about a b, and not about s e m.
Here's an example of a very simple program, providing input as the input a. double is the outermost wire of the program. repl is a small read-eval-print loop that calls stepWire to run the wire.
import FRP.Netwire
import Control.Wire.Core
import Prelude hiding (id, (.))
double :: Arrow a => a [x] [x]
double = arr (\xs -> xs ++ xs)
repl :: Wire (Timed Int ()) e IO String String -> IO ()
repl w = do
a <- getLine
(eb, w') <- stepWire w (Timed 1 ()) (Right a)
putStrLn . either (const "Inhibited") id $ eb
repl w'
main = repl double
Notice that we pass in the time difference to stepWire, not the total elapsed time. We can check that this is the correct thing to do by running a different top-level wire.
timeString :: (HasTime t s, Show t, Monad m) => Wire s e m a String
timeString = arr show . time
main = repl timeString
Which has the desired output:
a
1
b
2
c
3
I just solved this in an Arrow way, so this might be more composible. You can read my posts if you like. Kleisli Arrow in Netwire 5? and Console interactivity in Netwire?. The second post has a complete interactive program
First, you need this to lift Kleisli functions (That is, anything a -> m b):
mkKleisli :: (Monad m, Monoid e) => (a -> m b) -> Wire s e m a b
mkKleisli f = mkGen_ $ \a -> liftM Right $ f a
Then, assuming you want to get characters from terminal, you can lift hGetChar by doing this:
inputWire :: Wire s () IO () Char
inputWire = mkKleisli $ \_ -> hGetChar stdin
I haven't tested this runWire function (I just stripped code off from my previous posts), but it should run your wires:
runWire :: (Monad m) => Session m s -> Wire s e m () () -> m ()
runWire s w = do
(ds, s') <- stepSession s
-- | You don't really care about the () returned
(_, w') <- stepWire w ds (Right ())
runWire s' w'
You can compose the input wire wherever you like like any other Wires or Arrows. In my example, I did this (don't just copy, other parts of the program are different):
mainWire = proc _ -> do
c <- inputWire -< ()
q <- quitWire -< c
outputWire -< c
returnA -< q
Or, one-liner:
mainWire = inputWire >>> (quitWire &&& outputWire) >>> arr (\(q,_) -> q)