Parsec: error message at specific location - haskell

Using Parsec how does one indicate an error at a specific position if a semantic rule is violated. I know typically we don't want to do such things, but consider the example grammar.
<foo> ::= <bar> | ...
<bar> ::= a positive integer power of two
The <bar> rule is a finite set (my example is arbitrary), and a pure approach to the above could be a careful application of the choice combinator, but this might be impractical in space and time. In recursive descent or toolkit-generated parsers the standard trick is to parse an integer (a more relaxed grammar) and then semantically check the harder constraints. For Parsec, I could use a natural parser and check the result calling fail when that doesn't match or unexpected or whatever. But if we do that, the default error location is the wrong one. Somehow I need to raise the error at the earlier state.
I tried a brute force solution and wrote a combinator that uses getPosition and setPosition as illustrated by this very similar question. Of course, I was also unsuccessful (the error location is, of course wrong). I've run into this pattern many times. I am kind of looking for this type of combinator:
withPredicate :: (a -> Bool) -> String -> P a -> P a
withPredicate pred lbl p = do
ok <- lookAhead $ fmap pred (try p) <|> return False -- peek ahead
if ok then p -- consume the input if the value passed the predicate
else fail lbl -- otherwise raise the error at the *start* of this token
pPowerOfTwo = withPredicate isPowerOfTwo "power of two" natural
where isPowerOfTwo = (`elem` [2^i | i<-[1..20]])
The above does not work. (I tried variants on this as well.) Somehow the parser backtracks a says it's expecting a digit. I assume it's returning the error that made it the furthest. Even {get,set}ParserState fails erase that memory.
Am I handling this syntactic pattern wrong? How would all you Parsec users approach these type of problems?
Thanks!

I think both your ideas are OK. The other two answers deal with Parsec, but I'd like to note that in both
cases Megaparsec just does the right thing:
{-# LANGUAGE TypeApplications #-}
module Main (main) where
import Control.Monad
import Data.Void
import Text.Megaparsec
import qualified Text.Megaparsec.Char.Lexer as L
type Parser = Parsec Void String
withPredicate1 :: (a -> Bool) -> String -> Parser a -> Parser a
withPredicate1 f msg p = do
r <- lookAhead p
if f r
then p
else fail msg
withPredicate2 :: (a -> Bool) -> String -> Parser a -> Parser a
withPredicate2 f msg p = do
mpos <- getNextTokenPosition -- †
r <- p
if f r
then return r
else do
forM_ mpos setPosition
fail msg
main :: IO ()
main = do
let msg = "I only like numbers greater than 42!"
parseTest' (withPredicate1 #Integer (> 42) msg L.decimal) "11"
parseTest' (withPredicate2 #Integer (> 42) msg L.decimal) "22"
If I run it:
The next big Haskell project is about to start!
λ> :main
1:1:
|
1 | 11
| ^
I only like numbers greater than 42!
1:1:
|
1 | 22
| ^
I only like numbers greater than 42!
λ>
Try it for yourself! Works as expected.
† getNextTokenPosition is more correct than getPosition for streams where tokens contain position of their beginning and end in themselves. This may or may not be important in your case.

It's not a solution I like, but you can hypnotize Parsec into believing it's had a single failure with consumption:
failAt pos msg = mkPT (\_ -> return (Consumed (return $ Error $ newErrorMessage (Expect msg) pos)))
Here's a complete example:
import Control.Monad
import Text.Parsec
import Text.Parsec.Char
import Text.Parsec.Error
import Text.Parsec.Prim
import Debug.Trace
failAt pos msg = mkPT (\_ -> return (Consumed (return $ Error $ newErrorMessage (Expect msg) pos)))
type P a = Parsec String () a
withPredicate :: (a -> Bool) -> String -> P a -> P a
withPredicate pred msg p = do
pos <- getPosition
x <- p
unless (pred x) $ failAt pos msg
return x
natural = read <$> many1 digit
pPowerOfTwo = withPredicate isPowerOfTwo "power of two" natural
where isPowerOfTwo = (`elem` [2^i | i<-[1..20]])
main = print $ runParser pPowerOfTwo () "myinput" "4095"
When run, it results in:
Left "myinput" (line 1, column 1):
expecting power of two

I think the problem stems from how Parsec picks the "best error" in the non-deterministic setting. See Text.Parsec.Error.mergeError. Specifically, this selects the longest match when choosing which error is the error to report. I think we need some way to make Parsec order errors differently, which may be too obscure for us solving this problem.
In my case, I here's how I worked around the problem:
I solved stacked an Exception monad within my ParsecT type.
type P m = P.ParsecT String ParSt (ExceptT Diagnostic m)
Then I introduced a pair of combinators:
(Note: Loc is my internal location type)
-- stops hard on an error (no backtracking)
-- which is why I say "semantic" instead of "syntax" error
throwSemanticError :: (MonadTrans t, Monad m) => Loc -> String -> t (ExceptT Diagnostic m) a
throwSemanticError loc msg = throwSemanticErrorDiag $! Diagnostic loc msg
withLoc :: Monad m => (Loc -> P m a) -> P m a
withLoc pa = getLoc >>= pa
Now in parsing I can write:
parsePrimeNumber = withLoc $ \loc ->
i <- parseInt
unless (isPrime i) $ throwSemanticError loc "number is not prime!"
return i
The top level interface to run one of these monads is really nasty.
runP :: Monad m
=> ParseOpts
-> P m a
-> String
-> m (ParseResult a)
runP pos pma inp =
case runExceptT (P.runParserT pma (initPSt pos) "" inp) of
mea -> do
ea <- mea
case ea of
-- semantic error (throwSemanticError)
Left err -> return $! PError err
-- regular parse error
Right (Left err) -> return $ PError (errToDiag err)
-- success
Right (Right a) -> return (PSuccess a [])
I'm not terribly happy with this solution and desire something better.
I wish parsec had a:
semanticCheck :: (a -> Parsec Bool) -> Parsec a -> Parsec a
semanticCheck pred p =
a <- p
z <- pred a
unless z $
... somehow raise the error from the beginning of this token/parse
rather than the end ... and when propagating the error up,
use the end parse position, so this parse error beats out other
failed parsers that make it past the beginning of this token
(but not to the end)
return a

Using lookAhead, we can run a parser without consuming any input or registering any new errors, but record the state that we end up in. We can then apply a guard to the result of the parser. The guard can fail in whatever manner it desires if the value does not pass the semantic check. If the guard fails, then the error is located at the initial position. If the guard succeeds, we reset the parser to the recorded state, avoiding the need to re-execute p.
guardP :: Stream s m t => (a -> ParsecT s u m ()) -> ParsecT s u m a -> ParsecT s u m a
guardP guard p = do
(a, s) <- try . lookAhead $ do
a <- p
s <- getParserState
return (a, s)
guard a
setParserState s
return a
We can now implement pPowerOfTwo:
pPowerOfTwo :: Stream s m Char => ParsecT s u m Integer
pPowerOfTwo = guardP guardPowerOfTwo natural <?> "power of two"
where guardPowerOfTwo s = unless (s `elem` [2^i | i <- [1..20]]) . unexpected $ show s

Related

How to parse a series of lines (with only a few interesting ones) with Parsec in Haskell

I have some input data of the form below (this is just a small sample).
ID_SID_0_LANG=eng
ID_VIDEO_FORMAT=H264
ID_VIDEO_HEIGHT=574
ID_START_TIME=0.00
ID_SUBTITLE_ID=0
ID_VIDEO_ID=0
ID_VIDEO_FPS=25.000
ID_VIDEO_WIDTH=700
I'm trying to see if I can parse this with Parsec. For the sake of our example, I want to pull out two values, the width and the height. I am trying to see if this can be done with Parsec.
The lines may come in any order
If either the width or the height are missing, I'd like a ParseError
If either the width or the height occur more than once, I'd like a ParseError
The other lines are mixed and varied per input, I can assume nothing beyond their basic format.
I'd like to use Parsec because I'm going to have to parse the values (which, in general, may be of different types - enumerations for codecs, elapsed types, strings, etc.). And I'd like my returned data structure to contain Naturals rather than, say, Maybe Natural, to simplify later code.
My problem is how to "parse" the leading ID_ lines that aren't interesting to me, but pick up only those that are. So I want to parse "any number of uninteresting ID_ lines; a height (or width); any number of uninteresting ID_ lines; a width (or height if width already found); any number of uninteresting ID_ lines). And I'd like to do this without repeating the notion of what constitutes an "interesting" key, because repetition is a primary cause of subtle error when being later maintained.
My best effort so far is to parse lines producing a list of Data Structure Modifiers for the interesting lines, each with a Key, and separately checking for presence of the required lines and lack of duplication of the unique lines; but that's not satisfying because I'm repeating the "interesting" keys.
Can this be elegantly done with Parsec?
Thanks,
Given that you want an "elegant" Parsec solution, I think you're looking for a variant of a permutation parser.
For background reading, see the documentation for Text.Parsec.Perm and its more modern incarnation in module Control.Applicative.Permutation of the parser-combinators library. In addition, this Functional Pearl paper Parsing Permutation Phrases describes the approach and is great fun to read.
There are two special aspects to your problem: First, I'm not aware of an existing permutation parser that allows for "unmatched" content before, between and after matched portions in a clean manner, and hacks like building the skip logic into the component parsers or deriving an extra parser to identify skippable lines for use in intercalateEffect from Control.Applicative.Permutation seem ugly. Second, the special structure of your input -- the fact that the lines can be recognized by the identifier rather than only general component parsers -- means that we can write a more efficient solution than a usual permutation parser, one that looks up identifiers in a map instead of trying a list of parsers in sequence.
Below is a possible solution. On the one hand, it's using a sledgehammer to kill a fly. In your simple situation, writing an ad hoc parser to read in the identifiers and their RHSs, check for required identifiers and duplicates, and then invoke identifier-specific parsers for the RHSs, seems more straightforward. On the other hand, maybe there are more complicated scenarios where the solution below would be justified, and I think it's conceivable it might be useful to others.
Anyway, here's the idea. First, some preliminaries:
{-# OPTIONS_GHC -Wall #-}
module ParseLines where
import Control.Applicative
import Control.Monad
import Data.List (intercalate)
import Text.Parsec (unexpected, eof, parseTest)
import Text.Parsec.Char (char, letter, alphaNum, noneOf, newline, digit)
import Text.Parsec.String (Parser)
import qualified Data.Map.Lazy as Map
import qualified Data.Set as Set
Let's say we have a data type representing the final result of the parse:
data Video = Video
{ width :: Int
, height :: Int
} deriving (Show)
We're going to construct a Permutation a parser. The type a is what we're going to eventually return (and in this case, it's always Video). This Permutation will actually be a Map from "known" identifiers like ID_VIDEO_WIDTH to a special kind of parser that will parse the right-hand side for the given identifier (e.g., an integer like 700) and then return -- not the parsed integer -- but a continuation Permutation a that parses the remaining data to construct a Video, with the parsed integer (e.g., 700) "baked in" to the continuation. The continuation will have a map that recognizes the "remaining" values, and we'll also keep track of known identifiers we've already read to flag duplicates.
We'll use the following type:
type Identifier = String
data Permutation a = Permutation
-- "seen" identifiers for flagging duplicates
(Set.Set Identifier)
(Either
-- if there are more values to read, map identifier to a parser
-- that parses RHS and returns continuation for parsing the rest
(Map.Map Identifier (Parser (Permutation a)))
-- or we're ready for an eof and can return the final value
a)
"Running" such a parser involves converting it to a plain Parser, and this is where we implement the logic for identifying recognized lines, flagging duplicates, and skipping unrecognized identifiers. First, here's a parser for identifiers. If you wanted to be more lenient, you could use many1 (noneOf "\n=") or something.
ident :: Parser String
ident = (:) <$> letter' <*> many alphaNum'
where letter' = letter <|> underscore
alphaNum' = alphaNum <|> underscore
underscore = char '_'
and here's a parser for skipping the rest of a line when we see an unrecognized identifier:
skipLine :: Parser ()
skipLine = void $ many (noneOf "\n") >> newline
Finally, here's how we run the Permutation parser:
runPermutation :: Permutation a -> Parser a
runPermutation p#(Permutation seen e)
= -- if end of file, return the final answer (or error)
eof *>
case e of
Left m -> fail $
"eof before " ++ intercalate ", " (Map.keys m)
Right a -> return a
<|>
-- otherwise, parse the identifier
do k <- ident <* char '='
-- is it one we're waiting for?
case either (Map.lookup k) (const Nothing) e of
-- no, it's not, so check for duplicates and skip
Nothing -> if Set.member k seen
then unexpected ("duplicate " ++ k)
else skipLine *> runPermutation p
-- yes, it is
Just prhs -> do
-- parse the RHS to get a continuation Permutation
-- and run it to parse rest of parameters
(prhs <* newline) >>= runPermutation
To see how this is supposed to work, here's how we would directly construct a Permutation to parse a Video. It's long, but not that complicated:
perm2 :: Permutation Video
perm2 = Permutation
-- nothing's been seen yet
Set.empty
-- parse width or height
$ Left (Map.fromList
[ ("ID_VIDEO_WIDTH", do
-- parse the width
w <- int
-- return a continuation permutation
return $ Permutation
-- we've seen width
(Set.fromList ["ID_VIDEO_WIDTH"])
-- parse height
$ Left (Map.fromList
[ ("ID_VIDEO_HEIGHT", do
-- parse the height
h <- int
-- return a continuation permutation
return $ Permutation
-- we've seen them all
(Set.fromList ["ID_VIDEO_WIDTH", "ID_VIDEO_HEIGHT"])
-- have all parameters, so eof returns the video
$ Right (Video w h))
]))
-- similarly for other permutation:
, ("ID_VIDEO_HEIGHT", do
h <- int
return $ Permutation
(Set.fromList ["ID_VIDEO_HEIGHT"])
$ Left (Map.fromList
[ ("ID_VIDEO_WIDTH", do
w <- int
return $ Permutation
(Set.fromList ["ID_VIDEO_WIDTH", "ID_VIDEO_HEIGHT"])
$ Right (Video w h))
]))
])
int :: Parser Int
int = read <$> some digit
You can test it like so:
testdata1 :: String
testdata1 = unlines
[ "ID_SID_0_LANG=eng"
, "ID_VIDEO_FORMAT=H264"
, "ID_VIDEO_HEIGHT=574"
, "ID_START_TIME=0.00"
, "ID_SUBTITLE_ID=0"
, "ID_VIDEO_ID=0"
, "ID_VIDEO_FPS=25.000"
, "ID_VIDEO_WIDTH=700"
]
test1 :: IO ()
test1 = parseTest (runPermutation perm2) testdata1
You should be able to verify that it provides appropriate errors for missing keys, duplicate entries for known keys, and accepts keys in any order.
Finally, we obviously don't want to construct permutation parsers like perm2 manually, so we take a page from the Text.Parsec.Perm module and introduce the following syntax:
video :: Parser Video
video = runPermutation (Video <$$> ("ID_VIDEO_WIDTH", int) <||> ("ID_VIDEO_HEIGHT", int))
and define operators to construct the necessary Permutation objects. These definitions are a little tricky, but they follow pretty directly from the definition of Permutation.
(<$$>) :: (a -> b) -> (Identifier, Parser a) -> Permutation b
f <$$> xq = Permutation Set.empty (Right f) <||> xq
infixl 2 <$$>
(<||>) :: Permutation (a -> b) -> (Identifier, Parser a) -> Permutation b
p#(Permutation seen e) <||> (x, q)
= Permutation seen (Left (Map.insert x q' m'))
where
q' = (\a -> addQ x a p) <$> q
m' = case e of Right _ -> Map.empty
Left m -> Map.map (fmap (<||> (x, q))) m
infixl 1 <||>
addQ :: Identifier -> a -> Permutation (a -> b) -> Permutation b
addQ x a (Permutation seen e)
= Permutation (Set.insert x seen) $ case e of
Right f -> Right (f a)
Left m -> Left (Map.map (fmap (addQ x a)) m)
and the final test:
test :: IO ()
test = parseTest video testdata1
giving:
> test
Video {width = 700, height = 574}
>
Here's the final code, slightly rearranged:
{-# OPTIONS_GHC -Wall #-}
module ParseLines where
import Control.Applicative
import Control.Monad
import Data.List (intercalate)
import Text.Parsec (unexpected, eof, parseTest)
import Text.Parsec.Char (char, letter, alphaNum, noneOf, newline, digit)
import Text.Parsec.String (Parser)
import qualified Data.Map.Lazy as Map
import qualified Data.Set as Set
-- * Permutation parser for identifier settings
-- | General permutation parser for a type #a#.
data Permutation a = Permutation
-- | "Seen" identifiers for flagging duplicates
(Set.Set Identifier)
-- | Either map of continuation parsers for more identifiers or a
-- final value once we see eof.
(Either (Map.Map Identifier (Parser (Permutation a))) a)
-- | Create a one-identifier 'Permutation' from a 'Parser'.
(<$$>) :: (a -> b) -> (Identifier, Parser a) -> Permutation b
f <$$> xq = Permutation Set.empty (Right f) <||> xq
infixl 2 <$$>
-- | Add a 'Parser' to a 'Permutation'.
(<||>) :: Permutation (a -> b) -> (Identifier, Parser a) -> Permutation b
p#(Permutation seen e) <||> (x, q)
= Permutation seen (Left (Map.insert x q' m'))
where
q' = (\a -> addQ x a p) <$> q
m' = case e of Right _ -> Map.empty
Left m -> Map.map (fmap (<||> (x, q))) m
infixl 1 <||>
-- | Helper to add a parsed component to a 'Permutation'.
addQ :: Identifier -> a -> Permutation (a -> b) -> Permutation b
addQ x a (Permutation seen e)
= Permutation (Set.insert x seen) $ case e of
Right f -> Right (f a)
Left m -> Left (Map.map (fmap (addQ x a)) m)
-- | Convert a 'Permutation' to a 'Parser' that detects duplicates
-- and skips unknown identifiers.
runPermutation :: Permutation a -> Parser a
runPermutation p#(Permutation seen e)
= -- if end of file, return the final answer (or error)
eof *>
case e of
Left m -> fail $
"eof before " ++ intercalate ", " (Map.keys m)
Right a -> return a
<|>
-- otherwise, parse the identifier
do k <- ident <* char '='
-- is it one we're waiting for?
case either (Map.lookup k) (const Nothing) e of
-- no, it's not, so check for duplicates and skip
Nothing -> if Set.member k seen
then unexpected ("duplicate " ++ k)
else skipLine *> runPermutation p
-- yes, it is
Just prhs -> do
-- parse the RHS to get a continuation Permutation
-- and run it to parse rest of parameters
(prhs <* newline) >>= runPermutation
-- | Left-hand side of a setting.
type Identifier = String
-- | Parse an 'Identifier'.
ident :: Parser Identifier
ident = (:) <$> letter' <*> many alphaNum'
where letter' = letter <|> underscore
alphaNum' = alphaNum <|> underscore
underscore = char '_'
-- | Skip (rest of) a line.
skipLine :: Parser ()
skipLine = void $ many (noneOf "\n") >> newline
-- * Parsing video information
-- | Our video data.
data Video = Video
{ width :: Int
, height :: Int
} deriving (Show)
-- | Parsing integers (RHS of width and height settings)
int :: Parser Int
int = read <$> some digit
-- | Some test data
testdata1 :: String
testdata1 = unlines
[ "ID_SID_0_LANG=eng"
, "ID_VIDEO_FORMAT=H264"
, "ID_VIDEO_HEIGHT=574"
, "ID_START_TIME=0.00"
, "ID_SUBTITLE_ID=0"
, "ID_VIDEO_ID=0"
, "ID_VIDEO_FPS=25.000"
, "ID_VIDEO_WIDTH=700"
]
-- | `Video` parser based on `Permutation`.
video :: Parser Video
video = runPermutation (Video <$$> ("ID_VIDEO_WIDTH", int) <||> ("ID_VIDEO_HEIGHT", int))
-- | The final test.
test :: IO ()
test = parseTest video testdata1
Indeed a simple solution would be to parse the file into Map ByteString ByteString, checking for duplicates while parsing, and then build the target result from that, checking that all required fields are present.
parseMap :: Parsec (Map ByteString ByteString)
-- ...
parseValues :: Map ByteString ByteString -> Parsec MyDataStructure
-- ...
Function parseValues can use Parsec again to parse the fields (perhaps using runP on each one) and to report errors or missing fields.
The disadvantage of this solution that parsing is done on two levels (once to get ByteStrings and the second time to parse them). And that this way we can't report correctly the position of errors found in parseValues. However, Parsec allows to get and set the current position in a file, so it might be feasible to include them in the map, and then use them when parsing the individual strings:
parseMap :: Parsec (Map ByteString (SourcePos, ByteString))
Using Parsec directly to parse the full result might be possible, but I'm afraid it'd be tricky to accomplish to allow arbitrary order and at the same time different output types of the fields.
If you don't mind a slight performance loss, write one parser for the width-line and one for the length-line and do something like this:
let ls = lines input in
case ([x | Right x <- parseWidth ls], [x | Right x <- parseLength ls]) of
([w],[l]) -> ...
_ -> parserError ...
It's easy to add separate error cases for repetated/missing values without repeating anything.

In Parsec, how do I run second parser, only if the first parser consumed some input?

I need a combinator like p1 << p2, but p2 should run only if p1 has succeeded and consumed some input.
If p1 succeeded without consuming input, p2 should not run.
If p1 failed, then p2 is also ignored?
Overall result is r1's result
Parsec primitives make an internal distinction between a parser that succeeds after consuming some input and a parser that succeeds after consuming no input which you should be able to leverage. In particular, the following ought to work to parse p and then -- conditioned on p successfully consuming input -- parse q and discard its results:
ifConsumed :: Monad m => ParsecT s u m a -> ParsecT s u m b -> ParsecT s u m a
ifConsumed p q = mkPT k
where -- k :: State s u -> m (Consumed (m (Reply s u a)))
k s = do cons <- runParsecT p s
case cons of
Consumed mrep -> do
rep <- mrep
case rep of
Ok x s' err -> runParsecT (fmap (const x) q) s'
Error err -> return . Consumed . return $ Error err
Empty mrep -> do
rep <- mrep
case rep of
Ok x s' err -> return . Empty . return $ Ok x s' err
Error err -> return . Empty . return $ Error err
It's ugly because Parsec doesn't directly expose the ParsecT constructor, so you have to use the mkPt and runParsecT intermediaries which add a lot of boilerplate.
In a nutshell, it runs the p parser. If this succeeds with input consumed (the Consumed -> Ok branch), it runs the q parser modified via fmap to return the value parsed by p. On the other hand, if p succeeds with no input consumed (the Empty -> Ok branch), it simply returns success without running the q parser.
The only caveat is that I'm not 100% sure how, within the Parsec library itself, the invariant is preserved whereby the Consumed -> Ok branch only gets called when input has been consumed, so I don't know if this is truly reliable. You'll want to test it carefully in your particular use case.
For the following parser --- which parses a list of one or more elements separated commas where each element consists of zero or more digits, then two exclamation marks only if the previous parser consumed some input, then a semicolon --- it seems to work:
p :: Parser [String]
p = ifConsumed (sepBy1 (many digit) (char ',')) (char '!' >> char '!') <* char ';'
runp :: String -> Either ParseError [String]
runp = parse p ""
Some tests:
runp "" -- fails, expecting semicolon
runp ";" -- returns [""]
runp "!!;" -- fails, "!!" w/ no preceding content
runp ",;" -- fails, missing "!!"
runp ",!!;" -- returns ["",""]
runp ",!;" -- fails, expecting second "!"
runp ",1,23;" -- fails, missing "!!"
runp ",1,23!!;" -- returns ["","1","23"]
With a naive parser implementation, you should be able to do this:
(<<) p1 p2 = P $ \inp -> case parse p1 inp of
ErrorResult e -> ErrorResult e
SuccessResult (rem, res) -> if rem == inp
then SuccessResult (rem, res)
else parse p2 rem
Though Parsec is more advanced, you could probably roll your own there as well.
I don't think you can do that for arbitrary parsers p1 and p2: you need them to communicate somehow. If you could do this, it seems to me that you would break referential transparency.
For example, consider parsing the input string repeat 'x': whether p1 consumes a character or not, p2 will see the string as an endless sea of x characters. If it hasn't communicated with p1 somehow (eg by modifying something in the parser's state), then you can't know whether a character was consumed; if your combinator were somehow able to treat these two cases differently, it would be breaking the rules.

Parsec: difference between "try" and "lookAhead"?

What is the difference between "try" and "lookAhead" functions in parsec?
The combinators try and lookAhead are similar in that they both let Parsec "rewind", but they apply in different circumstances. In particular, try rewinds failure while lookAhead rewinds success.
By the documentation, try "pretends that it hasn't consumed any input when an error occurs" while lookAhead p "parses p without consuming any input", but "if p fails and consumes some input, so does lookAhead".
So if you think of a parser running as walking along some streaming state and either failing or succeeding, which we might write in Haskell terms as
type Parser a = [Tokens] -> (Either Error a, [Tokens])
then try ensures that if (try p) input ---> (Left err, output) then input == output and lookAhead has it such that (lookAhead p) input ---> (Right a, output) then input == output, but if (lookAhead p) input ---> (Left err, output) then they may be allowed to differ.
We can see this in action by looking at the code for Parsec directly, which is somewhat more complex than my notion of Parser above. First we examine ParsecT
newtype ParsecT s u m a
= ParsecT {unParser :: forall b .
State s u
-> (a -> State s u -> ParseError -> m b) -- consumed ok
-> (ParseError -> m b) -- consumed err
-> (a -> State s u -> ParseError -> m b) -- empty ok
-> (ParseError -> m b) -- empty err
-> m b
}
ParsecT is a continuation-based datatype. If you look at how one of them is constructed
ParsecT $ \s cok cerr eok eerr -> ...
You'll see how we have access to the State s u, s, and four functions which determine how we move forward. For instance, the fail clause of ParsecT's Monad instance uses the eerr option, constructing a ParseError from the current input position and the passed error message.
parserFail :: String -> ParsecT s u m a
parserFail msg
= ParsecT $ \s _ _ _ eerr ->
eerr $ newErrorMessage (Message msg) (statePos s)
While the most primitive successful token parse (tokenPrim) uses a complex sequence of events eventually culminating in calling cok with an updated State s u.
With this intuition, the source for try is particularly simple.
try :: ParsecT s u m a -> ParsecT s u m a
try p =
ParsecT $ \s cok _ eok eerr ->
unParser p s cok eerr eok eerr
It simply builds a new ParsecT based on the one passed to try, but with the "empty err" continuation in place of the consumed error. Whatever parsing combinator next sees try p will be unable to access its actual "consumed err" continuation and thus try is protected from changing its state on errors.
But lookAhead is more sophisticated
lookAhead :: (Stream s m t) => ParsecT s u m a -> ParsecT s u m a
lookAhead p = do{ state <- getParserState
; x <- p'
; setParserState state
; return x
}
where
p' = ParsecT $ \s cok cerr eok eerr ->
unParser p s eok cerr eok eerr
Examining just the where-clause we see it depends on modifying the passed parser p to use the "empty ok" continuation in place of the "consumed ok" continuation. This is symmetric to what try did. Further, it ensures that the parser state is unaffected by whatever happens when this modified p' is run via its do-block.

Unwrapping a monad

Given the below program, I am having issues dealing with monads.
module Main
where
import System.Environment
import System.Directory
import System.IO
import Text.CSV
--------------------------------------------------
exister :: String -> IO Bool
exister path = do
fileexist <- doesFileExist path
direxist <- doesDirectoryExist path
return (fileexist || direxist )
--------------------------------------------------
slurp :: String -> IO String
slurp path = do
withFile path ReadMode (\handle -> do
contents <- hGetContents handle
last contents `seq` return contents )
--------------------------------------------------
main :: IO ()
main = do
[csv_filename] <- getArgs
putStrLn (show csv_filename)
csv_raw <- slurp csv_filename
let csv_data = parseCSV csv_filename csv_raw
printCSV csv_data -- unable to compile.
csv_data is an Either (parseerror) CSV type, and printCSV takes only CSV data.
Here's the ediff between the working version and the broken version.
***************
*** 27,30 ****
csv_raw <- slurp csv_filename
let csv_data = parseCSV csv_filename csv_raw
! printCSV csv_data -- unable to compile.
\ No newline at end of file
--- 27,35 ----
csv_raw <- slurp csv_filename
let csv_data = parseCSV csv_filename csv_raw
! case csv_data of
! Left error -> putStrLn $ show error
! Right csv_data -> putStrLn $ printCSV csv_data
!
! putStrLn "done"
!
reference: http://hackage.haskell.org/packages/archive/csv/0.1.2/doc/html/Text-CSV.html
Regarding monads:
Yes, Either a is a monad. So simplifying the problem, you are basically asking for this:
main = print $ magicMonadUnwrap v
v :: Either String Int
v = Right 3
magicMonadUnwrap :: (Monad m) => m a -> a
magicMonadUnwrap = undefined
How do you define magicMonadUnwrap? Well, you see, it's different for each monad. Each one needs its own unwrapper. Many of these have the word "run" in them, for example, runST, runCont, or runEval. However, for some monads, it might not be safe to unwrap them (hence the need for differing unwrappers).
One implementation for lists would be head. But what if the list is empty? An unwrapper for Maybe is fromJust, but what if it's Nothing?
Similarly, the unwrapper for the Either monad would be something like:
fromRight :: Either a b -> b
fromRight (Right x) = x
But this unwrapper isn't safe: what if you had a Left value instead? (Left usually represents an error state, in your case, a parse error). So the best way to act upon an Either value it is to use the either function, or else use a case statement matching Right and Left, as Daniel Wagner illustrated.
tl;dr: there is no magicMonadUnwrap. If you're inside that same monad, you can use <-, but to truly extract the value from a monad...well...how you do it depends on which monad you're dealing with.
Use case.
main = do
...
case csv_data of
Left err -> {- whatever you're going to do with an error -- print it, throw it as an exception, etc. -}
Right csv -> printCSV csv
The either function is shorter (syntax-wise), but boils down to the same thing.
main = do
...
either ({- error condition function -}) printCSV csv_data
You must unlearn what you have learned.
Master Yoda.
Instead of thinking about, or searching for ways to "free", "liberate", "release", "unwrap" or "extract" normal Haskell values from effect-centric (usually monadic) contexts, learn how to use one of Haskell's more distinctive features - functions are first-class values:
you can use functions like values of other types e.g. like Bool, Char, Int, Integer etc:
arithOps :: [(String, Int -> Int -> Int)]
arithOps = zip ["PLUS","MINUS", "MULT", "QUOT", "REM"]
[(+), (-), (*), quot, rem]
For your purposes, what's more important is that functions can also be used as arguments e.g:
map :: (a -> b) -> [a] -> [b]
map f xs = [ f x | x <- xs ]
filter :: (a -> Bool) -> [a] -> [a]
filter p xs = [ x | x <- xs, p x ]
These higher-order functions are even available for use in effect-bearing contexts e.g:
import Control.Monad
liftM :: Monad m => (a -> b) -> (m a -> m b)
liftM2 :: Monad m => (a -> b -> c) -> (m a -> m b -> m c)
liftM3 :: Monad m => (a -> b -> c -> d) -> (m a -> m b -> m c -> m d)
...etc, which you can use to lift your regular Haskell functions:
do .
.
.
val <- liftM3 calculate this_M that_M other_M
.
.
.
Of course, the direct approach also works:
do .
.
.
x <- this_M
y <- that_M
z <- other_M
let val = calculate x y z
.
.
.
As your skills develop, you'll find yourself delegating more and more code to ordinary functions and leaving the effects to a vanishingly-small set of entities defined in terms of functors, applicatives, monads, arrows, etc as you progress towards Haskell mastery.
You're not convinced? Well, here's a brief note of how effects used to be handled in Haskell - there's also a longer description of how Haskell arrived at the monadic interface. Alternately, you could look at Standard ML, OCaml, and other similar languages - who knows, maybe you'll be happier with using them...

ErrorT catchError in practice

I have a very typical setup with a set of functions in the IO monad that can throw errors. To date I have just been dealing with errors at the end of the monad chain by pattern matching the Either result from runErrorT:
replLisp :: LispScope -> String -> IO String
replLisp s input = do
result <- runErrorT (evalLisp s input)
return $ either (id) (show) result
I would now like to add some error handling to my Hacked little scheme, but I'm having trouble making the type checker happy.
How does one use catchError? An example or two would be helpful.
This is my latest attempt:
catch :: [LispVal] -> IOThrowsError LispVal
catch [action rescue] = do
eval action >>= catchError $ eval rescue
Here is an example use of catchError to recover from a prior call to throwError:
import Control.Monad.Error
import Control.Monad.Identity
type MyMonad = ErrorT String Identity
runMyMonad = runIdentity . runErrorT
main = do
let x = runMyMonad (func 5 0)
print x
func :: Double -> Double -> MyMonad Double
func w x = do
y <- (divider x) `catchError` (\_ -> return 1)
return (w + y)
divider :: Double -> MyMonad Double
divider x = do
when (x == 0) (throwError "Can not divide by zero!")
return (10 / x)
Despite passing 0 in for division we can complete with the handlers result of 1 to obtain output of Right 6.0.
Does this help? Your question didn't really say what the issue was.
Error monads like Either and Maybe don't allow you to observe the error from within the same monad: you have to run the monad in order to observe it. Exceptions in IO are one notable exception (ahem) because IO is the end of the line... you can't go any further from there.
You have a few possibilities:
Since you're writing a mini-interpreter, it's probably a good idea to explicitly manage all the exceptions, using the ErrorT monad only for true, unrecoverable errors.
For any call that may error that you want to be able to recover from, perform a runErrorT and inspect that result, before passing along the result in the current monad.

Resources