I have a DSL, and a parser for it written in Haskell with the Parsec package. Now I want to deprecate a specific language feature of the DSL. In the next release, I want the parser to accept both the new and the old syntax, but I want the parser to spawn a deprecation message. I could not find how to do this. Is this possible, and if so, how can this be done?
Instead of emitting messages during parsing, it would be better to return extra information at the end of parsing: whether or not deprecated syntax was encountered.
The ParsecT type admits a type parameter for state set by the user during parsing:
ParsecT s u m a is a parser with stream type s, user state type u, underlying monad m and return type a. Parsec is strict in the user state.
The user state can be set with putState and modifyState. It can be obtained using getState.
Most parsec combinators are polymorphic on the user state. Most combinators for your own DSL should be, as well. But parsers for deprecated parts of the syntax should set a "flag" in your user state.
Something like this:
import Text.Parsec
import Text.Parsec.Char
import Data.Functor.Identity
type Parser = ParsecT [Char] Bool Identity -- using a Bool state
myParser :: Parser Char
myParser =
try (do char 'a'
putState True
char 'b')
<|>
try (do char 'a'
char 'c')
main :: IO ()
main = do
print $ runParser ((,) <$> myParser <*> getState) False "" "ab"
print $ runParser ((,) <$> myParser <*> getState) False "" "ac"
-- results:
-- Right ('b',True)
-- Right ('c',False)
Of course, instead of a simple boolean flag, it would be better to put more information into the state.
Notice that state set by a sub-parser is "forgotten" if the sub-parser backtracks. That is the correct behavior for our purposes: otherwise, we would get "false positives" triggered by branches that are ultimately discarded.
A common alternative to parsec is megaparsec. The latter doesn't allow for user-defined state in the parser type itself, but it can be emulated using a StateT transformer over the ParsecT type.
Related
I'm stuck on a problem with writing a parser in Haskell that I hope someone can help out with!
It is a bit more complicated than my usual parser because there are two layers of parsing. First a language definition is parsed into an AST, then that AST is transformed into another parser that parses the actual language.
I have made pretty good progress so far but I'm stuck on implementing recursion in the language definition. As the language definition is transformed from AST into a parser in a recursive function, I can't work out how it can call itself if it doesn't exist yet.
I'm finding it a bit hard to explain my problem, so maybe an example will help.
The language definition might define that a language consists of three keywords in sequence and then optional recursion in brackets.
A B C ($RECURSE)
Which would be parsed into an AST like:
[Keyword "A", Keyword "B", Keyword "C", Optional (Many [Recurse])]
The Many is not really required for this example, but in my actual project, optional blocks can have multiple syntax elements in them so an Optional would contain a Many with n elements.
I would then want it to get transformed into a parser that parses strings like:
A B C
A B C (A B C)
A B C (A B C (A B C))
I've boiled down my project into the simplest possible example. You can see my TODO comment where I'm stuck trying to implement the recursion.
{-# LANGUAGE OverloadedStrings #-}
module Example
( runExample,
)
where
import Control.Applicative hiding (many, some)
import Data.Text (Text)
import Data.Void
import System.IO as SIO
import Text.Megaparsec hiding (State)
import Text.Megaparsec.Char (space1, string')
import qualified Text.Megaparsec.Char.Lexer as L
import Text.Megaparsec.Debug
import Text.Pretty.Simple (pPrint)
-- Types
type Parser = Parsec Void Text
data SyntaxAst = Keyword Text | Recurse | Optional SyntaxAst | Many [SyntaxAst]
-- Megaparsec Base Parsers
-- Space consumer - used by other parsers to ignore whitespace
sc :: Parser ()
sc =
L.space
space1
(L.skipLineComment "--")
(L.skipBlockComment "/*" "*/")
-- Runs a parser, then consumes any left over space with sc
lexeme :: Parser a -> Parser a
lexeme = L.lexeme sc
-- Parses a string, then consumes any left over space with sc
symbol :: Text -> Parser Text
symbol = L.symbol sc
-- Parses something between parentheses
inParens :: Parser a -> Parser a
inParens =
between
(symbol "(")
(symbol ")")
-- Transforms the AST into a parser
transformSyntaxExprToParser :: SyntaxAst -> Parser [Text]
transformSyntaxExprToParser (Many exprs) = dbg "Many" (createParser exprs)
transformSyntaxExprToParser (Keyword text) = dbg "Keyword" (pure <$> lexeme (string' text))
transformSyntaxExprToParser (Optional inner) = dbg "Optional" (option [] (try (inParens (transformSyntaxExprToParser inner))))
transformSyntaxExprToParser Recurse = dbg "Recurse" (pure ["TODO"]) -- TODO: How do I recurse here?
-- transformSyntaxExprToParser s Recurse = dbg "Recurse" (createParser s) -- Seems to work in the example, but in my actual application creates an infinite loop and freezes
-- Walks over the parser AST and convert it to a parser
createParser :: [SyntaxAst] -> Parser [Text]
createParser expressions =
do
foldr1 (liftA2 (<>)) (fmap transformSyntaxExprToParser expressions)
runExample :: IO ()
runExample = do
-- To make the example simple, lets cut out the language definition parsing and just define
-- it literally.
let languageParser = createParser [Keyword "A", Keyword "B", Keyword "C", Optional (Many [Recurse])]
let run p = runParser p "" "A B C (A B C (A B C))"
let result = run languageParser
case result of
Left bundle -> SIO.putStrLn (errorBundlePretty bundle)
Right xs -> pPrint xs
A few things I've tried:
Pass the original AST up to the transformSyntaxExprToParser function and call createParser when the Recurse token is encountered. This didn't work due to infinite loops.
Using mutable references like IORef/STRef to pass in a reference that is updated to reference the final parser once the transformation is finished. I couldn't work out how to thread the IO/ST monads into the parser transform function.
State monads. I couldn't work out how to pass a reference through the state monad.
I hope that makes sense, let me know if I need to elaborate more. I can also push up my full project if it will help.
Thanks for reading!
Edit: I've made changes to my original example to demonstrate the infinite loop problem (integrating the excellent suggestions in the answer below) at https://pastebin.com/DN0JJ9BA
I believe you can use laziness here. Pass the final parser as a parameter to transformSyntaxExprToParser, and when you see a Recurse, return that parser.
transformSyntaxExprToParser :: Parser [Text] -> SyntaxAst -> Parser [Text]
transformSyntaxExprToParser self = go
where
go (Keyword text) = dbg "Keyword" (pure <$> lexeme (string' text))
go (Optional inner) = dbg "Optional" (option [] (try (inParens (go inner))))
go Recurse = dbg "Recurse" self
createParser :: [SyntaxAst] -> Parser [Text]
createParser expressions = parser
where
parser = foldr1 (liftA2 (<>))
(fmap (transformSyntaxExprToParser parser) expressions)
This ought to produce exactly the same kind of recursive parser as if you had written it directly. A Parser is ultimately just a data structure which you can construct using its instances of Monad, Applicative, Alternative, &c.
Your idea of doing this with a mutable reference such as an IORef is essentially what’s happening under the hood anyway when constructing and evaluating a thunk.
Your idea here was almost correct:
Pass the original AST up to the transformSyntaxExprToParser function and call createParser when the Recurse token is encountered. This didn't work due to infinite loops.
The problem is that you were constructing a new parser for every Recurse, from the same input, which contains a Recurse, thus constructing a new parser…and so on. What my code above does is just pass in the same parser.
If you need to perform monadic side effects while constructing the parser, such as logging, then you can use a recursive do, for example, with some hypothetical MonadLog class for illustration:
{-# Language RecursiveDo #-}
transformSyntaxExprToParser :: (MonadLog m) => Parser [Text] -> SyntaxAst -> m (Parser [Text])
transformSyntaxExprToParser self = go
where
go (Keyword text) = do
logMessage "Got ‘Keyword’"
pure $ dbg "Keyword" (pure <$> lexeme (string' text))
go (Optional inner) = do
logMessage "Got ‘Optional’"
inner' <- go inner
pure $ dbg "Optional" (option [] (try (inParens inner')))
go Recurse = do
logMessage "Got ‘Recurse’"
pure $ dbg "Recurse" self
createParser :: (MonadFix m, MonadLog m) => [SyntaxAst] -> m (Parser [Text])
createParser expressions = do
rec
parser <- fmap (foldr1 (liftA2 (<>)))
(traverse (transformSyntaxExprToParser parser) expressions)
pure parser
The rec block introduces a recursive binding which you may construct using side effects. In general, some care is required to ensure that recursive definitions like this are sufficiently lazy, that is, you don’t force the result sooner than intended, but here the recursion pattern is very simple, and you never examine the self parser, only treat it as a black box to hook up to other parsers.
This method also makes it explicit what the scope of a Recurse is, and opens the possibility of introducing local recursive parsers, with a new call to transformSyntaxExprToParser with a new local self argument.
I'm looking for a parser that would try to use the first parser, and return Left a if it succeeds, or if it fails try the second parser and return Right b. In other words, something with the signature:
Parser a -> Parser b -> Parser (Either a b)
Where, e.g., type Parser a = P.Parsec String () a
It's not particularly hard to implement it on my own:
parseEither pa pb = (Left <$> pa) <|> (Right <$> pb)
But it seems to be such a useful and trivial construct that I was wondering if anything similar already exists in the Parsec library.
Using Megaparsec 5.
Following this guide, I can achieve a back-tracking user-state by combining StateT and ParsecT (non-defined types should be obvious/irrelevant):
type MyParser a = StateT UserState (ParsecT Dec T.Text Identity) a
if I run a parser p :: MyParser a, like this:
parsed = runParser (runStateT p initialUserState) "" input
The type of parsed is:
Either (ParseError Char Dec) (a, UserState)
Which means, in case of error, the user state is lost.
Is there any way to have it in both cases?
EDIT:
Could I perhaps, in case of error, use a custom error component instead of Dec (a feature introduced in 5.0) and encapsulate the user state in there?
You can use a custom error component combined with the observing function for this purpose (see this great post for more information):
{-# LANGUAGE RecordWildCards #-}
module Main where
import Text.Megaparsec
import qualified Data.Set as Set
import Control.Monad.State.Lazy
data MyState = MyState Int deriving (Ord, Eq, Show)
data MyErrorComponent = MyErrorComponent (Maybe MyState) deriving (Ord, Eq, Show)
instance ErrorComponent MyErrorComponent where
representFail _ = MyErrorComponent Nothing
representIndentation _ _ _= MyErrorComponent Nothing
type Parser = StateT MyState (Parsec MyErrorComponent String)
trackState :: Parser a -> Parser a
trackState parser = do
result <- observing parser -- run parser but don't fail right away
case result of
Right x -> return x -- if it succeeds we're done here
Left ParseError {..} -> do
state <- get -- read the current state to add it to the error component
failure errorUnexpected errorExpected $
if Set.null errorCustom then Set.singleton (MyErrorComponent $ Just state) else errorCustom
In the above snipped, observing functions a bit like a try/catch block that catches a parse error, then reads the current state and adds the it to the custom error component. The custom error component in turn is returned when runParser returns a ParseError.
Here's a demonstration how this function could be used:
a = trackState $ do
put (MyState 6)
string "foo"
b = trackState $ do
put (MyState 5)
a
main = putStrLn (show $ runParser (runStateT b (MyState 0)) "" "bar")
In reality you would probably want to do something more clever (for instance I imagine you could also add the entire stack of states you go through while traversing the stack).
You could try sandwiching ParserT between two States, like
type MyParser a = StateT UserState (ParsecT Dec T.Text (State UsersState)) a
And write special-purpose put and modify operations that, after changing the outer state, copy the entire state into the inner State monad using put.
That way, even if parsing fails, you'll have the last "state before failure" available from the inner State monad.
I hit similar problem. I use default typing state:
type SubDefPos = Int
type SubDefName = String
data MyParserSt = MyParserSt {
subDefs :: [(SubDefPos, SubDefName)]
}
ParsecT Void String (StateT MyParserSt Identity) Expr
Every change to user state is supplied with the value of getOffset to be able to reject later if current position is less than position from the state.
Documentation for the parsec package states that u argument is used to carry some user state through monadic computation. But the same functionality can be achieved by basing ParsecT monad transformer on State monad. So if my parser is not stateful, i don't need u altogether, but have to set it to () with parsec. What's rationale for adding non-optional state support to ParsecT?
Because a parser of type ParsecT s () (State st) a behaves differently from a parser of type Parsec s st Identity a when it comes to backtracking:
User state resets when parsec tries an alternative after a failing parse that consumes no input.
But the underlying Monad m does not backtrack; all the effects that happened on the way to a final parse result are kept.
Consider the following example:
{-# LANGUAGE FlexibleContexts #-}
module Foo where
import Control.Applicative
import Control.Monad.State
import Text.Parsec.Prim hiding ((<|>), State(..))
import Text.Parsec.Error (ParseError)
tick :: MonadState Int m => ParsecT s Int m ()
tick = do
lift $ modify (+1)
modifyState (+1)
tickTock :: MonadState Int m => ParsecT s Int m ()
tickTock = (tick >> empty) <|> tick
-- | run a parser that has both user state and an underlying state monad.
--
-- Example:
-- >>> run tickTock
-- (Right 1,2)
run :: ParsecT String Int (State Int) () -> (Either ParseError Int, Int)
run m = runState (runParserT (m >> getState) initUserState "-" "") initStateState
where initUserState = 0
initStateState = 0
As you can see, the underlying state monad registered two ticks (from both alternatives that were tried),
while the user state of the Parsec monad transformer only kept the successful one.
ParsecT carries it's own state already: parsing position and input: http://haddocks.fpcomplete.com/fp/7.8/20140916-162/parsec/Text-Parsec-Prim.html#t:State
So as leftaroundabout pointed out, it's probably due optimisation purposes.
I'm surprised that I could not find any info on this. I must be the only person having any trouble with it.
So, let's say I have a dash counter. I want it to count the number of dashes in the string, and return the string. Pretend I gave an example that won't work using parsec's state handling. So this should work:
dashCounter = do
str <- many1 dash
count <- get
return (count,str)
dash = do
char '-'
modify (+1)
And indeed, this compiles. Okay, so I try to use it:
:t parse dashCounter "" "----"
parse dashCounter "" "----"
:: (Control.Monad.State.Class.MonadState
t Data.Functor.Identity.Identity,
Num t) =>
Either ParseError (t, [Char])
Okay, that makes sense. It should return the state and the string. Cool.
>parse dashCounter "" "----"
<interactive>:1:7:
No instance for (Control.Monad.State.Class.MonadState
t0 Data.Functor.Identity.Identity)
arising from a use of `dashCounter'
Possible fix:
add an instance declaration for
(Control.Monad.State.Class.MonadState
t0 Data.Functor.Identity.Identity)
In the first argument of `parse', namely `dashCounter'
In the expression: parse dashCounter "" "----"
In an equation for `it': it = parse dashCounter "" "----"
Oops. But then how could it have ever hoped to work in the first place? There's no way to input the initial state.
There is also a function:
>runPT dashCounter (0::Int) "" "----"
But it gives a similar error.
<interactive>:1:7:
No instance for (Control.Monad.State.Class.MonadState Int m0)
arising from a use of `dashCounter'
Possible fix:
add an instance declaration for
(Control.Monad.State.Class.MonadState Int m0)
In the first argument of `runPT', namely `dashCounter'
In the expression: runPT dashCounter (0 :: Int) "" "----"
In an equation for `it':
it = runPT dashCounter (0 :: Int) "" "----"
I feel like I should have to runState on it, or there should be a function that already does it internally, but I can't seem to figure out where to go from here.
Edit: I should have specified more clearly, I did not want to use parsec's state handling. The reason is I have a feeling I don't want its backtracking to affect what it collects with the problem I'm preparing to solve it with.
However, Mr. McCann has figured out how this should fit together and the final code would look like this:
dashCounter = do
str <- many1 dash
count <- get
return (count,str)
dash = do
c <- char '-'
modify (+1)
return c
test = runState (runPT dashCounter () "" "----------") 0
Thanks a lot.
You've actually got multiple problems going on here, all of which are relatively non-obvious the first time around.
Starting with the simplest: dash is returning (), which doesn't seem to be what you want given that you're collecting the results. You probably wanted something like dash = char '-' <* modify (+1). (Note that I'm using an operator from Control.Applicative here, because it looks tidier)
Next, clearing up a point of confusion: When you get the reasonable-looking type signature in GHCi, note the context of (Control.Monad.State.Class.MonadState t Data.Functor.Identity.Identity, Num t). That's not saying what things are, it's telling you want they need to be. Nothing guarantees that the instances it's asking for exist and, in fact, they don't. Identity is not a state monad!
On the other hand, you're absolutely correct in thinking that parse doesn't make sense; you can't use it here. Consider its type: Stream s Identity t => Parsec s () a -> SourceName -> s -> Either ParseError a. As is customary with monad transformers, Parsec is an synonym for ParsecT applied to the identity monad. And while ParsecT does provide user state, you apparently don't want to use it, and ParsecT does not give an instance of MonadState anyhow. Here's the only relevant instance: MonadState s m => MonadState s (ParsecT s' u m). In other words, to treat a parser as a state monad you have to apply ParsecT to some other state monad.
This sort of brings us to the next problem: Ambiguity. You're using a lot of type class methods and no type signatures, so you're likely to run into situations where GHC can't know what type you actually want, so you have to tell it.
Now, as a quick solution, let's first define a type synonym to give a name to the monad transformer stack we want:
type StateParse a = ParsecT String () (StateT Int Identity) a
Give dashCounter the relevant type signature:
dashCounter :: StateParse (Int, String)
dashCounter = do str <- many1 dash
count <- get
return (count,str)
And add a special-purpose "run" function:
runStateParse p sn inp count = runIdentity $ runStateT (runPT p () sn inp) count
Now, in GHCi:
Main> runStateParse dashCounter "" "---" 0
(Right (3,"---"),3)
Also, note that it's pretty common to use a newtype around a transformer stack instead of just a type synonym. This can help with the ambiguity issues in some cases, and obviously avoids ending up with gigantic type signatures.
If you want to use the user state component Parsec offers as a built-in feature, then you can use the getState and modifyState monadic functions.
I tried to stay true to your example program, though using the return of dash doesn't seem useful.
import Text.Parsec
dashCounter :: Parsec String Int (Int, [()])
dashCounter = do
str <- many1 dash
count <- getState
return (count,str)
dash :: Parsec String Int ()
dash = do
char '-'
modifyState (+1)
test = runP dashCounter 0 "" "---"
Note that runP is indeed addressing your concern about runState.
Whilst these answers sort out this specific problem, they ignore the more serious underlying issue with an approach like this. I would like to describe it here for anyone else looking at this answer.
There is a difference between the user state and using the StateT transformer. The internal user state is reset on backtracking but StateT is not. Consider the following code. We want to add one to our counter if there is a dash and two if there is a plus. They produce different results.
As can be seen both using the internal state and attaching a StateT transformer provide the correct result. The latter comes at the expense of having to explicitly lift operations and be much more careful with types.
import Text.Parsec hiding (State)
import Control.Monad.State
import Control.Monad.Identity
f :: ParsecT String Int Identity Int
f = do
try dash <|> plus
getState
dash = do
modifyState (+1)
char '-'
plus = do
modifyState (+2)
char '+'
f' :: ParsecT String () (State Int) ()
f' = void (try dash' <|> plus')
dash' = do
modify (+1)
char '-'
plus' = do
modify (+2)
char '+'
f'' :: StateT Int (Parsec String ()) ()
f'' = void (dash'' <|> plus'')
dash'' :: StateT Int (Parsec String ()) Char
dash'' = do
modify (+1)
lift $ char '-'
plus'' :: StateT Int (Parsec String ()) Char
plus'' = do
modify (+2)
lift $ char '+'
This is the result of running f, f' and f''.
*Main> runParser f 0 "" "+"
Right 2
*Main> flip runState 0 $ runPT f' () "" "+"
(Right (),3)
*Main> runParser (runStateT f'' 0) () "" "+"
Right ((),2)