I want to make a Haskell parser for some mathematical expresions. For that, I would like to use reservedOp to define the operators. I tried to find some info and I found this example:
reservedOp :: String -> CharParser st ()
reservedOp = PT.reservedOp lexer
I searched on google, even on hoogle :) but couldn't find any explanation for that st (). Can anyone explain me in a few words what's the deal with it?
CharParser is just a type synonym. The main parser type used for parsing ParsecT has a lot of type variables and a lot of functionality that is often unused. The full type is
newtype ParsecT s u m a = ...
s is the parser state, u is the user state, m is the underlying monad, and a the return value.
If none of that makes sense, you should read about monad transformers.
But that probably isn't important to understanding CharParser. You can follow the type synonyms
>:i CharParser
type CharParser st = GenParser Char st
>:i GenParser
type GenParser tok st = Parsec [tok] st
>:i GenParser
type GenParser tok st = Parsec [tok] st
>:i Parsec
type Parsec s u = ParsecT s u Identity
to discover that CharParser st () is just ParsecT [Char] st Identity (). That's a parser which operates on a stream of Char (aka a String), whose user state is anything, and which returns nothing. The only way the user state can be anything at all is if it is never used by anyone. So it means pretty much nothing, you could have written any of
reservedOp :: String -> CharParser () ()
reservedOp :: String -> CharParser Int ()
reservedOp :: String -> CharParser Bool ()
etc. If the user state is unused, then it is customary to write CharParser () () to indicate that (some would say that is wrong and it should be CharParser Void () where data Void is an uninhabited type, but that is just pedantry). In fact the author in your link did do that in most of their type signatures (ie, factor :: CharParser () Double)
Since st is in lowercase, it's just a (type) variable name. Look up CharParser to see what it's all about.
Related
Documentation for the parsec package states that u argument is used to carry some user state through monadic computation. But the same functionality can be achieved by basing ParsecT monad transformer on State monad. So if my parser is not stateful, i don't need u altogether, but have to set it to () with parsec. What's rationale for adding non-optional state support to ParsecT?
Because a parser of type ParsecT s () (State st) a behaves differently from a parser of type Parsec s st Identity a when it comes to backtracking:
User state resets when parsec tries an alternative after a failing parse that consumes no input.
But the underlying Monad m does not backtrack; all the effects that happened on the way to a final parse result are kept.
Consider the following example:
{-# LANGUAGE FlexibleContexts #-}
module Foo where
import Control.Applicative
import Control.Monad.State
import Text.Parsec.Prim hiding ((<|>), State(..))
import Text.Parsec.Error (ParseError)
tick :: MonadState Int m => ParsecT s Int m ()
tick = do
lift $ modify (+1)
modifyState (+1)
tickTock :: MonadState Int m => ParsecT s Int m ()
tickTock = (tick >> empty) <|> tick
-- | run a parser that has both user state and an underlying state monad.
--
-- Example:
-- >>> run tickTock
-- (Right 1,2)
run :: ParsecT String Int (State Int) () -> (Either ParseError Int, Int)
run m = runState (runParserT (m >> getState) initUserState "-" "") initStateState
where initUserState = 0
initStateState = 0
As you can see, the underlying state monad registered two ticks (from both alternatives that were tried),
while the user state of the Parsec monad transformer only kept the successful one.
ParsecT carries it's own state already: parsing position and input: http://haddocks.fpcomplete.com/fp/7.8/20140916-162/parsec/Text-Parsec-Prim.html#t:State
So as leftaroundabout pointed out, it's probably due optimisation purposes.
From Text.Parsec.Token:
lexeme p = do { x <- p; whiteSpace; return x }
It appears that lexeme takes a parser p and delivers a parser that has the same behavior as p, except that it also skips all the trailing whitespace. Correct?
Then how come the following does not work:
constant :: Parser Int
constant = do
digits <- many1 digit
return (read digits)
lexConst :: Parser Int
lexConst = lexeme constant
The last line results in the following error message:
Couldn't match expected type `ParsecT
String () Data.Functor.Identity.Identity Int'
with actual type `ParsecT s0 u0 m0 a0 -> ParsecT s0 u0 m0 a0'
Expected type: Parser Int
Actual type: ParsecT s0 u0 m0 a0 -> ParsecT s0 u0 m0 a0
In the return type of a call of `lexeme'
In the expression: lexeme constant
What am I doing wrong?
You misunderstood the documentation, the lexeme exported from Text.Parsec.Token is a field of a GenTokenParser s u m, so the type is
lexeme :: GenTokenParser s u m -> ParsecT s u m a -> ParsecT s u m a
and you haven't supplied the GenTokenParser argument in lexeme constant.
You need to create a GenTokenParser from a GenLanguageDef (typically with makeTokenParser) first to use its lexeme field.
The lexeme function is an accessor into a GenTokenParser record of parsers generated by makeTokenParser, so you need to apply it to such a record to get at it. One common way of doing this is to use record wildcards, e.g.
{-# LANGUAGE RecordWildCards #-}
import qualified Text.Parsec.Token as Tok
Tok.TokenParser { .. } = Tok.makeTokenParser {- language definition -}
This will bring lexeme and all the other parsers into scope already applied to the record, so you can use it like you were trying to do.
can someone help me to understand how to use Applicative style for writing Parsec parsers? This is the code i have:
module Main where
import Control.Applicative hiding (many)
import Text.Parsec
import Data.Functor.Identity
data Cmd = A | B deriving (Show)
main = do
line <- getContents
putStrLn . show $ parseCmd line
parseCmd :: String -> Either ParseError String
parseCmd input = parse cmdParse "(parser)" input
cmdParse :: Parsec String () String
cmdParse = do
slash <- char '/'
whatever <- many alphaNum
return (slash:whatever)
cmdParse2 :: String -> Parsec String () String
cmdParse2 = (:) <$> (char '/') <*> many alphaNum
but when i try to compile it, i get following:
/home/tomasherman/Desktop/funinthesun.hs:21:13:
Couldn't match expected type `Parsec String () String'
with actual type `[a0]'
Expected type: a0 -> [a0] -> Parsec String () String
Actual type: a0 -> [a0] -> [a0]
In the first argument of `(<$>)', namely `(:)'
In the first argument of `(<*>)', namely `(:) <$> (char '/')'
Failed, modules loaded: none.
The idea is that i want cmdParse2 to do same thing that cmdParse does, but using applicative stuff...my approach is probably completely wrong, i'm new to haskell
Your applicative usage is spot on, you just have an incorrect signature. Try:
cmdParse2 :: Parsec String () String
Your approach looks correct to me, the problem is that cmdParse2 has the wrong type. It should have the same type as cmdParse. By the way, you can omit the parens around char '/' in the applicative style parser.
I'm surprised that I could not find any info on this. I must be the only person having any trouble with it.
So, let's say I have a dash counter. I want it to count the number of dashes in the string, and return the string. Pretend I gave an example that won't work using parsec's state handling. So this should work:
dashCounter = do
str <- many1 dash
count <- get
return (count,str)
dash = do
char '-'
modify (+1)
And indeed, this compiles. Okay, so I try to use it:
:t parse dashCounter "" "----"
parse dashCounter "" "----"
:: (Control.Monad.State.Class.MonadState
t Data.Functor.Identity.Identity,
Num t) =>
Either ParseError (t, [Char])
Okay, that makes sense. It should return the state and the string. Cool.
>parse dashCounter "" "----"
<interactive>:1:7:
No instance for (Control.Monad.State.Class.MonadState
t0 Data.Functor.Identity.Identity)
arising from a use of `dashCounter'
Possible fix:
add an instance declaration for
(Control.Monad.State.Class.MonadState
t0 Data.Functor.Identity.Identity)
In the first argument of `parse', namely `dashCounter'
In the expression: parse dashCounter "" "----"
In an equation for `it': it = parse dashCounter "" "----"
Oops. But then how could it have ever hoped to work in the first place? There's no way to input the initial state.
There is also a function:
>runPT dashCounter (0::Int) "" "----"
But it gives a similar error.
<interactive>:1:7:
No instance for (Control.Monad.State.Class.MonadState Int m0)
arising from a use of `dashCounter'
Possible fix:
add an instance declaration for
(Control.Monad.State.Class.MonadState Int m0)
In the first argument of `runPT', namely `dashCounter'
In the expression: runPT dashCounter (0 :: Int) "" "----"
In an equation for `it':
it = runPT dashCounter (0 :: Int) "" "----"
I feel like I should have to runState on it, or there should be a function that already does it internally, but I can't seem to figure out where to go from here.
Edit: I should have specified more clearly, I did not want to use parsec's state handling. The reason is I have a feeling I don't want its backtracking to affect what it collects with the problem I'm preparing to solve it with.
However, Mr. McCann has figured out how this should fit together and the final code would look like this:
dashCounter = do
str <- many1 dash
count <- get
return (count,str)
dash = do
c <- char '-'
modify (+1)
return c
test = runState (runPT dashCounter () "" "----------") 0
Thanks a lot.
You've actually got multiple problems going on here, all of which are relatively non-obvious the first time around.
Starting with the simplest: dash is returning (), which doesn't seem to be what you want given that you're collecting the results. You probably wanted something like dash = char '-' <* modify (+1). (Note that I'm using an operator from Control.Applicative here, because it looks tidier)
Next, clearing up a point of confusion: When you get the reasonable-looking type signature in GHCi, note the context of (Control.Monad.State.Class.MonadState t Data.Functor.Identity.Identity, Num t). That's not saying what things are, it's telling you want they need to be. Nothing guarantees that the instances it's asking for exist and, in fact, they don't. Identity is not a state monad!
On the other hand, you're absolutely correct in thinking that parse doesn't make sense; you can't use it here. Consider its type: Stream s Identity t => Parsec s () a -> SourceName -> s -> Either ParseError a. As is customary with monad transformers, Parsec is an synonym for ParsecT applied to the identity monad. And while ParsecT does provide user state, you apparently don't want to use it, and ParsecT does not give an instance of MonadState anyhow. Here's the only relevant instance: MonadState s m => MonadState s (ParsecT s' u m). In other words, to treat a parser as a state monad you have to apply ParsecT to some other state monad.
This sort of brings us to the next problem: Ambiguity. You're using a lot of type class methods and no type signatures, so you're likely to run into situations where GHC can't know what type you actually want, so you have to tell it.
Now, as a quick solution, let's first define a type synonym to give a name to the monad transformer stack we want:
type StateParse a = ParsecT String () (StateT Int Identity) a
Give dashCounter the relevant type signature:
dashCounter :: StateParse (Int, String)
dashCounter = do str <- many1 dash
count <- get
return (count,str)
And add a special-purpose "run" function:
runStateParse p sn inp count = runIdentity $ runStateT (runPT p () sn inp) count
Now, in GHCi:
Main> runStateParse dashCounter "" "---" 0
(Right (3,"---"),3)
Also, note that it's pretty common to use a newtype around a transformer stack instead of just a type synonym. This can help with the ambiguity issues in some cases, and obviously avoids ending up with gigantic type signatures.
If you want to use the user state component Parsec offers as a built-in feature, then you can use the getState and modifyState monadic functions.
I tried to stay true to your example program, though using the return of dash doesn't seem useful.
import Text.Parsec
dashCounter :: Parsec String Int (Int, [()])
dashCounter = do
str <- many1 dash
count <- getState
return (count,str)
dash :: Parsec String Int ()
dash = do
char '-'
modifyState (+1)
test = runP dashCounter 0 "" "---"
Note that runP is indeed addressing your concern about runState.
Whilst these answers sort out this specific problem, they ignore the more serious underlying issue with an approach like this. I would like to describe it here for anyone else looking at this answer.
There is a difference between the user state and using the StateT transformer. The internal user state is reset on backtracking but StateT is not. Consider the following code. We want to add one to our counter if there is a dash and two if there is a plus. They produce different results.
As can be seen both using the internal state and attaching a StateT transformer provide the correct result. The latter comes at the expense of having to explicitly lift operations and be much more careful with types.
import Text.Parsec hiding (State)
import Control.Monad.State
import Control.Monad.Identity
f :: ParsecT String Int Identity Int
f = do
try dash <|> plus
getState
dash = do
modifyState (+1)
char '-'
plus = do
modifyState (+2)
char '+'
f' :: ParsecT String () (State Int) ()
f' = void (try dash' <|> plus')
dash' = do
modify (+1)
char '-'
plus' = do
modify (+2)
char '+'
f'' :: StateT Int (Parsec String ()) ()
f'' = void (dash'' <|> plus'')
dash'' :: StateT Int (Parsec String ()) Char
dash'' = do
modify (+1)
lift $ char '-'
plus'' :: StateT Int (Parsec String ()) Char
plus'' = do
modify (+2)
lift $ char '+'
This is the result of running f, f' and f''.
*Main> runParser f 0 "" "+"
Right 2
*Main> flip runState 0 $ runPT f' () "" "+"
(Right (),3)
*Main> runParser (runStateT f'' 0) () "" "+"
Right ((),2)
What does the constraint (Stream s Identity t) mean in the following type declaration?
parse :: (Stream s Identity t)
=> Parsec s () a -> SourceName -> s -> Either ParseError a
What is Stream in the following class declaration, what does it mean. I'm totally lost.
class Monad m => Stream s m t | s -> t where
When I use Parsec, I get into a jam with the type-signatures (xxx :: yyy) all the time. I always skip the signatures, load the src into ghci, and then copy the type-signature back to my .hs file. It works, but I still don't understand what all these signatures are.
EDIT: more about the point of my question.
I'm still confused about the 'context' of type-signature:
(Show a) =>
means a must be a instance of class Show.
(Stream s Identity t) =>
what's the meaning of this 'context', since t never showed after the =>
I have a lot of different parser to run, so I write a warp function to run any of those parser with real files. but here comes the problem:
Here is my code, It cannot be loaded, how can I make it work?
module RunParse where
import System.IO
import Data.Functor.Identity (Identity)
import Text.Parsec.Prim (Parsec, parse, Stream)
--what should I write "runIOParse :: ..."
--runIOParse :: (Stream s Identity t, Show a) => Parsec s () a -> String -> IO ()
runIOParse pa filename =
do
inh <- openFile filename ReadMode
outh <- openFile (filename ++ ".parseout") WriteMode
instr <- hGetContents inh
let result = show $ parse pa filename instr
hPutStr outh result
hClose inh
hClose outh
the constraint: (Stream s Identity t) means what?
It means that the input s your parser works on (i.e. [Char]) must be an instance of the Stream class. In the documentation you see that [Char] is indeed an instance of Stream, since any list is.
The parameter t is the token type, which is normally Char and is determinded by s, as states the functional dependency s -> t.
But don't worry too much about this Stream typeclass. It's used only to have a unified interface for any Stream-like type, e.g. lists or ByteStrings.
what is Stream
A Stream is simply a typeclass. It has the uncons function, which returns the head of the input and the tail in a tuple wrapped in Maybe. Normally you won't need this function. As far as I can see, it's only needed in the most basic parsers like tokenPrimEx.
Edit:
what's the meaning of this 'context', since t never showed after the =>
Have a look at functional dependencies. The t never shows after the ´=>´, because it is determiend by s.
And it means that you can use uncons on whatever s is.
Here is my code, It cannot be loaded, how can I make it work?
Simple: Add an import statement for Text.Parsec.String, which defines the missing instance for Stream [tok] m tok. The documentation could be a bit clearer here, because it looks as if this instance was defined in Text.Parsec.Prim.
Alternatively import the whole Parsec library (import Text.Parsec) - this is how I always do it.
The Stream type class is an abstraction for list-like data structures. Early versions of Parsec only worked for parsing lists of tokens (for example, String is a synonym for [Char], so Char is the token type), which can be a very inefficient representation. These days, most substantial input in Haskell is handled as Text or ByteString types, which are not lists, but can act a lot like them.
So, for example, you mention
parse :: (Stream s Identity t)
=> Parsec s () a -> SourceName -> s -> Either ParseError a
Some specializations of this type would be
parse1 :: Parsec String () a -> SourceName -> String -> Either ParseError a
parse2 :: Parsec Text () a -> SourceName -> Text -> Either ParseError a
parse3 :: Parsec ByteString () a -> SourceName -> ByteString -> Either ParseError a
or even, if you have a separate lexer with a token type MyToken:
parse4 :: Parsec [MyToken] () a -> SourceName -> [MyToken] -> Either ParseError a
Of these, only the first and last use actual lists for the input, but the middle two use other Stream instances that act enough like lists for Parsec to work with them.
You can even declare your own Stream instance, so if your input is in some other type that acts sort of list-like, you can write an instance, implement the uncons function, and Parsec will work with your type, as well.