Parsec and Applicative style - haskell

can someone help me to understand how to use Applicative style for writing Parsec parsers? This is the code i have:
module Main where
import Control.Applicative hiding (many)
import Text.Parsec
import Data.Functor.Identity
data Cmd = A | B deriving (Show)
main = do
line <- getContents
putStrLn . show $ parseCmd line
parseCmd :: String -> Either ParseError String
parseCmd input = parse cmdParse "(parser)" input
cmdParse :: Parsec String () String
cmdParse = do
slash <- char '/'
whatever <- many alphaNum
return (slash:whatever)
cmdParse2 :: String -> Parsec String () String
cmdParse2 = (:) <$> (char '/') <*> many alphaNum
but when i try to compile it, i get following:
/home/tomasherman/Desktop/funinthesun.hs:21:13:
Couldn't match expected type `Parsec String () String'
with actual type `[a0]'
Expected type: a0 -> [a0] -> Parsec String () String
Actual type: a0 -> [a0] -> [a0]
In the first argument of `(<$>)', namely `(:)'
In the first argument of `(<*>)', namely `(:) <$> (char '/')'
Failed, modules loaded: none.
The idea is that i want cmdParse2 to do same thing that cmdParse does, but using applicative stuff...my approach is probably completely wrong, i'm new to haskell

Your applicative usage is spot on, you just have an incorrect signature. Try:
cmdParse2 :: Parsec String () String

Your approach looks correct to me, the problem is that cmdParse2 has the wrong type. It should have the same type as cmdParse. By the way, you can omit the parens around char '/' in the applicative style parser.

Related

How to use the latest version of the Parsec.Indent library?

It might seem that this question is a duplicate of this question, however either Parsec or the Indent library has changed since 2012 and none of the old examples I have found for the indent library compile with the latest versions.
I want to make a parser for a programming language where indentation is part of the syntax (used to indicate scopes), in order to achieve this I want to make use of the Text.Parsec.Indent library, but I am at a loss on how to use it. It is clear to me that some modifications/custom parser type has to be made, but my limited knowledge on the State monad and surface level understanding of parsec seem to not be enough.
Let's say you wanted to make a parser for a simple list of ints like below. How would one achieve this?
mylist
fstitem
snditem
My attempts to create a simple parser based on some of the old examples floating around on the internet looked like this, but it obviously produces some type errors:
import Control.Monad.State
import Text.Parsec hiding (State)
import Text.Parsec.Indent
import Text.Parsec.Pos
type IParser a = ParsecT String () (State SourcePos) a
parseInt :: IParser Integer
parseInt = read <$> many1 digit
parseIndentedInt :: IParser Integer
parseIndentedInt = indented *> parseInt
specifically these:
Frontend/Parser.hs:14:20: error:
• Couldn't match type ‘Control.Monad.Trans.Reader.ReaderT
Text.Parsec.Indent.Internal.Indentation m0’
with ‘StateT SourcePos Data.Functor.Identity.Identity’
Expected type: IParser Integer
Actual type: ParsecT String () (IndentT m0) Integer
• In the expression: indented *> parseInt
In an equation for ‘parseIndentedInt’:
parseIndentedInt = indented *> parseInt
|
14 | parseIndentedInt = indented *> parseInt
| ^^^^^^^^^^^^^^^^^^^^
Frontend/Parser.hs:14:32: error:
• Couldn't match type ‘StateT
SourcePos Data.Functor.Identity.Identity’
with ‘Control.Monad.Trans.Reader.ReaderT
Text.Parsec.Indent.Internal.Indentation m0’
Expected type: ParsecT String () (IndentT m0) Integer
Actual type: IParser Integer
• In the second argument of ‘(*>)’, namely ‘parseInt’
In the expression: indented *> parseInt
In an equation for ‘parseIndentedInt’:
parseIndentedInt = indented *> parseInt
|
14 | parseIndentedInt = indented *> parseInt
| ^^^^^^^^
Failed, no modules loaded.
Okay after some deep diving into the source code and looking at the tests in the indents GitHub repository I managed to create a working example.
The following code can parse a simple indented list:
import Text.Parsec as Parsec
import Text.Parsec.Indent as Indent
data ExampleList = ExampleList String [ExampleList]
deriving (Eq, Show)
plistItem :: Indent.IndentParser String () String
plistItem = Parsec.many1 Parsec.lower <* Parsec.spaces
pList :: Indent.IndentParser String () ExampleList
pList = Indent.withPos (ExampleList <$> plistItem <*> Parsec.many (Indent.indented *> pList))
useParser :: Indent.IndentParser String () a -> String -> a
useParser p src = helper res
where res = Indent.runIndent $ Parsec.runParserT (p <* Parsec.eof) () "<test>" src
helper (Left err) = error "Parse error"
helper (Right ok) = ok
example usage:
*Main> useParser pList "mylist\n\tfstitem\n\tsnditem"
ExampleList "mylist" [ExampleList "fstitem" [],ExampleList "snditem" []]
Note that the useParser function does some stuff with actually taking the result from the Either monad, as well as putting an end of file parser behind the supplied parser. Depending on your application you might want to change this.
Additionally the type signatures could be shortend with something like this:
type IParser a = Indent.IndentParser String () a
plistItem :: IParser String
pList :: IParser ExampleList
useParser :: IParser a -> String -> a

Parsec: intuit type from parsed string

Is it possible to infer the type from many1?
MWE
module Main where
import System.Environment (getArgs)
import Text.ParserCombinators.Parsec
import Data.Either (rights)
type Vertex vertexWeight = (String, vertexWeight)
parseVertex :: Parser (Vertex a)
parseVertex = do
name <- many1 (noneOf "/")
char '/'
weight <- many1 (noneOf "\n")
return $ (name, weight)
main :: IO ()
main = do
putStrLn $ rights $ [parse parseVertex "test" "a/2"]
In the above example, I'd like for the weight parameter to get outputted as an Int, but this does not type-check.
Would it be wiser to represent a vertex as (String, String) and define parsers for the weight?
The type Parser (Vertex a) is shorthand for forall a. Parser (Vertex a), i.e. its type states that for any choice of a, it can have type Parser (Vertex a). This is clearly not what you want: you want to say that parseVertex will always have type Parser (Vertex a) for some choice of a, but this choice is to be made by parseVertex, not at its call site.
What you should do, is use a type T such that Parser (Vertex T) covers all possible return values of parseVertex. For example, if you use Parser (Vertex (Either Int String)), then parseVertex can choose based on the parse results so far if it will return something of the form (s, Left x), or (s, Right t), where s :: String, x :: Int and t :: String.
Of course, that also means that consumers of parseVector now have to be able to handle both cases.

Confusion about IO and do notation

I'm a beginner in Haskell and confused about this code I wrote
readRecords :: String -> [Either String Record]
readRecords path = do
f <- B.readFile path
map parseLogLine (C8.lines f)
But it gives me this error:
Main.hs:15:10:
Couldn't match type `IO' with `[]'
Expected type: [C8.ByteString]
Actual type: IO C8.ByteString
In the return type of a call of `B.readFile'
In a stmt of a 'do' block: f <- B.readFile path
In the expression:
do { f <- B.readFile path;
map parseLogLine (C8.lines f) }
parseLogLine's type signature is parseLogLine :: B8.ByteString -> Either String Record.
I'm completely surprised. B.readFile path should return IO ByteString and so f should be ByteString. C8.lines f should return [ByteString] and map should return [Either String Record].
Where am I wrong?
As a starting point, readRecords is defined with the wrong type. If the do block is to work in the IO monad, then it will produce an IO value, but you've defined it as returning [Either String Record] which is in the [] monad. That means that you can't call B.readFile which returns IO without triggering the type error you got.
Once you fix that, you'll find that the map expression on the last line of the do block has the wrong type because it produces [Either String Record] when it should produce IO [Either String Record]. Wrap the call in return to fix this.

I don't understand how to use the lexeme function

From Text.Parsec.Token:
lexeme p = do { x <- p; whiteSpace; return x }
It appears that lexeme takes a parser p and delivers a parser that has the same behavior as p, except that it also skips all the trailing whitespace. Correct?
Then how come the following does not work:
constant :: Parser Int
constant = do
digits <- many1 digit
return (read digits)
lexConst :: Parser Int
lexConst = lexeme constant
The last line results in the following error message:
Couldn't match expected type `ParsecT
String () Data.Functor.Identity.Identity Int'
with actual type `ParsecT s0 u0 m0 a0 -> ParsecT s0 u0 m0 a0'
Expected type: Parser Int
Actual type: ParsecT s0 u0 m0 a0 -> ParsecT s0 u0 m0 a0
In the return type of a call of `lexeme'
In the expression: lexeme constant
What am I doing wrong?
You misunderstood the documentation, the lexeme exported from Text.Parsec.Token is a field of a GenTokenParser s u m, so the type is
lexeme :: GenTokenParser s u m -> ParsecT s u m a -> ParsecT s u m a
and you haven't supplied the GenTokenParser argument in lexeme constant.
You need to create a GenTokenParser from a GenLanguageDef (typically with makeTokenParser) first to use its lexeme field.
The lexeme function is an accessor into a GenTokenParser record of parsers generated by makeTokenParser, so you need to apply it to such a record to get at it. One common way of doing this is to use record wildcards, e.g.
{-# LANGUAGE RecordWildCards #-}
import qualified Text.Parsec.Token as Tok
Tok.TokenParser { .. } = Tok.makeTokenParser {- language definition -}
This will bring lexeme and all the other parsers into scope already applied to the record, so you can use it like you were trying to do.

How to use Control.Monad.State with Parsec?

I'm surprised that I could not find any info on this. I must be the only person having any trouble with it.
So, let's say I have a dash counter. I want it to count the number of dashes in the string, and return the string. Pretend I gave an example that won't work using parsec's state handling. So this should work:
dashCounter = do
str <- many1 dash
count <- get
return (count,str)
dash = do
char '-'
modify (+1)
And indeed, this compiles. Okay, so I try to use it:
:t parse dashCounter "" "----"
parse dashCounter "" "----"
:: (Control.Monad.State.Class.MonadState
t Data.Functor.Identity.Identity,
Num t) =>
Either ParseError (t, [Char])
Okay, that makes sense. It should return the state and the string. Cool.
>parse dashCounter "" "----"
<interactive>:1:7:
No instance for (Control.Monad.State.Class.MonadState
t0 Data.Functor.Identity.Identity)
arising from a use of `dashCounter'
Possible fix:
add an instance declaration for
(Control.Monad.State.Class.MonadState
t0 Data.Functor.Identity.Identity)
In the first argument of `parse', namely `dashCounter'
In the expression: parse dashCounter "" "----"
In an equation for `it': it = parse dashCounter "" "----"
Oops. But then how could it have ever hoped to work in the first place? There's no way to input the initial state.
There is also a function:
>runPT dashCounter (0::Int) "" "----"
But it gives a similar error.
<interactive>:1:7:
No instance for (Control.Monad.State.Class.MonadState Int m0)
arising from a use of `dashCounter'
Possible fix:
add an instance declaration for
(Control.Monad.State.Class.MonadState Int m0)
In the first argument of `runPT', namely `dashCounter'
In the expression: runPT dashCounter (0 :: Int) "" "----"
In an equation for `it':
it = runPT dashCounter (0 :: Int) "" "----"
I feel like I should have to runState on it, or there should be a function that already does it internally, but I can't seem to figure out where to go from here.
Edit: I should have specified more clearly, I did not want to use parsec's state handling. The reason is I have a feeling I don't want its backtracking to affect what it collects with the problem I'm preparing to solve it with.
However, Mr. McCann has figured out how this should fit together and the final code would look like this:
dashCounter = do
str <- many1 dash
count <- get
return (count,str)
dash = do
c <- char '-'
modify (+1)
return c
test = runState (runPT dashCounter () "" "----------") 0
Thanks a lot.
You've actually got multiple problems going on here, all of which are relatively non-obvious the first time around.
Starting with the simplest: dash is returning (), which doesn't seem to be what you want given that you're collecting the results. You probably wanted something like dash = char '-' <* modify (+1). (Note that I'm using an operator from Control.Applicative here, because it looks tidier)
Next, clearing up a point of confusion: When you get the reasonable-looking type signature in GHCi, note the context of (Control.Monad.State.Class.MonadState t Data.Functor.Identity.Identity, Num t). That's not saying what things are, it's telling you want they need to be. Nothing guarantees that the instances it's asking for exist and, in fact, they don't. Identity is not a state monad!
On the other hand, you're absolutely correct in thinking that parse doesn't make sense; you can't use it here. Consider its type: Stream s Identity t => Parsec s () a -> SourceName -> s -> Either ParseError a. As is customary with monad transformers, Parsec is an synonym for ParsecT applied to the identity monad. And while ParsecT does provide user state, you apparently don't want to use it, and ParsecT does not give an instance of MonadState anyhow. Here's the only relevant instance: MonadState s m => MonadState s (ParsecT s' u m). In other words, to treat a parser as a state monad you have to apply ParsecT to some other state monad.
This sort of brings us to the next problem: Ambiguity. You're using a lot of type class methods and no type signatures, so you're likely to run into situations where GHC can't know what type you actually want, so you have to tell it.
Now, as a quick solution, let's first define a type synonym to give a name to the monad transformer stack we want:
type StateParse a = ParsecT String () (StateT Int Identity) a
Give dashCounter the relevant type signature:
dashCounter :: StateParse (Int, String)
dashCounter = do str <- many1 dash
count <- get
return (count,str)
And add a special-purpose "run" function:
runStateParse p sn inp count = runIdentity $ runStateT (runPT p () sn inp) count
Now, in GHCi:
Main> runStateParse dashCounter "" "---" 0
(Right (3,"---"),3)
Also, note that it's pretty common to use a newtype around a transformer stack instead of just a type synonym. This can help with the ambiguity issues in some cases, and obviously avoids ending up with gigantic type signatures.
If you want to use the user state component Parsec offers as a built-in feature, then you can use the getState and modifyState monadic functions.
I tried to stay true to your example program, though using the return of dash doesn't seem useful.
import Text.Parsec
dashCounter :: Parsec String Int (Int, [()])
dashCounter = do
str <- many1 dash
count <- getState
return (count,str)
dash :: Parsec String Int ()
dash = do
char '-'
modifyState (+1)
test = runP dashCounter 0 "" "---"
Note that runP is indeed addressing your concern about runState.
Whilst these answers sort out this specific problem, they ignore the more serious underlying issue with an approach like this. I would like to describe it here for anyone else looking at this answer.
There is a difference between the user state and using the StateT transformer. The internal user state is reset on backtracking but StateT is not. Consider the following code. We want to add one to our counter if there is a dash and two if there is a plus. They produce different results.
As can be seen both using the internal state and attaching a StateT transformer provide the correct result. The latter comes at the expense of having to explicitly lift operations and be much more careful with types.
import Text.Parsec hiding (State)
import Control.Monad.State
import Control.Monad.Identity
f :: ParsecT String Int Identity Int
f = do
try dash <|> plus
getState
dash = do
modifyState (+1)
char '-'
plus = do
modifyState (+2)
char '+'
f' :: ParsecT String () (State Int) ()
f' = void (try dash' <|> plus')
dash' = do
modify (+1)
char '-'
plus' = do
modify (+2)
char '+'
f'' :: StateT Int (Parsec String ()) ()
f'' = void (dash'' <|> plus'')
dash'' :: StateT Int (Parsec String ()) Char
dash'' = do
modify (+1)
lift $ char '-'
plus'' :: StateT Int (Parsec String ()) Char
plus'' = do
modify (+2)
lift $ char '+'
This is the result of running f, f' and f''.
*Main> runParser f 0 "" "+"
Right 2
*Main> flip runState 0 $ runPT f' () "" "+"
(Right (),3)
*Main> runParser (runStateT f'' 0) () "" "+"
Right ((),2)

Resources