It might seem that this question is a duplicate of this question, however either Parsec or the Indent library has changed since 2012 and none of the old examples I have found for the indent library compile with the latest versions.
I want to make a parser for a programming language where indentation is part of the syntax (used to indicate scopes), in order to achieve this I want to make use of the Text.Parsec.Indent library, but I am at a loss on how to use it. It is clear to me that some modifications/custom parser type has to be made, but my limited knowledge on the State monad and surface level understanding of parsec seem to not be enough.
Let's say you wanted to make a parser for a simple list of ints like below. How would one achieve this?
mylist
fstitem
snditem
My attempts to create a simple parser based on some of the old examples floating around on the internet looked like this, but it obviously produces some type errors:
import Control.Monad.State
import Text.Parsec hiding (State)
import Text.Parsec.Indent
import Text.Parsec.Pos
type IParser a = ParsecT String () (State SourcePos) a
parseInt :: IParser Integer
parseInt = read <$> many1 digit
parseIndentedInt :: IParser Integer
parseIndentedInt = indented *> parseInt
specifically these:
Frontend/Parser.hs:14:20: error:
• Couldn't match type ‘Control.Monad.Trans.Reader.ReaderT
Text.Parsec.Indent.Internal.Indentation m0’
with ‘StateT SourcePos Data.Functor.Identity.Identity’
Expected type: IParser Integer
Actual type: ParsecT String () (IndentT m0) Integer
• In the expression: indented *> parseInt
In an equation for ‘parseIndentedInt’:
parseIndentedInt = indented *> parseInt
|
14 | parseIndentedInt = indented *> parseInt
| ^^^^^^^^^^^^^^^^^^^^
Frontend/Parser.hs:14:32: error:
• Couldn't match type ‘StateT
SourcePos Data.Functor.Identity.Identity’
with ‘Control.Monad.Trans.Reader.ReaderT
Text.Parsec.Indent.Internal.Indentation m0’
Expected type: ParsecT String () (IndentT m0) Integer
Actual type: IParser Integer
• In the second argument of ‘(*>)’, namely ‘parseInt’
In the expression: indented *> parseInt
In an equation for ‘parseIndentedInt’:
parseIndentedInt = indented *> parseInt
|
14 | parseIndentedInt = indented *> parseInt
| ^^^^^^^^
Failed, no modules loaded.
Okay after some deep diving into the source code and looking at the tests in the indents GitHub repository I managed to create a working example.
The following code can parse a simple indented list:
import Text.Parsec as Parsec
import Text.Parsec.Indent as Indent
data ExampleList = ExampleList String [ExampleList]
deriving (Eq, Show)
plistItem :: Indent.IndentParser String () String
plistItem = Parsec.many1 Parsec.lower <* Parsec.spaces
pList :: Indent.IndentParser String () ExampleList
pList = Indent.withPos (ExampleList <$> plistItem <*> Parsec.many (Indent.indented *> pList))
useParser :: Indent.IndentParser String () a -> String -> a
useParser p src = helper res
where res = Indent.runIndent $ Parsec.runParserT (p <* Parsec.eof) () "<test>" src
helper (Left err) = error "Parse error"
helper (Right ok) = ok
example usage:
*Main> useParser pList "mylist\n\tfstitem\n\tsnditem"
ExampleList "mylist" [ExampleList "fstitem" [],ExampleList "snditem" []]
Note that the useParser function does some stuff with actually taking the result from the Either monad, as well as putting an end of file parser behind the supplied parser. Depending on your application you might want to change this.
Additionally the type signatures could be shortend with something like this:
type IParser a = Indent.IndentParser String () a
plistItem :: IParser String
pList :: IParser ExampleList
useParser :: IParser a -> String -> a
Related
I have a data type called EntrySearchableInfo written like this
type EntryDate = UTCTime -- From Data.Time
type EntryTag = Tag -- String
type EntryName = Name -- String
type EntryDescription = Description -- String
type EntryId = Int
data EntrySearchableInfo
= SearchableEntryDate EntryDate
| SearchableEntryTag EntryTag
| SearchableEntryName EntryName
| SearchableEntryDescription EntryDescription
| SearchableEntryId EntryId
Basically represents things that make sense in 'search' context.
I want to write a function with this type
entrySearchableInfoParser :: Parser (Either String EntrySearchableInfo)
which (I think) will be a combination of several primitive Parser <Type> functions I have already written
entryDateParser :: Parser (Either String UTCTime)
entryDateParser = parseStringToUTCTime <$> strOption
(long "date" <> short 'd' <> metavar "DATE" <> help entryDateParserHelp)
searchableEntryDateParser :: Parser (Either String EntrySearchableInfo)
searchableEntryDateParser = SearchableEntryDate <$$> entryDateParser -- <$$> is just (fmap . fmap)
searchableEntryTagParser :: Parser (Either String EntrySearchableInfo)
searchableEntryTagParser = ...
...
So I have two questions:
How do I combine those parsers to make entrySearchableInfoParser functions.
EntrySearchableInfo type is a part of a larger Entry type defined like this
data Entry
= Add EntryDate EntryInfo EntryTag EntryNote EntryId
| Replace EntrySearchableInfo Entry
| ...
...
I already have a function with type
entryAdd :: Parser (Either String Entry)
which constructs Entry using Add.
But I'm not sure how to make Entry type using Replace with entrySearchableInfoParser and entryAdd.
So combining those parsers were a lot simpler than I imagined.
I just had to use <|>
entrySearchableInfoParser :: Parser (Either String EntrySearchableInfo)
entrySearchableInfoParser =
searchableEntryDateParser
<|> searchableEntryTagParser
<|> searchableEntryNameParser
<|> searchableEntryDescriptionParser
<|> searchableEntryIdParser
and constructing Entry type using Replace with entrySearchableInfoParser and entryAdd was too.
entryAdd :: Parser (Either String Entry)
entryAdd = ...
entryReplace :: Parser (Either String Entry)
entryReplace = liftA2 Edit <$> entrySearchableInfoParser <*> entryAdd
Now it works perfectly!
I can't understand what does the type of (for example) eol mean:
eol :: (MonadParsec e s m, Token s ~ Char) => m String
or, better, I don't understand how to use eol with Text.Megaparsec.Text and not Text.Megaparsec.String.
I've been trying to use learn how to use Megaparsec following the (old) tutorial for Parsec from Real World Haskell (I actually started reading RWH tutorial first before finding out that Megaparsec existed). I rewrote the code of the first example to use Megaparsec (see below). But I found that when I try to force the type of eol to Parser Text the compiler throws the error: Couldn't match type ‘[Char]’ with ‘Text’, what I gather from this is that I cannot use eol with Text or, more likely, I don't know how to change that Token s ~ Char context from the eol declaration to use Token Text.
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE NoImplicitPrelude #-}
module CSVParser (
module CSVParser
) where
import Foundation
import Data.Functor.Identity (Identity)
import Text.Megaparsec
import Text.Megaparsec.Text
import Data.Text
csvFile :: Parser [[Text]]
csvFile =
do result <- many line
eof
return result
line :: Parser [Text]
line =
do result <- cells
--eol :: Parser Text -- uncommenting this line results in a compilation error
eol
return result
cells :: Parser [Text]
cells =
do first <- cellContent
next <- remainingCells
return (first : next)
remainingCells =
(char ',' >> cells)
<|> return []
cellContent :: Parser Text
cellContent = fromList <$> many (noneOf [',','\n'])
parseCSV :: Text -> Either (ParseError (Token Text) Dec) [[Text]]
parseCSV = parse csvFile "(unknown)"
In the type:
eol :: (MonadParsec e s m, Token s ~ Char) => m String
the ~ is a type equality constraint, and the MonadParsec and Token typeclasses are defined by Megaparsec. They can roughly be interpreted as follows:
MonadParsec e s m is an assertion that type m is a monadic parser that reads a Stream of type s and represents errors using an ErrorComponent of type e
Token s is the underlying type of the tokens read from stream s
So, the full type can be interpreted as: eol is a monadic parser with "return value" String that parses a stream whose tokens are Char.
For your problem, most of this can be ignored. The issue you're running into is that eol returns a String value as the result of the parse, and a String isn't a Text, so you can't make an eol (which is of type Parser String) be of type Parser Text, no matter how hard you try.
Two solutions are to ignore the unwanted String return value or, if you need it as text, convert it:
Data.Text.pack <$> eol
Is it possible to infer the type from many1?
MWE
module Main where
import System.Environment (getArgs)
import Text.ParserCombinators.Parsec
import Data.Either (rights)
type Vertex vertexWeight = (String, vertexWeight)
parseVertex :: Parser (Vertex a)
parseVertex = do
name <- many1 (noneOf "/")
char '/'
weight <- many1 (noneOf "\n")
return $ (name, weight)
main :: IO ()
main = do
putStrLn $ rights $ [parse parseVertex "test" "a/2"]
In the above example, I'd like for the weight parameter to get outputted as an Int, but this does not type-check.
Would it be wiser to represent a vertex as (String, String) and define parsers for the weight?
The type Parser (Vertex a) is shorthand for forall a. Parser (Vertex a), i.e. its type states that for any choice of a, it can have type Parser (Vertex a). This is clearly not what you want: you want to say that parseVertex will always have type Parser (Vertex a) for some choice of a, but this choice is to be made by parseVertex, not at its call site.
What you should do, is use a type T such that Parser (Vertex T) covers all possible return values of parseVertex. For example, if you use Parser (Vertex (Either Int String)), then parseVertex can choose based on the parse results so far if it will return something of the form (s, Left x), or (s, Right t), where s :: String, x :: Int and t :: String.
Of course, that also means that consumers of parseVector now have to be able to handle both cases.
can someone help me to understand how to use Applicative style for writing Parsec parsers? This is the code i have:
module Main where
import Control.Applicative hiding (many)
import Text.Parsec
import Data.Functor.Identity
data Cmd = A | B deriving (Show)
main = do
line <- getContents
putStrLn . show $ parseCmd line
parseCmd :: String -> Either ParseError String
parseCmd input = parse cmdParse "(parser)" input
cmdParse :: Parsec String () String
cmdParse = do
slash <- char '/'
whatever <- many alphaNum
return (slash:whatever)
cmdParse2 :: String -> Parsec String () String
cmdParse2 = (:) <$> (char '/') <*> many alphaNum
but when i try to compile it, i get following:
/home/tomasherman/Desktop/funinthesun.hs:21:13:
Couldn't match expected type `Parsec String () String'
with actual type `[a0]'
Expected type: a0 -> [a0] -> Parsec String () String
Actual type: a0 -> [a0] -> [a0]
In the first argument of `(<$>)', namely `(:)'
In the first argument of `(<*>)', namely `(:) <$> (char '/')'
Failed, modules loaded: none.
The idea is that i want cmdParse2 to do same thing that cmdParse does, but using applicative stuff...my approach is probably completely wrong, i'm new to haskell
Your applicative usage is spot on, you just have an incorrect signature. Try:
cmdParse2 :: Parsec String () String
Your approach looks correct to me, the problem is that cmdParse2 has the wrong type. It should have the same type as cmdParse. By the way, you can omit the parens around char '/' in the applicative style parser.
I'm surprised that I could not find any info on this. I must be the only person having any trouble with it.
So, let's say I have a dash counter. I want it to count the number of dashes in the string, and return the string. Pretend I gave an example that won't work using parsec's state handling. So this should work:
dashCounter = do
str <- many1 dash
count <- get
return (count,str)
dash = do
char '-'
modify (+1)
And indeed, this compiles. Okay, so I try to use it:
:t parse dashCounter "" "----"
parse dashCounter "" "----"
:: (Control.Monad.State.Class.MonadState
t Data.Functor.Identity.Identity,
Num t) =>
Either ParseError (t, [Char])
Okay, that makes sense. It should return the state and the string. Cool.
>parse dashCounter "" "----"
<interactive>:1:7:
No instance for (Control.Monad.State.Class.MonadState
t0 Data.Functor.Identity.Identity)
arising from a use of `dashCounter'
Possible fix:
add an instance declaration for
(Control.Monad.State.Class.MonadState
t0 Data.Functor.Identity.Identity)
In the first argument of `parse', namely `dashCounter'
In the expression: parse dashCounter "" "----"
In an equation for `it': it = parse dashCounter "" "----"
Oops. But then how could it have ever hoped to work in the first place? There's no way to input the initial state.
There is also a function:
>runPT dashCounter (0::Int) "" "----"
But it gives a similar error.
<interactive>:1:7:
No instance for (Control.Monad.State.Class.MonadState Int m0)
arising from a use of `dashCounter'
Possible fix:
add an instance declaration for
(Control.Monad.State.Class.MonadState Int m0)
In the first argument of `runPT', namely `dashCounter'
In the expression: runPT dashCounter (0 :: Int) "" "----"
In an equation for `it':
it = runPT dashCounter (0 :: Int) "" "----"
I feel like I should have to runState on it, or there should be a function that already does it internally, but I can't seem to figure out where to go from here.
Edit: I should have specified more clearly, I did not want to use parsec's state handling. The reason is I have a feeling I don't want its backtracking to affect what it collects with the problem I'm preparing to solve it with.
However, Mr. McCann has figured out how this should fit together and the final code would look like this:
dashCounter = do
str <- many1 dash
count <- get
return (count,str)
dash = do
c <- char '-'
modify (+1)
return c
test = runState (runPT dashCounter () "" "----------") 0
Thanks a lot.
You've actually got multiple problems going on here, all of which are relatively non-obvious the first time around.
Starting with the simplest: dash is returning (), which doesn't seem to be what you want given that you're collecting the results. You probably wanted something like dash = char '-' <* modify (+1). (Note that I'm using an operator from Control.Applicative here, because it looks tidier)
Next, clearing up a point of confusion: When you get the reasonable-looking type signature in GHCi, note the context of (Control.Monad.State.Class.MonadState t Data.Functor.Identity.Identity, Num t). That's not saying what things are, it's telling you want they need to be. Nothing guarantees that the instances it's asking for exist and, in fact, they don't. Identity is not a state monad!
On the other hand, you're absolutely correct in thinking that parse doesn't make sense; you can't use it here. Consider its type: Stream s Identity t => Parsec s () a -> SourceName -> s -> Either ParseError a. As is customary with monad transformers, Parsec is an synonym for ParsecT applied to the identity monad. And while ParsecT does provide user state, you apparently don't want to use it, and ParsecT does not give an instance of MonadState anyhow. Here's the only relevant instance: MonadState s m => MonadState s (ParsecT s' u m). In other words, to treat a parser as a state monad you have to apply ParsecT to some other state monad.
This sort of brings us to the next problem: Ambiguity. You're using a lot of type class methods and no type signatures, so you're likely to run into situations where GHC can't know what type you actually want, so you have to tell it.
Now, as a quick solution, let's first define a type synonym to give a name to the monad transformer stack we want:
type StateParse a = ParsecT String () (StateT Int Identity) a
Give dashCounter the relevant type signature:
dashCounter :: StateParse (Int, String)
dashCounter = do str <- many1 dash
count <- get
return (count,str)
And add a special-purpose "run" function:
runStateParse p sn inp count = runIdentity $ runStateT (runPT p () sn inp) count
Now, in GHCi:
Main> runStateParse dashCounter "" "---" 0
(Right (3,"---"),3)
Also, note that it's pretty common to use a newtype around a transformer stack instead of just a type synonym. This can help with the ambiguity issues in some cases, and obviously avoids ending up with gigantic type signatures.
If you want to use the user state component Parsec offers as a built-in feature, then you can use the getState and modifyState monadic functions.
I tried to stay true to your example program, though using the return of dash doesn't seem useful.
import Text.Parsec
dashCounter :: Parsec String Int (Int, [()])
dashCounter = do
str <- many1 dash
count <- getState
return (count,str)
dash :: Parsec String Int ()
dash = do
char '-'
modifyState (+1)
test = runP dashCounter 0 "" "---"
Note that runP is indeed addressing your concern about runState.
Whilst these answers sort out this specific problem, they ignore the more serious underlying issue with an approach like this. I would like to describe it here for anyone else looking at this answer.
There is a difference between the user state and using the StateT transformer. The internal user state is reset on backtracking but StateT is not. Consider the following code. We want to add one to our counter if there is a dash and two if there is a plus. They produce different results.
As can be seen both using the internal state and attaching a StateT transformer provide the correct result. The latter comes at the expense of having to explicitly lift operations and be much more careful with types.
import Text.Parsec hiding (State)
import Control.Monad.State
import Control.Monad.Identity
f :: ParsecT String Int Identity Int
f = do
try dash <|> plus
getState
dash = do
modifyState (+1)
char '-'
plus = do
modifyState (+2)
char '+'
f' :: ParsecT String () (State Int) ()
f' = void (try dash' <|> plus')
dash' = do
modify (+1)
char '-'
plus' = do
modify (+2)
char '+'
f'' :: StateT Int (Parsec String ()) ()
f'' = void (dash'' <|> plus'')
dash'' :: StateT Int (Parsec String ()) Char
dash'' = do
modify (+1)
lift $ char '-'
plus'' :: StateT Int (Parsec String ()) Char
plus'' = do
modify (+2)
lift $ char '+'
This is the result of running f, f' and f''.
*Main> runParser f 0 "" "+"
Right 2
*Main> flip runState 0 $ runPT f' () "" "+"
(Right (),3)
*Main> runParser (runStateT f'' 0) () "" "+"
Right ((),2)