Is it possible to infer the type from many1?
MWE
module Main where
import System.Environment (getArgs)
import Text.ParserCombinators.Parsec
import Data.Either (rights)
type Vertex vertexWeight = (String, vertexWeight)
parseVertex :: Parser (Vertex a)
parseVertex = do
name <- many1 (noneOf "/")
char '/'
weight <- many1 (noneOf "\n")
return $ (name, weight)
main :: IO ()
main = do
putStrLn $ rights $ [parse parseVertex "test" "a/2"]
In the above example, I'd like for the weight parameter to get outputted as an Int, but this does not type-check.
Would it be wiser to represent a vertex as (String, String) and define parsers for the weight?
The type Parser (Vertex a) is shorthand for forall a. Parser (Vertex a), i.e. its type states that for any choice of a, it can have type Parser (Vertex a). This is clearly not what you want: you want to say that parseVertex will always have type Parser (Vertex a) for some choice of a, but this choice is to be made by parseVertex, not at its call site.
What you should do, is use a type T such that Parser (Vertex T) covers all possible return values of parseVertex. For example, if you use Parser (Vertex (Either Int String)), then parseVertex can choose based on the parse results so far if it will return something of the form (s, Left x), or (s, Right t), where s :: String, x :: Int and t :: String.
Of course, that also means that consumers of parseVector now have to be able to handle both cases.
Related
I have some input data of the form below (this is just a small sample).
ID_SID_0_LANG=eng
ID_VIDEO_FORMAT=H264
ID_VIDEO_HEIGHT=574
ID_START_TIME=0.00
ID_SUBTITLE_ID=0
ID_VIDEO_ID=0
ID_VIDEO_FPS=25.000
ID_VIDEO_WIDTH=700
I'm trying to see if I can parse this with Parsec. For the sake of our example, I want to pull out two values, the width and the height. I am trying to see if this can be done with Parsec.
The lines may come in any order
If either the width or the height are missing, I'd like a ParseError
If either the width or the height occur more than once, I'd like a ParseError
The other lines are mixed and varied per input, I can assume nothing beyond their basic format.
I'd like to use Parsec because I'm going to have to parse the values (which, in general, may be of different types - enumerations for codecs, elapsed types, strings, etc.). And I'd like my returned data structure to contain Naturals rather than, say, Maybe Natural, to simplify later code.
My problem is how to "parse" the leading ID_ lines that aren't interesting to me, but pick up only those that are. So I want to parse "any number of uninteresting ID_ lines; a height (or width); any number of uninteresting ID_ lines; a width (or height if width already found); any number of uninteresting ID_ lines). And I'd like to do this without repeating the notion of what constitutes an "interesting" key, because repetition is a primary cause of subtle error when being later maintained.
My best effort so far is to parse lines producing a list of Data Structure Modifiers for the interesting lines, each with a Key, and separately checking for presence of the required lines and lack of duplication of the unique lines; but that's not satisfying because I'm repeating the "interesting" keys.
Can this be elegantly done with Parsec?
Thanks,
Given that you want an "elegant" Parsec solution, I think you're looking for a variant of a permutation parser.
For background reading, see the documentation for Text.Parsec.Perm and its more modern incarnation in module Control.Applicative.Permutation of the parser-combinators library. In addition, this Functional Pearl paper Parsing Permutation Phrases describes the approach and is great fun to read.
There are two special aspects to your problem: First, I'm not aware of an existing permutation parser that allows for "unmatched" content before, between and after matched portions in a clean manner, and hacks like building the skip logic into the component parsers or deriving an extra parser to identify skippable lines for use in intercalateEffect from Control.Applicative.Permutation seem ugly. Second, the special structure of your input -- the fact that the lines can be recognized by the identifier rather than only general component parsers -- means that we can write a more efficient solution than a usual permutation parser, one that looks up identifiers in a map instead of trying a list of parsers in sequence.
Below is a possible solution. On the one hand, it's using a sledgehammer to kill a fly. In your simple situation, writing an ad hoc parser to read in the identifiers and their RHSs, check for required identifiers and duplicates, and then invoke identifier-specific parsers for the RHSs, seems more straightforward. On the other hand, maybe there are more complicated scenarios where the solution below would be justified, and I think it's conceivable it might be useful to others.
Anyway, here's the idea. First, some preliminaries:
{-# OPTIONS_GHC -Wall #-}
module ParseLines where
import Control.Applicative
import Control.Monad
import Data.List (intercalate)
import Text.Parsec (unexpected, eof, parseTest)
import Text.Parsec.Char (char, letter, alphaNum, noneOf, newline, digit)
import Text.Parsec.String (Parser)
import qualified Data.Map.Lazy as Map
import qualified Data.Set as Set
Let's say we have a data type representing the final result of the parse:
data Video = Video
{ width :: Int
, height :: Int
} deriving (Show)
We're going to construct a Permutation a parser. The type a is what we're going to eventually return (and in this case, it's always Video). This Permutation will actually be a Map from "known" identifiers like ID_VIDEO_WIDTH to a special kind of parser that will parse the right-hand side for the given identifier (e.g., an integer like 700) and then return -- not the parsed integer -- but a continuation Permutation a that parses the remaining data to construct a Video, with the parsed integer (e.g., 700) "baked in" to the continuation. The continuation will have a map that recognizes the "remaining" values, and we'll also keep track of known identifiers we've already read to flag duplicates.
We'll use the following type:
type Identifier = String
data Permutation a = Permutation
-- "seen" identifiers for flagging duplicates
(Set.Set Identifier)
(Either
-- if there are more values to read, map identifier to a parser
-- that parses RHS and returns continuation for parsing the rest
(Map.Map Identifier (Parser (Permutation a)))
-- or we're ready for an eof and can return the final value
a)
"Running" such a parser involves converting it to a plain Parser, and this is where we implement the logic for identifying recognized lines, flagging duplicates, and skipping unrecognized identifiers. First, here's a parser for identifiers. If you wanted to be more lenient, you could use many1 (noneOf "\n=") or something.
ident :: Parser String
ident = (:) <$> letter' <*> many alphaNum'
where letter' = letter <|> underscore
alphaNum' = alphaNum <|> underscore
underscore = char '_'
and here's a parser for skipping the rest of a line when we see an unrecognized identifier:
skipLine :: Parser ()
skipLine = void $ many (noneOf "\n") >> newline
Finally, here's how we run the Permutation parser:
runPermutation :: Permutation a -> Parser a
runPermutation p#(Permutation seen e)
= -- if end of file, return the final answer (or error)
eof *>
case e of
Left m -> fail $
"eof before " ++ intercalate ", " (Map.keys m)
Right a -> return a
<|>
-- otherwise, parse the identifier
do k <- ident <* char '='
-- is it one we're waiting for?
case either (Map.lookup k) (const Nothing) e of
-- no, it's not, so check for duplicates and skip
Nothing -> if Set.member k seen
then unexpected ("duplicate " ++ k)
else skipLine *> runPermutation p
-- yes, it is
Just prhs -> do
-- parse the RHS to get a continuation Permutation
-- and run it to parse rest of parameters
(prhs <* newline) >>= runPermutation
To see how this is supposed to work, here's how we would directly construct a Permutation to parse a Video. It's long, but not that complicated:
perm2 :: Permutation Video
perm2 = Permutation
-- nothing's been seen yet
Set.empty
-- parse width or height
$ Left (Map.fromList
[ ("ID_VIDEO_WIDTH", do
-- parse the width
w <- int
-- return a continuation permutation
return $ Permutation
-- we've seen width
(Set.fromList ["ID_VIDEO_WIDTH"])
-- parse height
$ Left (Map.fromList
[ ("ID_VIDEO_HEIGHT", do
-- parse the height
h <- int
-- return a continuation permutation
return $ Permutation
-- we've seen them all
(Set.fromList ["ID_VIDEO_WIDTH", "ID_VIDEO_HEIGHT"])
-- have all parameters, so eof returns the video
$ Right (Video w h))
]))
-- similarly for other permutation:
, ("ID_VIDEO_HEIGHT", do
h <- int
return $ Permutation
(Set.fromList ["ID_VIDEO_HEIGHT"])
$ Left (Map.fromList
[ ("ID_VIDEO_WIDTH", do
w <- int
return $ Permutation
(Set.fromList ["ID_VIDEO_WIDTH", "ID_VIDEO_HEIGHT"])
$ Right (Video w h))
]))
])
int :: Parser Int
int = read <$> some digit
You can test it like so:
testdata1 :: String
testdata1 = unlines
[ "ID_SID_0_LANG=eng"
, "ID_VIDEO_FORMAT=H264"
, "ID_VIDEO_HEIGHT=574"
, "ID_START_TIME=0.00"
, "ID_SUBTITLE_ID=0"
, "ID_VIDEO_ID=0"
, "ID_VIDEO_FPS=25.000"
, "ID_VIDEO_WIDTH=700"
]
test1 :: IO ()
test1 = parseTest (runPermutation perm2) testdata1
You should be able to verify that it provides appropriate errors for missing keys, duplicate entries for known keys, and accepts keys in any order.
Finally, we obviously don't want to construct permutation parsers like perm2 manually, so we take a page from the Text.Parsec.Perm module and introduce the following syntax:
video :: Parser Video
video = runPermutation (Video <$$> ("ID_VIDEO_WIDTH", int) <||> ("ID_VIDEO_HEIGHT", int))
and define operators to construct the necessary Permutation objects. These definitions are a little tricky, but they follow pretty directly from the definition of Permutation.
(<$$>) :: (a -> b) -> (Identifier, Parser a) -> Permutation b
f <$$> xq = Permutation Set.empty (Right f) <||> xq
infixl 2 <$$>
(<||>) :: Permutation (a -> b) -> (Identifier, Parser a) -> Permutation b
p#(Permutation seen e) <||> (x, q)
= Permutation seen (Left (Map.insert x q' m'))
where
q' = (\a -> addQ x a p) <$> q
m' = case e of Right _ -> Map.empty
Left m -> Map.map (fmap (<||> (x, q))) m
infixl 1 <||>
addQ :: Identifier -> a -> Permutation (a -> b) -> Permutation b
addQ x a (Permutation seen e)
= Permutation (Set.insert x seen) $ case e of
Right f -> Right (f a)
Left m -> Left (Map.map (fmap (addQ x a)) m)
and the final test:
test :: IO ()
test = parseTest video testdata1
giving:
> test
Video {width = 700, height = 574}
>
Here's the final code, slightly rearranged:
{-# OPTIONS_GHC -Wall #-}
module ParseLines where
import Control.Applicative
import Control.Monad
import Data.List (intercalate)
import Text.Parsec (unexpected, eof, parseTest)
import Text.Parsec.Char (char, letter, alphaNum, noneOf, newline, digit)
import Text.Parsec.String (Parser)
import qualified Data.Map.Lazy as Map
import qualified Data.Set as Set
-- * Permutation parser for identifier settings
-- | General permutation parser for a type #a#.
data Permutation a = Permutation
-- | "Seen" identifiers for flagging duplicates
(Set.Set Identifier)
-- | Either map of continuation parsers for more identifiers or a
-- final value once we see eof.
(Either (Map.Map Identifier (Parser (Permutation a))) a)
-- | Create a one-identifier 'Permutation' from a 'Parser'.
(<$$>) :: (a -> b) -> (Identifier, Parser a) -> Permutation b
f <$$> xq = Permutation Set.empty (Right f) <||> xq
infixl 2 <$$>
-- | Add a 'Parser' to a 'Permutation'.
(<||>) :: Permutation (a -> b) -> (Identifier, Parser a) -> Permutation b
p#(Permutation seen e) <||> (x, q)
= Permutation seen (Left (Map.insert x q' m'))
where
q' = (\a -> addQ x a p) <$> q
m' = case e of Right _ -> Map.empty
Left m -> Map.map (fmap (<||> (x, q))) m
infixl 1 <||>
-- | Helper to add a parsed component to a 'Permutation'.
addQ :: Identifier -> a -> Permutation (a -> b) -> Permutation b
addQ x a (Permutation seen e)
= Permutation (Set.insert x seen) $ case e of
Right f -> Right (f a)
Left m -> Left (Map.map (fmap (addQ x a)) m)
-- | Convert a 'Permutation' to a 'Parser' that detects duplicates
-- and skips unknown identifiers.
runPermutation :: Permutation a -> Parser a
runPermutation p#(Permutation seen e)
= -- if end of file, return the final answer (or error)
eof *>
case e of
Left m -> fail $
"eof before " ++ intercalate ", " (Map.keys m)
Right a -> return a
<|>
-- otherwise, parse the identifier
do k <- ident <* char '='
-- is it one we're waiting for?
case either (Map.lookup k) (const Nothing) e of
-- no, it's not, so check for duplicates and skip
Nothing -> if Set.member k seen
then unexpected ("duplicate " ++ k)
else skipLine *> runPermutation p
-- yes, it is
Just prhs -> do
-- parse the RHS to get a continuation Permutation
-- and run it to parse rest of parameters
(prhs <* newline) >>= runPermutation
-- | Left-hand side of a setting.
type Identifier = String
-- | Parse an 'Identifier'.
ident :: Parser Identifier
ident = (:) <$> letter' <*> many alphaNum'
where letter' = letter <|> underscore
alphaNum' = alphaNum <|> underscore
underscore = char '_'
-- | Skip (rest of) a line.
skipLine :: Parser ()
skipLine = void $ many (noneOf "\n") >> newline
-- * Parsing video information
-- | Our video data.
data Video = Video
{ width :: Int
, height :: Int
} deriving (Show)
-- | Parsing integers (RHS of width and height settings)
int :: Parser Int
int = read <$> some digit
-- | Some test data
testdata1 :: String
testdata1 = unlines
[ "ID_SID_0_LANG=eng"
, "ID_VIDEO_FORMAT=H264"
, "ID_VIDEO_HEIGHT=574"
, "ID_START_TIME=0.00"
, "ID_SUBTITLE_ID=0"
, "ID_VIDEO_ID=0"
, "ID_VIDEO_FPS=25.000"
, "ID_VIDEO_WIDTH=700"
]
-- | `Video` parser based on `Permutation`.
video :: Parser Video
video = runPermutation (Video <$$> ("ID_VIDEO_WIDTH", int) <||> ("ID_VIDEO_HEIGHT", int))
-- | The final test.
test :: IO ()
test = parseTest video testdata1
Indeed a simple solution would be to parse the file into Map ByteString ByteString, checking for duplicates while parsing, and then build the target result from that, checking that all required fields are present.
parseMap :: Parsec (Map ByteString ByteString)
-- ...
parseValues :: Map ByteString ByteString -> Parsec MyDataStructure
-- ...
Function parseValues can use Parsec again to parse the fields (perhaps using runP on each one) and to report errors or missing fields.
The disadvantage of this solution that parsing is done on two levels (once to get ByteStrings and the second time to parse them). And that this way we can't report correctly the position of errors found in parseValues. However, Parsec allows to get and set the current position in a file, so it might be feasible to include them in the map, and then use them when parsing the individual strings:
parseMap :: Parsec (Map ByteString (SourcePos, ByteString))
Using Parsec directly to parse the full result might be possible, but I'm afraid it'd be tricky to accomplish to allow arbitrary order and at the same time different output types of the fields.
If you don't mind a slight performance loss, write one parser for the width-line and one for the length-line and do something like this:
let ls = lines input in
case ([x | Right x <- parseWidth ls], [x | Right x <- parseLength ls]) of
([w],[l]) -> ...
_ -> parserError ...
It's easy to add separate error cases for repetated/missing values without repeating anything.
I'm actually using Parsec to make an Expression Parser and I have a little question (I'm french also sorry for my english ).
I have this code :
data Expression ... -- Recursive type Expression
type Store [(String, Float)] -- variable's storage
type Parser a = Parsec String () a
parseExpr :: [Char] -> Either ParseError Expression
parseExpr string = parse expr "" stream
where
stream = filter (not . isSpace) string
-- Parser's rules ...
raiseError a = Nothing
evalParser :: [Char] -> Store -> Float
evalParser expr store = fromMaybe 0 (either raiseError (eval store)(parseExpr expr))
This code works really well, but i need this function :
parseExpression :: String -> Maybe Expression
And I have no ideas of the right syntax to use.
Someone can help me please ?
I'll start it for you, and you can finish it:
eitherToMaybe :: Either a b -> Maybe b
eitherToMaybe (Left a) = ???
eitherToMaybe (Right b) = ???
A severe over-generalization looks like this (you can find this function with a less direct implementation in the monadplus package):
import Control.Applicative (Alternative (..))
import Data.Profunctor.Unsafe ((#.))
import Data.Monoid (Alt (..))
afold :: (Foldable f, Alternative g)
=> f a -> g a
afold = getAlt . foldMap (Alt #. pure)
But you don't really need to get into that business just yet.
I have a question. There is any solution for reading from file list of tuples ? Depends on content ?
I know that if i need to read integers i do something like that:
toTuple :: [String] -> [(Int,Int)]
toTuple = map (\y -> read y ::(Int,Int))
But in file i can have tuples this kind (int,int) or (char, int). Is any way to do this nice ?
I was trying to do this at first in finding sign " ' " . If it was, then reading chars, but it doesn't work for some reason.
[Edit]
To function to tuple, i give strings with tuples, before that i splits lines by space sign.
INPUT EXAMPLE:
Case 1 : ["(1,2)", "(1,3)" ,"(3,4)" ,"(1,4)"]
Case 2 : ["('a',2)", "('b',3)", "('g',8)", "('h',2)", "('r',4)"]
Just try both and choose the successful:
import Text.Read
import Control.Applicative
choose :: Maybe a -> Maybe b -> Maybe (Either a b)
choose x y = fmap Left x <|> fmap Right y
readListMaybe :: Read a => [String] -> Maybe [a]
readListMaybe = mapM readMaybe
toTuple :: [String] -> Maybe (Either [(Int, Int)] [(Char, Int)])
toTuple ss = readListMaybe ss `choose` readListMaybe ss
main = do
-- Just (Left [(1,2),(1,3),(3,4),(1,4)])
print $ toTuple ["(1,2)", "(1,3)" ,"(3,4)" ,"(1,4)"]
-- Just (Right [('a',2),('b',3),('g',8),('h',2),('r',4)])
print $ toTuple ["('a',2)", "('b',3)", "('g',8)", "('h',2)", "('r',4)"]
Here is a far more efficient (and unsafe) version:
readListWithMaybe :: Read a => String -> [String] -> Maybe [a]
readListWithMaybe s ss = fmap (: map read ss) (readMaybe s)
toTuple :: [String] -> Either [(Int, Int)] [(Char, Int)]
toTuple [] = Left []
toTuple (s:ss) = fromJust $ readListWithMaybe s ss `choose` readListWithMaybe s ss
In the first definition of toTuple
toTuple :: [String] -> Maybe (Either [(Int, Int)] [(Char, Int)])
toTuple ss = readListMaybe ss `choose` readListMaybe ss
readListMaybe is too strict:
readListMaybe :: Read a => [String] -> Maybe [a]
readListMaybe = mapM readMaybe
mapM is defined in terms of sequence which is defined in terms of (>>=) which is strict for the Maybe monad. And also the reference to ss is keeped for too long. The second version doesn't have these problems.
As I said it may be a good idea to consider using a parsing library, if the task at hand gets a bit more complicated.
First of all you have the benefit of getting error messages and if you decide to switch to a self declared data Type it is still easily applicable (with slight modifications of course).
Also switching from ByteString to Text (which are both preferable to working with String anyways) is just a matter of (un)commenting 4 lines
Here is some example if you have not had the pleasure to work with it.
I'll explain it some time later today - for I have to leave now.
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Attoparsec.ByteString.Char8
import Data.ByteString.Char8 as X
-- import Data.Attoparsec.Text
-- import Data.Text as X
main :: IO ()
main = do print <$> toTuples $ X.unlines ["(1,2)","(1,3)","(3,4)","(1,4)"]
print <$> toTuples $ X.unlines ["('a',2)","('h',2)","('r',4)"]
print <$> toTuples $ X.unlines ["('a',2)","(1,3)","(1,4)"] --works
print <$> toTuples $ "('a',2)" -- yields Right [Right ('a',2)]!!
print <$> toTuples $ "(\"a\",2)" -- yields Right []!!
toTuples = parseOnly (myparser `sepBy` skipSpace :: Parser [Either (Int,Int) (Char,Int)])
where myparser :: Parser (Either (Int,Int) (Char,Int))
myparser = eitherP (tupleP decimal decimal)
(tupleP charP decimal)
charP = do char '\''
c <- notChar '\''
char '\''
return c
tupleP :: Parser a -> Parser b -> Parser (a, b)
tupleP a b = do char '('
a' <- a
skipSpace
char ','
skipSpace
b' <- b
char ')'
return (a',b')
Edit: Explanation
Parser is a monad, so it comes with do-notation which enables us to write the tupleP function in this very convenient form. Same goes for charP - we describe what to parse in the primitives given by the attoparsec library
and it reads something like
first expect a quote
then something that is not allowed to be a quote
and another quote
return the not quote thingy
if you can write down the parser informally you're most likely halfway through writing the haskell code, the only thing left to do is find the primitives in the library or write some auxilary function like tupleP.
A nice thing is that Parsers (being monads) compose nicely so we get our desired parser eitherP (tupleP ..) (tupleP ..).
The only magic that happens in the print <$>.. lines is that Either is a functor and every function using <$> or fmap uses the Right side of the Eithers.
Last thing to note is sepBy returns a list - so in the case where the parsing fails we still get an empty list as a result, if you want to see the failing use sepBy1 instead!
can someone help me to understand how to use Applicative style for writing Parsec parsers? This is the code i have:
module Main where
import Control.Applicative hiding (many)
import Text.Parsec
import Data.Functor.Identity
data Cmd = A | B deriving (Show)
main = do
line <- getContents
putStrLn . show $ parseCmd line
parseCmd :: String -> Either ParseError String
parseCmd input = parse cmdParse "(parser)" input
cmdParse :: Parsec String () String
cmdParse = do
slash <- char '/'
whatever <- many alphaNum
return (slash:whatever)
cmdParse2 :: String -> Parsec String () String
cmdParse2 = (:) <$> (char '/') <*> many alphaNum
but when i try to compile it, i get following:
/home/tomasherman/Desktop/funinthesun.hs:21:13:
Couldn't match expected type `Parsec String () String'
with actual type `[a0]'
Expected type: a0 -> [a0] -> Parsec String () String
Actual type: a0 -> [a0] -> [a0]
In the first argument of `(<$>)', namely `(:)'
In the first argument of `(<*>)', namely `(:) <$> (char '/')'
Failed, modules loaded: none.
The idea is that i want cmdParse2 to do same thing that cmdParse does, but using applicative stuff...my approach is probably completely wrong, i'm new to haskell
Your applicative usage is spot on, you just have an incorrect signature. Try:
cmdParse2 :: Parsec String () String
Your approach looks correct to me, the problem is that cmdParse2 has the wrong type. It should have the same type as cmdParse. By the way, you can omit the parens around char '/' in the applicative style parser.
I'm surprised that I could not find any info on this. I must be the only person having any trouble with it.
So, let's say I have a dash counter. I want it to count the number of dashes in the string, and return the string. Pretend I gave an example that won't work using parsec's state handling. So this should work:
dashCounter = do
str <- many1 dash
count <- get
return (count,str)
dash = do
char '-'
modify (+1)
And indeed, this compiles. Okay, so I try to use it:
:t parse dashCounter "" "----"
parse dashCounter "" "----"
:: (Control.Monad.State.Class.MonadState
t Data.Functor.Identity.Identity,
Num t) =>
Either ParseError (t, [Char])
Okay, that makes sense. It should return the state and the string. Cool.
>parse dashCounter "" "----"
<interactive>:1:7:
No instance for (Control.Monad.State.Class.MonadState
t0 Data.Functor.Identity.Identity)
arising from a use of `dashCounter'
Possible fix:
add an instance declaration for
(Control.Monad.State.Class.MonadState
t0 Data.Functor.Identity.Identity)
In the first argument of `parse', namely `dashCounter'
In the expression: parse dashCounter "" "----"
In an equation for `it': it = parse dashCounter "" "----"
Oops. But then how could it have ever hoped to work in the first place? There's no way to input the initial state.
There is also a function:
>runPT dashCounter (0::Int) "" "----"
But it gives a similar error.
<interactive>:1:7:
No instance for (Control.Monad.State.Class.MonadState Int m0)
arising from a use of `dashCounter'
Possible fix:
add an instance declaration for
(Control.Monad.State.Class.MonadState Int m0)
In the first argument of `runPT', namely `dashCounter'
In the expression: runPT dashCounter (0 :: Int) "" "----"
In an equation for `it':
it = runPT dashCounter (0 :: Int) "" "----"
I feel like I should have to runState on it, or there should be a function that already does it internally, but I can't seem to figure out where to go from here.
Edit: I should have specified more clearly, I did not want to use parsec's state handling. The reason is I have a feeling I don't want its backtracking to affect what it collects with the problem I'm preparing to solve it with.
However, Mr. McCann has figured out how this should fit together and the final code would look like this:
dashCounter = do
str <- many1 dash
count <- get
return (count,str)
dash = do
c <- char '-'
modify (+1)
return c
test = runState (runPT dashCounter () "" "----------") 0
Thanks a lot.
You've actually got multiple problems going on here, all of which are relatively non-obvious the first time around.
Starting with the simplest: dash is returning (), which doesn't seem to be what you want given that you're collecting the results. You probably wanted something like dash = char '-' <* modify (+1). (Note that I'm using an operator from Control.Applicative here, because it looks tidier)
Next, clearing up a point of confusion: When you get the reasonable-looking type signature in GHCi, note the context of (Control.Monad.State.Class.MonadState t Data.Functor.Identity.Identity, Num t). That's not saying what things are, it's telling you want they need to be. Nothing guarantees that the instances it's asking for exist and, in fact, they don't. Identity is not a state monad!
On the other hand, you're absolutely correct in thinking that parse doesn't make sense; you can't use it here. Consider its type: Stream s Identity t => Parsec s () a -> SourceName -> s -> Either ParseError a. As is customary with monad transformers, Parsec is an synonym for ParsecT applied to the identity monad. And while ParsecT does provide user state, you apparently don't want to use it, and ParsecT does not give an instance of MonadState anyhow. Here's the only relevant instance: MonadState s m => MonadState s (ParsecT s' u m). In other words, to treat a parser as a state monad you have to apply ParsecT to some other state monad.
This sort of brings us to the next problem: Ambiguity. You're using a lot of type class methods and no type signatures, so you're likely to run into situations where GHC can't know what type you actually want, so you have to tell it.
Now, as a quick solution, let's first define a type synonym to give a name to the monad transformer stack we want:
type StateParse a = ParsecT String () (StateT Int Identity) a
Give dashCounter the relevant type signature:
dashCounter :: StateParse (Int, String)
dashCounter = do str <- many1 dash
count <- get
return (count,str)
And add a special-purpose "run" function:
runStateParse p sn inp count = runIdentity $ runStateT (runPT p () sn inp) count
Now, in GHCi:
Main> runStateParse dashCounter "" "---" 0
(Right (3,"---"),3)
Also, note that it's pretty common to use a newtype around a transformer stack instead of just a type synonym. This can help with the ambiguity issues in some cases, and obviously avoids ending up with gigantic type signatures.
If you want to use the user state component Parsec offers as a built-in feature, then you can use the getState and modifyState monadic functions.
I tried to stay true to your example program, though using the return of dash doesn't seem useful.
import Text.Parsec
dashCounter :: Parsec String Int (Int, [()])
dashCounter = do
str <- many1 dash
count <- getState
return (count,str)
dash :: Parsec String Int ()
dash = do
char '-'
modifyState (+1)
test = runP dashCounter 0 "" "---"
Note that runP is indeed addressing your concern about runState.
Whilst these answers sort out this specific problem, they ignore the more serious underlying issue with an approach like this. I would like to describe it here for anyone else looking at this answer.
There is a difference between the user state and using the StateT transformer. The internal user state is reset on backtracking but StateT is not. Consider the following code. We want to add one to our counter if there is a dash and two if there is a plus. They produce different results.
As can be seen both using the internal state and attaching a StateT transformer provide the correct result. The latter comes at the expense of having to explicitly lift operations and be much more careful with types.
import Text.Parsec hiding (State)
import Control.Monad.State
import Control.Monad.Identity
f :: ParsecT String Int Identity Int
f = do
try dash <|> plus
getState
dash = do
modifyState (+1)
char '-'
plus = do
modifyState (+2)
char '+'
f' :: ParsecT String () (State Int) ()
f' = void (try dash' <|> plus')
dash' = do
modify (+1)
char '-'
plus' = do
modify (+2)
char '+'
f'' :: StateT Int (Parsec String ()) ()
f'' = void (dash'' <|> plus'')
dash'' :: StateT Int (Parsec String ()) Char
dash'' = do
modify (+1)
lift $ char '-'
plus'' :: StateT Int (Parsec String ()) Char
plus'' = do
modify (+2)
lift $ char '+'
This is the result of running f, f' and f''.
*Main> runParser f 0 "" "+"
Right 2
*Main> flip runState 0 $ runPT f' () "" "+"
(Right (),3)
*Main> runParser (runStateT f'' 0) () "" "+"
Right ((),2)