I was reading this Monadic Parsing article while I was trying to implement a pretty simple string parser in Haskell and also get a better understanding of using monads. Down below you can see my code, implementing functions for matching a single character or a whole string. It works as expected, but I observed two strange behaviors that I can't explain.
I have to handle single characters in string, otherwise, the parser will return only empty lists. To be exact, if I remove this line string [c] = do char c; return [c] it won't work anymore. I was expecting that string (c:s) would handle string (c:[]) properly. What could be the cause here?
In my opinion, string definition should be equivalent to string s = mapM char s as it would create a list of [Parser Char] for each character in s and collect the results as Parser [Char]. If I use the definition based on mapM, the program would get stuck in an infinite loop and won't print anything. Is something about lazy evalutation that I miss here?
.
module Main where
newtype Parser a = Parser { apply :: String->[(a, String)] }
instance Monad Parser where
return a = Parser $ \s -> [(a, s)]
ma >>= k = Parser $ \s -> concat [apply (k a) s' | (a, s') <- apply ma s]
instance Applicative Parser where
pure = return
mf <*> ma = do { f <- mf; f <$> ma; }
instance Functor Parser where
fmap f ma = f <$> ma
empty :: Parser a
empty = Parser $ const []
anychar :: Parser Char
anychar = Parser f where
f [] = []
f (c:s) = [(c, s)]
satisfy :: (Char -> Bool) -> Parser Char
satisfy prop = do
c <- anychar
if prop c then return c
else empty
char :: Char -> Parser Char
char c = satisfy (== c)
string :: String -> Parser String
string [] = empty
string [c] = do char c; return [c] --- if I remove this line, all results will be []
string (c:s) = do char c; string s; return (c:s)
main = do
let s = "12345"
print $ apply (string "123") s
print $ apply (string "12") s
print $ apply (string "1") s
print $ apply (string []) s
PS. I think the title of the question is not suggestive enough, please propose an edit if you have a better idea.
Since you did string [] = empty instead of string [] = return [], you can't use it as a base case for recursion that builds up a list.
fmap f ma = f <$> ma is wrong, since <$> is defined in terms of fmap. If you want to define fmap in terms of your other instances, then do fmap = liftA or fmap = liftM. Since mapM uses fmap internally but your original string didn't, this problem didn't come up in your first simple test.
string [] = empty
means: "If you need to parse an empty string, fail -- it can not be parsed at all, no matter what's the input string".
By comparison,
string [] = return ""
means: "If you need to parse an empty string, succeed and return the empty string -- it can always be parsed, no matter what's the input string".
By using the first equation, when you recurse in the case string (c:cs) you need to stop at one character (string [c]) since reaching zero characters will run empty and make the whole parser fail.
Hence, you need to either use that string [c] = return [c] equation, or modify the base "empty string" case so that it succeeds. Arguably, the latter would be more natural.
i have these functions:
item :: Parser Char
item = Parser i
where i [] = []
i (x:xs) = [(x,xs)]
many :: Eq a=> Parser a -> Parser [a]
many p = many1 p +++ return []
many1 :: Eq a=> Parser a -> Parser [a]
many1 p = do v <- p
vs <- many p
returns (v:vs)
Im getting strange results applying differents strings. If i execute:
parse (many item) "x:=0"
i get
[('x',0)]
while if i use another string longer like "if x=0 then x:=0 else x:=1" it looks like go in loop. Doing some attempts it seems that if the string is longer more than 19 chars the function doesnt end. It's strange because on other string long less 19 chars it works good. What it could be?
Other definitions:
newtype Parser a = Parser { parse :: String -> [(a, String)] }
instance Monad Parser where
return t = Parser $ \s -> [(t, s)]
m >>= k = Parser $ \s -> [(x, y) | (u, v) <- parse m s, (x, y) <- parse (k u) v]
(+++) :: Eq a => Parser a -> Parser a -> Parser a
p +++ q = Parser $ \s -> if((parse p s)==[]) then parse q s else parse p s
Your code works fine, it's just that you've written your parser to have infinite backtracking and therefore O(2^n) runtime. Every character you add doubles the time it takes to complete:
$ time hugs foo.hs <<< 'parse (many item) "if x=0 then x:=0 els"'
[...]
Main> [("if x=0 then x:=0 els","")]
Main> [Leaving Hugs]
real 0m11.076s
user 0m10.578s
sys 0m0.016s
vs
$ time hugs foo.hs <<< 'parse (many item) "if x=0 then x:=0 else"'
[...]
Main> [("if x=0 then x:=0 else","")]
Main> [Leaving Hugs]
real 0m22.346s
user 0m22.048s
sys 0m0.036s
Your implementation of (+++) doesn’t do what you think it does. In particular, it will only return successful parsed from one of its arguments, rather than from both. Here is how to do what you want:
(+++) :: Parser a -> Parser a -> Parser a
p +++ q = Parser $ \s -> parse p s ++ parse q s
Although this doesn’t remove duplicates so be warned that you can end up with an explosion of parses by doing eg many (many item).
I'm a newbie to Haskell, and now I'm learning to use parsec. I get stuck in one problem, that is, I want to get all the sub-strings which satisfies some specific pattern in a string. For example, from the following string,
"I want to choose F12 or F 12 from F1(a), F2a, F5-A, F34-5 and so on,
but F alone should not be chosen, that is, choose those which start with F
followed by a digit (before the digit there could be zero or more than one space) and then by any character from ['a'..'z'] ++
['A'..'Z'] ++ ['0'..'9'] ++ ['(',')',"-"]."
the result should be [F12, F12, F1(a), F2a, F5-A, F34-5], where the space between the F and the digit should be deleted.
With the parsec, I have succeeded in getting one sub-string, such as F12, F2a. The code is as follows:
hao :: Parser Char
hao = oneOf "abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ()-"
tuhao :: Parser String
tuhao = do { c <- char 'F'
; many space
; c1 <- digit
; cs <- many hao
; return (c:c1:cs)
}
parse tuhao "" str -- can parse the str and get one sub-string.
However, I am stuck at how to parse the example string above and get all the sub-strings of the specific pattern. I have an idea that if F is found, then begin parsing, else skip parsing or if parsing fails then skip parsing. But I don't know how to implement the plan. I have another idea that uses State to record the remaining string that is not parsed, and use recursion, but still fail to carry it out.
So I appreciate any tip! ^_^
F12, F 12, F1(a), F2a, F5-A, F34-5
This is an incomplete description, so I'll make some guesses.
I would start by defining a type that can contain the logical parts of these expressions. E.g.
newtype F = F (Int, Maybe String) deriving Show
That is, "F" followed by a number and an optional part that is either letters, parenthesised letters, or a dash followed by letters/digits. Since the number after "F" can have multiple digits, I assume that the optional letters/digits may be multiple, too.
Since the examples are limited, I assume that the following aren't valid: F1a(b), F1(a)b, F1a-5, F1(a)-A, F1a(a)-5, F1a1, F1-(a), etc. and that the following are valid: F1A, F1abc, F1(abc), F1-abc, F1-a1b2. This is probably not true. [1]
I would then proceed to write parsers for each of these sub-parts and compose them:
module Main where
import Text.Parsec
import Data.Maybe (catMaybes)
symbol :: String -> Parser String
symbol s = string s <* spaces
parens :: Parser a -> Parser a
parens = between (string "(") (string ")")
digits :: Parser Int
digits = read <$> many1 digit
parseF :: Parser F
parseF = curry F <$> firstPart <*> secondPart
where
firstPart :: Parser Int
firstPart = symbol "F" >> digits
secondPart :: Parser (Maybe String)
secondPart = optionMaybe $ choice
[ many1 letter
, parens (many1 letter)
, string "-" >> many1 alphaNum
]
(As Jon Purdy writes in a comment,) using this parser on a string to get multiple matches,
extract :: Parser a -> Parser [a]
extract p = do (:) <$> try p <*> extract p
<|> do anyChar >> extract p
<|> do eof >> return []
readFs :: String -> Either ParseError [F]
readFs s = parse (extract parseF) "" s
main :: IO ()
main = print (readFs "F12, F 12, F1(a), F2a, F5-A, F34-5")
This prints:
Right [F (12,Nothing),F (12,Nothing),F (1,Just "a"),F (2,Just "a"),F (5,Just "A"),F (34,Just "5")]
Takeaways:
You can parse optional whitespace using token parsing (symbol).
You can parse optional parts with option, optionMaybe or optional.
You can alternate between combinators using a <|> b <|> c or choice [a, b, c].
When alternating between choices, make sure they don't have overlapping FIRST sets. Otherwise you need to try; this is nasty but sometimes unavoidable. (In this case, FIRST sets for the three choices are letter, string "(" and string "-", i.e. not overlapping.)
[1]: For the sake of restriction, I kept to the assumptions above, but I felt that I could also have assumed that F1a-B, F1(a)-5 and F1(a)-5A are valid, in which case I might change the model to:
newtype F = F (Int, Maybe String, Maybe String)
We can get sub-strings of specific pattern in a string with the
findAll
combinator from
replace-megaparsec.
Notice that this tuhao parser doesn't actually return anything. The findAll combinator just checks for success of the parser to find sub-strings which match the pattern.
import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Maybe
import Data.Either
let tuhao :: Parsec Void String ()
tuhao = do
void $ single 'F'
void $ space
void $ digitChar
void $ many $ oneOf "abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ()-"
input = "I want to choose F12 or F 12 from F1(a), F2a, F5-A, F34-5 and so on, but F alone should not be chosen, that is, choose those which start with F followed by a digit (before the digit there could be zero or more than one space) and then by any character from ['a'..'z'] ++ ['A'..'Z'] ++ ['0'..'9'] ++ ['(',')',\"-\"]."
rights $ fromJust $ parseMaybe (findAll tuhao) input
["F12","F 12","F1(a)","F2a","F5-A","F34-5"]
Im working through the exercises on wikibooks/haskell and there is an exercise in the MonadPlus-chapter that wants you to write this hexChar function. My function works as shown below, but the thing is that when I try to switch the 2 helper parsers (digitParse and alphaParse) around the function ceases to work properly. If I switch them around I can only parse digits and not alphabetic chars anymore.
Why is this so?
char :: Char -> String -> Maybe (Char, String)
char c s = do
let (c':s') = s
if c == c' then Just (c, s') else Nothing
digit :: Int -> String -> Maybe Int
digit i s | i > 9 || i < 0 = Nothing
| otherwise = do
let (c:_) = s
if read [c] == i then Just i else Nothing
hexChar :: String -> Maybe (Char, String)
hexChar s = alphaParse s `mplus` digitParse s -- cannot switch these to parsers around!!
where alphaParse s = msum $ map ($ s) (map char (['a'..'f'] ++ ['A'..'F']))
digitParse s = do let (c':s') = s
x <- msum $ map ($ s) (map digit [0..9])
return (intToDigit x, s')
if read [c] == i then Just i else Nothing
The marked code has a flaw. You're using Int's Read instance, e.g. read :: String -> Int. But if it's not possible to parse [c] as an int (e.g. "a"), read will throw an exception:
> digit 1 "doesnt start with a digit"
*** Exception: Prelude.read: no parse
> -- other example
> (read :: String -> Int) "a"
*** Exception: Prelude.read: no parse
Instead, go the other way:
if [c] == show i then Just i else Nothing
This will always works, since show won't fail (not counting cases where bottom is involved).
parse' :: Parser a -> String -> [(a,String)]
parse' p inp = p `with` inp
parse :: Parser a -> String -> [a]
parse p inp = [ v | (v,[]) <- parse' p inp ]
mkMany1 :: (Parser a -> Parser [a]) -> Parser a -> Parser [a]
mkMany1 many p = do x <- p
xs <- many p
return (x:xs)
many1L :: Parser a -> Parser [a]
many1L = mkMany1 manyL
manyL :: Parser a -> Parser [a]
manyL p = (many1L p) ||| (success [])
I'm trying to parse a String for a number of substrings that doesn't include the characters '<', '>' or ' '(space) but my parser doesn't seem to terminate. Can someone give me some pointers on what I'm missing?
textValid :: Char -> Bool
textValid c = c /= '<' && c /= '>' && not (isSpace c)
text :: Parser String
text = manyL (sat textValid)
When I try to run the following command, it never terminates.
parse (manyL text) "abc def <"
The problem is that manyL parser can succeed without consuming input (returning an empty list).
And one must not pass a parser that can succeed without consuming input as the argument of manyL, because in that case, you get precisely such an infinite loop as you are in.
After the first text consumed the "abc" prefix of the input, you are left with " def <" a String beginning with a space. So trying text on that, it consumes as many textValid characters as there are at the beginning of the String - namely 0 - and returns them - []. That leaves the same input. Now manyL text tries text another time to see if that succeeds too ...
You should probably define
text = many1L (sat textValid)
so that text doesn't succeed without consuming input, and probably it is a good idea to consume spaces from the beginning of the remaining input after each successful parse, like
text = do
result <- many1L (sat textValid)
skipSpaces
return result
(skipSpaces left to implement).