Why does the order of these Haskell Parsec combinators matter?

Why does the order of these Haskell Parsec combinators matter? - haskell

I want to make a simple parser to parse an addition expression. Here is my code:
import Text.Parsec.Char
import Text.Parsec.String
import Text.ParserCombinators.Parsec
data Expr = Number Float |
Add Expr Expr |
number :: Parser Expr
number = do
n <- try $ many1 digit
return $ Number $ read n
add :: Parser Expr
add = do
e1 <- number
char '+'
e2 <- number
return $ Add e1 e2
expr :: Parser Expr
expr = try number <|> try add
p :: String -> Either ParseError Expr
p = parse (do{e <- expr; eof; return e}) "error"
But here is the output
ghci> parse add "err" "1+2"
Right (Add (Number 1.0) (Number 2.0))
ghci> p "1"
Right (Number 1.0)
ghci> p "1+2"
Left "error" (line 1, column 2):
unexpected '+'
expecting digit or end of input
But if I change the order of the expr combinators to
expr :: Parser Expr
expr = try add <|> try number
Then the output changes to
ghci> p "1+2"
Right (Add (Number 1.0) (Number 2.0))
Why does this happen? I thought the try keyword forces the Parsers I am combining to restart after each <|>.
I plan on making this much larger so I want to be sure I understand why this is happening now.
My actual program is larger already but this is still causing a problem independantly.

The problem you're facing is that when the string "1+2" is parsed with number, it succeeds (admittedly, with some unparsed characters). The use of try only matters if it had failed.
Perhaps another way to show this is to consider the example try (string "a") <|> try (string "ab"). This will succeed in matching any string starting with the character a, but it will never match on strings that start with "ab".
If you had tried instead
exprAll :: Parser Expr
exprAll = try (number <* eof) <|> try (add <* eof)
then you may get the behavior you're looking for. In this case, the "try"d parser does not succeed until the end-of-file character is reached, so when the + is encountered, the parse attempt of number <* eof fails and then parsing starts over using add <* eof.

Related

string to Integer parsing with Exception Handling in Haskell

Trying to make type nat which return [ (int, string) ]; handle zero as string
and type int which can handle negative integer also.
import Data.Char ( isDigit )
type Parser tok a = [tok] -> [(a, [tok])]
return :: a -> Parser tok a
return v = \ts -> [(v, ts)]
failure :: Parser tok a
failure = \_ -> []
item :: Parser tok tok
item [] = []
item (t:ts) = [(t, ts)]
sat :: (tok -> Bool) -> Parser tok tok
sat test = item >>= \t ->
if test t then return t
else failure
(<|>) :: Parser tok a -> Parser tok a -> Parser tok a
p1 <|> p2 = \ts -> case p1 ts of
[] -> p2 ts
rs1 -> rs1
many1 :: Parser tok a -> Parser tok [a]
many p = many1 p <|> return []
many1 p = p >>= \v ->
many p >>= \vs ->
return (v:vs)
digit = sat isDigit
first I make nat and use if ~ then ~ else make exception of 0.
nat :: Parser Char Int
nat = many1 digit >>= \s ->
if (s == "0") then failure
else return (read s)
It was able to handle one zero.
nat "0"
> []
nat "0def"
> []
but can't handle when consecutive numbers starting with 0 come to input.
nat "012def"
> [(12, "def")] -- probably my code only handle 1 Zero
-- expect return []
nat "000def"
> [(0, "def")] -- but i really don't know why this output is coming
-- expect return []
I tried to put aside the problem about nat and make an int first.
first I tried to use nat to define int.
int :: Parser Char Int
int = nat >>= \s ->
if (s < 0) then return (-s)
else return s
And I realize that i can't make to compiler recognize that s is negative.
so i tried to make it simillar as nat. which I want to add char '-'?
int :: Parser Char Int
int = many1 digit >>= \s ->
if (s == "-") then return (read s) ++ "-"
else return (read s)
I have two question
Why nat only handle 1 zero? I thought many1 is recursive parsing string step-by-step.
How can I add "-" in int? does it related with thins like synthetic function?
I am weak at using Haskell. is there something i'm missing?

Don't panic. It is not so difficult.
Why nat only handle 1 zero? I thought many1 is recursive parsing
string step-by-step.
You are right many1 digit will apply the parser digit on the longer chain possible. This means that when you are parsing 012def the parsed part is "012" and the remaining is "def". But you are missing where the error is. It is in the following
\s -> if (s == "0") then failure
else return (read s)
Once, many1 digit finish it provides "012" to your function via the bind (>>=) and it is process. But "012" is not "0". So the else part is applied : return (read "012"). The context "def" is not modified. So this results to the answer [12,"def"]. As you can see the problem is not the first part of the bind, but the second one. Since you know that you will only get digit (because many1 digit can only parse digit), I suggest that you change your binder with
let n = read s
in if n == 0 then failure
else n
More elegant (monadic) solution are possibles, but this one is probably quite readable and explicit for new haskeller.
How can I add "-" in int? does it related with thins like synthetic function?
You can't. You want to read a Char that cannot be translate as a digit. So you can't read it as digit. This is a simple parsing problem. You have to deal with the '-' separately. You can use a choice between two parsers. The first parses the Char '-' followed by a natural. The second parses only natural. Then translate the result in an Int. Take care to not forget the 0. It can be handle in the first, the second or both. This means that you can't use nat in both... or you will have to make a third parser that only parse 0. So remember that the order is important.
I am pretty sure that can be write this considering the previous work. You already have the digit and the choice combinator. The char parser is pretty obvious. You just have to think about it.
Hope this will help.

Adding infix operator to expression parser

I'm trying to add a parser for infix operators to a simple expressions parser. I have already looked at the documentation and at this question, but it seems like I am missing something.
import qualified Text.Parsec.Expr as Expr
import qualified Text.Parsec.Token as Tokens
import Text.ParserCombinators.Parsec
import Text.Parsec
data Expr = Number Integer
| Op Expr Expr
| Boolean Bool
instance Show Expr where
show (Op l r) = "(+ " ++ (show l) ++ " " ++ (show r) ++ ")"
show (Number r) = show r
show (Boolean b) = show b
parens = Tokens.parens haskell
reserved = Tokens.reservedOp haskell
infix_ operator func =
Expr.Infix (spaces >> reserved operator >> spaces >> return func) Expr.AssocLeft
infixOp =
Expr.buildExpressionParser table parser
where
table = [[infix_ "+" Op]]
number :: Parser Expr
number =
do num <- many1 digit
return $ Number $ read num
bool :: Parser Expr
bool = (string "true" >> return (Boolean True)) <|> (string "false" >> return (Boolean False))
parser = parens infixOp <|> number <|> bool
run = Text.Parsec.runParser parser () ""
This parser is able to parse expressions like 1, false, (1 + 2), (1 + false), but not 1 + 2 (it's parsed as 1). If I try to change the parser to parens infixOp <|> infixOp <|> number <|> bool, it get stuck.
What should i change in order to parse expressions like 1 + 2 without parenthesis?

You have to run the infixOp parser at the top level like this:
run = Text.Parsec.runParser infixOp () ""
Otherwise the your infix expressions can only be parsed when occuring in parentheses.
The attempt to use parens infixOp <|> infixOp <|> number <|> bool most likely gets stuck because it loops: parser tries to parse using infixOp, which tries to parse using parse and so on...
These tutorial might help you getting started with parsec (they did for me):
https://wiki.haskell.org/Parsing_a_simple_imperative_language
http://dev.stephendiehl.com/fun/002_parsers.html

How to use parsec to get sub-strings of specific pattern in a string

I'm a newbie to Haskell, and now I'm learning to use parsec. I get stuck in one problem, that is, I want to get all the sub-strings which satisfies some specific pattern in a string. For example, from the following string,
"I want to choose F12 or F 12 from F1(a), F2a, F5-A, F34-5 and so on,
but F alone should not be chosen, that is, choose those which start with F
followed by a digit (before the digit there could be zero or more than one space) and then by any character from ['a'..'z'] ++
['A'..'Z'] ++ ['0'..'9'] ++ ['(',')',"-"]."
the result should be [F12, F12, F1(a), F2a, F5-A, F34-5], where the space between the F and the digit should be deleted.
With the parsec, I have succeeded in getting one sub-string, such as F12, F2a. The code is as follows:
hao :: Parser Char
hao = oneOf "abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ()-"
tuhao :: Parser String
tuhao = do { c <- char 'F'
; many space
; c1 <- digit
; cs <- many hao
; return (c:c1:cs)
}
parse tuhao "" str -- can parse the str and get one sub-string.
However, I am stuck at how to parse the example string above and get all the sub-strings of the specific pattern. I have an idea that if F is found, then begin parsing, else skip parsing or if parsing fails then skip parsing. But I don't know how to implement the plan. I have another idea that uses State to record the remaining string that is not parsed, and use recursion, but still fail to carry it out.
So I appreciate any tip! ^_^

F12, F 12, F1(a), F2a, F5-A, F34-5
This is an incomplete description, so I'll make some guesses.
I would start by defining a type that can contain the logical parts of these expressions. E.g.
newtype F = F (Int, Maybe String) deriving Show
That is, "F" followed by a number and an optional part that is either letters, parenthesised letters, or a dash followed by letters/digits. Since the number after "F" can have multiple digits, I assume that the optional letters/digits may be multiple, too.
Since the examples are limited, I assume that the following aren't valid: F1a(b), F1(a)b, F1a-5, F1(a)-A, F1a(a)-5, F1a1, F1-(a), etc. and that the following are valid: F1A, F1abc, F1(abc), F1-abc, F1-a1b2. This is probably not true. [1]
I would then proceed to write parsers for each of these sub-parts and compose them:
module Main where
import Text.Parsec
import Data.Maybe (catMaybes)
symbol :: String -> Parser String
symbol s = string s <* spaces
parens :: Parser a -> Parser a
parens = between (string "(") (string ")")
digits :: Parser Int
digits = read <$> many1 digit
parseF :: Parser F
parseF = curry F <$> firstPart <*> secondPart
where
firstPart :: Parser Int
firstPart = symbol "F" >> digits
secondPart :: Parser (Maybe String)
secondPart = optionMaybe $ choice
[ many1 letter
, parens (many1 letter)
, string "-" >> many1 alphaNum
]
(As Jon Purdy writes in a comment,) using this parser on a string to get multiple matches,
extract :: Parser a -> Parser [a]
extract p = do (:) <$> try p <*> extract p
<|> do anyChar >> extract p
<|> do eof >> return []
readFs :: String -> Either ParseError [F]
readFs s = parse (extract parseF) "" s
main :: IO ()
main = print (readFs "F12, F 12, F1(a), F2a, F5-A, F34-5")
This prints:
Right [F (12,Nothing),F (12,Nothing),F (1,Just "a"),F (2,Just "a"),F (5,Just "A"),F (34,Just "5")]
Takeaways:
You can parse optional whitespace using token parsing (symbol).
You can parse optional parts with option, optionMaybe or optional.
You can alternate between combinators using a <|> b <|> c or choice [a, b, c].
When alternating between choices, make sure they don't have overlapping FIRST sets. Otherwise you need to try; this is nasty but sometimes unavoidable. (In this case, FIRST sets for the three choices are letter, string "(" and string "-", i.e. not overlapping.)
[1]: For the sake of restriction, I kept to the assumptions above, but I felt that I could also have assumed that F1a-B, F1(a)-5 and F1(a)-5A are valid, in which case I might change the model to:
newtype F = F (Int, Maybe String, Maybe String)

We can get sub-strings of specific pattern in a string with the
findAll
combinator from
replace-megaparsec.
Notice that this tuhao parser doesn't actually return anything. The findAll combinator just checks for success of the parser to find sub-strings which match the pattern.
import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Maybe
import Data.Either
let tuhao :: Parsec Void String ()
tuhao = do
void $ single 'F'
void $ space
void $ digitChar
void $ many $ oneOf "abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ()-"
input = "I want to choose F12 or F 12 from F1(a), F2a, F5-A, F34-5 and so on, but F alone should not be chosen, that is, choose those which start with F followed by a digit (before the digit there could be zero or more than one space) and then by any character from ['a'..'z'] ++ ['A'..'Z'] ++ ['0'..'9'] ++ ['(',')',\"-\"]."
rights $ fromJust $ parseMaybe (findAll tuhao) input
["F12","F 12","F1(a)","F2a","F5-A","F34-5"]

Parsec3 Text parser for quoted string, where everything is allowed in between quotes

I have actually asked this question before (here) but it turns out that the solution provided did not handle all test cases. Also, I need 'Text' parser rather than 'String', so I need parsec3.
Ok, the parser should allow for EVERY type of char inbetween quotes, even quotes. The end of the quoted text is marked by a ' character, followed by |, a space or end of input.
So,
'aa''''|
should return a string
aa'''
This is what I have:
import Text.Parsec
import Text.Parsec.Text
quotedLabel :: Parser Text
quotedLabel = do -- reads the first quote.
spaces
string "'"
lab <- liftM pack $ endBy1 anyChar endOfQuote
return lab
endOfQuote = do
string "'"
try(eof) <|> try( oneOf "| ")
Now, the problem here is of course that eof has a different type than oneOf "| ", so compilation falls.
How do I fix this? Is there a better way to achieve what I am trying to do?

Whitespace
First a comment on handling white space...
Generally the practice is to write your parsers so that they
consume the whitespace following a token
or syntactic unit. It's common to define combinator like:
lexeme p = p <* spaces
to easily convert a parser p to one that discards the whitespace
following whatever p parses. E.g., if you have
number = many1 digit
simply use lexeme number whenever you want to eat up the
whitespace following the number.
For more on this approach to handling whitespace and other advice
on parsing languages, see this Megaparsec tutorial.
Label expressions
Based on your previous SO question it appears you want
to parse expressions of the form:
label1 | label2 | ... | labeln
where each label may be a simple label or a quoted label.
The idiomatic way to parse this pattern is to use sepBy like this:
labels :: Parser String
labels = sepBy1 (try quotedLabel <|> simpleLabel) (char '|')
We define both simpleLabel and quotedLabel in terms of
what characters may occur in them. For simpleLabel a valid
character is a non-| and non-space:
simpleLabel :: Parser String
simpleLabel = many (noneOf "| ")
A quotedLabel is a single quote followed by a run
of valid quotedLabel-characters followed by an ending
single quote:
sq = char '\''
quotedLabel :: Parser String
quotedLabel = do
char sq
chs <- many validChar
char sq
return chs
A validChar is either a non-single quote or a single
quote not followed by eof or a vertical bar:
validChar = noneOf [sq] <|> try validQuote
validQuote = do
char sq
notFollowedBy eof
notFollowedBy (char '|')
return sq
The first notFollowedBy will fail if the single quote appears just
before the end of input. The second notFollowedBy will fail if
next character is a vertical bar. Therefore the sequence of the two
will succeed only if there is a non-vertical bar character following
the single quote. In this case the single quote should be interpreted
as part of the string and not the terminating single quote.
Unfortunately this doesn't quite work because the
current implementation of notFollowedBy
will always succeed with a parser which does not consume any
input -- i.e. like eof. (See this issue for more details.)
To work around this problem we can use this alternate
implementation:
notFollowedBy' :: (Stream s m t, Show a) => ParsecT s u m a -> ParsecT s u m ()
notFollowedBy' p = try $ join $
do {a <- try p; return (unexpected (show a));}
<|> return (return ())
Here is the complete solution with some tests. By adding a few lexeme
calls you can make this parser eat up any white space where you decide
it is not significant.
import Text.Parsec hiding (labels)
import Text.Parsec.String
import Control.Monad
notFollowedBy' :: (Stream s m t, Show a) => ParsecT s u m a -> ParsecT s u m ()
notFollowedBy' p = try $ join $
do {a <- try p; return (unexpected (show a));}
<|> return (return ())
sq = '\''
validChar = do
noneOf "'" <|> try validQuote
validQuote = do
char sq
notFollowedBy' eof
notFollowedBy (char '|')
return sq
quotedLabel :: Parser String
quotedLabel = do
char sq
str <- many validChar
char sq
return str
plainLabel :: Parser String
plainLabel = many (noneOf "| ")
labels :: Parser [String]
labels = sepBy1 (try quotedLabel <|> try plainLabel) (char '|')
test input expected = do
case parse (labels <* eof) "" input of
Left e -> putStrLn $ "error: " ++ show e
Right v -> if v == expected
then putStrLn $ "OK - got: " ++ show v
else putStrLn $ "NOT OK - got: " ++ show v ++ " expected: " ++ show expected
test1 = test "a|b|c" ["a","b","c"]
test2 = test "a|'b b'|c" ["a", "b b", "c"]
test3 = test "'abc''|def" ["abc'", "def" ]
test4 = test "'abc'" ["abc"]
test5 = test "x|'abc'" ["x","abc"]

To change the result of any functor computation you can just use:
fmap (const x) functor_comp
e.g.:
getLine :: IO String
fmap (const ()) getLine :: IO ()
eof :: Parser ()
oneOf "| " :: Parser Char
fmap (const ()) (oneOf "| ") :: Parser ()
Another option is to use operators from Control.Applicative:
getLine *> return 3 :: IO Integer
This performs getLine, discards the result and returns 3.
In your case, you might use:
try(eof) <|> try( oneOf "| " *> return ())

datatype conversion without using buildExpressionParser

I am stuck at a point in converting a expression entered by the user to my own datatype
I did it using biuldExpressionParser , but using simple parser and recursion I did as follows
openBrace = char '('
closeBrace :: GenParser Char st Char
closeBrace = char ')'
bracketExpr = do
spaces >> openBrace
expr <- expressionParser
spaces >> closeBrace
return expr
bracketExpr will return the entered expression in my own datatype
to convert it into my datatype I did for negation,if expression is a number or a variable as follows:
expressionParser = negate1
<|> number
<|> variable
--<|> addition
<?> "simple expression"
negate1 :: Parser Expr
negate1 = do{ char '-'
;ds <- number
;return (ExprNeg (ds) )
}
<?> "negate"
variable :: Parser Expr
variable = do{ ds<- many1 (letter <|> digit)
; return (ExprVar ds)}
<?> "variable"
number :: Parser Expr
number = do{ ds<- many1 digit
; return (ExprNum (read ds))}
<?> "number"
To do the same for addtion I tried by seperating the expression using sepBy but I am encountering several issues.
If the extered expreesion is 1+2
Then I should getExprAdd (ExprNum 1) (ExprNum 2)
I am unable to proceed further from here .Help would be great.
Thank you.

If you want to be writing a parser with parser combinators you need to think in terms of high-level rules first. Here's a skeleton parser in Parsec; it does not 100% meet your needs because all of the operators are same-precedence and right-associative, whereas you probably want different precedences and left-associativity. Still, this is the basic way to write a parser:
import Text.Parsec
import Text.Parsec.Char
import Data.Char (isDigit)
-- basic data type
data Expr = Op Char Expr Expr | N Integer deriving (Show)
type Parser x = Parsec String () x
-- reverse-sequenced >>, used to implement `parenthesized` and `whitespaced`
(<<) :: Monad m => m x -> m y -> m x
mx << my = mx >>= \x -> my >> return x
infixl 1 <<
parenthesized :: Parser e -> Parser e
parenthesized e = char '(' >> e << char ')'
whitespaced :: Parser e -> Parser e
whitespaced e = spaces >> e << spaces
number :: Parser Expr
number = do
c <- oneOf "123456789" -- leading 0's can be reserved for octal/hexadecimal
cs <- many digit
return (N (read (c:cs)))
operator :: Parser Expr
operator = do
e1 <- expr_no_op
o <- whitespaced (oneOf "+*/-")
e2 <- expression
return (Op o e1 e2)
expr_no_op :: Parser Expr
expr_no_op = whitespaced (try number <|> parenthesized expression)
expression :: Parser Expr
expression = whitespaced (try operator <|> try number <|> parenthesized expression)
Notice that you define tokens (above, just 'number') and then combine them with a "try this <|> try that <|> otherwise..." syntax. Notice also that you have to stop operator from taking an expression as its first argument otherwise you'll get a operator -> expression -> operator loop in the parsing. This is called "left factoring."

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why does the order of these Haskell Parsec combinators matter? - haskell

Related

string to Integer parsing with Exception Handling in Haskell

Adding infix operator to expression parser

How to use parsec to get sub-strings of specific pattern in a string

Parsec3 Text parser for quoted string, where everything is allowed in between quotes

datatype conversion without using buildExpressionParser

Categories

Resources