Adding infix operator to expression parser - haskell

I'm trying to add a parser for infix operators to a simple expressions parser. I have already looked at the documentation and at this question, but it seems like I am missing something.
import qualified Text.Parsec.Expr as Expr
import qualified Text.Parsec.Token as Tokens
import Text.ParserCombinators.Parsec
import Text.Parsec
data Expr = Number Integer
| Op Expr Expr
| Boolean Bool
instance Show Expr where
show (Op l r) = "(+ " ++ (show l) ++ " " ++ (show r) ++ ")"
show (Number r) = show r
show (Boolean b) = show b
parens = Tokens.parens haskell
reserved = Tokens.reservedOp haskell
infix_ operator func =
Expr.Infix (spaces >> reserved operator >> spaces >> return func) Expr.AssocLeft
infixOp =
Expr.buildExpressionParser table parser
where
table = [[infix_ "+" Op]]
number :: Parser Expr
number =
do num <- many1 digit
return $ Number $ read num
bool :: Parser Expr
bool = (string "true" >> return (Boolean True)) <|> (string "false" >> return (Boolean False))
parser = parens infixOp <|> number <|> bool
run = Text.Parsec.runParser parser () ""
This parser is able to parse expressions like 1, false, (1 + 2), (1 + false), but not 1 + 2 (it's parsed as 1). If I try to change the parser to parens infixOp <|> infixOp <|> number <|> bool, it get stuck.
What should i change in order to parse expressions like 1 + 2 without parenthesis?

You have to run the infixOp parser at the top level like this:
run = Text.Parsec.runParser infixOp () ""
Otherwise the your infix expressions can only be parsed when occuring in parentheses.
The attempt to use parens infixOp <|> infixOp <|> number <|> bool most likely gets stuck because it loops: parser tries to parse using infixOp, which tries to parse using parse and so on...
These tutorial might help you getting started with parsec (they did for me):
https://wiki.haskell.org/Parsing_a_simple_imperative_language
http://dev.stephendiehl.com/fun/002_parsers.html

Related

Why does the order of these Haskell Parsec combinators matter?

I want to make a simple parser to parse an addition expression. Here is my code:
import Text.Parsec.Char
import Text.Parsec.String
import Text.ParserCombinators.Parsec
data Expr = Number Float |
Add Expr Expr |
number :: Parser Expr
number = do
n <- try $ many1 digit
return $ Number $ read n
add :: Parser Expr
add = do
e1 <- number
char '+'
e2 <- number
return $ Add e1 e2
expr :: Parser Expr
expr = try number <|> try add
p :: String -> Either ParseError Expr
p = parse (do{e <- expr; eof; return e}) "error"
But here is the output
ghci> parse add "err" "1+2"
Right (Add (Number 1.0) (Number 2.0))
ghci> p "1"
Right (Number 1.0)
ghci> p "1+2"
Left "error" (line 1, column 2):
unexpected '+'
expecting digit or end of input
But if I change the order of the expr combinators to
expr :: Parser Expr
expr = try add <|> try number
Then the output changes to
ghci> p "1+2"
Right (Add (Number 1.0) (Number 2.0))
Why does this happen? I thought the try keyword forces the Parsers I am combining to restart after each <|>.
I plan on making this much larger so I want to be sure I understand why this is happening now.
My actual program is larger already but this is still causing a problem independantly.
The problem you're facing is that when the string "1+2" is parsed with number, it succeeds (admittedly, with some unparsed characters). The use of try only matters if it had failed.
Perhaps another way to show this is to consider the example try (string "a") <|> try (string "ab"). This will succeed in matching any string starting with the character a, but it will never match on strings that start with "ab".
If you had tried instead
exprAll :: Parser Expr
exprAll = try (number <* eof) <|> try (add <* eof)
then you may get the behavior you're looking for. In this case, the "try"d parser does not succeed until the end-of-file character is reached, so when the + is encountered, the parse attempt of number <* eof fails and then parsing starts over using add <* eof.

Operator precedence issue when parsing with Megaparsec

I was parsing a C-like language with array and struct. Following C operator precedence, . and [] are made of equal precedence.
opTable :: [[Operator Parser Expr]]
opTable = [[ InfixL $ Access <$ symbol "." , opSubscript]]
opSubscript = Postfix $ foldr1 (.) <$> some singleIndex
singleIndex = do
index < brackets expr
return $ \l -> ArrayIndex l index
When parsing
Struct S {
int[3] a;
}
Struct S s;
s.a[1]
it yielded
Access (Var "s") (ArrayIndex (Var "a") 1)
instead of
ArrayIndex (Access (Var "s") (Var "a")) 1
Why? Is it because [] is not parsed as InfixL?
Update:
After changing it to
opTable :: [[Operator Parser Expr]]
opTable = [[ PostFix $ (\ident expr -> Access expr ident) <$ symbol "." <*> identifier, opSubscript]]
I got another error
s.a[1]
| ^
unexpected '['
expecting ')', '_', alphanumeric character, or operator
The documentation for makeExprParser from parser-combinators is terrible with respect to prefix and postfix operators.
First, it fails to explain that with a mixture of prefix/postfix/infix operators at the supposed "same" precedence level, the prefix/postfix operators are always treated as higher precedence than the infix operators.
Second, when it makes the claims that "prefix and postfix operators of the same precedence can only occur once" and then gives --2 as an example for prefix operator -, it actually means that even two separate prefix operators (or two separate postfix operators) aren't allowed, so +-2 with separate prefix operators + and - isn't allowed either. What is allowed is a single prefix operator and a single postfix operator, at the same level, in which case the association is to the left, so -2! is okay (assuming - and ! are prefix and postfix operators at the same precedence level) and is parsed as (-2)!.
Oh, and third, the documentation never makes it clear that the example code for manyUnaryOp only works correctly for multiple prefix operators, and a non-obvious change is needed to get multiple postfix operators in the right order.
So, your first attempt doesn't work because the postfix operator is of secretly higher precedence than the infix operator. Your second attempt doesn't work because two different postfix operators at the same precedence level can't be parsed.
Your best bet is to parse single "postfix operator" consisting of a chain of access and index operations. Note the need for flip to get the ordering right for postfix operators.
opTable :: [[Operator Parser Expr]]
opTable = [[ indexAccessChain ]]
indexAccessChain = Postfix $ foldr1 (flip (.)) <$> some (singleIndex <|> singleAccess)
singleIndex = flip ArrayIndex <$> brackets expr
singleAccess = flip Access <$> (char '.' *> identifier)
A self-contained example:
{-# OPTIONS_GHC -Wall #-}
module Operators where
import Text.Megaparsec
import Text.Megaparsec.Char
import Control.Monad.Combinators.Expr
import Data.Void
type Parser = Parsec Void String
data Expr
= Access Expr String
| ArrayIndex Expr Expr
| Var String
| Lit Int
deriving (Show)
expr :: Parser Expr
expr = makeExprParser term opTable
identifier :: Parser String
identifier = some letterChar
term :: Parser Expr
term = Var <$> identifier
<|> Lit . read <$> some digitChar
opTable :: [[Operator Parser Expr]]
opTable = [[ indexAccessChain ]]
indexAccessChain :: Operator Parser Expr
indexAccessChain = Postfix $ foldr1 (flip (.)) <$> some (singleIndex <|> singleAccess)
singleIndex, singleAccess :: Parser (Expr -> Expr)
singleIndex = flip ArrayIndex <$> brackets expr
singleAccess = flip Access <$> (char '.' *> identifier)
brackets :: Parser a -> Parser a
brackets = between (char '[') (char ']')
main :: IO ()
main = parseTest expr "s.a[1][2][3].b.c[4][5][6]"

Parsec3 Text parser for quoted string, where everything is allowed in between quotes

I have actually asked this question before (here) but it turns out that the solution provided did not handle all test cases. Also, I need 'Text' parser rather than 'String', so I need parsec3.
Ok, the parser should allow for EVERY type of char inbetween quotes, even quotes. The end of the quoted text is marked by a ' character, followed by |, a space or end of input.
So,
'aa''''|
should return a string
aa'''
This is what I have:
import Text.Parsec
import Text.Parsec.Text
quotedLabel :: Parser Text
quotedLabel = do -- reads the first quote.
spaces
string "'"
lab <- liftM pack $ endBy1 anyChar endOfQuote
return lab
endOfQuote = do
string "'"
try(eof) <|> try( oneOf "| ")
Now, the problem here is of course that eof has a different type than oneOf "| ", so compilation falls.
How do I fix this? Is there a better way to achieve what I am trying to do?
Whitespace
First a comment on handling white space...
Generally the practice is to write your parsers so that they
consume the whitespace following a token
or syntactic unit. It's common to define combinator like:
lexeme p = p <* spaces
to easily convert a parser p to one that discards the whitespace
following whatever p parses. E.g., if you have
number = many1 digit
simply use lexeme number whenever you want to eat up the
whitespace following the number.
For more on this approach to handling whitespace and other advice
on parsing languages, see this Megaparsec tutorial.
Label expressions
Based on your previous SO question it appears you want
to parse expressions of the form:
label1 | label2 | ... | labeln
where each label may be a simple label or a quoted label.
The idiomatic way to parse this pattern is to use sepBy like this:
labels :: Parser String
labels = sepBy1 (try quotedLabel <|> simpleLabel) (char '|')
We define both simpleLabel and quotedLabel in terms of
what characters may occur in them. For simpleLabel a valid
character is a non-| and non-space:
simpleLabel :: Parser String
simpleLabel = many (noneOf "| ")
A quotedLabel is a single quote followed by a run
of valid quotedLabel-characters followed by an ending
single quote:
sq = char '\''
quotedLabel :: Parser String
quotedLabel = do
char sq
chs <- many validChar
char sq
return chs
A validChar is either a non-single quote or a single
quote not followed by eof or a vertical bar:
validChar = noneOf [sq] <|> try validQuote
validQuote = do
char sq
notFollowedBy eof
notFollowedBy (char '|')
return sq
The first notFollowedBy will fail if the single quote appears just
before the end of input. The second notFollowedBy will fail if
next character is a vertical bar. Therefore the sequence of the two
will succeed only if there is a non-vertical bar character following
the single quote. In this case the single quote should be interpreted
as part of the string and not the terminating single quote.
Unfortunately this doesn't quite work because the
current implementation of notFollowedBy
will always succeed with a parser which does not consume any
input -- i.e. like eof. (See this issue for more details.)
To work around this problem we can use this alternate
implementation:
notFollowedBy' :: (Stream s m t, Show a) => ParsecT s u m a -> ParsecT s u m ()
notFollowedBy' p = try $ join $
do {a <- try p; return (unexpected (show a));}
<|> return (return ())
Here is the complete solution with some tests. By adding a few lexeme
calls you can make this parser eat up any white space where you decide
it is not significant.
import Text.Parsec hiding (labels)
import Text.Parsec.String
import Control.Monad
notFollowedBy' :: (Stream s m t, Show a) => ParsecT s u m a -> ParsecT s u m ()
notFollowedBy' p = try $ join $
do {a <- try p; return (unexpected (show a));}
<|> return (return ())
sq = '\''
validChar = do
noneOf "'" <|> try validQuote
validQuote = do
char sq
notFollowedBy' eof
notFollowedBy (char '|')
return sq
quotedLabel :: Parser String
quotedLabel = do
char sq
str <- many validChar
char sq
return str
plainLabel :: Parser String
plainLabel = many (noneOf "| ")
labels :: Parser [String]
labels = sepBy1 (try quotedLabel <|> try plainLabel) (char '|')
test input expected = do
case parse (labels <* eof) "" input of
Left e -> putStrLn $ "error: " ++ show e
Right v -> if v == expected
then putStrLn $ "OK - got: " ++ show v
else putStrLn $ "NOT OK - got: " ++ show v ++ " expected: " ++ show expected
test1 = test "a|b|c" ["a","b","c"]
test2 = test "a|'b b'|c" ["a", "b b", "c"]
test3 = test "'abc''|def" ["abc'", "def" ]
test4 = test "'abc'" ["abc"]
test5 = test "x|'abc'" ["x","abc"]
To change the result of any functor computation you can just use:
fmap (const x) functor_comp
e.g.:
getLine :: IO String
fmap (const ()) getLine :: IO ()
eof :: Parser ()
oneOf "| " :: Parser Char
fmap (const ()) (oneOf "| ") :: Parser ()
Another option is to use operators from Control.Applicative:
getLine *> return 3 :: IO Integer
This performs getLine, discards the result and returns 3.
In your case, you might use:
try(eof) <|> try( oneOf "| " *> return ())

Parser for JSON String

I'm trying to write a parser for a JSON String.
A valid example, per my parser, would be: "\"foobar\"" or "\"foo\"bar\"".
Here's what I attempted, but it does not terminate:
parseEscapedQuotes :: Parser String
parseEscapedQuotes = Parser f
where
f ('"':xs) = Just ("\"", xs)
f _ = Nothing
parseStringJValue :: Parser JValue
parseStringJValue = (\x -> S (concat x)) <$>
((char '"') *>
(zeroOrMore (alt parseEscapedQuotes (oneOrMore (notChar '"'))))
<* (char '"'))
My reasoning is that, I can have a repetition of either escaped quotes "\"" or characters not equal to ".
But it's not working as I expected:
ghci> runParser parseStringJValue "\"foobar\""
Nothing
I don't know what parser combinator library you are using, but here is a working example using Parsec. I'm using monadic style to make it clearer what's going on, but it is easily translated to applicative style.
import Text.Parsec
import Text.Parsec.String
jchar :: Parser Char
jchar = escaped <|> anyChar
escaped :: Parser Char
escaped = do
char '\\'
c <- oneOf ['"', '\\', 'r', 't' ] -- etc.
return $ case c of
'r' -> '\r'
't' -> '\t'
_ -> c
jstringLiteral :: Parser String
jstringLiteral = do
char '"'
cs <- manyTill jchar (char '"')
return cs
test1 = parse jstringLiteral "" "\"This is a test\""
test2 = parse jstringLiteral "" "\"This is an embedded quote: \\\" after quote\""
test3 = parse jstringLiteral "" "\"Embedded return: \\r\""
Note the extra level of backslashes needed to represent parser input as Haskell string literals. Reading the input from a file would make creating the parser input more convenient.
The definition of the manyTill combinator is:
manyTill p end = scan
where
scan = do{ end; return [] }
<|>
do{ x <- p; xs <- scan; return (x:xs) }
and this might help you figure out why your definitions aren't working.

datatype conversion without using buildExpressionParser

I am stuck at a point in converting a expression entered by the user to my own datatype
I did it using biuldExpressionParser , but using simple parser and recursion I did as follows
openBrace = char '('
closeBrace :: GenParser Char st Char
closeBrace = char ')'
bracketExpr = do
spaces >> openBrace
expr <- expressionParser
spaces >> closeBrace
return expr
bracketExpr will return the entered expression in my own datatype
to convert it into my datatype I did for negation,if expression is a number or a variable as follows:
expressionParser = negate1
<|> number
<|> variable
--<|> addition
<?> "simple expression"
negate1 :: Parser Expr
negate1 = do{ char '-'
;ds <- number
;return (ExprNeg (ds) )
}
<?> "negate"
variable :: Parser Expr
variable = do{ ds<- many1 (letter <|> digit)
; return (ExprVar ds)}
<?> "variable"
number :: Parser Expr
number = do{ ds<- many1 digit
; return (ExprNum (read ds))}
<?> "number"
To do the same for addtion I tried by seperating the expression using sepBy but I am encountering several issues.
If the extered expreesion is 1+2
Then I should getExprAdd (ExprNum 1) (ExprNum 2)
I am unable to proceed further from here .Help would be great.
Thank you.
If you want to be writing a parser with parser combinators you need to think in terms of high-level rules first. Here's a skeleton parser in Parsec; it does not 100% meet your needs because all of the operators are same-precedence and right-associative, whereas you probably want different precedences and left-associativity. Still, this is the basic way to write a parser:
import Text.Parsec
import Text.Parsec.Char
import Data.Char (isDigit)
-- basic data type
data Expr = Op Char Expr Expr | N Integer deriving (Show)
type Parser x = Parsec String () x
-- reverse-sequenced >>, used to implement `parenthesized` and `whitespaced`
(<<) :: Monad m => m x -> m y -> m x
mx << my = mx >>= \x -> my >> return x
infixl 1 <<
parenthesized :: Parser e -> Parser e
parenthesized e = char '(' >> e << char ')'
whitespaced :: Parser e -> Parser e
whitespaced e = spaces >> e << spaces
number :: Parser Expr
number = do
c <- oneOf "123456789" -- leading 0's can be reserved for octal/hexadecimal
cs <- many digit
return (N (read (c:cs)))
operator :: Parser Expr
operator = do
e1 <- expr_no_op
o <- whitespaced (oneOf "+*/-")
e2 <- expression
return (Op o e1 e2)
expr_no_op :: Parser Expr
expr_no_op = whitespaced (try number <|> parenthesized expression)
expression :: Parser Expr
expression = whitespaced (try operator <|> try number <|> parenthesized expression)
Notice that you define tokens (above, just 'number') and then combine them with a "try this <|> try that <|> otherwise..." syntax. Notice also that you have to stop operator from taking an expression as its first argument otherwise you'll get a operator -> expression -> operator loop in the parsing. This is called "left factoring."

Resources