datatype conversion without using buildExpressionParser - haskell

I am stuck at a point in converting a expression entered by the user to my own datatype
I did it using biuldExpressionParser , but using simple parser and recursion I did as follows
openBrace = char '('
closeBrace :: GenParser Char st Char
closeBrace = char ')'
bracketExpr = do
spaces >> openBrace
expr <- expressionParser
spaces >> closeBrace
return expr
bracketExpr will return the entered expression in my own datatype
to convert it into my datatype I did for negation,if expression is a number or a variable as follows:
expressionParser = negate1
<|> number
<|> variable
--<|> addition
<?> "simple expression"
negate1 :: Parser Expr
negate1 = do{ char '-'
;ds <- number
;return (ExprNeg (ds) )
}
<?> "negate"
variable :: Parser Expr
variable = do{ ds<- many1 (letter <|> digit)
; return (ExprVar ds)}
<?> "variable"
number :: Parser Expr
number = do{ ds<- many1 digit
; return (ExprNum (read ds))}
<?> "number"
To do the same for addtion I tried by seperating the expression using sepBy but I am encountering several issues.
If the extered expreesion is 1+2
Then I should getExprAdd (ExprNum 1) (ExprNum 2)
I am unable to proceed further from here .Help would be great.
Thank you.

If you want to be writing a parser with parser combinators you need to think in terms of high-level rules first. Here's a skeleton parser in Parsec; it does not 100% meet your needs because all of the operators are same-precedence and right-associative, whereas you probably want different precedences and left-associativity. Still, this is the basic way to write a parser:
import Text.Parsec
import Text.Parsec.Char
import Data.Char (isDigit)
-- basic data type
data Expr = Op Char Expr Expr | N Integer deriving (Show)
type Parser x = Parsec String () x
-- reverse-sequenced >>, used to implement `parenthesized` and `whitespaced`
(<<) :: Monad m => m x -> m y -> m x
mx << my = mx >>= \x -> my >> return x
infixl 1 <<
parenthesized :: Parser e -> Parser e
parenthesized e = char '(' >> e << char ')'
whitespaced :: Parser e -> Parser e
whitespaced e = spaces >> e << spaces
number :: Parser Expr
number = do
c <- oneOf "123456789" -- leading 0's can be reserved for octal/hexadecimal
cs <- many digit
return (N (read (c:cs)))
operator :: Parser Expr
operator = do
e1 <- expr_no_op
o <- whitespaced (oneOf "+*/-")
e2 <- expression
return (Op o e1 e2)
expr_no_op :: Parser Expr
expr_no_op = whitespaced (try number <|> parenthesized expression)
expression :: Parser Expr
expression = whitespaced (try operator <|> try number <|> parenthesized expression)
Notice that you define tokens (above, just 'number') and then combine them with a "try this <|> try that <|> otherwise..." syntax. Notice also that you have to stop operator from taking an expression as its first argument otherwise you'll get a operator -> expression -> operator loop in the parsing. This is called "left factoring."

Related

Operator precedence issue when parsing with Megaparsec

I was parsing a C-like language with array and struct. Following C operator precedence, . and [] are made of equal precedence.
opTable :: [[Operator Parser Expr]]
opTable = [[ InfixL $ Access <$ symbol "." , opSubscript]]
opSubscript = Postfix $ foldr1 (.) <$> some singleIndex
singleIndex = do
index < brackets expr
return $ \l -> ArrayIndex l index
When parsing
Struct S {
int[3] a;
}
Struct S s;
s.a[1]
it yielded
Access (Var "s") (ArrayIndex (Var "a") 1)
instead of
ArrayIndex (Access (Var "s") (Var "a")) 1
Why? Is it because [] is not parsed as InfixL?
Update:
After changing it to
opTable :: [[Operator Parser Expr]]
opTable = [[ PostFix $ (\ident expr -> Access expr ident) <$ symbol "." <*> identifier, opSubscript]]
I got another error
s.a[1]
| ^
unexpected '['
expecting ')', '_', alphanumeric character, or operator
The documentation for makeExprParser from parser-combinators is terrible with respect to prefix and postfix operators.
First, it fails to explain that with a mixture of prefix/postfix/infix operators at the supposed "same" precedence level, the prefix/postfix operators are always treated as higher precedence than the infix operators.
Second, when it makes the claims that "prefix and postfix operators of the same precedence can only occur once" and then gives --2 as an example for prefix operator -, it actually means that even two separate prefix operators (or two separate postfix operators) aren't allowed, so +-2 with separate prefix operators + and - isn't allowed either. What is allowed is a single prefix operator and a single postfix operator, at the same level, in which case the association is to the left, so -2! is okay (assuming - and ! are prefix and postfix operators at the same precedence level) and is parsed as (-2)!.
Oh, and third, the documentation never makes it clear that the example code for manyUnaryOp only works correctly for multiple prefix operators, and a non-obvious change is needed to get multiple postfix operators in the right order.
So, your first attempt doesn't work because the postfix operator is of secretly higher precedence than the infix operator. Your second attempt doesn't work because two different postfix operators at the same precedence level can't be parsed.
Your best bet is to parse single "postfix operator" consisting of a chain of access and index operations. Note the need for flip to get the ordering right for postfix operators.
opTable :: [[Operator Parser Expr]]
opTable = [[ indexAccessChain ]]
indexAccessChain = Postfix $ foldr1 (flip (.)) <$> some (singleIndex <|> singleAccess)
singleIndex = flip ArrayIndex <$> brackets expr
singleAccess = flip Access <$> (char '.' *> identifier)
A self-contained example:
{-# OPTIONS_GHC -Wall #-}
module Operators where
import Text.Megaparsec
import Text.Megaparsec.Char
import Control.Monad.Combinators.Expr
import Data.Void
type Parser = Parsec Void String
data Expr
= Access Expr String
| ArrayIndex Expr Expr
| Var String
| Lit Int
deriving (Show)
expr :: Parser Expr
expr = makeExprParser term opTable
identifier :: Parser String
identifier = some letterChar
term :: Parser Expr
term = Var <$> identifier
<|> Lit . read <$> some digitChar
opTable :: [[Operator Parser Expr]]
opTable = [[ indexAccessChain ]]
indexAccessChain :: Operator Parser Expr
indexAccessChain = Postfix $ foldr1 (flip (.)) <$> some (singleIndex <|> singleAccess)
singleIndex, singleAccess :: Parser (Expr -> Expr)
singleIndex = flip ArrayIndex <$> brackets expr
singleAccess = flip Access <$> (char '.' *> identifier)
brackets :: Parser a -> Parser a
brackets = between (char '[') (char ']')
main :: IO ()
main = parseTest expr "s.a[1][2][3].b.c[4][5][6]"

Adding infix operator to expression parser

I'm trying to add a parser for infix operators to a simple expressions parser. I have already looked at the documentation and at this question, but it seems like I am missing something.
import qualified Text.Parsec.Expr as Expr
import qualified Text.Parsec.Token as Tokens
import Text.ParserCombinators.Parsec
import Text.Parsec
data Expr = Number Integer
| Op Expr Expr
| Boolean Bool
instance Show Expr where
show (Op l r) = "(+ " ++ (show l) ++ " " ++ (show r) ++ ")"
show (Number r) = show r
show (Boolean b) = show b
parens = Tokens.parens haskell
reserved = Tokens.reservedOp haskell
infix_ operator func =
Expr.Infix (spaces >> reserved operator >> spaces >> return func) Expr.AssocLeft
infixOp =
Expr.buildExpressionParser table parser
where
table = [[infix_ "+" Op]]
number :: Parser Expr
number =
do num <- many1 digit
return $ Number $ read num
bool :: Parser Expr
bool = (string "true" >> return (Boolean True)) <|> (string "false" >> return (Boolean False))
parser = parens infixOp <|> number <|> bool
run = Text.Parsec.runParser parser () ""
This parser is able to parse expressions like 1, false, (1 + 2), (1 + false), but not 1 + 2 (it's parsed as 1). If I try to change the parser to parens infixOp <|> infixOp <|> number <|> bool, it get stuck.
What should i change in order to parse expressions like 1 + 2 without parenthesis?
You have to run the infixOp parser at the top level like this:
run = Text.Parsec.runParser infixOp () ""
Otherwise the your infix expressions can only be parsed when occuring in parentheses.
The attempt to use parens infixOp <|> infixOp <|> number <|> bool most likely gets stuck because it loops: parser tries to parse using infixOp, which tries to parse using parse and so on...
These tutorial might help you getting started with parsec (they did for me):
https://wiki.haskell.org/Parsing_a_simple_imperative_language
http://dev.stephendiehl.com/fun/002_parsers.html

Parsing arithmetic expression with Haskell Parsec

I'm writing a arithmetic parser to treat expressions like "1+2-3". I use this blog post as reference. To treat left associativity and precedence, I write a parser with Parsec according to this BNF (from blog post).
<exp> ::= <term> { ("+" | "-") <term> }
<term> ::= <factor> { ("*" | "/") <factor> }
<factor> ::= "(" <exp> ")" | <unary_op> <factor> | <int>
This is my parser code.
parseExp :: Parser Exp
parseExp = do
t1 <- parseTerm
loop t1
where termSuffix t1 = do
op <- lexeme $ oneOf "+-"
t2 <- parseTerm
case op of
'+' -> termSuffix (Binary Plus t1 t2)
'-' -> termSuffix (Binary Minus t1 t2)
loop t = termSuffix t <|> return t
parseTerm :: Parser Exp
parseTerm = do
f1 <- parseFactor
loop f1
where factorSuffix f1 = do
op <- lexeme $ oneOf "*/"
f2 <- parseFactor
case op of
'*' -> factorSuffix (Binary Mul f1 f2)
'/' -> factorSuffix (Binary Div f1 f2)
loop t = factorSuffix t <|> return t
parseFactor :: Parser Exp
parseFactor = parseConst <|> parseParen <|> parseUnary
parseParen = do
void $ lexeme $ char '('
e <- parseExp
void $ lexeme $ char ')'
return e
parseUnary :: Parser Exp
parseUnary = do
op <- lexeme $ oneOf "!~-"
f <- parseFactor
case op of
'!' -> return $ Unary LogNeg f
'~' -> return $ Unary BitCompl f
'-' -> return $ Unary ArithNeg f
parseConst :: Parser Exp
parseConst = do
i <- many1 digit
return (Const $ read i)
I also used this tutorial code as reference. tutorial
simpleExpr7 :: Parser SimpleExpr
simpleExpr7 = do
-- first parse a term
e <- term7
-- then see if it is followed by an '+ expr' suffix
maybeAddSuffix e
where
-- this function takes an expression, and parses a
-- '+ expr' suffix, returning an Add expression
-- it recursively calls itself via the maybeAddSuffix function
addSuffix e0 = do
void $ lexeme $ char '+'
e1 <- term7
maybeAddSuffix (Add e0 e1)
-- this is the wrapper for addSuffix, which adapts it so that if
-- addSuffix fails, it returns just the original expression
maybeAddSuffix e = addSuffix e <|> return e
My code doesn't work. This code works like this.
*Main CodeGen Parser> parseWithEof parseExp "-2"
Right (Unary ArithNeg (Const 2))
*Main CodeGen Parser> parseWithEof parseExp "(2)"
Right (Const 2)
*Main CodeGen Parser> parseWithEof parseExp "-!(((2)))"
Right (Unary ArithNeg (Unary LogNeg (Const 2)))
*Main CodeGen Parser> parseWithEof parseExp "1+2"
Left (line 1, column 4):
unexpected end of input
expecting digit
*Main CodeGen Parser> parseWithEof parseExp "1+2+3"
Left (line 1, column 6):
unexpected end of input
expecting digit
*Main CodeGen Parser> parseWithEof parseExp "1+2*3"
Left (line 1, column 6):
unexpected end of input
expecting digit
I can't understand why this results unexpected end of input.
Consider parsing 1+2. In parseExp this parses 1 into t1 = Const 1 and then enters the loop loop (Const 1). The loop tries the first alternative termSuffix (Const 1) which succesfully parses the operator +, the next term t2 = Const 2, and then loops back into termSuffix (Binary Plus (Const 1) (Const 2)) which expects either a + or -. The parse fails. Instead of looping back into termSuffix, you should loop back into loop to allow a single term after the first +:
parseExp :: Parser Exp
parseExp = do
t1 <- parseTerm
loop t1
where termSuffix t1 = do
op <- lexeme $ oneOf "+-"
t2 <- parseTerm
case op of
-- *** use `loop` here, not `termSuffix` ***
'+' -> loop (Binary Plus t1 t2)
'-' -> loop (Binary Minus t1 t2)
loop t = termSuffix t <|> return t
After making a similar change to parseTerm, your test cases all work fine.

Parsec if a match it found then throw error

I am trying to distinguish between Ints and floats in a parser. I have 2 parsers one for each int and float. However, I am having trouble getting into to fail on a '.'. I looked for negating and look ahead and didn't seem to get and fruits.
I hope I am not duplicating any questions.
I had it working with looking at the next character that is not a '.' but that is an ugly solution.
EDIT: Added more code.
--Int--------------------------------------------------------------------
findInt :: Parser String
findInt = plus <|> minus <|> number
number :: Parser String
number = many1 digit
plus :: Parser String
plus = char '+' *> number
minus :: Parser String
minus = char '-' <:> number
makeInt :: Parser Int
makeInt = prepareResult (findInt <* many (noneOf ".") <* endOfLine)
where readInt = read :: String -> Int
prepareResult = liftA readInt
makeInt2 :: Parser Int
makeInt2 = do
numberFound <- (findInt <* many (noneOf ".") <* endOfLine)
match <- char '.'
return (prepareResult numberFound)
where readInt = read :: String -> Int
prepareResult = readInt
--End Int----------------------------------------------------------------
I think you are best off actually combining the two parsers into one. Try something like this:
import Text.Parsec.String (Parser)
import Control.Applicative ((<|>))
import Text.Parsec.Char (char,digit)
import Text.Parsec.Combinator (many1,optionMaybe)
makeIntOrFloat :: Parser (Either Int Float)
makeIntOrFloat = do
sign <- optionMaybe (char '-' <|> char '+')
n <- many1 digit
m <- optionMaybe (char '.' *> many1 digit)
return $ case (m,sign) of
(Nothing, Just '-') -> Left (negate (read n))
(Nothing, _) -> Left (read n)
(Just m, Just '-') -> Right (negate (read n + read m / 10.0^(length m)))
(Just m, _) -> Right (read n + read m / 10.0^(length m))
ErikR has a correct solution, but the use of try means that parsec has to keep track of the possibility of backtracking (which is a bit inefficient) when in fact that is unnecessary in this case.
Here, the key difference is that we can actually tell right away if we have a float or not - if we don't have a float, the char '.' *> many1 digit parser in optionMaybe will fail immediately (without consuming input), so there is no need to consider backtracking.
At GHCi
ghci> import Text.Parsec.Prim
ghci> parseTest makeIntOrFloat "1234.012"
Right 1234.012
ghci> parseTest makeIntOrFloat "1234"
Left 1234
I would use notFollowedBy - e.g.:
import Text.Parsec
import Text.Parsec.String
import Text.Parsec.Combinator
int :: Parser String
int = many1 digit <* notFollowedBy (char '.')
float :: Parser (String,String)
float = do whole <- many1 digit
fracpart <- try (char '.' *> many digit) <|> (return "")
return (whole, fracpart)
intOrFloat :: Parser (Either String (String,String))
intOrFloat = try (fmap Left int) <|> (fmap Right float)
test1 = parseTest (intOrFloat <* eof) "123"
test2 = parseTest (intOrFloat <* eof) "123.456"
test3 = parseTest (intOrFloat <* eof) "123."
It is typically easiest to use applicative combinators to build your parsers - this makes your parsers easier to reason about and often you do not need monadic and backtracking functions of the parser.
For example, a parser for integers could be written as such:
import Text.Parsec hiding ((<|>), optional)
import Text.Parsec.String
import Numeric.Natural
import Control.Applicative
import Data.Foldable
natural :: Parser Natural
natural = read <$> many1 digit
sign :: Num a => Parser (a -> a)
sign = asum [ id <$ char '+'
, negate <$ char '-'
, pure id
]
integer :: Parser Integer
integer = sign <*> (fromIntegral <$> natural)
A decimal number is an integer optionally followed by a decimal portion (a '.' followed by another integer), which is itself a number proper, so your parser can be written as
decimalPart :: Parser Double
decimalPart = read . ("0."++) <$> (char '.' *> many1 digit)
integerOrDecimal :: Parser (Either Integer Double)
integerOrDecimal = liftA2 cmb integer (optional decimalPart) where
cmb :: Integer -> Maybe Double -> Either Integer Double
cmb x Nothing = Left x
cmb x (Just d) = Right (fromIntegral x + d)
The definition of cmb is obvious - if the is no decimal part, then produce an Integer, and if there is, produce a Double, by adding the integer part to the decimal part.
You can also define a parser for decimals in terms of the above:
decimal :: Parser Double
decimal = either fromIntegral id <$> integerOrDecimal
Note that none of the above parsers directly use monadic functions (i.e. >>=) or backtracking - making them simple and efficient.

How to make the Parsec chainl1 function follow operator precedence rules

I'm programming a standard math notation -> DC POSIX-compliant format converter. It takes the input string, parses it into an intermediary data type and then turns it into the output string by showing it.
This is the Data type used. I have no problems with the Data type -> Output String conversion, it works flawlessly:
data Expression = Expression :+ Expression
| Expression :- Expression
| Expression :* Expression
| Expression :/ Expression
| Expression :^ Expression
| Cons String
infixr 0 :+
infixr 0 :-
infixr 1 :*
infixr 1 :/
infixr 2 :^
instance Show Expression where
show (x :+ y) = unwords [show x, show y, "+"]
show (x :- y) = unwords [show x, show y, "-"]
show (x :* y) = unwords [show x, show y, "*"]
show (x :/ y) = unwords [show x, show y, "/"]
show (x :^ y) = unwords [show x, show y, "^"]
show (Cons y) = y
The Parsec parser part, however, refuses to comply with the defined operator precedency rules. Clearly because of the way chainl1 is used in the subexpression parser definition:
expression :: Parser Expression
expression = do
spaces
x <- subexpression
spaces >> eof >> return x
subexpression :: Parser Expression
subexpression = (
(bracketed subexpression) <|>
constant
) `chainl1` (
try addition <|>
try substraction <|>
try multiplication <|>
try division <|>
try exponentiation
)
addition = operator '+' (:+)
substraction = operator '-' (:-)
multiplication = operator '*' (:*)
division = operator '/' (:/)
exponentiation = operator '^' (:^)
operator :: Char -> (a -> a -> a) -> Parser (a -> a -> a)
operator c op = do
spaces >> char c >> spaces
return op
bracketed :: Parser a -> Parser a
bracketed parser = do
char '('
x <- parser
char ')'
return x
constant :: Parser Expression
constant = do
parity <- optionMaybe $ oneOf "-+"
constant <- many1 (digit <|> char '.')
return (if parity == Just '-'
then (Cons $ '_':constant)
else Cons constant)
Is there a way of making the parser take into account the operator precedence rules without having to rewrite the entirety of my code?
Well, you don't need to rewrite your entire code, but since your subexpression parser doesn't take precedence into account at all, you have to rewrite that - substantially.
One possibility is to build it from parsers for subexpressions with top-level operators with the same precedence,
atom :: Parser Expression
atom = bracketed subexpression <|> constant
-- highest precedence operator is exponentiation, usually that's
-- right-associative, hence I use chainr1 here
powers :: Parser Expression
powers = atom `chainr1` try exponentiation
-- a multiplicative expression is a product or quotient of powers,
-- left-associative
multis :: Parser Expression
multis = powers `chainl1` (try multiplication <|> try division)
-- a subexpression is a sum (or difference) of multiplicative expressions
subexpression :: Parser Expression
subexpression = multis `chainl1` (try addition <|> try substraction)
Another option is to let the precedence and associativities be taken care of by the library and use Text.Parsec.Expr, namely buildExpressionParser:
table = [ [binary "^" (:^) AssocRight]
, [binary "*" (:*) AssocLeft, binary "/" (:/) AssocLeft]
, [binary "+" (:+) AssocLeft, binary "-" (:-) AssocLeft]
]
binary name fun assoc = Infix (do{ string name; spaces; return fun }) assoc
subexpression = buildExpressionParser table atom
(which requires that bracketed parser and constant consume the spaces after the used tokens).

Resources