Monadic equivalent of applicative <* - haskell

After having read Anthony's response on a style-related parser question, I was trying to convince myself that writing monadic parsers can still be rather compact.
So instead of
reference :: Parser Transc
reference = try $ do string "#{"
a <- number
char ','
b <- number
char ','
c <- number
char '}'
return $ Outside (a,b,c)
We can simply have:
reference3 :: Parser Transc
reference3 = liftM3 (((Outside .).) . (,,))
(string "#{" >> number <<! char ',')
number
(char ',' >> number <<! char '}') where
(<<!) = liftM2 const
Which is very similar to applicative version provided by Anthony:
reference2 :: Parser Transc
reference2 = ((Outside .) .) . (,,)
<$> (string "#{" *> number2 <* char ',')
<*> number2
<*> (char ',' *> number2 <* char '}')
...except for the <<! operator which is conceptually similar to <* which is defined as liftA2 const meaning "sequence but discard value and use value provided to the left".
Of course << would have been a bad name for liftM2 const, it would have suggested that << is equivalent to flip >> if we follow the same logic as >>= and =<<.
I don't find a "liftM2 const" under a single name. Is this because it is not that useful?

I don't quite see the problem. Every monad is also an of applicative functor, so you can simply use (*>) in the monadic expressions as well.
(At the time of this answer (year 2011), Applicative was not a superclass of Monad, so it may have been necessary to add a corresponding class instance.)

Related

Operator precedence issue when parsing with Megaparsec

I was parsing a C-like language with array and struct. Following C operator precedence, . and [] are made of equal precedence.
opTable :: [[Operator Parser Expr]]
opTable = [[ InfixL $ Access <$ symbol "." , opSubscript]]
opSubscript = Postfix $ foldr1 (.) <$> some singleIndex
singleIndex = do
index < brackets expr
return $ \l -> ArrayIndex l index
When parsing
Struct S {
int[3] a;
}
Struct S s;
s.a[1]
it yielded
Access (Var "s") (ArrayIndex (Var "a") 1)
instead of
ArrayIndex (Access (Var "s") (Var "a")) 1
Why? Is it because [] is not parsed as InfixL?
Update:
After changing it to
opTable :: [[Operator Parser Expr]]
opTable = [[ PostFix $ (\ident expr -> Access expr ident) <$ symbol "." <*> identifier, opSubscript]]
I got another error
s.a[1]
| ^
unexpected '['
expecting ')', '_', alphanumeric character, or operator
The documentation for makeExprParser from parser-combinators is terrible with respect to prefix and postfix operators.
First, it fails to explain that with a mixture of prefix/postfix/infix operators at the supposed "same" precedence level, the prefix/postfix operators are always treated as higher precedence than the infix operators.
Second, when it makes the claims that "prefix and postfix operators of the same precedence can only occur once" and then gives --2 as an example for prefix operator -, it actually means that even two separate prefix operators (or two separate postfix operators) aren't allowed, so +-2 with separate prefix operators + and - isn't allowed either. What is allowed is a single prefix operator and a single postfix operator, at the same level, in which case the association is to the left, so -2! is okay (assuming - and ! are prefix and postfix operators at the same precedence level) and is parsed as (-2)!.
Oh, and third, the documentation never makes it clear that the example code for manyUnaryOp only works correctly for multiple prefix operators, and a non-obvious change is needed to get multiple postfix operators in the right order.
So, your first attempt doesn't work because the postfix operator is of secretly higher precedence than the infix operator. Your second attempt doesn't work because two different postfix operators at the same precedence level can't be parsed.
Your best bet is to parse single "postfix operator" consisting of a chain of access and index operations. Note the need for flip to get the ordering right for postfix operators.
opTable :: [[Operator Parser Expr]]
opTable = [[ indexAccessChain ]]
indexAccessChain = Postfix $ foldr1 (flip (.)) <$> some (singleIndex <|> singleAccess)
singleIndex = flip ArrayIndex <$> brackets expr
singleAccess = flip Access <$> (char '.' *> identifier)
A self-contained example:
{-# OPTIONS_GHC -Wall #-}
module Operators where
import Text.Megaparsec
import Text.Megaparsec.Char
import Control.Monad.Combinators.Expr
import Data.Void
type Parser = Parsec Void String
data Expr
= Access Expr String
| ArrayIndex Expr Expr
| Var String
| Lit Int
deriving (Show)
expr :: Parser Expr
expr = makeExprParser term opTable
identifier :: Parser String
identifier = some letterChar
term :: Parser Expr
term = Var <$> identifier
<|> Lit . read <$> some digitChar
opTable :: [[Operator Parser Expr]]
opTable = [[ indexAccessChain ]]
indexAccessChain :: Operator Parser Expr
indexAccessChain = Postfix $ foldr1 (flip (.)) <$> some (singleIndex <|> singleAccess)
singleIndex, singleAccess :: Parser (Expr -> Expr)
singleIndex = flip ArrayIndex <$> brackets expr
singleAccess = flip Access <$> (char '.' *> identifier)
brackets :: Parser a -> Parser a
brackets = between (char '[') (char ']')
main :: IO ()
main = parseTest expr "s.a[1][2][3].b.c[4][5][6]"

Parsec if a match it found then throw error

I am trying to distinguish between Ints and floats in a parser. I have 2 parsers one for each int and float. However, I am having trouble getting into to fail on a '.'. I looked for negating and look ahead and didn't seem to get and fruits.
I hope I am not duplicating any questions.
I had it working with looking at the next character that is not a '.' but that is an ugly solution.
EDIT: Added more code.
--Int--------------------------------------------------------------------
findInt :: Parser String
findInt = plus <|> minus <|> number
number :: Parser String
number = many1 digit
plus :: Parser String
plus = char '+' *> number
minus :: Parser String
minus = char '-' <:> number
makeInt :: Parser Int
makeInt = prepareResult (findInt <* many (noneOf ".") <* endOfLine)
where readInt = read :: String -> Int
prepareResult = liftA readInt
makeInt2 :: Parser Int
makeInt2 = do
numberFound <- (findInt <* many (noneOf ".") <* endOfLine)
match <- char '.'
return (prepareResult numberFound)
where readInt = read :: String -> Int
prepareResult = readInt
--End Int----------------------------------------------------------------
I think you are best off actually combining the two parsers into one. Try something like this:
import Text.Parsec.String (Parser)
import Control.Applicative ((<|>))
import Text.Parsec.Char (char,digit)
import Text.Parsec.Combinator (many1,optionMaybe)
makeIntOrFloat :: Parser (Either Int Float)
makeIntOrFloat = do
sign <- optionMaybe (char '-' <|> char '+')
n <- many1 digit
m <- optionMaybe (char '.' *> many1 digit)
return $ case (m,sign) of
(Nothing, Just '-') -> Left (negate (read n))
(Nothing, _) -> Left (read n)
(Just m, Just '-') -> Right (negate (read n + read m / 10.0^(length m)))
(Just m, _) -> Right (read n + read m / 10.0^(length m))
ErikR has a correct solution, but the use of try means that parsec has to keep track of the possibility of backtracking (which is a bit inefficient) when in fact that is unnecessary in this case.
Here, the key difference is that we can actually tell right away if we have a float or not - if we don't have a float, the char '.' *> many1 digit parser in optionMaybe will fail immediately (without consuming input), so there is no need to consider backtracking.
At GHCi
ghci> import Text.Parsec.Prim
ghci> parseTest makeIntOrFloat "1234.012"
Right 1234.012
ghci> parseTest makeIntOrFloat "1234"
Left 1234
I would use notFollowedBy - e.g.:
import Text.Parsec
import Text.Parsec.String
import Text.Parsec.Combinator
int :: Parser String
int = many1 digit <* notFollowedBy (char '.')
float :: Parser (String,String)
float = do whole <- many1 digit
fracpart <- try (char '.' *> many digit) <|> (return "")
return (whole, fracpart)
intOrFloat :: Parser (Either String (String,String))
intOrFloat = try (fmap Left int) <|> (fmap Right float)
test1 = parseTest (intOrFloat <* eof) "123"
test2 = parseTest (intOrFloat <* eof) "123.456"
test3 = parseTest (intOrFloat <* eof) "123."
It is typically easiest to use applicative combinators to build your parsers - this makes your parsers easier to reason about and often you do not need monadic and backtracking functions of the parser.
For example, a parser for integers could be written as such:
import Text.Parsec hiding ((<|>), optional)
import Text.Parsec.String
import Numeric.Natural
import Control.Applicative
import Data.Foldable
natural :: Parser Natural
natural = read <$> many1 digit
sign :: Num a => Parser (a -> a)
sign = asum [ id <$ char '+'
, negate <$ char '-'
, pure id
]
integer :: Parser Integer
integer = sign <*> (fromIntegral <$> natural)
A decimal number is an integer optionally followed by a decimal portion (a '.' followed by another integer), which is itself a number proper, so your parser can be written as
decimalPart :: Parser Double
decimalPart = read . ("0."++) <$> (char '.' *> many1 digit)
integerOrDecimal :: Parser (Either Integer Double)
integerOrDecimal = liftA2 cmb integer (optional decimalPart) where
cmb :: Integer -> Maybe Double -> Either Integer Double
cmb x Nothing = Left x
cmb x (Just d) = Right (fromIntegral x + d)
The definition of cmb is obvious - if the is no decimal part, then produce an Integer, and if there is, produce a Double, by adding the integer part to the decimal part.
You can also define a parser for decimals in terms of the above:
decimal :: Parser Double
decimal = either fromIntegral id <$> integerOrDecimal
Note that none of the above parsers directly use monadic functions (i.e. >>=) or backtracking - making them simple and efficient.

datatype conversion without using buildExpressionParser

I am stuck at a point in converting a expression entered by the user to my own datatype
I did it using biuldExpressionParser , but using simple parser and recursion I did as follows
openBrace = char '('
closeBrace :: GenParser Char st Char
closeBrace = char ')'
bracketExpr = do
spaces >> openBrace
expr <- expressionParser
spaces >> closeBrace
return expr
bracketExpr will return the entered expression in my own datatype
to convert it into my datatype I did for negation,if expression is a number or a variable as follows:
expressionParser = negate1
<|> number
<|> variable
--<|> addition
<?> "simple expression"
negate1 :: Parser Expr
negate1 = do{ char '-'
;ds <- number
;return (ExprNeg (ds) )
}
<?> "negate"
variable :: Parser Expr
variable = do{ ds<- many1 (letter <|> digit)
; return (ExprVar ds)}
<?> "variable"
number :: Parser Expr
number = do{ ds<- many1 digit
; return (ExprNum (read ds))}
<?> "number"
To do the same for addtion I tried by seperating the expression using sepBy but I am encountering several issues.
If the extered expreesion is 1+2
Then I should getExprAdd (ExprNum 1) (ExprNum 2)
I am unable to proceed further from here .Help would be great.
Thank you.
If you want to be writing a parser with parser combinators you need to think in terms of high-level rules first. Here's a skeleton parser in Parsec; it does not 100% meet your needs because all of the operators are same-precedence and right-associative, whereas you probably want different precedences and left-associativity. Still, this is the basic way to write a parser:
import Text.Parsec
import Text.Parsec.Char
import Data.Char (isDigit)
-- basic data type
data Expr = Op Char Expr Expr | N Integer deriving (Show)
type Parser x = Parsec String () x
-- reverse-sequenced >>, used to implement `parenthesized` and `whitespaced`
(<<) :: Monad m => m x -> m y -> m x
mx << my = mx >>= \x -> my >> return x
infixl 1 <<
parenthesized :: Parser e -> Parser e
parenthesized e = char '(' >> e << char ')'
whitespaced :: Parser e -> Parser e
whitespaced e = spaces >> e << spaces
number :: Parser Expr
number = do
c <- oneOf "123456789" -- leading 0's can be reserved for octal/hexadecimal
cs <- many digit
return (N (read (c:cs)))
operator :: Parser Expr
operator = do
e1 <- expr_no_op
o <- whitespaced (oneOf "+*/-")
e2 <- expression
return (Op o e1 e2)
expr_no_op :: Parser Expr
expr_no_op = whitespaced (try number <|> parenthesized expression)
expression :: Parser Expr
expression = whitespaced (try operator <|> try number <|> parenthesized expression)
Notice that you define tokens (above, just 'number') and then combine them with a "try this <|> try that <|> otherwise..." syntax. Notice also that you have to stop operator from taking an expression as its first argument otherwise you'll get a operator -> expression -> operator loop in the parsing. This is called "left factoring."

Is this idiomatic use of Text.Parsec?

My use of Text.Parsec is a little rusty. If I just want to return the matched string is this idiomatic?
category :: Stream s m Char => ParsecT s u m [Char]
category = concat <$> (many1 $ (:) <$> char '/' <*> (many1 $ noneOf "/\n"))
I feel like there might be an existing operator for liftM concat . many1 or (:) <$> p1 <*> p2 that I'm ignoring, but I'm not sure.
That's fine, I think. A little judicious naming would make it prettier:
category = concat <$> many1 segment
where
segment = (:) <$> char '/' <*> many1 (noneOf "/\n")
I think it would be slightly more idiomatic use of Parsec to return something more structured, for example, the list of strings:
catList :: Parser [String]
catList = char '/' *> many1 alphaNum `sepBy1` char '/'
I don't think there's a combinator like the one you were wondering there was, but this is Haskell, and roll-your-own-control-structure-or-combinator is always available:
concatMany1 :: Parser [a] -> Parser [a]
concatMany1 p = concat <$> many1 p
catConcat = concatMany1 $ (:) <$> char '/' <*> many1 alphaNum
But this next combinator is even nicer, and definitely idiomatic Haskell at least:
infixr 5 <:>
(<:>) :: Applicative f => f a -> f [a] -> f [a]
hd <:> tl = (:) <$> hd <*> tl
So now we can write
catCons :: Parser String
catCons = concatMany1 (char '/' <:> many1 alphaNum)
but incidentally also
contrivedExample :: IO String
contrivedExample = getChar <:> getLine
moreContrived :: String -> Maybe String
moreContrived name = find isLetter name <:> lookup name symbolTable
noneOf
You'll notice I've used alphaNum where you used noneOf "/\n". I think noneOf is not good practice; parsers should be really careful to accept onlt the right thing. Are you absolutely sure you want your parser to accept /qwerty/12345/!"£$%^&*()#:?><.,#{}[] \/ "/" /-=_+~? Should it really be happy with /usr\local\bin?
As it stands, your parser accepts any string as long as it starts with / and ends before \n with something that's not /. I think you should rewrite it with alphaNum <|> oneOf "_-.',~+" or similar instead of using noneOf. Using noneOf allows you to avoid thinking about what you should allow and focus on getting positive examples to parse instead of only positive examples to parse.
Parser
I've also always gone for Parser a instead of Stream s m t => ParsecT s u m a. That's just lazy typing, but let's pretend I did it to make it clearer what my code was doing, shall we? :) Use what type signature suits you, of course.

Can parser combinators be made efficient?

Around 6 years ago, I benchmarked my own parser combinators in OCaml and found that they were ~5× slower than the parser generators on offer at the time. I recently revisited this subject and benchmarked Haskell's Parsec vs a simple hand-rolled precedence climbing parser written in F# and was surprised to find the F# to be 25× faster than the Haskell.
Here's the Haskell code I used to read a large mathematical expression from file, parse and evaluate it:
import Control.Applicative
import Text.Parsec hiding ((<|>))
expr = chainl1 term ((+) <$ char '+' <|> (-) <$ char '-')
term = chainl1 fact ((*) <$ char '*' <|> div <$ char '/')
fact = read <$> many1 digit <|> char '(' *> expr <* char ')'
eval :: String -> Int
eval = either (error . show) id . parse expr "" . filter (/= ' ')
main :: IO ()
main = do
file <- readFile "expr"
putStr $ show $ eval file
putStr "\n"
and here's my self-contained precedence climbing parser in F#:
let rec (|Expr|) = function
| P(f, xs) -> Expr(loop (' ', f, xs))
| xs -> invalidArg "Expr" (sprintf "%A" xs)
and loop = function
| ' ' as oop, f, ('+' | '-' as op)::P(g, xs)
| (' ' | '+' | '-' as oop), f, ('*' | '/' as op)::P(g, xs) ->
let h, xs = loop (op, g, xs)
match op with
| '+' -> (+) | '-' -> (-) | '*' -> (*) | '/' | _ -> (/)
|> fun op -> loop (oop, op f h, xs)
| _, f, xs -> f, xs
and (|P|_|) = function
| '('::Expr(f, ')'::xs) -> Some(P(f, xs))
| c::_ as xs when '0' <= c && c <= '9' ->
let rec loop n = function
| c2::xs when '0' <= c2 && c2 <= '9' -> loop (10*n + int(string c2)) xs
| xs -> Some(P(n, xs))
loop 0 xs
| _ -> None
My impression is that even state-of-the-art parser combinators waste a lot of time back tracking. Is that correct? If so, is it possible to write parser combinators that generate state machines to obtain competitive performance or is it necessary to use code generation?
EDIT:
Here's the OCaml script I used to generate a ~2Mb expression for benchmarking:
open Printf
let rec f ff n =
if n=0 then fprintf ff "1" else
fprintf ff "%a+%a*(%a-%a)" f (n-1) f (n-1) f (n-1) f (n-1)
let () =
let n = try int_of_string Sys.argv.(1) with _ -> 3 in
fprintf stdout "%a\n" f n
I've come up with a Haskell solution that is 30× faster than the Haskell solution you posted (with my concocted test expression).
Major changes:
Change Parsec/String to Attoparsec/ByteString
In the fact function, change read & many1 digit to decimal
Made the chainl1 recursion strict (remove $! for the lazier version).
I tried to keep everything else you had as similar as possible.
import Control.Applicative
import Data.Attoparsec
import Data.Attoparsec.Char8
import qualified Data.ByteString.Char8 as B
expr :: Parser Int
expr = chainl1 term ((+) <$ char '+' <|> (-) <$ char '-')
term :: Parser Int
term = chainl1 fact ((*) <$ char '*' <|> div <$ char '/')
fact :: Parser Int
fact = decimal <|> char '(' *> expr <* char ')'
eval :: B.ByteString -> Int
eval = either (error . show) id . eitherResult . parse expr . B.filter (/= ' ')
chainl1 :: (Monad f, Alternative f) => f a -> f (a -> a -> a) -> f a
chainl1 p op = p >>= rest where
rest x = do f <- op
y <- p
rest $! (f x y)
<|> pure x
main :: IO ()
main = B.readFile "expr" >>= (print . eval)
I guess what I concluded from this is that the majority of the slowdown for the parser combinator was that it was sitting on an inefficient base, not that it was a parser combinator, per se.
I imagine with more time and profiling this could go faster, as I stopped when I went past the 25× mark.
I don't know if this would be faster than the precedence climbing parser ported to Haskell. Maybe that would be an interesting test?
I'm currently working on the next version of FParsec (v. 0.9), which will in many situations improve performance by up to a factor of 2 relative to the current version.
[Update: FParsec 0.9 has been released, see http://www.quanttec.com/fparsec ]
I've tested Jon's F# parser implementation against two FParsec implementations. The first FParsec parser is a direct translation of djahandarie's parser. The second one uses FParsec's embeddable operator precedence component. As the input I used a string generated with Jon's OCaml script with parameter 10, which gives me an input size of about 2.66MB. All parsers were compiled in release mode and were run on the 32-bit .NET 4 CLR. I only measured the pure parsing time and didn't include startup time or the time needed for constructing the input string (for the FParsec parsers) or the char list (Jon's parser).
I measured the following numbers (updated numbers for v. 0.9 in parens):
Jon's hand-rolled parser: ~230ms
FParsec parser #1: ~270ms (~235ms)
FParsec parser #2: ~110ms (~102ms)
In light of these numbers, I'd say that parser combinators can definitely offer competitive performance, at least for this particular problem, especially if you take into account that FParsec
automatically generates highly readable error messages,
supports very large files as input (with arbitrary backtracking), and
comes with a declarative, runtime-configurable operator-precedence parser module.
Here's the code for the two FParsec implementations:
Parser #1 (Translation of djahandarie's parser):
open FParsec
let str s = pstring s
let expr, exprRef = createParserForwardedToRef()
let fact = pint32 <|> between (str "(") (str ")") expr
let term = chainl1 fact ((str "*" >>% (*)) <|> (str "/" >>% (/)))
do exprRef:= chainl1 term ((str "+" >>% (+)) <|> (str "-" >>% (-)))
let parse str = run expr str
Parser #2 (Idiomatic FParsec implementation):
open FParsec
let opp = new OperatorPrecedenceParser<_,_,_>()
type Assoc = Associativity
let str s = pstring s
let noWS = preturn () // dummy whitespace parser
opp.AddOperator(InfixOperator("-", noWS, 1, Assoc.Left, (-)))
opp.AddOperator(InfixOperator("+", noWS, 1, Assoc.Left, (+)))
opp.AddOperator(InfixOperator("*", noWS, 2, Assoc.Left, (*)))
opp.AddOperator(InfixOperator("/", noWS, 2, Assoc.Left, (/)))
let expr = opp.ExpressionParser
let term = pint32 <|> between (str "(") (str ")") expr
opp.TermParser <- term
let parse str = run expr str
In a nutshell, parser combinators are slow for lexing.
There was a Haskell combinator library for building lexers (see "Lazy Lexing is Fast" Manuel M. T. Chakravarty) - as the tables were generated at runtime, there wasn't the hassle of code generation. The library got used a bit - it was initially used in one of the FFI preprocessors, but I don't think it ever got uploaded to Hackage, so maybe it was a little too inconvenient for regular use.
In the OCaml code above, the parser is directly matching on char-lists so it can be as fast as list destructuring is in the host language (it would be much faster than Parsec if it were re-implemented in Haskell). Christian Lindig had an OCaml library that had a set of parser combinators and a set of lexer combinators - the lexer combinators were certainly much simpler than Manuel Chakravarty's, and it might might be worthwhile tracking down this library and bench-marking it before writing a lexer generator.
Have you tried one of the known fast parser libraries? Parsec's aims have never really been speed, but ease of use and clarity. Comparing to something like attoparsec may be a more fair comparison, especially because the string types are likely to be more equal (ByteString instead of String).
I also wonder which compile flags were used. This being another trolling post by the infamous Jon Harrop, it would not surprise me if no optimisations were used at all for the Haskell code.

Resources