The documentation for Parsec.Expr.buildExpressionParser says:
Prefix and postfix operators of the same precedence can only occur
once (i.e. --2 is not allowed if - is prefix negate).
and indeed, this is biting me, since the language I am trying to parse allows arbitrary repetition of its prefix and postfix operators (think of a C expression like **a[1][2]).
So, why does Parsec make this restriction, and how can I work around it?
I think I can move my prefix/postfix parsers down into the term parser since they have the highest precedence.
i.e.
**a + 1
is parsed as
(*(*(a)))+(1)
but what could I have done if I wanted it to parse as
*(*((a)+(1)))
if buildExpressionParser did what I want, I could simply have rearranged the order of the operators in the table.
Note See here for a better solution
I solved it myself by using chainl1:
prefix p = Prefix . chainl1 p $ return (.)
postfix p = Postfix . chainl1 p $ return (flip (.))
These combinators use chainl1 with an op parser that always succeeds, and simply composes the functions returned by the term parser in left-to-right or right-to-left order. These can be used in the buildExprParser table; where you would have done this:
exprTable = [ [ Postfix subscr
, Postfix dot
]
, [ Prefix pos
, Prefix neg
]
]
you now do this:
exprTable = [ [ postfix $ choice [ subscr
, dot
]
]
, [ prefix $ choice [ pos
, neg
]
]
]
in this way, buildExprParser can still be used to set operator precedence, but now only sees a single Prefix or Postfix operator at each precedence. However, that operator has the ability to slurp up as many copies of itself as it can, and return a function which makes it look as if there were only a single operator.
Related
I'm attempting to parse permutations of flags. The behavior I want is "one or more flags in any order, without repetition". I'm using the following packages:
megaparsec
parser-combinators
The code I have is outputting what I want, but is too lenient on inputs. I don't understand why it's accepting multiples of the same flags. What am I doing wrong here?
pFlags :: Parser [Flag]
pFlags = runPermutation $ f <$>
toPermutation (optional (GroupFlag <$ char '\'')) <*>
toPermutation (optional (LeftJustifyFlag <$ char '-'))
where f a b = catMaybes [a, b]
Examples:
"'-" = [GroupFlag, LeftJustifyFlag] -- CORRECT
"-'" = [LeftJustifyFlag, GroupFlag] -- CORRECT
"''''-" = [GroupFlag, LeftJustifyFlag] -- INCORRECT, should fail if there's more than one of the same flag.
Instead of toPermutation with optional, I believe you need to use toPermutationWithDefault, something like this (untested):
toPermutationWithDefault Nothing (Just GroupFlag <$ char '\'')
The reasoning is described in the paper “Parsing Permutation Phrases” (PDF) in §4, “adding optional elements” (emph. added):
Consider, for example […] all permutations of a, b and c. Suppose b can be empty and we want to recognise ac. This can be done in three different ways since the empty b can be recognised before a, after a or after c. Fortunately, it is irrelevant for the result of a parse where exactly the empty b is derived, since order is not important. This allows us to use a strategy similar to the one proposed by Cameron: parse nonempty constituents as they are seen and allow the parser to stop if all remaining elements are optional. When the parser stops the default values are returned for all optional elements that have not been recognised.
To implement this strategy we need to be able to determine whether a parser can derive the empty string and split it into its default value and its non-empty part, i.e. a parser that behaves the same except that it does not recognise the empty string.
That is, the permutation parser needs to know which elements can succeed without consuming input, otherwise it will be too eager to commit to a branch. I don’t know why this would lead to accepting multiples of an element, though; perhaps you’re also missing an eof?
Consider the following code snippet in Idris:
myList : List Int
myList = [
1,
2,
3
]
The closing delimiter ] is on the same column as the declaration itself. I find this a quite natural way to want to format long, multi-line lists.
However, the equivalent snippet in Haskell fails to compile with a syntax error:
myList :: [Int]
myList = [
1,
2,
3
]
>> main.hs:9:1: error:
>> parse error (possibly incorrect indentation or mismatched brackets)?
>> |
>> 9 | ]
>> | ^
And requires instead the the closing delimiter ] is placed on a column number strictly greater than where the expression is declared. Or at least, as far as I can garner, this seems to be what is going on.
Is there a reason Haskell doesn't like this syntax? I know there are some subtle interactions between the Haskell parser and lexer to enable Haskell's implementation of the offsides rule, so perhaps it has something to do with that.
Well, ultimately the answer is just “because the Haskell language standard demands it to be parsed this way”.
As to for some reasoning why this is a good idea, it's that indentation is the primary way code is structured, and parentheses/brackets only come in locally. I find this much more consequent than Python's attitude that indentation is kind of the primary structure, but for an expression to spread over multiple lines you actually need to wrap it in parentheses. (Not saying that these are the only two ways it could be done.)
Note that if you really want, you can always disable the indentation sensitivity completely, with something like
myList :: [Int]
myList = l where {
l = [
1,
2,
3
]}
But I would not recommend it. The preferred style to write multiline lists is
myList
= [ 1
, 2
, 3
]
or
myList = [ 1
, 2
, 3 ]
Again, I would argue that this leading-comma style is much preferrable to the trailing-comma one most programmers in other languages use, especially for nested lists: the commas become “bullet points” aligned with the opening bracket, which makes the AST structure very clear.
myMonstrosity :: [(Int, [([Int], Int)])]
= [ ( 1
, [ ( [37,43]
, 9 )
, ( [768,4,9807,3,4,98]
, 15 ) ]
)
, ( 2, [] )
, ( 3
, [ ( [], 300 )
, ( [0..4000], -5 ) ]
)
]
i've been working on an antlr4 grammar for Z Notation (ISO UTF version), and the specification calls for a lex phase, and then a "2 phased" parse.
you first lex it into a bunch of NAME (or DECORWORD) tokens, and then you parse the resulting tokens against the operatorTemplate rules in the spec's parser grammar, replace appropriate tokens, and then finally parse your new modified token stream to get the AST.
i have the above working, but i can't figure out how to set the precedence and associativity of the parser rules dynamically, so the parse trees are wrong.
the operator syntax looks like (numbers are precedence):
-generic 5 rightassoc (_ → _)
-function 65 rightassoc (_ ◁ _)
i don't see any api to set the associativity on a rule, so i tried with semantic predicates, something like:
expression:
: {ZSupport.isLeftAssociative()}? expression I expression
| <assoc=right> expression i expression
;
or
expression:
: {ZSupport.isLeftAssociative()}? expression i expression
| <assoc=right> {ZSupport.isRightAssociative()}? expression I expression
;
but then i get "The following sets of rules are mutually left-recursive [expression]"
can this be done?
I was able to accomplish this by moving the semantic predicate:
expression:
: expression {ZSupport.isLeftAssociative()}? I expression
| <assoc=right> expression I expression
;
I was under the impression that this wasn't going to work based on this discussion:
https://stackoverflow.com/a/23677069/7711235
...but it does seem to work correctly in all my test cases...
I have such a data type :
data Node a = Node
{ label :: a,
adjacent :: [(a,Int)] } deriving Show
Example : ( Node 'a' [ ( 'b' , 3 ) , ( 'c' ,2 ) ] )
I want to get the label from this structure, I wrote this function (and several other combinations which I thought might work) :
giveLabel Node a [(c,b)] = a;
But I keep getting errors. Can you tell me how should I change my function? Thanks
giveLabel (Node a [(c,b)]) = a
Is the syntax you want - defining functions uses the same rules of precedence as calling them, and according to those rules, you defined a function giveLabel with three arguments (Node, a, and [c,b]); and that was illegal because in that context Node was missing arguments.
Even that probably isn't what you want - the pattern [(c,b)] only matches lists with exactly one item in. Since you don't care about the list of neighbours you can write:
giveLabel (Node a xs) = a
...where xs will bind to the whole list of neighbours; but actually since you don't even care about that, you can write:
giveLabel (Node a _) = a
...where _ is a useful way of pattern matching against a parameter you aren't going to use.
I'm currently writing a parser for a simple programming language. It's getting there however I'm unable to parse a boolean logic statement such as "i == 0 AND j == 0". All I get back is "non exhaustive patterns in case"
When I parse a boolean expression on its own it works fine e.g. "i == 0". Note "i == 0 a" will also return a boolean statement but "i == 0 AND" does not return anything.
Can anyone help please?
Whilst the above works correctly for input such as run parseBoolean "i == 0"
As #hammar points out, you should use Text.Parsec.Expr for this kind of thing. However, since this is homework, maybe you have to do it the hard way!
The problem is in parseArithmetic, you allow anyChar to be an operator, but then in the case statement, you only allow for +, -, *, /, %, and ^. When parseArithmetic tries to parse i == 0, it uses the first = as the operator, but can't parse an intExp2 from the second =, and fails in the monad, and backtracks, before getting to the case statement. However, when you try to parse i == 0 AND j == 0, it gets the i == part, but then it thinks that there's an arithmetic expression of 0 A ND, where A is an operator, and ND is the name of some variable, so it gets to the case, and boom.
Incidentally, instead of using the parser to match a string, and then using a case statement to match it a second time, you can have your parser return a function instead of a string, and then apply the function directly:
parseOp :: String -> a -> Parser a
parseOp op a = string op >> spaces >> return a
parseLogic :: Parser BoolExp
parseLogic = do
boolExp1 <- parseBoolExp
spaces
operator <- choice [ try $ parseOp "AND" And
, parseOp "OR" Or
, parseOp "XOR" XOr
]
boolExp2 <- parseBoolExp
return $ operator boolExp1 boolExp2
parseBoolean :: Parser BoolExp
parseBoolean = do
intExp1 <- parseIntExp
spaces
operator <- choice [ try $ parseOp "==" Main.EQ
, parseOp "=>" GTorEQ
, parseOp "<=" LTorEQ
]
intExp2 <- parseIntExp
return $ operator intExp1 intExp2