The example from ANTLR website (https://www.antlr.org/) makes *// be higher priority than +/-, but I'm confused with the meaning.
grammar Expr;
prog: (expr NEWLINE)* ;
expr: expr ('*'|'/') expr
| expr ('+'|'-') expr
| INT
| '(' expr ')'
;
NEWLINE : [\r\n]+ ;
INT : [0-9]+ ;
Because I think it's top-down parsing, if we input 100+2*34, the *// rule should be select first, makes it higher in the parsing tree, and so the result should be interpreted as (100+2)*34. But from the resulting parsing tree on the website, it's 100+(2*34).
Can someone help me to clarify this?
The precedence for operators is indeed defined by the order of the alternatives in the expr rule. I assume by "higher in the parse tree" you mean the vertical position when printing the tree top-down, correct? However, when talking about the position of a tree node, you usually use the tree level (with the root being level 0). So, the deeper a node is in the tree, the higher is its position.
With that out of the way we can now look at your question: Of course the operators with lower precedence (here plus and minus) should appear at a lower level (closer to the root node), because evaluation of expressions begins at the leafs, that is, the deepest tree level.
Related
In a code base I'm reading, I found a function declaration like this (some parts are missing):
filepathNormalise :: BS.ByteString -> BS.ByteString
filepathNormalise xs
| isWindows, Just (a,xs) <- BS.uncons xs, sep a, Just (b,_) <- BS.uncons xs, sep b
= '/' `BS.cons` f xs
What does the comma do here?
(Only as a bonus, if someone readily knows this: is this syntax mentioned in Haskell Programming from first principles, and if so, where? As I can't remember reading about it.)
Guards are described in Haskell 2010 section 3.13, Case Expressions
(that section is about case expressions, not top-level declarations, but presumably the semantics are the same):
guards → | guard1, …, guardn (n ≥ 1)
guard → pat <- infixexp (pattern guard)
| let decls (local declaration)
| infixexp (boolean guard)
For each guarded expression, the comma-separated guards are tried sequentially from left to right. If all of them succeed, then the corresponding expression is evaluated in the environment extended with the bindings introduced by the guards. That is, the bindings that are introduced by a guard (either by using a let clause or a pattern guard) are in scope in the following guards and the corresponding expression. If any of the guards fail, then this guarded expression fails and the next guarded expression is tried.
In the simple case, the comma serves a role similar to Boolean and. But the comma is more powerful in that each guard can introduce new bindings that are used by the subsequent guards (proceeding from left to right).
Commas in guards are uncommon enough (in my experience, at least) that I'd describe this feature as Haskell trivia -- not at all necessary to writing (or, for the most part, reading) Haskell. I suspect that Haskell Programming from first principles omits it for that reason.
This syntax is not legal in Haskell '98; this was added to the language specification in Haskell 2010. It's part of the "pattern guards" language extension.
https://prime.haskell.org/wiki/PatternGuards
The real usefulness in this is in allowing you to pattern match inside a guard clause. The syntactic change also has the side-effect of allowing you to AND together several Boolean terms using commas.
(I personally really dislike this extension, and I'm a little shocked it made it into the official spec, but there we are...)
I am reading the Haskell report 2010 and has some questions with regarding to the meta-logical representation in section 2.4Here:
In the mnemonics of "varid" and "varsym", does "var" mean variable?
My understanding are that "varid" are identifiers for variables and functions, while "varsym" are identifiers also, but for operators. Is this understanding correct?
If 1 and 2 are correct, does it mean operator is also a kind of variable? (I am very confused because this is likely not right.)
Appreciate any advice.
As far as I can tell, the report is defining the difference between symbols that are used prefix, and those that are used infix, for example:
f x y -- f is used prefix
a / b -- / is used infix
This is just a syntactic convenience, as all prefix symbols can be used infix with backticks, and all infix symbols can be used prefix with ()s:
x `f` y -- infix
(/) a b -- prefix
(a /) b -- operator section
(/ b) a -- operator section
Sub-questions:
yes, but I can't figure out any meaningful mnemonic for the id and sym parts. :(
operators are in the realm of Haskell syntax, not its semantics. They're only used to provide a more convenient syntax for writing some expressions. As far as I know, if they were removed from Haskell, the only loss would be convenient syntax -- i.e. there's nothing that you need operators for, other than convenient syntax, and you can replace every single use of operators with non-operator symbols. They are completely identical to variables -- they are variables -- but require different syntax for their use.
yes, I would agree that operator symbols are variables. However, the values bound to oerators symbols would not be variables.
In defining multiple pattern matches for a function, for instance as follows:
1: takeTree 0 tree = Leaf
2: takeTree levels (Leaf) = Leaf
3: takeTree levels (Branch value left right) = Branch value (takeTree...
I get two warnings in particular:
Source.hs:1: Warning: Defined but not used: `tree'
Source.hs:2: Warning: Defined but not used: `levels'
I'm not immediately convinced these are useful warnings though. If my code were instead:
1: takeTree 0 _ = Leaf
2: takeTree _ (Leaf) = Leaf
3: takeTree levels (Branch value left right) = Branch value (takeTree...
which, fixes the warnings, I now find it to be far less readable, and obfuscates the semantics of what I expect as input values.
Why is Defined but not used a reasonable warning at all here, when among my exhaustive patterns each argument is in fact used at least once?
The assumption that if something was important enough to name it must be important enough to use is reasonable in many styles of coding.
But you can have your cake and eat it too:
takeTree 0 _tree = Leaf
takeTree _levels (Leaf) = Leaf
takeTree levels (Branch value left right) = Branch value (takeTree...
The leading underscore in the name signals to both the human reader and the compiler that the name is not intended to be used in this equation, but the name being longer than a single underscore can still convey more meaning to the human.
I've made coding errors that were pointed out by this warning. Simplified example:
fun x xs = go xs
where
go [] = ...
go (y:xs') = f y (go xs)
The recursive call should have xs' as argument, of course, and the "defined but not used" warning will catch that. To me, that's worth the inconvenience of using _ for unused matches.
The compiler is not able to guess your intentions, and the fact that you used the argument in another match does not mean that you didn't mean to use it in the match that generates the warning. After all, you could have used tree, so the warning that you didn't is reasonable.
See also the answer by Ben: you can use _name to use a name but still suppress the warning.
The compiler's trying to suggest that you use a certain style of coding. If you don't like it (fair enough), there's a way to avoid the problem. See e.g.:
How to [temporarily] suppress "defined but not used" warnings?
As to the substance of the matter (whether or not this is a useful warning): the downside of naming your variables in such situations is it suggests that the names are meaningful to the compiler, when they're not. The upside, as you rightly point out, is that they are meaningful to humans. Basically there's a trade-off and it's quite subjective. The important thing is that you can get the behaviour you want if desired.
In any programming language textbooks, we are always told how each operator in that language has either left or right associativity. It seems that associativity is a fundamental property of any operator regardless of the number of operands it takes. It also seems to me that we can assign any associativity to any operator regardless of how we assign associativity to other operators.
But why is it the case? Perhaps an example is better. Suppose I want to design a hypothetical programming language. Is it valid to assign associativity to these operators in this arbitrary way (all having the same precedence):
unary operator:
! right associative
binary operators:
+ left associative
- right associative
* left associative
/ right associative
! + - * / are my 5 operators all having the same precedence.
If yes, how would an expression like 2+2!3+5*6/3-5!3!3-3*2 is parenthesized by my hypothetical parser? And why.
EDIT:
The first example (2+2!3+5*6/3-5!3!3-3*2) is incorrect. Perhaps forget about the unary op and let me put it this way, can we assign operators having the same precedence different associativity like the way I did above? If yes how would an example,say 2+3-4*5/3+2 be evaluated? Because most programming language seems to assign the same associativity to the operators having the same precedence. But we always talk about OPERATOR ASSOCIATIVITY as if it is a property of an individual operator - not a property of a precedence level.
Let us remember what associativity means. Take any operator, say #. Its associativity, as we all know, is the rule that disambiguates expressions of the form a # b # c: if # is left associative, it's parsed as (a # b) # c; if it's right associative, a # (b # c). It could also be nonassociative, in which case a # b # c is a syntax error.
What about if we have two different operators, say # and #? If one is of higher precedence than the other, there's nothing more to say, no work for associativity to do; precedence takes care of the disambiguation. However, if they are of equal precedence, we need associativity to help us. There are three easy cases:
If both operators are left associative, a # b # c means (a # b) # c.
If both operators are right associative, a # b # c means a # (b # c).
If both operators are nonassociative, then a # b # c is a syntax error.
In the remaining cases, the operators do not agree about associativity. Which operator's choice gets precedence? You could probably devise such associativity-precedence rules, but I think the most natural rule to impose is to declare any such case syntax errors. After all, if two operators are of equal precedence, why one would have associativity-precedence over the other?
Under the natural rule I just gave, your example expression is a syntax error.
Now, we could certainly assign differing associativities to operators of the same precedence. However, this would mean that there are combinations of operators of equal precedence (such as your example!) that are syntax errors. Most language designers seem to prefer to avoid that and assign the same associativity to all operators of equal precedence; that way, all combinations are legal. It's just aesthetics, I think.
You have to define associativity somehow, and most languages choose to assign associativity (and precedence) "naturally" -- to match the rules of common mathematics.
There are notable exceptions, however -- APL has strict right-to-left associativity, with all operators at the same precedence level.
I am working on a compiler/proof checker, and I was wondering, if I had a syntax tree such as this, for example:
data Expr
= Lambdas (Set String) Expr
| Var String
| ...
if there were a way to check the alpha-equivalence (equivalence modulo renaming) of Exprs. This Expr, however, is different from the lambda calculus in that the set of variables in a lambda is commutative -- i.e. the order of parameters does not factor in to the checking.
(For simplicity, however, Lambda ["x","y"] ... is distinct from Lambda ["x"] (Lambda ["y"] ...), and in that case the order does matter).
In other words, given two Exprs, how can one efficienly find a renaming from one to the other? This kind of combinatorial problem smells of NP-complete.
The commutativity of the parameters does hint at an exponential comparision, true.
But I suspect you can normalize the parameter lists so you only have to compare them in single order. Then a tree compare with renaming would be essentially linear in the size of the trees.
What I suggest doing is that for each parameter list, visit the subtree in (in-order, postorder, doesn't matter as long as you are consistent) and sort the parameter by the index of the order in which the visit first encounter a parameter use. So if you have
lambda(a,b): .... b ..... a ... b ....
you'd sort the parameter list as:
lambda(b,a)
because you encounter b first, then a second, and the additional encounter of b doesn't matter. Compare the trees with the normalized parameters list.
Life gets messier if you insist the the operators in a lambda clause can be commutative. My guess is that you can still normalize it.
We can appeal to Daan Leijen's HMF for a few ideas. (He dealing with binders for 'foralls', which also come across as commutative.
In particular, he rebinds the variables in the occurrence order in the body.
Then comparison of terms involves skolemizing both the same way and comparing the results.
We can do better than that by replacing that skolemization pass with a locally nameless representation.
data Bound t a = Bound {-# UNPACK #-} !Int t | Unbound a
instance Functor (Bound t) where ...
instance Bifunctor Bound where ...
data Expr a
= Lambdas {-# UNPACK #-} !Int (Expr (Bound () a))
| Var a
So now occurrences of Bound under a lambda are the variables bound directly by the lambda, along with any type information you want to put in the occurence, here I just used ().
Now closed terms are polymorphic in 'a' and, if you sort the elements of the lambda by their use site (and can ensure that you always canonicalize the lambda by removing unused terms) alpha equivalent terms compare simply with (==), and if you need open terms you can work with Expr String or some other representation.
A more anal retentive version of the signature for Expr and Bound would use an existential type and a type level natural to identify the number of variables being bound, and use 'Fin' in the Bound constructor, but since you already have to maintain the invariant that you bind no more variables than the # occurring in the lambda and that the type information agrees across all of Var (Bound n _) with the same n, its not too much of a burden to maintain another.
Update: You can use my bound package to do an improved version of this in a fully self-contained way!