Haskell: Get a prefix operator that works without parentheses - haskell

One big reason prefix operators are nice is that they can avoid the need for parentheses so that + - 10 1 2 unambiguously means (10 - 1) + 2. The infix expression becomes ambiguous if parens are dropped, which you can do away with by having certain precedence rules but that's messy, blah, blah, blah.
I'd like to make Haskell use prefix operations but the only way I've seen to do that sort of trades away the gains made by getting rid of parentheses.
Prelude> (-) 10 1
uses two parens.
It gets even worse when you try to compose functions because
Prelude> (+) (-) 10 1 2
yields an error, I believe because it's trying to feed the minus operation into the plus operations rather than first evaluating the minus and then feeding--so now you need even more parens!
Is there a way to make Haskell intelligently evaluate prefix notation? I think if I made functions like
Prelude> let p x y = x+y
Prelude> let m x y = x-y
I would recover the initial gains on fewer parens but function composition would still be a problem. If there's a clever way to join this with $ notation to make it behave at least close to how I want, I'm not seeing it. If there's a totally different strategy available I'd appreciate hearing it.
I tried reproducing what the accepted answer did here:
Haskell: get rid of parentheses in liftM2
but in both a Prelude console and a Haskell script, the import command didn't work. And besides, this is more advanced Haskell than I'm able to understand, so I was hoping there might be some other simpler solution anyway before I do the heavy lifting to investigate whatever this is doing.

It gets even worse when you try to compose functions because
Prelude> (+) (-) 10 1 2
yields an error, I believe because it's
trying to feed the minus operation into the plus operations rather
than first evaluating the minus and then feeding--so now you need even
more parens!
Here you raise exactly the key issue that's a blocker for getting what you want in Haskell.
The prefix notation you're talking about is unambiguous for basic arithmetic operations (more generally, for any set of functions of statically known arity). But you have to know that + and - each accept 2 arguments for + - 10 1 2 to be unambiguously resolved as +(-(10, 1), 2) (where I've explicit argument lists to denote every call).
But ignoring the specific meaning of + and -, the first function taking the second function as an argument is a perfectly reasonable interpretation! For Haskell rather than arithmetic we need to support higher order functions like map. You would want not not x has to turn into not(not(x)), but map not x has to turn into map(not, x).
And what if I had f g x? How is that supposed to work? Do I need to know what f and g are bound to so that I know whether it's a case like not not x or a case like map not x, just to know how to parse the call structure? Even assuming I have all the code available to inspect, how am I supposed to figure out what things are bound to if I can't know what the call structure of any expression is?
You'd end up needing to invent disambiguation syntax like map (not) x, wrapping not in parentheses to disable it's ability to act like an arity-1 function (much like Haskell's actual syntax lets you wrap operators in parentheses to disable their ability to act like an infix operator). Or use the fact that all Haskell functions are arity-1, but then you have to write (map not) x and your arithmetic example has to look like (+ ((- 10) 1)) 2. Back to the parentheses!
The truth is that the prefix notation you're proposing isn't unambiguous. Haskell's normal function syntax (without operators) is; the rule is you always interpret a sequence of terms like foo bar baz qux etc as ((((foo) bar) baz) qux) etc (where each of foo, bar, etc can be an identifier or a sub-term in parentheses). You use parentheses not to disambiguate that rule, but to group terms to impose a different call structure than that hard rule would give you.
Infix operators do complicate that rule, and they are ambiguous without knowing something about the operators involved (their precedence and associativity, which unlike arity is associated with the name not the actual value referred to). Those complications were added to help make the code easier to understand; particularly for the arithmetic conventions most programmers are already familiar with (that + is lower precedence than *, etc).
If you don't like the additional burden of having to memorise the precedence and associativity of operators (not an unreasonable position), you are free to use a notation that is unambiguous without needing precedence rules, but it has to be Haskell's prefix notation, not Polish prefix notation. And whatever syntactic convention you're using, in any language, you'll always have to use something like parentheses to indicate grouping where the call structure you need is different from what the standard convention would indicate. So:
(+) ((-) 10 1) 2
Or:
plus (minus 10 1) 2
if you define non-operator function names.

Related

Precedence of function application

In order to illustrate function application has the highest precedence in Haskell the following example was provided (by schoolofhaskell):
sq b = b * b
main = print $
-- show
sq 3+1
-- /show
The result here is 10.
What puzzles me is that the argument constitutes a function application too. Consider the operator + being a short-cut of a function. So when the argument is taken, I would expect that its function application now takes precedence over the original one.
Written that way it delivers the expected result:
sq b = b * b
main = print $
sq ((+) 3 1 )
Any explanation?
What puzzles me is that the argument constitutes a function application too. Consider the operator "+" being a short-cut of a function.
I think this is the heart of the misunderstanding (actually, very good understanding!) involved here. It is true that 3 + 1 is an expression denoting the application of the (+) function to 3 and 1, you have understood that correctly. However, Haskell has two kinds of function application syntax, prefix and infix. So the more precise version of "function application has the highest precedence" would be something like "syntactically prefix function application has higher precedence than any syntactically infix function application".
You can also convert back and forth between the two forms. Each function has a "natural" position: names with only symbols are naturally syntactically infix and names with letters and numbers and the like are naturally syntactically prefix. You can turn a naturally prefix function to infix with backticks, and a naturally infix function to prefix with parentheses. So, if we define
plus = (+)
then all of the following really mean the same thing:
3 + 1
3 `plus` 1
(+) 3 1
plus 3 1
Returning to your example:
sq 3+1
Because sq is naturally prefix, and + is naturally infix, the sq application takes precedence.
So when the argument is taken, would expect that its function application now takes precedence over the original one.
The Haskell grammar [Haskell report] specifies:
exp10
→ …
| …
| fexp
fexp
→ [fexp] aexp (function application)
This means that function application syntax has precedence 10 (the superscript on the exp10 part).
This means that, when your expression is parsed, the empty space in sq 3 takes precedence over the + in 3+1, and thus sq 3+1 is interpreted as (sq 3) + 1 which semantically means that it squares 3 first, and then adds 1 to the result, and will thus produce 10.
If you write it as sq (3 + 1) or in canonical form as sq ((+) 3 1) it will first sum up 3 and 1 and then determine the square which will produce 16.
The addition operator is syntactically different from a function application, and that is what determines its operator precedence.
If you rewrite your addition (3 + 1) as a function application ((+) 3 1), the slice (+) follows special slicing rules inside its own parentheses, but outside the slice it's just another parenthesized expression.
Note that your "expected result" is not really parallel to your original example:
sq 3 + 1 -- original example, parsed `(sq 3) + 1`
sq ((+) 3 1) -- "expected result", has added parentheses to force your parse
sq (3 + 1) -- this is the operator version of "expected result"
In Haskell, the parentheses are not part of function application -- they are used solely for grouping!
That is to say: like (+) is just another parenthesized expression, so is (3 + 1)
I think your confusion is just the result of slightly imprecise language (on the part of both the OP and the School of Haskell page linked).
When we say things like "function application has higher precedence than any operator", the term "function application" there is not actually a phrase meaning "applying a function". It's a name for the specific syntactic form func arg (where you just write two terms next to each other in order to apply the first one to the second). We are trying to draw a distinction between "normal" prefix syntax for applying a function and the infix operator syntax for applying a function. So with this specific usage, sq 3 is "function application" and 3 + 1 is not. But this isn't claiming that sq is a function and + is not!
Whereas in many other contexts "function application" does just mean "applying a function"; there it isn't a single term, but just the ordinary meaning of the words "function" and "application" that happen to be used together. In that sense, both sq 3 and 3 + 1 are examples of "function application".
These two senses of the term arise because there are two different contexts we use when thinking about the code1: logical and syntactic. Consider if we define:
add = (+)
In the "logical" view where we think about the idealised mathematical objects represented by our code, both add and (+) are simply functions (the same function in fact). They are even exactly the same object (we defined one by saying it was equal to the other). This underlying mathematical function exists independently of any name, and has exactly the same properties no matter how we choose to refer to it. In particular, the function can be applied (since that is basically the sole defining feature of a function).
But at the syntactic level, the language simply has different rules about how you can use the names add and + (regardless of what underlying objects those names refer to). One of these names is an operator, and the other is not. We have special syntactic rules for the how you need to write the application of an operator, which differs from how you need to write the application of any non-operator term (including but not limited to non-operator names like sq2). So when talking about syntax we need to be able to draw a distinction between these two cases. But it's important to remember that this is a distinction about names and syntax, and has nothing to do with the underlying functions being referred to (proved by the fact that the exact same function can have many names in different parts of the program, some of them operators and some of them not).
There isn't really an agreed upon common term for "any term that isn't an operator"; if there was we would probably say "non-operator application has higher precedence than any operator", since that would be clearer. But for better or worse the "application by simple adjacency" syntactic feature is frequently referred to as "function application", even though that term could also mean other things in other contexts3.
So (un?)fortunately there isn't really anything deeper going on here than the phrase "function application" meaning different things in different contexts.
1 Okay, there are way more than two contexts we might use to think about our code. There are two relevant to the point I'm making.
2 For an example of other non-operator terms that can be applied we can also have arbitrary expressions in parentheses. For example (compare on fst) (1, ()) is the non-operator application of (compare `on` fst) to (1, ()); the expression being applied is itself the operator-application of on to compare and fst.
3 For yet another usage, $ is often deemed to be the "function application operator"; this is perhaps ironic when considered alongside usages that are trying to use the phrase "function application" specifically to exclude operators!

What is the difference between normal and lambda Haskell functions?

I'm a beginner to Haskell and I've been following the e-book Get Programming with Haskell
I'm learning about closures with Lambda functions but I fail to see the difference in the following code:
genIfEven :: Integral p => p -> (p -> p) -> p
genIfEven x = (\f -> isEven f x)
genIfEven2 :: Integral p => (p -> p) -> p -> p
genIfEven2 f x = isEven f x
It would be great if anyone could explain what the precise difference here is
At a basic level1, there isn't really a difference between "normal" functions and ones created with lambda syntax. What made you think there was a difference to ask what it is? (In the particular example you've shown, the functions take their parameters in a different order, but are otherwise the same; either of them could be defined with lambda syntax or "normal" syntax)
Functions are first class values in Haskell. Which means you can pass them to other functions, return them as results, store and retrieve them in data structures, etc, etc. Just like you can with numbers, or strings, or any other value.
Just like with numbers, strings, etc, it's helpful to have syntax for denoting a function value, because you might want to make a simple one right in the middle of other code. It would be pretty horrible if you, say, needed to pass x + 1 to some function and you couldn't just write the literal 1 for the number one, you had to instead go elsewhere in the file and add a one = 1 binding so that you could come back and write x + one. In exactly the same way, you might need to pass to some other function a function for adding 1; it would be annoying to go elsewhere in the file and add a separate definition plusOne x = x + 1, so lambda syntax gives us a way of writing "function literals": \x -> x + 1.2
Considering "normal" function definition syntax, like this:
incrementAllBy _ [] = []
incrementAllBy n (x:xs) = (x + n) : xs
Here we don't have any bit of source code that just represents the function value that incrementAllBy is a name for. The function is implied in this syntax, spread over (possibly) multiple "rules" that say what value our function returns given that it is applied to arguments of a certain form. This syntax also fundamentally forces us to bind the function to a name. All of that is in contrast to lambda syntax which just directly represents the function itself, with no bundled case analysis or name.
However they are both merely different ways of writing down functions. Once defined, there is no difference between the functions, whichever syntax you used to express them.
You mentioned that you were learning about closures. It's not really clear how that relates to the question, but I can guess it's being a bit confusing.
I'm going to say something slightly controversial here: you don't need to learn about closures.3
A closure is what is involved in making something like incrementAllBy n xs = map (\x -> x + n) xs work. The function created here \x -> x + n depends on n, which is a parameter so it can be different every time incrementAllBy is called, and there can be multiple such calls running at the same time. So this \x -> x + n function can't end up as just a chunk of compiled code at a particular address in the program's binary, the way top-level functions are. The structure in memory that is passed to map has to either store a copy of n or store a reference to it. Such a structure is called a "closure", and is said to have "closed over" n, or "captured" it.
In Haskell, you don't need to know any of that. I view the expression \n -> x + n as simply creating a new function value, depending on the value n (and also the value +, which is a first-class value too!) that happens to be in scope. I don't think you need to think about this any differently than you would think about the expression x + n creating a new numeric value depending on a local n. Closures in Haskell only matter when you're trying to understand how the language is implemented, not when you're programming in Haskell.
Closures do matter in imperative languages. There the question of whether (the equivalent of) \x -> x + n stores a reference to n or a copy of n (and when the copy is taken, and how deeply) is critical to understanding how code using this function works, because n isn't just a way of referring to a value, it's a variable that has (potentially) different values over time.
But in Haskell I don't really think we should teach beginners about the term or concept of closures. It vastly over-complicates "you can make new functions out of existing values, even ones that are only locally in scope".
So if you have been given these two functions as examples to try and illustrate the concept of closures, and it isn't making sense to you what difference this "closure" makes, you can probably ignore the whole issue and move on to something more important.
1 Sometimes which choice of "equivalent" syntax you use to write your code does affect operational behaviour, like performance. Usually (but not always) the effect is negligible. As a beginner, I highly recommend ignoring such issues for now, so I haven't mentioned them. It's much easier to learn the principles involved in reasoning about how your code might be executed once you've got a thorough understanding of what all the language elements are supposed to mean.
It can sometimes also affect the way GHC infers types (mostly not actually the fact that they're lambdas, but if you bind function names without syntactic parameters as in plusOne = \x -> x + 1 you can trip up the monomorphism restriction, but that's another topic covered in many Stack Overflow questions, so I won't address it here).
2 In this case you could also use an operator section to write an even simpler function literal as (+1).
3 Now I'm going to teach you about closures so I can explain why you don't need to know about closures. :P
There is no difference whatsoever, except one: lambda expressions don't need a name. For that reason, they are sometimes called "anonymous functions" in other languages.
If you plan to use a function often, you'll want to give it a name, if you only need it once, a lambda will usually do, as you can define it at the site where you use it.
You can, of course, name an anonymous function after it is born:
genIfEven2 = \f x -> isEven f x
That would be completely equivalent to your definition.

Does an operators (such as +) behave more like a curried function or a function with an argument of a pair tuple type?

How can I find out the type of operator "+"? says operator + isn't a function.
Which does an operator such as + behave more like,
a curried function, or
a function whose argument has a pair tuple type?
By an operator say #, I mean # not (#). (#) is a curried function, and if # behaves like a curried function, I guess there is no need to have (#), isn't it?
Thanks.
According to the Haskell 2010 report:
An operator symbol [not starting with a colon] is an ordinary identifier.1
Hence, + is an identifier. Identifiers can identify different things, and in the case of +, it identifies a function from the Num typeclass. That function behaves like any other Haskell function, except for different parsing rules of expressions involving said functions.
Specifically, this is described in 3.2.
An operator is a function that can be applied using
infix syntax (Section 3.4), or partially applied using a section (Section 3.5).
In order to determine what constitutes an operator, we can read further:
An operator is either an operator symbol, such as + or $$, or is an ordinary identifier enclosed in grave accents (backquotes), such as `op`2.
So, to sum up:
+ is an identifier. It's an operator symbol, because it doesn't start with a colon and only uses non-alphanumeric characters. When used in an expression (parsed as a part of an expression), we treat it as an operator, which in turn means it's a function.
To answer your question specifically, everything you said involves just syntax. The need for () in expressions is merely to enable prefix application. In all other cases where such disambiguation isn't necessary (such as :type), it might have simply been easier to implement that way, because you can then just expect ordinary identifiers and push the burden to provide one to the user.
1 For what it's worth, I think the report is actually misleading here. This specific statement seems to be in conflict with other parts, crucially this:
Dually, an operator symbol can be converted to an ordinary identifier by enclosing it in parentheses.
My understanding it is that in the first quote, the context for "ordinary" means that it's not a type constructor operator, but a value category operator, hence "value category" means "ordinary". In other quotes, "ordinary" is used for identifiers which are not operator-identifiers; the difference being obviously the application. That would corroborate the fact that enclosing an operator identifier in parens, we turn it into an ordinary identifier for the purposes of prefix application. Phew, at least I didn't write that report ;)
2 I'd like to point out one additional thing here. Neither `(+)` nor (`add`) do actually parse. The second one is understandable, since the report specifically says that enclosing in parens only works for operator identifiers, one can see that `add`, while being an operator, isn't an operator identifier like +.
The first case is actually a bit more tricky for me. Since we can obtain an operator by enclosing an ordinary identifier in backticks, (+) isn't exactly "as ordinary" as add. The language, or at least GHC parser that I tested this with, seems to differentiate between "ordinary ordinary" identifiers, and ordinary identifiers obtained by enclosing operator symbols with parens. Whether this actually contradicts the spec or is another case of mixed naming is beyond my knowledge at this point.
(+) and + refer to the same thing, a function of type Num a => a -> a -> a. We spell it as (+) when we want to use it prefix, and + when we want to write it infix. There is no essential difference; it is just a matter of syntax.
So + (without section) behaves just like a function which requires a pair argument [because a section is needed for partial application]. () works like currying.
While this is not unreasonable from a syntactical point of view, I wouldn't describe it like that. Uncurried functions in Haskell take a tuple as an argument (say, (a, a) -> a as opposed to a -> a -> a), and there isn't an actual tuple anywhere here. 2 + 3 is better thought of as an abbreviation for (+) 2 3, the key point being that the function being used is the same in both cases.

Which programming languages allow you to define paired-bracket like operators?

I am looking to create a DSL and I'm looking for a language where you can define your own bracket-style operators for things like the floor and ceiling functions. I'd rather not go the route of defining my own Antlr parser for a custom syntax.
As a far as I know the only languages I know of that allow you to define custom operators are all binary infix operators.
tl;dr: Which languages allow for defined paired symbol (like opening bracket/closed bracket) operators?
Also, I don't see how this question can be "too broad" if no one has named a single language that has this and the criteria are very specific and definitely in the programming domain.
Since Fortress is dead, the only languages I know of where something like this would be imaginable are those of FORTH heritage.
In all others that I know of, braces, brackets and parentheses are already heavily used, and can't be overloaded further.
I suggest giving up the quest for such stuff and get comfortable writing
floor x
ceiling y
or however function application is expressed in the language of your choice.
However, in the article you cite, it is said: Unicode contains codepoints for these symbols at U+2308–U+230B: ⌈x⌉, ⌊x⌋.
Thus you can at least define this a s operator in a Haskell like language and use like:
infix 5 ⌈⌉
(foo ⌈⌉)
The best I could come up with is like the following:
--- how to make brackets
module Test where
import Prelude.Math
infix 7 `«`
infix 6 `»`
_ « x = Math.ceil x
(») = const
x = () «2.345» ()
main _ = println x
The output was: 3.0
(This is not actually Haskell, but Frege, a Haskell-like JVM language.)
Note that I used «» instead of ⌈⌉, because I somehow have no font in my IDE that would correctly show the bracket symbols. (This is another reason not to do such.)
The way it works is that with the infix directives, we get this parsed as
(») ((«) () 2.345) ()
(One could insert any expression in place of the ())
Maybe if you ask in the Haskell group, someone finds a better solution.

Good Haskell coding standards

Could someone provide a link to a good coding standard for Haskell? I've found this and this, but they are far from comprehensive. Not to mention that the HaskellWiki one includes such "gems" as "use classes with care" and "defining symbolic infix identifiers should be left to library writers only."
Really hard question. I hope your answers turn up something good. Meanwhile, here is a catalog of mistakes or other annoying things that I have found in beginners' code. There is some overlap with the Cal Tech style page that Kornel Kisielewicz points to. Some of my advice is every bit as vague and useless as the HaskellWiki "gems", but I hope at least it is better advice :-)
Format your code so it fits in 80 columns. (Advanced users may prefer 87 or 88; beyond that is pushing it.)
Don't forget that let bindings and where clauses create a mutually recursive nest of definitions, not a sequence of definitions.
Take advantage of where clauses, especially their ability to see function parameters that are already in scope (nice vague advice). If you are really grokking Haskell, your code should have a lot more where-bindings than let-bindings. Too many let-bindings is a sign of an unreconstructed ML programmer or Lisp programmer.
Avoid redundant parentheses. Some places where redundant parentheses are particularly offensive are
Around the condition in an if expression (brands you as an unreconstructed C programmer)
Around a function application which is itself the argument of an infix operator (Function application binds tighter than any infix operator. This fact should be burned into every Haskeller's brain, in much the same way that us dinosaurs had APL's right-to-left scan rule burned in.)
Put spaces around infix operators. Put a space following each comma in a tuple literal.
Prefer a space between a function and its argument, even if the argument is parenthesized.
Use the $ operator judiciously to cut down on parentheses. Be aware of the close relationship between $ and infix .:
f $ g $ h x == (f . g . h) x == f . g . h $ x
Don't overlook the built-in Maybe and Either types.
Never write if <expression> then True else False; the correct phrase is simply <expression>.
Don't use head or tail when you could use pattern matching.
Don't overlook function composition with the infix dot operator.
Use line breaks carefully. Line breaks can increase readability, but there is a tradeoff: Your editor may display only 40–50 lines at once. If you need to read and understand a large function all at once, you mustn't overuse line breaks.
Almost always prefer the -- comments which run to end of line over the {- ... -} comments. The braced comments may be appropriate for large headers—that's it.
Give each top-level function an explicit type signature.
When possible, align -- lines, = signs, and even parentheses and commas that occur in adjacent lines.
Influenced as I am by GHC central, I have a very mild preference to use camelCase for exported identifiers and short_name with underscores for local where-bound or let-bound variables.
Some good rules of thumbs imho:
Consult with HLint to make sure you don't have redundant braces and that your code isn't pointlessly point-full.
Avoid recreating existing library functions. Hoogle can help you find them.
Often times existing library functions are more general than what one was going to make. For example if you want Maybe (Maybe a) -> Maybe a, then join does that among other things.
Argument naming and documentation is important sometimes.
For a function like replicate :: Int -> a -> [a], it's pretty obvious what each of the arguments does, from their types alone.
For a function that takes several arguments of the same type, like isPrefixOf :: (Eq a) => [a] -> [a] -> Bool, naming/documentation of arguments is more important.
If one function exists only to serve another function, and isn't otherwise useful, and/or it's hard to think of a good name for it, then it probably should exist in it's caller's where clause instead of in the module's scope.
DRY
Use Template-Haskell when appropriate.
Bundles of functions like zip3, zipWith3, zip4, zipWith4, etc are very meh. Use Applicative style with ZipLists instead. You probably never really need functions like those.
Derive instances automatically. The derive package can help you derive instances for type-classes such as Functor (there is only one correct way to make a type an instance of Functor).
Code that is more general has several benefits:
It's more useful and reusable.
It is less prone to bugs because there are more constraints.
For example if you want to program concat :: [[a]] -> [a], and notice how it can be more general as join :: Monad m => m (m a) -> m a. There is less room for error when programming join because when programming concat you can reverse the lists by mistake and in join there are very few things you can do.
When using the same stack of monad transformers in many places in your code, make a type synonym for it. This will make the types shorter, more concise, and easier to modify in bulk.
Beware of "lazy IO". For example readFile doesn't really read the file's contents at the moment the file is read.
Avoid indenting so much that I can't find the code.
If your type is logically an instance of a type-class, make it an instance.
The instance can replace other interface functions you may have considered with familiar ones.
Note: If there is more than one logical instance, create newtype-wrappers for the instances.
Make the different instances consistent. It would have been very confusing/bad if the list Applicative behaved like ZipList.
I like to try to organize functions
as point-free style compositions as
much as possible by doing things
like:
func = boo . boppity . bippity . snd
where boo = ...
boppity = ...
bippity = ...
I like using ($) only to avoid nested parens or long parenthesized expressions
... I thought I had a few more in me, oh well
I'd suggest taking a look at this style checker.
I found good markdown file covering almost every aspect of haskell code style. It can be used as cheat sheet. You can find it here: link

Resources