In a code base I'm reading, I found a function declaration like this (some parts are missing):
filepathNormalise :: BS.ByteString -> BS.ByteString
filepathNormalise xs
| isWindows, Just (a,xs) <- BS.uncons xs, sep a, Just (b,_) <- BS.uncons xs, sep b
= '/' `BS.cons` f xs
What does the comma do here?
(Only as a bonus, if someone readily knows this: is this syntax mentioned in Haskell Programming from first principles, and if so, where? As I can't remember reading about it.)
Guards are described in Haskell 2010 section 3.13, Case Expressions
(that section is about case expressions, not top-level declarations, but presumably the semantics are the same):
guards → | guard1, …, guardn (n ≥ 1)
guard → pat <- infixexp (pattern guard)
| let decls (local declaration)
| infixexp (boolean guard)
For each guarded expression, the comma-separated guards are tried sequentially from left to right. If all of them succeed, then the corresponding expression is evaluated in the environment extended with the bindings introduced by the guards. That is, the bindings that are introduced by a guard (either by using a let clause or a pattern guard) are in scope in the following guards and the corresponding expression. If any of the guards fail, then this guarded expression fails and the next guarded expression is tried.
In the simple case, the comma serves a role similar to Boolean and. But the comma is more powerful in that each guard can introduce new bindings that are used by the subsequent guards (proceeding from left to right).
Commas in guards are uncommon enough (in my experience, at least) that I'd describe this feature as Haskell trivia -- not at all necessary to writing (or, for the most part, reading) Haskell. I suspect that Haskell Programming from first principles omits it for that reason.
This syntax is not legal in Haskell '98; this was added to the language specification in Haskell 2010. It's part of the "pattern guards" language extension.
https://prime.haskell.org/wiki/PatternGuards
The real usefulness in this is in allowing you to pattern match inside a guard clause. The syntactic change also has the side-effect of allowing you to AND together several Boolean terms using commas.
(I personally really dislike this extension, and I'm a little shocked it made it into the official spec, but there we are...)
Related
Haskell has a resticted syntax to define type families:
(1) type family Length (xs :: [*]) where
(2) Length '[] = 0
(3) Length (x ': xs) = 1 + Length xs
On the lines (2) and (3) on the left side of the equal sign (=) we only have simple pattern matching.
On the right side of the equal sign we have just type level function application and as a syntactic sugar
there are type operators ((+) in line (3)).
There are no guards, no case expressions, no if-then-else syntax, no let and where's
and there is no partial function application.
This is not a problem, as the missing case expression can be replaced by a specialized type level function,
that pattern matches on the different cases, the missing if-then-else syntax can be replaced by the If
function of the Data.Type.Bool
package.
Looking at some examples we see, that pattern matching syntax on the type level has at least one
additional feature, not available in normal Haskell value level functions:
(1) type family Contains (lst :: [a]) (elem :: a) where
(2) Contains (x ': xs) (x) = 'True
(3) Contains '[] (x) = 'False
(4) Contains (x ': xs) (y) = Contains xs (y)
In line (2) we use twice the variable x. Line (2) evaluates to 'True, if the head of the list of the first parameter
is equal as the second parameter.
If we do the same thing on a value level function, GHC answers with a Conflicting definitions for 'x' error.
In value level functions we must add an Eq a => context to compile the function.
Type level pattern matching seems to work similar as unification back in the old times of Prolog.
I unsuccessfully googled for some documentation about the syntax of type level functions.
Why does GHC not require something like an a ~ b type equality constraint in the definition of the Contains type family?
Is type equality always available?
Has the type family syntax other additional features, that are unavailable on the value level?
Where is this documented?
Haskell's type-level language is a purely first-order language, in which "application" is just another constructor, rather than a thing which computes. There are binding constructs, like forall, but the notion of equality for type-level stuff is fundamentally mere alpha-equivalence: structural up to renaming of bound variables. Indeed, the whole of our constructor-class machinery, monads, etc relies on being able to take an application m v apart unambiguously.
Type-level functions don't really live in the type-level language as first-class citizens: only their full applications do. We end up with an equational (for the ~ notion of equality) theory of type-level expressions in which constraints are expressed and solved, but the underlying notion of value that these expressions denote is always first-order, and thus always equippable with equality.
Hence it always makes sense to interpret repeated pattern variables by a structural equality test, which is exactly how pattern matching was designed in its original 1969 incarnation, as an extension to another language rooted in a fundamentally first-order notion of value, LISP.
I am reading the Haskell report 2010 and has some questions with regarding to the meta-logical representation in section 2.4Here:
In the mnemonics of "varid" and "varsym", does "var" mean variable?
My understanding are that "varid" are identifiers for variables and functions, while "varsym" are identifiers also, but for operators. Is this understanding correct?
If 1 and 2 are correct, does it mean operator is also a kind of variable? (I am very confused because this is likely not right.)
Appreciate any advice.
As far as I can tell, the report is defining the difference between symbols that are used prefix, and those that are used infix, for example:
f x y -- f is used prefix
a / b -- / is used infix
This is just a syntactic convenience, as all prefix symbols can be used infix with backticks, and all infix symbols can be used prefix with ()s:
x `f` y -- infix
(/) a b -- prefix
(a /) b -- operator section
(/ b) a -- operator section
Sub-questions:
yes, but I can't figure out any meaningful mnemonic for the id and sym parts. :(
operators are in the realm of Haskell syntax, not its semantics. They're only used to provide a more convenient syntax for writing some expressions. As far as I know, if they were removed from Haskell, the only loss would be convenient syntax -- i.e. there's nothing that you need operators for, other than convenient syntax, and you can replace every single use of operators with non-operator symbols. They are completely identical to variables -- they are variables -- but require different syntax for their use.
yes, I would agree that operator symbols are variables. However, the values bound to oerators symbols would not be variables.
I have an ontological question about monads in haskell; I'm shaky on whether the language makes a distinction between statements and expressions at all. For example, I feel like in most other languages anything with a signature like a -> SomeMonadProbs () would be considered a statement. That said, since haskell is purely functional, and functions are composed of expressions, I'm a wee bit confused on what haskell would say about monads in terms of their expression-hood.
Monad is just one interface for interacting with expressions. For example, consider this list comprehension implemented using do notation:
example :: [(Int, Int)]
example = do
x <- [1..3]
y <- [4..6]
return (x, y)
That desugars to:
[1..3] >>= \x ->
[4..6] >>= \y ->
return (x, y)
... and substituting in the definition of (>>=) for lists gives:
concatMap (\x -> concatMap (\y -> [(x, y)]) [4..6]) [1..3]
The important idea is that anything you can do using do notation can be replaced with calls to (>>=).
The closest thing to "statements" in Haskell are syntactic lines of a do notation block, such as:
x <- [1..3]
These lines do not correspond to isolated expressions, but rather syntactic fragments of an expression which are not self-contained:
[1..3] >>= \x -> ... {incomplete lambda}
So it's really more appropriate to say that everything is an expression in Haskell, and do notation gives you something which appears like a bunch of statements but actually desugars to a bunch of expressions under the hood.
Here are few thoughts.
a >>= b is an application just like any other application, so from syntactic point of view there are clearly no statements in Haskell, only expressions.
From semantic point of view (see for example Tackling the awkward squad paper) there are "denotational" and "operational" fragments of Haskell semantics.
The denotational fragment treats >>= similar to a data constructor, so it considers a >>= b to be in WHNF. The "operational" fragment "deconstructs" the values in IO monad and performs different effects in the process.
When reasoning about programs, you often don't need to consider the "operational" fragment at all. For example, when you refactor foo a >> foo a into let bar = foo a in bar >> bar you don't care about the nature of foo, so IO actions are indistinguishable from any other values here.
It's where Haskell shines, and it's tempting to say there are no statements at all, however it leads to funny and somewhat paradoxical conclusion. For example, C preprocessor language can be considered a denotational fragment of C. So C has denotational and operational fragments too, but nobody says that C is purely functional or has no statements. See The C language is purely functional post for a detailed treatment of this matter.
Haskell of course differs from C quantitatively: its denotational fragment is expressive enough to be practically useful, so you have to think about underlying transitions in its operational semantics less often than in C.
But when you have to think about those transitions, like when reasoning about the order of data written to a network socket, you have to resort to that statement-after-statement thinking.
So while IO actions are not themselves statements and in a certain narrow technical sense there are no statements at all, the actions represent the statements so I think it's fair to say that statements are present in Haskell in a very indirect form.
whether the language makes a distinction between statements and expressions at all
It does not. There are no productions for "statement" or anything like that in the grammar, and nothing is called "statement" or anything equivalent (as far as I know) in the language description.
The language report calls elements inside the do notation "statements". There are two kinds of statements that are not expressions: pat <- exp`` andlet decls`.
in most other languages anything with a signature like a -> SomeMonadProbs () would be considered a statement
Haskell is different from most other languages. That's kinda its point (not being different for the sake of it, obviously, but unifying expressions and statements into a single construct).
This has been a question I've been wondering for a while. if statements are staples in most programming languages (at least then ones I've worked with), but in Haskell it seems like it is quite frowned upon. I understand that for complex situations, Haskell's pattern matching is much cleaner than a bunch of ifs, but is there any real difference?
For a simple example, take a homemade version of sum (yes, I know it could just be foldr (+) 0):
sum :: [Int] -> Int
-- separate all the cases out
sum [] = 0
sum (x:xs) = x + sum xs
-- guards
sum xs
| null xs = 0
| otherwise = (head xs) + sum (tail xs)
-- case
sum xs = case xs of
[] -> 0
_ -> (head xs) + sum (tail xs)
-- if statement
sum xs = if null xs then 0 else (head xs) + sum (tail xs)
As a second question, which one of these options is considered "best practice" and why? My professor way back when always used the first method whenever possible, and I'm wondering if that's just his personal preference or if there was something behind it.
The problem with your examples is not the if expressions, it's the use of partial functions like head and tail. If you try to call either of these with an empty list, it throws an exception.
> head []
*** Exception: Prelude.head: empty list
> tail []
*** Exception: Prelude.tail: empty list
If you make a mistake when writing code using these functions, the error will not be detected until run time. If you make a mistake with pattern matching, your program will not compile.
For example, let's say you accidentally switched the then and else parts of your function.
-- Compiles, throws error at run time.
sum xs = if null xs then (head xs) + sum (tail xs) else 0
-- Doesn't compile. Also stands out more visually.
sum [] = x + sum xs
sum (x:xs) = 0
Note that your example with guards has the same problem.
I think the Boolean Blindness article answers this question very well. The problem is that boolean values have lost all their semantic meaning as soon as you construct them. That makes them a great source for bugs and also makes the code more difficult to understand.
Your first version, the one preferred by your prof, has the following advantages compared to the others:
no mention of null
list components are named in the pattern, so no mention of head and tail.
I do think that this one is considered "best practice".
What's the big deal? Why would we want to avoid especially head and tail? Well, everybody knows that those functions are not total, so one automatically tries to make sure that all cases are covered. A pattern match on [] not only stands out more than null xs, a series of pattern matches can be checked by the compiler for completeness. Hence, the idiomatic version with complete pattern match is easier to grasp (for the trained Haskell reader) and to proof exhaustive by the compiler.
The second version is slightly better than the last one because one sees at once that all cases are handled. Still, in the general case the RHS of the second equation could be longer and there could be a where clauses with a couple of definitions, the last of them could be something like:
where
... many definitions here ...
head xs = ... alternative redefnition of head ...
To be absolutly sure to understand what the RHS does, one has to make sure common names have not been redefined.
The 3rd version is the worst one IMHO: a) The 2nd match fails to deconstruct the list and still uses head and tail. b) The case is slightly more verbose than the equivalent notation with 2 equations.
In many programming languages, if-statements are fundamental primitives, and things like switch-blocks are just syntax sugar to make deeply-nested if-statements easier to write.
Haskell does it the other way around. Pattern matching is the fundamental primitive, and an if-expression is literally just syntax sugar for pattern matching. Similarly, constructs like null and head are simply user-defined functions, which are all ultimately implemented using pattern matching. So pattern matching is the thing at the bottom of it all. (And therefore potentially more efficient than calling user-defined functions.)
In many cases - such as the ones you list above - it's simply a matter of style. The compiler can almost certainly optimise things to the point where all versions are roughly equal in performance. But generally [not always!] pattern matching makes it clearer exactly what you're trying to achieve.
(It's annoyingly easy to write an if-expression where you get the two alternatives the wrong way around. You'd think it would be a rare mistake, but it's surprisingly common. With a pattern match, there's little chance of making that specific mistake, although there's still plenty of other things to screw up.)
Each call to null, head and tail entails a pattern match. But the 1st version in your answer does just one pattern match, and reuses its results through named components of the pattern.
Just for that, it is better. But it is also more visually apparent, more readable.
Pattern matching is better than a string of if-then-else statements for (at least) the following reasons:
it is more declarative
it interacts well with sum-types
Pattern matching helps to reduce the amount of "accidental complexity" in your code - that is, code that is really more about implementation details rather than the essential logic of your program.
In most other languages when the compier/run-time sees a string of if-then-else statements it has no choice but to test the conditions in exactly the order the programmer specified them. But pattern matching encourages the programmer to focus more on describing what should happen versus how things should be performed. Due to purity and immutability of values in Haskell the compiler can consider the collection of patterns as a whole and decide the how best to implement them.
An analogy would be C's switch statement. If you dump the assembly code for various switch statements you will see that sometimes the compiler will generate a chain/tree of comparisons and in other cases it will generate a jump table. The programmer uses the same syntax in both cases - the compiler chooses the implementation based on what the comparison values are. If they form a contiguous block of values the jump table method is used, otherwise a comparison tree is used. And this separation of concerns allows the compiler to implement even more strategies in the future if other patterns among the comparison values are detected.
I am very new to Haskell and to functional programming in general. My question is pretty basic. What is the difference between Pattern Matching and Guards?
Function using pattern matching
check :: [a] -> String
check [] = "Empty"
check (x:xs) = "Contains Elements"
Function using guards
check_ :: [a] -> String
check_ lst
| length lst < 1 = "Empty"
| otherwise = "Contains elements"
To me it looks like Pattern Matching and Guards are fundamentally the same. Both evaluate a condition, and if true will execute the expression hooked to it. Am I correct in my understanding?
In this example I can either use pattern matching or guards to arrive at the same result. But something tells me I am missing out on something important here. Can we always replace one with the other?
Could someone give examples where pattern matching is preferred over guards and vice versa?
Actually, they're fundamentally quite different! At least in Haskell, at any rate.
Guards are both simpler and more flexible: They're essentially just special syntax that translates to a series of if/then expressions. You can put arbitrary boolean expressions in the guards, but they don't do anything you couldn't do with a regular if.
Pattern matches do several additional things: They're the only way to deconstruct data, and they bind identifiers within their scope. In the same sense that guards are equivalent to if expressions, pattern matching is equivalent to case expressions. Declarations (either at the top level, or in something like a let expression) are also a form of pattern match, with "normal" definitions being matches with the trivial pattern, a single identifier.
Pattern matches also tend to be the main way stuff actually happens in Haskell--attempting to deconstruct data in a pattern is one of the few things that forces evaluation.
By the way, you can actually do pattern matching in top-level declarations:
square = (^2)
(one:four:nine:_) = map square [1..]
This is occasionally useful for a group of related definitions.
GHC also provides the ViewPatterns extension which sort of combines both; you can use arbitrary functions in a binding context and then pattern match on the result. This is still just syntactic sugar for the usual stuff, of course.
As for the day-to-day issue of which to use where, here's some rough guides:
Definitely use pattern matching for anything that can be matched directly one or two constructors deep, where you don't really care about the compound data as a whole, but do care about most of the structure. The # syntax lets you bind the overall structure to a variable while also pattern matching on it, but doing too much of that in one pattern can get ugly and unreadable quickly.
Definitely use guards when you need to make a choice based on some property that doesn't correspond neatly to a pattern, e.g. comparing two Int values to see which is larger.
If you need only a couple pieces of data from deep inside a large structure, particularly if you also need to use the structure as a whole, guards and accessor functions are usually more readable than some monstrous pattern full of # and _.
If you need to do the same thing for values represented by different patterns, but with a convenient predicate to classify them, using a single generic pattern with a guard is usually more readable. Note that if a set of guards is non-exhaustive, anything that fails all the guards will drop down to the next pattern (if any). So you can combine a general pattern with some filter to catch exceptional cases, then do pattern matching on everything else to get details you care about.
Definitely don't use guards for things that could be trivially checked with a pattern. Checking for empty lists is the classic example, use a pattern match for that.
In general, when in doubt, just stick with pattern matching by default, it's usually nicer. If a pattern starts getting really ugly or convoluted, then stop to consider how else you could write it. Besides using guards, other options include extracting subexpressions as separate functions or putting case expressions inside the function body in order to push some of the pattern matching down onto them and out of the main definition.
For one, you can put boolean expressions within a guard.
For example:
Just as with list comprehensions, boolean expressions can be freely mixed with among the pattern guards. For example:
f x | [y] <- x
, y > 3
, Just z <- h y
= ...
Update
There is a nice quote from Learn You a Haskell about the difference:
Whereas patterns are a way of making sure a value conforms to some form and deconstructing it, guards are a way of testing whether some property of a value (or several of them) are true or false. That sounds a lot like an if statement and it's very similar. The thing is that guards are a lot more readable when you have several conditions and they play really nicely with patterns.
To me it looks like Pattern Matching and Guards are fundamentally the same. Both evaluate a condition, and if true will execute the expression hooked to it. Am I correct in my understanding?
Not quite. First pattern matching can not evaluate arbitrary conditions. It can only check whether a value was created using a given constructor.
Second pattern matching can bind variables. So while the pattern [] might be equivalent to the guard null lst (not using length because that'd not be equivalent - more on that later), the pattern x:xs most certainly is not equivalent to the guard not (null lst) because the pattern binds the variables x and xs, which the guard does not.
A note on using length: Using length to check whether a list is empty is very bad practice, because, to calculate the length it needs to go through the whole list, which will take O(n) time, while just checking whether the list is empty takes O(1) time with null or pattern matching. Further using `length´ just plain does not work on infinite lists.
In addition to the other good answers, I'll try to be specific about guards: Guards are just syntactic sugar. If you think about it, you will often have the following structure in your programs:
f y = ...
f x =
if p(x) then A else B
That is, if a pattern matches, it is followed right after by a if-then-else discrimination. A guard folds this discrimination into the pattern match directly:
f y = ...
f x | p(x) = A
| otherwise = B
(otherwise is defined to be True in the standard library). It is more convenient than an if-then-else chain and sometimes it also makes the code much simpler variant-wise so it is easier to write than the if-then-else construction.
In other words, it is sugar on top of another construction in a way which greatly simplifies your code in many cases. You will find that it eliminates a lot of if-then-else chains and make your code more readable.