I am very new to Haskell and to functional programming in general. My question is pretty basic. What is the difference between Pattern Matching and Guards?
Function using pattern matching
check :: [a] -> String
check [] = "Empty"
check (x:xs) = "Contains Elements"
Function using guards
check_ :: [a] -> String
check_ lst
| length lst < 1 = "Empty"
| otherwise = "Contains elements"
To me it looks like Pattern Matching and Guards are fundamentally the same. Both evaluate a condition, and if true will execute the expression hooked to it. Am I correct in my understanding?
In this example I can either use pattern matching or guards to arrive at the same result. But something tells me I am missing out on something important here. Can we always replace one with the other?
Could someone give examples where pattern matching is preferred over guards and vice versa?
Actually, they're fundamentally quite different! At least in Haskell, at any rate.
Guards are both simpler and more flexible: They're essentially just special syntax that translates to a series of if/then expressions. You can put arbitrary boolean expressions in the guards, but they don't do anything you couldn't do with a regular if.
Pattern matches do several additional things: They're the only way to deconstruct data, and they bind identifiers within their scope. In the same sense that guards are equivalent to if expressions, pattern matching is equivalent to case expressions. Declarations (either at the top level, or in something like a let expression) are also a form of pattern match, with "normal" definitions being matches with the trivial pattern, a single identifier.
Pattern matches also tend to be the main way stuff actually happens in Haskell--attempting to deconstruct data in a pattern is one of the few things that forces evaluation.
By the way, you can actually do pattern matching in top-level declarations:
square = (^2)
(one:four:nine:_) = map square [1..]
This is occasionally useful for a group of related definitions.
GHC also provides the ViewPatterns extension which sort of combines both; you can use arbitrary functions in a binding context and then pattern match on the result. This is still just syntactic sugar for the usual stuff, of course.
As for the day-to-day issue of which to use where, here's some rough guides:
Definitely use pattern matching for anything that can be matched directly one or two constructors deep, where you don't really care about the compound data as a whole, but do care about most of the structure. The # syntax lets you bind the overall structure to a variable while also pattern matching on it, but doing too much of that in one pattern can get ugly and unreadable quickly.
Definitely use guards when you need to make a choice based on some property that doesn't correspond neatly to a pattern, e.g. comparing two Int values to see which is larger.
If you need only a couple pieces of data from deep inside a large structure, particularly if you also need to use the structure as a whole, guards and accessor functions are usually more readable than some monstrous pattern full of # and _.
If you need to do the same thing for values represented by different patterns, but with a convenient predicate to classify them, using a single generic pattern with a guard is usually more readable. Note that if a set of guards is non-exhaustive, anything that fails all the guards will drop down to the next pattern (if any). So you can combine a general pattern with some filter to catch exceptional cases, then do pattern matching on everything else to get details you care about.
Definitely don't use guards for things that could be trivially checked with a pattern. Checking for empty lists is the classic example, use a pattern match for that.
In general, when in doubt, just stick with pattern matching by default, it's usually nicer. If a pattern starts getting really ugly or convoluted, then stop to consider how else you could write it. Besides using guards, other options include extracting subexpressions as separate functions or putting case expressions inside the function body in order to push some of the pattern matching down onto them and out of the main definition.
For one, you can put boolean expressions within a guard.
For example:
Just as with list comprehensions, boolean expressions can be freely mixed with among the pattern guards. For example:
f x | [y] <- x
, y > 3
, Just z <- h y
= ...
Update
There is a nice quote from Learn You a Haskell about the difference:
Whereas patterns are a way of making sure a value conforms to some form and deconstructing it, guards are a way of testing whether some property of a value (or several of them) are true or false. That sounds a lot like an if statement and it's very similar. The thing is that guards are a lot more readable when you have several conditions and they play really nicely with patterns.
To me it looks like Pattern Matching and Guards are fundamentally the same. Both evaluate a condition, and if true will execute the expression hooked to it. Am I correct in my understanding?
Not quite. First pattern matching can not evaluate arbitrary conditions. It can only check whether a value was created using a given constructor.
Second pattern matching can bind variables. So while the pattern [] might be equivalent to the guard null lst (not using length because that'd not be equivalent - more on that later), the pattern x:xs most certainly is not equivalent to the guard not (null lst) because the pattern binds the variables x and xs, which the guard does not.
A note on using length: Using length to check whether a list is empty is very bad practice, because, to calculate the length it needs to go through the whole list, which will take O(n) time, while just checking whether the list is empty takes O(1) time with null or pattern matching. Further using `length´ just plain does not work on infinite lists.
In addition to the other good answers, I'll try to be specific about guards: Guards are just syntactic sugar. If you think about it, you will often have the following structure in your programs:
f y = ...
f x =
if p(x) then A else B
That is, if a pattern matches, it is followed right after by a if-then-else discrimination. A guard folds this discrimination into the pattern match directly:
f y = ...
f x | p(x) = A
| otherwise = B
(otherwise is defined to be True in the standard library). It is more convenient than an if-then-else chain and sometimes it also makes the code much simpler variant-wise so it is easier to write than the if-then-else construction.
In other words, it is sugar on top of another construction in a way which greatly simplifies your code in many cases. You will find that it eliminates a lot of if-then-else chains and make your code more readable.
Related
Would it be possible to have a function that takes some value and a pattern in order to check if both match?
Let's call this hypothetical function matches and with it we could rewrite the following function...
isSingleton :: [a] -> Bool
isSingleton [_] = True
isSingleton _ = False
...like so...
isSingleton xs = xs `matches` [_]
Would this be theoretically possible? If yes, how? And if not, why?
Well, you can't really use [_] as an expression – the compiler doesn't allow it. There are various bodges that could be applied to make it kind-of-possible:
-fdefer-typed-holes makes GHC ignore the _ during compilation. That doesn't really make it a legal expression, it just means the error will be raised at runtime instead. (Generally a very bad idea, should only be used for trying out something while another piece of your code isn't complete yet.)
You could define _' = undefined to have similar syntax that is accepted as an expression. Of course, that still means there will be a runtime error.
A runtime error could be caught, but only in the IO monad. That means basically don't do anything that requires it. But of course, technically speaking you could, and then wrap it in unsafePerformIO. Disgusting, but I won't say there aren't situations where this sort of thing is a necessary evil.
With that, you could already implement a function that would be able to determine that either it definitely doesn't match [_] (if == returns False without an error being raise), or possibly matches (in case the error is raised before a constructor discrepancy is found). That would be enough to determine that [] does not match [_], but not to determine that [1,_,3] does not match [1,_,0].
Again this could be botched around with a type class that first converts the values to a proper ⊥-less tree structure (using the unsafePerformIO catch to determine how deep to descent), but... please, let's stop the thought process here. We're clearly working against the language instead of with it with all of that.
What could of course be done is to do it in Template Haskell, but that would fix you to compile-time pattern expressions, I doubt that's in the spirit of the question. Or you could generate a type that explicitly expresses “original type with holes in it”, so the holes could properly be distinguished without any unsafe IO shenanigans. That would become quite verbose though.
In practice, I would just not try passing around first-class patterns, but instead pass around boolean-valued functions. Granted, functions are more general than patterns, but not in a way that would pose any practical problems.
Something that's more closely analogous to patterns are Prisms in the lens ecosystem. Indeed, there is the is combinator that does what you're asking for. However, lenses/prisms can be again fiddly to construct, I'm not even sure a prism corresponding to [_] can be built using the _Cons and _Empty primitives.
tl;dr it's not worth it.
Patterns aren't first-class in Haskell; you cannot write a function that receives a pattern as an argument.
However, anywhere you would call your hypothetical matches function, you can instead make a bespoke function that tests the particular pattern you're interested in, and use that function instead. So you don't really need first class patterns, they just might save a bit of boilerplate.
The extension LambdaCase more or less allows you to write a function that just does a pattern match with minimal syntactic overhead, although there is no special-case syntax available for the specific purpose of returning a Bool saying whether a pattern matches (such special-purpose syntax would allow you to to avoid explicitly writing that you want to map the pattern to True and having to add a catch-all alternative case to map to False).
For example:
isSingleton = \case [_] -> True
_ -> False
For something like isSingleton (which already is a single pattern match encapsulated into a function) there's not much benefit in doing this over just implementing it directly. But in a more complex function whether you want to call x `matches` <pattern> (or pass (`matches` <pattern>) to another function), it might be an alternative that keeps the pattern inline.
Honestly I'd probably just define functions like isSingleton for each pattern I wanted this for though (possibly just in a where clause).
When I try to compile this with ghc it complains the number of parameters in the left hand sides of the function definition are different.
module Example where
import Data.Maybe
from_maybe :: a -> Maybe a -> a
from_maybe a Nothing = a
from_maybe _ = Data.Maybe.fromJust
I'm wondering whether this is a ghc restriction. I tried to see if I
could find anything about the number of parameters in the Haskell 2010
Report but I wasn't successful.
Is this legal Haskell or isn't it? If not, where is this parameter count
restriction listed?
It's not legal. The restriction is described in the
Haskell 2010 Report:
4.4.3.1 Function bindings
[...]
Note that all clauses defining a function must be contiguous, and the number of patterns in each clause must be the same.
The answer from melpomene is the summation of the syntax rules. In this answer I want to try to describe why this has to be the case.
From the point of view of the programmer, the syntax in the question seems quite reasonable. There is a first specific case which is marked by a value in the second parameter. Then there is a general solution which reduces to an existing function. Either pattern on its own is valid so why can they not co-exist?
The syntax rule that all patterns must capture exactly the same parameters is a necessary result of haskell being a lazy language. Being lazy means that the values of parameters are not evaluated until they are needed.
Let's look at what happens when we curry a function, that is provide it with less then the number of parameters in the declaration. When that happens haskell produces an anonymous function* that will complete the function and stores the first parameters in the scope of that function without evaluating them.
That last part is important. For functions with different numbers of parameters, it would be necessary to evaluate some of them to choose which pattern to match, but then not evaluate them.
In other words the compiler needs to know exactly how many parameters it needs before it evaluates them to choose among the patterns. So while it seems like we should not need to put an extra _ in, the compiler needs it to be there.
As an aside, the order in which parameters occur can make a big difference in how easy a function is to implement and use. See the answers to my question Ordering of parameters to make use of currying for suggestions. If the order is reversed on the example function, it can be implemented with only the one named point.
from_maybe :: Maybe a -> a -> a
from_maybe Nothing = id
from_maybe (Just x) = const x
*: What an implementation like GHC actually does in this situation is subject to optimization and may not work exactly this way, but the end result must be the same as if it did.
I've been using Haskell for about a year now because I'm writing a paper on it. In one of the first chapters I talk about pattern matching, and refer to the function
fst :: (a,b) -> a
to demonstrate the usefulness of it. In the text I say that it "would be more convoluted" to implement fst without using pattern matching. Out of interest I started thinking about how I'd do this, but can't seem to think of any way that doesn't use pattern matching in one way or another, the best thing I can think of is
fst' tuple = let (first, second) = tuple
in first
but that essentially still uses pattern matching! Is there a way to access the inner values of a tuple without pattern matching?
The only way to extract the arguments of a data constructor is to pattern match on it. You can do the pattern matching manually, or call a function (like fst) which does it for you - but either way, the pattern matching is being done somewhere.
I've read somewhere lately that pattern matching happens during run-time and not compile-time. (I am looking for the source, but can't find it at the moment.) Is it true? And if so, do guards in functions have the same performance?
Reading this was surprising to me because I used to think GHC was able to optimize some (probably not all) pattern match decisions during compile time. Does this happen at all?
A case for example:
f 1 = 3
f 2 = 4
vs
f' a | a == 1 = 3
| a == 2 = 4
Do f and f' compile to the same number of instructions (e.g. in Core and/or lower)?
Is the situation any different if I pattern match on a constructor instead of a value? E.g. if GHC sees that a function from a location is always invoked with one constructor, does it optimize that call in a way that eliminates the run-time check? And if so, can you give me an example showing what the optimization produces?
In summary
What is good to know about these two approaches in terms of performance?
When is one preferable performance-wise?
Never mind patterns vs. guards, you might as well ask about if vs. case.
Pattern matching is preferrable to equality checks. Equality-checking is not really a natural thing to do in Haskell. Boolean blindness is one problem, but apart from that full equality check is often simply not feasible – e.g. infinite lists will never compare equal!
How much more efficient direct pattern matching is depends on the type. In case of numbers, don't expect much difference since those patterns are under the hood implemented with equality checks.
I generally prefer patterns – because they're just nicer and can be more efficient. Equality checks will be either just as expensive, or possibly more expensive, and are just un-idiomatic. Only use boolean evaluation when you have to, otherwise stick to patterns (which can be in guards too)!
This has been a question I've been wondering for a while. if statements are staples in most programming languages (at least then ones I've worked with), but in Haskell it seems like it is quite frowned upon. I understand that for complex situations, Haskell's pattern matching is much cleaner than a bunch of ifs, but is there any real difference?
For a simple example, take a homemade version of sum (yes, I know it could just be foldr (+) 0):
sum :: [Int] -> Int
-- separate all the cases out
sum [] = 0
sum (x:xs) = x + sum xs
-- guards
sum xs
| null xs = 0
| otherwise = (head xs) + sum (tail xs)
-- case
sum xs = case xs of
[] -> 0
_ -> (head xs) + sum (tail xs)
-- if statement
sum xs = if null xs then 0 else (head xs) + sum (tail xs)
As a second question, which one of these options is considered "best practice" and why? My professor way back when always used the first method whenever possible, and I'm wondering if that's just his personal preference or if there was something behind it.
The problem with your examples is not the if expressions, it's the use of partial functions like head and tail. If you try to call either of these with an empty list, it throws an exception.
> head []
*** Exception: Prelude.head: empty list
> tail []
*** Exception: Prelude.tail: empty list
If you make a mistake when writing code using these functions, the error will not be detected until run time. If you make a mistake with pattern matching, your program will not compile.
For example, let's say you accidentally switched the then and else parts of your function.
-- Compiles, throws error at run time.
sum xs = if null xs then (head xs) + sum (tail xs) else 0
-- Doesn't compile. Also stands out more visually.
sum [] = x + sum xs
sum (x:xs) = 0
Note that your example with guards has the same problem.
I think the Boolean Blindness article answers this question very well. The problem is that boolean values have lost all their semantic meaning as soon as you construct them. That makes them a great source for bugs and also makes the code more difficult to understand.
Your first version, the one preferred by your prof, has the following advantages compared to the others:
no mention of null
list components are named in the pattern, so no mention of head and tail.
I do think that this one is considered "best practice".
What's the big deal? Why would we want to avoid especially head and tail? Well, everybody knows that those functions are not total, so one automatically tries to make sure that all cases are covered. A pattern match on [] not only stands out more than null xs, a series of pattern matches can be checked by the compiler for completeness. Hence, the idiomatic version with complete pattern match is easier to grasp (for the trained Haskell reader) and to proof exhaustive by the compiler.
The second version is slightly better than the last one because one sees at once that all cases are handled. Still, in the general case the RHS of the second equation could be longer and there could be a where clauses with a couple of definitions, the last of them could be something like:
where
... many definitions here ...
head xs = ... alternative redefnition of head ...
To be absolutly sure to understand what the RHS does, one has to make sure common names have not been redefined.
The 3rd version is the worst one IMHO: a) The 2nd match fails to deconstruct the list and still uses head and tail. b) The case is slightly more verbose than the equivalent notation with 2 equations.
In many programming languages, if-statements are fundamental primitives, and things like switch-blocks are just syntax sugar to make deeply-nested if-statements easier to write.
Haskell does it the other way around. Pattern matching is the fundamental primitive, and an if-expression is literally just syntax sugar for pattern matching. Similarly, constructs like null and head are simply user-defined functions, which are all ultimately implemented using pattern matching. So pattern matching is the thing at the bottom of it all. (And therefore potentially more efficient than calling user-defined functions.)
In many cases - such as the ones you list above - it's simply a matter of style. The compiler can almost certainly optimise things to the point where all versions are roughly equal in performance. But generally [not always!] pattern matching makes it clearer exactly what you're trying to achieve.
(It's annoyingly easy to write an if-expression where you get the two alternatives the wrong way around. You'd think it would be a rare mistake, but it's surprisingly common. With a pattern match, there's little chance of making that specific mistake, although there's still plenty of other things to screw up.)
Each call to null, head and tail entails a pattern match. But the 1st version in your answer does just one pattern match, and reuses its results through named components of the pattern.
Just for that, it is better. But it is also more visually apparent, more readable.
Pattern matching is better than a string of if-then-else statements for (at least) the following reasons:
it is more declarative
it interacts well with sum-types
Pattern matching helps to reduce the amount of "accidental complexity" in your code - that is, code that is really more about implementation details rather than the essential logic of your program.
In most other languages when the compier/run-time sees a string of if-then-else statements it has no choice but to test the conditions in exactly the order the programmer specified them. But pattern matching encourages the programmer to focus more on describing what should happen versus how things should be performed. Due to purity and immutability of values in Haskell the compiler can consider the collection of patterns as a whole and decide the how best to implement them.
An analogy would be C's switch statement. If you dump the assembly code for various switch statements you will see that sometimes the compiler will generate a chain/tree of comparisons and in other cases it will generate a jump table. The programmer uses the same syntax in both cases - the compiler chooses the implementation based on what the comparison values are. If they form a contiguous block of values the jump table method is used, otherwise a comparison tree is used. And this separation of concerns allows the compiler to implement even more strategies in the future if other patterns among the comparison values are detected.