View patterns vs. pattern guards - haskell

I'm trying to get a sense of the relationship between view patterns and pattern guards in GHC. Pattern guards seem quite intuitive, while view patterns seem a bit confusing. It kind of looks like view patterns are better for dealing with things deep in a pattern, while pattern guards can reuse a view more intuitively, but I don't quite get it.

View patterns have significant overlap with pattern guards. The main advantage of view patterns is that they can be nested, and avoid introducing intermediate pattern variables. For a silly example:
endpoints (sort -> begin : (reverse -> end : _)) = Just (begin, end)
endpoints _ = Nothing
The pattern guard equivalent requires every new view to bind a new pattern variable, alternating between evaluating expressions and binding patterns.
endpoints xs
| begin : sorted <- sort xs
, end : _ <- reverse sorted
= Just (begin, end)
| otherwise = Nothing
View patterns can also use only those variables bound earlier in the pattern, but it does look nice:
nonzero :: (a -> Int) -> a -> Maybe a
nonzero f (f -> 0) = Nothing
nonzero _ x = Just x
-- nonzero (fromEnum . not . null) "123" == Just "123"
-- "" == Nothing
The main advantage of pattern guards is that they are a simple generalisation of guards, and can include ordinary Boolean expressions. I generally prefer them over view patterns because I find the style of case and guards less repetitious than the equational style.

View patterns let you project a value before pattern matching on it. It can almost be thought of as a short cut for
foo x = case f x of
...
There's a bit of sugar on top for dealing with more complex views, but basically that's it. On the other hand, pattern guards are strictly more general,
They can include arbitrary boolean conditions for matching
They can match using more than one of the variables
I favor view patterns when I'm doing something "lens-like". I have a big piece of data and I'm interested in one particular view of it. For example, with lens
foo (view someLens -> Bar baz quux) = ...
Pattern guards tend to work well when you want something closer to a more flexible case expression.

Related

Pattern matching in Haskell case statements

I came across the following piece of code:
lala :: [[Int]] -> Bool
lala b = case b of
(a:_) | Just (b, _) <- uncons a -> True
other -> False
While I understand that the function checks that the first element of the list is not empty (there are better ways of writing this, but that's not the point), I don't fully understand the pattern matching happening in the case statement. Is the left arrow in this case simply pattern matching on the uncons call? Can this style of pattern matching be nested? This almost seems like a list comprehension syntax, are there other places where this type of pattern matching can be used?

How to know the data type of a value of a sum type [duplicate]

What is pattern matching in Haskell and how is it related to guarded equations?
I've tried looking for a simple explanation, but I haven't found one.
EDIT:
Someone tagged as homework. I don't go to school anymore, I'm just learning Haskell and I'm trying to understand this concept. Pure out of interest.
In a nutshell, patterns are like defining piecewise functions in math. You can specify different function bodies for different arguments using patterns. When you call a function, the appropriate body is chosen by comparing the actual arguments with the various argument patterns. Read A Gentle Introduction to Haskell for more information.
Compare:
with the equivalent Haskell:
fib 0 = 1
fib 1 = 1
fib n | n >= 2
= fib (n-1) + fib (n-2)
Note the "n ≥ 2" in the piecewise function becomes a guard in the Haskell version, but the other two conditions are simply patterns. Patterns are conditions that test values and structure, such as x:xs, (x, y, z), or Just x. In a piecewise definition, conditions based on = or ∈ relations (basically, the conditions that say something "is" something else) become patterns. Guards allow for more general conditions. We could rewrite fib to use guards:
fib n | n == 0 = 1
| n == 1 = 1
| n >= 2 = fib (n-1) + fib (n-2)
There are other good answers, so I'm going to give you a very technical answer. Pattern matching is the elimination construct for algebraic data types:
"Elimination construct" means "how to consume or use a value"
"Algebraic data type", in addition to first-class functions, is the big idea in a statically typed functional language like Clean, F#, Haskell, or ML
The idea of algebraic data types is that you define a type of thing, and you say all the ways you can make that thing. As an example, let's define "Sequence of String" as an algebraic data type, with three ways to make it:
data StringSeq = Empty -- the empty sequence
| Cat StringSeq StringSeq -- two sequences in succession
| Single String -- a sequence holding a single element
Now, there are all sorts of things wrong with this definition, but as an example it's interesting because it provides constant-time concatenation of sequences of arbitrary length. (There are other ways to achieve this.) The declaration introduces Empty, Cat, and Single, which are all the ways there are of making sequences. (That makes each one an introduction construct—a way to make things.)
You can make an empty sequence without any other values.
To make a sequence with Cat, you need two other sequences.
To make a sequence with Single, you need an element (in this case a string)
Here comes the punch line: the elimination construct, pattern matching, gives you a way to scrutinize a sequence and ask it the question what constructor were you made with?. Because you have to be prepared for any answer, you provide at least one alternative for each constructor. Here's a length function:
slen :: StringSeq -> Int
slen s = case s of Empty -> 0
Cat s s' -> slen s + slen s'
Single _ -> 1
At the core of the language, all pattern matching is built on this case construct. However, because algebraic data types and pattern matching are so important to the idioms of the language, there's special "syntactic sugar" for doing pattern matching in the declaration form of a function definition:
slen Empty = 0
slen (Cat s s') = slen s + slen s'
slen (Single _) = 1
With this syntactic sugar, computation by pattern matching looks a lot like definition by equations. (The Haskell committee did this on purpose.) And as you can see in the other answers, it is possible to specialize either an equation or an alternative in a case expression by slapping a guard on it. I can't think of a plausible guard for the sequence example, and there are plenty of examples in the other answers, so I'll leave it there.
Pattern matching is, at least in Haskell, deeply tied to the concept of algebraic data types. When you declare a data type like this:
data SomeData = Foo Int Int
| Bar String
| Baz
...it defines Foo, Bar, and Baz as constructors--not to be confused with "constructors" in OOP--that construct a SomeData value out of other values.
Pattern matching is nothing more than doing this in reverse--a pattern would "deconstruct" a SomeData value into its constituent pieces (in fact, I believe that pattern matching is the only way to extract values in Haskell).
When there are multiple constructors for a type, you write multiple versions of a function for each pattern, with the correct one being selected depending on which constructor was used (assuming you've written patterns to match all possible constructions--which it's generally good practice to do).
In a functional language, pattern matching involves checking an argument against different forms. A simple example involves recursively defined operations on lists. I will use OCaml to explain pattern matching since it's my functional language of choice, but the concepts are the same in F# and Haskell, AFAIK.
Here is the definition of a function to compute the length of a list lst. In OCaml, an ``a listis defined recursively as the empty list[], or the structureh::t, wherehis an element of typea(abeing any type we want, such as an integer or even another list),tis a list (hence the recursive definition), and::` is the cons operator, which creates a new list out of an element and a list.
So the function would look like this:
let rec len lst =
match lst with
[] -> 0
| h :: t -> 1 + len t
rec is a modifier that tells OCaml that a function will call itself recursively. Don't worry about that part. The match statement is what we're focusing on. OCaml will check lst against the two patterns - empty list, or h :: t - and return a different value based on that. Since we know every list will match one of these patterns, we can rest assured that our function will return safely.
Note that even though these two patterns will take care of all lists, you aren't limited to them. A pattern like h1 :: h2 :: t (matching all lists of length 2 or more) is also valid.
Of course, the use of patterns isn't restricted to recursively defined data structures, or recursive functions. Here is a (contrived) function to tell you whether a number is 1 or 2:
let is_one_or_two num =
match num with
1 -> true
| 2 -> true
| _ -> false
In this case, the forms of our pattern are the numbers themselves. _ is a special catch-all used as a default case, in case none of the above patterns match.
Pattern matching is one of those painful operations that is hard to get one's head around if you come from procedural programming background. I find it hard to get into because the same syntax used to create a data structure can be used for matching.
In F# you can use the cons operator :: to add an element to the beginning of a list like so:
let a = 1 :: [2;3]
//val a : int list = [1; 2; 3]
Similarly you can use the same operator to split the list up like so:
let a = [1;2;3];;
match a with
| [a;b] -> printfn "List contains 2 elements" //will match a list with 2 elements
| a::tail -> printfn "Head is %d" a //will match a list with 2 or more elements
| [] -> printfn "List is empty" //will match an empty list

A real life example when pattern matching is more preferable than a case expression in Haskell?

So I have been busy with the Real World Haskell book and I did the lastButOne exercise. I came up with 2 solutions, one with pattern matching
lastButOne :: [a] -> a
lastButOne ([]) = error "Empty List"
lastButOne (x:[]) = error "Only one element"
lastButOne (x:[x2]) = x
lastButOne (x:xs) = lastButOne xs
And one using a case expression
lastButOneCase :: [a] -> a
lastButOneCase x =
case x of
[] -> error "Empty List"
(x:[]) -> error "Only One Element"
(x:[x2]) -> x
(x:xs) -> lastButOneCase xs
What I wanted to find out is when would pattern matching be preferred over case expressions and vice versa. This example was not good enough for me because it seems that while both of the functions work as intended, it did not lead me to choose one implementation over the other. So the choice "seems" preferential at first glance?
So are there good cases by means of source code, either in haskell's own source or github or somewhere else, where one is able to see when either method is preferred or not?
First a short terminology diversion: I would call both of these "pattern matching". I'm not sure there is a good term for distinguishing pattern-matching-via-case and pattern-matching-via-multiple-definition.
The technical distinction between the two is quite light indeed. You can verify this yourself by asking GHC to dump the core it generates for the two functions, using the -ddump-simpl flag. I tried this at a few different optimization levels, and in all cases the only differences in the Core were naming. (By the way, if anyone knows a good "semantic diff" program for Core -- which knows about at the very least alpha equivalence -- I'm very interested in hearing about it!)
There are a few small gotchas to watch out for, though. You might wonder whether the following is also equivalent:
{-# LANGUAGE LambdaCase #-}
lastButOne = \case
[] -> error "Empty List"
(x:[]) -> error "Only One Element"
(x:[x2]) -> x
(x:xs) -> lastButOneCase xs
In this case, the answer is yes. But consider this similar-looking one:
-- ambiguous type error
sort = \case
[] -> []
x:xs -> insert x (sort xs)
All of a sudden this is a typeclass-polymorphic CAF, and so on old GHCs this will trigger the monomorphism restriction and cause an error, whereas the superficially identical version with an explicit argument does not:
-- this is fine!
sort [] = []
sort (x:xs) = insert x (sort xs)
The other minor difference (which I forgot about -- thank you to Thomas DuBuisson for reminding me) is in the handling of where clauses. Since where clauses are attached to binding sites, they cannot be shared across multiple equations but can be shared across multiple cases. For example:
-- error; the where clause attaches to the second equation, so
-- empty is not in scope in the first equation
null [] = empty
null (x:xs) = nonempty
where empty = True
nonempty = False
-- ok; the where clause attaches to the equation, so both empty
-- and nonempty are in scope for the entire case expression
null x = case x of
[] -> empty
x:xs -> nonempty
where
empty = True
nonempty = False
You might think this means you can do something with equations that you can't do with case expressions, namely, have different meanings for the same name in the two equations, like this:
null [] = answer where answer = True
null (x:xs) = answer where answer = False
However, since the patterns of case expressions are binding sites, this can be emulated in case expressions as well:
null x = case x of
[] -> answer where answer = True
x:xs -> answer where answer = False
Whether the where clause is attached to the case's pattern or to the equation depends on indentation, of course.
If I recall correctly both these will "desugar" into the same core code in ghc, so the choice is purely stylistic. Personally I would go for the first one. As someone said, its shorter, and what you term "pattern matching" is intended to be used this way. (Actually the second version is also pattern matching, just using a different syntax for it).
It's a stylistic preference. Some people sometimes argue that one choice or another makes certain code changes take less effort, but I generally find such arguments, even when accurate, don't actually amount to a big improvement. So do as you like.
A perspective that's well worth bringing into this is Hudak, Hughes, Peyton Jones and Wadler's paper "A History of Haskell: Being Lazy With Class". Section 4.4 is about this topic. The short story: Haskell supports both because the designers couldn't agree on one over the other. Yep, again, it's a stylistic preference.
When you're matching on more than one expression, case expressions start to look more attractive.
f pat11 pat21 = ...
f pat11 pat22 = ...
f pat11 pat23 = ...
f pat12 pat24 = ...
f pat12 pat25 = ...
can be more annoying to write than
f pat11 y =
case y of
pat21 -> ...
pat22 -> ...
pat23 -> ...
f pat12 y =
case y of
pat24 -> ...
pat25 -> ...
More significantly, I've found that when using GADTs, the "declaration style" doesn't seem to propagate evidence from left to right the way I'd expect it to. There might be some trick I haven't worked out, but I end up having to nest case expressions to avoid spurious incomplete pattern warnings.

Why should I use case expressions if I can use "equations"?

I'm learning Haskell, from the book "Real World Haskell". In pages 66 and 67, they show the case expressions with this example:
fromMaybe defval wrapped =
case wrapped of
Nothing -> defval
Just value -> value
I remember a similar thing in F#, but (as shown earlier in the book) Haskell can define functions as series of equations; while AFAIK, F Sharp cannot. So I tried to define this in such a way:
fromMaybe2 defval Nothing = defval
fromMaybe2 defval (Just value) = value
I loaded it in GHCi and after a couple of results, I convinced myself it was the same However; this makes me wonder, why should there be case expressions when equations:
are more comprehensible (it's Mathematics; why use case something of, who says that?);
are less verbose (2 vs 4 lines);
require much less structuring and syntatic sugar (-> could be an operator, look what they've done!);
only use variables when needed (in basic cases, such as this wrapped just takes up space).
What's good about case expressions? Do they exist only because similar FP-based languages (such as F#) have them? Am I missing something?
Edit:
I see from #freyrs's answer that the compiler makes these exactly the same. So, equations can always be turned into case expressions (as expected). My next question is the converse; can one go the opposite route of the compiler and use equations with let/where expressions to express any case expression?
This comes from a culture of having small "kernel" expression-oriented languages. Haskell grows from Lisp's roots (i.e. lambda calculus and combinatory logic); it's basically Lisp plus syntax plus explicit data type definitions plus pattern matching minus mutation plus lazy evaluation (lazy evaluation was itself first described in Lisp AFAIK; i.e. in the 70-s).
Lisp-like languages are expression-oriented, i.e. everything is an expression, and a language's semantics is given as a set of reduction rules, turning more complex expressions into simpler ones, and ultimately into "values".
Equations are not expressions. Several equations could be somehow mashed into one expression; you'd have to introduce some syntax for that; case is that syntax.
Rich syntax of Haskell gets translated into smaller "core" language, that has case as one of its basic building blocks. case has to be a basic construct, because pattern-matching in Haskell is made to be such a basic, core feature of the language.
To your new question, yes you can, by introducing auxiliary functions as Luis Casillas shows in his answer, or with the use of pattern guards, so his example becomes:
foo x y | (Meh o p) <- z = baz y p o
| (Gah t q) <- z = quux x t q
where
z = bar x
The two functions compile into exactly the same internal code in Haskell ( called Core ) which you can dump out by passing the flags -ddump-simpl -dsuppress-all to ghc.
It may look a bit intimidating with the variable names, but it's effectively just a explicitly typed version of the code you wrote above. The only difference is the variables names.
fromMaybe2
fromMaybe2 =
\ # t_aBC defval_aB6 ds_dCK ->
case ds_dCK of _ {
Nothing -> (defval_aB6) defval_aB6;
Just value_aB8 -> (value_aB8) value_aB8
}
fromMaybe
fromMaybe =
\ # t_aBJ defval_aB3 wrapped_aB4 ->
case wrapped_aB4 of _ {
Nothing -> (defval_aB3) defval_aB3;
Just value_aB5 -> (value_aB5) value_aB5
}
The paper "A History of Haskell: Being Lazy with Class" (PDF) provides some useful perspective on this question. Section 4.4 ("Declaration style vs. expression style," p.13) is about this topic. The money quote:
[W]e engaged in furious debate about which style was “better.” An underlying assumption was that if possible there should be “just one way to do something,” so that, for example, having both let and where would be redundant and confusing. [...] In the end, we abandoned the underlying assumption, and provided full syntactic support for both styles.
Basically they couldn't agree on one so they threw both in. (Note that quote is explicitly about let and where, but they treat both that choice and the case vs. equations choice as two manifestations of the same basic choice—what they call "declaration style" vs. "expression style.")
In modern practice, the declaration style (your "series of equations") has become the more common one. case is often seen in this situation, where you need to match on a value that is computed from one of the arguments:
foo x y = case bar x of
Meh o p -> baz y p o
Gah t q -> quux x t q
You can always rewrite this to use an auxiliary function:
foo x y = go (bar x)
where go (Meh o p) = baz y p o
go (Gah t q) = quux x t q
This has the very minor disadvantage that you need to name your auxiliary function—but go is normally a perfectly fine name in this situation.
Case expression can be used anywhere an expression is expected, while equations can't. Example:
1 + (case even 9 of True -> 2; _ -> 3)
You can even nest case expression, and I've seen code that does that. However I tend to stay away from case expressions, and try to solve the problem with equations, even if I have to introduce a local function using where/let.
Every definition using equations is equivalent to one using case. For instance
negate True = False
negate False = True
stands for
negate x = case x of
True -> False
False -> True
That is to say, these are two ways of expressing the same thing, and the former is translated to the latter by GHC.
From the Haskell code that I've read, it seems canonical to use the first style wherever possible.
See section 4.4.3.1 of the Haskell '98 report.
The answer to your added question is yes, but it's pretty ugly.
case exp of
pat1 -> e1
pat2 -> e2
etc.
can, I believe, be emulated by
let
f pat1 = e1
f pat2 = e2
etc.
in f exp
as long as f is not free in exp, e1, e2, etc. But you shouldn't do that because it's horrible.

Is it recommended to always have exhaustive pattern matches in Haskell, even for "impossible" cases?

Is it recommended to always have exhaustive pattern matches in Haskell, even for "impossible" cases?
For example, in the following code, I am pattern matching on the "accumulator" of a foldr. I am in complete control of the contents of the accumulator, because I create it (it is not passed to me as input, but rather built within my function). Therefore, I know certain patterns should never match it. If I strive to never get the "Pattern match(es) are non-exhaustive" error, then I would place a pattern match for it that simply error's with the message "This pattern should never happen." Much like an assert in C#. I can't think of anything else to do there.
What practice would you recommend in this situation and why?
Here's the code:
gb_groupBy p input = foldr step [] input
where
step item acc = case acc of
[] -> [[item]]
((x:xs):ys) -> if p x item
then (item:x:xs):ys
else [item]:acc
The pattern not matched (as reported by the interpreter) is:
Warning: Pattern match(es) are non-exhaustive
In a case alternative: Patterns not matched: [] : _
This is probably more a matter of style than anything else. Personally, I would put in a
_ -> error "Impossible! Empty list in step"
if only to silence the warning :)
You can resolve the warning in this special case by doing this:
gb_groupBy p input = foldr step [] input
where
step item acc = case acc of
[] -> [[item]]
(xs:xss) -> if p (head xs) item
then (item:xs):xss
else [item]:acc
The pattern matching is then complete, and the "impossible" condition of an empty list at the head of the accumulator would cause a runtime error but no warning.
Another way of looking at the more general problem of incomplete pattern matchings is to see them as a "code smell", i.e. an indication that we're trying to solve a problem in a suboptimal, or non-Haskellish, way, and try to rewrite our functions.
Implementing groupBy with a foldr makes it impossible to apply it to an infinite list, which is a design goal that the Haskell List functions try to achieve wherever semantically reasonable. Consider
take 5 $ groupBy (==) someFunctionDerivingAnInfiniteList
If the first 5 groups w.r.t. equality are finite, lazy evaluation will terminate. This is something you can't do in a strictly evaluated language. Even if you don't work with infinite lists, writing functions like this will yield better performance on long lists, or avoid the stack overflow that occurs when evaluating expressions like
take 5 $ gb_groupBy (==) [1..1000000]
In List.hs, groupBy is implemented like this:
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _ [] = []
groupBy eq (x:xs) = (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs
This enables the interpreter/compiler to evaluate only the parts of the computation necessary for the result.
span yields a pair of lists, where the first consists of (consecutive) elements from the head of the list all satisfying a predicate, and the second is the rest of the list. It's also implemented to work on infinite lists.
I find exhaustiveness checking on case patterns indispensible. I try never to use _ in a case at top level, because _ matches everything, and by using it you vitiate the value of exhaustiveness checking. This is less important with lists but critical important with user-defined algebraic data types, because I want to be able to add a new constructor and have the compiler barf on all the missing cases. For this reason I always compile with -Werror turned on, so there is no way I can leave out a case.
As observed, your code can be extended with this case
[] : _ -> error "this can't happen"
Internally, GHC has a panic function, which unlike error will give source coordinates, but I looked at the implementation and couldn't make head or tail of it.
To follow up on my earlier comment, I realised that there is a way to acknowledge the missing case but still get a useful error with file/line number. It's not ideal as it'll only appear in unoptimized builds, though (see here).
...
[]:xs -> assert False (error "unreachable because I know everything")
The type system is your friend, and the warning is letting you know your function has cracks. The very best approach is to go for a cleaner, more elegant fit between types.
Consider ghc's definition of groupBy:
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _ [] = []
groupBy eq (x:xs) = (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs
My point of view is that an impossible case is undefined.
If it's undefined we have a function for it: the cunningly named undefined.
Complete your matching with the likes of:
_ -> undefined
And there you have it!

Resources