Pattern matching in Haskell case statements

Pattern matching in Haskell case statements - haskell

I came across the following piece of code:
lala :: [[Int]] -> Bool
lala b = case b of
(a:_) | Just (b, _) <- uncons a -> True
other -> False
While I understand that the function checks that the first element of the list is not empty (there are better ways of writing this, but that's not the point), I don't fully understand the pattern matching happening in the case statement. Is the left arrow in this case simply pattern matching on the uncons call? Can this style of pattern matching be nested? This almost seems like a list comprehension syntax, are there other places where this type of pattern matching can be used?

Related

How to Pattern Match With Algebraic Types in Haskell

The goal of the assignment I am working on is to create a bunch of different functions that involve searching a data type called a Trie, in which the constructor is defined as
data Trie = MakeTrie Char [Trie] deriving Eq
I am tying to first build simple functions so I can figure out how to descend this Trie, but it seems like pattern matching is not working.
test :: Trie -> Bool
test t
| t == MakeTrie '.' [_] = True
| otherwise = False
I get an error stating that a hole was found and that relevant bindings include t :: Trie. How can I let the interpreter know that [_] represents a list of Tries? The reason I am doing this is because I have no idea how else do go ahead descending my Trie later if I don't use pattern matching.

You should checkout the function syntax chapter in Learn You A Haskell (particularly the first section on pattern matching).
This is how you do pattern matching in Haskell for this example:
test :: Trie -> Bool
test (MakeTrie '.' _) = True
test _ = False
Testing:
Prelude> test (MakeTrie '.' [])
True
Prelude> test (MakeTrie 'a' [])
False

There are two problems here:
if you write [_], this is a pattern that says "a list of one element, regardless what that element is"; and
you can not do pattern matching with (==).
Indeed (==) is a function that compares two objects. But it is not said that if two objects are equal, that these share the same constructor, etc. (==) can implement an arbitrary equivalence relation.
We can write the function as:
test :: Trie -> Bool
test (MakeTrie '.' _) = True
test _ = False
So here the first clause checks if the input matches the pattern MakeTrie '.' _, so it checks if it is the MakeTrie data constructor where the first parameter is a '.', the second parameter can by anything.
The second clause matches everything, and returns False in that case.

One interesting pattern

I'm solving 99 Haskell Probems. I've successfully solved problem No. 21, and when I opened solution page, the following solution was proposed:
Insert an element at a given position into a list.
insertAt :: a -> [a] -> Int -> [a]
insertAt x xs (n+1) = let (ys,zs) = split xs n in ys++x:zs
I found pattern (n + 1) interesting, because it seems to be an elegant way to convert 1-based argument of insertAt into 0-based argument of split (it's function from previous exercises, essentially the same as splitAt). The problem is that GHC did not find this pattern that elegant, in fact it says:
Parse error in pattern: n + 1
I don't think that the guy who wrote the answer was dumb and I would like to know if this kind of patterns is legal in Haskell, and if it is, how to fix the solution.

I believe it has been removed from the language, and so was likely around when the author of 99 Haskell Problems wrote that solution, but it is no longer in Haskell.

The problem with n+k patterns goes back to a design decision in Haskell, to distinguish between constructors and variables in patterns by the first character of their names. If you go back to ML, a common function definition might look like (using Haskell syntax)
map f nil = nil
map f (x:xn) = f x : map f xn
As you can see, syntactically there's no difference between f and nil on the LHS of the first line, but they have different roles; f is a variable that needs to be bound to the first argument to map while nil is a constructor that needs to be matched against the second. Now, ML makes this distinction by looking each variable up in the surrounding scope, and assuming names are variables when the look-up fails. So nil is recognized as a constructor when the lookup fails. But consider what happens when there's a typo in the pattern:
map f niil = nil
(two is in niil). niil isn't a constructor name in scope, so it gets treated as a variable, and the definition is silently interpreted incorrectly.
Haskell's solution to this problem is to require constructor names to begin with uppercase letters, and variable names to begin with lowercase letters. And, for infix operators / constructors, constructor names must begin with : while operator names may not begin with :. This also helps distinguish between deconstructing bindings:
x:xn = ...
is clearly a deconstructing binding, because you can't define a function named :, while
n - m = ...
is clearly a function definition, because - can't be a constructor name.
But allowing n+k patterns, like n+1, means that + is both a valid function name, and something that works like a constructor in patterns. Now
n + 1 = ...
is ambiguous again; it could be part of the definition of a function named (+), or it could be a deconstructing pattern match definition of n. In Haskell 98, this ambiguity was solved by declaring
n + 1 = ...
a function definition, and
(n + 1) = ...
a deconstructing binding. But that obviously was never a satisfactory solution.

Note that you can now use view patterns instead of n+1.
For example:
{-# LANGUAGE ViewPatterns #-}
module Temp where
import Data.List (splitAt)
split :: [a] -> Int -> ([a], [a])
split = flip splitAt
insertAt :: a -> [a] -> Int -> [a]
insertAt x xs (subtract 1 -> n) = let (ys,zs) = split xs n in ys++x:zs

View patterns vs. pattern guards

I'm trying to get a sense of the relationship between view patterns and pattern guards in GHC. Pattern guards seem quite intuitive, while view patterns seem a bit confusing. It kind of looks like view patterns are better for dealing with things deep in a pattern, while pattern guards can reuse a view more intuitively, but I don't quite get it.

View patterns have significant overlap with pattern guards. The main advantage of view patterns is that they can be nested, and avoid introducing intermediate pattern variables. For a silly example:
endpoints (sort -> begin : (reverse -> end : _)) = Just (begin, end)
endpoints _ = Nothing
The pattern guard equivalent requires every new view to bind a new pattern variable, alternating between evaluating expressions and binding patterns.
endpoints xs
| begin : sorted <- sort xs
, end : _ <- reverse sorted
= Just (begin, end)
| otherwise = Nothing
View patterns can also use only those variables bound earlier in the pattern, but it does look nice:
nonzero :: (a -> Int) -> a -> Maybe a
nonzero f (f -> 0) = Nothing
nonzero _ x = Just x
-- nonzero (fromEnum . not . null) "123" == Just "123"
-- "" == Nothing
The main advantage of pattern guards is that they are a simple generalisation of guards, and can include ordinary Boolean expressions. I generally prefer them over view patterns because I find the style of case and guards less repetitious than the equational style.

View patterns let you project a value before pattern matching on it. It can almost be thought of as a short cut for
foo x = case f x of
...
There's a bit of sugar on top for dealing with more complex views, but basically that's it. On the other hand, pattern guards are strictly more general,
They can include arbitrary boolean conditions for matching
They can match using more than one of the variables
I favor view patterns when I'm doing something "lens-like". I have a big piece of data and I'm interested in one particular view of it. For example, with lens
foo (view someLens -> Bar baz quux) = ...
Pattern guards tend to work well when you want something closer to a more flexible case expression.

In Haskell, why non-exhaustive patterns are not compile-time errors?

This is a follow-up of Why am I getting "Non-exhaustive patterns in function..." when I invoke my Haskell substring function?
I understand that using -Wall, GHC can warn against non-exhaustive patterns. I'm wondering what's the reason behind not making it a compile-time error by default given that it's always possible to explicitly define a partial function:
f :: [a] -> [b] -> Int
f [] _ = error "undefined for empty array"
f _ [] = error "undefined for empty array"
f (_:xs) (_:ys) = length xs + length ys
The question is not GHC-specific.
Is it because...
nobody wanted to enforce a Haskell compiler to perform this kind of analysis?
a non-exhaustive pattern search can find some but not all cases?
partially defined functions are considered legitimate and used often enough not to impose the kind of construct shown above? If this is the case, can you explain to me why non-exhaustive patterns are helpful/legitimate?

There are cases where you don't mind that a pattern match is non-exhaustive. For example, while this might not be the optimal implementation, I don't think it would help if it didn't compile:
fac 0 = 1
fac n | n > 0 = n * fac (n-1)
That this is non-exhaustive (negative numbers don't match any case) doesn't really matter for the typical usage of the factorial function.
Also it might not generally be possible to decide for the compiler if a pattern match is exhaustive:
mod2 :: Integer -> Integer
mod2 n | even n = 0
mod2 n | odd n = 1
Here all cases should be covered, but the compiler probably can't detect it. Since the guards could be arbitrarily complex, the compiler cannot always decide if the patterns are exhaustive. Of course this example would better be written with otherwise, but I think it should also compile in its current form.

You can use -Werror to turn warnings into errors. I don't know if you can turn just the non-exhaustive patterns warnings into errors, sorry!
As for the third part of your question:
I sometimes write a number of functions which tend to work closely together and have properties which you can't easily express in Haskell. At least some of these functions tend to have non-exhaustive patterns, usually the 'consumers'. This comes up, for example in functions which are 'sort-of' inverses of each other.
A toy example:
duplicate :: [a] -> [a]
duplicate [] = []
duplicate (x:xs) = x : x : (duplicate xs)
removeDuplicates :: Eq a => [a] -> [a]
removeDuplicates [] = []
removeDuplicates (x:y:xs) | x == y = x : removeDuplicates xs
Now it's pretty easy to see that removeDuplicates (duplicate as) is equal to as (whenever the element type is in Eq), but in general duplicate (removeDuplicates bs) will crash, because there are an odd number of elements or 2 consecutive elements differ. If it doesn't crash, it's because bs was produced by (or could have been produced by) duplicate in the first place!.
So we have the following laws (not valid Haskell):
removeDuplicates . duplicate == id
duplicate . removeDuplicates == id (for values in the range of duplicate)
Now, if you want to prevent non-exhaustive patterns here, you could make removeDuplicates return Maybe [a], or add error messages for the missing cases. You could even do something along the lines of
newtype DuplicatedList a = DuplicatedList [a]
duplicate :: [a] -> DuplicatedList a
removeDuplicates :: Eq a => DuplicatedList a -> [a]
-- implementations omitted
All this is necessary, because you can't easily express 'being a list of even length, with consecutive pairs of elements being equal' in the Haskell type system (unless you're Oleg :)
But if you don't export removeDuplicates I think it's perfectly okay to use non-exhaustive patterns here. As soon as you do export it, you'll lose control over the inputs and will have to deal with the missing cases!

Is it recommended to always have exhaustive pattern matches in Haskell, even for "impossible" cases?

Is it recommended to always have exhaustive pattern matches in Haskell, even for "impossible" cases?
For example, in the following code, I am pattern matching on the "accumulator" of a foldr. I am in complete control of the contents of the accumulator, because I create it (it is not passed to me as input, but rather built within my function). Therefore, I know certain patterns should never match it. If I strive to never get the "Pattern match(es) are non-exhaustive" error, then I would place a pattern match for it that simply error's with the message "This pattern should never happen." Much like an assert in C#. I can't think of anything else to do there.
What practice would you recommend in this situation and why?
Here's the code:
gb_groupBy p input = foldr step [] input
where
step item acc = case acc of
[] -> [[item]]
((x:xs):ys) -> if p x item
then (item:x:xs):ys
else [item]:acc
The pattern not matched (as reported by the interpreter) is:
Warning: Pattern match(es) are non-exhaustive
In a case alternative: Patterns not matched: [] : _

This is probably more a matter of style than anything else. Personally, I would put in a
_ -> error "Impossible! Empty list in step"
if only to silence the warning :)

You can resolve the warning in this special case by doing this:
gb_groupBy p input = foldr step [] input
where
step item acc = case acc of
[] -> [[item]]
(xs:xss) -> if p (head xs) item
then (item:xs):xss
else [item]:acc
The pattern matching is then complete, and the "impossible" condition of an empty list at the head of the accumulator would cause a runtime error but no warning.
Another way of looking at the more general problem of incomplete pattern matchings is to see them as a "code smell", i.e. an indication that we're trying to solve a problem in a suboptimal, or non-Haskellish, way, and try to rewrite our functions.
Implementing groupBy with a foldr makes it impossible to apply it to an infinite list, which is a design goal that the Haskell List functions try to achieve wherever semantically reasonable. Consider
take 5 $ groupBy (==) someFunctionDerivingAnInfiniteList
If the first 5 groups w.r.t. equality are finite, lazy evaluation will terminate. This is something you can't do in a strictly evaluated language. Even if you don't work with infinite lists, writing functions like this will yield better performance on long lists, or avoid the stack overflow that occurs when evaluating expressions like
take 5 $ gb_groupBy (==) [1..1000000]
In List.hs, groupBy is implemented like this:
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _ [] = []
groupBy eq (x:xs) = (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs
This enables the interpreter/compiler to evaluate only the parts of the computation necessary for the result.
span yields a pair of lists, where the first consists of (consecutive) elements from the head of the list all satisfying a predicate, and the second is the rest of the list. It's also implemented to work on infinite lists.

I find exhaustiveness checking on case patterns indispensible. I try never to use _ in a case at top level, because _ matches everything, and by using it you vitiate the value of exhaustiveness checking. This is less important with lists but critical important with user-defined algebraic data types, because I want to be able to add a new constructor and have the compiler barf on all the missing cases. For this reason I always compile with -Werror turned on, so there is no way I can leave out a case.
As observed, your code can be extended with this case
[] : _ -> error "this can't happen"
Internally, GHC has a panic function, which unlike error will give source coordinates, but I looked at the implementation and couldn't make head or tail of it.

To follow up on my earlier comment, I realised that there is a way to acknowledge the missing case but still get a useful error with file/line number. It's not ideal as it'll only appear in unoptimized builds, though (see here).
...
[]:xs -> assert False (error "unreachable because I know everything")

The type system is your friend, and the warning is letting you know your function has cracks. The very best approach is to go for a cleaner, more elegant fit between types.
Consider ghc's definition of groupBy:
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _ [] = []
groupBy eq (x:xs) = (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs

My point of view is that an impossible case is undefined.
If it's undefined we have a function for it: the cunningly named undefined.
Complete your matching with the likes of:
_ -> undefined
And there you have it!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Pattern matching in Haskell case statements - haskell

Related

How to Pattern Match With Algebraic Types in Haskell

One interesting pattern

View patterns vs. pattern guards

In Haskell, why non-exhaustive patterns are not compile-time errors?

Is it recommended to always have exhaustive pattern matches in Haskell, even for "impossible" cases?

Categories

Resources