In Haskell, why non-exhaustive patterns are not compile-time errors? - haskell

This is a follow-up of Why am I getting "Non-exhaustive patterns in function..." when I invoke my Haskell substring function?
I understand that using -Wall, GHC can warn against non-exhaustive patterns. I'm wondering what's the reason behind not making it a compile-time error by default given that it's always possible to explicitly define a partial function:
f :: [a] -> [b] -> Int
f [] _ = error "undefined for empty array"
f _ [] = error "undefined for empty array"
f (_:xs) (_:ys) = length xs + length ys
The question is not GHC-specific.
Is it because...
nobody wanted to enforce a Haskell compiler to perform this kind of analysis?
a non-exhaustive pattern search can find some but not all cases?
partially defined functions are considered legitimate and used often enough not to impose the kind of construct shown above? If this is the case, can you explain to me why non-exhaustive patterns are helpful/legitimate?

There are cases where you don't mind that a pattern match is non-exhaustive. For example, while this might not be the optimal implementation, I don't think it would help if it didn't compile:
fac 0 = 1
fac n | n > 0 = n * fac (n-1)
That this is non-exhaustive (negative numbers don't match any case) doesn't really matter for the typical usage of the factorial function.
Also it might not generally be possible to decide for the compiler if a pattern match is exhaustive:
mod2 :: Integer -> Integer
mod2 n | even n = 0
mod2 n | odd n = 1
Here all cases should be covered, but the compiler probably can't detect it. Since the guards could be arbitrarily complex, the compiler cannot always decide if the patterns are exhaustive. Of course this example would better be written with otherwise, but I think it should also compile in its current form.

You can use -Werror to turn warnings into errors. I don't know if you can turn just the non-exhaustive patterns warnings into errors, sorry!
As for the third part of your question:
I sometimes write a number of functions which tend to work closely together and have properties which you can't easily express in Haskell. At least some of these functions tend to have non-exhaustive patterns, usually the 'consumers'. This comes up, for example in functions which are 'sort-of' inverses of each other.
A toy example:
duplicate :: [a] -> [a]
duplicate [] = []
duplicate (x:xs) = x : x : (duplicate xs)
removeDuplicates :: Eq a => [a] -> [a]
removeDuplicates [] = []
removeDuplicates (x:y:xs) | x == y = x : removeDuplicates xs
Now it's pretty easy to see that removeDuplicates (duplicate as) is equal to as (whenever the element type is in Eq), but in general duplicate (removeDuplicates bs) will crash, because there are an odd number of elements or 2 consecutive elements differ. If it doesn't crash, it's because bs was produced by (or could have been produced by) duplicate in the first place!.
So we have the following laws (not valid Haskell):
removeDuplicates . duplicate == id
duplicate . removeDuplicates == id (for values in the range of duplicate)
Now, if you want to prevent non-exhaustive patterns here, you could make removeDuplicates return Maybe [a], or add error messages for the missing cases. You could even do something along the lines of
newtype DuplicatedList a = DuplicatedList [a]
duplicate :: [a] -> DuplicatedList a
removeDuplicates :: Eq a => DuplicatedList a -> [a]
-- implementations omitted
All this is necessary, because you can't easily express 'being a list of even length, with consecutive pairs of elements being equal' in the Haskell type system (unless you're Oleg :)
But if you don't export removeDuplicates I think it's perfectly okay to use non-exhaustive patterns here. As soon as you do export it, you'll lose control over the inputs and will have to deal with the missing cases!

Related

Haskell: Why ++ is not allowed in pattern matching?

Suppose we want to write our own sum function in Haskell:
sum' :: (Num a) => [a] -> a
sum' [] = 0
sum' (x:xs) = x + sum' xs
Why can't we do something like:
sum' :: (Num a) => [a] -> a
sum' [] = 0
sum' (xs++[x]) = x + sum' xs
In other words why can't we use ++ in pattern matching ?
This is a deserving question, and it has so far received sensible answers (mutter only constructors allowed, mutter injectivity, mutter ambiguity), but there's still time to change all that.
We can say what the rules are, but most of the explanations for why the rules are what they are start by over-generalising the question, addressing why we can't pattern match against any old function (mutter Prolog). This is to ignore the fact that ++ isn't any old function: it's a (spatially) linear plugging-stuff-together function, induced by the zipper-structure of lists. Pattern matching is about taking stuff apart, and indeed, notating the process in terms of the plugger-togetherers and pattern variables standing for the components. Its motivation is clarity. So I'd like
lookup :: Eq k => k -> [(k, v)] -> Maybe v
lookup k (_ ++ [(k, v)] ++ _) = Just v
lookup _ _ = Nothing
and not only because it would remind me of the fun I had thirty years ago when I implemented a functional language whose pattern matching offered exactly that.
The objection that it's ambiguous is a legitimate one, but not a dealbreaker. Plugger-togetherers like ++ offer only finitely many decompositions of finite input (and if you're working on infinite data, that's your own lookout), so what's involved is at worst search, rather than magic (inventing arbitrary inputs that arbitrary functions might have thrown away). Search calls for some means of prioritisation, but so do our ordered matching rules. Search can also result in failure, but so, again, can matching.
We have a sensible way to manage computations offering alternatives (failure and choice) via the Alternative abstraction, but we are not used to thinking of pattern matching as a form of such computation, which is why we exploit Alternative structure only in the expression language. The noble, if quixotic, exception is match-failure in do-notation, which calls the relevant fail rather than necessarily crashing out. Pattern matching is an attempt to compute an environment suitable for the evaluation of a 'right-hand side' expression; failure to compute such an environment is already handled, so why not choice?
(Edit: I should, of course, add that you only really need search if you have more than one stretchy thing in a pattern, so the proposed xs++[x] pattern shouldn't trigger any choices. Of course, it takes time to find the end of a list.)
Imagine there was some sort of funny bracket for writing Alternative computations, e.g., with (|) meaning empty, (|a1|a2|) meaning (|a1|) <|> (|a2|), and a regular old (|f s1 .. sn|) meaning pure f <*> s1 .. <*> sn. One might very well also imagine (|case a of {p1 -> a1; .. pn->an}|) performing a sensible translation of search-patterns (e.g. involving ++) in terms of Alternative combinators. We could write
lookup :: (Eq k, Alternative a) => k -> [(k, v)] -> a k
lookup k xs = (|case xs of _ ++ [(k, v)] ++ _ -> pure v|)
We may obtain a reasonable language of search-patterns for any datatype generated by fixpoints of differentiable functors: symbolic differentiation is exactly what turns tuples of structures into choices of possible substructures. Good old ++ is just the sublists-of-lists example (which is confusing, because a list-with-a-hole-for-a-sublist looks a lot like a list, but the same is not true for other datatypes).
Hilariously, with a spot of LinearTypes, we might even keep hold of holey data by their holes as well as their root, then plug away destructively in constant time. It's scandalous behaviour only if you don't notice you're doing it.
You can only pattern match on constructors, not on general functions.
Mathematically, a constructor is an injective function: each combination of arguments gives one unique value, in this case a list. Because that value is unique, the language can deconstruct it again into the original arguments. I.e., when you pattern match on :, you essentially use the function
uncons :: [a] -> Maybe (a, [a])
which checks if the list is of a form you could have constructed with : (i.e., if it is non-empty), and if yes, gives you back the head and tail.
++ is not injective though, for example
Prelude> [0,1] ++ [2]
[0,1,2]
Prelude> [0] ++ [1,2]
[0,1,2]
Neither of these representations is the right one, so how should the list be deconstructed again?
What you can do however is define a new, “virtual” constructor that acts like : in that it always seperates exactly one element from the rest of the list (if possible), but does so on the right:
{-# LANGUAGE PatternSynonyms, ViewPatterns #-}
pattern (:>) :: [a] -> a -> [a]
pattern (xs:>ω) <- (unsnoc -> Just (xs,ω))
where xs:>ω = xs ++ [ω]
unsnoc :: [a] -> Maybe ([a], a)
unsnoc [] = Nothing
unsnoc [x] = Just x
unsnoc (_:xs) = unsnoc xs
Then
sum' :: Num a => [a] -> a
sum' (xs:>x) = x + sum xs
sum' [] = 0
Note that this is very inefficient though, because the :> pattern-synonym actually needs to dig through the entire list, so sum' has quadratic rather than linear complexity.
A container that allows pattern matching on both the left and right end efficiently is Data.Sequence, with its :<| and :|> pattern synonyms.
You can only pattern-match on data constructors, and ++ is a function, not a data constructor.
Data constructors are persistent; a value like 'c':[] cannot be simplified further, because it is a fundamental value of type [Char]. An expression like "c" ++ "d", however, can replaced with its equivalent "cd" at any time, and thus couldn't reliably be counted on to be present for pattern matching.
(You might argue that "cd" could always replaced by "c" ++ "d", but in general there isn't a one-to-one mapping between a list and a decomposition via ++. Is "cde" equivalent to "c" ++ "de" or "cd" ++ "e" for pattern matching purposes?)
++ isn't a constructor, it's just a plain function. You can only match on constructors.
You can use ViewPatterns or PatternSynonyms to augment your ability to pattern match (thanks #luqui).

Error: Non-exhaustive patterns in function Haskell

I did a thread for an error like this one, in it I explain my program. here's the link
I'm going forward in my project and I have an another problem like that one. I did an other thread but if I just need to edit the first one just tell me.
I want to reverse my matrix. For example [[B,B,N],[N,B,B]] will become [[B,N],[B,B],[N,B]]. Here's my code:
transpose :: Grille -> Grille
transpose [] = []
transpose g
| head(g) == [] = []
| otherwise = [premierElem(g)] ++ transpose(supp g)
supp :: Grille -> Grille
supp [] = []
supp (g:xg) = [tail(g)] ++ supp(xg)
premierElem :: Grille -> [Case]
premierElem [] = []
premierElem (x:xg) = [head(x)] ++ premierElem(xg)
I got the exact same error and I tried like for the first one but that's not it.
EDIT: The exact error
*Main> transpose g0
[[B,B,N,B,N,B],[B,B,B,B,N,B],[B,N,N,B,N,B],[B,B,N,N,B,N],[B,N,N,N,B,N],[B,B,N,B,B,B],[N,B,N,N,B,N],[*** Exception: Prelude.head: empty list
The problem is that your transpose function has a broken termination condition. How does it know when to stop? Try walking through the final step by hand...
In general, your case transpose [] = [] will never occur, because your supp function never changes the number of lists in its argument. A well-formed matrix will end up as [[],[],[],...], which will not match []. The only thing that will stop it is an error like you received.
So, you need to check the remaining length of your nested (row?) vectors, and stop if it is zero. There are many ways to approach this; if it's not cheating, you could look at the implementation of transpose in the Prelude documents.
Also, re your comment above: if you expect your input to be well-formed in some way, you should cover any excluded cases by complaining about the ill-formed input, such as reporting an error.
Fixing Your Code
You should avoid using partial functions, such as tail and head, and instead make your own functions do (more) pattern matching. For example:
premierElem g = [head(head(g))] ++ premierElem(tail(g))
Yuck! If you want the first element of the first list in g then match on the pattern:
premierElem ((a:_):rest) = [a] ++ premierElem rest
This in and of itself is insufficient, you'll want to handle the case where the first list of the Grille is an empty list and at least give a useful error message if you can't use a reasonable default value:
premeirElem ([]:rest) = premeirElem rest
Making Better Code
Eventually you will become more comfortable in the language and learn to express what you want using higher level operations, which often means you'll be able to reuse functions already provided in base or other libraries. In this case:
premeirElem :: [[a]] -> [a]
premeirElem = concatMap (take 1)
Which assumes you are OK with silently ignoring []. If that isn't your intent then other similarly concise solutions can work well, but we'd need clarity on the goal.

A real life example when pattern matching is more preferable than a case expression in Haskell?

So I have been busy with the Real World Haskell book and I did the lastButOne exercise. I came up with 2 solutions, one with pattern matching
lastButOne :: [a] -> a
lastButOne ([]) = error "Empty List"
lastButOne (x:[]) = error "Only one element"
lastButOne (x:[x2]) = x
lastButOne (x:xs) = lastButOne xs
And one using a case expression
lastButOneCase :: [a] -> a
lastButOneCase x =
case x of
[] -> error "Empty List"
(x:[]) -> error "Only One Element"
(x:[x2]) -> x
(x:xs) -> lastButOneCase xs
What I wanted to find out is when would pattern matching be preferred over case expressions and vice versa. This example was not good enough for me because it seems that while both of the functions work as intended, it did not lead me to choose one implementation over the other. So the choice "seems" preferential at first glance?
So are there good cases by means of source code, either in haskell's own source or github or somewhere else, where one is able to see when either method is preferred or not?
First a short terminology diversion: I would call both of these "pattern matching". I'm not sure there is a good term for distinguishing pattern-matching-via-case and pattern-matching-via-multiple-definition.
The technical distinction between the two is quite light indeed. You can verify this yourself by asking GHC to dump the core it generates for the two functions, using the -ddump-simpl flag. I tried this at a few different optimization levels, and in all cases the only differences in the Core were naming. (By the way, if anyone knows a good "semantic diff" program for Core -- which knows about at the very least alpha equivalence -- I'm very interested in hearing about it!)
There are a few small gotchas to watch out for, though. You might wonder whether the following is also equivalent:
{-# LANGUAGE LambdaCase #-}
lastButOne = \case
[] -> error "Empty List"
(x:[]) -> error "Only One Element"
(x:[x2]) -> x
(x:xs) -> lastButOneCase xs
In this case, the answer is yes. But consider this similar-looking one:
-- ambiguous type error
sort = \case
[] -> []
x:xs -> insert x (sort xs)
All of a sudden this is a typeclass-polymorphic CAF, and so on old GHCs this will trigger the monomorphism restriction and cause an error, whereas the superficially identical version with an explicit argument does not:
-- this is fine!
sort [] = []
sort (x:xs) = insert x (sort xs)
The other minor difference (which I forgot about -- thank you to Thomas DuBuisson for reminding me) is in the handling of where clauses. Since where clauses are attached to binding sites, they cannot be shared across multiple equations but can be shared across multiple cases. For example:
-- error; the where clause attaches to the second equation, so
-- empty is not in scope in the first equation
null [] = empty
null (x:xs) = nonempty
where empty = True
nonempty = False
-- ok; the where clause attaches to the equation, so both empty
-- and nonempty are in scope for the entire case expression
null x = case x of
[] -> empty
x:xs -> nonempty
where
empty = True
nonempty = False
You might think this means you can do something with equations that you can't do with case expressions, namely, have different meanings for the same name in the two equations, like this:
null [] = answer where answer = True
null (x:xs) = answer where answer = False
However, since the patterns of case expressions are binding sites, this can be emulated in case expressions as well:
null x = case x of
[] -> answer where answer = True
x:xs -> answer where answer = False
Whether the where clause is attached to the case's pattern or to the equation depends on indentation, of course.
If I recall correctly both these will "desugar" into the same core code in ghc, so the choice is purely stylistic. Personally I would go for the first one. As someone said, its shorter, and what you term "pattern matching" is intended to be used this way. (Actually the second version is also pattern matching, just using a different syntax for it).
It's a stylistic preference. Some people sometimes argue that one choice or another makes certain code changes take less effort, but I generally find such arguments, even when accurate, don't actually amount to a big improvement. So do as you like.
A perspective that's well worth bringing into this is Hudak, Hughes, Peyton Jones and Wadler's paper "A History of Haskell: Being Lazy With Class". Section 4.4 is about this topic. The short story: Haskell supports both because the designers couldn't agree on one over the other. Yep, again, it's a stylistic preference.
When you're matching on more than one expression, case expressions start to look more attractive.
f pat11 pat21 = ...
f pat11 pat22 = ...
f pat11 pat23 = ...
f pat12 pat24 = ...
f pat12 pat25 = ...
can be more annoying to write than
f pat11 y =
case y of
pat21 -> ...
pat22 -> ...
pat23 -> ...
f pat12 y =
case y of
pat24 -> ...
pat25 -> ...
More significantly, I've found that when using GADTs, the "declaration style" doesn't seem to propagate evidence from left to right the way I'd expect it to. There might be some trick I haven't worked out, but I end up having to nest case expressions to avoid spurious incomplete pattern warnings.

One interesting pattern

I'm solving 99 Haskell Probems. I've successfully solved problem No. 21, and when I opened solution page, the following solution was proposed:
Insert an element at a given position into a list.
insertAt :: a -> [a] -> Int -> [a]
insertAt x xs (n+1) = let (ys,zs) = split xs n in ys++x:zs
I found pattern (n + 1) interesting, because it seems to be an elegant way to convert 1-based argument of insertAt into 0-based argument of split (it's function from previous exercises, essentially the same as splitAt). The problem is that GHC did not find this pattern that elegant, in fact it says:
Parse error in pattern: n + 1
I don't think that the guy who wrote the answer was dumb and I would like to know if this kind of patterns is legal in Haskell, and if it is, how to fix the solution.
I believe it has been removed from the language, and so was likely around when the author of 99 Haskell Problems wrote that solution, but it is no longer in Haskell.
The problem with n+k patterns goes back to a design decision in Haskell, to distinguish between constructors and variables in patterns by the first character of their names. If you go back to ML, a common function definition might look like (using Haskell syntax)
map f nil = nil
map f (x:xn) = f x : map f xn
As you can see, syntactically there's no difference between f and nil on the LHS of the first line, but they have different roles; f is a variable that needs to be bound to the first argument to map while nil is a constructor that needs to be matched against the second. Now, ML makes this distinction by looking each variable up in the surrounding scope, and assuming names are variables when the look-up fails. So nil is recognized as a constructor when the lookup fails. But consider what happens when there's a typo in the pattern:
map f niil = nil
(two is in niil). niil isn't a constructor name in scope, so it gets treated as a variable, and the definition is silently interpreted incorrectly.
Haskell's solution to this problem is to require constructor names to begin with uppercase letters, and variable names to begin with lowercase letters. And, for infix operators / constructors, constructor names must begin with : while operator names may not begin with :. This also helps distinguish between deconstructing bindings:
x:xn = ...
is clearly a deconstructing binding, because you can't define a function named :, while
n - m = ...
is clearly a function definition, because - can't be a constructor name.
But allowing n+k patterns, like n+1, means that + is both a valid function name, and something that works like a constructor in patterns. Now
n + 1 = ...
is ambiguous again; it could be part of the definition of a function named (+), or it could be a deconstructing pattern match definition of n. In Haskell 98, this ambiguity was solved by declaring
n + 1 = ...
a function definition, and
(n + 1) = ...
a deconstructing binding. But that obviously was never a satisfactory solution.
Note that you can now use view patterns instead of n+1.
For example:
{-# LANGUAGE ViewPatterns #-}
module Temp where
import Data.List (splitAt)
split :: [a] -> Int -> ([a], [a])
split = flip splitAt
insertAt :: a -> [a] -> Int -> [a]
insertAt x xs (subtract 1 -> n) = let (ys,zs) = split xs n in ys++x:zs

Is it recommended to always have exhaustive pattern matches in Haskell, even for "impossible" cases?

Is it recommended to always have exhaustive pattern matches in Haskell, even for "impossible" cases?
For example, in the following code, I am pattern matching on the "accumulator" of a foldr. I am in complete control of the contents of the accumulator, because I create it (it is not passed to me as input, but rather built within my function). Therefore, I know certain patterns should never match it. If I strive to never get the "Pattern match(es) are non-exhaustive" error, then I would place a pattern match for it that simply error's with the message "This pattern should never happen." Much like an assert in C#. I can't think of anything else to do there.
What practice would you recommend in this situation and why?
Here's the code:
gb_groupBy p input = foldr step [] input
where
step item acc = case acc of
[] -> [[item]]
((x:xs):ys) -> if p x item
then (item:x:xs):ys
else [item]:acc
The pattern not matched (as reported by the interpreter) is:
Warning: Pattern match(es) are non-exhaustive
In a case alternative: Patterns not matched: [] : _
This is probably more a matter of style than anything else. Personally, I would put in a
_ -> error "Impossible! Empty list in step"
if only to silence the warning :)
You can resolve the warning in this special case by doing this:
gb_groupBy p input = foldr step [] input
where
step item acc = case acc of
[] -> [[item]]
(xs:xss) -> if p (head xs) item
then (item:xs):xss
else [item]:acc
The pattern matching is then complete, and the "impossible" condition of an empty list at the head of the accumulator would cause a runtime error but no warning.
Another way of looking at the more general problem of incomplete pattern matchings is to see them as a "code smell", i.e. an indication that we're trying to solve a problem in a suboptimal, or non-Haskellish, way, and try to rewrite our functions.
Implementing groupBy with a foldr makes it impossible to apply it to an infinite list, which is a design goal that the Haskell List functions try to achieve wherever semantically reasonable. Consider
take 5 $ groupBy (==) someFunctionDerivingAnInfiniteList
If the first 5 groups w.r.t. equality are finite, lazy evaluation will terminate. This is something you can't do in a strictly evaluated language. Even if you don't work with infinite lists, writing functions like this will yield better performance on long lists, or avoid the stack overflow that occurs when evaluating expressions like
take 5 $ gb_groupBy (==) [1..1000000]
In List.hs, groupBy is implemented like this:
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _ [] = []
groupBy eq (x:xs) = (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs
This enables the interpreter/compiler to evaluate only the parts of the computation necessary for the result.
span yields a pair of lists, where the first consists of (consecutive) elements from the head of the list all satisfying a predicate, and the second is the rest of the list. It's also implemented to work on infinite lists.
I find exhaustiveness checking on case patterns indispensible. I try never to use _ in a case at top level, because _ matches everything, and by using it you vitiate the value of exhaustiveness checking. This is less important with lists but critical important with user-defined algebraic data types, because I want to be able to add a new constructor and have the compiler barf on all the missing cases. For this reason I always compile with -Werror turned on, so there is no way I can leave out a case.
As observed, your code can be extended with this case
[] : _ -> error "this can't happen"
Internally, GHC has a panic function, which unlike error will give source coordinates, but I looked at the implementation and couldn't make head or tail of it.
To follow up on my earlier comment, I realised that there is a way to acknowledge the missing case but still get a useful error with file/line number. It's not ideal as it'll only appear in unoptimized builds, though (see here).
...
[]:xs -> assert False (error "unreachable because I know everything")
The type system is your friend, and the warning is letting you know your function has cracks. The very best approach is to go for a cleaner, more elegant fit between types.
Consider ghc's definition of groupBy:
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _ [] = []
groupBy eq (x:xs) = (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs
My point of view is that an impossible case is undefined.
If it's undefined we have a function for it: the cunningly named undefined.
Complete your matching with the likes of:
_ -> undefined
And there you have it!

Resources