Implementation of null function - haskell

I have to learn Haskell for university and therefor I'm using learnyouahaskell.com for the beginning.I always used imperative languages so I decided to practice Haskell by coding a lot more than I would for other languages.
I started to implement several functions to work with lists such as head, tail, init,...
At some point I looked up the implementations of these functions to compare to mine and I stumbled upon the null function defined in List.lhs.
null's implementation:
-- | Test whether a list is empty.
null :: [a] -> Bool
null [] = True
null (_:_) = False
my implementation:
mNull :: [a] -> Bool
mNull [] = True
mNull _ = False
I know there are no stupid questions even for such simple questions :)
So my question is why the original implementation uses (_:_) instead of just _?
Is there any advantage in using (_:_) or are there any edge cases I don't know of?
I can't really imagine any advantage because _ catches everything.

You can't just turn the clauses around with your solution:
mNull' :: [a] -> Bool
mNull' _ = False
mNull' [] = True
this
will always yield False, even if you pass an empty list. Because the runtime doesn't ever consider the [] clause, it immediately sees _ matches anything. (GHC will warn you about such an overlapping pattern.)
On the other hand,
null' :: [a] -> Bool
null' (_:_) = False
null' [] = True
still works correctly, because (_:_) fails to match the particular case of an empty list.
That in itself doesn't really give the two explicit clauses an advantage. However, in more complicated code, writing out all the mutually excluse options has one benefit: if you've forgotten one option, the compiler can also warn you about that! Whereas a _ can and will just handle any case not dealt with by the previous clauses, even if that's not actually correct.

Because _ literally means anything apart from explicitly specified patterns. When you specify (_:_) it means anything which can be represented as a list containing at least 1 element, without bothering with what or even how many elements the list actually contains. Since the case with an explicit pattern for empty list is already present, (_:_) might as well be replaced by _.
However, representing it as (_:_) gives you the flexibility to not even explicitly pass the empty list pattern. In fact, this will work:
-- | Test whether a list is empty.
null :: [a] -> Bool
null (_:_) = False
null _ = True
Demo

I'll add on to what leftaroundabout said. This is not really a potential concern for the list type, but in general, data types sometimes get modified. If you have an ill-conceived
data FavoriteFood = Pizza
| SesameSpinachPancake
| ChanaMasala
and later you learn to like DrunkenNoodles and add it to the list, all your functions that pattern match on the type will need to be expanded. If you've used any _ patterns, you will need to find them manually; the compiler can't help you. If you've matched each thing explicitly, you can turn on -fwarn-incomplete-patterns and GHC will tell you about every spot you've missed.

Related

How do I modify this Haskell function so I don't have to import Data.Bool and only use prelude function?

I want to build function below using only prelude built in function without importing Data.Bool. I want to replace bool function to something else so I don't have to import Data.Bool and function prints same output as below function. How can I do this so it returns same output?
increment :: [Bool] -> [Bool]
increment x = case x of
[] -> [True]
(y : ys) -> not y : bool id increment y ys
bool from Data.Bool is doing exactly the same thing as a if statement, so it can be a way to implement it:
bool x y b = if b then y else x
#dfeuer suggested in a comment that you should throw away this code because it's disgusting, and instead try to write it yourself. This might be distressing to you if you're the one that wrote the code in the first place and can't see why it's disgusting, so allow me to elaborate.
In fact, "disgusting" is too strong a word. However, the code is unnecessarily complex and difficult to understand. A more straightforward implementation does all the processing using pattern matching on the function argument:
increment :: [Bool] -> [Bool]
increment [] = [True]
increment (False : rest) = True : rest
increment (True : rest) = False : increment rest
This code is easier to read for most people, because all of the decision logic is at the same "level" and implemented the same way -- by inspecting the three patterns on the left-hand side of the definitions, you can see exactly how the three, mutually exclusive cases are handled at a glance.
In contrast, the original code requires the reader to consider the pattern match against an empty versus not empty list, the effect of the "not" computation on the first boolean, the bool call based on that same boolean, and the application of either the function id or the recursive increment on the rest of the boolean list. For any given input, you need to consider all four conceptually distinct processing steps to understand what the function is doing, and at the end, you'll probably still be uncertain about which steps were triggered by which aspects of the input.
Now, ideally, GHC with -O2 would compile both of these version to exactly the same code internally. It almost does. But, it turns out that due to an apparent optimization bug, the original code ends up being slightly less efficient than this rewritten version because it unnecessarily checks y == True twice.

How to Pattern Match With Algebraic Types in Haskell

The goal of the assignment I am working on is to create a bunch of different functions that involve searching a data type called a Trie, in which the constructor is defined as
data Trie = MakeTrie Char [Trie] deriving Eq
I am tying to first build simple functions so I can figure out how to descend this Trie, but it seems like pattern matching is not working.
test :: Trie -> Bool
test t
| t == MakeTrie '.' [_] = True
| otherwise = False
I get an error stating that a hole was found and that relevant bindings include t :: Trie. How can I let the interpreter know that [_] represents a list of Tries? The reason I am doing this is because I have no idea how else do go ahead descending my Trie later if I don't use pattern matching.
You should checkout the function syntax chapter in Learn You A Haskell (particularly the first section on pattern matching).
This is how you do pattern matching in Haskell for this example:
test :: Trie -> Bool
test (MakeTrie '.' _) = True
test _ = False
Testing:
Prelude> test (MakeTrie '.' [])
True
Prelude> test (MakeTrie 'a' [])
False
There are two problems here:
if you write [_], this is a pattern that says "a list of one element, regardless what that element is"; and
you can not do pattern matching with (==).
Indeed (==) is a function that compares two objects. But it is not said that if two objects are equal, that these share the same constructor, etc. (==) can implement an arbitrary equivalence relation.
We can write the function as:
test :: Trie -> Bool
test (MakeTrie '.' _) = True
test _ = False
So here the first clause checks if the input matches the pattern MakeTrie '.' _, so it checks if it is the MakeTrie data constructor where the first parameter is a '.', the second parameter can by anything.
The second clause matches everything, and returns False in that case.

A real life example when pattern matching is more preferable than a case expression in Haskell?

So I have been busy with the Real World Haskell book and I did the lastButOne exercise. I came up with 2 solutions, one with pattern matching
lastButOne :: [a] -> a
lastButOne ([]) = error "Empty List"
lastButOne (x:[]) = error "Only one element"
lastButOne (x:[x2]) = x
lastButOne (x:xs) = lastButOne xs
And one using a case expression
lastButOneCase :: [a] -> a
lastButOneCase x =
case x of
[] -> error "Empty List"
(x:[]) -> error "Only One Element"
(x:[x2]) -> x
(x:xs) -> lastButOneCase xs
What I wanted to find out is when would pattern matching be preferred over case expressions and vice versa. This example was not good enough for me because it seems that while both of the functions work as intended, it did not lead me to choose one implementation over the other. So the choice "seems" preferential at first glance?
So are there good cases by means of source code, either in haskell's own source or github or somewhere else, where one is able to see when either method is preferred or not?
First a short terminology diversion: I would call both of these "pattern matching". I'm not sure there is a good term for distinguishing pattern-matching-via-case and pattern-matching-via-multiple-definition.
The technical distinction between the two is quite light indeed. You can verify this yourself by asking GHC to dump the core it generates for the two functions, using the -ddump-simpl flag. I tried this at a few different optimization levels, and in all cases the only differences in the Core were naming. (By the way, if anyone knows a good "semantic diff" program for Core -- which knows about at the very least alpha equivalence -- I'm very interested in hearing about it!)
There are a few small gotchas to watch out for, though. You might wonder whether the following is also equivalent:
{-# LANGUAGE LambdaCase #-}
lastButOne = \case
[] -> error "Empty List"
(x:[]) -> error "Only One Element"
(x:[x2]) -> x
(x:xs) -> lastButOneCase xs
In this case, the answer is yes. But consider this similar-looking one:
-- ambiguous type error
sort = \case
[] -> []
x:xs -> insert x (sort xs)
All of a sudden this is a typeclass-polymorphic CAF, and so on old GHCs this will trigger the monomorphism restriction and cause an error, whereas the superficially identical version with an explicit argument does not:
-- this is fine!
sort [] = []
sort (x:xs) = insert x (sort xs)
The other minor difference (which I forgot about -- thank you to Thomas DuBuisson for reminding me) is in the handling of where clauses. Since where clauses are attached to binding sites, they cannot be shared across multiple equations but can be shared across multiple cases. For example:
-- error; the where clause attaches to the second equation, so
-- empty is not in scope in the first equation
null [] = empty
null (x:xs) = nonempty
where empty = True
nonempty = False
-- ok; the where clause attaches to the equation, so both empty
-- and nonempty are in scope for the entire case expression
null x = case x of
[] -> empty
x:xs -> nonempty
where
empty = True
nonempty = False
You might think this means you can do something with equations that you can't do with case expressions, namely, have different meanings for the same name in the two equations, like this:
null [] = answer where answer = True
null (x:xs) = answer where answer = False
However, since the patterns of case expressions are binding sites, this can be emulated in case expressions as well:
null x = case x of
[] -> answer where answer = True
x:xs -> answer where answer = False
Whether the where clause is attached to the case's pattern or to the equation depends on indentation, of course.
If I recall correctly both these will "desugar" into the same core code in ghc, so the choice is purely stylistic. Personally I would go for the first one. As someone said, its shorter, and what you term "pattern matching" is intended to be used this way. (Actually the second version is also pattern matching, just using a different syntax for it).
It's a stylistic preference. Some people sometimes argue that one choice or another makes certain code changes take less effort, but I generally find such arguments, even when accurate, don't actually amount to a big improvement. So do as you like.
A perspective that's well worth bringing into this is Hudak, Hughes, Peyton Jones and Wadler's paper "A History of Haskell: Being Lazy With Class". Section 4.4 is about this topic. The short story: Haskell supports both because the designers couldn't agree on one over the other. Yep, again, it's a stylistic preference.
When you're matching on more than one expression, case expressions start to look more attractive.
f pat11 pat21 = ...
f pat11 pat22 = ...
f pat11 pat23 = ...
f pat12 pat24 = ...
f pat12 pat25 = ...
can be more annoying to write than
f pat11 y =
case y of
pat21 -> ...
pat22 -> ...
pat23 -> ...
f pat12 y =
case y of
pat24 -> ...
pat25 -> ...
More significantly, I've found that when using GADTs, the "declaration style" doesn't seem to propagate evidence from left to right the way I'd expect it to. There might be some trick I haven't worked out, but I end up having to nest case expressions to avoid spurious incomplete pattern warnings.

Haskell's `otherwise` is a synonym for `_`?

I ran across a piece of code recently that used Haskell's otherwise to pattern match on a list. This struck me as odd, since:
ghci> :t otherwise
otherwise :: Bool
So, I tried the following:
ghci> case [] of otherwise -> "!?"
"!?"
I also tried it with various other patterns of different types and with -XNoImplicitPrelude turned on (to remove otherwise from scope), and it still works. Is this supposed to happen? Where is this documented?
It's not equivalent to _, it's equivalent to any other identifier. That is if an identifier is used as a pattern in Haskell, the pattern always matches and the matched value is bound to that identifier (unlike _ where it also always matches, but the matched value is discarded).
Just to be clear: the identifier otherwise is not special here. The code could just as well have been x -> "!?". Also, since the binding is never actually used, it would make more sense to use _ to avoid an "unused identifier" warning and to make it obvious to the reader that the value does not matter.
Just since nobody has said it yet, otherwise is supposed to be used as a guard expression, not a pattern. case ... of pat | ... -> ... | otherwise -> ... Now its definition as True is important. – Reid Barton
An example:
fact n acc
| n == 0 = acc
| otherwise = fact (n-1) $! (acc * n)
Since otherwise is True, that second guard will always succeed.
Note that using otherwise in a pattern (as opposed to a guard) is likely to confuse people. It will also trip a name shadowing warning if GHC is run with the appropriate warnings enabled.

Is it recommended to always have exhaustive pattern matches in Haskell, even for "impossible" cases?

Is it recommended to always have exhaustive pattern matches in Haskell, even for "impossible" cases?
For example, in the following code, I am pattern matching on the "accumulator" of a foldr. I am in complete control of the contents of the accumulator, because I create it (it is not passed to me as input, but rather built within my function). Therefore, I know certain patterns should never match it. If I strive to never get the "Pattern match(es) are non-exhaustive" error, then I would place a pattern match for it that simply error's with the message "This pattern should never happen." Much like an assert in C#. I can't think of anything else to do there.
What practice would you recommend in this situation and why?
Here's the code:
gb_groupBy p input = foldr step [] input
where
step item acc = case acc of
[] -> [[item]]
((x:xs):ys) -> if p x item
then (item:x:xs):ys
else [item]:acc
The pattern not matched (as reported by the interpreter) is:
Warning: Pattern match(es) are non-exhaustive
In a case alternative: Patterns not matched: [] : _
This is probably more a matter of style than anything else. Personally, I would put in a
_ -> error "Impossible! Empty list in step"
if only to silence the warning :)
You can resolve the warning in this special case by doing this:
gb_groupBy p input = foldr step [] input
where
step item acc = case acc of
[] -> [[item]]
(xs:xss) -> if p (head xs) item
then (item:xs):xss
else [item]:acc
The pattern matching is then complete, and the "impossible" condition of an empty list at the head of the accumulator would cause a runtime error but no warning.
Another way of looking at the more general problem of incomplete pattern matchings is to see them as a "code smell", i.e. an indication that we're trying to solve a problem in a suboptimal, or non-Haskellish, way, and try to rewrite our functions.
Implementing groupBy with a foldr makes it impossible to apply it to an infinite list, which is a design goal that the Haskell List functions try to achieve wherever semantically reasonable. Consider
take 5 $ groupBy (==) someFunctionDerivingAnInfiniteList
If the first 5 groups w.r.t. equality are finite, lazy evaluation will terminate. This is something you can't do in a strictly evaluated language. Even if you don't work with infinite lists, writing functions like this will yield better performance on long lists, or avoid the stack overflow that occurs when evaluating expressions like
take 5 $ gb_groupBy (==) [1..1000000]
In List.hs, groupBy is implemented like this:
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _ [] = []
groupBy eq (x:xs) = (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs
This enables the interpreter/compiler to evaluate only the parts of the computation necessary for the result.
span yields a pair of lists, where the first consists of (consecutive) elements from the head of the list all satisfying a predicate, and the second is the rest of the list. It's also implemented to work on infinite lists.
I find exhaustiveness checking on case patterns indispensible. I try never to use _ in a case at top level, because _ matches everything, and by using it you vitiate the value of exhaustiveness checking. This is less important with lists but critical important with user-defined algebraic data types, because I want to be able to add a new constructor and have the compiler barf on all the missing cases. For this reason I always compile with -Werror turned on, so there is no way I can leave out a case.
As observed, your code can be extended with this case
[] : _ -> error "this can't happen"
Internally, GHC has a panic function, which unlike error will give source coordinates, but I looked at the implementation and couldn't make head or tail of it.
To follow up on my earlier comment, I realised that there is a way to acknowledge the missing case but still get a useful error with file/line number. It's not ideal as it'll only appear in unoptimized builds, though (see here).
...
[]:xs -> assert False (error "unreachable because I know everything")
The type system is your friend, and the warning is letting you know your function has cracks. The very best approach is to go for a cleaner, more elegant fit between types.
Consider ghc's definition of groupBy:
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _ [] = []
groupBy eq (x:xs) = (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs
My point of view is that an impossible case is undefined.
If it's undefined we have a function for it: the cunningly named undefined.
Complete your matching with the likes of:
_ -> undefined
And there you have it!

Resources