Using pairs in list comprehension gives "redundant clause" warning - list-comprehension

I get an odd-looking warning when I use the (_,_) pattern in a list comprehension. My minimal working example is as follows.
theory Misc imports
Main
"~~/src/HOL/Library/Code_Target_Numeral"
begin
definition "xys = [(1::int,2::int),(2,3),(3,4)]"
value "[x+4. (x,_) ← xys]"
end
Everything seems to work fine, but I get the warning
The following clauses are redundant (covered by preceding clauses):
x ⇒ []
Should I be worried?

No need to be worried.
The syntax
[a+4. (a, b) ← xys]
is internally translated to
concat (map (λx. case x of (a, b) ⇒ [a + 4] | _ ⇒ []) xys)
(see the section "List comprehension" of HOL/List.thy).
What is happening here is that if your pattern (a, b) matches, it will be mapped to your expression a + 4 (the first half of the case expression). If it doesn't match, it will be dropped from the output list (the second half of the case expression). In your case, the pattern (a, b) will always match, meaning that the second half is redundant---hence the warning.
In the ideal case, the list comprehension implementation would be modified to not generate the second half of the case expression if the pattern entered by the user will always match. Until this is done, it is safe to just ignore the warning.

Related

Is whitespace either used as a function application operator or a word separator

How does outermost evaluation work on an application of a curried function? says:
in Haskell, whitespace is an operator: it applies the lhs function to the rhs argument.
Is it true? I can't find it in documents.
When Haskell compiler lexical analyzing a Haskell program, is a whitespace recognized as either a function application operator or a token separator?
I’ve never heard anyone say that whitespace is an operator before. I suppose you could consider it to be an operator in the context of a function application, but in most contexts it is not an operator. For instance, I don’t see any way to consider whitespace as an operator in the following code sample, where whitespace is used only to separate tokens:
module Main where
x = "test 1"
y = "test 2"
main = do
(z : zs) <- getLine
putStrLn $ z : (x ++ y ++ zs)
It seems fairly obvious here that whitespace is acting purely as a token separator. The apparent ‘operator-ness’ in something like f x y z can be best thought of as saying that if two values are placed next to each other, the second is applied to the first. For instance, putStrLn"xxx" and putStrLn "xxx" both apply "xxx" to putStrLn; the space is completely irrelevant.
EDIT: In a comment, #DanielWagner provided two great examples. Firstly, (f)x is the same as f x, yet has no whitespace; here we see confirmation that the space is acting purely as a token separator, so can be replaced by a bracket (which also separates tokens) without any impact on the lexemes of the expression. Secondly, f {x=y} does not apply {x=y} to f, but rather uses record syntax to create a new record based on f; again, we can remove the space to get f{x=y}, which does an equally good job of separating the lexemes.
The white space in most cases is "function application", meaning apply the function of the right, to the argument to the left, just like the ($) operator, but it can be used to be more clear on your code, some examples:
plusOne = (1 +)
you can either do
plusOne 2
or
plusOne $ 2
:t ($)
($) :: (a -> b) -> a -> b
I forgot a usefull example:
imagine you want to filter the greater than 3, but before you want to add one to each element:
filter (>3) $ map plusOne [1,2,3,4]
That will compile, but this wont:
filter (>3) map plusOne [1,2,3,4]
But in other cases, is not function application, like the other #bradrn answer or #Daniel warner comment just shows.

Strange tilde syntax

GHC accepts this code, but it ought to be illegal syntax(?) Any guesses as to what's going on?
module Tilde where
~ x = x + 2 -- huh?
~ x +++ y = y * 3 -- this makes sense
The (+++) equation makes sense: it's declaring an operator, using infix syntax, and using an irrefutable pattern match on the first argument.
The first 'equation' looks like the same to start with. But there's no operator. If I ask
λ> :i ~
===> <interactive>:1:1: error: parse error on input `~'
λ> :i (~)
===> class (a ~ b) => (~) (a :: k) (b :: k)
-- Defined in `Data.Type.Equality'
instance [incoherent] forall k (a :: k) (b :: k). (a ~ b) => a ~ b
-- Defined in `Data.Type.Equality'
which is a bemusing discovery, but nothing to do with it(?) I can't define my own class or operator (~) -- Illegal binding of built-in syntax, not surprisingly.
Oh:
λ> :i x
===> x :: Integer -- GHCi defaulting, presumably
and trying to run x loops for ever. So the strangeness is actually defining
x = x + 2
Then what's the ~ doing?
The tilde is doing exactly what it did in your other example: it makes the pattern irrefutable (so the pattern match can not fail). The pattern already was irrefutable, of course, in both cases (being a plain variable, which always matches), but that doesn't make the tilde illegal, just unnecessary.
Writing
x = 5
creates a global variable named x, bound to the value 5. Adding a tilde makes the pattern match irrefutable, but it was already irrefutable, so that doesn't make much sense. But it's legal to write something like
(xs, ys) = span odd [1..10]
This defines two global variables, xs and ys, containing all the odd numbers and all the even numbers between 1 and 10. You could even make this irrefutable if you want by adding a tilde. Of course, this pattern can't fail (if the expression is well-typed), so there's no point to that. But consider:
~(x:xs) = filter odd [1..10]
This defines two global variables, x and xs, if the filter returns at least one result. If the filter were to return zero results, the pattern match would fail. (In practice, this means that accessing x or xs would throw a pattern match failure exception.)
You can even write utterly bizarre stuff like
False = True
This seemingly nonsensical declaration pattern-matches the pattern False against the value True, and does nothing either way. It's one of those obscure corners of the language.

Capitalize Every Other Letter in a String -- take / drop versus head / tail for Lists

I have spent the past afternoon or two poking at my computer as if I had never seen one before. Today's topic Lists
The exercise is to take a string and capitalize every other letter. I did not get very far...
Let's take a list x = String.toList "abcde" and try to analyze it. If we add the results of take 1 and drop 1 we get back the original list
> x = String.toList "abcde"
['a','b','c','d','e'] : List Char
> (List.take 1 x) ++ (List.drop 1 x)
['a','b','c','d','e'] : List Char
I thought head and tail did the same thing, but I get a big error message:
> [List.head x] ++ (List.tail x)
==================================== ERRORS ====================================
-- TYPE MISMATCH --------------------------------------------- repl-temp-000.elm
The right argument of (++) is causing a type mismatch.
7│ [List.head x] ++ (List.tail x)
^^^^^^^^^^^
(++) is expecting the right argument to be a:
List (Maybe Char)
But the right argument is:
Maybe (List Char)
Hint: I always figure out the type of the left argument first and if it is
acceptable on its own, I assume it is "correct" in subsequent checks. So the
problem may actually be in how the left and right arguments interact.
The error message tells me a lot of what's wrong. Not 100% sure how I would fix it. The list joining operator ++ is expecting [Maybe Char] and instead got Maybe [Char]
Let's just try to capitalize the first letter in a string (which is less cool, but actually realistic).
[String.toUpper ( List.head x)] ++ (List.drop 1 x)
This is wrong since Char.toUpper requires String and instead List.head x is a Maybe Char.
[Char.toUpper ( List.head x)] ++ (List.drop 1 x)
This also wrong since Char.toUpper requires Char instead of Maybe Char.
In real life a user could break a script like this by typing non-Unicode character (like an emoji). So maybe Elm's feedback is right. This should be an easy problem it takes "abcde" and turns into "AbCdE" (or possibly "aBcDe"). How to handle errors properly?
The same question in JavaScript: How do I make the first letter of a string uppercase in JavaScript?
In Elm, List.head and List.tail both return they Maybe type because either function could be passed an invalid value; specifically, the empty list. Some languages, like Haskell, throw an error when passing an empty list to head or tail, but Elm tries to eliminate as many runtime errors as possible.
Because of this, you must explicitly handle the exceptional case of the empty list if you choose to use head or tail.
Note: There are probably better ways to achieve your end goal of string mixed capitalization, but I'll focus on the head and tail issue because it's a good learning tool.
Since you're using the concatenation operator, ++, you'll need a List for both arguments, so it's safe to say you could create a couple functions that handle the Maybe return values and translate them to an empty list, which would allow you to use your concatenation operator.
myHead list =
case List.head list of
Just h -> [h]
Nothing -> []
myTail list =
case List.tail list of
Just t -> t
Nothing -> []
Using the case statements above, you can handle all possible outcomes and map them to something usable for your circumstances. Now you can swap myHead and myTail into your code and you should be all set.

One interesting pattern

I'm solving 99 Haskell Probems. I've successfully solved problem No. 21, and when I opened solution page, the following solution was proposed:
Insert an element at a given position into a list.
insertAt :: a -> [a] -> Int -> [a]
insertAt x xs (n+1) = let (ys,zs) = split xs n in ys++x:zs
I found pattern (n + 1) interesting, because it seems to be an elegant way to convert 1-based argument of insertAt into 0-based argument of split (it's function from previous exercises, essentially the same as splitAt). The problem is that GHC did not find this pattern that elegant, in fact it says:
Parse error in pattern: n + 1
I don't think that the guy who wrote the answer was dumb and I would like to know if this kind of patterns is legal in Haskell, and if it is, how to fix the solution.
I believe it has been removed from the language, and so was likely around when the author of 99 Haskell Problems wrote that solution, but it is no longer in Haskell.
The problem with n+k patterns goes back to a design decision in Haskell, to distinguish between constructors and variables in patterns by the first character of their names. If you go back to ML, a common function definition might look like (using Haskell syntax)
map f nil = nil
map f (x:xn) = f x : map f xn
As you can see, syntactically there's no difference between f and nil on the LHS of the first line, but they have different roles; f is a variable that needs to be bound to the first argument to map while nil is a constructor that needs to be matched against the second. Now, ML makes this distinction by looking each variable up in the surrounding scope, and assuming names are variables when the look-up fails. So nil is recognized as a constructor when the lookup fails. But consider what happens when there's a typo in the pattern:
map f niil = nil
(two is in niil). niil isn't a constructor name in scope, so it gets treated as a variable, and the definition is silently interpreted incorrectly.
Haskell's solution to this problem is to require constructor names to begin with uppercase letters, and variable names to begin with lowercase letters. And, for infix operators / constructors, constructor names must begin with : while operator names may not begin with :. This also helps distinguish between deconstructing bindings:
x:xn = ...
is clearly a deconstructing binding, because you can't define a function named :, while
n - m = ...
is clearly a function definition, because - can't be a constructor name.
But allowing n+k patterns, like n+1, means that + is both a valid function name, and something that works like a constructor in patterns. Now
n + 1 = ...
is ambiguous again; it could be part of the definition of a function named (+), or it could be a deconstructing pattern match definition of n. In Haskell 98, this ambiguity was solved by declaring
n + 1 = ...
a function definition, and
(n + 1) = ...
a deconstructing binding. But that obviously was never a satisfactory solution.
Note that you can now use view patterns instead of n+1.
For example:
{-# LANGUAGE ViewPatterns #-}
module Temp where
import Data.List (splitAt)
split :: [a] -> Int -> ([a], [a])
split = flip splitAt
insertAt :: a -> [a] -> Int -> [a]
insertAt x xs (subtract 1 -> n) = let (ys,zs) = split xs n in ys++x:zs

Is it recommended to always have exhaustive pattern matches in Haskell, even for "impossible" cases?

Is it recommended to always have exhaustive pattern matches in Haskell, even for "impossible" cases?
For example, in the following code, I am pattern matching on the "accumulator" of a foldr. I am in complete control of the contents of the accumulator, because I create it (it is not passed to me as input, but rather built within my function). Therefore, I know certain patterns should never match it. If I strive to never get the "Pattern match(es) are non-exhaustive" error, then I would place a pattern match for it that simply error's with the message "This pattern should never happen." Much like an assert in C#. I can't think of anything else to do there.
What practice would you recommend in this situation and why?
Here's the code:
gb_groupBy p input = foldr step [] input
where
step item acc = case acc of
[] -> [[item]]
((x:xs):ys) -> if p x item
then (item:x:xs):ys
else [item]:acc
The pattern not matched (as reported by the interpreter) is:
Warning: Pattern match(es) are non-exhaustive
In a case alternative: Patterns not matched: [] : _
This is probably more a matter of style than anything else. Personally, I would put in a
_ -> error "Impossible! Empty list in step"
if only to silence the warning :)
You can resolve the warning in this special case by doing this:
gb_groupBy p input = foldr step [] input
where
step item acc = case acc of
[] -> [[item]]
(xs:xss) -> if p (head xs) item
then (item:xs):xss
else [item]:acc
The pattern matching is then complete, and the "impossible" condition of an empty list at the head of the accumulator would cause a runtime error but no warning.
Another way of looking at the more general problem of incomplete pattern matchings is to see them as a "code smell", i.e. an indication that we're trying to solve a problem in a suboptimal, or non-Haskellish, way, and try to rewrite our functions.
Implementing groupBy with a foldr makes it impossible to apply it to an infinite list, which is a design goal that the Haskell List functions try to achieve wherever semantically reasonable. Consider
take 5 $ groupBy (==) someFunctionDerivingAnInfiniteList
If the first 5 groups w.r.t. equality are finite, lazy evaluation will terminate. This is something you can't do in a strictly evaluated language. Even if you don't work with infinite lists, writing functions like this will yield better performance on long lists, or avoid the stack overflow that occurs when evaluating expressions like
take 5 $ gb_groupBy (==) [1..1000000]
In List.hs, groupBy is implemented like this:
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _ [] = []
groupBy eq (x:xs) = (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs
This enables the interpreter/compiler to evaluate only the parts of the computation necessary for the result.
span yields a pair of lists, where the first consists of (consecutive) elements from the head of the list all satisfying a predicate, and the second is the rest of the list. It's also implemented to work on infinite lists.
I find exhaustiveness checking on case patterns indispensible. I try never to use _ in a case at top level, because _ matches everything, and by using it you vitiate the value of exhaustiveness checking. This is less important with lists but critical important with user-defined algebraic data types, because I want to be able to add a new constructor and have the compiler barf on all the missing cases. For this reason I always compile with -Werror turned on, so there is no way I can leave out a case.
As observed, your code can be extended with this case
[] : _ -> error "this can't happen"
Internally, GHC has a panic function, which unlike error will give source coordinates, but I looked at the implementation and couldn't make head or tail of it.
To follow up on my earlier comment, I realised that there is a way to acknowledge the missing case but still get a useful error with file/line number. It's not ideal as it'll only appear in unoptimized builds, though (see here).
...
[]:xs -> assert False (error "unreachable because I know everything")
The type system is your friend, and the warning is letting you know your function has cracks. The very best approach is to go for a cleaner, more elegant fit between types.
Consider ghc's definition of groupBy:
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _ [] = []
groupBy eq (x:xs) = (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs
My point of view is that an impossible case is undefined.
If it's undefined we have a function for it: the cunningly named undefined.
Complete your matching with the likes of:
_ -> undefined
And there you have it!

Resources