Can GHC warn if class instance is a loop?

Can GHC warn if class instance is a loop? - haskell

Real World Haskell has this example:
class BasicEq3 a where
isEqual3 :: a -> a -> Bool
isEqual3 x y = not (isNotEqual3 x y)
isNotEqual3 :: a -> a -> Bool
isNotEqual3 x y = not (isEqual3 x y)
instance BasicEq3 Bool
And when I run it in GHCI:
#> isEqual3 False False
out of memory
So, you have to implement at least one of the 2 methods or it will loop. And you get the flexibility of choosing which one, which is neat.
The question I have is, is there a way to get a warning or something if didn't override enough of the defaults and the defaults form a loop? It seems strange to me that the compiler that is so crazy smart is fine with this example.

I think it's perfectly fine for GHC to issue a warning in case of an "unbroken" cyclic dependency. There's even a ticket along those lines: http://hackage.haskell.org/trac/ghc/ticket/6028
Just because something is "undecidable" doesn't mean no instance of the problem can be solved effectively. GHC (or any other Haskell compiler) already has quite a bit of the information it needs, and it'd be perfectly possible for it to issue a warning if the user is instantiating a class without "breaking" the cyclic dependency. And if the compiler gets it wrong in the rare cases as exemplified in previous posts, then the user can have a -nowarnundefinedcyclicmethods or a similar mechanism to tell GHC to be quiet. In nearly every other case, the warning will be most welcome and would add to programmer productivity; avoiding what's almost always a silly bug.

No, I'm afraid GHC doesn't do anything like that. Also that isn't possible in general.
You see, the methods of a type class could be mutually recursive in a useful way. Here's a contrived example of such a type class. It's perfectly fine to not define either sumOdds or sumEvens, even though their default implementations are in terms of each other.
class Weird a where
measure :: a -> Int
sumOdds :: [a] -> Int
sumOdds [] = 0
sumOdds (_:xs) = sumEvens xs
sumEvens :: [a] -> Int
sumEvens [] = 0
sumEvens (x:xs) = measure x + sumOdds xs

No, there isn't, since if the compiler could make this determination, that would be equivalent to solving the Halting Problem. In general, the fact that two functions call each other in a "loop" pattern is not enough to conclude that actually calling one of the functions will result in a loop.
To use a (contrived) example,
collatzOdd :: Int -> Int
collatzOdd 1 = 1
collatzOdd n = let n' = 3*n+1 in if n' `mod` 2 == 0 then collatzEven n'
else collatzOdd n'
collatzEven :: Int -> Int
collatzEven n = let n' = n `div` 2 in if n' `mod` 2 == 0 then collatzEven n'
else collatzOdd n'
collatz :: Int -> Int
collatz n = if n `mod` 2 == 0 then collatzEven n else collatzOdd n
(This is of course not the most natural way to implement the Collatz conjecture, but it illustrates mutually recursive functions.)
Now collatzEven and collatzOdd depend on each other, but the Collatz conjecture states that calling collatz terminates for all positive n. If GHC could determine whether a class that had collatzOdd and collatzEven as default definitions had a complete definition or not, then GHC would be able to solve the Collatz conjecture! (This is of course not a proof of the undecideability of the Halting Problem, but it should illustrate why determining whether a mutually recursive set of functions is well-defined is not at all as trivial as it may seem.)
In general, since GHC cannot determine this automatically, documentation for Haskell classes will give the "minimal complete definition" necessary for creating an instance of a class.

I don't think so. I worry that you're expecting the compiler to solve the halting problem! Just because two functions are defined in terms of each other, doesn't mean it's a bad default class. Also, I've used classes in the past where I just needed to write instance MyClass MyType to add useful functionality. So asking the compiler to warn you about that class is asking it to complain about other, valid code.
[Of course, use ghci during development and test every function after you've written it!
Use HUnit and/or QuickCheck, just to make sure none of this stuff ends up in final code.]

In my personal opinion, the defaulting mechanism is unnecessary and unwise. It would be easy for the class author to just provide the defaults as ordinary functions:
notEq3FromEq3 :: (a -> a -> Bool) -> (a -> a -> Bool)
notEq3FromEq3 eq3 = (\x y -> not (eq3 x y))
eq3FromNotEq3 :: (a -> a -> Bool) -> (a -> a -> Bool)
eq3FromNotEq3 ne3 = (\x y -> not (ne3 x y))
(In fact, these two definitions are equal, but that would not be true in general). Then an instance looks like:
instance BasicEq3 Bool where
isEqual3 True True = True
isEqual3 False False = True
isEqual3 _ _ = False
isNotEqual3 = notEq3FromEq3 isEqual3
and no defaults are required. Then GHC can warn you if you don't provide the definition, and any unpleasant loops have to be explicitly written by you into your code.
This does remove the neat ability to add new methods to a class with default definitions without affecting existing instances, but that's not so huge a benefit in my view. The above approach is also in principle more flexible: you could e.g. provide functions that allowed an Ord instance to choose any comparison operator to implement.

Related

Pattern match over non-constructor functions

One of the most powerful ways pattern matching and lazy evaluation can come together is to bypass expensive computation. However I am still shocked that Haskell only permits the pattern matching of constructors, which is barely pattern matching at all!
Is there some way to impliment the following functionality in Haskell:
exp :: Double -> Double
exp 0 = 1
exp (log a) = a
--...
log :: Double -> Double
log 1 = 0
log (exp a) = a
--...
The original problem I found this useful in was writing an associativity preference / rule in a Monoid class:
class Monoid m where
iden :: m
(+) m -> m -> m
(+) iden a = a
(+) a iden = a
--Line with issue
(+) ((+) a b) c = (+) a ((+) b c)

There's no reason to be shocked about this. How would it be even remotely feasible to pattern match on arbitrary functions? Most functions aren't invertible, and even for those that are it is typically nontrivial to actually compute the inverses.
Of course the compiler could in principle handle trivial examples like replacing literal exp (log x) with x, but that would be almost completely useless in practice (in the unlikely event somebody were to literally write that, they could as well reduce it right there in the source), and would generally lead to very weird unpredictable behaviour if inlining order changes whether or not the compiler can see that a match applies.
(There is however a thing called rewrite rules, which is similar to what you proposed but is seen as only an optimisation tool.)
Even the two lines from the Monoid class that don't error don't make sense, but for different reasons. First, when you write
(+) iden a = a
(+) a iden = a
this doesn't do what you seem to think. These are actually two redundant catch-call clauses, equivalent to
(+) x y = y
(+) x y = x
...which is an utterly nonsensical thing to write. What you want to state could in fact be written as
default (+) :: Eq a => a -> a -> a
x+y
| x==iden = y
| y==iden = x
| otherwise = ...
...but this still doesn't accomplish anything useful, because this is never going to be a full definition of +. And as soon as a concrete instance even begins to define its own + operator, the complete default one is going to be ignored.
Moreover, if you were to have these kind of clauses all over your Haskell project it would in practice just mean your performing a lot of unnecessary, redundant extra checks. A law-abiding Monoid instance needs to fulfill mempty <> a ≡ a anyway, no point explicitly special-casing it.
I think what you really want is tests. It would make sense to specify laws right in a class declaration in a way that they could automatically be checked, but standard Haskell has no syntax for this. Most projects just do it in a separate test suite, using QuickCheck to generate example inputs. I think there's also a tool that allow you to put the test cases right in your source file, but I forgot what it's called.

Besides as-pattern, what else can # mean in Haskell?

I am studying Haskell currently and try to understand a project that uses Haskell to implement cryptographic algorithms. After reading Learn You a Haskell for Great Good online, I begin to understand the code in that project. Then I found I am stuck at the following code with the "#" symbol:
-- | Generate an #n#-dimensional secret key over #rq#.
genKey :: forall rq rnd n . (MonadRandom rnd, Random rq, Reflects n Int)
=> rnd (PRFKey n rq)
genKey = fmap Key $ randomMtx 1 $ value #n
Here the randomMtx is defined as follows:
-- | A random matrix having a given number of rows and columns.
randomMtx :: (MonadRandom rnd, Random a) => Int -> Int -> rnd (Matrix a)
randomMtx r c = M.fromList r c <$> replicateM (r*c) getRandom
And PRFKey is defined below:
-- | A PRF secret key of dimension #n# over ring #a#.
newtype PRFKey n a = Key { key :: Matrix a }
All information sources I can find say that # is the as-pattern, but this piece of code is apparently not that case. I have checked the online tutorial, blogs and even the Haskell 2010 language report at https://www.haskell.org/definition/haskell2010.pdf. There is simply no answer to this question.
More code snippets can be found in this project using # in this way too:
-- | Generate public parameters (\( \mathbf{A}_0 \) and \(
-- \mathbf{A}_1 \)) for #n#-dimensional secret keys over a ring #rq#
-- for gadget indicated by #gad#.
genParams :: forall gad rq rnd n .
(MonadRandom rnd, Random rq, Reflects n Int, Gadget gad rq)
=> rnd (PRFParams n gad rq)
genParams = let len = length $ gadget #gad #rq
n = value #n
in Params <$> (randomMtx n (n*len)) <*> (randomMtx n (n*len))
I deeply appreciate any help on this.

That #n is an advanced feature of modern Haskell, which is usually not covered by tutorials like LYAH, nor can be found the the Report.
It's called a type application and is a GHC language extension. To understand it, consider this simple polymorphic function
dup :: forall a . a -> (a, a)
dup x = (x, x)
Intuitively calling dup works as follows:
the caller chooses a type a
the caller chooses a value x of the previously chosen type a
dup then answers with a value of type (a,a)
In a sense, dup takes two arguments: the type a and the value x :: a. However, GHC is usually able to infer the type a (e.g. from x, or from the context where we are using dup), so we usually pass only one argument to dup, namely x. For instance, we have
dup True :: (Bool, Bool)
dup "hello" :: (String, String)
...
Now, what if we want to pass a explicitly? Well, in that case we can turn on the TypeApplications extension, and write
dup #Bool True :: (Bool, Bool)
dup #String "hello" :: (String, String)
...
Note the #... arguments carrying types (not values). Those are something that exists at compile time, only -- at runtime the argument does not exist.
Why do we want that? Well, sometimes there is no x around, and we want to prod the compiler to choose the right a. E.g.
dup #Bool :: Bool -> (Bool, Bool)
dup #String :: String -> (String, String)
...
Type applications are often useful in combination with some other extensions which make type inference unfeasible for GHC, like ambiguous types or type families. I won't discuss those, but you can simply understand that sometimes you really need to help the compiler, especially when using powerful type-level features.
Now, about your specific case. I don't have all the details, I don't know the library, but it's very likely that your n represents a kind of natural-number value at the type level. Here we are diving in rather advanced extensions, like the above-mentioned ones plus DataKinds, maybe GADTs, and some typeclass machinery. While I can't explain everything, hopefully I can provide some basic insight. Intuitively,
foo :: forall n . some type using n
takes as argument #n, a kind-of compile-time natural, which is not passed at runtime. Instead,
foo :: forall n . C n => some type using n
takes #n (compile-time), together with a proof that n satisfies constraint C n. The latter is a run-time argument, which might expose the actual value of n. Indeed, in your case, I guess you have something vaguely resembling
value :: forall n . Reflects n Int => Int
which essentially allows the code to bring the type-level natural to the term-level, essentially accessing the "type" as a "value". (The above type is considered an "ambiguous" one, by the way -- you really need #n to disambiguate.)
Finally: why should one want to pass n at the type level if we then later on convert that to the term level? Wouldn't be easier to simply write out functions like
foo :: Int -> ...
foo n ... = ... use n
instead of the more cumbersome
foo :: forall n . Reflects n Int => ...
foo ... = ... use (value #n)
The honest answer is: yes, it would be easier. However, having n at the type level allows the compiler to perform more static checks. For instance, you might want a type to represent "integers modulo n", and allow adding those. Having
data Mod = Mod Int -- Int modulo some n
foo :: Int -> Mod -> Mod -> Mod
foo n (Mod x) (Mod y) = Mod ((x+y) `mod` n)
works, but there is no check that x and y are of the same modulus. We might add apples and oranges, if we are not careful. We could instead write
data Mod n = Mod Int -- Int modulo n
foo :: Int -> Mod n -> Mod n -> Mod n
foo n (Mod x) (Mod y) = Mod ((x+y) `mod` n)
which is better, but still allows to call foo 5 x y even when n is not 5. Not good. Instead,
data Mod n = Mod Int -- Int modulo n
-- a lot of type machinery omitted here
foo :: forall n . SomeConstraint n => Mod n -> Mod n -> Mod n
foo (Mod x) (Mod y) = Mod ((x+y) `mod` (value #n))
prevents things to go wrong. The compiler statically checks everything. The code is harder to use, yes, but in a sense making it harder to use is the whole point: we want to make it impossible for the user to try adding something of the wrong modulus.
Concluding: these are very advanced extensions. If you're a beginner, you will need to slowly progress towards these techniques. Don't be discouraged if you can't grasp them after only a short study, it does take some time. Make a small step at a time, solve some exercises for each feature to understand the point of it. And you'll always have StackOverflow when you are stuck :-)

Haskell: Typeclass vs passing a function

To me it seems that you can always pass function arguments rather than using a typeclass. For example rather than defining equality typeclass:
class Eq a where
(==) :: a -> a -> Bool
And using it in other functions to indicate type argument must be an instance of Eq:
elem :: (Eq a) => a -> [a] -> Bool
Can't we just define our elem function without using a typeclass and instead pass a function argument that does the job?

Yes. This is called "dictionary passing style". Sometimes when I am doing some especially tricky things, I need to scrap a typeclass and turn it into a dictionary, because dictionary passing is more powerful1, yet often quite cumbersome, making conceptually simple code look quite complicated. I use dictionary passing style sometimes in languages that aren't Haskell to simulate typeclasses (but have learned that that is usually not as great an idea as it sounds).
Of course, whenever there is a difference in expressive power, there is a trade-off. While you can use a given API in more ways if it is written using DPS, the API gets more information if you can't. One way this shows up in practice is in Data.Set, which relies on the fact that there is only one Ord dictionary per type. The Set stores its elements sorted according to Ord, and if you build a set with one dictionary, and then inserted an element using a different one, as would be possible with DPS, you could break Set's invariant and cause it to crash. This uniqueness problem can be mitigated using a phantom existential type to mark the dictionary, but, again, at the cost of quite a bit of annoying complexity in the API. This also shows up in pretty much the same way in the Typeable API.
The uniqueness bit doesn't come up very often. What typeclasses are great at is writing code for you. For example,
catProcs :: (i -> Maybe String) -> (i -> Maybe String) -> (i -> Maybe String)
catProcs f g = f <> g
which takes two "processors" which take an input and might give an output, and concatenates them, flattening away Nothing, would have to be written in DPS something like this:
catProcs f g = (<>) (funcSemi (maybeSemi listSemi)) f g
We essentially had to spell out the type we're using it at again, even though we already spelled it out in the type signature, and even that was redundant because the compiler already knows all the types. Because there's only one way to construct a given Semigroup at a type, the compiler can do it for you. This has a "compound interest" type effect when you start defining a lot of parametric instances and using the structure of your types to compute for you, as in the Data.Functor.* combinators, and this is used to great effect with deriving via where you can essentially get all the "standard" algebraic structure of your type written for you.
And don't even get me started on MPTC's and fundeps, which feed information back into typechecking and inference. I have never tried converting such a thing to DPS -- I suspect it would involve passing around a lot of type equality proofs -- but in any case I'm sure it would be a lot more work for my brain than I would be comfortable with.
--
1Unless you use reflection in which case they become equivalent in power -- but reflection can also be cumbersome to use.

Yes. That (called dictionary passing) is basically what the compiler does to typeclasses anyway. For that function, done literally, it would look a bit like this:
elemBy :: (a -> a -> Bool) -> a -> [a] -> Bool
elemBy _ _ [] = False
elemBy eq x (y:ys) = eq x y || elemBy eq x ys
Calling elemBy (==) x xs is now equivalent to elem x xs. And in this specific case, you can go a step further: eq has the same first argument every time, so you can make it the caller's responsibility to apply that, and end up with this:
elemBy2 :: (a -> Bool) -> [a] -> Bool
elemBy2 _ [] = False
elemBy2 eqx (y:ys) = eqx y || elemBy2 eqx ys
Calling elemBy2 (x ==) xs is now equivalent to elem x xs.
...Oh wait. That's just any. (And in fact, in the standard library, elem = any . (==).)

Eta-conversion changes semantics in a strict language

Take this OCaml code:
let silly (g : (int -> int) -> int) (f : int -> int -> int) =
g (f (print_endline "evaluated"; 0))
silly (fun _ -> 0) (fun x -> fun y -> x + y)
It prints evaluated and returns 0. But if I eta-expand f to get g (fun x -> f (print_endline "evaluated"; 0) x), evaluated is no longer printed.
Same holds for this SML code:
fun silly (g : (int -> int) -> int, f : int -> int -> int) : int =
g (f (print "evaluated" ; 0));
silly ((fn _ => 0), fn x => fn y => x + y);
On the other hand, this Haskell code doesn't print evaluated even with the strict pragma:
{-# LANGUAGE Strict #-}
import Debug.Trace
silly :: ((Int -> Int) -> Int) -> (Int -> Int -> Int) -> Int
silly g f = g (f (trace "evaluated" 0))
main = print $ silly (const 0) (+)
(I can make it, though, by using seq, which makes perfect sense for me)
While I understand that OCaml and SML do the right thing theoretically, are there any practical reason to prefer this behaviour to the "lazier" one? Eta-contraction is a common refactoring tool and I'm totally scared of using it in a strict language. I feel like I should paranoically eta-expand everything, just because otherwise arguments to partially applied functions can be evaluated when they're not supposed to. When is the "strict" behaviour useful?
Why and how does Haskell behave differently under the Strict pragma? Are there any references I can familiarize myself with to better understand the design space and pros and cons of the existing approaches?

To address the technical part of your question, eta-conversion also changes the meaning of expressions in lazy languages, you just need to consider the eta-rule of a different type constructor, e.g., + instead of ->.
This is the eta-rule for binary sums:
(case e of Lft y -> f (Lft y) | Rgt y -> f (Rgt y)) = f e (eta-+)
This equation holds under eager evaluation, because e will always be reduced on both sides. Under lazy evaluation, however, the r.h.s. only reduces e if f also forces it. That might make the l.h.s. diverge where the r.h.s. would not. So the equation does not hold in a lazy language.
To make it concrete in Haskell:
f x = 0
lhs = case undefined of Left y -> f (Left y); Right y -> f (Right y)
rhs = f undefined
Here, trying to print lhs will diverge, whereas rhs yields 0.
There is more that could be said about this, but the essence is that the equational theories of both evaluation regimes are sort of dual.
The underlying problem is that under a lazy regime, every type is inhabited by _|_ (non-termination), whereas under eager it is not. That has severe semantic consequences. In particular, there are no inductive types in Haskell, and you cannot prove termination of a structural recursive function, e.g., a list traversal.
There is a line of research in type theory distinguishing data types (strict) from codata types (non-strict) and providing both in a dual manner, thus giving the best of both worlds.
Edit: As for the question why a compiler should not eta-expand functions: that would utterly break every language. In a strict language with effects that's most obvious, because the ability to stage effects via multiple function abstractions is a feature. The simplest example perhaps is this:
let make_counter () =
let x = ref 0 in
fun () -> x := !x + 1; !x
let tick = make_counter ()
let n1 = tick ()
let n2 = tick ()
let n3 = tick ()
But effects are not the only reason. Eta-expansion can also drastically change the performance of a program! In the same way you sometimes want to stage effects you sometimes also want to stage work:
match :: String -> String -> Bool
match regex = \s -> run fsm s
where fsm = ...expensive transformation of regex...
matchFloat = match "[0-9]+(\.[0-9]*)?((e|E)(+|-)?[0-9]+)?"
Note that I used Haskell here, because this example shows that implicit eta-expansion is not desirable in either eager or lazy languages!

With respect to your final question (why does Haskell do this), the reason "Strict Haskell" behaves differently from a truly strict language is that the Strict extension doesn't really change the evaluation model from lazy to strict. It just makes a subset of bindings into "strict" bindings by default, and only in the limited Haskell sense of forcing evaluation to weak head normal form. Also, it only affects bindings made in the module with the extension turned on; it doesn't retroactively affect bindings made elsewhere. (Moreover, as described below, the strictness doesn't take effect in partial function application. The function needs to be fully applied before any arguments are forced.)
In your particular Haskell example, I believe the only effect of the Strict extension is as if you had explicitly written the following bang patterns in the definition of silly:
silly !g !f = g (f (trace "evaluated" 0))
It has no other effect. In particular, it doesn't make const or (+) strict in their arguments, nor does it generally change the semantics of function applications to make them eager.
So, when the term silly (const 0) (+) is forced by print, the only effect is to evaluate its arguments to WHNF as part of the function application of silly. The effect is similar to writing (in non-Strict Haskell):
let { g = const 0; f = (+) } in g `seq` f `seq` silly g f
Obviously, forcing g and f to their WHNFs (which are lambdas) isn't going to have any side effect, and when silly is applied, const 0 is still lazy in its remaining argument, so the resulting term is something like:
(\x -> 0) ((\x y -> <defn of plus>) (trace "evaluated" 0))
(which should be interpreted without the Strict extension -- these are all lazy bindings here), and there's nothing here that will force the side effect.
As noted above, there's another subtle issue that this example glosses over. Even if you had made everything in sight strict:
{-# LANGUAGE Strict #-}
import Debug.Trace
myConst :: a -> b -> a
myConst x y = x
myPlus :: Int -> Int -> Int
myPlus x y = x + y
silly :: ((Int -> Int) -> Int) -> (Int -> Int -> Int) -> Int
silly g f = g (f (trace "evaluated" 0))
main = print $ silly (myConst 0) myPlus
this still wouldn't have printed "evaluated". This is because, in the evaluation of silly when the strict version of myConst forces its second argument, that argument is a partial application of the strict version of myPlus, and myPlus won't force any of its arguments until it's been fully applied.
This also means that if you change the definition of myPlus to:
myPlus x = \y -> x + y -- now it will print "evaluated"
then you'll be able to largely reproduce the ML behavior. Because myPlus is now fully applied, it will force its argument, and this will print "evaluated". You can suppress it again eta-expanding f in the definition of silly:
silly g f = g (\x -> f (trace "evaluated" 0) x) -- now it won't
because now when myConst forces its second argument, that argument is already in WHNF (because it's a lambda), and we never get to the application of f, full or not.
In the end, I guess I wouldn't take "Haskell plus the Strict extension and unsafe side effects like trace" too seriously as a good point in the design space. Its semantics may be (barely) coherent, but they sure are weird. I think the only serious use case is when you have some code whose semantics "obviously" don't depend on lazy versus strict evaluation but where performance would be improved by a lot of forcing. Then, you can just turn on Strict for a performance boost without having to think too hard.

Insufficient definition of replicate

I have a question that I think is rather tricky.
The standard prelude contains the function
replicate :: Int -> a -> [a]
The following might seem like a reasonable definition for it
replicate n x = take n [x,x,..]
But it is actually not sufficient. Why not?
I know that the replicate function is defined as:
replicate :: Int -> a -> [a]
replicate n x = take n (repeat x)
And repeat is defined as:
repeat :: a -> [a]
repeat x = xs where xs = x:xs
Is the definition insufficient (from the question) because it uses an infinite list?

First of all there is a small syntax error in the question, it should be:
replicate n x = take n [x,x..]
-- ^ no comma
but let's not be picky.
Now when you use range syntax (i.e. x..), then x should be of a type that is an instance of Enum. Indeed:
Prelude> :t \n x -> take n [x,x..]
\n x -> take n [x,x..] :: Enum a => Int -> a -> [a]
You can argue that x,x.. will only generate x, but the Haskell compiler does not know that at compile time.
So the type in replicate (in the question) is too specific: it implies a type constraint - Enum a - that is actually not necessary.
Your own definition on the other hand is perfectly fine. Haskell has no problem with infinite lists since it uses lazy evaluation. Furthermore because you define xs with xs as tail, you actually constructed a circular linked list which also is better in terms of memory usage.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string