Do the compiler or the more "native" parts of the libraries (IO or functions that have access to black magic and the implementation) make assumptions about these laws? Will breaking them cause the impossible to happen?
Or do they just express a programming pattern -- ie, the only person you'll annoy by breaking them are people who use your code and didn't expect you to be so careless?
The monad laws are simply additional rules that instances are expected to follow, beyond what can be expressed in the type system. Insofar as Monad expresses a programming pattern, the laws are part of that pattern. Such laws apply to other type classes as well: Monoid has very similar rules to Monad, and it's generally expected that instances of Eq will follow the rules expected for an equality relation, among other examples.
Because these laws are in some sense "part of" the type class, it should be reasonable for other code to expect they will hold, and act accordingly. Misbehaving instances may thus violate assumptions made by client code's logic, resulting in bugs, the blame for which is properly placed at the instance, not the code using it.
In short, "breaking the monad laws" should generally be read as "writing buggy code".
I'll illustrate this point with an example involving another type class, modified from one given by Daniel Fischer on the haskell-cafe mailing list. It is (hopefully) well known that the standard libraries include some misbehaving instances, namely Eq and Ord for floating point types. The misbehavior occurs, as you might guess, when NaN is involved. Consider the following data structure:
> let x = fromList [0, -1, 0/0, -5, -6, -3] :: Set Float
Where 0/0 produces a NaN, which violates the assumptions about Ord instances made by Data.Set.Set. Does this Set contain 0?
> member 0 x
True
Yes, of course it does, it's right there in plain sight! Now, we insert a value into the Set:
> let x' = insert (0/0) x
This Set still contains 0, right? We didn't remove anything, after all.
> member 0 x'
False
...oh. Oh, dear.
The compiler doesn't make any assumptions about the laws, however, if your instance does not obey the laws, it will not behave like a monad -- it will do strange things and otherwise appear to your users to not work correctly (e.g. dropping values, or evaluating things in the wrong order).
Also, refactorings your users might make assuming the monad laws hold will obviously not be sound.
For people working in more "mainstream" languages, this would be like implementing an interface, but doing so incorrectly. For example, imagine you're using a framework that offers an IShape interface, and you implement it. However, your implementation of the draw() method doesn't draw at all, but instead merely instantiates 1000 more instances of your class.
The framework would try to use your IShape and do reasonable things with it, and God knows what would happen. It'd be kind of an interesting train wreck to watch.
If you say you're a Monad, you're "declaring" that you adhere to its contract and laws. Other code will believe your declaration and act accordingly. Since you lied, things will go wrong in unforeseen ways.
Related
To clarify my question, let me rephrase it in a more or less equivalent way:
Why is there a concept of superclass/class inheritance in Haskell?
What are the historical reasons that led to that design choice?
Why would it be so bad, for example, to have a base library with no class hierarchy, just typeclasses independent from each other?
Here I'll expose some random thoughts that made me want to ask this question. My current intuitions might be inaccurate as they are based on my current understanding of Haskell which is not perfect, but here they are...
It is not obvious to me why type class inheritance exists in Haskell. I find it a bit weird, as it creates asymmetry in concepts.
Often in mathematics, concepts can be defined from different viewpoints, I don't necessarily want to favor an order of how they ought to be defined. OK there is some order in which one should prove things, but once theorems and structures are there, I'd rather see them as available independent tools.
Moreover one perhaps not so good thing I see with class inheritance is this: I think a class instance will silently pick a corresponding superclass instance, which was probably implemented to be the most natural one for that type. Let's consider a Monad viewed as a subclass of Functor. Maybe there could be more than one way to define a Functor on some type that also happens to be a Monad. But saying that a Monad is a Functor implicitly makes the choice of one particular Functor for that Monad. Someday, you might forget that actually you wanted some other Functor.
Perhaps this example is not the best fit, but I have the feeling this sort of situation might generalize and possibly be dangerous if your class is a child of many. Current Haskell inheritance sounds like it makes default choices about parents implicitly.
If instead you have a design without hierarchy, I feel you would always have to be explicit about all the properties required, which would perhaps mean a bit less risk, more clarity, and more symmetry. So far, what I'm seeing is that the cost of such a design, would be : more constraints to write in instance definitions, and newtype wrappers, for each meaningful conversion from one set of concepts to another. I am not sure, but perhaps that could have been acceptable. Unfortunately, I think Haskell auto deriving mechanism for newtypes doesn't work very well, I would appreciate that the language was somehow smarter with newtype wrapping/unwrapping and required less verbosity.
I'm not sure, but now that I think about it, perhaps an alternative to newtype wrappers could be specific imports of modules containing specific variations of instances.
Another alternative I thought about while writing this, is that maybe one could weaken the meaning of class (P x) => C x, where instead of it being a requirement that an instance of C selects an instance of P, we could just take it to loosely mean that for example, C class also contains P's methods but no instance of P is automatically selected, no other relationship with P exists. So we could keep some sort of weaker hierarchy that might be more flexible.
Thanks if you have some clarifications over that topic, and/or correct my possible misunderstandings.
Maybe you're tired of hearing from me, but here goes...
I think superclasses were introduced as a relatively minor and unimportant feature of type classes. In Wadler and Blott, 1988, they are briefly discussed in Section 6 where the example class Eq a => Num a is given. There, the only rationale offered is that it's annoying to have to write (Eq a, Num a) => ... in a function type when it should be "obvious" that data types that can be added, multiplied, and negated ought to be testable for equality as well. The superclass relationship allows "a convenient abbreviation".
(The unimportance of this feature is underscored by the fact that this example is so terrible. Modern Haskell doesn't have class Eq a => Num a because the logical justification for all Nums also being Eqs is so weak. The example class Eq a => Ord a would be been a lot more convincing.)
So, the base library implemented without any superclasses would look more or less the same. There would just be more logically superfluous constraints on function type signatures in both library and user code, and instead of fielding this question, I'd be fielding a beginner question about why:
leq :: (Ord a) => a -> a -> Bool
leq x y = x < y || x == y
doesn't type check.
To your point about superclasses forcing a particular hierarchy, you're missing your target.
This kind of "forcing" is actually a fundamental feature of type classes. Type classes are "opinionated by design", and in a given Haskell program (where "program" includes all the libraries, include base used by the program), there can be only one instance of a particular type class for a particular type. This property is referred to as coherence. (Even though there is a language extension IncohorentInstances, it is considered very dangerous and should only be used when all possible instances of a particular type class for a particular type are functionally equivalent.)
This design decision comes with certain costs, but it also comes with a number of benefits. Edward Kmett talks about this in detail in this video, starting at around 14:25. In particular, he compares Haskell's coherent-by-design type classes with Scala's incoherent-by-design implicits and contrasts the increased power that comes with the Scala approach with the reusability (and refactoring benefits) of "dumb data types" that comes with the Haskell approach.
So, there's enough room in the design space for both coherent type classes and incoherent implicits, and Haskell's appoach isn't necessarily the right one.
BUT, since Haskell has chosen coherent type classes, there's no "cost" to having a specific hierarchy:
class Functor a => Monad a
because, for a particular type, like [] or MyNewMonadDataType, there can only be one Monad and one Functor instance anyway. The superclass relationship introduces a requirement that any type with Monad instance must have Functor instance, but it doesn't restrict the choice of Functor instance because you never had a choice in the first place. Or rather, your choice was between having zero Functor [] instances and exactly one.
Note that this is separate from the question of whether or not there's only one reasonable Functor instance for a Monad type. In principle, we could define a law-violating data type with incompatible Functor and Monad instances. We'd still be restricted to using that one Functor MyType instance and that one Monad MyType instance throughout our program, whether or not Functor was a superclass of Monad.
I once asked a question on haskell beginners, whether to use data/newtype or a typeclass. In my particular case it turned out that no typeclass was required. Additionally Tom Ellis gave me a brilliant advice, what to do when in doubt:
The simplest way of answering this which is mostly correct is:
use data
I know that typeclasses can make a few things a bit prettier, but not much AFIK. It also strikes me that typeclasses are mostly used for brain stem stuff, wheras in newer stuff, new typeclasses hardly ever get introduced and everything is done with data/newtype.
Now I wonder if there are cases where typeclasses are absolutely required and things could not be expressed with data/newtype?
Answering a similar question on StackOverflow Gabriel Gonzales said
Use type classes if:
There is only one correct behavior per given type
The type class has associated equations (i.e. "laws") that all instances must satisfy
Hmm ..
Or are typeclasses and data/newtype somewhat competing concepts which coexist for historical reasons?
I would argue that typeclasses are an essential part of Haskell.
They are the part of Haskell that makes it the easiest language I know of to refactor, and they are a great asset to your being able to reason about the correctness of code.
So, let's talk about dictionary passing.
Now, any sort of dictionary passing is a big improvement in the state of affairs in traditional object oriented languages. We know how to do OOP with vtables in C++. However, the vtable is 'part of the object' in OOP languages. Fusing the vtable with the object forces your code into a form where you have a rigid discipline about who can extend the core types with new features, its really only the original author of the class who has to incorporate all the things others want to bake into their type. This leads to "lava flow code" and all sorts of other design antipatterns, etc.
Languages like C# give you the ability to hack in extension methods to fake new stuff, and "traits" in languages like scala and multiple inheritance in other languages let you delegate some of the work as well, but they are partial solutions.
When you split the vtable from the objects they manipulate you get a heady rush of power. You can now pass them around wherever you want, but then of course you need to name them and talk about them. The ML discipline around modules / functors and the explicit dictionary passing style take this approach.
Typeclasses take a slightly different tack. We rely on uniqueness of a typeclass instance for a given type and it is in large part it is this choice permits us to get away with such simple core data types.
Why?
Because we can move the use of the dictionaries to the use sites, and don't have to carry them around with the data types and we can rely upon the fact that when we do so nothing has changed about the behavior of the code.
Mechanical translation of the code to more complex manually passed dictionaries loses the uniqueness of such a dictionary at a given type. Passing the dictionaries in at different points in your program now leads to programs with greatly differing behavior. You may or may not have to remember the dictionaries your data type was constructed with, and woe betide you if you want to have conditional behavior based on what your arguments are.
For simple examples like Set you can get away with a manual dictionary translation. The price doesn't seem so high. You have to bake in the dictionary for, say, how you want to sort the Set when you make the object and then insert/lookup, would just preserve your choice. This might be a cost you can bear. When you union two Sets now, of course, its up in the air which ordering you get. Maybe you take the smaller and insert it into the larger, but then the ordering would change willy nilly, so instead you have to take say, the left and always insert it into the right, or document this haphazard behavior. You're now being forced into suboptimal performing solutions in the interest of 'flexibility'.
But Set is a trivial example. There you might bake an index into the type about which instance it was you are using, there is only one class involved. What happens when you want more complex behavior? One of the things we do with Haskell is work with monad transformers. Now you have lots of instances floating around -- and you don't have a good place to store them all, MonadReader, MonadWriter, MonadState, etc. may all apply.. conditionally, based on the underlying monad. what happens when you hoist and swap it out and now different things may or may not apply?
Carrying around an explicit dictionaries for this is a lot of work, there isn't a good place to store them and you are asking users to adopt a global program transformation to adopt this practice.
These are the things that typeclasses make effortless.
Do I believe you should use them for everything?
Not by a long shot.
But I can't agree with the other replies here that they are inessential to Haskell.
Haskell is the only language that supplies them and they are critical to at least my ability to think in this language, and are a huge part of why I consider Haskell home.
I do agree with a few things here, use typeclasses when there are laws and when the choice is unambiguous.
I'd challenge however, that if you don't have laws or if the choice isn't unambiguous, you may not know enough about how to model the problem domain, and should be seeking something for which you can fit it into the typeclass mold, possibly even into existing abstractions -- and when you finally find that solution, you'll find you can easily reuse it.
Typeclasses are, in most cases, inessential. Any typeclass code can be mechanically converted into dictionary-passing style. They mainly provide convenience, sometimes an essential amount of convenience (cf. kmett's answer).
Sometimes the single-instance property of typeclasses is used to enforce invariants. For example, you could not convert Data.Set into dictionary-passing style safely, because if you inserted twice with two different Ord dictionaries, you could break the data structure invariant. Of course you could still convert any working code to working code in dictionary-passing style, but you would not be able to outlaw as much broken code.
Laws are another important cultural aspect to typeclasses. The compiler does not enforce laws, but Haskell programmers expect typeclasses to come with laws that all the instances satisfy. This can be leveraged to provide stonger guarantees about some functions. This advantage comes only from the conventions of the community, and is not a formal property of a language.
To answer that part of the question:
"typeclasses and data/newtype somewhat competing concepts"
No. Typeclasses are an extension to the type system, that allows you to make constraints on polymorphic arguments. Like most things in programming, they are, of course, syntactic sugar [so they aren't essential in the sense that their use can't be replaced by anything else]. That doesn't mean they're superfluous. It just means you could express similar things using other language facilities, but you'd lose some clarity while you're at it. Dictionary passing can be used for mostly the same things, but it's ultimately less strict in the type system because it allows changing behavior at runtime (which is also an excellent example of where you'd use dictionary passing instead of type classes).
Data and newtype still mean exactly the same thing whether you have typeclasses or not: Introduce a new type, in the case of data as new kind of data structure, and in case of newtype as a typesafe variant of type.
To expand slightly on my comment I would suggest always starting by using data and dictionary passing. If the boilerplate and manual instance plumbing becomes too much to bear then consider introducing a typeclass. I suspect this approach generally leads to a cleaner design.
I just want to make a really mundane point about syntax.
People tend to underestimate the convenience afforded by type classes, probably because they have never tried Haskell without using any. This is a "the grass is greener on the other side of the fence" sort of phenomenon.
while :: Monad m -> m Bool -> m a -> m ()
while m p body = (>>=) m p $ \x ->
if x
then (>>) m body (while m p body)
else return m ()
average :: Floating a -> a -> a -> a -> a
average f a b c = (/) f ((+) (floatingToNum f) a ((+) (floatingToNum f) b c))
(fromInteger (floatingToNum f) 3)
This is the historical motivation for type classes and it remains valid today. If we didn't have type classes, we'd certainly need some kind of replacement for it to avoid writing monstrosities like these. (Maybe something like record puns or Agda's "open".)
I know that typeclasses can make a few things a bit prettier, but not much AFIK.
Bit prettier?? No! Way prettier! (as others have already noted)
However the answer to this really depends very much where this question comes from.
If Haskell is your tool of choice for serious software engineering, typeclasses are
powerful and essential.
If you are a beginner using haskell to learn (functional) programming, the complexity and difficulty of typeclasses can outweigh the advantages – certainly at the beginning of your studies.
Here are a couple of examples comparing ghc with gofer (predecessor of hugs,
predecessor of modern haskell):
gofer
? 1 ++ [2,3,4]
ERROR: Type error in application
*** expression :: 1 ++ [2,3,4]
*** term :: 1
*** type :: Int
*** does not match :: [Int]
Now compare with ghc:
Prelude> 1 ++ [2,3,4]
:2:1:
No instance for (Num [a0]) arising from the literal `1'
Possible fix: add an instance declaration for (Num [a0])
In the first argument of `(++)', namely `1'
In the expression: 1 ++ [2, 3, 4]
In an equation for `it': it = 1 ++ [2, 3, 4]
:2:7:
No instance for (Num a0) arising from the literal `2'
The type variable `a0' is ambiguous
Possible fix: add a type signature that fixes these type variable(s)
Note: there are several potential instances:
instance Num Double -- Defined in `GHC.Float'
instance Num Float -- Defined in `GHC.Float'
instance Integral a => Num (GHC.Real.Ratio a)
-- Defined in `GHC.Real'
...plus three others
In the expression: 2
In the second argument of `(++)', namely `[2, 3, 4]'
In the expression: 1 ++ [2, 3, 4]
This should suggest that error-message-wise, not only are typeclasses not prettier, they can be uglier!
One can go all the way (in gofer) and use the 'simple prelude' that uses
no typeclasses at all. This makes it quite unrealistic for serious programming
but real neat for wrapping your head round Hindley-Milner:
Standard Prelude
? :t (==)
(==) :: Eq a => a -> a -> Bool
? :t (+)
(+) :: Num a => a -> a -> a
Simple Prelude
? :t (==)
(==) :: a -> a -> Bool
? :t (+)
(+) :: Int -> Int -> Int
All the typeclasses in Typeclassopedia have associated laws, such as associativity or commutativity for certain operators. The definition of a "law" seems to be a constraint that cannot be expressed in the type system. I certainly understand why you want to have, say, monad laws, but is there a fundamental reason why a typeclass that can be expressed fully within the type system is pointless?
You will notice that almost always the laws are algebraic laws. They could be expressed by the type system by using some extensions, but the proofs would be cumbersome to express. So you have unchecked laws and potentially implementations might break them. Why is this good?
The reason is that the design patterns used in Haskell are motivated (and in most cases mirrored) by mathematical structures, usually from abstract algebra. While most other languages have an intuitive notion of certain features like safety, performance and semantics, we Haskell programmers prefer to establish a formal notion. The advantage of doing this is: Once your types and functions obey the safety laws, they are safe in the sense of the underlying algebraic structure. They are provably safe.
Take functors as an example. A Haskell functor has the following two laws:
fmap f . fmap g = fmap (f . g)
fmap id = id
Firstly this is very important: Functions in Haskell are opaque. You cannot examine, compare or whatever them. While this sounds like a bad thing in Haskell it is actually a very good thing. The fmap function cannot examine the function you've passed it. Particularly it can't check that you've passed the identity function or that you've passed a composition. In short: it can't cheat! The only way for it to obey these two laws is actually not to introduce any effects of its own. That means, in a proper functor fmap will never do anything unexpected. In fact it cannot do anything else than to map the given function. This is a very simple example and I haven't explained all the subtleties why fmap can't cheat, but it demonstrates the point.
Now extend this all over the language, the base libraries and most sensible third party libraries. This gives you a language that is as predictable as a language can get. When you write code, you know what it's going to do. That's one of the main reasons why Haskell code often works out of the box. I often write pages of Haskell code before compiling. Once my type errors are fixed, my program usually works.
The other reason why this is desirable is that it allows a more compositional style of programming. This is particularly useful when working as a team. First you map your application to algebraic structures and establish the necessary laws. For example: You express what it means for something to be a Valid Web Server. In particular you establish a formal notion of web server composition. If you compose two Valid Web Servers, the result is a Valid Web Server. Do you see where this is going? After establishing these laws the teammates go to work, and they work in isolation. Little to no communication is necessary to get their job done. When they meet again, everybody presents their Valid Web Servers and they just compose them to make the final product, a web site. Since the individual components were all Valid Web Servers, the final result must be a Valid Web Server. Provably.
Yes and no. For instance the Show class does not have any laws associated with it, and it is certainly useful.
However, typeclasses express interfaces. An interface needs to satisfy more than being just a bunch of functions - you want these functions to fulfill a specification. The specification is normally more complicated than what can be expressed in Haskell's type system. For example, take the Eq class. It only needs to provide us with a function, the type of which has to be a -> a -> Bool. That's the most that Haskell's type system will allow us to require from an instance of an Eq type. However, we would normally expect more from this function - you would probably want it to be an equivalence relation (reflexive, symmetric and transitive). So then you state these requirements as separate "laws".
A typeclass doesn't need to have laws, but it often will be more useful if it has them. Many typeclasses are expected to function in a certain way, the laws codify user expectations. The laws let users make assumptions about the way that an instance of a typeclass will work. If you break the typeclass laws, you don't get arrested by the Haskell police, you just end up with confused users.
I'm struggling with what Super Combinators are:
A supercombinator is either a constant, or a combinator which contains only supercombinators as subexpressions.
And also with what Constant Applicative Forms are:
Any super combinator which is not a lambda abstraction. This includes truly constant expressions such as 12, ((+) 1 2), [1,2,3] as well as partially applied functions such as ((+) 4). Note that this last example is equivalent under eta abstraction to \ x -> (+) 4 x which is not a CAF.
This is just not making any sense to me! Isn't ((+) 4) just as "truly constant" as 12? CAFs sound like values to my simple mind.
These Haskell wiki pages you reference are old, and I think unfortunately written. Particularly unfortunate is that they mix up CAFs and supercombinators. Supercombinators are interesting but unrelated to GHC. CAFs are still very much a part of GHC, and can be understood without reference to supercombinators.
So let's start with supercombinators. Combinators derive from combinatory logic, and, in the usage here, consist of functions which only apply the values passed in to one another in one or another form -- i.e. they combine their arguments. The most famous set of combinators are S, K, and I, which taken together are Turing-complete. Supercombinators, in this context, are functions built only of values passed in, combinators, and other supercombinators. Hence any supercombinator can be expanded, through substitution, into a plain old combinator.
Some compilers for functional languages (not GHC!) use combinators and supercombinators as intermediate steps in compilation. As with any similar compiler technology, the reason for doing this is to admit optimization analysis that is more easily performed in such a simplified, minimal language. One such core language built on supercombinators is Edwin Brady's epic.
Constant Applicative Forms are something else entirely. They're a bit more subtle, and have a few gotchas. The way to think of them is as an aspect of compiler implementation with no separate semantic meaning but with a potentially profound effect on runtime performance. The following may not be a perfect description of a CAF, but it'll try to convey my intuition of what one is, since I haven't seen a really good description anywhere else for me to crib from. The clean "authoritative" description in the GHC Commentary Wiki reads as follows:
Constant Applicative Forms, or CAFs for short, are top-level values
defined in a program. Essentially, they are objects that are not
allocated dynamically at run-time but, instead, are part of the static
data of the program.
That's a good start. Pure, functional, lazy languages can be thought of in some sense as a graph reduction machine. The first time you demand the value of a node, that forces its evaluation, which in turn can demand the values of subnodes, etc. One a node is evaluated, the resultant value sticks around (although it does not have to stick around -- since this is a pure language we could always keep the subnodes live and recalculate with no semantic effect). A CAF is indeed just a value. But, in the context, a special kind of value -- one which the compiler can determine has a meaning entirely dependent on its subnodes. That is to say:
foo x = ...
where thisIsACaf = [1..10::Int]
thisIsNotACaf = [1..x::Int]
thisIsAlsoNotACaf :: Num a => [a]
thisIsAlsoNotACaf = [1..10] -- oops, polymorphic! the "num" dictionary is implicitly a parameter.
thisCouldBeACaf = const [1..10::Int] x -- requires a sufficiently smart compiler
thisAlsoCouldBeACaf _ = [1..10::Int] -- also requires a sufficiently smart compiler
So why do we care if things are CAFs? Basically because sometimes we really really don't want to recompute something (for example, a memotable!) and so want to make sure it is shared properly. Other times we really do want to recompute something (e.g. a huge boring easy to generate list -- such as the naturals -- which we're just walking over) and not have it stick around in memory forever. A combination of naming things and binding them under lets or writing them inline, etc. typically lets us specify these sorts of things in a natural, intuitive way. Occasionally, however, the compiler is smarter or dumber than we expect, and something we think should only be computed once is always recomputed, or something we don't want to hang on to gets lifted out as a CAF. Then, we need to think things through more carefully. See this discussion to get an idea about some of the trickiness involved: A good way to avoid "sharing"?
[By the way, I don't feel up to it, but anyone that wants to should feel free to take as much of this answer as they want to try and integrate it with the existing Haskell Wiki pages and improve/update them]
Matt is right in that the definition is confusing. It is even contradictory. A CAF is defined as:
Any super combinator which is not a lambda abstraction. This includes
truly constant expressions such as 12, ((+) 1 2), [1,2,3] as
well as partially applied functions such as ((+) 4).
Hence, ((+) 4) is seen as a CAF. But in the very next sentence we're told it is equivalent to something that is not a CAF:
this last example is equivalent under eta abstraction to \ x -> (+) 4 x which is not a CAF.
It would be cleaner to rule out partially applied functions on the ground that they are equivalent to lambda abstractions.
Ive come across some answers (here in SO) saying that Haskell has many "dark corners" in its type system, and also some messy holes. Could someone elaborate on this?
Thanks in advance
I guess I should answer this, especially since two people so far have misinterpreted my remarks...
Regarding non-termination, the remark in question was slight hyperbole for dramatic effect, and referred to non-termination at the value level. This was in context of comparing Haskell to theorem provers, in an answer to someone who mentioned type-enforced correctness properties as something they particularly appreciated. In that sense, the presence of ⊥ inhabiting otherwise empty types is a "flaw", because it changes the meaning of a type like A -> B from "given an A, produces a B" into "given an A, either produces a B or crashes the program" which is, for obvious reasons, somewhat less satisfying from a proof-of-correctness standpoint.
It's also completely irrelevant to almost all day-to-day programming and no worse than any other general-purpose language because, of course, the possibility of non-termination is required for Turing-completeness.
I don't have any problem with UndecidableInstances. Actually, it bothers me less than ⊥ at the value level does because it only crashes GHC when compiling, not the finished program. OverlappingInstances is another matter, though, and the ad hoc mishmash of GHC extensions to provide little bits of things that would most naturally require dependent types certainly qualifies as "messy".
But keep in mind that most of the things I'm complaining about in Haskell are only a problem because of the otherwise very solid foundation. Most type systems in other statically typed languages aren't even coherent enough to be called "wrong" in comparison, and cleaning up the stuff I'm calling "messy" is an active and ongoing area of research.
Haskell's type system has no problems or messy holes, actually. Haskell 98 can be fully type-inferred. It possesses what is known as the "principal type" property, which is to say that any given expression has at most a single most general type. There are, however, a range of expressions which are good and useful and valid but do not type under Haskell 98. Most important of these are higher-ranked types. forall a b. (a -> b) -> a -> b is an (uninteresting) example of a rank-one type, which is to say that the forall is only at the very outside. forall b. -> (forall a. a -> a) -> b -> b is an example of a useless but possible type which is not rank-one, and cannot be expressed in Haskell98. Higher ranked types are one of many things which break the principal type property.
As one adds more and more extensions to the basic Haskell98 system, there begin to be tradeoffs introduced between the ability to write really powerful types which express both different types of polymorphism and different types of constraints and the ability to have as much code as possible completely type-inferred. At the very edges of what's possible, types can get messy and complicated, and occasionally you can run into things that seem like they should work but don't. But at that point, you're generally doing what's known as "type-level programming" where a great deal of your application logic has been embedded in the types themselves, and through a combination of typeclass tricks you've conned the compiler into, essentially, running the types as a program at compile time.
I disagree, by the way, with camccann's assertion that potential nontermination is a messy compromise in the type-checker. I think it's a perfectly useful feature, in fact a prerequisite for turning-completeness at the type level, and you only risk it if you explicitly ask the compiler to start allowing lots of dodgy stuff.
So you're referring to Camccann saying "Haskell's type system is full of holes, due to nontermination and other messy compromises"? I think he's talking about the UndecidableInstances extension and probably a few others.
Then you referred to Norman, I can only assume, saying "Haskell's type system is ambitious and powerful, but it is continually being improved, which means there is some inconsistency as a result of history.". I'm sure he had something in mind but will let him clarify when he see`s this question.