What are algebraic structures in functional programming? - haskell

I've been doing some light reading on functional programming concepts and ideas. So far, so good, I've read about three main concepts: algebraic structures, type classes, and algebraic data types. I have a fairly good understanding of what algebraic data types are. I think sum types and product types are fairly straightforward. For example, I can imagine creating an algebraic data type like a Card type which is a product type consisting of two enum types, Suit (with four values and symbols) and Rank (with 13 values and symbols).
However, I'm still hung up on trying to understand precisely what algebraic structures and type classes are. I just have a surface-level picture in my head but can't quite completely wrap my head around, for instance, the different types of algebraic structures like functors, monoids, monads, etc. How exactly are these different? How can they be used in a programming setting? How are type classes different from regular classes? Can anyone at least point me in the direction of a good book on abstract algebra and functional programming? Someone recommended I learn Haskell but do I really need to learn Haskell in order to understand functional programming?

"algebraic structure" is a concept that goes well beyond programming, it belongs to mathematics.
Imagine the unfathomably deep sea of all possible mathematical objects. Numbers of every stripe (the naturals, the reals, p-adic numbers...) are there, but also things like sequences of letters, graphs, trees, symmetries of geometrical figures, and all well-defined transformations and mappings between them. And much else.
We can try to "throw a net" into this sea and retain only some of those entities, by specifying conditions. Like "collections of things, for which there is an operation that combines two of those things into a third thing of the same type, and for which the operation is associative". We can give those conditions their own name, like, say, "semigroup". (Because we are talking about highly abstract stuff, choosing a descriptive name is difficult.)
That leaves out many inhabitants of the mathematical "sea", but the description still fits a lot of them! Many collections of things are semigroups. The natural numbers with the multiplication operation for example, but also non-empty lists of letters with concatenation, or the symmetries of a square with composition.
You can expand your description with extra conditions. Like "a semigroup, and there's also an element such that combining it with any other element gives the other element, unchanged". That restricts the number of mathematical entities that fit the description, because you are demanding more of them. Some valid semigroups will lack that "neutral element". But a lot of mathematical entities will still satisfy the expanded description. If you aren't careful, you can declare conditions so restrictive that no possible mathematical entity can actually fit them! At other times, you can be so precise that only one entity fits them.
Working purely with these descriptions of mathematical entities, using only the general properties we require of them, we can obtain unexpected results about them, non-obvious at first sight, results that will apply to all entities which fit the description. Think of these discoveries as the mathematical equivalent of "code reuse". For example, if we know that some collection of things is a semigroup, then we can calculate exponentials using binary exponentiation instead of tediously combining a thing with itself n times. But that only works because of the associative property of the semigroup operation.

You’ve asked quite a few questions here, but I can try to answer them as best I can:
… different types of algebraic structures like functors, monoids, monads, etc. How exactly are these different? How can they be used in a programming setting?
This is a very common question when learning Haskell. I won’t write yet another answer here — and a complete answer is fairly long anyway — but a simple Google search gives some very good answers: e.g. I can recommend 1 2 3
How are type classes different from regular classes?
(By ‘regular classes’ I assume you mean classes as found in OOP.)
This is another common question. Basically, the two have almost nothing in common except the name. A class in OOP is a combination of fields and methods. Classes are used by creating instances of that class; each instance can store data in its fields, and manipulate that data using its methods.
By contrast, a type class is simply a collection of functions (often also called methods, though there’s pretty much no connection). You can declare an instance of a type class for a data type (again, no connection) by redefining each method of the class for that type, after which you may use the methods with that type. For instance, the Eq class looks like this:
class Eq a where
(==) :: a -> a -> Bool
(/=) :: a -> a -> Bool
And you can define an instance of that class for, say, Bool, by implementing each function:
instance Eq Bool where
True == True = True
False == False = True
_ == _ = False
p /= q = not (p == q)
Can anyone at least point me in the direction of a good book on abstract algebra and functional programming?
I must admit that I can’t help with this (and it’s off-topic for Stack Overflow anyway).
Someone recommended I learn Haskell but do I really need to learn Haskell in order to understand functional programming?
No, you don’t — you can learn functional programming from any functional language, including Lisp (particularly Scheme dialects), OCaml, F#, Elm, Scala etc. Haskell happens to be a particularly ‘pure’ functional programming language, and I would recommend it as well, but if you just want to learn and understand functional programming then any one of those will do.

Related

What does "a monad is a model of computation" mean

What does it mean exactly when people say "a monad is a model of computation"? Does this mean computation in the sense of turing completeness? If so, how?
Clarification: This question is not about explaining monads but what people mean with "model of computation" in this context and how this relates to monads. See towards the end of this answer for a typical use of this phrase.
In my understanding a turing machine, the theory of recursive functions, lambda calculus etc. are all models of computation and I cannot see how a monad would relate to that if at all.
The idea of monads as models of computation can be traced back to the work of Eugenio Moggi. Among Haskell practitioners, the best known paper by Moggi on this matter is Notions of computations as monads (1991). Relevant quotes include:
The [lambda]-calculus is considered a useful mathematical tool in the study of programming languages, since programs can be identified with [lambda]-terms. However, if one goes further and uses [beta][eta]-conversion to prove equivalence of programs, then a gross simplification is introduced (programs are identified with total functions from values to values) that may jeopardise the applicability of theoretical results, In this paper we introduce calculi based on a categorical semantics for computations, that provide a correct basis for proving equivalence of programs for a wide range of notions of computation. [p. 1]
[...]
We do not take as a starting point for proving equivalence of programs the theory of [beta][eta]-conversion, which identifies the denotation of a program (procedure) of type A -> B with a total function from A to B, since this identification wipes out completely behaviours such as non-termination, non-determinism, and side-effects, that can be exhibited by real programs. Instead, we proceed as follows:
We take category theory as a general theory of functions and develop on top a categorical semantics of computations based on monads. [...] [p. 1]
[...]
The basic idea behind the categorical semantics below is that, in order to interpret a programming language in a category [C], we distinguish the object A of values (of type A) from the object TA of computations (of type A), and take as denotations of programs (of type A) the elements of TA. In particular, we identify the type A with the object of values (of type A) and obtain the object of computations (of type A) by applying an unary type-constructor T to A. We call T a notion of computation, since it abstracts away from the type of values computations may produce. There are many choices for TA corresponding to different notions of computations. [pp. 2-3]
[...]
We have identified monads as important to modeling notions of computations, but computational monads seem to have additional properties; e.g., they have a tensorial strength and may satisfy the mono requirement. It is likely that there are other properties of computational monads still to be identified, and there is no reason to believe that such properties have to be found in the literature on monads. [p. 27 -- thanks danidiaz]
A related older paper by Moggi, Computational lambda-calculus and monads (1989 -- thanks michid for the reference), speaks literally of "computational model[s]":
A computational model is a monad (T;[eta];[mu]) satisfying the mono requirement: [eta-A] is a mono for every A [belonging to] C.
There is an alternative description of a monad (see[7]), which is easier to justify computationally. [...] [p. 2]
This particular bit of terminology was dropped in the Notions of computations as monads, as Moggi sharpened the focus of his presentation on the "alternative description" (namely, Kleisli triples, which are composed by, in Haskell parlance, a type constructor, return and bind). The essence, though, remain the same throughout.
Philip Wadler presents the idea with a more practical bent in Monads for functional programming (1992):
The use of monads to structure functional programs is described. Monads provide a convenient framework for simulating effectsfound in other languages, such as global state, exception handling, out-put, or non-determinism. [p. 1]
[...]
Pure functional languages have this advantage: all flow of data is made explicit.And this disadvantage: sometimes it is painfully explicit.
A program in a pure functional language is written as a set of equations. Explicit data flow ensures that the value of an expression depends only on its free variables. Hence substitution of equals for equals is always valid, making such programs especially easy to reason about. Explicit data flow also ensures that the order of computation is irrelevant, making such programs susceptible to lazy evaluation.
It is with regard to modularity that explicit data flow becomes both a blessing and a curse. On the one hand, it is the ultimate in modularity. All data in and all data out are rendered manifest and accessible, providing a maximum of flexibility. On the other hand, it is the nadir of modularity. The essence of an algorithm can become buried under the plumbing required to carry data from its point of creation to its point of use. [p. 2]
[...]
Say it is desired to add error checking, so that the second example above returns a sensible error message. In an impure language, this is easily achieved with the use of exceptions.
In a pure language, exception handling may be mimicked by introducing a type to represent computations that may raise an exception. [pp. 3 -4 -- note this is before monads are introduced as an unifying abstraction.]
[...]
Each of the variations on the interpreter has a similar structure, which may be abstracted to yield the notion of a monad.
In each variation, we introduced a type of computations. Respectively, M represented computations that could raise exceptions, act on state, and generate output. By now the reader will have guessed that M stands for monad. [p. 6]
This is one of the roots of the usage of "computation" to refer to monadic values.
A significant body of later literature makes use of the concept of computation in this manner. For instance, this is the opening passage of Notions of Computation as Monoids by Exequiel Rivas and Mauro Jaskelioff (2014 -- thanks danidiaz for the suggestion):
When constructing a semantic model of a system or when structuring computer code,there are several notions of computation that one might consider. Monads (Moggi, 1989; Moggi, 1991) are the most popular notion, but other notions,such as arrows (Hughes, 2000) and, more recently, applicative functors (McBride & Paterson, 2008) have been gaining widespread acceptance. Each of these notions of computation has particular characteristics that makes them more suitable for some tasks than for others. Nevertheless, there is much to be gained from unifying all three different notions under a single conceptual framework. [p. 1]
Another good example is Comonadic notions of computation by Tarmo Uustalu and Varmo Vene (2000):
Since the seminal work by Moggi in the late 80s, monads, more precisely, strong monads, have become a generally accepted tool for structuring effectful notions of computation, such as computation with exceptions, output, computation using an environment, state-transforming, nondeterministic and probabilistic computation etc. The idea is to use a Kleisli category as the category of impure, effectful functions, with the Kleisli inclusion giving an embedding of the pure functions from the base category. [...] [p. 263]
[...]
The starting-point in the monadic approach to (call-by-value) effectful computation is the idea that impure, effectful functions from A to B must be nothing else than pure functions from A to TB. Here pure functions live in a base category C and T is an endofunctor on C that describes the notion of effect of interest; it is useful to think of TA as the type of effectful computations of values of a given type A.
For this to work, impure functions must have identities and compose. Therefore T cannot merely be a functor, but must be a monad. [p. 265]
Such uses of "computation" fit the usual computer science notion of models of computation (see danidiaz's answer for more on that). In the informal functional programming literature, allusions to monads as models of computation have varying degrees of precision. Still, they generally draw from, or at least are offshoots of, a rigorous idea.
Nothing. It doesn't mean anything. It's the output of someone struggling to find metaphors which make monads into something they already know. It almost means something. "It is possible to construct models of computation which form monads," for instance, is a meaningful statement. But the difference is significant. "Monads are models of computation" is an attempt to force a broad abstraction into a narrow interpretation. The other specifies that you can work with a broader abstraction for one use case.
Be very wary of reductive explanations. Do you think that an entire community of developers would keep using unfamiliar terminology if familiar terminology communicated the same thing? The term Monad has stuck around for 20 years in a language community that rapidly invents and discards abstractions as it searches for improvements. The only way that can happen is if it communicates something useful and precise.
It's just hard to write an explanation of the application of the idea to programming that makes any sense to people who don't know enough of the language to understand the constructs in use. If you aren't comfortable with at least higher-kinded types, type classes, and higher-order functions there's no way to understand what the notation is saying.
Learning prerequisite ideas will help. Practice writing code will help. Looking at how (>>=) works for various concrete types will help. Struggling through learning how to use a library like Parsec (or modern descendants like megaparsec) will help.
Trying to force the idea to match something you already know via metaphor will not.
Expanding a little on #duplode's answer, I think that when talking about computation, "model" can have at least two slightly different meanings.
One is model in the sense of the Church–Turing thesis. Here a model is a way of performing computations that is capable of expressing any algorithm. So turing machines, lambda calculus, post correspondence systems... are all models.
Another is model in the sense of programming language semantics. The idea is that we consider programs as composable syntactical structures, and we want them to "mean" something, ideally in a way that lets us determine the meaning of a composition from the meaning of the elements. In this sense, lambda calculus has models.
Now, one kind of semantics is denotational semantics, in which the meaning we assign to a program is some kind of mathematical object. For a trivial example, consider binary numbers. Here the "programs" are strings of 0s and 1s, regarded as mere symbols. And the "model" would be natural numbers, along with a function which maps each string of symbols to the corresponding natural number.
Sometimes these denotations of programs are expressed in terms of category theory. This is the context of Moggi's papers: he is making use of machinery from category theory—like monads—to map programming language concepts like exceptions, continuations, input/output... into a mathematical model. Monads become a convenient way of structuring the mathematical universe of program meanings.

What are abstract patterns?

I am learning Haskell and trying to understand the Monoid typeclass.
At the moment, I am reading the haskellbook and it says the following about the pattern (monoid):
One of the finer points of the Haskell community has been its
propensity for recognizing abstract patterns in code which have
well-defined, lawful representations in mathematics.
What does the author mean by abstract patterns?
Abstract in this sense is the opposite of concrete. This is probably one of the key things to understand about Haskell.
What is a concrete thing? Well, most values in Haskell are concrete. For example 'a' :: Char. The letter 'a' is a Char value, and it's a concrete value. Char is a concrete type. But in 1 :: Num a => a, the number 1 is actually a value of any type, so long as that type has the set of functions that the Num typeclass sets out as mandatory. This is an abstract value! We can have abstract values, abstract types, and therefore abstract functions. When the program is compiled, the Haskell compiler will pick a particular concrete value that supports all of our requirements.
Haskell, at its core, has a very simple, small but incredibly flexible language. It's very similar to an expression of maths, actually. This makes it very powerful. Why? because most things that would be built in language constructs in other languages are not directly built into Haskell, but defined in terms of this simple core.
One of the core pieces is the function, which, it turns out, most of computation is expressible in terms of. Because so much of Haskell is just defined in terms of this small simple core, it means we can extend it to almost anywhere we can imagine.
Typeclasses are probably the best example of this. Monoid, and Num are examples of typeclasses. These are constructs that allow programmers to use an abstraction like a function across a great many types but only having to define it once. Typeclasses let us use the same function names across a whole range of types if we can define those functions for those types. Why is that important or useful? Well, if we can recognise a pattern across, for example, all numbers, and we have a mechanism for talking about all numbers in the language itself, then we can write functions that work with all numbers at once. This is an abstract pattern. You'll notice some Haskellers are quite interested in a branch of mathematics called Category Theory. This branch is pretty much the mathematical definition of abstract patterns. Contrast this ability to encode such things with the inability of other languages, where in other languages the patterns the community notice are often far less rigorous and have to be manually written out, and without any respect for its mathematical nature. The beauty of following the mathematics is the extremely large body of stuff we get for free by aligning our language closer with mathematics.
This is a good explanation of these basics including typeclasses in a book that I helped author: http://www.happylearnhaskelltutorial.com/1/output_other_things.html
Because functions are written in a very general way (because Haskell puts hardly any limits on our ability to express things generally), we can write functions that use types which express such things as "any type, so long as it's a Monoid". These are called type constraints, as above.
Generally abstractions are very useful because we can, for example, write on single function to operate on an entire range of types which means we can often find functions that do exactly what we want on our types if we just make them instances of specific typeclasses. The Ord typeclass is a great example of this. Making a type we define ourselves an instance of Ord gives us a whole bunch of sorting and comparing functions for free.
This is, in my opinion, one of the most exciting parts about Haskell, because while most other languages also allow you to be very general, they mostly take an extreme dip in how expressive you can be with that generality, so therefore also are less powerful. (This is because they are less precise in what they talk about, because their types are less well "defined").
This is how we're able to reason about the "possible values" of a function, and it's not limited to Haskell. The more information we encode at the type level, the more toward the specificity end of the spectrum of expressivity we veer. For example, to take a classic case, the function const :: a -> b -> a. This function requires that a and b can be of absolutely any type at all, including the same type if we like. From that, because the second parameter can be a different type than the first, we can work out that it really only has one possible functionality. It can't return an Int, unless we give it an Int as its first value, because that's not any type, right? So therefore we know the only value it can return is the first value! The functionality is defined right there in the type! If that's not mindblowing, then I don't know what is. :)
As we move to dependent types (that is, a type system where types are first class, which means also that ordinary values can be encoded in the type system), we can get closer and closer to having the type system specify specifically what the constraints of possible functionality are. However, the kicker is, it doesn't necessarily speak about the implementation of the functionality unless we want it to, because we're in control of how abstract it is, but while maintaining expressivity and much precision. That's pretty fascinating, and amazingly powerful.
Much math can be expressed in the language that underpins Haskell, the lambda calculus.

Are there benefits of strong typing besides safety?

In the Haskell community, we are slowly adding features of dependent types. Dependent types is an advanced typing feature by which types can depend on values. Some languages like Agda and Idris already have them. It appears to be a very advanced feature requiring an advanced type system, until you realize that python has had dependent types has had the dynamic typing version of dependent types, which may or may not be actual dependent types, from the beginning.
For most any program in a functional programming language, there is a way to reperesent it as an untyped lambda calculus term, no matter how advanced the typing. That's because typing only eliminates programs, not enable new ones.
Strong Typing wins us safety. How classes of errors that happened at run time can no longer happen at run time. This safety is rather nice. Besides this safety though, what does strong typing give you?
Are there an additional benefits of a strong type system besides safety?
(Note that I'm not saying that strong typing is worthless. Safety is a huge benefit in and of itself. I'm just wondering if there are additional benefits.)
First, we need to talk a bit about the history of the simply typed lambda calculus.
There are two historical developments of the simply typed lambda calculus.
When Alonzo Church described the lambda calculus the types were baked in as part of the meaning / operational behavior of the terms.
When Haskell Curry described the lambda calculus the types were annotations put on the terms.
So we have the lambda calculus a la Church and the lambda calculus a la Curry. See https://en.wikipedia.org/wiki/Simply_typed_lambda_calculus#Intrinsic_vs._extrinsic_interpretations for more.
Ironically, the language Haskell, which is named after Curry is based on a lambda calculus a la Church!
What this means is the types aren't simply annotations that rule out bad programs for you. They can "do stuff" too. Such types don't erase without leaving residue.
This shows up in Haskell's notion of type classes, which are really why Haskell is a language a la Church.
In Haskell, when I make a function
sort :: Ord a => [a] -> [a]
We're passing an object or dictionary for Ord a as the first argument.
But you aren't forced to plumb that argument around yourself in the code, it is the job of the compiler to build that up and use it.
instance Ord Char
instance Ord Int
instance Ord a => Ord [a]
So if you go and use sort on a list of strings, which are themselves lists of chars, then this will build up the dictionary by passing the Ord Char instance through the instance for Ord a => Ord [a] to get Ord [Char], which is the same as Ord String, then you can sort a list of strings.
Calling sort above, is a lot less verbose than manually building a LexicographicComparator<List<Char>> by passing it an IComparator<Char> to its constructor and calling the function with an extra second argument, if I were to compare the complexity of calling such a sort function in Haskell to calling it in C# or Java.
This shows us that programming with types can be significantly less verbose, because mechanisms like implicits and typeclasses can infer a large part of the code for your program during type checking.
On a simpler basis, even the sizes of arguments can depend on types, unless you want to pay fairly massive costs for boxing everything in your language up so that it has a homogeneous representation.
This shows us that programming with types can be significantly more efficient, because it can use dedicated representations, rather than paying for boxed structures everywhere in your code. An int can't just be a machine integer, because it has to somehow look like everything else in the system. If you're willing to give up an order of magnitude or more worth of performance at runtime, then this may not matter to you.
Finally, once we have types "doing stuff" for us, it is often beneficial to consider the refactoring benefits that mere safety provides.
If I refactor the smaller set of code that remains, it'll rewrite all that type-class plumbing for me. It'll figure out the new ways it can rewrite the code to unbox more arguments. I'm not stuck elaborating all of this stuff by hand, I can leave these mundane tasks to the type-checker.
But even when I do change the types, I can move arguments around fairly willy-nilly, comfortable that the compiler will very likely catch my errors. Types give you "free theorems" which are like unit tests for whole classes of such errors.
On the other hand, once I lock down an API in a language like Python I'm deathly afraid of changing it, because it'll silently break at runtime for all my downstream dependencies! This leads to baroque APIs that lean heavily on easily bit-rotted keyword-arguments, and the API of something that evolves over time rarely resembles what you'd build out of the box if you had it to do over again. Consequently, even the mere safety concern has long-term impact in API design once you ever want people to build on top of your work, rather than simply replace it when it gets too unwieldy.
That's because typing only eliminates programs, not enable new ones.
This is not a correct statement. Type-classes make it possible to generate parts of your program from type-level information.
Consider two expressions:
readMaybe "15" :: Maybe Integer
readMaybe "15" :: Maybe Bool
Here I'm using the readMaybe function from the Text.Read module. At term level those expressions are identical, only their type annotations are different. However, the results they produce at runtime differ (Just 15 in the first case, Nothing in the second case).
This is because the compiler generates code for you from the static type information you have. To be more precise, it selects a suitable type class instance and passes its dictionary to the polymorphic function (readMaybe in our case).
This example is simple, but there are way more complex use cases. Using the mtl library you can write computations that run in different computational contexts (aka Monads). The compiler will automatically insert a lot of code that manages the computational contexts. In a dynamically typed language, you would have no static information to make this possible.
As you can see, static typing not only cuts off incorrect programs but also writes correct ones for you.
You need "safety" when you already know what and how you want to write. It's a very small part of what types are useful for. The most important thing about types is that they make your reasoning structured. When someone writes in Python a + b he doesn't see a and b as some abstract variables — he sees them as some numbers. Types are already there in the internal language of humans, Python just doesn't have a type system to talk about them. The actual question in the "typed vs untyped (unityped) programming" dispute is "do we want to reflect our internal structured concepts in a safe and explicit or unsafe and implicit way?". Types don't introduce new concepts — it's untyped reasoning forgets the existing ones.
When someone looks at a tree (I mean a real green one) he doesn't see every single leaf on it, but he doesn't treat it as an abstract nameless object as well. "A tree" — is an approximation that is good enough for most cases and that's why we have Hindley-Milner type systems, but sometimes you want to talk about a specific tree and you do want to look at leaves. And that's what dependent types give you: the ability to zoom. "A tree without leaves", "a tree in the forest", "a tree of a particular form"... Dependently typed programming is just another step towards how humans think.
On a less abstract note, I have a type checker for a toy dependently typed language, where all typing rules are expressed as constructors of a data type. You don't need to dive into the type checking procedure to understand the rules of the system. That's the power of "zooming": you can introduce as complex invariants as you want, thus distinguishing essential parts from not important ones.
Another example of the power dependent types give you is various forms of reflection. Look e.g. at the Pierre-Évariste Dagand thesis, which proves that
generic programming is just programming
And of course types are hints, many functions and abstractions I defined I would define in a far more clumsy way in a weakly typed language, but types suggested better alternatives.
There is just no question "What to choose: simple types or dependent types?". Dependent types are always better and they of course subsume simple types. The question is "What to choose: no types or dependent types?", but that question doesn't stand for me.
Refactoring. By having a strong type system you can safely refactor code and have the compiler tell you whether what you are doing now even makes sense. The stronger the typing system, the more refactor errors are avoided. This of course means your code is a lot more maintainable.

Type classes with laws that contain not equalities/symmetries but inequalities/asymmetries

All of the type classes that I've come across, I think have had laws that establish symmetries by specifying equations. I was wondering though if there are any prominent theoretical or even practical examples of type classes that establish asymmetries, i.e. ones that demand the lack of symmetry? By e.g. specifying <expr1> /= <expr2> or <, or not somePredicate(a, b).
I understand that inequality can be expressed as an equality with a free variable, i.e. a > b = a + k = b etc, but I'm thinking the introduction of free variables itself might align with my idea of enforced asymmetry.
What would be the (theoretical) applications of such law? And are there any (runnable) examples of this?
Alternatively: if this can't be considered Haskell enough to be on SO, should this go on CS or CSTheory?
Algebraic laws in general typically are only specified in terms of equational identities, and not not disequalities. The standpoint to think about this is model theory. A theory can be thought of as 1) a collection of symbols, of different arities, so that sentences can be constructed from them (i.e. of arity 0 we might have sequences of numerals, of arity 1 we have negation, and of arity two we have addition) and 2) a set of equations that provide relations between sentences constructed from such signatures.
This lets us describe things like various arithmetic theories, groups, rings, modules, etc.
Now a model of a theory is a set of concrete assignments of mathematical objects (numbers, functions, etc) to the elements of the signature, such that the translation of sentences into the elements of the model respects these equations.
Categorically, we often think of a theory as a special sort of category of all possible sentences generated by the signature. The arrows in this category are implication -- sentences which may be generated from others by application of the equational identities. This in turn induces equivalences between all sentences which are the same under the application of the equational identities (this yields the "generators and relations" approach). And in turn, a model is simply a functor from this theory to any other category, though typically Set.
This yields a very nice adjunction between syntax and semantics. The greater the collection of sentences you want to model, the fewer the models you can get, and the more models you have, the smaller the set of sentences that will be satisfied by all of them. (Here I am only sketching the idea rather than filling in lots of important details).
In any case, one consequence of this that people tend to ignore, but that really pays off, is that in such a setting you want a "terminal model" that is the least among all models, just as you want an "initial theory" that admits all models. The terminal (aka trivial) model is the functor that sends everything in the theory to the empty set and maps on the empty set. Lots of very nice properties emerge when you have such things. But note -- to have such things, you must only have equational identities and not "disidentities." Such theories are called Algebraic Theories.
What does this all have to do with Haskell? Well, we can think of the signatures of typeclasses exactly as signatures of algebraic theories, and the laws of them as the equations of such theories. And that's generally how typeclasses are used in Haskell and why they were introduced -- to suit these sorts of situations.
But of course we don't have to do this -- we can have whatever laws we want. But we lose all sorts of nice properties along the way -- and often in fact find that disequalities mean our theory will have very few models, and with weird structure relating them. Since typeclasses are intended to capture common structure between various things, and since non-algebraic theories tend to fix unique(ish) models, then it turns out it is rarely the case that we would want to use disequality relations in typeclass laws. And indeed I can't think of any examples where I've seen it come up.
Here's another way to think of it -- consider a theory with equalities and disequalities both, and then eliminate the disequalities. What remains still admits all the prior models, but also may have a bunch of "unintended" ones. So we don't gain additional reasoning in terms of rewrites -- we just have certain models that are a priori excluded. Furthermore, when one wishes to rule out "unintended" models this is usually because we want to fix a particular "intended" one. And if we want to fix a particular intended model, the question immediately arises -- why not just use that concrete structure, instead of the more general typeclass?

Why aren't there many discussions about co- and contra-variance in Haskell (as opposed to Scala or C#)?

I know what covariance and contravariance of types are. My question is why haven't I encountered discussion of these concepts yet in my study of Haskell (as opposed to, say, Scala)?
It seems there is a fundamental difference in the way Haskell views types as opposed to Scala or C#, and I'd like to articulate what that difference is.
Or maybe I'm wrong and I just haven't learned enough Haskell yet :-)
There are two main reasons:
Haskell lacks an inherent notion of subtyping, so in general variance is less relevant.
Contravariance mostly appears where mutability is involved, so most data types in Haskell would simply be covariant and there'd be little value to distinguishing that explicitly.
However, the concepts do apply--for instance, the lifting operation performed by fmap for Functor instances is actually covariant; the terms co-/contravariance are used in Category Theory to talk about functors. The contravariant package defines a type class for contravariant functors, and if you look at the instance list you'll see why I said it's much less common.
There are also places where the idea shows up implicitly, in how manual conversions work--the various numeric type classes define conversions to and from basic types like Integer and Rational, and the module Data.List contains generic versions of some standard functions. If you look at the types of these generic versions you'll see that Integral constraints (giving toInteger) are used on types in contravariant position, while Num constraints (giving fromInteger) are used for covariant position.
There are no "sub-types" in Haskell, so covariance and contravariance don't make any sense.
In Scala, you have e.g. Option[+A] with the subclasses Some[+A] and None. You have to provide the covariance annotations + to say that an Option[Foo] is an Option[Bar] if Foo extends Bar. Because of the presence of sub-types, this is necessary.
In Haskell, there are no sub-types. The equivalent of Option in Haskell, called Maybe, has this definition:
data Maybe a = Nothing | Just a
The type variable a can only ever be one type, so no further information about it is necessary.
As mentioned, Haskell does not have subtypes. However, if you're looking at typeclasses it may not be clear how that works without subtyping.
Typeclasses specify predicates on types, not types themselves. So when a Typeclass has a superclass (e.g. Eq a => Ord a), that doesn't mean instances are subtypes, because only the predicates are inherited, not the types themselves.
Also, co-, contra-, and in- variance mean different things in different fields of math (see Wikipedia). For example the terms covariant and contravariant are used in functors (which in turn are used in Haskell), but the terms mean something completely different. The term invariant can be used in a lot of places.

Resources