Understanding Haskell's Bool Deriving an Ord - haskell

Learn You a Haskell presents the Bool type:
data Bool = False | True deriving (Ord)
I don't understand the reason for comparing Bool's.
> False `compare` True
LT
> True `compare` False
GT
What would be lost if Bool did not derive from Ord?

Bool forms a bounded lattice* where False is bottom and True is top. This bounded lattice defines a (total) ordering where False really is strictly less than True. (They are also the only elements of this lattice.)
The boolean operations and and or can also be looked at as meet and join, respectively, in this lattice. Meet finds the greatest lower bound and join finds the least upper bound. This means that a && False = False is the same thing as saying that the lower bound of bottom and anything else is bottom, and a || True = True is the same thing as saying that the upper bound of top and anything is top. So meet and join, which use the ordering property of the booleans, are equivalent to the boolean operations you are familiar with.
You can use min and max to show this in Haskell:
False `min` True = False -- this is the greatest lower bound
False && True = False -- so is this
False `max` True = True -- this is the least upper bound
False || True = True -- so is this
This shows that you can define && and || just from the derived Ord instance:
(&&) = min
(||) = max
Note that these definitions are not equivalent in the presence of a different kind of bottom because (&&) and (||) are short-circuiting (non-strict in the second argument when the first is False or True, respectively) while min and max are not.
Also, a small correction: The deriving clause does not say thatBool "derives from" Ord. It instructs GHC to derive an instance of the typeclass Ord for the type Bool.
* More specifically, a complemented distributive lattice. More specifically still, a boolean algebra.

The Ord instance for Bool becomes much more important when you need to compare values that contain Bool somewhere inside. For example, without it we wouldn't be able to write expressions like:
[False,True] `compare` [False,True,False]
(3, False) < (3, True)
data Person = Person { name :: String, member :: Bool } deriving (Eq, Ord)
etc.

It is because Haskell designers made a mistake! I never saw a mathematics textbook that mentioned ordering of booleans. Just beacuse they can be it does not mean with should. Some of us use Haskell exactly because it disallows/protects us from confusing/nonsensical things in many cases but not this one.
instance Ord Bool causes a => b to mean what you expect a <= b to mean!
Earlier arguments in favour of instance Ord Bool where that you can make more types comparable implicitly. Continuing that line of argument some might want to make every type comparable impicitly and even have weak dynamic typing and omit type classes altogether. But we want strong typing exactly to disallow what is not obviously correct, and instance Ord Bool defeats that purpose.
As for the argument that Bool is a bounded lattice. Unlike boolean:={True,False}, what we have in Haskell is Bool:={True,False,bottom} is no longer a bounded lattice since neither True nor False are identity elements in the presense of bottom. That is related to those comments discussing && vs min etc.

Related

The difference between algebraic data types and subclasses in data Bool = False | True

I am not really familiar with Haskell but am looking through some of it. I noticed this:
data Bool = False | True
In an OO language this could be done sort of using subclasses:
class Bool
class False < Bool
class True < Bool
Wondering at a high level what the difference is between these two constructs. Wondering if a simple algebraic data type can be considered a class and its subclasses. If not, why not.
It makes more sense to think of sum types as typed disjoint unions. Subclasses in OO languages on the other hand share data layout, whereas in Haskell, the data constructors can be completely disjoint. – Tobias
Bool is a type, while False and True are values. A key difference lies in that. – duplode
To see it, try :k at GHCi propmt:
~> :k Bool
Bool :: *
~> :k False
***error***
This is because Bool is a type of things, and False (creates) a thing. It is a (nullary) data constructor, which happens to not require any arguments:
x :: Bool
x = False
Things have types:
~> :t False
False :: Bool
~> :t x
x :: Bool
~> :t Bool
***error***

Confusion about type level bools in base

Base contains a number of type families for type level boolean operations as seen here:
https://hackage.haskell.org/package/base-4.8.2.0/docs/Data-Type-Bool.html
However, the links to True and False just refer to data constructors of Bool, not types themselves, so presumably they can't be used for type level operations, as they're values, not types.
Could someone explain what's happening here and where I can find the type level definitions?
the links to True and False just refer to data constructors of Bool, not types themselves
That's right. In fact they are not types (what values would e.g. True have*)? They are still themselves just values, of type Bool. Only, that entire type has been “lifted” one level through the -XDataKinds extension, so Bool is now also a type-level type: aka a kind.
Traditionally in Haskell, we work mainly with a damn single kind: *, the kind of ordinary types†. This kind contains Bool and String and IO () and (Int -> Double) -> Char... everything that actually has values&ddagger;. Plus the constructor-kinds, which are all of some form * -> * and contain things like Maybe or [] (when not applied to a contained-type argument).
With DataKinds now, we have a whole added arsenal of kinds: any type§ you could use in runtime-Haskell code can now also be used as a kind in compile-time! All of these kinds contain exactly the values they also have on the runtime level. But those type-level values, such as the False and True you asked about, are not actually types, they just live in the type-level. But you can build actual * types from them, e.g. with something like
data CanContain :: Bool -> * -> * where
Interesting :: a -> CanContain True a
Boring :: CanContain False a
then a function with type X -> CanContain True Y must actually generate an Y value, but a function with type X -> CanContain False Y needs not.
*No, the answer is not True. Though, then we could implement type Bool = Either False True, which would kinda make sense.
†Arguably, that not a very good name in a language which otherwise parses * as an infix symbol. It will actually be changed in the future.
&ddagger;It's not quite so simple: there are also unboxed kinds, but those are a bit of a technical detail.
§As dfeuer remarks, not all types can be lifted right now (GHC-7.10), but simple ones such as Bool certainly can.

Ordering of Bool types (i.e. True > False) - Why? [duplicate]

This question already has answers here:
Understanding Haskell's Bool Deriving an Ord
(3 answers)
Closed 7 years ago.
Can someone please explain the following output?
Prelude> compare True False
GT
it :: Ordering
Prelude> compare False True
LT
it :: Ordering
Why are Bool type values ordered in Haskell - especially, since we can demonstrate that values of True and False are not exactly 1 and 0 (unlike many other languages)?
This is how the derived instance of Ord works:
data D = A | B | C deriving Ord
Given that datatype, we get C > B > A. Bool is defined as False | True, and it kind of makes sense when you look at other examples such as:
Maybe a = Nothing | Just a
Either a b = Left a | Right b
In each of the case having "some" ("truthy") value is greater than having no values at all (or having "left" or "bad" or "falsy" value).
While Bool is not Int, it can be converted to the 0,1 fragment of Int since it is an Enum type.
fromEnum False = 0
fromEnum True = 1
Now, the Enum could have been different, reversing 0 and 1, but that would probably be surprising to most programmers thinking about bits.
Since it has an Enum type, everything else being equal, it's better to define an Ord instance which follows the same order, satisfying
compare x y = compare (fromEnum x) (fromEnum y)
In fact, each instance generated from deriving (Eq, Ord, Enum) follows such property.
On a more theoretical note, logicians tend to order propositions from the strongest to the weakest (forming a lattice). In this structure, False (as a proposition) is the bottom, i.e. the least element, while True is the top. While this is only a convention (theory would be just as nice if we picked the opposite ordering), it's a good thing to be consistent.
Minor downside: the implication boolean connective is actually p <= q expressing that p implies q, instead of the converse as the "arrow" seems to indicate.
Let me answer your question with a question: Why is there an Ord instance for ()?
Unlike Bool, () has only one possible value: (). So why the hell would you ever want to compare it? There is only one value possible!
Basically, it's useful if all or most of the standard basic types have instances for common classes. It makes it easier to derive instances for your own types. If Foo doesn't have an Ord instance, and your new type has a single Foo field, then you can't auto-derive an Ord instance.
You might, for example, have some kind of tree type where we can attach several items of information to the leaves. Something like Tree x y z. And you might want to have an Eq instance to compare trees. It would be annoying if Tree () Int String didn't have an Eq instance just because () doesn't. So that's why () has Eq (and Ord and a few others).
Similar remarks apply to Bool. It might not sound particularly useful to compare two bool values, but it would be irritating if your Ord instance vanishes as soon as you put a bool in there.
(One other complicating factor is that sometimes we want Ord because there's a logically meaningful ordering for things, and sometimes we just want some arbitrary order, typically so we can use something as a key for Data.Map or similar. Arguably there ought to be two separate classes for that… but there isn't.)
Basically, it comes from math. In set theory or category theory boolean functions are usually thought of as classifiers of subsets/subobjects. In plain terms, function f :: a -> Bool is identified with filter f :: [a] -> [a]. So, if we change one value from False to True, the resulting filtered list (subset, subobject, whatever) is going to have more elements. Therefore, True is considered "bigger" than False.

How can quotient types help safely expose module internals?

Reading up on quotient types and their use in functional programming, I came across this post. The author mentions Data.Set as an example of a module which provides a ton of functions which need access to module's internals:
Data.Set has 36 functions, when all that are really needed to ensure the meaning of a set ("These elements are distinct") are toList and fromList.
The author's point seems to be that we need to "open up the module and break the abstraction" if we forgot some function which can be implemented efficiently only using module's internals.
He then says
We could alleviate all of this mess with quotient types.
but gives no explanation to that claim.
So my question is: how are quotient types helping here?
EDIT
I've done a bit more research and found a paper "Constructing Polymorphic Programs with Quotient Types". It elaborates on declaring quotient containers and mentions the word "efficient" in abstract and introduction. But if I haven't misread, it does not give any example of an efficient representation "hiding behind" a quotient container.
EDIT 2
A bit more is revealed in "[PDF] Programming in Homotopy Type Theory" paper in Chapter 3. The fact that quotient type can be implemented as a dependent sum is used. Views on abstract types are introduced (which look very similar to type classes to me) and some relevant Agda code is provided. Yet the chapter focuses on reasoning about abstract types, so I'm not sure how this relates to my question.
I recently made a blog post about quotient types, and I was led here by a comment. The blog post may provide some additional context in addition to the papers referenced in the question.
The answer is actually pretty straightforward. One way to arrive at it is to ask the question: why are we using an abstract data type in the first place for Data.Set?
There are two distinct and separable reasons. The first reason is to hide the internal type behind an interface so that we can substitute a completely new type in the future. The second reason is to enforce implicit invariants on values of the internal type. Quotient type and their dual subset types allow us to make the invariants explicit and enforced by the type checker so that we no longer need to hide the representation. So let me be very clear: quotient (and subset) types do not provide you with any implementation hiding. If you implement Data.Set with quotient types using lists as your representation, then later decide you want to use trees, you will need to change all code that uses your type.
Let's start with a simpler example (leftaroundabout's). Haskell has an Integer type but not a Natural type. A simple way to specify Natural as a subset type using made up syntax would be:
type Natural = { n :: Integer | n >= 0 }
We could implement this as an abstract type using a smart constructor that threw an error when given a negative Integer. This type says that only a subset of the values of type Integer are valid. Another approach we could use to implement this type is to use a quotient type:
type Natural = Integer / ~ where n ~ m = abs n == abs m
Any function h :: X -> T for some type T induces a quotient type on X quotiented by the equivalence relation x ~ y = h x == h y. Quotient types of this form are more easily encoded as abstract data types. In general, though, there may not be such a convenient function, e.g.:
type Pair a = (a, a) / ~ where (a, b) ~ (x, y) = a == x && b == y || a == y && b == x
(As to how quotient types relate to setoids, a quotient type is a setoid that enforces that you respect its equivalence relation.) This second definition of Natural has the property that there are two values that represent 2, say. Namely, 2 and -2. The quotient type aspect says we are allowed to do whatever we want with the underlying Integer, so long as we never produce a result that differentiates between these two representatives. Another way to see this is that we can encode a quotient type using subset types as:
X/~ = forall a. { f :: X -> a | forEvery (\(x, y) -> x ~ y ==> f x == f y) } -> a
Unfortunately, that forEvery is tantamount to checking equality of functions.
Zooming back out, subset types add constraints on producers of values and quotient types add constraints on consumers of values. Invariants enforced by an abstract data type may be a mixture of these. Indeed, we may decide to represent a Set as the following:
data Tree a = Empty | Branch (Tree a) a (Tree a)
type BST a = { t :: Tree a | isSorted (toList t) }
type Set a = { t :: BST a | noDuplicates (toList t) } / ~
where s ~ t = toList s == toList t
Note, nothing about this ever requires us to actually execute isSorted, noDuplicates, or toList. We "merely" need to convince the type checker that the implementations of functions on this type would satisfy these predicates. The quotient type allows us to have a redundant representation while enforcing that we treat equivalent representations in the same way. This doesn't mean we can't leverage the specific representation we have to produce a value, it just means that we must convince the type checker that we would have produced the same value given a different, equivalent representation. For example:
maximum :: Set a -> a
maximum s = exposing s as t in go t
where go Empty = error "maximum of empty Set"
go (Branch _ x Empty) = x
go (Branch _ _ r) = go r
The proof obligation for this is that the right-most element of any binary search tree with the same elements is the same. Formally, it's go t == go t' whenever toList t == toList t'. If we used a representation that guaranteed the tree would be balanced, e.g. an AVL tree, this operation would be O(log N) while converting to a list and picking the maximum from the list would be O(N). Even with this representation, this code is strictly more efficient than converting to a list and getting the maximum from the list. Note, that we could not implement a function that displayed the tree structure of the Set. Such a function would be ill-typed.
I'll give a simpler example where it's reasonably clear. Admittedly I myself don't really see how this would translate to something like Set, efficiently.
data Nat = Nat (Integer / abs)
To use this safely, we must be sure that any function Nat -> T (with some non-quotient T, for simplicity's sake) does not depend on the actual integer value, but only on its absolute. To do so, it's not really necessary to hide Integer completely; it would be sufficient to prevent you from matching on it directly. Instead, the compiler might rewrite the matches, e.g.
even' :: Nat -> Bool
even' (Nat 0) = True
even' (Nat 1) = False
even' (Nat n) = even' . Nat $ n - 2
could be rewritten to
even' (Nat n') = case abs n' of
[|abs 0|] -> True
[|abs 1|] -> False
n -> even' . Nat $ n - 2
Such a rewriting would point out equivalence violations, e.g.
bad (Nat 1) = "foo"
bad (Nat (-1)) = "bar"
bad _ = undefined
would rewrite to
bad (Nat n') = case n' of
1 -> "foo"
1 -> "bar"
_ -> undefined
which is obviously an overlapped pattern.
Disclaimer: I just read up on quotient types upon reading this question.
I think the author's just saying that sets can be described as quotient types over lists. Ie: (making up some haskell-like syntax):
data Set a = Set [a] / (sort . nub) deriving (Eq)
Ie, a Set a is just a [a] with equality between two Set a's determined by whether the sort . nub of the underlying lists are equal.
We could do this explicitly like this, I guess:
import Data.List
data Set a = Set [a] deriving (Show)
instance (Ord a, Eq a) => Eq (Set a) where
(Set xs) == (Set ys) = (sort $ nub xs) == (sort $ nub ys)
Not sure if this is actually what the author intended as this isn't a particularly efficient way of implementing a set. Someone can feel free to correct me.

Haskell: do standard libraries assume Eq and Ord are compatible?

This is a followup question to Inconsistent Eq and Ord instances?.
The question there is essentially: when declaring Eq and Ord instances for a type, must one ensure that compare x y returns EQ if and only if x == y returns True? Is it dangerous to create instances that break this assumption? It seems like a natural law one might assume, but it doesn’t appear to be explicitly stated in the Prelude, unlike e.g. the monad or functor laws.
The basic response was: it is a bit dangerous to do this, since libraries may assume that this law holds.
My question, now, is: do any of the standard libraries (in particular, Set or Map) make this assumption? Is it dangerous to have a type with incompatible Eq and Ord, so long as I am only relying on the standard libraries supplied with GHC? (If big-list questions were still acceptable, I would be asking: which commonly used libraries assume this law?)
Edit. My use-case is similar to that of the original question. I have a type with a custom instance of Eq, that I use quite a bit. The only reason I want Ord is so that I can use it as the domain of a Map; I don’t care about the specific order, and will never use it explicitly in code. So if I can use the derived instance of Ord, then my life will be easier and my code clearer.
The definition of Ord itself in the standard prelude requires there already be an Eq instance:
class (Eq a) => Ord a where
...
So it would be just as wrong to violate
x == y = compare x y == EQ
x /= y = compare x y /= EQ
As it would be to violate (from the default definitions for these operators in Ord).
x <= y = compare x y /= GT
x < y = compare x y == LT
x >= y = compare x y /= LT
x > y = compare x y == GT
Edit: Use in libraries
I would be quite surprised if standard libraries didn't make use of Ord's == and /= operators. The specific purpose operators (==, /=, <=, <, >=, >) are frequently more convenient than compare, so I'd expect to see them used in code for maps or filters.
You can see == being used in guards on keys in Data.Map in fromAscListWithKey. This specific function only calls out for the Eq class, but if the key is also an Ord instance, Ord's compare will be used for other functions of the resulting Map, which is an assumption that Eq's == is the same as Ord's compare and testing for EQ.
As a library programmer, I wouldn't be surprised if any of the special purpose operators outperformed compare for the specific purpose. After all, that's why they are part of the Eq and Ord classes instead of being defined as polymorphic for all Eq or Ord instances. I might make a point of using them even when compare is more convenient. If I did, I'd probably define something like:
compareOp :: (Ord a) => Ordering -> Bool -> a -> a -> Bool
compareOp EQ True = (==)
compareOp EQ False = (/=)
compareOp LT True = (<)
compareOp LT False = (>=)
compareOp GT True = (>)
compareOp GT False = (<=)
To extend Cirdec's answer, typeclass instances should only be made if the operation being defined is somehow canonical. If there is a reasonable Eq which doesn't extend to a reasonable Ord, then it's best practice to pick either the other Eq or to not define an Ord. It's easy enough to create a non-polymorphic function for the "other" equality.
A great example of this tension is the potential Monoid instance
instance Monoid Int where
mzero = 0
mappend = (+)
which contests with the other "obvious" Monoid instance
instance Monoid Int where
mzero = 1
mappend = (*)
In this case the chosen path was to instantiate neither because it's not clear that one is "canonical" over the other. This typically conforms best to a user's expectation and which prevent bugs.
I've read through this and your original question, so I will address your general problem....
You want this-
Map BigThing OtherType
and this-
(==)::BigThing->BigThing->Bool
One of these cases has to be exact, the other case should ignore some of its data, for performance reasons. (it was (==) that needed to be exact in the first question, but it looks like you might be addressing the reverse in this question.... Same answer either way).
For instance, you want the map to only store the result based on some label, like a
`name::BigThing->String`
but (==) should do a deep compare. One way to do this would be to define incompatible compare and (==) functions. However....
in this case, this is unnecessary. Why not just instead use the map
Map String OtherThing
and do a lookup like this-
lookup (name obj) theMap
It is pretty rare to index directly on very large document data....

Resources