Ordering of Bool types (i.e. True > False) - Why? [duplicate] - haskell

This question already has answers here:
Understanding Haskell's Bool Deriving an Ord
(3 answers)
Closed 7 years ago.
Can someone please explain the following output?
Prelude> compare True False
GT
it :: Ordering
Prelude> compare False True
LT
it :: Ordering
Why are Bool type values ordered in Haskell - especially, since we can demonstrate that values of True and False are not exactly 1 and 0 (unlike many other languages)?

This is how the derived instance of Ord works:
data D = A | B | C deriving Ord
Given that datatype, we get C > B > A. Bool is defined as False | True, and it kind of makes sense when you look at other examples such as:
Maybe a = Nothing | Just a
Either a b = Left a | Right b
In each of the case having "some" ("truthy") value is greater than having no values at all (or having "left" or "bad" or "falsy" value).

While Bool is not Int, it can be converted to the 0,1 fragment of Int since it is an Enum type.
fromEnum False = 0
fromEnum True = 1
Now, the Enum could have been different, reversing 0 and 1, but that would probably be surprising to most programmers thinking about bits.
Since it has an Enum type, everything else being equal, it's better to define an Ord instance which follows the same order, satisfying
compare x y = compare (fromEnum x) (fromEnum y)
In fact, each instance generated from deriving (Eq, Ord, Enum) follows such property.
On a more theoretical note, logicians tend to order propositions from the strongest to the weakest (forming a lattice). In this structure, False (as a proposition) is the bottom, i.e. the least element, while True is the top. While this is only a convention (theory would be just as nice if we picked the opposite ordering), it's a good thing to be consistent.
Minor downside: the implication boolean connective is actually p <= q expressing that p implies q, instead of the converse as the "arrow" seems to indicate.

Let me answer your question with a question: Why is there an Ord instance for ()?
Unlike Bool, () has only one possible value: (). So why the hell would you ever want to compare it? There is only one value possible!
Basically, it's useful if all or most of the standard basic types have instances for common classes. It makes it easier to derive instances for your own types. If Foo doesn't have an Ord instance, and your new type has a single Foo field, then you can't auto-derive an Ord instance.
You might, for example, have some kind of tree type where we can attach several items of information to the leaves. Something like Tree x y z. And you might want to have an Eq instance to compare trees. It would be annoying if Tree () Int String didn't have an Eq instance just because () doesn't. So that's why () has Eq (and Ord and a few others).
Similar remarks apply to Bool. It might not sound particularly useful to compare two bool values, but it would be irritating if your Ord instance vanishes as soon as you put a bool in there.
(One other complicating factor is that sometimes we want Ord because there's a logically meaningful ordering for things, and sometimes we just want some arbitrary order, typically so we can use something as a key for Data.Map or similar. Arguably there ought to be two separate classes for that… but there isn't.)

Basically, it comes from math. In set theory or category theory boolean functions are usually thought of as classifiers of subsets/subobjects. In plain terms, function f :: a -> Bool is identified with filter f :: [a] -> [a]. So, if we change one value from False to True, the resulting filtered list (subset, subobject, whatever) is going to have more elements. Therefore, True is considered "bigger" than False.

Related

Using different Ordering for Sets

I was reading a Chapter 2 of Purely Functional Data Structures, which talks about unordered sets implemented as binary search trees. The code is written in ML, and ends up showing a signature ORDERED and a functor UnbalancedSet(Element: ORDERED): SET. Coming from more of a C++ background, this makes sense to me; custom comparison function objects form part of the type and can be passed in at construction time, and this seems fairly analogous to the ML functor way of doing things.
When it comes to Haskell, it seems the behavior depends only on the Ord instance, so if I wanted to have a set that had its order reversed, it seems like I'd have to use a newtype instance, e.g.
newtype ReverseInt = ReverseInt Int deriving (Eq, Show)
instance Ord ReverseInt where
compare (ReverseInt a) (ReverseInt b)
| a == b = EQ
| a < b = GT
| a > b = LT
which I could then use in a set:
let x = Set.fromList $ map ReverseInt [1..5]
Is there any better way of doing this sort of thing that doesn't resort to using newtype to create a different Ord instance?
No, this is really the way to go. Yes, having a newtype is sometimes annoying but you get some big benefits:
When you see a Set a and you know a, you immediately know what type of comparison it uses (sort of the same way that purity makes code more readable by not making you have to trace execution). You don't have to know where that Set a comes from.
For many cases, you can coerce your way through multiple newtypes at once. For example, I can turn xs = [1,2,3] :: Int into ys = [ReverseInt 1, ReverseInt 2, ReverseInt 3] :: [ReverseInt] just using ys = coerce xs :: [ReverseInt]. Unfortunately, that isn't the case for Set (and it shouldn't - you'd need the coercion function to be monotonic to not screw up the data structure invariants, and there is not yet a way to express that in the type system).
newtypes end up being more composable than you expect. For example, the ReverseInt type you made already exists in a form that generalizes to reversing any type with an Ord constraint: it is called Down. To be explicit, you could use Down Int instead of ReversedInt, and you get the instance you wrote out for free!
Of course, if you still feel very strongly about this, nothing is stopping you from writing your version of Set which has to have a field which is the comparison function it uses. Something like
data Set a = Set { comparisionKey :: a -> a -> Ordering
, ...
}
Then, every time you make a Set, you would have to pass in the comparison key.

Why do We Need Sum Types?

Imagine a language which doesn't allow multiple value constructors for a data type. Instead of writing
data Color = White | Black | Blue
we would have
data White = White
data Black = Black
data Blue = Black
type Color = White :|: Black :|: Blue
where :|: (here it's not | to avoid confusion with sum types) is a built-in type union operator. Pattern matching would work in the same way
show :: Color -> String
show White = "white"
show Black = "black"
show Blue = "blue"
As you can see, in contrast to coproducts it results in a flat structure so you don't have to deal with injections. And, unlike sum types, it allows to randomly combine types resulting in greater flexibility and granularity:
type ColorsStartingWithB = Black :|: Blue
I believe it wouldn't be a problem to construct recursive data types as well
data Nil = Nil
data Cons a = Cons a (List a)
type List a = Cons a :|: Nil
I know union types are present in TypeScript and probably other languages, but why did the Haskell committee chose ADTs over them?
Haskell's sum type is very similar to your :|:.
The difference between the two is that the Haskell sum type | is a tagged union, while your "sum type" :|: is untagged.
Tagged means every instance is unique - you can distunguish Int | Int from Int (actually, this holds for any a):
data EitherIntInt = Left Int | Right Int
In this case: Either Int Int carries more information than Int because there can be a Left and Right Int.
In your :|:, you cannot distinguish those two:
type EitherIntInt = Int :|: Int
How do you know if it was a left or right Int?
See the comments for an extended discussion of the section below.
Tagged unions have another advantage: The compiler can verify whether you as the programmer handled all cases, which is implementation-dependent for general untagged unions. Did you handle all cases in Int :|: Int? Either this is isomorphic to Int by definition or the compiler has to decide which Int (left or right) to choose, which is impossible if they are indistinguishable.
Consider another example:
type (Integral a, Num b) => IntegralOrNum a b = a :|: b -- untagged
data (Integral a, Num b) => IntegralOrNum a b = Either a b -- tagged
What is 5 :: IntegralOrNum Int Double in the untagged union? It is both an instance of Integral and Num, so we can't decide for sure and have to rely on implementation details. On the other hand, the tagged union knows exactly what 5 should be because it is branded with either Left or Right.
As for naming: The disjoint union in Haskell is a union type. ADTs are only a means of implementing these.
I will try to expand the categorical argument mentioned by #BenjaminHodgson.
Haskell can be seen as the category Hask, in which objects are types and morphisms are functions between types (disregarding bottom).
We can define a product in Hask as tuple - categorically speaking it meets the definition of the product:
A product of a and b is the type c equipped with projections p and q such that p :: c -> a and q :: c -> b and for any other candidate c' equipped with p' and q' there exists a morphism m :: c' -> c such that we can write p' as p . m and q' as q . m.
Read up on this in Bartosz' Category Theory for Programmers for further information.
Now for every category, there exists the opposite category, which has the same morphism but reverses all the arrows. The coproduct is thus:
The coproduct c of a and b is the type c equipped with injections i :: a -> c and j :: b -> c such that for all other candidates c' with i' and j' there exists a morphism m :: c -> c' such that i' = m . i and j' = m . j.
Let's see how the tagged and untagged union perform given this definition:
The untagged union of a and b is the type a :|: b such that:
i :: a -> a :|: b is defined as i a = a and
j :: b -> a :|: b is defined as j b = b
However, we know that a :|: a is isomorphic to a. Based on that observation we can define a second candidate for the product a :|: a :|: b which is equipped with the exact same morphisms. Therefore, there is no single best candidate, since the morphism m between a :|: a :|: b and a :|: b is id. id is a bijection, which implies that m is invertible and "convert" types either way. A visual representation of that argument. Replace p with i and q with j.
Restricting ourselves Either, as you can verify yourself with:
i = Left and
j = Right
This shows that the categorical complement of the product type is the disjoint union, not the set-based union.
The set union is part of the disjoint union, because we can define it as follows:
data Left a = Left a
data Right b = Right b
type DisjUnion a b = Left a :|: Right b
Because we have shown above that the set union is not a valid candidate for the coproduct of two types, we would lose many "free" properties (which follow from parametricity as leftroundabout mentioned) by not choosing the disjoint union in the category Hask (because there would be no coproduct).
This is an idea I've thought a lot about myself: a language with “first-class type algebra”. Pretty sure we could do about everything this way that we do in Haskell. Certainly if these disjunctions were, like Haskell alternatives, tagged unions; then you could directly rewrite any ADT to use them. In fact GHC can do this for you: if you derive a Generic instance, a variant type will be represented by a :+: construct, which is in essence just Either.
I'm not so sure if untagged unions would also do. As long as you require the types participating in a sum to be discernibly different, the explicit tagging should in principle not be necessary. The language would then need a convenient way to match on types at runtime. Sounds a lot like what dynamic languages do – obviously comes with quite some overhead though.
The biggest problem would be that if the types on both sides of :|: must be unequal then you lose parametricity, which is one of Haskell's nicest traits.
Given that you mention TypeScript, it is instructive to have a look at what its docs have to say about its union types. The example there starts from a function...
function padLeft(value: string, padding: any) { //etc.
... that has a flaw:
The problem with padLeft is that its padding parameter is typed as any. That means that we can call it with an argument that’s neither a number nor a string
One plausible solution is then suggested, and rejected:
In traditional object-oriented code, we might abstract over the two types by creating a hierarchy of types. While this is much more explicit, it’s also a little bit overkill.
Rather, the handbook suggests...
Instead of any, we can use a union type for the padding parameter:
function padLeft(value: string, padding: string | number) { // etc.
Crucially, the concept of union type is then described in this way:
A union type describes a value that can be one of several types.
A string | number value in TypeScript can be either of string type or of number type, as string and number are subtypes of string | number (cf. Alexis King's comment to the question). An Either String Int value in Haskell, however, is neither of String type nor of Int type -- its only, monomorphic, type is Either String Int. Further implications of that difference show up in the remainder of the discussion:
If we have a value that has a union type, we can only access members that are common to all types in the union.
In a roughly analogous Haskell scenario, if we have, say, an Either Double Int, we cannot apply (2*) directly on it, even though both Double and Int have instances of Num. Rather, something like bimap is necessary.
What happens when we need to know specifically whether we have a Fish? [...] we’ll need to use a type assertion:
let pet = getSmallPet();
if ((<Fish>pet).swim) {
(<Fish>pet).swim();
}
else {
(<Bird>pet).fly();
}
This sort of downcasting/runtime type checking is at odds with how the Haskell type system ordinarily works, even though it can be implemented using the very same type system (also cf. leftaroundabout's answer). In contrast, there is nothing to figure out at runtime about the type of an Either Fish Bird: the case analysis happens at value level, and there is no need to deal with anything failing and producing Nothing (or worse, null) due to runtime type mismatches.

Haskell Confusing Type Classes / Polymorphism

So basically I've past learning this part way back a month ago, and I can do more complicated stuff but I still don't understand when I need "Ord" or "Eq" etc in my type definitions. When I look it up online its just so confusing to me for some reason.
E.g.
my_min :: Ord a => a -> a -> a
my_min n1 n2 = if n1<n2 then n1 else n2
Why does this need Ord? Can you give an example of when you need Eq as well (a very simple one)? I need a very clear, basic explanation of when you need to put these, what to look out for to do that, and how to avoid using these at all if possible.
Usually I just need something like "[Int] -> Int -> [Int]" so i know ok this function takes an integer list and an integer, and returns an integer list. But when it includes Eq or Ord I have no idea.
What about this Eq type in this example I found in my lecture notes about finding two lists is identical, how does it apply?
identical :: Eq a =>[a]->[a]->Bool
identical [] [] = True
identical [] _ = False
identical _ [] = False
identical (h1:t1) (h2:t2) =
if h1==h2
then (identical t1 t2)
else False
Thank you.
Ord implies that the thing can be ordered, which means that you can say a is smaller (or greater) than b. Using only Eq is like saying: I know that these two items are not the same, but I cannot say which one is greater or smaller. For example if you take a traffic light as a data type:
data TLight = Red | Yellow | Green deriving (Show, Eq)
instance Eq TLight where
Green == Green = True
Yellow == Yellow = True
Red == Red = True
_ == _ = False
Now we can say: Red is unequal to Yellow but we cannot say what is greater. This is the reason why you could not use TLight in your my_min. You cannot say which one is greater.
To your second question: "Is there any case where you have to use Eq and Ord?":
Ord implies Eq. This means that if a type can be ordered, you can also check it for equality.
You said you have mostly dealt with [Int] -> Int -> [Int] and you then knew it takes a list of integer and an integer and returns an integer. Now if you want to generalise your function you have to ask yourself: Do the possible types I want to use in my function need any special functionality? like if they have to be able to be ordered or equated.
Lets do a few examples: Say we want to write a function which takes a list of type a and an element of type a and returns the lisy with the element consed onto it. How would it's type signature look like? Lets start with simply this:
consfunc :: [a] -> a -> [a]
Do we need any more functionality? No! Our type a can be anything because we do not need it to be able to be ordered simple because that is mot what our function should do.
Now what if we want to take a list and an element and check if the element is in the list already? The beginning type signature is:
elemfunc :: [a] -> a -> Bool
Now does our element have to be able to do something special? Yes it does, we have to be able to check if it is equal to any element in the list, which says that our type a has to be equatable, so our type signature looks like this:
elemfunc :: (Eq a) => [a] -> a -> Bool
Now what if we want to take a list and a element and insert it if it is smaller than the first element? Can you guess how the type signature would look like?
Lets begin with the standard again and ask ourselves: Do we need more than just knowing that the element and the list have to be of the same type: Yes, becuase our condition needs to perform a test that requires our type to be ordered, we have to include Ord in our type signature:
conditionalconsfunc :: (Ord a) => [a] -> a -> [a]
Edit:
Well you want to see if two lists are identical, so there are two things you have to look out for:
Your lists have to contain the same type and the things inside the list have to be equatable, hence the Eq.
If you are working with fixed types like Int, you never need class constraints. These only arise when working with polymorphic code.
You need Eq if you ever use the == or /= functions, or if you call any other functions that do. (I.e., if you call a function that has Eq in its type, then your type needs to have Eq as well.)
You need Ord if you ever use <, >, compare or similar functions. (Again, or if you call something that does.)
Note that you do not need Eq if you only do pattern matching. Thus, the following are different:
factorial 1 = 1
factorial n = n * factorial (n-1)
-- Only needs Num.
factorial n = if n == 1 then 1 else n * factorial (n-1)
-- Needs Num and Eq.
factorial n = if n < 2 then 1 else n * factorial (n-1)
-- Needs Num, Eq and Ord. (But Ord implies Eq automatically.)

Understanding Haskell's Bool Deriving an Ord

Learn You a Haskell presents the Bool type:
data Bool = False | True deriving (Ord)
I don't understand the reason for comparing Bool's.
> False `compare` True
LT
> True `compare` False
GT
What would be lost if Bool did not derive from Ord?
Bool forms a bounded lattice* where False is bottom and True is top. This bounded lattice defines a (total) ordering where False really is strictly less than True. (They are also the only elements of this lattice.)
The boolean operations and and or can also be looked at as meet and join, respectively, in this lattice. Meet finds the greatest lower bound and join finds the least upper bound. This means that a && False = False is the same thing as saying that the lower bound of bottom and anything else is bottom, and a || True = True is the same thing as saying that the upper bound of top and anything is top. So meet and join, which use the ordering property of the booleans, are equivalent to the boolean operations you are familiar with.
You can use min and max to show this in Haskell:
False `min` True = False -- this is the greatest lower bound
False && True = False -- so is this
False `max` True = True -- this is the least upper bound
False || True = True -- so is this
This shows that you can define && and || just from the derived Ord instance:
(&&) = min
(||) = max
Note that these definitions are not equivalent in the presence of a different kind of bottom because (&&) and (||) are short-circuiting (non-strict in the second argument when the first is False or True, respectively) while min and max are not.
Also, a small correction: The deriving clause does not say thatBool "derives from" Ord. It instructs GHC to derive an instance of the typeclass Ord for the type Bool.
* More specifically, a complemented distributive lattice. More specifically still, a boolean algebra.
The Ord instance for Bool becomes much more important when you need to compare values that contain Bool somewhere inside. For example, without it we wouldn't be able to write expressions like:
[False,True] `compare` [False,True,False]
(3, False) < (3, True)
data Person = Person { name :: String, member :: Bool } deriving (Eq, Ord)
etc.
It is because Haskell designers made a mistake! I never saw a mathematics textbook that mentioned ordering of booleans. Just beacuse they can be it does not mean with should. Some of us use Haskell exactly because it disallows/protects us from confusing/nonsensical things in many cases but not this one.
instance Ord Bool causes a => b to mean what you expect a <= b to mean!
Earlier arguments in favour of instance Ord Bool where that you can make more types comparable implicitly. Continuing that line of argument some might want to make every type comparable impicitly and even have weak dynamic typing and omit type classes altogether. But we want strong typing exactly to disallow what is not obviously correct, and instance Ord Bool defeats that purpose.
As for the argument that Bool is a bounded lattice. Unlike boolean:={True,False}, what we have in Haskell is Bool:={True,False,bottom} is no longer a bounded lattice since neither True nor False are identity elements in the presense of bottom. That is related to those comments discussing && vs min etc.

Haskell: do standard libraries assume Eq and Ord are compatible?

This is a followup question to Inconsistent Eq and Ord instances?.
The question there is essentially: when declaring Eq and Ord instances for a type, must one ensure that compare x y returns EQ if and only if x == y returns True? Is it dangerous to create instances that break this assumption? It seems like a natural law one might assume, but it doesn’t appear to be explicitly stated in the Prelude, unlike e.g. the monad or functor laws.
The basic response was: it is a bit dangerous to do this, since libraries may assume that this law holds.
My question, now, is: do any of the standard libraries (in particular, Set or Map) make this assumption? Is it dangerous to have a type with incompatible Eq and Ord, so long as I am only relying on the standard libraries supplied with GHC? (If big-list questions were still acceptable, I would be asking: which commonly used libraries assume this law?)
Edit. My use-case is similar to that of the original question. I have a type with a custom instance of Eq, that I use quite a bit. The only reason I want Ord is so that I can use it as the domain of a Map; I don’t care about the specific order, and will never use it explicitly in code. So if I can use the derived instance of Ord, then my life will be easier and my code clearer.
The definition of Ord itself in the standard prelude requires there already be an Eq instance:
class (Eq a) => Ord a where
...
So it would be just as wrong to violate
x == y = compare x y == EQ
x /= y = compare x y /= EQ
As it would be to violate (from the default definitions for these operators in Ord).
x <= y = compare x y /= GT
x < y = compare x y == LT
x >= y = compare x y /= LT
x > y = compare x y == GT
Edit: Use in libraries
I would be quite surprised if standard libraries didn't make use of Ord's == and /= operators. The specific purpose operators (==, /=, <=, <, >=, >) are frequently more convenient than compare, so I'd expect to see them used in code for maps or filters.
You can see == being used in guards on keys in Data.Map in fromAscListWithKey. This specific function only calls out for the Eq class, but if the key is also an Ord instance, Ord's compare will be used for other functions of the resulting Map, which is an assumption that Eq's == is the same as Ord's compare and testing for EQ.
As a library programmer, I wouldn't be surprised if any of the special purpose operators outperformed compare for the specific purpose. After all, that's why they are part of the Eq and Ord classes instead of being defined as polymorphic for all Eq or Ord instances. I might make a point of using them even when compare is more convenient. If I did, I'd probably define something like:
compareOp :: (Ord a) => Ordering -> Bool -> a -> a -> Bool
compareOp EQ True = (==)
compareOp EQ False = (/=)
compareOp LT True = (<)
compareOp LT False = (>=)
compareOp GT True = (>)
compareOp GT False = (<=)
To extend Cirdec's answer, typeclass instances should only be made if the operation being defined is somehow canonical. If there is a reasonable Eq which doesn't extend to a reasonable Ord, then it's best practice to pick either the other Eq or to not define an Ord. It's easy enough to create a non-polymorphic function for the "other" equality.
A great example of this tension is the potential Monoid instance
instance Monoid Int where
mzero = 0
mappend = (+)
which contests with the other "obvious" Monoid instance
instance Monoid Int where
mzero = 1
mappend = (*)
In this case the chosen path was to instantiate neither because it's not clear that one is "canonical" over the other. This typically conforms best to a user's expectation and which prevent bugs.
I've read through this and your original question, so I will address your general problem....
You want this-
Map BigThing OtherType
and this-
(==)::BigThing->BigThing->Bool
One of these cases has to be exact, the other case should ignore some of its data, for performance reasons. (it was (==) that needed to be exact in the first question, but it looks like you might be addressing the reverse in this question.... Same answer either way).
For instance, you want the map to only store the result based on some label, like a
`name::BigThing->String`
but (==) should do a deep compare. One way to do this would be to define incompatible compare and (==) functions. However....
in this case, this is unnecessary. Why not just instead use the map
Map String OtherThing
and do a lookup like this-
lookup (name obj) theMap
It is pretty rare to index directly on very large document data....

Resources