overriding default Eq definition for specific pattern - haskell

Suppose I have defined some data type that derives Eq but want to insert my own definition of (==) for some pattern. Is there any way to do this or do I have to define (==) for every pattern?
e.g.
data Asdf = One Char | Two Char Char --(deriving Eq)
instance Eq Asdf where
(==) (One _) (One _) = True
--otherwise use what the derived definition would have done
--can I do this without defining these patterns myself?

To do what you're trying to do, you have to define it yourself, and that means you have to define it for every pattern.
Basically data MyType x = A x | B x x deriving (Eq) will add a default derivation equivalent to,
instance Eq x => Eq (MyType x) where
A x1 == A x2 = x1 == x2
B x1 x2 == B x3 x4 = x1 == x3 && x2 == x4
_ == _ = False
Note that it figures out the necessary dependencies (the Eq x => part above) as well as fills in the diagonal cases -- the special cases among the n2 possible matches where the same constructor was used.
As far as I know, it does this definition all-at-once and there is no way to dig into an existing instance declaration to mess with it -- and there is a good reason for this; if they let you do this then that would mean that as codebases grow, you could not look at an instance derivation or a deriving (Eq) clause and be confident that you know exactly what it means, since some other part of the code might monkey-patch that Eq instance to do something nefarious.
So one way is to redefine the diagonal yourself. But that's not the only way. There is at least one alternative which may work if it's easier to modify several usage sites than to shove all n constructors into a single thing:
newtype EverythingIsEqual x = E x deriving (Show)
instance Eq (EverythingIsEqual x) where
_ == _ = True
data MyType x = A (EverythingIsEqual x) | B x x deriving (Show, Eq, Ord)
This newtype allows you to strategically modify certain terms to have a different Eq relation at no runtime cost -- in fact this is pretty much one of the two central arguments for newtypes; aside from the lesser one where "I want to have a type-level difference between these two Strings but they ARE just strings and I don't want to pay any performance penalty," there is the greater argument of "sometimes we want to tell Haskell to use a different Ord dictionary without messing with any of the values that this dictionary acts upon, we just want to swap out the functions."

This question discusses how to do something very similar for the Show instance, using the https://hackage.haskell.org/package/generic-deriving package: Accessing the "default show" in Haskell?
See this answer in particular: https://stackoverflow.com/a/35385768/936310
I recently used it for the Show instance recently, and it worked wonderfully. You can similarly derive Eq as well for your type, assuming it's regular enough.

Related

Haskell Either with fixed types

I have two types, assume they both have monoid implementations. Is there a way to have another type that will be specified to contain an X or a Y? Or is this not the right way to go about this?
data X = X [Int]
data Y = Y Double
The OP has clarified in the comments that they want 'instance behaviour' for some type Either X Y. Typically, you'd use a newtype in this situation:
newtype EitherXY = EitherXY (Either X Y)
In case you're not already aware, newtypes can have a record-like unwrapping function.
newtype EitherXY = EitherXY { unwrap :: Either X Y } deriving (...)
You may also auto-derive certain type classes (as with data declarations). The set of derivable classes depends on the compiler version and the set of enabled extensions. I won't elaborate on it here.
It's probably better to just do
data X_Or_Y = InX X | InY Y
This type is isomorphic to Either X Y, but it's easier to work with/pattern match on than a newtype, since it only has 2 layers of nested constructors

Using different Ordering for Sets

I was reading a Chapter 2 of Purely Functional Data Structures, which talks about unordered sets implemented as binary search trees. The code is written in ML, and ends up showing a signature ORDERED and a functor UnbalancedSet(Element: ORDERED): SET. Coming from more of a C++ background, this makes sense to me; custom comparison function objects form part of the type and can be passed in at construction time, and this seems fairly analogous to the ML functor way of doing things.
When it comes to Haskell, it seems the behavior depends only on the Ord instance, so if I wanted to have a set that had its order reversed, it seems like I'd have to use a newtype instance, e.g.
newtype ReverseInt = ReverseInt Int deriving (Eq, Show)
instance Ord ReverseInt where
compare (ReverseInt a) (ReverseInt b)
| a == b = EQ
| a < b = GT
| a > b = LT
which I could then use in a set:
let x = Set.fromList $ map ReverseInt [1..5]
Is there any better way of doing this sort of thing that doesn't resort to using newtype to create a different Ord instance?
No, this is really the way to go. Yes, having a newtype is sometimes annoying but you get some big benefits:
When you see a Set a and you know a, you immediately know what type of comparison it uses (sort of the same way that purity makes code more readable by not making you have to trace execution). You don't have to know where that Set a comes from.
For many cases, you can coerce your way through multiple newtypes at once. For example, I can turn xs = [1,2,3] :: Int into ys = [ReverseInt 1, ReverseInt 2, ReverseInt 3] :: [ReverseInt] just using ys = coerce xs :: [ReverseInt]. Unfortunately, that isn't the case for Set (and it shouldn't - you'd need the coercion function to be monotonic to not screw up the data structure invariants, and there is not yet a way to express that in the type system).
newtypes end up being more composable than you expect. For example, the ReverseInt type you made already exists in a form that generalizes to reversing any type with an Ord constraint: it is called Down. To be explicit, you could use Down Int instead of ReversedInt, and you get the instance you wrote out for free!
Of course, if you still feel very strongly about this, nothing is stopping you from writing your version of Set which has to have a field which is the comparison function it uses. Something like
data Set a = Set { comparisionKey :: a -> a -> Ordering
, ...
}
Then, every time you make a Set, you would have to pass in the comparison key.

Why context is not considered when selecting typeclass instance in Haskell?

I understand that when having
instance (Foo a) => Bar a
instance (Xyy a) => Bar a
GHC doesn't consider the contexts, and the instances are reported as duplicate.
What is counterintuitive, that (I guess) after selecting an instance, it still needs to check if the context matches, and if not, discard the instance. So why not reverse the order, and discard instances with non-matching contexts, and proceed with the remaining set.
Would this be intractable in some way? I see how it could cause more constraint resolution work upfront, but just as there is UndecidableInstances / IncoherentInstances, couldn't there be a ConsiderInstanceContexts when "I know what I am doing"?
This breaks the open-world assumption. Assume:
class B1 a
class B2 a
class T a
If we allow constraints to disambiguate instances, we may write
instance B1 a => T a
instance B2 a => T a
And may write
instance B1 Int
Now, if I have
f :: T a => a
Then f :: Int works. But, the open world assumption says that, once something works, adding more instances cannot break it. Our new system doesn't obey:
instance B2 Int
will make f :: Int ambiguous. Which implementation of T should be used?
Another way to state this is that you've broken coherence. For typeclasses to be coherent means that there is only one way to satisfy a given constraint. In normal Haskell, a constraint c has only one implementation. Even with overlapping instances, coherence generally holds true. The idea is that instance T a and instance {-# OVERLAPPING #-} T Int do not break coherence, because GHC can't be tricked into using the former instance in a place where the latter would do. (You can trick it with orphans, but you shouldn't.) Coherence, at least to me, seems somewhat desirable. Typeclass usage is "hidden", in some sense, and it makes sense to enforce that it be unambiguous. You can also break coherence with IncoherentInstances and/or unsafeCoerce, but, y'know.
In a category theoretic way, the category Constraint is thin: there is at most one instance/arrow from one Constraint to another. We first construct two arrows a : () => B1 Int and b : () => B2 Int, and then we break thinness by adding new arrows x_Int : B1 Int => T Int, y_Int : B2 Int => T Int such that x_Int . a and y_Int . b are both arrows () => T Int that are not identical. Diamond problem, anyone?
This does not answer you question as to why this is the case. Note, however, that you can always define a newtype wrapper to disambiguate between the two instances:
newtype FooWrapper a = FooWrapper a
newtype XyyWrapper a = XyyWrapper a
instance (Foo a) => Bar (FooWrapper a)
instance (Xyy a) => Bar (XyyWrapper a)
This has the added advantage that by passing around either a FooWrapper or a XyyWrapper you explicitly control which of the two instances you'd like to use if your a happens to satisfy both.
Classes are a bit weird. The original idea (which still pretty much works) is a sort of syntactic sugar around what would otherwise be data statements. For example you can imagine:
data Num a = Num {plus :: a -> a -> a, ... , fromInt :: Integer -> a}
numInteger :: Num Integer
numInteger = Num (+) ... id
then you can write functions which have e.g. type:
test :: Num x -> x -> x -> x -> x
test lib a b c = a + b * (abs (c + b))
where (+) = plus lib
(*) = times lib
abs = absoluteValue lib
So the idea is "we're going to automatically derive all of this library code." The question is, how do we find the library that we want? It's easy if we have a library of type Num Int, but how do we extend it to "constrained instances" based on functions of type:
fooLib :: Foo x -> Bar x
xyyLib :: Xyy x -> Bar x
The present solution in Haskell is to do a type-pattern-match on the output-types of those functions and propagate the inputs to the resulting declaration. But when there's two outputs of the same type, we would need a combinator which merges these into:
eitherLib :: Either (Foo x) (Xyy x) -> Bar x
and basically the problem is that there is no good constraint-combinator of this kind right now. That's your objection.
Well, that's true, but there are ways to achieve something morally similar in practice. Suppose we define some functions with types:
data F
data X
foobar'lib :: Foo x -> Bar' x F
xyybar'lib :: Xyy x -> Bar' x X
bar'barlib :: Bar' x y -> Bar x
Clearly the y is a sort of "phantom type" threaded through all of this, but it remains powerful because given that we want a Bar x we will propagate the need for a Bar' x y and given the need for the Bar' x y we will generate either a Bar' x X or a Bar' x y. So with phantom types and multi-parameter type classes, we get the result we want.
More info: https://www.haskell.org/haskellwiki/GHC/AdvancedOverlap
Adding backtracking would make instance resolution require exponential time, in the worst case.
Essentially, instances become logical statements of the form
P(x) => R(f(x)) /\ Q(x) => R(f(x))
which is equivalent to
(P(x) \/ Q(x)) => R(f(x))
Computationally, the cost of this check is (in the worst case)
c_R(n) = c_P(n-1) + c_Q(n-1)
assuming P and Q have similar costs
c_R(n) = 2 * c_PQ(n-1)
which leads to exponential growth.
To avoid this issue, it is important to have fast ways to choose a branch, i.e. to have clauses of the form
((fastP(x) /\ P(x)) \/ (fastQ(x) /\ Q(x))) => R(f(x))
where fastP and fastQ are computable in constant time, and are incompatible so that at most one branch needs to be visited.
Haskell decided that this "fast check" is head compatibility (hence disregarding contexts). It could use other fast checks, of course -- it's a design decision.

How can quotient types help safely expose module internals?

Reading up on quotient types and their use in functional programming, I came across this post. The author mentions Data.Set as an example of a module which provides a ton of functions which need access to module's internals:
Data.Set has 36 functions, when all that are really needed to ensure the meaning of a set ("These elements are distinct") are toList and fromList.
The author's point seems to be that we need to "open up the module and break the abstraction" if we forgot some function which can be implemented efficiently only using module's internals.
He then says
We could alleviate all of this mess with quotient types.
but gives no explanation to that claim.
So my question is: how are quotient types helping here?
EDIT
I've done a bit more research and found a paper "Constructing Polymorphic Programs with Quotient Types". It elaborates on declaring quotient containers and mentions the word "efficient" in abstract and introduction. But if I haven't misread, it does not give any example of an efficient representation "hiding behind" a quotient container.
EDIT 2
A bit more is revealed in "[PDF] Programming in Homotopy Type Theory" paper in Chapter 3. The fact that quotient type can be implemented as a dependent sum is used. Views on abstract types are introduced (which look very similar to type classes to me) and some relevant Agda code is provided. Yet the chapter focuses on reasoning about abstract types, so I'm not sure how this relates to my question.
I recently made a blog post about quotient types, and I was led here by a comment. The blog post may provide some additional context in addition to the papers referenced in the question.
The answer is actually pretty straightforward. One way to arrive at it is to ask the question: why are we using an abstract data type in the first place for Data.Set?
There are two distinct and separable reasons. The first reason is to hide the internal type behind an interface so that we can substitute a completely new type in the future. The second reason is to enforce implicit invariants on values of the internal type. Quotient type and their dual subset types allow us to make the invariants explicit and enforced by the type checker so that we no longer need to hide the representation. So let me be very clear: quotient (and subset) types do not provide you with any implementation hiding. If you implement Data.Set with quotient types using lists as your representation, then later decide you want to use trees, you will need to change all code that uses your type.
Let's start with a simpler example (leftaroundabout's). Haskell has an Integer type but not a Natural type. A simple way to specify Natural as a subset type using made up syntax would be:
type Natural = { n :: Integer | n >= 0 }
We could implement this as an abstract type using a smart constructor that threw an error when given a negative Integer. This type says that only a subset of the values of type Integer are valid. Another approach we could use to implement this type is to use a quotient type:
type Natural = Integer / ~ where n ~ m = abs n == abs m
Any function h :: X -> T for some type T induces a quotient type on X quotiented by the equivalence relation x ~ y = h x == h y. Quotient types of this form are more easily encoded as abstract data types. In general, though, there may not be such a convenient function, e.g.:
type Pair a = (a, a) / ~ where (a, b) ~ (x, y) = a == x && b == y || a == y && b == x
(As to how quotient types relate to setoids, a quotient type is a setoid that enforces that you respect its equivalence relation.) This second definition of Natural has the property that there are two values that represent 2, say. Namely, 2 and -2. The quotient type aspect says we are allowed to do whatever we want with the underlying Integer, so long as we never produce a result that differentiates between these two representatives. Another way to see this is that we can encode a quotient type using subset types as:
X/~ = forall a. { f :: X -> a | forEvery (\(x, y) -> x ~ y ==> f x == f y) } -> a
Unfortunately, that forEvery is tantamount to checking equality of functions.
Zooming back out, subset types add constraints on producers of values and quotient types add constraints on consumers of values. Invariants enforced by an abstract data type may be a mixture of these. Indeed, we may decide to represent a Set as the following:
data Tree a = Empty | Branch (Tree a) a (Tree a)
type BST a = { t :: Tree a | isSorted (toList t) }
type Set a = { t :: BST a | noDuplicates (toList t) } / ~
where s ~ t = toList s == toList t
Note, nothing about this ever requires us to actually execute isSorted, noDuplicates, or toList. We "merely" need to convince the type checker that the implementations of functions on this type would satisfy these predicates. The quotient type allows us to have a redundant representation while enforcing that we treat equivalent representations in the same way. This doesn't mean we can't leverage the specific representation we have to produce a value, it just means that we must convince the type checker that we would have produced the same value given a different, equivalent representation. For example:
maximum :: Set a -> a
maximum s = exposing s as t in go t
where go Empty = error "maximum of empty Set"
go (Branch _ x Empty) = x
go (Branch _ _ r) = go r
The proof obligation for this is that the right-most element of any binary search tree with the same elements is the same. Formally, it's go t == go t' whenever toList t == toList t'. If we used a representation that guaranteed the tree would be balanced, e.g. an AVL tree, this operation would be O(log N) while converting to a list and picking the maximum from the list would be O(N). Even with this representation, this code is strictly more efficient than converting to a list and getting the maximum from the list. Note, that we could not implement a function that displayed the tree structure of the Set. Such a function would be ill-typed.
I'll give a simpler example where it's reasonably clear. Admittedly I myself don't really see how this would translate to something like Set, efficiently.
data Nat = Nat (Integer / abs)
To use this safely, we must be sure that any function Nat -> T (with some non-quotient T, for simplicity's sake) does not depend on the actual integer value, but only on its absolute. To do so, it's not really necessary to hide Integer completely; it would be sufficient to prevent you from matching on it directly. Instead, the compiler might rewrite the matches, e.g.
even' :: Nat -> Bool
even' (Nat 0) = True
even' (Nat 1) = False
even' (Nat n) = even' . Nat $ n - 2
could be rewritten to
even' (Nat n') = case abs n' of
[|abs 0|] -> True
[|abs 1|] -> False
n -> even' . Nat $ n - 2
Such a rewriting would point out equivalence violations, e.g.
bad (Nat 1) = "foo"
bad (Nat (-1)) = "bar"
bad _ = undefined
would rewrite to
bad (Nat n') = case n' of
1 -> "foo"
1 -> "bar"
_ -> undefined
which is obviously an overlapped pattern.
Disclaimer: I just read up on quotient types upon reading this question.
I think the author's just saying that sets can be described as quotient types over lists. Ie: (making up some haskell-like syntax):
data Set a = Set [a] / (sort . nub) deriving (Eq)
Ie, a Set a is just a [a] with equality between two Set a's determined by whether the sort . nub of the underlying lists are equal.
We could do this explicitly like this, I guess:
import Data.List
data Set a = Set [a] deriving (Show)
instance (Ord a, Eq a) => Eq (Set a) where
(Set xs) == (Set ys) = (sort $ nub xs) == (sort $ nub ys)
Not sure if this is actually what the author intended as this isn't a particularly efficient way of implementing a set. Someone can feel free to correct me.

Haskell: do standard libraries assume Eq and Ord are compatible?

This is a followup question to Inconsistent Eq and Ord instances?.
The question there is essentially: when declaring Eq and Ord instances for a type, must one ensure that compare x y returns EQ if and only if x == y returns True? Is it dangerous to create instances that break this assumption? It seems like a natural law one might assume, but it doesn’t appear to be explicitly stated in the Prelude, unlike e.g. the monad or functor laws.
The basic response was: it is a bit dangerous to do this, since libraries may assume that this law holds.
My question, now, is: do any of the standard libraries (in particular, Set or Map) make this assumption? Is it dangerous to have a type with incompatible Eq and Ord, so long as I am only relying on the standard libraries supplied with GHC? (If big-list questions were still acceptable, I would be asking: which commonly used libraries assume this law?)
Edit. My use-case is similar to that of the original question. I have a type with a custom instance of Eq, that I use quite a bit. The only reason I want Ord is so that I can use it as the domain of a Map; I don’t care about the specific order, and will never use it explicitly in code. So if I can use the derived instance of Ord, then my life will be easier and my code clearer.
The definition of Ord itself in the standard prelude requires there already be an Eq instance:
class (Eq a) => Ord a where
...
So it would be just as wrong to violate
x == y = compare x y == EQ
x /= y = compare x y /= EQ
As it would be to violate (from the default definitions for these operators in Ord).
x <= y = compare x y /= GT
x < y = compare x y == LT
x >= y = compare x y /= LT
x > y = compare x y == GT
Edit: Use in libraries
I would be quite surprised if standard libraries didn't make use of Ord's == and /= operators. The specific purpose operators (==, /=, <=, <, >=, >) are frequently more convenient than compare, so I'd expect to see them used in code for maps or filters.
You can see == being used in guards on keys in Data.Map in fromAscListWithKey. This specific function only calls out for the Eq class, but if the key is also an Ord instance, Ord's compare will be used for other functions of the resulting Map, which is an assumption that Eq's == is the same as Ord's compare and testing for EQ.
As a library programmer, I wouldn't be surprised if any of the special purpose operators outperformed compare for the specific purpose. After all, that's why they are part of the Eq and Ord classes instead of being defined as polymorphic for all Eq or Ord instances. I might make a point of using them even when compare is more convenient. If I did, I'd probably define something like:
compareOp :: (Ord a) => Ordering -> Bool -> a -> a -> Bool
compareOp EQ True = (==)
compareOp EQ False = (/=)
compareOp LT True = (<)
compareOp LT False = (>=)
compareOp GT True = (>)
compareOp GT False = (<=)
To extend Cirdec's answer, typeclass instances should only be made if the operation being defined is somehow canonical. If there is a reasonable Eq which doesn't extend to a reasonable Ord, then it's best practice to pick either the other Eq or to not define an Ord. It's easy enough to create a non-polymorphic function for the "other" equality.
A great example of this tension is the potential Monoid instance
instance Monoid Int where
mzero = 0
mappend = (+)
which contests with the other "obvious" Monoid instance
instance Monoid Int where
mzero = 1
mappend = (*)
In this case the chosen path was to instantiate neither because it's not clear that one is "canonical" over the other. This typically conforms best to a user's expectation and which prevent bugs.
I've read through this and your original question, so I will address your general problem....
You want this-
Map BigThing OtherType
and this-
(==)::BigThing->BigThing->Bool
One of these cases has to be exact, the other case should ignore some of its data, for performance reasons. (it was (==) that needed to be exact in the first question, but it looks like you might be addressing the reverse in this question.... Same answer either way).
For instance, you want the map to only store the result based on some label, like a
`name::BigThing->String`
but (==) should do a deep compare. One way to do this would be to define incompatible compare and (==) functions. However....
in this case, this is unnecessary. Why not just instead use the map
Map String OtherThing
and do a lookup like this-
lookup (name obj) theMap
It is pretty rare to index directly on very large document data....

Resources