I have a (fairly) legitimate case where there are two type instance implementations, and I want to specify a default one. After noting that doing modular arithmetic with Int types resulted in lots of hash collisions, I want to try GHC's Int64. I have the following code:
class Hashable64 a where
hash64 :: a -> Int64
instance Hashable64 a => Hashable a where
hash = fromInteger . toInteger . hash64
instance Hashable64 a => Hashable64 [a] where
hash64 = foldl1 (\x y -> x + 22636946317 * y) . map hash64
and an instance Hashable64 Char, which thus results in two implementations for Hashable String, namely:
The one defined in Data.Hashable.
Noting that it is an Hashable64 instance, then converting to a regular Int for an instance of Data.Hashable.
The second code path may be better because it performs hashing with Int64s. Can I specify to use this derivation of the instance Hashable String?
Edit 1
Sorry, I forgot to add I have already tried the overlapping instances thing; perhaps I'm just not implementing it correctly? The documentation for overlapping instances says it works when one instance is more specific. But when I try to add a specific instance for Hashable String, the situation doesn't improve. Full code at [ http://pastebin.com/9fP6LUX2 ] (sorry for the superfluous default header).
instance Hashable String where
hash x = hash (hash64 x)
I get
Matching instances:
instance (Hashable a) => Hashable [a] -- Defined in Data.Hashable
instance [overlap ok] Hashable String
-- Defined at Hashable64.hs:70:9-23
Edit 2
Any other solutions to this specific problem are welcome. A good solution might provide insight into this overlapping instances problem.
This sort of situation is handled by GHC's OverlappingInstances extension. Roughly speaking, this extension allows instances to coexist despite the existence of some type(s) to which both could apply. For such types, GHC will select the "most specific" instance, which is a little fuzzy in some cases but usually does what you'd want it to.
This sort of situation, where you have one or more specialized instances and a single catch-all instance Foo a as the "default", usually works pretty well.
The main stumbling blocks to be aware of with overlapping instances are:
If something forces GHC to select an instance on a polymorphic type that's ambiguous, it will refuse with potentially cryptic compiler errors
The context of an instance is ignored until after it's been selected, so don't try to distinguish between instances that way; there are workarounds but they're annoying.
The latter point would be relevant here if, for example, you have a list of some type that's not an instance of Hashable64; GHC will select the more specific second instance, then fail because of the context, even if the full type (the list, not the element type) is an instance of Hashable64 and thus would work with the first, generic instance.
Edit: Oh, I see, I misinterpreted the situation slightly, regarding where the instances are coming from. Quoth the GHC User's Guide:
The willingness to be overlapped or incoherent is a property of the instance declaration itself (...). Neither flag is required in a module that imports and uses the instance declaration.
(...)
These rules make it possible for a library author to design a library that relies on overlapping instances without the library client having to know.
If an instance declaration is compiled without -XOverlappingInstances, then that instance can never be overlapped. This could perhaps be inconvenient. (...) We are interested to receive feedback on these points.
...in other words, overlapping is only allowed if the less specific instance was built with OverlappingInstances enabled, which the instance for Hashable [a] was not. So your instance for Hashable a is allowed, but one for Hashable [Char] fails, as observed.
This is a tidy illustration of why the User's Guide finds the current rules unsatisfactory (other rules would have their own problems, so it's not clear what the best approach, if any, would be).
Back in the here-and-now, you have multiple options, which are slightly less convenient than what you'd hoped for. Off the top of my head:
Alternate class: Define your equivalent of the Hashable class, write the overlapped instances you want, and use generic instances with Hashable in the context to fall back to the original as needed. This is problematic if you're using another library that expects Hashable instances, rather than pre-hashed values or an explicit hash function.
Type wrapper: newtype wrappers are something of a "standard" way to disambiguate instances (c.f. Monoid). By using such a wrapper around your values, you'll be able to write whatever instances you please because none of the pre-defined instances will match. This becomes problematic if you have a lot of functions that would need to wrap/unwrap the newtype, though keep in mind that you can define other instances (e.g., Num, Show, etc.) for the wrapper easily and there's no overhead at run time.
There are other, more arcane, workarounds, but I can't offer too much explicit guidance because which is the least awkward tends to be very situation-dependent.
It's worth noting that you're definitely pushing the edges of what can be sensibly expressed with type classes, so it's not surprising that things are awkward. It's not a very satisfying situation, but there's little you can do when constrained to adding instances for a class defined elsewhere.
Related
To clarify my question, let me rephrase it in a more or less equivalent way:
Why is there a concept of superclass/class inheritance in Haskell?
What are the historical reasons that led to that design choice?
Why would it be so bad, for example, to have a base library with no class hierarchy, just typeclasses independent from each other?
Here I'll expose some random thoughts that made me want to ask this question. My current intuitions might be inaccurate as they are based on my current understanding of Haskell which is not perfect, but here they are...
It is not obvious to me why type class inheritance exists in Haskell. I find it a bit weird, as it creates asymmetry in concepts.
Often in mathematics, concepts can be defined from different viewpoints, I don't necessarily want to favor an order of how they ought to be defined. OK there is some order in which one should prove things, but once theorems and structures are there, I'd rather see them as available independent tools.
Moreover one perhaps not so good thing I see with class inheritance is this: I think a class instance will silently pick a corresponding superclass instance, which was probably implemented to be the most natural one for that type. Let's consider a Monad viewed as a subclass of Functor. Maybe there could be more than one way to define a Functor on some type that also happens to be a Monad. But saying that a Monad is a Functor implicitly makes the choice of one particular Functor for that Monad. Someday, you might forget that actually you wanted some other Functor.
Perhaps this example is not the best fit, but I have the feeling this sort of situation might generalize and possibly be dangerous if your class is a child of many. Current Haskell inheritance sounds like it makes default choices about parents implicitly.
If instead you have a design without hierarchy, I feel you would always have to be explicit about all the properties required, which would perhaps mean a bit less risk, more clarity, and more symmetry. So far, what I'm seeing is that the cost of such a design, would be : more constraints to write in instance definitions, and newtype wrappers, for each meaningful conversion from one set of concepts to another. I am not sure, but perhaps that could have been acceptable. Unfortunately, I think Haskell auto deriving mechanism for newtypes doesn't work very well, I would appreciate that the language was somehow smarter with newtype wrapping/unwrapping and required less verbosity.
I'm not sure, but now that I think about it, perhaps an alternative to newtype wrappers could be specific imports of modules containing specific variations of instances.
Another alternative I thought about while writing this, is that maybe one could weaken the meaning of class (P x) => C x, where instead of it being a requirement that an instance of C selects an instance of P, we could just take it to loosely mean that for example, C class also contains P's methods but no instance of P is automatically selected, no other relationship with P exists. So we could keep some sort of weaker hierarchy that might be more flexible.
Thanks if you have some clarifications over that topic, and/or correct my possible misunderstandings.
Maybe you're tired of hearing from me, but here goes...
I think superclasses were introduced as a relatively minor and unimportant feature of type classes. In Wadler and Blott, 1988, they are briefly discussed in Section 6 where the example class Eq a => Num a is given. There, the only rationale offered is that it's annoying to have to write (Eq a, Num a) => ... in a function type when it should be "obvious" that data types that can be added, multiplied, and negated ought to be testable for equality as well. The superclass relationship allows "a convenient abbreviation".
(The unimportance of this feature is underscored by the fact that this example is so terrible. Modern Haskell doesn't have class Eq a => Num a because the logical justification for all Nums also being Eqs is so weak. The example class Eq a => Ord a would be been a lot more convincing.)
So, the base library implemented without any superclasses would look more or less the same. There would just be more logically superfluous constraints on function type signatures in both library and user code, and instead of fielding this question, I'd be fielding a beginner question about why:
leq :: (Ord a) => a -> a -> Bool
leq x y = x < y || x == y
doesn't type check.
To your point about superclasses forcing a particular hierarchy, you're missing your target.
This kind of "forcing" is actually a fundamental feature of type classes. Type classes are "opinionated by design", and in a given Haskell program (where "program" includes all the libraries, include base used by the program), there can be only one instance of a particular type class for a particular type. This property is referred to as coherence. (Even though there is a language extension IncohorentInstances, it is considered very dangerous and should only be used when all possible instances of a particular type class for a particular type are functionally equivalent.)
This design decision comes with certain costs, but it also comes with a number of benefits. Edward Kmett talks about this in detail in this video, starting at around 14:25. In particular, he compares Haskell's coherent-by-design type classes with Scala's incoherent-by-design implicits and contrasts the increased power that comes with the Scala approach with the reusability (and refactoring benefits) of "dumb data types" that comes with the Haskell approach.
So, there's enough room in the design space for both coherent type classes and incoherent implicits, and Haskell's appoach isn't necessarily the right one.
BUT, since Haskell has chosen coherent type classes, there's no "cost" to having a specific hierarchy:
class Functor a => Monad a
because, for a particular type, like [] or MyNewMonadDataType, there can only be one Monad and one Functor instance anyway. The superclass relationship introduces a requirement that any type with Monad instance must have Functor instance, but it doesn't restrict the choice of Functor instance because you never had a choice in the first place. Or rather, your choice was between having zero Functor [] instances and exactly one.
Note that this is separate from the question of whether or not there's only one reasonable Functor instance for a Monad type. In principle, we could define a law-violating data type with incompatible Functor and Monad instances. We'd still be restricted to using that one Functor MyType instance and that one Monad MyType instance throughout our program, whether or not Functor was a superclass of Monad.
In Haskell, it's possible to add constraints to a type parameter.
For example:
foo :: Functor f => f a
The question: is it possible to negate a constraint?
I want to say that f can be anything except Functor for example.
UPD:
So it comes from the idea of how to map the bottom nested Functor.
Let's say I have Functor a where a can be a Functor b or not and the same rules works for b.
Reasons why this is not possible: (basically all the same reason, just different aspects of it)
There is an open-world assumption about type classes. It isn't possible to prove that a type is not an instance of a class because even if during compilation of a module, the instance isn't there, that doesn't mean somebody doesn't define it in a module “further down the road”. This could in principle be in a separate package, such that the compiler can't possibly know whether or not the instance exists.
(Such orphan instances are generally quite frowned upon, but there are use cases for them and the language doesn't attempt to prevent this.)
Membership in a class is an intuitionistic property, meaning that you shouldn't think of it as a classical boolean value “instance or not instance” but rather, if you can prove that a type is an instance then this gives you certain features for the type (specified by the class methods). If you can't prove that the type is an instance then this doesn't mean there is no instance, but perhaps just that you're not smart enough to prove it. (Read, “perhaps that nobody is smart enough”.)This ties back to the first point: the compiler not yet having the instance available is one case of “not being smart enough”.
A class isn't supposed to be used for making dispatch on whether or not a type is in it, but for enabling certain polymorphic functions even if they require ad-hoc conditions on the types. That's what class methods do, and they can come from a class instance, but how could they come from a “not-in-class instance”?
Now, all that said, there is a way you can kind of fake this: with an overlapping instance. Don't do it, it's a bad idea, but... that's the closest you can get.
I'll start by introducing a concrete problem (StackOverflow guys like that).
Say you define a simple type
data T a = T a
This type is a Functor, Applicative and a Monad. Ignoring automatic deriving, to get those instances you have to write each one of them, even though Monad implies Applicative, which implies Functor.
More than that, I could define a class like this
class Wrapper f where
wrap :: a -> f a
unwrap :: f a -> a
This is a pretty strong condition and it definitely implies Monad, but I can't write
instance Wrapper f => Monad f where
return = wrap
fa >>= f = f $ unwrap fa
because this, for some reason, means "everything is a Monad (every f), only if it's a Wrapper", instead of "everything that's a Wrapper is a Monad".
Similarly you can't define the Monad a => Applicative a and Applicative a => Functor a instances.
Another thing you can't do (which is only probably related, I really don't know) is have one class be a superclass of another one, and provide a default implementation of the subclass. Sure, it's great that class Applicative a => Monad a, but it's much less great that I still have to define the Applicative instance before I can define the Monad one.
This isn't a rant. I wrote a lot because otherwise this would quickly the marked as "too broad" or "unclear". The question boils down to the title.
I know (at least I'm pretty sure) that there is some theoretical reason for this, so I'm wondering what exactly are the benefits here.
As a sub question, I would like to ask if there are viable alternatives that still keep all (or most) of those advantages, but allow what I wrote.
Addition:
I suspect one of the answers might be something along the lines "What if my type is a Wrapper, but I don't want to use the Monad instance that that implies?". To this I ask, why couldn't the compiler just pick the most specific one? If there is an instance Monad MyType, surely it's more specific than instance Wrapper a => Monad a.
There's a lot of questions rolled into one here. But let's take them one at a time.
First: why doesn't the compiler look at instance contexts when choosing which instance to use? This is to keep instance search efficient. If you require the compiler to consider only instances whose instance heads are satisfied, you essentially end up requiring your compiler to do back-tracking search among all possible instances, at which point you have implemented 90% of Prolog. If, on the other hand, you take the stance (as Haskell does) that you look only at instance heads when choosing which instance to use, and then simply enforce the instance context, there is no backtracking: at every moment, there is only one choice you can make.
Next: why can't you have one class be a superclass of another one, and provide a default implementation of the subclass? There is no fundamental reason for this restriction, so GHC offers this feature as an extension. You could write something like this:
{-# LANGUAGE DefaultSignatures #-}
class Applicative f where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b
default pure :: Monad f => a -> f a
default (<*>) :: Monad f => f (a -> b) -> f a -> f b
pure = return
(<*>) = ap
Then once you had provided an instance Monad M where ..., you could simply write instance Applicative M with no where clause and have it Just Work. I don't really know why this wasn't done in the standard library.
Last: why can't the compiler allow many instances and just pick the most specific one? The answer to this one is sort of a mix of the previous two: there are very good fundamental reasons this doesn't work well, yet GHC nevertheless offers an extension that does it. The fundamental reason this doesn't work well is that the most specific instance for a given value can't be known before runtime. GHC's answer to this is, for polymorphic values, to pick the most specific one compatible with the full polymorphism available. If later that thing thing gets monomorphised, well, too bad for you. The result of this is that some functions may operate on some data with one instance and others may operate on that same data with another instance; this can lead to very subtle bugs. If after all this discussion you still think that's a good idea, and refuse to learn from the mistakes of others, you can turn on IncoherentInstances.
I think that covers all the questions.
Consistency and separate compilation.
If we have two instances whose heads both match, but have different constraints, say:
-- File: Foo.hs
instance Monad m => Applicative m
instance Applicative Foo
Then either this is valid code producing an Applicative instance for Foo, or it's an error producing two different Applicative instances for Foo. Which one it is depends on whether a monad instance exists for Foo. That's a problem, because it's difficult to guarantee that knowledge about whether Monad Foo holds will make it to the compiler when it's compiling this module.
A different module (say Bar.hs) may produce a Monad instance for Foo. If Foo.hs doesn't import that module (even indirectly), then how is the compiler to know? Worse, we can change whether this is an error or a valid definition by changing whether we later include Bar.hs in the final program or not!
For this to work, we'd need to know that all instances that exist in the final compiled program are visible in every module, which leads to the conclusion that every module is a dependency of every other module regardless of whether the module actually imports the other. You'd have to go quite far along the path to requiring whole-program-analysis to support such a system, which makes distributing pre-compiled libraries difficult to impossible.
The only way to avoid this is to never have GHC make decisions based on negative information. You can't choose an instance based on the non-existence of another instance.
This means that the constraints on an instance have to be ignored for instance resolution. You need to select an instance regardless of whether the constraints hold; if that leaves more than one possibly-applicable instance, then you would need negative information (namely that all but one of them require constraints that do not hold) to accept the code as valid.
If you have only one instance that's even a candidate, and you can't see a proof of its constraints, you can accept the code by just passing the constraints on to where the instance is used (we can rely on getting this information to other modules, because they'll have to import this one, even if only indirectly); if those positions can't see a required instance either, then they'll generate an appropriate error about an unsatisfied constraint.
So by ignoring the constraints, we ensure that a compiler can make correct decisions about instances even by only knowing about other modules that it imports (transitively); it doesn't have to know about everything that's defined in every other module in order to know which constraints do not hold.
First of all, I want to clarify that I've tried to find a solution to my problem googling but I didn't succeed.
I need a way to compare two expressions. The problem is that these expressions are not comparable. I'm coming from Erlang, where I can do :
case exp1 of
exp2 -> ...
where exp1 and exp2 are bound. But Haskell doesn't allow me to do this. However, in Haskell I could compare using ==. Unfortunately, their type is not member of the class Eq. Of course, both expressions are unknown until runtime, so I can't write a static pattern in the source code.
How could compare this two expressions without having to define my own comparison function? I suppose that pattern matching could be used here in some way (as in Erlang), but I don't know how.
Edit
I think that explaining my goal could help to understand the problem.
I'm modyfing an Abstract Syntax Tree (AST). I am going to apply a set of rules that are going to modify this AST, but I want to store the modifications in a list. The elements of this list should be a tuple with the original piece of the AST and its modification. So the last step is to for each tuple search for a piece of the AST that is exactly the same, and substitute it by the second element of the tuple. So, I will need something like this:
change (old,new):t piece_of_ast =
case piece_of_ast of
old -> new
_ -> piece_of_ast
I hope this explanation clarify my problem.
Thanks in advance
It's probably an oversight in the library (but maybe I'm missing a subtle reason why Eq is a bad idea!) and I would contact the maintainer to get the needed Eq instances added in.
But for the record and the meantime, here's what you can do if the type you want to compare for equality doesn't have an instance for Eq, but does have one for Data - as is the case in your question.
The Data.Generics.Twins package offers a generic version of equality, with type
geq :: Data a => a -> a -> Bool
As the documentation states, this is 'Generic equality: an alternative to "deriving Eq" '. It works by comparing the toplevel constructors and if they are the same continues on to the subterms.
Since the Data class inherits from Typeable, you could even write a function like
veryGenericEq :: (Data a, Data b) => a -> b -> Bool
veryGenericEq a b = case (cast a) of
Nothing -> False
Maybe a' -> geq a' b
but I'm not sure this is a good idea - it certainly is unhaskelly, smashing all types into one big happy universe :-)
If you don't have a Data instance either, but the data type is simple enough that comparing for equality is 100% straightforward then StandaloneDeriving is the way to go, as #ChristianConkle indicates. To do this you need to add a {-# LANGUAGE StandaloneDeriving #-} pragma at the top of your file and add a number of clauses
deriving instance Eq a => Eq (CStatement a)
one for each type CStatement uses that doesn't have an Eq instance, like CAttribute. GHC will complain about each one you need, so you don't have to trawl through the source.
This will create a bunch of so-called 'orphan instances.' Normally, an instance like instance C T where will be defined in either the module that defines C or the module that defines T. Your instances are 'orphans' because they're separated from their 'parents.' Orphan instances can be bad because you might start using a new library which also has those instances defined; which instance should the compiler use? There's a little note on the Haskell wiki about this issue. If you're not publishing a library for others to use, it's fine; it's your problem and you can deal with it. It's also fine for testing; if you can implement Eq, then the library maintainer can probably include deriving Eq in the library itself, solving your problem.
I'm not familiar with Erlang, but Haskell does not assume that all expressions can be compared for equality. Consider, for instance, undefined == undefined or undefined == let x = x in x.
Equality testing with (==) is an operation on values. Values of some types are simply not comparable. Consider, for instance, two values of type IO String: return "s" and getLine. Are they equal? What if you run the program and type "s"?
On the other hand, consider this:
f :: IO Bool
f = do
x <- return "s" -- note that using return like this is pointless
y <- getLine
return (x == y) -- both x and y have type String.
It's not clear what you're looking for. Pattern matching may be the answer, but if you're using a library type that's not an instance of Eq, the likely answer is that comparing two values is actually impossible (or that the library author has for some reason decided to impose that restriction).
What types, specifically, are you trying to compare?
Edit: As a commenter mentioned, you can also compare using Data. I don't know if that is easier for you in this situation, but to me it is unidiomatic; a hack.
You also asked why Haskell can't do "this sort of thing" automatically. There are a few reasons. In part it's historical; in part it's that, with deriving, Eq satisfies most needs.
But there's also an important principle in play: the public interface of a type ideally represents what you can actually do with values of that type. Your specific type is a bad example, because it exposes all its constructors, and it really looks like there should be an Eq instance for it. But there are many other libraries that do not expose the implementation of their types, and instead require you to use provided functions (and class instances). Haskell allows that abstraction; a library author can rely on the fact that a consumer can't muck about with implementation details (at least without using unsafe tools like unsafeCoerce).
Suppose I have the following class:
class P a where
nameOf :: a -> String
I would like to declare that all instances of this class are automatically instances of Show. My first attempt would be the following:
instance P a => Show a where
show = nameOf
My first attempt to go this way yesterday resulted in a rabbit warren of language extensions: I was first told to switch on flexible instances, then undecidable instances, then overlapping instances, and finally getting an error about overlapping instance declarations. I gave up and returned to repeating the code. However, this fundamentally seems like a very simple demand, and one that should be easily satisfied.
So, two questions:
Is there a trivially easy way to do this that I've just missed?
Why do I get an overlapping instances problem? I can see why I might need UndecidableInstances, since I seem to be violating the Paterson condition, but there are no overlapping instances around here: there are no instances of P, even. Why does the typechecker believe there are multiple instances for Show Double (as seems to be the case in this toy example)?
You get the overlapping instances error because some of your instances of P may have other instances of Show and then the compiler won't be able to decide which ones to use. If you have an instance of P for Double, then there you go, you get two instances of Show for Double: yours general one and the one already declared in Haskell's base library. How this error is triggered is correctly stated by #augustss in the comments to your question. For more info see the specs.
As you already know, there is no way to achieve what you're trying without the UndecidableInstances. When you enable that flag you must understand that you're taking over the compiler's responsibility to ensure that there won't arise any conflicting instances. This means that, of course, there mustn't be any other instances of Show produced in your library. This also means that your library won't export the P class, which will erase the possibility of users of the library declaring the conflicting instances.
If your case somehow conflicts with the said above, it's a reliable sign of that there must be something wrong with it. And in fact there is...
What you're trying to achieve is incorrect above all. You are missing several important points about the Show typeclass, distinguishing it from constructs like a toString method of popular OO languages:
From Show's haddock:
The result of show is a syntactically correct Haskell expression containing only constants, given the fixity declarations in force at the point where the type is declared. It contains only the constructor names defined in the data type, parentheses, and spaces. When labelled constructor fields are used, braces, commas, field names, and equal signs are also used.
In other words, declaring an instance of Show, which does not produce a valid Haskell expression, is incorrect per se.
Given the above it just doesn't make sense to declare a custom instance of Show when the type allows to simply derive it.
When a type does not allow to derive it (e.g., GADT), generally you'll still have to stick to type-specific instances to produce correct results.
So, if you need a custom representation function, you shouldn't use Show for that. Just declare a custom class, e.g.:
class Repr a where
repr :: a -> String
and approach the instances declaration responsibly.