What should be my expectations for Haskell's "deriving"? - haskell

Apologies in advance for a beginner question, but I have struggled to find useful info on this. I was working through "Learn You Haskell for Great Good" and am trying to understand the deriving keyword, which seems like Java's implements but supposedly with cool automatic code generation because of the category theory stuff or something. I declare a data structure for 2-vectors like
data R2 = R2 {x :: Double, y :: Double} deriving (Show)
Then I can use it for things like
show (R2 1.0 2.0)
Now what I'd like to do is vector addition and scalar multiplication, like
(2.0 * (R2 1.0 2.0)) + (R2 3.0 4.0)
but when I try
Prelude> data R2 = R2 { x :: Double, y :: Double} deriving (Num,Show)
<interactive>:3:52:
Can't make a derived instance of `Num R2':
`Num' is not a derivable class
In the data declaration for `R2'
So the compiler figured out how to show the cartesian product of primitive types, but addition is too hard? Maybe Num isn't the right type class to derive? How often can one expect to derive a type class and get working code without additional work, like how I didn't have to write my own show function?
Thanks very much,
John

trying to understand the deriving keyword, which seems like Java's implements but supposedly with cool automatic code generation
instance is a bit more like implements, in that you state that a type is an instance of a type class and then write the implementations. deriving is all about the cool automatic generation of those implementations (though it does subsume instance).
How often can one expect to derive a type class and get working code without additional work, like how I didn't have to write my own show function?
Alexey Romanov's answer covers for which classes deriving works. There is also another way to auto-generate instances: using generics. From a bird's eye view, it works like this: you describe what an instance should look like for a generic type and then, for any type you want to have an instance, derive Generic and add an empty (i.e. with no implementations, as they will be generated automatically) instance declaration. Some libraries like aeson and binary offer generic instances ready to use, and you can of course roll your own for your classes.

See https://downloads.haskell.org/~ghc/7.10.2/docs/html/users_guide/deriving.html:
In Haskell 98, the only classes that may appear in the deriving clause are the standard classes Eq, Ord, Enum, Ix, Bounded, Read, and Show.
GHC also allows deriving Generic, Functor, Data, Typeable, Foldable and Traversable for data declarations, and any classes for newtype declarations (after enabling the relevant extensions, as listed in the linked page).

Here's one reason that you can't derive the Num class
data Vector = Vector Int Int
instance Num Vector where
Vector a b + Vector c d = Vector (a + c) (b + d)
Vector a b * Vector c d = Vector (a * c) (b * d)
data Complex = Complex Int Int
instance Num Complex where
Complex a b + Complex c d = Complex (a + c) (b + d)
Complex a b * Complex c d = Complex (a * c - b * d) (a * d + b * c)
Both are sensible instances that a real programmer might want to define. For a given data definition with two Int fields, which instance should the deriving clause pick?

The Haskell Report 2010 (the document that describes the Haskell language and which all implementions should follow) defines the conditions for deriving a class C as follows:
C is one of Eq, Ord, Enum, Bounded, Show, or Read.
There is a context cx′ such that cx′ ⇒ C tij holds for each of the constituent types tij.
If C is Bounded, the type must be either an enumeration (all constructors must be nullary) or have only one constructor.
If C is Enum, the type must be an enumeration.
There must be no explicit instance declaration elsewhere in the program that makes T u1 … uk an instance of C.
If the data declaration has no constructors (i.e. when n = 0), then no classes are derivable.
Also, later in the report, it is said that it's also possible to derive Data.Ix instances.
To find out how a particular instance is derived exactly (for example, what does the derived Show instance output?), read the section about it in the report. That section only gives implementations for the cases where the above conditions are met. That's why it's impossible to derive Num instances: it's not specified what that instance should do!
GHC also provides a few extensions that make allow for deriving more classes.
Those extensions are not part of standard Haskell, so they have to enabled explicitly. For example, if GenericNewtypeDeriving is enabled, you can write the following:
newtype MyInt = MyInt Int deriving (Num)
-- By GenericNewtypeDeriving, GHC will just "copy" the instance
-- for the base type of the newtype, in this case, it'll use the `Num Int` instance.
You can read about these extensions in the GHC user guide

Sadly, deriving only works for a small handful of classes, where the necessary code is hard-wired into the compiler. You can write instances yourself for any class, but only this small handful can be derived automatically.

Related

How to say a data of type A is also of type B in Haskell?

I am learning haskell and am having trouble following the line of thought. I am trying to think in C++ terms and I am unable to find the equivalent of C++ subclass in Haskell. How do I say data B is also data A for simple structs A and B?
Background: I have read LearnYouAHaskell at least thrice. I am able to write basic Haskell code, but nothing very advanced and am fairly experienced working in C++.
Attempts: I was trying to think in terms of defining A as a type class and making B an instance of A. However, I don't want to write new definition of the method/data member and just want to use the definition of A. I am unable to comprehend the error messages.
-- Trial.hs
class A a where
data_member :: Int
data B = B {
x :: Int
}
instance A B;
Trial.hs:2:9: error:
• Could not deduce (A a0)
from the context: A a
bound by the type signature for:
data_member :: A a => Int
at Trial.hs:2:9-26
The type variable ‘a0’ is ambiguous
• In the ambiguity check for ‘data_member’
To defer the ambiguity check to use sites, enable AllowAmbiguousTypes
When checking the class method: data_member :: forall a. A a => Int
In the class declaration for ‘A’
Like Damian suggests, use a Sum type: The | type operator/constructor.
Using ADT (Algebraic Data Types) is a great strength of Haskell. Dive blindlessly into using them whenever you can, I think all programmer learning haskell with an Imperative background come to this same conclusion: ADT are incredibly useful and concise.
Coming from a C++ background, when I first groked the Sum type I was translating it to C++ in my mind this way:
data PureVirtualClassA = ConcreteClassB Member1 Member2 | ConcreteClassC Member3
where Member1, Member2, and Member3 would be the types of a struct member. You can simplify this example with all the three being Int:
data PureVirtualClassA = ConcreteClassB Int Int | ConcreteClassC Int
(If you want a named data member, you should go for using Records, but they are not always needed)
Now you can use it in a function, just like you would use C++ polymorphism, where ConcreteClassB and ConcreteClassC would be deriving from PureVirtualClassA:
myFunction :: PureVirtualClassA -> Int
myFunction (ConcreteClassB x y) = x + y
myFunction (ConcreteClassC z) = z
IMPORTANT NOTE: In those examples for the C++ programmer I have used the word Class with the C++ meaning! Don't use the word Class this way in Haskell. A class in Haskell is something different, it's more like an interface, but the comparison does not stand.
You can create a new type containing both:
data A = A Int
data B = B Int
data AB = MakeA A | MakeB B
:t MakeA $ A 4
MakeA $ A 4 :: AB

How to create a generic Complex type in haskell?

I want to create a Complex type to represent complex numbers.
Following works:
Prelude> data Complex = Complex Int Int
Prelude> :t Complex
Complex :: Int -> Int -> Complex
How can I change this to accept any Num type, instead of just Int.
I tried following:
Prelude> data Complex a = Num a => Complex a a
but got this:
* Data constructor `Complex' has existential type variables, a context, or a specialised result type
Complex :: forall a. Num a => a -> a -> Complex a
(Use ExistentialQuantification or GADTs to allow this)
* In the definition of data constructor `Complex'
In the data type declaration for `Complex'
I'm not really sure what to make of this error. Any help is appreciated.
Traditional data in Haskell is just that: data. It doesn't need to know anything about the properties of its fields, it just needs to be able to store them. Hence there's no real need to constrain the fields at that point; just make it
data Complex a = Complex !a !a
(! because strict fields are better for performance).
Of course when you then implement the Num instance, you will need a constraint:
instance (Num a) => Num (Complex a) where
fromInteger = (`Complex`0) . fromInteger
Complex r i + Complex ρ ι = Complex (r+ρ) (i+ι)
...
...in fact, you need the much stronger constraint RealFloat a to implement abs, at least that's how the standard version does it. (Which means, Complex Int is actually not usable, not with the standard Num hierarchy; you need e.g. Complex Double.)
That said, it is also possible to bake the constraint in to the data type itself. The ExistentialTypes syntax you tried is highly limiting though and not suitable for this; what you want instead is the GADT
data Complex a where
Complex :: Num a => a -> a -> Complex a
With that in place, you could then implement e.g. addition without mentioning any constraint in the signature
cplxAdd :: Complex a -> Complex a -> Complex a
cplxAdd (Complex r i) (Complex ρ ι) = Complex (r+ρ) (i+ι)
You would now need to fulfill Num whenever you try to construct a Complex value though. That means, you'd still need an explicit constraint in the Num instance.
Also, this version is potentially much slower, because the Num dictionary actually needs to be stored in the runtime representation.
Type constructors cannot be constrained in pure Haskell, only functions can. So it is supposed that you declare
data Complex a = Complex a a
and then constrain functions, like
conjugate :: (Num a) => Complex a -> Complex a
conjugate (Complex x y) = Complex x (-y)
In fact, the type and constraint for conjugate can be derived by the compiler, so you can just define the implementation:
conjugate (Complex x y) = Complex x (-y)
However, if you really wish to constrain the type constructor Complex, you can turn on some extensions that enable it, namely ExistentialQuantification or GADTs, as the compiler suggests. To do this, add this line to the very beginning of your file:
{-# LANGUAGE ExistentialQuantification #-}
or
{-# LANGUAGE GADTs #-}
Those are called pragmas.
While you could, as the compiler message instructs, use ExistentialQuantification, you could also define the type like this:
data Complex a = Complex a a deriving (Show, Eq)
It's a completely unconstrained type, so perhaps another name would be more appropriate... This type seems to often be called Pair...
When you write functions, however, you can constrain the values contained in the type:
myFunction :: Num a => Complex a -> a
myFunction (Complex x y) = x + y

Data families vs Injective type families

Now that we have injective type families, is there any remaining use case for using data families over type families?
Looking at past StackOverflow questions about data families, there is this question from a couple years ago discussing the difference between type families and data families, and this answer about use cases of data families. Both say that the injectivity of data families is their greatest strength.
Looking at the docs on data families, I see reason not to rewrite all uses of data families using injective type families.
For example, say I have a data family (I've merged some examples from the docs to try to squeeze in all the features of data families)
data family G a b
data instance G Int Bool = G11 Int | G12 Bool deriving (Eq)
newtype instance G () a = G21 a
data instance G [a] b where
G31 :: c -> G [Int] b
G32 :: G [a] Bool
I might as well rewrite it as
type family G a b = g | g -> a b
type instance G Int Bool = G_Int_Bool
type instance G () a = G_Unit_a a
type instance G [a] b = G_lal_b a b
data G_Int_Bool = G11 Int | G12 Bool deriving (Eq)
newtype G_Unit_a a = G21 a
data G_lal_b a b where
G31 :: c -> G_lal_b [Int] b
G32 :: G_lal_b [a] Bool
It goes without saying that associated instances for data families correspond to associated instances with type families in the same way. Then is the only remaining difference that we have less things in the type-namespace?
As a followup, is there any benefit to having less things in the type-namespace? All I can think of is that this will become debugging hell for someone playing with this on ghci - the types of the constructors all seem to indicate that the constructors are all under one GADT...
type family T a = r | r -> a
data family D a
An injective type family T satisfies the injectivity axiom
if T a ~ T b then a ~ b
But a data family satisfies the much stronger generativity axiom
if D a ~ g b then D ~ g and a ~ b
(If you like: Because the instances of D define new types that are different from any existing types.)
In fact D itself is a legitimate type in the type system, unlike a type family like T, which can only ever appear in a fully saturated application like T a. This means
D can be the argument to another type constructor, like MaybeT D. (MaybeT T is illegal.)
You can define instances for D, like instance Functor D. (You can't define instances for a type family Functor T, and it would be unusable anyway because instance selection for, e.g., map :: Functor f => (a -> b) -> f a -> f b relies on the fact that from the type f a you can determine both f and a; for this to work f cannot be allowed to vary over type families, even injective ones.)
You're missing one other detail - data families create new types. Type families can only refer to other types. In particular, every instance of a data family declares new constructors. And it's nicely generic. You can create a data instance with newtype instance if you want newtype semantics. Your instance can be a record. It can have multiple constructors. It can even be a GADT if you want.
It's exactly the difference between the type and data/newtype keywords. Injective type families don't give you new types, rendering them useless in the case where you need that.
I understand where you're coming from. I had this same issue with the difference initially. Then I finally ran into a use case where they're useful, even without a type class getting involved.
I wanted to write an api for dealing with mutable cells in a few different contexts, without using classes. I knew I wanted to do it with a free monad with interpreters in IO, ST, and maybe some horrible hacks with unsafeCoerce to even go so far as shoehorning it into State. This wasn't for any practical purpose, of course - I was just exploring API designs.
So I had something like this:
data MutableEnv (s :: k) a ...
newRef :: a -> MutableEnv s (Ref s a)
readRef :: Ref s a -> MutableEnv s a
writeRef :: Ref s a -> a -> MutableEnv s ()
The definition of MutableEnv wasn't important. Just standard free/operational monad stuff with constructors matching the three functions in the api.
But I was stuck on what to define Ref as. I didn't want some sort of class, I wanted it to be a concrete type as far as the type system was concerned.
Then late one night I was out for a walk and it hit me - what I essentially want is a type whose constructors are indexed by an argument type. But it had to be open, unlike a GADT - new interpreters could be added at will. And then it hit me. That's exactly what a data family is. An open, type-indexed family of data values. I could complete the api with just the following:
data family Ref (s :: k) :: * -> *
Then, dealing with the underlying representation for a Ref was no big deal. Just create a data instance (or newtype instance, more likely) whenever an interpreter for MutableEnv is defined.
This exact example isn't really useful. But it clearly illustrates something data families can do that injective type families can't.
The answer by Reid Barton explains the distinction between my two examples perfectly. It has reminded me of something I read in Richard Eisenberg's thesis about adding dependent types to Haskell and I thought that since the heart of this question is injectivity and generativity, it would be worth mentioning how DependentHaskell will deal with this (when it eventually gets implemented, and if the quantifiers proposed now are the ones eventually implemented).
What follows is based on pages 56 and 57 (4.3.4 Matchability) of the aforementioned thesis:
Definition (Generativity). If f and g are generative, then f a ~ g b implies f ~ g
Definition (Injectivity). If f is injective, then f a ~ f b implies a ~ b
Definition (Matchability). A function f is matchable iff it is generative and injective
In Haskell as we know it now (8.0.1) the matchable (type-level) functions consist exactly of newtype, data, and data family type constructors. In the future, under DependentHaskell, one of the new quantifiers we will get will be '-> and this will be used to denote matchable functions. In other words, there will be a way to inform the compiler a type-level function is generative (which currently can only be done by making sure that function is a type constructor).

On inferring fmap for ADTs

Suppose that two new types are defined like this
type MyProductType a = (FType1 a, FType2 a)
type MyCoproductType a = Either (FType1 a) (FType2 a)
...and that FType1 and Ftype2 are both instances of Functor.
If one now were to declare MyProductType and MyCoproductType as instances of Functor, would the compiler require explicit definitions for their respective fmap's, or can it infer these definitions from the previous ones?
Also, is the answer to this question implementation-dependent, or does it follow from the Haskell spec?
By way of background, this question was motivated by trying to make sense of a remark in something I'm reading. The author first defines
type Writer a = (a, String)
...and later writes (my emphasis)
...the Writer type constructor is functorial in a. We don't even have to implement fmap for it, because it's just a simple product type.
The emphasized text is the remark I'm trying to make sense of. I thought it meant that Haskell could infer fmap's for any ADT based on functorial types, and, in particular, it could infer the fmap for a "simple product type" like Writer, but now I think this interpretation is not right (at least if I'm reading Ørjan Johansen's answer correctly).
As for what the author meant by that sentence, now I really have no clue. Maybe all he meant is that it's not worth the trouble to re-define Writer in such a way that its functoriality can be made explicit, since it's such a "simple ... type". (Grasping at straws here.)
First, you cannot generally define new instances for type synonyms, especially not partially applied ones as you would need in your case. I think you meant to define a newtype or data instead:
newtype MyProductType a = MP (FType1 a, FType2 a)
newtype MyCoproductType a = MC (Either (FType1 a) (FType2 a))
Standard Haskell says nothing about deriving Functor automatically at all, that is only possible with GHC's DeriveFunctor extension. (Or sometimes GeneralizedNewtypeDeriving, but that doesn't apply in your examples because you're not using a just as the last argument inside the constructor.)
So let's try that:
{-# LANGUAGE DeriveFunctor #-}
data FType1 a = FType1 a deriving Functor
data FType2 a = FType2 a deriving Functor
newtype MyProductType a = MP (FType1 a, FType2 a) deriving Functor
newtype MyCoproductType a = MC (Either (FType1 a) (FType2 a)) deriving Functor
We get the error message:
Test.hs:6:76:
Can't make a derived instance of ‘Functor MyCoproductType’:
Constructor ‘MC’ must use the type variable only as the last argument of a data type
In the newtype declaration for ‘MyCoproductType’
It turns out that GHC can derive the first three, but not the last one. I believe the third one only works because tuples are special cased. Either doesn't work though, because GHC doesn't keep any special knowledge about how Either treats its first argument. It's nominally a mathematical functor in that argument, but not a Haskell Functor.
Note that GHC is smarter about using variables only as last argument of types known to be Functors. The following works fine:
newtype MyWrappedType a = MW (Either (FType1 Int) (FType2 (Maybe a))) deriving Functor
So to sum up: It depends, GHC has an extension for this but it's not always smart enough to do what you want.

Constructor that lifts (via DataKinds) to * -> A

Given an ADT like
data K = A | B Bool
the DataKinds extension allows us to lift it into kinds and types/type constructors
K :: BOX
'A :: K
'B :: 'Bool -> K
Is there a way to add a constructor to K that lifts to the type constructor
'C :: * -> K
?
As Conor states, this is not directly possible. You can, however, define
data K a = ... | C a
Then this promotes to
C :: a -> K a
If you then use K *, you can achieve what you want.
At the moment, I'm afraid not. I haven't spotted an obvious workaround, either.
This ticket documents the prospects for the declaration of data kinds, born kind, rather than being data types with kindness thrust upon them. It would be entirely reasonable for the constructors of such things to pack up types as you propose. We're not there yet, but it doesn't look all that problematic.
My eyes are on a greater prize. I would like * to be perfectly sensible type of runtime values, so that the kind you want could exist by promotion as we have it today. Combine that with the mooted notion of pi-type (non-parametric abstraction over the portion of the language that's effectively shared by types and values) and we might get a more direct way to make ad hoc type abstractions than we have with Data.Typeable. The usual forall would remain parametric.

Resources