Haskell: Understanding generalised algebraic data types - haskell

I have the following code, which outlines a language of boolean and arithmetic expressions:
data Exp a where
Plus :: Exp Int -> Exp Int -> Exp Int
Const :: (Show a) => a -> Exp a
Not :: Exp Bool -> Exp Bool
And :: Exp Bool -> Exp Bool -> Exp Bool
Greater :: Exp Int -> Exp Int -> Exp Bool
The evaluation function for the above language has the following type:
eval :: Exp a -> a
I am trying to understand what are the possible types the eval function can return. The above code uses GADTs which enable the type Exp a to be defined in terms of the type signatures of its constructors.
The return type of the constructors is not always Exp a. I believe that the types eval can return include Int, Bool, or a value of any type implementing Show. However, is it possible for eval to return any other type besides the three I mentioned previously (since the return type of the function is a)? Any insights are appreciated.

Exp a -> a is a type that could have many possible functions. For example, a valid function that probably isn't the one you are thinking of as "the" eval function is
foo :: Exp a -> a
foo (Plus _ _) = 3
foo (Const x) = x
foo (Not _) = True
foo (And _ _) = True
foo (Greater _ _) = True
The image of a function is the set of values it can return. This example demonstrates that the function foo and the eval you are expecting have different images, despite having the same return type forall a. a.
You are asking, in essence, what the union of all possible images of functions of type Exp a -> a is. This depends greatly on the actual definition of Exp. As currently defined, that would be the union of Int, Bool, and Show a => a.
Exp the type constructor, though, is capable of defining uninhabited types. The type Exp (Int -> Int) exists, even though you haven't defined a constructor which can create values of that type. Since you can't provide a value of type Exp (Int -> Int) for any potential eval function, it can't affect the image of any such function, either.
Changing the definition of Exp to include such a constructor would increase the set of values that could be passed to a function of type Exp a -> a, thus increase the set of values that could occur in the image of such a function.

To be clear, given your definition of Exp above, the answer is "yes": any function with type signature eval :: Exp a -> a can only return a (defined) value of type Int, Bool, or some other type with a Show instance. Because any type (of kind *) can be given a Show instance, that technically means that eval can return any type, but within a particular program that has a fixed set of Show instances, eval will be limited to returning values from this set of types.
You can see that this must be true as follows. Suppose that eval e returned a value of some fixed type t for some expression e. By the type signature of eval, that would imply that e must have type Exp t. However, data type declarations are "closed" meaning that the set of constructors given in the data Exp a declaration is exhaustive, and they represent the only methods of constructing a (defined) value of type Exp a. It's clear from the set of constructors that the only possible values of type Exp a are those that appear in the rightmost positions of the constructors' signatures: Exp Int, Exp Bool, and Exp a with the constraint Show a. Therefore, these are the only types possible for e implying that t must be Int, Bool, or some other a satisfying the constraint Show a.
As always in reasoning about Haskell types, we have to be a little bit careful in considering undefined/bottom values. If you consider "returning an undefined value" to be meaningful, then, indeed, eval can "return" an undefined value of any type, even one without a Show instance. For example, the following will typecheck:
stupid :: Exp (Int -> Int)
stupid = eval undefined
However, if your reason for asking the question is to determine whether you'd ever be in a position where the expression eval e might unexpectedly have a type other than Int, Bool, or some Show a => a that you would somehow have to handle, then no. The form of the GADT places limits on the possible types a in the signature eval :: Exp a -> a.

Related

Clarifying Data Constructor in Haskell

In the following:
data DataType a = Data a | Datum
I understand that Data Constructor are value level function. What we do above is defining their type. They can be function of multiple arity or const. That's fine. I'm ok with saying Datum construct Datum. What is not that explicit and clear to me here is somehow the difference between the constructor function and what it produce. Please let me know if i am getting it well:
1 - a) Basically writing Data a, is defining both a Data Structure and its Constructor function (as in scala or java usually the class and the constructor have the same name) ?
2 - b) So if i unpack and make an analogy. With Data a We are both defining a Structure(don't want to use class cause class imply a type already i think, but maybe we could) of object (Data Structure), the constructor function (Data Constructor/Value constructor), and the later return an object of that object Structure. Finally The type of that Structure of object is given by the Type constructor. An Object Structure in a sense is just a Tag surrounding a bunch value of some type. Is my understanding correct ?
3 - c) Can I formally Say:
Data Constructor that are Nullary represent constant values -> Return the the constant value itself of which the type is given by the Type Constructor at the definition site.
Data Constructor that takes an argument represent class of values, where class is a Tag ? -> Return an infinite number of object of that class, of which the type is given by the Type constructor at the definition site.
Another way of writing this:
data DataType a = Data a | Datum
Is with generalised algebraic data type (GADT) syntax, using the GADTSyntax extension, which lets us specify the types of the constructors explicitly:
{-# LANGUAGE GADTSyntax #-}
data DataType a where
Data :: a -> DataType a
Datum :: DataType a
(The GADTs extension would work too; it would also allow us to specify constructors with different type arguments in the result, like DataType Int vs. DataType Bool, but that’s a more advanced topic, and we don’t need that functionality here.)
These are exactly the types you would see in GHCi if you asked for the types of the constructor functions with :type / :t:
> :{
| data DataType a where
| Data :: a -> DataType a
| Datum :: DataType a
| :}
> :type Data
Data :: a -> DataType a
> :t Datum
Datum :: DataType a
With ExplicitForAll we can also specify the scope of the type variables explicitly, and make it clearer that the a in the data definition is a separate variable from the a in the constructor definitions by also giving them different names:
data DataType a where
Data :: forall b. b -> DataType b
Datum :: forall c. DataType c
Some more examples of this notation with standard prelude types:
data Either a b where
Left :: forall a b. a -> Either a b
Right :: forall a b. b -> Either a b
data Maybe a where
Nothing :: Maybe a
Just :: a -> Maybe a
data Bool where
False :: Bool
True :: Bool
data Ordering where
LT, EQ, GT :: Ordering -- Shorthand for repeated ‘:: Ordering’
I understand that Data Constructor are value level function. What we do above is defining their type. They can be function of multiple arity or const. That's fine. I'm ok with saying Datum construct Datum. What is not that explicit and clear to me here is somehow the difference between the constructor function and what it produce.
Datum and Data are both “constructors” of DataType a values; neither Datum nor Data is a type! These are just “tags” that select between the possible varieties of a DataType a value.
What is produced is always a value of type DataType a for a given a; the constructor selects which “shape” it takes.
A rough analogue of this is a union in languages like C or C++, plus an enumeration for the “tag”. In pseudocode:
enum Tag {
DataTag,
DatumTag,
}
// A single anonymous field.
struct DataFields<A> {
A field1;
}
// No fields.
struct DatumFields<A> {};
// A union of the possible field types.
union Fields<A> {
DataFields<A> data;
DatumFields<A> datum;
}
// A pair of a tag with the fields for that tag.
struct DataType<A> {
Tag tag;
Fields<A> fields;
}
The constructors are then just functions returning a value with the appropriate tag and fields. Pseudocode:
<A> DataType<A> newData(A x) {
DataType<A> result;
result.tag = DataTag;
result.fields.data.field1 = x;
return result;
}
<A> DataType<A> newDatum() {
DataType<A> result;
result.tag = DatumTag;
// No fields.
return result;
}
Unions are unsafe, since the tag and fields can get out of sync, but sum types are safe because they couple these together.
A pattern-match like this in Haskell:
case someDT of
Datum -> f
Data x -> g x
Is a combination of testing the tag and extracting the fields. Again, in pseudocode:
if (someDT.tag == DatumTag) {
f();
} else if (someDT.tag == DataTag) {
var x = someDT.fields.data.field1;
g(x);
}
Again this is coupled in Haskell to ensure that you can only ever access the fields if you have checked the tag by pattern-matching.
So, in answer to your questions:
1 - a) Basically writing Data a, is defining both a Data Structure and its Constructor function (as in scala or java usually the class and the constructor have the same name) ?
Data a in your original code is not defining a data structure, in that Data is not a separate type from DataType a, it’s just one of the possible tags that a DataType a value may have. Internally, a value of type DataType Int is one of the following:
The tag for Data (in GHC, a pointer to an “info table” for the constructor), and a reference to a value of type Int.
x = Data (1 :: Int) :: DataType Int
+----------+----------------+ +---------+----------------+
x ---->| Data tag | pointer to Int |---->| Int tag | unboxed Int# 1 |
+----------+----------------+ +---------+----------------+
The tag for Datum, and no other fields.
y = Datum :: DataType Int
+-----------+
y ----> | Datum tag |
+-----------+
In a language with unions, the size of a union is the maximum of all its alternatives, since the type must support representing any of the alternatives with mutation. In Haskell, since values are immutable, they don’t require any extra “padding” since they can’t be changed.
It’s a similar situation for standard data types, e.g., a product or sum type:
(x :: X, y :: Y) :: (X, Y)
+---------+--------------+--------------+
| (,) tag | pointer to X | pointer to Y |
+---------+--------------+--------------+
Left (m :: M) :: Either M N
+-----------+--------------+
| Left tag | pointer to M |
+-----------+--------------+
Right (n :: N) :: Either M N
+-----------+--------------+
| Right tag | pointer to N |
+-----------+--------------+
2 - b) So if i unpack and make an analogy. With Data a We are both defining a Structure(don't want to use class cause class imply a type already i think, but maybe we could) of object (Data Structure), the constructor function (Data Constructor/Value constructor), and the later return an object of that object Structure. Finally The type of that Structure of object is given by the Type constructor. An Object Structure in a sense is just a Tag surrounding a bunch value of some type. Is my understanding correct ?
This is sort of correct, but again, the constructors Data and Datum aren’t “data structures” by themselves. They’re just the names used to introduce (construct) and eliminate (match) values of type DataType a, for some type a that is chosen by the caller of the constructors to fill in the forall
data DataType a = Data a | Datum says:
If some term e has type T, then the term Data e has type DataType T
Inversely, if some value of type DataType T matches the pattern Data x, then x has type T in the scope of the match (case branch or function equation)
The term Datum has type DataType T for any type T
3 - c) Can I formally Say:
Data Constructor that are Nullary represent constant values -> Return the the constant value itself of which the type is given by the Type Constructor at the definition site.
Data Constructor that takes an argument represent class of values, where class is a Tag ? -> Return an infinite number of object of that class, of which the type is given by the Type constructor at the definition site.
Not exactly. A type constructor like DataType :: Type -> Type, Maybe :: Type -> Type, or Either :: Type -> Type -> Type, or [] :: Type -> Type (list), or a polymorphic data type, represents an “infinite” family of concrete types (Maybe Int, Maybe Char, Maybe (String -> String), …) but only in the same way that id :: forall a. a -> a represents an “infinite” family of functions (id :: Int -> Int, id :: Char -> Char, id :: String -> String, …).
That is, the type a here is a parameter filled in with an argument value given by the caller. Usually this is implicit, through type inference, but you can specify it explicitly with the TypeApplications extension:
-- Akin to: \ (a :: Type) -> \ (x :: a) -> x
id :: forall a. a -> a
id x = x
id #Int :: Int -> Int
id #Int 1 :: Int
Data :: forall a. a -> DataType a
Data #Char :: Char -> DataType Char
Data #Char 'x' :: DataType Char
The data constructors of each instantiation don’t really have anything to do with each other. There’s nothing in common between the instantiations Data :: Int -> DataType Int and Data :: Char -> DataType Char, apart from the fact that they share the same tag name.
Another way of thinking about this in Java terms is with the visitor pattern. DataType would be represented as a function that accepts a “DataType visitor”, and then the constructors don’t correspond to separate data types, they’re just the methods of the visitor which accept the fields and return some result. Writing the equivalent code in Java is a worthwhile exercise, but here it is in Haskell:
{-# LANGUAGE RankNTypes #-}
-- (Allows passing polymorphic functions as arguments.)
type DataType a
= forall r. -- A visitor with a generic result type
r -- With one “method” for the ‘Datum’ case (no fields)
-> (a -> r) -- And one for the ‘Data’ case (one field)
-> r -- Returning the result
newData :: a -> DataType a
newData field = \ _visitDatum visitData -> visitData field
newDatum :: DataType a
newDatum = \ visitDatum _visitData -> visitDatum
Pattern-matching is simply running the visitor:
matchDT :: DataType a -> b -> (a -> b) -> b
matchDT dt visitDatum visitData = dt visitDatum visitData
-- Or: matchDT dt = dt
-- Or: matchDT = id
-- case someDT of { Datum -> f; Data x -> g x }
-- f :: r
-- g :: a -> r
-- someDT :: DataType a
-- :: forall r. r -> (a -> r) -> r
someDT f (\ x -> g x)
Similarly, in Haskell, data constructors are just the ways of introducing and eliminating values of a user-defined type.
What is not that explicit and clear to me here is somehow the difference between the constructor function and what it produce
I'm having trouble following your question, but I think you are complicating things. I would suggest not thinking too deeply about the "constructor" terminology.
But hopefully the following helps:
Starting simple:
data DataType = Data Int | Datum
The above reads "Declare a new type named DataType, which has the possible values Datum or Data <some_number> (e.g. Data 42)"
So e.g. Datum is a value of type DataType.
Going back to your example with a type parameter, I want to point out what the syntax is doing:
data DataType a = Data a | Datum
^ ^ ^ These things appear in type signatures (type level)
^ ^ These things appear in code (value level stuff)
There's a bit of punning happening here. so in the data declaration you might see "Data Int" and this is mixing type-level and value-level stuff in a way that you wouldn't see in code. In code you'd see e.g. Data 42 or Data someVal.
I hope that helps a little...

When do I need type annotations?

Consider these functions
{-# LANGUAGE TypeFamilies #-}
tryMe :: Maybe Int -> Int -> Int
tryMe (Just a) b = a
tryMe Nothing b = b
class Test a where
type TT a
doIt :: TT a -> a -> a
instance Test Int where
type TT Int = Maybe Int
doIt (Just a) b = a
doIt (Nothing) b = b
This works
main = putStrLn $ show $ tryMe (Just 2) 25
This doesn't
main = putStrLn $ show $ doIt (Just 2) 25
{-
• Couldn't match expected type ‘TT a0’ with actual type ‘Maybe a1’
The type variables ‘a0’, ‘a1’ are ambiguous
-}
But then, if I specify the type for the second argument it does work
main = putStrLn $ show $ doIt (Just 2) 25::Int
The type signature for both functions seem to be the same. Why do I need to annotate the second parameter for the type class function? Also, if I annotate only the first parameter to Maybe Int it still doesn't work. Why?
When do I need to cast types in Haskell?
Only in very obscure, pseudo-dependently-typed settings where the compiler can't proove that two types are equal but you know they are; in this case you can unsafeCoerce them. (Which is like C++' reinterpret_cast, i.e. it completely circumvents the type system and just treats a memory location as if it contains the type you've told it. This is very unsafe indeed!)
However, that's not what you're talking about here at all. Adding a local signature like ::Int does not perform any cast, it merely adds a hint to the type checker. That such a hint is needed shouldn't be surprising: you didn't specify anywhere what a is supposed to be; show is polymorphic in its input and doIt polymorphic in its output. But the compiler must know what it is before it can resolve the associated TT; choosing the wrong a might lead to completely different behaviour from the intended.
The more surprising thing is, really, that sometimes you can omit such signatures. The reason this is possible is that Haskell, and more so GHCi, has defaulting rules. When you write e.g. show 3, you again have an ambiguous a type variable, but GHC recognises that the Num constraint can be “naturally” fulfilled by the Integer type, so it just takes that pick.
Defaulting rules are handy when quickly evaluating something at the REPL, but they are fiddly to rely on, hence I recommend you never do it in a proper program.
Now, that doesn't mean you should always add :: Int signatures to any subexpression. It does mean that, as a rule, you should aim for making function arguments always less polymorphic than the results. What I mean by that: any local type variables should, if possible, be deducable from the environment. Then it's sufficient to specify the type of the final end result.
Unfortunately, show violates that condition, because its argument is polymorphic with a variable a that doesn't appear in the result at all. So this is one of the functions where you don't get around having some signature.
All this discussion is fine, but it hasn't yet been stated explicitly that in Haskell numeric literals are polymorphic. You probably knew that, but may not have realized that it has bearing on this question. In the expression
doIt (Just 2) 25
25 does not have type Int, it has type Num a => a — that is, its type is just some numeric type, awaiting extra information to pin it down exactly. And what makes this tricky is that the specific choice might affect the type of the first argument. Thus amalloy's comment
GHC is worried that someone might define an instance Test Integer, in which case the choice of instance will be ambiguous.
When you give that information — which can come from either the argument or the result type (because of the a -> a part of doIt's signature) — by writing either of
doIt (Just 2) (25 :: Int)
doIt (Just 2) 25 :: Int -- N.B. this annotates the type of the whole expression
then the specific instance is known.
Note that you do not need type families to produce this behavior. This is par for the course in typeclass resolution. The following code will produce the same error for the same reason.
class Foo a where
foo :: a -> a
main = print $ foo 42
You might be wondering why this doesn't happen with something like
main = print 42
which is a good question, that leftroundabout has already addressed. It has to do with Haskell's defaulting rules, which are so specialized that I consider them little more than a hack.
With this expression:
putStrLn $ show $ tryMe (Just 2) 25
We've got this starting information to work from:
putStrLn :: String -> IO ()
show :: Show a => a -> String
tryMe :: Maybe Int -> Int -> Int
Just :: b -> Maybe b
2 :: Num c => c
25 :: Num d => d
(where I've used different type variables everywhere, so we can more easily consider them all at once in the same scope)
The job of the type-checker is basically to find types to choose for all of those variables, so and then make sure that the argument and result types line up, and that all the required type class instances exist.
Here we can see that tryMe applied to two arguments is going to be an Int, so a (used as input to show) must be Int. That requires that there is a Show Int instance; indeed there is, so we're done with a.
Similarly tryMe wants a Maybe Int where we have the result of applying Just. So b must be Int, and our use of Just is Int -> Maybe Int.
Just was applied to 2 :: Num c => c. We've decided it must be applied to an Int, so c must be Int. We can do that if we have Num Int, and we do, so c is dealt with.
That leaves 25 :: Num d => d. It's used as the second argument to tryMe, which is expecting an Int, so d must be Int (again discharging the Num constraint).
Then we just have to make sure all the argument and result types line up, which is pretty obvious. This is mostly rehashing the above since we made them line up by choosing the only possible value of the type variables, so I won't get into it in detail.
Now, what's different about this?
putStrLn $ show $ doIt (Just 2) 25
Well, lets look at all the pieces again:
putStrLn :: String -> IO ()
show :: Show a => a -> String
doIt :: Test t => TT t -> t -> t
Just :: b -> Maybe b
2 :: Num c => c
25 :: Num d => d
The input to show is the result of applying doIt to two arguments, so it is t. So we know that a and t are the same type, which means we need Show t, but we don't know what t is yet so we'll have to come back to that.
The result of applying Just is used where we want TT t. So we know that Maybe b must be TT t, and therefore Just :: _b -> TT t. I've written _b using GHC's partial type signature syntax, because this _b is not like the b we had before. When we had Just :: b -> Maybe b we could pick any type we liked for b and Just could have that type. But now we need some specific but unknown type _b such that TT t is Maybe _b. We don't have enough information to know what that type is yet, because without knowing t we don't know which instance's definition of TT t we're using.
The argument of Just is 2 :: Num c => c. So we can tell that c must also be _b, and this also means we're going to need a Num _b instance. But since we don't know what _b is yet we can't check whether there's a Num instance for it. We'll come back to it later.
And finally the 25 :: Num d => d is used where doIt wants a t. Okay, so d is also t, and we need a Num t instance. Again, we still don't know what t is, so we can't check this.
So all up, we've figured out this:
putStrLn :: String -> IO ()
show :: t -> String
doIt :: TT t -> t -> t
Just :: _b -> TT t
2 :: _b
25 :: t
And have also these constraints waiting to be solved:
Test t, Num t, Num _b, Show t, (Maybe _b) ~ (TT t)
(If you haven't seen it before, ~ is how we write a constraint that two type expressions must be the same thing)
And we're stuck. There's nothing further we can figure out here, so GHC is going to report a type error. The particular error message you quoted is complaining that we can't tell that TT t and Maybe _b are the same (it calls the type variables a0 and a1), since we didn't have enough information to select concrete types for them (they are ambiguous).
If we add some extra type signatures for parts of the expression, we can go further. Adding 25 :: Int1 immediately lets us read off that t is Int. Now we can get somewhere! Lets patch that into the constrints we had yet to solve:
Test Int, Num Int, Num _b, Show Int, (Maybe _b) ~ (TT Int)
Num Int and Show Int are obvious and built in. We've got Test Int too, and that gives us the definition TT Int = Maybe Int. So (Maybe _b) ~ (Maybe Int), and therefore _b is Int too, which also allows us to discharge that Num _b constraint (it's Num Int again). And again, it's easy now to verify all the argument and result types match up, since we've filled in all the type variables to concrete types.
But why didn't your other attempt work? Lets go back to as far as we could get with no additional type annotation:
putStrLn :: String -> IO ()
show :: t -> String
doIt :: TT t -> t -> t
Just :: _b -> TT t
2 :: _b
25 :: t
Also needing to solve these constraints:
Test t, Num t, Num _b, Show t, (Maybe _b) ~ (TT t)
Then add Just 2 :: Maybe Int. Since we know that's also Maybe _b and also TT t, this tells us that _b is Int. We also now know we're looking for a Test instance that gives us TT t = Maybe Int. But that doesn't actually determine what t is! It's possible that there could also be:
instance Test Double where
type TT Double = Maybe Int
doIt (Just a) _ = fromIntegral a
doIt Nothing b = b
Now it would be valid to choose t as either Int or Double; either would work fine with your code (since the 25 could also be a Double), but would print different things!
It's tempting to complain that because there's only one instance for t where TT t = Maybe Int that we should choose that one. But the instance selection logic is defined not to guess this way. If you're in a situation where it's possible that another matching instance should exist, but isn't there due to an error in the code (forgot to import the module where it's defined, for example), then it doesn't commit to the only matching instance it can see. It only chooses an instance when it knows no other instance could possibly apply.2
So the "there's only one instance where TT t = Maybe Int" argument doesn't let GHC work backward to settle that t could be Int.
And in general with type families you can only "work forwards"; if you know the type you're applying a type family to you can tell from that what the resulting type should be, but if you know the resulting type this doesn't identify the input type(s). This is often surprising, since ordinary type constructors do let us "work backwards" this way; we used this above to conclude from Maybe _b = Maybe Int that _b = Int. This only works because with new data declarations, applying the type constructor always preserves the argument type in the resulting type (e.g. when we apply Maybe to Int, the resulting type is Maybe Int). The same logic doesn't work with type families, because there could be multiple type family instances mapping to the same type, and even when there isn't there is no requirement that there's an identifiable pattern connecting something in the resulting type to the input type (I could have type TT Char = Maybe (Int -> Double, Bool).
So you'll often find that when you need to add a type annotation, you'll often find that adding one in a place whose type is the result of a type family doesn't work, and you'll need to pin down the input to the type family instead (or something else that is required to be the same type as it).
1 Note that the line you quoted as working in your question main = putStrLn $ show $ doIt (Just 2) 25::Int does not actually work. The :: Int signature binds "as far out as possible", so you're actually claiming that the entire expression putStrLn $ show $ doIt (Just 2) 25 is of type Int, when it must be of type IO (). I'm assuming when you really checked it you put brackets around 25 :: Int, so putStrLn $ show $ doIt (Just 2) (25 :: Int).
2 There are specific rules about what GHC considers "certain knowledge" that there could not possibly be any other matching instances. I won't get into them in detail, but basically when you have instance Constraints a => SomeClass (T a), it has to be able to unambiguously pick an instance only by considering the SomeClass (T a) bit; it can't look at the constraints left of the => arrow.

How do I understand the set of valid inputs to a Haskell type constructor?

Warning: very beginner question.
I'm currently mired in the section on algebraic types in the Haskell book I'm reading, and I've come across the following example:
data Id a =
MkId a deriving (Eq, Show)
idInt :: Id Integer
idInt = MkId 10
idIdentity :: Id (a -> a)
idIdentity = MkId $ \x -> x
OK, hold on. I don't fully understand the idIdentity example. The explanation in the book is that:
This is a little odd. The type Id takes an argument and the data
constructor MkId takes an argument of the corresponding polymorphic
type. So, in order to have a value of type Id Integer, we need to
apply a -> Id a to an Integer value. This binds the a type variable to
Integer and applies away the (->) in the type constructor, giving us
Id Integer. We can also construct a MkId value that is an identity
function by binding the a to a polymorphic function in both the type
and the term level.
But wait. Why only fully polymorphic functions? My previous understanding was that a can be any type. But apparently constrained polymorphic type doesn't work: (Num a) => a -> a won't work here, and the GHC error suggests that only completely polymorphic types or "qualified types" (not sure what those are) are valid:
f :: (Num a) => a -> a
f = undefined
idConsPoly :: Id (Num a) => a -> a
idConsPoly = MkId undefined
Illegal polymorphic or qualified type: Num a => a -> a
Perhaps you intended to use ImpredicativeTypes
In the type signature for ‘idIdentity’:
idIdentity :: Id (Num a => a -> a)
EDIT: I'm a bonehead. I wrote the type signature below incorrectly, as pointed out by #chepner in his answer below. This also resolves my confusion in the next sentence below...
In retrospect, this behavior makes sense because I haven't defined a Num instance for Id. But then what explains me being able to apply a type like Integer in idInt :: Id Integer?
So in generality, I guess my question is: What specifically is the set of valid inputs to type constructors? Only fully polymorphic types? What are "qualified types" then? Etc...
You just have the type constructor in the wrong place. The following is fine:
idConsPoly :: Num a => Id (a -> a)
idConsPoly = MkId undefined
The type constructor Id here has kind * -> *, which means you can give it any value that has kind * (which includes all "ordinary" types) and returns a new value of kind *. In general, you are more concerned with arrow-kinded functions(?), of which type constructors are just one example.
TypeProd is a ternary type constructor whose first two arguments have kind * -> *:
-- Based on :*: from Control.Compose
newtype TypeProd f g a = Prod { unProd :: (f a, g a) }
Either Int is an expression whose value has kind * -> * but is not a type constructor, being the partial application of the type constructor Either to the nullary type constructor Int.
Also contributing to your confusion is that you've misinterpreted the error message from GHC. It means "Num a => a -> a is a polymorphic or qualified type, and therefore illegal (in this context)".
The last sentence that you quoted from the book is not very well worded, and maybe contributed to that misunderstanding. It's important to realize that in Id (a -> a) the argument a -> a is not a polymorphic type, but just an ordinary type that happens to mention a type variable. The thing which is polymorphic is idIdentity, which can have the type Id (a -> a) for any type a.
In standard Haskell polymorphism and qualification can only appear at the outermost level of the type in a type signature.
The type signature is almost correct
idConsPoly :: (Num a) => Id (a -> a)
Should be right, though i have no ghc on my phone to test this.
Also i think your question is quite broad, thus i deliberately answer only the concrete problem here.

Evolving data structure

I'm trying to write a compiler for a C-like language in Haskell. The compiler progresses by transforming an AST. The first pass parses the input to create the AST, tying a knot with the symbol table to allow symbols to be located before they have been defined without the need for forward references.
The AST contains information about types and expressions, and there can be connections between them; e.g. sizeof(T) is an expression that depends on a type T, and T[e] is an array type that depends on a constant expression e.
Types and expressions are represented by Haskell data types like so:
data Type = TypeInt Id
| TypePointer Id Type -- target type
| TypeArray Id Type Expr -- elt type, elt count
| TypeStruct Id [(String, Type)] -- [(field name, field type)]
| TypeOf Id Expr
| TypeDef Id Type
data Expr = ExprInt Int -- literal int
| ExprVar Var -- variable
| ExprSizeof Type
| ExprUnop Unop Expr
| ExprBinop Binop Expr Expr
| ExprField Bool Expr String -- Bool gives true=>s.field, false=>p->field
Where Unop includes operators like address-of (&), and dereference (*), and Binop includes operators like plus (+), and times (*), etc...
Note that each type is assigned a unique Id. This is used to construct a type dependency graph in order to detect cycles which lead to infinite types. Once we are sure that there are no cycles in the type graph, it is safe to apply recursive functions over them without getting into an infinite loop.
The next step is to determine the size of each type, assign offsets to struct fields, and replace ExprFields with pointer arithmetic. In doing so, we can determine the type of expressions, and eliminate ExprSizeofs, ExprFields, TypeDefs, and TypeOfs from the type graph, so our types and expressions have evolved, and now look more like this:
data Type' = TypeInt Id
| TypePointer Id Type'
| TypeArray Id Type' Int -- constant expression has been eval'd
| TypeStruct Id [(String, Int, Type')] -- field offset has been determined
data Expr' = ExprInt Type' Int
| ExprVar Type' Var
| ExprUnop Type' Unop Expr'
| ExprBinop Type' Binop Expr' Expr'
Note that we've eliminated some of the data constructors, and slightly changed some of the others. In particular, Type' no longer contains any Expr', and every Expr' has determined its Type'.
So, finally, the question: Is it better to create two almost identical sets of data types, or try to unify them into a single data type?
Keeping two separate data types makes it explicit that certain constructors can no longer appear. However, the function which performs constant folding to evaluate constant expressions will have type:
foldConstants :: Expr -> Either String Expr'
But this means that we cannot perform constant folding later on with Expr's (imagine some pass that manipulates an Expr', and wants to fold any emerged constant expressions). We would need another implementation:
foldConstants' :: Expr' -> Either String Expr'
On the other hand, keeping a single type would solve the constant folding problem, but would prevent the type checker from enforcing static invariants.
Furthermore, what do we put into the unknown fields (like field offsets, array sizes, and expression types) during the first pass? We could plug the holes with undefined, or error "*hole*", but that feels like a disaster waiting to happen (like NULL pointers that you can't even check). We could change the unknown fields into Maybes, and plug the holes with Nothing (like NULL pointers that we can check), but it would be annoying in subsequent passes to have to keep pulling values out of Maybes that are always going to be Justs.
Hopefully someone with more experience will have a more polished, battle-tested and ready answer, but here's my shot at it.
You can have your pie and eat part of it too at relatively little cost with GADTs:
{-# LANGUAGE GADTs #-}
data P0 -- phase zero
data P1 -- phase one
data Type p where
TypeInt :: Id -> Type p
TypePointer :: Id -> Type p -> Type p -- target type
TypeArray :: Id -> Type p -> Expr p -> Type p -- elt type, elt count
TypeStruct :: Id -> [(String, Type p)] -> Type p -- [(field name, field type)]
TypeOf :: Id -> Expr P0 -> Type P0
TypeDef :: Id -> Type P0 -> Type P0
data Expr p where
ExprInt :: Int -> Expr p -- literal int
ExprVar :: Var -> Expr p -- variable
ExprSizeof :: Type P0 -> Expr P0
ExprUnop :: Unop -> Expr p -> Expr p
ExprBinop :: Binop -> Expr p -> Expr p -> Expr p
ExprField :: Bool -> Expr P0 -> String -> Expr P0 -- Bool gives true=>s.field, false=>p->field
Here the things we've changed are:
The datatypes now use GADT syntax. This means that constructors are declared using their type signatures. data Foo = Bar Int Char becomes data Foo where Bar :: Int -> Char -> Foo (aside from syntax, the two are completely equivalent).
We have added a type variable to both Type and Expr. This is a so-called phantom type variable: there is no actual data stored that is of type p, it is used only to enforce invariants in the type system.
We've declared dummy types to represent the phases before and after the transformation: phase zero and phase one. (In a more elaborate system with multiple phases we could potentially use type-level numbers to denote them.)
GADTs let us store type-level invariants in the data structure. Here we have two of them. The first is that recursive positions must be of the same phase as the structure containing them. For example, looking at TypePointer :: Id -> Type p -> Type p, you pass a Type p to the TypePointer constructor and get a Type p as result, and those ps must be the same type. (If we wanted to allow different types, we could use p and q.)
The second is that we enforce the fact that some constructors can only be used in the first phase. Most of the constructors are polymorphic in the phantom type variable p, but some of them require that it be P0. This means that those constructors can only be used to construct values of type Type P0 or Expr P0, not any other phase.
GADTs work in two directions. The first is that if you have a function which returns a Type P1, and try to use one of the constructors which returns a Type P0 to construct it, you will get a type error. This is what's called "correct by construction": it's statically impossible to construct an invalid structure (provided you can encode all of the relevant invariants in the type system). The flip side of it is that if you have a value of Type P1, you can be sure that it was constructed correctly: the TypeOf and TypeDef constructors can't have been used (in fact, the compiler will complain if you try to pattern match on them), and any recursive positions must also be of phase P1. Essentially, when you construct a GADT you store evidence that the type constraints are satisfied, and when you pattern match on it you retrieve that evidence and can take advantage of it.
That was the easy part. Unfortunately, we have some differences between the two types beyond just which constructors are allowed: some of the constructor arguments are different between the phases, and some are only present in the post-transformation phase. We can again use GADTs to encode this, but it's not as low-cost and elegant. One solution would be to duplicate all of the constructors which are different, and have one each for P0 and P1. But duplication isn't nice. We can try to do it more fine-grained:
-- a couple of helper types
-- here I take advantage of the fact that of the things only present in one phase,
-- they're always present in P1 and not P0, and not vice versa
data MaybeP p a where
NothingP :: MaybeP P0 a
JustP :: a -> MaybeP P1 a
data EitherP p a b where
LeftP :: a -> EitherP P0 a b
RightP :: b -> EitherP P1 a b
data Type p where
TypeInt :: Id -> Type p
TypePointer :: Id -> Type p -> Type p
TypeArray :: Id -> Type p -> EitherP p (Expr p) Int -> Type p
TypeStruct :: Id -> [(String, MaybeP p Int, Type p)] -> Type p
TypeOf :: Id -> Expr P0 -> Type P0
TypeDef :: Id -> Type P0 -> Type P0
-- for brevity
type MaybeType p = MaybeP p (Type p)
data Expr p where
ExprInt :: MaybeType p -> Int -> Expr p
ExprVar :: MaybeType p -> Var -> Expr p
ExprSizeof :: Type P0 -> Expr P0
ExprUnop :: MaybeType p -> Unop -> Expr p -> Expr p
ExprBinop :: MaybeType p -> Binop -> Expr p -> Expr p -> Expr p
ExprField :: Bool -> Expr P0 -> String -> Expr P0
Here with some helper types we've enforced the fact that some constructor arguments can only be present in phase one (MaybeP) and some are different between the two phases (EitherP). While this makes us completely type-safe, it feels a bit ad-hoc and we still have to wrap things in and out of the MaybePs and EitherPs all the time. I don't know if there's a better solution in that respect. Complete type safety is something, though: we could write fromJustP :: MaybeP P1 a -> a and be sure that it is completely and utterly safe.
Update: An alternative is to use TypeFamilies:
data Proxy a = Proxy
class Phase p where
type MaybeP p a
type EitherP p a b
maybeP :: Proxy p -> MaybeP p a -> Maybe a
eitherP :: Proxy p -> EitherP p a b -> Either a b
phase :: Proxy p
phase = Proxy
instance Phase P0 where
type MaybeP P0 a = ()
type EitherP P0 a b = a
maybeP _ _ = Nothing
eitherP _ a = Left a
instance Phase P1 where
type MaybeP P1 a = a
type EitherP P1 a b = b
maybeP _ a = Just a
eitherP _ a = Right a
The only change to Expr and Type relative the previous version is that the constructors need to have an added Phase p constraint, e.g. ExprInt :: Phase p => MaybeType p -> Int -> Expr p.
Here if the type of p in a Type or Expr is known, you can statically know whether the MaybePs will be () or the given type and which of the types the EitherPs are, and can use them directly as that type without explicit unwrapping. When p is unknown you can use maybeP and eitherP from the Phase class to find out what they are. (The Proxy arguments are necessary, because otherwise the compiler wouldn't have any way to tell which phase you meant.) This is analogous to the GADT version where, if p is known, you can be sure of what MaybeP and EitherP contains, while otherwise you have to pattern match both possibilities. This solution isn't perfect either in the respect that the 'missing' arguments become () rather than disappearing entirely.
Constructing Exprs and Types also seems to be broadly similar between the two versions: if the value you're constructing has anything phase-specific about it then it must specify that phase in its type. The trouble seems to come when you want to write a function polymorphic in p but still handling phase-specific parts. With GADTs this is straightforward:
asdf :: MaybeP p a -> MaybeP p a
asdf NothingP = NothingP
asdf (JustP a) = JustP a
Note that if I had merely written asdf _ = NothingP the compiler would have complained, because the type of the output wouldn't be guaranteed to be the same as the input. By pattern matching we can figure out what type the input was and return a result of the same type.
With the TypeFamilies version, though, this is a lot harder. Just using maybeP and the resulting Maybe you can't prove anything to the compiler about types. You can get part of the way there by, instead of having maybeP and eitherP return Maybe and Either, making them deconstructor functions like maybe and either which also make a type equality available:
maybeP :: Proxy p -> (p ~ P0 => r) -> (p ~ P1 => a -> r) -> MaybeP p a -> r
eitherP :: Proxy p -> (p ~ P0 => a -> r) -> (p ~ P1 => b -> r) -> EitherP p a b -> r
(Note that we need Rank2Types for this, and note also that these are essentially the CPS-transformed versions of the GADT versions of MaybeP and EitherP.)
Then we can write:
asdf :: Phase p => MaybeP p a -> MaybeP p a
asdf a = maybeP phase () id a
But that's still not enough, because GHC says:
data.hs:116:29:
Could not deduce (MaybeP p a ~ MaybeP p0 a0)
from the context (Phase p)
bound by the type signature for
asdf :: Phase p => MaybeP p a -> MaybeP p a
at data.hs:116:1-29
NB: `MaybeP' is a type function, and may not be injective
In the fourth argument of `maybeP', namely `a'
In the expression: maybeP phase () id a
In an equation for `asdf': asdf a = maybeP phase () id a
Maybe you could solve this with a type signature somewhere, but at that point it seems like more bother than it's worth. So pending further information from someone else I'm going to recommend using the GADT version, being the simpler and more robust one, at the cost of a little syntactic noise.
Update again: The problem here was that because MaybeP p a is a type function and there is no other information to go by, GHC has no way to know what p and a should be. If I pass in a Proxy p and use that instead of phase that solves p, but a is still unknown.
There's no ideal solution to this problem, since each one has different pros and cons.
I would personally go with using a single data type "tree" and add separate data constructors for things that need to be differentiated. I.e.:
data Type
= ...
| TypeArray Id Type Expr
| TypeResolvedArray Id Type Int
| ...
This has the advantage that you can run the same phase multiple times on the same tree, as you say, but the reasoning goes deeper than that: Let's say that you are implementing a syntax element that generates more AST (something like include or a C++ template kind of deal, and it can depend on constant exprs like your TypeArray so you can't evaluate it in the first iteration). With the unified data type approach, you can just insert the new AST in your existing tree, and not only can you run the same phases as before on that tree directly, but you will get caching for free, i.e. if the new AST references an array by using sizeof(typeof(myarr)) or something, you don't have to determine the constant size of myarr again, because its type is already a TypeResolvedArray from your previous resolution phase.
You could use a different representation when you are past all of your compilation phases, and it is time to interpret the code (or something); then you are certain of the fact that no more AST changes will be necessary, and a more streamlined representation might be a good idea.
By the way, you should use Data.Word.Word instead of Data.Int.Int for array sizes. It is such a common error in C to use ints for indexing arrays, while C pointers actually are unsigned. Please don't make this mistake in your language, too, unless you really want to support arrays with negative sizes.

Haskell get type of algebraic parameter

I have a type
class IntegerAsType a where
value :: a -> Integer
data T5
instance IntegerAsType T5 where value _ = 5
newtype (IntegerAsType q) => Zq q = Zq Integer deriving (Eq)
newtype (Num a, IntegerAsType n) => PolyRing a n = PolyRing [a]
I'm trying to make a nice "show" for the PolyRing type. In particular, I want the "show" to print out the type 'a'. Is there a function that returns the type of an algebraic parameter (a 'show' for types)?
The other way I'm trying to do it is using pattern matching, but I'm running into problems with built-in types and the algebraic type.
I want a different result for each of Integer, Int and Zq q.
(toy example:)
test :: (Num a, IntegerAsType q) => a -> a
(Int x) = x+1
(Integer x) = x+2
(Zq x) = x+3
There are at least two different problems here.
1) Int and Integer are not data constructors for the 'Int' and 'Integer' types. Are there data constructors for these types/how do I pattern match with them?
2) Although not shown in my code, Zq IS an instance of Num. The problem I'm getting is:
Ambiguous constraint `IntegerAsType q'
At least one of the forall'd type variables mentioned by the constraint
must be reachable from the type after the '=>'
In the type signature for `test':
test :: (Num a, IntegerAsType q) => a -> a
I kind of see why it is complaining, but I don't know how to get around that.
Thanks
EDIT:
A better example of what I'm trying to do with the test function:
test :: (Num a) => a -> a
test (Integer x) = x+2
test (Int x) = x+1
test (Zq x) = x
Even if we ignore the fact that I can't construct Integers and Ints this way (still want to know how!) this 'test' doesn't compile because:
Could not deduce (a ~ Zq t0) from the context (Num a)
My next try at this function was with the type signature:
test :: (Num a, IntegerAsType q) => a -> a
which leads to the new error
Ambiguous constraint `IntegerAsType q'
At least one of the forall'd type variables mentioned by the constraint
must be reachable from the type after the '=>'
I hope that makes my question a little clearer....
I'm not sure what you're driving at with that test function, but you can do something like this if you like:
{-# LANGUAGE ScopedTypeVariables #-}
class NamedType a where
name :: a -> String
instance NamedType Int where
name _ = "Int"
instance NamedType Integer where
name _ = "Integer"
instance NamedType q => NamedType (Zq q) where
name _ = "Zq (" ++ name (undefined :: q) ++ ")"
I would not be doing my Stack Overflow duty if I did not follow up this answer with a warning: what you are asking for is very, very strange. You are probably doing something in a very unidiomatic way, and will be fighting the language the whole way. I strongly recommend that your next question be a much broader design question, so that we can help guide you to a more idiomatic solution.
Edit
There is another half to your question, namely, how to write a test function that "pattern matches" on the input to check whether it's an Int, an Integer, a Zq type, etc. You provide this suggestive code snippet:
test :: (Num a) => a -> a
test (Integer x) = x+2
test (Int x) = x+1
test (Zq x) = x
There are a couple of things to clear up here.
Haskell has three levels of objects: the value level, the type level, and the kind level. Some examples of things at the value level include "Hello, world!", 42, the function \a -> a, or fix (\xs -> 0:1:zipWith (+) xs (tail xs)). Some examples of things at the type level include Bool, Int, Maybe, Maybe Int, and Monad m => m (). Some examples of things at the kind level include * and (* -> *) -> *.
The levels are in order; value level objects are classified by type level objects, and type level objects are classified by kind level objects. We write the classification relationship using ::, so for example, 32 :: Int or "Hello, world!" :: [Char]. (The kind level isn't too interesting for this discussion, but * classifies types, and arrow kinds classify type constructors. For example, Int :: * and [Int] :: *, but [] :: * -> *.)
Now, one of the most basic properties of Haskell is that each level is completely isolated. You will never see a string like "Hello, world!" in a type; similarly, value-level objects don't pass around or operate on types. Moreover, there are separate namespaces for values and types. Take the example of Maybe:
data Maybe a = Nothing | Just a
This declaration creates a new name Maybe :: * -> * at the type level, and two new names Nothing :: Maybe a and Just :: a -> Maybe a at the value level. One common pattern is to use the same name for a type constructor and for its value constructor, if there's only one; for example, you might see
newtype Wrapped a = Wrapped a
which declares a new name Wrapped :: * -> * at the type level, and simultaneously declares a distinct name Wrapped :: a -> Wrapped a at the value level. Some particularly common (and confusing examples) include (), which is both a value-level object (of type ()) and a type-level object (of kind *), and [], which is both a value-level object (of type [a]) and a type-level object (of kind * -> *). Note that the fact that the value-level and type-level objects happen to be spelled the same in your source is just a coincidence! If you wanted to confuse your readers, you could perfectly well write
newtype Huey a = Louie a
newtype Louie a = Dewey a
newtype Dewey a = Huey a
where none of these three declarations are related to each other at all!
Now, we can finally tackle what goes wrong with test above: Integer and Int are not value constructors, so they can't be used in patterns. Remember -- the value level and type level are isolated, so you can't put type names in value definitions! By now, you might wish you had written test' instead:
test' :: Num a => a -> a
test' (x :: Integer) = x + 2
test' (x :: Int) = x + 1
test' (Zq x :: Zq a) = x
...but alas, it doesn't quite work like that. Value-level things aren't allowed to depend on type-level things. What you can do is to write separate functions at each of the Int, Integer, and Zq a types:
testInteger :: Integer -> Integer
testInteger x = x + 2
testInt :: Int -> Int
testInt x = x + 1
testZq :: Num a => Zq a -> Zq a
testZq (Zq x) = Zq x
Then we can call the appropriate one of these functions when we want to do a test. Since we're in a statically-typed language, exactly one of these functions is going to be applicable to any particular variable.
Now, it's a bit onerous to remember to call the right function, so Haskell offers a slight convenience: you can let the compiler choose one of these functions for you at compile time. This mechanism is the big idea behind classes. It looks like this:
class Testable a where test :: a -> a
instance Testable Integer where test = testInteger
instance Testable Int where test = testInt
instance Num a => Testable (Zq a) where test = testZq
Now, it looks like there's a single function called test which can handle any of Int, Integer, or numeric Zq's -- but in fact there are three functions, and the compiler is transparently choosing one for you. And that's an important insight. The type of test:
test :: Testable a => a -> a
...looks at first blush like it is a function that takes a value that could be any Testable type. But in fact, it's a function that can be specialized to any Testable type -- and then only takes values of that type! This difference explains yet another reason the original test function didn't work. You can't have multiple patterns with variables at different types, because the function only ever works on a single type at a time.
The ideas behind the classes NamedType and Testable above can be generalized a bit; if you do, you get the Typeable class suggested by hammar above.
I think now I've rambled more than enough, and likely confused more things than I've clarified, but leave me a comment saying which parts were unclear, and I'll do my best.
Is there a function that returns the type of an algebraic parameter (a 'show' for types)?
I think Data.Typeable may be what you're looking for.
Prelude> :m + Data.Typeable
Prelude Data.Typeable> typeOf (1 :: Int)
Int
Prelude Data.Typeable> typeOf (1 :: Integer)
Integer
Note that this will not work on any type, just those which have a Typeable instance.
Using the extension DeriveDataTypeable, you can have the compiler automatically derive these for your own types:
{-# LANGUAGE DeriveDataTypeable #-}
import Data.Typeable
data Foo = Bar
deriving Typeable
*Main> typeOf Bar
Main.Foo
I didn't quite get what you're trying to do in the second half of your question, but hopefully this should be of some help.

Resources