Suppose I want to define the type Mod4 of integers modulo 4. After all, Int is Mod2^64. One obvious way I could go is
data Mod4 = ZeroMod4 | OneMod4 | TwoMod4 | ThreeMod4
However I could also do this
data Mod4 = Mod4 Integer
instance Eq Mod4 where
(Mod4 x) == (Mod4 y) = (x-y) `mod` 4 == 0
But then this function is problematic :
f :: Mod4 -> Mod4
f (Mod4 x) = if x < 20 then Mod4 0 else Mod4 1
f (Mod4 16) is different from f (Mod4 20), whereas those two arguments are ==. So I end up with two sorts of equality : representation in memory (
Mod4 16 is different from Mod4 20) and ==.
Since all functions can pattern match their arguments, they can always bypass any == operator. Why didn't Haskell just took the representation in memory as the definition of equality ? This way all types become trivially equatable.
Actually, equality is implied by the very concept of function : a graph that produces equal outputs when given equal inputs. So it makes little sense to speak of a function on a type that is not equatable.
Why didn't Haskell just took the representation in memory as the definition of equality ? This way all types become trivially equatable.
Nope. You can't compare values of type Integer -> Bool. Functions can not be compared, in general.
Back to the blackboard. How to design equality in a typed language?
One option is to let (==) :: a -> a -> Bool, and throw an exception if a is a function. See e.g. Ocaml.
Another option is to partition types in equatable/not equatable. This is eqtype in SML.
Another, but related, option is to express "eq-ability" as a constraint on the polymorphism. Eq in Haskell.
Now, Eq might have been more special. E.g. you can't define its instances by yourself, and you must use deriving Eq, similarly to how Typeable works now.
The Haskell designers instead to allow users to define their own comparison function. The users might know some "smarter" way. E.g. to compare a 10-field record, start by comparing the usually-different fields, and compare usually-equal ones later, trying to improve efficiency.
Note that, if we don't export the data type constructor, we can make equality to be an equivalence and still be useful. E.g. Data.Set.Set equates different (balanced) trees when they represent the same set, yet the exported interface never breaks the equivalence, so equality looks like equality from outside.
So it makes little sense to speak of a function on a type that is not equatable.
True, when "not equatable" is interpreted in a mathematical sense. However. when it is interpreted as "the equality predicate is not computable", it makes a lot of sense. We can speak of a function working on values whose type has undecidable equality.
Related
Why are these not equivalent?
show $ if someCondition then someInt else some double
and
if someCondition then show someInt else show someDouble
I understand that if you isolate the if ... else part in the first example to an expression by itself then you can't represent its type with an anonymous sum type, the kind of Int | Double, like something you could do easily in TypeScript (mentioning TypeScript because it is the langauge I used often and that supports Sum types), and would have to resort to using the Either data then based on it would call show.
The example I gave here is trivial but to me it makes more sense to think "Okay we are going to show something, and that something depends on someCondition" rather than "Okay if someCondition is true then show someInt otherwise show someDouble", and also allows for less code duplication (here the show is repeated twice but it could also be a long function application and instead of an if ... else there could be >2 branches to consider)
In my mind it should be easy for the compiler to check if each of the types that make the sum type (here Int | Double) could be used as a parameter to show function and decides if the types are correct or not. Even better is that show function always returns a string no matter the types of the parameters, so the compiler doesn't have to carry with it all the possible "branches" (so all the possible types).
Is it by choice that such a feature doesn't exist? Or is implementing it harder that I think?
All parts of an expression must be well-typed. The type of if someCondition then someInt else someDouble would have to be something like exists a. Show a => a, but Haskell doesn't support that kind of existential quantification.
Update: As chi points out in a comment, this would also be possible if Haskell had support for union/intersection types (which are not the same as sum/product types), but it unfortunately doesn't.
There are product types with lightweight syntax, written (,), in Haskell. One would thing that a sum type with a lightweight syntax, something like (Int | String), would be a great idea. The reality is more complicated. Let's see why (I'm taking some liberties with Num, they are not important).
if someCondition then 42 else "helloWorld"
If this should return a value of type like (Int | String), then what should the following return?
if someCondition then 42 else 0
(Int | Int) obviously, but if this is distinct from plain Int then we're in deep trouble. So (Int | Int) should be identical to plain Int.
One can immediately see that this is not just lightweight syntax for sum types, but a wholly new language feature. A different kind of type system if you will. Should we have one?
Let's look at this function.
mysteryType x a b = if x then a else b
Now what type does mysteryType have? Obviously
mysteryType :: Bool -> a -> b -> (a|b)
right? Now what if a and b are the same type?
let x = mysteryType True 42 0
This should be plain Int as we have agreed previously. Now mysteryType sometimes return an anonymous sum type, and sometimes it does not, depending on what arguments you pass. How would you pattern match such an expression? What on Earth can you do with it? Except trivial things like "show" (or whatever methods of other type-classes it would be an instance of), not a whole lot. Unless you add run-time type information to the language, that is, so typeof is available — and that make Haskell an entirely different language.
So yeah. Why isn't Haskell a TypeScript? Because we don't need another TypeScript. If you want TypeScript, you know where to find it.
I'm told that in dependent type system, "types" and "values" is mixed, and we can treat both of them as "terms" instead.
But there is something I can't understand: in a strongly typed programming language without Dependent Type (like Haskell), Types is decided (infered or checked) at compile time, but values is decided (computed or inputed) at runtime.
I think there must be a gap between these two stages. Just think that if a value is interactively read from STDIN, how can we reference this value in a type which must be decided AOT?
e.g. There is a natural number n and a list of natural number xs (which contains n elements) which I need to read from STDIN, how can I put them into a data structure Vect n Nat?
Suppose we input n :: Int at runtime from STDIN. We then read n strings, and store them into vn :: Vect n String (pretend for the moment this can be done).
Similarly, we can read m :: Int and vm :: Vect m String. Finally, we concatenate the two vectors: vn ++ vm (simplifying a bit here). This can be type checked, and will have type Vect (n+m) String.
Now it is true that the type checker runs at compile time, before the values n,m are known, and also before vn,vm are known. But this does not matter: we can still reason symbolically on the unknowns n,m and argue that vn ++ vm has that type, involving n+m, even if we do not yet know what n+m actually is.
It is not that different from doing math, where we manipulate symbolic expressions involving unknown variables according to some rules, even if we do not know the values of the variables. We don't need to know what number is n to see that n+n = 2*n.
Similarly, the type checker can type check
-- pseudocode
readNStrings :: (n :: Int) -> IO (Vect n String)
readNStrings O = return Vect.empty
readNStrings (S p) = do
s <- getLine
vp <- readNStrings p
return (Vect.cons s vp)
(Well, actually some more help from the programmer could be needed to typecheck this, since it involves dependent matching and recursion. But I'll neglect this.)
Importantly, the type checker can check that without knowing what n is.
Note that the same issue actually already arises with polymorphic functions.
fst :: forall a b. (a, b) -> a
fst (x, y) = x
test1 = fst # Int # Float (2, 3.5)
test2 = fst # String # Bool ("hi!", True)
...
One might wonder "how can the typechecker check fst without knowing what types a and b will be at runtime?". Again, by reasoning symbolically.
With type arguments this is arguably more obvious since we usually run the programs after type erasure, unlike value parameters like our n :: Int above, which can not be erased. Still, there is some similarity between universally quantifying over types or over Int.
It seems to me that there are two questions here:
Given that some values are unknown during compile-time (e.g., values read from STDIN), how can we make use of them in types? (Note that chi has already given an excellent answer to this.)
Some operations (e.g., getLine) seem to make absolutely no sense at compile-time; how could we possibly talk about them in types?
The answer to (1), as chi has said, is symbolic or abstract reasoning. You can read in a number n, and then have a procedure that builds a Vect n Nat by reading from the command line n times, making use of arithmetic properties such as the fact that 1+(n-1) = n for nonzero natural numbers.
The answer to (2) is a bit more subtle. Naively, you might want to say "this function returns a vector of length n, where n is read from the command line". There are two types you might try to give this (apologies if I'm getting Haskell notation wrong)
unsafePerformIO (do n <- getLine; return (IO (Vect (read n :: Int) Nat)))
or (in pseudo-Coq notation, since I'm not sure what Haskell's notation for existential types is)
IO (exists n, Vect n Nat)
These two types can actually both be made sense of, and say different things. The first type, to me, says "at compile time, read n from the command line, and return a function which, at runtime, gives a vector of length n by performing IO". The second type says "at runtime, perform IO to get a natural number n and a vector of length n".
The way I like looking at this is that all side effects (other than, perhaps, non-termination) are monad transformers, and there is only one monad: the "real-world" monad. Monad transformers work just as well at the type level as at the term level; the one thing which is special is run :: M a -> a which executes the monad (or stack of monad transformers) in the "real world". There are two points in time at which you can invoke run: one is at compile time, where you invoke any instance of run which shows up at the type level. Another is at runtime, where you invoke any instance of run which shows up at the value level. Note that run only makes sense if you specify an evaluation order; if your language does not specify whether it is call-by-value or call-by-name (or call-by-push-value or call-by-need or call-by-something-else), you can get incoherencies when you try to compute a type.
In order to prove that for instance the Category laws hold for some operations on a data type, how do one decide how to define equality? Considering the following type for representing boolean expressions:
data Exp
= ETrue
| EFalse
| EAnd Exp Exp
deriving (Eq)
Is it feasible trying to prove that Exp forms a Category with identity ETrue and operator:
(<&>) = EAnd
without redefining the Eq instance? Using the default instance of Eq the left-identity law breaks, i.e:
ETrue <&> e == e
evaluates to False. However, defining an eval function:
eval ETrue = True
eval EFalse = False
eval (EAnd e1 e2) = eval e1 && eval e2
and the Eq instance as:
instance Eq Exp where
e1 == e2 = eval e1 == eval e2
fixes the problem. Is comparison in terms of (==) a general requirement for claiming to satisfy such laws, or is it sufficient to say that the laws hold for a particular type of equality operator?
Equality is EVIL. You rarely (if ever) need structural equality,
because it is too strong. You only want an equivalence that is strong enough for
what you're doing. This is particularly true for category theory.
In Haskell, deriving Eq will give you structural equality, which means that you'll
often want to write your own implementation of == / /=.
A simple example: Define rational number as pairs of integers,
data Rat = Integer :/ Integer. If you use structural equality (what Haskell is
deriving), you'll have (1:/2) /= (2:/4), but as a fraction 1/2 == 2/4. What
you really care about is the value that your tuples denote, not their
representation. This means you'll need an equivalence that compares reduced
fractions, so you should implement that instead.
Side note: If someone using the code assumes that you've defined a structural
equality test, i.e. that checking with == justifies replacing data sub-components
through pattern matching, their code may break. If that is of importance,
you may hide the constructors to disallow pattern matching, or maybe define your
own class (say, Equiv with === and =/=) to separate both concepts. (This
is mostly important for theorem provers like Agda or Coq, in Haskell it's really
hard to get practical/real-world code so wrong that finally something breaks.)
Really Stupid(TM) example: Let's say that person wants to print long lists of huge
Rats and believes memoizing the string representations of the Integers will save
on binary-to-decimal conversion. There's a lookup table for Rats, such that equal
Rats will never be converted twice, and there's a lookup table for integers. If
(a:/b) == (c:/d), missing integer entries will be filled by copying between a-c /
b-d to skip conversion (ouch!). For the list [ 1:/1, 2:/2, 2:/4 ], 1 gets
converted and then, because 1:/1 == 2:/2, the string for 1 gets copied into the
2 lookup entry. The final result "1/1, 1/1, 1/4" is borked.
I was doing my usual "Read a chapter of LYAH before bed" routine, feeling like my brain was expanding with every code sample. At this point I was convinced that I understood the core awesomeness of Haskell, and now just had to understand the standard libraries and type classes so that I could start writing real software.
So I was reading the chapter about applicative functors when all of a sudden the book claimed that functions don't merely have types, they are types, and can be treated as such (For example, by making them instances of type classes). (->) is a type constructor like any other.
My mind was blown yet again, and I immediately jumped out of bed, booted up the computer, went to GHCi and discovered the following:
Prelude> :k (->)
(->) :: ?? -> ? -> *
What on earth does it mean?
If (->) is a type constructor, what are the value constructors? I can take a guess, but would have no idea how define it in traditional data (->) ... = ... | ... | ... format. It's easy enough to do this with any other type constructor: data Either a b = Left a | Right b. I suspect my inability to express it in this form is related to the extremly weird type signature.
What have I just stumbled upon? Higher kinded types have kind signatures like * -> * -> *. Come to think of it... (->) appears in kind signatures too! Does this mean that not only is it a type constructor, but also a kind constructor? Is this related to the question marks in the type signature?
I have read somewhere (wish I could find it again, Google fails me) about being able to extend type systems arbitrarily by going from Values, to Types of Values, to Kinds of Types, to Sorts of Kinds, to something else of Sorts, to something else of something elses, and so on forever. Is this reflected in the kind signature for (->)? Because I've also run into the notion of the Lambda cube and the calculus of constructions without taking the time to really investigate them, and if I remember correctly it is possible to define functions that take types and return types, take values and return values, take types and return values, and take values which return types.
If I had to take a guess at the type signature for a function which takes a value and returns a type, I would probably express it like this:
a -> ?
or possibly
a -> *
Although I see no fundamental immutable reason why the second example couldn't easily be interpreted as a function from a value of type a to a value of type *, where * is just a type synonym for string or something.
The first example better expresses a function whose type transcends a type signature in my mind: "a function which takes a value of type a and returns something which cannot be expressed as a type."
You touch so many interesting points in your question, so I am
afraid this is going to be a long answer :)
Kind of (->)
The kind of (->) is * -> * -> *, if we disregard the boxity GHC
inserts. But there is no circularity going on, the ->s in the
kind of (->) are kind arrows, not function arrows. Indeed, to
distinguish them kind arrows could be written as (=>), and then
the kind of (->) is * => * => *.
We can regard (->) as a type constructor, or maybe rather a type
operator. Similarly, (=>) could be seen as a kind operator, and
as you suggest in your question we need to go one 'level' up. We
return to this later in the section Beyond Kinds, but first:
How the situation looks in a dependently typed language
You ask how the type signature would look for a function that takes a
value and returns a type. This is impossible to do in Haskell:
functions cannot return types! You can simulate this behaviour using
type classes and type families, but let us for illustration change
language to the dependently typed language
Agda. This is a
language with similar syntax as Haskell where juggling types together
with values is second nature.
To have something to work with, we define a data type of natural
numbers, for convenience in unary representation as in
Peano Arithmetic.
Data types are written in
GADT style:
data Nat : Set where
Zero : Nat
Succ : Nat -> Nat
Set is equivalent to * in Haskell, the "type" of all (small) types,
such as Natural numbers. This tells us that the type of Nat is
Set, whereas in Haskell, Nat would not have a type, it would have
a kind, namely *. In Agda there are no kinds, but everything has
a type.
We can now write a function that takes a value and returns a type.
Below is a the function which takes a natural number n and a type,
and makes iterates the List constructor n applied to this
type. (In Agda, [a] is usually written List a)
listOfLists : Nat -> Set -> Set
listOfLists Zero a = a
listOfLists (Succ n) a = List (listOfLists n a)
Some examples:
listOfLists Zero Bool = Bool
listOfLists (Succ Zero) Bool = List Bool
listOfLists (Succ (Succ Zero)) Bool = List (List Bool)
We can now make a map function that operates on listsOfLists.
We need to take a natural number that is the number of iterations
of the list constructor. The base cases are when the number is
Zero, then listOfList is just the identity and we apply the function.
The other is the empty list, and the empty list is returned.
The step case is a bit move involving: we apply mapN to the head
of the list, but this has one layer less of nesting, and mapN
to the rest of the list.
mapN : {a b : Set} -> (a -> b) -> (n : Nat) ->
listOfLists n a -> listOfLists n b
mapN f Zero x = f x
mapN f (Succ n) [] = []
mapN f (Succ n) (x :: xs) = mapN f n x :: mapN f (Succ n) xs
In the type of mapN, the Nat argument is named n, so the rest of
the type can depend on it. So this is an example of a type that
depends on a value.
As a side note, there are also two other named variables here,
namely the first arguments, a and b, of type Set. Type
variables are implicitly universally quantified in Haskell, but
here we need to spell them out, and specify their type, namely
Set. The brackets are there to make them invisible in the
definition, as they are always inferable from the other arguments.
Set is abstract
You ask what the constructors of (->) are. One thing to point out
is that Set (as well as * in Haskell) is abstract: you cannot
pattern match on it. So this is illegal Agda:
cheating : Set -> Bool
cheating Nat = True
cheating _ = False
Again, you can simulate pattern matching on types constructors in
Haskell using type families, one canoical example is given on
Brent Yorgey's blog.
Can we define -> in the Agda? Since we can return types from
functions, we can define an own version of -> as follows:
_=>_ : Set -> Set -> Set
a => b = a -> b
(infix operators are written _=>_ rather than (=>)) This
definition has very little content, and is very similar to doing a
type synonym in Haskell:
type Fun a b = a -> b
Beyond kinds: Turtles all the way down
As promised above, everything in Agda has a type, but then
the type of _=>_ must have a type! This touches your point
about sorts, which is, so to speak, one layer above Set (the kinds).
In Agda this is called Set1:
FunType : Set1
FunType = Set -> Set -> Set
And in fact, there is a whole hierarchy of them! Set is the type of
"small" types: data types in haskell. But then we have Set1,
Set2, Set3, and so on. Set1 is the type of types which mentions
Set. This hierarchy is to avoid inconsistencies such as Girard's
paradox.
As noticed in your question, -> is used for types and kinds in
Haskell, and the same notation is used for function space at all
levels in Agda. This must be regarded as a built in type operator,
and the constructors are lambda abstraction (or function
definitions). This hierarchy of types is similar to the setting in
System F omega, and more
information can be found in the later chapters of
Pierce's Types and Programming Languages.
Pure type systems
In Agda, types can depend on values, and functions can return types,
as illustrated above, and we also had an hierarchy of
types. Systematic investigation of different systems of the lambda
calculi is investigated in more detail in Pure Type Systems. A good
reference is
Lambda Calculi with Types by Barendregt,
where PTS are introduced on page 96, and many examples on page 99 and onwards.
You can also read more about the lambda cube there.
Firstly, the ?? -> ? -> * kind is a GHC-specific extension. The ? and ?? are just there to deal with unboxed types, which behave differently from just * (which has to be boxed, as far as I know). So ?? can be any normal type or an unboxed type (e.g. Int#); ? can be either of those or an unboxed tuple. There is more information here: Haskell Weird Kinds: Kind of (->) is ?? -> ? -> *
I think a function can't return an unboxed type because functions are lazy. Since a lazy value is either a value or a thunk, it has to be boxed. Boxed just means it is a pointer rather than just a value: it's like Integer() vs int in Java.
Since you are probably not going to be using unboxed types in LYAH-level code, you can imagine that the kind of -> is just * -> * -> *.
Since the ? and ?? are basically just more general version of *, they do not have anything to do with sorts or anything like that.
However, since -> is just a type constructor, you can actually partially apply it; for example, (->) e is an instance of Functor and Monad. Figuring out how to write these instances is a good mind-stretching exercise.
As far as value constructors go, they would have to just be lambdas (\ x ->) or function declarations. Since functions are so fundamental to the language, they get their own syntax.
How can I do this in haskell?
equal(S,S) -> true;
equal(S1, S2) -> {differ, S1, S2}.
Haskell has a perfectly serviceable (==) operator for checking equality (on types for which equality is defined) so I'm assuming you're referring to something else here besides merely testing equality.
I don't know Erlang, but given that you wrote equal(S, S) my first guess would be that you want pattern matches to express equality by reusing the variable name. Unfortunately Haskell (and ML-style in general) pattern matching is less powerful than in languages like Prolog; all the pattern can do is bind variables, not perform full unification.
It's true that there are constant value patterns like foo [1,2] = ... but that's just syntactic sugar for a binding and equality check, and it's only done for constant values, not variables.
The usual Haskell approach would probably be pattern guards, like this:
data EqualResult a b = Yep | Nope (a, b) deriving (Show, Eq)
equal :: (Eq a) => a -> a -> EqualResult a a
equal s1 s2 | s1 == s2 = Yep
| otherwise = Nope (s1, s2)
On the off chance that you wanted some sort of reference equality instead of checking for equal values, that doesn't work because it doesn't even make sense in Haskell.
Edit: It has been pointed out to me that you may also have been asking about returning different result types. Working with types should be covered well in any introduction to Haskell, but the short version in this case is that if you need to return one of two possible types, you need a data type with one constructor for each; you then examine the result using pattern matching (in a declaration or case expression).
In this case, to make it look more like your function I've made a special-purpose type with two constructors: One indicating equality (with no further details) and one indicating inequality that holds a pair of values. You can also do it in a generic way using the built-in type Either a b, which has two constructors Left a and Right b.