Why does Haskell 9.0 not have Zero in its linear types, but Idris 2 does? - haskell

From the Idris 2 publication about linear types "Idris 2: Quantitative Type Theory in Practice":
For Idris 2, we make a concrete choice of semiring, where a multiplicity can be one of:
0: the variable is not used at run time
1: the variable is used exactly once at run time
ω: no restrictions on the variable’s usage at run time
But for Haskell:
In the fashion of levity polymorphism, the proposal introduces a data type Multiplicity which is treated specially by the type checker, to represent the multiplicities:
data Multiplicity
= One -- represents 1
| Many -- represents ω
Why didn't they add a Zero?

In Idris, the value of a function argument can appear in the return type. You might write a function with type
vecLen : (n : Nat) -> Vect n a -> (m : Nat ** n = m)
This says "vecLen is a function that takes a Nat (call it n) and a Vect of length n and element type a and returns a Nat (call it m) such that n = m". The intent is to walk the linked list Vect and come up with its length as a runtime Nat value. But the issue is that vecLen requires you to already have the length of the vector as a runtime value, because that's the very n argument we're talking about! I.e. it's possible to implement just
vecLen n _ = (n ** Refl) -- does no real work
You can't not take n as an argument, because the Vect type needs its length as an argument to make sense. This is a somewhat serious problem, because having both n and a Vect n a duplicates information—without intervention the "classic" definition of Vect n a itself is in fact space-quadratic in n! So we need a way to take an argument that we know "in principle exists" but may not "actually exist" at runtime. That's what the zero multiplicity is for.
vecLen : (0 n : Nat) -> Vect n a -> (m : Nat ** n = m)
-- old definition invalid, you MUST iterate over the Vect to collect information on n (which is what we want!)
vecLen _ [] = (0 ** Refl)
vecLen _ (_ :: xs) with (vecLen _ xs)
vecLen _ (_ :: xs) | (n ** Refl) = (S n ** Refl)
This is the use of the zero multiplicity in Idris: it lets you talk about values that may not exist yet/anymore/ever inside of types so you can reason about them etc. (I believe in one talk Edwin Brady did, he used or at least noted you could use zero multiplicities to reason about the previous state of a mutable resource.)
But Haskell does not quite have this kind of dependent typing... so there just isn't a use case for zero multiplicities. From another point of view, the zero multiplicity form of f :: a -> b is just f :: forall (x :: a). b, because Haskell is wholesale type-erased in a way that Idris and other dependently typed languages cannot easily be. (In this sense, IMO, I think Haskell has the whole dependent type erasure problem that other languages have such trouble with neatly solved—at the cost of having to use an awkward encoding for everything else about dependent types!)
In the latter sense, the "dependent" Haskell (i.e. souped-up with singletons) way to take a dependent, unrestricted argument is this:
-- vvvvvv + vvvvvvvv duplicate information!
vecLen :: SNat n -> Vect n a -> SNat n
vecLen n _ = n -- allowed, ergh!
and the "zero multiplicity" version is just
vecLen :: Vect n a -> SNat n
-- must walk Vect, cannot match on erased types like n
vecLen Nil = SZ
vecLen (_ `Cons` xs) = SS (goodLen xs)

Related

What does data Vector :: * -> Nat -> * where mean in Haskell?

I'm looking at https://wiki.haskell.org/GHC/Kinds and I found this:
data Nat = Zero | Succ Nat
data Vector :: * -> Nat -> * where
VNil :: Vector a Zero
VCons :: a -> Vector a n -> Vector a (Succ n)
I tried looking about what does the where does in Haskell but I could not find info about it together with data. I have an idea of what a kind is. A constructor that takes no parameters has kind * and a constructor that takes one parameter has kind * -> *, it's the 'type' of the constructor. I think it's defining the kind of Vector as being * -> Nat -> * which means something that can take anything, a natural number and return a Vector?
And more important: why would someone use this stuff for?
The data ... where ... syntax is GADT syntax. (The wiki you were looking at also has a page about GADTs, linked from the page you were reading)
GADT syntax is enabled with the GADTs language extension. It gives you an alternative to the standard data declaration syntax, where instead of listing the data constructors as if they were applied to the types of their fields (thus implicitly defining the overall type of the constructor), you instead write an explicit type signature for each constructor.
For example, the standard Maybe type is defined like this in traditional syntax:
-- Maybe type constructor takes an argument a;
-- all data constructors implicitly return Maybe a
data Maybe a
= Just a -- Just data constructor takes an argument **of type** a, not a itself
| Nothing -- Nothing data constructor takes no arguments
And like this in GADT syntax:
-- Maybe type constructor takes an argument a
data Maybe a where
Just :: a -> Maybe a -- Just takes an argument of type a to return a Maybe a
Nothing :: Maybe a -- Nothing is simply of type Maybe a
This syntax is more verbose. In complex cases it can arguably be clearer, since we explicitly write the type of the constructors in ordinary type expression syntax, rather than defining them in a weird special-purpose syntax where we pseudo-apply term-level data constructors to the types of their arguments.
But more importantly, GADT syntax1 opens up the door to new features that simply cannot be expressed in the original syntax. That is happening in your example.
Primarily the new features come from control over the return type of each constructor. If we wanted to try to define Vector using the traditional data syntax, we would have to do something like this:
data Vector a n
= VNil
| VCons a (Vector a n)
The n parameter is supposed to represent the vector's number of elements at the type level. The idea is so that we can do things like zip two vectors together with the compiler enforcing that the two vectors have the same length, rather than doing something like Data.List.zip where it simply gives up and silently discards any remaining elements from one list when the other runs out of elements.
But what we've written above can't do that. VNil always returns a value of type Vector a n, and it doesn't have any fields of type n (or a) so any VNil can be used with any n type parameter at all; n isn't going to say anything about the size of the vector! And similarly VCons has a field with a Vector a n in it, but is also going to end up constructing a value of type Vector a n, where the n parameter is the same, when we want it to indicate that the size is one larger than the tail-vector's size.
GADT syntax allows us to fix those problems:
{-# LANGUAGE GADTs #-}
data Vector a n where
VNil :: Vector a Zero
VCons :: a -> Vector a n -> Vector a (Succ n)
Now we can explicitly say that the VNil constructor is a vector of size Zero; it's not like Nothing :: Maybe a where it is always polymorphic in that type variable. (Our VNil is still polymorphic in the type variable a, of course, since having no elements means we shouldn't constrain what the element type will be). And VCons takes an a and a Vector a n to produce a Vector a (Succ n)2; specifically a vector that is "one item larger" than the vector inside it.
We missed a step though, which is actually defining Zero and Succ anywhere. The way to do that is simply:
data Nat = Zero | Succ Nat
This says: A natural number2 is either Zero, or it is the Succ of some other Nat. We can pattern match a Nat repeatedly and we'll eventually3 hit Zero; if we wanted to convert this to a "normal" number we'd do that and count the Succ constructors.
But that has only given us Zero and Succ as data constructors. We were wanting to use them in the types of our VNil and VCons. For that we'll need another extension: DataKinds. It just allows use to use any data declaration both to define a (type-level) type constructor and its associated (term-level) data constructors and to define a (kind-level) "kind constructor" and its associated type constructors (which in turn do not contain any values at the term-level; they're purely for type manipulation).
Note that the wiki page you were looking at appears to be describing a not-yet-implemented language extension called Kinds, which is clearly the same as what is now DataKinds. As such, that wiki page is extremely old, so you should probably just ignore it. If you were looking for reading material about kinds in general (rather than the DataKinds extension specifically), you probably need to continue your search.
But using DataKinds we can do this (actually compiles now):
{-# LANGUAGE GADTs, DataKinds #-}
data Nat = Zero | Succ Nat
data Vector a n where
VNil :: Vector a Zero
VCons :: a -> Vector a n -> Vector a (Succ n)
Here we defined Nat (at the type level) and Zero & Succ (at the term level) as normal. But then in the definition of Vector we use Zero and Succ at the type level. This is enough for GHC to infer that in Vector a n the n must be of kind Nat (because it is used with type constructors in that kind). But we can also use yet-another extension KindSignatures and be more explicit about the kind of the type constructor Vector, like so:
{-# LANGUAGE GADTs, DataKinds, KindSignatures #-}
data Nat = Zero | Succ Nat
data Vector :: * -> Nat -> * where -- this line is where the difference is
VNil :: Vector a Zero
VCons :: a -> Vector a n -> Vector a (Succ n)
Before we were just saying data Vector a n and trusting both the compiler and the reader to figure out from the usage below that a must be of kind * (it's used as the type of an actual value in the VCons constructor) and n is of kind Nat (because that position in the Vector type constructor is filled by type-level Zero and Succ n in the constructors). Now we can explicitly give the compiler and the reader more information up front with data Vector :: * -> Nat -> *; Vector takes a type parameter of kind *, another type parameter of kind Nat, and results in a type of kind * (which means it's a type that can actually have values at the term level).
It can be a little confusing with DataKinds whether a token like Zero refers to the term-level Zero (of type Nat, which is of kind *) or the type-level Zero (of kind Nat). Most of the time the compiler can perfectly tell which is which because it keeps very strong track of whether a given expression is a term expression or a type expression. But DataKinds gives you a way of being more explicit; if you prepend a type constructor with a single quote (like 'Zero) then it definitively means the data constructor promoted to type level. Some people consider it good style to always use this explicit marking.4
So that is pretty much everything in that is going on in your example.
As for what it's all for... at the most basic level it allows you do things like keeping track of the length of lists5, and then have the compiler enforce that multiple lists have the same size (or have sizes that have some specific other relationship, like being larger, or smaller, or twice as large, etc). People use that capability for a huge number of things that I can't possibly sum up in a single post (and not just with sizes of things; there are countless ways to use DataKinds and GADTs to reflect information at the type level so that the compiler can enforce things for you); it's a bit like asking "what are functions for". But here's a couple of example functions that do interesting things with the type-level length:
vzipWith :: (a -> b -> c) -> Vector a n -> Vector b n -> Vector c n
vzipWith _ VNil VNil = VNil
vzipWith f (a `VCons` as) (b `VCons` bs)
= f a b `VCons` vzipWith f as bs
The zip that enforces the vectors have the same length, as I mentioned earlier. Guarantees that you can't have a bug because one list was accidentally shorter and you simply ignored elements of the other, instead the compiler will complain. Here's vzipWith at work in GHCI:
λ vzipWith replicate (1 `VCons` (2 `VCons` VNil)) ('a' `VCons` ('b' `VCons` VNil))
VCons "a" (VCons "bb" VNil)
it :: Vector [Char] ('Succ ('Succ 'Zero))
With a bit more work we can define how to add type level Nats (using even more extensions, which I won't explain in detail here). Then we can append vectors while keeping track of the combined length:
type Plus :: Nat -> Nat -> Nat
type family Plus n m
where Zero `Plus` n = n
Succ n `Plus` m = Succ (n `Plus` m)
vappend :: Vector a n -> Vector a m -> Vector a (n `Plus` m)
vappend VNil ys = ys
vappend (x `VCons` xs) ys = x `VCons` vappend xs ys
And at work:
λ vappend (1 `VCons` (2 `VCons` VNil)) (3 `VCons` (4 `VCons` (5 `VCons` VNil)))
VCons 1 (VCons 2 (VCons 3 (VCons 4 (VCons 5 VNil))))
it ::
Num a => Vector a ('Succ ('Succ ('Succ ('Succ ('Succ 'Zero)))))
If you want to play around with it in GHCI, put all of this in a file and load it:
{-# LANGUAGE DataKinds, GADTs, KindSignatures, StandaloneDeriving, TypeFamilies, TypeOperators, StandaloneKindSignatures #-}
data Nat = Zero | Succ Nat
data Vector :: * -> Nat -> * where
VNil :: Vector a 'Zero
VCons :: a -> Vector a n -> Vector a ('Succ n)
deriving instance Show a => Show (Vector a n)
vzipWith :: (a -> b -> c) -> Vector a n -> Vector b n -> Vector c n
vzipWith _ VNil VNil = VNil
vzipWith f (a `VCons` as) (b `VCons` bs)
= f a b `VCons` vzipWith f as bs
type Plus :: Nat -> Nat -> Nat
type family Plus n m
where Zero `Plus` n = n
Succ n `Plus` m = Succ (n `Plus` m)
vappend :: Vector a n -> Vector a m -> Vector a (n `Plus` m)
vappend VNil ys = ys
vappend (x `VCons` xs) ys = x `VCons` vappend xs ys
Here I have also added yet another extension StandaloneDeriving so we can derive a Show instance for Vector, so you can play around in the interpreter and see what you get.
1 There is in fact an extension GADTSyntax that enables just the new syntax, but doesn't allow you to define any types that you couldn't have defined in the old syntax. Hardly anyone uses this extension as far as I know; GADTs are well-regarded and anyone who bothers to learn the new syntax only does so in the context of learning about GADTs, so if they like the new syntax and want to use it everywhere they're probably just enabling GADTs everywhere.
2 "Succ" is short for "successor". The standard way of defining natural numbers from first principles is to assume that there exists the first natural number (zero), and that for any natural number has a successor which is a different natural number (and is not the successor of any other number). It's a fancy way of saying you can start at zero and count up from there as far as you want to go. But this inductive structure of natural numbers happens to map very nicely into Haskell's type logic, making it very easy to count things at type level using numbers defined this way, which is why it's being used here.
3 Unless it's infinite, which means our attempt to "count the succs" will never terminate, which is another way to produce bottom/undefined in Haskell.
4 This "tick" syntax is necessary in some cases, for disambiguation. For example, with DataKinds we can have type level lists of types, like [Bool, Char, Maybe Integer], because we can promote lists to operate at type-level instead of term-level. [Bool, Char, Maybe Integer] is a type-level list of kind [*] (a list of things of kind *, i.e. a list of types). The problem arises when we consider things like [Bool]. This is definitely a type expression, but is it the list type constructor applied to Bool (meaning, the type of terms that are a list of boolean values), or is it a singleton list-of-types (i.e. the list data constructors : and [], promoted to type level), whose single value happens to be the type Bool? One is of kind *, and the other is of kind [*]. There's no way to tell the programmer's intent from looking at it, so the rules of DataKinds say we prioritise the pre-DataKinds interpretation; [Bool] is definitely the type of lists of boolean values. We can use the tick mark to explicitly choose the other interpretation: '[Bool] is a singleton list of types.
5 Lists whose type reflects the length of the list are traditionally called vectors, to distinguish them from ordinary lists whose type says nothing about the length. Perhaps this is not ideal, since there are several other things that are also called "vectors" to distinguish them from ordinary lists. But it's established terminology.

How do I implement a partially injective type family in Haskell?

I am implementing a varity of functions that require type safe natural numbers in Haskell, and have recently needed an Exponential type to represent a new type.
Below are the three type families that I have made up to this point for ease of reference.
type family Add n m where
Add 'One n = 'Succ n
Add ('Succ n) m = 'Succ (Add n m)
-- Multiplication, allowing ever more powerful operators
type family Mul n m where
Mul 'One m = m
Mul ('Succ n) m = Add m (Mul n m)
-- Exponentiation, allowing even even more powerful operators
type family Exp n m where
Exp n 'One = n
Exp n ('Succ m) = Mul n (Exp n m)
However, when using this type I came across the issue that it wasn't injective; this meant that some of the type inferences I kinda wanted didn't exist. (The error was NB: ‘Exp’ is a non-injective type family). I can ignore the issue by using -XAllowAmbiguousTypes, but I would prefer not to use this extension, so all the types can checked where the function is defined.
I think that Exp n m should be injective when m is constant, so I wanted to try to implement that, but I'm unsure as to how to do that after a lot of trial and error. Even if it doesn't solve my problem currently, it might be useful in future. Alternatively, Exp n m is injective for a given n where m changes, and n is not One.
Upon asking other people, they suggested that something like type family Exp n m = inj | inj, n -> m where but that doesn't work, giving a syntax error on the comma if it's there and a parse error on the final n if it isn't. This was intended to allow inj and n to uniquely identify a given m.
The function I am trying to implement but am having trouble with currently has a signature as follows.
tensorPower :: forall i a n m . (Num a, KnownNat i) => Matrix n m a -> Matrix (Exp n i) (Exp m i) a
This function can be called with tensorPower #Three a (when -XAllowAmbiguousTypes is set), but I'd like GHC to be able to determine the i value by itself if possible. For the purposes of this question it's fine to assume that a given a matrix isn't polymorphic.
Adjusting the constraint to the following doesn't work either; this was an attempt at creating injectivity in the type of the above function instead of where the type family is defined
forall i a n m
. ( Num a
, KnownNat i
, Exp n ( 'Succ i) ~ Mul n (Exp n i)
, Exp m ( 'Succ i) ~ Mul m (Exp m i)
, Exp n One ~ n
, Exp m One ~ m
)
So, is it possible to implement injectivity for this function, and if so, how do I implement it?
(To see more of the code in action, please visit the repository. The src folder has the majority of the source of the code, with the main areas being fiddled with in this question belonging to Lib.hs and Quantum.hs. The extensions used can (mostly) be found in the package.yaml)
There's actually a surprisingly simple way to get this to work in at least one way; the following type family, when used appropriately in a constraint, allows tensorPower to be used without an annotation.
-- Reverse the exponent - if it can't match then it goes infinitely
type family RLog n m x c where
RLog m n n i = i
RLog m n x i = RLog m n (Mul m x) ('Succ i)
type ReverseLog n m = RLog n m n 'One
type GetExp n i = ReverseLog n (Exp n i)
----------------
-- adjusted constraint for tensorPower
forall i a n m . (Num a, KnownNat i, i ~ GetExp n i, i ~ GetExp m i)
For example, now one can type (tensorPower hadamard) *.* (zero .*. zero .*. one) (where hadamard is Matrix Two Two Double, both zero and one are Matrix Two One Double, (*.*) is matrix multiplication, (.*.) is the tensor product and the type of i is completely inferred).
The way this type family works is that it has four parameters: the base, the target, the accumulator, and the current exponent. If the target and the accumulator are equal, then the current exponent is "returned". If they are not equal, we recurse by multiplying the current accumulator by the base, and increment the current exponent.
There is one issue with this solution that I can see: if it can't match up the "bases", it has a really horribly long error message as it recurses as deep as it can into types. This can be fixed by doing some other type trickery which is out of scope for this question, but can be seen in this commit on my project's repository.
In conclusion: introducing some abstract injectivity seemed to be a no-go, but implementing a sort of reversal of the exponent resulted in clean, simple, and functioning code - this is effectively injectivity, by proving that there is a reversible function for the Exp.
(One note is that this solution needs a bit more fiddling to work fully, since GetExp n i doesn't really work for n=='One; I've gotten around this by never having GetExp ('One) i in the first place)

Haskell singletons: What do we gain with SNat

I'm trying to grook Haskell singletons.
In the paper Dependently Typed Programming with Singletons
and in his blog post singletons v0.9 Released!
Richard Eisenberg defines the data type Nat which defines natural numbers with the peano axioms:
data Nat = Zero | Succ Nat
By using the language extension DataKinds this data type is promoted to the type level.
The data constuctors Zero and Succ are promoted to the type constructors 'Zero and 'Succ.
With this we get for every Natural number a single and unique corresponding type on the type level. Eg for 3 we get 'Succ ( 'Succ ( 'Succ 'Zero)).
So we have now Natural numbers as types.
He then defines on the value level the function plus and on the type level the type family Plus
to have the addition operation available.
With the promote function/quasiqoter of the singletons library we can automatically
create the Plus type family from the plus function. So we can avoid writing the type family ourselfs.
So far so good!
With GADT syntax he also defines a data type SNat:
data SNat :: Nat -> * where
SZero :: SNat Zero
SSucc :: SNat n -> SNat (Succ n)
Basically he only wraps the Nat type into a SNat constructor.
Why is this necessary? What do we gain?
Are the data types Nat and SNat not isomorphic? Why is SNat a singleton, and why is Nat not
a singleton? In both cases every type is inhabited by one single value, the corresponding natural number.
What do we gain? Hmm. The status of singletons is that of awkward but currently necessary workaround, and the sooner we can do away with them, the better.
Let me see if I can clarify the picture. We have a data type Nat:
data Nat = Zero | Suc Nat
(wars have been started over even more trivial issues than the number of 'c's in Suc)
The type Nat has run-time values which are indistinguishable at the type level. The Haskell type system currently has the replacement property, which means that in any well typed program, you may replace any well typed subexpression by an alternative subexpression with the same scope and type, and the program will continue to be well typed. For example, you can rewrite every occurrence of
if <b> then <t> else <e>
to
if <b> then <e> else <t>
and you can be sure that nothing will go wrong...with the outcome of checking your program's type.
The replacement property is an embarrassment. It's clear proof that your type system gives up at the very moment that meaning starts to matter.
Now, by being a data type for run-time values, Nat also becomes a type of type-level values 'Zero and 'Suc. The latter live only in Haskell's type language and have no run-time presence at all. Please note that although 'Zero and 'Suc exist at the type level, it is unhelpful to refer to them as "types" and the people who currently do that should desist. They do not have type * and can thus not classify values which is what types worthy of the name do.
There is no direct means of exchange between run-time and type-level Nats, which can be a nuisance. The paradigmatic example concerns a key operation on vectors:
data Vec :: Nat -> * -> * where
VNil :: Vec 'Zero x
VCons :: x -> Vec n x -> Vec ('Suc n) x
We might like to compute a vector of copies of a given element (perhaps as part of an Applicative instance). It might look like a good idea to give the type
vec :: forall (n :: Nat) (x :: *). x -> Vec n x
but can that possibly work? In order to make n copies of something, we need to know n at run time: a program has to decide whether to deploy VNil and stop or to deploy VCons and keep going, and it needs some data to do that. A good clue is the forall quantifier, which is parametric: it indicates thats the quantified information is available only to types and is erased by run time.
Haskell currently enforces an entirely spurious coincidence between dependent quantification (what forall does) and erasure for run time. It does not support a dependent but not erased quantifier, which we often call pi. The type and implementation of vec should be something like
vec :: pi (n :: Nat) -> forall (x :: *). Vec n x
vec 'Zero x = VNil
vec ('Suc n) x = VCons x (vec n x)
where arguments in pi-positions are written in the type language, but the data are available at run time.
So what do we do instead? We use singletons to capture indirectly what it means to be a run-time copy of type-level data.
data SNat :: Nat -> * where
SZero :: SNat Zero
SSuc :: SNat n -> SNat (Suc n)
Now, SZero and SSuc make run-time data. SNat is not isomorphic to Nat: the former has type Nat -> *, while the latter has type *, so it is a type error to try to make them isomorphic. There are many run-time values in Nat, and the type system does not distinguish them; there is exactly one run-time value (worth speaking of) in each different SNat n, so the fact that the type system cannot distinguish them is beside the point. The point is that each SNat n is a different type for each different n, and that GADT pattern matching (where a pattern can be of a more specific instance of the GADT type it is known to be matching) can refine our knowledge of n.
We may now write
vec :: forall (n :: Nat). SNat n -> forall (x :: *). x -> Vec n x
vec SZero x = VNil
vec (SSuc n) x = VCons x (vec n x)
Singletons allow us to bridge the gap between run time and type-level data, by exploiting the only form of run-time analysis that allows the refinement of type information. It's quite sensible to wonder if they're really necessary, and they presently are, only because that gap has not yet been eliminated.

Pattern matching in Observational Type Theory

In the end of the "5. Full OTT" section of Towards Observational Type Theory the authors show how to define coercible-under-constructors indexed data types in OTT. The idea is basically to turn indexed data types into parameterized like this:
data IFin : ℕ -> Set where
zero : ∀ {n} -> IFin (suc n)
suc : ∀ {n} -> IFin n -> IFin (suc n)
data PFin (m : ℕ) : Set where
zero : ∀ {n} -> suc n ≡ m -> PFin m
suc : ∀ {n} -> suc n ≡ m -> PFin n -> PFin m
Conor also mentions this technique at the bottom of observational type theory (delivery):
The fix, of course, is to do what the GADT people did, and define
inductive families explicitly upto propositional equality. And then of
course you can transport them, by transisitivity.
However a type checker in Haskell is aware of equality constraints in scope and actually uses them during type checking. E.g. we can write
f :: a ~ b => a -> b
f x = x
It doesn't work so in type theory, since it's not enough to have a proof of a ~ b in scope to be able to rewrite by this equation: that proof also must be refl, because in the presense of a false hypothesis type checking becomes undecidable due to termination issues (something like this). So when you pattern match on Fin m in Haskell m gets rewritten to suc n in each branch, but that can't happen in type theory, instead you're left with an explicit proof of suc n ~ m. In OTT it's not possible to pattern match on proofs at all, hence you can neither pretend the proof is refl nor actually require that. It's only possible to supply the proof to coerce or just ignore it.
This makes it very hard to write anything that involves indexed data types. E.g. the usual three-lines (including the type signature) lookup for vectors becomes this beast:
vlookupₑ : ∀ {n m a} {α : Level a} {A : Univ α} -> ⟦ n ≅ m ⇒ fin n ⇒ vec A m ⇒ A ⟧
vlookupₑ p (fzeroₑ q) (vconsₑ r x xs) = x
vlookupₑ {n} {m} p (fsucₑ {n′} q i) (vconsₑ {m′} r x xs) =
vlookupₑ (left (suc n′) {m} {suc m′} (trans (suc n′) {n} {m} q p) r) i xs
vlookupₑ {n} {m} p (fzeroₑ {n′} q) (vnilₑ r) =
⊥-elim $ left (suc n′) {m} {0} (trans (suc n′) {n} {m} q p) r
vlookupₑ {n} {m} p (fsucₑ {n′} q i) (vnilₑ r) =
⊥-elim $ left (suc n′) {m} {0} (trans (suc n′) {n} {m} q p) r
vlookup : ∀ {n a} {α : Level a} {A : Univ α} -> Fin n -> Vec A n -> ⟦ A ⟧
vlookup {n} = vlookupₑ (refl n)
It could be a bit simplified, since if two elements of a data type that has decidable equality are observably equal, then they are also equal in the usual intensional sense, and natural numbers do have decidable equality, so we can coerce all the equations to their intensional counterparts and pattern match on them, but that would break some computational properties of vlookup and is verbose anyway. It's nearly impossible to deal in more complicated cases with indices which equality cannot be decided.
Is my reasoning correct? How is pattern matching in OTT meant to work? If this is a problem indeed, are there any ways to mitigate it?
I guess I'll field this one. I find it a strange question, but that's because of my own particular journey. The short answer is: don't do pattern matching in OTT, or in any kernel type theory. Which is not the same thing as to not do pattern matching ever.
The long answer is basically my PhD thesis.
In my PhD thesis, I show how to elaborate high-level programs written in a pattern matching style into a kernel type theory which has only the induction principles for inductive datatypes and a suitable treatment of propositional equality. The elaboration of pattern matching introduces propositional equations on datatype indices, then solves them by unification. Back then, I was using an intensional equality, but observational equality gives you at least the same power. That is: my technology for elaborating pattern matching (and thus keeping it out of the kernel theory), hiding all the equational piggery-jokery, predates the upgrade to observational equality. The ghastly vlookup you've used to illustrate your point might correspond to the output of the elaboration process, but the input need not be that bad. The nice definition
vlookup : Fin n -> Vec X n -> X
vlookup fz (vcons x xs) = x
vlookup (fs i) (vcons x xs) = vlookup i xs
elaborates just fine. The equation-solving that happens along the way is just the same equation-solving that Agda does at the meta-level when checking a definition by pattern matching, or that Haskell does. Don't be fooled by programs like
f :: a ~ b => a -> b
f x = x
In kernel Haskell, that elaborates to some sort of
f {q} x = coerce q x
but it's not in your face. And it's not in compiled code, either. OTT equality proofs, like Haskell equality proofs, can be erased before computing with closed terms.
Digression. To be clear about the status of equality data in Haskell, the GADT
data Eq :: k -> k -> * where
Refl :: Eq x x
really gives you
Refl :: x ~ y -> Eq x y
but because the type system is not logically sound, type safety relies on strict pattern matching on that type: you can't erase Refl and you really must compute it and match it at run time, but you can erase the data corresponding to the proof of x~y. In OTT, the entire propositional fragment is proof-irrelevant for open terms and erasable for closed computation. End of digression.
The decidability of equality on this or that datatype is not especially relevant (at least, not if you have uniqueness of identity proofs; if you don't always have UIP, decidability is one way to get it sometimes). The equational problems which show up in pattern matching are on arbitrary open expressions. That's a lot of rope. But a machine can certainly decide the fragment which consists of first-order expressions built from variables and fully applied constructors (and that's what Agda does when you split cases: if the constraints are too weird, the thing just barfs). OTT should allow us to push a bit further into the decidable fragments of higher-order unification. If you know (forall x. f x = t[x]) for unknown f, that's equivalent to f = \ x -> t[x].
So, "no pattern matching in OTT" has always been a deliberate design choice, as we always intended it to be an elaboration target for a translation we already knew how to do. Rather, it's a strict upgrade in kernel theory power.

Pattern matching on length using this GADT:

I've defined the following GADT:
data Vector v where
Zero :: Num a => Vector a
Scalar :: Num a => a -> Vector a
Vector :: Num a => [a] -> Vector [a]
TVector :: Num a => [a] -> Vector [a]
If it's not obvious, I'm trying to implement a simple vector space. All vector spaces need vector addition, so I want to implement this by making Vector and instance of Num. In a vector space, it doesn't make sense to add vectors of different lengths, and this is something I would like to enforce. One way I thought to do it would be using guards:
instance Num (Vector v) where
(Vector a) + (Vector b) | length a == length b =
Vector $ zipWith (+) a b
| otherwise =
error "Only add vectors with the same length."
There is nothing really wrong with this approach, but I feel like there has to be a way to do this with pattern matching. Perhaps one way to do it would be to define a new data type VectorLength, which would look something like this:
data Length l where
AnyLength :: Nat a => Length a
FixedLength :: Nat a -> Length a
Then, a length component could be added to the Vector data type, something like this:
data Vector (Length l) v where
Zero :: Num a => Vector AnyLength a
-- ...
Vector :: Num a => [a] -> Vector (length [a]) [a]
I know this isn't correct syntax, but this is the general idea I'm playing with. Finally, you could define addition to be
instance Num (Vector v) where
(Vector l a) + (Vector l b) = Vector $ zipWith (+) a b
Is such a thing possible, or is there any other way to use pattern matching for this purpose?
What you're looking for is something (in this instance confusingly) named a Vector as well. Generally, these are used in dependently typed languages where you'd write something like
data Vec (n :: Natural) a where
Nil :: Vec 0 a
Cons :: a -> Vec n a -> Vec (n + 1) a
But that's far from valid Haskell (or really any language). Some very recent extensions to GHC are beginning to enable this kind of expression but they're not there yet.
You might be interested in fixed-vector which does a best approximation of a fixed Vector available in relatively stable GHC. It uses a number of tricks between type families and continuations to create classes of fixed-size vectors.
Just to add to the example in the other answer - this nearly works already in GHC 7.6:
{-# LANGUAGE DataKinds, GADTs, KindSignatures, TypeOperators #-}
import GHC.TypeLits
data Vector (n :: Nat) a where
Nil :: Vector 0 a
Cons :: a -> Vector n a -> Vector (n + 1) a
That code compiles fine, it just doesn't work quite the way you'd hope. Let's check it out in ghci:
*Main> :t Nil
Nil :: Vector 0 a
Good so far...
*Main> :t Cons "foo" Nil
Cons "foo" Nil :: Vector (0 + 1) [Char]
Well, that's a little odd... Why does it say (0 + 1) instead of 1?
*Main> :t Cons "foo" Nil :: Vector 1 String
<interactive>:1:1:
Couldn't match type `0 + 1' with `1'
Expected type: Vector 1 String
Actual type: Vector (0 + 1) String
In the return type of a call of `Cons'
In the expression: Cons "foo" Nil :: Vector 1 String
Uh. Oops. That'd be why it says (0 + 1) instead of 1. It doesn't know that those are the same. This will be fixed (at least this case will) in GHC 7.8, which is due out... In a couple months, I think?

Resources