Matrix dimensionality checks at compile time - haskell

I was wondering whether is was possible to use the matrix type from Data.Matrix to construct another type with which it becomes possible to perform dimensionality checking at compile time.
E.g. I want to be able to write a function like:
mmult :: Matrix' r c -> Matrix' c r -> Matrix' r r
mmult = ...
However, I don't see how to do this since the arguments to a type constructor Matrix' would have to be types and not integer constants.

I don't see how to do this since the arguments to a type constructor Matrix' would have to be types and not integer constants
They do need to be type-level values, but not necessarily types. “Type-level” basically just means known at compile-time, but this also contains stuff that isn't really types. Types are in particular the type-level values of kind Type, but you can also have type-level strings or, indeed, natural numbers.
{-# LANGUAGE DataKinds, KindSignatures #-}
import GHC.TypeLits
import Data.Matrix
newtype Matrix' (n :: Nat) (m :: Nat) a
= StaMat {getStaticSizeMatrix :: Matrix a}
mmult :: Num a => Matrix' n m a -> Matrix' l n a -> Matrix' l m a
mmult (StaMat f) (StaMat g) = StaMat $ multStd f g
I would remark that matrices are only a special case of a much more general mathematical concept, that of linear maps between vector spaces. And since vector spaces can be seen as particular types, it actually makes a lot of sense to not use mere integers as the type-level tags, but the actual spaces. What you have then is a category, and it allows you to deal with both dynamic- and static size matrix/vector types, and can even be generalised to completely different spaces like infinite-dimensional Hilbert spaces.
{-# LANGUAGE GADTs #-}
newtype StaVect (n :: Nat) a
= StaVect {getStaticSizeVect :: Vector a}
data LinMap v w where
StaMat :: Matrix a -> LinMap (StaVec n a) (StaVec m a)
-- ...Add more constructors for mappings between other sorts of vector spaces...
linCompo :: LinMap v w -> LinMap u v -> LinMap u w
linCompo (StaMat f) (StaMat g) = StaMat $ multStd f g
The linearmap-category package pursues this direction.

Related

Type-level constraints in Haskell

I'm trying to encode term algebra as a data type in Haskell for further use in an algorithm. By term algebra I mean a set of terms which are either variables or functions applied to other terms. The functions with zero arguments are constants (but that's actually does not matter here).
Firstly, one would need the following GHC language extensions to replicate my code:
{-# LANGUAGE GADTs, DataKinds,
TypeApplications,
TypeFamilies,
TypeOperators,
StandaloneKindSignatures,
UndecidableInstances#-}
and the following imports:
import qualified GHC.TypeLits as GTL
import Data.Kind
The direct way to encode terms (the first one I took):
data Term where
Var :: String -> Term
Func :: String -> Integer -> [Term] -> Term
where by String I want to encode the name, by Integer the arity and by [Terms] the list of arguments of a function.
Then I want to be sure that the list of terms as arguments have the same length as an arity.
The first idea is to use smart constructors, but I would like to encode such constraints on a type level. So the second idea would be to create type-level naturals, lists of a specified length and the data type where these numbers coincide:
data Z = Z
data S a = S a
data List n a where
Nil :: List Z a
Cons :: a -> List m a -> List (S m) a
data WWTerm where
WWVar :: String -> WWTerm
WWFunc :: String -> m -> List m WWTerm -> WWTerm
My question here is the following: is there a way to impose type-level constraints using ordinary lists at the same time via 1) creating special type-families, 2) creating special type classes, or 3) via constraints in data types?
Regarding my attempts, I wrote the following:
type MyLength :: [Type] -> GTL.Nat
type family MyLength xs where
MyLength '[] = 0
MyLength (x':xs) = 1 GTL.+ (MyLength xs)
data QFunc n l where
QFunc :: (MyLength l ~ n) => String -> (n :: GTL.Nat) -> l -> QFunc n l
Unfortunately, this part of code doesn't compile for the following reasons:
Expected a type, but ‘n :: Nat’ has kind ‘Nat’
...
Expected a type, but ‘l’ has kind ‘[*]’
...
Are there any thoughts on how to approach my goal?

Heterogenous sized vectors where the types "work elsewhere"

Suppose I have a function that works on a vector with size known at compile-time (these are provided by the vector-sized package):
{-# LANGUAGE DataKinds, GADTs #-}
module Test where
import Data.Vector.Sized
-- Processes vectors known at compile time to have size 4.
processVector :: Vector 4 Int -> String
processVector = undefined
Fine, but what if I don't want to process vector of ints, but a vector of vectors?
-- Same thing but has subvectors of size 3.
processVector2 :: Vector 4 (Vector 3 Int) -> String
processVector2 = undefined
Fine, but there each sub-vector is of a fixed size. I want a function where the subvectors can each be of a different size but still known at compile time.
We can do this with existential quantifications:
data InnerVector = forall n. InnerVector (Vector n Int)
processVector3 :: Vector 4 InnerVector -> String
processVector3 = undefined
Fine, but what if I want to return not a String but a vector of the same dimensions?
processVector4 :: Vector 4 InnerVector -> Vector 4 InnerVector
processVector4 = undefined
This does not work because the second vector might have differently sized subvectors from the input subvectors! I want them known to be same at compile time. (So the subvectors at index 0 have same size, subvectors at index 1 have the same size, and so on.)
Is this possible to achieve? If not, do you know of (or can you create) a data structure that makes this possible?
I am avoiding tuples because:
My vectors will have size over 100.
Vectors make general processing easy (using 0-based indexes), so my processing functions continue to work even if I add more items to my vector.
I do indeed only want values of one type within the inner vectors (Int in the example).
By using existential quantification you effectively hide the sizes of the inner
vectors. But if you want to write code with types that convey that you are
preserving those sizes, you don't want them hidden. Instead you want your types
to be loud and clear about them.
So, let's define some types that broadcast these inner sizes. Essentially, you
need your "vector-of-vectors" type to be a type of heterogenous lists that
restricts the elements of these lists to be vectors. For sure, there are some
libraries out there that can help you put together such a type, but here we'll
roll our own. Just because it's more fun to do so.
Let's start with enabling some language extensions and then writing some types for the inner vectors and their sizes:
{-# LANGUAGE DataKinds, GADTs, InstanceSigs, KindSignatures, TypeOperators #-}
data Nat = Zero | Succ Nat
data Vector :: Nat -> * -> * where
VNil :: Vector Zero a
VCons :: a -> Vector n a -> Vector (Succ n) a
instance Functor (Vector n) where
fmap f VNil = VNil
fmap f (VCons x xs) = VCons (f x) (fmap f xs)
Next up is our type of "jagged" matrices (i.e., a vector of variable-size
vectors). As said, this is just a specific type of heterogeneous lists:
data JaggedMatrix :: [Nat] -> * -> * where
MNil :: JaggedMatrix '[] a
MCons :: Vector n a -> JaggedMatrix ns a -> JaggedMatrix (n : ns) a
There, that's it. The type of jagged matrices is indexed by a list that contains the sizes of the inner vectors. The outer dimension is not explicated in the type, but can simply be derived from the length of the inner-dimensions list.
Let's put it to work and write a dimensions-preservering
function. Here's an obvious one:
instance Functor (JaggedMatrix ns) where
fmap :: (a -> b) -> JaggedMatrix ns a -> JaggedMatrix ns b
fmap f MNil = MNil
fmap f (MCons xs xss) = MCons (fmap f xs) (fmap f xss)

Can a Haskell type constructor have non-type parameters?

A type constructor produces a type given a type. For example, the Maybe constructor
data Maybe a = Nothing | Just a
could be a given a concrete type, like Char, and give a concrete type, like Maybe Char. In terms of kinds, one has
GHCI> :k Maybe
Maybe :: * -> *
My question: Is it possible to define a type constructor that yields a concrete type given a Char, say? Put another way, is it possible to mix kinds and types in the type signature of a type constructor? Something like
GHCI> :k my_type
my_type :: Char -> * -> *
Can a Haskell type constructor have non-type parameters?
Let's unpack what you mean by type parameter. The word type has (at least) two potential meanings: do you mean type in the narrow sense of things of kind *, or in the broader sense of things at the type level? We can't (yet) use values in types, but modern GHC features a very rich kind language, allowing us to use a wide range of things other than concrete types as type parameters.
Higher-Kinded Types
Type constructors in Haskell have always admitted non-* parameters. For example, the encoding of the fixed point of a functor works in plain old Haskell 98:
newtype Fix f = Fix { unFix :: f (Fix f) }
ghci> :k Fix
Fix :: (* -> *) -> *
Fix is parameterised by a functor of kind * -> *, not a type of kind *.
Beyond * and ->
The DataKinds extension enriches GHC's kind system with user-declared kinds, so kinds may be built of pieces other than * and ->. It works by promoting all data declarations to the kind level. That is to say, a data declaration like
data Nat = Z | S Nat -- natural numbers
introduces a kind Nat and type constructors Z :: Nat and S :: Nat -> Nat, as well as the usual type and value constructors. This allows you to write datatypes parameterised by type-level data, such as the customary vector type, which is a linked list indexed by its length.
data Vec n a where
Nil :: Vec Z a
(:>) :: a -> Vec n a -> Vec (S n) a
ghci> :k Vec
Vec :: Nat -> * -> *
There's a related extension called ConstraintKinds, which frees constraints like Ord a from the yoke of the "fat arrow" =>, allowing them to roam across the landscape of the type system as nature intended. Kmett has used this power to build a category of constraints, with the newtype (:-) :: Constraint -> Constraint -> * denoting "entailment": a value of type c :- d is a proof that if c holds then d also holds. For example, we can prove that Ord a implies Eq [a] for all a:
ordToEqList :: Ord a :- Eq [a]
ordToEqList = Sub Dict
Life after forall
However, Haskell currently maintains a strict separation between the type level and the value level. Things at the type level are always erased before the program runs, (almost) always inferrable, invisible in expressions, and (dependently) quantified by forall. If your application requires something more flexible, such as dependent quantification over runtime data, then you have to manually simulate it using a singleton encoding.
For example, the specification of split says it chops a vector at a certain length according to its (runtime!) argument. The type of the output vector depends on the value of split's argument. We'd like to write this...
split :: (n :: Nat) -> Vec (n :+: m) a -> (Vec n a, Vec m a)
... where I'm using the type function (:+:) :: Nat -> Nat -> Nat, which stands for addition of type-level naturals, to ensure that the input vector is at least as long as n...
type family n :+: m where
Z :+: m = m
S n :+: m = S (n :+: m)
... but Haskell won't allow that declaration of split! There aren't any values of type Z or S n; only types of kind * contain values. We can't access n at runtime directly, but we can use a GADT which we can pattern-match on to learn what the type-level n is:
data Natty n where
Zy :: Natty Z
Sy :: Natty n -> Natty (S n)
ghci> :k Natty
Natty :: Nat -> *
Natty is called a singleton, because for a given (well-defined) n there is only one (well-defined) value of type Natty n. We can use Natty n as a run-time stand-in for n.
split :: Natty n -> Vec (n :+: m) a -> (Vec n a, Vec m a)
split Zy xs = (Nil, xs)
split (Sy n) (x :> xs) =
let (ys, zs) = split n xs
in (x :> ys, zs)
Anyway, the point is that values - runtime data - can't appear in types. It's pretty tedious to duplicate the definition of Nat in singleton form (and things get worse if you want the compiler to infer such values); dependently-typed languages like Agda, Idris, or a future Haskell escape the tyranny of strictly separating types from values and give us a range of expressive quantifiers. You're able to use an honest-to-goodness Nat as split's runtime argument and mention its value dependently in the return type.
#pigworker has written extensively about the unsuitability of Haskell's strict separation between types and values for modern dependently-typed programming. See, for example, the Hasochism paper, or his talk on the unexamined assumptions that have been drummed into us by four decades of Hindley-Milner-style programming.
Dependent Kinds
Finally, for what it's worth, with TypeInType modern GHC unifies types and kinds, allowing us to talk about kind variables using the same tools that we use to talk about type variables. In a previous post about session types I made use of TypeInType to define a kind for tagged type-level sequences of types:
infixr 5 :!, :?
data Session = Type :! Session -- Type is a synonym for *
| Type :? Session
| E
I'd recommend #Benjamin Hodgson's answer and the references he gives to see how to make this sort of thing useful. But, to answer your question more directly, using several extensions (DataKinds, KindSignatures, and GADTs), you can define types that are parameterized on (certain) concrete types.
For example, here's one parameterized on the concrete Bool datatype:
{-# LANGUAGE DataKinds, KindSignatures, GADTs #-}
{-# LANGUAGE FlexibleInstances #-}
module FlaggedType where
-- The single quotes below are optional. They serve to notify
-- GHC that we are using the type-level constructors lifted from
-- data constructors rather than types of the same name (and are
-- only necessary where there's some kind of ambiguity otherwise).
data Flagged :: Bool -> * -> * where
Truish :: a -> Flagged 'True a
Falsish :: a -> Flagged 'False a
-- separate instances, just as if they were different types
-- (which they are)
instance (Show a) => Show (Flagged 'False a) where
show (Falsish x) = show x
instance (Show a) => Show (Flagged 'True a) where
show (Truish x) = show x ++ "*"
-- these lists have types as indicated
x = [Truish 1, Truish 2, Truish 3] -- :: Flagged 'True Integer
y = [Falsish "a", Falsish "b", Falsish "c"] -- :: Flagged 'False String
-- this won't typecheck: it's just like [1,2,"abc"]
z = [Truish 1, Truish 2, Falsish 3] -- won't typecheck
Note that this isn't much different from defining two completely separate types:
data FlaggedTrue a = Truish a
data FlaggedFalse a = Falsish a
In fact, I'm hard pressed to think of any advantage Flagged has over defining two separate types, except if you have a bar bet with someone that you can write useful Haskell code without type classes. For example, you can write:
getInt :: Flagged a Int -> Int
getInt (Truish z) = z -- same polymorphic function...
getInt (Falsish z) = z -- ...defined on two separate types
Maybe someone else can think of some other advantages.
Anyway, I believe that parameterizing types with concrete values really only becomes useful when the concrete type is sufficient "rich" that you can use it to leverage the type checker, as in Benjamin's examples.
As #user2407038 noted, most interesting primitive types, like Ints, Chars, Strings and so on can't be used this way. Interestingly enough, though, you can use literal positive integers and strings as type parameters, but they are treated as Nats and Symbols (as defined in GHC.TypeLits) respectively.
So something like this is possible:
import GHC.TypeLits
data Tagged :: Symbol -> Nat -> * -> * where
One :: a -> Tagged "one" 1 a
Two :: a -> Tagged "two" 2 a
Three :: a -> Tagged "three" 3 a
Look at using Generalized Algebraic Data Types (GADTS), which enable you to define concrete outputs based on input type, e.g.
data CustomMaybe a where
MaybeChar :: Maybe a -> CustomMaybe Char
MaybeString :: Maybe a > CustomMaybe String
MaybeBool :: Maybe a -> CustomMaybe Bool
exampleFunction :: CustomMaybe a -> a
exampleFunction (MaybeChar maybe) = 'e'
exampleFunction (MaybeString maybe) = True //Compile error
main = do
print $ exampleFunction (MaybeChar $ Just 10)
To a similar effect, RankNTypes can allow the implementation of similar behaviour:
exampleFunctionOne :: a -> a
exampleFunctionOne el = el
type PolyType = forall a. a -> a
exampleFuntionTwo :: PolyType -> Int
exampleFunctionTwo func = func 20
exampleFunctionTwo func = func "Hello" --Compiler error, PolyType being forced to return 'Int'
main = do
print $ exampleFunctionTwo exampleFunctionOne
The PolyType definition allows you to insert the polymorphic function within exampleFunctionTwo and force its output to be 'Int'.
No. Haskell doesn't have dependent types (yet). See https://typesandkinds.wordpress.com/2016/07/24/dependent-types-in-haskell-progress-report/ for some discussion of when it may.
In the meantime, you can get behavior like this in Agda, Idris, and Cayenne.

Typeclass instances with constrained return type

I'm implementing a notion of inner product that's general over the container and numerical types. The definition states that the return type of this operation is a (non-negative) real number.
One option (shown below) is to write all instances by hand, for each numerical type (Float, Double, Complex Float, Complex Double, Complex CFloat, Complex CDouble, etc.). The primitive types aren't many, but I dislike the repetition.
Another option, or so I thought, is to have a parametric instance with a constraint such as RealFloat (which represents Float and Double).
{-# language MultiParamTypeClasses, TypeFamilies, FlexibleInstances #-}
module Test where
import Data.Complex
class Hilbert c e where
type HT e :: *
dot :: c e -> c e -> HT e
instance Hilbert [] Double where
type HT Double = Double
dot x y = sum $ zipWith (*) x y
instance Hilbert [] (Complex Double) where
type HT (Complex Double) = Double
a `dot` b = realPart $ sum $ zipWith (*) (conjugate <$> a) b
Question
Why does the instance below not work ("Couldn't match type e with Double.. expected type HT e, actual type e")?
instance RealFloat e => Hilbert [] e where
type HT e = Double
dot x y = sum $ zipWith (*) x y
Well, that particular instance doesn't work because the sum only yields an e, but you want the result to be Double. As e is constrained to RealFrac, this is easy to fix though, as any Real (questionable though is is mathematically) can be converted to a Fractional:
dot x y = realToFrac . sum $ zipWith (*) x y
However, that generic instance prevents you from also defining complex instances: with instance RealFloat e => Hilbert [] e where you cover all types, even if they aren't really real numbers. You could still instantiate Complex as an overlapping instance, but I'd rather stay away from those if I could help it.
It's also questionable if such vectorspace classes should be defined on * -> * at all. Yes, linear also does it this way, but IMO parametricity doesn't work in our favour in this application. Have you checked out the vector-space package? Mind, it isn't exactly complete for doing serious linear algebra; that's a gap I hope to fill with my linearmap-category package.

OCaml functors (parametrized modules) emulation in Haskell

Is there any recommended way to use typeclasses to emulate OCaml-like parametrized modules?
For an instance, I need the module that implements the complex
generic computation, that may be parmetrized with different
misc. types, functions, etc. To be more specific, let it be
kMeans implementation that could be parametrized with different
types of values, vector types (list, unboxed vector, vector, tuple, etc),
and distance calculation strategy.
For convenience, to avoid crazy amount of intermediate types, I want to
have this computation polymorphic by DataSet class, that contains all
required interfaces. I also tried to use TypeFamilies to avoid a lot
of typeclass parameters (that cause problems as well):
{-# Language MultiParamTypeClasses
, TypeFamilies
, FlexibleContexts
, FlexibleInstances
, EmptyDataDecls
, FunctionalDependencies
#-}
module Main where
import qualified Data.List as L
import qualified Data.Vector as V
import qualified Data.Vector.Unboxed as U
import Distances
-- contains instances for Euclid distance
-- import Distances.Euclid as E
-- contains instances for Kulback-Leibler "distance"
-- import Distances.Kullback as K
class ( Num (Elem c)
, Ord (TLabel c)
, WithDistance (TVect c) (Elem c)
, WithDistance (TBoxType c) (Elem c)
)
=> DataSet c where
type Elem c :: *
type TLabel c :: *
type TVect c :: * -> *
data TDistType c :: *
data TObservation c :: *
data TBoxType c :: * -> *
observations :: c -> [TObservation c]
measurements :: TObservation c -> [Elem c]
label :: TObservation c -> TLabel c
distance :: TBoxType c (Elem c) -> TBoxType c (Elem c) -> Elem c
distance = distance_
instance DataSet () where
type Elem () = Float
type TLabel () = Int
data TObservation () = TObservationUnit [Float]
data TDistType ()
type TVect () = V.Vector
data TBoxType () v = VectorBox (V.Vector v)
observations () = replicate 10 (TObservationUnit [0,0,0,0])
measurements (TObservationUnit xs) = xs
label (TObservationUnit _) = 111
kMeans :: ( Floating (Elem c)
, DataSet c
) => c
-> [TObservation c]
kMeans s = undefined -- here the implementation
where
labels = map label (observations s)
www = L.map (V.fromList.measurements) (observations s)
zzz = L.zipWith distance_ www www
wtf1 = L.foldl wtf2 0 (observations s)
wtf2 acc xs = acc + L.sum (measurements xs)
qq = V.fromList [1,2,3 :: Float]
l = distance (VectorBox qq) (VectorBox qq)
instance Floating a => WithDistance (TBoxType ()) a where
distance_ xs ys = undefined
instance Floating a => WithDistance V.Vector a where
distance_ xs ys = sqrt $ V.sum (V.zipWith (\x y -> (x+y)**2) xs ys)
This code somehow compiles and work, but it's pretty ugly and hacky.
The kMeans should be parametrized by value type (number, float point number, anything),
box type (vector,list,unboxed vector, tuple may be) and distance calculation strategy.
There are also types for Observation (that's the type of sample provided by user,
there should be a lot of them, measurements that contained in each observation).
So the problems are:
1) If the function does not contains the parametric types in it's signature,
types will not be deduced
2) Still no idea, how to declare typeclass WithDistance to have different instances
for different distance type (Euclid, Kullback, anything else via phantom types).
Right now WithDistance just polymorphic by box type and value type, so if we need
different strategies, we may only put them in different modules and import the required
module. But this is a hack and non-typed approach, right?
All of this may be done pretty easy in OCaml with is't modules. What the proper approach
to implement such things in Haskell?
Typeclasses with TypeFamilies somehow look similar to parametric modules, but they
work different. I really need something like that.
It is really the case that Haskell lacks useful features found in *ML module systems.
There is ongoing effort to extend Haskell's module system: http://plv.mpi-sws.org/backpack/
But I think you can get a bit further without those ML modules.
Your design follows God class anti-pattern and that is why it is anti-modular.
Type class can be useful only if every type can have no more than a single instance of that class. E.g. DataSet () instance fixes type TVect () = V.Vector and you can't easily create similar instance but with TVect = U.Vector.
You need to start with implementing kMeans function, then generalize it by replacing concrete types with type variables and constraining those type variables with type classes when needed.
Here is little example. At first you have some non-general implementation:
kMeans :: Int -> [(Double,Double)] -> [[(Double,Double)]]
kMeans k points = ...
Then you generalize it by distance calculation strategy:
kMeans
:: Int
-> ((Double,Double) -> (Double,Double) -> Double)
-> [(Double,Double)]
-> [[(Double,Double)]]
kMeans k distance points = ...
Now you can generalize it by type of points, but this requires introducing a class that will capture some properties of points that are used by distance computation e.g. getting list of coordinates:
kMeans
:: Point p
=> Int -> (p -> p -> Coord p) -> [p]
-> [[p]]
kMeans k distance points = ...
class Num (Coord p) => Point p where
type Coord p
coords :: p -> [Coord p]
euclidianDistance
:: (Point p, Floating (Coord p))
=> p -> p -> Coord p
euclidianDistance a b
= sum $ map (**2) $ zipWith (-) (coords a) (coords b)
Now you may wish to make it a bit faster by replacing lists with vectors:
kMeans
:: (Point p, DataSet vec p)
=> Int -> (p -> p -> Coord p) -> vec p
-> [vec p]
kMeans k distance points = ...
class DataSet vec p where
map :: ...
foldl' :: ...
instance Unbox p => DataSet U.Vector p where
map = U.map
foldl' = U.foldl'
And so on.
Suggested approach is to generalize various parts of algorithm and constrain those parts with small loosely coupled type classes (when required).
It is a bad style to collect everything in a single monolithic type class.

Resources