DataKind Unions - haskell

I'm not sure if it is the right terminology, but is it possible to declare function types that take in an 'union' of datakinds?
For example, I know I can do the following:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
...
data Shape'
= Circle'
| Square'
| Triangle'
data Shape :: Shape' -> * where
Circle :: { radius :: Int} -> Shape Circle'
Square :: { side :: Int} -> Shape Square'
Triangle
:: { a :: Int
, b :: Int
, c :: Int}
-> Shape Triangle'
test1 :: Shape Circle' -> Int
test1 = undefined
However, what if I want to take in a shape that is either a circle or a square? What if I also want to take in all shapes for a separate function?
Is there a way for me to either define a set of Shape' kinds to use, or a way for me to allow multiple datakind definitions per data?
Edit:
The usage of unions doesn't seem to work:
{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
...
type family Union (a :: [k]) (r :: k) :: Constraint where
Union (x ': xs) x = ()
Union (x ': xs) y = Union xs y
data Shape'
= Circle'
| Square'
| Triangle'
data Shape :: Shape' -> * where
Circle :: { radius :: Int} -> Shape Circle'
Square :: { side :: Int} -> Shape Square'
Triangle
:: { a :: Int
, b :: Int
, c :: Int}
-> Shape Triangle'
test1 :: Union [Circle', Triangle'] s => Shape s -> Int
test1 Circle {} = undefined
test1 Triangle {} = undefined
test1 Square {} = undefined
The part above compiles

You can accomplish something like this in (I think) a reasonably clean way using a type family together with ConstraintKinds and PolyKinds:
type family Union (a :: [k]) (r :: k) :: Constraint where
Union (x ': xs) x = ()
Union (x ': xs) y = Union xs y
test1 :: Union [Circle', Triangle'] s => Shape s -> Int
test1 = undefined
The () above is the empty constraint (it's like an empty "list" of type class constraints).
The first "equation" of the type family makes use of the nonlinear pattern matching available in type families (it uses x twice on the left hand side). The type family also makes use of the fact that if none of the cases match, it will not give you a valid constraint.
You should also be able to use a type-level Boolean instead of ConstraintKinds. That would be a bit more cumbersome and I think it would be best to avoid using a type-level Boolean here (if you can).
Side-note (I can never remember this and I had to look it up for this answer): You get Constraint in-scope by importing it from GHC.Exts.
Edit: Partially disallowing unreachable definitions
Here is a modification to get it to (partially) disallow unreachable definitions as well as invalid calls. It is slightly more roundabout, but it seems to work.
Modify Union to give a * instead of a constraint, like this:
type family Union (a :: [k]) (r :: k) :: * where
Union (x ': xs) x = ()
Union (x ': xs) y = Union xs y
It doesn't matter too much what the type is, as long as it has an inhabitant you can pattern match on, so I give back the () type (the unit type).
This is how you would use it:
test1 :: Shape s -> Union [Circle', Triangle'] s -> Int
test1 Circle {} () = undefined
test1 Triangle {} () = undefined
-- test1 Square {} () = undefined -- This line won't compile
If you forget to match on it (like, if you put a variable name like x instead of matching on the () constructor), it is possible that an unreachable case can be defined. It will still give a type error at the call-site when you actually try to reach that case, though (so, even if you don't match on the Union argument, the call test1 (Square undefined) () will not type check).
Note that it seems the Union argument must come after the Shape argument in order for this to work (fully as described, anyway).

This is getting kind of awful, but I guess you could require a proof that it's either a circle or a square using Data.Type.Equality:
test1 :: Either (s :~: Circle') (s :~: Square') -> Shape s -> Int
Now the user has to give an extra argument (a "proof term") saying which one it is.
In fact you can use the proof term idea to "complete" bradm's solution, with:
class MyOpClass sh where
myOp :: Shape sh -> Int
shapeConstraint :: Either (sh :~: Circle') (sh :~: Square')
Now nobody can go adding any more instances (unless they use undefined, which would be impolite).

You could use typeclasses:
class MyOpClass sh where
myOp :: Shape sh -> Int
instance MyOpClass Circle' where
myOp (Circle r) = _
instance MyOpClass Square' where
myOP (Square s) = _
This doesn't feel like a particularly 'complete' solution to me - anyone could go back and add another instance MyOpClass Triangle' - but I can't think of any other solution. Potentially you could avoid this problem simply by not exporting the typeclass however.

Another solution I've noticed, though pretty verbose, is to create a kind that has a list of feature booleans. You can then pattern match on the features when restricting the type:
-- [circleOrSquare] [triangleOrSquare]
data Shape' =
Shape'' Bool
Bool
data Shape :: Shape' -> * where
Circle :: { radius :: Int} -> Shape (Shape'' True False)
Square :: { side :: Int} -> Shape (Shape'' True True)
Triangle
:: { a :: Int
, b :: Int
, c :: Int}
-> Shape (Shape'' False True)
test1 :: Shape (Shape'' True x) -> Int
test1 Circle {} = 2
test1 Square {} = 2
test1 Triangle {} = 2
Here, Triangle will fail to match:
• Couldn't match type ‘'True’ with ‘'False’
Inaccessible code in
a pattern with constructor:
Triangle :: Int -> Int -> Int -> Shape ('Shape'' 'False 'True),
in an equation for ‘test1’
• In the pattern: Triangle {}
In an equation for ‘test1’: test1 Triangle {} = 2
|
52 | test1 Triangle {} = 2
| ^^^^^^^^^^^
Unfortunately, I don't think you can write this as a record, which may be clearer and avoids the ordering of the features.
This might be usable in conjunction with the class examples for readability.

Related

Parameterized Types in Haskell

Why do types in Haskell have to be explicitly parameterized in the type constructor parameter?
For example:
data Maybe a = Nothing | Just a
Here a has to be specified with the type. Why can't it be specified only in the constructor?
data Maybe = Nothing | Just a
Why did they make this choice from a design point of view? Is one better than the other?
I do understand that first is more strongly typed than the second, but there isn't even an option for the second one.
Edit :
Example function
data Maybe = Just a | Nothing
div :: (Int -> Int -> Maybe)
div a b
| b == 0 = Nothing
| otherwise = Just (a / b)
It would probably clear things up to use GADT notation, since the standard notation kind of mangles together the type- and value-level languages.
The standard Maybe type looks thus as a GADT:
{-# LANGUAGE GADTs #-}
data Maybe a where
Nothing :: Maybe a
Just :: a -> Maybe a
The “un-parameterised” version is also possible:
data EMaybe where
ENothing :: EMaybe
EJust :: a -> EMaybe
(as Joseph Sible commented, this is called an existential type). And now you can define
foo :: Maybe Int
foo = Just 37
foo' :: EMaybe
foo' = EJust 37
Great, so why don't we just use EMaybe always?
Well, the problem is when you want to use such a value. With Maybe it's fine, you have full control of the contained type:
bhrar :: Maybe Int -> String
bhrar Nothing = "No number 😞"
bhrar (Just i)
| i<0 = "Negative 😖"
| otherwise = replicate i '😌'
But what can you do with a value of type EMaybe? Not much, it turns out, because EJust contains a value of some unknown type. So whatever you try to use the value for, will be a type error, because the compiler has no way to confirm it's actually the right type.
bhrar :: EMaybe -> String
bhrar' (EJust i) = replicate i '😌'
=====> Error couldn't match expected type Int with a
If a variable is not reflected in the return type it is considered existential. This is possible to define data ExMaybe = ExNothing | forall a. ExJust a but the argument to ExJust is completely useless. ExJust True and ExJust () both have type ExMaybe and are indistinguisable from the type system's perspective.
Here is the GADT syntax for both the original Maybe and the existential ExMaybe
{-# Language GADTs #-}
{-# Language LambdaCase #-}
{-# Language PolyKinds #-}
{-# Language ScopedTypeVariables #-}
{-# Language StandaloneKindSignatures #-}
{-# Language TypeApplications #-}
import Data.Kind (Type)
import Prelude hiding (Maybe(..))
type Maybe :: Type -> Type
data Maybe a where
Nothing :: Maybe a
Just :: a -> Maybe a
type ExMaybe :: Type
data ExMaybe where
ExNothing :: ExMaybe
ExJust :: a -> ExMaybe
You're question is like asking why a function f x = .. needs to specify its argument, there is the option of making the type argument invisible but this is very odd but the argument is still there even if invisible.
-- >> :t JUST
-- JUST :: a -> MAYBE
-- >> :t JUST 'a'
-- JUST 'a' :: MAYBE
type MAYBE :: forall (a :: Type). Type
data MAYBE where
NOTHING :: MAYBE #a
JUST :: a -> MAYBE #a
mAYBE :: b -> (a -> b) -> MAYBE #a -> b
mAYBE nOTHING jUST = \case
NOTHING -> nOTHING
JUST a -> jUST a
Having explicit type parameters makes it much more expressive. You lose so much information without it. For example, how would you write the type of map? Or functors in general?
map :: (a -> b) -> [a] -> [b]
This version says almost nothing about what’s going on
map :: (a -> b) -> [] -> []
Or even worse, head:
head :: [] -> a
Now we suddenly have access to unsafe coerce and zero type safety at all.
unsafeCoerce :: a -> b
unsafeCoerce x = head [x]
But we don’t just lose safety, we also lose the ability to do some things. For example if we want to read something into a list or Maybe, we can no longer specify what kind of list we want.
read :: Read a => a
example :: [Int] -> String
main = do
xs <- getLine
putStringLine (example xs)
This program would be impossible to write without lists having an explicit type parameter. (Or rather, read would be unable to have different implementations for different list types, since content type is now opaque)
It is however, as was mentioned by others, still possible to define a similar type by using the ExistentialQuantification extension. But in those cases you are very limited in how you can use those data types, since you cannot know what they contain.

Is it possible to promote a value to type level?

Doing this just for fun but I don't end up figuring this out.
Say I have a typeclass that unifies coordinate system on squares and hexagons:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE TypeFamilies #-}
import Data.Proxy
data Shape = Square | Hexagon
class CoordSystem (k :: Shape) where
type Coord k
-- all coordinates arranged in a 2D list
-- given a side length of the shape
allCoords :: forall p. p k -> Int -> [[Coord k]]
-- neighborhoods of a coordinate.
neighborsOf :: forall p. p k -> Int -> Coord k -> [Coord k]
-- omitting implementations
instance CoordSystem 'Square
instance CoordSystem 'Hexagon
Now suppose I want to use this interface with a s :: Shape that is only known at runtime. But to make use of this interface, at some point I'll need a function like this:
-- none of those two works:
-- promote :: CoordSystem k => Shape -> Proxy (k :: Shape) -- signature 1
-- promote :: Shape -> forall k. CoordSystem k => Proxy (k :: Shape)
promote s = case s of
Square -> Proxy #'Square
Hexagon -> Proxy #'Hexagon
However this does not work, if signature 1 is uncommented:
• Couldn't match type ‘k’ with ‘'Square’
‘k’ is a rigid type variable bound by
the type signature for:
promote :: forall (k :: Shape). CoordSystem k => Shape -> Proxy k
at SO.hs:28:1-55
Expected type: Proxy k
Actual type: Proxy 'Square
Understandably, none of 'Square, 'Hexagon, k :: Shape unifies with others, so I have no idea whether this is possible.
I also feel type erasure shouldn't be an issue here as alternatives of Shape can use to uniquely identify the instance - for such reason I feel singletons could be of use but I'm not familiar with that package to produce any working example either.
The usual way is to use either an existential type or its Church encoding. The encoded version is actually easier to understand at first, I think, and closer to what you already attempted. The problem with your forall k. CoordSystem k => {- ... thing mentioning k -} is that it promises to polymorph into whatever k the user likes (so long as the user likes CoordSystems!). To fix it, you can demand that the user polymorph into whatever k you like.
-- `a` must not mention `k`, since `k` is not
-- in scope in the final return type
promote :: forall a. Shape -> (forall k. CoordSystem k => Tagged k a) -> a
promote Square a = unTagged (a #Square)
promote Hexagon a = unTagged (a #Hexagon)
-- usage example
test = promote Hexagon (unproxy $ \p -> length (allCoords p 15))
Note that on the right hand side of the = sign, a has the type forall k. CoordSystem k => {- ... -} that says the user gets to choose k, but this time you're the user.
Another common option is to use an existential:
data SomeSystem where
-- Proxy to be able to name the wrapped type when matching on a SomeSystem;
-- in some future version of GHC we may be able to name it via pattern-matching
-- on a type application instead, which would be better
SomeSystem :: CoordSystem k => Proxy k -> SomeSystem
Then you would write something like
promote :: Shape -> SomeSystem
promote Square = SomeSystem (Proxy #Square)
promote Hexagon = SomeSystem (Proxy #Hexagon)
-- usage example
test = case promote Hexagon of SomeSystem p -> length (allCoords p 15)
and then the user would pattern match to extract the CoordSystem instance from it.
A final choice is singletons:
data ShapeS k where
SquareS :: ShapeS Square
HexagonS :: ShapeS Hexagon
Here we have made a direct connection between SquareS at the computation level and Square at the type level (resp. HexagonS and Hexagon). Then you can write:
-- N.B. not a rank-2 type, and in particular `a` is
-- now allowed to mention `k`
promote :: ShapeS k -> (CoordSystem k => a) -> a
promote SquareS a = a
promote HexagonS a = a
The singletons package offers tools for automatically deriving the singleton types that correspond to your ADTs.

What can type families do that multi param type classes and functional dependencies cannot

I have played around with TypeFamilies, FunctionalDependencies, and MultiParamTypeClasses. And it seems to me as though TypeFamilies doesn't add any concrete functionality over the other two. (But not vice versa). But I know type families are pretty well liked so I feel like I am missing something:
"open" relation between types, such as a conversion function, which does not seem possible with TypeFamilies. Done with MultiParamTypeClasses:
class Convert a b where
convert :: a -> b
instance Convert Foo Bar where
convert = foo2Bar
instance Convert Foo Baz where
convert = foo2Baz
instance Convert Bar Baz where
convert = bar2Baz
Surjective relation between types, such as a sort of type safe pseudo-duck typing mechanism, that would normally be done with a standard type family. Done with MultiParamTypeClasses and FunctionalDependencies:
class HasLength a b | a -> b where
getLength :: a -> b
instance HasLength [a] Int where
getLength = length
instance HasLength (Set a) Int where
getLength = S.size
instance HasLength Event DateDiff where
getLength = dateDiff (start event) (end event)
Bijective relation between types, such as for an unboxed container, which could be done through TypeFamilies with a data family, although then you have to declare a new data type for every contained type, such as with a newtype. Either that or with an injective type family, which I think is not available prior to GHC 8. Done with MultiParamTypeClasses and FunctionalDependencies:
class Unboxed a b | a -> b, b -> a where
toList :: a -> [b]
fromList :: [b] -> a
instance Unboxed FooVector Foo where
toList = fooVector2List
fromList = list2FooVector
instance Unboxed BarVector Bar where
toList = barVector2List
fromList = list2BarVector
And lastly a surjective relations between two types and a third type, such as python2 or java style division function, which can be done with TypeFamilies by also using MultiParamTypeClasses. Done with MultiParamTypeClasses and FunctionalDependencies:
class Divide a b c | a b -> c where
divide :: a -> b -> c
instance Divide Int Int Int where
divide = div
instance Divide Int Double Double where
divide = (/) . fromIntegral
instance Divide Double Int Double where
divide = (. fromIntegral) . (/)
instance Divide Double Double Double where
divide = (/)
One other thing I should also add is that it seems like FunctionalDependencies and MultiParamTypeClasses are also quite a bit more concise (for the examples above anyway) as you only have to write the type once, and you don't have to come up with a dummy type name which you then have to type for every instance like you do with TypeFamilies:
instance FooBar LongTypeName LongerTypeName where
FooBarResult LongTypeName LongerTypeName = LongestTypeName
fooBar = someFunction
vs:
instance FooBar LongTypeName LongerTypeName LongestTypeName where
fooBar = someFunction
So unless I am convinced otherwise it really seems like I should just not bother with TypeFamilies and use solely FunctionalDependencies and MultiParamTypeClasses. Because as far as I can tell it will make my code more concise, more consistent (one less extension to care about), and will also give me more flexibility such as with open type relationships or bijective relations (potentially the latter is solver by GHC 8).
Here's an example of where TypeFamilies really shines compared to MultiParamClasses with FunctionalDependencies. In fact, I challenge you to come up with an equivalent MultiParamClasses solution, even one that uses FlexibleInstances, OverlappingInstance, etc.
Consider the problem of type level substitution (I ran across a specific variant of this in Quipper in QData.hs). Essentially what you want to do is recursively substitute one type for another. For example, I want to be able to
substitute Int for Bool in Either [Int] String and get Either [Bool] String,
substitute [Int] for Bool in Either [Int] String and get Either Bool String,
substitute [Int] for [Bool] in Either [Int] String and get Either [Bool] String.
All in all, I want the usual notion of type level substitution. With a closed type family, I can do this for any types (albeit I need an extra line for each higher-kinded type constructor - I stopped at * -> * -> * -> * -> *).
{-# LANGUAGE TypeFamilies #-}
-- Subsitute type `x` for type `y` in type `a`
type family Substitute x y a where
Substitute x y x = y
Substitute x y (k a b c d) = k (Substitute x y a) (Substitute x y b) (Substitute x y c) (Substitute x y d)
Substitute x y (k a b c) = k (Substitute x y a) (Substitute x y b) (Substitute x y c)
Substitute x y (k a b) = k (Substitute x y a) (Substitute x y b)
Substitute x y (k a) = k (Substitute x y a)
Substitute x y a = a
And trying at ghci I get the desired output:
> :t undefined :: Substitute Int Bool (Either [Int] String)
undefined :: Either [Bool] [Char]
> :t undefined :: Substitute [Int] Bool (Either [Int] String)
undefined :: Either Bool [Char]
> :t undefined :: Substitute [Int] [Bool] (Either [Int] String)
undefined :: Either [Bool] [Char]
With that said, maybe you should be asking yourself why am I using MultiParamClasses and not TypeFamilies. Of the examples you gave above, all except Convert translate to type families (albeit you will need an extra line per instance for the type declaration).
Then again, for Convert, I am not convinced it is a good idea to define such a thing. The natural extension to Convert would be instances such as
instance (Convert a b, Convert b c) => Convert a c where
convert = convert . convert
instance Convert a a where
convert = id
which are as unresolvable for GHC as they are elegant to write...
To be clear, I am not saying there are no uses of MultiParamClasses, just that when possible you should be using TypeFamilies - they let you think about type-level functions instead of just relations.
This old HaskellWiki page does an OK job of comparing the two.
EDIT
Some more contrasting and history I stumbled upon from augustss blog
Type families grew out of the need to have type classes with
associated types. The latter is not strictly necessary since it can be
emulated with multi-parameter type classes, but it gives a much nicer
notation in many cases. The same is true for type families; they can
also be emulated by multi-parameter type classes. But MPTC gives a
very logic programming style of doing type computation; whereas type
families (which are just type functions that can pattern match on the
arguments) is like functional programming.
Using closed type families
adds some extra strength that cannot be achieved by type classes. To
get the same power from type classes we would need to add closed type
classes. Which would be quite useful; this is what instance chains
gives you.
Functional dependencies only affect the process of constraint solving, while type families introduced the notion of non-syntactic type equality, represented in GHC's intermediate form by coercions. This means type families interact better with GADTs. See this question for the canonical example of how functional dependencies fail here.

Pattern matching on a private data constructor

I'm writing a simple ADT for grid axis. In my application grid may be either regular (with constant step between coordinates), or irregular (otherwise). Of course, the regular grid is just a special case of irregular one, but it may worth to differentiate between them in some situations (for example, to perform some optimizations). So, I declare my ADT as the following:
data GridAxis = RegularAxis (Float, Float) Float -- (min, max) delta
| IrregularAxis [Float] -- [xs]
But I don't want user to create malformed axes with max < min or with unordered xs list. So, I add "smarter" construction functions which perform some basic checks:
regularAxis :: (Float, Float) -> Float -> GridAxis
regularAxis (a, b) dx = RegularAxis (min a b, max a b) (abs dx)
irregularAxis :: [Float] -> GridAxis
irregularAxis xs = IrregularAxis (sort xs)
I don't want user to create grids directly, so I don't add GridAxis data constructors into module export list:
module GridAxis (
GridAxis,
regularAxis,
irregularAxis,
) where
But it turned out that after having this done I cannot use pattern matching on GridAxis anymore. Trying to use it
import qualified GridAxis as GA
test :: GA.GridAxis -> Bool
test axis = case axis of
GA.RegularAxis -> True
GA.IrregularAxis -> False
gives the following compiler error:
src/Physics/ImplicitEMC.hs:7:15:
Not in scope: data constructor `GA.RegularAxis'
src/Physics/ImplicitEMC.hs:8:15:
Not in scope: data constructor `GA.IrregularAxis'
Is there something to work this around?
You can define constructor pattern synonyms. This lets you use the same name for smart construction and "dumb" pattern matching.
{-# LANGUAGE PatternSynonyms #-}
module GridAxis (GridAxis, pattern RegularAxis, pattern IrregularAxis) where
import Data.List
data GridAxis = RegularAxis_ (Float, Float) Float -- (min, max) delta
| IrregularAxis_ [Float] -- [xs]
-- The line with "<-" defines the matching behavior
-- The line with "=" defines the constructor behavior
pattern RegularAxis minmax delta <- RegularAxis_ minmax delta where
RegularAxis (a, b) dx = RegularAxis_ (min a b, max a b) (abs dx)
pattern IrregularAxis xs <- IrregularAxis_ xs where
IrregularAxis xs = IrregularAxis_ (sort xs)
Now you can do:
module Foo
import GridAxis
foo :: GridAxis -> a
foo (RegularAxis (a, b) d) = ...
foo (IrregularAxis xs) = ...
And also use RegularAxis and IrregularAxis as smart constructors.
This looks as a use case for pattern synonyms.
Basically you don't export the real constructor, but only a "smart" one
{-# LANGUAGE PatternSynonyms #-}
module M(T(), SmartCons, smartCons) where
data T = RealCons Int
-- the users will construct T using this
smartCons :: Int -> T
smartCons n = if even n then RealCons n else error "wrong!"
-- ... and destruct T using this
pattern SmartCons n <- RealCons n
Another module importing M can then use
case someTvalue of
SmartCons n -> use n
and e.g.
let value = smartCons 23 in ...
but can not use the RealCons directly.
If you prefer to stay in basic Haskell, without extensions, you can use a "view type"
module M(T(), smartCons, Tview(..), toView) where
data T = RealCons Int
-- the users will construct T using this
smartCons :: Int -> T
smartCons n = if even n then RealCons n else error "wrong!"
-- ... and destruct T using this
data Tview = Tview Int
toView :: T -> Tview
toView (RealCons n) = Tview n
Here, users have full access to the view type, which can be constructed/destructed freely, but have only a restricted start constructor for the actual type T. Destructing the actual type T is possible by moving to the view type
case toView someTvalue of
Tview n -> use n
For nested patterns, things become more cumbersome, unless you enable other extensions such as ViewPatterns.

OCaml functors (parametrized modules) emulation in Haskell

Is there any recommended way to use typeclasses to emulate OCaml-like parametrized modules?
For an instance, I need the module that implements the complex
generic computation, that may be parmetrized with different
misc. types, functions, etc. To be more specific, let it be
kMeans implementation that could be parametrized with different
types of values, vector types (list, unboxed vector, vector, tuple, etc),
and distance calculation strategy.
For convenience, to avoid crazy amount of intermediate types, I want to
have this computation polymorphic by DataSet class, that contains all
required interfaces. I also tried to use TypeFamilies to avoid a lot
of typeclass parameters (that cause problems as well):
{-# Language MultiParamTypeClasses
, TypeFamilies
, FlexibleContexts
, FlexibleInstances
, EmptyDataDecls
, FunctionalDependencies
#-}
module Main where
import qualified Data.List as L
import qualified Data.Vector as V
import qualified Data.Vector.Unboxed as U
import Distances
-- contains instances for Euclid distance
-- import Distances.Euclid as E
-- contains instances for Kulback-Leibler "distance"
-- import Distances.Kullback as K
class ( Num (Elem c)
, Ord (TLabel c)
, WithDistance (TVect c) (Elem c)
, WithDistance (TBoxType c) (Elem c)
)
=> DataSet c where
type Elem c :: *
type TLabel c :: *
type TVect c :: * -> *
data TDistType c :: *
data TObservation c :: *
data TBoxType c :: * -> *
observations :: c -> [TObservation c]
measurements :: TObservation c -> [Elem c]
label :: TObservation c -> TLabel c
distance :: TBoxType c (Elem c) -> TBoxType c (Elem c) -> Elem c
distance = distance_
instance DataSet () where
type Elem () = Float
type TLabel () = Int
data TObservation () = TObservationUnit [Float]
data TDistType ()
type TVect () = V.Vector
data TBoxType () v = VectorBox (V.Vector v)
observations () = replicate 10 (TObservationUnit [0,0,0,0])
measurements (TObservationUnit xs) = xs
label (TObservationUnit _) = 111
kMeans :: ( Floating (Elem c)
, DataSet c
) => c
-> [TObservation c]
kMeans s = undefined -- here the implementation
where
labels = map label (observations s)
www = L.map (V.fromList.measurements) (observations s)
zzz = L.zipWith distance_ www www
wtf1 = L.foldl wtf2 0 (observations s)
wtf2 acc xs = acc + L.sum (measurements xs)
qq = V.fromList [1,2,3 :: Float]
l = distance (VectorBox qq) (VectorBox qq)
instance Floating a => WithDistance (TBoxType ()) a where
distance_ xs ys = undefined
instance Floating a => WithDistance V.Vector a where
distance_ xs ys = sqrt $ V.sum (V.zipWith (\x y -> (x+y)**2) xs ys)
This code somehow compiles and work, but it's pretty ugly and hacky.
The kMeans should be parametrized by value type (number, float point number, anything),
box type (vector,list,unboxed vector, tuple may be) and distance calculation strategy.
There are also types for Observation (that's the type of sample provided by user,
there should be a lot of them, measurements that contained in each observation).
So the problems are:
1) If the function does not contains the parametric types in it's signature,
types will not be deduced
2) Still no idea, how to declare typeclass WithDistance to have different instances
for different distance type (Euclid, Kullback, anything else via phantom types).
Right now WithDistance just polymorphic by box type and value type, so if we need
different strategies, we may only put them in different modules and import the required
module. But this is a hack and non-typed approach, right?
All of this may be done pretty easy in OCaml with is't modules. What the proper approach
to implement such things in Haskell?
Typeclasses with TypeFamilies somehow look similar to parametric modules, but they
work different. I really need something like that.
It is really the case that Haskell lacks useful features found in *ML module systems.
There is ongoing effort to extend Haskell's module system: http://plv.mpi-sws.org/backpack/
But I think you can get a bit further without those ML modules.
Your design follows God class anti-pattern and that is why it is anti-modular.
Type class can be useful only if every type can have no more than a single instance of that class. E.g. DataSet () instance fixes type TVect () = V.Vector and you can't easily create similar instance but with TVect = U.Vector.
You need to start with implementing kMeans function, then generalize it by replacing concrete types with type variables and constraining those type variables with type classes when needed.
Here is little example. At first you have some non-general implementation:
kMeans :: Int -> [(Double,Double)] -> [[(Double,Double)]]
kMeans k points = ...
Then you generalize it by distance calculation strategy:
kMeans
:: Int
-> ((Double,Double) -> (Double,Double) -> Double)
-> [(Double,Double)]
-> [[(Double,Double)]]
kMeans k distance points = ...
Now you can generalize it by type of points, but this requires introducing a class that will capture some properties of points that are used by distance computation e.g. getting list of coordinates:
kMeans
:: Point p
=> Int -> (p -> p -> Coord p) -> [p]
-> [[p]]
kMeans k distance points = ...
class Num (Coord p) => Point p where
type Coord p
coords :: p -> [Coord p]
euclidianDistance
:: (Point p, Floating (Coord p))
=> p -> p -> Coord p
euclidianDistance a b
= sum $ map (**2) $ zipWith (-) (coords a) (coords b)
Now you may wish to make it a bit faster by replacing lists with vectors:
kMeans
:: (Point p, DataSet vec p)
=> Int -> (p -> p -> Coord p) -> vec p
-> [vec p]
kMeans k distance points = ...
class DataSet vec p where
map :: ...
foldl' :: ...
instance Unbox p => DataSet U.Vector p where
map = U.map
foldl' = U.foldl'
And so on.
Suggested approach is to generalize various parts of algorithm and constrain those parts with small loosely coupled type classes (when required).
It is a bad style to collect everything in a single monolithic type class.

Resources