Subtypes for natural language types - haskell

I'm a linguist working on the formal syntax/semantics of Natural Languages. I've started
using Haskell quite recently and very soon I realized that I needed to add subtyping. For example given the types Human
and Animal,
I would like to have Human as a subtype of Animal. I found that this is possible using a coerce function where the instances are declared by the user, but I do not know how to define coerce in the instances I'm interested in. So basically I do not know what to add after 'coerce =' to make it work'. Here is the code up to that point:
{-# OPTIONS
-XMultiParamTypeClasses
-XFlexibleInstances
-XFunctionalDependencies
-XRankNTypes
-XTypeSynonymInstances
-XTypeOperators
#-}
module Model where
import Data.List
data Animal = A|B deriving (Eq,Show,Bounded,Enum)
data Man = C|D|E|K deriving (Eq,Show,Bounded,Enum)
class Subtype a b where
coerce :: a->b
instance Subtype Man Animal where
coerce=
animal:: [Animal]
animal = [minBound..maxBound]
man:: [Man]
man = [minBound..maxBound]
Thanks in advance

Just ignore the Subtype class for a second and examine the type of the coerce function you are writing. If the a is a Man and the b is an Animal, then the type of the coerce function you are writing should be:
coerce :: Man -> Animal
This means that all you have to do is write a sensible function that converts each one of your Man constructors (i.e. C | D | E | K) to a corresponding Animal constructor (i.e. A | B). That's what it means to subtype, where you define some function that maps the "sub" type onto the original type.
Of course, you can imagine that because you have four constructors for your Man type and only two constructors for your Animal type then you will end up with more than one Man constructor mapping to the same Animal constructor. There's nothing wrong with that and it just means that the coerce function is not reversible. I can't comment more on that without knowing exactly what those constructors were meant to represent.
The more general answer to your question is that there is no way to automatically know which constructors in Man should map to which constructors in Animal. That's why you have to write the coerce function to tell it what the relationship between men and animals is.
Note also that there is nothing special about the 'Subtype' class and 'coerce' function. You can just skip them and write an 'manToAnimal' function. After all there is no built-in language or compiler support for sub-typing and Subtype is just another class that some random guy came up with (and frankly, subtyping is not really idiomatic Haskell, but you didn't really ask about that). All that defining the class instance does is allow you to overload the function coerce to work on the Man type.
I hope that helps.

What level of abstraction are you working where you "need to add subtyping"?
Are you trying to create world model for your program encoded by Haskell types? (I can see this if your types are actually Animal, Dog, etc.)
Are you trying to create more general software, and you think subtypes would be a good design?
Or are you just learning haskell and playing around with things.
If (1), I think that will not work out for you so well. Haskell does not have very good reflective abilities -- i.e. ability to weave type logic into runtime logic. Your model would end up pretty deeply entangled with the implementation. I would suggest creaing a "world model" (set of) types, as opposed to a set of types corresponding to a specific world model. I.e., answer this question for Haskell: what is a world model?
If (2), think again :-). Subtyping is part of a design tradition in which Haskell does not participate. There are other ways to design your program, and they will end up playing nicer with the functional mindset then subtyping would have. It takes times to develop your functional design sense, so be patient with it. Just remember: keep it simple, stupid. Use data types and functions over them (but remember to use higher-order functions to generalize and share code). If you are reaching for advanced features (even typeclasses are fairly advanced in the sense I mean), you are probably doing it wrong.
If (3), see Doug's answer, and play with stuff. There are lots of ways to fake it, and they all kind of suck eventually.

I don't know much about Natural Languages so my suggestion may be missing the point, but this may be what you are looking for.
{-# OPTIONS
-XMultiParamTypeClasses
-XFlexibleContexts
#-}
module Main where
data Animal = Mammal | Reptile deriving (Eq, Show)
data Dog = Terrier | Hound deriving (Eq, Show)
data Snake = Cobra | Rattle deriving (Eq, Show)
class Subtype a b where
coerce :: a -> b
instance Subtype Animal Animal where
coerce = id
instance Subtype Dog Animal where
coerce _ = Mammal
instance Subtype Snake Animal where
coerce _ = Reptile
isWarmBlooded :: (Subtype a Animal) => a -> Bool
isWarmBlooded = (Mammal == ) . coerce
main = do
print $ isWarmBlooded Hound
print $ isWarmBlooded Cobra
print $ isWarmBlooded Mammal
Gives you:
True
False
True
Is that kind of what you are shooting for? Haskell doesn't have subtyping built-in, but this might do as a work-around. Admittedly, there are probably better ways to do this.
Note: This answer is not intended to point out the best, correct or idomatic way to solve the problem at hand. It is intended to answer the question which was "what to add after 'coerce=' to make it work."

You can't write the coerce function you're looking for — at least, not sensibly. There aren't any values in Animal that correspond with the values in Man, so you can't write a definition for coerce.
Haskell doesn't have subtyping as an explicit design decision, for various reasons (it allows type inference to work better, and allowing subtyping vastly complicates the language's type system). Instead, you should express relationships like this using aggregation:
data Animal = A | B | AnimalMan Man deriving (Eq, Show, Bounded, Enum)
data Man = C | D | E | K deriving (Eq, Show, Bounded, Enum)
AnimalMan now has the type Man -> Animal, exactly as you wanted coerce to have.

If I understood you correctly, it is quite possible. We will use type classes and generalized algebraic data types to implement this functionality.
If you want to be able to do something like this (where animals and humans can be fed but only humans can think):
animals :: [AnyAnimal]
animals = (replicate 5 . AnyAnimal $ SomeAnimal 10) ++ (replicate 5 . AnyAnimal $ SomeHuman 10 10)
humans :: [AnyHuman]
humans = replicate 5 . AnyHuman $ SomeHuman 10 10
animals' :: [AnyAnimal]
animals' = map coerce humans
animals'' :: [AnyAnimal]
animals'' = (map (\(AnyAnimal x) -> AnyAnimal $ feed 50 x) animals) ++
(map (\(AnyAnimal x) -> AnyAnimal $ feed 50 x) animals') ++
(map (\(AnyHuman x) -> AnyAnimal $ feed 50 x) humans)
humans' :: [AnyHuman]
humans' = (map (\(AnyHuman x) -> AnyHuman . think 100 $ feed 50 x) humans)
Then it's possible, for example:
{-# LANGUAGE GADTs #-}
{-# LANGUAGE MultiParamTypeClasses #-}
-- | The show is there only to make things easier
class (Show a) => IsAnimal a where
feed :: Int -> a -> a
-- other interface defining functions
class (IsAnimal a) => IsHuman a where
think :: Int -> a -> a
-- other interface defining functions
class Subtype a b where
coerce :: a -> b
data AnyAnimal where
AnyAnimal :: (IsAnimal a) => a -> AnyAnimal
instance Show AnyAnimal where
show (AnyAnimal x) = "AnyAnimal " ++ show x
data AnyHuman where
AnyHuman :: (IsHuman a) => a -> AnyHuman
instance Show AnyHuman where
show (AnyHuman x) = "AnyHuman " ++ show x
data SomeAnimal = SomeAnimal Int deriving Show
instance IsAnimal SomeAnimal where
feed = flip const
data SomeHuman = SomeHuman Int Int deriving Show
instance IsAnimal SomeHuman where
feed = flip const
instance IsHuman SomeHuman where
think = flip const
instance Subtype AnyHuman AnyAnimal where
coerce (AnyHuman x) = AnyAnimal x
animals :: [AnyAnimal]
animals = (replicate 5 . AnyAnimal $ SomeAnimal 10) ++ (replicate 5 . AnyAnimal $ SomeHuman 10 10)
humans :: [AnyHuman]
humans = replicate 5 . AnyHuman $ SomeHuman 10 10
animals' :: [AnyAnimal]
animals' = map coerce humans
Few comments:
You can make AnyAnimal and AnyHuman instances of their respective classes for convenience (atm. you have to unpack them first and pack them afterwards).
We could have single GADT AnyAnimal like this (both approaches have their use I would guess):
data AnyAnimal where
AnyAnimal :: (IsAnimal a) => a -> AnyAnimal
AnyHuman :: (IsHuman a) => a -> AnyAnimal
instance Show AnyHuman where
show (AnyHuman x) = "AnyHuman " ++ show x
show (AnyAnimal x) = "AnyAnimal " ++ show x
instance Subtype AnyAnimal AnyAnimal where
coerce (AnyHuman x) = AnyAnimal x
coerce (AnyAnimal x) = AnyAnimal x

It's rather advanced, but have a look at Edward Kmett's work on using the new Constraint kinds for this kind of functionality.

Related

Existential types in Haskell and generics in other languages

I was trying to grasp the concept of existential types in Haskell using the article Haskell/Existentially quantified types. At the first glance, the concept seems clear and somewhat similar to generics in object oriented languages. The main example there is something called "heterogeneous list", defined as follows:
data ShowBox = forall s. Show s => SB s
heteroList :: [ShowBox]
heteroList = [SB (), SB 5, SB True]
instance Show ShowBox where
show (SB s) = show s
f :: [ShowBox] -> IO ()
f xs = mapM_ print xs
main = f heteroList
I had a different notion of a "heterogeneous list", something like Shapeless in Scala. But here, it's just a list of items wrapped in an existential type that only adds a type constraint. The exact type of its elements is not manifested in its type signature, the only thing we know is that they all conform to the type constraint.
In object-oriented languages, it seems very natural to write something like this (example in Java). This is a ubiquitous use case, and I don't need to create a wrapper type to process a list of objects that all implement a certain interface. The animals list has a generic type List<Vocal>, so I can assume that its elements all conform to this Vocal interface:
interface Vocal {
void voice();
}
class Cat implements Vocal {
public void voice() {
System.out.println("meow");
}
}
class Dog implements Vocal {
public void voice() {
System.out.println("bark");
}
}
var animals = Arrays.asList(new Cat(), new Dog());
animals.forEach(Vocal::voice);
I noticed that existential types are only available as a language extension, and they are not described in most of the "basic" Haskell books or tutorials, so my suggestion is that this is quite an advanced language feature.
My question is, why? Something that seems basic in languages with generics (constructing and using a list of objects whose types implement some interface and accessing them polymorphically), in Haskell requires a language extension, custom syntax and creating an additional wrapper type? Is there no way of achieving something like that without using existential types, or is there just no basic-level use cases for this?
Or maybe I'm just mixing up the concepts, and existential types and generics mean completely different things. Please help me make sense of it.
Yes,existential types and generic mean different things. An existential type can be used similarly to an interface in an object-oriented language. You can put one in a list of course, but a list or any other generic type is not needed to use an interface. It is enough to have a variable of type Vocal to demonstrate its usage.
It is not widely used in Haskell because it is not really needed most of the time.
nonHeteroList :: [IO ()]
nonHeteroList = [print (), print 5, print True]
does the same thing without any language extension.
An existential type (or an interface in an object-oriented language) is nothing but a piece of data with a bundled dictionary of methods. If you only have one method in your dictionary, just use a function. If you have more than one, you can use a tuple or a record of those. So if you have something like
interface Shape {
void Draw();
double Area();
}
you can express it in Haskell as, for example,
type Shape = (IO (), Double)
and say
circle center radius = (drawCircle center radius, pi * radius * radius)
rectangle topLeft bottomRight = (drawRectangle topLeft bottomRight,
abs $ (topLeft.x-bottomRight.x) * (topLeft.y-bottomRight.y))
shapes = [circle (P 2.0 3.5) 4.2, rectangle (P 3.3 7.2) (P -2.0 3.1)]
though you can express exactly the same thing with type classes, instances and existentials
class Shape a where
draw :: a -> IO ()
area :: a -> Double
data ShapeBox = forall s. Shape s => SB s
instance Shape ShapeBox where
draw (SB a) = draw a
area (SB a) = area a
data Circle = Circle Point Double
instance Shape Circle where
draw (Circle c r) = drawCircle c r
area (Circle _ r) = pi * r * r
data Rectangle = Rectangle Point Point
instance Shape Rectangle where
draw (Rectangle tl br) = drawRectangle tl br
area (Rectangle tl br) = abs $ (tl.x - br.x) * (tl.y - br.y)
shapes = [Circle (P 2.0 3.5) 4.2, Rectangle (P 3.3 7.2) (P -2.0 3.1)]
and there you have it, N times longer.
is there just no basic-level use cases for this?
Sort-of, yeah. While in Java, you have no choice but to have open classes, Haskell has ADTs which you'd normally use for these kind of use-cases. In your example, Haskell can represent it in one of two ways:
data Cat = Cat
data Dog = Dog
class Animal a where
voice :: a -> String
instance Animal Cat where
voice Cat = "meow"
instance Animal Dog where
voice Dog = "woof"
or
data Animal = Cat | Dog
voice Cat = "meow"
voice Dog = "woof"
If you needed something extensible, you'd use the former, but if you need to be able to case on the type of animal, you'd use the latter. If you wanted the former, but wanted a list, you don't have to use existential types, you could instead capture what you wanted in a list, like:
voicesOfAnimals :: [() -> String]
voicesOfAnimals = [\_ -> voice Cat, \_ -> voice Dog]
Or even more simply
voicesOfAnimals :: [String]
voicesOfAnimals = [voice Cat, voice Dog]
This is kind-of what you're doing with Heterogenous lists anyway, you have a constraint, in this case Animal a on each element, which lets you call voice on each element, but nothing else, since the constraint doesn't give you any more information about the value (well if you had the constraint Typeable a you'd be able to do more, but let's not worry about dynamic types here).
As for the reason for why Haskell doesn't support Heterogenous lists without extensions and wrappers, I'll let someone else explain it but key topics are:
subtyping
variance
inference
https://gitlab.haskell.org/ghc/ghc/-/wikis/impredicative-polymorphism (I think)
In your Java example, what's the type of Arrays.asList(new Cat())? Well, it depends on what you declare it as. If you declare the variable with List<Cat>, it typechecks, you can declare it with List<Animal>, and you can declare it with List<Object>. If you declared it as a List<Cat>, you wouldn't be able to reassign it to List<Animal> as that would be unsound.
In Haskell, typeclasses can't be used as the type within a list (so [Cat] is valid in the first example and [Animal] is valid in the second example, but [Animal] isn't valid in the first example), and this seems to be due to impredicative polymorphism not being supported in Haskell (not 100% sure). Haskell lists are defined something like [a] = [] | a : [a]. [x, y, z] is just syntatic sugar for x : (y : (z : [])). So consider the example in Haskell. Let's say you type [Dog] in the repl (this is equivalent to Dog : [] btw). Haskell infers this to have the type [Dog]. But if you were to give it Cat at the front, like [Cat, Dog] (Cat : Dog : []), it would match the 2nd constructor (:), and would infer the type of Cat : ... to [Cat], which Dog : [] would fail to match.
Since others have explained how you can avoid existential types in many cases, I figured I'd point out why you might want them. The simplest example I can think of is called Coyoneda:
data Coyoneda f a = forall x. Coyoneda (x -> a) (f x)
Coyoneda f a holds a container (or other functor) full of some type x and a function that can be mapped over it to produce an f a. Here's the Functor instance:
instance Functor (Coyoneda f) where
fmap f (Coyoneda g x) = Coyoneda (f . g) x
Note that this does not have a Functor f constraint! What makes it useful? To explain that takes two more functions:
liftCoyoneda :: f a -> Coyoneda f a
liftCoyoneda = Coyoneda id
lowerCoyoneda :: Functor f => Coyoneda f a -> f a
lowerCoyoneda (Coyoneda f x) = fmap f x
The cool thing is that fmap applications get built up and performed all together:
lowerCoyoneda . fmap f . fmap g . fmap h . liftCoyoneda
is operationally
fmap (f . g . h)
rather than
fmap f . fmap g . fmap h
This can be useful if fmap is expensive in the underlying functor.

Subset algebraic data type, or type-level set, in Haskell

Suppose you have a large number of types and a large number of functions that each return "subsets" of these types.
Let's use a small example to make the situation more explicit. Here's a simple algebraic data type:
data T = A | B | C
and there are two functions f, g that return a T
f :: T
g :: T
For the situation at hand, assume it is important that f can only return a A or B and g can only return a B or C.
I would like to encode this in the type system. Here are a few reasons/circumstances why this might be desirable:
Let the functions f and g have a more informative signature than just ::T
Enforce that implementations of f and g do not accidentally return a forbidden type that users of the implementation then accidentally use
Allow code reuse, e.g. when helper functions are involved that only operate on subsets of type T
Avoid boilerplate code (see below)
Make refactoring (much!) easier
One way to do this is to split up the algebraic datatype and wrap the individual types as needed:
data A = A
data B = B
data C = C
data Retf = RetfA A | RetfB B
data Retg = RetgB B | RetgC C
f :: Retf
g :: Retg
This works, and is easy to understand, but carries a lot of boilerplate for frequent unwrapping of the return types Retf and Retg.
I don't see polymorphism being of any help, here.
So, probably, this is a case for dependent types. It's not really a type-level list, rather a type-level set, but I've never seen a type-level set.
The goal, in the end, is to encode the domain knowledge via the types, so that compile-time checks are available, without having excessive boilerplate. (The boilerplate gets really annoying when there are lots of types and lots of functions.)
Define an auxiliary sum type (to be used as a data kind) where each branch corresponds to a version of your main type:
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE StandaloneKindSignatures #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE DataKinds #-}
import Data.Kind
import Data.Void
import GHC.TypeLits
data Version = AllEnabled | SomeDisabled
Then define a type family that maps the version and the constructor name (given as a type-level Symbol) to the type () if that branch is allowed, and to the empty type Void if it's disallowed.
type Enabled :: Version -> Symbol -> Type
type family Enabled v ctor where
Enabled SomeDisabled "C" = Void
Enabled _ _ = ()
Then define your type as follows:
type T :: Version -> Type
data T v = A !(Enabled v "A")
| B !(Enabled v "B")
| C !(Enabled v "C")
(The strictness annotations are there to help the exhaustivity checker.)
Typeclass instances can be derived, but separately for each version:
deriving instance Show (T AllEnabled)
deriving instance Eq (T AllEnabled)
deriving instance Show (T SomeDisabled)
deriving instance Eq (T SomeDisabled)
Here's an example of use:
noC :: T SomeDisabled
noC = A ()
main :: IO ()
main = print $ case noC of
A _ -> "A"
B _ -> "B"
-- this doesn't give a warning with -Wincomplete-patterns
This solution makes pattern-matching and construction more cumbersome, because those () are always there.
A variation is to have one type family per branch (as in Trees that Grow) instead of a two-parameter type family.
I tried to achieve something like this in the past, but without much success -- I was not too satisfied with my solution.
Still, one can use GADTs to encode this constraint:
data TagA = IsA | NotA
data TagC = IsC | NotC
data T (ta :: TagA) (tc :: TagC) where
A :: T 'IsA 'NotC
B :: T 'NotA 'NotC
C :: T 'NotA 'IsC
-- existential wrappers
data TnotC where TnotC :: T ta 'NotC -> TnotC
data TnotA where TnotA :: T 'NotA tc -> TnotA
f :: TnotC
g :: TnotA
This however gets boring fast, because of the wrapping/unwrapping of the exponentials. Consumer functions are more convenient since we can write
giveMeNotAnA :: T 'NotA tc -> Int
to require anything but an A. Producer functions instead need to use existentials.
In a type with many constructors, it also gets inconvenient since we have to use a GADT with many tags/parameters. Maybe this can be streamlined with some clever typeclass machinery.
Giving each individual value its own type scales extremely badly, and is quite unnecessarily fine-grained.
What you probably want is just restrict the types by some property on their values. In e.g. Coq, that would be a subset type:
Inductive T: Type :=
| A
| B
| C.
Definition Retf: Type := { x: T | x<>C }.
Definition Retg: Type := { x: T | x<>A }.
Well, Haskell has no way of expressing such value constraints, but that doesn't stop you from creating types that conceptually fulfill them. Just use newtypes:
newtype Retf = Retf { getRetf :: T }
mkRetf :: T -> Maybe Retf
mkRetf C = Nothing
mkRetf x = Retf x
newtype Retg = Retg { getRetg :: T }
mkRetg :: ...
Then in the implementation of f, you match for the final result of mkRetf and raise an error if it's Nothing. That way, an implementation mistake that makes it give a C will unfortunately not give a compilation error, but at least a runtime error from within the function that's actually at fault, rather than somewhere further down the line.
An alternative that might be ideal for you is Liquid Haskell, which does support subset types. I can't say too much about it, but it's supposedly pretty good (and will in new GHC versions have direct support).

Set specific properties for data in Haskell

Let us say I want to make a ADT as follows in Haskell:
data Properties = Property String [String]
deriving (Show,Eq)
I want to know if it is possible to give the second list a bounded and enumerated property? Basically the first element of the list will be the minBound and the last element will be the maxBound. I am trying,
data Properties a = Property String [a]
deriving (Show, Eq)
instance Bounded (Properties a) where
minBound a = head a
maxBound a = (head . reverse) a
But not having much luck.
Well no, you can't do quite what you're asking, but maybe you'll find inspiration in this other neat trick.
{-# language ScopedTypeVariables, FlexibleContexts, UndecidableInstances #-}
import Data.Reflection -- from the reflection package
import qualified Data.List.NonEmpty as NE
import Data.List.NonEmpty (NonEmpty (..))
import Data.Proxy
-- Just the plain string part
newtype Pstring p = P String deriving Eq
-- Those properties you're interested in. It will
-- only be possible to produce bounds if there's at
-- least one property, so NonEmpty makes more sense
-- than [].
type Props = NonEmpty String
-- This is just to make a Show instance that does
-- what you seem to want easier to write. It's not really
-- necessary.
data Properties = Property String [String] deriving Show
Now we get to the key part, where we use reflection to produce class instances that can depend on run-time values. Roughly speaking, you can think of
Reifies x t => ...
as being a class-level version of
\(x :: t) -> ...
Because it operates at the class level, you can use it to parametrize instances. Since Reifies x t binds a type variable x, rather than a term variable, you need to use reflect to actually get the value back. If you happen to have a value on hand whose type ends in p, then you can just apply reflect to that value. Otherwise, you can always magic up a Proxy :: Proxy p to do the job.
-- If some Props are "in the air" tied to the type p,
-- then we can show them along with the string.
instance Reifies p Props => Show (Pstring p) where
showsPrec k p#(P str) =
showsPrec k $ Property str (NE.toList $ reflect p)
-- If some Props are "in the air" tied to the type p,
-- then we can give Pstring p a Bounded instance.
instance Reifies p Props => Bounded (Pstring p) where
minBound = P $ NE.head (reflect (Proxy :: Proxy p))
maxBound = P $ NE.last (reflect (Proxy :: Proxy p))
Now we need to have a way to actually bind types that can be passed to the type-level lambdas. This is done using the reify function. So let's throw some Props into the air and then let the butterfly nets get them back.
main :: IO ()
main = reify ("Hi" :| ["how", "are", "you"]) $
\(_ :: Proxy p) -> do
print (minBound :: Pstring p)
print (maxBound :: Pstring p)
./dfeuer#squirrel:~/src> ./WeirdBounded
Property "Hi" ["Hi","how","are","you"]
Property "you" ["Hi","how","are","you"]
You can think of reify x $ \(p :: Proxy p) -> ... as binding a type p to the value x; you can then pass the type p where you like by constraining things to have types involving p.
If you're just doing a couple of things, all this machinery is way more than necessary. Where it gets nice is when you're performing lots of operations with values that have phantom types carrying extra information. In many cases, you can avoid most of the explicit applications of reflect and the explicit proxy handling, because type inference just takes care of it all for you. For a good example of this technique in action, see the hyperloglog package. Configuration information for the HyperLogLog data structure is carried in a type parameter; this guarantees, at compile time, that only similarly configured structures are merged with each other.

What's a better way of managing large Haskell records?

Replacing fields names with letters, I have cases like this:
data Foo = Foo { a :: Maybe ...
, b :: [...]
, c :: Maybe ...
, ... for a lot more fields ...
} deriving (Show, Eq, Ord)
instance Writer Foo where
write x = maybeWrite a ++
listWrite b ++
maybeWrite c ++
... for a lot more fields ...
parser = permute (Foo
<$?> (Nothing, Just `liftM` aParser)
<|?> ([], bParser)
<|?> (Nothing, Just `liftM` cParser)
... for a lot more fields ...
-- this is particularly hideous
foldl1 merge [foo1, foo2, ...]
merge (Foo a b c ...seriously a lot more...)
(Foo a' b' c' ...) =
Foo (max a a') (b ++ b') (max c c') ...
What techniques would allow me to better manage this growth?
In a perfect world a, b, and c would all be the same type so I could keep them in a list, but they can be many different types. I'm particularly interested in any way to fold the records without needing the massive patterns.
I'm using this large record to hold the different types resulting from permutation parsing the vCard format.
Update
I've implemented both the generics and the foldl approaches suggested below. They both work, and they both reduce three large field lists to one.
Datatype-generic programming techniques can be used to transform all the fields of a record in some "uniform" sort of way.
Perhaps all the fields in the record implement some typeclass that we want to use (the typical example is Show). Or perhaps we have another record of "similar" shape that contains functions, and we want to apply each function to the corresponding field of the original record.
For these kinds of uses, the generics-sop library is a good option. It expands the default Generics functionality of GHC with extra type-level machinery that provides analogues of functions like sequence or ap, but which work over all the fields of a record.
Using generics-sop, I tried to create a slightly less verbose version of your merge funtion. Some preliminary imports:
{-# language TypeOperators #-}
{-# language DeriveGeneric #-}
{-# language TypeFamilies #-}
{-# language DataKinds #-}
import Control.Applicative (liftA2)
import qualified GHC.Generics as GHC
import Generics.SOP
A helper function that lifts a binary operation to a form useable by the functions of generics-sop:
fn_2' :: (a -> a -> a) -> (I -.-> (I -.-> I)) a -- I is simply an Identity functor
fn_2' = fn_2 . liftA2
A general merge function that takes a vector of operators and works on any single-constructor record that derives Generic:
merge :: (Generic a, Code a ~ '[ xs ]) => NP (I -.-> (I -.-> I)) xs -> a -> a -> a
merge funcs reg1 reg2 =
case (from reg1, from reg2) of
(SOP (Z np1), SOP (Z np2)) ->
let npResult = funcs `hap` np1 `hap` np2
in to (SOP (Z npResult))
Code is a type family that returns a type-level list of lists describing the structure of a datatype. The outer list is for constructors, the inner lists contain the types of the fields for each constructor.
The Code a ~ '[ xs ] part of the constraint says "the datatype can only have one constructor" by requiring the outer list to have exactly one element.
The (SOP (Z _) pattern matches extract the (heterogeneus) vector of field values from the record's generic representation. SOP stands for "sum-of-products".
A concrete example:
data Person = Person
{
name :: String
, age :: Int
} deriving (Show,GHC.Generic)
instance Generic Person -- this Generic is from generics-sop
mergePerson :: Person -> Person -> Person
mergePerson = merge (fn_2' (++) :* fn_2' (+) :* Nil)
The Nil and :* constructors are used to build the vector of operators (the type is called NP, from n-ary product). If the vector doesn't match the number of fields in the record, the program won't compile.
Update. Given that the types in your record are highly uniform, an alternative way of creating the vector of operations is to define instances of an auxiliary typeclass for each field type, and then use the hcpure function:
class Mergeable a where
mergeFunc :: a -> a -> a
instance Mergeable String where
mergeFunc = (++)
instance Mergeable Int where
mergeFunc = (+)
mergePerson :: Person -> Person -> Person
mergePerson = merge (hcpure (Proxy :: Proxy Mergeable) (fn_2' mergeFunc))
The hcliftA2 function (that combines hcpure, fn_2 and hap) could be used to simplify things further.
Some suggestions:
(1) You can use the RecordWildCards extension to automatically
unpack a record into variables. Doesn't help if you need to unpack
two records of the same type, but it's a useful to keep in mind.
Oliver Charles has a nice blog post on it: (link)
(2) It appears your example application is performing a fold over the records.
Have a look at Gabriel Gonzalez's foldl package. There is also a blog post: (link)
Here is a example of how you might use it with a record like:
data Foo = Foo { _a :: Int, _b :: String }
The following code computes the maximum of the _a fields and the
concatenation of the _b_ fields.
import qualified Control.Foldl as L
import Data.Profunctor
data Foo = Foo { _a :: Int, _b :: String }
deriving (Show)
fold_a :: L.Fold Foo Int
fold_a = lmap _a (L.Fold max 0 id)
fold_b :: L.Fold Foo String
fold_b = lmap _b (L.Fold (++) "" id)
fold_foos :: L.Fold Foo Foo
fold_foos = Foo <$> fold_a <*> fold_b
theFoos = [ Foo 1 "a", Foo 3 "b", Foo 2 "c" ]
test = L.fold fold_foos theFoos
Note the use of the Profunctor function lmap to extract out
the fields we want to fold over. The expression:
L.Fold max 0 id
is a fold over a list of Ints (or any Num instance), and therefore:
lmap _a (L.Fold max 0 id)
is the same fold but over a list of Foo records where we use _a
to produce the Ints.

Examining the binding structure in a free monad AST

Take this simple base functor and other machinery for a free monad with binding terms:
{-# LANGUAGE DeriveFunctor #-}
import Control.Monad.Free
data ProgF r =
FooF (Double -> r)
| BarF Double (Int -> r)
| EndF
deriving Functor
type Program = Free ProgF
foo = liftF (FooF id)
bar a = liftF (BarF a id)
And here's a simple program
prog :: Program Int
prog = do
a <- foo
bar a
It has the following (hand-crafted) AST:
prog =
Free (FooF (\p0 ->
Free (BarF p0 (\p1 ->
Pure p1))
What I'd like to be able to do is reason about bound terms in the following way:
look at the Pure term in the AST
note the bound variables that occur there
annotate the corresponding binding nodes in the AST
Annotating a free monad AST directly via a cofree comonad seems to be impossible without doing some kind of pairing, but you could imagine getting to something like the following annotated AST (via, say, Fix) in which nodes binding variables that appear in Pure are annotated with Just True:
annotatedProg =
Just False :< FooF (\p0 ->
Just True :< BarF p0 (\p1 ->
Nothing :< EndF))
So: is there a way to inspect the bindings in a program like this in such an ad-hoc way? I.e., without introducing a distinct variable type à la this question, for example.
I suspect that this might be impossible to do. Options like data-reify are attractive but it seems to be extremely difficult or impossible to make ProgF an instance of the requisite typeclasses (Foldable, Traversable, MuRef).
Is that intuition correct, or is there some means to do this that I haven't considered? Note that I'm happy to entertain any gruesomely unsafe or dynamic means.
I'm satisfied that this is not possible to do by any 'sane' ad-hoc method, for much the same reason that it's not possible to examine the binding structure of e.g. \a -> \b -> \c -> b + a.

Resources