Is there a canonical way of comparing/changing one/two records in haskell? - haskell

I want to compare two records in haskell, without defining each change in the datatype of the record with and each function of 2 datas for all of the elements of the record over and over.
I read about lens, but I could not find an example for that,
and do not know where begin to read in the documentation.
Example, not working:
data TheState = TheState { number :: Int,
truth :: Bool
}
initState = TheState 77 True
-- not working, example:
stateMaybe = fmap Just initState
-- result should be:
-- ANewStateType{ number = Just 77, truth = Just True}
The same way, I want to compare the 2 states:
state2 = TheState 78 True
-- not working, example
stateMaybe2 = someNewCompare initState state2
-- result should be:
-- ANewStateType{ number = Just 78, truth = Nothing}

As others have mentioned in comments, it's most likely easier to create a different record to hold the Maybe version of the fields and do the manual conversion. However there is a way to get the functor like mapping over your fields in a more automated way.
It's probably more involved than what you would want but it's possible to achieve using a pattern called Higher Kinded Data (HKD) and a library called barbies.
Here is a amazing blog post on the subject: https://chrispenner.ca/posts/hkd-options
And here is my attempt at using HKD on your specific example:
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE FlexibleContexts #-}
-- base
import Data.Functor.Identity
import GHC.Generics (Generic)
-- barbie
import Data.Barbie
type TheState = TheState_ Identity
data TheState_ f = TheState
{ number :: f Int
, truth :: f Bool
} deriving (Generic, FunctorB)
initState :: TheState
initState = TheState (pure 77) (pure True)
stateMaybe :: TheState_ Maybe
stateMaybe = bmap (Just . runIdentity) initState
What is happening here, is that we are wrapping every field of the record in a custom f. We now get to choose what to parameterise TheState with in order to wrap every field. A normal record now has all of its fields wrapped in Identity. But you can have other versions of the record easily available as well. The bmap function let's you map your transformation from one type of TheState_ to another.
Honestly, the blog post will do a much better job at explaining this than I would. I find the subject very interesting, but I am still very new to it myself.
Hope this helped! :-)

How to make a Functor out of a record. For that I have an answer: apply the function to > all of the items of the record.
I want to use the record as an heterogenous container / hashmap, where
the names determine the values-types
While there's no "easy", direct way of doing this, it can be accomplished with several existing libraries.
This answer uses red-black-record library, which is itself built over the anonymous products of sop-core. "sop-core" allows each field in a product to be wrapped in a functor like Maybe and provides functions to manipulate fields uniformly. "red-black-record" inherits this, adding named fields and conversions from normal records.
To make TheState compatible with "red-black-record", we need to do the following:
{-# LANGUAGE DataKinds, FlexibleContexts, ScopedTypeVariables,
DeriveGeneric, DeriveAnyClass,
TypeApplications #-}
import GHC.Generics
import Data.SOP
import Data.SOP.NP (NP,cliftA2_NP) -- anonymous n-ary products
import Data.RBR (Record, -- generalized record type with fields wrapped in functors
I(..), -- an identity functor for "simple" cases
Productlike, -- relates a map of types to its flattened list of types
ToRecord, toRecord, -- convert a normal record to its generalized form
RecordCode, -- returns the map of types correspoding to a normal record
toNP, fromNP, -- convert generalized record to and from n-ary product
getField) -- access field from generalized record using TypeApplication
data TheState = TheState { number :: Int,
truth :: Bool
} deriving (Generic,ToRecord)
We auto-derive the Generic instance that allows other code to introspect the structure of the datatype. This is needed by ToRecord, that allows conversion of normal records into their "generalized forms".
Now consider the following function:
compareRecords :: forall r flat. (ToRecord r,
Productlike '[] (RecordCode r) flat,
All Eq flat)
=> r
-> r
-> Record Maybe (RecordCode r)
compareRecords state1 state2 =
let mapIIM :: forall a. Eq a => I a -> I a -> Maybe a
mapIIM (I val1) (I val2) = if val1 /= val2 then Just val2
else Nothing
resultNP :: NP Maybe flat
resultNP = cliftA2_NP (Proxy #Eq)
mapIIM
(toNP (toRecord state1))
(toNP (toRecord state2))
in fromNP resultNP
It compares two records whatsoever that have ToRecord r instances, and also a corresponding flattened list of types that all have Eq instances (the Productlike '[] (RecordCode r) flat and All Eq flat constraints).
First it converts the initial record arguments to their generalized forms with toRecord. These generalized forms are parameterized with an identity functor I because they come from "pure" values and there aren't any effects are play, yet.
The generalized record forms are in turn converted to n-ary products with toNP.
Then we can use the cliftA2_NP function from "sop-core" to compare accross all fields using their respective Eq instances. The function requires specifying the Eq constraint using a Proxy.
The only thing left to do is reconstructing a generalized record (this one parameterized by Maybe) using fromNP.
An example of use:
main :: IO ()
main = do
let comparison = compareRecords (TheState 0 False) (TheState 0 True)
print (getField #"number" comparison)
print (getField #"truth" comparison)
getField is used to extract values from generalized records. The field name is given as a Symbol by way of -XTypeApplications.

Related

Haskell subclassing and instance overlap

Coming from the OOP world, I sometimes find myself trying to use the inheritance pattern in Haskell, with varying degrees of success. Here's a little puzzle I encountered with subclassing (using GHC 8.10.7).
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE UndecidableInstances #-}
import Data.List (sort)
class Collection c a where
-- gets list of elements in the collection
elements :: c a -> [a]
class OrderedCollection c a where
-- gets sorted list of elements in the collection
orderedElements :: c a -> [a]
instance (Ord a, OrderedCollection c a) => Collection c a where
-- "default" implementation
elements = orderedElements
newtype SortedList a = SortedList [a]
deriving Show
instance (Ord a) => OrderedCollection SortedList a where
-- need to sort the elements in the list
orderedElements (SortedList xs) = sort xs
instance Collection SortedList a where
-- "optimized" implementation: no need to sort
elements (SortedList xs) = xs
test :: (Ord a, Show a, OrderedCollection c a) => c a -> IO ()
test coll = do
putStrLn $ "ordered elements: " ++ show (orderedElements coll)
putStrLn $ "elements: " ++ show (elements coll)
myList :: SortedList Int
myList = SortedList [3, 2, 1]
main :: IO ()
main = do
test myList
After including the necessary language extensions, this still gave me an error: Overlapping instances for Collection c a arising from a use of ‘elements’. It suggests using IncoherentInstances. Since this extension is now deprecated in favor of per-instance pragmas, I added an INCOHERENT pragma to the subclass instance:
instance {-# INCOHERENT #-} (Ord a, OrderedCollection c a) => Collection c a where
...
This successfully compiled. However, the result was not what I expected, as the output was:
ordered elements: [1,2,3]
elements: [1,2,3]
What I wanted was for the specialized implementation of Collection for SortedList to override the default (in an OO language, SortedList would inherit from OrderedCollection and then override the elements method). But here the type checker does not know to use SortedList's custom Collection implementation, because the type signature of test only imposes the constraint OrderedCollection c a.
Next, I tried adding the Collection constraint:
test :: (Ord a, Show a, Collection c a, OrderedCollection c a) => c a -> IO ()
This gave me the output I wanted:
ordered elements: [1,2,3]
elements: [3,2,1]
However, GHC also issued a warning about "fragile inner bindings" and suggested I add the MonoLocalBinds extension, which silences that warning. In any case, I'm not thrilled with having to include the Collection c a constraint (given it's implied by OrderedCollection c a), or having to use incoherent instances.
Interestingly, if I changed the INCOHERENT pragma to OVERLAPPABLE, it still compiled, and it also allowed me to remove MonoLocalBinds.
My question is, are there any alternative approaches to achieving the desired "inheritance" behavior here, without needing the redundant constraint in test?
When you write this:
instance ... => Collection c a where
You're declaring a Collection instance for all types ever. And it doesn't matter at all what's on the left of the fat arrow =>. Constraints do not participate in instance resolution. When the compiler tries to lookup an instance for a particular type, it only looks at what's on the right of the fat arrow =>, and only after finding a matching instance does it check if its constraints are satisfied. And if they're not, the compiler won't go back to look for another instance. That's how instance resolution works, and there are good reasons for it.
So, to reiterate: Collection c a means that this is an instance for all types.
And therefore, any subsequent Collection instances you might declare would of course be overlapping.
Thankfully, in this particular case, there is a better way: you can declare default methods without creating a universal instance like that. To do that, declare the method right inside the class declaration. And yes, you can put constraints on it too (see docs):
class Collection c a where
-- gets list of elements in the collection
elements :: c a -> [a]
default elements :: OrderedCollection c a => c a -> [a]
elements = orderedElements
But more generally, while type classes plus existential quantification is technically equivalent to OOP-style class hierarchies, if you try to actually model your domain like that, it would be more and more awkward and painful the further you go. It's a bit like trying to model ADTs in something like Java. Technically possible, but oh so messy!
There are some legitimate cases where a class hierarchy may make sense (one notable example is the GHC exception system), but most of the time there are much simpler ways.

Subset algebraic data type, or type-level set, in Haskell

Suppose you have a large number of types and a large number of functions that each return "subsets" of these types.
Let's use a small example to make the situation more explicit. Here's a simple algebraic data type:
data T = A | B | C
and there are two functions f, g that return a T
f :: T
g :: T
For the situation at hand, assume it is important that f can only return a A or B and g can only return a B or C.
I would like to encode this in the type system. Here are a few reasons/circumstances why this might be desirable:
Let the functions f and g have a more informative signature than just ::T
Enforce that implementations of f and g do not accidentally return a forbidden type that users of the implementation then accidentally use
Allow code reuse, e.g. when helper functions are involved that only operate on subsets of type T
Avoid boilerplate code (see below)
Make refactoring (much!) easier
One way to do this is to split up the algebraic datatype and wrap the individual types as needed:
data A = A
data B = B
data C = C
data Retf = RetfA A | RetfB B
data Retg = RetgB B | RetgC C
f :: Retf
g :: Retg
This works, and is easy to understand, but carries a lot of boilerplate for frequent unwrapping of the return types Retf and Retg.
I don't see polymorphism being of any help, here.
So, probably, this is a case for dependent types. It's not really a type-level list, rather a type-level set, but I've never seen a type-level set.
The goal, in the end, is to encode the domain knowledge via the types, so that compile-time checks are available, without having excessive boilerplate. (The boilerplate gets really annoying when there are lots of types and lots of functions.)
Define an auxiliary sum type (to be used as a data kind) where each branch corresponds to a version of your main type:
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE StandaloneKindSignatures #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE DataKinds #-}
import Data.Kind
import Data.Void
import GHC.TypeLits
data Version = AllEnabled | SomeDisabled
Then define a type family that maps the version and the constructor name (given as a type-level Symbol) to the type () if that branch is allowed, and to the empty type Void if it's disallowed.
type Enabled :: Version -> Symbol -> Type
type family Enabled v ctor where
Enabled SomeDisabled "C" = Void
Enabled _ _ = ()
Then define your type as follows:
type T :: Version -> Type
data T v = A !(Enabled v "A")
| B !(Enabled v "B")
| C !(Enabled v "C")
(The strictness annotations are there to help the exhaustivity checker.)
Typeclass instances can be derived, but separately for each version:
deriving instance Show (T AllEnabled)
deriving instance Eq (T AllEnabled)
deriving instance Show (T SomeDisabled)
deriving instance Eq (T SomeDisabled)
Here's an example of use:
noC :: T SomeDisabled
noC = A ()
main :: IO ()
main = print $ case noC of
A _ -> "A"
B _ -> "B"
-- this doesn't give a warning with -Wincomplete-patterns
This solution makes pattern-matching and construction more cumbersome, because those () are always there.
A variation is to have one type family per branch (as in Trees that Grow) instead of a two-parameter type family.
I tried to achieve something like this in the past, but without much success -- I was not too satisfied with my solution.
Still, one can use GADTs to encode this constraint:
data TagA = IsA | NotA
data TagC = IsC | NotC
data T (ta :: TagA) (tc :: TagC) where
A :: T 'IsA 'NotC
B :: T 'NotA 'NotC
C :: T 'NotA 'IsC
-- existential wrappers
data TnotC where TnotC :: T ta 'NotC -> TnotC
data TnotA where TnotA :: T 'NotA tc -> TnotA
f :: TnotC
g :: TnotA
This however gets boring fast, because of the wrapping/unwrapping of the exponentials. Consumer functions are more convenient since we can write
giveMeNotAnA :: T 'NotA tc -> Int
to require anything but an A. Producer functions instead need to use existentials.
In a type with many constructors, it also gets inconvenient since we have to use a GADT with many tags/parameters. Maybe this can be streamlined with some clever typeclass machinery.
Giving each individual value its own type scales extremely badly, and is quite unnecessarily fine-grained.
What you probably want is just restrict the types by some property on their values. In e.g. Coq, that would be a subset type:
Inductive T: Type :=
| A
| B
| C.
Definition Retf: Type := { x: T | x<>C }.
Definition Retg: Type := { x: T | x<>A }.
Well, Haskell has no way of expressing such value constraints, but that doesn't stop you from creating types that conceptually fulfill them. Just use newtypes:
newtype Retf = Retf { getRetf :: T }
mkRetf :: T -> Maybe Retf
mkRetf C = Nothing
mkRetf x = Retf x
newtype Retg = Retg { getRetg :: T }
mkRetg :: ...
Then in the implementation of f, you match for the final result of mkRetf and raise an error if it's Nothing. That way, an implementation mistake that makes it give a C will unfortunately not give a compilation error, but at least a runtime error from within the function that's actually at fault, rather than somewhere further down the line.
An alternative that might be ideal for you is Liquid Haskell, which does support subset types. I can't say too much about it, but it's supposedly pretty good (and will in new GHC versions have direct support).

What's a better way of managing large Haskell records?

Replacing fields names with letters, I have cases like this:
data Foo = Foo { a :: Maybe ...
, b :: [...]
, c :: Maybe ...
, ... for a lot more fields ...
} deriving (Show, Eq, Ord)
instance Writer Foo where
write x = maybeWrite a ++
listWrite b ++
maybeWrite c ++
... for a lot more fields ...
parser = permute (Foo
<$?> (Nothing, Just `liftM` aParser)
<|?> ([], bParser)
<|?> (Nothing, Just `liftM` cParser)
... for a lot more fields ...
-- this is particularly hideous
foldl1 merge [foo1, foo2, ...]
merge (Foo a b c ...seriously a lot more...)
(Foo a' b' c' ...) =
Foo (max a a') (b ++ b') (max c c') ...
What techniques would allow me to better manage this growth?
In a perfect world a, b, and c would all be the same type so I could keep them in a list, but they can be many different types. I'm particularly interested in any way to fold the records without needing the massive patterns.
I'm using this large record to hold the different types resulting from permutation parsing the vCard format.
Update
I've implemented both the generics and the foldl approaches suggested below. They both work, and they both reduce three large field lists to one.
Datatype-generic programming techniques can be used to transform all the fields of a record in some "uniform" sort of way.
Perhaps all the fields in the record implement some typeclass that we want to use (the typical example is Show). Or perhaps we have another record of "similar" shape that contains functions, and we want to apply each function to the corresponding field of the original record.
For these kinds of uses, the generics-sop library is a good option. It expands the default Generics functionality of GHC with extra type-level machinery that provides analogues of functions like sequence or ap, but which work over all the fields of a record.
Using generics-sop, I tried to create a slightly less verbose version of your merge funtion. Some preliminary imports:
{-# language TypeOperators #-}
{-# language DeriveGeneric #-}
{-# language TypeFamilies #-}
{-# language DataKinds #-}
import Control.Applicative (liftA2)
import qualified GHC.Generics as GHC
import Generics.SOP
A helper function that lifts a binary operation to a form useable by the functions of generics-sop:
fn_2' :: (a -> a -> a) -> (I -.-> (I -.-> I)) a -- I is simply an Identity functor
fn_2' = fn_2 . liftA2
A general merge function that takes a vector of operators and works on any single-constructor record that derives Generic:
merge :: (Generic a, Code a ~ '[ xs ]) => NP (I -.-> (I -.-> I)) xs -> a -> a -> a
merge funcs reg1 reg2 =
case (from reg1, from reg2) of
(SOP (Z np1), SOP (Z np2)) ->
let npResult = funcs `hap` np1 `hap` np2
in to (SOP (Z npResult))
Code is a type family that returns a type-level list of lists describing the structure of a datatype. The outer list is for constructors, the inner lists contain the types of the fields for each constructor.
The Code a ~ '[ xs ] part of the constraint says "the datatype can only have one constructor" by requiring the outer list to have exactly one element.
The (SOP (Z _) pattern matches extract the (heterogeneus) vector of field values from the record's generic representation. SOP stands for "sum-of-products".
A concrete example:
data Person = Person
{
name :: String
, age :: Int
} deriving (Show,GHC.Generic)
instance Generic Person -- this Generic is from generics-sop
mergePerson :: Person -> Person -> Person
mergePerson = merge (fn_2' (++) :* fn_2' (+) :* Nil)
The Nil and :* constructors are used to build the vector of operators (the type is called NP, from n-ary product). If the vector doesn't match the number of fields in the record, the program won't compile.
Update. Given that the types in your record are highly uniform, an alternative way of creating the vector of operations is to define instances of an auxiliary typeclass for each field type, and then use the hcpure function:
class Mergeable a where
mergeFunc :: a -> a -> a
instance Mergeable String where
mergeFunc = (++)
instance Mergeable Int where
mergeFunc = (+)
mergePerson :: Person -> Person -> Person
mergePerson = merge (hcpure (Proxy :: Proxy Mergeable) (fn_2' mergeFunc))
The hcliftA2 function (that combines hcpure, fn_2 and hap) could be used to simplify things further.
Some suggestions:
(1) You can use the RecordWildCards extension to automatically
unpack a record into variables. Doesn't help if you need to unpack
two records of the same type, but it's a useful to keep in mind.
Oliver Charles has a nice blog post on it: (link)
(2) It appears your example application is performing a fold over the records.
Have a look at Gabriel Gonzalez's foldl package. There is also a blog post: (link)
Here is a example of how you might use it with a record like:
data Foo = Foo { _a :: Int, _b :: String }
The following code computes the maximum of the _a fields and the
concatenation of the _b_ fields.
import qualified Control.Foldl as L
import Data.Profunctor
data Foo = Foo { _a :: Int, _b :: String }
deriving (Show)
fold_a :: L.Fold Foo Int
fold_a = lmap _a (L.Fold max 0 id)
fold_b :: L.Fold Foo String
fold_b = lmap _b (L.Fold (++) "" id)
fold_foos :: L.Fold Foo Foo
fold_foos = Foo <$> fold_a <*> fold_b
theFoos = [ Foo 1 "a", Foo 3 "b", Foo 2 "c" ]
test = L.fold fold_foos theFoos
Note the use of the Profunctor function lmap to extract out
the fields we want to fold over. The expression:
L.Fold max 0 id
is a fold over a list of Ints (or any Num instance), and therefore:
lmap _a (L.Fold max 0 id)
is the same fold but over a list of Foo records where we use _a
to produce the Ints.

How can I make my type an instance of Arbitrary?

I have the following data and function
data Foo = A | B deriving (Show)
foolist :: Maybe Foo -> [Foo]
foolist Nothing = [A]
foolist (Just x) = [x]
prop_foolist x = (length (foolist x)) == 1
when running quickCheck prop_foolist, ghc tells me that Foo needs to be an instance of Arbitrary.
No instance for (Arbitrary Foo) arising from a use of ‘quickCheck’
In the expression: quickCheck prop_foolist
In an equation for ‘it’: it = quickCheck prop_foolist
I tried data Foo = A | B deriving (Show, Arbitrary), but this results in
Can't make a derived instance of ‘Arbitrary Foo’:
‘Arbitrary’ is not a derivable class
Try enabling DeriveAnyClass
In the data declaration for ‘Foo’
However, I can't figure out how to enble DeriveAnyClass. I just wanted to use quickcheck with my simple function! The possible values of x is Nothing, Just A and Just B. Surely this should be possible to test?
There are two reasonable approaches:
Reuse an existing instance
If there's another instance that looks similar, you can use it. The Gen type is an instance of Functor, Applicative, and even Monad, so you can easily build generators from other ones. This is probably the most important general technique for writing Arbitrary instances. Most complex instances will be built up from one or more simpler ones.
boolToFoo :: Bool -> Foo
boolToFoo False = A
boolToFoo True = B
instance Arbitrary Foo where
arbitrary = boolToFoo <$> arbitrary
In this case, Foo can't be "shrunk" to subparts in any meaningful way, so the default trivial implementation of shrink will work fine. If it were a more interesting type, you could have used some analogue of
shrink = map boolToFoo . shrink . fooToBool
Use the pieces available in Test.QuickCheck.Arbitrary and/or Test.QuickCheck.Gen
In this case, it's pretty easy to just put together the pieces:
import Test.QuickCheck.Arbitrary
data Foo = A | B
deriving (Show,Enum,Bounded)
instance Arbitrary Foo where
arbitrary = arbitraryBoundedEnum
As mentioned, the default shrink implementation would be fine in this case. In the case of a recursive type, you'd likely want to add
{-# LANGUAGE DeriveGeneric #-}
import GHC.Generics (Generic)
and then derive Generic for your type and use
instance Arbitrary ... where
...
shrink = genericShrink
As the documentation warns, genericShrink does not respect any internal validity conditions you may wish to impose, so some care may be required in some cases.
You asked about DeriveAnyClass. If you wanted that, you'd add
{-# LANGUAGE DeriveAnyClass #-}
to the top of your file. But you don't want that. You certainly don't want it here, anyway. It only works for classes that have a full complement of defaults based on Generics, typically using the DefaultSignatures extension. In this case, there is no default arbitrary :: Generic a => Gen a line in the Arbitrary class definition, and arbitrary is mandatory. So an instance of Arbitrary produced by DeriveAnyClass will produce a runtime error as soon as QuickCheck tries to call its arbitrary method.

Test if a value matches a constructor

Say I have a data type like so:
data NumCol = Empty |
Single Int |
Pair Int Int |
Lots [Int]
Now I wish to filter out the elements matching a given constructor from a [NumCol]. I can write it for, say, Pair:
get_pairs :: [NumCol] -> [NumCol]
get_pairs = filter is_pair
where is_pair (Pair _ _) = True
is_pair _ = False
This works, but it's not generic. I have to write a separate function for is_single, is_lots, etc.
I wish instead I could write:
get_pairs = filter (== Pair)
But this only works for type constructors that take no arguments (i.e. Empty).
So the question is, how can I write a function that takes a value and a constructor, and returns whether the value matches the constructor?
At least get_pairs itself can be defined relatively simply by using a list comprehension to filter instead:
get_pairs xs = [x | x#Pair {} <- xs]
For a more general solution of matching constructors, you can use prisms from the lens package:
{-# LANGUAGE TemplateHaskell #-}
import Control.Lens
import Control.Lens.Extras (is)
data NumCol = Empty |
Single Int |
Pair Int Int |
Lots [Int]
-- Uses Template Haskell to create the Prisms _Empty, _Single, _Pair and _Lots
-- corresponding to your constructors
makePrisms ''NumCol
get_pairs :: [NumCol] -> [NumCol]
get_pairs = filter (is _Pair)
Tags of tagged unions ought to be first-class values, and with a wee bit of effort, they are.
Jiggery-pokery alert:
{-# LANGUAGE GADTs, DataKinds, KindSignatures,
TypeFamilies, PolyKinds, FlexibleInstances,
PatternSynonyms
#-}
Step one: define type-level versions of the tags.
data TagType = EmptyTag | SingleTag | PairTag | LotsTag
Step two: define value-level witnesses for the representability of the type-level tags. Richard Eisenberg's Singletons library will do this for you. I mean something like this:
data Tag :: TagType -> * where
EmptyT :: Tag EmptyTag
SingleT :: Tag SingleTag
PairT :: Tag PairTag
LotsT :: Tag LotsTag
And now we can say what stuff we expect to find associated with a given tag.
type family Stuff (t :: TagType) :: * where
Stuff EmptyTag = ()
Stuff SingleTag = Int
Stuff PairTag = (Int, Int)
Stuff LotsTag = [Int]
So we can refactor the type you first thought of
data NumCol :: * where
(:&) :: Tag t -> Stuff t -> NumCol
and use PatternSynonyms to recover the behaviour you had in mind:
pattern Empty = EmptyT :& ()
pattern Single i = SingleT :& i
pattern Pair i j = PairT :& (i, j)
pattern Lots is = LotsT :& is
So what's happened is that each constructor for NumCol has turned into a tag indexed by the kind of tag it's for. That is, constructor tags now live separately from the rest of the data, synchronized by a common index which ensures that the stuff associated with a tag matches the tag itself.
But we can talk about tags alone.
data Ex :: (k -> *) -> * where -- wish I could say newtype here
Witness :: p x -> Ex p
Now, Ex Tag, is the type of "runtime tags with a type level counterpart". It has an Eq instance
instance Eq (Ex Tag) where
Witness EmptyT == Witness EmptyT = True
Witness SingleT == Witness SingleT = True
Witness PairT == Witness PairT = True
Witness LotsT == Witness LotsT = True
_ == _ = False
Moreover, we can easily extract the tag of a NumCol.
numColTag :: NumCol -> Ex Tag
numColTag (n :& _) = Witness n
And that allows us to match your specification.
filter ((Witness PairT ==) . numColTag) :: [NumCol] -> [NumCol]
Which raises the question of whether your specification is actually what you need. The point is that detecting a tag entitles you an expectation of that tag's stuff. The output type [NumCol] doesn't do justice to the fact that you know you have just the pairs.
How might you tighten the type of your function and still deliver it?
One approach is to use DataTypeable and the Data.Data module. This approach relies on two autogenerated typeclass instances that carry metadata about the type for you: Typeable and Data. You can derive them with {-# LANGUAGE DeriveDataTypeable #-}:
data NumCol = Empty |
Single Int |
Pair Int Int |
Lots [Int] deriving (Typeable, Data)
Now we have a toConstr function which, given a value, gives us a representation of its constructor:
toConstr :: Data a => a -> Constr
This makes it easy to compare two terms just by their constructors. The only remaining problem is that we need a value to compare against when we define our predicate! We can always just create a dummy value with undefined, but that's a bit ugly:
is_pair x = toConstr x == toConstr (Pair undefined undefined)
So the final thing we'll do is define a handy little class that automates this. The basic idea is to call toConstr on non-function values and recurse on any functions by first passing in undefined.
class Constrable a where
constr :: a -> Constr
instance Data a => Constrable a where
constr = toConstr
instance Constrable a => Constrable (b -> a) where
constr f = constr (f undefined)
This relies on FlexibleInstance, OverlappingInstances and UndecidableInstances, so it might be a bit evil, but, using the (in)famous eyeball theorem, it should be fine. Unless you add more instances or try to use it with something that isn't a constructor. Then it might blow up. Violently. No promises.
Finally, with the evil neatly contained, we can write an "equal by constructor" operator:
(=|=) :: (Data a, Constrable b) => a -> b -> Bool
e =|= c = toConstr e == constr c
(The =|= operator is a bit of a mnemonic, because constructors are syntactically defined with a |.)
Now you can write almost exactly what you wanted!
filter (=|= Pair)
Also, maybe you'd want to turn off the monomorphism restriction. In fact, here's the list of extensions I enabled that you can just use:
{-# LANGUAGE DeriveDataTypeable, FlexibleInstances, NoMonomorphismRestriction, OverlappingInstances, UndecidableInstances #-}
Yeah, it's a lot. But that's what I'm willing to sacrifice for the cause. Of not writing extra undefineds.
Honestly, if you don't mind relying on lens (but boy is that dependency a doozy), you should just go with the prism approach. The only thing to recommend mine is that you get to use the amusingly named Data.Data.Data class.

Resources