Haskell: is there a way of 'mapping' over an algebraic data type? - haskell

Suppose that I have some simple algebraic data (essentially enums) and another type which has these enums as fields.
data Color = Red | Green | Blue deriving (Eq, Show, Enum, Ord)
data Width = Thin | Normal | Fat deriving (Eq, Show, Enum, Ord)
data Height = Short | Medium | Tall deriving (Eq, Show, Enum, Ord)
data Object = Object { color :: Colour
, width :: Width
, height :: Height } deriving (Show)
Given a list of objects, I want to test that the attributes are all distinct. For this I have following functions (using sort from Data.List)
allDifferent = comparePairwise . sort
where comparePairwise xs = and $ zipWith (/=) xs (drop 1 xs)
uniqueAttributes :: [Object] -> Bool
uniqueAttributes objects = all [ allDifferent $ map color objects
, allDifferent $ map width objects
, allDifferent $ map height objects ]
This works, but is rather dissatisfying because I had to type each field (color, width, height) manually. In my actual code, there are more fields! Is there a way of 'mapping' the function
\field -> allDifferent $ map field objects
over the fields of an algebraic datatype like Object? I want to treat Object as a list of its fields (something that would be easy in e.g. javascript), but these fields have different types...

Here is a solution using generics-sop:
pointwiseAllDifferent
:: (Generic a, Code a ~ '[ xs ], All Ord xs) => [a] -> Bool
pointwiseAllDifferent =
and
. hcollapse
. hcmap (Proxy :: Proxy Ord) (K . allDifferent)
. hunzip
. map (unZ . unSOP . from)
hunzip :: SListI xs => [NP I xs] -> NP [] xs
hunzip = foldr (hzipWith ((:) . unI)) (hpure [])
This assumes that the type Object you want to compare is a record type and requires that you make this type an instance of the class Generic, which can be done using Template Haskell:
deriveGeneric ''Object
Let's try to see what's going on here by looking at a concrete example:
objects = [Object Red Thin Short, Object Green Fat Short]
The line map (unZ . unSOP . from) converts each Object into a heterogeneous list (called an n-ary product in the library):
GHCi> map (unZ . unSOP . from) objects
[I Red :* (I Thin :* (I Short :* Nil)),I Green :* (I Fat :* (I Short :* Nil))]
The hunzip then turns this list of products into a product where each element is a list:
GHCi> hunzip it
[Red,Green] :* ([Thin,Fat] :* ([Short,Short] :* Nil))
Now, we apply allDifferent to each list in the product:
GHCi> hcmap (Proxy :: Proxy Ord) (K . allDifferent) it
K True :* (K True :* (K False :* Nil))
The product is now in fact homogeneous, as every position contains a Bool, so hcollapse turns it into a normal homogeneous list again:
GHCi> hcollapse it
[True,True,False]
The last step just applies and to it:
GHCi> and it
False

For this very specific situation (checking a set of attributes that are simple sum types with 0-arity constructors), you can use the following construction using Data.Data generics:
{-# LANGUAGE DeriveDataTypeable #-}
module Signature where
import Data.List (sort, transpose)
import Data.Data
data Color = Red | Green | Blue deriving (Eq, Show, Enum, Ord, Data)
data Width = Thin | Normal | Fat deriving (Eq, Show, Enum, Ord, Data)
data Height = Short | Medium | Tall deriving (Eq, Show, Enum, Ord, Data)
data Object = Object { color :: Color
, width :: Width
, height :: Height } deriving (Show, Data)
-- |Signature of attribute constructors used in object
signature :: Object -> [String]
signature = gmapQ (show . toConstr)
uniqueAttributes :: [Object] -> Bool
uniqueAttributes = all allDifferent . transpose . map signature
allDifferent :: (Ord a) => [a] -> Bool
allDifferent = comparePairwise . sort
where comparePairwise xs = and $ zipWith (/=) xs (drop 1 xs)
The key here is the function signature which takes an object and generically across its immediate children calculates the constructor name of each child. So:
*Signature> signature (Object Red Fat Medium)
["Red","Fat","Medium"]
*Signature>
If there are any fields other than these simple sum types, (like say an attribute of type data Weight = Weight Int or if you added a name :: String field to Object), then this will suddenly fail.
(Edited to add:) Note that you can use constrIndex . toConstr in place of show . toConstr to use an Int-valued constructor index (basically, the index starting with 1 of the constructor within the data definition), if this feels less indirect. If the Constr returned by toConstr had an Ord instance, there would be no indirection at all, but unfortunately...

Related

Deriving Eq and Show for an ADT that contains fields that can't have Eq or Show

I'd like to be able to derive Eq and Show for an ADT that contains multiple fields. One of them is a function field. When doing Show, I'd like it to display something bogus, like e.g. "<function>"; when doing Eq, I'd like it to ignore that field. How can I best do this without hand-writing a full instance for Show and Eq?
I don't want to wrap the function field inside a newtype and write my own Eq and Show for that - it would be too bothersome to use like that.
One way you can get proper Eq and Show instances is to, instead of hard-coding that function field, make it a type parameter and provide a function that just “erases” that field. I.e., if you have
data Foo = Foo
{ fooI :: Int
, fooF :: Int -> Int }
you change it to
data Foo' f = Foo
{ _fooI :: Int
, _fooF :: f }
deriving (Eq, Show)
type Foo = Foo' (Int -> Int)
eraseFn :: Foo -> Foo' ()
eraseFn foo = foo{ fooF = () }
Then, Foo will still not be Eq- or Showable (which after all it shouldn't be), but to make a Foo value showable you can just wrap it in eraseFn.
Typically what I do in this circumstance is exactly what you say you don’t want to do, namely, wrap the function in a newtype and provide a Show for that:
data T1
{ f :: X -> Y
, xs :: [String]
, ys :: [Bool]
}
data T2
{ f :: OpaqueFunction X Y
, xs :: [String]
, ys :: [Bool]
}
deriving (Show)
newtype OpaqueFunction a b = OpaqueFunction (a -> b)
instance Show (OpaqueFunction a b) where
show = const "<function>"
If you don’t want to do that, you can instead make the function a type parameter, and substitute it out when Showing the type:
data T3' a
{ f :: a
, xs :: [String]
, ys :: [Bool]
}
deriving (Functor, Show)
newtype T3 = T3 (T3' (X -> Y))
data Opaque = Opaque
instance Show Opaque where
show = const "..."
instance Show T3 where
show (T3 t) = show (Opaque <$ t)
Or I’ll refactor my data type to derive Show only for the parts I want to be Showable by default, and override the other parts:
data T4 = T4
{ f :: X -> Y
, xys :: T4' -- Move the other fields into another type.
}
instance Show T4 where
show (T4 f xys) = "T4 <function> " <> show xys
data T4' = T4'
{ xs :: [String]
, ys :: [Bool]
}
deriving (Show) -- Derive ‘Show’ for the showable fields.
Or if my type is small, I’ll use a newtype instead of data, and derive Show via something like OpaqueFunction:
{-# LANGUAGE DerivingVia #-}
newtype T5 = T5 (X -> Y, [String], [Bool])
deriving (Show) via (OpaqueFunction X Y, [String], [Bool])
You can use the iso-deriving package to do this for data types using lenses if you care about keeping the field names / record accessors.
As for Eq (or Ord), it’s not a good idea to have an instance that equates values that can be observably distinguished in some way, since some code will treat them as identical and other code will not, and now you’re forced to care about stability: in some circumstance where I have a == b, should I pick a or b? This is why substitutability is a law for Eq: forall x y f. (x == y) ==> (f x == f y) if f is a “public” function that upholds the invariants of the type of x and y (although floating-point also violates this). A better choice is something like T4 above, having equality only for the parts of a type that can satisfy the laws, or explicitly using comparison modulo some function at use sites, e.g., comparing someField.
The module Text.Show.Functions in base provides a show instance for functions that displays <function>. To use it, just:
import Text.Show.Functions
It just defines an instance something like:
instance Show (a -> b) where
show _ = "<function>"
Similarly, you can define your own Eq instance:
import Text.Show.Functions
instance Eq (a -> b) where
-- all functions are equal...
-- ...though some are more equal than others
_ == _ = True
data Foo = Foo Int Double (Int -> Int) deriving (Show, Eq)
main = do
print $ Foo 1 2.0 (+1)
print $ Foo 1 2.0 (+1) == Foo 1 2.0 (+2) -- is True
This will be an orphan instance, so you'll get a warning with -Wall.
Obviously, these instances will apply to all functions. You can write instances for a more specialized function type (e.g., only for Int -> String, if that's the type of the function field in your data type), but there is no way to simultaneously (1) use the built-in Eq and Show deriving mechanisms to derive instances for your datatype, (2) not introduce a newtype wrapper for the function field (or some other type polymorphism as mentioned in the other answers), and (3) only have the function instances apply to the function field of your data type and not other function values of the same type.
If you really want to limit applicability of the custom function instances without a newtype wrapper, you'd probably need to build your own generics-based solution, which wouldn't make much sense unless you wanted to do this for a lot of data types. If you go this route, then the Generics.Deriving.Show and Generics.Deriving.Eq modules in generic-deriving provide templates for these instances which could be modified to treat functions specially, allowing you to derive per-datatype instances using some stub instances something like:
instance Show Foo where showsPrec = myGenericShowsPrec
instance Eq Foo where (==) = myGenericEquality
I proposed an idea for adding annotations to fields via fields, that allows operating on behaviour of individual fields.
data A = A
{ a :: Int
, b :: Int
, c :: Int -> Int via Ignore (Int->Int)
}
deriving
stock GHC.Generic
deriving (Eq, Show)
via Generically A -- assuming Eq (Generically A)
-- Show (Generically A)
But this is already possible with the "microsurgery" library, but you might have to write some boilerplate to get it going. Another solution is to write separate behaviour in "sums-of-products style"
data A = A Int Int (Int->Int)
deriving
stock GHC.Generic
deriving
anyclass SOP.Generic
deriving (Eq, Show)
via A <-𝈖-> '[ '[ Int, Int, Ignore (Int->Int) ] ]

In this case, is it possible to fold Applicative <*> to avoid repetition?

In the following code
module Main where
import Control.Monad.State
import Control.Applicative
type Code = String
toSth :: Read a => State [Code] a
toSth = state $ \(c:cs) -> ((read c), cs)
codes = ["12", "True", ""]
data Tick = Tick {n :: Int, bid :: Bool} deriving (Show)
res = runState (pure Tick <*> toSth <*> toSth) codes
main = print res
I get the correct results
(Tick {n = 12, bid = True},[""])
But my problem is with the repetition of
pure Tick <*> toSth <*> toSth
I.e., if the record has 100 fields, then I have to write <*> toSth 100 times, which does not look Haskell.
Is there a way to foldl on <*>? I know the standard foldl :: Foldable t => (b -> a -> b) -> b -> t a -> b won't work here, because the accumulator type changes in each iteration.
Thanks a lot!
This can be done with some advanced generics libraries, like generics-sop.
Generics libraries translate datatypes from and to some kind of "uniform" representation. The libraries also provide functions to create or modify such representation. We can work over the representation and afterwards transform back into the original datatype.
{-# language DeriveGeneric, TypeApplications #-}
import qualified GHC.Generics
import Generics.SOP (Generic,to,SOP(SOP),NS(Z),hsequence,hcpure)
import Data.Proxy
data Tick = Tick {n :: Int, bid :: Bool} deriving (Show,GHC.Generics.Generic)
instance Generic Tick -- this Generic is from generics-sop
res :: (Tick, [Code])
res =
let tickAction :: State [Code] Tick
tickAction = to . SOP . Z <$> hsequence (hcpure (Proxy #Read) toSth)
in runState tickAction codes
hcpure creates an n-ary product out of an effectful function (here toSth) that knows how to create every member of the product. We have to pass a Proxy with the constraint to convince the compiler. The result is a product where each component is wrapped in State.
hsequence is like sequenceA but for n-ary products having different types for each component. The result is similar: the Applicative is "pulled outwards".
SOP and Z are constructors that wrap the product and let us call to to get a value of the original Tick type.
res could be given this more general signature to work over any single-constructor record that is an instance of Generics.SOP.Generic:
{-# language DataKinds #-}
res :: (Generic r, Generics.SOP.Code r ~ '[ xs ], Generics.SOP.All Read xs) => (r,[Code])
res =
let tickAction = to . SOP . Z <$> hsequence (hcpure (Proxy #Read) toSth)
in runState tickAction codes

What's a better way of managing large Haskell records?

Replacing fields names with letters, I have cases like this:
data Foo = Foo { a :: Maybe ...
, b :: [...]
, c :: Maybe ...
, ... for a lot more fields ...
} deriving (Show, Eq, Ord)
instance Writer Foo where
write x = maybeWrite a ++
listWrite b ++
maybeWrite c ++
... for a lot more fields ...
parser = permute (Foo
<$?> (Nothing, Just `liftM` aParser)
<|?> ([], bParser)
<|?> (Nothing, Just `liftM` cParser)
... for a lot more fields ...
-- this is particularly hideous
foldl1 merge [foo1, foo2, ...]
merge (Foo a b c ...seriously a lot more...)
(Foo a' b' c' ...) =
Foo (max a a') (b ++ b') (max c c') ...
What techniques would allow me to better manage this growth?
In a perfect world a, b, and c would all be the same type so I could keep them in a list, but they can be many different types. I'm particularly interested in any way to fold the records without needing the massive patterns.
I'm using this large record to hold the different types resulting from permutation parsing the vCard format.
Update
I've implemented both the generics and the foldl approaches suggested below. They both work, and they both reduce three large field lists to one.
Datatype-generic programming techniques can be used to transform all the fields of a record in some "uniform" sort of way.
Perhaps all the fields in the record implement some typeclass that we want to use (the typical example is Show). Or perhaps we have another record of "similar" shape that contains functions, and we want to apply each function to the corresponding field of the original record.
For these kinds of uses, the generics-sop library is a good option. It expands the default Generics functionality of GHC with extra type-level machinery that provides analogues of functions like sequence or ap, but which work over all the fields of a record.
Using generics-sop, I tried to create a slightly less verbose version of your merge funtion. Some preliminary imports:
{-# language TypeOperators #-}
{-# language DeriveGeneric #-}
{-# language TypeFamilies #-}
{-# language DataKinds #-}
import Control.Applicative (liftA2)
import qualified GHC.Generics as GHC
import Generics.SOP
A helper function that lifts a binary operation to a form useable by the functions of generics-sop:
fn_2' :: (a -> a -> a) -> (I -.-> (I -.-> I)) a -- I is simply an Identity functor
fn_2' = fn_2 . liftA2
A general merge function that takes a vector of operators and works on any single-constructor record that derives Generic:
merge :: (Generic a, Code a ~ '[ xs ]) => NP (I -.-> (I -.-> I)) xs -> a -> a -> a
merge funcs reg1 reg2 =
case (from reg1, from reg2) of
(SOP (Z np1), SOP (Z np2)) ->
let npResult = funcs `hap` np1 `hap` np2
in to (SOP (Z npResult))
Code is a type family that returns a type-level list of lists describing the structure of a datatype. The outer list is for constructors, the inner lists contain the types of the fields for each constructor.
The Code a ~ '[ xs ] part of the constraint says "the datatype can only have one constructor" by requiring the outer list to have exactly one element.
The (SOP (Z _) pattern matches extract the (heterogeneus) vector of field values from the record's generic representation. SOP stands for "sum-of-products".
A concrete example:
data Person = Person
{
name :: String
, age :: Int
} deriving (Show,GHC.Generic)
instance Generic Person -- this Generic is from generics-sop
mergePerson :: Person -> Person -> Person
mergePerson = merge (fn_2' (++) :* fn_2' (+) :* Nil)
The Nil and :* constructors are used to build the vector of operators (the type is called NP, from n-ary product). If the vector doesn't match the number of fields in the record, the program won't compile.
Update. Given that the types in your record are highly uniform, an alternative way of creating the vector of operations is to define instances of an auxiliary typeclass for each field type, and then use the hcpure function:
class Mergeable a where
mergeFunc :: a -> a -> a
instance Mergeable String where
mergeFunc = (++)
instance Mergeable Int where
mergeFunc = (+)
mergePerson :: Person -> Person -> Person
mergePerson = merge (hcpure (Proxy :: Proxy Mergeable) (fn_2' mergeFunc))
The hcliftA2 function (that combines hcpure, fn_2 and hap) could be used to simplify things further.
Some suggestions:
(1) You can use the RecordWildCards extension to automatically
unpack a record into variables. Doesn't help if you need to unpack
two records of the same type, but it's a useful to keep in mind.
Oliver Charles has a nice blog post on it: (link)
(2) It appears your example application is performing a fold over the records.
Have a look at Gabriel Gonzalez's foldl package. There is also a blog post: (link)
Here is a example of how you might use it with a record like:
data Foo = Foo { _a :: Int, _b :: String }
The following code computes the maximum of the _a fields and the
concatenation of the _b_ fields.
import qualified Control.Foldl as L
import Data.Profunctor
data Foo = Foo { _a :: Int, _b :: String }
deriving (Show)
fold_a :: L.Fold Foo Int
fold_a = lmap _a (L.Fold max 0 id)
fold_b :: L.Fold Foo String
fold_b = lmap _b (L.Fold (++) "" id)
fold_foos :: L.Fold Foo Foo
fold_foos = Foo <$> fold_a <*> fold_b
theFoos = [ Foo 1 "a", Foo 3 "b", Foo 2 "c" ]
test = L.fold fold_foos theFoos
Note the use of the Profunctor function lmap to extract out
the fields we want to fold over. The expression:
L.Fold max 0 id
is a fold over a list of Ints (or any Num instance), and therefore:
lmap _a (L.Fold max 0 id)
is the same fold but over a list of Foo records where we use _a
to produce the Ints.

Sort by constructor ignoring (part of) value

Suppose I have
data Foo = A String Int | B Int
I want to take an xs :: [Foo] and sort it such that all the As are at the beginning, sorted by their strings, but with the ints in the order they appeared in the list, and then have all the Bs at the end, in the same order they appeared.
In particular, I want to create a new list containg the first A of each string and the first B.
I did this by defining a function taking Foos to (Int, String)s and using sortBy and groupBy.
Is there a cleaner way to do this? Preferably one that generalizes to at least 10 constructors.
Typeable, maybe? Something else that's nicer?
EDIT: This is used for processing a list of Foos that is used elsewhere. There is already an Ord instance which is the normal ordering.
You can use
sortBy (comparing foo)
where foo is a function that extracts the interesting parts into something comparable (e.g. Ints).
In the example, since you want the As sorted by their Strings, a mapping to Int with the desired properties would be too complicated, so we use a compound target type.
foo (A s _) = (0,s)
foo (B _) = (1,"")
would be a possible helper. This is more or less equivalent to Tikhon Jelvis' suggestion, but it leaves space for the natural Ord instance.
To make it easier to build comparison function for ADTs with large number of constructors, you can map values to their constructor index with SYB:
{-# LANGUAGE DeriveDataTypeable #-}
import Data.Generics
data Foo = A String Int | B Int deriving (Show, Eq, Typeable, Data)
cIndex :: Data a => a -> Int
cIndex = constrIndex . toConstr
Example:
*Main Data.Generics> cIndex $ A "foo" 42
1
*Main Data.Generics> cIndex $ B 0
2
Edit:After re-reading your question, I think the best option is to make Foo an instance of Ord. I do not think there is any way to do this automatically that will act the way you want (just using deriving will create different behavior).
Once Foo is an instance of Ord, you can just use sort from Data.List.
In your exact example, you can do something like this:
data Foo = A String Int | B Int deriving (Eq)
instance Ord Foo where
(A _ _) <= (B _) = True
(A s _) <= (A s' _) = s <= s'
(B _) <= (B _) = True
When something is an instance of Ord, it means the data type has some ordering. Once we know how to order something, we can use a bunch of existing functions (like sort) on it and it will behave how you want. Anything in Ord has to be part of Eq, which is what the deriving (Eq) bit does automatically.
You can also derive Ord. However, the behavior will not be exactly what you want--it will order by all of the fields if it has to (e.g. it will put As with the same string in order by their integers).
Further edit: I was thinking about it some more and realized my solution is probably semantically wrong.
An Ord instance is a statement about your whole data type. For example, I'm saying that Bs are always equal with each other when the derived Eq instance says otherwise.
If the data your representing always behaves like this (that is, Bs are all equal and As with the same string are all equal) then an Ord instance makes sense. Otherwise, you should not actually do this.
However, you can do something almost exactly like this: write your own special compare function (Foo -> Foo -> Ordering) that encapsulates exactly what you want to do then use sortBy. This properly codifies that your particular sorting is special rather than the natural ordering of the data type.
You could use some template haskell to fill in the missing transitive cases. The mkTransitiveLt creates the transitive closure of the given cases (if you order them least to greatest). This gives you a working less-than, which can be turned into a function that returns an Ordering.
{-# LANGUAGE TemplateHaskell #-}
import MkTransitiveLt
import Data.List (sortBy)
data Foo = A String Int | B Int | C | D | E deriving(Show)
cmp a b = $(mkTransitiveLt [|
case (a, b) of
(A _ _, B _) -> True
(B _, C) -> True
(C, D) -> True
(D, E) -> True
(A s _, A s' _) -> s < s'
otherwise -> False|])
lt2Ord f a b =
case (f a b, f b a) of
(True, _) -> LT
(_, True) -> GT
otherwise -> EQ
main = print $ sortBy (lt2Ord cmp) [A "Z" 1, A "A" 1, B 1, A "A" 0, C]
Generates:
[A "A" 1,A "A" 0,A "Z" 1,B 1,C]
mkTransitiveLt must be defined in a separate module:
module MkTransitiveLt (mkTransitiveLt)
where
import Language.Haskell.TH
mkTransitiveLt :: ExpQ -> ExpQ
mkTransitiveLt eq = do
CaseE e ms <- eq
return . CaseE e . reverse . foldl go [] $ ms
where
go ms m#(Match (TupP [a, b]) body decls) = (m:ms) ++
[Match (TupP [x, b]) body decls | Match (TupP [x, y]) _ _ <- ms, y == a]
go ms m = m:ms

Haskell data structures oddity

I've been attempting to write a small file to try out a bag-like data structure. So far my code is as follows:
data Fruit = Apple | Banana | Pear deriving (Eq, Show)
data Bag a = EmptyBag | Contents [(a, Integer)]
emptyBag :: Bag a
emptyBag = EmptyBag
unwrap :: [a] -> a
unwrap [x] = x
isObject theObject (obj, inte) = theObject == obj
count :: Bag a -> a -> Integer
count (Contents [xs]) theObject = snd (unwrap (filter (isObject theObject) [xs]))
count EmptyBag _ = 0
But when I try and run it I get the error
Could not deduce (Eq a) from the context ()
arising from a use of 'isObject' at ....
Whereas when I take the count function out and call
snd(unwrap(filter (isObject Banana) [(Apple,1),(Banana,2)]))
it happily returns 2.
Any clues on why this is, or advice on writing this kind of data structure would be much appreciated.
(==) can only be used in a context that includes Eq, but when you declared count you didn't include that context. If I'm reading correctly, that would be
count :: Eq a => Bag a -> a -> Integer
If you declare count without including the type, you can ask ghci for the inferred type; or just ask for the inferred type of snd (unwrap (filter (isObject Banana) [(Apple,1),(Banana,2)]))

Resources