Pattern matching on a private data constructor - haskell

I'm writing a simple ADT for grid axis. In my application grid may be either regular (with constant step between coordinates), or irregular (otherwise). Of course, the regular grid is just a special case of irregular one, but it may worth to differentiate between them in some situations (for example, to perform some optimizations). So, I declare my ADT as the following:
data GridAxis = RegularAxis (Float, Float) Float -- (min, max) delta
| IrregularAxis [Float] -- [xs]
But I don't want user to create malformed axes with max < min or with unordered xs list. So, I add "smarter" construction functions which perform some basic checks:
regularAxis :: (Float, Float) -> Float -> GridAxis
regularAxis (a, b) dx = RegularAxis (min a b, max a b) (abs dx)
irregularAxis :: [Float] -> GridAxis
irregularAxis xs = IrregularAxis (sort xs)
I don't want user to create grids directly, so I don't add GridAxis data constructors into module export list:
module GridAxis (
GridAxis,
regularAxis,
irregularAxis,
) where
But it turned out that after having this done I cannot use pattern matching on GridAxis anymore. Trying to use it
import qualified GridAxis as GA
test :: GA.GridAxis -> Bool
test axis = case axis of
GA.RegularAxis -> True
GA.IrregularAxis -> False
gives the following compiler error:
src/Physics/ImplicitEMC.hs:7:15:
Not in scope: data constructor `GA.RegularAxis'
src/Physics/ImplicitEMC.hs:8:15:
Not in scope: data constructor `GA.IrregularAxis'
Is there something to work this around?

You can define constructor pattern synonyms. This lets you use the same name for smart construction and "dumb" pattern matching.
{-# LANGUAGE PatternSynonyms #-}
module GridAxis (GridAxis, pattern RegularAxis, pattern IrregularAxis) where
import Data.List
data GridAxis = RegularAxis_ (Float, Float) Float -- (min, max) delta
| IrregularAxis_ [Float] -- [xs]
-- The line with "<-" defines the matching behavior
-- The line with "=" defines the constructor behavior
pattern RegularAxis minmax delta <- RegularAxis_ minmax delta where
RegularAxis (a, b) dx = RegularAxis_ (min a b, max a b) (abs dx)
pattern IrregularAxis xs <- IrregularAxis_ xs where
IrregularAxis xs = IrregularAxis_ (sort xs)
Now you can do:
module Foo
import GridAxis
foo :: GridAxis -> a
foo (RegularAxis (a, b) d) = ...
foo (IrregularAxis xs) = ...
And also use RegularAxis and IrregularAxis as smart constructors.

This looks as a use case for pattern synonyms.
Basically you don't export the real constructor, but only a "smart" one
{-# LANGUAGE PatternSynonyms #-}
module M(T(), SmartCons, smartCons) where
data T = RealCons Int
-- the users will construct T using this
smartCons :: Int -> T
smartCons n = if even n then RealCons n else error "wrong!"
-- ... and destruct T using this
pattern SmartCons n <- RealCons n
Another module importing M can then use
case someTvalue of
SmartCons n -> use n
and e.g.
let value = smartCons 23 in ...
but can not use the RealCons directly.
If you prefer to stay in basic Haskell, without extensions, you can use a "view type"
module M(T(), smartCons, Tview(..), toView) where
data T = RealCons Int
-- the users will construct T using this
smartCons :: Int -> T
smartCons n = if even n then RealCons n else error "wrong!"
-- ... and destruct T using this
data Tview = Tview Int
toView :: T -> Tview
toView (RealCons n) = Tview n
Here, users have full access to the view type, which can be constructed/destructed freely, but have only a restricted start constructor for the actual type T. Destructing the actual type T is possible by moving to the view type
case toView someTvalue of
Tview n -> use n
For nested patterns, things become more cumbersome, unless you enable other extensions such as ViewPatterns.

Related

OCaml functors (parametrized modules) emulation in Haskell

Is there any recommended way to use typeclasses to emulate OCaml-like parametrized modules?
For an instance, I need the module that implements the complex
generic computation, that may be parmetrized with different
misc. types, functions, etc. To be more specific, let it be
kMeans implementation that could be parametrized with different
types of values, vector types (list, unboxed vector, vector, tuple, etc),
and distance calculation strategy.
For convenience, to avoid crazy amount of intermediate types, I want to
have this computation polymorphic by DataSet class, that contains all
required interfaces. I also tried to use TypeFamilies to avoid a lot
of typeclass parameters (that cause problems as well):
{-# Language MultiParamTypeClasses
, TypeFamilies
, FlexibleContexts
, FlexibleInstances
, EmptyDataDecls
, FunctionalDependencies
#-}
module Main where
import qualified Data.List as L
import qualified Data.Vector as V
import qualified Data.Vector.Unboxed as U
import Distances
-- contains instances for Euclid distance
-- import Distances.Euclid as E
-- contains instances for Kulback-Leibler "distance"
-- import Distances.Kullback as K
class ( Num (Elem c)
, Ord (TLabel c)
, WithDistance (TVect c) (Elem c)
, WithDistance (TBoxType c) (Elem c)
)
=> DataSet c where
type Elem c :: *
type TLabel c :: *
type TVect c :: * -> *
data TDistType c :: *
data TObservation c :: *
data TBoxType c :: * -> *
observations :: c -> [TObservation c]
measurements :: TObservation c -> [Elem c]
label :: TObservation c -> TLabel c
distance :: TBoxType c (Elem c) -> TBoxType c (Elem c) -> Elem c
distance = distance_
instance DataSet () where
type Elem () = Float
type TLabel () = Int
data TObservation () = TObservationUnit [Float]
data TDistType ()
type TVect () = V.Vector
data TBoxType () v = VectorBox (V.Vector v)
observations () = replicate 10 (TObservationUnit [0,0,0,0])
measurements (TObservationUnit xs) = xs
label (TObservationUnit _) = 111
kMeans :: ( Floating (Elem c)
, DataSet c
) => c
-> [TObservation c]
kMeans s = undefined -- here the implementation
where
labels = map label (observations s)
www = L.map (V.fromList.measurements) (observations s)
zzz = L.zipWith distance_ www www
wtf1 = L.foldl wtf2 0 (observations s)
wtf2 acc xs = acc + L.sum (measurements xs)
qq = V.fromList [1,2,3 :: Float]
l = distance (VectorBox qq) (VectorBox qq)
instance Floating a => WithDistance (TBoxType ()) a where
distance_ xs ys = undefined
instance Floating a => WithDistance V.Vector a where
distance_ xs ys = sqrt $ V.sum (V.zipWith (\x y -> (x+y)**2) xs ys)
This code somehow compiles and work, but it's pretty ugly and hacky.
The kMeans should be parametrized by value type (number, float point number, anything),
box type (vector,list,unboxed vector, tuple may be) and distance calculation strategy.
There are also types for Observation (that's the type of sample provided by user,
there should be a lot of them, measurements that contained in each observation).
So the problems are:
1) If the function does not contains the parametric types in it's signature,
types will not be deduced
2) Still no idea, how to declare typeclass WithDistance to have different instances
for different distance type (Euclid, Kullback, anything else via phantom types).
Right now WithDistance just polymorphic by box type and value type, so if we need
different strategies, we may only put them in different modules and import the required
module. But this is a hack and non-typed approach, right?
All of this may be done pretty easy in OCaml with is't modules. What the proper approach
to implement such things in Haskell?
Typeclasses with TypeFamilies somehow look similar to parametric modules, but they
work different. I really need something like that.
It is really the case that Haskell lacks useful features found in *ML module systems.
There is ongoing effort to extend Haskell's module system: http://plv.mpi-sws.org/backpack/
But I think you can get a bit further without those ML modules.
Your design follows God class anti-pattern and that is why it is anti-modular.
Type class can be useful only if every type can have no more than a single instance of that class. E.g. DataSet () instance fixes type TVect () = V.Vector and you can't easily create similar instance but with TVect = U.Vector.
You need to start with implementing kMeans function, then generalize it by replacing concrete types with type variables and constraining those type variables with type classes when needed.
Here is little example. At first you have some non-general implementation:
kMeans :: Int -> [(Double,Double)] -> [[(Double,Double)]]
kMeans k points = ...
Then you generalize it by distance calculation strategy:
kMeans
:: Int
-> ((Double,Double) -> (Double,Double) -> Double)
-> [(Double,Double)]
-> [[(Double,Double)]]
kMeans k distance points = ...
Now you can generalize it by type of points, but this requires introducing a class that will capture some properties of points that are used by distance computation e.g. getting list of coordinates:
kMeans
:: Point p
=> Int -> (p -> p -> Coord p) -> [p]
-> [[p]]
kMeans k distance points = ...
class Num (Coord p) => Point p where
type Coord p
coords :: p -> [Coord p]
euclidianDistance
:: (Point p, Floating (Coord p))
=> p -> p -> Coord p
euclidianDistance a b
= sum $ map (**2) $ zipWith (-) (coords a) (coords b)
Now you may wish to make it a bit faster by replacing lists with vectors:
kMeans
:: (Point p, DataSet vec p)
=> Int -> (p -> p -> Coord p) -> vec p
-> [vec p]
kMeans k distance points = ...
class DataSet vec p where
map :: ...
foldl' :: ...
instance Unbox p => DataSet U.Vector p where
map = U.map
foldl' = U.foldl'
And so on.
Suggested approach is to generalize various parts of algorithm and constrain those parts with small loosely coupled type classes (when required).
It is a bad style to collect everything in a single monolithic type class.

Omitting constructor arguments in Haskell case statements

Omitting function arguments is a nice tool for concise Haskell code.
h :: String -> Int
h = (4 +) . length
What about omitting data constructor arguments in case statements. The following code might be considered a little grungy, where s and i are the final arguments in A and B but are repeated as the final arguments in the body of each case match.
f :: Foo -> Int
f = \case
A s -> 4 + length s
B i -> 2 + id i
Is there a way to omit such arguments in case pattern matching? For constructors with a large number of arguments, this would radically shorten code width. E.g. the following pseudo code.
g :: Foo -> Int
g = \case
{- match `A` constructor -> function application to A's arguments -}
A -> (4 +) . length
{- match `B` constructor -> function application to B's arguments -}
B -> (2 +) . id
The GHC extension RecordWildCards lets you concisely bring all the fields of a constructor into scope (of course, this requires you to give names to those fields).
{-# LANGUAGE LambdaCase, RecordWildCards #-}
data Foo = Foo {field1, field2 :: Int} | Bar {field1 :: Int}
baz = \case
Foo{..} -> 4 + field2
Bar{..} -> 2 + field1
-- plus it also "sucks in" fields from a scope
mkBar400 = let field1 = 400 in Bar{..}
`
You can always refactor case statements on constructors into a single function so that from then on you only pass your concise function definitions as arguments to these specific functions. Allow me to illustrate.
Consider the Maybe a datatype:
data Maybe a = Nothing | Just a
Should you now need to define a function f :: Maybe a -> b (for some fixed b and perhaps also a), instead of writing it like
f Nothing = this
f (Just x) = that x
you could start by first defining a function
maybe f _ Nothing = f
maybe _ g (Just x) = g x
and then f can by defined as maybe this that. Pretty much as what happens with all the familiar recursion patterns.
This way you're effectively refactoring out case statements. The code gets arguably cleaner and it does not require language extensions.

Different types in case expression result in Haskell

I'm trying to implement some kind of message parser in Haskell, so I decided to use types for message types, not constructors:
data DebugMsg = DebugMsg String
data UpdateMsg = UpdateMsg [String]
.. and so on. I belive it is more useful to me, because I can define typeclass, say, Msg for message with all information/parsers/actions related to this message.
But I have problem here. When I try to write parsing function using case:
parseMsg :: (Msg a) => Int -> Get a
parseMsg code =
case code of
1 -> (parse :: Get DebugMsg)
2 -> (parse :: Get UpdateMsg)
..type of case result should be same in all branches. Is there any solution? And does it even possible specifiy only typeclass for function result and expect it to be fully polymorphic?
Yes, all the right hand sides of all your subcases must have the exact same type; and this type must be the same as the type of the whole case expression. This is a feature; it's required for the language to be able to guarantee at compilation time that there cannot be any type errors at runtime.
Some of the comments on your question mention that the simplest solution is to use a sum (a.k.a. variant) type:
data ParserMsg = DebugMsg String | UpdateMsg [String]
A consequence of this is that the set of alternative results is defined ahead of time. This is sometimes an upside (your code can be certain that there are no unhandled subcases), sometimes a downside (there is a finite number of subcases and they are determined at compilation time).
A more advanced solution in some cases—which you might not need, but I'll just throw it in—is to refactor the code to use functions as data. The idea is that you create a datatype that has functions (or monadic actions) as its fields, and then different behaviors = different functions as record fields.
Compare these two styles with this example. First, specifying different cases as a sum (this uses GADTs, but should be simple enough to understand):
{-# LANGUAGE GADTs #-}
import Data.Vector (Vector, (!))
import qualified Data.Vector as V
type Size = Int
type Index = Int
-- | A 'Frame' translates between a set of values and consecutive array
-- indexes. (Note: this simplified implementation doesn't handle duplicate
-- values.)
data Frame p where
-- | A 'SimpleFrame' is backed by just a 'Vector'
SimpleFrame :: Vector p -> Frame p
-- | A 'ProductFrame' is a pair of 'Frame's.
ProductFrame :: Frame p -> Frame q -> Frame (p, q)
getSize :: Frame p -> Size
getSize (SimpleFrame v) = V.length v
getSize (ProductFrame f g) = getSize f * getSize g
getIndex :: Frame p -> Index -> p
getIndex (SimpleFrame v) i = v!i
getIndex (ProductFrame f g) ij =
let (i, j) = splitIndex (getSize f, getSize g) ij
in (getIndex f i, getIndex g j)
pointIndex :: Eq p => Frame p -> p -> Maybe Index
pointIndex (SimpleFrame v) p = V.elemIndex v p
pointIndex (ProductFrame f g) (p, q) =
joinIndexes (getSize f, getSize g) (pointIndex f p) (pointIndex g q)
joinIndexes :: (Size, Size) -> Index -> Index -> Index
joinIndexes (_, rsize) i j = i * rsize + j
splitIndex :: (Size, Size) -> Index -> (Index, Index)
splitIndex (_, rsize) ij = (ij `div` rsize, ij `mod` rsize)
In this first example, a Frame can only ever be either a SimpleFrame or a ProductFrame, and every Frame function must be defined to handle both cases.
Second, datatype with function members (I elide code common to both examples):
data Frame p = Frame { getSize :: Size
, getIndex :: Index -> p
, pointIndex :: p -> Maybe Index }
simpleFrame :: Eq p => Vector p -> Frame p
simpleFrame v = Frame (V.length v) (v!) (V.elemIndex v)
productFrame :: Frame p -> Frame q -> Frame (p, q)
productFrame f g = Frame newSize getI pointI
where newSize = getSize f * getSize g
getI ij = let (i, j) = splitIndex (getSize f, getSize g) ij
in (getIndex f i, getIndex g j)
pointI (p, q) = joinIndexes (getSize f, getSize g)
(pointIndex f p)
(pointIndex g q)
Here the Frame type takes the getIndex and pointIndex operations as data members of the Frame itself. There isn't a fixed compile-time set of subcases, because the behavior of a Frame is determined by its element functions, which are supplied at runtime. So without having to touch those definitions, we could add:
import Control.Applicative ((<|>))
concatFrame :: Frame p -> Frame p -> Frame p
concatFrame f g = Frame newSize getI pointI
where newSize = getSize f + getSize g
getI ij | ij < getSize f = ij
| otherwise = ij - getSize f
pointI p = getPoint f p <|> fmap (+(getSize f)) (getPoint g p)
I call this second style "behavioral types," but that really is just me.
Note that type classes in GHC are implemented similarly to this—there is a hidden "dictionary" argument passed around, and this dictionary is a record whose members are implementations for the class methods:
data ShowDictionary a { primitiveShow :: a -> String }
stringShowDictionary :: ShowDictionary String
stringShowDictionary = ShowDictionary { primitiveShow = ... }
-- show "whatever"
-- ---> primitiveShow stringShowDictionary "whatever"
You could accomplish something like this with existential types, however it wouldn't work how you want it to, so you really shouldn't.
Doing it with normal polymorphism, as you have in your example, won't work at all. What your type says is that the function is valid for all a--that is, the caller gets to choose what kind of message to receive. However, you have to choose the message based on the numeric code, so this clearly won't do.
To clarify: all standard Haskell type variables are universally quantified by default. You can read your type signature as ∀a. Msg a => Int -> Get a. What this says is that the function is define for every value of a, regardless of what the argument may be. This means that it has to be able to return whatever particular a the caller wants, regardless of what argument it gets.
What you really want is something like ∃a. Msg a => Int -> Get a. This is why I said you could do it with existential types. However, this is relatively complicated in Haskell (you can't quite write a type signature like that) and will not actually solve your problem correctly; it's just something to keep in mind for the future.
Fundamentally, using classes and types like this is not very idiomatic in Haskell, because that's not what classes are meant to do. You would be much better off sticking to a normal algebraic data type for your messages.
I would have a single type like this:
data Message = DebugMsg String
| UpdateMsg [String]
So instead of having a parse function per type, just do the parsing in the parseMsg function as appropriate:
parseMsg :: Int -> String -> Message
parseMsg n msg = case n of
1 -> DebugMsg msg
2 -> UpdateMsg [msg]
(Obviously fill in whatever logic you actually have there.)
Essentially, this is the classical use for normal algebraic data types. There is no reason to have different types for the different kinds of messages, and life is much easier if they have the same type.
It looks like you're trying to emulate sub-typing from other languages. As a rule of thumb, you use algebraic data types in place of most of the uses of sub-types in other languages. This is certainly one of those cases.

What type signature do I need to allow a list of functions to be converted to haskell code? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why is such a function definition not allowed in haskell?
I made a haskell function called funlist. What it does is it takes a starting value, and a list of functions, and applies all of the functions in the list to the starting value.
funlist thing [function] = function thing
funlist thing (function:functions) = funlist (function thing) functions
funlist _ _ = error "need a list of functions"
The problem with this function is that it has a type of funlist :: t -> [t -> t] -> t. That type means that while ghc will allow a list of functions that don't convert the starting value to a completely different type (e.g [sin,cos,tan] will be allowed), a function that converts the starting value to a different type (e.g show) will generate an error because that function doesn't match the type signature.
This isn't how the function should work. It should be able to take a list of functions that change the starting values type (e.g. [sin,show]). This function basically converts funlist 5 [sin,cos,tan,isInfinite,show] to show $ isInfinite $ tan $ cos $ sin $ 5, and while the latter works, the former doesn't.
Is there any way that I can get this function to work properly?
EDIT: I know about . and >>>, I'm just wondering if there's a way to make this work.
You can write what you want with a GADT:
{-# LANGUAGE GADTs #-}
module Funlist where
data F x y where
Id :: F a a
Ap :: (a->b) -> F b c -> F a c
-- A very round about way to write f x = x + x
f1 :: Int -> Char
f1 = toEnum
f2 :: Char -> String
f2 x = x:x:[]
f3 :: String -> [Int]
f3 = map fromEnum
f4 :: [Int] -> Integer
f4 = foldr (+) 0 . map toInteger
f_list :: F Int Integer
f_list = Ap f1 (Ap f2 (Ap f3 (Ap f4 Id)))
ap :: F a b -> a -> b
ap Id x = x
ap (Ap f gs) x = ap gs (f x)
Now ap f_list 65 is 130
This does not work with normal functions/normal lists in Haskell, since it requires a dynamically typed language, and not a statically typed language like Haskell. The funlist function can't have a different type depending on what the contents of the function list is at runtime; its type must be known at compile-time. Further, the compiler must be able to check that the function chain is valid, so that you can't use the list [tan, show, sin] for example.
There are two solutions to this problem.
You can either use heterogenous lists. These lists can store lists where each element is a different type. You can then check the constraint that each element must be a function and that one elements return type must be the next function's parameter type. This can become very difficult very quickly.
You can also use Data.Dynamic to let your functions take and return dynamic types. You have to perform some dynamic type casts in that case.
If all you're going to do with this list of functions is apply them to a single value in a pipeline, then instead of writing and calling your funlist function, do this:
show . isInfinite . tan . cos . sin $ 5
or, if you don't want the list reversed in your code, do this:
import Control.Arrow (>>>)
(sin >>> cos >>> tan >>> isInfinite >>> show) 5
Functions in Haskell, in general, have types that look like a -> b, for some choice of a and b. In your case, you have a list [f0, ..., fn] of functions, and you want to compute this:
funlist [f0, ..., fn] x == f0 (funlist [f1, ..., fn] x)
== f0 (f1 (funlist [f2, ..., fn] x))
...
== f0 (f1 (... (fn x)))
The t -> t problem you're having is a consequence of these two things:
This computation requires the argument type of f0 to be the return type of f1, the argument type of f1 to be the return type of f2, and so on: f0 :: y -> z, f1 :: x -> y, ..., fn :: a -> b.
But you're putting all those functions in a list, and all the elements of a list in Haskell must have the same type.
These two, taken together, imply that the list of functions used in funlist must have type [t -> t], because that's the only way both conditions can be met at the same time.
Other than that, dave4420's answer is the best simple answer, IMO: use function composition. If you can't use it because the computation to be done is only known at runtime, then you want to have some data structure more complex than the list to represent the possible computations. Chris Kuklewicz presents a very generic solution for that, but I'd normally do something custom-made for the specific problem area at hand.
Also good to know that your funlist can be written like this:
funlist :: a -> [a -> a] -> a
funlist x fs = foldr (.) id fs x
Short answer: No, there's no way to do what you want with lists (in a sensible way, at least).
The reason is that lists in Haskell are always homogenous, i.e. each element of a list must have the same type. The functions you want to put to the list have types:
sin :: Floating a => a -> a
isInfinite :: Floating b => b -> Bool
show :: Show c => c -> String
So you can't just put the functions in the same list. Your two main options are to:
Use a structure other than list (e.g. HList or a custom GADT)
Use dynamic typing
Since the other answers already gave GADT examples, here's how you could implement your function using dynamic types:
import Data.Dynamic
funlist :: Dynamic -> [Dynamic] -> Dynamic
funlist thing (function:functions) = funlist (dynApp function thing) functions
funlist thing [] = thing
However, using dynamic types causes some boilerplate, because you have to convert between static and dynamic types. So, to call the function, you'd need to write
funlist (toDyn 5) [toDyn sin, toDyn cos, toDyn tan, toDyn isInfinite, toDyn show]
And unfortunately, even that is not enough. The next problem is that dynamic values must have homomorphic types, so for example instead of the function show :: Show a => a -> String you need to manually specify e.g. the concrete type show :: Bool -> String, so the above becomes:
funlist (toDyn (5::Double)) [toDyn sin, toDyn cos, toDyn tan, toDyn isInfinite,
toDyn (show :: Bool -> String)]
What's more, the result of the function is another dynamic value, so we need to convert it back to a static value if we want to use it in regular functions.
fromDyn (funlist (toDyn (5::Double)) [toDyn sin, toDyn cos, toDyn tan,
toDyn isInfinite, toDyn (show :: Bool -> String)]) ""
What you want works in Haskell, but it's not a list. It is a function composition and can actually be wrapped in a GADT:
import Control.Arrow
import Control.Category
import Prelude hiding ((.), id)
data Chain :: * -> * -> * where
Chain :: (a -> c) -> Chain c b -> Chain a b
Id :: Chain a a
apply :: Chain a b -> a -> b
apply (Chain f k) x = apply k (f x)
apply Id x = x
Now you can inspect the structure of the function chain to some extent. There isn't much you can find out, but you can add further meta information to the Chain constructor, if you need more.
The type also forms an interesting category that preserves the additional information:
instance Category Chain where
id = Id
Id . c = c
c . Id = c
c2 . Chain f1 k1 = Chain f1 (c2 . k1)
instance Arrow Chain where
arr f = Chain f Id
first (Chain f c) = Chain (first f) (first c)
first Id = Id
There where some answers using GADTs, which is a good way to do such things. What I want to add here is that the structure used in these answers already exists in a more general fashion: it's called a thrist ("type threaded list"):
Prelude Data.Thrist> let fs = Cons (show :: Char -> String) (Cons length Nil)
Prelude Data.Thrist> let f = foldl1Thrist (flip (.)) fs
Prelude Data.Thrist> :t fs
fs :: Thrist (->) Char Int
Prelude Data.Thrist> :t f
f :: Char -> Int
Prelude Data.Thrist> f 'a'
3
Of course, you could also use foldl1Thrist (>>>) fs instead. Note that thrists form a category, an arrow and a monoid (with appendThrist).

Sort by constructor ignoring (part of) value

Suppose I have
data Foo = A String Int | B Int
I want to take an xs :: [Foo] and sort it such that all the As are at the beginning, sorted by their strings, but with the ints in the order they appeared in the list, and then have all the Bs at the end, in the same order they appeared.
In particular, I want to create a new list containg the first A of each string and the first B.
I did this by defining a function taking Foos to (Int, String)s and using sortBy and groupBy.
Is there a cleaner way to do this? Preferably one that generalizes to at least 10 constructors.
Typeable, maybe? Something else that's nicer?
EDIT: This is used for processing a list of Foos that is used elsewhere. There is already an Ord instance which is the normal ordering.
You can use
sortBy (comparing foo)
where foo is a function that extracts the interesting parts into something comparable (e.g. Ints).
In the example, since you want the As sorted by their Strings, a mapping to Int with the desired properties would be too complicated, so we use a compound target type.
foo (A s _) = (0,s)
foo (B _) = (1,"")
would be a possible helper. This is more or less equivalent to Tikhon Jelvis' suggestion, but it leaves space for the natural Ord instance.
To make it easier to build comparison function for ADTs with large number of constructors, you can map values to their constructor index with SYB:
{-# LANGUAGE DeriveDataTypeable #-}
import Data.Generics
data Foo = A String Int | B Int deriving (Show, Eq, Typeable, Data)
cIndex :: Data a => a -> Int
cIndex = constrIndex . toConstr
Example:
*Main Data.Generics> cIndex $ A "foo" 42
1
*Main Data.Generics> cIndex $ B 0
2
Edit:After re-reading your question, I think the best option is to make Foo an instance of Ord. I do not think there is any way to do this automatically that will act the way you want (just using deriving will create different behavior).
Once Foo is an instance of Ord, you can just use sort from Data.List.
In your exact example, you can do something like this:
data Foo = A String Int | B Int deriving (Eq)
instance Ord Foo where
(A _ _) <= (B _) = True
(A s _) <= (A s' _) = s <= s'
(B _) <= (B _) = True
When something is an instance of Ord, it means the data type has some ordering. Once we know how to order something, we can use a bunch of existing functions (like sort) on it and it will behave how you want. Anything in Ord has to be part of Eq, which is what the deriving (Eq) bit does automatically.
You can also derive Ord. However, the behavior will not be exactly what you want--it will order by all of the fields if it has to (e.g. it will put As with the same string in order by their integers).
Further edit: I was thinking about it some more and realized my solution is probably semantically wrong.
An Ord instance is a statement about your whole data type. For example, I'm saying that Bs are always equal with each other when the derived Eq instance says otherwise.
If the data your representing always behaves like this (that is, Bs are all equal and As with the same string are all equal) then an Ord instance makes sense. Otherwise, you should not actually do this.
However, you can do something almost exactly like this: write your own special compare function (Foo -> Foo -> Ordering) that encapsulates exactly what you want to do then use sortBy. This properly codifies that your particular sorting is special rather than the natural ordering of the data type.
You could use some template haskell to fill in the missing transitive cases. The mkTransitiveLt creates the transitive closure of the given cases (if you order them least to greatest). This gives you a working less-than, which can be turned into a function that returns an Ordering.
{-# LANGUAGE TemplateHaskell #-}
import MkTransitiveLt
import Data.List (sortBy)
data Foo = A String Int | B Int | C | D | E deriving(Show)
cmp a b = $(mkTransitiveLt [|
case (a, b) of
(A _ _, B _) -> True
(B _, C) -> True
(C, D) -> True
(D, E) -> True
(A s _, A s' _) -> s < s'
otherwise -> False|])
lt2Ord f a b =
case (f a b, f b a) of
(True, _) -> LT
(_, True) -> GT
otherwise -> EQ
main = print $ sortBy (lt2Ord cmp) [A "Z" 1, A "A" 1, B 1, A "A" 0, C]
Generates:
[A "A" 1,A "A" 0,A "Z" 1,B 1,C]
mkTransitiveLt must be defined in a separate module:
module MkTransitiveLt (mkTransitiveLt)
where
import Language.Haskell.TH
mkTransitiveLt :: ExpQ -> ExpQ
mkTransitiveLt eq = do
CaseE e ms <- eq
return . CaseE e . reverse . foldl go [] $ ms
where
go ms m#(Match (TupP [a, b]) body decls) = (m:ms) ++
[Match (TupP [x, b]) body decls | Match (TupP [x, y]) _ _ <- ms, y == a]
go ms m = m:ms

Resources