Sort by constructor ignoring (part of) value - haskell

Suppose I have
data Foo = A String Int | B Int
I want to take an xs :: [Foo] and sort it such that all the As are at the beginning, sorted by their strings, but with the ints in the order they appeared in the list, and then have all the Bs at the end, in the same order they appeared.
In particular, I want to create a new list containg the first A of each string and the first B.
I did this by defining a function taking Foos to (Int, String)s and using sortBy and groupBy.
Is there a cleaner way to do this? Preferably one that generalizes to at least 10 constructors.
Typeable, maybe? Something else that's nicer?
EDIT: This is used for processing a list of Foos that is used elsewhere. There is already an Ord instance which is the normal ordering.

You can use
sortBy (comparing foo)
where foo is a function that extracts the interesting parts into something comparable (e.g. Ints).
In the example, since you want the As sorted by their Strings, a mapping to Int with the desired properties would be too complicated, so we use a compound target type.
foo (A s _) = (0,s)
foo (B _) = (1,"")
would be a possible helper. This is more or less equivalent to Tikhon Jelvis' suggestion, but it leaves space for the natural Ord instance.

To make it easier to build comparison function for ADTs with large number of constructors, you can map values to their constructor index with SYB:
{-# LANGUAGE DeriveDataTypeable #-}
import Data.Generics
data Foo = A String Int | B Int deriving (Show, Eq, Typeable, Data)
cIndex :: Data a => a -> Int
cIndex = constrIndex . toConstr
Example:
*Main Data.Generics> cIndex $ A "foo" 42
1
*Main Data.Generics> cIndex $ B 0
2

Edit:After re-reading your question, I think the best option is to make Foo an instance of Ord. I do not think there is any way to do this automatically that will act the way you want (just using deriving will create different behavior).
Once Foo is an instance of Ord, you can just use sort from Data.List.
In your exact example, you can do something like this:
data Foo = A String Int | B Int deriving (Eq)
instance Ord Foo where
(A _ _) <= (B _) = True
(A s _) <= (A s' _) = s <= s'
(B _) <= (B _) = True
When something is an instance of Ord, it means the data type has some ordering. Once we know how to order something, we can use a bunch of existing functions (like sort) on it and it will behave how you want. Anything in Ord has to be part of Eq, which is what the deriving (Eq) bit does automatically.
You can also derive Ord. However, the behavior will not be exactly what you want--it will order by all of the fields if it has to (e.g. it will put As with the same string in order by their integers).
Further edit: I was thinking about it some more and realized my solution is probably semantically wrong.
An Ord instance is a statement about your whole data type. For example, I'm saying that Bs are always equal with each other when the derived Eq instance says otherwise.
If the data your representing always behaves like this (that is, Bs are all equal and As with the same string are all equal) then an Ord instance makes sense. Otherwise, you should not actually do this.
However, you can do something almost exactly like this: write your own special compare function (Foo -> Foo -> Ordering) that encapsulates exactly what you want to do then use sortBy. This properly codifies that your particular sorting is special rather than the natural ordering of the data type.

You could use some template haskell to fill in the missing transitive cases. The mkTransitiveLt creates the transitive closure of the given cases (if you order them least to greatest). This gives you a working less-than, which can be turned into a function that returns an Ordering.
{-# LANGUAGE TemplateHaskell #-}
import MkTransitiveLt
import Data.List (sortBy)
data Foo = A String Int | B Int | C | D | E deriving(Show)
cmp a b = $(mkTransitiveLt [|
case (a, b) of
(A _ _, B _) -> True
(B _, C) -> True
(C, D) -> True
(D, E) -> True
(A s _, A s' _) -> s < s'
otherwise -> False|])
lt2Ord f a b =
case (f a b, f b a) of
(True, _) -> LT
(_, True) -> GT
otherwise -> EQ
main = print $ sortBy (lt2Ord cmp) [A "Z" 1, A "A" 1, B 1, A "A" 0, C]
Generates:
[A "A" 1,A "A" 0,A "Z" 1,B 1,C]
mkTransitiveLt must be defined in a separate module:
module MkTransitiveLt (mkTransitiveLt)
where
import Language.Haskell.TH
mkTransitiveLt :: ExpQ -> ExpQ
mkTransitiveLt eq = do
CaseE e ms <- eq
return . CaseE e . reverse . foldl go [] $ ms
where
go ms m#(Match (TupP [a, b]) body decls) = (m:ms) ++
[Match (TupP [x, b]) body decls | Match (TupP [x, y]) _ _ <- ms, y == a]
go ms m = m:ms

Related

Using data constructor as a function parameter

I am making my way through "Haskell Programming..." and, in Chapter 10, have been working with a toy database. The database is defined as:
data DatabaseItem = DBString String
| DBNumber Integer
| DBDate UTCTime
deriving (Eq, Ord, Show)
and, given a database of the form [databaseItem], I am asked to write a function
dbNumberFilter :: [DatabaseItem] -> [Integer]
that takes a list of DatabaseItems, filters them for DBNumbers, and returns a list the of Integer values stored in them.
I solved that with:
dbNumberFilter db = foldr selectDBNumber [] db
where
selectDBNumber (DBNumber a) b = a : b
selectDBNumber _ b = b
Obviously, I can write an almost identical to extract Strings or UTCTTimes, but I am wondering if there is a way to create a generic filter that can extract a list of Integers, Strings, by passing the filter a chosen data constructor. Something like:
dbGenericFilter :: (a -> DataBaseItem) -> [DatabaseItem] -> [a]
dbGenericFilter DBICon db = foldr selectDBDate [] db
where
selectDBDate (DBICon a) b = a : b
selectDBDate _ b = b
where by passing DBString, DBNumber, or DBDate in the DBICon parameter, will return a list of Strings, Integers, or UTCTimes respectively.
I can't get the above, or any variation of it that I can think of, to work. But is there a way of achieving this effect?
You can't write a function so generic that it just takes a constructor as its first argument and then does what you want. Pattern matches are not first class in Haskell - you can't pass them around as arguments. But there are things you could do to write this more simply.
One approach that isn't really any more generic, but is certainly shorter, is to make use of the fact that a failed pattern match in a list comprehension skips the item:
dbNumberFilter db = [n | DBNumber n <- db]
If you prefer to write something generic, such that dbNUmberFilter = genericFilter x for some x, you can extract the concept of "try to match a DBNumber" into a function:
import Data.Maybe (mapMaybe)
genericFilter :: (DatabaseItem -> Maybe a) -> [DatabaseItem] -> [a]
genericFilter = mapMaybe
dbNumberFilter = genericFilter getNumber
where getNumber (DBNumber n) = Just n
getNumber _ = Nothing
Another somewhat relevant generic thing you could do would be to define the catamorphism for your type, which is a way of abstracting all possible pattern matches for your type into a single function:
dbCata :: (String -> a)
-> (Integer -> a)
-> (UTCTime -> a)
-> DatabaseItem -> a
dbCata s i t (DBString x) = s x
dbCata s i t (DBNumber x) = i x
dbCata s i t (DBDate x) = t x
Then you can write dbNumberFilter with three function arguments instead of a pattern match:
dbNumberFilter :: [DatabaseItem] -> [Integer]
dbNumberFilter = (>>= dbCata mempty pure mempty)

What's a better way of managing large Haskell records?

Replacing fields names with letters, I have cases like this:
data Foo = Foo { a :: Maybe ...
, b :: [...]
, c :: Maybe ...
, ... for a lot more fields ...
} deriving (Show, Eq, Ord)
instance Writer Foo where
write x = maybeWrite a ++
listWrite b ++
maybeWrite c ++
... for a lot more fields ...
parser = permute (Foo
<$?> (Nothing, Just `liftM` aParser)
<|?> ([], bParser)
<|?> (Nothing, Just `liftM` cParser)
... for a lot more fields ...
-- this is particularly hideous
foldl1 merge [foo1, foo2, ...]
merge (Foo a b c ...seriously a lot more...)
(Foo a' b' c' ...) =
Foo (max a a') (b ++ b') (max c c') ...
What techniques would allow me to better manage this growth?
In a perfect world a, b, and c would all be the same type so I could keep them in a list, but they can be many different types. I'm particularly interested in any way to fold the records without needing the massive patterns.
I'm using this large record to hold the different types resulting from permutation parsing the vCard format.
Update
I've implemented both the generics and the foldl approaches suggested below. They both work, and they both reduce three large field lists to one.
Datatype-generic programming techniques can be used to transform all the fields of a record in some "uniform" sort of way.
Perhaps all the fields in the record implement some typeclass that we want to use (the typical example is Show). Or perhaps we have another record of "similar" shape that contains functions, and we want to apply each function to the corresponding field of the original record.
For these kinds of uses, the generics-sop library is a good option. It expands the default Generics functionality of GHC with extra type-level machinery that provides analogues of functions like sequence or ap, but which work over all the fields of a record.
Using generics-sop, I tried to create a slightly less verbose version of your merge funtion. Some preliminary imports:
{-# language TypeOperators #-}
{-# language DeriveGeneric #-}
{-# language TypeFamilies #-}
{-# language DataKinds #-}
import Control.Applicative (liftA2)
import qualified GHC.Generics as GHC
import Generics.SOP
A helper function that lifts a binary operation to a form useable by the functions of generics-sop:
fn_2' :: (a -> a -> a) -> (I -.-> (I -.-> I)) a -- I is simply an Identity functor
fn_2' = fn_2 . liftA2
A general merge function that takes a vector of operators and works on any single-constructor record that derives Generic:
merge :: (Generic a, Code a ~ '[ xs ]) => NP (I -.-> (I -.-> I)) xs -> a -> a -> a
merge funcs reg1 reg2 =
case (from reg1, from reg2) of
(SOP (Z np1), SOP (Z np2)) ->
let npResult = funcs `hap` np1 `hap` np2
in to (SOP (Z npResult))
Code is a type family that returns a type-level list of lists describing the structure of a datatype. The outer list is for constructors, the inner lists contain the types of the fields for each constructor.
The Code a ~ '[ xs ] part of the constraint says "the datatype can only have one constructor" by requiring the outer list to have exactly one element.
The (SOP (Z _) pattern matches extract the (heterogeneus) vector of field values from the record's generic representation. SOP stands for "sum-of-products".
A concrete example:
data Person = Person
{
name :: String
, age :: Int
} deriving (Show,GHC.Generic)
instance Generic Person -- this Generic is from generics-sop
mergePerson :: Person -> Person -> Person
mergePerson = merge (fn_2' (++) :* fn_2' (+) :* Nil)
The Nil and :* constructors are used to build the vector of operators (the type is called NP, from n-ary product). If the vector doesn't match the number of fields in the record, the program won't compile.
Update. Given that the types in your record are highly uniform, an alternative way of creating the vector of operations is to define instances of an auxiliary typeclass for each field type, and then use the hcpure function:
class Mergeable a where
mergeFunc :: a -> a -> a
instance Mergeable String where
mergeFunc = (++)
instance Mergeable Int where
mergeFunc = (+)
mergePerson :: Person -> Person -> Person
mergePerson = merge (hcpure (Proxy :: Proxy Mergeable) (fn_2' mergeFunc))
The hcliftA2 function (that combines hcpure, fn_2 and hap) could be used to simplify things further.
Some suggestions:
(1) You can use the RecordWildCards extension to automatically
unpack a record into variables. Doesn't help if you need to unpack
two records of the same type, but it's a useful to keep in mind.
Oliver Charles has a nice blog post on it: (link)
(2) It appears your example application is performing a fold over the records.
Have a look at Gabriel Gonzalez's foldl package. There is also a blog post: (link)
Here is a example of how you might use it with a record like:
data Foo = Foo { _a :: Int, _b :: String }
The following code computes the maximum of the _a fields and the
concatenation of the _b_ fields.
import qualified Control.Foldl as L
import Data.Profunctor
data Foo = Foo { _a :: Int, _b :: String }
deriving (Show)
fold_a :: L.Fold Foo Int
fold_a = lmap _a (L.Fold max 0 id)
fold_b :: L.Fold Foo String
fold_b = lmap _b (L.Fold (++) "" id)
fold_foos :: L.Fold Foo Foo
fold_foos = Foo <$> fold_a <*> fold_b
theFoos = [ Foo 1 "a", Foo 3 "b", Foo 2 "c" ]
test = L.fold fold_foos theFoos
Note the use of the Profunctor function lmap to extract out
the fields we want to fold over. The expression:
L.Fold max 0 id
is a fold over a list of Ints (or any Num instance), and therefore:
lmap _a (L.Fold max 0 id)
is the same fold but over a list of Foo records where we use _a
to produce the Ints.

Combining patterns

Consider the following data type and pattern synonyms:
{-# LANGUAGE PatternSynonyms, NamedFieldPuns #-}
data Foo = Foo {
a :: Int
, b :: String
, c :: Maybe Bool
}
pattern Bar a b <- Foo { a, b }
pattern Baz c <- Foo { c }
I'd like to match a Foo but get all of a, b, and c. Something like this (invalid Haskell):
showit :: Foo -> String
showit (Bar a b & Baz c) = show a ++ b ++ show c
One option is to use a ViewPattern:
dup :: a -> (a, a)
dup a = (a, a)
showall' :: Foo -> String
showall' (dup -> (Bar a b, Baz c)) = show a ++ b ++ show c
But this results in a non-exhaustive match warning. But we know that Bar and Baz are irrefutable so matching each is also irrefutable.
How can this be expressed without a compiler warning?
The motivation for this is to have fine-grained pattern synonyms against fields of a large data type and allow callers to extract just the desired fields similar to records with NamedFieldPuns. Pattern synonyms don't yet support record syntax, but it's in the works : https://ghc.haskell.org/trac/ghc/ticket/8582
In my case I cannot expose the constructor from the module since I'm using the "smart-constructor" pattern, and hence cannot give callers the benefit of record pattern matching with NamedFieldPuns.
See https://stackoverflow.com/a/25161749/3198498 as the inspiration. I'm trying to expand the idea in that answer to allow callers to arbitrarily extract n of m fields, for a largish m.
Edit: It turns out there is a rather broad problem with PatternSynonyms and exhaustiveness checks : https://ghc.haskell.org/trac/ghc/ticket/10339
This seems to make pattern synonyms used as field extractors very unpleasant when compiling with warnings enabled.
Not sure if this is at all helpful, but I will give it a shot. Would either of these solutions be acceptable?
showit :: Foo -> String
showit x#(Bar a b) = show a ++ b ++ show (c x)
showit' :: Foo -> String
showit' x#(Bar a b) = show a ++ b ++ showthat x
where
showthat (Baz c) = show c

data type with a default field and that needs a function that works with it

Say, I have a data type
data FooBar a = Foo String Char [a]
| Bar String Int [a]
I need to create values of this type and give empty list as the second field:
Foo "hello" 'a' []
or
Bar "world" 1 []
1) I do this everywhere in my code and I think it would be nice if I could omit the empty list part somehow and have the empty list assigned implicitly. Is this possible? Something similar to default function arguments in other languages.
2) Because of this [] "default" value, I often need to have a partial constructor application that results in a function that takes the first two values:
mkFoo x y = Foo x y []
mkBar x y = Bar x y []
Is there a "better" (more idiomatic, etc) way to do it? to avoid defining new functions?
3) I need a way to add things to the list:
add (Foo u v xs) x = Foo u v (x:xs)
add (Bar u v xs) x = Bar u v (x:xs)
Is this how it is done idiomatically? Just a general purpose function?
As you see I am a beginner, so maybe these questions make little sense. Hope not.
I'll address your questions one by one.
Default arguments do not exist in Haskell. They are simply not worth the added complexity and loss of compositionally. Being a functional language, you do a lot more function manipulation in Haskell, so funkiness like default arguments would be tough to handle.
One thing I didn't realize when I started Haskell is that data constructors are functions just like everything else. In your example,
Foo :: String -> Char -> [a] -> FooBar a
Thus you can write functions for filling in various arguments of other functions, and then those functions will work with Foo or Bar or whatever.
fill1 :: a -> (a -> b) -> b
fill1 a f = f a
--Note that fill1 = flip ($)
fill2 :: b -> (a -> b -> c) -> (a -> c)
--Equivalently, fill2 :: b -> (a -> b -> c) -> a -> c
fill2 b f = \a -> f a b
fill3 :: c -> (a -> b -> c -> d) -> (a -> b -> d)
fill3 c f = \a b -> f a b c
fill3Empty :: (a -> b -> [c] -> d) -> (a -> b -> d)
fill3Empty f = fill3 [] f
--Now, we can write
> fill3Empty Foo x y
Foo x y []
The lens package provides elegant solutions to questions like this. However, you can tell at a glance that this package is enormously complicated. Here is the net result of how you would call the lens package:
_list :: Lens (FooBar a) (FooBar b) [a] [b]
_list = lens getter setter
where getter (Foo _ _ as) = as
getter (Bar _ _ as) = as
setter (Foo s c _) bs = Foo s c bs
setter (Bar s i _) bs = Bar s i bs
Now we can do
> over _list (3:) (Foo "ab" 'c' [2,1])
Foo "ab" 'c' [3,2,1]
Some explanation: the lens function produces a Lens type when given a getter and a setter for some type. Lens s t a b is a type that says "s holds an a and t holds a b. Thus, if you give me a function a -> b, I can give you a function s -> t". That is exactly what over does: you provide it a lens and a function (in our case, (3:) was a function that adds 3 to the front of a List) and it applies the function "where the lens indicates". This is very similar to a functor, however, we have significantly more freedom (in this example, the functor instance would be obligated to change every element of the lists, not operate on the lists themselves).
Note that our new _list lens is very generic: it works equally well over Foo and Bar and the lens package provides many functions other than over for doing magical things.
The idiomatic thing is to take those parameters of a function or constructor that you commonly want to partially apply, and move them toward the beginning:
data FooBar a = Foo [a] String Char
| Bar [a] String Int
foo :: String -> Char -> FooBar a
foo = Foo []
bar :: String -> Int -> FooBar a
bar = Bar []
Similarly, reordering the parameters to add lets you partially apply add to get functions of type FooBar a -> FooBar a, which can be easily composed:
add :: a -> FooBar a -> FooBar a
add x (Foo xs u v) = Foo (x:xs) u v
add123 :: FooBar Int -> FooBar Int
add123 = add 1 . add 2 . add 3
add123 (foo "bar" 42) == Foo [1, 2, 3] "bar" 42
(2) and (3) are perfectly normal and idiomatic ways of doing such things. About (2) in particular, one expression you will occasionally hear is "smart constructor". That just means a function like your mkFoo/mkBar that produces a FooBar a (or a Maybe (FooBar a) etc.) with some extra logic to ensure only reasonable values can be constructed.
Here are some additional tricks that might (or might not!) make sense, depending on what you are trying to do with FooBar.
If you use Foo values and Barvalues in similar ways most of the time (i.e. the difference between having the Char field and the Int one is a minor detail), it makes sense to factor out the similarities and use a single constructor:
data FooBar a = FooBar String FooBarTag [a]
data FooBarTag = Foo Char | Bar Int
Beyond avoiding case analysis when you don't care about the FooBarTag, that allows you to safely use record syntax (records and types with multiple constructors do not mix well).
data FooBar a = FooBar
{ fooBarName :: String
, fooBarTag :: FooBarTag
, fooBarList :: [a]
}
Records allow you to use the fields without having to pattern match the whole thing.
If there are sensible defaults for all fields in a FooBar, you can go one step beyond mkFoo-like constructors and define a default value.
defaultFooBar :: FooBar a
defaultFooBar = FooBar
{ fooBarName = ""
, fooBarTag = Bar 0
, fooBarList = []
}
You don't need records to use a default, but they allow overriding default fields conveniently.
myFooBar = defaultFooBar
{ fooBarTag = Foo 'x'
}
If you ever get tired of typing long names for the defaults over and over, consider the data-default package:
instance Default (FooBar a) where
def = defaultFooBar
myFooBar = def { fooBarTag = Foo 'x' }
Do note that a significant number of people do not like the Default class, and not without reason. Still, for types which are very specific to your application (e.g. configuration settings) Default is perfectly fine IMO.
Finally, updating record fields can be messy. If you end up annoyed by that, you will find lens very useful. Note that it is a big library, and it might be a little overwhelming to a beginner, so take a deep breath beforehand. Here is a small sample:
{-# LANGUAGE TemplateHaskell #-} -- At the top of the file. Needed for makeLenses.
import Control.Lens
-- Note the underscores.
-- If you are going to use lenses, it is sensible not to export the field names.
data FooBar a = FooBar
{ _fooBarName :: String
, _fooBarTag :: FooBarTag
, _fooBarList :: [a]
}
makeLenses ''FooBar -- Defines lenses for the fields automatically.
defaultFooBar :: FooBar a
defaultFooBar = FooBar
{ _fooBarName = ""
, _fooBarTag = Bar 0
, _fooBarList = []
}
-- Using a lens (fooBarTag) to set a field without record syntax.
-- Note the lack of underscores in the name of the lens.
myFooBar = set fooBarTag (Foo 'x') defaultFooBar
-- Using a lens to access a field.
myTag = view fooBarTag myFooBar -- Results in Foo 'x'
-- Using a lens (fooBarList) to modify a field.
add :: a -> FooBar a -> FooBar a
add x fb = over fooBarList (x :) fb
-- set, view and over have operator equivalents, (.~). (^.) and (%~) respectively.
-- Note that (^.) is flipped with respect to view.
Here is a gentle introduction to lens which focuses on aspects I have not demonstrated here, specially in how nicely lenses can be composed.

Checking for a particular data constructor

Let's say that I defined my own data-Type like
data MyData = A arg| B arg2| C arg3
How would I write a function (for instance: isMyDataType) that checks wether the given argument is one out of the particular types in MyData and successively returns a boolean (True or False) , e.g. typing in Ghci:
isMyDataType B returns True and isMyDataType Int returns False.
I believe you want functions to test for particular constructors:
isA :: MyData -> Bool
isB :: MyData -> Bool
If so, then you can write these yourself or derive them. The implementation would look like:
isA (A _) = True
isA _ = False
isB (B _) = True
isB _ = False
To derive them automatically, just use the derive library and add, in your source code:
{-# LANGUAGE TemplateHaskell #-}
import Data.DeriveTH
data MyData = ...
deriving (Eq, Ord, Show}
derive makeIs ''MyData
-- Older GHCs require more syntax: $( derive makeIs ''MyData)
Also note: your data declaration is invalid, the name must be capitalized, MyData instead of myData.
Finally, this whole answer is based on the assumption you want to test constructors, not data types as you said (which are statically checked at compile time, as Tarrasch said).
Haskell always checks that the types makes sense. The compiler would complain immediately if you wrote isMyDataType 4, because 4 is not of type MyData, it's of type Int.
I'm not sure this is what you asked for, but either way I strongly suggest for you to try out what you've asked here in practice, so you can see for yourself. Most important is that you check out type signatures in haskell, it is key for learning haskell.
You can use Maybes. You can create a set of functions that check for each of the types
getA, getB, getC :: MyData a -> Maybe a
getA x = case x of {(A v) -> Just v; _ -> Nothing}
getB x = case x of {(B v) -> Just v; _ -> Nothing}
getC x = case x of {(C v) -> Just v; _ -> Nothing}
This affords some practical idioms for certain tasks:
allAs :: [MyData a] -> [a]
allAs xs = mapMaybe getA xs
printIfA :: Show a => MyData a -> IO ()
printIfA x = maybe (return ()) print $ getA x

Resources