What is an elegant idiom for a lexicographic Ord instance? - haskell

This code works, but it's verbose, and I'm sure there's a more concise way.
import qualified Data.Vector as V
data Event a = Event { start :: Time
, duration :: Time
, payload :: Maybe a } deriving (Show, Eq)
instance Ord a => Ord (Event a) where
(<=) a b = start a < start b
|| start a == start b && duration a < duration b
|| start a == start b && duration a == duration b && payload a <= payload b
The idea behind it is that if one thing starts before the other, you should call it the lesser, and don't even look at the other two fields. Similarly, if they start at the same time but one is more brief, then that briefer one is the lesser, and you can ignore the third field.

Use deriving:
data Event a = Event { start :: Time
, duration :: Time
, payload :: Maybe a } deriving (Show, Eq, Ord)
The derived instance is automatically lexicographic.

As #HTNW notes, the automatically derived Ord instance will work. In more general situations, if you need to sort lexicographically on multiple items each with existing Ord instances, you can use the automatic tuple Ord instance:
instance Ord a => Ord (Event a) where
(<=) a b = order a <= order b
where order x = (start x, duration x, payload x)

As HTNW suggests, use deriving if you can. If you can't (e.g. the record fields aren't in the appropriate order, or you aren't actually writing an instance), a very nice option is doing it in terms of compare (or comparing, which adds extra convenience in your case) rather than (<=) and then taking advantage of the Monoid instance for Ordering, which amounts to lexicographic order:
import Data.Ord (comparing)
import Data.Monoid ((<>))
instance Ord a => Ord (Event a) where
compare a b = comparing start a b
<> comparing duration a b
<> comparing payload a b
As functions have a Monoid instance which acts on the results, you might take it even further:
instance Ord a => Ord (Event a) where
compare = comparing start <> comparing duration <> comparing payload

Related

Haskell - create instance of class (how to do it right?)

I read the chapter about that topic in "learn you a haskell" and tried to find some hints on different websites - but are still unable to solve the following task.
Im a haskell newbie (6 weeks of "experience") and its the first time I have to work with instances.
So here is the task, my code has to pass the HUnit tests and the end. I tried to implement the instances but it seems like I´ve missed something there. Hope you can help me! THX
module SemiGroup where
{-
A type class 'SemiGroup' is given. It has exactly one method: a binary operation
called '(<>)'. Also a data type 'Tree' a newtype 'Sum' and a newtype 'Max' are
given. Make them instances of the 'SemiGroup' class.
The 'Tree' instance should build a 'Branch' of the given left and right side.
The 'Sum' instance should take the sum of its given left and right side. You need
a 'Num' constraint for that.
The 'Max' instance should take the maximum of its given left and right side. You
also need a constraint for that but you have to figure out yourself which one.
This module is not going to compile until you add the missing instances.
-}
import Test.HUnit (runTestTT,Test(TestLabel,TestList),(~?=))
-- | A semigroup has a binary operation.
class SemiGroup a where
(<>) :: a -> a -> a
-- Leaf = Blatt, Branch = Ast
-- | A binary tree data type.
data Tree a = Leaf a
| Branch (Tree a) (Tree a)
deriving (Eq,Show)
-- | A newtype for taking the sum.
newtype Sum a = Sum {unSum :: a}
-- | A newtype for taking the maximum.
newtype Max a = Max {unMax :: a}
instance SemiGroup Tree where
(<>) x y = ((x) (y))
instance SemiGroup (Num Sum) where
(<>) x y = x+y
instance SemiGroup (Eq Max) where
(<>) x y = if x>y then x else y
-- | Tests the implementation of the 'SemiGroup' instances.
main :: IO ()
main = do
testresults <- runTestTT tests
print testresults
-- | List of tests for the 'SemiGroup' instances.
tests :: Test
tests = TestLabel "SemiGroupTests" (TestList [
Leaf "Hello" <> Leaf "Friend" ~?= Branch (Leaf "Hello") (Leaf "Friend"),
unSum (Sum 4 <> Sum 8) ~?= 12,
unMax (Max 8 <> Max 4) ~?= 8])
I tried something like:
class SemiGroup a where
(<>) :: a -> a -> a
-- Leaf = Blatt, Branch = Ast
-- | A binary tree data type.
data Tree a = Leaf a
| Branch (Tree a) (Tree a)
deriving (Eq,Show)
-- | A newtype for taking the sum.
newtype Sum a = Sum {unSum :: a}
-- | A newtype for taking the maximum.
newtype Max a = Max {unMax :: a}
instance SemiGroup Tree where
x <> y = Branch x y
instance Num a => SemiGroup (Sum a) where
x <> y = x+y
instance Eq a => SemiGroup (Max a) where
x <> y = if x>y then x else y
But there a still some failures left! At least the wrap/unwrap thing that "chi" mentioned. But I have no idea. maybe another hint ? :/
I fail to see how to turn Tree a into a semigroup (unless it has to be considered up-to something).
For the Sum a newtype, you need to require that a is of class Num. Then, you need to wrap/unwrap the Sum constructor around values so that: 1) you take two Sum a, 2) you convert them into two a, which is a proper type over which + is defined, 3) you sum them, 4) you turn the result back into a Sum a.
You can try to code the above yourself starting from
instance Num a => Semigroup (Sum a) where
x <> y = ... -- Here both x and y have type (Sum a)
The Max a instance will require a similar wrap/unwrap code.
A further hint: to unwrap a Sum a into an a you can use the function
unSum :: Sum a -> a
to wrap an a into a Sum a you can use instead
Sum :: a -> Sum a
Note that both functions Sum, unSum are already implicitly defined by your newtype declaration, so you do not have to define them (you already did).
Alternatively, you can use pattern matching to unwrap your values. Instead of defining
x <> y = ... -- x,y have type Sum a (they are wrapped)
you can write
Sum x <> Sum y = ... -- x,y have type a (they are unwrapped)
Pay attention to the types. Either manually, or with some help from GHCi, figure out the type of the functions you are writing -- you'll find they don't match the types that the typeclass instance needs. You'll use wrapping and unwrapping to adjust the types until they work.

Set-Like data structure that maintains insertion Order?

The properties I'm looking for are
initially maintains insertion order
transversing in the insertion order
and of course maintain that each element is unique
But there are cases where It's okay to disregard insertion order, such as...
retrieving a difference between two different sets
performing a union the two sets eliminating any duplicates
Java's LinkedHashSet seems to be exactly what I'm after, except for the fact it's not written in Haskell.
current & initial solution
The easiest (and a relevantly inefficient) solution is to implement it as a list and transform it into a set when I need too, but I believe there is likely a better way.
other ideas
My first idea was to implement it as a Data.Set of a newtype of (Int, a) where it would be ordered by the first tuple index, and the second index (a) being the actual value. I quickly realised this wasn't going to work because as the set would allow for duplicates of the type a, which would have defeated the whole purpose of using a set.
simultaneously maintaining a list and a set? (nope)
Another Idea I had was have an abstract data type that would maintain both a list and set representation of the data, which doesn't sound to efficient either.
recap
Are there any descent implementations of such a data structure in Haskell? I've seen Data.List.Ordered but it seems to just add set operations to lists, which sounds terribly inefficient as well (but likely what I'll settle with if I can't find a solution). Another solution suggested here, was to implement it via finger tree, but I would prefer to not reimplement it if it's already a solved problem.
You can certainly use Data.Set with what is isomorphic to (Int, a), but wrapped in a newtype with different a Eq instance:
newtype Entry a = Entry { unEntry :: (Int, a) } deriving (Show)
instance Eq a => Eq (Entry a) where
(Entry (_, a)) == (Entry (_, b)) = a == b
instance Ord a => Ord (Entry a) where
compare (Entry (_, a)) (Entry (_, b)) = compare a b
But this won't quite solve all your problems if you want automatic incrementing of your index, so you could make a wrapper around (Set (Entry a), Int):
newtype IndexedSet a = IndexedSet (Set (Entry a), Int) deriving (Eq, Show)
But this does mean that you'll have to re-implement Data.Set to respect this relationship:
import qualified Data.Set as S
import Data.Set (Set)
import Data.Ord (comparing)
import Data.List (sortBy)
-- declarations from above...
null :: IndexedSet a -> Bool
null (IndexedSet (set, _)) = S.null set
-- | If you re-index on deletions then size will just be the associated index
size :: IndexedSet a -> Int
size (IndexedSet (set, _)) = S.size set
-- Remember that (0, a) == (n, a) for all n
member :: Ord a => a -> IndexedSet a -> Bool
member a (IndexedSet (set, _)) = S.member (Entry (0, a)) set
empty :: IndexedSet a
empty = IndexedSet (S.empty, 0)
-- | This function is critical, you have to make sure to increment the index
-- Might also want to consider making it strict in the i field for performance
insert :: Ord a => a -> IndexedSet a -> IndexedSet a
insert a (IndexedSet (set, i)) = IndexedSet (S.insert (Entry (i, a)) set, i + 1)
-- | Simply remove the `Entry` wrapper, sort by the indices, then strip those off
toList :: IndexedSet a -> [a]
toList (IndexedSet (set, _))
= map snd
$ sortBy (comparing fst)
$ map unEntry
$ S.toList set
But this is fairly trivial in most cases and you can add functionality as you need it. The only thing you'll need to really worry about is what to do in deletions. Do you re-index everything or are you just concerned about order? If you're just concerned about order, then it's simple (and size can be left sub-optimal by having to actually calculate the size of the underlying Set), but if you re-index then you can get your size in O(1) time. These sorts of decisions should be decided based on what problem you're trying to solve.
I would prefer to not reimplement it if it's already a solved problem.
This approach is definitely a re-implementation. But it isn't complicated in most of the cases, could be pretty easily turned into a nice little library to upload to Hackage, and retains a lot of the benefits of sets without much bookkeeping.

Inconsistent Eq and Ord instances?

I have a large Haskell program which is running dismayingly slow. Profiling and testing has revealed that a large fraction of the time is spend comparing equality and ordering of a particular large datatype that is very important. Equality is a useful operation (this is state-space search, and graph search is much preferable to tree search), but I only need an Ord instance for this class in order to use Maps. So what I want to do is say
instance Eq BigThing where
(==) b b' = name b == name b' &&
firstPart b == firstPart b' &&
secondPart b == secondPart b' &&
{- ...and so on... -}
instance Ord BigThing where
compare b b' = compare (name b) (name b')
But since the names may not always be different for different objects, this risks the curious case where two BigThings may be inequal according to ==, but comparing them yields EQ.
Is this going to cause problems with Haskell libraries? Is there another way I could satisfy the requirement for a detailed equality operation but a cheap ordering?
First, using Text or ByteString instead of String could help a lot without changing anything else.
Generally I wouldn't recommend creating an instance of Eq inconsistent with Ord. Libraries can rightfully depend on it, and you never know what kind of strange problems it can cause. (For example, are you sure that Map doesn't use the relationship between Eq and Ord?)
If you don't need the Eq instance at all, you can simply define
instance Eq BigThing where
x == y = compare x y == EQ
Then equality will be consistent with comparison. There is no requirement that equal values must have all fields equal.
If you need an Eq instance that compares all fields, then you can stay consistent by wrapping BigThing into a newtype, define the above Eq and Ord for it, and use it in your algorithm whenever you need ordering according to name:
newtype BigThing' a b c = BigThing' (BigThing a b c)
instance Eq BigThing' where
x == y = compare x y == EQ
instance Ord BigThing' where
compare (BigThing b) (BigThing b') = compare (name b) (name b')
Update: Since you say any ordering is acceptable, you can use hashing to your advantage. For this, you can use the hashable package. The idea is that you pre-compute hash values on data creation and use them when comparing values. If two values are different, it's almost sure that their hashes will differ and you the compare only their hashes (two integers), nothing more. It could look like this:
module BigThing
( BigThing()
, bigThing
, btHash, btName, btSurname
)
where
import Data.Hashable
data BigThing = BigThing { btHash :: Int,
btName :: String,
btSurname :: String } -- etc
deriving (Eq, Ord)
-- Since the derived Eq/Ord instances compare fields lexicographically and
-- btHash is the first, they'll compare the hash first and continue with the
-- other fields only if the hashes are equal.
-- See http://www.haskell.org/onlinereport/derived.html#sect10.1
--
-- Alternativelly, you can create similar Eq/Ord instances yourself, if for any
-- reason you don't want the hash to be the first field.
-- A smart constructor for creating instances. Your module will not export the
-- BigThing constructor, it will export this function instead:
bigThing :: String -> String -> BigThing
bigThing nm snm = BigThing (hash (nm, snm)) nm snm
Note that with this solution, the ordering will be seemingly random with no apparent relation to the fields.
You can also combine this solution with the previous ones. Or, you can create a small module for wrapping any type with its precomputed hash (wrapped values must have Eq instances consistent with their Hashable instances).
module HashOrd
( Hashed()
, getHashed
, hashedHash
)
where
import Data.Hashable
data Hashed a = Hashed { hashedHash :: Int, getHashed :: a }
deriving (Ord, Eq, Show, Read, Bounded)
hashed :: (Hashable a) => a -> Hashed a
hashed x = Hashed (hash x) x
instance Hashable a => Hashable (Hashed a) where
hashWithSalt salt (Hashed _ x) = hashWithSalt salt x

Sort by constructor ignoring (part of) value

Suppose I have
data Foo = A String Int | B Int
I want to take an xs :: [Foo] and sort it such that all the As are at the beginning, sorted by their strings, but with the ints in the order they appeared in the list, and then have all the Bs at the end, in the same order they appeared.
In particular, I want to create a new list containg the first A of each string and the first B.
I did this by defining a function taking Foos to (Int, String)s and using sortBy and groupBy.
Is there a cleaner way to do this? Preferably one that generalizes to at least 10 constructors.
Typeable, maybe? Something else that's nicer?
EDIT: This is used for processing a list of Foos that is used elsewhere. There is already an Ord instance which is the normal ordering.
You can use
sortBy (comparing foo)
where foo is a function that extracts the interesting parts into something comparable (e.g. Ints).
In the example, since you want the As sorted by their Strings, a mapping to Int with the desired properties would be too complicated, so we use a compound target type.
foo (A s _) = (0,s)
foo (B _) = (1,"")
would be a possible helper. This is more or less equivalent to Tikhon Jelvis' suggestion, but it leaves space for the natural Ord instance.
To make it easier to build comparison function for ADTs with large number of constructors, you can map values to their constructor index with SYB:
{-# LANGUAGE DeriveDataTypeable #-}
import Data.Generics
data Foo = A String Int | B Int deriving (Show, Eq, Typeable, Data)
cIndex :: Data a => a -> Int
cIndex = constrIndex . toConstr
Example:
*Main Data.Generics> cIndex $ A "foo" 42
1
*Main Data.Generics> cIndex $ B 0
2
Edit:After re-reading your question, I think the best option is to make Foo an instance of Ord. I do not think there is any way to do this automatically that will act the way you want (just using deriving will create different behavior).
Once Foo is an instance of Ord, you can just use sort from Data.List.
In your exact example, you can do something like this:
data Foo = A String Int | B Int deriving (Eq)
instance Ord Foo where
(A _ _) <= (B _) = True
(A s _) <= (A s' _) = s <= s'
(B _) <= (B _) = True
When something is an instance of Ord, it means the data type has some ordering. Once we know how to order something, we can use a bunch of existing functions (like sort) on it and it will behave how you want. Anything in Ord has to be part of Eq, which is what the deriving (Eq) bit does automatically.
You can also derive Ord. However, the behavior will not be exactly what you want--it will order by all of the fields if it has to (e.g. it will put As with the same string in order by their integers).
Further edit: I was thinking about it some more and realized my solution is probably semantically wrong.
An Ord instance is a statement about your whole data type. For example, I'm saying that Bs are always equal with each other when the derived Eq instance says otherwise.
If the data your representing always behaves like this (that is, Bs are all equal and As with the same string are all equal) then an Ord instance makes sense. Otherwise, you should not actually do this.
However, you can do something almost exactly like this: write your own special compare function (Foo -> Foo -> Ordering) that encapsulates exactly what you want to do then use sortBy. This properly codifies that your particular sorting is special rather than the natural ordering of the data type.
You could use some template haskell to fill in the missing transitive cases. The mkTransitiveLt creates the transitive closure of the given cases (if you order them least to greatest). This gives you a working less-than, which can be turned into a function that returns an Ordering.
{-# LANGUAGE TemplateHaskell #-}
import MkTransitiveLt
import Data.List (sortBy)
data Foo = A String Int | B Int | C | D | E deriving(Show)
cmp a b = $(mkTransitiveLt [|
case (a, b) of
(A _ _, B _) -> True
(B _, C) -> True
(C, D) -> True
(D, E) -> True
(A s _, A s' _) -> s < s'
otherwise -> False|])
lt2Ord f a b =
case (f a b, f b a) of
(True, _) -> LT
(_, True) -> GT
otherwise -> EQ
main = print $ sortBy (lt2Ord cmp) [A "Z" 1, A "A" 1, B 1, A "A" 0, C]
Generates:
[A "A" 1,A "A" 0,A "Z" 1,B 1,C]
mkTransitiveLt must be defined in a separate module:
module MkTransitiveLt (mkTransitiveLt)
where
import Language.Haskell.TH
mkTransitiveLt :: ExpQ -> ExpQ
mkTransitiveLt eq = do
CaseE e ms <- eq
return . CaseE e . reverse . foldl go [] $ ms
where
go ms m#(Match (TupP [a, b]) body decls) = (m:ms) ++
[Match (TupP [x, b]) body decls | Match (TupP [x, y]) _ _ <- ms, y == a]
go ms m = m:ms

Haskell newbie on types

I'm completely new to Haskell (and more generally to functional programming), so forgive me if this is really basic stuff. To get more than a taste, I try to implement in Haskell some algorithmic stuff I'm working on. I have a simple module Interval that implements intervals on the line. It contains the type
data Interval t = Interval t t
the helper function
makeInterval :: (Ord t) => t -> t -> Interval t
makeInterval l r | l <= r = Interval l r
| otherwise = error "bad interval"
and some utility functions about intervals.
Here, my interest lies in multidimensional intervals (d-intervals), those objects that are composed of d intervals. I want to separately consider d-intervals that are the union of d disjoint intervals on the line (multiple interval) from those that are the union of d interval on d separate lines (track interval). With distinct algorithmic treatments in mind, I think it would be nice to have two distinct types (even if both are lists of intervals here) such as
import qualified Interval as I
-- Multilple interval
newtype MInterval t = MInterval [I.Interval t]
-- Track interval
newtype TInterval t = TInterval [I.Interval t]
to allow for distinct sanity checks, e.g.
makeMInterval :: (Ord t) => [I.Interval t] -> MInterval t
makeMInterval is = if foldr (&&) True [I.precedes i i' | (i, i') <- zip is (tail is)]
then (MInterval is)
else error "bad multiple interval"
makeTInterval :: (Ord t) => [I.Interval t] -> TInterval t
makeTInterval = TInterval
I now get to the point, at last! But some functions are naturally concerned with both multiple intervals and track intervals. For example, a function order would return the number of intervals in a multiple interval or a track interval. What can I do? Adding
-- Dimensional interval
data DInterval t = MIntervalStuff (MInterval t) | TIntervalStuff (TInterval t)
does not help much, since, if I understand well (correct me if I'm wrong), I would have to write
order :: DInterval t -> Int
order (MIntervalStuff (MInterval is)) = length is
order (TIntervalStuff (TInterval is)) = length is
and call order as order (MIntervalStuff is) or order (TIntervalStuff is) when is is a MInterval or a TInterval. Not that great, it looks odd. Neither I want to duplicate the function (I have many functions that are concerned with both multiple and track intevals, and some other d-interval definitions such as equal length multiple and track intervals).
I'm left with the feeling that I'm completely wrong and have missed some important point about types in Haskell (and/or can't forget enough here about OO programming). So, quite a newbie question, what would be the best way in Haskell to deal with such a situation? Do I have to forget about introducing MInterval and TInterval and go with one type only?
Thanks a lot for your help,
Garulfo
Edit: this is the same idea as sclv's answer; his links provide more information on this technique.
What about this approach?
data MInterval = MInterval --multiple interval
data TInterval = TInterval --track interval
data DInterval s t = DInterval [I.Interval t]
makeMInterval :: (Ord t) => [I.Interval t] -> Maybe (DInterval MInterval t)
makeMInterval is = if foldr (&&) True [I.precedes i i' | (i, i') <- zip is (tail is)]
then Just (DInterval is)
else Nothing
order :: DInterval s t -> Int
order (DInterval is) = length is
equalOrder :: DInterval s1 t -> DInterval s2 t -> Bool
equalOrder i1 i2 = order i1 == order i2
addToMInterval :: DInterval MInterval t -> Interval t -> Maybe (DInterval MInterval t)
addToMInterval = ..
Here the type DInterval represents multi-dimension intervals, but it takes an extra type parameter as a phantom type. This extra type information allows the typechecker to differentiate between different kinds of intervals even though they have exactly the same representation.
You get the type safety of your original design, but all your structures share the same implementation. Crucially, when the type of the interval doesn't matter, you can leave it unspecified.
I also changed the implementation of your makeMInterval function; returning a Maybe for a function like this is much more idiomatic than calling error.
More explanation on Maybe:
Let's examine your function makeInterval. This function is supposed to take a list of Interval's, and if they meet the criteria return a multiple interval, otherwise a track interval. This explanation leads to the type:
makeInterval :: (Ord t) =>
[I.Interval t] ->
Either (DInterval TInterval t) (DInterval MInterval t)
Now that we have the type, how can we implement it? We'd like to re-use our makeMInterval function.
makeInterval is = maybe
(Left $ DInterval TInterval is)
Right
(makeMInterval is)
The function maybe takes three arguments: a default b to use if the Maybe is Nothing, a function a -> b if the Maybe is Just a, and a Maybe a. It returns either the default or the result of applying the function to the Maybe value.
Our default is the track interval, so we create a Left track interval for the first argument. If the maybe is Just (DInterval MInterval t), the multiple interval already exists so all that's necessary is to stick it into the Right side of the either. Finally, makeMInterval is used to create the multiple interval.
What you want are types with the same structure and common operations, but which may be distinguished, and for which certain operations only make sense on one or another type. The idiom for this in Haskell is Phantom Types.
http://www.haskell.org/haskellwiki/Phantom_type
http://www.haskell.org/haskellwiki/Wrapper_types
http://blog.malde.org/index.php/2009/05/14/using-a-phantom-type-to-label-different-kinds-of-sequences/
http://neilmitchell.blogspot.com/2007/04/phantom-types-for-real-problems.html
What you want is a typeclass. Typeclasses is how overloading is done in Haskell. A type class is a common structure that is shared by many types. In your case you want to create a class of all types that have a order function:
class Dimensional a where
order :: a -> Int
This is saying there's a class of types a called Dimensional, and for each type belonging to this class there's a function a -> Int called order. Now you need to declare all your types as an belonging to the class:
instance Dimensional MInterval where
order (MInterval is) = length is
and so on. You can also declare functions that act on things that are from a given class, this way:
isOneDimensional :: (Dimensional a) => a -> Bool
isOneDimensional interval = (order interval == 1)
The type declaration can be read "for all types a that belong to typeclass Dimensional, this functions takes a and returns a Bool".
I like to think about classes as algebraic structures in mathematics (it's not quite the same thing as you can't enforce some logical constraints, but...): you state what are the requirements for a type to belong to some class (in the analogy, the operations must be defined over a set for it to belong to a certain algebraic construction), and then, for each specific type you state what is the specific implementation of the required functions (in the analogy, what are the specific functions for each specific set).
Take, for example, a group. Groups must have an identity, an inverse for each element and a multiplication:
class Group a where:
identity :: a
inverse :: a -> a
(<*>) :: a -> a -> a
and Integers form a group under addition:
instance Group Integer where:
identity = 0
inverse x = (- x)
x <*> y = x + y
Note that this will not solve the repetition, as you have to instantiate each type.
consider the use of typeclasses to show similarities between different types. See the haskell tutorial[1] about classes and overloading for help.
[1]: http://www.haskell.org/tutorial/classes.html haskell tutorial

Resources