I have a large Haskell program which is running dismayingly slow. Profiling and testing has revealed that a large fraction of the time is spend comparing equality and ordering of a particular large datatype that is very important. Equality is a useful operation (this is state-space search, and graph search is much preferable to tree search), but I only need an Ord instance for this class in order to use Maps. So what I want to do is say
instance Eq BigThing where
(==) b b' = name b == name b' &&
firstPart b == firstPart b' &&
secondPart b == secondPart b' &&
{- ...and so on... -}
instance Ord BigThing where
compare b b' = compare (name b) (name b')
But since the names may not always be different for different objects, this risks the curious case where two BigThings may be inequal according to ==, but comparing them yields EQ.
Is this going to cause problems with Haskell libraries? Is there another way I could satisfy the requirement for a detailed equality operation but a cheap ordering?
First, using Text or ByteString instead of String could help a lot without changing anything else.
Generally I wouldn't recommend creating an instance of Eq inconsistent with Ord. Libraries can rightfully depend on it, and you never know what kind of strange problems it can cause. (For example, are you sure that Map doesn't use the relationship between Eq and Ord?)
If you don't need the Eq instance at all, you can simply define
instance Eq BigThing where
x == y = compare x y == EQ
Then equality will be consistent with comparison. There is no requirement that equal values must have all fields equal.
If you need an Eq instance that compares all fields, then you can stay consistent by wrapping BigThing into a newtype, define the above Eq and Ord for it, and use it in your algorithm whenever you need ordering according to name:
newtype BigThing' a b c = BigThing' (BigThing a b c)
instance Eq BigThing' where
x == y = compare x y == EQ
instance Ord BigThing' where
compare (BigThing b) (BigThing b') = compare (name b) (name b')
Update: Since you say any ordering is acceptable, you can use hashing to your advantage. For this, you can use the hashable package. The idea is that you pre-compute hash values on data creation and use them when comparing values. If two values are different, it's almost sure that their hashes will differ and you the compare only their hashes (two integers), nothing more. It could look like this:
module BigThing
( BigThing()
, bigThing
, btHash, btName, btSurname
)
where
import Data.Hashable
data BigThing = BigThing { btHash :: Int,
btName :: String,
btSurname :: String } -- etc
deriving (Eq, Ord)
-- Since the derived Eq/Ord instances compare fields lexicographically and
-- btHash is the first, they'll compare the hash first and continue with the
-- other fields only if the hashes are equal.
-- See http://www.haskell.org/onlinereport/derived.html#sect10.1
--
-- Alternativelly, you can create similar Eq/Ord instances yourself, if for any
-- reason you don't want the hash to be the first field.
-- A smart constructor for creating instances. Your module will not export the
-- BigThing constructor, it will export this function instead:
bigThing :: String -> String -> BigThing
bigThing nm snm = BigThing (hash (nm, snm)) nm snm
Note that with this solution, the ordering will be seemingly random with no apparent relation to the fields.
You can also combine this solution with the previous ones. Or, you can create a small module for wrapping any type with its precomputed hash (wrapped values must have Eq instances consistent with their Hashable instances).
module HashOrd
( Hashed()
, getHashed
, hashedHash
)
where
import Data.Hashable
data Hashed a = Hashed { hashedHash :: Int, getHashed :: a }
deriving (Ord, Eq, Show, Read, Bounded)
hashed :: (Hashable a) => a -> Hashed a
hashed x = Hashed (hash x) x
instance Hashable a => Hashable (Hashed a) where
hashWithSalt salt (Hashed _ x) = hashWithSalt salt x
Related
I am trying to build a lattice-type of FCA-type of data structure in Haskell where I could check if two entities have a join or not. Actually, I'm not even sure that the lattice is the right structure as it might be a bit "too much".
Here is the context. In a COBOL program analyzer, I have an ontology of data set names, files, records and fields. A data set name can have multiple file names depending on the program, a file can have multiple records and a record can have multiple fields. I'd like this hierarchy to be reflected in the Haskell data structure. But I'd like also to be able to have a relation inherited for file1 and file2 such that I can check if file1 and file2 belong to the same data set name. Actually, this relation could almost be that of "==". But it could simply be that they do have a join in dsn0, for instance.
I have other ontologies in that context that would benefit from a lattice or FCA data structure. For example, I have programs that belongs to job steps and job steps that belong to jobs. If I could easily figure out if two programs belong to the same job, that would be great. Here also, it seems like a "join" operator. Getting the extension (code) of a certain entity would be useful too.
I'm still a bit new to Haskell. I tried to look at the Lattice library but I'm not sure where to go from there, concretely. Any idea of how to get started? A small example of a lattice in Haskell would be very helpful. Thank you very much for your help (and patience).
UPDATE:
The Lattice might not be the best formalism for this as mentioned in the comments. I realize I might just have to use a regular class type of data structure along those lines:
data DSN = DSN {
programFiles :: [ProgramFile]
name :: String
ddn :: DDN
}
data ProgramFile = ProgramFile {
records :: [Record]
name :: String
}
data Record = Record {
fields :: [Field]
name :: String
}
data Field = Field {
name :: String
order :: Int
}
I guess my initial intention behind using a tree/lattice/FCA type of structure is to take full advantage of the functor potential in Haskell which should lead to interesting lattice operations including geting all the extension of a concept, checking that two concepts belong to the same higher-level concept, checking equality '==' of two files through their DSN,...
Maybe a non-binary tree structure would be better? Is it easy to do that in Haskell?
I recommend making an abstract data type to represent one-to-many relationships. It might look something like this:
module OneToMany (OMRel, empty, insert, delete, source, targets) where
import Data.Map (Map)
import Data.Set (Set)
import qualified Data.Map.Strict as M
import qualified Data.Set as S
data OMRel a b = OMRel
{ oneToMany :: Map a (Set b)
, manyToOne :: Map b a
} deriving (Eq, Ord, Read, Show)
empty :: OMRel a b
empty = OMRel M.empty M.empty
insert :: (Ord a, Ord b) => a -> b -> OMRel a b -> OMRel a b
insert a b (OMRel otm mto) = OMRel
{ oneToMany = M.insertWith S.union a (S.singleton b) $
case M.lookup b mto of
Just oldA -> M.adjust (S.delete b) oldA otm
Nothing -> otm
, manyToOne = M.insert b a mto
}
delete :: (Ord a, Ord b) => a -> b -> OMRel a b -> OMRel a b
delete a b (OMRel otm mto) = OMRel (M.adjust (S.delete b) a otm) (M.delete b mto)
source :: Ord b => b -> OMRel a b -> Maybe a
source b = M.lookup b . manyToOne
targets :: Ord a => a -> OMRel a b -> Set b
targets a = M.findWithDefault S.empty a . oneToMany
(Of course you can flesh out the API with more efficient bulk operations like merging, bulk insert, sequential composition, etc. But this is sort of the minimal construct/consume API that gets you where you need to go.)
Then you need a couple data types to represent your various ontological entries:
newtype Dataset = Dataset { dataset :: String }
newtype Record = Record { record :: String }
newtype Field = Field { order :: Int }
From there you can use values with types like OMRel Dataset FilePath to represent the fact that datasets "contain" files. For querying containment equality, you can write this once and for all via the OMRel API above:
sameSource :: (Eq a, Ord b) => OMRel a b -> b -> b -> Bool
sameSource rel b b' = source b rel == source b' rel
(You may need an extra clause if two missing targets should be considered inequal.) Then, e.g., this can be specialized to
sameSource :: OMRel Dataset FilePath -> FilePath -> FilePath -> Bool
and friends.
You will not be able to make a Functor (or Bifunctor) instance for OMRel because of the Ord constraints. It's not clear to me that fmap/bimap make a TON of sense for this particular data structure, though. For example, if we have associations x :: a <-> y :: b and x' :: a <-> y' :: b, and f b = f b', then should fmap f associate f b with a or a'? If they do turn out to have a sensible interpretation and be useful, you could either pursue the constrained-monads approach or simply offer a function named bimap from the OneToMany module with the appropriate type, but without making it an instance method.
This code works, but it's verbose, and I'm sure there's a more concise way.
import qualified Data.Vector as V
data Event a = Event { start :: Time
, duration :: Time
, payload :: Maybe a } deriving (Show, Eq)
instance Ord a => Ord (Event a) where
(<=) a b = start a < start b
|| start a == start b && duration a < duration b
|| start a == start b && duration a == duration b && payload a <= payload b
The idea behind it is that if one thing starts before the other, you should call it the lesser, and don't even look at the other two fields. Similarly, if they start at the same time but one is more brief, then that briefer one is the lesser, and you can ignore the third field.
Use deriving:
data Event a = Event { start :: Time
, duration :: Time
, payload :: Maybe a } deriving (Show, Eq, Ord)
The derived instance is automatically lexicographic.
As #HTNW notes, the automatically derived Ord instance will work. In more general situations, if you need to sort lexicographically on multiple items each with existing Ord instances, you can use the automatic tuple Ord instance:
instance Ord a => Ord (Event a) where
(<=) a b = order a <= order b
where order x = (start x, duration x, payload x)
As HTNW suggests, use deriving if you can. If you can't (e.g. the record fields aren't in the appropriate order, or you aren't actually writing an instance), a very nice option is doing it in terms of compare (or comparing, which adds extra convenience in your case) rather than (<=) and then taking advantage of the Monoid instance for Ordering, which amounts to lexicographic order:
import Data.Ord (comparing)
import Data.Monoid ((<>))
instance Ord a => Ord (Event a) where
compare a b = comparing start a b
<> comparing duration a b
<> comparing payload a b
As functions have a Monoid instance which acts on the results, you might take it even further:
instance Ord a => Ord (Event a) where
compare = comparing start <> comparing duration <> comparing payload
Given the following types:
import Data.Set as Set
-- http://json.org/
type Key = String
data Json = JObject Key (Set JValue)
| JArray JArr
deriving Show
data JObj = JObj Key JValue
deriving Show
data JArr = Arr [JValue] deriving Show
data Null = Null deriving Show
data JValue = Num Double
| S String
| B Bool
| J JObj
| Array JArr
| N Null
deriving Show
I created a JObject Key (Set Value) with a single element:
ghci> JObject "foo" (Set.singleton (B True))
JObject "foo" (fromList [B True])
But, when I tried to create a 2-element Set, I got a compile-time error:
ghci> JObject "foo" (Set.insert (Num 5.5) $ Set.singleton (B True))
<interactive>:159:16:
No instance for (Ord JValue) arising from a use of ‘insert’
In the expression: insert (Num 5.5)
In the second argument of ‘JObject’, namely
‘(insert (Num 5.5) $ singleton (B True))’
In the expression:
JObject "foo" (insert (Num 5.5) $ singleton (B True))
So I asked, "Why is it necessary for JValue to implement the Ord typeclass?"
The docs on Data.Set answer that question.
The implementation of Set is based on size balanced binary trees (or trees of bounded balance)
But, is there a Set-like, i.e. non-ordered, data structure that does not require Ord's implementation that I can use?
You will pretty much always need at least Eq to implement a set (or at least the ability to write an Eq instance, whether or not one exists). Having only Eq will give you a horrifyingly inefficient one. You can improve this with Ord or with Hashable.
One thing you might want to do here is use a trie, which will let you take advantage of the nested structure instead of constantly fighting it.
You can start by looking at generic-trie. This does not appear to offer anything for your Array pieces, so you may have to add some things.
Why Eq is not good enough
The simplest way to implement a set is using a list:
type Set a = [a]
member a [] = False
member (x:xs) | a == x = True
| otherwise = member a xs
insert a xs | member a xs = xs
| otherwise = a:xs
This is no good (unless there are very few elements), because you may have to traverse the entire list to see if something is a member.
To improve matters, we need to use some sort of tree:
data Set a = Node a (Set a) (Set a) | Tip
There are a lot of different kinds of trees we can make, but in order to use them, we must be able, at each node, to decide which of the branches to take. If we only have Eq, there is no way to choose the right one. If we have Ord (or Hashable), that gives us a way to choose.
The trie approach structures the tree based on the structure of the data. When your type is deeply nested (a list of arrays of records of lists...), either hashing or comparison can be very expensive, so the trie will probably be better.
Side note on Ord
Although I don't think you should use the Ord approach here, it very often is the right one. In some cases, your particular type may not have a natural ordering, but there is some efficient way to order its elements. In this case you can play a trick with newtype:
newtype WrappedThing = Wrap Thing
instance Ord WrappedThing where
....
newtype ThingSet = ThingSet (Set WrappedThing)
insertThing thing (ThingSet s) = ThingSet (insert (Wrap thing) s)
memberThing thing (ThingSet s) = member (WrapThing) s
...
Yet another approach, in some cases, is to define a "base type" that is an Ord instance, but only export a newtype wrapper around it; you can use the base type for all your internal functions, but the exported type is completely abstract (and not an Ord instance).
I know this question has been asked and answered lots of times but I still don't really understand why putting constraints on a data type is a bad thing.
For example, let's take Data.Map k a. All of the useful functions involving a Map need an Ord k constraint. So there is an implicit constraint on the definition of Data.Map. Why is it better to keep it implicit instead of letting the compiler and programmers know that Data.Map needs an orderable key.
Also, specifying a final type in a type declaration is something common, and one can see it as a way of "super" constraining a data type.
For example, I can write
data User = User { name :: String }
and that's acceptable. However is that not a constrained version of
data User' s = User' { name :: s }
After all 99% of the functions I'll write for the User type don't need a String and the few which will would probably only need s to be IsString and Show.
So, why is the lax version of User considered bad:
data (IsString s, Show s, ...) => User'' { name :: s }
while both User and User' are considered good?
I'm asking this, because lots of the time, I feel I'm unnecessarily narrowing my data (or even function) definitions, just to not have to propagate constraints.
Update
As far as I understand, data type constraints only apply to the constructor and don't propagate. So my question is then, why do data type constraints not work as expected (and propagate)? It's an extension anyway, so why not have a new extension doing data properly, if it was considered useful by the community?
TL;DR:
Use GADTs to provide implicit data contexts.
Don't use any kind of data constraint if you could do with Functor instances etc.
Map's too old to change to a GADT anyway.
Scroll to the bottom if you want to see the User implementation with GADTs
Let's use a case study of a Bag where all we care about is how many times something is in it. (Like an unordered sequence. We nearly always need an Eq constraint to do anything useful with it.
I'll use the inefficient list implementation so as not to muddy the waters over the Data.Map issue.
GADTs - the solution to the data constraint "problem"
The easy way to do what you're after is to use a GADT:
Notice below how the Eq constraint not only forces you to use types with an Eq instance when making GADTBags, it provides that instance implicitly wherever the GADTBag constructor appears. That's why count doesn't need an Eq context, whereas countV2 does - it doesn't use the constructor:
{-# LANGUAGE GADTs #-}
data GADTBag a where
GADTBag :: Eq a => [a] -> GADTBag a
unGADTBag (GADTBag xs) = xs
instance Show a => Show (GADTBag a) where
showsPrec i (GADTBag xs) = showParen (i>9) (("GADTBag " ++ show xs) ++)
count :: a -> GADTBag a -> Int -- no Eq here
count a (GADTBag xs) = length.filter (==a) $ xs -- but == here
countV2 a = length.filter (==a).unGADTBag
size :: GADTBag a -> Int
size (GADTBag xs) = length xs
ghci> count 'l' (GADTBag "Hello")
2
ghci> :t countV2
countV2 :: Eq a => a -> GADTBag a -> Int
Now we didn't need the Eq constraint when we found the total size of the bag, but it didn't clutter up our definition anyway. (We could have used size = length . unGADTBag just as well.)
Now lets make a functor:
instance Functor GADTBag where
fmap f (GADTBag xs) = GADTBag (map f xs)
oops!
DataConstraints_so.lhs:49:30:
Could not deduce (Eq b) arising from a use of `GADTBag'
from the context (Eq a)
That's unfixable (with the standard Functor class) because I can't restrict the type of fmap, but need to for the new list.
Data Constraint version
Can we do as you asked? Well, yes, except that you have to keep repeating the Eq constraint wherever you use the constructor:
{-# LANGUAGE DatatypeContexts #-}
data Eq a => EqBag a = EqBag {unEqBag :: [a]}
deriving Show
count' a (EqBag xs) = length.filter (==a) $ xs
size' (EqBag xs) = length xs -- Note: doesn't use (==) at all
Let's go to ghci to find out some less pretty things:
ghci> :so DataConstraints
DataConstraints_so.lhs:1:19: Warning:
-XDatatypeContexts is deprecated: It was widely considered a misfeature,
and has been removed from the Haskell language.
[1 of 1] Compiling Main ( DataConstraints_so.lhs, interpreted )
Ok, modules loaded: Main.
ghci> :t count
count :: a -> GADTBag a -> Int
ghci> :t count'
count' :: Eq a => a -> EqBag a -> Int
ghci> :t size
size :: GADTBag a -> Int
ghci> :t size'
size' :: Eq a => EqBag a -> Int
ghci>
So our EqBag count' function requires an Eq constraint, which I think is perfectly reasonable, but our size' function also requires one, which is less pretty. This is because the type of the EqBag constructor is EqBag :: Eq a => [a] -> EqBag a, and this constraint must be added every time.
We can't make a functor here either:
instance Functor EqBag where
fmap f (EqBag xs) = EqBag (map f xs)
for exactly the same reason as with the GADTBag
Constraintless bags
data ListBag a = ListBag {unListBag :: [a]}
deriving Show
count'' a = length . filter (==a) . unListBag
size'' = length . unListBag
instance Functor ListBag where
fmap f (ListBag xs) = ListBag (map f xs)
Now the types of count'' and show'' are exactly as we expect, and we can use standard constructor classes like Functor:
ghci> :t count''
count'' :: Eq a => a -> ListBag a -> Int
ghci> :t size''
size'' :: ListBag a -> Int
ghci> fmap (Data.Char.ord) (ListBag "hello")
ListBag {unListBag = [104,101,108,108,111]}
ghci>
Comparison and conclusions
The GADTs version automagically propogates the Eq constraint everywhere the constructor is used. The type checker can rely on there being an Eq instance, because you can't use the constructor for a non-Eq type.
The DatatypeContexts version forces the programmer to manually propogate the Eq constraint, which is fine by me if you want it, but is deprecated because it doesn't give you anything more than the GADT one does and was seen by many as pointless and annoying.
The unconstrained version is good because it doesn't prevent you from making Functor, Monad etc instances. The constraints are written exactly when they're needed, no more or less. Data.Map uses the unconstrained version partly because unconstrained is generally seen as most flexible, but also partly because it predates GADTs by some margin, and there needs to be a compelling reason to potentially break existing code.
What about your excellent User example?
I think that's a great example of a one-purpose data type that benefits from a constraint on the type, and I'd advise you to use a GADT to implement it.
(That said, sometimes I have a one-purpose data type and end up making it unconstrainedly polymorphic just because I love to use Functor (and Applicative), and would rather use fmap than mapBag because I feel it's clearer.)
{-# LANGUAGE GADTs #-}
import Data.String
data User s where
User :: (IsString s, Show s) => s -> User s
name :: User s -> s
name (User s) = s
instance Show (User s) where -- cool, no Show context
showsPrec i (User s) = showParen (i>9) (("User " ++ show s) ++)
instance (IsString s, Show s) => IsString (User s) where
fromString = User . fromString
Notice since fromString does construct a value of type User a, we need the context explicitly. After all, we composed with the constructor User :: (IsString s, Show s) => s -> User s. The User constructor removes the need for an explicit context when we pattern match (destruct), becuase it already enforced the constraint when we used it as a constructor.
We didn't need the Show context in the Show instance because we used (User s) on the left hand side in a pattern match.
Constraints
The problem is that constraints are not a property of the data type, but of the algorithm/function that operates on them. Different functions might need different and unique constraints.
A Box example
As an example, let's assume we want to create a container called Box which contains only 2 values.
data Box a = Box a a
We want it to:
be showable
allow the sorting of the two elements via sort
Does it make sense to apply the constraint of both Ord and Show on the data type? No, because the data type in itself could be only shown or only sorted and therefore the constraints are related to its use, not it's definition.
instance (Show a) => Show (Box a) where
show (Box a b) = concat ["'", show a, ", ", show b, "'"]
instance (Ord a) => Ord (Box a) where
compare (Box a b) (Box c d) =
let ca = compare a c
cb = compare b d
in if ca /= EQ then ca else cb
The Data.Map case
Data.Map's Ord constraints on the type is really needed only when we have > 1 elements in the container. Otherwise the container is usable even without an Ord key. For example, this algorithm:
transf :: Map NonOrd Int -> Map NonOrd Int
transf x =
if Map.null x
then Map.singleton NonOrdA 1
else x
Live demo
works just fine without the Ord constraint and always produce a non empty map.
Using DataTypeContexts reduces the number of programs you can write. If most of those illegal programs are nonsense, you might say it's worth the runtime cost associated with ghc passing in a type class dictionary that isn't used. For example, if we had
data Ord k => MapDTC k a
then #jefffrey's transf is rejected. But we should probably have transf _ = return (NonOrdA, 1) instead.
In some sense the context is documentation that says "every Map must have ordered keys". If you look at all of the functions in Data.Map you'll get a similar conclusion "every useful Map has ordered keys". While you can create maps with unordered keys using
mapKeysMonotonic :: (k1 -> k2) -> Map k1 a -> Map k2 a
singleton :: k2 a -> Map k2 a
But the moment you try to do anything useful with them, you'll wind up with No instance for Ord k2 somewhat later.
The properties I'm looking for are
initially maintains insertion order
transversing in the insertion order
and of course maintain that each element is unique
But there are cases where It's okay to disregard insertion order, such as...
retrieving a difference between two different sets
performing a union the two sets eliminating any duplicates
Java's LinkedHashSet seems to be exactly what I'm after, except for the fact it's not written in Haskell.
current & initial solution
The easiest (and a relevantly inefficient) solution is to implement it as a list and transform it into a set when I need too, but I believe there is likely a better way.
other ideas
My first idea was to implement it as a Data.Set of a newtype of (Int, a) where it would be ordered by the first tuple index, and the second index (a) being the actual value. I quickly realised this wasn't going to work because as the set would allow for duplicates of the type a, which would have defeated the whole purpose of using a set.
simultaneously maintaining a list and a set? (nope)
Another Idea I had was have an abstract data type that would maintain both a list and set representation of the data, which doesn't sound to efficient either.
recap
Are there any descent implementations of such a data structure in Haskell? I've seen Data.List.Ordered but it seems to just add set operations to lists, which sounds terribly inefficient as well (but likely what I'll settle with if I can't find a solution). Another solution suggested here, was to implement it via finger tree, but I would prefer to not reimplement it if it's already a solved problem.
You can certainly use Data.Set with what is isomorphic to (Int, a), but wrapped in a newtype with different a Eq instance:
newtype Entry a = Entry { unEntry :: (Int, a) } deriving (Show)
instance Eq a => Eq (Entry a) where
(Entry (_, a)) == (Entry (_, b)) = a == b
instance Ord a => Ord (Entry a) where
compare (Entry (_, a)) (Entry (_, b)) = compare a b
But this won't quite solve all your problems if you want automatic incrementing of your index, so you could make a wrapper around (Set (Entry a), Int):
newtype IndexedSet a = IndexedSet (Set (Entry a), Int) deriving (Eq, Show)
But this does mean that you'll have to re-implement Data.Set to respect this relationship:
import qualified Data.Set as S
import Data.Set (Set)
import Data.Ord (comparing)
import Data.List (sortBy)
-- declarations from above...
null :: IndexedSet a -> Bool
null (IndexedSet (set, _)) = S.null set
-- | If you re-index on deletions then size will just be the associated index
size :: IndexedSet a -> Int
size (IndexedSet (set, _)) = S.size set
-- Remember that (0, a) == (n, a) for all n
member :: Ord a => a -> IndexedSet a -> Bool
member a (IndexedSet (set, _)) = S.member (Entry (0, a)) set
empty :: IndexedSet a
empty = IndexedSet (S.empty, 0)
-- | This function is critical, you have to make sure to increment the index
-- Might also want to consider making it strict in the i field for performance
insert :: Ord a => a -> IndexedSet a -> IndexedSet a
insert a (IndexedSet (set, i)) = IndexedSet (S.insert (Entry (i, a)) set, i + 1)
-- | Simply remove the `Entry` wrapper, sort by the indices, then strip those off
toList :: IndexedSet a -> [a]
toList (IndexedSet (set, _))
= map snd
$ sortBy (comparing fst)
$ map unEntry
$ S.toList set
But this is fairly trivial in most cases and you can add functionality as you need it. The only thing you'll need to really worry about is what to do in deletions. Do you re-index everything or are you just concerned about order? If you're just concerned about order, then it's simple (and size can be left sub-optimal by having to actually calculate the size of the underlying Set), but if you re-index then you can get your size in O(1) time. These sorts of decisions should be decided based on what problem you're trying to solve.
I would prefer to not reimplement it if it's already a solved problem.
This approach is definitely a re-implementation. But it isn't complicated in most of the cases, could be pretty easily turned into a nice little library to upload to Hackage, and retains a lot of the benefits of sets without much bookkeeping.