Set-like Data Structure without `Ord`?

Set-like Data Structure without `Ord`? - haskell

Given the following types:
import Data.Set as Set
-- http://json.org/
type Key = String
data Json = JObject Key (Set JValue)
| JArray JArr
deriving Show
data JObj = JObj Key JValue
deriving Show
data JArr = Arr [JValue] deriving Show
data Null = Null deriving Show
data JValue = Num Double
| S String
| B Bool
| J JObj
| Array JArr
| N Null
deriving Show
I created a JObject Key (Set Value) with a single element:
ghci> JObject "foo" (Set.singleton (B True))
JObject "foo" (fromList [B True])
But, when I tried to create a 2-element Set, I got a compile-time error:
ghci> JObject "foo" (Set.insert (Num 5.5) $ Set.singleton (B True))
<interactive>:159:16:
No instance for (Ord JValue) arising from a use of ‘insert’
In the expression: insert (Num 5.5)
In the second argument of ‘JObject’, namely
‘(insert (Num 5.5) $ singleton (B True))’
In the expression:
JObject "foo" (insert (Num 5.5) $ singleton (B True))
So I asked, "Why is it necessary for JValue to implement the Ord typeclass?"
The docs on Data.Set answer that question.
The implementation of Set is based on size balanced binary trees (or trees of bounded balance)
But, is there a Set-like, i.e. non-ordered, data structure that does not require Ord's implementation that I can use?

You will pretty much always need at least Eq to implement a set (or at least the ability to write an Eq instance, whether or not one exists). Having only Eq will give you a horrifyingly inefficient one. You can improve this with Ord or with Hashable.
One thing you might want to do here is use a trie, which will let you take advantage of the nested structure instead of constantly fighting it.
You can start by looking at generic-trie. This does not appear to offer anything for your Array pieces, so you may have to add some things.
Why Eq is not good enough
The simplest way to implement a set is using a list:
type Set a = [a]
member a [] = False
member (x:xs) | a == x = True
| otherwise = member a xs
insert a xs | member a xs = xs
| otherwise = a:xs
This is no good (unless there are very few elements), because you may have to traverse the entire list to see if something is a member.
To improve matters, we need to use some sort of tree:
data Set a = Node a (Set a) (Set a) | Tip
There are a lot of different kinds of trees we can make, but in order to use them, we must be able, at each node, to decide which of the branches to take. If we only have Eq, there is no way to choose the right one. If we have Ord (or Hashable), that gives us a way to choose.
The trie approach structures the tree based on the structure of the data. When your type is deeply nested (a list of arrays of records of lists...), either hashing or comparison can be very expensive, so the trie will probably be better.
Side note on Ord
Although I don't think you should use the Ord approach here, it very often is the right one. In some cases, your particular type may not have a natural ordering, but there is some efficient way to order its elements. In this case you can play a trick with newtype:
newtype WrappedThing = Wrap Thing
instance Ord WrappedThing where
....
newtype ThingSet = ThingSet (Set WrappedThing)
insertThing thing (ThingSet s) = ThingSet (insert (Wrap thing) s)
memberThing thing (ThingSet s) = member (WrapThing) s
...
Yet another approach, in some cases, is to define a "base type" that is an Ord instance, but only export a newtype wrapper around it; you can use the base type for all your internal functions, but the exported type is completely abstract (and not an Ord instance).

Related

The simplest way to generically traverse a tree in haskell

Suppose I used language-javascript library to build AST in Haskell. The AST has nodes of different types, and each node can have fields of those different types.
And each type can have numerous constructors. (All the types instantiate Data, Eq and Show).
I would like to count each type's constructor occurrence in the tree. I could use toConstr to get the constructor, and ideally I'd make a Tree -> [Constr] function fisrt (then counting is easy).
There are different ways to do that. Obviously pattern matching is too verbose (imagine around 3 types with 9-28 constructors).
So I'd like to use a generic traversal, and I tried to find the solution in SYB library.
There is an everywhere function, which doesn't suit my needs since I don't need a Tree -> Tree transformation.
There is gmapQ, which seems suitable in terms of its type, but as it turns out it's not recursive.
The most viable option so far is everywhereM. It still does the useless transformation, but I can use a Writer to collect toConstr results. Still, this way doesn't really feel right.
Is there an alternative that will not perform a useless (for this task) transformation and still deliver the list of constructors? (The order of their appearance in the tree doesn't matter for now)

Not sure if it's the simplest, but:
> data T = L | B T T deriving Data
> everything (++) (const [] `extQ` (\x -> [toConstr (x::T)])) (B L (B (B L L) L))
[B,L,B,B,L,L,L]
Here ++ says how to combine the results from subterms.
const [] is the base case for subterms who are not of type T. For those of type T, instead, we apply \x -> [toConstr (x::T)].
If you have multiple tree types, you'll need to extend the query using
const [] `extQ` (handleType1) `extQ` (handleType2) `extQ` ...
This is needed to identify the types for which we want to take the constructors. If there are a lot of types, probably this can be made shorter in some way.
Note that the code above is not very efficient on large trees since using ++ in this way can lead to quadratic complexity. It would be better, performance wise, to return a Data.Map.Map Constr Int. (Even if we do need to define some Ord Constr for that)

universe from the Data.Generics.Uniplate.Data module can give you a list of all the sub-trees of the same type. So using Ilya's example:
data T = L | B T T deriving (Data, Show)
tree :: T
tree = B L (B (B L L) L)
λ> import Data.Generics.Uniplate.Data
λ> universe tree
[B L (B (B L L) L),L,B (B L L) L,B L L,L,L,L]
λ> fmap toConstr $ universe tree
[B,L,B,B,L,L,L]

The limit set of types with new data like `Tree a`

Exploring and studing type system in Haskell I've found some problems.
1) Let's consider polymorphic type as Binary Tree:
data Tree a = Leaf a | Branch (Tree a) (Tree a) deriving Show
And, for example, I want to limit my considerations only with Tree Int, Tree Bool and Tree Char. Of course, I can make a such new type:
data TreeIWant = T1 (Tree Int) | T2 (Tree Bool) | T3 (Tree Char) deriving Show
But could it possible to make new restricted type (for homogeneous trees) in more elegant (and without new tags like T1,T2,T3) way (perhaps with some advanced type extensions)?
2) Second question is about trees with heterogeneous values. I can do them with usual Haskell, i.e. I can do the new helping type, contained tagged heterogeneous values:
data HeteroValues = H1 Int | H2 Bool | H3 Char deriving Show
and then make tree with values of this type:
type TreeH = Tree HeteroValues
But could it possible to make new type (for heterogeneous trees) in more elegant (and without new tags like H1,H2,H3) way (perhaps with some advanced type extensions)?
I know about heterogeneous list, perhaps it is the same question?

For question #2, it's easy to construct a "restricted" heterogeneous type without explicit tags using a GADT and a type class:
{-# LANGUAGE GADTs #-}
data Thing where
T :: THING a => a -> Thing
class THING a
Now, declare THING instances for the the things you want to allow:
instance THING Int
instance THING Bool
instance THING Char
and you can create Things and lists (or trees) of Things:
> t1 = T 'a' -- Char is okay
> t2 = T "hello" -- but String is not
... type error ...
> tl = [T (42 :: Int), T True, T 'x']
> tt = Branch (Leaf (T 'x')) (Leaf (T False))
>
In terms of the type names in your question, you have:
type HeteroValues = Thing
type TreeH = Tree Thing
You can use the same type class with a new GADT for question #1:
data ThingTree where
TT :: THING a => Tree a -> ThingTree
and you have:
type TreeIWant = ThingTree
and you can do:
> tt1 = TT $ Branch (Leaf 'x') (Leaf 'y')
> tt2 = TT $ Branch (Leaf 'x') (Leaf False)
... type error ...
>
That's all well and good, until you try to use any of the values you've constructed. For example, if you wanted to write a function to extract a Bool from a possibly boolish Thing:
maybeBool :: Thing -> Maybe Bool
maybeBool (T x) = ...
you'd find yourself stuck here. Without a "tag" of some kind, there's no way of determining if x is a Bool, Int, or Char.
Actually, though, you do have an implicit tag available, namely the THING type class dictionary for x. So, you can write:
maybeBool :: Thing -> Maybe Bool
maybeBool (T x) = maybeBool' x
and then implement maybeBool' in your type class:
class THING a where
maybeBool' :: a -> Maybe Bool
instance THING Int where
maybeBool' _ = Nothing
instance THING Bool where
maybeBool' = Just
instance THING Char where
maybeBool' _ = Nothing
and you're golden!
Of course, if you'd used explicit tags:
data Thing = T_Int Int | T_Bool Bool | T_Char Char
then you could skip the type class and write:
maybeBool :: Thing -> Maybe Bool
maybeBool (T_Bool x) = Just x
maybeBool _ = Nothing
In the end, it turns out that the best Haskell representation of an algebraic sum of three types is just an algebraic sum of three types:
data Thing = T_Int Int | T_Bool Bool | T_Char Char
Trying to avoid the need for explicit tags will probably lead to a lot of inelegant boilerplate elsewhere.
Update: As #DanielWagner pointed out in a comment, you can use Data.Typeable in place of this boilerplate (effectively, have GHC generate a lot of boilerplate for you), so you can write:
import Data.Typeable
data Thing where
T :: THING a => a -> Thing
class Typeable a => THING a
instance THING Int
instance THING Bool
instance THING Char
maybeBool :: Thing -> Maybe Bool
maybeBool = cast
This perhaps seems "elegant" at first, but if you try this approach in real code, I think you'll regret losing the ability to pattern match on Thing constructors at usage sites (and so having to substitute chains of casts and/or comparisons of TypeReps).

Using different Ordering for Sets

I was reading a Chapter 2 of Purely Functional Data Structures, which talks about unordered sets implemented as binary search trees. The code is written in ML, and ends up showing a signature ORDERED and a functor UnbalancedSet(Element: ORDERED): SET. Coming from more of a C++ background, this makes sense to me; custom comparison function objects form part of the type and can be passed in at construction time, and this seems fairly analogous to the ML functor way of doing things.
When it comes to Haskell, it seems the behavior depends only on the Ord instance, so if I wanted to have a set that had its order reversed, it seems like I'd have to use a newtype instance, e.g.
newtype ReverseInt = ReverseInt Int deriving (Eq, Show)
instance Ord ReverseInt where
compare (ReverseInt a) (ReverseInt b)
| a == b = EQ
| a < b = GT
| a > b = LT
which I could then use in a set:
let x = Set.fromList $ map ReverseInt [1..5]
Is there any better way of doing this sort of thing that doesn't resort to using newtype to create a different Ord instance?

No, this is really the way to go. Yes, having a newtype is sometimes annoying but you get some big benefits:
When you see a Set a and you know a, you immediately know what type of comparison it uses (sort of the same way that purity makes code more readable by not making you have to trace execution). You don't have to know where that Set a comes from.
For many cases, you can coerce your way through multiple newtypes at once. For example, I can turn xs = [1,2,3] :: Int into ys = [ReverseInt 1, ReverseInt 2, ReverseInt 3] :: [ReverseInt] just using ys = coerce xs :: [ReverseInt]. Unfortunately, that isn't the case for Set (and it shouldn't - you'd need the coercion function to be monotonic to not screw up the data structure invariants, and there is not yet a way to express that in the type system).
newtypes end up being more composable than you expect. For example, the ReverseInt type you made already exists in a form that generalizes to reversing any type with an Ord constraint: it is called Down. To be explicit, you could use Down Int instead of ReversedInt, and you get the instance you wrote out for free!
Of course, if you still feel very strongly about this, nothing is stopping you from writing your version of Set which has to have a field which is the comparison function it uses. Something like
data Set a = Set { comparisionKey :: a -> a -> Ordering
, ...
}
Then, every time you make a Set, you would have to pass in the comparison key.

Set-Like data structure that maintains insertion Order?

The properties I'm looking for are
initially maintains insertion order
transversing in the insertion order
and of course maintain that each element is unique
But there are cases where It's okay to disregard insertion order, such as...
retrieving a difference between two different sets
performing a union the two sets eliminating any duplicates
Java's LinkedHashSet seems to be exactly what I'm after, except for the fact it's not written in Haskell.
current & initial solution
The easiest (and a relevantly inefficient) solution is to implement it as a list and transform it into a set when I need too, but I believe there is likely a better way.
other ideas
My first idea was to implement it as a Data.Set of a newtype of (Int, a) where it would be ordered by the first tuple index, and the second index (a) being the actual value. I quickly realised this wasn't going to work because as the set would allow for duplicates of the type a, which would have defeated the whole purpose of using a set.
simultaneously maintaining a list and a set? (nope)
Another Idea I had was have an abstract data type that would maintain both a list and set representation of the data, which doesn't sound to efficient either.
recap
Are there any descent implementations of such a data structure in Haskell? I've seen Data.List.Ordered but it seems to just add set operations to lists, which sounds terribly inefficient as well (but likely what I'll settle with if I can't find a solution). Another solution suggested here, was to implement it via finger tree, but I would prefer to not reimplement it if it's already a solved problem.

You can certainly use Data.Set with what is isomorphic to (Int, a), but wrapped in a newtype with different a Eq instance:
newtype Entry a = Entry { unEntry :: (Int, a) } deriving (Show)
instance Eq a => Eq (Entry a) where
(Entry (_, a)) == (Entry (_, b)) = a == b
instance Ord a => Ord (Entry a) where
compare (Entry (_, a)) (Entry (_, b)) = compare a b
But this won't quite solve all your problems if you want automatic incrementing of your index, so you could make a wrapper around (Set (Entry a), Int):
newtype IndexedSet a = IndexedSet (Set (Entry a), Int) deriving (Eq, Show)
But this does mean that you'll have to re-implement Data.Set to respect this relationship:
import qualified Data.Set as S
import Data.Set (Set)
import Data.Ord (comparing)
import Data.List (sortBy)
-- declarations from above...
null :: IndexedSet a -> Bool
null (IndexedSet (set, _)) = S.null set
-- | If you re-index on deletions then size will just be the associated index
size :: IndexedSet a -> Int
size (IndexedSet (set, _)) = S.size set
-- Remember that (0, a) == (n, a) for all n
member :: Ord a => a -> IndexedSet a -> Bool
member a (IndexedSet (set, _)) = S.member (Entry (0, a)) set
empty :: IndexedSet a
empty = IndexedSet (S.empty, 0)
-- | This function is critical, you have to make sure to increment the index
-- Might also want to consider making it strict in the i field for performance
insert :: Ord a => a -> IndexedSet a -> IndexedSet a
insert a (IndexedSet (set, i)) = IndexedSet (S.insert (Entry (i, a)) set, i + 1)
-- | Simply remove the `Entry` wrapper, sort by the indices, then strip those off
toList :: IndexedSet a -> [a]
toList (IndexedSet (set, _))
= map snd
$ sortBy (comparing fst)
$ map unEntry
$ S.toList set
But this is fairly trivial in most cases and you can add functionality as you need it. The only thing you'll need to really worry about is what to do in deletions. Do you re-index everything or are you just concerned about order? If you're just concerned about order, then it's simple (and size can be left sub-optimal by having to actually calculate the size of the underlying Set), but if you re-index then you can get your size in O(1) time. These sorts of decisions should be decided based on what problem you're trying to solve.
I would prefer to not reimplement it if it's already a solved problem.
This approach is definitely a re-implementation. But it isn't complicated in most of the cases, could be pretty easily turned into a nice little library to upload to Hackage, and retains a lot of the benefits of sets without much bookkeeping.

Inconsistent Eq and Ord instances?

I have a large Haskell program which is running dismayingly slow. Profiling and testing has revealed that a large fraction of the time is spend comparing equality and ordering of a particular large datatype that is very important. Equality is a useful operation (this is state-space search, and graph search is much preferable to tree search), but I only need an Ord instance for this class in order to use Maps. So what I want to do is say
instance Eq BigThing where
(==) b b' = name b == name b' &&
firstPart b == firstPart b' &&
secondPart b == secondPart b' &&
{- ...and so on... -}
instance Ord BigThing where
compare b b' = compare (name b) (name b')
But since the names may not always be different for different objects, this risks the curious case where two BigThings may be inequal according to ==, but comparing them yields EQ.
Is this going to cause problems with Haskell libraries? Is there another way I could satisfy the requirement for a detailed equality operation but a cheap ordering?

First, using Text or ByteString instead of String could help a lot without changing anything else.
Generally I wouldn't recommend creating an instance of Eq inconsistent with Ord. Libraries can rightfully depend on it, and you never know what kind of strange problems it can cause. (For example, are you sure that Map doesn't use the relationship between Eq and Ord?)
If you don't need the Eq instance at all, you can simply define
instance Eq BigThing where
x == y = compare x y == EQ
Then equality will be consistent with comparison. There is no requirement that equal values must have all fields equal.
If you need an Eq instance that compares all fields, then you can stay consistent by wrapping BigThing into a newtype, define the above Eq and Ord for it, and use it in your algorithm whenever you need ordering according to name:
newtype BigThing' a b c = BigThing' (BigThing a b c)
instance Eq BigThing' where
x == y = compare x y == EQ
instance Ord BigThing' where
compare (BigThing b) (BigThing b') = compare (name b) (name b')
Update: Since you say any ordering is acceptable, you can use hashing to your advantage. For this, you can use the hashable package. The idea is that you pre-compute hash values on data creation and use them when comparing values. If two values are different, it's almost sure that their hashes will differ and you the compare only their hashes (two integers), nothing more. It could look like this:
module BigThing
( BigThing()
, bigThing
, btHash, btName, btSurname
)
where
import Data.Hashable
data BigThing = BigThing { btHash :: Int,
btName :: String,
btSurname :: String } -- etc
deriving (Eq, Ord)
-- Since the derived Eq/Ord instances compare fields lexicographically and
-- btHash is the first, they'll compare the hash first and continue with the
-- other fields only if the hashes are equal.
-- See http://www.haskell.org/onlinereport/derived.html#sect10.1
--
-- Alternativelly, you can create similar Eq/Ord instances yourself, if for any
-- reason you don't want the hash to be the first field.
-- A smart constructor for creating instances. Your module will not export the
-- BigThing constructor, it will export this function instead:
bigThing :: String -> String -> BigThing
bigThing nm snm = BigThing (hash (nm, snm)) nm snm
Note that with this solution, the ordering will be seemingly random with no apparent relation to the fields.
You can also combine this solution with the previous ones. Or, you can create a small module for wrapping any type with its precomputed hash (wrapped values must have Eq instances consistent with their Hashable instances).
module HashOrd
( Hashed()
, getHashed
, hashedHash
)
where
import Data.Hashable
data Hashed a = Hashed { hashedHash :: Int, getHashed :: a }
deriving (Ord, Eq, Show, Read, Bounded)
hashed :: (Hashable a) => a -> Hashed a
hashed x = Hashed (hash x) x
instance Hashable a => Hashable (Hashed a) where
hashWithSalt salt (Hashed _ x) = hashWithSalt salt x

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string