Retaining list-ness operations for a data type - haskell

I want to be able to define a custom data type as opposed to using a type alias to ensure that proper values are being passed around, below is a sketch of how that might look,
module Example (fromList) where
import Data.Ord (comparing, Down(..))
import Data.List (sort)
data DictEntry = DictEntry (String, Integer) deriving (Show, Eq)
instance Ord DictEntry where
(DictEntry (word1, freq1)) `compare` (DictEntry (word2, freq2))
| freq1 == freq2 = word1 `compare` word2
| otherwise = comparing Down freq1 freq2
data Dictionary = Dictionary [DictEntry] deriving (Show)
fromList :: [(String, Integer)] -> Dictionary
fromList l = Dictionary $ sort $ map DictEntry l
However, I'd also like to retain the "list-ness" of the underlying type without having to unwrap and re-wrap [DictEntry], and without having to define utility functions such as head :: Dictionary -> DictEntry and tail :: Dictionary -> Dictionary. Is that possible? Is there some type class that I could define an instance of or a language extension that enables this?

Never use head and avoid using tail, for lists or else. These are unsafe and can always easily be replaced with pattern matching.
But yes, there is a typeclass that supports list-like operations, or rather multiple classes. The simplest of these is Monoid, which just implements concatenation and empty-initialisation. Foldable, allows you to deconstruct containers as if they were lists. Traversable additionally allows you to assemble them again as you go over the data.
The latter two won't quite work with Dictionary because it's not parametric on the contained type. You can circumvent that by switching to the “monomorphic version”.
However, I frankly don't think you should do any of this – just use the standard Map type to store key-value associative data, instead of rolling your own dictionary type.

Related

How to define an unordered collection in Haskell

Wondering how you would define an unordered group/collection in Haskell, where by "collection" I mean it can have many copies of the same element, and the items are unordered. I know of the List data type in Haskell, but this is inherently ordered. I would like to see what the definition would look like for an unordered collection/group/list.
I would define it this way
import qualified Data.Map.Lazy as Map
type MultiSet' a = Map.Map a Int
Just a mapping from a type a to an Int. In mathematics it would be something like f : S -> N. The elements you put into it must be ordable, that is because the underlying structure of the Map is defined by a binary tree. This shouldn't be a problem as you can forget about it when using the data structure. See the very extensive documentation of Data.Map for functions to deal with our MultiSet'.
Now there is already a definition together with implementation for this and it is called MultiSet. You can browse to its source code as well, there you see they defined it in an almost an identical way (they used the strict version of the map).
Alternatively you can use a hashmap, it will look like this:
import qualified Data.HashMap.Lazy as Map
type MultiSet'' a = Map.HashMap a Int
The elements you put into it do not need to be ordable, but hashable.
If you just want a structure that has no reasonable order then why not compose a Map with a hash?
type MyBag a = Map (Int,a) Int
insert x mp = Data.Map.insertWith (+) 1 (hash x, x) mp
The above is a balanced binary tree with an order that depends on the hash of the value you have inserted. The map itself is boring, along the lines of data Map k a = Bin (Map k a) a (Map k a) | Nil.
This said, I think you underspecified what you are looking for and what you hope to learn. Your searches have probably yielded hashtables and unordered-containers - why aren't those sufficiently informative?

Using different Ordering for Sets

I was reading a Chapter 2 of Purely Functional Data Structures, which talks about unordered sets implemented as binary search trees. The code is written in ML, and ends up showing a signature ORDERED and a functor UnbalancedSet(Element: ORDERED): SET. Coming from more of a C++ background, this makes sense to me; custom comparison function objects form part of the type and can be passed in at construction time, and this seems fairly analogous to the ML functor way of doing things.
When it comes to Haskell, it seems the behavior depends only on the Ord instance, so if I wanted to have a set that had its order reversed, it seems like I'd have to use a newtype instance, e.g.
newtype ReverseInt = ReverseInt Int deriving (Eq, Show)
instance Ord ReverseInt where
compare (ReverseInt a) (ReverseInt b)
| a == b = EQ
| a < b = GT
| a > b = LT
which I could then use in a set:
let x = Set.fromList $ map ReverseInt [1..5]
Is there any better way of doing this sort of thing that doesn't resort to using newtype to create a different Ord instance?
No, this is really the way to go. Yes, having a newtype is sometimes annoying but you get some big benefits:
When you see a Set a and you know a, you immediately know what type of comparison it uses (sort of the same way that purity makes code more readable by not making you have to trace execution). You don't have to know where that Set a comes from.
For many cases, you can coerce your way through multiple newtypes at once. For example, I can turn xs = [1,2,3] :: Int into ys = [ReverseInt 1, ReverseInt 2, ReverseInt 3] :: [ReverseInt] just using ys = coerce xs :: [ReverseInt]. Unfortunately, that isn't the case for Set (and it shouldn't - you'd need the coercion function to be monotonic to not screw up the data structure invariants, and there is not yet a way to express that in the type system).
newtypes end up being more composable than you expect. For example, the ReverseInt type you made already exists in a form that generalizes to reversing any type with an Ord constraint: it is called Down. To be explicit, you could use Down Int instead of ReversedInt, and you get the instance you wrote out for free!
Of course, if you still feel very strongly about this, nothing is stopping you from writing your version of Set which has to have a field which is the comparison function it uses. Something like
data Set a = Set { comparisionKey :: a -> a -> Ordering
, ...
}
Then, every time you make a Set, you would have to pass in the comparison key.

Set-like Data Structure without `Ord`?

Given the following types:
import Data.Set as Set
-- http://json.org/
type Key = String
data Json = JObject Key (Set JValue)
| JArray JArr
deriving Show
data JObj = JObj Key JValue
deriving Show
data JArr = Arr [JValue] deriving Show
data Null = Null deriving Show
data JValue = Num Double
| S String
| B Bool
| J JObj
| Array JArr
| N Null
deriving Show
I created a JObject Key (Set Value) with a single element:
ghci> JObject "foo" (Set.singleton (B True))
JObject "foo" (fromList [B True])
But, when I tried to create a 2-element Set, I got a compile-time error:
ghci> JObject "foo" (Set.insert (Num 5.5) $ Set.singleton (B True))
<interactive>:159:16:
No instance for (Ord JValue) arising from a use of ‘insert’
In the expression: insert (Num 5.5)
In the second argument of ‘JObject’, namely
‘(insert (Num 5.5) $ singleton (B True))’
In the expression:
JObject "foo" (insert (Num 5.5) $ singleton (B True))
So I asked, "Why is it necessary for JValue to implement the Ord typeclass?"
The docs on Data.Set answer that question.
The implementation of Set is based on size balanced binary trees (or trees of bounded balance)
But, is there a Set-like, i.e. non-ordered, data structure that does not require Ord's implementation that I can use?
You will pretty much always need at least Eq to implement a set (or at least the ability to write an Eq instance, whether or not one exists). Having only Eq will give you a horrifyingly inefficient one. You can improve this with Ord or with Hashable.
One thing you might want to do here is use a trie, which will let you take advantage of the nested structure instead of constantly fighting it.
You can start by looking at generic-trie. This does not appear to offer anything for your Array pieces, so you may have to add some things.
Why Eq is not good enough
The simplest way to implement a set is using a list:
type Set a = [a]
member a [] = False
member (x:xs) | a == x = True
| otherwise = member a xs
insert a xs | member a xs = xs
| otherwise = a:xs
This is no good (unless there are very few elements), because you may have to traverse the entire list to see if something is a member.
To improve matters, we need to use some sort of tree:
data Set a = Node a (Set a) (Set a) | Tip
There are a lot of different kinds of trees we can make, but in order to use them, we must be able, at each node, to decide which of the branches to take. If we only have Eq, there is no way to choose the right one. If we have Ord (or Hashable), that gives us a way to choose.
The trie approach structures the tree based on the structure of the data. When your type is deeply nested (a list of arrays of records of lists...), either hashing or comparison can be very expensive, so the trie will probably be better.
Side note on Ord
Although I don't think you should use the Ord approach here, it very often is the right one. In some cases, your particular type may not have a natural ordering, but there is some efficient way to order its elements. In this case you can play a trick with newtype:
newtype WrappedThing = Wrap Thing
instance Ord WrappedThing where
....
newtype ThingSet = ThingSet (Set WrappedThing)
insertThing thing (ThingSet s) = ThingSet (insert (Wrap thing) s)
memberThing thing (ThingSet s) = member (WrapThing) s
...
Yet another approach, in some cases, is to define a "base type" that is an Ord instance, but only export a newtype wrapper around it; you can use the base type for all your internal functions, but the exported type is completely abstract (and not an Ord instance).

How to return a polymorphic type in Haskell based on the results of string parsing?

TL;DR:
How can I write a function which is polymorphic in its return type? I'm working on an exercise where the task is to write a function which is capable of analyzing a String and, depending on its contents, generate either a Vector [Int], Vector [Char] or Vector [String].
Longer version:
Here are a few examples of how the intended function would behave:
The string "1 2\n3 4" would generate a Vector [Int] that's made up of two lists: [1,2] and [3,4].
The string "'t' 'i' 'c'\n't' 'a' 'c'\n't' 'o' 'e'" would generate a Vector [Char] (i.e., made up of the lists "tic", "tac" and "toe").
The string "\"hello\" \"world\"\n\"monad\" \"party\"" would generate a Vector [String] (i.e., ["hello","world"] and ["monad","party"]).
Error-checking/exception handling is not a concern for this particular exercise. At this stage, all testing is done purely, i.e., this isn't in the realm of the IO monad.
What I have so far:
I have a function (and new datatype) which is capable of classifying a string. I also have functions (one for each Int, Char and String) which can convert the string into the necessary Vector.
My question: how can I combine these three conversion functions into a single function?
What I've tried:
(It obviously doesn't typecheck if I stuff the three conversion
functions into a single function (i.e., using a case..of structure
to pattern match on VectorType of the string.
I tried making a Vectorable class and defining a separate instance for each type; I quickly realized that this approach only works if the functions' arguments vary by type. In our case, the the type of the argument doesn't vary (i.e., it's always a String).
My code:
A few comments
Parsing: the mySplitter object and the mySplit function handle the parsing. It's admittedly a crude parser based on the Splitter type and the split function from Data.List.Split.Internals.
Classifying: The classify function is capable of determining the final VectorType based on the string.
Converting: The toVectorNumber, toVectorChar and toVectorString functions are able to convert a string to type Vector [Int], Vector [Char] and Vector [String], respectively.
As a side note, I'm trying out CorePrelude based on a recommendation from a mentor. That's why you'll see me use the generalized versions of the normal Prelude functions.
Code:
import qualified Prelude
import CorePrelude
import Data.Foldable (concat, elem, any)
import Control.Monad (mfilter)
import Text.Read (read)
import Data.Char (isAlpha, isSpace)
import Data.List.Split (split)
import Data.List.Split.Internals (Splitter(..), DelimPolicy(..), CondensePolicy(..), EndPolicy(..), Delimiter(..))
import Data.Vector ()
import qualified Data.Vector as V
data VectorType = Number | Character | TextString deriving (Show)
mySplitter :: [Char] -> Splitter Char
mySplitter elts = Splitter { delimiter = Delimiter [(`elem` elts)]
, delimPolicy = Drop
, condensePolicy = Condense
, initBlankPolicy = DropBlank
, finalBlankPolicy = DropBlank }
mySplit :: [Char]-> [Char]-> [[Char]]
mySplit delims = split (mySplitter delims)
classify :: String -> VectorType
classify xs
| '\"' `elem` cs = TextString
| hasAlpha cs = Character
| otherwise = Number
where
cs = concat $ split (mySplitter "\n") xs
hasAlpha = any isAlpha . mfilter (/=' ')
toRows :: [Char] -> [[Char]]
toRows = mySplit "\n"
toVectorChar :: [Char] -> Vector [Char]
toVectorChar = let toChar = concat . mySplit " \'"
in V.fromList . fmap (toChar) . toRows
toVectorNumber :: [Char] -> Vector [Int]
toVectorNumber = let toNumber = fmap (\x -> read x :: Int) . mySplit " "
in V.fromList . fmap toNumber . toRows
toVectorString :: [Char] -> Vector [[Char]]
toVectorString = let toString = mfilter (/= " ") . mySplit "\""
in V.fromList . fmap toString . toRows
You can't.
Covariant polymorphism is not supported in Haskell, and wouldn't be useful if it were.
That's basically all there is to answer. Now as to why this is so.
It's no good "returning a polymorphic value" like OO languages so like to do, because the only reason to return any value at all is to use it in other functions. Now, in OO languages you don't have functions but methods that come with the object, so it's quite easy to "return different types": each will have its suitable methods built-in, and they can per instance vary. (Whether that's a good idea is another question.)
But in Haskell, the functions come from elsewhere. They don't know about implementation changes for a particular instance, so the only way such functions can safely be defined is to know every possible implementation. But if your return type is really polymorphic, that's not possible, because polymorphism is an "open" concept (it allows new implementation varieties to be added any time later).
Instead, Haskell has a very convenient and totally safe mechanism of describing a closed set of "instances" – you've actually used it yourself already! ADTs.
data PolyVector = NumbersVector (Vector [Int])
| CharsVector (Vector [Char])
| StringsVector (Vector [String])
That's the return type you want. The function won't be polymorphic as such, it'll simply return a more versatile type.
If you insist it should be polymorphic
Now... actually, Haskell does have a way to sort-of deal with "polymorphic returns". As in OO when you declare that you return a subclass of a specified class. Well, you can't "return a class" at all in Haskell, you can only return types. But those can be made to express "any instance of...". It's called existential quantification.
{-# LANGUAGE GADTs #-}
data PolyVector' where
PolyVector :: YourVElemClass e => Vector [e] -> PolyVector'
class YourVElemClass where
...?
instance YourVElemClass Int
instance YourVElemClass Char
instance YourVElemClass String
I don't know if that looks intriguing to you. Truth is, it's much more complicated and rather harder to use; you can't just just any of the possible results directly but can only make use of the elements through methods of YourVElemClass. GADTs can in some applications be extremely useful, but these usually involve classes with very deep mathematical motivation. YourVElemClass doesn't seem to have such a motivation, so you'll be much better off with a simple ADT alternative, than existential quantification.
There's a famous rant against existentials by Luke Palmer (note he uses another syntax, existential-specific, which I consider obsolete, as GADTs are strictly more general).
Easy, use an sum type!
data ParsedVector = NumberVector (Vector [Int]) | CharacterVector (Vector [Char]) | TextString (Vector [String]) deriving (Show)
parse :: [Char] -> ParsedVector
parse cs = case classify cs of
Number -> NumberVector $ toVectorNumber cs
Character -> CharacterVector $ toVectorChar cs
TextString -> TextStringVector $ toVectorString cs

Haskell Ord instance with a Set

I have some code that I would like to use to append an edge to a Node data structure:
import Data.Set (Set)
import qualified Data.Set as Set
data Node = Vertex String (Set Node)
deriving Show
addEdge :: Node -> Node -> Node
addEdge (Vertex name neighbors) destination
| Set.null neighbors = Vertex name (Set.singleton destination)
| otherwise = Vertex name (Set.insert destination neighbors)
However when I try to compile I get this error:
No instance for (Ord Node)
arising from a use of `Set.insert'
As far as I can tell, Set.insert expects nothing but a value and a set to insert it into. What is this Ord?
In GHCi:
> import Data.Set
> :t insert
insert :: (Ord a) => a -> Set a -> Set a
So yes, it does expect Ord. As for what Ord means, it's a type class for ordered values. It's required in this case because Data.Set uses a search tree, and so needs to be able to compare values to see which is larger or if they're equal.
Nearly all of the standard built-in data types are instances of Ord, as well as things like lists, tuples, Maybe, etc. being instances of Ord when their type parameter(s) are. The most notable exception, of course, are functions, where no sensible concept of ordering (or even equality) can be defined.
In many cases, you can automatically create instances of type classes for your own data types using a deriving clause after the declaration:
data Foo a = Foo a a Int deriving (Eq, Ord, Show, Read)
For parameterized types, the automatic derivation depends on the type parameter also being an instance, as is the case with lists, tuples, and such.
Besides Ord, some important type classes are Eq (equality comparisons, but not less/greater than), Enum (types you can enumerate values of, such as counting Integers), and Read/Show (simple serialization/deserialization with strings). To learn more about type classes, try this chapter in Real World Haskell or, for a more general overview, there's a Wikipedia article.
Haskell sets are based on a search tree. In order to put an element in a search tree an ordering over the elements must be given. You can derive Ord just like you are deriving Show by adding it to your data declaration, i.e.:
data Node = Vertex String (Set Node)
deriving (Show, Eq, Ord)
You can see the requirement of Ord by the signature of Data.Set.insert
(Ord a) => a -> Set a -> Set a
The part (Ord a) => establishes a constraint that there is an instance of the typeclass Ord for a. The section on type classes in the haskell tutorial gives a more thorough explanation.

Resources