Wondering how you would define an unordered group/collection in Haskell, where by "collection" I mean it can have many copies of the same element, and the items are unordered. I know of the List data type in Haskell, but this is inherently ordered. I would like to see what the definition would look like for an unordered collection/group/list.
I would define it this way
import qualified Data.Map.Lazy as Map
type MultiSet' a = Map.Map a Int
Just a mapping from a type a to an Int. In mathematics it would be something like f : S -> N. The elements you put into it must be ordable, that is because the underlying structure of the Map is defined by a binary tree. This shouldn't be a problem as you can forget about it when using the data structure. See the very extensive documentation of Data.Map for functions to deal with our MultiSet'.
Now there is already a definition together with implementation for this and it is called MultiSet. You can browse to its source code as well, there you see they defined it in an almost an identical way (they used the strict version of the map).
Alternatively you can use a hashmap, it will look like this:
import qualified Data.HashMap.Lazy as Map
type MultiSet'' a = Map.HashMap a Int
The elements you put into it do not need to be ordable, but hashable.
If you just want a structure that has no reasonable order then why not compose a Map with a hash?
type MyBag a = Map (Int,a) Int
insert x mp = Data.Map.insertWith (+) 1 (hash x, x) mp
The above is a balanced binary tree with an order that depends on the hash of the value you have inserted. The map itself is boring, along the lines of data Map k a = Bin (Map k a) a (Map k a) | Nil.
This said, I think you underspecified what you are looking for and what you hope to learn. Your searches have probably yielded hashtables and unordered-containers - why aren't those sufficiently informative?
Related
tl;dr: how do you write instances of Arbitrary that don't explode if your data type allows for way too much nesting? And how would you guarantee these instances produce truly random specimens of your data structure?
I want to generate random tree structures, then test certain properties of these structures after I've mangled them with my library code. (NB: I'm writing an implementation of a subtyping algorithm, i.e. given a hierarchy of types, is type A a subtype of type B. This can be made arbitrarily complex, by including multiple-inheritance and post-initialization updates to the hierarchy. The classical method that supports neither of these is Schubert Numbering, and the latest result known to me is Alavi et al. 2008.)
Let's take the example of rose-trees, following Data.Tree:
data Tree a = Node a (Forest a)
type Forest a = [Tree a]
A very simple (and don't-try-this-at-home) instance of Arbitray would be:
instance (Arbitrary a) => Arbitrary (Tree a) where
arbitrary = Node <$> arbitrary <$> arbitrary
Since a already has an Arbitrary instance as per the type constraint, and the Forest will have one, because [] is an instance, too, this seems straight-forward. It won't (typically) terminate for very obvious reasons: since the lists it generates are arbitrarily long, the structures become too large, and there's a good chance they won't fit into memory. Even a more conservative approach:
arbitrary = Node <$> arbitrary <*> oneof [arbitrary,return []]
won't work, again, for the same reason. One could tweak the size parameter, to keep the length of the lists down, but even that won't guarantee termination, since it's still multiple consecutive dice-rolls, and it can turn out quite badly (and I want the odd node with 100 children.)
Which means I need to limit the size of the entire tree. That is not so straight-forward. unordered-containers has it easy: just use fromList. This is not so easy here: How do you turn a list into a tree, randomly, and without incurring bias one way or the other (i.e. not favoring left-branches, or trees that are very left-leaning.)
Some sort of breadth-first construction (the functions provided by Data.Tree are all pre-order) from lists would be awesome, and I think I could write one, but it would turn out to be non-trivial. Since I'm using trees now, but will use even more complex stuff later on, I thought I might try to find a more general and less complex solution. Is there one, or will I have to resort to writing my own non-trivial Arbitrary generator? In the latter case, I might actually just resort to unit-tests, since this seems too much work.
Use sized:
instance Arbitrary a => Arbitrary (Tree a) where
arbitrary = sized arbTree
arbTree :: Arbitrary a => Int -> Gen (Tree a)
arbTree 0 = do
a <- arbitrary
return $ Node a []
arbTree n = do
(Positive m) <- arbitrary
let n' = n `div` (m + 1)
f <- replicateM m (arbTree n')
a <- arbitrary
return $ Node a f
(Adapted from the QuickCheck presentation).
P.S. Perhaps this will generate overly balanced trees...
You might want to use the library presented in the paper "Feat: Functional Enumeration of Algebraic Types" at the Haskell Symposium 2012. It is on Hackage as testing-feat, and a video of the talk introducing it is available here: http://www.youtube.com/watch?v=HbX7pxYXsHg
As Janis mentioned, you can use the package testing-feat, which creates enumerations of arbitrary algebraic data types. This is the easiest way to create unbiased uniformly distributed generators
for all trees of up to a given size.
Here is how you would use it for rose trees:
import Test.Feat (Enumerable(..), uniform, consts, funcurry)
import Test.Feat.Class (Constructor)
import Data.Tree (Tree(..))
import qualified Test.QuickCheck as QC
-- We make an enumerable instance by listing all constructors
-- for the type. In this case, we have one binary constructor:
-- Node :: a -> [Tree a] -> Tree a
instance Enumerable a => Enumerable (Tree a) where
enumerate = consts [binary Node]
where
binary :: (a -> b -> c) -> Constructor c
binary = unary . funcurry
-- Now we use the Enumerable instance to create an Arbitrary
-- instance with the help of the function:
-- uniform :: Enumerable a => Int -> QC.Gen a
instance Enumerable a => QC.Arbitrary (Tree a) where
QC.arbitrary = QC.sized uniform
-- QC.shrink = <some implementation>
The Enumerable instance can also be generated automatically with TemplateHaskell:
deriveEnumerable ''Tree
I want to be able to define a custom data type as opposed to using a type alias to ensure that proper values are being passed around, below is a sketch of how that might look,
module Example (fromList) where
import Data.Ord (comparing, Down(..))
import Data.List (sort)
data DictEntry = DictEntry (String, Integer) deriving (Show, Eq)
instance Ord DictEntry where
(DictEntry (word1, freq1)) `compare` (DictEntry (word2, freq2))
| freq1 == freq2 = word1 `compare` word2
| otherwise = comparing Down freq1 freq2
data Dictionary = Dictionary [DictEntry] deriving (Show)
fromList :: [(String, Integer)] -> Dictionary
fromList l = Dictionary $ sort $ map DictEntry l
However, I'd also like to retain the "list-ness" of the underlying type without having to unwrap and re-wrap [DictEntry], and without having to define utility functions such as head :: Dictionary -> DictEntry and tail :: Dictionary -> Dictionary. Is that possible? Is there some type class that I could define an instance of or a language extension that enables this?
Never use head and avoid using tail, for lists or else. These are unsafe and can always easily be replaced with pattern matching.
But yes, there is a typeclass that supports list-like operations, or rather multiple classes. The simplest of these is Monoid, which just implements concatenation and empty-initialisation. Foldable, allows you to deconstruct containers as if they were lists. Traversable additionally allows you to assemble them again as you go over the data.
The latter two won't quite work with Dictionary because it's not parametric on the contained type. You can circumvent that by switching to the “monomorphic version”.
However, I frankly don't think you should do any of this – just use the standard Map type to store key-value associative data, instead of rolling your own dictionary type.
{-# LANGUAGE OverloadedStrings #-}
import Data.Attoparsec.Text
import Control.Applicative(many)
import Data.Word
parseManyNumbers :: Parser [Int] -- I'd like many to return a Vector instead
parseManyNumbers = many (decimal <* skipSpace)
main :: IO ()
main = print $ parseOnly parseManyNumbers "131 45 68 214"
The above is just an example, but I need to parse a large amount of primitive values in Haskell and need to use arrays instead of lists. This is something that possible in the F#'s Fparsec, so I've went as far as looking at Attoparsec's source, but I can't figure out a way to do it. In fact, I can't figure out where many from Control.Applicative is defined in the base Haskell library. I thought it would be there as that is where documentation on Hackage points to, but no such luck.
Also, I am having trouble deciding what data structure to use here as I can't find something as convenient as a resizable array in Haskell, but I would rather not use inefficient tree based structures.
An option to me would be to skip Attoparsec and implement an entire parser inside the ST monad, but I would rather avoid it except as a very last resort.
There is a growable vector implementation in Haskell, which is based on the great AMT algorithm: "persistent-vector". Unfortunately, the library isn't that much known in the community so far. However to give you a clue about the performance of the algorithm, I'll say that it is the algorithm that drives the standard vector implementations in Scala and Clojure.
I suggest you implement your parser around that data-structure under the influence of the list-specialized implementations. Here the functions are, btw:
-- | One or more.
some :: f a -> f [a]
some v = some_v
where
many_v = some_v <|> pure []
some_v = (fmap (:) v) <*> many_v
-- | Zero or more.
many :: f a -> f [a]
many v = many_v
where
many_v = some_v <|> pure []
some_v = (fmap (:) v) <*> many_v
Some ideas:
Data Structures
I think the most practical data structure to use for the list of Ints is something like [Vector Int]. If each component Vector is sufficiently long (i.e. has length 1k) you'll get good space economy. You'll have
to write your own "list operations" to traverse it, but you'll avoid re-copying data that you would have to perform to return the data in a single Vector Int.
Also consider using a Dequeue instead of a list.
Stateful Parsing
Unlike Parsec, Attoparsec does not provide for user state. However, you
might be able to make use of the runScanner function (link):
runScanner :: s -> (s -> Word8 -> Maybe s) -> Parser (ByteString, s)
(It also returns the parsed ByteString which in your case may be problematic since it will be very large. Perhaps you can write an alternate version which doesn't do this.)
Using unsafeFreeze and unsafeThaw you can incrementally fill in a Vector. Your s data structure might look
something like:
data MyState = MyState
{ inNumber :: Bool -- True if seen a digit
, val :: Int -- value of int being parsed
, vecs :: [ Vector Int ] -- past parsed vectors
, v :: Vector Int -- current vector we are filling
, vsize :: Int -- number of items filled in current vector
}
Maybe instead of a [Vector Int] you use a Dequeue (Vector Int).
I imagine, however, that this approach will be slow since your parsing function will get called for every single character.
Represent the list as a single token
Parsec can be used to parse a stream of tokens, so how about writing
your own tokenizer and letting Parsec create the AST.
The key idea is to represent these large sequences of Ints as a single token. This gives you a lot more latitude in how you parse them.
Defer Conversion
Instead of converting the numbers to Ints at parse time, just have parseManyNumbers return a ByteString and defer the conversion until
you actually need the values. This much enable you to avoid reifying
the values as an actual list.
Vectors are arrays, under the hood. The tricky thing about arrays is that they are fixed-length. You pre-allocate an array of a certain length, and the only way of extending it is to copy the elements into a larger array.
This makes linked lists simply better at representing variable-length sequences. (It's also why list implementations in imperative languages amortise the cost of copying by allocating arrays with extra space and copying only when the space runs out.) If you don't know in advance how many elements there are going to be, your best bet is to use a list (and perhaps copy the list into a Vector afterwards using fromList, if you need to). That's why many returns a list: it runs the parser as many times as it can with no prior knowledge of how many that'll be.
On the other hand, if you happen to know how many numbers you're parsing, then a Vector could be more efficient. Perhaps you know a priori that there are always n numbers, or perhaps the protocol specifies before the start of the sequence how many numbers there'll be. Then you can use replicateM to allocate and populate the vector efficiently.
I want to create a HashTable in Haskell, insert hash values inside and look up in this HashTable.
I found this documentation but I just started Haskell and therefore I don't really know how to ue these functions.
If some of you could show me some lines of code it would be perfect.
I second Ingo's comment about starting with something simpler. However, I'll break down a few things in a bit of detail.
First of all, I assume you've installed the latest Haskell Platform. In the website for the Platform there is a page with collected documentation for the libraries included with it. Any library that's not in that page would be something you'd need to install separately.
The Platform does include Data.HashTable, so you don't need to install anything, but if you look at the latest Platform's documentation on it, you'll see that it's deprecated and going to be removed soon. So I would not use that module.
The Haskell Platform comes with the two most popular Haskell implementations of a map/dictionary data structure:
Data.Map. (Most of the documentation for this is in Data.Map.Lazy.) This implements a map as a kind of balanced search tree, which means that the keys need to be an ordered type—a type that implements the Ord class. A lot of the built-in Haskell types already implement this class, so this would probably be your easiest choice at first.
The Data.HashMap module hierarchy, with two variants; Data.HashMap.Lazy would be a good starting point. This implements maps as a kind of hash table, so the keys need to implement the Hashable class. This class is newer and not as popular as Ord, so often you might need to implement this class for your key types.
So Data.Map is the easier type to use. But to use it effectively you're going to need to understand a few things beside the most basic language constructs:
How to import a module in a source file.
How to use qualified imports—Data.Map has function names that collide with many of the built-in ones in Haskell, which requires some special syntax.
How to load a module into the ghci interpreter.
How to compile a project that uses the containers library where Data.Map lives (using the cabal tool).
Once you have that down, the easiest way to build a map is from a list of key/value pairs:
module MyModule where
import Data.Map (Map) -- This just imports the type name
import qualified Data.Map as Map -- Imports everything else, but with names
-- prefixed with "Map." (with the period).
-- Example: make a Map from a key/value pair
ages :: Map String Integer
ages = Map.fromList [("Joe", 35), ("Mary", 37), ("Irma", 16)]
A few examples on how to use maps:
-- Example: look up somebody and return a message saying what their age is.
-- 'Nothing' means that the map didn't have the key.
findAge :: String -> String
findAge name = case Map.lookup name ages of
Nothing -> "I don't know the age of " ++ name ++ "."
Just age -> name ++ " is " ++ show age ++ " years old."
-- Example: make a map with one extra entry compared to `ages` above.
moreAges :: Map String Integer
moreAges = Map.insert "Steve" 23 ages
-- Example: union of two maps.
evenMoreAges :: Map String Integer
evenMoreAges = Map.union moreAges anotherMap
where anotherMap = Map.fromList [("Metuselah", 111), ("Anuq", 3)]
As a complement to Ingo's answer, consider using the purely function Data.Map.
import qualified Data.Map as M
myMap :: M.Map Int String
myMap = M.fromList $ zip [1..10] ['a'..'j']
insertedMap :: M.Map Int String
insertedMap = M.insert 11 "fizzbuzz" oldMap
at11 :: Maybe String
at11 = M.lookup 11 insertedMap
Then you can use M.lookup, M.insert, and many other functions to modify/query the map. This datastructure is also purely functional/persistant (notice how IO is nowhere in the types). That means that we can do something like
let newMap = M.insert key val oldMap
in M.union oldMap otherMap
See how we can still use the older version of the map even after inserting something? That's "persistance", we never destroy the older versions of our data structure.
Just so to avoid someone calling the haskell community arrogant, here is a short break down of the first function you'll need:
new :: (key -> key -> Bool) -> (key -> Int32) -> IO (HashTable key val)
This tells us the following: to create a HashTable for a specific key type key you need to pass a function that checks equality on keys, and a function that computes a hash value for keys. So, if eq and hashit would be the desired functions, the following:
new eq hashit
gives you an empty HashTable in the IO-Monad.
An easier way could be to create a HashTable from a list using one of the predefined hash functions:
fromList hashInt [(42, "forty-two"), (0, "zero")]
As part of a coding challenge I have to implement a dungeon map.
I have already designed it using Data.Map as a design choice because printing the map was not required and sometimes I had to update an map tile, e.g. when an obstacle was destroyed.
type Dungeon = Map Pos Tile
type Pos = (Int,Int) -- cartesian coordinates
data Tile = Wall | Destroyable | ...
But what if I had to print it too - then I would have to use something like
elaboratePrint . sort $ fromList dungeon where elaboratePrint takes care of the linebreaks and makes nice unicode symbols from the tileset.
Another choice I considered would be a nested list
type Dungeon = [[Tile]]
This would have the disadvantage, that it is hard to update a single element in such a data structure. But printing then would be a simple one liner unlines . map show.
Another structure I considered was Array, but as I am not used to arrays a short glance at the hackage docs - i only found a map function that operated on indexes and one that worked on elements, unless one is willing to work with mutable arrays updating one element is not easy at first glance. And printing an array is also not clear how to do that fast and easily.
So now my question - is there a better data structure for representing a dungeon map that has the property of easy printing and easy updating single elements.
How about an Array? Haskell has real, 2-d arrays.
import Data.Array.IArray -- Immutable Arrays
Now an Array is indexed by any Ix a => a. And luckily, there is an instance (Ix a, Ix b) => Ix (a, b). So we can have
type Dungeon = Array (Integer, Integer) Tile
Now you construct one of these with any of several functions, the simplest to use being
array :: Ix i => (i, i) -> [(i, a)] -> Array i a
So for you,
startDungeon = array ( (0, 0), (100, 100) )
[ ( (x, y), Empty ) | x <- [0..100], y <- [0..100]]
And just substitute 100 and Empty for the appropriate values.
If speed becomes a concern, then it's a simple fix to use MArray and ST. I'd suggest not switching unless speed is actually a real concern here.
To address the pretty printing
import Data.List
import Data.Function
pretty :: Array (Integer, Integer) Tile -> String
pretty = unlines . map show . groupBy ((==) `on` snd.fst) . assoc
And map show can be turned in to however you want to format [Tile] into a row. If you decide that you really want these to be printed in an awesome and efficient manner (Console game maybe) you should look at a proper pretty printing library, like this one.
First — tree-likes such as Data.Map and lists remain the natural data structures for functional languages. Map is a bit of an overkill structure-wise if you only need rectangular maps, but [[Tile]] may actually be pretty fine. It has O(√n) for both random-access and updates, that's not too bad.
In particular, it's better than pure-functional updates of a 2D array (O(n))! So if you need really good performance, there's no way around using mutable arrays. Which isn't necessarily bad though, after all a game is intrinsically concerned with IO and state. What is good about Data.Array, as noted by jozefg, is the ability to use tuples as Ix indexes, so I would go with MArray.
Printing is easy with arrays. You probably only need rectangular parts of the whole map, so I'd just extract such slices with a simple list comprehension
[ [ arrayMap ! (x,y) | x<-[21..38] ] | y<-[37..47] ]
You already know how to print lists.