What datatype to choose for a dungeon map - haskell

As part of a coding challenge I have to implement a dungeon map.
I have already designed it using Data.Map as a design choice because printing the map was not required and sometimes I had to update an map tile, e.g. when an obstacle was destroyed.
type Dungeon = Map Pos Tile
type Pos = (Int,Int) -- cartesian coordinates
data Tile = Wall | Destroyable | ...
But what if I had to print it too - then I would have to use something like
elaboratePrint . sort $ fromList dungeon where elaboratePrint takes care of the linebreaks and makes nice unicode symbols from the tileset.
Another choice I considered would be a nested list
type Dungeon = [[Tile]]
This would have the disadvantage, that it is hard to update a single element in such a data structure. But printing then would be a simple one liner unlines . map show.
Another structure I considered was Array, but as I am not used to arrays a short glance at the hackage docs - i only found a map function that operated on indexes and one that worked on elements, unless one is willing to work with mutable arrays updating one element is not easy at first glance. And printing an array is also not clear how to do that fast and easily.
So now my question - is there a better data structure for representing a dungeon map that has the property of easy printing and easy updating single elements.

How about an Array? Haskell has real, 2-d arrays.
import Data.Array.IArray -- Immutable Arrays
Now an Array is indexed by any Ix a => a. And luckily, there is an instance (Ix a, Ix b) => Ix (a, b). So we can have
type Dungeon = Array (Integer, Integer) Tile
Now you construct one of these with any of several functions, the simplest to use being
array :: Ix i => (i, i) -> [(i, a)] -> Array i a
So for you,
startDungeon = array ( (0, 0), (100, 100) )
[ ( (x, y), Empty ) | x <- [0..100], y <- [0..100]]
And just substitute 100 and Empty for the appropriate values.
If speed becomes a concern, then it's a simple fix to use MArray and ST. I'd suggest not switching unless speed is actually a real concern here.
To address the pretty printing
import Data.List
import Data.Function
pretty :: Array (Integer, Integer) Tile -> String
pretty = unlines . map show . groupBy ((==) `on` snd.fst) . assoc
And map show can be turned in to however you want to format [Tile] into a row. If you decide that you really want these to be printed in an awesome and efficient manner (Console game maybe) you should look at a proper pretty printing library, like this one.

First — tree-likes such as Data.Map and lists remain the natural data structures for functional languages. Map is a bit of an overkill structure-wise if you only need rectangular maps, but [[Tile]] may actually be pretty fine. It has O(√n) for both random-access and updates, that's not too bad.
In particular, it's better than pure-functional updates of a 2D array (O(n))! So if you need really good performance, there's no way around using mutable arrays. Which isn't necessarily bad though, after all a game is intrinsically concerned with IO and state. What is good about Data.Array, as noted by jozefg, is the ability to use tuples as Ix indexes, so I would go with MArray.
Printing is easy with arrays. You probably only need rectangular parts of the whole map, so I'd just extract such slices with a simple list comprehension
[ [ arrayMap ! (x,y) | x<-[21..38] ] | y<-[37..47] ]
You already know how to print lists.

Related

Haskell: Assigning unique char to matrix values if x > 0

So my goal for the program is for it to receive an Int matrix for input, and program converts all numbers > 0 to a unique sequential char, while 0's convert into a '_' (doesn't matter, just any character not in the sequence).
eg.
main> matrixGroupings [[0,2,1],[2,2,0],[[0,0,2]]
[["_ab"],["cd_"],["__e"]]
The best I've been able to achieve is
[["_aa"],["aa_"],["__a"]]
using:
matrixGroupings xss = map (map (\x -> if x > 0 then 'a' else '_')) xss
As far as I can tell, the issue I'm having is getting the program to remember what its last value was, so that when the value check is > 0, it picks the next char in line. I can't for the life of me figure out how to do this though.
Any help would be appreciated.
Your problem is an instance of an ancient art: labelling of various structures with a stream of
labels. It dates back at least to Chris Okasaki, and my favourite treatment is by Jeremy
Gibbons.
As you can see from these two examples, there is some variety to the way a structure may be
labelled. But in this present case, I suppose the most straightforward way will do. And in Haskell
it would be really short. Let us dive in.
The recipe is this:
Define a polymorphic type for your matrices. It must be such that a matrix of numbers and a
matrix of characters are both rightful members.
Provide an instance of Traversable class. It may in many cases be derived automagically.
Pick a monad to your liking. One simple choice is State. (Actually, that is the only choice I
can think of.)
Create an action in this monad that takes a number to a character.
Traverse a matrix with this action.
Let's cook!
A type may be as simple as this:
newtype Matrix a = Matrix [[a]] deriving Show
It is entirely possible that the inner lists will be of unequal length — this type does not
protect us from making a "ragged" matrix. This is poor design. But I am going to skim over
it for now. Haskell provides an endless depth for perfection. This type is good enough for
our needs here.
We can immediately define an example of a matrix:
example :: Matrix Int
example = Matrix [[0,2,1],[2,2,0],[0,0,2]]
How hard is it to define a Traversable? 0 hard.
{-# language DeriveTraversable #-}
...
newtype Matrix a = Matrix [[a]] deriving (Show, Functor, Foldable, Traversable)
Presto.
Where do we get labels from? It is a side effect. The function reaches somewhere, takes a
stream of labels, takes the head, and puts the tail back in the extra-dimensional pocket. A
monad that can do this is State.
It works like this:
label :: Int -> State String Char
label 0 = return '_'
label x = do
ls <- get
case ls of
[ ] -> error "No more labels!"
(l: ls') -> do
put ls'
return l
I hope the code explains itself. When a function "creates" a monadic value, we call it
"effectful", or an "action" in a given monad. For instance, print is an action that,
well, prints stuff. Which is an effect. label is also an action, though in a different
monad. Compare and see for youself.
Now we are ready to cook a solution:
matrixGroupings m = evalState (traverse label m) ['a'..'z']
This is it.
λ matrixGroupings example
Matrix ["_ab","cd_","__e"]
Bon appetit!
P.S. I took all glory from you, it is unfair. To make things fun again, I challenge you for an exercise: can you define a Traversable instance that labels a matrix in another order — by columns first, then rows?

How to define an unordered collection in Haskell

Wondering how you would define an unordered group/collection in Haskell, where by "collection" I mean it can have many copies of the same element, and the items are unordered. I know of the List data type in Haskell, but this is inherently ordered. I would like to see what the definition would look like for an unordered collection/group/list.
I would define it this way
import qualified Data.Map.Lazy as Map
type MultiSet' a = Map.Map a Int
Just a mapping from a type a to an Int. In mathematics it would be something like f : S -> N. The elements you put into it must be ordable, that is because the underlying structure of the Map is defined by a binary tree. This shouldn't be a problem as you can forget about it when using the data structure. See the very extensive documentation of Data.Map for functions to deal with our MultiSet'.
Now there is already a definition together with implementation for this and it is called MultiSet. You can browse to its source code as well, there you see they defined it in an almost an identical way (they used the strict version of the map).
Alternatively you can use a hashmap, it will look like this:
import qualified Data.HashMap.Lazy as Map
type MultiSet'' a = Map.HashMap a Int
The elements you put into it do not need to be ordable, but hashable.
If you just want a structure that has no reasonable order then why not compose a Map with a hash?
type MyBag a = Map (Int,a) Int
insert x mp = Data.Map.insertWith (+) 1 (hash x, x) mp
The above is a balanced binary tree with an order that depends on the hash of the value you have inserted. The map itself is boring, along the lines of data Map k a = Bin (Map k a) a (Map k a) | Nil.
This said, I think you underspecified what you are looking for and what you hope to learn. Your searches have probably yielded hashtables and unordered-containers - why aren't those sufficiently informative?

Haskell: Should I get "Stack space overflow" when constructing an IntMap from a list with a million values?

My problem is that when using any of the Map implementations in Haskell that I always get a "Stack space overflow" when working with a million values.
What I'm trying to do is process a list of pairs. Each pair contains two Ints (not Integers, I failed miserably with them so I tried Ints instead). I want to go through each pair in the list and use the first Int as a key. For each unique key I want to build up a list of second elements where each of the second elements are in a pair that have the same first element. So what I want at the end is a "Map" from an Int to a list of Ints. Here's an example.
Given a list of pairs like this:
[(1,10),(2,11),(1,79),(3,99),(1,42),(3,18)]
I would like to end up with a "Map" like this:
{1 : [42,79,10], 2 : [11], 3 : [18,99]}
(I'm using a Python-like notation above to illustrate a "Map". I know it ain't Haskell. It's just there for illustrative purposes.)
So the first thing I tried was my own hand built version where I sorted the list of pairs of Ints and then went through the list building up a new list of pairs but this time the second element was a list. The first element is the key i.e. the unique Int values of the first element of each pair and the second element is a list of the second values of each original pair which have the key as the first element.
So given a list of pairs like this:
[(1,10),(2,11),(1,79),(3,99),(1,42),(3,18)]
I end up with a list of pairs like this:
[(1, [42,79,10], (2, [11]), (3, [18,99])]
This is easy to do. But there is one problem. The performance of the "sort" function on the original list (of 10 million pairs) is shockingly bad. I can generate the original list of pairs in less than a second. I can process the sorted list into my hand built map in less than a second. However, sorting the original list of pairs takes 40 seconds.
So I thought about using one of the built-in "Map" data structures in Haskell to do the job. The idea being I build my original list of pairs and then using standard Map functions to build a standard Map.
And that's where it all went pear-shaped. It works well on a list of 100,000 values but when I move to 1 million values, I get a "Stack space overflow" error.
So here's some example code that suffers from the problem. Please, please note that is not the actual code that I want to implement. It is just a very simplified version of code for which the same problem exists. I don't really want to separate a million consecutive numbers into odd and even partitions!!
import Data.IntMap.Strict(IntMap, empty, findWithDefault, insert, size)
power = 6
ns :: [Int]
ns = [1..10^power-1]
mod2 n = if odd n then 1 else 0
mod2Pairs = zip (map mod2 ns) ns
-- Takes a list of pairs and returns a Map where the key is the unique Int values
-- of the first element of each pair and the value is a list of the second values
-- of each pair which have the key as the first element.
-- e.g. makeMap [(1,10),(2,11),(1,79),(3,99),(1,42),(3,18)] =
-- 1 -> [42,79,10], 2 -> [11], 3 -> [18,99]
makeMap :: [(Int,a)] -> IntMap [a]
makeMap pairs = makeMap' empty pairs
where
makeMap' m [] = m
makeMap' m ((a, b):cs) = makeMap' m' cs
where
bs = findWithDefault [] a m
m' = insert a (b:bs) m
mod2Map = makeMap mod2Pairs
main = do
print $ "Yowzah"
print $ "length mod2Pairs="++ (show $ length mod2Pairs)
print $ "size mod2Map=" ++ (show $ size mod2Map)
When I run this, I get:
"Yowzah"
"length mod2Pairs=999999"
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it.
From the above output, it should be clear that the stack space overflow happens when I try to do "makeMap mod2Pairs".
But to my naive eye all this seems to do is go through a list of pairs and for each pair lookup a key (the first element of each pair) and A) if it doesn't find a match return an empty list or B) if it does find a match, return the list that has previously been inserted. In either case it "cons"'s the second element of the pair to the "found" list and inserts that back into the Map with the same key.
(PS instead of findWithDefault, I've also tried lookup and handled the Just and Nothing using case but to no avail.)
I've had a look through the Haskell documentation on the various Map implementations and from the point of view of performance in terms of CPU and memory (especially stack memory), it seems that A) a strict implementation and B) one where the keys are Ints would be the best. I have also tried Data.Map and Data.Strict.Map and they also suffer from the same problem.
I am convinced the problem is with the "Map" implementation. Am I right? Why would I get a stack overflow error i.e. what is the Map implementation doing in the background that is causing a stack overflow? Is it making lots and lots of recursive calls behind the scenes?
Can anyone help explain what is going on and how to get around the problem?
I don't have an old enough GHC to check (this works just fine for me, and I don't have 7.6.3 as you do), but my guess would be that your makeMap' is too lazy. Probably this will fix it:
makeMap' m ((a, b):cs) = m `seq` makeMap' m' cs
Without it, you are building up a million-deep nested thunk, and deeply-nested thunks is the traditional way to cause stack overflows in Haskell.
Alternately, I would try just replacing the whole makeMap implementation with fromListWith:
makeMap pairs = fromListWith (++) [(k, [v]) | (k, v) <- pairs]

How to make a custom Attoparsec parser combinator that returns a Vector instead of a list?

{-# LANGUAGE OverloadedStrings #-}
import Data.Attoparsec.Text
import Control.Applicative(many)
import Data.Word
parseManyNumbers :: Parser [Int] -- I'd like many to return a Vector instead
parseManyNumbers = many (decimal <* skipSpace)
main :: IO ()
main = print $ parseOnly parseManyNumbers "131 45 68 214"
The above is just an example, but I need to parse a large amount of primitive values in Haskell and need to use arrays instead of lists. This is something that possible in the F#'s Fparsec, so I've went as far as looking at Attoparsec's source, but I can't figure out a way to do it. In fact, I can't figure out where many from Control.Applicative is defined in the base Haskell library. I thought it would be there as that is where documentation on Hackage points to, but no such luck.
Also, I am having trouble deciding what data structure to use here as I can't find something as convenient as a resizable array in Haskell, but I would rather not use inefficient tree based structures.
An option to me would be to skip Attoparsec and implement an entire parser inside the ST monad, but I would rather avoid it except as a very last resort.
There is a growable vector implementation in Haskell, which is based on the great AMT algorithm: "persistent-vector". Unfortunately, the library isn't that much known in the community so far. However to give you a clue about the performance of the algorithm, I'll say that it is the algorithm that drives the standard vector implementations in Scala and Clojure.
I suggest you implement your parser around that data-structure under the influence of the list-specialized implementations. Here the functions are, btw:
-- | One or more.
some :: f a -> f [a]
some v = some_v
where
many_v = some_v <|> pure []
some_v = (fmap (:) v) <*> many_v
-- | Zero or more.
many :: f a -> f [a]
many v = many_v
where
many_v = some_v <|> pure []
some_v = (fmap (:) v) <*> many_v
Some ideas:
Data Structures
I think the most practical data structure to use for the list of Ints is something like [Vector Int]. If each component Vector is sufficiently long (i.e. has length 1k) you'll get good space economy. You'll have
to write your own "list operations" to traverse it, but you'll avoid re-copying data that you would have to perform to return the data in a single Vector Int.
Also consider using a Dequeue instead of a list.
Stateful Parsing
Unlike Parsec, Attoparsec does not provide for user state. However, you
might be able to make use of the runScanner function (link):
runScanner :: s -> (s -> Word8 -> Maybe s) -> Parser (ByteString, s)
(It also returns the parsed ByteString which in your case may be problematic since it will be very large. Perhaps you can write an alternate version which doesn't do this.)
Using unsafeFreeze and unsafeThaw you can incrementally fill in a Vector. Your s data structure might look
something like:
data MyState = MyState
{ inNumber :: Bool -- True if seen a digit
, val :: Int -- value of int being parsed
, vecs :: [ Vector Int ] -- past parsed vectors
, v :: Vector Int -- current vector we are filling
, vsize :: Int -- number of items filled in current vector
}
Maybe instead of a [Vector Int] you use a Dequeue (Vector Int).
I imagine, however, that this approach will be slow since your parsing function will get called for every single character.
Represent the list as a single token
Parsec can be used to parse a stream of tokens, so how about writing
your own tokenizer and letting Parsec create the AST.
The key idea is to represent these large sequences of Ints as a single token. This gives you a lot more latitude in how you parse them.
Defer Conversion
Instead of converting the numbers to Ints at parse time, just have parseManyNumbers return a ByteString and defer the conversion until
you actually need the values. This much enable you to avoid reifying
the values as an actual list.
Vectors are arrays, under the hood. The tricky thing about arrays is that they are fixed-length. You pre-allocate an array of a certain length, and the only way of extending it is to copy the elements into a larger array.
This makes linked lists simply better at representing variable-length sequences. (It's also why list implementations in imperative languages amortise the cost of copying by allocating arrays with extra space and copying only when the space runs out.) If you don't know in advance how many elements there are going to be, your best bet is to use a list (and perhaps copy the list into a Vector afterwards using fromList, if you need to). That's why many returns a list: it runs the parser as many times as it can with no prior knowledge of how many that'll be.
On the other hand, if you happen to know how many numbers you're parsing, then a Vector could be more efficient. Perhaps you know a priori that there are always n numbers, or perhaps the protocol specifies before the start of the sequence how many numbers there'll be. Then you can use replicateM to allocate and populate the vector efficiently.

How would you represent a graph (the kind associated with the travelling salesman problem) in Haskell

It's pretty easy to represent a tree in haskell:
data Tree a = Node Tree a Tree | Leaf a
but that's because it has no need for the concept of an imperative style "pointer" because each Node/Leaf has one, and only one parent. I guess I could represent it as a list of lists of Maybe Ints ...to create a table with Nothing for those nodes without a path between and Just n for those that do... but that seems really ugly and unwieldy.
You can use a type like
type Graph a = [Node a]
data Node a = Node a [Node a]
The list of nodes is the outgoing (or incoming if you prefer) edges of that node. Since you can build cyclic data structures this can represent arbitrary (multi-)graphs. The drawback of this kind of graph structure is that it cannot be modified once you have built it it. To do traversals each node probably needs a unique name (can be included in the a) so you can keep track of which nodes you have visited.
Disclaimer: below is a mostly pointless exercise in "tying the knot" technique. Fgl is the way to go if you want to actually use your graphs. However if you are wondering how it's possible to represent cyclic data structures functionally, read on.
It is pretty easy to represent a graph in Haskell!
-- a directed graph
data Vertex a b = Vertex { vdata :: a, edges :: [Edge a b] }
data Edge a b = Edge { edata :: b, src :: Vertex a b, dst :: Vertex a b }
-- My graph, with vertices labeled with strings, and edges unlabeled
type Myvertex = Vertex String ()
type Myedge = Edge String ()
-- A couple of helpers for brevity
e :: Myvertex -> Myvertex -> Myedge
e = Edge ()
v :: String -> [Myedge] -> Myvertex
v = Vertex
-- This is a full 5-graph
mygraph5 = map vv [ "one", "two", "three", "four", "five" ] where
vv s = let vk = v s (zipWith e (repeat vk) mygraph5) in vk
This is a cyclic, finite, recursive, purely functional data structure. Not a very efficient or beautiful one, but look, ma, no pointers! Here's an exercise: include incoming edges in the vertex
data Vertex a b = Vertex {vdata::a, outedges::[Edge a b], inedges::[Edge a b]}
It's easy to build a full graph that has two (indistinguishable) copies of each edge:
mygraph5 = map vv [ "one", "two", "three", "four", "five" ] where
vv s =
let vks = repeat vk
vk = v s (zipWith e vks mygraph5)
(zipWith e mygraph5 vks)
in vk
but try to build one that has one copy of each! (Imagine that there's some expensive computation involved in e v1 v2).
The knot-tying techniques that others have outlined can work, but are a bit of a pain, especially when you're trying to construct the graph on the fly. I think the approach you describe is a bit more practical. I would use an array/vector of node types where each node type holds a list/array/vector of neighbors (in addition to any other data you need) represented as ints of the appropriate size, where the int is an index into the node array. I probably wouldn't use Maybe Ints. With Int you can still use -1 or any suitable value as your uninitialized default. Once you have populated all your neighbor lists and know they are good values you won't need the failure machinery provided by Maybe anyway, which as you observed imposes overhead and inconvenience. But your pattern of using Maybe would be the correct thing to do if you needed to make complete use of all possible values the node pointer type could contain.
The simplest way is to give the vertices in the graph unique names (which could be as simple as Ints) and use either the usual adjacency matrix or neighbor list approaches, i.e., if the names are Ints, either use array (Int,Int) Bool, or array Int [Int].
Have a look at this knot-tying technique, it is used to create circular structures. You may need it if your graph contains cycles.
Also, you can represent your graph using the adjacency matrix.
Or you can keep maps between each node and the inbound and outbound edges.
In fact, each of them is useful in one context and a pain in others. Depending on your problem, you'll have to choose.

Resources