Haskell - Calculating the shortest path using trees - haskell

i am trying to write a code in haskell, that goes from point A, to point F, on a board game, that is essentially a Matrix, following the shortest path.
This is the board:
AAAA
ACCB
ADEF
*
0 0 N
The robot enters on the letter A, on the bottom (where it is the * ), and must reach F, on the bottom of the board are the coordinates, x=0, y=0, and pointing towards North. F coordinate is (3,0)
The trick is, it can't jump more than one letter, it can go from A to B, B to C, etc. and it can walk through the letters of the type (A to A, B to B, etc)
It can only move forward and make turns (Left, right) so the path to let me go to F would be
Forward, Forward, Right, Forward ,Forward, Forward, Right, Jump, Right, Jump, Forward, Left, Jump, Left, Forward, Forward
Once it reaches F, it's done.
I want to try this approach, using a Tree
A
/ \
A D
/ \
/ \
A C
/ \ / \
/ \ D C
A
/ \
/ \
A
/
/
A
/ \
B A
/ \
C F
After that i would only need to validate the correct path and shortest right?
Problem is , i don't have that much experience using trees.
Would you indicate any other way to get the best path?
Thank you very much .

We're going to solve this problem by searching a tree in three parts. First we will build a Tree representing the paths through the problem, with branches for each state. We'd like to find the shortest path to get to a state with a certain criteria, so we will write a breadth first search for searching any Tree. This won't be fast enough for the example problem you provided, so we will improve on the breadth first search with a transposition table which keeps track of states we have already explored to avoid exploring them again.
Building a Tree
We'll assume that your playing board is represented in an Array from Data.Array
import Data.Array
type Board = Array (Int, Int) Char
board :: Board
board = listArray ((1,1),(3,4)) ("AAAA" ++ "ACCB" ++ "ADEF")
Data.Array doesn't provide a default easy way to make sure indexes that we look up values for with ! are actually in the bounds of the Array. For convenience, we'll provide a safe version that returns Just v if the value is in the Array or Nothing otherwise.
import Data.Maybe
(!?) :: Ix i => Array i a -> i -> Maybe a
a !? i = if inRange (bounds a) i then Just (a ! i) else Nothing
The State of the puzzle can be represented by the combination of a position of the robot and the direction that the robot is facing.
data State = State {position :: (Int, Int), direction :: (Int, Int)}
deriving (Eq, Ord, Show)
The direction is a unit vector that can be added to the position to get a new position. We can rotate the direction vector left or right and moveTowards it.
right :: Num a => (a, a) -> (a, a)
right (down, across) = (across, -down)
left :: Num a => (a, a) -> (a, a)
left (down, across) = (-across, down)
moveTowards :: (Num a, Num b) => (a, b) -> (a, b) -> (a, b)
moveTowards (x1, y1) (x2, y2) = (x1 + x2, y1 + y2)
To explore a board, we will need to be able to determine from a state what moves are legal. To do this it'd be useful to name the moves, so we'll make a data type to represent the possible moves.
import Prelude hiding (Right, Left)
data Move = Left | Right | Forward | Jump
deriving (Show)
To determine what moves are legal on a board we need to know which Board we are using and the State of the robot. This suggests the type moves :: Board -> State -> Move, but we re going to be computing the new state after each move just to decide if the move was legal, so we will also return the new state for convenience.
moves :: Board -> State -> [(Move, State)]
moves board (State pos dir) =
(if inRange (bounds board) pos then [(Right, State pos (right dir)), (Left, State pos (left dir))] else []) ++
(if next == Just here then [(Forward, State nextPos dir)] else []) ++
(if next == Just (succ here) then [(Jump, State nextPos dir)] else [])
where
here = fromMaybe 'A' (board !? pos)
nextPos = moveTowards dir pos
next = board !? nextPos
If we're on the board, we can turn Left and Right; the restriction that we be on the board guarantees all the States returned by moves have positions that are on the board. If the value held at the nextPos, next position matches what is Just here we can go Forward to it (if we're off the board, we assume what is here is 'A'). If next is Just the successor of what is here we can Jump to it. If next is off the board it is Nothing and can't match either Just here or Just (succ here).
Up until this point, we've just provided the description of the problem and haven't touched on answering the question with tree. We are going to use the rose tree Tree defined in Data.Tree.
data Tree a = Node {
rootLabel :: a, -- ^ label value
subForest :: Forest a -- ^ zero or more child trees
}
type Forest a = [Tree a]
Each node of a Tree a holds a single value a and a list of branches which are each a Tree a.
We are going to build a list of Trees in a straightforward manner from our moves function. We are going to make each result of moves the rootLabel of a Node and make the branches be the list of Trees we get when we explore the new state.
import Data.Tree
explore :: Board -> State -> [Tree (Move, State)]
explore board = map go . moves board
where
go (label, state) = Node (label, state) (explore board state)
At this point, our trees are infinite; nothing keeps the robot from endlessly spinning in place.. We can't draw one, but we could if we could limit the tree to just a few steps.
limit :: Int -> Tree a -> Tree a
limit n (Node a ts)
| n <= 0 = Node a []
| otherwise = Node a (map (limit (n-1)) ts)
We'll display just the first couple levels of the tree when we start off the bottom left corner facing towards the board in State (4, 1) (-1, 0).
(putStrLn .
drawForest .
map (fmap (\(m, s) -> show (m, board ! position s)) . limit 2) .
explore board $ State (4, 1) (-1, 0))
(Forward,'A')
|
+- (Right,'A')
| |
| +- (Right,'A')
| |
| `- (Left,'A')
|
+- (Left,'A')
| |
| +- (Right,'A')
| |
| `- (Left,'A')
|
`- (Forward,'A')
|
+- (Right,'A')
|
+- (Left,'A')
|
`- (Forward,'A')
Breadth First Search
Breadth first search explores all the possibilities at one level (across the "breadth" of what is being searched) before descending into the next level (into the "depth" of what is being searched). Breadth first search finds the shortest path to a goal. For our trees, this means exploring everything at one layer before exploring any of what's in the inner layers. We'll accomplish this by making a queue of nodes to explore adding the nodes we discover in the next layer to the end of the queue. The queue will always hold nodes from the current layer followed by nodes from the next layer. It will never hold any nodes from the layer past that because we won't discover those nodes until we have moved on to the next layer.
To implement that, we need an efficient queue, so we'll use a sequence from Data.Sequence/
import Data.Sequence (viewl, ViewL (..), (><))
import qualified Data.Sequence as Seq
We start with an empty queue Seq.empty of nodes to explore and an empty path [] into the Trees. We add the initial possibilities to the end of the queue with >< (concatenation of sequences) and go. We look at the start of the queue. If there's nothing left, EmptyL, we didn't find a path to the goal and return Nothing. If there is something there, and it matches the goal p, we return the path we have accumulate backwards. If the first thing in the queue doesn't match the goal we add it as the most recent part of the path and add all of its branches to the remainder of what was queued.
breadthFirstSearch :: (a -> Bool) -> [Tree a] -> Maybe [a]
breadthFirstSearch p = combine Seq.empty []
where
combine queue ancestors branches =
go (queue >< (Seq.fromList . map ((,) ancestors) $ branches))
go queue =
case viewl queue of
EmptyL -> Nothing
(ancestors, Node a bs) :< queued ->
if p a
then Just . reverse $ a:ancestors
else combine queued (a:ancestors) bs
This lets us write our first solve for Boards. It's convenient here that all of the positions returned from moves are on the board.
solve :: Char -> Board -> State -> Maybe [Move]
solve goal board = fmap (map fst) . breadthFirstSearch ((== goal) . (board !) . position . snd) . explore board
If we run this for our board it never finishes! Well, eventually it will, but my back of a napkin calculation suggests it will take about 40 million steps. The path to the end of the maze is 16 steps long and the robot is frequently presented with 3 options for what to do at each step.
> solve 'F' board (State (4, 1) (-1, 0))
We can solve much smaller puzzles like
AB
AC
*
Which we can represent the board for this puzzle with
smallBoard :: Board
smallBoard = listArray ((1,1),(2,2)) ("AB" ++ "AC")
We solve it looking for 'C' starting in row 3 column 1 looking towards lower numbered rows.
> solve 'C' smallBoard (State (3, 1) (-1, 0))
Just [Forward,Forward,Right,Jump,Right,Jump]
Transposition Table
Certainly this problem must be easier to solve than exploring 40 million possible paths. Most of those paths consist of spinning in place or randomly meandering back and forth. The degenerate paths all share one property, they keep visiting states they have already visited. In the breadthFirstSeach code, those paths keep adding the same nodes to the queue. We can get rid of all of this extra work just by remembering the nodes that we've already seen.
We'll remember the set of nodes we've already seen with a Set from Data.Set.
import qualified Data.Set as Set
To the signature of breadthFirstSearch we'll add a function from the label for a node to a representation for the branches of that node. The representation should be equal whenever all the branches out of the node are the same. In order to quickly compare the representations in O(log n) time with a Set we require that the representation have an Ord instance instead of just equality. The Ord instance allows Set to check for membership with binary search.
breadthFirstSearchUnseen:: Ord r => (a -> r) -> (a -> Bool) -> [Tree a] -> Maybe [a]
In addition to keeping track of the queue, breadthFirstSearchUnseen keeps track of the set of representations that have been seen, starting with Set.empty. Each time we add branches to the queue with combine we also add the representations to seen. We only add the unseen branches whose representations are not in the set of branches we've already seen.
breadthFirstSearchUnseen repr p = combine Set.empty Seq.empty []
where
combine seen queued ancestors unseen =
go
(seen `Set.union` (Set.fromList . map (repr . rootLabel) $ unseen))
(queued >< (Seq.fromList . map ((,) ancestors ) $ unseen))
go seen queue =
case viewl queue of
EmptyL -> Nothing
(ancestors, Node a bs) :< queued ->
if p a
then Just . reverse $ ancestors'
else combine seen queued ancestors' unseen
where
ancestors' = a:ancestors
unseen = filter (flip Set.notMember seen . repr . rootLabel) bs
Now we can improve our solve function to use breadthFirstSearchUnseen. All of the branches from a node are determined by the State - the Move label that got to that state is irrelevant - so we only use the snd part of the (Move, State) tuple as the representation for a node.
solve :: Char -> Board -> State -> Maybe [Move]
solve goal board = fmap (map fst) . breadthFirstSearchUnseen snd ((== goal) . (board !) . position . snd) . explore board
We can now solve the original puzzle very quickly.
> solve 'F' board (State (4, 1) (-1, 0))
Just [Forward,Forward,Forward,Right,Forward,Forward,Forward,Right,Jump,Right,Jump,Forward,Left,Jump,Left,Jump,Jump]

Related

Building a Binary Tree (not BST) in Haskell Breadth-First

I recently started using Haskell and it will probably be for a short while. Just being asked to use it to better understand functional programming for a class I am taking at Uni.
Now I have a slight problem I am currently facing with what I am trying to do. I want to build it breadth-first but I think I got my conditions messed up or my conditions are also just wrong.
So essentially if I give it
[“A1-Gate”, “North-Region”, “South-Region”, “Convention Center”, “Rectorate”, “Academic Building1”, “Academic Building2”] and [0.0, 0.5, 0.7, 0.3, 0.6, 1.2, 1.4, 1.2], my tree should come out like
But my test run results are haha not what I expected. So an extra sharp expert in Haskell could possibly help me spot what I am doing wrong.
Output:
*Main> l1 = ["A1-Gate", "North-Region", "South-Region", "Convention Center",
"Rectorate", "Academic Building1", "Academic Building2"]
*Main> l3 = [0.0, 0.5, 0.7, 0.3, 0.6, 1.2, 1.4, 1.2]
*Main> parkingtree = createBinaryParkingTree l1 l3
*Main> parkingtree
Node "North-Region" 0.5
(Node "A1-Gate" 0.0 EmptyTree EmptyTree)
(Node "Convention Center" 0.3
(Node "South-Region" 0.7 EmptyTree EmptyTree)
(Node "Academic Building2" 1.4
(Node "Academic Building1" 1.2 EmptyTree EmptyTree)
(Node "Rectorate" 0.6 EmptyTree EmptyTree)))
A-1 Gate should be the root but it ends up being a child with no children so pretty messed up conditions.
If I could get some guidance it would help. Below is what I've written so far::
data Tree = EmptyTree | Node [Char] Float Tree Tree deriving (Show,Eq,Ord)
insertElement location cost EmptyTree =
Node location cost EmptyTree EmptyTree
insertElement newlocation newcost (Node location cost left right) =
if (left == EmptyTree && right == EmptyTree)
then Node location cost (insertElement newlocation newcost EmptyTree)
right
else if (left == EmptyTree && right /= EmptyTree)
then Node location cost (insertElement newlocation newcost EmptyTree)
right
else if (left /= EmptyTree && right == EmptyTree)
then Node location cost left
(insertElement newlocation newcost EmptyTree)
else Node newlocation newcost EmptyTree
(Node location cost left right)
buildBPT [] = EmptyTree
--buildBPT (xs:[]) = insertElement (fst xs) (snd xs) (buildBPT [])
buildBPT (x:xs) = insertElement (fst x) (snd x) (buildBPT xs)
createBinaryParkingTree a b = buildBPT (zip a b)
Thank you for any guidance that might be provided. Yes I have looked at some of the similar questions I do think my problem is different but if you think a certain post has a clear answer that will help I am willing to go and take a look at it.
Here's a corecursive solution.
{-# bft(Xs,T) :- bft( Xs, [T|Q], Q). % if you don't read Prolog, see (*)
bft( [], Nodes , []) :- maplist( =(empty), Nodes).
bft( [X|Xs], [N|Nodes], [L,R|Q]) :- N = node(X,L,R),
bft( Xs, Nodes, Q).
#-}
data Tree a = Empty | Node a (Tree a) (Tree a) deriving Show
bft :: [a] -> Tree a
bft xs = head nodes -- Breadth First Tree
where
nodes = zipWith g (map Just xs ++ repeat Nothing) -- values and
-- Empty leaves...
(pairs $ tail nodes) -- branches...
g (Just x) (lt,rt) = Node x lt rt
g Nothing _ = Empty
pairs ~(a: ~(b:c)) = (a,b) : pairs c
{-
nodes!!0 = g (Just (xs!!0)) (nodes!!1, nodes!!2) .
nodes!!1 = g (Just (xs!!1)) (nodes!!3, nodes!!4) . .
nodes!!2 = g (Just (xs!!2)) (nodes!!5, nodes!!6) . . . .
................ .................
-}
nodes is the breadth-first enumeration of all the subtrees of the result tree. The tree itself is the top subtree, i.e., the first in this list. We create Nodes from each x in the input xs, and when the input
is exhausted we create Emptys by using an indefinite number of Nothings instead (the Empty leaves' true length is length xs + 1 but we don't need to care about that).
And we didn't have to count at all.
Testing:
> bft [1..4]
Node 1 (Node 2 (Node 4 Empty Empty) Empty) (Node 3 Empty Empty)
> bft [1..10]
Node 1
(Node 2
(Node 4
(Node 8 Empty Empty)
(Node 9 Empty Empty))
(Node 5
(Node 10 Empty Empty)
Empty))
(Node 3
(Node 6 Empty Empty)
(Node 7 Empty Empty))
How does it work: the key is g's laziness, that it doesn't force lt's nor rt's value, while the tuple structure is readily served by -- very lazy in its own right -- pairs. So both are just like the not-yet-set variables in that Prolog pseudocode(*), when served as 2nd and 3rd arguments to g. But then, for the next x in xs, the node referred to by this lt becomes the next invocation of g's result.
And then it's rt's turn, etc. And when xs end, and we hit the Nothings, g stops pulling the values from pairs's output altogether. So pairs stops advancing on the nodes too, which is thus never finished though it's defined as an unending stream of Emptys past that point, just to be on the safe side.
(*) Prolog's variables are explicitly set-once: they are allowed to be in a not-yet-assigned state. Haskell's (x:xs) is Prolog's [X | Xs].
The pseudocode: maintain a queue; enqueue "unassigned pointer"; for each x in xs: { set pointer in current head of the queue to Node(x, lt, rt) where lt, rt are unassigned pointers; enqueue lt; enqueue rt; pop queue }; set all pointers remaining in queue to Empty; find resulting tree in the original head of the queue, i.e. the original first "unassigned pointer" (or "empty box" instead of "unassigned pointer" is another option).
This Prolog's "queue" is of course fully persistent: "popping" does not mutate any data structure and doesn't change any outstanding references to the queue's former head -- it just advances the current pointer into the queue. So what's left in the wake of all this queuing, is the bfs-enumeration of the built tree's nodes, with the tree itself its head element -- the tree is its top node, with the two children fully instantiated to the bottom leaves by the time the enumeration is done.
Update: #dfeuer came up with much simplified version of it which is much closer to the Prolog original (that one in the comment at the top of the post), that can be much clearer. Look for more efficient code and discussion and stuff in his post. Using the simple [] instead of dfeuer's use of the more efficient infinite stream type data IS a = a :+ IS a for the sub-trees queue, it becomes
bftree :: [a] -> Tree a
bftree xs = t
where
t : q = go xs q
go [] _ = repeat Empty
go (x:ys) ~(l : ~(r : q)) = Node x l r : go ys q
---READ-- ----READ---- ---WRITE---
{-
xs = [ x x2 x3 x4 x5 x6 x7 x8 … ]
(t:q) = [ t l r ll lr rl rr llr … Empty Empty … … ]
-}
For comparison, the opposite operation of breadth-first enumeration of a tree is
bflist :: Tree a -> [a]
bflist t = [x | Node x _ _ <- q]
where
q = t : go 1 q
go 0 _ = []
go i (Empty : q) = go (i-1) q
go i (Node _ l r : q) = l : r : go (i+1) q
-----READ------ --WRITE--
How does bftree work: t : q is the list of the tree's sub-trees in breadth-first order. A particular invocation of go (x:ys) uses l and r before they are defined by subsequent invocations of go, either with another x further down the ys, or by go [] which always returns Empty. The result t is the very first in this list, the topmost node of the tree, i.e. the tree itself.
This list of tree nodes is created by the recursive invocations of go at the same speed with which the input list of values xs is consumed, but is consumed as the input to go at twice that speed, because each node has two child nodes.
These extra nodes thus must also be defined, as Empty leaves. We don't care how many are needed and simply create an infinite list of them to fulfill any need, although the actual number of empty leaves will be one more than there were xs.
This is actually the same scheme as used in computer science for decades for array-backed trees where tree nodes are placed in breadth-first order in a linear array. Curiously, in such setting both conversions are a no-op -- only our interpretation of the same data is what's changing, our handling of it, how are we interacting with / using it.
Update: the below solution is big-O optimal and (I think) pretty easy to understand, so I'm leaving it here in case anyone's interested. However, Will Ness's solution is much more beautiful and, especially when optimized a bit, can be expected to perform better in practice. It is much more worthy of study!
I'm going to ignore the fake edge labels for now and just focus on the core of what's happening.
A common pattern in algorithm design is that it's sometimes easier to solve a more general problem. So instead of trying to build a tree, I'm going to look at how to build a forest (a list of trees) with a given number of trees. I'll make the node labels polymorphic to avoid having to think about what they look like; you can of course use the same building technique with your original tree type.
data Tree a = Empty | Node a (Tree a) (Tree a)
-- Built a tree from a breadth-first list
bft :: [a] -> Tree a
bft xs = case dff 1 xs of
[] -> Empty
[t] -> t
_ -> error "something went wrong"
-- Build a forest of nonempty trees.
-- The given number indicates the (maximum)
-- number of trees to build.
bff :: Int -> [a] -> [Tree a]
bff _ [] = []
bff n xs = case splitAt n xs of
(front, rear) -> combine front (bff (2 * n) rear)
where
combine :: [a] -> [Tree a] -> [Tree a]
-- you write this
Here's a full, industrial-strength, maximally lazy implementation. This is the most efficient version I've been able to come up with that's as lazy as possible. A slight variant is less lazy but still works for fully-defined infinite inputs; I haven't tried to test which would be faster in practice.
bft' :: [a] -> Tree a
bft' xs = case bff 1 xs of
[] -> Empty
[t] -> t
_ -> error "whoops"
bff' :: Int -> [a] -> [Tree a]
bff' !_ [] = []
bff' n xs = combine n xs (bff (2 * n) (drop n xs))
where
-- The "take" portion of the splitAt in the original
-- bff is integrated into this version of combine. That
-- lets us avoid allocating an intermediate list we don't
-- really need.
combine :: Int -> [a] -> [Tree a] -> [Tree a]
combine 0 !_ ~[] = [] -- These two lazy patterns are just documentation
combine _k [] ~[] = []
combine k (y : ys) ts = Node y l r : combine (k - 1) ys dropped
where
(l, ~(r, dropped)) = case ts of -- This lazy pattern matters.
[] -> (Empty, (Empty, []))
t1 : ts' -> (t1, case ts' of
[] -> (Empty, [])
t2 : ts'' -> (t2, ts''))
For the less-lazy variant, replace (!l, ~(!r, dropped)) with (!l, !r, dropped) and adjust the RHS accordingly.
For true industrial strength, forests should be represented using lists strict in their elements:
data SL a = Cons !a (SL a) | Nil
And the pairs in the above (l, ~(r, dropped)) should both be represented using a type like
data LSP a b = LSP !a b
This should avoid some (pretty cheap) run-time checks. More importantly, it makes it easier to see where things are and aren't getting forced.
The method that you appear to have chosen is to build the tree up backwards: from bottom-to-top, right-to-left; starting from the last element of your list. This makes your buildBPT function look nice, but requires your insertElement to be overly complex. To construct a binary tree in a breadth-first fashion this way would require some difficult pivots at every step past the first three.
Adding 8 nodes to the tree would require the following steps (see how the nodes are inserted from last to first):
. 4
6 6
8 7 8 . .
. .
3
7 4 5
8 . 6 7 8 .
6 2
7 8 3 4
5 6 7 8
5
6 7 1
8 . . . 2 3
4 5 6 7
8 . . . . . . .
If, instead, you insert the nodes left-to-right, top-to-bottom, you end up with a much simpler solution, requiring no pivoting, but instead some tree structure introspection. See the insertion order; at all times, the existing values remain where they were:
. 1
2 3
1 4 5 . .
. .
1
1 2 3
2 . 4 5 6 .
1 1
2 3 2 3
4 5 6 7
1
2 3 1
4 . . . 2 3
4 5 6 7
8 . . . . . . .
The insertion step has an asymptotic time complexity on the order of O(n^2) where n is the number of nodes to insert, as you are inserting the nodes one-by-one, and then iterating the nodes already present in the tree.
As we insert left-to-right, the trick is to check whether the left sub-tree is complete:
if it is, and the right sub-tree is not complete, then recurse to the right.
if it is, and the right sub-tree is also complete, then recurse to the left (starting a new row).
if it is not, then recurse to the left.
Here is my (more generic) solution:
data Tree a = Leaf | Node a (Tree a) (Tree a)
deriving (Eq, Show)
main = do
let l1 = ["A1-Gate", "North-Region", "South-Region", "Convention Center",
"Rectorate", "Academic Building1", "Academic Building2"]
let l2 = [0.0, 0.5, 0.7, 0.3, 0.6, 1.2, 1.4, 1.2]
print $ treeFromList $ zip l1 l2
mkNode :: a -> Tree a
mkNode x = Node x Leaf Leaf
insertValue :: Tree a -> a -> Tree a
insertValue Leaf y = mkNode y
insertValue (Node x left right) y
| isComplete left && nodeCount left /= nodeCount right = Node x left (insertValue right y)
| otherwise = Node x (insertValue left y) right
where nodeCount Leaf = 0
nodeCount (Node _ left right) = 1 + nodeCount left + nodeCount right
depth Leaf = 0
depth (Node _ left right) = 1 + max (depth left) (depth right)
isComplete n = nodeCount n == 2 ^ (depth n) - 1
treeFromList :: (Show a) => [a] -> Tree a
treeFromList = foldl insertValue Leaf
EDIT: more detailed explanation:
The idea is to remember in what order you insert nodes: left-to-right first, then top-to-bottom. I compressed the different cases in the actual function, but you can expand them into three:
Is the left side complete? If not, then insert to the left side.
Is the right side as complete as the left side, which is complete? If not, then insert to the right side.
Both sides are full, so we start a new level by inserting to the left side.
Because the function fills the nodes up from left-to-right and top-to-bottom, then we always know (it's an invariant) that the left side must fill up before the right side, and that the left side can never be more than one level deeper than the right side (nor can it be shallower than the right side).
By following the growth of the second set of example trees, you can see how the values are inserted following this invariant. This is enough to describe the process recursively, so it extrapolates to a list of any size (the recursion is the magic).
Now, how do we determine whether a tree is 'complete'? Well, it is complete if it is perfectly balanced, or if – visually – its values form a triangle. As we are working with binary trees, then the base of the triangle (when filled) must have a number of values equal to a power of two. More specifically, it must have 2^(depth-1) values. Count for yourself in the examples:
depth = 1 -> base = 1: 2^(1-1) = 1
depth = 2 -> base = 2: 2^(2-1) = 2
depth = 3 -> base = 4: 2^(3-1) = 4
depth = 4 -> base = 8: 2^(4-1) = 8
The total number of nodes above the base is one less than the width of the base: 2^(n-1) - 1. The total number of nodes in the complete tree is therefore the number of nodes above the base, plus those of the base, so:
num nodes in complete tree = 2^(depth-1) - 1 + 2^(depth-1)
= 2 × 2^(depth-1) - 1
= 2^depth - 1
So now we can say that a tree is complete if it has exactly 2^depth - 1 non-empty nodes in it.
Because we go left-to-right, top-to-bottom, when the left side is complete, we move to the right, and when the right side is just as complete as the left side (meaning that it has the same number of nodes, which is means that it is also complete because of the invariant), then we know that the whole tree is complete, and therefore a new row must be added.
I originally had three special cases in there: when both nodes are empty, when the left node is empty (and therefore so was the right) and when the right node is empty (and therefore the left could not be). These three special cases are superseded by the final case with the guards:
If both sides are empty, then countNodes left == countNodes right, so therefore we add another row (to the left).
If the left side is empty, then both sides are empty (see previous point).
If the right side is empty, then the left side must have depth 1 and node count 1, meaning that it is complete, and 1 /= 0, so we add to the right side.

Haskell :: Recursion in Recursion for Loop in Loop

How it goes: Based on the set of tuple (id, x, y), find the min max for x and y , then two dots (red points) created. Each element in tuple are grouped to two groups based on the distance towards the red dots.
Each group cant exceed 5 dots. If exceed, new group should be computed. I've managed to do recursion for the first phase. But I have no idea how to do it for second phase. The second phase should look like this:
Based on these two groups, again it need to find the min max for x and y (for each group), then four dots (red points) created. Each element in tuple are grouped to two groups based on the distance towards the red dots.
getDistance :: (Int, Double, Double) -> (Int, Double, Double) -> Double
getDistance (_,x1,y1) (_,x2,y2) = sqrt $ (x1-x2)^2 + (y1-y2)^2
getTheClusterID :: (Int, Double, Double) -> Int
getTheClusterID (id, _, _) = id
idxy = [(id, x, y)]
createCluster id cs = [(id, minX, minY),(id+1, maxX, minY), (id+2, minX, maxY), (id+3, maxX, maxY)]
where minX = minimum $ map (\(_,x,_,_) -> x) cs
maxX = maximum $ map (\(_,x,_,_) -> x) cs
minY = minimum $ map (\(_,_,y,_) -> y) cs
maxY = maximum $ map (\(_,_,y,_) -> y) cs
idCluster = [1]
cluster = createCluster (last idCluster) idxy
clusterThis (id,a,b) = case (a,b) of
j | getDistance (a,b) (cluster!!0) < getDistance (a,b) (cluster!!1) &&
-> (getTheClusterID (cluster!!0), a, b)
j | getDistance (a,b) (cluster!!1) < getDistance (a,b) (cluster!!0) &&
-> (getTheClusterID (cluster!!1), a, b)
_ -> (getTheClusterID (cluster!!0), a, b)
groupAll = map clusterThis idxy
I am moving from imperative to functional. Sorry if my way of thinking is still in imperative way. Still learning.
EDIT:
To clarify, this is the original data looks like.
The basic principle to follow in writing such an algorithm is to write small, compositional programs; each program is then easy to reason about and test in isolation, and the final program can be written in terms of the smaller ones.
The algorithm can be summarized as follows:
Compute the points which bound the set of points.
Split the rest of the points into two clusters, one containing points closer to the minimum point, the other containing all other points (equivalently, points closer to the maximum point).
If any cluster contains more than 5 points, repeat the process on that cluster.
The presence of a 'repeat the process' step indicates this to be a divide and conquer problem.
I see no need for an ID for each point, so I've dispensed with this.
To begin, define datatypes for each type of data you will be working with:
import Data.List (partition)
data Point = Point { ptX :: Double, ptY :: Double }
data Cluster = Cluster { clusterPts :: [Point] }
This may seem silly for such simple data, but it can potentially save you quite a bit of confusion during debugging. Also note the import of a function we will be using later.
The 1st step:
minMaxPoints :: [Point] -> (Point, Point)
minMaxPoints ps =
(Point minX minY
,Point maxX maxY)
where minX = minimum $ map ptX ps
maxX = maximum $ map ptX ps
minY = minimum $ map ptY ps
maxY = maximum $ map ptY ps
This is essentially the same as your createCluster function.
The 2nd step:
pointDistance :: Point -> Point -> Double
pointDistance (Point x1 y1) (Point x2 y2) = sqrt $ (x1-x2)^2 + (y1-y2)^2
cluster1 :: [Point] -> [Cluster]
cluster1 ps =
let (mn, mx) = minMaxPoints ps
(psmn, psmx) = partition (\p -> pointDistance mn p < pointDistance mx p) ps
in [ Cluster psmn, Cluster psmx ]
This function should clear - it is a direct translation of the above statement of this step into code. The partition function takes a predicate and a list and produces two lists, the first containing all elements for which the predicate is true, and the second all elements for which it is false. pointDistance is essentially the same as your getDistance function.
The 3rd step:
cluster :: [Point] -> [Cluster]
cluster ps =
cluster1 ps >>= \cl#(Cluster c) ->
if length c > 5
then cluster c
else [cl]
This also implements the statement above very directly. Perhaps the only confusing part is the use of >>=, which (here) has type [a] -> (a -> [b]) -> [b]; it simply applies the given function to each element of the given list, and concatenates the result (equivalently, it is written flip concatMap).
Finally your test case (which I hope I've translated correctly from pictures to Haskell data):
testPts :: [Point]
testPts = map (uncurry Point)
[ (0,0), (1,0), (2,1), (0,2)
, (5,2), (5,4), (4,3), (4,4)
, (8,2), (9,3), (10,2)
, (11,4), (12,3), (13,3), (13,5) ]
main = mapM_ (print . map (\p -> (ptX p, ptY p)) . clusterPts) $ cluster testPts
Running this program produces
[(0.0,0.0),(0.0,2.0),(2.0,1.0),(1.0,0.0)]
[(4.0,4.0),(5.0,2.0),(5.0,4.0),(4.0,3.0)]
[(10.0,2.0),(9.0,3.0),(8.0,2.0)]
[(13.0,3.0),(12.0,3.0),(11.0,4.0),(13.0,5.0)]
Functional programmers love recursion, yet they go to great lengths to avoid writing it. Jeez, people, make up your minds!
I like to structure my code, to the extent possible, using common, well-understood combinators. I want to demonstrate a style of Haskell programming which leans heavily on standard tools to implement the boring parts of a program (mapping, zipping, looping) as tersely and generically as possible, freeing you up to focus on the problem at hand.
So don't worry if you don't understand everything here. I just want to show you what's possible! (And please ask if you have questions!)
Vectors
First things first: we're working with two-dimensional space, so we'll need two-dimensional vectors and some secondary school vector algebra to work with them.
I'm going to parameterise my vector by the scalar on which our vector space is built. This'll allow me to work with standard type classes like Functor, so I can delegate a lot of the work of building a vector algebra to the machine. I've turned on DeriveFunctor and DeriveFoldable, which allow me to utter the magic words deriving (Functor, Foldable).
data Pair a = Pair {
px :: a,
py :: a
} deriving (Show, Functor, Foldable)
Hereafter I'm going to avoid working explicitly with Pair, and program to an interface, not an implementation. This'll allow me to build a simple linear algebra library in a manner that's independent of the dimensionality of the vector space. I'll give example type signatures in terms of V2:
type V2 = Pair Double
Scalar multiplication: functors
A vector space is required to have two operations: scalar multiplication and vector addition. Scalar multiplication means multiplying each component of a vector by a constant scalar. If you view a vector as a container of components, it should be clear that this means "do the same thing to every element in a container" - that is, it's a mapping operation. That's what Functor is for.
-- mul :: Double -> V2 -> V2
mul :: (Functor f, Num n) => n -> f n -> f n
mul k f = fmap (k *) f
Vector addition: zippy applicatives
Vector addition involves adding up the components of a vector point-wise. Thinking of a vector as a container of components, addition is a zipping operation - match up each element of the two vectors and add them up.
Applicative functors are functors with an additional "apply" operation. Thinking of a functor f as a container, Applicative's <*> :: f (a -> b) -> f a -> f b gives you a way to take a container of functions and apply it to a container of values to get a new container of values. It should be clear that one way to make Pair into an Applicative is to use zipping to apply functions to values.
instance Applicative Pair where
pure x = Pair x x
Pair f g <*> Pair x y = Pair (f x) (g y)
(For another example of a zippy applicative, see this answer of mine.)
Now that we have a way to zip two pairs, we can leverage a bit of standard Applicative machinery to implement vector addition.
-- add :: V2 -> V2 -> V2
add :: (Applicative f, Num n) => f n -> f n -> f n
add = liftA2 (+)
Vector subtraction, which gives you a way to find the distance between two points, is defined in terms of multiplication and addition.
-- minus :: V2 -> V2 -> V2
minus :: (Applicative f, Num n) => f n -> f n -> f n
v `minus` u = v `add` mul (-1) u
Dot products: foldable containers
2D Euclidean space is actually a Hilbert space - a vector space equipped with a way to measure lengths and angles in the form of a dot product. To take the dot product of two vectors, you multiply the components together and then add up the results. Once more, we'll be using Applicative to multiply the components, but that just gives us another vector: how do we implement "adding up the results"?
Foldable is the class of containers which admit an "aggregation" operation foldr :: (a -> b -> b) -> b -> f a -> b. The standard prelude's sum is defined in terms of foldr, so:
-- dot :: V2 -> V2 -> Double
dot :: (Applicative f, Foldable f, Num n) => f n -> f n -> n
v `dot` u = sum $ liftA2 (*) v u
This gives us a way to find the absolute length of a vector: dot it with itself and take the square root.
-- modulus :: V2 -> Double
modulus :: (Applicative f, Foldable f, Floating n) => f n -> n
modulus v = sqrt $ v `dot` v
So the distance between two points is the modulus of the difference of the vectors.
dist :: (Applicative f, Foldable f, Floating n) => f n -> f n -> n
dist v u = modulus (v `minus` u)
N-ary zipping: traversable containers
An axis-aligned (hyper-)rectangle can be defined by just two points. We'll represent the bounding box of a set of points as a Pair of vectors pointing to opposite corners of the bounding box.
Given a collection of vectors of components, we can find the opposite corners of the bounding box by finding the maximum and minimum of each component across the collection. This requires us to zip up, or transpose, a collection of vectors of components into a vector of collections of components. For this I'll use Traversable's sequenceA.
-- boundingBox :: [V2] -> Pair V2
boundingBox :: (Traversable t, Applicative f, Ord n) => t (f n) -> Pair (f n)
boundingBox vs =
let components = sequenceA vs
in Pair (minimum <$> components) (maximum <$> components)
Clustering
Now that we have a library for working with vectors, we can get down to the meaty part of the algorithm: dividing sets of points into clusters.
Partitioning
Let me rephrase the specification of the inner loop of your algorithm. You want to partition a set of points based on whether they're closer to the bottom-left corner of the set's bounding box or to the top-right corner. That's what partition does.
We can write a function, whichCluster which uses minus and modulus to decide this for a single point, and then use partition to apply it to the whole set.
type Cluster = []
-- cluster :: Cluster V2 -> [Cluster V2]
cluster :: (Applicative f, Foldable f, Ord n, Floating n) => Cluster (f n) -> [Cluster (f n)]
cluster vs =
let Pair bottomLeft topRight = boundingBox vs
whichCluster v = dist v bottomLeft <= dist v topRight
(g1, g2) = partition whichCluster vs
in [g1, g2]
Repetition, repetition, repetition
Now we want to repeatedly cluster until we don't have any groups larger than 5. Here's the plan. We'll keep track of two sets of clusters, those which are small enough, and those which require further sub-clustering. I'll use partition to sort a list of clusters into those which are small enough and those which need subclustering. I'll use the list monad's >>= :: [a] -> (a -> [b]) -> [b] (here [Cluster V2] -> ([V2] -> [Cluster V2]) -> [Cluster V2]), which maps a function over a list and flattens the result, to implement the notion of subclustering. And I'll use until to repeatedly subcluster until the set of remaining too-large clusters is empty.
-- smallClusters :: Int -> Cluster V2 -> [Cluster V2]
smallClusters :: (Applicative f, Foldable f, Ord n, Floating n) => Int -> Cluster (f n) -> [Cluster (f n)]
smallClusters maxSize vs = fst $ until (null . snd) splitLarge ([], [vs])
where
smallEnough xs = length xs <= maxSize
splitLarge (small, remaining) =
let (newSmall, large) = partition smallEnough remaining
in (small ++ newSmall, large >>= cluster)
A quick test, cribbed from #user2407038's answer:
testPts :: [V2]
testPts = map (uncurry Pair)
[ (0,0), (1,0), (2,1), (0,2)
, (5,2), (5,4), (4,3), (4,4)
, (8,2), (9,3), (10,2)
, (11,4), (12,3), (13,3), (13,5) ]
ghci> smallClusters 5 testPts
[
[Pair {px = 0.0, py = 0.0},Pair {px = 1.0, py = 0.0},Pair {px = 2.0, py = 1.0},Pair {px = 0.0, py = 2.0}],
[Pair {px = 5.0, py = 2.0},Pair {px = 5.0, py = 4.0},Pair {px = 4.0, py = 3.0},Pair {px = 4.0, py = 4.0}],
[Pair {px = 8.0, py = 2.0},Pair {px = 9.0, py = 3.0},Pair {px = 10.0, py = 2.0}]
[Pair {px = 11.0, py = 4.0},Pair {px = 12.0, py = 3.0},Pair {px = 13.0, py = 3.0},Pair {px = 13.0, py = 5.0}]
]
There you go. Small clusters in n-dimensional space, all without a single recursive function.
Labelling
Part of the point of working with the Applicative and Foldable interfaces, rather than working with V2 directly, was so I could demonstrate the following little magic trick.
Your original code represented points as 3-tuples consisting of two Doubles for the location and an Int for the point's label, but my V2 has no label. Can we recover this? Well, since the code doesn't at any point mention any concrete types - just standard type classes - we can just build a new type for labelled vectors. As long as said type is a Foldable Applicative all of the above code will continue to work without modification!
data Labelled m f a = Labelled m (f a) deriving (Show, Functor, Foldable)
instance (Monoid m, Applicative f) => Applicative (Labelled m f) where
pure = Labelled mempty . pure
Labelled m ff <*> Labelled n fx = Labelled (m <> n) (ff <*> fx)
The Monoid constraint is there because when combining actions you also need a way to combine their labels. I'm just going to use First - left-biased choice - because I'm not expecting the points' labels to be relevant to the zipping operations like modulus and boundingBox.
type LabelledV2 = Labelled (First Int) Pair Double
testPts :: [LabelledV2]
testPts = zipWith (Labelled . First . Just) [0..] $ map (uncurry Pair)
[ (0,0), (1,0), (2,1), (0,2)
, (5,2), (5,4), (4,3), (4,4)
, (8,2), (9,3), (10,2)
, (11,4), (12,3), (13,3), (13,5) ]
ghci> traverse (traverse (getFirst . lbl)) $ smallClusters 5 testPts
Just [[0,1,2,3],[4,5,6,7],[8,9,10],[11,12,13,14]] -- try reordering testPts

Simplify haskell function

I'm really struggling with Haskell atm.
It took me almost 6 hours to write a function that does what I want. Unfortunately I'm not satisfied with the look of it.
Could someone please give me any hints how to rewrite it?
get_connected_area :: Eq generic_type => [[generic_type]] -> (Int, Int) -> [(Int,Int)] -> generic_type -> [(Int,Int)]
get_connected_area habitat point area nullValue
| elem point area = area
| not ((fst point) >= 0) = area
| not ((snd point) >= 0) = area
| not ((fst point) < (length habitat)) = area
| not ((snd point) < (length (habitat!!0))) = area
| (((habitat!!(fst point))!!(snd point))) == nullValue = area
| otherwise =
let new_area = point : area
in
get_connected_area habitat (fst point+1, snd point) (
get_connected_area habitat (fst point-1, snd point) (
get_connected_area habitat (fst point, snd point+1) (
get_connected_area habitat (fst point, snd point-1) new_area nullValue
) nullValue
) nullValue
) nullValue
The function get's a [[generic_type]] (representing a landscape-map) and searches the fully connected area around a point that isn't equal to the given nullValue.
Eg.:
If the function gets called like this:
get_connected_area [[0,1,0],[1,1,1],[0,1,0],[1,0,0]] (1,1) [] 0
That literally means
0 1 0
1 1 1
0 1 0
1 0 0
Represents a map (like google maps). Start from the point (coordinates) (1,1) I want to get all coordinates of the elements that form a connected area with the given point.
The result therefore should be:
0 1 0
1 1 1
0 1 0
1 0 0
And the corresponting return value (list of coordinates of bold 1s):
[(2,1),(0,1),(1,2),(1,0),(1,1)]
One small change is that you can use pattern matching for the variable point. This means you can use (x, y) instead of point in the function declaration:
get_connected_area habitat (x, y) area nullValue = ...
Now everywhere you have fst point, just put x, and everywhere you have snd point, put y.
Another modification is to use more variables for subexpressions. This can help with the nested recursive calls. For example, make a variable for the inner-most nested call:
....
where foo = get_connected_area habitat (x, y-1) new_area nullValue
Now just put foo instead of the call. This technique can now be repeated for the "new" inner-most call. (Note that you should pick a more descriptive name than foo. Maybe down?)
Note that not (x >= y) is the same as x < y. Use this to simplify all of the conditions. Since these conditions test if a point is inside a bounding rectangle, most of this logic can be factored to a function isIn :: (Int, Int) -> (Int, Int) -> (Int, Int) -> Bool which will make get_connected_area more readable.
This would be my first quick pass through the function, and sort of the minimum that might pass a code review (just in terms of style):
getConnectedArea :: Eq a => [[a]] -> a -> (Int, Int) -> [(Int,Int)] -> [(Int,Int)]
getConnectedArea habitat nullValue = go where
go point#(x,y) area
| elem point area = area
| x < 0 = area
| y < 0 = area
| x >= length habitat = area
| y >= length (habitat!!0) = area
| ((habitat!!x)!!y) == nullValue = area
| otherwise =
foldr go (point : area)
[ (x+1, y), (x-1, y), (x, y+1), (x, y-1) ]
We bind habitat and nullValue once at the top level (clarifying what the recursive work is doing), remove indirection in the predicates, use camel-case (underdashes obscure where function application is happening), replace generic_type with a (using a noisy variable here actually has the opposite effect from the one you intended; I end up trying to figure out what special semantics you're trying to call out when the interesting thing is that the type doesn't matter (so long as it can be compared for equality)).
At this point there are lots of things we can do:
pretend we're writing real code and worry about asymptotics of treating lists as arrays (!!, and length) and sets (elem), and use proper array and set data structures instead
move your bounds checking (and possible null value checking) into a new lookup function (the goal being to have only a single ... = area clause if possible
consider improvements to the algorithm: can we avoid recursively checking the cell we just came from algorithmically? can we avoid passing area entirely (making our search nicely lazy/"productive")?
Here is my take:
import qualified Data.Set as Set
type Point = (Int, Int)
getConnectedArea :: (Point -> Bool) -> Point -> Set.Set Point
getConnectedArea habitat = \p -> worker p Set.empty
-- \p is to the right of = to keep it out of the scope of the where clause
where
worker p seen
| p `Set.member` seen = seen
| habitat p = foldr worker (Set.insert p seen) (neighbors p)
| otherwise = seen
neighbors (x,y) = [(x-1,y), (x+1,y), (x,y-1), (x,y+1)]
What I've done
foldr over the neighbors, as some commenters suggested.
Since the order of points doesn't matter, I use a Set instead of a list, so it's semantically a better fit and faster to boot.
Named some helpful intermediate abstractions such as Point and neighbors.
A better data structure for the habitat would also be good, since lists are linear time to access, maybe a 2D Data.Array—but as far as this function cares, all you need is an indexing function Point -> Bool (out of bounds and null value are treated the same), so I've replaced the data structure parameter with the indexing function itself (this is a common transformation in FP).
We can see that it would also be possible to abstract away the neighbors function and then we would arrive at a very general graph traversal method
traverseGraph :: (Ord a) => (a -> [a]) -> a -> Set.Set a
in terms of which you could write getConnectedArea. I recommend doing this for educational purposes—left as an exercise.
EDIT
Here's an example of how to call the function in terms of (almost) your old function:
import Control.Monad ((<=<))
-- A couple helpers for indexing lists.
index :: Int -> [a] -> Maybe a
index _ [] = Nothing
index 0 (x:_) = x
index n (_:xs) = index (n-1) xs
index2 :: (Int,Int) -> [[a]] -> Maybe a
index2 (x,y) = index x <=< index y
-- index2 uses Maybe's monadic structure, and I think it's quite pretty.
-- But if you're not ready for that, you might prefer
index2' (x,y) xss
| Just xs <- index y xss = index x xs
| otherwise = Nothing
getConnectedArea' :: (Eq a) => [[a]] -> Point -> a -> [a]
getConnectedArea' habitat point nullValue = Set.toList $ getConnectedArea nonnull point
where
nonnull :: Point -> Bool
nonnull p = case index2 p habitat of
Nothing -> False
Just x -> x /= nullValue
OK i will try to simplify your code. However there are already good answers and that's why i will tackle this with a slightly more conceptual approach.
I believe you could chose better data types. For instance Data.Matrix seems to provide an ideal data type in the place of your [[generic_type]] type. Also for coordinates i wouldn't chose a tuple type since tuple type is there to pack different types. It's functor and monad instances are not very helpful when it is chosen as a coordinate system. Yet since it seems Data.Matrix is just happy with tuples as coordinates i will keep them.
OK your rephrased code is as follows;
import Data.Matrix
gca :: Matrix Int -> (Int, Int) -> Int -> [(Int,Int)]
gca fld crd nil = let nbs = [id, subtract 1, (+1)] >>= \f -> [id, subtract 1, (+1)]
>>= \g -> return (f,g)
>>= \(f,g) -> return ((f . fst) crd, (g . snd) crd)
in filter (\(x,y) -> fld ! (x,y) /= nil) nbs
*Main> gca (fromLists [[0,1,0],[1,1,1],[0,1,0],[1,0,0]]) (2,2) 0
[(2,2),(2,1),(2,3),(1,2),(3,2)]
The first thing to note is, the Matrix data type is index 1 based. So we have our center point at (2,2).
The second is... we have a three element list of functions defined as [id, subtract 1, (+1)]. The contained functions are all Num a => a -> a type and i need them to define the surrounding pixels of the given coordinate including the given coordinate. So we have a line just like if we did;
[1,2,3] >>= \x -> [1,2,3] >>= \y -> return [x,y] would result [[1,1],[1,2],[1,3],[2,1],[2,2],[2,3],[3,1],[3,2],[3,3]] which, in our case, would yield a 2 combinations of all functions in the place of the numbers 1,2 and 3.
Which then we apply to our given coordinate one by one with a cascading instruction
>>= \[f,g] -> return ((f . fst) crd, (g . snd) crd)
which yields all neighboring coordinates.
Then its nothing more than filtering the neighboring filters by checking if they are not equal to the nil value within out matrix.

Return all reachable states in a DFA starting from the initial state

I'm trying to write a program in Haskell that returns a list of reachable states starting from the initial state, similar to a depth first search.
states_reachable :: Eq st => DFA st -> [st]
states_reachable (qs, sigma, delta, s, inF) =
[delta q a | q <- qs, a <- sigma]
Note:
qs is the set of states
sigma is the alphabet
delta is a transition function that takes a state and a symbol and returns the next state
s is the initial state
inF is a function that returns true if the state is an accepting state, false otherwise
As the function is defined now:
[delta q a | q <- qs, a <- sigma]
returns all the states in the DFA (which is incorrect).
What I would like is to populate the list by starting at the initial state s and testing the delta function with each input.
For example:
// returns the states reachable from the initial state for all symbols in sigma
[delta s a | a <- sigma]
The next step would be to repeat the process for each new state added to the list. This might add duplicates, but I can remove them later.
Then I tried:
[delta q a | q <- [delta s a | a <- sigma], a <- sigma]
I thought this might work, but it doesn't because it's creating the inner list and then using it to populate the outer list and then stopping.
I need to recursively build this list by exploring all the new states returned by the delta function.
You're attempting to compute the transitive closure of a relation here, where the relation is "x can get to y in one step". As such, I'd suggest using a generic transitive-closure solution, rather than a DFA-specific one, and then produce that relation from your DFA. Here's one fairly basic way to do it:
module Closure (closure) where
import Data.List (nub)
closure :: Eq a => (a -> [a]) -> a -> [a]
closure nexts init = go [init]
where go xs = let xs' = nub $ xs ++ (xs >>= nexts)
in if xs == xs'
then xs
else go xs'
The algorithm here is to have a list of reachable states, and at each step expand it by walking from each to all its nearest neighbors, and then nub the list to get rid of duplicates. Once that expansion step adds no new nodes, you're done.
Now, how to map your DFA problem onto this? It's not too hard: you just need to produce a nexts function using sigma and delta. Here we need to assume that your DFA delta function is total, ie that every node has a transition specified for every letter in sigma. This is easy enough by creating one additional "failure" node that all nodes can transition to if they don't like their input, so I'll just assume that's been done.
neighbors :: (node -> letter -> node) -> [letter] -> node -> [node]
neighbors delta sigma n = map (delta n) sigma
With that in place, your original problem reduces to:
closure (neighbors delta sigma) s

Haskell profilling subgraph mining algorithm

I try to solve problem of finding all connected subgraphs in Haskell. Algorithm used is described here. Quote from that paper:
As in every path algorithm, there are forward steps and back steps. A step forward is done if a given connected subgraph can be extended by addition of edge k, that is if edge k is not already part of the given subgraph, if k is adjacent to at least one edge of the given subgraph, and if addition of edge k is not forbidden by some restrictions given below.
A step back is done as soon as a given connected subgraph cannot be further elongated. In this case the edge added last is removed from the string, it is temporarily given the status "forbidden", and any other edges which were forbidden by backtracking from a previous longer string are simultaneously "allowed" again. In contrast, an edge which is forbidden by being removed from a string shorter than the present one remains forbidden, thus assuring that every connected subgraph is constructed once and only once.
To do this algorithm, I represented graphs as list of edges:
type Edge = (Int,Int)
type Graph = [Edge]
Firstly, I wrote function addEdge that check if is it possible to extend graph, return Nothing if it isn't possible or Edge to extend.
I have a "parent" graph and "extensible" graph, so I try to found one and only one edge that exists in "parent" graph, connected with "extensible" graph, not already included in "extensible" graph and so not included in forbidden set.
I wrote this function below:
addEdge :: Graph -> Graph -> [Edge] -> Maybe Edge
addEdge !parent !extensible !forb = listToMaybe $ intersectBy (\ (i,j) (k,l) -> (i == k || i == l || j == k || j == l)) (parent \\ (extensible `union` forb)) extensible
It's work! but, as I see from profiling whole program, addEdge is the most heavy function. I am sure, that my code isn't optimal. Leastways, intersectBy function that finds all possible solutions but i need only one. Is there any ways to make this code more rapid? Maybe, don't use standard lists but Set from Data.Set? It's first point of attention.
Main recursive function ext presented below:
ext :: Graph -> [Graph] -> Maybe Graph -> [(Edge,Int)] -> Int -> [Graph]
ext !main !list !grow !forb !maxLength | isEnd == True = (filter (\g -> (length g /= 1)) list) ++ (group main)
| ((addEdge main workGraph forbEdges) == Nothing) || (length workGraph) >= maxLength = ext main list (Just workGraph) forbProcess maxLength
| otherwise = ext main ((addedEdge:workGraph):list) Nothing forb maxLength where
workGraph = if grow == Nothing then (head list) else (bite (fromJust grow)) -- [Edge] graph now proceeded
workGraphLength = length workGraph
addedEdge = fromJust $ addEdge'
addEdge' = addEdge main workGraph forbEdges
bite xz = if (length xz == 1) then (fromJust (addEdge main xz forbEdges)):[] else tail xz
forbProcess = (head workGraph,workGraphLength):(filter ((<=workGraphLength).snd) forb)
forbEdges = map fst forb -- convert from (Edge,Level) to [Edge]
isEnd = (grow /= Nothing) && (length (fromJust grow) == 1) && ((addEdge main (fromJust grow) forbEdges) == Nothing)
I test my program on graph
c60 = [(1,4),(1,3),(1,2),(2,6),(2,5),(3,10),(3,7),(4,24),(4,21),(5,8),(5,7),(6,28),(6,25),
(7,9),(8,11),(8,12),(9,16),(9,13),(10,20),(10,17),(11,14),(11,13),(12,28),(12,30),(13,15),
(14,43),(14,30),(15,44),(15,18),(16,18),(16,17),(17,19),(18,47),(19,48),(19,22),(20,22),(20,21),
(21,23),(22,31),(23,32),(23,26),(24,26),(24,25),(25,27),(26,35),(27,36),(27,29),(28,29),(29,39),
(30,40),(31,32),(31,33),(32,34),(33,50),(33,55),(34,37),(34,55),(35,36),(35,37),(36,38),(37,57),
(38,41),(38,57),(39,40),(39,41),(40,42),(41,59),(42,45),(42,59),(43,44),(43,45),(44,46),(45,51),
(46,49),(46,51),(47,48),(47,49),(48,50),(49,53),(50,53),(51,52),(52,60),(52,54),(53,54),(54,56),(55,56),(56,58),(57,58),(58,60),(59,60)] :: Graph
For example, find all subgraphs with length from 1 to 7
length $ ext c60 [[(1,2)]] Nothing [] 7
>102332
Problem is too low speed of computation. As it pointed in original article, program have been written in FORTRAN 77 and launched on 150MHz workstation, perform test task minimum 30 times faster then my code on modern i5 processor.
I can't understand, why my program is so slow? Is there any ways to refactor this code? Or the best solution is porting it on C, and write bindings to C library over FFI?
I decided to take a shot at implementing the algorithm described in the paper using fgl. The complete code follows.
{-# LANGUAGE NoMonomorphismRestriction #-}
import Data.Graph.Inductive
import Data.List
import Data.Tree
uniq = map head . group . sort . map (\(a, b) -> (min a b, max a b))
delEdgeLU (from, to) = delEdge (from, to) . delEdge (to, from)
insEdgeDU (from, to) = insEdge (from, to, ()) . insNodeU to . insNodeU from where
insNodeU n g = if gelem n g then g else insNode (n, ()) g
nextEdges subgraph remaining
| isEmpty subgraph = uniq (edges remaining)
| otherwise = uniq $ do
n <- nodes subgraph
n' <- suc remaining n
return (n, n')
search_ subgraph remaining
= Node subgraph
. snd . mapAccumL step remaining
$ nextEdges subgraph remaining
where
step r e = let r' = delEdgeLU e r in (r', search_ (insEdgeDU e subgraph) r')
search = search_ empty
mkUUGraph :: [(Int, Int)] -> Gr () ()
mkUUGraph es = mkUGraph ns (es ++ map swap es) where
ns = nub (map fst es ++ map snd es)
swap (a, b) = (b, a)
-- the one from the paper
sampleGraph = mkUUGraph cPaper
cPaper = [(1, 2), (1, 5), (1, 6), (2, 3), (3, 4), (4, 5)]
The functions you'll want to use at the top-level are mkUUGraph, which constructs a graph from a list of edges, and search, which constructs a tree whose nodes are connected subgraphs of its input. For example, to compute the statistics shown at the bottom of "Scheme 1" in the paper, you might do this:
*Main> map length . tail . levels . search . mkUUGraph $ [(1, 2), (1, 5), (1, 6), (2, 3), (3, 4), (4, 5)]
[6,7,8,9,6,1]
*Main> sum it
37
I had a little trouble comparing it to your implementation, because I don't understand what all the arguments to ext are supposed to do. In particular, I couldn't work out how to call ext on the adjacency graph in the paper in such a way that I got 37 results. Perhaps you have a bug.
In any case, I did my best to emulate what I think your code is trying to do: finding graphs with up to seven edges, and certainly containing the edge (1, 2) (despite the fact that your code outputs many graphs that do not contain (1, 2)). I added this code:
mainHim = print . length $ ext c60 [[(1,2)]] Nothing [] 7
mainMe = print . length . concat . take 7 . levels $ search_ (mkUUGraph [(1,2)]) (mkUUGraph c60)
My code finds 3301 such graphs; yours finds 35571. I didn't try very hard to figure out where that discrepancy came from. In ghci, mainHim takes 36.45s; mainMe takes 0.13s. When compiled with -O2, mainHim takes 4.65s; mainMe takes 0.05s. The numbers for mainMe can be cut in half again by using the PatriciaTree graph implementation rather than the default one, and probably cut still farther with profiling and some thought. Just in case the reason mainMe is so much faster is that it is finding so many fewer graphs, I tested a modified main as well:
main = print . length . concat . take 8 . levels $ (search (mkUUGraph c60) :: Tree (Gr () ()))
This prints 35853, so it is finding roughly the same number of graphs as your test command. It takes 0.72s in ghci and 0.38s when compiled with -O2.
Or the best solution is porting it on C, and write bindings to C library over FFI?
No, you don't have to write it in C. The code generated by GHC is not that much slower than C. This huge speed difference suggests that you're implementing a different algorithm. So instead of rewriting in a different language, you should rewrite the Haskell code.
I guess the problem with your code is that you ...
use lists instead of sets
use breadth-first instead of depth-first enumeration (not sure)
use operations on the whole set of edges instead of cleverly keeping track of which edges are in which set
encode the recursive structure of the algorithm by hand, instead of using recursive calls.
I have to admit that I don't fully understand your code. But I read the paper you linked to, and the algorithm described there seems to be a simple brute-force enumeration of all results. So I guess the Haskell implementation should use the list monad (or list comprehensions) to enumerate all subgraphs, filtering out non-connected subgraphs during the enumeration. If you've never written code with the list monad before, just enumerating all subgraphs might be a good starting point.

Resources