How to declare data constructor for graphs - haskell

Below I give data constructors for list and trees.
data List a = NilL | Cons a (List a) deriving Show
data Tree a = NilT | Branch a [Tree a] deriving Show
With these definitions I can create infinite structures easily as shown below:
list = Cons 1 list
tree = Branch 1 lt
where
lt = tree : lt
I want to create infinite graphs (both directed and undirected) in this manner. How to declare a data constructor for it and how to create an infinite graph using that data constructor in Haskell?

A simple solution is to use some form of indirection, like indices
type Vertex = Integer
data Graph = Graph [Vertex] [(Vertex, Vertex)]
infGraph = Graph [1..] [(a, b) | a <- [1..], b <- [1..]]
However this isn't quite as satisfying as knot tying
data Vertex = Vertex { tag :: Integer
, edges :: [Vertex] }
type Graph = [Vertex] -- A graph is determined by the set of vertices
-- Construct a graph of infinitely many vertices in which
-- each vertex is connected.
infGraph = map (flip Vertex infGraph) [1..]
infGraph' = map (\v' -> v' infGraph') . map Vertex $ [1..]
We map Vertex over [1..] which gives us a list of functions [Vertex] -> Vertex which want a list of edges to connect each vertex to. Since infGraph is the list of all vertices, we pass that to each Vertex and tie the knot.
Of course for serious work, use a package.

Related

Find connected components from lists of edges

I need to find the connected components of a graph given its edges.
The graph edges are represented as a list of tuples, and the result needs to be a list of lists of vertices representing the connected components,
eg [(1,2), (2,3), (2,6), (5,6) (6,7), (8,9), (9,10)] -> [[1,2,3,5,6,7], [8,9,10]].
There could be any number of connected edges and unconnected graph components in the list. The tuples will always be in ascending order though, if that helps.
I have the signature groupEdges :: [(Int, Int)] -> [[Int]] but I just can't think of how to get from tuples to lists.
I thought of taking one tuple at a time and searching the rest for matching elements, but I don't know how to make this list of sublists.
This question is similar to this question on the CS Stack Exchange, but I don't want to use Data.Graph. I would like to do this without other packages if possible.
-- edit - comments from chepner and ThibautM have given me the first step. I can convert from tuples to lists by calling the function with groupEdges map (\(x,y) -> [x,y]) pairs.
Now I need to take this list of lists and group the connected components eg. [[1,2], [2,3], [2,6], [5,6], [6,7], [8,9], [9,10]] -> [[1,2,3,5,6,7], [8,9,10]]
You mentioned not using packages. The 4 functions I used are relatively easy to implement yourself if you really want. (And I encourage you to either do so or at least lookup their implementation)
Lists are used as set, which is a lot worse in performance than using a dedicated structure e.g. Data.Set. Using a disjoint-set (union-find, merge-find) data structure (referenced in your linked answer) would be even better, but probably not very good as a starting point for understanding
import Data.List (elem, intersect, union, partition)
pairs = [(1,2), (2,3), (2,6), (5,6), (6,7), (8,9), (9,10)]
pairs2 = map (\(x,y) -> [x,y]) pairs
-- add item to list, only if its no already present - using list as set
addItem item list | elem item list = list
| otherwise = item : list
-- used to test whether subgraphs are connected i.e. contain common node
intersects a b = not $ null $ intersect a b
unionAll :: (Eq a) => [[a]] -> [a]
unionAll (x1:x2:xs) = unionAll ((union x1 x2):xs)
unionAll [x] = x
unionAll [] = []
-- find edges that are connected to first edge/subgraph and merge them
groupFirst :: (Eq a) => [[a]] -> [[a]]
groupFirst (x:xs) = (unionAll (x:connected)) : disconnected
where
-- separate 'xs' edges/subgraphs into those that are connected to 'x' and the rest
(connected, disconnected) = partition (intersects x) xs
groupFirst [] = []
-- if no more edges/subgraphs can be connected with first edge, continue with remaining (disconnected) edge/subgraph ([5,6] in second iteration)
groupAll :: (Eq a) => [[a]] -> [[a]]
groupAll (x:xs) = y:(groupAll ys)
where
(y:ys) = groupFirst (x:xs)
groupAll [] = []
-- after first 'groupAll pairs2' - [[1,2,3,6],[5,6,7],[8,9,10]]
-- repeat this process until no more components can be connected together
groupAllRepeat x = if x /= groupAll x then groupAllRepeat (groupAll x) else x
main = print (groupAllRepeat pairs2)

Extracting elements of a graph in Haskell

I have a little question concerning Haskell.
If I have data type representing a graph like this one :
import Data.Map (Map,empty,member,insert)
import Graphviz
-- | A directed graph
data Graph v = Graph
{ arcsMap :: Map v [v] -- A map associating a vertex with its
successors
, labelMap :: Map v String -- The Graphviz label of each node
, styleMap :: Map v String -- The Graphviz style of each node
}
And i want to create a list by accessing every vertices of a given graph.
Like this:
-- | Returns the list of vertices of a graph in ascending order
--
-- >>> vertices emptyGraph
-- []
-- >>> vertices $ addVertices emptyGraph [1,4,5,2,1]
-- [1,2,4,5]
vertices :: Graph v -> [v]
My question is how can I tell Haskell to look into every vertices in arcsMap and create a list with it ?
Thankyou !!!
The function keys returns all keys of the map. So you could implement vertices like this:
vertices :: Graph v -> [v]
vertices = keys . arcsMap

Arbitrary instance for generating unbiased graphs for quickcheck

module Main where
import Test.QuickCheck
import Data.Set as Set
data Edge v = Edge {source :: v, target :: v}
deriving (Show,Eq,Ord)
data Graph v = Graph {nodes :: Set v, edges :: Set (Edge v)}
deriving Show
instance Arbitrary v => Int-> Arbitrary (Edge v) where
arbitrary = sized aux
where aux n = do s <- arbitrary
t <- arbitrary `suchThat` (/= s)
return $ Edge {source = s, target = t}
instance (Ord v, Arbitrary v) => Arbitrary (Graph v) where
arbitrary = aux `suchThat` isValid
where aux = do ns <- arbitrary
es <- arbitrary
return $ Graph {nodes = fromList ns, edges = fromList es}
This current definition of the instance is generating graphs with few edges, how do I alter it so it's less biased and it satisfies these two functions? :
-- | The function 'isDAG' tests if the graph is acyclic.
isDAG :: Ord v => Graph v -> Bool
isDAG g = isValid g && all nocycle (nodes g)
where nocycle v = all (\a -> v `notMember` reachable g a) $ Set.map target (adj g v)
-- | The function 'isForest' tests if a valid DAG is a florest (a set of trees), in other words,
-- if every node(vertex) has a maximum of one adjacent.
isForest :: Ord v => DAG v -> Bool
isForest g = isDAG g && all (\v -> length (adj g v) <= 1) (nodes g)
First you must figure out how to construct a graph which satisfies those properties.
DAG: If your nodes admit some ordering, and for each edge (u,v) you have u < v then the graph is acyclic. This ordering can be any ordering at all, so you can just manufacture an arbitrary ordering on the set of nodes in the graph.
Forest: If your graph has no edges, this property is trivially satisfied. Initially you can add any edge whose source is any node. If you add an edge, remove the source of that edge from the remaining available nodes.
I guess the big question is how to translate this to code. QuickCheck provides many combinators, esp. for selecting from lists, with and without replacement, of various sizes, etc.
instance (Ord v, Arbitrary v) => Arbitrary (Graph v) where
arbitrary = do
ns <- Set.fromList <$> liftA2 (++) (replicateM 10 arbitrary) arbitrary
First you generate a random set of nodes.
let ns' = map reverse $ drop 2 $ inits $ Set.toList ns
For each node, this computes the (non-empty) set of nodes which are "greater" than that node. Here "greater" just means according to the arbitrary ordering induced by the order of the elements in the list. This gets you the DAG property.
es <- sublistOf ns' >>=
mapM (\(f:ts) -> Edge f <$> elements ts)
You then get a random sublist of that list (which gets you the forest property), and for each element in that random sublist, you create an edge pointing from the "largest" node in that set to one that is "smaller".
return $ Graph ns (Set.fromList es)
Then you're done! Test like so:
main = quickCheck $ forAll arbitrary (liftA2 (&&) (isDAG :: Graph Integer -> Bool) isForest)
A natural way of constructing graphs is inductively, adding one node at a time. Then it becomes quite easy to ensure the required properties hold:
If for each added node its edges point only to existing nodes (and not in the other direction), we ensure the DAG property.
If there is at most one edge going from a node, we ensure the forest property. (As you didn't provide the adj function, it's not clear if by "forest" you mean there is at most one edge going from a node or to a node.)
So the process of generating such a graph would go as follows:
Generate a list of random nodes.
Construct a graph by add them one by one. For each node, either add a random edge to one of the already added nodes, or no edge (decide randomly).
The main factor here is deciding whether to add an edge or not. By tweaking this parameter you get more or less trees in your forest. One option is to use frequency for that.

Reconstruct a graph from BFS output in Haskell

I want to reconstruct the incidence structure of a graph in Haskell, which is given by the output of a breadth first traversal of it. Explicitly, the output consists of a root vertex and a list of neighborhoods (a neighborhood is a list of vertices marked as new or old (= already visited)), where each neighborhood corresponds to the least vertex which has not been assigned to a neighborhood, yet.
In any imperative language, I would solve the problem by using a queue:
Input: root vertex r, list of neighborhoods L
(1) Put r into the empty queue Q
(2) if Q is empty then STOP
(3) extract the first vertex v of Q
(4) extract the first neighborhood N of L
(5) append the unvisited vertices of N to Q
(6) remove the markings (new/old) of the nodes of N and assign v to N
(7) goto (2)
I tried to implement this naive algorithm in Haskell (by using a list or by using Data.Sequence as queue), but ghci always runs out of memory. This should not happen, because although the input consists of 300MB data, 16GB RAM should clearly suffice.
Therefore the naive implementation seems to cause a memory leak. How would you implement this algorithm in Haskell?
Edit:
Here are the (slightly simplified) data types, I use:
data Output = Out !Vertex ![[BFSNode]]
data Vertex = Vertex Integer SomeMoreComplexData
data BFSNode = New Vertex | Old Integer
data Graph = ![Vertex] ![(Integer,[Integer])]
The data type "Output" contains the already parsed BFS output consisting of the root vertex and the lists of neighborhoods. BFSNode corresponds to a node in the BFS tree which belongs to either a new vertex which is visited for the first time, or to an old vertex which already has been visited and which is therefore referred by its unique number. Note that the parsing process works fine and consumes very few memory.
My aim is to convert "Output" into the data type "Graph" which consists of the lists of vertices and of an incidence list.
Here is a simplified version of my implementation:
readTree :: [[BFSNode]] -> Seq Integer -> Graph
readTree [] _ = Graph [] []
readTree (nb:nbs) qs =
let (i :< qs') = viewl qs
newVs = fromList $! map nodeNr . filter isNew $ nb
(Graph vs adj) = readTree nbs $ qs' >< newVs
in Graph (map unNew (filter isNew nb) ++ vs) ((i,nub $ map nodeNr nb):adj)
"nbs" is the list of neighborhoods, "qs" is the queue. The function "nodeNr" extracts the unique identification number from a vertex, "isNew" tests whether a vertex is new, and "unNew" unpacks a new vertex from the data type "BFSNode".
Edit2:
I think I localized the problem now. Maybe it has nothing to do with my implementation of the conversion process. My failure was to use the build in function "read" to read the data type "Output" from a file. I realized now that Haskell has problems with reading big files. Even if it were just about reading a list of integers, e.g.
main = do
txt <- readFile "test"
writeFile "test2" . show $ (read txt :: [Integer]) }
the program will run out of memory if the file "test" is big enough. I understand now, that it is no good idea to parse data in this way, since "read" will load all data into the memory before showing any output, but I still do not understand why it fills 16GB of RAM although the file amounts not even 500MB. Do you have any idea what is wrong with "read"? Does Haskell show the same behavior on your machines?
Edit3:
Now I implemented a stream based parsing function "readOutput" which takes a String and returns the data type "Output". This function is lazy, so I immediately get an output when I call it. But when I compose it with my conversion function "readTree" (which is clearly tail-recursive) I get no output at all and the memory usage increases as usual. What am I doing wrong?
Edit4:
The problem in Edit3 came from some strictifications which I removed now.
This question does not specify a key ingredient - how is the graph going to be represented in Haskell? Functional programs require carefully thought out data structures to maximize sharing and run efficiently. Usually, this means they're recursively built from nothing (inductive). There's a paper on inductive graphs and functional graph algorithmsā€ˇ that gives one representation:
module Test where
data Graph a = Empty | Extension (Graph a) [Int] (Int, a)
deriving Show
That is, a graph is either Empty, or a (smaller) graph extended by one node. This is exactly how lists are built using Cons in functional languages, except that the additional node has to specify the smaller graph, the predecessor links ([Int]), and the new node number and data, (Int,a). Note that they also implemented this as an abstract type ''for efficiency reasons.''
A graph with one node can be generated by extending the empty graph.
singleton :: (Int,a) -> Graph a
singleton x = Extension Empty [] x
Using this structure, it's simple to define a recursive parse algorithm for your BFS tree.
data Mark a = Visited Int | New (Int,a) deriving Show
parse :: (Int,a) -> [[Mark a]] -> Graph a
parse x nbrs = extend Empty [x] nbrs
extend :: Graph a -> [(Int,a)] -> [[Mark a]] -> Graph a
extend g [] [] = g
extend g _ [] = Empty -- leftover nodes, really an error.
extend g [] _ = Empty -- leftover neighborhoods, really an error.
extend g (x : tl) (nbr : nbrs) =
extend (Extension g (seen nbr) x) (news tl nbr) nbrs
news :: [(Int,a)] -> [Mark a] -> [(Int,a)]
news l (New x : tl) = news (uniq l x) tl
news l (_ : tl) = news l tl
news l [] = l
uniq :: [(Int,a)] -> (Int,a) -> [(Int,a)]
uniq (x:tl) y = x : if (fst x == fst y) then tl else uniq tl y
uniq [] y = [y]
seen :: [Mark a] -> [Int]
seen (Visited i : tl) = i : seen tl
seen (_ : tl) = seen tl
seen [] = []
m0 = [New (1,())]
m1 = [Visited 0, New (2,()), New (3,())]
m2 = [Visited 1, New (3,())]
m3 = [Visited 1, Visited 2]
nbrs = [m0,m1,m2,m3]
Testing it out,
$ ghci
GHCi, version 7.6.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> :load Test
[1 of 1] Compiling Test ( Test.hs, interpreted )
Ok, modules loaded: Test.
*Test> parse (0,()) nbrs
Extension (Extension (Extension (Extension Empty [] (0,())) [0] (1,())) [1] (2,())) [1,2] (3,())
For efficiency, you could do the following:
The news and seen functions could be combined let (ns,sn) = newseen nbr ([],[]) and made tail-recursive (passing their partially constructed lists and returning immediately) for efficiency.
Your input could keep track of the node at the center of each neighbor list. This would avoid the list concatenation in the stack of neighbors. Alternatively, you could use a functional dequeue to hold that stack.
If you haven't seen it, I'd recommend Okasaki's book on purely functional data structures.

Haskell 2-3-4 Tree

We've been asked to create a 2-3-4 tree in Haskell, as in write the data type, the insert function, and a display function.
I'm finding it very difficult to get information on this kind of tree, even in a language I'm comfortable with (Java, C++).
What I have so far -
data Tree t = Empty
| Two t (Tree t)(Tree t)
| Three t t (Tree t)(Tree t)(Tree t)
| Four t t t (Tree t)(Tree t)(Tree t)(Tree t) deriving (Eq, Ord, Show)
leaf2 a = Two a Empty Empty
leaf3 a b = Three a b Empty Empty Empty
leaf4 a b c = Four a b c Empty Empty Empty Empty
addNode::(Ord t) => t -> Tree t -> Tree t
addNode t Empty = leaf2 t
addNode x (Two t left right)
| x < t = Two t (addNode x left) right
| otherwise = Two t left (addNode x right)
This compiles but I'm not sure if it's correct, but not sure how to start writing the insert into a three node or four node.
The assignment also says that "deriving show" for the display function is not enough, that it should print out the tree in the format normally seen in diagrams. Again, unsure on the way to go with this.
Any help or direction appreciated.
I know nothing about 2-3-4 trees, but for the Three node, you would start with something like this:
addNode t (Three x y left mid right)
| cond1 = expr1
| cond2 = expr2
(etc)
What cond1, cond2, expr1, and expr2 are, exactly, is dependent on the definition of what a 2-3-4 tree is.
As for a show method, the general outline would be this:
instance (Show t) => Show (Tree t) where
show Empty = ...
show (Two x l r) = ...show x...show l...show r...
show (Three x y l m r) = ...
show (Four x y z l m n r) = ...
The implementation depends on how you want it to look, but for the non-Empty cases, you will probably invoke show on all of the components of the tree being shown. If you want to indent the nested parts of the tree, then perhaps you should create a separate method:
instance (Show t) => Show (Tree t) where
show = showTree 0
showTree :: Show t => Int -> Tree t -> String
showTree n = indent . go
where indent = (replicate n ' ' ++)
go Empty = "Empty"
go (Two x l r) = (...show x...showTree (n+1) l...showTree (n+1) r...)
(etc)
We've been asked to create a 2-3-4 tree
My condolences. I myself once had to implement one for homework. A 2-3-4 tree is a B-tree with all the disadvantages of the B-tree and none of the advantages, because writing the cases separately for each number of children as you do is as cumbersome as having a list of only 2-4 elements.
Point being: B-tree insertion algorithms should work, just fix the size. Cormen et al. have pseudocode for one in their book Introduction to algorithms (heavy imperativeness warning!).
It might still be better to have lists of data elements and children instead of the four-case algebraic data type, even if the type wouldn't enforce the size of the nodes then. At least it would make it easier to expand the node size.

Resources