How to ensure correct edges in graph - haskell

I was trying to make a datatype for a graph in Haskell as follows:
type Vertex a = a
type Edge a = (Vertex a, Vertex a)
data Graph a = Combine [Vertex a] [Edge a]
This is a representation that worked for what I wanted to do, but I realized there could be edges for vertices which are not in the vertex-list.
My question is thus whether there is a possibility to assure every edge only contains vertices from the vertex-list?
I thought a lot about it already, but the best idea I got thus far was some function that makes a new graph with all missing vertices from the edge-list added to the vertex-list. Something like:
fix_graph :: Graph a -> Graph a
fix_graph (Combine v e) = Combine (removeDuplicates (v ++ [x | (x,_) <- e] ++ [y | (_,y) <- e])) e
removeDuplicates :: [t] -> [t]
...
Because this idea did not really satisfy me (also because I didn't take the time to implement it well), I wondered whether it would be possible to have a data constructor that adds the vertices from the edges which are not in the vertex-list yet immediately.
I've already read through the answers here, but I'm not really fond of the adjacency-representation used there. I know I am being annoying, but I would just like to get to know whether there aren't any other possibilities to solve this problem.
If anybody could help me with a solution or with getting rid of my illusions, it would be helpful...
Thanks in advance

So there are a couple different options:
Tying the knot
There are lots of ways to encode graphs in Haskell. The absolute simplest is to use a process called "tying the knot" to create circularity in a tree data structure. So for example to encode this graph:
.--.
A -- B -- C -- D -./
| | | |
E -- F G -- H
| |
+--------------+
You can simply write a node as its name and list of children:
data Node = Node String [Node]
instance Eq Node where
Node a _ == Node b _ = a == b
my_graph = [a, b, c, d, e, f, g, h] where
a = Node "A" [b, e]
b = Node "B" [a, c, f]
c = Node "C" [b, d, g]
d = Node "D" [c, d, h]
e = Node "E" [a, f, h]
f = Node "F" [b, e]
g = Node "G" [c, h]
h = Node "H" [d, e, g]
This has a lot of convenience: you can now walk through the data structure like any other Haskell data structure, with tail-recursive functions. To terminate on cycles, on recursion you tack the current item onto a path variable and the first thing that your recursive logic should say is, | nodeelempath = ... to handle the cycles however you want.
The flip side is that your minor consistency issues have blown up a bit into really thorny consistency issues. Consider for example the difference between these two:
-- A has one child, which is itself; B has one child, which is A.
let a = Node "A" [a]; b = Node "B" [a] in [a, b]
-- this looks almost the same but if you descend far enough on B you find an
-- impostor node with the wrong children.
let a = Node "A" [a]
impostor = Node "A" [b]
b = Node "B" [Node "A" [Node "A" [impostor]]]
in [a, b]
So that sucks and the only real answer I have for it is, "normalize by converting to one of the below...".
Anyway, the above trick also goes by the names mutual recursion and letrec, and basically means that within a where or let clause, all of the definitions that you put there can "see each other". It is not a property of laziness; you can make the above data structure in a strict language too -- but the language design for a functional strict language which understands mutually-recursive definitions this way might be a little difficult. (With a non-functional language, you just create the pointers as you need.)
Explicit numerology
Now think about how you'd take such a graph as we've got above, and convert it to your representation. The easiest way would involve going through a middle-man step which contains an Array:
import From.Above.Code (Node)
import Data.Array
type Graph = Array [Int]
graph :: [Node] -> Maybe Graph
graph nodes = fmap (array (1, length nodes)) . sequence $ map format nodes where
indices = zip nodes [1..]
pair x y = (x, y)
format node#(Node _ children) = do -- in the Maybe monad
index <- lookup node indices
index_list <- sequence $ map (flip lookup indices) children
return (index, index_list)
Now, this has a lot fewer consistency issues, which can now all be alleviated programmatically. However, those consistency issues can serve a purpose if you want to programmatically create such a graph with the State monad, and you want to temporarily leave the data structure in an inconsistent state until the proper node is read. The only disadvantage is that when you write the graph into your file, it looks a bit harder to understand because numbers are less friendly than strings:
array (1, 8) [
(1, [2, 5]),
(2, [1, 3, 6]),
(3, [2, 4, 7]),
(4, [3, 4, 8]),
(5, [1, 6, 8]),
(6, [2, 5]),
(7, [3, 8]),
(8, [4, 5, 7])]
You can solve this with, say, a Map String [String] for the tradeoff that accesses become O(log n). In any case you should learn that representation: you will want to convert to an IntMap [Int] and back when you to do your "completeness checks" that you're proposing.
Once you've got these, it turns out that you can use a backing Array Int Node to create a recursive [Node] as above:
nodesFromArray arr = nodes where
makeNode index children = Node (show index) [backingArray ! c | c <- children]
backingArray = array (bounds arr) [(i, makeNode i c) | (i, c) <- assocs arr]
nodes = map makeNode arr
Lists of edges
Once you've got the above lists (either Map.toList or Array.assocs), lists of edges become very easy to make:
edges_from_array = concatMap . uncurry (fmap . pair) . assocs
The flip-side is a little more complicated and accomplishes what you're trying to do directly:
import Data.Map (Map)
import Data.Set (Set)
import qualified Data.Map as Map
import qualified Data.Set as Set
makeGraphMap vertices edges = add edges (Map.fromList $ blankGraph vertices) where
blankGraph verts = zip verts (repeat Set.empty)
setInsert x Nothing = Just $ Set.singleton x
setInsert x (Just set) = Just $ Set.insert x set
add [] graphMap = fmap Set.toList graphMap
add ((a, b) : es) graphMap = Map.alter (setInsert b) a verts
That is, we walk the list of edges with a map which maps keys to their sets of children; we initialize this to the list of vertices mapping to empty sets (so that we can have disconnected single nodes in our graphs), then walk through the edges by inserting the value to the set at the key, creating that set if we don't see the key.

Related

Join elements with list with some same attributes

I have a list with the type `[(Int, Char, Int)]'. E.g:
[(1, 'x', 1), (1, 'y', 2), (2, 'x', 1)]
The first Int is the number of times the Char appears and the second Int is to differentiate the same char from each other. For example, there could be x1 and x2.
I want to join elements of that list that have the same 2nd and 3rd element. In the case of the list above, it would become [(3, 'x', 1), (1, 'y', 2)] (the first and third tuples from the initial list were added together).
I've looked into zipWith and list comprehensions, but none of them seem to work. Is there any other way that I'm not thinking about that might work here?
The two functions you want to use are Data.List.sortBy and Data.List.groupBy.
If we sort by comparing the second and third elements of each tuple, we get the entries in the list sorted by variable and exponent. This is accomplished by passing a lambda which uses pattern macthing to extract and compare just those elements.
import Data.List
lst = [(1, 'x', 1), (1, 'y', 2), (2, 'x', 1)]
lst' = sortBy (\(_, a, b) (_, a', b') -> (a,b) `compare` (a',b')) lst
-- [(1,'x',1), (2,'x',1), (1,'y',2)]
Now we need to group based on the second and third values. groupBy will not work the way you need on an unsorted list, so don't skip that step.
The lambda being passed to groupBy here should look very familiar from the previous example.
lst'' = groupBy (\(_, a, b) (_, a', b') -> a == a' && b == b') lst'
-- [[(1,'x',1), (2,'x',1)], [(1,'y',2)]]
Now summing the first elements of the tuples and incorporating the other information is trivial with list comprehensions.
We get the variable and exponent info from the first element in the list and bind those to x and y respectively, then sum up the first coefficients and build a new tuple.
[let (_,x,y) = lst!!0 in (sum [c | (c,_,_) <- lst], x, y) | lst <- lst'', not (null lst)]
-- [(3,'x',1), (1,'y',2)]
First of all, I would suggest working with more meaningful domain types. A 3-tuple of built-in types could mean a lot of different things. By defining a new type and naming the components, you make everything clearer and prevent mistakes like getting the two Ints mixed up:
type Power = Int
type Coefficient = Int
data Exp var = Exp var Power deriving (Show, Eq, Ord)
data Term var = Term Coefficient (Exp var) deriving Show
What you're doing looks a lot to me like combining terms in polynomials, so I've defined types that make sense in that context. You may prefer different names, or a different structure, if you're actually doing something else.
Now you're looking for a function of type [Term Char] -> [Term Char], which groups together like Exps. Generally Data.Map.fromListWith is a great tool for grouping list items together by a key:
import qualified Data.Map as M
combine :: Ord a => [Term a] -> M.Map (Exp a) Coefficient
combine = M.fromListWith (+) . map toTuple
where toTuple (Term coef exp) = (exp, coef)
Then all that's left is to re-inflate the Map we've extracted to a list again:
simplify :: Ord a => [Term a] -> [Term a]
simplify = map fromTuple . M.assocs . combine
where fromTuple (exp, coef) = Term coef exp
And indeed, we get the grouping you hoped for:
*Main> simplify [Term 1 (Exp 'x' 1), Term 1 (Exp 'y' 2), Term 2 (Exp 'x' 1)]
[Term 3 (Exp 'x' 1),Term 1 (Exp 'y' 2)]

Haskell :: Recursion in Recursion for Loop in Loop

How it goes: Based on the set of tuple (id, x, y), find the min max for x and y , then two dots (red points) created. Each element in tuple are grouped to two groups based on the distance towards the red dots.
Each group cant exceed 5 dots. If exceed, new group should be computed. I've managed to do recursion for the first phase. But I have no idea how to do it for second phase. The second phase should look like this:
Based on these two groups, again it need to find the min max for x and y (for each group), then four dots (red points) created. Each element in tuple are grouped to two groups based on the distance towards the red dots.
getDistance :: (Int, Double, Double) -> (Int, Double, Double) -> Double
getDistance (_,x1,y1) (_,x2,y2) = sqrt $ (x1-x2)^2 + (y1-y2)^2
getTheClusterID :: (Int, Double, Double) -> Int
getTheClusterID (id, _, _) = id
idxy = [(id, x, y)]
createCluster id cs = [(id, minX, minY),(id+1, maxX, minY), (id+2, minX, maxY), (id+3, maxX, maxY)]
where minX = minimum $ map (\(_,x,_,_) -> x) cs
maxX = maximum $ map (\(_,x,_,_) -> x) cs
minY = minimum $ map (\(_,_,y,_) -> y) cs
maxY = maximum $ map (\(_,_,y,_) -> y) cs
idCluster = [1]
cluster = createCluster (last idCluster) idxy
clusterThis (id,a,b) = case (a,b) of
j | getDistance (a,b) (cluster!!0) < getDistance (a,b) (cluster!!1) &&
-> (getTheClusterID (cluster!!0), a, b)
j | getDistance (a,b) (cluster!!1) < getDistance (a,b) (cluster!!0) &&
-> (getTheClusterID (cluster!!1), a, b)
_ -> (getTheClusterID (cluster!!0), a, b)
groupAll = map clusterThis idxy
I am moving from imperative to functional. Sorry if my way of thinking is still in imperative way. Still learning.
EDIT:
To clarify, this is the original data looks like.
The basic principle to follow in writing such an algorithm is to write small, compositional programs; each program is then easy to reason about and test in isolation, and the final program can be written in terms of the smaller ones.
The algorithm can be summarized as follows:
Compute the points which bound the set of points.
Split the rest of the points into two clusters, one containing points closer to the minimum point, the other containing all other points (equivalently, points closer to the maximum point).
If any cluster contains more than 5 points, repeat the process on that cluster.
The presence of a 'repeat the process' step indicates this to be a divide and conquer problem.
I see no need for an ID for each point, so I've dispensed with this.
To begin, define datatypes for each type of data you will be working with:
import Data.List (partition)
data Point = Point { ptX :: Double, ptY :: Double }
data Cluster = Cluster { clusterPts :: [Point] }
This may seem silly for such simple data, but it can potentially save you quite a bit of confusion during debugging. Also note the import of a function we will be using later.
The 1st step:
minMaxPoints :: [Point] -> (Point, Point)
minMaxPoints ps =
(Point minX minY
,Point maxX maxY)
where minX = minimum $ map ptX ps
maxX = maximum $ map ptX ps
minY = minimum $ map ptY ps
maxY = maximum $ map ptY ps
This is essentially the same as your createCluster function.
The 2nd step:
pointDistance :: Point -> Point -> Double
pointDistance (Point x1 y1) (Point x2 y2) = sqrt $ (x1-x2)^2 + (y1-y2)^2
cluster1 :: [Point] -> [Cluster]
cluster1 ps =
let (mn, mx) = minMaxPoints ps
(psmn, psmx) = partition (\p -> pointDistance mn p < pointDistance mx p) ps
in [ Cluster psmn, Cluster psmx ]
This function should clear - it is a direct translation of the above statement of this step into code. The partition function takes a predicate and a list and produces two lists, the first containing all elements for which the predicate is true, and the second all elements for which it is false. pointDistance is essentially the same as your getDistance function.
The 3rd step:
cluster :: [Point] -> [Cluster]
cluster ps =
cluster1 ps >>= \cl#(Cluster c) ->
if length c > 5
then cluster c
else [cl]
This also implements the statement above very directly. Perhaps the only confusing part is the use of >>=, which (here) has type [a] -> (a -> [b]) -> [b]; it simply applies the given function to each element of the given list, and concatenates the result (equivalently, it is written flip concatMap).
Finally your test case (which I hope I've translated correctly from pictures to Haskell data):
testPts :: [Point]
testPts = map (uncurry Point)
[ (0,0), (1,0), (2,1), (0,2)
, (5,2), (5,4), (4,3), (4,4)
, (8,2), (9,3), (10,2)
, (11,4), (12,3), (13,3), (13,5) ]
main = mapM_ (print . map (\p -> (ptX p, ptY p)) . clusterPts) $ cluster testPts
Running this program produces
[(0.0,0.0),(0.0,2.0),(2.0,1.0),(1.0,0.0)]
[(4.0,4.0),(5.0,2.0),(5.0,4.0),(4.0,3.0)]
[(10.0,2.0),(9.0,3.0),(8.0,2.0)]
[(13.0,3.0),(12.0,3.0),(11.0,4.0),(13.0,5.0)]
Functional programmers love recursion, yet they go to great lengths to avoid writing it. Jeez, people, make up your minds!
I like to structure my code, to the extent possible, using common, well-understood combinators. I want to demonstrate a style of Haskell programming which leans heavily on standard tools to implement the boring parts of a program (mapping, zipping, looping) as tersely and generically as possible, freeing you up to focus on the problem at hand.
So don't worry if you don't understand everything here. I just want to show you what's possible! (And please ask if you have questions!)
Vectors
First things first: we're working with two-dimensional space, so we'll need two-dimensional vectors and some secondary school vector algebra to work with them.
I'm going to parameterise my vector by the scalar on which our vector space is built. This'll allow me to work with standard type classes like Functor, so I can delegate a lot of the work of building a vector algebra to the machine. I've turned on DeriveFunctor and DeriveFoldable, which allow me to utter the magic words deriving (Functor, Foldable).
data Pair a = Pair {
px :: a,
py :: a
} deriving (Show, Functor, Foldable)
Hereafter I'm going to avoid working explicitly with Pair, and program to an interface, not an implementation. This'll allow me to build a simple linear algebra library in a manner that's independent of the dimensionality of the vector space. I'll give example type signatures in terms of V2:
type V2 = Pair Double
Scalar multiplication: functors
A vector space is required to have two operations: scalar multiplication and vector addition. Scalar multiplication means multiplying each component of a vector by a constant scalar. If you view a vector as a container of components, it should be clear that this means "do the same thing to every element in a container" - that is, it's a mapping operation. That's what Functor is for.
-- mul :: Double -> V2 -> V2
mul :: (Functor f, Num n) => n -> f n -> f n
mul k f = fmap (k *) f
Vector addition: zippy applicatives
Vector addition involves adding up the components of a vector point-wise. Thinking of a vector as a container of components, addition is a zipping operation - match up each element of the two vectors and add them up.
Applicative functors are functors with an additional "apply" operation. Thinking of a functor f as a container, Applicative's <*> :: f (a -> b) -> f a -> f b gives you a way to take a container of functions and apply it to a container of values to get a new container of values. It should be clear that one way to make Pair into an Applicative is to use zipping to apply functions to values.
instance Applicative Pair where
pure x = Pair x x
Pair f g <*> Pair x y = Pair (f x) (g y)
(For another example of a zippy applicative, see this answer of mine.)
Now that we have a way to zip two pairs, we can leverage a bit of standard Applicative machinery to implement vector addition.
-- add :: V2 -> V2 -> V2
add :: (Applicative f, Num n) => f n -> f n -> f n
add = liftA2 (+)
Vector subtraction, which gives you a way to find the distance between two points, is defined in terms of multiplication and addition.
-- minus :: V2 -> V2 -> V2
minus :: (Applicative f, Num n) => f n -> f n -> f n
v `minus` u = v `add` mul (-1) u
Dot products: foldable containers
2D Euclidean space is actually a Hilbert space - a vector space equipped with a way to measure lengths and angles in the form of a dot product. To take the dot product of two vectors, you multiply the components together and then add up the results. Once more, we'll be using Applicative to multiply the components, but that just gives us another vector: how do we implement "adding up the results"?
Foldable is the class of containers which admit an "aggregation" operation foldr :: (a -> b -> b) -> b -> f a -> b. The standard prelude's sum is defined in terms of foldr, so:
-- dot :: V2 -> V2 -> Double
dot :: (Applicative f, Foldable f, Num n) => f n -> f n -> n
v `dot` u = sum $ liftA2 (*) v u
This gives us a way to find the absolute length of a vector: dot it with itself and take the square root.
-- modulus :: V2 -> Double
modulus :: (Applicative f, Foldable f, Floating n) => f n -> n
modulus v = sqrt $ v `dot` v
So the distance between two points is the modulus of the difference of the vectors.
dist :: (Applicative f, Foldable f, Floating n) => f n -> f n -> n
dist v u = modulus (v `minus` u)
N-ary zipping: traversable containers
An axis-aligned (hyper-)rectangle can be defined by just two points. We'll represent the bounding box of a set of points as a Pair of vectors pointing to opposite corners of the bounding box.
Given a collection of vectors of components, we can find the opposite corners of the bounding box by finding the maximum and minimum of each component across the collection. This requires us to zip up, or transpose, a collection of vectors of components into a vector of collections of components. For this I'll use Traversable's sequenceA.
-- boundingBox :: [V2] -> Pair V2
boundingBox :: (Traversable t, Applicative f, Ord n) => t (f n) -> Pair (f n)
boundingBox vs =
let components = sequenceA vs
in Pair (minimum <$> components) (maximum <$> components)
Clustering
Now that we have a library for working with vectors, we can get down to the meaty part of the algorithm: dividing sets of points into clusters.
Partitioning
Let me rephrase the specification of the inner loop of your algorithm. You want to partition a set of points based on whether they're closer to the bottom-left corner of the set's bounding box or to the top-right corner. That's what partition does.
We can write a function, whichCluster which uses minus and modulus to decide this for a single point, and then use partition to apply it to the whole set.
type Cluster = []
-- cluster :: Cluster V2 -> [Cluster V2]
cluster :: (Applicative f, Foldable f, Ord n, Floating n) => Cluster (f n) -> [Cluster (f n)]
cluster vs =
let Pair bottomLeft topRight = boundingBox vs
whichCluster v = dist v bottomLeft <= dist v topRight
(g1, g2) = partition whichCluster vs
in [g1, g2]
Repetition, repetition, repetition
Now we want to repeatedly cluster until we don't have any groups larger than 5. Here's the plan. We'll keep track of two sets of clusters, those which are small enough, and those which require further sub-clustering. I'll use partition to sort a list of clusters into those which are small enough and those which need subclustering. I'll use the list monad's >>= :: [a] -> (a -> [b]) -> [b] (here [Cluster V2] -> ([V2] -> [Cluster V2]) -> [Cluster V2]), which maps a function over a list and flattens the result, to implement the notion of subclustering. And I'll use until to repeatedly subcluster until the set of remaining too-large clusters is empty.
-- smallClusters :: Int -> Cluster V2 -> [Cluster V2]
smallClusters :: (Applicative f, Foldable f, Ord n, Floating n) => Int -> Cluster (f n) -> [Cluster (f n)]
smallClusters maxSize vs = fst $ until (null . snd) splitLarge ([], [vs])
where
smallEnough xs = length xs <= maxSize
splitLarge (small, remaining) =
let (newSmall, large) = partition smallEnough remaining
in (small ++ newSmall, large >>= cluster)
A quick test, cribbed from #user2407038's answer:
testPts :: [V2]
testPts = map (uncurry Pair)
[ (0,0), (1,0), (2,1), (0,2)
, (5,2), (5,4), (4,3), (4,4)
, (8,2), (9,3), (10,2)
, (11,4), (12,3), (13,3), (13,5) ]
ghci> smallClusters 5 testPts
[
[Pair {px = 0.0, py = 0.0},Pair {px = 1.0, py = 0.0},Pair {px = 2.0, py = 1.0},Pair {px = 0.0, py = 2.0}],
[Pair {px = 5.0, py = 2.0},Pair {px = 5.0, py = 4.0},Pair {px = 4.0, py = 3.0},Pair {px = 4.0, py = 4.0}],
[Pair {px = 8.0, py = 2.0},Pair {px = 9.0, py = 3.0},Pair {px = 10.0, py = 2.0}]
[Pair {px = 11.0, py = 4.0},Pair {px = 12.0, py = 3.0},Pair {px = 13.0, py = 3.0},Pair {px = 13.0, py = 5.0}]
]
There you go. Small clusters in n-dimensional space, all without a single recursive function.
Labelling
Part of the point of working with the Applicative and Foldable interfaces, rather than working with V2 directly, was so I could demonstrate the following little magic trick.
Your original code represented points as 3-tuples consisting of two Doubles for the location and an Int for the point's label, but my V2 has no label. Can we recover this? Well, since the code doesn't at any point mention any concrete types - just standard type classes - we can just build a new type for labelled vectors. As long as said type is a Foldable Applicative all of the above code will continue to work without modification!
data Labelled m f a = Labelled m (f a) deriving (Show, Functor, Foldable)
instance (Monoid m, Applicative f) => Applicative (Labelled m f) where
pure = Labelled mempty . pure
Labelled m ff <*> Labelled n fx = Labelled (m <> n) (ff <*> fx)
The Monoid constraint is there because when combining actions you also need a way to combine their labels. I'm just going to use First - left-biased choice - because I'm not expecting the points' labels to be relevant to the zipping operations like modulus and boundingBox.
type LabelledV2 = Labelled (First Int) Pair Double
testPts :: [LabelledV2]
testPts = zipWith (Labelled . First . Just) [0..] $ map (uncurry Pair)
[ (0,0), (1,0), (2,1), (0,2)
, (5,2), (5,4), (4,3), (4,4)
, (8,2), (9,3), (10,2)
, (11,4), (12,3), (13,3), (13,5) ]
ghci> traverse (traverse (getFirst . lbl)) $ smallClusters 5 testPts
Just [[0,1,2,3],[4,5,6,7],[8,9,10],[11,12,13,14]] -- try reordering testPts

Haskell profilling subgraph mining algorithm

I try to solve problem of finding all connected subgraphs in Haskell. Algorithm used is described here. Quote from that paper:
As in every path algorithm, there are forward steps and back steps. A step forward is done if a given connected subgraph can be extended by addition of edge k, that is if edge k is not already part of the given subgraph, if k is adjacent to at least one edge of the given subgraph, and if addition of edge k is not forbidden by some restrictions given below.
A step back is done as soon as a given connected subgraph cannot be further elongated. In this case the edge added last is removed from the string, it is temporarily given the status "forbidden", and any other edges which were forbidden by backtracking from a previous longer string are simultaneously "allowed" again. In contrast, an edge which is forbidden by being removed from a string shorter than the present one remains forbidden, thus assuring that every connected subgraph is constructed once and only once.
To do this algorithm, I represented graphs as list of edges:
type Edge = (Int,Int)
type Graph = [Edge]
Firstly, I wrote function addEdge that check if is it possible to extend graph, return Nothing if it isn't possible or Edge to extend.
I have a "parent" graph and "extensible" graph, so I try to found one and only one edge that exists in "parent" graph, connected with "extensible" graph, not already included in "extensible" graph and so not included in forbidden set.
I wrote this function below:
addEdge :: Graph -> Graph -> [Edge] -> Maybe Edge
addEdge !parent !extensible !forb = listToMaybe $ intersectBy (\ (i,j) (k,l) -> (i == k || i == l || j == k || j == l)) (parent \\ (extensible `union` forb)) extensible
It's work! but, as I see from profiling whole program, addEdge is the most heavy function. I am sure, that my code isn't optimal. Leastways, intersectBy function that finds all possible solutions but i need only one. Is there any ways to make this code more rapid? Maybe, don't use standard lists but Set from Data.Set? It's first point of attention.
Main recursive function ext presented below:
ext :: Graph -> [Graph] -> Maybe Graph -> [(Edge,Int)] -> Int -> [Graph]
ext !main !list !grow !forb !maxLength | isEnd == True = (filter (\g -> (length g /= 1)) list) ++ (group main)
| ((addEdge main workGraph forbEdges) == Nothing) || (length workGraph) >= maxLength = ext main list (Just workGraph) forbProcess maxLength
| otherwise = ext main ((addedEdge:workGraph):list) Nothing forb maxLength where
workGraph = if grow == Nothing then (head list) else (bite (fromJust grow)) -- [Edge] graph now proceeded
workGraphLength = length workGraph
addedEdge = fromJust $ addEdge'
addEdge' = addEdge main workGraph forbEdges
bite xz = if (length xz == 1) then (fromJust (addEdge main xz forbEdges)):[] else tail xz
forbProcess = (head workGraph,workGraphLength):(filter ((<=workGraphLength).snd) forb)
forbEdges = map fst forb -- convert from (Edge,Level) to [Edge]
isEnd = (grow /= Nothing) && (length (fromJust grow) == 1) && ((addEdge main (fromJust grow) forbEdges) == Nothing)
I test my program on graph
c60 = [(1,4),(1,3),(1,2),(2,6),(2,5),(3,10),(3,7),(4,24),(4,21),(5,8),(5,7),(6,28),(6,25),
(7,9),(8,11),(8,12),(9,16),(9,13),(10,20),(10,17),(11,14),(11,13),(12,28),(12,30),(13,15),
(14,43),(14,30),(15,44),(15,18),(16,18),(16,17),(17,19),(18,47),(19,48),(19,22),(20,22),(20,21),
(21,23),(22,31),(23,32),(23,26),(24,26),(24,25),(25,27),(26,35),(27,36),(27,29),(28,29),(29,39),
(30,40),(31,32),(31,33),(32,34),(33,50),(33,55),(34,37),(34,55),(35,36),(35,37),(36,38),(37,57),
(38,41),(38,57),(39,40),(39,41),(40,42),(41,59),(42,45),(42,59),(43,44),(43,45),(44,46),(45,51),
(46,49),(46,51),(47,48),(47,49),(48,50),(49,53),(50,53),(51,52),(52,60),(52,54),(53,54),(54,56),(55,56),(56,58),(57,58),(58,60),(59,60)] :: Graph
For example, find all subgraphs with length from 1 to 7
length $ ext c60 [[(1,2)]] Nothing [] 7
>102332
Problem is too low speed of computation. As it pointed in original article, program have been written in FORTRAN 77 and launched on 150MHz workstation, perform test task minimum 30 times faster then my code on modern i5 processor.
I can't understand, why my program is so slow? Is there any ways to refactor this code? Or the best solution is porting it on C, and write bindings to C library over FFI?
I decided to take a shot at implementing the algorithm described in the paper using fgl. The complete code follows.
{-# LANGUAGE NoMonomorphismRestriction #-}
import Data.Graph.Inductive
import Data.List
import Data.Tree
uniq = map head . group . sort . map (\(a, b) -> (min a b, max a b))
delEdgeLU (from, to) = delEdge (from, to) . delEdge (to, from)
insEdgeDU (from, to) = insEdge (from, to, ()) . insNodeU to . insNodeU from where
insNodeU n g = if gelem n g then g else insNode (n, ()) g
nextEdges subgraph remaining
| isEmpty subgraph = uniq (edges remaining)
| otherwise = uniq $ do
n <- nodes subgraph
n' <- suc remaining n
return (n, n')
search_ subgraph remaining
= Node subgraph
. snd . mapAccumL step remaining
$ nextEdges subgraph remaining
where
step r e = let r' = delEdgeLU e r in (r', search_ (insEdgeDU e subgraph) r')
search = search_ empty
mkUUGraph :: [(Int, Int)] -> Gr () ()
mkUUGraph es = mkUGraph ns (es ++ map swap es) where
ns = nub (map fst es ++ map snd es)
swap (a, b) = (b, a)
-- the one from the paper
sampleGraph = mkUUGraph cPaper
cPaper = [(1, 2), (1, 5), (1, 6), (2, 3), (3, 4), (4, 5)]
The functions you'll want to use at the top-level are mkUUGraph, which constructs a graph from a list of edges, and search, which constructs a tree whose nodes are connected subgraphs of its input. For example, to compute the statistics shown at the bottom of "Scheme 1" in the paper, you might do this:
*Main> map length . tail . levels . search . mkUUGraph $ [(1, 2), (1, 5), (1, 6), (2, 3), (3, 4), (4, 5)]
[6,7,8,9,6,1]
*Main> sum it
37
I had a little trouble comparing it to your implementation, because I don't understand what all the arguments to ext are supposed to do. In particular, I couldn't work out how to call ext on the adjacency graph in the paper in such a way that I got 37 results. Perhaps you have a bug.
In any case, I did my best to emulate what I think your code is trying to do: finding graphs with up to seven edges, and certainly containing the edge (1, 2) (despite the fact that your code outputs many graphs that do not contain (1, 2)). I added this code:
mainHim = print . length $ ext c60 [[(1,2)]] Nothing [] 7
mainMe = print . length . concat . take 7 . levels $ search_ (mkUUGraph [(1,2)]) (mkUUGraph c60)
My code finds 3301 such graphs; yours finds 35571. I didn't try very hard to figure out where that discrepancy came from. In ghci, mainHim takes 36.45s; mainMe takes 0.13s. When compiled with -O2, mainHim takes 4.65s; mainMe takes 0.05s. The numbers for mainMe can be cut in half again by using the PatriciaTree graph implementation rather than the default one, and probably cut still farther with profiling and some thought. Just in case the reason mainMe is so much faster is that it is finding so many fewer graphs, I tested a modified main as well:
main = print . length . concat . take 8 . levels $ (search (mkUUGraph c60) :: Tree (Gr () ()))
This prints 35853, so it is finding roughly the same number of graphs as your test command. It takes 0.72s in ghci and 0.38s when compiled with -O2.
Or the best solution is porting it on C, and write bindings to C library over FFI?
No, you don't have to write it in C. The code generated by GHC is not that much slower than C. This huge speed difference suggests that you're implementing a different algorithm. So instead of rewriting in a different language, you should rewrite the Haskell code.
I guess the problem with your code is that you ...
use lists instead of sets
use breadth-first instead of depth-first enumeration (not sure)
use operations on the whole set of edges instead of cleverly keeping track of which edges are in which set
encode the recursive structure of the algorithm by hand, instead of using recursive calls.
I have to admit that I don't fully understand your code. But I read the paper you linked to, and the algorithm described there seems to be a simple brute-force enumeration of all results. So I guess the Haskell implementation should use the list monad (or list comprehensions) to enumerate all subgraphs, filtering out non-connected subgraphs during the enumeration. If you've never written code with the list monad before, just enumerating all subgraphs might be a good starting point.

List processing in Haskell

I am teaching myself Haskell and have run into a problem and need help.
Background:
type AInfo = (Char, Int)
type AList = [AInfo] (let’s say [(‘a’, 2), (‘b’,5), (‘a’, 1), (‘w’, 21)]
type BInfo = Char
type BList = [BInfo] (let’s say [‘a’, ‘a’, ‘c’, ‘g’, ‘a’, ‘w’, ‘b’]
One quick edit: The above information is for illustrative purposes only. The actual elements of the lists are a bit more complex. Also, the lists are not static; they are dynamic (hence the uses of the IO monad) and I need to keep/pass/"return"/have access to and change the lists during the running of the program.
I am looking to do the following:
For all elements of AList check against all elements of BList and where the character of the AList element (pair) is equal to the character in the Blist add one to the Int value of the AList element (pair) and remove the character from BList.
So what this means is after the first element of AList is checked against all elements of BList the values of the lists should be:
AList [(‘a’, 5), (‘b’,5), (‘a’, 1), (‘w’, 21)]
BList [‘c’, ‘g’, ‘w’, ‘b’]
And in the end, the lists values should be:
AList [(‘a’, 5), (‘b’,6), (‘a’, 1), (‘w’, 22)]
BList [‘c’, ‘g’]
Of course, all of this is happening in an IO monad.
Things I have tried:
Using mapM and a recursive helper function. I have looked at both:
Every element of AList checked against every element of bList -- mapM (myHelpF1 alist) blist and
Every element of BList checked against every element of AList – mapM (myHelpF2 alist) blist
Passing both lists to a function and using a complicated
if/then/else & helper function calls (feels like I am forcing
Haskell to be iterative; Messy convoluted code, Does not feel
right.)
I have thought about using filter, the character value of AList
element and Blist to create a third list of Bool and the count the
number of True values. Update the Int value. Then use filter on
BList to remove the BList elements that …… (again Does not feel
right, not very Haskell-like.)
Things I think I know about the problem:
The solution may be exceeding trivial. So much so, the more experienced Haskellers will be muttering under their breath “what a noob” as they type their response.
Any pointers would be greatly appreciated. (mutter away….)
A few pointers:
Don't use [(Char, Int)] for "AList". The data structure you are looking for is a finite map: Map Char Int. Particularly look at member and insertWith. toList and fromList convert from the representation you currently have for AList, so even if you are stuck with that representation, you can convert to a Map for this algorithm and convert back at the end. (This will be more efficient than staying in a list because you are doing so many lookups, and the finite map API is easier to work with than lists)
I'd approach the problem as two phases: (1) partition out the elements of blist by whether they are in the map, (2) insertWith the elements which are already in the map. Then you can return the resulting map and the other partition.
I would also get rid of the meaningless assumptions such as that keys are Char -- you can just say they are any type k (for "key") that satisfies the necessary constraints (that you can put it in a Map, which requires that it is Orderable). You do this with lowercase type variables:
import qualified Data.Map as Map
sieveList :: (Ord k) => Map.Map k Int -> [k] -> (Map.Map k Int, [k])
Writing algorithms in greater generality helps catch bugs, because it makes sure that you don't use any assumptions you don't need.
Oh, also this program has no business being in the IO monad. This is pure code.
import Data.List
type AInfo = (Char, Int)
type AList = [AInfo]
type BInfo = Char
type BList = [BInfo]
process :: AList -> BList -> AList
process [] _ = []
process (a:as) b = if is_in a b then (fst a,snd a + 1):(process as (delete (fst a) b)) else a:process as b where
is_in f [] = False
is_in f (s:ss) = if fst f == s then True else is_in f ss
*Main> process [('a',5),('b',5),('a',1),('b',21)] ['c','b','g','w','b']
[('a',5),('b',6),('a',1),('b',22)]
*Main> process [('a',5),('b',5),('a',1),('w',21)] ['c','g','w','b']
[('a',5),('b',6),('a',1),('w',22)]
Probably an important disclaimer: I'm rusty at Haskell to the point of ineptness, but as a relaxing midnight exercise I wrote this thing. It should do what you want, although it doesn't return a BList. With a bit of modification, you can get it to return an (AList,BList) tuple, but methinks you'd be better off using an imperative language if that kind of manipulation is required.
Alternately, there's an elegant solution and I'm too ignorant of Haskell to know it.
While I am by no means a Haskell expert, I have a partial attempt that returns that result of an operation once. Maybe you can find out how to map it over the rest to get your solution. The addwhile is clever, since you only want to update the first occurrence of an element in lista, if it exists twice, it will just add 0 to it. Code critiques are more than welcome.
import Data.List
type AInfo = (Char, Int)
type AList = [AInfo]
type BInfo = Char
type BList = [BInfo]
lista = ([('a', 2), ('b',5), ('a', 1), ('w', 21)] :: AList)
listb = ['a','a','c','g','a','w','b']
--step one, get the head, and its occurrences
items list = (eleA, eleB) where
eleA = length $ filter (\x -> x == (head list)) list
eleB = head list
getRidOfIt list ele = (dropWhile (\x -> x == ele) list) --drop like its hot
--add to lista
addWhile :: [(Char, Int)] -> Char -> Int -> [(Char,Int)]
addWhile [] _ _ = []
addWhile ((x,y):xs) letter times = if x == letter then (x,y+times) : addWhile xs letter times
else (x,y) : addWhile xs letter 0
--first answer
firstAnswer = addWhile lista (snd $ items listb) (fst $ items listb)
--[('a',5),('b',5),('a',1),('w',21)]
The operation you describe is pure, as #luqui points out, so we just define it as a pure Haskell function. It can be used inside a monad (including IO) by means of fmap (or do).
import Data.List
combine alist blist = (reverse a, b4) where
First we sort and count the B list:
b = map (\g->(head g,length g)) . group . sort $ blist
We need the import for group and sort to be available. Next, we roll along the alist and do our thing:
(a,b2) = foldl g ([],b) alist
g (acc,b) e#(x,c) = case pick x b of
Nothing -> (e:acc,b)
Just (n,b2) -> ((x,c+n):acc,b2)
b3 = map fst b2
b4 = [ c | c <- blist, elem c b3 ]
Now pick, as used, must be
pick x [] = Nothing
pick x ((y,n):t)
| x==y = Just (n,t)
| otherwise = case pick x t of Nothing -> Nothing
Just (k,r) -> Just (k, (y,n):r)
Of course pick performs a linear search, so if performance (speed) becomes a problem, b should be changed to allow for binary search (tree etc, like Map). The calculation of b4 which is filter (`elem` b3) blist is another potential performance problem with its repeated checks for presence in b3. Again, checking for presence in trees is faster than in lists, in general.
Test run:
> combine [('a', 2), ('b',5), ('a', 1), ('w', 21)] "aacgawb"
([('a',5),('b',6),('a',1),('w',22)],"cg")
edit: you probably want it the other way around, rolling along the blist while updating the alist and producing (or not) the elements of blist in the result (b4 in my code). That way the algorithm will operate in a more local manner on long input streams (that assuming your blist is long, though you didn't say that). As written above, it will have a space problem, consuming the input stream blist several times over. I'll keep it as is as an illustration, a food for thought.
So if you decide to go the 2nd route, first convert your alist into a Map (beware the duplicates!). Then, scanning (with scanl) over blist, make use of updateLookupWithKey to update the counts map and at the same time decide for each member of blist, one by one, whether to output it or not. The type of the accumulator will thus have to be (Map a Int, Maybe a), with a your element type (blist :: [a]):
scanl :: (acc -> a -> acc) -> acc -> [a] -> [acc]
scanning = tail $ scanl g (Nothing, fromList $ reverse alist) blist
g (_,cmap) a = case updateLookupWithKey (\_ c->Just(c+1)) a cmap of
(Just _, m2) -> (Nothing, m2) -- seen before
_ -> (Just a, cmap) -- not present in counts
new_b_list = [ a | (Just a,_) <- scanning ]
last_counts = snd $ last scanning
You will have to combine the toList last_counts with the original alist if you have to preserve the old duplicates there (why would you?).

Efficient table for Dynamic Programming in Haskell

I've coded up the 0-1 Knapsack problem in Haskell. I'm fairly proud about the laziness and level of generality achieved so far.
I start by providing functions for creating and dealing with a lazy 2d matrix.
mkList f = map f [0..]
mkTable f = mkList (\i -> mkList (\j -> f i j))
tableIndex table i j = table !! i !! j
I then make a specific table for a given knapsack problem
knapsackTable = mkTable f
where f 0 _ = 0
f _ 0 = 0
f i j | ws!!i > j = leaveI
| otherwise = max takeI leaveI
where takeI = tableIndex knapsackTable (i-1) (j-(ws!!i)) + vs!!i
leaveI = tableIndex knapsackTable (i-1) j
-- weight value pairs; item i has weight ws!!i and value vs!!i
ws = [0,1,2, 5, 6, 7] -- weights
vs = [0,1,7,11,21,31] -- values
And finish off with a couple helper functions for looking at the table
viewTable table maxI maxJ = take (maxI+1) . map (take (maxJ+1)) $ table
printTable table maxI maxJ = mapM_ print $ viewTable table maxI maxJ
This much was pretty easy. But I want to take it a step further.
I want a better data structure for the table. Ideally, it should be
Unboxed (immutable) [edit] never mind this
Lazy
Unbounded
O(1) time to construct
O(1) time complexity for looking up a given entry,
(more realistically, at worst O(log n), where n is i*j for looking up the entry at row i, column j)
Bonus points if you can explain why/how your solution satisfies these ideals.
Also bonus points if you can further generalize knapsackTable, and prove that it is efficient.
In improving the data structure you should try to satisfy the following goals:
If I ask for the solution where the maximum weight is 10 (in my current code, that would be indexTable knapsackTable 5 10, the 5 means include items 1-5) only the minimal amount of work necessary should be performed. Ideally this means no O(i*j) work for forcing the spine of each row of the table to necessary column length. You could say this isn't "true" DP, if you believe DP means evaluating the entirety of the table.
If I ask for the entire table to be printed (something like printTable knapsackTable 5 10), the values of each entry should be computed once and only once. The values of a given cell should depend on the values of other cells (DP style: the idea being, never recompute the same subproblem twice)
Ideas:
Data.Array bounded :(
UArray strict :(
Memoization techniques (SO question about DP in Haskell) this might work
Answers that make some compromises to my stated ideals will be upvoted (by me, anyways) as long as they are informative. The answer with the least compromises will probably be the "accepted" one.
First, your criterion for an unboxed data structure is probably a bit mislead. Unboxed values must be strict, and they have nothing to do with immutability. The solution I'm going to propose is immutable, lazy, and boxed. Also, I'm not sure in what way you are wanting construction and querying to be O(1). The structure I'm proposing is lazily constructed, but because it's potentially unbounded, its full construction would take infinite time. Querying the structure will take O(k) time for any particular key of size k, but of course the value you're looking up may take further time to compute.
The data structure is a lazy trie. I'm using Conal Elliott's MemoTrie library in my code. For genericity, it takes functions instead of lists for the weights and values.
knapsack :: (Enum a, Num w, Num v, Num a, Ord w, Ord v, HasTrie a, HasTrie w) =>
(a -> w) -> (a -> v) -> a -> w -> v
knapsack weight value = knapsackMem
where knapsackMem = memo2 knapsack'
knapsack' 0 w = 0
knapsack' i 0 = 0
knapsack' i w
| weight i > w = knapsackMem (pred i) w
| otherwise = max (knapsackMem (pred i) w)
(knapsackMem (pred i) (w - weight i)) + value i
Basically, it's implemented as a trie with a lazy spine and lazy values. It's bounded only by the key type. Because the entire thing is lazy, its construction before forcing it with queries is O(1). Each query forces a single path down the trie and its value, so it's O(1) for a bounded key size O(log n). As I already said, it's immutable, but not unboxed.
It will share all work in the recursive calls. It doesn't actually allow you to print the trie directly, but something like this should not do any redundant work:
mapM_ (print . uncurry (knapsack ws vs)) $ range ((0,0), (i,w))
Unboxed implies strict and bounded. Anything 100% Unboxed cannot be Lazy or Unbounded. The usual compromise is embodied in converting [Word8] to Data.ByteString.Lazy where there are unboxed chunks (strict ByteString) which are linked lazily together in an unbounded way.
A much more efficient table generator (enhanced to track individual items) could be made using "scanl", "zipWith", and my "takeOnto". This effectively avoid using (!!) while creating the table:
import Data.List(sort,genericTake)
type Table = [ [ Entry ] ]
data Entry = Entry { bestValue :: !Integer, pieces :: [[WV]] }
deriving (Read,Show)
data WV = WV { weight, value :: !Integer }
deriving (Read,Show,Eq,Ord)
instance Eq Entry where
(==) a b = (==) (bestValue a) (bestValue b)
instance Ord Entry where
compare a b = compare (bestValue a) (bestValue b)
solutions :: Entry -> Int
solutions = length . filter (not . null) . pieces
addItem :: Entry -> WV -> Entry
addItem e wv = Entry { bestValue = bestValue e + value wv, pieces = map (wv:) (pieces e) }
-- Utility function for improve
takeOnto :: ([a] -> [a]) -> Integer -> [a] -> [a]
takeOnto endF = go where
go n rest | n <=0 = endF rest
| otherwise = case rest of
(x:xs) -> x : go (pred n) xs
[] -> error "takeOnto: unexpected []"
improve oldList wv#(WV {weight=wi,value = vi}) = newList where
newList | vi <=0 = oldList
| otherwise = takeOnto (zipWith maxAB oldList) wi oldList
-- Dual traversal of index (w-wi) and index w makes this a zipWith
maxAB e2 e1 = let e2v = addItem e2 wv
in case compare e1 e2v of
LT -> e2v
EQ -> Entry { bestValue = bestValue e1
, pieces = pieces e1 ++ pieces e2v }
GT -> e1
-- Note that the returned table is finite
-- The dependence on only the previous row makes this a "scanl" operation
makeTable :: [Int] -> [Int] -> Table
makeTable ws vs =
let wvs = zipWith WV (map toInteger ws) (map toInteger vs)
nil = repeat (Entry { bestValue = 0, pieces = [[]] })
totW = sum (map weight wvs)
in map (genericTake (succ totW)) $ scanl improve nil wvs
-- Create specific table, note that weights (1+7) equal weight 8
ws, vs :: [Int]
ws = [2,3, 5, 5, 6, 7] -- weights
vs = [1,7,8,11,21,31] -- values
t = makeTable ws vs
-- Investigate table
seeTable = mapM_ seeBestValue t
where seeBestValue row = mapM_ (\v -> putStr (' ':(show (bestValue v)))) row >> putChar '\n'
ways = mapM_ seeWays t
where seeWays row = mapM_ (\v -> putStr (' ':(show (solutions v)))) row >> putChar '\n'
-- This has two ways of satisfying a bestValue of 8 for 3 items up to total weight 5
interesting = print (t !! 3 !! 5)
Lazy storable vectors: http://hackage.haskell.org/package/storablevector
Unbounded, lazy, O(chunksize) time to construct, O(n/chunksize) indexing, where chunksize can be sufficiently large for any given purpose. Basically a lazy list with some significant constant factor benifits.
To memoize functions, I recommend a library like Luke Palmer's memo combinators. The library uses tries, which are unbounded and have O(key size) lookup. (In general, you can't do better than O(key size) lookup because you always have to touch every bit of the key.)
knapsack :: (Int,Int) -> Solution
knapsack = memo f
where
memo = pair integral integral
f (i,j) = ... knapsack (i-b,j) ...
Internally, the integral combinator probably builds an infinite data structure
data IntTrie a = Branch IntTrie a IntTrie
integral f = \n -> lookup n table
where
table = Branch (\n -> f (2*n)) (f 0) (\n -> f (2*n+1))
Lookup works like this:
lookup 0 (Branch l a r) = a
lookup n (Branch l a r) = if even n then lookup n2 l else lookup n2 r
where n2 = n `div` 2
There are other ways to build infinite tries, but this one is popular.
Why won't you use Data.Map putting the other Data.Map into it? As far as I know it's quite fast.
It wouldn't be lazy though.
More than that, you can implement Ord typeclass for you data
data Index = Index Int Int
and put a two dimensional index directly as a key. You can achieve laziness by generating this map as a list and then just use
fromList [(Index 0 0, value11), (Index 0 1, value12), ...]

Resources