Haskell :: Recursion in Recursion for Loop in Loop - haskell

How it goes: Based on the set of tuple (id, x, y), find the min max for x and y , then two dots (red points) created. Each element in tuple are grouped to two groups based on the distance towards the red dots.
Each group cant exceed 5 dots. If exceed, new group should be computed. I've managed to do recursion for the first phase. But I have no idea how to do it for second phase. The second phase should look like this:
Based on these two groups, again it need to find the min max for x and y (for each group), then four dots (red points) created. Each element in tuple are grouped to two groups based on the distance towards the red dots.
getDistance :: (Int, Double, Double) -> (Int, Double, Double) -> Double
getDistance (_,x1,y1) (_,x2,y2) = sqrt $ (x1-x2)^2 + (y1-y2)^2
getTheClusterID :: (Int, Double, Double) -> Int
getTheClusterID (id, _, _) = id
idxy = [(id, x, y)]
createCluster id cs = [(id, minX, minY),(id+1, maxX, minY), (id+2, minX, maxY), (id+3, maxX, maxY)]
where minX = minimum $ map (\(_,x,_,_) -> x) cs
maxX = maximum $ map (\(_,x,_,_) -> x) cs
minY = minimum $ map (\(_,_,y,_) -> y) cs
maxY = maximum $ map (\(_,_,y,_) -> y) cs
idCluster = [1]
cluster = createCluster (last idCluster) idxy
clusterThis (id,a,b) = case (a,b) of
j | getDistance (a,b) (cluster!!0) < getDistance (a,b) (cluster!!1) &&
-> (getTheClusterID (cluster!!0), a, b)
j | getDistance (a,b) (cluster!!1) < getDistance (a,b) (cluster!!0) &&
-> (getTheClusterID (cluster!!1), a, b)
_ -> (getTheClusterID (cluster!!0), a, b)
groupAll = map clusterThis idxy
I am moving from imperative to functional. Sorry if my way of thinking is still in imperative way. Still learning.
EDIT:
To clarify, this is the original data looks like.

The basic principle to follow in writing such an algorithm is to write small, compositional programs; each program is then easy to reason about and test in isolation, and the final program can be written in terms of the smaller ones.
The algorithm can be summarized as follows:
Compute the points which bound the set of points.
Split the rest of the points into two clusters, one containing points closer to the minimum point, the other containing all other points (equivalently, points closer to the maximum point).
If any cluster contains more than 5 points, repeat the process on that cluster.
The presence of a 'repeat the process' step indicates this to be a divide and conquer problem.
I see no need for an ID for each point, so I've dispensed with this.
To begin, define datatypes for each type of data you will be working with:
import Data.List (partition)
data Point = Point { ptX :: Double, ptY :: Double }
data Cluster = Cluster { clusterPts :: [Point] }
This may seem silly for such simple data, but it can potentially save you quite a bit of confusion during debugging. Also note the import of a function we will be using later.
The 1st step:
minMaxPoints :: [Point] -> (Point, Point)
minMaxPoints ps =
(Point minX minY
,Point maxX maxY)
where minX = minimum $ map ptX ps
maxX = maximum $ map ptX ps
minY = minimum $ map ptY ps
maxY = maximum $ map ptY ps
This is essentially the same as your createCluster function.
The 2nd step:
pointDistance :: Point -> Point -> Double
pointDistance (Point x1 y1) (Point x2 y2) = sqrt $ (x1-x2)^2 + (y1-y2)^2
cluster1 :: [Point] -> [Cluster]
cluster1 ps =
let (mn, mx) = minMaxPoints ps
(psmn, psmx) = partition (\p -> pointDistance mn p < pointDistance mx p) ps
in [ Cluster psmn, Cluster psmx ]
This function should clear - it is a direct translation of the above statement of this step into code. The partition function takes a predicate and a list and produces two lists, the first containing all elements for which the predicate is true, and the second all elements for which it is false. pointDistance is essentially the same as your getDistance function.
The 3rd step:
cluster :: [Point] -> [Cluster]
cluster ps =
cluster1 ps >>= \cl#(Cluster c) ->
if length c > 5
then cluster c
else [cl]
This also implements the statement above very directly. Perhaps the only confusing part is the use of >>=, which (here) has type [a] -> (a -> [b]) -> [b]; it simply applies the given function to each element of the given list, and concatenates the result (equivalently, it is written flip concatMap).
Finally your test case (which I hope I've translated correctly from pictures to Haskell data):
testPts :: [Point]
testPts = map (uncurry Point)
[ (0,0), (1,0), (2,1), (0,2)
, (5,2), (5,4), (4,3), (4,4)
, (8,2), (9,3), (10,2)
, (11,4), (12,3), (13,3), (13,5) ]
main = mapM_ (print . map (\p -> (ptX p, ptY p)) . clusterPts) $ cluster testPts
Running this program produces
[(0.0,0.0),(0.0,2.0),(2.0,1.0),(1.0,0.0)]
[(4.0,4.0),(5.0,2.0),(5.0,4.0),(4.0,3.0)]
[(10.0,2.0),(9.0,3.0),(8.0,2.0)]
[(13.0,3.0),(12.0,3.0),(11.0,4.0),(13.0,5.0)]

Functional programmers love recursion, yet they go to great lengths to avoid writing it. Jeez, people, make up your minds!
I like to structure my code, to the extent possible, using common, well-understood combinators. I want to demonstrate a style of Haskell programming which leans heavily on standard tools to implement the boring parts of a program (mapping, zipping, looping) as tersely and generically as possible, freeing you up to focus on the problem at hand.
So don't worry if you don't understand everything here. I just want to show you what's possible! (And please ask if you have questions!)
Vectors
First things first: we're working with two-dimensional space, so we'll need two-dimensional vectors and some secondary school vector algebra to work with them.
I'm going to parameterise my vector by the scalar on which our vector space is built. This'll allow me to work with standard type classes like Functor, so I can delegate a lot of the work of building a vector algebra to the machine. I've turned on DeriveFunctor and DeriveFoldable, which allow me to utter the magic words deriving (Functor, Foldable).
data Pair a = Pair {
px :: a,
py :: a
} deriving (Show, Functor, Foldable)
Hereafter I'm going to avoid working explicitly with Pair, and program to an interface, not an implementation. This'll allow me to build a simple linear algebra library in a manner that's independent of the dimensionality of the vector space. I'll give example type signatures in terms of V2:
type V2 = Pair Double
Scalar multiplication: functors
A vector space is required to have two operations: scalar multiplication and vector addition. Scalar multiplication means multiplying each component of a vector by a constant scalar. If you view a vector as a container of components, it should be clear that this means "do the same thing to every element in a container" - that is, it's a mapping operation. That's what Functor is for.
-- mul :: Double -> V2 -> V2
mul :: (Functor f, Num n) => n -> f n -> f n
mul k f = fmap (k *) f
Vector addition: zippy applicatives
Vector addition involves adding up the components of a vector point-wise. Thinking of a vector as a container of components, addition is a zipping operation - match up each element of the two vectors and add them up.
Applicative functors are functors with an additional "apply" operation. Thinking of a functor f as a container, Applicative's <*> :: f (a -> b) -> f a -> f b gives you a way to take a container of functions and apply it to a container of values to get a new container of values. It should be clear that one way to make Pair into an Applicative is to use zipping to apply functions to values.
instance Applicative Pair where
pure x = Pair x x
Pair f g <*> Pair x y = Pair (f x) (g y)
(For another example of a zippy applicative, see this answer of mine.)
Now that we have a way to zip two pairs, we can leverage a bit of standard Applicative machinery to implement vector addition.
-- add :: V2 -> V2 -> V2
add :: (Applicative f, Num n) => f n -> f n -> f n
add = liftA2 (+)
Vector subtraction, which gives you a way to find the distance between two points, is defined in terms of multiplication and addition.
-- minus :: V2 -> V2 -> V2
minus :: (Applicative f, Num n) => f n -> f n -> f n
v `minus` u = v `add` mul (-1) u
Dot products: foldable containers
2D Euclidean space is actually a Hilbert space - a vector space equipped with a way to measure lengths and angles in the form of a dot product. To take the dot product of two vectors, you multiply the components together and then add up the results. Once more, we'll be using Applicative to multiply the components, but that just gives us another vector: how do we implement "adding up the results"?
Foldable is the class of containers which admit an "aggregation" operation foldr :: (a -> b -> b) -> b -> f a -> b. The standard prelude's sum is defined in terms of foldr, so:
-- dot :: V2 -> V2 -> Double
dot :: (Applicative f, Foldable f, Num n) => f n -> f n -> n
v `dot` u = sum $ liftA2 (*) v u
This gives us a way to find the absolute length of a vector: dot it with itself and take the square root.
-- modulus :: V2 -> Double
modulus :: (Applicative f, Foldable f, Floating n) => f n -> n
modulus v = sqrt $ v `dot` v
So the distance between two points is the modulus of the difference of the vectors.
dist :: (Applicative f, Foldable f, Floating n) => f n -> f n -> n
dist v u = modulus (v `minus` u)
N-ary zipping: traversable containers
An axis-aligned (hyper-)rectangle can be defined by just two points. We'll represent the bounding box of a set of points as a Pair of vectors pointing to opposite corners of the bounding box.
Given a collection of vectors of components, we can find the opposite corners of the bounding box by finding the maximum and minimum of each component across the collection. This requires us to zip up, or transpose, a collection of vectors of components into a vector of collections of components. For this I'll use Traversable's sequenceA.
-- boundingBox :: [V2] -> Pair V2
boundingBox :: (Traversable t, Applicative f, Ord n) => t (f n) -> Pair (f n)
boundingBox vs =
let components = sequenceA vs
in Pair (minimum <$> components) (maximum <$> components)
Clustering
Now that we have a library for working with vectors, we can get down to the meaty part of the algorithm: dividing sets of points into clusters.
Partitioning
Let me rephrase the specification of the inner loop of your algorithm. You want to partition a set of points based on whether they're closer to the bottom-left corner of the set's bounding box or to the top-right corner. That's what partition does.
We can write a function, whichCluster which uses minus and modulus to decide this for a single point, and then use partition to apply it to the whole set.
type Cluster = []
-- cluster :: Cluster V2 -> [Cluster V2]
cluster :: (Applicative f, Foldable f, Ord n, Floating n) => Cluster (f n) -> [Cluster (f n)]
cluster vs =
let Pair bottomLeft topRight = boundingBox vs
whichCluster v = dist v bottomLeft <= dist v topRight
(g1, g2) = partition whichCluster vs
in [g1, g2]
Repetition, repetition, repetition
Now we want to repeatedly cluster until we don't have any groups larger than 5. Here's the plan. We'll keep track of two sets of clusters, those which are small enough, and those which require further sub-clustering. I'll use partition to sort a list of clusters into those which are small enough and those which need subclustering. I'll use the list monad's >>= :: [a] -> (a -> [b]) -> [b] (here [Cluster V2] -> ([V2] -> [Cluster V2]) -> [Cluster V2]), which maps a function over a list and flattens the result, to implement the notion of subclustering. And I'll use until to repeatedly subcluster until the set of remaining too-large clusters is empty.
-- smallClusters :: Int -> Cluster V2 -> [Cluster V2]
smallClusters :: (Applicative f, Foldable f, Ord n, Floating n) => Int -> Cluster (f n) -> [Cluster (f n)]
smallClusters maxSize vs = fst $ until (null . snd) splitLarge ([], [vs])
where
smallEnough xs = length xs <= maxSize
splitLarge (small, remaining) =
let (newSmall, large) = partition smallEnough remaining
in (small ++ newSmall, large >>= cluster)
A quick test, cribbed from #user2407038's answer:
testPts :: [V2]
testPts = map (uncurry Pair)
[ (0,0), (1,0), (2,1), (0,2)
, (5,2), (5,4), (4,3), (4,4)
, (8,2), (9,3), (10,2)
, (11,4), (12,3), (13,3), (13,5) ]
ghci> smallClusters 5 testPts
[
[Pair {px = 0.0, py = 0.0},Pair {px = 1.0, py = 0.0},Pair {px = 2.0, py = 1.0},Pair {px = 0.0, py = 2.0}],
[Pair {px = 5.0, py = 2.0},Pair {px = 5.0, py = 4.0},Pair {px = 4.0, py = 3.0},Pair {px = 4.0, py = 4.0}],
[Pair {px = 8.0, py = 2.0},Pair {px = 9.0, py = 3.0},Pair {px = 10.0, py = 2.0}]
[Pair {px = 11.0, py = 4.0},Pair {px = 12.0, py = 3.0},Pair {px = 13.0, py = 3.0},Pair {px = 13.0, py = 5.0}]
]
There you go. Small clusters in n-dimensional space, all without a single recursive function.
Labelling
Part of the point of working with the Applicative and Foldable interfaces, rather than working with V2 directly, was so I could demonstrate the following little magic trick.
Your original code represented points as 3-tuples consisting of two Doubles for the location and an Int for the point's label, but my V2 has no label. Can we recover this? Well, since the code doesn't at any point mention any concrete types - just standard type classes - we can just build a new type for labelled vectors. As long as said type is a Foldable Applicative all of the above code will continue to work without modification!
data Labelled m f a = Labelled m (f a) deriving (Show, Functor, Foldable)
instance (Monoid m, Applicative f) => Applicative (Labelled m f) where
pure = Labelled mempty . pure
Labelled m ff <*> Labelled n fx = Labelled (m <> n) (ff <*> fx)
The Monoid constraint is there because when combining actions you also need a way to combine their labels. I'm just going to use First - left-biased choice - because I'm not expecting the points' labels to be relevant to the zipping operations like modulus and boundingBox.
type LabelledV2 = Labelled (First Int) Pair Double
testPts :: [LabelledV2]
testPts = zipWith (Labelled . First . Just) [0..] $ map (uncurry Pair)
[ (0,0), (1,0), (2,1), (0,2)
, (5,2), (5,4), (4,3), (4,4)
, (8,2), (9,3), (10,2)
, (11,4), (12,3), (13,3), (13,5) ]
ghci> traverse (traverse (getFirst . lbl)) $ smallClusters 5 testPts
Just [[0,1,2,3],[4,5,6,7],[8,9,10],[11,12,13,14]] -- try reordering testPts

Related

Simplify haskell function

I'm really struggling with Haskell atm.
It took me almost 6 hours to write a function that does what I want. Unfortunately I'm not satisfied with the look of it.
Could someone please give me any hints how to rewrite it?
get_connected_area :: Eq generic_type => [[generic_type]] -> (Int, Int) -> [(Int,Int)] -> generic_type -> [(Int,Int)]
get_connected_area habitat point area nullValue
| elem point area = area
| not ((fst point) >= 0) = area
| not ((snd point) >= 0) = area
| not ((fst point) < (length habitat)) = area
| not ((snd point) < (length (habitat!!0))) = area
| (((habitat!!(fst point))!!(snd point))) == nullValue = area
| otherwise =
let new_area = point : area
in
get_connected_area habitat (fst point+1, snd point) (
get_connected_area habitat (fst point-1, snd point) (
get_connected_area habitat (fst point, snd point+1) (
get_connected_area habitat (fst point, snd point-1) new_area nullValue
) nullValue
) nullValue
) nullValue
The function get's a [[generic_type]] (representing a landscape-map) and searches the fully connected area around a point that isn't equal to the given nullValue.
Eg.:
If the function gets called like this:
get_connected_area [[0,1,0],[1,1,1],[0,1,0],[1,0,0]] (1,1) [] 0
That literally means
0 1 0
1 1 1
0 1 0
1 0 0
Represents a map (like google maps). Start from the point (coordinates) (1,1) I want to get all coordinates of the elements that form a connected area with the given point.
The result therefore should be:
0 1 0
1 1 1
0 1 0
1 0 0
And the corresponting return value (list of coordinates of bold 1s):
[(2,1),(0,1),(1,2),(1,0),(1,1)]
One small change is that you can use pattern matching for the variable point. This means you can use (x, y) instead of point in the function declaration:
get_connected_area habitat (x, y) area nullValue = ...
Now everywhere you have fst point, just put x, and everywhere you have snd point, put y.
Another modification is to use more variables for subexpressions. This can help with the nested recursive calls. For example, make a variable for the inner-most nested call:
....
where foo = get_connected_area habitat (x, y-1) new_area nullValue
Now just put foo instead of the call. This technique can now be repeated for the "new" inner-most call. (Note that you should pick a more descriptive name than foo. Maybe down?)
Note that not (x >= y) is the same as x < y. Use this to simplify all of the conditions. Since these conditions test if a point is inside a bounding rectangle, most of this logic can be factored to a function isIn :: (Int, Int) -> (Int, Int) -> (Int, Int) -> Bool which will make get_connected_area more readable.
This would be my first quick pass through the function, and sort of the minimum that might pass a code review (just in terms of style):
getConnectedArea :: Eq a => [[a]] -> a -> (Int, Int) -> [(Int,Int)] -> [(Int,Int)]
getConnectedArea habitat nullValue = go where
go point#(x,y) area
| elem point area = area
| x < 0 = area
| y < 0 = area
| x >= length habitat = area
| y >= length (habitat!!0) = area
| ((habitat!!x)!!y) == nullValue = area
| otherwise =
foldr go (point : area)
[ (x+1, y), (x-1, y), (x, y+1), (x, y-1) ]
We bind habitat and nullValue once at the top level (clarifying what the recursive work is doing), remove indirection in the predicates, use camel-case (underdashes obscure where function application is happening), replace generic_type with a (using a noisy variable here actually has the opposite effect from the one you intended; I end up trying to figure out what special semantics you're trying to call out when the interesting thing is that the type doesn't matter (so long as it can be compared for equality)).
At this point there are lots of things we can do:
pretend we're writing real code and worry about asymptotics of treating lists as arrays (!!, and length) and sets (elem), and use proper array and set data structures instead
move your bounds checking (and possible null value checking) into a new lookup function (the goal being to have only a single ... = area clause if possible
consider improvements to the algorithm: can we avoid recursively checking the cell we just came from algorithmically? can we avoid passing area entirely (making our search nicely lazy/"productive")?
Here is my take:
import qualified Data.Set as Set
type Point = (Int, Int)
getConnectedArea :: (Point -> Bool) -> Point -> Set.Set Point
getConnectedArea habitat = \p -> worker p Set.empty
-- \p is to the right of = to keep it out of the scope of the where clause
where
worker p seen
| p `Set.member` seen = seen
| habitat p = foldr worker (Set.insert p seen) (neighbors p)
| otherwise = seen
neighbors (x,y) = [(x-1,y), (x+1,y), (x,y-1), (x,y+1)]
What I've done
foldr over the neighbors, as some commenters suggested.
Since the order of points doesn't matter, I use a Set instead of a list, so it's semantically a better fit and faster to boot.
Named some helpful intermediate abstractions such as Point and neighbors.
A better data structure for the habitat would also be good, since lists are linear time to access, maybe a 2D Data.Array—but as far as this function cares, all you need is an indexing function Point -> Bool (out of bounds and null value are treated the same), so I've replaced the data structure parameter with the indexing function itself (this is a common transformation in FP).
We can see that it would also be possible to abstract away the neighbors function and then we would arrive at a very general graph traversal method
traverseGraph :: (Ord a) => (a -> [a]) -> a -> Set.Set a
in terms of which you could write getConnectedArea. I recommend doing this for educational purposes—left as an exercise.
EDIT
Here's an example of how to call the function in terms of (almost) your old function:
import Control.Monad ((<=<))
-- A couple helpers for indexing lists.
index :: Int -> [a] -> Maybe a
index _ [] = Nothing
index 0 (x:_) = x
index n (_:xs) = index (n-1) xs
index2 :: (Int,Int) -> [[a]] -> Maybe a
index2 (x,y) = index x <=< index y
-- index2 uses Maybe's monadic structure, and I think it's quite pretty.
-- But if you're not ready for that, you might prefer
index2' (x,y) xss
| Just xs <- index y xss = index x xs
| otherwise = Nothing
getConnectedArea' :: (Eq a) => [[a]] -> Point -> a -> [a]
getConnectedArea' habitat point nullValue = Set.toList $ getConnectedArea nonnull point
where
nonnull :: Point -> Bool
nonnull p = case index2 p habitat of
Nothing -> False
Just x -> x /= nullValue
OK i will try to simplify your code. However there are already good answers and that's why i will tackle this with a slightly more conceptual approach.
I believe you could chose better data types. For instance Data.Matrix seems to provide an ideal data type in the place of your [[generic_type]] type. Also for coordinates i wouldn't chose a tuple type since tuple type is there to pack different types. It's functor and monad instances are not very helpful when it is chosen as a coordinate system. Yet since it seems Data.Matrix is just happy with tuples as coordinates i will keep them.
OK your rephrased code is as follows;
import Data.Matrix
gca :: Matrix Int -> (Int, Int) -> Int -> [(Int,Int)]
gca fld crd nil = let nbs = [id, subtract 1, (+1)] >>= \f -> [id, subtract 1, (+1)]
>>= \g -> return (f,g)
>>= \(f,g) -> return ((f . fst) crd, (g . snd) crd)
in filter (\(x,y) -> fld ! (x,y) /= nil) nbs
*Main> gca (fromLists [[0,1,0],[1,1,1],[0,1,0],[1,0,0]]) (2,2) 0
[(2,2),(2,1),(2,3),(1,2),(3,2)]
The first thing to note is, the Matrix data type is index 1 based. So we have our center point at (2,2).
The second is... we have a three element list of functions defined as [id, subtract 1, (+1)]. The contained functions are all Num a => a -> a type and i need them to define the surrounding pixels of the given coordinate including the given coordinate. So we have a line just like if we did;
[1,2,3] >>= \x -> [1,2,3] >>= \y -> return [x,y] would result [[1,1],[1,2],[1,3],[2,1],[2,2],[2,3],[3,1],[3,2],[3,3]] which, in our case, would yield a 2 combinations of all functions in the place of the numbers 1,2 and 3.
Which then we apply to our given coordinate one by one with a cascading instruction
>>= \[f,g] -> return ((f . fst) crd, (g . snd) crd)
which yields all neighboring coordinates.
Then its nothing more than filtering the neighboring filters by checking if they are not equal to the nil value within out matrix.

Arbitrary instance for generating unbiased graphs for quickcheck

module Main where
import Test.QuickCheck
import Data.Set as Set
data Edge v = Edge {source :: v, target :: v}
deriving (Show,Eq,Ord)
data Graph v = Graph {nodes :: Set v, edges :: Set (Edge v)}
deriving Show
instance Arbitrary v => Int-> Arbitrary (Edge v) where
arbitrary = sized aux
where aux n = do s <- arbitrary
t <- arbitrary `suchThat` (/= s)
return $ Edge {source = s, target = t}
instance (Ord v, Arbitrary v) => Arbitrary (Graph v) where
arbitrary = aux `suchThat` isValid
where aux = do ns <- arbitrary
es <- arbitrary
return $ Graph {nodes = fromList ns, edges = fromList es}
This current definition of the instance is generating graphs with few edges, how do I alter it so it's less biased and it satisfies these two functions? :
-- | The function 'isDAG' tests if the graph is acyclic.
isDAG :: Ord v => Graph v -> Bool
isDAG g = isValid g && all nocycle (nodes g)
where nocycle v = all (\a -> v `notMember` reachable g a) $ Set.map target (adj g v)
-- | The function 'isForest' tests if a valid DAG is a florest (a set of trees), in other words,
-- if every node(vertex) has a maximum of one adjacent.
isForest :: Ord v => DAG v -> Bool
isForest g = isDAG g && all (\v -> length (adj g v) <= 1) (nodes g)
First you must figure out how to construct a graph which satisfies those properties.
DAG: If your nodes admit some ordering, and for each edge (u,v) you have u < v then the graph is acyclic. This ordering can be any ordering at all, so you can just manufacture an arbitrary ordering on the set of nodes in the graph.
Forest: If your graph has no edges, this property is trivially satisfied. Initially you can add any edge whose source is any node. If you add an edge, remove the source of that edge from the remaining available nodes.
I guess the big question is how to translate this to code. QuickCheck provides many combinators, esp. for selecting from lists, with and without replacement, of various sizes, etc.
instance (Ord v, Arbitrary v) => Arbitrary (Graph v) where
arbitrary = do
ns <- Set.fromList <$> liftA2 (++) (replicateM 10 arbitrary) arbitrary
First you generate a random set of nodes.
let ns' = map reverse $ drop 2 $ inits $ Set.toList ns
For each node, this computes the (non-empty) set of nodes which are "greater" than that node. Here "greater" just means according to the arbitrary ordering induced by the order of the elements in the list. This gets you the DAG property.
es <- sublistOf ns' >>=
mapM (\(f:ts) -> Edge f <$> elements ts)
You then get a random sublist of that list (which gets you the forest property), and for each element in that random sublist, you create an edge pointing from the "largest" node in that set to one that is "smaller".
return $ Graph ns (Set.fromList es)
Then you're done! Test like so:
main = quickCheck $ forAll arbitrary (liftA2 (&&) (isDAG :: Graph Integer -> Bool) isForest)
A natural way of constructing graphs is inductively, adding one node at a time. Then it becomes quite easy to ensure the required properties hold:
If for each added node its edges point only to existing nodes (and not in the other direction), we ensure the DAG property.
If there is at most one edge going from a node, we ensure the forest property. (As you didn't provide the adj function, it's not clear if by "forest" you mean there is at most one edge going from a node or to a node.)
So the process of generating such a graph would go as follows:
Generate a list of random nodes.
Construct a graph by add them one by one. For each node, either add a random edge to one of the already added nodes, or no edge (decide randomly).
The main factor here is deciding whether to add an edge or not. By tweaking this parameter you get more or less trees in your forest. One option is to use frequency for that.

Chinese Remainder Theorem Haskell

I need to write a function or functions in Haskell that can solve the Chinese Remainder Theorem. It needs to be created with the following definition:
crt :: [(Integer, Integer)] -> (Integer, Integer)
That the answer looks like
>crt [(2,7), (0,3), (1,5)]
(51, 105)
I think I have the overall idea, I just don't have the knowledge to write it. I know that the crt function must be recursive. I have created a helper function to split the list of tuples into a tuple of two lists:
crtSplit xs = (map fst xs, product(map snd xs))
Which, in this example, gives me:
([2,0,1],105)
I think what I need to do know is create a list for each of the elements in the first list. How would I begin to do this?
Chinese remainder theorem has an algebraic solution, based on the fact that x = r1 % m1 and x = r2 % m2 can be reduced to one modular equation if m1 and m2 are coprime.
To do so you need to know what modular inverse is and how it can efficiently be calculated using extended Euclidean algorithm.
If you put these pieces together, you can solve Chinese remainder theorem with a right fold:
crt :: (Integral a, Foldable t) => t (a, a) -> (a, a)
crt = foldr go (0, 1)
where
go (r1, m1) (r2, m2) = (r `mod` m, m)
where
r = r2 + m2 * (r1 - r2) * (m2 `inv` m1)
m = m2 * m1
-- Modular Inverse
a `inv` m = let (_, i, _) = gcd a m in i `mod` m
-- Extended Euclidean Algorithm
gcd 0 b = (b, 0, 1)
gcd a b = (g, t - (b `div` a) * s, s)
where (g, s, t) = gcd (b `mod` a) a
then:
\> crt [(2,7), (0,3), (1,5)]
(51,105)
\> crt [(2,3), (3,4), (1,5)] -- wiki example
(11,60)
Without going into algebra, you can also solve this with brute force. Perhaps that's what you've been asked to do.
For your example, create a list for each mod independent of the other two (upper bound will be least common multiple of the mod, assuming they are co-prime as a precondition, product, i.e. 105). These three list should have one common element which will satisfy all constraints.
m3 = [3,6,9,12,15,...,105]
m5 = [6,11,16,21,...,101]
m7 = [9,16,23,30,...,100]
you can use Data.Set to find the intersection of these lists. Now, extend this logic to n number of terms using recursion or fold.
Update
Perhaps an easier approach is defining a filter to create a sequence with a fixed remainder for a modulus and repeatedly apply for the given pairs
Prelude> let rm (r,m) = filter (\x -> x `mod` m == r)
verify that it works,
Prelude> take 10 $ rm (1,5) [1..]
[1,6,11,16,21,26,31,36,41,46]
now, for the given example use it repeatedly,
Prelude> take 3 $ rm (1,5) $ rm (0,3) $ rm (2,7) [1..]
[51,156,261]
of course we just need the first element, change to head instead
Prelude> head $ rm (1,5) $ rm (0,3) $ rm (2,7) [1..]
51
which we can generalize with fold
Prelude> head $ foldr rm [1..] [(1,5),(0,3),(2,7)]
51

Haskell - finding bigrams from an input list of words

I'm following the NLPWP Computational Linguistics site and trying to create a Haskell procedure to find collocations (most common groupings of two words, like "United States" or "to find") in a list of words. I've got the following working code to find bigram frequency:
import Data.Map (Map)
import qualified Data.Map as Map
-- | Function for creating a list of bigrams
-- | e.g. [("Colorless", "green"), ("green", "ideas")]
bigram :: [a] -> [[a]]
bigram [] = []
bigram [_] = []
bigram xs = take 2 xs : bigram (tail xs)
-- | Helper for freqList and freqBigram
countElem base alow = case (Map.lookup alow base) of
Just v -> Map.insert alow (v + 1) base
Nothing -> Map.insert alow 1 base
-- | Maps each word to its frequency.
freqList alow = foldl countElem Map.empty alow
-- | Maps each bigram to its frequency.
freqBigram alow = foldl countElem Map.empty (bigram alow)
I'm trying to write a function that outputs a Map from each bigram to [freq of bigram]/[(freq word 1)*(freq word 2)]. Could you possibly provide advice on how to approach it?
None of the following code is working, but it gives a vague outline for what I was trying to do.
collocations alow =
| let f key = (Map.lookup key freqBi) / ((Map.lookup (first alow) freqs)*(Map.lookup (last alow) freqs))
in Map.mapWithKey f = freqBi
where freqs = (freqList alow)
where freqBi = (freqBigram alow)
I'm very new to Haskell, so let me know if you've got any idea how to fix the collocations procedure. Style tips are also welcome.
Most of your code looks sane, except for the final colloctions function.
I'm not sure why there's a stray pipe in there after the equals sign. You're not trying to write any kind of pattern guard, so I don't think that should be there.
Map.lookup returns a Maybe key, so trying to do division or multiplication isn't going to work. Maybe what you want is some kind of function that takes a key and a map, and returns the associated count or zero if the key doesn't exist?
Other than that, it looks like you're not too far off having this work.
As I read it, your confusion stems from mistaking types, more or less. General advice: Use type signatures on all your top level functions and make sure they are sensible and what you expect of the function (I often do this even before implementing the function).
Let's take a look at your
-- | Function for creating a list of bigrams
-- | e.g. [("Colorless", "green"), ("green", "ideas")]
bigram :: [a] -> [[a]]
If you're giving in a list of Strings, you'll be getting a list of lists of Strings, so your bigram is a list.
You could decide to be more explicit (only allow Strings instead of sometype a - for the beginning at least). So, actually we get a list of Words an make a list of Bigrams from it:
type Word = String
type Bigram = (Word, Word)
bigram :: [Word] -> [Bigram]
For the implementation you can try to use readily available functions from Data.List, for example zipWith and tail.
Now your freqList and freqBigram look like
freqList :: [Word] -> Map Word Int
freqBigram :: [Word] -> Map Bigram Int
With this error messages of the compiler will be clearer to you. To point at it: Take care what you're doing in the lookups for the word frequencies. You're searching for the frequency of word1 and word2, and the bigram is (word1,word2).
Now you should be able to figure the solution out on your own, I guess.
First of all I advise you to have a look at the function
insertWith :: Ord k => (a -> a -> a) -> k -> a -> Map k a -> Map k a
maybe you'll recognize the pattern if used
f freqs bg = insertWith (+) bg 1 freqs
Next as #MathematicalOrchid already pointed out your solution is not too far from being correct.
lookup :: Ord k => k -> Map k a -> Maybe a
You already took care of that in your countElems function.
what I'd like to note that there is this neat abstraction called Applicative, which works really well for problems like yours.
First of all you have to import Control.Applicative if you're using GHC prior to 7.10 for newer versions it is already at your fingertips.
So what does this abstraction provide, similar to Functor it gives you a way to handle "side effects" in your case the possibility of the failing lookup resulting in Nothing.
We have two operators provided by Applicative: pure and <*>, and in addition as every Applicative is required to be a Functor we also get fmap or <$> which are the latter is just an infix alias for convenience.
So how does this apply to your situation?
<*> :: Applicative f => f (a -> b) -> f a -> f b
<$> :: Functor f => a -> b -> f a -> f b
First of all you see that those two look darn similar but with <*> being slightly less familiar.
Now having a function
f :: Int -> Int
f x = x + 3
and
x1 :: Maybe Int
x1 = Just 4
x2 :: Maybe Int
x2 = Nothing
one couldn't simply just f y because that wouldn't typecheck - but and that is the first idea to keep in mind. Maybe is a Functor (it is also an Applicative - it is even more an M-thing, but let's not go there).
f <$> x1 = Just 7
f <$> x2 = Nothing
so you can imagine the f looking up the value and performing the calculation inside the Just and if there is no value - a.k.a. we have the Nothing situation, we'll do what every lazy student does - be lazy and do nothing ;-).
Now we get to the next part <*>
g1 :: Maybe (Int -> Int)
g1 = Just (x + 3)
g2 :: Maybe (Int -> Int)
g2 = Nothing
Still g1 x1 wouldn't work, but
g1 <*> x1 = Just 7
g1 <*> x2 = Nothing
g2 <*> x1 = Nothing -- remember g2 is Nothing
g2 <*> x2 = Nothing
NEAT! - but still how does this solve your problem?
The 'magic' is using both operators ... for multi-argument functions
h :: Int -> Int -> Int
h x y = x + y + 2
and partial function application, which just means put in one value get back a function that waits for the next value.
GHCi> :type h 1
h 1 :: Int -> Int
Now the strange thing happens we can use with a function like h.
GHCi> :type h1 <$> x1
h1 <$> x1 :: Maybe (Int -> Int)
well that's good because then we can use our <*> with it
y1 :: Maybe Int
y1 = Just 7
h1 <$> x1 <*> y1 = Just (4 + 7 + 2)
= Just 13
and this even works with an arbitrary number of arguments
k :: Int -> Int -> Int -> Int -> Int
k x y z w = ...
k <$> x1 <*> y1 <*> z1 <*> w1 = ...
So design a pure function that works with Int, Float, Double or whatever you like and then use the Functor/Applicative abstraction to make your lookup and frequency calculation work with each other.

polymorphism in haskell - using multiple versions of one function without giving it different names

the other day I wrote a small program to collect a bunch of numbers in a matrix - data Matrix = Matrix [[Int]] starting at a corner - Corner and follow a Path - [Direction] all of the three types are instance of a class Transformable
then i have a function that generates a transformation function, that turns a given corner and direction - to LeftUp and Right
turnLUR :: Transformable a => (Corner, Direction) -> a -> a
then i use this function to "harvest" all the numbers in the matrix:
harvest :: Matrix → [(Corner,Direction)] → [Int]
harvest _ [] = []
harvest (Matrix []) _ = []
harvest as (cd:csds) = b ++ harvest (Matrix bs) csds' --cd = (Corner,Direction)
where f1 = turnLUR cd -- Matrix -> Matrix
f2 = turnLUR cd -- Corner -> Corner
f3 = turnLUR cd -- Direction -> Direction
Matrix (b:bs) = f1 as -- b = first line of [[Int]]
fcfd (c,d) = (f2 c,f3 d)
csds' = map fcfd csds
Now my question why do I have to write down f1, f2 and f3 instead of using one function f three times (keeping DRY in my mind!) - all three types Corners, Directions and Matrix are instances of the class Transformable.
How would I write that code without - making three versions of the "same" function?
This is because of the monomorphism restriction. Give it an explicit type signature, and you'll be able to reuse the same function each time:
harvest :: Matrix → [(Corner,Direction)] → [Int]
harvest _ [] = []
harvest (Matrix []) _ = []
harvest as (cd:csds) = b ++ harvest (Matrix bs) csds' --cd = (Corner,Direction)
where f :: Transformable a => a -> a
f = turnLUR cd
Matrix (b:bs) = f as -- b = first line of [[Int]]
fcfd (c,d) = (f c,f d)
csds' = map fcfd csds
Alternatively, you can turn it off by putting {-# LANGUAGE NoMonomorphismRestriction #-} at the top of your file.
The monomorphism restriction isn't well-liked — even GHC's manual refers to it as the dreaded monomorphism restriction — but there are some tricky corner cases, especially related to sharing (see the wiki page I linked for more information), that it avoids. In this case, I'd recommend just adding the type signature.

Resources