Passage: usage example - statistics

Has anyone used Passage, the PArallel SAmpler GEnerator?
Potentially it could be very useful to me, but the only documentation I found is the three line README on Github (https://github.com/cscherrer/passage), and the terse API on Hackage.
A simple example of how to run it would be great!

At a high level, Passage has two important monads to keep in mind: BayesianNetwork and BayesianSimulator.
First, you build a model in the BayesianNetwork monad:
myModel :: Int -> BayesianNetwork (Node, Node, [Node])
myModel n = do
mu <- normal 0 0.001
tau <- improperScale
xs <- replicateM n $ normal mu tau
return (mu, tau, xs)
This is specified as a generative model, so anything that is not random (here the number of data points n) must be passed as a parameter. Alternatively, we could have put a distribution on n.
Next, we build a simulator that calls the model:
mySim :: [Double] -> BayesianSimulator ()
mySim xs0 = do
setThreadNum 4
let n = length xs0
(mu, tau, xs) <- model $ myModel n
forM (zip xs xs0) $ \(x, x0) -> observe x x0
monitor mu
monitor tau
Finally, take some data:
xs0 = [1, -1, 2, 2, 2, -2]
and run the simluator:
main = genSimulator "myExample" (mySim xs0)
This creates a new directory myExample with OpenMP code for the sampler.

Related

Haskell `randoms` function not behaving well with my library

I'm trying to write a Haskell library for cryptographically secure random numbers. The code follows:
module URandom (URandom, initialize) where
import qualified Data.ByteString.Lazy as B
import System.Random
import Data.Word
newtype URandom = URandom [Word8]
instance RandomGen URandom where
next (URandom (x : xs)) = (fromIntegral x, URandom xs)
split (URandom l) = (URandom (evens l), URandom (odds l))
where evens (x : _ : xs) = x : evens xs
odds (_ : x : xs) = x : odds xs
genRange _ = (fromIntegral (minBound :: Word8), fromIntegral (maxBound :: Word8))
initialize :: IO URandom
initialize = URandom . B.unpack <$> B.readFile "/dev/urandom"
Unfortunately, it's not behaving like I want. In particular, performing
take 10 . randoms <$> initialize
yields (something similar to)
[-4611651379516519433,-4611644973572935887,-31514321567846,9223361179177989878,-4611732094835278236,9223327886739677537,4611709625714976418,37194416358963,4611669560113361421,-4611645373004878170,-9223329383535098640,4611675323959360258,-27021785867556,9223330964083681227,4611705212636167666]
which to my, albiet untrained, eye, does not appear very random. A lot of 46... and 92... in there.
What could be going wrong? Why doesn't this produce well-distributed numbers? It's worth noting that even if I concatenate together Word8s to form Ints the distribution does not improve, I didn't think it was worth including that code here.
Edit: here's some evidence that's not distributed correctly. I've written a function called histogram:
histogram :: ∀ t . (Integral t, Bounded t)
=> [t] -> Int -> S.Seq Int
histogram [] buckets = S.replicate buckets 0
histogram (x : xs) buckets = S.adjust (+ 1) (whichBucket x) (histogram xs buckets)
where whichBucket x = fromIntegral $ ((fromIntegral x * fromIntegral buckets) :: Integer) `div` fromIntegral (maxBound :: t)
and when I run
g <- initialize
histogram (take 1000000 $ randoms g :: [Word64]) 16
I get back
fromList [128510,0,0,121294,129020,0,0,122090,127873,0,0,120919,128637,0,0,121657]
Some of the buckets are completely empty!
The issue is a bug in random-1.0.1.1 that was fixed in random-1.1. The changelog points to this ticket. In particular, referring to the older version:
It also assumes that all RandomGen implementations produce the same range of random values as StdGen.
Here randomness is produced 8 bits at a time, and that caused the observed behavior.
random-1.1 fixed this:
This implementation also works with any RandomGen, even ones that produce as little as a single bit of entropy per next call or have a minimum bound other than zero.

how does parBuffer work?

I was looking at the code of parBuffer in parallel-3.2.0.4 but I am missing something on how it works. I don't see how can it create new sparks aside from the initial ones.
As far as I can see it's using start in parBufferWHNF to force the first n to be sparked with par, and then going through ret it's using par again on the same entries (shouldn't this just discard y and not risk to get the spark GC'd?) while returning the corresponding result? and then it's returning directly xs, without any additional spark creation as rdeepseq is just calling pseq.
But clearly testing code like this
withStrategy (parBuffer 10 rdeepseq) $ take 100 [ expensive stuff ]
I can see all the 100 sparks in the ghc RTS informations, but where are the other 90 created?
Here is the code I was looking at:
parBufferWHNF :: Int -> Strategy [a]
parBufferWHNF n0 xs0 = return (ret xs0 (start n0 xs0))
where -- ret :: [a] -> [a] -> [a]
ret (x:xs) (y:ys) = y `par` (x : ret xs ys)
ret xs _ = xs
-- start :: Int -> [a] -> [a]
start 0 ys = ys
start !_n [] = []
start !n (y:ys) = y `par` start (n-1) ys
-- | Like 'evalBuffer' but evaluates the list elements in parallel when
-- pushing them into the buffer.
parBuffer :: Int -> Strategy a -> Strategy [a]
parBuffer n strat = parBufferWHNF n . map (withStrategy strat)
parBuffer is conceptually similar to a circular buffer with a constant window size rolling over the input and producing the output and is useful when implementing pipeline parallelism or working with lazy streams.
Its implementation internally depends on how the result is evaluated -- it makes
use of lazyness and graph sharing (which explains why the sparks are not discarded) to produce output as input is consumed ensuring that the number of threads is limited to N and hence constant space is used (as opposed to parList which is linear in the length of argument list).
The start function is used to create the initial N sparks and pass the rest of the input to ret unsparked. The ret function takes two lists (xs0 and xs0 but without the initial N elements, as returned by start) and sparks an element
from the second list every time a thread completes (the x in the result; this actually happens once the user demands the results) until there are no elements left.

Haskell - How do I get Random Points (Int,Int)

I'm trying to get a set of random points (x,y) for drawing graph nodes to a screen. I need one randomly generated point for each node name passed in.
I found this code on a SO page, and modified it slightly to work for me, but it doesn't really do what I need.
I need a list of random (as random as possible) (Int,Int).
Anyway, here is what I have so far, and of course, it gives the same values every time, so it isn't particularly random :)
rndPoints :: [String] -> [Point]
rndPoints [] = []
rndPoints xs = zip x y where
size = length xs
x = take size (tail (map fst $ scanl (\(r, gen) _ -> randomR (25::Int,1000::Int) gen) (random (mkStdGen 1)) $ repeat ()))
y = take size (tail (map fst $ scanl (\(r, gen) _ -> randomR (25::Int,775::Int) gen) (random (mkStdGen 1)) $ repeat ()))
Any help would be much appreciated.
First, let's clean up your code a bit. There is a plural version of randomR that delivers an infinite list of random values: randomRs. This simplifies things a bit:
rndPoints1 :: [String] -> [Point]
rndPoints1 [] = []
rndPoints1 xs = zip x y
where
size = length xs
x = take size $ randomRs (25, 1000) (mkStdGen 1)
y = take size $ randomRs (25, 775) (mkStdGen 1)
We can simplify that further, by using zip's property that it stops after the shorter list is exhausted:
rndPoints2 :: [a] -> [Point]
rndPoints2 xs = map snd $ zip xs $ zip x y
where
x = randomRs (25, 1000) (mkStdGen 1)
y = randomRs (25, 775) (mkStdGen 1)
Notice I've also generalized the type of incoming list to just [a]. Since the values are never used, they needn't be Strings!
Now, it gives the same value every time because it uses mkStdGen to create a pseudo-random generator from the same seed (1) each time. If you want it to be different each time, then you need to create a generator in IO which can be based on the radom state of the computer. Rather than put the whole computation in IO, it is cleaner to pass in a StdGen:
rndPoints3 :: StdGen -> [Point]
rndPoints3 sg = zip x y
where
(sg1, sg2) = split sg
x = randomRs (25, 1000) sg1
y = randomRs (25, 775) sg2
pointsForLabels :: [a] -> StdGen -> [(a, Point)]
pointsForLabels xs sg = zip xs $ rndPoints3 sg
example3 :: [a] -> IO [(a, Point)]
example3 xs = newStdGen >>= return . pointsForLabels xs
Here, newStdGen creates a new pseudo-random generator each time, but it is in IO. That is passed eventually to a pure (non-IO) function rndPoints3 that takes the generator, and returns an infinite list of random Points. Within that function, split is used to create two generators from it, and each is used to derive the random list of coordinates.
pointsForLables now separates out the logic of matching up a new random point for each label. I also changed it to return the more likely useful pairs of labels and Points.
Finally, example3 lives in IO, and creates the generator and passes it all into the otherwise pure code.
I ended up using MonadRandom for this. I think the code was a little clearer and easier for me to understand. You could adapt the following code to address the original question.
import Control.Applicative
import Control.Monad.Random
type Point = (Float, Float)
type Poly = [Point]
randomScalar :: (RandomGen g) => Rand g Float
randomScalar = getRandomR (-500, 500)
randomPoint :: (RandomGen g) => Rand g Point
randomPoint = (,) <$> randomScalar <*> randomScalar
randomPoly :: (RandomGen g) => Int -> Rand g Poly
randomPoly n = sequence (replicate n randomPoint)

Haskell cartesian product of infinite lists

I want to generate a vectorspace from a basis pair, which looks something like:
genFromPair (e1, e2) = [x*e1 + y*e2 | x <- [0..], y <- [0..]]
When I examine the output though, it sems like I'm getting [0, e2, 2*e2,...] (i.e. x never gets above 0). Which sort of makes sense when I think about how I would write the code to do this list comprehension.
I wrote some code to take expanding "shells" from the origin (first the ints with norm 0, then with norm 1, then norm 2...) but this is kind of annoying and specific to Z^2 - I'd have to rewrite it for Z^3 or Z[i] etc. Is there a cleaner way of doing this?
The data-ordlist package has some functions which are extremely useful for working with sorted infinite lits. One of these is mergeAllBy, which combines an infinite list of infinite lists using some comparison function.
The idea is then to build an infinite list of lists such that y is fixed in each list, while x grows. As long as we can guarantee that each list is sorted, and that the heads of the lists are sorted, according to our ordering, we get a merged sorted list back.
Here's a quick example:
import Data.List.Ordered
import Data.Ord
genFromPair (e1, e2) = mergeAllBy (comparing norm) [[x.*e1 + y.*e2 | x <- [0..]] | y <- [0..]]
-- The rest just defines a simple vector type so we have something to play with
data Vec a = Vec a a
deriving (Eq, Show)
instance Num a => Num (Vec a) where
(Vec x1 y1) + (Vec x2 y2) = Vec (x1+x2) (y1+y2)
-- ...
s .* (Vec x y) = Vec (s*x) (s*y)
norm (Vec x y) = sqrt (x^2 + y^2)
Trying this in GHCi we get the expected result:
*Main> take 5 $ genFromPair (Vec 0 1, Vec 1 0)
[Vec 0.0 0.0,Vec 0.0 1.0,Vec 1.0 0.0,Vec 1.0 1.0,Vec 0.0 2.0]
You could look at your space as a tree. At the root of the tree one picks the first element and in its child you pick the second element..
Here's your tree defined using the ListTree package:
import Control.Monad.ListT
import Data.List.Class
import Data.List.Tree
import Prelude hiding (scanl)
infiniteTree :: ListT [] Integer
infiniteTree = repeatM [0..]
spacesTree :: ListT [] [Integer]
spacesTree = scanl (\xs x -> xs ++ [x]) [] infiniteTree
twoDimSpaceTree = genericTake 3 spacesTree
It's an infinite tree, but we could enumerate over it for example in DFS order:
ghci> take 10 (dfs twoDimSpaceTree)
[[],[0],[0,0],[0,1],[0,2],[0,3],[0,4],[0,5],[0,6],[0,7]]
The order you want, in tree-speak, is a variant of best-first-search for infinite trees, where one assumes that the children of tree nodes are sorted (you can't compare all the node's children as in normal best-first-search because there are infinitely many of those). Luckily, this variant is already implemented:
ghci> take 10 $ bestFirstSearchSortedChildrenOn sum $ genericTake 3 $ spacesTree
[[],[0],[0,0],[0,1],[1],[1,0],[1,1],[0,2],[2],[2,0]]
You can use any norm you like for your expanding shells, instead of sum above.
Using the diagonal snippet from CodeCatalog:
genFromPair (e1, e2) = diagonal [[x*e1 + y*e2 | x <- [0..]] | y <- [0..]]
diagonal :: [[a]] -> [a]
diagonal = concat . stripe
where
stripe [] = []
stripe ([]:xss) = stripe xss
stripe ((x:xs):xss) = [x] : zipCons xs (stripe xss)
zipCons [] ys = ys
zipCons xs [] = map (:[]) xs
zipCons (x:xs) (y:ys) = (x:y) : zipCons xs ys
Piggybacking on hammar's reply: His approach seems fairly easy to extend to higher dimensions:
Prelude> import Data.List.Ordered
Prelude Data.List.Ordered> import Data.Ord
Prelude Data.List.Ordered Data.Ord> let norm (x,y,z) = sqrt (fromIntegral x^2+fromIntegral y^2+fromIntegral z^2)
Prelude Data.List.Ordered Data.Ord> let mergeByNorm = mergeAllBy (comparing norm)
Prelude Data.List.Ordered Data.Ord> let sorted = mergeByNorm (map mergeByNorm [[[(x,y,z)| x <- [0..]] | y <- [0..]] | z <- [0..]])
Prelude Data.List.Ordered Data.Ord> take 20 sorted
[(0,0,0),(1,0,0),(0,1,0),(0,0,1),(1,1,0),(1,0,1),(0,1,1),(1,1,1),(2,0,0),(0,2,0),(0,0,2),(2,1,0),(1,2,0),(2,0,1),(0,2,1),(1,0,2),(0,1,2),(2,1,1),(1,2,1),(1,1,2)]

How can I write a MST algorithm (Prim or Kruskal) in Haskell?

I can write both Prim's and Kruskal's algorithms to find a minimum spanning tree in C++ or Java, but I want to know how to implement them in Haskell with O(mlogm) or O(mlogn) (pure functional programs is better). Thanks a lot.
As svenningsson suggests, priority search queue is well suited for both Kruskal's and Prim's (atleast the author proclaims it in his paper.) The problem with Kruskal is that it requires that you have an O(log n) union-find algorithm. A union-find datastructure with a purely functional interface is described here, but it uses mutable state internally, and a purely functional implementation might be impossible and in fact, there are several problems where an efficient purely functional solution is not known, as discussed in this related SO question.
A non-pure alterenative is to implement union-find algorithm in the ST monad. A search on Hackage finds that the equivalence package suits our needs. Following is an implementation of Kruskal using Data.Equivalence.Monad from the equivalence package:
import Data.Equivalence.Monad
import Data.Graph as G
import Data.List(sortBy)
import Data.Map as M
import Control.Monad(filterM)
import Data.Ord(comparing)
run = runEquivM (const ()) (const $ const ())
kruskal weight graph = run $
filterM go (sortBy (comparing weight) theEdges)
where
theEdges = G.edges graph
go (u,v) = do
eq <- equivalent u v
if eq then return False else
equate u v >> return True
It can be used like this:
fromL xs = fromJust . flip M.lookup (M.fromList xs)
testWeights = fromL [((1,2),1),((2,3),4),((3,4),5),((1,4),30),((1,3),4)]
testGraph = G.buildG (1,4) [(1,2),(2,3),(3,4),(1,4),(1,3)]
test = kruskal testWeights testGraph
and running test gives:
[(1,2),(1,3),(3,4)]
It should be noted that the running time is dependent on weights running in O(1) time, however fromL creates a weight function running in O(log(n)) time, this can be improved to O(1) time by using arrays or just keeping track of the weight in the input list, but it's not really part of the algorithm.
Here is a crude Kruskal implementation.
import Data.List(sort)
import Data.Set (Set, member, fromList, insert, union)
data Edge a = Edge a a Double deriving Show
instance (Eq a) => Eq (Edge a) where
Edge x1 y1 z1 == Edge x2 y2 z2 = x1 == x2 && y1 == y2 && z1 == z2
instance Eq a => Ord (Edge a) where
(Edge _ _ x) `compare` (Edge _ _ y) = x `compare` y
kruskal :: Ord a => [Edge a] -> [Edge a]
kruskal = fst . foldl mst ([],[]) . sort
mst :: Ord a => ([Edge a],[Set a]) -> Edge a -> ([Edge a],[Set a])
mst (es, sets) e#(Edge p q _) = step $ extract sets where
step (rest, Nothing, Nothing) = (e : es, fromList [p,q] : rest)
step (rest, Just ps, Nothing) = (e : es, q `insert` ps : rest)
step (rest, Nothing, Just qs) = (e : es, p `insert` qs : rest)
step (rest, Just ps, Just qs) | ps == qs = (es, sets) --circle
| otherwise = (e : es, ps `union` qs : rest)
extract = foldr f ([], Nothing, Nothing) where
f s (list, setp, setq) =
let list' = if member p s || member q s then list else s:list
setp' = if member p s then Just s else setp
setq' = if member q s then Just s else setq
in (list', setp', setq')
The first step is sorting the edges, which is O(n log n). The problem is to find a faster lookup for the vertex sets in the extract function. I couldn't find a faster solution for this, maybe someone has an idea...
[Update]
For a Scala implementation I used a map-like data-structure for (hopefully) better performance, but unfortunately it uses mutable sets, and I have no clue how to translate this into Haskell. The code is in my blog (sorry, description is in German): http://dgronau.wordpress.com/2010/11/28/nochmal-kruskal/
I think a priority search queue is what you're looking for. It can be implemented optimally in a functional language as demonstrated by Ralf Hinze in a paper. It seems like the paper is only available via acm's library at a cost.

Resources