Pseudorandom number generators in Haskell - haskell

I'm working on solutions to the latest Programming Praxis puzzles—the first on implementing the minimal standard random number generator and the second on implementing a shuffle box to go with either that one or a different pseudorandom number generator. Implementing the math is pretty straightforward. The tricky bit for me is figuring out how to put the pieces together properly.
Conceptually, a pseudorandom number generator is a function stepRandom :: s -> (s, a) where s is the type of the internal state of the generator and a is the type of randomly chosen object produced. For a linear congruential PRNG, we could have s = a = Int64, for example, or perhaps s = Int64 and a = Double. This post on PSE does a pretty good job of showing how to use a monad to thread the PRNG state through a random computation, and finish things off with runRandom to run a computation with a certain initial state (seed).
Conceptually, a shuffle box is a function shuffle :: box -> a -> (box, a) along with a function to initialize a new box of the desired size with values from a PRNG. In practice, however, the representation of this box is a bit trickier. For efficiency, it should be represented as a mutable array, which forces it into ST or IO. Something vaguely like this:
mkShuffle :: (Integral i, Ix i, MArray a e m) => i -> m e -> m (a i e)
mkShuffle size getRandom = do
thelist <- replicateM (fromInteger.fromIntegral $ size) getRandom
newListArray (0,size-1) thelist
shuffle :: (Integral b, Ix b, MArray a b m) => a b b -> b -> m b
shuffle box n = do
(start,end) <- getBounds box
let index = start + n `quot` (end-start+1)
value <- readArray box index
writeArray box index n
return value
What I really want to do, however, is attach an (initialized?) shuffle box to a PRNG, so as to "pipe" the output from the PRNG into the shuffle box. I don't understand how to set up that plumbing properly.

I'm assuming that the goal is to implement an algorithm as follows: we have a random generator of some sort which we can think of as somehow producing a stream of random values
import Pipes
prng :: Monad m => Producer Int m r
-- produces Ints using the effects of m never stops, thus the
-- return type r is polymorphic
We would like to modify this PRNG via a shuffle box. Shuffle boxes have a mutable state Box which is an array of random integers and they modify a stream of random integers in a particular way
shuffle :: Monad m => Box -> Pipe Int Int m r
-- given a box, convert a stream of integers into a different
-- stream of integers using the effects of m without stopping
-- (polymorphic r)
shuffle works on an integer-by-integer basis by indexing into its Box by the incoming random value modulo the size of the box, storing the incoming value there, and emitting the value which was previously stored there. In some sense it's like a stochastic delay function.
So with that spec let's get to a real implementation. We want to use a mutable array so we'll use the vector library and the ST monad. ST requires that we pass around a phantom s parameter that matches throughout a particular ST monad invocation, so when we write Box it'll need to expose that parameter.
import qualified Data.Vector.Mutable as Vm
import Control.Monad.ST
data Box s = Box { sz :: Int, vc :: Vm.STVector s Int }
The sz parameter is the size of the Box's memory and the Vm.STVector s is a mutable ST Vector linked to the s ST thread. We can immediately use this to build our shuffle algorithm, now knowing that the Monad m must actually be ST s.
import Control.Monad
shuffle :: Box s -> Pipe Int Int (ST s) r
shuffle box = forever $ do -- this pipe runs forever
up <- await -- wait for upstream
next <- lift $ do let index = up `rem` sz box -- perform the shuffle
prior <- Vm.read (vc box) index -- using our mutation
Vm.write (vc box) index up -- primitives in the ST
return prior -- monad
yield next -- then yield the result
Now we'd just like to be able to attach this shuffle to some prng Producer. Since we're using vector it's nice to use the high-performance mwc-random library.
import qualified System.Random.MWC as MWC
-- | Produce a uniformly distributed positive integer
uniformPos :: MWC.GenST s -> ST s Int
uniformPos gen = liftM abs (MWC.uniform gen)
prng :: MWC.GenST s -> Int -> ST s (Box s)
prng gen = forever $ do
val <- lift (uniformPos gen)
yield val
Notice that since we're passing the PRNG seed, MWC.GenST s, along in an ST s thread we don't need to catch modifications and thread them along as well. Instead, mwc-random uses a mutable STRef s behind the scenes. Also notice that we modify MWC.uniform to return positive indices only as this is required for our indexing scheme in shuffle.
We can also use mwc-random to generate our initial box.
mkBox :: MWC.GenST s -> Int -> ST s (Box s)
mkBox gen size = do
vec <- Vm.replicateM size (uniformPos gen)
return (Box size vec)
The only trick here is the very nice Vm.replicateM function which effectively has the constrained type
Vm.replicateM :: Int -> ST s Int -> Vm.STVector s Int
where the second argument is an ST s action which generates a new element of the vector.
Finally we have all the pieces. We just need to assemble them. Fortunately, the modularity we get from using pipes makes this trivial.
import qualified Pipes.Prelude as P
run10 :: MWC.GenST s -> ST s [Int]
run10 gen = do
box <- mkBox gen 1000
P.toListM (prng gen >-> shuffle box >-> P.take 10)
Here we use (>->) to build a production pipeline and P.toListM to run that pipeline and produce a list. Finally we just need to execute this ST s thread in IO which is also where we can create our initial MWC.GenST s seed and feed it to run10 using MWC.withSystemRandom which generates the initial seed from, as it says, SystemRandom.
main :: IO ()
main = do
result <- MWC.withSystemRandom run10
print result
And we have our pipeline.
*ShuffleBox> main
[743244324568658487,8970293000346490947,7840610233495392020,6500616573179099831,1849346693432591466,4270856297964802595,3520304355004706754,7475836204488259316,1099932102382049619,7752192194581108062]
Note that the actual operations of these pieces is not terrifically complex. Unfortunately, the types in ST, mwc-random, vector, and pipes are all each individually highly generalized and thus can be quite burdensome to comprehend at first. Hopefully the above, where I've deliberately weakened and specialized nearly every type to this exact problem, will be much easier to follow and provide a little bit of intuition for how each of these wonderful libraries works individually and together.

Related

Haskell vector C++ push_back analogue

I've discovered that Haskell Data.Vector.* miss C++ std::vector::push_back's functionality. There is grow/unsafeGrow, but they seem to have O(n) complexity.
Is there a way to grow vectors in O(1) amortized time for an element?
No there really is no such facility in Data.Vector. It isn't too difficult to implement this from scratch using MutableArray like Data.Vector.Mutable does (see my implementation below), but there are some significant drawbacks. In particular, all of its operations end up happening inside some state context usually ST or IO. This has the downsides that
Any code that manipulates such a data structure ends up having to be monadic
The compiler is much less likely to be able to optimize. For example, libraries like vector use something really clever called fusion to optimize away intermediate allocations. This sort of thing is not possible in a state context.
Parallelism is going to be a lot tougher: in ST I can't even have two threads and in IO I will have race conditions all over the place. The nasty bit here is that any sharing is going to have to happen in IO.
As if all this wasn't enough, garbage collection also performs better inside pure code.
What do I do then?
It isn't particularly often that you have a need for exactly this behaviour - usually you are better off using an immutable data structure (thereby avoiding all of the aforementioned problems) which does something similar. Just limiting ourselves to containers which comes with GHC, some alternatives include:
if you are almost always just using push_back, maybe you just want a stack (a plain old [a]).
if you anticipate doing more push_back than lookups, Data.Sequence gives you O(1) appending to either end and O(log n) lookup.
if you are interested in a lot of operations especially hashmap-like, Data.IntMap is pretty optimized. Even if the theoretical cost of those operations is O(log n), you will need a pretty big IntMap to start feeling those costs.
Making something like C++ vector
Of course, if one doesn't care about the restrictions mentioned initially, there is no reason not to have a C++ like vector. Just for fun, I went ahead and implemented this from scratch (needs packages data-default and primitive).
The reason this code is probably not already in some library is that it goes against much of the spirit of Haskell (I do this with the intent of conforming to a C++ style vector).
The only operation that actually makes a new vector is newVector - everything else "modifies" an existing vector. Since pushBack doesn't return a new GrowVector, it has to modify the existing one (including its length and/or capacity), so length and capacity have to be "pointers". In turn, that means that even getting the length is a monadic operation.
While this isn't unboxed, it would not be too difficult to replicate vectors data family approach - it is just tedious1.
With that said:
module GrowVector (
GrowVector, newEmpty, size, read, write, pushBack, popBack
) where
import Data.Primitive.Array
import Data.Primitive.MutVar
import Data.Default
import Control.Monad
import Control.Monad.Primitive (PrimState, PrimMonad)
import Prelude hiding (length, read)
data GrowVector s a = GrowVector
{ underlying :: MutVar s (MutableArray s a) -- ^ underlying array
, length :: MutVar s Int -- ^ perceived length of vector
, capacity :: MutVar s Int -- ^ actual capacity
}
type GrowVectorIO = GrowVector (PrimState IO)
-- | Make a new empty vector with the given capacity. O(n)
newEmpty :: (Default a, PrimMonad m) => Int -> m (GrowVector (PrimState m) a)
newEmpty cap = do
arr <- newArray cap def
GrowVector <$> newMutVar arr <*> newMutVar 0 <*> newMutVar cap
-- | Read an element in the vector (unchecked). O(1)
read :: PrimMonad m => GrowVector (PrimState m) a -> Int -> m a
g `read` i = do arr <- readMutVar (underlying g); arr `readArray` i
-- | Find the size of the vector. O(1)
size :: PrimMonad m => GrowVector (PrimState m) a -> m Int
size g = readMutVar (length g)
-- | Double the vector capacity. O(n)
resize :: (Default a, PrimMonad m) => GrowVector (PrimState m) a -> m ()
resize g = do
curCap <- readMutVar (capacity g) -- read current capacity
curArr <- readMutVar (underlying g) -- read current array
curLen <- readMutVar (length g) -- read current length
newArr <- newArray (2 * curCap) def -- allocate a new array twice as big
copyMutableArray newArr 1 curArr 1 curLen -- copy the old array over
underlying g `writeMutVar` newArr -- use the new array in the vector
capacity g `modifyMutVar'` (*2) -- update the capacity in the vector
-- | Write an element to the array (unchecked). O(1)
write :: PrimMonad m => GrowVector (PrimState m) a -> Int -> a -> m ()
write g i x = do arr <- readMutVar (underlying g); writeArray arr i x
-- | Pop an element of the vector, mutating it (unchecked). O(1)
popBack :: PrimMonad m => GrowVector (PrimState m) a -> m a
popBack g = do
s <- size g;
x <- g `read` (s - 1)
length g `modifyMutVar'` (+ negate 1)
pure x
-- | Push an element. (Amortized) O(1)
pushBack :: (Default a, PrimMonad m) => GrowVector (PrimState m) a -> a -> m ()
pushBack g x = do
s <- readMutVar (length g) -- read current size
c <- readMutVar (capacity g) -- read current capacity
when (s+1 == c) (resize g) -- if need be, resize
write g (s+1) x -- write to the back of the array
length g `modifyMutVar'` (+1) -- increase te length
Current semantics of grow
I think the github issue does a pretty good job of explaining the semantics:
I think the intended semantics are that it may do a realloc, but not guaranteed to, and all the current implementations do the simpler copying semantics because for on heap allocations the cost should be roughly the same.
Basically you should use grow when you want a new mutable vector of an increased size, starting with the elements of the old vector (and no longer care about the old vector). This is quite useful - for example one could implement GrowVector using MVector and grow.
1 the approach is that for every new type of unboxed vector you want to have, you make a data instance that "expands" your type into a fixed number of unboxed arrays (or other unboxed vectors). This is the point of data family - to allow different instantiations of a type to have totally different runtime representations, and to also be extensible (you can add your own data instance if you want).

Generation of infinite list of Ints with MWC-random

I would like to write a function like this:
myFnc :: Gen -> ([Int], Gen)
using MWC Random. General idea is to create first Gen using predefined seed, then generate infinite sequences of Ints and new Gens in absolutely pure manner.
So I started trying to get Gen from seed represented as Int. Documentation says I can do it with initialize function. Very well, let's see, it takes vector of Word32s, so tried this to get at least one random Int:
import System.Random.MWC
import Data.Vector.Generic
rere :: IO ()
rere =
do gen <- initialize (singleton 42)
x <- uniform gen :: Int
print x
but it does not compile. Error:
Couldn't match expected type ‘Int’ with actual type ‘m0 a0’
In a stmt of a 'do' block: x <- uniform gen :: Int
In the expression:
do { gen <- initialize (singleton 42);
x <- uniform gen :: Int;
print x }
I looked at documentation, but it seems infinitely far from my very very simple initial desire...
It seems like I cannot use uniform too, because it returns value inside monad, so I don't really know how to make a simple plain list of Ints from all these stuff.
For example, here is what I want to implement, but with System.Random:
import System.Random
mkStdGen 5 -- first: that's how to create generator from given Int
myFnc :: StdGen -> ([Int], StdGen) -- second: desired function
myFnc g = (randoms g, fst . split $ g)
This works.
I've implemented a sample of this on FP Haskell Center. The core of the implementation is:
randoms :: (Variate a, PrimMonad m) => Gen (PrimState m) -> m [a]
randoms gen =
loop
where
loop = return $ unsafeInlinePrim $ do
x <- uniform gen
xs <- loop
return $! x : xs
Note that this has a different type signature than what you asked for. In particular, there's no concept of "split" in mwc-random. Also, getting random numbers in mwc-random is inherently a mutable action, so we need to live in some PrimMonad as well as use unsafe inlining. This is probably safe presuming you never use the Gen provided somewhere else.
However, I think you should try to restructure your program to accept the mutable nature of mwc-random, or switch to a pure random number generator like mersenne-random-pure64.
In System.Random.MWC, "uniform" returns a type "m a" but you are trying to constrain it to Int. That is what your error is complaining about.
Put
x :: Int
on the previous line.

How can I generalize my sampling framework?

In the context of a stochastic ray tracer, I'd like to decouple the MC integration (path tracing, bidirectional path tracing) from sample generation (uniform random, stratified, poisson, metropolis, ...). Most of this is already implemented, but it's tedious to use. So I ditched that and try build something nicer, by splitting sampled computations in two phases: In SampleGen you are allowed to request a random value using the mk1d and mk2d functions, which are then supplied with actual Floats by the sampling algorithm. Those values can be examined in SampleRun to do the actual computation. Here's some code with the interesting bits of a stratified sampler and it's use:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
import Control.Applicative
import Control.Monad.State.Strict
import Control.Monad.Primitive
import System.Random.MWC as MWC
-- allows to construct sampled computations
newtype SampleGen s m a = SampleGen (StateT s m a)
deriving ( Functor, Applicative, Monad
, MonadState s, MonadTrans )
-- allows to evaluate sampled computations constructed in SampleGen
newtype SampleRun s m a = SampleRun (StateT s m a)
deriving ( Functor, Applicative, Monad
, MonadState s )
-- a sampled computation, parametrized over the generator's state g,
-- the evaluator's state r, the underlying monad m and the result
-- type a
type Sampled g r m a = SampleGen g m (SampleRun r m a)
----------------------
-- Stratified Sampling
----------------------
-- | we just count the number of requested 1D samples
type StratGen = Int
-- | the pre-computed values and a RNG for additional ones
type StratRun m = ([Float], Gen (PrimState m))
-- | specialization of Sampled for stratified sampling
type Stratified m a = Sampled StratGen (StratRun m) m a
-- | gives a sampled value in [0..1), this is kind
-- of the "prime" value, upon which all computations
-- are built
mk1d :: PrimMonad m => Stratified m Float
mk1d = do
n1d <- get
put $ n1d + 1
return $ SampleRun $ do
fs <- gets fst
if length fs > n1d
then return (fs !! n1d)
else gets snd >>= lift . MWC.uniform
-- | gives a pair of stratified values, should really also
-- be a "prime" value, but here we just construct them
-- from two 1D samples for fun
mk2d :: (Functor m, PrimMonad m) => Stratified m (Float, Float)
mk2d = mk1d >>= \f1 -> mk1d >>= \f2 ->
return $ (,) <$> f1 <*> f2
-- | evaluates a stratified computation
runStratified
:: (PrimMonad m)
=> Int -- ^ number of samples
-> Stratified m a -- ^ computation to evaluate
-> m [a] -- ^ the values produced, a list of nsamples values
runStratified nsamples (SampleGen c) = do
(SampleRun x, n1d) <- runStateT c 0
-- let's just pretend I'd use n1d to actually
-- compute stratified samples
gen <- MWC.create
replicateM nsamples $ evalStateT x ([{- samples would go here #-}], gen)
-- estimate Pi by Monte Carlo sampling
-- mcPi :: (Functor m, PrimMonad m) => Sampled g r m Float
mcPi :: (Functor m, PrimMonad m) => Stratified m Float
mcPi = do
v <- mk2d
return $ v >>= \(x, y) -> return $ if x * x + y * y < 1 then 4 else 0
main :: IO ()
main = do
vs <- runStratified 10000 mcPi :: IO [Float]
print $ sum vs / fromIntegral (length vs)
The missing part here is that in it's current form, the mcPi function has the type
mcPi :: (Functor m, PrimMonad m) => Stratified m Float
while it should really be something like
mcPi :: (Functor m, PrimMonad m) => Sampled g r m Float
Admitted, the four type parameters on Sampled aren't exactly beautiful, but at least something like this would be useful. In summary, I'm looking for something allowing to express computations like mcPi independent of the sampling algorithm, e.g.:
a uniform random sampler does not need to maintain any state in the SampleGen phase, and needs only a RNG in the SampleRun phase
both, the stratified and the poisson disk sampler (and probably others) keep track of the number of 1D and 2D samples needed and precompute them into a vector, and they would be allowed to share a SampleGen and SampleRun implementation, to differ only in what happens inbetween SampleGen and SampleRun (how the vector is actually filled)
a metropolis sampler would use a lazy sample generation technique in it's SampleRun phase
I'd like to compile it using GHC, so extensions like MultiParamTypeClasses and TypeFamilies are ok to me, but I did not come up with anything remotely usable.
PS: As motivation, some pretty pictures. And the code in it's current form is on GitHub
I'm going to start off with a radically different question, "What should the code look like"?, and then work towards the question "How is the sampling framework put together"?.
What the code should look like
The definition of mcPi should be
mcPi :: (Num s, Num p) => s -> s -> p
mcPi x y = if x * x + y * y < 1 then 4 else 0
The Monte Carlo estimation of pi is that, given two numbers (that happen to come from the interval [0..1)) pi is the area of a square if they fall within a circle, otherwise it's 0. The Monte Carlo estimation of pi doesn't know anything about computation. It doesn't know if it's going to be repeated, or anything about where the numbers came from. It does know that the numbers should be uniformly distributed over the square, but that's a topic for a different question. The Monte Carlo estimation of pi is just a function from the samples to the estimate.
Other random things will know that they are part of a random process. A simple random process might be: flip a coin, if the coin comes up "heads", flip it again.
simpleRandomProcess :: (Monad m, MonadCoinFlip m) => m Coin
simpleRandomProcess =
do
firstFlip <- flipACoin
case firstFlip of
Heads -> flipACoin
Tails -> firstFlip
This random process would like to be able to see things like
data Coin = Heads | Tails
class MonadCoinFlip m where
flipACoin :: m Coin -- The coin should be fair
Random processes may change how much random data they need based on the results of previous experiments. This suggests that we will ultimately need to provide a Monad.
The interface
You would like to "decouple the MC integration (path tracing, bidirectional path tracing) from sample generation (uniform random, stratified, poisson, metropolis, ...)". In your examples, they all want to sample floats. That suggests the following class
class MonadSample m where
sample :: m Float -- Should be on the interval [0..1)
This is very similar to the existing MonadRandom class, except for two things. A MonadRandom implementation essentially needs to provide a uniformly random Int in some range of its own choosing. Your sampler will provide a Float sample of unknown distribution on the interval [0..1). This is different enough to justify having your own new class.
Due to the upcoming Monad Applicative change, I'm instead going to suggest a different name for this class, SampleSource.
class SampleSource f where
sample :: f Float -- Should be on the interval [0..1)
sample replaces mk1d in your code. mk2d can also be replaced, again not knowing what the source of the samples will be. sample2d, the replacement for mk2d, will work with any Applicative sample source, it doesn't need it to be a Monad. The reason it doesn't need a Monad is it won't decide how many samples to get, or what else to do, based on the result of samples; the structure of its computation is known ahead of time.
sample2d :: (Applicative f, SampleSource f) => f (Float, Float)
sample2d = (,) <$> sample <*> sample
If you are going to allow the sample source to introduce interactions between dimensions, for example for Poisson disk sampling, you'd need to add that to the interface, either explicitly enumerating the dimensions
class SampleSource f where
sample :: f Float
sample2d :: f (Float, Float)
sample3d :: f (Float, Float, Float)
sample4d :: f (Float, Float, Float, Float)
or using some vector library.
class SampleSource f where
sample :: f Float
samples :: Int -> f (Vector Float)
Implementing the interface
Now, we need to describe how each of your sample sources can be used as a SampleSource. As an example, I'll implement SampleSource for one of the worst sample sources there is.
newtype ZeroSampleSourceT m a = ZeroSampleSourceT {
unZeroSampleSourceT :: IdentityT m a
} deriving (MonadTrans, Monad, Functor, MonadPlus, Applicative, Alternative, MonadIO)
instance (Monad m) => SampleSource (ZeroSampleSourceT m a) where
sample = return 0
runZeroSampleSourceT :: (Monad m) => ZeroSampleSourceT m a -> m a
runZeroSampleSourceT = runIdentityT . unZeroSampleSourceT
When all Monads are Applicative I'd instead write
instance (Applicative f) => SampleSource (ZeroSampleSourceT f) where
sample = pure 0
I'll also implement an MWC uniform SampleSource.
newtype MWCUniformSampleSourceT m a = MWCUniformSampleSourceT m a {
unMWCUniformSampleSourceT :: ReaderT (Gen (PrimState m)) m a
} deriving (MonadTrans, Monad, Functor, MonadPlus, Applicative, Alternative, MonadIO)
runMWCUniformSampleSourceT :: MWCUniformSampleSourceT m a -> (Gen (PrimState m)) -> m a
runMWCUniformSampleSourceT = runReaderT . unMWCUniformSampleSourceT
-- MWC's uniform generates floats on the open-closed interval (0,1]
uniformClosedOpen :: PrimMonad m => Gen (PrimState m) -> m Float
uniformClosedOpen = fmap (\x -> x - 2**(-33)) . uniform
instance (PrimMonad m) => SampleSource (MWCUniformSampleSourceT m) where
sample = MWCUniformSampleSourceT . ReaderT $ uniformClosedOpen
We won't completely implement Stratified or runStratified, since your example code doesn't contain complete implementations for them.
But I want to know how many samples will be used ahead of time
I'm not sure exactly what you are trying to do with "stratified" sampling. Pre-generating numbers, and using a generator when those run out isn't what I understand stratified sampling to be. If you are going to provide a monadic interface to something, you won't be able to tell ahead of time what will be executed, so you won't be able to predict how many samples a computation will need before you start executing it. If you can settle for only an Applicative interface, then you can test ahead of time how many samples will be needed by the entire computation.
But Poisson Disk sampling needs to know how many points are being sampled ahead of time
If a single sampling can depend on both the number of samples needed and the number of dimensions, like in Poisson Disk sampling, those need to be passed to the sampler when they become known.
class SampleSource f where
sample :: f Float
samples :: Int -> f ([Float])
sampleN :: Int -> f (Vector Float)
samplesN :: Int -> Int -> f ([Vector Float])
You could generalize this to sampling in arbitrary shapes in arbitrary dimensions, which is what we'd need to do if we took the next leap.
Applicative query language with a Monadic interpreter
We can go, very, very elaborate and make an Applicative query language for requests for samples. The language will need to add two features on top of what Applicative already does. It will need to be able to repeat requests and it will need to group requests for samples together to identify which groupings are meaningful. It's motivated by the following code, which wants to get 6 different 2d samples, where sample2d is the same as our first definition.
take 6 (repeat sample2d)
First, we'll need to be able to repeat things over and over. The nicest way to this would be if we could write, e.g.
take 6 (repeat sample) :: SampleSource f => [f Float]
We'd need a way to go from an [f a] to f [a]. This already exists; it's Data.Traversable's sequenceA, which requires that f be Applicative. So we already get repetition from Applicative.
sequenceA . take 6 . repeat $ sample2d
To group requests together, we'll add a function to mark which groupings are meaningful.
sequenceA . take 6 . repeat . mark $ sample2d
and a class for things that can mark some grouping. If we need more meaning than just groupings - for example if the internal things should be dependent or independent, we'd add it here.
class Mark f where
mark :: f a -> f a
If everything is going to be very homogeneous, we might add a class for query-able sample sources
class (Applicative f, Mark f, SampleSource f) => QueryableSampleSouce f where
Now we will talk about the idea of a monad that has a more-optimized query language. Here we will start using all of those GHC-specific extensions; specifically TypeFamilies.
class MonadQuery m where
type Query m :: * -> *
interpret :: (Query m a) -> m a
And finally a class for monad sample sources with an Applicative query language
class (MonadQuery m, QueryableSampleSource (Query m), SampleSource m, Monad m) => MonadSample m where
At this point, we will want to work out what laws these should follow. I'd suggest a few:
interpret sample == sample
interpret (sequenceA a) = sequence (interpret a)
That is, without a mark, sample sources don't get to do anything terribly special with the queries. This would mean that a query that wants to be subject to Poisson disk's special treatment of 2d points and special treatment of the set of points would need to be marked twice:
mark . sequenceA . take 6 . repeat . mark $ sample2d
The Applicative query language sort-of corresponds with your StratGen type; by having a mearly Applicative interface it allows you to look ahead at the structure of the incoming query. The Monad then corresponds with your StratRun type.

The Haskell RNG and state

As a Java person learning Haskell I was getting use to the new way of thinking about everything but I've spent half a day trying to implement something with a simple RNG and am getting nowhere. In Java I could crate a static RNG and call it with Classname.random.nextInt(10) and it would meet these criteria:
I wouldn't have to keep a reference to the RNG and I could call it ad-hoc (even from inside a loop or a recursive function)
It would produce a new random number every time it was called
It would produce a new set of random numbers every time the project executed
So far in Haskell I'm facing the classic programmers dilemma - I can have 2/3. I'm still learning and have absolutely no idea about Monads, except that they might be able to help me here.
My Most recent attempt has been this:
getRn :: (RandomGen g) => Int -> Int -> Rand g Int
getRn lo hi= getRandomR (lo,hi)
--EDIT: Trimming my questions so that it's not so long winded, replacing with a summary and then what I ended up doing instead:
After creating a bunch of random cities (for TSP), I maped over them with a function createEdges that took a city and connected it to the rest of the cities: M.mapWithKey (\x y -> (x,(createEdges y [1..3] makeCountry)))
PROBLEM:
I wanted to replace [1..3] with something random. I.e. I wanted to map randomness (IO) over pure code. This caused no end of confusion for me (see people's attempt to answer me below to get a good sense of my confusion). In fact I'm still not even sure if I'm explaining the problem correctly.
I was getting this type of error: Couldn't match expected type [Int] with actual type IO [Int]
SOLUTION:
So after finding out that what I wanted to do was fundamentally wrong in a functional environment, I decided to change my approach. Instead of generating a list of cities and then applying randomness to connect them, I instead created an [[Int]] where each inner list represented the random edges. Thereby creating my randomness at the start of the process, rather than trying to map randomness over the pure code.
(I posted the final result as my own answer, but SO won't let me accept my own answer yet. Once it does I've reached that threshold I'll come back and accept)
You can work with random numbers without any monads or IO at all if you like.
All you have to know is, that as there is state (internal state of the random-number-generator) involved you have to take this state with you.
In my opinion the easiest framework for this is Sytem.Random.
Using this your getRn function could look like this:
getRn :: (RandomGen g) => Int -> Int -> g -> (Int, g)
getRn lo hi g = randomR (lo,hi) g
here you can view g as the state I mentioned above - you put it in and you get another back like this (in ghci):
> let init = mkStdGen 11
> let (myNr, nextGen) = getRn 1 6 init
> myNr
6
> let (myNr, nextGen') = getRn 1 6 nextGen
> myNr
4
I think you can start by using just this - thread the gen around and later when you get all the monad stuff come back and make it a bit easier to write/read.
I don't know the definitions of your data but here is a simple example that uses this technique:
module StackOQuestion where
import System.Random
getRn :: (RandomGen g) => Int -> Int -> g -> (Int, g)
getRn lo hi = randomR (lo,hi)
getRnList :: (RandomGen g) => (g -> (a, g)) -> Int -> g -> ([a], g)
getRnList f n g
| n <= 0 = ([], g)
| otherwise = let (ls, g') = getRnList f (n-1) g
(a, g'') = f g'
in (a:ls, g'')
type City = (Int, Int)
randomCity :: (RandomGen g) => g -> (City, g)
randomCity g =
let (f, g') = getRn 1 6 g
(s, g'') = getRn 1 6 g'
in ((f, s), g'')
randomCities :: (RandomGen g) => (Int, Int) -> g -> ([City], g)
randomCities (minC, maxC) g =
let (count, g') = getRn minC maxC g
in getRnList randomCity count g'
and you can test it like this:
> let init = mkStdGen 23
> randomCities (2,6) init
([(4,3),(1,2)],394128088 652912057)
As you can see this creates two Cities (here simply represented as an integer-pair) - for other values of init you will get other answers.
If you look the right way at this you can see that there is already the beginning of a state-monad there (the g -> ('a, g) part) ;)
PS: mkStdGen is a bit like the Random-initialization you know from Java and co (the part where you usually put your system-clock's tick-count in) - I choose 11 because it was quick to type ;) - of course you will always get the same numbers if you stick with 11 - so you will need to initialize this with something from IO - but you can push this pack to main and keep pure otherwise if you just pass then g around
I would say if you want to work with random numbers, the easiest thing to do is to use an utility library like Control.Monad.Random.
The more educational, work intensive path is to learn to write your own monad like that. First you want to understand the State monad and get comfortable with it. I think studying this older question (disclaimer: I have an answer there) may be a good starting point for studying this. The next step I would take is to be able to write the State monad on my own.
After that, the next exercise I would try is to write a "utility" monad for random number generation. By "utility" monad what I mean is a monad that basically repackages the standard State monad with an API that makes it easier for that specific task. This is how that Control.Monad.Random package is implemented:
-- | A monad transformer which adds a random number generator to an
-- existing monad.
newtype RandT g m a = RandT (StateT g m a)
Their RandT monad is really just a newtype definition that reuses StateT and adds a few utility functions so that you can concentrate on using random numbers rather than on the state monad itself. So for this exercise, you basically design a random number generation monad with the API you'd like to have, then use the State and Random libraries to implement it.
Edit: After a lot more reading and some extra help from a friend, I finally reduced it to this solution. However I'll keep my original solution in the answer as well just in case the same approach helps another newbie like me (it was a vital part of my learning process as well).
-- Use a unique random generator (replace <$> newStdGen with mkStdGen 123 for testing)
generateTemplate = createCitiesWeighted <$> newStdGen
-- create random edges (with weight as pair) by taking a random sized sample of randoms
multiTakePair :: [Int] -> [Int] -> [Int] -> [[(Int,Int)]]
multiTakePair ws (l:ls) is = (zip chunka chunkb) : multiTakePair remaindera ls remainderb
where
(chunkb,remainderb) = splitAt l is
(chunka,remaindera) = splitAt l ws
-- pure version of utilizing multitake by passing around an RNG using "split"
createCitiesWeighted :: StdGen -> [[(Int,Int)]]
createCitiesWeighted gen = take count result
where
(count,g1) = randomR (15,20) gen
(g2,g3) = split g1
cs = randomRs (0, count - 2) g1
es = randomRs (3,7) g2
ws = randomRs (1,10) g3
result = multiTakePair ws es cs
The original solution -----
As well as #user2407038's insightful comments, my solution relied very heavily on what I read from these two questions:
Sampling sequences of random numbers in Haskell
Random Integer in Haskell
(NB. I was having an issue where I couldn't work out how to randomize how many edges each city would have, #AnrewC provided an awesome response that not only answered that question but massively reduce excess code)
module TspRandom (
generateCityTemplate
) where
import Control.Monad (liftM, liftM2) -- promote a pure function to a monad
-- #AndrewC's suggestion
multiTake :: [Int] -> [Int] -> [[Int]]
multiTake (l:ls) is = chunk : multiTake ls remainder
where (chunk,remainder) = splitAt l is
-- Create a list [[Int]] where each inner int is of a random size (3-7)
-- The values inside each inner list max out at 19 (total - 1)
createCities = liftM (take 20) $ liftM2 multiTake (getRandomRs (3,7)) (getRandomRs (0, 19))
-- Run the generator
generateCityTemplate = do
putStrLn "Calculating # Cities"
x <- createCities
print x
return ()
The state monad is actually very simple. It is just a function from a state to a value and a new state, or:
data State s a = State {getState :: s -> (s, a)}
In fact, this is exactly what the Rand monad is. It isn't necessary to understand the mechanics of State to use Rand. You shouldn't be evaluating the Rand inside of IO, just use it directly, using the same do notation you have been using for IO. do notation works for any monad.
createCities :: Rand StdGen Int
createCities = getRn minCities maxCities
x :: Cities -> X
x = ...
func :: Rand StdGen X
func = do
cities <- createCities
return (x cities)
-- also valid
func = cities <$> createCities
func = createCities >>= return . x
You can't write getConnections like you have written it. You must do the following:
getConnections :: City -> Country -> Rand StdGen [Int]
getConnections c country = do
edgeCount <- createEdgeCount
fromIndecies [] edgeCount (citiesExcludeSelf c country)
Any function which calls getConnections will have to also return a value of type Rand StdGen x. You can only get rid of it once you have written the entire algorithm and want to run it.
Then, you can run the result using evalRandIO func, or, if you want to test some algorithm and you want to give it the same inputs on every test, you can use evalRand func (mkStdGen 12345), where 12345, or any other number, is your seed value.

How to generate random array in Haskell?

How can I generate random array using Data.Array?
I have function, that gives me a random number:
randomNumber :: (Random r) => r -> r -> IO r
randomNumber a b = getStdRandom (randomR (a,b))
And then I'm trying to use function from Data.Array to generate list
assocs $ array (1,100) [(i,i) | i <- (randomNumber 1 10)]
I know, that the type of randomNumber is IO, is there any method to convert IO Int -> Int? Or I need to use other methods to get random list? Should I do these functions with bind operator in do block?
You should use functions to generate random list from a generator, that are pure, and then use getStdRandom:
randomList :: Int -> Int -> IO [Int]
randomList a b = getStdGen >>= return . randomRs (a,b)
The function that you need is randomRs. Then you set the generator to stdGen with getStdGen and you have your generator.
The function randomList first gets the standard generator with getStdGen and then passes it to randomRs. Note that randomList can be rewritten without hiding the generator parameter:
randomList a b = getStdGen >>= \gen -> return (randomRs (a,b) gen)
I'll continue as long as #mariop's answer tells about lists, not arrays, and try to explain a nature of Haskell randomness a little more.
(if you're not interested in theory, skip to the (tl;dr) section)
At first, let's choose a signature for our presumed function. I'll consider that you need a plain array (as in C or Java), indexed by consecutive natural numbers (if my guessing is wrong, please correct).
As you may know, all Haskell functions are pure and deterministic, so each function must always return same results for the same arguments. That's not the case of random, of course. The solution is to use pseudorandom values, where we have a generator. A generator itself is a complicated function that have an internal hidden state called seed, and can produce a value and a new generator with a new seed (which then can produce a new (value, generator) pair and so on). A good generator is built in way that the next value could not be predicted from the previous value (when the we don't know the seed), so they appear as random to the user.
In fact, all major random implementations in most languages are
pseudorandom because the "true" random (which gets its values from the
sources of "natural" randomness, called entropy, such as CPU temperature) is
computatively expensive.
All so-called random functions in Haskell are dealing with the generator in some way. If you look at methods from the Random typeclass, they are divided in two groups:
Those which get the random generator explicitly: randomR, random and so on. You can build an explicit generator, initialized with a seed, with mkStdRandom (or even make your own).
Those which work in the IO monad: randomIO, randomRIO. They actually get the generator from the environment "carried" within the IO monad (with getStdRandom), and give it to function from the first group.
So, we can organize our function in either way:
--Arguments are generator, array size, min and max bound
generateArray :: (RangomGen g, Random r) => g -> Int -> r -> r -> Array Int r
or
--Arguments are array size, min and max bound
generateArray :: Random r => Int -> r -> r -> IO (Array Int r)
Because Haskell is lazy, there is no need to make a fixed set of random values — we can make an infinite one and take as many values as we need. The infinite list of random bounded values is produced by the randomRs function.
(tl;dr)
If the array is consecutive, the easier way is to build it from a plain values list rather than assocs (key, value) list:
generateArray gen size min max =
listArray (0, size - 1) $ randomRs (min, max) gen
or
generateArray size min max =
getStdGen >>= return . listArray (0, size - 1) . randomRs (min, max)

Resources