I am fairly new to Haskell and I'm always confused when it comes to dealing with random values. This time I'm trying to create a 5x5 2d-array in which every cell contains a random value between 4 options.
data Content = Bomb | One | Two | Three deriving (Eq, Show)
data Board = Array (Int, Int) Cell
data Cell = Cell {
content :: Content,
state :: CellState,
notes :: Notes
}
type Notes = [String]
type CellState = Bool
then in the main function I typed
main :: IO()
main = do
let cellMatrix = createMatrix
print cellMatrix
How can I create a function that creates a Board, and where do I need to do all the g<-newStdGen stuff?
Any advice would be greatly appreciated!
Focusing solely on the randomness aspect of the problem here, we see that the System.Random module in its version 1.2 provides an uniformR function that can produce one pseudo-random integer value from a specified range. Typically, we would need 5*5=25 values from the (0,3) range.
Side Note: Beware the changes between Random v1.1 and v1.2 are massive ones.
Alternatively, the same module provides a randomRs function, but that one provides an infinite list of integer values, hence cannot return an updated value of the generator. That might not be what you require.
We can use as our workhorse a function that returns a finite count of random integer values, together with an updated generator:
import System.Random
getManyInts :: RandomGen g => g -> Int -> (Int,Int) -> ([Int],g)
getManyInts g0 count range =
if (count <= 0) then ([], g0)
else let
(v,g1) = uniformR range g0
(vs,gf) = getManyInts g1 (count-1) range
in
(v:vs, gf)
In order to navigate from integer to Enum values, we will use a slightly improved Content type:
data Content = Bomb | One | Two | Three
deriving (Eq, Show, Enum, Bounded)
At that point, we can write a small test program that provides 5*5=25 random values of Content type:
main :: IO ()
main = do
g0 <- newStdGen
let count = 5*5
(loC, hiC) = (minBound :: Content, maxBound :: Content)
intRange = (fromEnum loC, fromEnum hiC)
(xs, g1) = getManyInts g0 count intRange
cs = (map toEnum xs) :: [Content]
putStrLn $ "contents: " ++ (show cs)
Test program output:
contents: [Two,Three,One,One,One,Two,One,Two,One,Bomb,Two,Two,One,Three,Bomb,Two,Two,Bomb,Three,One,One,Bomb,One,One,Bomb]
Addendum:
A better, polymorphic function:
Next, we can provide a function that returns a list of random values for any similar type:
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE ExplicitForAll #-}
getManyEnums :: forall g e. (RandomGen g, Bounded e, Enum e) => g -> Int -> ([e],g)
getManyEnums g0 count = (map toEnum xs, g1)
where
intRange = (fromEnum (minBound :: e), fromEnum (maxBound :: e))
(xs, g1) = getManyInts g0 count intRange
This function can be specialized as required. For example:
Testing under ghci:
$ ghci
GHCi, version 8.10.5: https://www.haskell.org/ghc/ :? for help
...
λ>
λ> :load q73268595.hs
[1 of 1] Compiling Main ( q73268595.hs, interpreted )
Ok, one module loaded.
λ>
λ> getManyContents = getManyEnums :: StdGen -> Int -> ([Content],StdGen)
λ>
λ> g0 <- newStdGen
λ>
λ> (cs,gf) = getManyContents g0 10
λ>
λ> cs
[One,One,Two,One,One,Three,Three,Three,Bomb,One]
λ>
λ> :q
Leaving GHCi.
$
Related
I'm a newbie so I'm having problems doing this:P
Currently I have tried this:
xs1 :: RandomGen g => Int -> g -> [[Char]]
xs1 n = sequence $ replicate n $ randomRs ('!', '~' ::Char)
but I can't give this string to my function:
fun1 :: (Eq a, Num a) => [a] -> [Char] -> [Char]
fun1 a xs1=[if (a!!n==1) then (xs1!!n) else b | n <- [0, 1,2,3,4]]
any help would be greatly appreciated
IMO the easiest way for a beginner to use random stuff is randomIO or randomRIO - here is an example printing 5 random characters (in the range '!' to '~'):
import Control.Monad (replicateM)
import System.Random (Random(randomRIO))
randomString :: Int -> IO String
randomString len = replicateM len $ randomRIO ('!', '~')
select :: (Num f, Eq f) => [f] -> String -> String
select =
zipWith (\f c -> if f == 1 then c else ' ')
main :: IO ()
main = do
-- there will be a warning without the `:: Int`
s <- select [1 :: Int,0,1,0,1] <$> randomString 5
putStrLn s
I'm using randomRIO here because using just randomIO will give you random characters from all over the place most likely all unprintable ;)
I don't know what you are trying to do with fun1 but I'll gone edit if you make it clear
How can I randomly generate a string 5 symbols long ?
Due to the use of randomRs, your xs1 function returns a list of infinitely long strings.
Also, as xs1 is a function not a list, you cannot really index it thru the !! operator.
Please please: paste the full text of the error message in your question, rather than just writing:
“I can't give this string to my function:”.
In order to generate random strings, you need two things:
a generator, typically returned by mkStdGen or mkTFGen
a monadic action object with the appropriate return type
Let's try to do that under the ghci Haskell interpreter:
$ ghci
GHCi, version 8.8.4: https://www.haskell.org/ghc/ :? for help
λ>
λ> import System.Random
λ> import Control.Monad.Random
λ>
So let's get a generator first:
λ>
λ> seed = 42
λ> gen1 = mkStdGen seed
λ>
Then let's write the action object:
λ>
λ> act5 = sequence $ replicate 5 (getRandomR ('!', '~' ::Char))
λ>
λ> :type act5
act5 :: MonadRandom m => m [Char]
λ>
The action object is essentially a wrapper around a transition function that takes a generator and returns some (output value, new generator) pair.
Finally, we combine these two things thru the runRand :: Rand g a -> g -> (a, g) library function, which returns both the value you want and an updated generator:
λ>
λ> (str1, gen2) = runRand act5 gen1
λ>
λ> str1
"D6bK/"
λ>
And we can even get a second pseudo-random string, courtesy of the updated generator:
λ>
λ> (str2, gen3) = runRand act5 gen2
λ>
λ> str2
"|}#5h"
λ>
And so on ... As you can see, it is not really necessary to use the IO monad for this.
A slight generalization:
Now, say you need 6 such random strings:
λ>
λ> act5rep = sequence (replicate 6 act5)
λ>
λ> :type act5rep
act5rep :: MonadRandom m => m [[Char]]
λ>
λ> (strs, newGen) = runRand act5rep gen1
λ>
λ> strs
["D6bK/","|}#5h","0|qSQ","h4:.1","3+e-}","}eu,I"]
λ>
My code aims to create a word search puzzle. There is a data called Orientation representing the direction of each word in the puzzle.
data Orientation =
Forward | Back | Up | Down | UpForward | UpBack | DownForward | DownBack
deriving (Eq, Ord, Show, Read)
Now given a input of strings which is [String], I want to randomly assign each string an orientation like [(Orientation, String)]
assignWordDir :: [String] -> [(Orientation, String)]
assignWordDir [] = []
assignWordDir (s:strs) = (ori, s) : assignWordDir
where ori = pickOri [Forward, Back, Up, Down, UpForward, UpBack, DownForward, DownBack]
pickOri :: [a] -> IO a
pickOri xs = do
i <- randomRIO (0, len)
pure $ xs !! i
where len = length xs - 1
I cannot compile because the output of pickOri is IO Orientation, is there any suggestions on how to modify my code? Thanks a lot
Couldn't match expected type ‘[(IO Orientation, String)]’
with actual type ‘[String] -> [(Orientation, String)]’
You might consider modifying the functions so that they stay pure by taking a RandomGen parameter. The pickOri function, for example, might be modified thusly:
pickOri :: RandomGen g => g -> [a] -> (a, g)
pickOri rnd xs =
let len = length xs - 1
(i, g) = randomR (0, len) rnd
in (xs !! i, g)
It's necessary to return the new RandomGen value g together with the selected list element, so that it'll generate another pseudo-random number the next time around.
Likewise, you can modify assignWordDir like this:
assignWordDir :: RandomGen g => g -> [b] -> [(Orientation, b)]
assignWordDir _ [] = []
assignWordDir rnd (s:strs) = (ori, s) : assignWordDir g strs
where (ori, g) =
pickOri rnd [Forward, Back, Up, Down, UpForward, UpBack, DownForward, DownBack]
Notice that when recursing into to assignWordDir, the recursive function call uses the g it receives from pickOri.
You can use mkStdGen or newStdGen to produce RandomGen values. Here's an example using newStdGen:
*Q65132918> rnd <- newStdGen
*Q65132918> assignWordDir rnd ["foo", "bar", "baz"]
[(UpBack,"foo"),(Up,"bar"),(UpBack,"baz")]
*Q65132918> assignWordDir rnd ["foo", "bar", "baz"]
[(UpBack,"foo"),(Up,"bar"),(UpBack,"baz")]
Notice that when you use the same RandomGen value, you get the same sequence. That's because assignWordDir is a pure function, so that's expected.
You can, however, produce a new random sequence by creating or getting a new StdGen value:
*Q65132918> rnd <- newStdGen
*Q65132918> assignWordDir rnd ["foo", "bar", "baz"]
[(Up,"foo"),(Up,"bar"),(Forward,"baz")]
If you want to play with this in a compiled module, you can keep these functions as presented here, and then compose them with a newStdGen-generated StdGen in the main entry point.
i'm trying to write a function that for n gives matrix n*n with unique rows and columns (latin square).
I got function that gives my list of strings "1" .. "2" .. "n"
numSymbol:: Int -> [String]
I tried to generate all permutations of this, and them all n-length tuples of permutations, and them check if it is unique in row / columns. But complexity (n!)^2 works perfect for 2 and 3, but with n > 3 it takes forever. It is possible to build latin square from permutations directly, for example from
permutation ( numSymbol 3) = [["1","2","3"],["1","3","2"],["2","1","3"],["2","3","1"],["3","1","2"],["3","2","1"]]
get
[[["1","2","3",],["2","1","3"],["3","1","2"]] , ....]
without generating list like [["1",...],["1",...],...], when we know first element disqualify it ?
Note: since we can easily take a Latin square that's been filled with numbers from 1 to n and re-label it with anything we want, we can write code that uses integer symbols without giving anything away, so let's stick with that.
Anyway, the stateful backtracking/nondeterministic monad:
type StateList s = StateT s []
is helpful for this sort of problem.
Here's the idea. We know that every symbol s is going to appear exactly once in each row r, so we can represent this with an urn of all possible ordered pairs (r,s):
my_rs_urn = [(r,s) | r <- [1..n], s <- [1..n]]
Similarly, as every symbol s appears exactly once in each column c, we can use a second urn:
my_cs_urn = [(c,s) | c <- [1..n], s <- [1..n]]
Creating a Latin square is matter of filling in each position (r,c) with a symbol s by removing matching balls (r,s) and (c,s) (i.e., removing two balls, one from each urn) so that every ball is used exactly once. Our state will be the content of the urns.
We need backtracking because we might reach a point where for a particular position (r,c), there is no s such that (r,s) and (c,s) are both still available in their respective urns. Also, a pleasant side-effect of list-based backtracking/nondeterminism is that it'll generate all possible Latin squares, not just the first one it finds.
Given this, our state will look like:
type Urn = [(Int,Int)]
data S = S
{ size :: Int
, rs :: Urn
, cs :: Urn }
I've included the size in the state for convenience. It won't ever be modified, so it actually ought to be in a Reader instead, but this is simpler.
We'll represent a square by a list of cell contents in row-major order (i.e., the symbols in positions [(1,1),(1,2),...,(1,n),(2,1),...,(n,n)]):
data Square = Square
Int -- square size
[Int] -- symbols in row-major order
deriving (Show)
Now, the monadic action to generate latin squares will look like this:
type M = StateT S []
latin :: M Square
latin = do
n <- gets size
-- for each position (r,c), get a valid symbol `s`
cells <- forM (pairs n) (\(r,c) -> getS r c)
return $ Square n cells
pairs :: Int -> [(Int,Int)]
pairs n = -- same as [(x,y) | x <- [1..n], y <- [1..n]]
(,) <$> [1..n] <*> [1..n]
The worker function getS picks an s so that (r,s) and (c,s) are available in the respective urns, removing those pairs from the urns as a side effect. Note that getS is written non-deterministically, so it'll try every possible way of picking an s and associated balls from the urns:
getS :: Int -> Int -> M Int
getS r c = do
-- try each possible `s` in the row
s <- pickSFromRow r
-- can we put `s` in this column?
pickCS c s
-- if so, `s` is good
return s
Most of the work is done by the helpers pickSFromRow and pickCS. The first, pickSFromRow picks an s from the given row:
pickSFromRow :: Int -> M Int
pickSFromRow r = do
balls <- gets rs
-- "lift" here non-determinstically picks balls
((r',s), rest) <- lift $ choices balls
-- only consider balls in matching row
guard $ r == r'
-- remove the ball
modify (\st -> st { rs = rest })
-- return the candidate "s"
return s
It uses a choices helper which generates every possible way of pulling one element out of a list:
choices :: [a] -> [(a,[a])]
choices = init . (zipWith f <$> inits <*> tails)
where f a (x:b) = (x, a++b)
f _ _ = error "choices: internal error"
The second, pickCS checks if (c,s) is available in the cs urn, and removes it if it is:
pickCS :: Int -> Int -> M ()
pickCS c s = do
balls <- gets cs
-- only continue if the required ball is available
guard $ (c,s) `elem` balls
-- remove the ball
modify (\st -> st { cs = delete (c,s) balls })
With an appropriate driver for our monad:
runM :: Int -> M a -> [a]
runM n act = evalStateT act (S n p p)
where p = pairs n
this can generate all 12 Latin square of size 3:
λ> runM 3 latin
[Square 3 [1,2,3,2,3,1,3,1,2],Square 3 [1,2,3,3,1,2,2,3,1],...]
or the 576 Latin squares of size 4:
λ> length $ runM 4 latin
576
Compiled with -O2, it's fast enough to enumerate all 161280 squares of size 5 in a couple seconds:
main :: IO ()
main = print $ length $ runM 5 latin
The list-based urn representation above isn't very efficient. On the other hand, because the lengths of the lists are pretty small, there's not that much to be gained by finding more efficient representations.
Nonetheless, here's complete code that uses efficient Map/Set representations tailored to the way the rs and cs urns are used. Compiled with -O2, it runs in constant space. For n=6, it can process about 100000 Latin squares per second, but that still means it'll need to run for a few hours to enumerate all 800 million of them.
{-# OPTIONS_GHC -Wall #-}
module LatinAll where
import Control.Monad.State
import Data.List
import Data.Set (Set)
import qualified Data.Set as Set
import Data.Map (Map, (!))
import qualified Data.Map as Map
data S = S
{ size :: Int
, rs :: Map Int [Int]
, cs :: Set (Int, Int) }
data Square = Square
Int -- square size
[Int] -- symbols in row-major order
deriving (Show)
type M = StateT S []
-- Get Latin squares
latin :: M Square
latin = do
n <- gets size
cells <- forM (pairs n) (\(r,c) -> getS r c)
return $ Square n cells
-- All locations in row-major order [(1,1),(1,2)..(n,n)]
pairs :: Int -> [(Int,Int)]
pairs n = (,) <$> [1..n] <*> [1..n]
-- Get a valid `s` for position `(r,c)`.
getS :: Int -> Int -> M Int
getS r c = do
s <- pickSFromRow r
pickCS c s
return s
-- Get an available `s` in row `r` from the `rs` urn.
pickSFromRow :: Int -> M Int
pickSFromRow r = do
urn <- gets rs
(s, rest) <- lift $ choices (urn ! r)
modify (\st -> st { rs = Map.insert r rest urn })
return s
-- Remove `(c,s)` from the `cs` urn.
pickCS :: Int -> Int -> M ()
pickCS c s = do
balls <- gets cs
guard $ (c,s) `Set.member` balls
modify (\st -> st { cs = Set.delete (c,s) balls })
-- Return all ways of removing one element from list.
choices :: [a] -> [(a,[a])]
choices = init . (zipWith f <$> inits <*> tails)
where f a (x:b) = (x, a++b)
f _ _ = error "choices: internal error"
-- Run an action in the M monad.
runM :: Int -> M a -> [a]
runM n act = evalStateT act (S n rs0 cs0)
where rs0 = Map.fromAscList $ zip [1..n] (repeat [1..n])
cs0 = Set.fromAscList $ pairs n
main :: IO ()
main = do
print $ runM 3 latin
print $ length (runM 4 latin)
print $ length (runM 5 latin)
Somewhat remarkably, modifying the program to produce only reduced Latin squares (i.e., with symbols [1..n] in order in both the first row and the first column) requires changing only two functions:
-- All locations in row-major order, skipping first row and column
-- i.e., [(2,2),(2,3)..(n,n)]
pairs :: Int -> [(Int,Int)]
pairs n = (,) <$> [2..n] <*> [2..n]
-- Run an action in the M monad.
runM :: Int -> M a -> [a]
runM n act = evalStateT act (S n rs0 cs0)
where -- skip balls [(1,1)..(n,n)] for first row
rs0 = Map.fromAscList $ map (\r -> (r, skip r)) [2..n]
-- skip balls [(1,1)..(n,n)] for first column
cs0 = Set.fromAscList $ [(c,s) | c <- [2..n], s <- skip c]
skip i = [1..(i-1)]++[(i+1)..n]
With these modifications, the resulting Square will include symbols in row-major order but skipping the first row and column. For example:
λ> runM 3 latin
[Square 3 [3,1,1,2]]
means:
1 2 3 fill in question marks 1 2 3
2 ? ? =====================> 2 3 1
3 ? ? in row-major order 3 1 2
This is fast enough to enumerate all 16,942,080 reduced Latin squares of size 7 in a few minutes:
$ stack ghc -- -O2 -main-is LatinReduced LatinReduced.hs && time ./LatinReduced
[1 of 1] Compiling LatinReduced ( LatinReduced.hs, LatinReduced.o )
Linking LatinReduced ...
16942080
real 3m9.342s
user 3m8.494s
sys 0m0.848s
I am struggling on Real World Haskell Chapter 11 quickCheck generator implementation for a an algebraic data type.
Following the book implementation (which was published in 2008), I came up with the following:
-- file: ch11/Prettify2.hs
module Prettify2(
Doc(..)
) where
data Doc = Empty
| Char Char
| Text String
| Line
| Concat Doc Doc
| Union Doc Doc
deriving (Show, Eq)
And my Arbitrary implementation:
-- file: ch11/Arbitrary.hs
import System.Random
import Test.QuickCheck.Gen
import qualified Test.QuickCheck.Arbitrary
class Arbitrary a where
arbitrary :: Gen a
-- elements' :: [a] => Gen a {- Expected a constraint, but ‘[a]’ has kind ‘*’ -}
-- choose' :: Random a => (a, a) -> Gen a
-- oneof' :: [Gen a] -> a
data Ternary = Yes
| No
| Unknown
deriving(Eq, Show)
instance Arbitrary Ternary where
arbitrary = do
n <- choose (0, 2) :: Gen Int
return $ case n of
0 -> Yes
1 -> No
_ -> Unknown
instance (Arbitrary a, Arbitrary b) => Arbitrary (a, b) where
arbitrary = do
x <- arbitrary
y <- arbitrary
return (x, y)
instance Arbitrary Char where
arbitrary = elements (['A'..'Z'] ++ ['a' .. 'z'] ++ " ~!##$%^&*()")
I tried the two following implementation with no success:
import Prettify2
import Control.Monad( liftM, liftM2 )
instance Arbitrary Doc where
arbitrary = do
n <- choose (1,6) :: Gen Int
case n of
1 -> return Empty
2 -> do x <- arbitrary
return (Char x)
3 -> do x <- arbitrary
return (Text x)
4 -> return Line
5 -> do x <- arbitrary
y <- arbitrary
return (Concat x y)
6 -> do x <- arbitrary
y <- arbitrary
return (Union x y)
instance Arbitrary Doc where
arbitrary =
oneof [ return Empty
, liftM Char arbitrary
, liftM Text arbitrary
, return Line
, liftM2 Concat arbitrary arbitrary
, liftM2 Union arbitrary arbitrary ]
But it doesn't compile since No instance for (Arbitrary String)
I tried then to implement the instance for Arbitrary String in the following ways:
import qualified Test.QuickCheck.Arbitrary but it does not implement Arbitrary String neither
installing Test.RandomStrings hackage link
instance Arbitrary String where
arbitrary = do
n <- choose (8, 16) :: Gen Int
return $ randomWord randomASCII n :: Gen String
With the following backtrace:
$ ghci
GHCi, version 7.10.3: http://www.haskell.org/ghc/ :? for help
Prelude> :l Arbitrary.hs
[1 of 2] Compiling Prettify2 ( Prettify2.hs, interpreted )
[2 of 2] Compiling Main ( Arbitrary.hs, interpreted )
Arbitrary.hs:76:9:
The last statement in a 'do' block must be an expression
return <- randomWord randomASCII n :: Gen String
Failed, modules loaded: Prettify2
Would you have any good suggestion about how to implement this particular generator and - more in general - how to proceed in these cases?
Thank you in advance
Don't define a new Arbitrary type class, import Test.QuickCheck instead. It defines most of these instances for you. Also be careful about the version of quickcheck, RWH assumes version 1.
The resulting full implementation will be:
-- file: ch11/Arbitrary.hs
import Test.QuickCheck
import Prettify2
import Control.Monad( liftM, liftM2 )
data Ternary = Yes
| No
| Unknown
deriving(Eq, Show)
instance Arbitrary Ternary where
arbitrary = do
n <- choose (0, 2) :: Gen Int
return $ case n of
0 -> Yes
1 -> No
_ -> Unknown
instance Arbitrary Doc where
arbitrary =
oneof [ return Empty
, liftM Char arbitrary
, liftM Text arbitrary
, return Line
, liftM2 Concat arbitrary arbitrary
, liftM2 Union arbitrary arbitrary ]
I am trying to write a very large amount of data to a file in constant memory.
import qualified Data.ByteString.Lazy as B
{- Creates and writes num grids of dimensions aa x aa -}
writeGrids :: Int -> Int -> IO ()
writeGrids num aa = do
rng <- newPureMT
let (grids,shuffleds) = createGrids rng aa
createDirectoryIfMissing True "data/grids/"
B.writeFile (gridFileName num aa)
(encode (take num grids))
B.writeFile (shuffledFileName num aa)
(encode (take num shuffleds))
However this consumes memory proportional to the size of num. I know createGrids is a sufficiently lazy function because I have tested it by appending error "not lazy enough" (as suggested by the Haskell wiki here) to the end of the lists it returns and no errors are raised. take is a lazy function that is defined in Data.List. encode is also a lazy function defined in Data.Binary. B.writeFile is defined in Data.ByteString.Lazy.
Here is the complete code so you can execute it:
import Control.Arrow (first)
import Data.Binary
import GHC.Float (double2Float)
import System.Random (next)
import System.Random.Mersenne.Pure64 (PureMT, newPureMT, randomDouble)
import System.Random.Shuffle (shuffle')
import qualified Data.ByteString.Lazy as B
main :: IO ()
main = writeGrids 1000 64
{- Creates and writes num grids of dimensions aa x aa -}
writeGrids :: Int -> Int -> IO ()
writeGrids num aa = do
rng <- newPureMT
let (grids,shuffleds) = createGrids rng aa
B.writeFile "grids.bin" (encode (take num grids))
B.writeFile "shuffleds.bin" (encode (take num shuffleds))
{- a random number generator, dimension of grids to make
returns a pair of lists, the first is a list of grids of dimensions
aa x aa, the second is a list of the shuffled grids corresponding to the first list -}
createGrids :: PureMT -> Int -> ([[(Float,Float)]],[[(Float,Float)]])
createGrids rng aa = (grids,shuffleds) where
rs = randomFloats rng
grids = map (getGridR aa) (chunksOf (2 * aa * aa) rs)
shuffleds = shuffler (aa * aa) rng grids
{- length of each grid, a random number generator, a list of grids
returns a the list with each grid shuffled -}
shuffler :: Int -> PureMT -> [[(Float,Float)]] -> [[(Float,Float)]]
shuffler n rng (xs:xss) = shuffle' xs n rng : shuffler n (snd (next rng)) xss
shuffler _ _ [] = []
{- divides list into chunks of size n -}
chunksOf :: Int -> [a] -> [[a]]
chunksOf n = go
where go xs = case splitAt n xs of
(ys,zs) | null ys -> []
| otherwise -> ys : go zs
{- dimension of grid, list of random floats [0,1]
returns a list of (x,y) points of length n^2 such that all
points are in the range [0,1] and the points are a randomly
perturbed regular grid -}
getGridR :: Int -> [Float] -> [(Float,Float)]
getGridR n rs = pts where
nn = n * n
(irs,jrs) = splitAt nn rs
n' = fromIntegral n
grid = [ (p,q) | p <- [0..n'-1], q <- [0..n'-1] ]
pts = zipWith (\(p,q) (ir,jr) -> ((p+ir)/n',(q+jr)/n')) grid (zip irs jrs)
{- an infinite list of random floats in range [0,1] -}
randomFloats :: PureMT -> [Float]
randomFloats rng = let (d,rng') = first double2Float (randomDouble rng)
in d : randomFloats rng'
The required packages are:
, bytestring
, binary
, random
, mersenne-random-pure64
, random-shuffle
Two reasons for the memory usage:
First, Data.Binary.encode doesn't seem to run in constant space. The following program uses 910 MB memory:
import Data.Binary
import qualified Data.ByteString.Lazy as B
len = 10000000 :: Int
main = B.writeFile "grids.bin" $ encode [0..len]
If we leave a 0 out from len we get 97 MB memory usage.
In contrast, the following program uses 1 MB:
import qualified Data.ByteString.Lazy.Char8 as B
main = B.writeFile "grids.bin" $ B.pack $ show [0..(1000000::Int)]
Second, in your program shuffleds contains references to contents of grids, which prevents garbage collection of grids. So when we print grids, we also evaluate it and then it has to sit in memory until we finish printing shuffleds. The following version of your program still consumes lots of memory, but it uses constant space if we comment out one of the two lines with B.writeFile.
import qualified Data.ByteString.Lazy.Char8 as B
writeGrids :: Int -> Int -> IO ()
writeGrids num aa = do
rng <- newPureMT
let (grids,shuffleds) = createGrids rng aa
B.writeFile "grids.bin" (B.pack $ show (take num grids))
B.writeFile "shuffleds.bin" (B.pack $ show (take num shuffleds))
For what it's worth, here is a full solution combining the ideas of everyone here. Memory consumption is constant at ~6MB (compiled with -O2).
import Control.Arrow (first)
import Control.Monad.State (state, evalState)
import Data.Binary
import GHC.Float (double2Float)
import System.Random (next)
import System.Random.Mersenne.Pure64 (PureMT, newPureMT, randomDouble)
import System.Random.Shuffle (shuffle')
import qualified Data.ByteString as B (hPut)
import qualified Pipes.Binary as P (encode)
import qualified Pipes.Prelude as P (zip, mapM, drain)
import Pipes (runEffect, (>->))
import System.IO (withFile, IOMode(AppendMode))
main :: IO ()
main = writeGrids 1000 64
{- Creates and writes num grids of dimensions aa x aa -}
writeGrids :: Int -> Int -> IO ()
writeGrids num aa = do
rng <- newPureMT
let (grids, shuffleds) = createGrids rng aa
gridFile = "grids.bin"
shuffledFile = "shuffleds.bin"
encoder = P.encode . SerList . take num
writeFile gridFile ""
writeFile shuffledFile ""
withFile gridFile AppendMode $ \hGr ->
withFile shuffledFile AppendMode $ \hSh ->
runEffect
$ P.zip (encoder grids) (encoder shuffleds)
>-> P.mapM (\(ch1, ch2) -> B.hPut hGr ch1 >> B.hPut hSh ch2)
>-> P.drain -- discards the stream of () results.
{- a random number generator, dimension of grids to make
returns a pair of lists, the first is a list of grids of dimensions
aa x aa, the second is a list of the shuffled grids corresponding to the first list -}
createGrids :: PureMT -> Int -> ( [[(Float,Float)]], [[(Float,Float)]] )
createGrids rng aa = unzip gridsAndShuffleds where
rs = randomFloats rng
grids = map (getGridR aa) (chunksOf (2 * aa * aa) rs)
gridsAndShuffleds = shuffler (aa * aa) rng grids
{- length of each grid, a random number generator, a list of grids
returns a the list with each grid shuffled -}
shuffler :: Int -> PureMT -> [[(Float,Float)]] -> [( [(Float,Float)], [(Float,Float)] )]
shuffler n rng xss = evalState (traverse oneShuffle xss) rng
where
oneShuffle xs = state $ \r -> ((xs, shuffle' xs n r), snd (next r))
newtype SerList a = SerList { runSerList :: [a] }
deriving (Show)
instance Binary a => Binary (SerList a) where
put (SerList (x:xs)) = put False >> put x >> put (SerList xs)
put _ = put True
get = do
stop <- get :: Get Bool
if stop
then return (SerList [])
else do
x <- get
SerList xs <- get
return (SerList (x : xs))
{- divides list into chunks of size n -}
chunksOf :: Int -> [a] -> [[a]]
chunksOf n = go
where go xs = case splitAt n xs of
(ys,zs) | null ys -> []
| otherwise -> ys : go zs
{- dimension of grid, list of random floats [0,1]
returns a list of (x,y) points of length n^2 such that all
points are in the range [0,1] and the points are a randomly
perturbed regular grid -}
getGridR :: Int -> [Float] -> [(Float,Float)]
getGridR n rs = pts where
nn = n * n
(irs,jrs) = splitAt nn rs
n' = fromIntegral n
grid = [ (p,q) | p <- [0..n'-1], q <- [0..n'-1] ]
pts = zipWith (\(p,q) (ir,jr) -> ((p+ir)/n',(q+jr)/n')) grid (zip irs jrs)
{- an infinite list of random floats in range [0,1] -}
randomFloats :: PureMT -> [Float]
randomFloats rng = let (d,rng') = first double2Float (randomDouble rng)
in d : randomFloats rng'
Comments on the changes:
shuffler is now a traversal with the State functor. It produces, in a single pass through the input list, a list of pairs, in which each grid is paired with its shuffled version. createGrids then (lazily) unzips this list.
The files are written to using pipes machinery, in a way loosely inspired by this answer (I originally wrote this using P.foldM). Note that the hPut I used is the strict bytestring one, for it acts on strict chunks supplied by the producer made with P.zip (which, in spirit, is a pair of lazy bytestrings that supplies chunks in pairs).
SerList is there to hold the custom Binary instance Thomas M. DuBuisson alludes to. Note that I haven't thought too much about laziness and strictness in the get method of the instance. If that causes you trouble, this question looks useful.