Pure pseudo-random generators for Haskell with a nice API? - haskell

What are the recommended Haskell packages for pure pseudo-random generators (uniform Doubles)?
I'm interested in a convenient API in the first place, speed would be nice too.
Maybe mwc-random?

I like the mersenne-random-pure64 package. For example you can use it like this to generate an infinite lazy stream of random doubles from a seed value:
import Data.Word (Word64)
import Data.List (unfoldr)
import System.Random.Mersenne.Pure64
randomStream :: (PureMT -> (a, PureMT)) -> PureMT -> [a]
randomStream rndstep g = unfoldr (Just . rndstep) g
toStream :: Word64 -> [Double]
toStream seed = randomStream randomDouble $ pureMT seed
main = print . take 10 $ toStream 42
using System.Random (randoms)
You can get a similar output with the builtin randoms function, which is shorter and more general (Thanks to ehird for pointing that out):
import System.Random (randoms)
import System.Random.Mersenne.Pure64 (pureMT)
main = print . take 10 $ randomdoubles where
randomdoubles :: [Double]
randomdoubles = randoms $ pureMT 42
making it an instance of MonadRandom
After reading about MonadRandom i got curious how to get PureMT working as an instance of it. Out of the box it doesn't work, because PureMT doesn't instantiate RandomGen's split function. One way to make it work is to wrap PureMT in a newtype and write a custom split instance for the RandomGen typeclass, for which there exists a default MonadRandom instance.
import Control.Monad.Random
import System.Random.Mersenne.Pure64
getTenRandomDoubles :: Rand MyPureMT [Double]
getTenRandomDoubles = getRandoms >>= return . take 10
main = print $ evalRand getTenRandomDoubles g
where g = MyPureMT $ pureMT 42
newtype MyPureMT = MyPureMT { unMyPureMT :: PureMT }
myPureMT = MyPureMT . pureMT
instance RandomGen MyPureMT where
next = nextMyPureMT
split = splitMyPureMT
splitMyPureMT :: MyPureMT -> (MyPureMT, MyPureMT)
splitMyPureMT (MyPureMT g) = (myPureMT s, myPureMT s') where
(s',g'') = randomWord64 g'
(s ,g' ) = randomWord64 g
nextMyPureMT (MyPureMT g) = (s, MyPureMT g') where
(s, g') = randomInt g

The standard System.Random has a pure interface. I would recommend wrapping it in a State g (for whatever generator g you're using) to avoid threading the state; the state function makes turning functions like next into stateful actions easy:
next :: (RandomGen g) => g -> (Int, g)
state :: (s -> (a, s)) -> State s a
state next :: (RandomGen g) => State g Int
The MonadRandom package is based on the State g interface with pre-written wrappers for the generator functions; I think it's fairly popular.
Note that you can still run actions using this pure interface on the global RNG. MonadRandom has evalRandIO for the purpose.
I think you could write an (orphan) RandomGen instance to use mwc-random with these.

A particularly nice package with a pure interface that is also suitable for cryptographic applications and yet maintains a high performance is the cprng-aes package.
It provides two interfaces: A deterministic pure one using the type classes from System.Random as well as a strong IO interface using the type classes from the Crypto-API package.
As a side note: I would generally prefer the mersenne-random packages over mwc-random. They use the original Mersenne Twister algorithm and in my benchmarks outperformed mwc-random by a large factor.

Related

Haskell: Replace mapM in a monad transformer stack to achieve lazy evaluation (no space leaks)

It has already been discussed that mapM is inherently not lazy, e.g. here and here. Now I'm struggling with a variation of this problem where the mapM in question is deep inside a monad transformer stack.
Here's a function taken from a concrete, working (but space-leaking) example using LevelDB that I put on gist.github.com:
-- read keys [1..n] from db at DirName and check that the values are correct
doRead :: FilePath -> Int -> IO ()
doRead dirName n = do
success <- runResourceT $ do
db <- open dirName defaultOptions{ cacheSize= 2048 }
let check' = check db def in -- is an Int -> ResourceT IO Bool
and <$> mapM check' [1..n] -- space leak !!!
putStrLn $ if success then "OK" else "Fail"
This function reads the values corresponding to keys [1..n] and checks that they are all correct. The troublesome line inside the ResourceT IO a monad is
and <$> mapM check' [1..n]
One solution would be to use streaming libraries such as pipes, conduit, etc. But these seem rather heavy and I'm not at all sure how to use them in this situation.
Another path I looked into is ListT as suggested here. But the type signatures of ListT.fromFoldable :: [Bool]->ListT Bool and ListT.fold :: (r -> a -> m r) -> r -> t m a -> mr (where m=IO and a,r=Bool) do not match the problem at hand.
What is a 'nice' way to get rid of the space leak?
Update: Note that this problem has nothing to do with monad transformer stacks! Here's a summary of the proposed solutions:
1) Using Streaming:
import Streaming
import qualified Streaming.Prelude as S
S.all_ id (S.mapM check' (S.each [1..n]))
2) Using Control.Monad.foldM:
foldM (\a i-> do {b<-check' i; return $! a && b}) True [1..n]
3) Using Control.Monad.Loops.allM
allM check' [1..n]
I know you mention you don't want to use streaming libraries, but your problem seems pretty easy to solve with streaming without changing the code too much.
import Streaming
import qualified Streaming.Prelude as S
We use each [1..n] instead of [1..n] to get a stream of elements:
each :: (Monad m, Foldable f) => f a -> Stream (Of a) m ()
Stream the elements of a pure, foldable container.
(We could also write something like S.take n $ S.enumFrom 1).
We use S.mapM check' instead of mapM check':
mapM :: Monad m => (a -> m b) -> Stream (Of a) m r -> Stream (Of b) m r
Replace each element of a stream with the result of a monadic action
And then we fold the stream of booleans with S.all_ id:
all_ :: Monad m => (a -> Bool) -> Stream (Of a) m r -> m Bool
Putting it all together:
S.all_ id (S.mapM check' (S.each [1..n]))
Not too different from the code you started with, and without the need for any new operator.
I think what you need is allM from the monad-loops package.
Then it would be just allM check' [1..n]
(Or if you don't want the import it's a pretty small function to copy.)

What are good Haskell conventions for managing deeply nested bracket patterns?

I am currently working with Haskell bindings to a HDF5 C library. Like many C libraries, this one uses many pointers in its functions calls.
The usual "best practice" Haskell functions for allocating and releasing C resources follow the bracket pattern, like alloca, withArray, etc. In using them, I often enter several nested brackets. For instance, here is a small excerpt for HDF5 bindings:
selectHyperslab rID dName = withDataset rID dName $ \dID -> do
v <- withDataspace 10 $ \dstDS -> do
srcDS <- c'H5Dget_space dID
dat <- alloca3 (0, 1, 10) $ \(start, stride, count) -> do
err <- c'H5Sselect_hyperslab srcDS c'H5S_SELECT_SET start stride count nullPtr
-- do some work ...
return value
alloca3 (a, b, c) action =
alloca $ \aP -> do
poke aP a
alloca $ \bP -> do
poke bP b
alloca $ \cP -> do
poke cP c
action (aP, bP, cP)
In the code above, the nested brackets are bracket functions I wrote withDataset, withDataspace, and alloca3, which I wrote to prevent the bracket nesting from going another 3 levels deep in the code. For C libraries with lots of resource acquisition calls and pointer arguments, coding with the standard bracket primitives can get unmanageable (which is why I wrote alloca3 to reduce the nesting.)
So generally, are there any best practices or coding techniques to help reduce the nesting of brackets when needing to allocate and deallocate many resources (such as with C calls)? The only alternative I have found is the ResourceT transformer, which from the tutorial looks like it is designed to make interleaving resource acquire/release possible, and not to simplify the bracket pattern.
Recently I was investigating this problem in Scala. The recurring pattern is (a -> IO r) -> IO r, where a given function is executed within some resource allocation context given a value of type a. And this is just ContT r IO a, which is readily available in Haskell. So we can write:
import Control.Monad
import Control.Monad.Cont
import Control.Monad.IO.Class
import Control.Exception (bracket)
import Foreign.Ptr (Ptr)
import Foreign.Storable (Storable)
import Foreign.Marshal.Alloc (alloca)
allocaC :: Storable a => ContT r IO (Ptr a)
allocaC = ContT alloca
bracketC :: IO a -> (a -> IO b) -> ContT r IO a
bracketC start end = ContT (bracket start end)
bracketC_ :: IO a -> IO b -> ContT r IO a
bracketC_ start end = ContT (bracket start (const end))
-- ...etc...
-- | Example:
main :: IO ()
main = flip runContT return $ do
bracketC_ (putStrLn "begin1") (putStrLn "end1")
bracketC_ (putStrLn "begin2") (putStrLn "end2")
liftIO $ putStrLn "..."
The standard monad/applicative functions allow you to simplify a lot of your code, for example:
allocAndPoke :: (Storable a) => a -> ContT r IO (Ptr a)
allocAndPoke x = allocaC >>= \ptr -> liftIO (poke ptr x) >> return ptr
-- With the monad alloca3 won't be probably needed, just as an example:
alloca3C (a, b, c) =
(,,) <$> allocAndPoke a <*> allocAndPoke b <*> allocAndPoke c
allocaManyC :: (Storable a) => [a] -> ContT r IO [Ptr a]
allocaManyC = mapM allocAndPoke

How to generate a random propositional formula (CNF) in haskell?

How can I get a random propositional formula in haskell? Preferably I need the formula in CNF, but I would
I want to use the formulas for performance testing that also involves SAT solvers. Please note that my goal is not to test the performance of SAT solvers! I am also not interested in very difficult formulas, so the difficulty should be random or otherwise only include easy formulas.
I know that my real world data leads to propositional formulas that are not difficult for SAT solvers.
At the moment I use the hatt and SBV libraries as data structures to work with propositional formulas. I also looked at the hGen library, perhaps it can be used to generate the random formulas. However there is no documentation and I didn’t get far by looking at the source code of hGen.
My goal is to chose n and get back a formula that includes n boolean variables.
If you want to generate something randomly, I suggest a nondeterminism monad, of which MonadRandom is a popular choice.
I would suggest two inputs to this procedure: vars, the number of variables, and clauses the number of clauses. Of course you could always generate the number of clauses at random as well, using this same idea. Here's a sketch:
import Control.Monad.Random (Rand, StdGen, uniform)
import Control.Applicative ((<$>))
import Control.Monad (replicateM)
type Cloud = Rand StdGen -- for "probability cloud"
newtype Var = Var Int
data Atom = Positive Var -- X
| Negative Var -- not X
type CNF = [[Atom]] -- conjunction of disjunctions
genCNF :: Int -> Int -> Cloud CNF
genCNF vars clauses = replicateM clauses genClause
where
genClause :: Could [Atom]
genClause = replicateM 3 genAtom -- for CNF-3
genAtom :: Cloud Atom
genAtom = do
f <- uniform [Positive, Negative]
v <- Var <$> uniform [0..vars-1]
return (f v)
I included the optional type signatures in the where clause to make it easier to follow the structure.
Essentially, follow the "grammar" of what you are trying to generate; with each nonterminal associate with a gen* function.
I don't know how to judge whether a CNF expression is easy or hard.
Using the types in hatt:
import Data.Logic.Propositional
import System.Random
import Control.Monad.State
import Data.Maybe
import Data.SBV
type Rand = State StdGen
rand :: Random a => (a, a) -> Rand a
rand = state . randomR
runRand :: Rand a -> IO a
runRand r = randomIO >>= return . fst . runState r . mkStdGen
randFormula :: Rand Expr
randFormula = rand (3, 10) >>= randFormulaN 50
randFormulaN :: Int -> Int -> Rand Expr
randFormulaN negC n = replicateM n clause >>= return . foldl1 Conjunction
where vars = take n (map (Variable . Var) ['a'..])
clause = rand (1, n) >>= mapM f . flip take vars >>= return . foldl1 Disjunction
f v = rand (0,100) >>= \neg -> return (if neg <= negC then Negation v else v)

Join two consumers into a single consumer that returns multiple values?

I have been experimenting with the new pipes-http package and I had a thought. I have two parsers for a web page, one that returns line items and another a number from elsewhere in the page. When I grab the page, it'd be nice to string these parsers together and get their results at the same time from the same bytestring producer, rather than fetching the page twice or fetching all the html into memory and parsing it twice.
In other words, say you have two Consumers:
c1 :: Consumer a m r1
c2 :: Consumer a m r2
Is it possible to make a function like this:
combineConsumers :: Consumer a m r1 -> Consumer a m r2 -> Consumer a m (r1, r2)
combineConsumers = undefined
I have tried a few things, but I can't figure it out. I understand if it isn't possible, but it would be convenient.
Edit:
I'm sorry it turns out I was making an assumption about pipes-attoparsec, due to my experience with conduit-attoparsec that caused me to ask the wrong question. Pipes-attoparsec turns an attoparsec into a pipes Parser when I just assumed that it would return a pipes Consumer. That means that I can't actually turn two attoparsec parsers into consumers that take text and return a result, then use them with the plain old pipes ecosystem. I'm sorry but I just don't understand pipes-parse.
Even though it doesn't help me, Arthur's answer is pretty much what I envisioned when I asked the question, and I'll probably end up using his solution in the future. In the meantime I'm just going to use conduit.
It the results are "monoidal", you can use the tee function from the Pipes prelude, in combination with a WriterT.
{-# LANGUAGE OverloadedStrings #-}
import Data.Monoid
import Control.Monad
import Control.Monad.Writer
import Control.Monad.Writer.Class
import Pipes
import qualified Pipes.Prelude as P
import qualified Data.Text as T
textSource :: Producer T.Text IO ()
textSource = yield "foo" >> yield "bar" >> yield "foo" >> yield "nah"
counter :: Monoid w => T.Text
-> (T.Text -> w)
-> Consumer T.Text (WriterT w IO) ()
counter word inject = P.filter (==word) >-> P.mapM (tell . inject) >-> P.drain
main :: IO ()
main = do
result <-runWriterT $ runEffect $
hoist lift textSource >->
P.tee (counter "foo" inject1) >-> (counter "bar" inject2)
putStrLn . show $ result
where
inject1 _ = (,) (Sum 1) mempty
inject2 _ = (,) mempty (Sum 1)
Update: As mentioned in a comment, the real problem I see is that in pipes parsers aren't Consumers. And how can you run two parsers concurrently if they have different behaviours regarding leftovers? What happens if one of the parsers wants to "un-draw" some text and the other parser doesn't?
One possible solution is to run the parsers in a truly concurrent manner, in different threads. The primitives in the pipes-concurrency package let you "duplicate" a Producer by writing the same data to two different mailboxes. And then each parser can do whatever it wants with its own copy of the producer. Here's an example which also uses the pipes-parse, pipes-attoparsec and async packages:
{-# LANGUAGE OverloadedStrings #-}
import Data.Monoid
import qualified Data.Text as T
import Data.Attoparsec.Text hiding (takeWhile)
import Data.Attoparsec.Combinator
import Control.Applicative
import Control.Monad
import Control.Monad.State.Strict
import Pipes
import qualified Pipes.Prelude as P
import qualified Pipes.Attoparsec as P
import qualified Pipes.Concurrent as P
import qualified Control.Concurrent.Async as A
parseChars :: Char -> Parser [Char]
parseChars c = fmap mconcat $
many (notChar c) *> many1 (some (char c) <* many (notChar c))
textSource :: Producer T.Text IO ()
textSource = yield "foo" >> yield "bar" >> yield "foo" >> yield "nah"
parseConc :: Producer T.Text IO ()
-> Parser a
-> Parser b
-> IO (Either P.ParsingError a,Either P.ParsingError b)
parseConc producer parser1 parser2 = do
(outbox1,inbox1,seal1) <- P.spawn' P.Unbounded
(outbox2,inbox2,seal2) <- P.spawn' P.Unbounded
feeding <- A.async $ runEffect $ producer >-> P.tee (P.toOutput outbox1)
>-> P.toOutput outbox2
sealing <- A.async $ A.wait feeding >> P.atomically seal1 >> P.atomically seal2
r <- A.runConcurrently $
(,) <$> (A.Concurrently $ parseInbox parser1 inbox1)
<*> (A.Concurrently $ parseInbox parser2 inbox2)
A.wait sealing
return r
where
parseInbox parser inbox = evalStateT (P.parse parser) (P.fromInput inbox)
main :: IO ()
main = do
(Right a, Right b) <- parseConc textSource (parseChars 'o') (parseChars 'a')
putStrLn . show $ (a,b)
The result is:
("oooo","aa")
I'm not sure how much overhead this approach introduces.
I think something is wrong with the way you are going about this, for the reasons Davorak mentions in his remark. But if you really need such a function, you can define it.
import Pipes.Internal
import Pipes.Core
zipConsumers :: Monad m => Consumer a m r -> Consumer a m s -> Consumer a m (r,s)
zipConsumers p q = go (p,q) where
go (p,q) = case (p,q) of
(Pure r , Pure s) -> Pure (r,s)
(M mpr , ps) -> M (do pr <- mpr
return (go (pr, ps)))
(pr , M mps) -> M (do ps <- mps
return (go (pr, ps)))
(Request _ f, Request _ g) -> Request () (\a -> go (f a, g a))
(Request _ f, Pure s) -> Request () (\a -> do r <- f a
return (r, s))
(Pure r , Request _ g) -> Request () (\a -> do s <- g a
return (r,s))
(Respond x _, _ ) -> closed x
(_ , Respond y _) -> closed y
If you are 'zipping' consumers without using their return value, only their 'effects' you can just use tee consumer1 >-> consumer2
The idiomatic solution is to rewrite your Consumers as a Fold or FoldM from the foldl library and then combine them using Applicative style. You can then convert this combined fold to one that works on pipes.
Let's assume that you either have two Folds:
fold1 :: Fold a r1
fold2 :: Fold a r2
... or two FoldMs:
foldM1 :: Monad m => FoldM a m r1
foldM2 :: Monad m => FoldM a m r2
Then you combine these into a single Fold/FoldM using Applicative style:
import Control.Applicative
foldBoth :: Fold a (r1, r2)
foldBoth = (,) <$> fold1 <*> fold2
foldBothM :: Monad m => FoldM a m (r1, r2)
foldBothM = (,) <$> foldM1 <*> foldM2
-- or: foldBoth = liftA2 (,) fold1 fold2
-- foldMBoth = liftA2 (,) foldM1 foldM2
You can turn either fold into a Pipes.Prelude-style fold or a Parser. Here are the necessary conversion functions:
import Control.Foldl (purely, impurely)
import qualified Pipes.Prelude as Pipes
import qualified Pipes.Parse as Parse
purely Pipes.fold
:: Monad m => Fold a b -> Producer a m () -> m b
impurely Pipes.foldM
:: Monad m => FoldM m a b -> Producer a m () -> m b
purely Parse.foldAll
:: Monad m => Fold a b -> Parser a m r
impurely Parse.foldMAll
:: Monad m => FoldM a m b -> Parser a m r
The reason for the purely and impurely functions is so that foldl and pipes can interoperate without either one incurring a dependency on the other. Also, they allow libraries other than pipes (like conduit) to reuse foldl without a dependency, too (Hint hint, #MichaelSnoyman).
I apologize that this feature is not documented, mainly because it took me a while to figure out how to get pipes and foldl to interoperate in a dependency-free manner, and that was after I wrote the pipes tutorial. I will update the tutorial to point out this trick.
To learn how to use foldl, just read the documentation in the main module. It's a very small and easy-to-learn library.
For what it's worth, in the conduit world, the relevant function is zipSinks. There might be some way to adapt this function to work for pipes, but automatic termination may get in the way.
Consumer forms a Monad so
combineConsumers = liftM2 (,)
will type check. Unfortunately, the semantics might be unlike what you're expecting: the first consumer will run to completion and then the second.

Declare all instances of a typeclass are in another typeclass without modifying the original class declarations

There is an Crypto.Random API inside the crypto-api package that specifies what it means for something to be a "pseudorandom number generator".
I have implemented this API using an instance of System.Random's RandomGen class, namely, StdGen:
instance CryptoRandomGen StdGen where
newGen bs = Right $ mkStdGen $ shift e1 24 + shift e2 16 + shift e3 8 + e4
where (e1 : e2 : e3 : e4 : _) = Prelude.map fromIntegral $ unpack bs
genSeedLength = Tagged 4
genBytes n g = Right $ genBytesHelper n empty g
where genBytesHelper 0 partial gen = (partial, gen)
genBytesHelper n partial gen = genBytesHelper (n-1) (partial `snoc` nextitem) newgen
where (nextitem, newgen) = randomR (0, 255) gen
reseed bs _ = newGen bs
However, this implementation is only for the StdGen type, but it would really work for anything in System.Random's RandomGen typeclass.
Is there a way to say that everything in RandomGen is a member of CryptoRandomGen using the given shim functions? I'd like to be able to do this in my own code, without having to change the source of either of those two libraries. My instincts would be to change the first line to something like
instance (RandomGen a) => CryptoRandomGen a where
but that doesn't appear to be syntactically correct.
Crypto-API author here. Please don't do this - it's really a violation of the implicit properties of CryptoRandomGen.
That said, here's how I'd do it: Just make a newtype that wraps your RandomGen and make that newtype an instance of CryptoRandomGen.
newtype AsCRG g = ACRG { unACRG :: g}
instance RandomGen g => CryptoRandomGen (AsCRG g) where
newGen = -- This is not possible to implement with only a 'RandomGen' constraint. Perhaps you want a 'Default' instance too?
genSeedLength = -- This is also not possible from just 'RandomGen'
genBytes nr g =
let (g1,g2) = split g
randInts :: [Word32]
randInts = B.concat . map Data.Serialize.encode
. take ((nr + 3) `div` 4)
$ (randoms g1 :: [Word32])
in (B.take nr randInts, g2)
reseed _ _ = -- not possible w/o more constraints
newGenIO = -- not possible w/o more constraints
So you see, you can split the generator (or manage many intermediate generators), make the right number of Ints (or in my case, Word32s), encode them, and return the bytes.
Because RandomGen is limited to just generation (and splitting), there isn't any straight-forward way to support instatiation, reinstantiation, or querying properties such as the seed length.
As far as I know, this is impossible, unless you're willing to turn on UndecidableInstances (which, of course, can make the typechecker go in an infinite loop). Here's an example that makes every instance of Monad an instance of Functor:
{-# LANGUAGE FlexibleInstances, UndecidableInstances #-}
module Main
where
import Control.Monad (liftM)
instance (Monad a) => Functor a where
fmap = liftM
-- Test code
data MyState a = MyState { unM :: a }
deriving Show
instance Monad MyState where
return a = MyState a
(>>=) m k = k (unM m)
main :: IO ()
main = print . fmap (+ 1) . MyState $ 1
Testing:
*Main> :main
MyState { unM = 2 }
In your case, this translates to:
{-# LANGUAGE FlexibleInstances, UndecidableInstances #-}
instance (RandomGen a) => CryptoRandomGen a where
newGen = ...
genSeedLength = ...
genBytes = ...
reseed = ...
As an aside, I once asked how to implement this without UndecidableInstances on haskell-cafe and got this answer (the same workaround that Thomas proposed; I consider it ugly).

Resources