Overflow problems in dealing with IO and MonadRandom and chained computations - haskell

So basically I have a computation step that takes in a previous result and outputs a Rand g Path, where Path is a custom data type (think of it like a traveling salesman kind of problem). I'm letting MonadRandom handle all of the generator passing and stuff.
I want to find the, say, nth composition of this computation upon itself. Right now I'm using
thecomputation :: (RandomGen g) => Rand g Path
thecomputation = (iterate (>>= step) (return startingPath)) !! n
And then to print it out I would run
main = do
res <- evalRandIO thecomputation
print res
However, I have a problem
If I pick a high enough n (i need on the order of 10^6), I get a stack overflow.
I've managed to track the problem to the fact that thecomputation is actually a heavily composed (nested?) IO object. It's a series of IO computations and so ghc has to keep track of all of those layers of nested IO's, and after enough layers, it gives up.
How am I supposed to deal with this? In an imperative language there really isn't much to this. But what should I do here? Should I force some of the IO's to evaluate or ...?
There is a similar question on this site but I wasn't able to get anything helpful out of the accepted answer so I'm still pretty lost
Concrete Example
import System.Random
import Control.Monad.Random
import Control.Monad
data Path = DoublePath Double deriving Show
step :: (RandomGen g) => Path -> Rand g Path
step (DoublePath x) = do
dx <- getRandom
return (DoublePath ((x + dx)/x))
thecomputation :: (RandomGen g) => Rand g Path
thecomputation = (iterate (>>= step) (return (DoublePath 10.0))) !! 1000000
main = do
result <- evalRandIO thecomputation
print result
does overflow on my computer

You are bitten by lazyness: Everytime you call step on some value x, GHC is creating a thunk step x that is not evaluated until the final value is required.
A simple fix is to make step strict in its argument, e.g. by pattern-matching on DoublePath !x (and using -XBangPatterns) or inserting x `seq` before the body of the function. Then your code finished without stack overflow (heh).

It is enough to make the type strict. This ought to be second nature especially for numerical and other 'unboxable' parameters and doesn't require a language extension.
data Path = DoublePath !Double deriving Show
-- $ ghc -O2 doublepath.hs
-- ...
-- $ time ./doublepath
-- DoublePath 1.526581416150007
-- real 0m2.516s
-- user 0m2.307s
-- sys 0m0.092s

Related

Haskell: How to use a HashMap in a main function

I beg for your help, speeding up the following program:
main = do
jobsToProcess <- fmap read getLine
forM_ [1..jobsToProcess] $ \_ -> do
[r, k] <- fmap (map read . words) getLine :: IO [Int]
putStrLn $ doSomeReallyLongWorkingJob r k
There could(!) be a lot of identical jobs to do, but it's not up to me modifying the inputs, so I tried to use Data.HashMap for backing up already processed jobs. I already optimized the algorithms in the doSomeReallyLongWorkingJob function, but now it seems, it's quite as fast as C.
But unfortunately it seems, I'm not able to implement a simple cache without producing a lot of errors. I need a simple cache of Type HashMap (Int, Int) Int, but everytime I have too much or too few brackets. And IF I manage to define the cache, I'm stuck in putting data into or retrieving data from the cache cause of lots of errors.
I already Googled for some hours but it seems I'm stuck. BTW: The result of the longrunner is an Int as well.
It's pretty simple to make a stateful action that caches operations. First some boilerplate:
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.State
import Data.Map (Map)
import qualified Data.Map as M
import Debug.Trace
I'll use Data.Map, but of course you can substitute in a hash map or any similar data structure without much trouble. My long-running computation will just add up its arguments. I'll use trace to show when this computation is executed; we'll hope not to see the output of the trace when we enter a duplicate input.
reallyLongRunningComputation :: [Int] -> Int
reallyLongRunningComputation args = traceShow args $ sum args
Now the caching operation will just look up whether we've seen a given input before. If we have, we'll return the precomputed answer; otherwise we'll compute the answer now and store it.
cache :: (MonadState (Map a b) m, Ord a) => (a -> b) -> a -> m b
cache f x = do
mCached <- gets (M.lookup x)
case mCached of
-- depending on your goals, you may wish to force `result` here
Nothing -> modify (M.insert x result) >> return result
Just cached -> return cached
where
result = f x
The main function now just consists of calling cache reallyLongRunningComputation on appropriate inputs.
main = do
iterations <- readLn
flip evalStateT M.empty . replicateM_ iterations
$ liftIO getLine
>>= liftIO . mapM readIO . words
>>= cache reallyLongRunningComputation
>>= liftIO . print
Let's try it in ghci!
> main
5
1 2 3
[1,2,3]
6
4 5
[4,5]
9
1 2
[1,2]
3
1 2
3
1 2 3
6
As you can see by the bracketed outputs, reallyLongRunningComputation was called the first time we entered 1 2 3 and the first time we entered 1 2, but not the second time we entered these inputs.
I hope i'm not too far off base, but first you need a way to carry around the past jobs with you. Easiest would be to use a foldM instead of a forM.
import Control.Monad
import Data.Maybe
main = do
jobsToProcess <- fmap read getLine
foldM doJobAcc acc0 [1..jobsToProcess]
where
acc0 = --initial value of some type of accumulator, i.e. hash map
doJobAcc acc _ = do
[r, k] <- fmap (map read . words) getLine :: IO [Int]
case getFromHash acc (r,k) of
Nothing -> do
i <- doSomeReallyLongWorkingJob r k
return $ insertNew acc (r,k) i
Just i -> do
return acc
Note, I don't actually use the interface for putting and getting the hash table key. It doesn't actually have to be a hash table, Data.Map from containers could work. Or even a list if its going to be a small one.
Another way to carry around the hash table would be to use a State transformer monad.
I am just adding this answer since I feel like the other answers are diverging a bit from the original question, namely using hashtable constructs in Main function (inside IO monad).
Here is a minimal hashtable example using hashtables module. To install the module with cabal, simply use
cabal install hashtables
In this example, we simply put some values in a hashtable and use lookup to print a value retrieved from the table.
import qualified Data.HashTable.IO as H
main :: IO ()
main = do
t <- H.new :: IO (H.CuckooHashTable Int String)
H.insert t 22 "Hello world"
H.insert t 5 "No problem"
msg <- H.lookup t 5
print msg
Notice that we need to use explicit type annotation to specify which implementation of the hashtable we wish to use.

Randomness in a nested pure function

I want to provide a function that replaces each occurrence of # in a string with a different random number. In a non-pure language, it's trivial. However, how should it be designed in a pure language? I don't want to use unsafePerformIO, as it rather looks like a hack and not a proper design.
Should this function require a random generator as one of its parameters? And if so, would that generator have to be passed through the whole stack of invocations? Are there other possible approaches? Should I use the State monad, here? I would appreciate a toy example demonstrating a viable approach...
You would, in fact, use a variant of the state monad to pass the random generator around behind the scenes. The Rand type in Control.Monad.Random helps with this. The API is a bit confusing, but more because it's polymorphic over the type of random generator you use than because it has to be functional. This extra bit of scaffolding is useful, however, because you can easily reuse your existing code with different random generators which lets you test different algorithms as well as explicitly controlling whether the generator is deterministic (good for testing) or seeded with outside data (in IO).
Here's a simple example of Rand in action. The RandomGen g => in the type signature tells us that we can use any type of random generator for it. We have to explicitly annotate n as an Int because otherwise GHC only knows that it has to be some numeric type that can be generated and turned into a string, which can be one of multiple possible options (like Double).
randomReplace :: RandomGen g => String -> Rand g String
randomReplace = foldM go ""
where go str '#' = do
n :: Int <- getRandomR (0, 10)
return (str ++ show n)
go str chr = return $ str ++ [chr]
To run this, we need to get a random generator from somewhere and pass it into evalRand. The easiest way to do this is to get the global system generator which we can do in IO:
main :: IO ()
main = do gen <- getStdGen
print $ evalRand (randomReplace "ab#c#") gen
This is such a common pattern that the library provides an evalRandIO function which does it for you:
main :: IO ()
main = do res <- evalRandIO $ randomReplace "ab#c#"
print res
In the end, the code is a bit more explicit about having a random generator and passing it around, but it's still reasonably easy to follow. For more involved code, you could also use RandT, which allows you to extend other monads (like IO) with the ability to generate random values, letting you relegate all the plumbing and setup to one part of your code.
It's just a monadic mapping
import Control.Applicative
import Control.Monad.Random
import Data.Char
randomReplace :: RandomGen g => String -> Rand g String
randomReplace = mapM f where
f '#' = intToDigit <$> getRandomR (0, 10)
f c = return c
main = evalRandIO (randomReplace "#abc#def#") >>= print

MonadRandom: why stack overflow happens?

This question is certainly for stackoverflow.com
here is the sample
module Main where
import Control.Monad.Random
import Control.Exception
data Tdata = Tdata Int Int Integer String
randomTdata :: (Monad m, RandomGen g) => RandT g m Tdata
randomTdata = do
a <- getRandom
b <- getRandom
c <- getRandom
return $ Tdata a b c "random"
manyTdata :: IO [Tdata]
manyTdata = do
g <- newStdGen
evalRandT (sequence $ repeat randomTdata) g
main = do
a <- manyTdata
b <- evaluate $ take 1 a
return ()
after compilation this return
Stack space overflow: current size 8388608 bytes.
Use `+RTS -Ksize -RTS' to increase it
How can it happen ? Is MonadRandom not lazy or what else ? And how to define the cause of stack overflow in cases like that ?
The issue arises because you are building IO into your manyTdata function.
The monad transformer ends up being of type RandT g IO Tdata. Because each element of
your infinite list can consist of IO actions, the entirety of the infinite list
returned by manyTdata must be evaluated completely before the function can return
any results.
The simplest solution would be to use Rand instead of RandT, as using the tranformer
isn't really useful here, anyway; you could also change the base monad to something like
the Identity monad by changing manyTdata to
manyTdata :: IO [Tdata]
manyTdata = do
g <- newStdGen
return $ runIdentity $ evalRandT (sequence $ repeat randomTdata) g
Which will terminate in a finite amount of time. The error concerning your stack size
is simply a result of recursively expanding your list of IO actions. Your code says to sequence all of these actions, so they all have to be performed, it has nothing to do with laziness.
Something else to think about, rather than using randomTdata, consider
making Tdata an instance of the Random class.

Dice Game in Haskell

I'm trying to spew out randomly generated dice for every roll that the user plays. The user has 3 rolls per turn and he gets to play 5 turns (I haven't implemented this part yet and I would appreciate suggestions).
I'm also wondering how I can display the colors randomly. I have the list of tuples in place, but I reckon I need some function that uses random and that list to match those colors. I'm struggling as to how.
module Main where
import System.IO
import System.Random
import Data.List
diceColor = [("Black",1),("Green",2),("Purple",3),("Red",4),("White",5),("Yellow",6)]
{-
randomList :: (RandomGen g) -> Int -> g -> [Integer]
random 0 _ = []
randomList n generator = r : randomList (n-1) newGenerator
where (r, newGenerator) = randomR (1, 6) generator
-}
rand :: Int -> [Int] -> IO ()
rand n rlst = do
num <- randomRIO (1::Int, 6)
if n == 0
then doSomething rlst
else rand (n-1) (num:rlst)
doSomething x = putStrLn (show (sort x))
main :: IO ()
main = do
--hSetBuffering stdin LineBuffering
putStrLn "roll, keep, score?"
cmd <- getLine
doYahtzee cmd
--rand (read cmd) []
doYahtzee :: String -> IO ()
doYahtzee cmd = do
if cmd == "roll"
then rand 5 []
else do print "You won"
There's really a lot of errors sprinkled throughout this code, which suggests to me that you tried to build the whole thing at once. This is a recipe for disaster; you should be building very small things and testing them often in ghci.
Lecture aside, you might find the following facts interesting (in order of the associated errors in your code):
List is deprecated; you should use Data.List instead.
No let is needed for top-level definitions.
Variable names must begin with a lower case letter.
Class prerequisites are separated from a type by =>.
The top-level module block should mainly have definitions; you should associate every where clause (especially the one near randomList) with a definition by either indenting it enough not to be a new line in the module block or keeping it on the same line as the definition you want it to be associated with.
do introduces a block; those things in the block should be indented equally and more than their context.
doYahtzee is declared and used as if it has three arguments, but seems to be defined as if it only has one.
The read function is used to parse a String. Unless you know what it does, using read to parse a String from another String is probably not what you want to do -- especially on user input.
putStrLn only takes one argument, not four, and that argument has to be a String. However, making a guess at what you wanted here, you might like the (!!) and print functions.
dieRoll doesn't seem to be defined anywhere.
It's possible that there are other errors, as well. Stylistically, I recommend that you check out replicateM, randomRs, and forever. You can use hoogle to search for their names and read more about them; in the future, you can also use it to search for functions you wish existed by their type.

How to get system time in Haskell using Data.Time.Clock?

I'm needing some Ints to use as seed to random number generation and so I wanted to use the old trick of using the system time as seed.
So I tried to use the Data.Time package and I managed to do the following:
import Data.Time.Clock
time = getCurrentTime >>= return . utctDayTime
When I run time I get things like:
Prelude Data.Time.Clock> time
55712.00536s
The type of time is IO DiffTime. I expected to see an IO Something type as this depends on things external to the program. So I have two questions:
a) Is it possible to somehow unwrap the IO and get the underlying DiffTime value?
b) How do I convert a DiffTime to an integer with it's value in seconds? There's a function secondsToDiffTime but I couldn't find its inverse.
Is it possible to somehow unwrap the IO and get the underlying DiffTime value?
Yes. There are dozens of tutorials on monads which explain how. They are all based on the idea that you write a function that takes DiffTime and does something (say returning IO ()) or just returns an Answer. So if you have f :: DiffTime -> Answer, you write
time >>= \t -> return (f t)
which some people would prefer to write
time >>= (return . f)
and if you have continue :: DiffTime -> IO () you have
time >>= continue
Or you might prefer do notation:
do { t <- time
; continue t -- or possibly return (f t)
}
For more, consult one of the many fine tutorals on monads.
a) Of course it is possible to get the DiffTime value; otherwise, this function would be rather pointless. You'll need to read up on monads. This chapter and the next of Real World Haskell has a good introduction.
b) The docs for DiffTime say that it's an instance of the Real class, i.e. it can be treated as a real number, in this case the number of seconds. Converting it to seconds is thus a simple matter of chaining conversion functions:
diffTimeToSeconds :: DiffTime -> Integer
diffTimeToSeconds = floor . toRational
If you are planning to use the standard System.Random module for random number generation, then there is already a generator with a time-dependent seed initialized for you: you can get it by calling getStdGen :: IO StdGen. (Of course, you still need the answer to part (a) of your question to use the result.)
This function is not exactly what the OP asks. But it's useful:
λ: import Data.Time.Clock
λ: let getSeconds = getCurrentTime >>= return . fromRational . toRational . utctDayTime
λ: :i getSeconds
getSeconds :: IO Double -- Defined at <interactive>:56:5
λ: getSeconds
57577.607162
λ: getSeconds
57578.902397
λ: getSeconds
57580.387334

Resources