I'm trying to find out why the following code has a memory leak:
module Main where
import System.IO
func :: Int -> Int -> ([Int], Int)
func input 0 = ([], input)
func input numTimes = do
let (rest, ret) = func (input + 1) (numTimes - 1)
((input : rest), ret)
main :: IO ()
main = do
hSetBuffering stdout LineBuffering
let (list, final) = func 0 10000000000
listStr = map (\x -> (show x) ++ "\n") list
putStr (foldr (++) "" listStr)
putStr (show final)
printStrs :: [String] -> String -> IO ()
printStrs [] str = do
putStrLn str
printStrs (first : rest) str = do
putStr first
printStrs rest str
When I compile it with ghc --make Main and run it, the top command shows it eating up more and more memory even though the amount of memory it uses should be constant because of lazy evaluation. I've tried using the printStrs function I wrote instead, and it still eats up all memory. I've tried using ghci on the code and using :sprint to print out the thunks from func and it seems like the thunks aren't increasing the amount of memory used for each evaluation of an element in the list.
I honestly don't know what else to do.
The problem is that func will build a huge list and laziness will not be able to avoid it. It reminds me of continuation passing where the order of computations are sequentialized.
I think, the part with foldr is responsible for the memory consumption. By avoiding it and compiling it with ghc -O3, the memory usage is constant in my test:
module Main where
import System.IO
func :: Int -> Int -> ([Int], Int)
func input 0 = ([], input)
func input numTimes = do
let (rest, ret) = func (input + 1) (numTimes - 1)
((input : rest), ret)
main :: IO ()
main = do
hSetBuffering stdout LineBuffering
let (list, final) = func 0 10000000000
mapM_ (putStrLn . show) list
putStr (show final)
In ghci, it still blows the memory. But it might be because the interpreter is not able to optimize the recursion away.
You mention in a comment that
I just want to get intermediate values so I can have some idea of how much progress the program is making and then take the final value as a separate return value
Let's try defining a special-purpose datatype which models the idea of "inspect some bit of progress, or get hold of the final result, if we have finished". Something like
{-# LANGUAGE DeriveFunctor #-}
data Progress a r = Emit a (Progress a r) | Result r
deriving Functor -- maps over the result value r, not over the as
Notice that, unlike ([Int], Int), Progress doesn't give us "direct" access to the final result until we have gone trough all the nested Emit constructors. Hopefully this will help us avoid unexpected dependencies between thunks.
Now let's define func like this:
{-# LANGUAGE BangPatterns #-}
func :: Int -> Int -> Progress Int Int
func input 0 =
Result input
-- the bang avoids the accumulation of thunks behind the input param
func !input numTimes =
Emit input (func (input + 1) (numTimes - 1))
Notice that we don't need to go through all the recursive calls to get hold of the first progress "notification". Even if input is 10000000000, we can pattern-match on the outermost Emit constructor after the first iteration!
The disadvantage of the Progress a r datatype is that we can't easily use regular list functions to print the progress. But we can define our own:
printProgress :: Show a => Progress a r -> IO r
printProgress (Result r) =
pure r
printProgress (Emit a rest) =
do print a
printProgress rest
In practice, we often also want to be able to perform monadic effects at each "step". At that point, it's common to turn to some streaming library like streaming. If you squint a little, the Stream type from "streaming" and similar libraries is basically an effectful list which returns a special result after reaching the end.
Related
I'm learning Monad Transformers and decided to write an interpreter for a simple language(with loop constructs) similar to Brainfuck using Monad Transformers. I would like to terminate the interpreter after certain number of statements.
This simple language is made of single memory cell capable of holding an Int and 5 instructions Input, Output, Increment, Decrement and Loop. A loop terminates when value in the memory is zero. Input is read from a list and similarly output is written to another list. Increment and Decrement does +1 and -1 to memory correspondingly.
I'm using World type to keep track of input, output (streams) and memory, Sum Int to count number of instructions evaluated. Except World to terminate evaluation after certain statements.
module Transformers where
import qualified Data.Map as Map
import Data.Maybe
import Control.Monad.State.Lazy
import Control.Monad.Writer.Lazy
import Control.Monad.Except
data Term = Input
| Output
| Increment
| Decrement
| Loop [Term]
deriving (Show)
data World = World {
inp :: [Int],
out :: [Int],
mem :: Int
} deriving Show
op_limit = 5
loop
:: [Term]
-> StateT World (WriterT (Sum Int) (Except World)) ()
-> StateT World (WriterT (Sum Int) (Except World)) ()
loop terms sp = sp >> do
s <- get
if mem s == 0 then put s else loop terms (foldM (\_ t -> eval t) () terms)
limit :: StateT World (WriterT (Sum Int) (Except World)) ()
limit = do
(s, count) <- listen get
when (count >= op_limit) $ throwError s
tick :: StateT World (WriterT (Sum Int) (Except World)) ()
tick = tell 1
eval :: Term -> StateT World (WriterT (Sum Int) (Except World)) ()
eval Input =
limit >> tick >> modify (\s -> s { inp = tail (inp s), mem = head (inp s) })
eval Output = limit >> tick >> modify (\s -> s { out = mem s : out s })
eval Increment = limit >> tick >> modify (\s -> s { mem = mem s + 1 })
eval Decrement = limit >> tick >> modify (\s -> s { mem = mem s - 1 })
eval (Loop terms) = loop terms (void get)
type Instructions = [Term]
interp :: Instructions -> World -> Either World (World, Sum Int)
interp insts w =
let sp = foldM (\_ inst -> eval inst) () insts
in runExcept (runWriterT (execStateT sp w))
Example run in ghci:
*Transformers> interp [Loop [Output, Decrement]] $ World [] [] 5
Right (World {inp = [], out = [1,2,3,4,5], mem = 0},Sum {getSum = 10})
The monad limit based on count and should decide to either Fail with current state or do nothing. But I noticed that count in (s, count) <- listen get is always zero. I don't understand why is this happening. Please help me understand where I went wrong.
Is my ordering of transformers in the stack correct? Are there any rules (informal) to decide the layering?
Computations inside the Writer monad can't have access to their own accumulator. What's more: the accumulator is never forced while the computation runs, not even to WHNF. This applies to both the strict and lazy variants of Writer—the strict variant is strict in a sense unrelated to the accumulator. This unavoidable laziness in the accumulator can be a source of space leaks if the computation runs for too long.
Your limit function is not branching on the value of the "mainline" WriterT accumulator. The get action (you are using mtl) simply reads the state from the StateT layer, and performs no effects in the other layers: it adds mempty to its WriterT accumulator an throws no error.
Then, the listen extracts the Writer accumulator of the get action (only of the get, not of the whole computation) and adds it to the "mainline" accumulator. But this extracted value (the one returned in the tuple) will always be mempty, that is, Sum 0!
Instead of WriterT, you could put the counter in the StateT state, as #chi has mentioned. You could also use AccumT, which is very similar to WriterT but lets you inspect the accumulator (it also lets you force it to WHNF using bang patterns).
AccumT doesn't seem to have a corresponding mtl typeclass though, so you'll need to sprinkle a few lifts in order to use it.
I beg for your help, speeding up the following program:
main = do
jobsToProcess <- fmap read getLine
forM_ [1..jobsToProcess] $ \_ -> do
[r, k] <- fmap (map read . words) getLine :: IO [Int]
putStrLn $ doSomeReallyLongWorkingJob r k
There could(!) be a lot of identical jobs to do, but it's not up to me modifying the inputs, so I tried to use Data.HashMap for backing up already processed jobs. I already optimized the algorithms in the doSomeReallyLongWorkingJob function, but now it seems, it's quite as fast as C.
But unfortunately it seems, I'm not able to implement a simple cache without producing a lot of errors. I need a simple cache of Type HashMap (Int, Int) Int, but everytime I have too much or too few brackets. And IF I manage to define the cache, I'm stuck in putting data into or retrieving data from the cache cause of lots of errors.
I already Googled for some hours but it seems I'm stuck. BTW: The result of the longrunner is an Int as well.
It's pretty simple to make a stateful action that caches operations. First some boilerplate:
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.State
import Data.Map (Map)
import qualified Data.Map as M
import Debug.Trace
I'll use Data.Map, but of course you can substitute in a hash map or any similar data structure without much trouble. My long-running computation will just add up its arguments. I'll use trace to show when this computation is executed; we'll hope not to see the output of the trace when we enter a duplicate input.
reallyLongRunningComputation :: [Int] -> Int
reallyLongRunningComputation args = traceShow args $ sum args
Now the caching operation will just look up whether we've seen a given input before. If we have, we'll return the precomputed answer; otherwise we'll compute the answer now and store it.
cache :: (MonadState (Map a b) m, Ord a) => (a -> b) -> a -> m b
cache f x = do
mCached <- gets (M.lookup x)
case mCached of
-- depending on your goals, you may wish to force `result` here
Nothing -> modify (M.insert x result) >> return result
Just cached -> return cached
where
result = f x
The main function now just consists of calling cache reallyLongRunningComputation on appropriate inputs.
main = do
iterations <- readLn
flip evalStateT M.empty . replicateM_ iterations
$ liftIO getLine
>>= liftIO . mapM readIO . words
>>= cache reallyLongRunningComputation
>>= liftIO . print
Let's try it in ghci!
> main
5
1 2 3
[1,2,3]
6
4 5
[4,5]
9
1 2
[1,2]
3
1 2
3
1 2 3
6
As you can see by the bracketed outputs, reallyLongRunningComputation was called the first time we entered 1 2 3 and the first time we entered 1 2, but not the second time we entered these inputs.
I hope i'm not too far off base, but first you need a way to carry around the past jobs with you. Easiest would be to use a foldM instead of a forM.
import Control.Monad
import Data.Maybe
main = do
jobsToProcess <- fmap read getLine
foldM doJobAcc acc0 [1..jobsToProcess]
where
acc0 = --initial value of some type of accumulator, i.e. hash map
doJobAcc acc _ = do
[r, k] <- fmap (map read . words) getLine :: IO [Int]
case getFromHash acc (r,k) of
Nothing -> do
i <- doSomeReallyLongWorkingJob r k
return $ insertNew acc (r,k) i
Just i -> do
return acc
Note, I don't actually use the interface for putting and getting the hash table key. It doesn't actually have to be a hash table, Data.Map from containers could work. Or even a list if its going to be a small one.
Another way to carry around the hash table would be to use a State transformer monad.
I am just adding this answer since I feel like the other answers are diverging a bit from the original question, namely using hashtable constructs in Main function (inside IO monad).
Here is a minimal hashtable example using hashtables module. To install the module with cabal, simply use
cabal install hashtables
In this example, we simply put some values in a hashtable and use lookup to print a value retrieved from the table.
import qualified Data.HashTable.IO as H
main :: IO ()
main = do
t <- H.new :: IO (H.CuckooHashTable Int String)
H.insert t 22 "Hello world"
H.insert t 5 "No problem"
msg <- H.lookup t 5
print msg
Notice that we need to use explicit type annotation to specify which implementation of the hashtable we wish to use.
I am trying to write to file a list of random Integers in a file. There seems to be a problem with writeFile here. When I use my function randomFile it says no instance for (Show (IO a0)). I see writeFile doesn't print anything to screen but IO(), so when I call the function randomFile 1 2 3 it says no Instance for Show (IO a0) but actually I just want to execute the function and not have to print anything but how can I avoid this problem. I might be making a lot of errors here. Any help.
import Control.Monad
import Control.Applicative
import System.Random
randNo mind maxd = randomRIO (mind,maxd)
randomFile mind maxd noe = do
let l=(replicate (fromInteger(noe ^ noe)) ( mind `randNo` maxd))
writeFile "RFile.txt" (show l)
I think you have a misunderstanding of what IO is. If you haven't done it, I strongly recommend going through the Input and Output section of Learn You a Haskell.
IO doesn't necessarily have anything to do with print. In Haskell every entry in memory that was made by your own code is considered "pure" while any entry that touches the rest of the computer lives in IO (with some exceptions you will learn about over time).
We model IO using something called a Monad. Which you will learn more about the longer you do Haskell. To understand this, let's look at an example of some code that does and doesn't use IO:
noIOused :: Int -> Int
noIOused x = x + 5
usesIO :: Int -> IO Int
usesIO x = print x >> return (x + 5)
usesIO2 :: Int -> IO Int
usesIO2 x = do
print x
return (x + 5)
The first function is "pure". The second and third functions have an IO "effect" that comes in the form of printing to the screen. usesIO and usesIO2 are just 2 different ways of doing the same thing (it's the same code but with different syntax). I'll use the second format, called do notation from here.
Here are some other ways you could have had IO effects:
add5WithFile :: Int -> IO Int
add5WithFile x = do
writeFile "someFile.txt" (show x)
return (x + 5)
Notice that in that function we didn't print anything, we wrote a file. But writing a file has a side effect and interacts with the rest of the system. So any value we return has to get wrapped in IO.
addRandom :: Int -> IO Int
addRandom x = do
y <- randomRIO (1,10)
return (x + y)
In addRandom we called randomRIO (1,10). But the problem is that randomRIO doesn't return an Int. It returns an IO Int. Why? Because in order to get true randomness we need to interact with the system in some way. To get around that, we have to temporarily strip away the IO. That's where this line comes in:
y <- randomRIO (1,10)
That <- arrow tells us that we want a y value outside of IO. For as long as we remain inside the do syntax that y value is going to be "pure". Now we can use it just like any other value.
So for example we couldn't do this:
let w = x + (randomRIO (1,10))
Because that would be trying to add Int to IO Int. And unfortunately our + function doesn't know how to do that. So first we have to "bind" the result of randomRIO to y before we can add it to x.
Now let's look at your code:
let l=(replicate (fromInteger(noe ^ noe)) ( mind `randNo` maxd))
writeFile "RFile.txt" (show l)
The type of l is actually IO a0. It's a0 because you haven't told the compiler what kind of number you want. So it doesn't know if you want a fraction, a double, a big integer or whatever.
So the first problem is to let the compiler know a little bit more about what kind of random number you want. We do this by adding a type annotation:
randNo :: Int -> Int -> IO Int
randNo mind maxd = randomRIO (mind,maxd)
Now both you and the compiler knows what kind of value randNo is.
Now we need to "bind" that value inside of the do notation to temporarily escape IO. You might think that would be simple, like this:
randomFile mind maxd noe = do
l <- replicate (fromInteger(noe ^ noe)) ( mind `randNo` maxd)
writeFile "RFile.txt" (show l)
Surely that will "bind" the IO Int to l right? Unfortunately not. The problem here is that replicate is a function of the form Int -> a -> [a]. That is, given a number and a type, it will give you a list of that type.
If you give replicate an IO Int it's going to make [IO Int]. That actually looks more like this: List (IO Int) except we use [] as syntactic sugar for lists. Unfortunately if we want to "bind" an IO value to something with <- it has to be the out-most type.
So what you need is a way to turn an [IO Int] into an IO [Int]. There are two ways to do that. If we put \[IO a\] -> IO \[a\] into Hoogle we get this:
sequence :: Monad m => [m a] -> m [a]
As I mentioned before, we generalise IO to something called a Monad. Which isn't really that big a deal, we could pretend that sequence has this signature: sequence :: [IO a] -> IO [a] and it would be the same thing just specialised to IO.
Now your function would be done like this:
randomFile mind maxd noe = do
l <- sequence (replicate (fromInteger(noe ^ noe)) ( mind `randNo` maxd))
writeFile "RFile.txt" (show l)
But a sequence followed by replicate is something people have to do all the time. So someone went and made a function called replicateM:
replicateM :: Monad m => Int -> m a -> m [a]
Now we can write your function like this:
randomFile mind maxd noe = do
l <- replicateM (fromInteger(noe ^ noe)) ( mind `randNo` maxd)
writeFile "RFile.txt" (show l)
And for some real Haskell magic, you can write all 3 lines of code in a single line, like this:
randomFile mind maxd noe = randomRIO >>= writeFile "RFile.txt" . replicateM (fromInteger(noe ^ noe))
If that looks like gibberish to you, then there's a lot you need to learn. Here is the suggested path:
If you haven't already, start from the beginning with Learn You a Haskell
Then learn about how You could have invented Monads
Then learn more about how to use randomness in Haskell
Finally see if you can complete the 20 intermediate Haskell exercises
I am building some moderately large DIMACS files, however with the method used below the memory usage is rather large compared to the size of the files generated, and on some of the larger files I need to generate I run in to out of memory problems.
import Control.Monad.State.Strict
import Control.Monad.Writer.Strict
import qualified Data.ByteString.Lazy.Char8 as B
import Control.Monad
import qualified Text.Show.ByteString as BS
import Data.List
main = printDIMACS "test.cnf" test
test = do
xs <- freshs 100000
forM_ (zip xs (tail xs))
(\(x,y) -> addAll [[negate x, negate y],[x,y]])
type Var = Int
type Clause = [Var]
data DIMACSS = DS{
nextFresh :: Int,
numClauses :: Int
} deriving (Show)
type DIMACSM a = StateT DIMACSS (Writer B.ByteString) a
freshs :: Int -> DIMACSM [Var]
freshs i = do
next <- gets nextFresh
let toRet = [next..next+i-1]
modify (\s -> s{nextFresh = next+i})
return toRet
fresh :: DIMACSM Int
fresh = do
i <- gets nextFresh
modify (\s -> s{nextFresh = i+1})
return i
addAll :: [Clause] -> DIMACSM ()
addAll c = do
tell
(B.concat .
intersperse (B.pack " 0\n") .
map (B.unwords . map BS.show) $ c)
tell (B.pack " 0\n")
modify (\s -> s{numClauses = numClauses s + length c})
add h = addAll [h]
printDIMACS :: FilePath -> DIMACSM a -> IO ()
printDIMACS file f = do
writeFile file ""
appendFile file (concat ["p cnf ", show i, " ", show j, "\n"])
B.appendFile file b
where
(s,b) = runWriter (execStateT f (DS 1 0))
i = nextFresh s - 1
j = numClauses s
I would like to keep the monadic building of clauses since it is very handy, but I need to overcome the memory problem. How do I optimize the above program so that it doesn't use too much memory?
If you want good memory behavior, you need to make sure that you write out the clauses as you generate them, instead of collecting them in memory and dumping them as such, either using lazyness or a more explicit approach such as conduits, enumerators, pipes or the like.
The main obstacle to that approach is that the DIMACS format expects the number of clauses and variables in the header. This prevents the naive implementation from being sufficiently lazy. There are two possibilities:
The pragmatic one is to write the clauses first to a temporary location. After that the numbers are known, so you write them to the real file and append the contents of the temporary file.
The prettier approach is possible if the generation of clauses has no side effects (besides the effects offered by your DIMACSM monad) and is sufficiently fast: Run it twice, first throwing away the clauses and just calculating the numbers, print the header line, run the generator again; now printing the clauses.
(This is from my experience with implementing SAT-Britney, where I took the second approach, because it fitted better with other requirements in that context.)
Also, in your code, addAll is not lazy enough: The list c needs to be retained even after writing (in the MonadWriter sense) the clauses. This is another space leak. I suggest you implement add as the primitive operation and then addAll = mapM_ add.
As explained in Joachim Breitner's answer the problem was that DIMACSM was not lazy enough, both because the strict versions of the monads was used and because the number of variables and clauses are needed before the ByteString can be written to the file. The solution is to use the lazy versions of the Monads and execute them twice. It turns out that it is also necessary to have WriterT be the outer monad:
import Control.Monad.State
import Control.Monad.Writer
...
type DIMACSM a = WriterT B.ByteString (State DIMACSS) a
...
printDIMACS :: FilePath -> DIMACSM a -> IO ()
printDIMACS file f = do
writeFile file ""
appendFile file (concat ["p cnf ", show i, " ", show j, "\n"])
B.appendFile file b
where
s = execState (execWriterT f) (DS 1 0)
b = evalState (execWriterT f) (DS 1 0)
i = nextFresh s - 1
j = numClauses s
How can we print current state value, for debugging purposes? E.g., in code from concrete-example-1 of http://www.haskell.org/haskellwiki/State_Monad , how can we print the current state value after each input character is read?
module StateGame where
import Control.Monad.State
type GameValue = Int
type GameState = (Bool, Int)
playGame :: String -> State GameState GameValue
playGame [] = do
(_, score) <- get
return score
playGame (x:xs) = do
(on, score) <- get
case x of
'a' | on -> put (on, score + 1)
'b' | on -> put (on, score - 1)
'c' -> put (not on, score)
_ -> put (on, score)
playGame xs
startState = (False, 0)
main = print $ evalState (playGame "abcaaacbbcabbab") startState
For quick and dirty debugging, use trace from Debug.Trace. I often find it useful to flip the argument order and define
infixl 0 `debug`
debug :: a -> String -> a
debug = flip trace
Then you can tack the debugging to the end of the line and it's easier to comment out (and at the end remove).
For more principled logging, combine the State with a Writer and tell the logging messages or use StateT s IO if you directly want to log to a file or stderr.
As n.m. points out there is Debug.Trace, but it's easy to write something yourself. However I strongly recommend to use this only for debugging, and to remove it for real world code. Here is an example:
import System.IO.Unsafe
output a b = seq (unsafePerformIO (print a)) b
(output "test" 23) * 25
-- "test"
-- 527
Here output takes an argument to print out, and a return value, behaving like a const, just with a side effect. seq is needed to force the evaluation of print, else laziness will prevent to print anything.