The program below works if run with runhaskell or if compiled but not with -O2. If compiled with -O2 it seems to hang.
I'm using GHC 7.10.2.
I've changed the min/max iterations to 10 and 20 respectively. It will
generate anywhere from 20 to 100 MB of output into the file test.out.
Run time is about 15 - 60 secs.
Program Explanation
Below is a multi-threaded program that has a pool of workers and a manager. The workers generate traces to be used in plotting a Buddhabrot, put it in a queue, and a manager periodically empties the queue and writes the data to disk. When a certain amount of data has been generated, the program stops.
But when the program runs the manager thread only does one check, and then it gets stuck (the worker threads are still running). However, if I remove the part where the manager thread writes to file, then everything seems to work. I just don't understand why...
import Control.Concurrent
import Control.Concurrent.Async
import Control.Concurrent.STM
import Control.Monad
( forever
, unless
)
import Control.Monad.Loops
import System.IO
import System.Random
import qualified Data.Binary as B
import qualified Data.ByteString.Lazy as BS
type Coord = (Double, Double)
type Trace = [Coord]
-- | Represents a rectangle in the complex plane, bounded by a lower left
-- coordinate and an upper right coordinate.
data Plane
= Plane { ll :: Coord, ur :: Coord }
deriving (Show)
-- | Adds two coordinates.
(+.) :: Coord -> Coord -> Coord
(r1, i1) +. (r2, i2) = (r1 + r2, i1 + i2)
-- | Multiplies two coordinates.
(*.) :: Coord -> Coord -> Coord
(r1, i1) *. (r2, i2) = (r1*r2 - i1*i2, r1*i2 + r2*i1)
-- | Computes the square of a coordinate.
square :: Coord -> Coord
square (r, i) = (r*r - i*i, 2*r*i)
-- | Distance from origin to a given coordinate.
distFromOrigin :: Coord -> Double
distFromOrigin (r, i) = r*r + i*i
-- | A structure for passing data to the worker threads.
data WorkerData
= WorkerData { wdMinIt :: Int
, wdMaxIt :: Int
, wdTraceQueue :: TQueue Trace
-- ^ A queue of traces to be written to disk.
}
-- | A structure for passing data to the manager thread.
data ManagerData
= ManagerData { mdOutHandle :: Handle
-- ^ Handle to the output file.
, mdNumTraces :: Integer
-- ^ Number of traces to gather.
, mdTraceQueue :: TQueue Trace
-- ^ A queue of traces to be written to disk.
}
-- | Encodes an entity to binary bytestring.
encode :: B.Binary a => a -> BS.ByteString
encode = B.encode
-- | Writes a lazy bytestring to file.
writeToFile :: Handle -> BS.ByteString -> IO ()
writeToFile = BS.hPut
mkManagerData :: TQueue Trace -> IO ManagerData
mkManagerData t_queue =
do let out_f = "test.out"
out_h <- openBinaryFile out_f WriteMode
let num_t = 1000
return $ ManagerData { mdOutHandle = out_h
, mdNumTraces = num_t
, mdTraceQueue = t_queue
}
mkWorkerData :: TQueue Trace -> IO WorkerData
mkWorkerData t_queue =
do let min_it = 10 -- 1000
max_it = 20 -- 10000
return $ WorkerData { wdMinIt = min_it
, wdMaxIt = max_it
, wdTraceQueue = t_queue
}
-- | The actions to be performed by the manager thread.
runManager :: ManagerData -> IO ()
runManager m_data =
do execute 0
return ()
where execute count =
do new_traces <- purgeTQueue $ mdTraceQueue m_data
let new_count = count + (toInteger $ length new_traces)
putStrLn $ "Found " ++ (show $ new_count) ++ " traces so far. "
if length new_traces > 0
then do putStrLn $ "Writing new traces to file..."
_ <- mapM (writeToFile (mdOutHandle m_data))
(map encode new_traces)
putStr "Done"
else return ()
putStrLn ""
unless (new_count >= mdNumTraces m_data) $
do threadDelay (1000 * 1000) -- Sleep 1s
execute new_count
-- | The actions to be performed by a worker thread.
runWorker :: WorkerData -> IO ()
runWorker w_data =
forever $
do c <- randomCoord
case computeTrace c (wdMinIt w_data) (wdMaxIt w_data) of
Just t -> atomically $ writeTQueue (wdTraceQueue w_data) t
Nothing -> return ()
-- | Reads all values from a given 'TQueue'. If any other thread reads from the
-- same 'TQueue' during the execution of this function, then this function may
-- deadlock.
purgeTQueue :: Show a => TQueue a -> IO [a]
purgeTQueue q =
whileJust (atomically $ tryReadTQueue q)
(return . id)
-- | Generates a random coordinate to trace.
randomCoord :: IO Coord
randomCoord =
do x <- randomRIO (-2.102613, 1.200613)
y <- randomRIO (-1.237710, 1.239710)
return (x, y)
-- | Computes a trace, using the classical Mandelbrot function, for a given
-- coordinate and minimum and maximum iteration count. If the length of the
-- trace is less than the minimum iteration count, or exceeds the maximum
-- iteration count, 'Nothing' is returned.
computeTrace
:: Coord
-> Int
-- ^ Minimum iteration count.
-> Int
-- ^ Maximum iteration count.
-> Maybe Trace
computeTrace c0 min_it max_it =
if isUsefulCoord c0
then let step c = square c +. c0
computeIt c it = if it < max_it
then computeIt (step c) (it + 1)
else it
computeTr [] = error "computeTr: empty list"
computeTr (c:cs) = if length cs < max_it
then computeTr (step c:(c:cs))
else (c:cs)
num_it = computeIt c0 0
in if num_it >= min_it && num_it <= max_it
then Just $ reverse $ computeTr [c0]
else Nothing
else Nothing
-- | Checks if a given coordinate is useful by checking if it belongs in the
-- cardioid or period-2 bulb of the Mandelbrot.
isUsefulCoord :: Coord -> Bool
isUsefulCoord (x, y) =
let t1 = x - 1/4
p = sqrt (t1*t1 + y*y)
is_in_cardioid = x < p - 2*p*p + 1/4
t2 = x + 1
is_in_bulb = t2*t2 + y*y < 1/16
in not is_in_cardioid && not is_in_bulb
main :: IO ()
main =
do t_queue <- newTQueueIO
m_data <- mkManagerData t_queue
w_data <- mkWorkerData t_queue
let num_workers = 1
workers <- mapM async (replicate num_workers (runWorker w_data))
runManager m_data
_ <- mapM cancel workers
_ <- mapM waitCatch workers
putStrLn "Tracing finished"
Why It Fails
After reviewing the answers below, I finally realized why it doesn't work as intended. The program does not hang, but the time it takes for the manager thread to encode a single trace is in the order of tens of seconds (and when encoded it consumes several megabytes)! This means that even if there are some tens of traces in the queue when exhausted -- on my machine the workers manage to produce about 250 traces before the queue is exhausted by the manger thread -- it will take forever before the next exhaust.
Hence it matters little what solution I choose unless the work of the manager thread is greatly reduced. For that, I will have to abandon my idea of dumping each individual trace to file and instead process it once generated.
The problem is two-fold:
(1) The manager thread doesn't process any
Traces until it has exhausted the queue.
(2) The worker thread can add elements to the queue very, very quickly.
This results in a race that the manager thread rarely wins. [ This also explains the observed behavior with -O2 - the optimization just made the worker thread faster. ]
Adding some debugging code shows that the worker can add
items to the queue in excess of 100K Traces per second.
Moreover, even though the manager is only interested in
writing out the first 1000 Traces, the worker doesn't
stop at this limit. So, under certain circumstances,
the manager is never able to exit this loop:
purgeTQueue q =
whileJust (atomically $ tryReadTQueue q)
(return . id)
The simplest way to fix the code is to have the
manager thread use readTQueue to read and process just one
item off the queue at a time. This will also block
the manager thread when the queue us empty obviating
the need to the manager thread to periodically sleep.
Changing purgeTQueue to:
purgeTQueue = do item <- atomically $ readTQueue (mdTraceQueue m_data)
return [item]
and removing the threadDelay from runManager fixes the problem.
Example code available in the Lib4.hs module at: https://github.com/erantapaa/mandel
Related
How can computations done in ST be made to run in parallel?
I have a vector which needs to be filled in by random access, hence the use of ST, and the computation runs correctly single-threaded, but have been unable to figure out how to use more than one core.
Random access is needed because of the meaning of the indices into the vector. There are n things and every possible way of choosing among n things has an entry in the vector, as in the choice function. Each of these choices corresponds to a binary number (conceptually, a packed [Bool]) and these Int values are the indices. If there are n things, then the size of the vector is 2^n. The natural way the algorithm runs is for every entry corresponding to "n choose 1" to be filled in, then every entry for "n choose 2," etc. The entries corresponding to "n choose k" depends on the entries corresponding to "n choose (k-1)." The integers for the different choices do not occur in numerical order, and that's why random access is needed.
Here's a pointless (but slow) computation that follows the same pattern. The example function shows how I tried to break the computation up so that the bulk of the work is done in a pure world (no ST monad). In the code below, bogus is where most of the work is done, with the intent of calling that in parallel, but only one core is ever used.
import qualified Data.Vector as Vb
import qualified Data.Vector.Mutable as Vm
import qualified Data.Vector.Generic.Mutable as Vg
import qualified Data.Vector.Generic as Gg
import Control.Monad.ST as ST ( ST, runST )
import Data.Foldable(forM_)
import Data.Char(digitToInt)
main :: IO ()
main = do
putStrLn $ show (example 9)
example :: Int -> Vb.Vector Int
example n = runST $ do
m <- Vg.new (2^n) :: ST s (Vm.STVector s Int)
Vg.unsafeWrite m 0 (1)
forM_ [1..n] $ \i -> do
p <- prev m n (i-1)
let newEntries = (choiceList n i) :: [Int]
forM_ newEntries $ \e -> do
let v = bogus p e
Vg.unsafeWrite m e v
Gg.unsafeFreeze m
choiceList :: Int -> Int -> [Int]
choiceList _ 0 = [0]
choiceList n 1 = [ 2^k | k <- [0..(n-1) ] ]
choiceList n k
| n == k = [2^n - 1]
| otherwise = (choiceList (n-1) k) ++ (map ((2^(n-1)) +) $ choiceList (n-1) (k-1))
prev :: Vm.STVector s Int -> Int -> Int -> ST s Integer
prev m n 0 = return 1
prev m n i = do
let chs = choiceList n i
v <- mapM (\k -> Vg.unsafeRead m k ) chs
let e = map (\k -> toInteger k ) v
return (sum e)
bogus :: Integer -> Int -> Int
bogus prior index = do
let f = fac prior
let g = (f^index) :: Integer
let d = (map digitToInt (show g)) :: [Int]
let a = fromIntegral (head d)^2
a
fac :: Integer -> Integer
fac 0 = 1
fac n = n * fac (n - 1)
If anyone tests this, using more than 9 or 10 in show (example 9) will take much longer than you want to wait for such a pointless sequence of numbers.
Just do it in IO. If you need to use the result in pure code, then unsafePerformIO is available.
The following version runs about 3-4 times faster with +RTS -N16 than +RTS -N1. My changes involved converting the ST vectors to IO, changing the forM_ to forConcurrently_, and adding a bang annotation to let !v = bogus ....
Full code:
import qualified Data.Vector as Vb
import qualified Data.Vector.Mutable as Vm
import qualified Data.Vector.Generic.Mutable as Vg
import qualified Data.Vector.Generic as Gg
import Control.Monad.ST as ST ( ST, runST )
import Data.Foldable(forM_)
import Data.Char(digitToInt)
import Control.Concurrent.Async
import System.IO.Unsafe
main :: IO ()
main = do
let m = unsafePerformIO (example 9)
putStrLn $ show m
example :: Int -> IO (Vb.Vector Int)
example n = do
m <- Vg.new (2^n)
Vg.unsafeWrite m 0 (1)
forM_ [1..n] $ \i -> do
p <- prev m n (i-1)
let newEntries = (choiceList n i) :: [Int]
forConcurrently_ newEntries $ \e -> do
let !v = bogus p e
Vg.unsafeWrite m e v
Gg.unsafeFreeze m
choiceList :: Int -> Int -> [Int]
choiceList _ 0 = [0]
choiceList n 1 = [ 2^k | k <- [0..(n-1) ] ]
choiceList n k
| n == k = [2^n - 1]
| otherwise = (choiceList (n-1) k) ++ (map ((2^(n-1)) +) $ choiceList (n-1) (k-1))
prev :: Vm.IOVector Int -> Int -> Int -> IO Integer
prev m n 0 = return 1
prev m n i = do
let chs = choiceList n i
v <- mapM (\k -> Vg.unsafeRead m k ) chs
let e = map (\k -> toInteger k ) v
return (sum e)
bogus :: Integer -> Int -> Int
bogus prior index = do
let f = fac prior
let g = (f^index) :: Integer
let d = (map digitToInt (show g)) :: [Int]
let a = fromIntegral (head d)^2
a
fac :: Integer -> Integer
fac 0 = 1
fac n = n * fac (n - 1)
I think this can not be done in a safe way. In the general case, it seems it would break Haskell's referential transparency.
If we could perform multi-threaded computations within ST s, then we could spawn two threads that race over the same STRef s Bool. Let's say one thread is writing False and the other one True.
After we use runST on the computation, we get an expression of type Bool which is sometimes False and sometimes True. That should not be possible.
If you are absolutely certain that your parallelization does not break referential transparency, you could try using unsafe primitives like unsafeIOToST to spawn new threads. Use with extreme care.
There might be safer ways to achieve something similar. Outside ST, we do have some parallelism available in Control.Parallel.Strategies.
There are a number of ways to do parallelization in Haskell. Usually they will give comparable performance improvements, however some are better then the others and it mostly depends on problem that needs parallelization. This particular use case looked very interesting to me, so I decided to investigate a few approaches.
Approaches
vector-strategies
We are using a boxed vector, therefore we can utilize laziness and built-in spark pool for parallelization. One very simple approach is provided by vector-strategies package, which can iterate over any immutable boxed vector and evaluate all of the thunks in parallel. It is also possible to split the vector in chunks, but as it turns out the chunk size of 1 is the optimal one:
exampleParVector :: Int -> Vb.Vector Int
exampleParVector n = example n `using` parVector 1
parallel
parVector uses par underneath and requires one extra iteration over the vector. In this case we are already iterating over thee vector, thus it would actually make more sense to use par from parallel directly. This would allow us to perform computation in parallel while continue using ST monad:
import Control.Parallel (par)
...
forM_ [1..n] $ \i -> do
p <- prev m n (i-1)
let newEntries = choiceList n i :: [Int]
forM_ newEntries $ \e -> do
let v = bogus p e
v `par` Vg.unsafeWrite m e v
It is important to note that the computation of each element of the vector is expensive when compared to the total number of elements in the vector. That is why using par is a very good solution here. If it was the opposite, namely the vector was very large, but elements weren't too expensive to compute, it would be better to use an unboxed vector and switch it to a different parallelization method.
async
Another way was described by #K.A.Buhr. Switch to IO from ST and use async:
import Control.Concurrent.Async (forConcurrently_)
...
forM_ [1..n] $ \i -> do
p <- prev m n (i-1)
let newEntries = choiceList n i :: [Int]
forConcurrently_ newEntries $ \e -> do
let !v = bogus p e
Vg.unsafeWrite m e v
The concern that #chi has raised is a valid one, however in this particular implementation it is safe to use unsafePerformIO instead of runST, because parallelization does not violate the invariant of deterministic computation. Namely, we can promise that regardless of the input supplied to example function, the output will always be exactly the same.
scheduler
Green threads are pretty cheap in Haskell, but they aren't free. The solution above with async package has one slight drawback: it will spin up at least as many threads as there are elements in the newEntries list each time forConcurrently_ is called. It would be better to spin up as many threads as there are capabilities (the -N RTS option) and let them do all the work. For this we can use scheduler package, which is a work stealing scheduler:
import Control.Scheduler (Comp(Par), runBatch_, withScheduler_)
...
withScheduler_ Par $ \scheduler ->
forM_ [1..n] $ \i -> runBatch_ scheduler $ \_ -> do
p <- prev m n (i-1)
let newEntries = choiceList n i :: [Int]
forM_ newEntries $ \e -> scheduleWork_ scheduler $ do
let !v = bogus p e
Vg.unsafeWrite m e v
Spark pool in GHC also uses a work stealing scheduler, which is built into RTS and is unrelated to the package above in any shape or form, but the idea is very similar: few threads with many units of computation.
Benchmarks
Here are some benchmarks on a 16-core machine for all of the approaches with example 7 (value 9 takes on the order of seconds, which introduces too much noise for criterion). We only get about x5 speedup, because a significant part of the algorithm is sequential in nature and can't be parallelized.
I've read in multiple places that the writer monad of a list keeps that full list in memory, and therefore shouldn't be used in anything other than small samples (no logging, for instance).
For instance, read here
However, to test the claim, I wrote the following program, and actually showed that it successfully outputs an infinite list lazily!
import Control.Monad.Writer
createInfiniteList :: Int -> Writer [Int] ()
createInfiniteList i = do
tell [i]
createInfiniteList (i+1)
main :: IO ()
main = do
let x = execWriter $ createInfiniteList 1
print x
I've watched this program output over 1 billion items (it runs very fast), and monitored that the memory usage never went past 0.1% on my machine.
Has the writer monad been rewritten to fix the original problem? Can I count on it continuing to work this way in the future?
note- I am aware that better logging monads exist (I use those elsewhere)... My desired use case is not logging (but it is similar)
There are two factors at play here. The way the <>/mappend calls are nested, and whether the entire log is kept in memory.
How are the <> calls nested?
This depends on how you write your code using Writer, not on Writer's implementation. To see why, lets cheat.
data Tree a = Nil | Leaf a | Node (Tree a) (Tree a)
deriving (Show)
instance Semigroup (Tree a)
where x <> y = Node x y
instance Monoid (Tree a)
where mempty = Nil
This is not a proper monoid, since <> is not associative. x <> (y <> z) gives Node x (Node y z) whereas (x <> y) <> z) gives Node (Node x y) z. It allows us to tell after the fact whether the Writer's "log" was reduced left-nested or right nested.
go :: Int -> Writer (Tree Int) ()
go i
| i < 5
= do tell (Leaf i)
go (i+1)
| otherwise
= pure ()
main :: IO ()
main = do
let (result, log) = runWriter $ go 1
putStrLn (render log)
render Nil = "Nil"
render (Leaf x) = show x
render (Node x y) = "(" ++ render x ++ ") <> (" ++ render y ++ ")"
With this, you get: (1) <> ((2) <> ((3) <> ((4) <> (Nil))))
Clearly right-nested. Hence how you could generate an infinite list as the "log" of a Writer and consume it as it was generated in a relatively small amount of space.
But swap the order of the tell and the recursion, so that it looks like this:
go :: Int -> Writer (Tree Int) ()
go i
| i < 5
= do go (i+1)
tell (Leaf i)
| otherwise
= pure ()
And you get this: ((((Nil) <> (4)) <> (3)) <> (2)) <> (1). Now it's left-nested, infinite recursion does not work:
import Control.Monad.Writer
createInfiniteList :: Int -> Writer [Int] ()
createInfiniteList i = do
createInfiniteList (i+1)
tell [i]
main :: IO ()
main = do
let x = execWriter $ createInfiniteList 1
print x
This never prints anything, and consumes an ever-growing amount of memory.
Basically the structure of the <> calls is similar to the structure of your Writer expressions. Everywhere you bind in a call to another function (including the equivalent in a do-block), all of the <> calls resulting from that call will be "inside parentheses". So tell _ >> recurse results in right-nested <>s, while recurse >> tell _ results in left-nested <>s, and more complex call-graphs result in similarly-structured nesting of <>s.
Forcing the result builds the entire log
Another particular thing about your test program is that it doesn't use the "result" of the Writer at all, only the "log". Obviously if the recursion is infinite there's never any final result at all, but if we change your program like so:
import Control.Monad.Writer
createLargeList :: Int -> Writer [Int] ()
createLargeList i
| i < 50000000
= do tell [i]
createLargeList (i+1)
| otherwise
= pure ()
main :: IO ()
main = do
let (result, log) = runWriter $ createLargeList 1
print $ length log
print result
Then it behaves similarly; length consumes the list as it is produced and completes in short order (and relatively low memory usage). After that the () is readily available and is printed immediately.
But if we change it to print the result first:
import Control.Monad.Writer
createLargeList :: Int -> Writer [Int] ()
createLargeList i
| i < 50000000
= do tell [i]
createLargeList (i+1)
| otherwise
= pure ()
main :: IO ()
main = do
let (result, log) = runWriter $ createLargeList 1
print result
print $ length log
Then on my system this takes much longer, and consumes nearly 15 GB of RAM1. It does have to materialise the log fully in RAM just to get at the final result, even when the <>s are right-nested and the log can be consumed lazily.
Technically I believe it is not building the list in memory, but rather a chain of thunks applying <> to singleton lists, which is just as long as the final list and probably uses more memory per link in the chain. The resulting list is still consumed by length as it is generated by forcing those thunks, but that doesn't really help since the entire thunk chain had to be generated to get at the final () result, instead of having the thunk chain itself generates as length demands more of the list.
1 That's compiling like ghc foo.hs; if I compile with -O2 then it behaves similarly to printing the length of the log first. This is a fairly simple case for GHC to inline everything and figure out a better way of computing the same result; I wouldn't assume its optimisations could address this problem if the program were more complicated.
I've been trying to solve the adventofcode day 5 question 2 (https://adventofcode.com/2017/day/5). It differs from the first question where if the item is bigger of equal to 3, it's decreased instead of increased by 1.
When running the implementation with the test data, it produces the correct outcome, so it seems that the implementation is perfect. Also, the recursive call looks to be in tail position, but it's still producing a stackoverflow exception.
The code looks like this
module AdventOfCode5 where
type Instruction = Int
type Position = Int
main :: IO ()
main = do
input <- readFile "day5input.txt"
let instructions = fmap (read :: String -> Instruction) $ lines input
_ <- putStrLn $ show $ computeResult (Prelude.length instructions) 0 (+1) $ instructions
return ()
main2 :: IO ()
main2 = do
input <- readFile "day5input.txt"
let instructions = fmap (read :: String -> Instruction) $ lines input
_ <- putStrLn $ show $ computeResult (Prelude.length instructions) 0 decAbove3AndIncBelow3 instructions
return ()
decAbove3AndIncBelow3 :: Int -> Int
decAbove3AndIncBelow3 x
| x >= 3 = x - 1
| otherwise = x + 1
computeResult :: Int -> Position -> (Int -> Int) -> [Instruction] -> Int
computeResult = takeStep' 0
where takeStep' :: Int -> Int -> Position -> (Int -> Int) -> [Instruction] -> Int
takeStep' count max pos changeInteger instructions
| pos >= max = count
| otherwise =
let
elementAtPos = instructions!!pos
newCount = count + 1
newPos = pos + elementAtPos
newInstructions = (take pos instructions) ++ ([(changeInteger elementAtPos)]) ++ (drop (pos + 1)) instructions
in
takeStep' newCount max newPos changeInteger newInstructions
The idea of the implementation is that you hold a counter and increment the counter for every iteration, in combination with altering the list of instructions with the updated version (where the Int -> Int is the function that knows how to update). You got a position where to look at and the recursion stops as soon as the position is larger than the list size (which i passed as input but could also be derived from the list of instructions).
Can anybody explain to me why this one is producing a stackoverflow?
There is a space leak in the first argument of takeStep', because it builds a thunk (... ((0 + 1) + 1) ...) + 1 instead of just evaluating the integer.
The stack may explode when that thunk gets evaluated.
Use seq to force count before continuing, e.g., count `seq` otherwise in the guard;
or compile with optimizations.
ghci is interpreting it, not compiling it. In particular, it doesn't perform the strictness analysis necessary to automatically fix that leak.
You can run this command to compile with optimizations (-O)
ghc -O -main-is AdventOfCode5.main2 AdventOfCode5.hs
(although even without optimizations compilation seems to reduce space usage enough to succeed.)
To practice concurrent programming, I wrote the following (suboptimal) program, which repeatedly calculates the first prime bigger than whatever the user inputs:
import Control.Concurrent
import Control.Concurrent.Chan
import Control.Monad (forever)
primeAtLeast n = -- Some pure code that looks up the first prime at least as big as n
outputPrimeAtLeast n = putStrLn $ show $ (n, primeAtLeast n)
main = do
chan <- newChan
worker <- forkIO $ forever $ readChan chan >>= outputPrimeAtLeast
forever $ (readLn :: (IO Int)) >>= (writeChan chan)
killThread worker
I want to have a worker thread in the background that does the actual calculation and outputs (n, primeAtLeast n) as soon as it's finished.
What it's doing now: As soon as I enter a number n, it immediately outputs (n,, returns the control to the main thread, calculates primeAtLeast n in the background and outputs the second half primeAtLeast n) as soon as it's finished.
So is putStrLn not atomic? Or where is the problem?
Try this:
outputPrimeAtLeast n = let p = primeAtLeast n in p `seq` putStrLn $ show (n, p)
The above forces the computation of the prime before the putStrLn is run.
Further, you may use print instead of putStrLn . show:
outputPrimeAtLeast n = let p = primeAtLeast n in p `seq` print (n, p)
Alternatively, you may use a putStrLn function which forces every single character before starting printing anything.
strictPutStrLn :: Show a => a -> IO ()
strictPutStrLn x = let str = show x in str `listSeq` putStrLn str
listSeq :: [a] -> b -> b
listSeq [] w = w
listSeq (x:xs) w = x `seq` listSeq xs w
I have a function in my main block
map anyHeavyFunction [list]
I'd like to show a progress bar during the computation process or add additional actions (pause, stop process etc.), but because map is a pure function I can't do it directly. I can guess I have to use monads, but what monad is appropriate? IO, State?
I know there is at least one library on hackage that has some pre-made monad transformers for this task, but I normally turn to the pipes package to roll my own when I need one. I am using pipes-4.0.0 it is going to be on hackage this weekend, but you can grab it form the github repo before that.
I also used terminal-progress-bar package so that it makes a nice terminal animation as well.
{-# language BangPatterns #-}
import Pipes
import qualified Pipes.Prelude as P
import Control.Monad.IO.Class
import System.ProgressBar
import System.IO ( hSetBuffering, BufferMode(NoBuffering), stdout )
-- | Takes the total size of the stream to be processed as l and the function
-- to map as fn
progress l = loop 0
where
loop n = do
liftIO $ progressBar (msg "Working") percentage 40 n l
!x <- await -- bang pattern to make strict
yield x
loop (n+1)
main = do
-- Force progress bar to print immediately
hSetBuffering stdout NoBuffering
let n = 10^6
let heavy x = last . replicate n $ x -- time wasting function
r <- P.toListM $ each [1..100] >-> P.map heavy >-> progress 100
putStrLn ""
return r
This animates:
> Working [=>.......................] 7%
> Working [=====>...................] 20%
Every update erases the last bar so it only take up one line on the terminal. Then it finishes like so:
> main
Working [=========================] 100%
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100]
Here's a (kind of) simple answer that I'm not satisfied with. It is based on the fact that #shellenberg wanted to apply a heavy function on each element of a (supposedly long) list. If it suffices to move the "progress bar" once for every element of the list, then the following can be turned into a general solution.
First of all, you need to pick the monad in which you'll work. This depends on what exactly your "progress bar" is. For this discussion, let's say that the IO monad is enough and that we want to alternately display the characters -, /, | and \. You'll also (most probably) need some kind of state S (here it is only the number of elements processed so far, therefore S is Int), so the real monad used will be StateT S IO.
Suppose your original program is:
m = 100000 -- how many elements the list has
-- Your (pure) function
anyHeavyFunction :: Int -> Bool
anyHeavyFunction n =
length [1..n] + length [n+1..4217] == 4217
-- Your list
list :: [Int]
list = take m $ repeat 4217
-- The main program
main :: IO ()
main = do
let l = map anyHeavyFunction list
if and l
then putStrLn "OK"
else putStrLn "WRONG"
(Notice that, very conveniently, the heavy function takes the same time for each element of the list.)
This is how you could convert it to display the crude "progress bar":
import Control.Monad.State
import System.IO (hFlush, stdout)
m = 100000 -- how many elements the list has
k = 5000 -- how often you want to "tick"
tick :: a -> StateT Int IO a
tick x = do
s <- get
put $ s+1
when (s `mod` k == 0) $ liftIO $ do
let r = (s `div` k) `mod` 4
putChar $ "-/|\\" !! r
putChar '\b'
hFlush stdout
x `seq` return x
-- Your (pure) function
anyHeavyFunction :: Int -> Bool
anyHeavyFunction n =
length [1..n] + length [n+1..4217] == 4217
-- Your list
list :: [Int]
list = take m $ repeat 4217
-- The main program
main :: IO ()
main = do
l <- flip evalStateT 0 $ mapM (tick . anyHeavyFunction) list
if and l
then putStrLn "OK"
else putStrLn "WRONG"
An interesting point: The seq in tick forces evaluation of the result for each element of the list. This is enough, if the result has a basic type (Bool here). Otherwise, it's not clear what you would want to do -- remember Haskell is lazy!
If one wants a finer progress bar or if one is not satisfied with the assumption that one "tick" will be counted for each element of the list, then I believe it's necessary to incorporate the ticking in the logic of the heavy function. This makes it ugly... I'd like to see what kind of general solutions can be suggested to that. I'm all in for Haskell, but I think it just sucks for such things as progress bars... There's no free lunch; you can't be pure and lazy and have your progress bars made easy!
EDIT: A version which uses the ProgressBar module suggested by #Davorak. It certainly looks nicer than my rotating bar.
import Control.Monad.State
import System.ProgressBar
import System.IO (hSetBuffering, BufferMode(NoBuffering), stdout)
m = 100000 -- how many elements the list has
k = 5000 -- how often you want to "tick"
tick :: a -> StateT Int IO a
tick x = do
s <- get
put $ s+1
when (s `mod` k == 0) $ liftIO $ do
progressBar (msg "Working") percentage 40 (toInteger s) (toInteger m)
x `seq` return x
-- Your (pure) function
anyHeavyFunction :: Int -> Bool
anyHeavyFunction n =
length [1..n] + length [n+1..4217] == 4217
-- Your list
list :: [Int]
list = take m $ repeat 4217
-- The main program
main :: IO ()
main = do
hSetBuffering stdout NoBuffering
l <- flip evalStateT 0 $ mapM (tick . anyHeavyFunction) list
if and l
then putStrLn "OK"
else putStrLn "WRONG"
The idea is the same, the drawbacks too.
You could use parMap to apply the expensive function in parallel (if the dependencies permit) and a list of TVars corresponding to each list (or chunk of) element(s) and set them once the respective function application has completed. A separate thread could check on the values and update the display (obviously some IO action would happen here).