Synchronising parallel processes in Haskell - haskell

In Haskell, how can I run multiple (monad?) functions (such as print) in parallel and see their output in the order of their finish time? I want three processes each one ending up in a print function.
import Control.Parallel
main = a `par` b `pseq` (a,b)
where
a = print("ack",ack 3 10)
b = print("fac",fac 42)
If I don't use pseq, it will show the last one specified in par combination. I want to make sure all processes are finished before the program ends. I tried this but it does not show the output of a,b:
...
main = a `par` b `pseq` print("done.")
...
Note: my program ends with the following lines:
fac 0 = 1
fac n = n * fac (n-1)
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))

Don't use Control.Parallel for running IO actions concurrently.
Control.Concurrent.Async should do what you want – Control.Parallel is used for hinting which pure operations can be evaluated simultaneously, not for running multiple actions (monadic or otherwise) at the same time.

In the C family of languages, there's a difference between these 2 lines
a = foobar; // This just sets a to be a function pointer to foobar
b = foobar(); // This actually runs foobar and stores the result in `b`
Your code basically has the same problem as if you were writing C and forgot the () at the end of all your function calls. It assigns the function pointers a and b in parallel but doesn't actually call them.
If you are writing pure functions in Haskell, you can add parallelism using par, but it doesn't work for IO because IO a is like a function pointer. Your code "evaluates" a and b in parallel but the result of that evaluation is still waiting for you to actually execute it.
For code that lives in IO, you have to import Control.Concurrent and use the function forkIO to create the new thread. Then because the child threads all automatically die when the main thread finishes, you need some way of making the main thread wait (I'll use MVars in the example because they're the simplest reliable way to do it)
import Control.Concurrent
import Control.Concurrent.MVar
main = do
-- MVars have a type parameter because you can also use them to send data
-- between threads, but in this example I'm basically just using them as locks
await <- newEmptyMVar :: IO (MVar ())
bwait <- newEmptyMVar :: IO (MVar ())
forkIO $ print ("ack",ack 3 10) >> putMVar await ()
forkIO $ print ("fac",fac 42) >> putMVar bwait ()
takeMVar await
takeMVar bwait

Related

Adding a Progress Bar to Parallel Code (Haskell)

In Haskell, I have a list that can be evaluated in parallel. Each individual evaluation doesn't take that long, but there are many of them (1 million, for example). I'm using the following library. The plan is split the list into chunks and run them in parallel. I have something like the following that works:
import Control.Parallel.Strategies
import Control.DeepSeq
-- Imagine this being slightly more expensive
kindaExpensiveComputation :: Int -> [Int]
kindaExpensiveComputation n = replicate n 42
main :: IO ()
main = do
let n = 1000000
let args = replicate n 20
let chunkSize = n `div` 10
let result = force $ withStrategy (parListChunk chunkSize rseq) . map kindaExpensiveComputation $ args
-- do stuff with result here
-- end program
I would like to add a progress bar to this so I can keep track of how much of the list has been done. My instinct was to try something like the following:
import Control.Parallel.Strategies
import Control.DeepSeq
import System.ProgressBar
-- Imagine this being slightly more expensive
kindaExpensiveComputation :: ProgressBar s -> Int -> IO [Int]
kindaExpensiveComputation pb n = do
let res = replicate n 42
incProgress pb 1
return res
main :: IO ()
main = do
let n = 1000000
let args = replicate n 20
let chunkSize = n `div` 10
pb <- newProgressBar defStyle 10 (Progress 0 n ())
let result = force $ withStrategy (parListChunk chunkSize rseq) . map (kindaExpensiveComputation pb) $ args
-- do stuff with result here
-- end program
But force doesn't seem to be able to handle IO. I've tried a couple other things, but whatever I try evaluates the list of IO [Int] in parallel but not the actual contents of the IO. I see that the parallel library has some functions like withStrategyIO, although I'm not sure how to use it or if it's what I'm looking for.
I think my understanding of how Haskell evaluates expressions is causing my confusion, so any pointers on that would be helpful as well.
Unfortunately, I believe GHC currently does not expose any functionality for observing (in IO, obviously) the evaluation progress of parallel computations. You will need to use concurrency in the form of forkIO and friends (or a library that wraps them like the async package) instead.

Waiting on multiple async's reliably?

My code needs to fire multiple threads and keep track of which have finished and which are still running. I as planning on using waitAny or waitAnyCatch, but was thrown off by the following in the documentation
If multiple Asyncs complete or have completed, then the value returned corresponds to the first completed Async in the list.
If that is really the case, how does one ever keep track of running / exited threads reliably?
Here's my simplified code:
chan <- newChan
currentThreadsRef <- newIORef []
-- read jobs from a channel, and run them in parallel asyncs/threads,
-- while adding all threads references to currentThreadsRef
async $ do
jobArgs <- readChan chan
jobAsync <- async $ runJob jobArgs
atomicallyModifyIORef' currentThreadsRef $ \x -> (jobAsync:x, ())
-- wait for jobs to be finished, and remove the thread refernece
-- from currentThreadsRef
waitForAllJobs currentJobsRef = do
(readIORef currentJobsRef) >>= \case
[] -> logDebug "All jobs exited"
currentJobs -> do
(exitedJob, jobResult) <- waitAnyCatch currentJobs
atomicallyModifyIORef currentJobsRef $ \x -> (filter (/= exitedjob) x, ())
logDebug $ "Job completed with result=" <> show result
waitForAllJobs currentJobsRef
PS: Although it may not be obvious from my simplified code above, there is a reason why I cannot simply use mapConcurrently over the input-data. Actually, async-pool seems like a good fit for my use-case, but even that has the same problem with waitAny.
Here's a program that launches 1000 asyncs all set to terminate within a second and waits for them all in a loop. Compiled with ghc -O2 -threaded and run with +RTS -N, it runs in about 1.5 seconds, and none of the asyncs gets "lost":
import Control.Concurrent
import Control.Concurrent.Async
import qualified Data.Set as Set
main :: IO ()
main = do
let n = 1000 :: Int
asyncs0 <- mapM (\i -> async (threadDelay 1000000 >> return i)) [1..n]
let loop :: Set.Set (Async Int) -> IO ()
loop asyncs | null asyncs = return ()
| otherwise = do
(a, _i) <- waitAny (Set.toList asyncs)
loop (Set.delete a asyncs)
loop (Set.fromList asyncs0)
So, as was mentioned in a comment, the documentation is referring to the fact that the first completed async in the provided list is the one that will be "returned", but if multiple asyncs have completed, the additional ones aren't "forgotten". You just need to remove the returned async from the list and re-poll, and you'll eventually get them all.
So, you shouldn't have any trouble waiting on multiple asyncs with waitAny.

How can I make a Pipe concurrent with Haskell's Pipe library?

I have some Haskell code that uses Pipes:
module Main(main) where
import Pipes
a :: Producer Int IO ()
a = each [1..10]
b :: Pipe Int Int IO ()
b = do
x <- await
yield (x*2)
b
c :: Consumer Int IO ()
c = do
x <- await
lift $ print x
c
main :: IO ()
main = runEffect $ a >-> b >-> c
The Pipes.Concurrent tutorial demonstrates using multiple workers along with work stealing. How can I do something similar inside of b? I would like b to perform it's work concurrently using a set number of workers.
Obviously, concurrency isn't useful in this exact case, but it's the simplest example I could come up with. In my real use case I'd like to make some web requests concurrently using a limited number of workers.
EDIT: I misunderstood what you were asking;
You may be able to do this inside a pipe, but I'm not really sure what the motivation would be. I'd recommend building re-usable pipe chains and just dispatching to them using workers rather than trying to build workers INSIDE the pipe. You'll lose any ordering guarantees that the first in is the first out if you build it into the pipe itself.
The section on Work Stealing is what you're looking for, this code is basically verbatim from the tutorial, but let's break down how it works. Here's one way we could do what you want:
module Main(main) where
import Pipes
import Pipes.Concurrent
import Control.Concurrent.Async (async, wait)
import Control.Concurrent (threadDelay)
import Control.Monad (forM)
a :: Producer Int IO ()
a = each [1..10]
b :: Pipe Int Int IO ()
b = do
x <- await
yield (x*2)
b
c :: Consumer Int IO ()
c = do
x <- await
lift $ print x
c
main :: IO ()
main = do
(output, input) <- spawn unbounded
feeder <- async $ do runEffect $ a >-> toOutput output
performGC
workers <- forM [1..3] $ \i ->
async $ do runEffect $ fromInput input >-> b >-> c
performGC
mapM_ wait (feeder:workers)
The first line spawn unbounded is from Pipes.Concurrent, it initializes a 'mailbox' that has a handle for input and output. It confused me at first, but in this case we send messages TO the output and pull them FROM the input. This resembles a push-pull message channel in languages like golang.
We specify a Buffer to say how many messages we can store, in this case we set no-limit with unbounded.
Okay, so the mailbox is initialized, we can now create Effects which send messages to it. The mailbox channels are implemented using the STM, so that's how it can collect messages asynchronously.
Let's create an asynchronous job that feeds the mailbox;
feeder <- async $ do runEffect $ a >-> toOutput output
performGC
The a >-> toOutput output is just normal pipe composition, we need toOutput to convert output into a pipe. Note the performGC call that's also part of the IO, it allows Pipes.Concurrent to know to clean up after the job has completed. We could run this using forkIO if we like, but in this case we use async so that we can wait for the result to finish later on. Okay, so our mailbox should be asynchronously receiving messages, let's pull them out and do some work.
workers <- forM [1..3] $ \i ->
async $ do runEffect $ fromInput input >-> b >-> c
performGC
Same idea as before, but this time we're just spawning a few of them. We read from the input just like a normal pipe using fromInput and then run it through the rest of our chain, cleaning up when we're done. input will ensure that each time a value is pulled out that only one worker receives it. When all the jobs feeding into output complete (it keeps track of all the open jobs) then it will close the input pipe and the workers will finish.
If you're using this in a web-worker scenario you would have a main loop which keeps sending requests to the toOutput output channel, and then spawn as many workers as you like who pull into their pipeline from fromInput input.

Why does my parallel traversal Haskell program leak memory?

Consider the following Haskell program (I'm doing this mostly for learning purposes):
import qualified Control.Concurrent.MSem as Sem
import System.Environment (getArgs)
import Control.Concurrent (forkIO)
import Control.Monad
-- Traverse with maximum n threads
parallelTraverse :: Foldable a => Int -> (b -> IO()) -> a b -> IO ()
parallelTraverse n action values = do
sem <- Sem.new n
forM_ values $ \value -> Sem.with sem (forkIO $ action value)
main :: IO ()
main = do
args <- getArgs
let nThreads = read . head $ args :: Int
parallelTraverse nThreads print [(1::Int)..]
when I run it, memory quickly climbs to several GB. I tried various combinations to make sure I discard the results of intermediate computations (the print actions). Why is it still leaking space?
First of all, you have an evident mistake in the following piece:
Sem.with sem (forkIO $ action value)
You're addressing the semaphore from the master thread around the "fork" operation instead of the action there. Following is the proper way to implement it:
forkIO (Sem.with sem (action value))
I.e., to address the semaphore from the context of the forked thread.
Secondly, in the following code you're calling the parallelTraverse operation on an infinite list:
parallelTraverse nThreads print [(1::Int)..]
Which results in the infinite forking of threads. And since the forkIO operation is roughly instantaneous for the calling thread, it's pretty much no surprise that you're running out of resources quite soon.
To use the semaphore to limit the number of worker threads the with pattern simply won't do in your case. Instead you should use the explicit combination of wait and signal and not forget to treat the exceptions properly (in case you expect them). E.g.,:
parallelTraverse :: Foldable a => Int -> (b -> IO()) -> a b -> IO ()
parallelTraverse n action values = do
sem <- Sem.new n
forM_ values $ \value -> do
Sem.wait sem
forkIO $ finally (action value) (Sem.signal sem)

Strict evaluation techniques for concurrent channels in Haskell

I'm toying with Haskell threads, and I'm running into the problem of communicating lazily-evaluated values across a channel. For example, with N worker threads and 1 output thread, the workers communicate unevaluated work and the output thread ends up doing the work for them.
I've read about this problem in various documentation and seen various solutions, but I only found one solution that works and the rest do not. Below is some code in which worker threads start some computation that can take a long time. I start the threads in descending order, so that the first thread should take the longest, and the later threads should finish earlier.
import Control.Concurrent (forkIO)
import Control.Concurrent.Chan -- .Strict
import Control.Concurrent.MVar
import Control.Exception (finally, evaluate)
import Control.Monad (forM_)
import Control.Parallel.Strategies (using, rdeepseq)
main = (>>=) newChan $ (>>=) (newMVar []) . run
run :: Chan (Maybe String) -> MVar [MVar ()] -> IO ()
run logCh statVars = do
logV <- spawn1 readWriteLoop
say "START"
forM_ [18,17..10] $ spawn . busyWork
await
writeChan logCh Nothing -- poison the logger
takeMVar logV
putStrLn "DONE"
where
say mesg = force mesg >>= writeChan logCh . Just
force s = mapM evaluate s -- works
-- force s = return $ s `using` rdeepseq -- no difference
-- force s = return s -- no-op; try this with strict channel
busyWork = say . show . sum . filter odd . enumFromTo 2 . embiggen
embiggen i = i*i*i*i*i
readWriteLoop = readChan logCh >>= writeReadLoop
writeReadLoop Nothing = return ()
writeReadLoop (Just mesg) = putStrLn mesg >> readWriteLoop
spawn1 action = do
v <- newEmptyMVar
forkIO $ action `finally` putMVar v ()
return v
spawn action = do
v <- spawn1 action
modifyMVar statVars $ \vs -> return (v:vs, ())
await = do
vs <- modifyMVar statVars $ \vs -> return ([], vs)
mapM_ takeMVar vs
Using most techniques, the results are reported in the order spawned; that is, the longest-running computation first. I interpret this to mean that the output thread is doing all the work:
-- results in order spawned (longest-running first = broken)
START
892616806655
503999185040
274877906943
144162977343
72313663743
34464808608
15479341055
6484436675
2499999999
DONE
I thought the answer to this would be strict channels, but they didn't work. I understand that WHNF for strings is insufficient because that would just force the outermost constructor (nil or cons for the first character of the string). The rdeepseq is supposed to fully evaluate, but it makes no difference. The only thing I've found that works is to map Control.Exception.evaluate :: a -> IO a over all the characters in the string. (See the force function comments in the code for several different alternatives.) Here's the result with Control.Exception.evaluate:
-- results in order finished (shortest-running first = correct)
START
2499999999
6484436675
15479341055
34464808608
72313663743
144162977343
274877906943
503999185040
892616806655
DONE
So why don't strict channels or rdeepseq produce this result? Are there other techniques? Am I misinterpreting why the first result is broken?
There are two issues going on here.
The reason the first attempt (using an explicit rnf) doesn't work is that, by using return, you've created a thunk that fully evaluates itself when it is evaluated, but the thunk itself has not being evaluated. Notice that the type of evaluate is a -> IO a: the fact that it returns a value in IO means that evaluate can impose ordering:
return (error "foo") >> return 1 == return 1
evaluate (error "foo") >> return 1 == error "foo"
The upshot is that this code:
force s = evaluate $ s `using` rdeepseq
will work (as in, have the same behavior as mapM_ evaluate s).
The case of using strict channels is a little trickier, but I believe this is due to a bug in strict-concurrency. The expensive computation is actually being run on the worker threads, but it's not doing you much good (you can check for this explicitly by hiding some asynchronous exceptions in your strings and seeing which thread the exception surfaces on).
What's the bug? Let's take a look at the code for strict writeChan:
writeChan :: NFData a => Chan a -> a -> IO ()
writeChan (Chan _read write) val = do
new_hole <- newEmptyMVar
modifyMVar_ write $ \old_hole -> do
putMVar old_hole $! ChItem val new_hole
return new_hole
We see that modifyMVar_ is called on write before we evaluate the thunk. The sequence of operations then is:
writeChan is entered
We takeMVar write (blocking anyone else who wants to write to the channel)
We evaluate the expensive thunk
We put the expensive thunk onto the channel
We putMVar write, unblocking all of the other threads
You don't see this behavior with the evaluate variants, because they perform the evaluation before the lock is acquired.
I’ll send Don mail about this and see if he agrees that this behavior is kind of suboptimal.
Don agrees that this behavior is suboptimal. We're working on a patch.

Resources