Why does my parallel traversal Haskell program leak memory?

Why does my parallel traversal Haskell program leak memory? - haskell

Consider the following Haskell program (I'm doing this mostly for learning purposes):
import qualified Control.Concurrent.MSem as Sem
import System.Environment (getArgs)
import Control.Concurrent (forkIO)
import Control.Monad
-- Traverse with maximum n threads
parallelTraverse :: Foldable a => Int -> (b -> IO()) -> a b -> IO ()
parallelTraverse n action values = do
sem <- Sem.new n
forM_ values $ \value -> Sem.with sem (forkIO $ action value)
main :: IO ()
main = do
args <- getArgs
let nThreads = read . head $ args :: Int
parallelTraverse nThreads print [(1::Int)..]
when I run it, memory quickly climbs to several GB. I tried various combinations to make sure I discard the results of intermediate computations (the print actions). Why is it still leaking space?

First of all, you have an evident mistake in the following piece:
Sem.with sem (forkIO $ action value)
You're addressing the semaphore from the master thread around the "fork" operation instead of the action there. Following is the proper way to implement it:
forkIO (Sem.with sem (action value))
I.e., to address the semaphore from the context of the forked thread.
Secondly, in the following code you're calling the parallelTraverse operation on an infinite list:
parallelTraverse nThreads print [(1::Int)..]
Which results in the infinite forking of threads. And since the forkIO operation is roughly instantaneous for the calling thread, it's pretty much no surprise that you're running out of resources quite soon.
To use the semaphore to limit the number of worker threads the with pattern simply won't do in your case. Instead you should use the explicit combination of wait and signal and not forget to treat the exceptions properly (in case you expect them). E.g.,:
parallelTraverse :: Foldable a => Int -> (b -> IO()) -> a b -> IO ()
parallelTraverse n action values = do
sem <- Sem.new n
forM_ values $ \value -> do
Sem.wait sem
forkIO $ finally (action value) (Sem.signal sem)

Related

Waiting on multiple async's reliably?

My code needs to fire multiple threads and keep track of which have finished and which are still running. I as planning on using waitAny or waitAnyCatch, but was thrown off by the following in the documentation
If multiple Asyncs complete or have completed, then the value returned corresponds to the first completed Async in the list.
If that is really the case, how does one ever keep track of running / exited threads reliably?
Here's my simplified code:
chan <- newChan
currentThreadsRef <- newIORef []
-- read jobs from a channel, and run them in parallel asyncs/threads,
-- while adding all threads references to currentThreadsRef
async $ do
jobArgs <- readChan chan
jobAsync <- async $ runJob jobArgs
atomicallyModifyIORef' currentThreadsRef $ \x -> (jobAsync:x, ())
-- wait for jobs to be finished, and remove the thread refernece
-- from currentThreadsRef
waitForAllJobs currentJobsRef = do
(readIORef currentJobsRef) >>= \case
[] -> logDebug "All jobs exited"
currentJobs -> do
(exitedJob, jobResult) <- waitAnyCatch currentJobs
atomicallyModifyIORef currentJobsRef $ \x -> (filter (/= exitedjob) x, ())
logDebug $ "Job completed with result=" <> show result
waitForAllJobs currentJobsRef
PS: Although it may not be obvious from my simplified code above, there is a reason why I cannot simply use mapConcurrently over the input-data. Actually, async-pool seems like a good fit for my use-case, but even that has the same problem with waitAny.

Here's a program that launches 1000 asyncs all set to terminate within a second and waits for them all in a loop. Compiled with ghc -O2 -threaded and run with +RTS -N, it runs in about 1.5 seconds, and none of the asyncs gets "lost":
import Control.Concurrent
import Control.Concurrent.Async
import qualified Data.Set as Set
main :: IO ()
main = do
let n = 1000 :: Int
asyncs0 <- mapM (\i -> async (threadDelay 1000000 >> return i)) [1..n]
let loop :: Set.Set (Async Int) -> IO ()
loop asyncs | null asyncs = return ()
| otherwise = do
(a, _i) <- waitAny (Set.toList asyncs)
loop (Set.delete a asyncs)
loop (Set.fromList asyncs0)
So, as was mentioned in a comment, the documentation is referring to the fact that the first completed async in the provided list is the one that will be "returned", but if multiple asyncs have completed, the additional ones aren't "forgotten". You just need to remove the returned async from the list and re-poll, and you'll eventually get them all.
So, you shouldn't have any trouble waiting on multiple asyncs with waitAny.

Synchronising parallel processes in Haskell

In Haskell, how can I run multiple (monad?) functions (such as print) in parallel and see their output in the order of their finish time? I want three processes each one ending up in a print function.
import Control.Parallel
main = a `par` b `pseq` (a,b)
where
a = print("ack",ack 3 10)
b = print("fac",fac 42)
If I don't use pseq, it will show the last one specified in par combination. I want to make sure all processes are finished before the program ends. I tried this but it does not show the output of a,b:
...
main = a `par` b `pseq` print("done.")
...
Note: my program ends with the following lines:
fac 0 = 1
fac n = n * fac (n-1)
ack 0 n = n+1
ack m 0 = ack (m-1) 1
ack m n = ack (m-1) (ack m (n-1))

Don't use Control.Parallel for running IO actions concurrently.
Control.Concurrent.Async should do what you want – Control.Parallel is used for hinting which pure operations can be evaluated simultaneously, not for running multiple actions (monadic or otherwise) at the same time.

In the C family of languages, there's a difference between these 2 lines
a = foobar; // This just sets a to be a function pointer to foobar
b = foobar(); // This actually runs foobar and stores the result in `b`
Your code basically has the same problem as if you were writing C and forgot the () at the end of all your function calls. It assigns the function pointers a and b in parallel but doesn't actually call them.
If you are writing pure functions in Haskell, you can add parallelism using par, but it doesn't work for IO because IO a is like a function pointer. Your code "evaluates" a and b in parallel but the result of that evaluation is still waiting for you to actually execute it.
For code that lives in IO, you have to import Control.Concurrent and use the function forkIO to create the new thread. Then because the child threads all automatically die when the main thread finishes, you need some way of making the main thread wait (I'll use MVars in the example because they're the simplest reliable way to do it)
import Control.Concurrent
import Control.Concurrent.MVar
main = do
-- MVars have a type parameter because you can also use them to send data
-- between threads, but in this example I'm basically just using them as locks
await <- newEmptyMVar :: IO (MVar ())
bwait <- newEmptyMVar :: IO (MVar ())
forkIO $ print ("ack",ack 3 10) >> putMVar await ()
forkIO $ print ("fac",fac 42) >> putMVar bwait ()
takeMVar await
takeMVar bwait

Haskell - Actor based mutability

I'm working on a haskell network application and I use the actor pattern to manage multithreading. One thing I came across is how to store for example a set of client sockets/handles. Which of course must be accessible for all threads and can change when clients log on/off.
Since I'm coming from the imperative world I thought about some kind of lock-mechanism but when I noticed how ugly this is I thought about "pure" mutability, well actually it's kind of pure:
import Control.Concurrent
import Control.Monad
import Network
import System.IO
import Data.List
import Data.Maybe
import System.Environment
import Control.Exception
newStorage :: (Eq a, Show a) => IO (Chan (String, Maybe (Chan [a]), Maybe a))
newStorage = do
q <- newChan
forkIO $ storage [] q
return q
newHandleStorage :: IO (Chan (String, Maybe (Chan [Handle]), Maybe Handle))
newHandleStorage = newStorage
storage :: (Eq a, Show a) => [a] -> Chan (String, Maybe (Chan [a]), Maybe a) -> IO ()
storage s q = do
let loop = (`storage` q)
(req, reply, d) <- readChan q
print ("processing " ++ show(d))
case req of
"add" -> loop ((fromJust d) : s)
"remove" -> loop (delete (fromJust d) s)
"get" -> do
writeChan (fromJust reply) s
loop s
store s d = writeChan s ("add", Nothing, Just d)
unstore s d = writeChan s ("remove", Nothing, Just d)
request s = do
chan <- newChan
writeChan s ("get", Just chan, Nothing)
readChan chan
The point is that a thread (actor) is managing a list of items and modifies the list according to incoming requests. Since thread are really cheap I thought this could be a really nice functional alternative.
Of course this is just a prototype (a quick dirty proof of concept).
So my question is:
Is this a "good" way of managing shared mutable variables (in the actor world) ?
Is there already a library for this pattern ? (I already searched but I found nothing)
Regards,
Chris

Here is a quick and dirty example using stm and pipes-network. This will set up a simple server that allows clients to connect and increment or decrement a counter. It will display a very simple status bar showing the current tallies of all connected clients and will remove client tallies from the bar when they disconnect.
First I will begin with the server, and I've generously commented the code to explain how it works:
import Control.Concurrent.STM (STM, atomically)
import Control.Concurrent.STM.TVar
import qualified Data.HashMap.Strict as H
import Data.Foldable (forM_)
import Control.Concurrent (forkIO, threadDelay)
import Control.Monad (unless)
import Control.Monad.Trans.State.Strict
import qualified Data.ByteString.Char8 as B
import Control.Proxy
import Control.Proxy.TCP
import System.IO
main = do
hSetBuffering stdout NoBuffering
{- These are the internal data structures. They should be an implementation
detail and you should never expose these references to the
"business logic" part of the application. -}
-- I use nRef to keep track of creating fresh Ints (which identify users)
nRef <- newTVarIO 0 :: IO (TVar Int)
{- hMap associates every user (i.e. Int) with a counter
Notice how I've "striped" the hash map by storing STM references to the
values instead of storing the values directly. This means that I only
actually write the hashmap when adding or removing users, which reduces
contention for the hash map.
Since each user gets their own unique STM reference for their counter,
modifying counters does not cause contention with other counters or
contention with the hash map. -}
hMap <- newTVarIO H.empty :: IO (TVar (H.HashMap Int (TVar Int)))
{- The following code makes heavy use of Haskell's pure closures. Each
'let' binding closes over its current environment, which is safe since
Haskell is pure. -}
let {- 'getCounters' is the only server-facing command in our STM API. The
only permitted operation is retrieving the current set of user
counters.
'getCounters' closes over the 'hMap' reference currently in scope so
that the server never needs to be aware about our internal
implementation. -}
getCounters :: STM [Int]
getCounters = do
refs <- fmap H.elems (readTVar hMap)
mapM readTVar refs
{- 'init' is the only client-facing command in our STM API. It
initializes the client's entry in the hash map and returns two
commands: the first command is what the client calls to 'increment'
their counter and the second command is what the client calls to log
off and delete
'delete' command.
Notice that those two returned commands each close over the client's
unique STM reference so the client never needs to be aware of how
exactly 'init' is implemented under the hood. -}
init :: STM (STM (), STM ())
init = do
n <- readTVar nRef
writeTVar nRef $! n + 1
ref <- newTVar 0
modifyTVar' hMap (H.insert n ref)
let incrementRef :: STM ()
incrementRef = do
mRef <- fmap (H.lookup n) (readTVar hMap)
forM_ mRef $ \ref -> modifyTVar' ref (+ 1)
deleteRef :: STM ()
deleteRef = modifyTVar' hMap (H.delete n)
return (incrementRef, deleteRef)
{- Now for the actual program logic. Everything past this point only uses
the approved STM API (i.e. 'getCounters' and 'init'). If I wanted I
could factor the above approved STM API into a separate module to enforce
the encapsulation boundary, but I am lazy. -}
{- Fork a thread which polls the current state of the counters and displays
it to the console. There is a way to implement this without polling but
this gets the job done for now.
Most of what it is doing is just some simple tricks to reuse the same
console line instead of outputting a stream of lines. Otherwise it
would be just:
forkIO $ forever $ do
ns <- atomically getCounters
print ns
-}
forkIO $ (`evalStateT` 0) $ forever $ do
del <- get
lift $ do
putStr (replicate del '\b')
putStr (replicate del ' ' )
putStr (replicate del '\b')
ns <- lift $ atomically getCounters
let str = show ns
lift $ putStr str
put $! length str
lift $ threadDelay 10000
{- Fork a thread for each incoming connection, which listens to the client's
commands and translates them into 'STM' actions -}
serve HostAny "8080" $ \(socket, _) -> do
(increment, delete) <- atomically init
{- Right now, just do the dumb thing and convert all keypresses into
increment commands, with the exception of the 'q' key, which will
quit -}
let handler :: (Proxy p) => () -> Consumer p Char IO ()
handler () = runIdentityP loop
where
loop = do
c <- request ()
unless (c == 'q') $ do
lift $ atomically increment
loop
{- This uses my 'pipes' library. It basically is a high-level way to
say:
* Read binary packets from the socket no bigger than 4096 bytes
* Get the first character from each packet and discard the rest
* Handle the character using the above 'handler' function -}
runProxy $ socketReadS 4096 socket >-> mapD B.head >-> handler
{- The above pipeline finishes either when the socket closes or
'handler' stops looping because it received a 'q'. Either case means
that the client is done so we log them out using 'delete'. -}
atomically delete
Next up is the client, which simply opens a connections and forwards all key presses as single packets:
import Control.Monad
import Control.Proxy
import Control.Proxy.Safe
import Control.Proxy.TCP.Safe
import Data.ByteString.Char8 (pack)
import System.IO
main = do
hSetBuffering stdin NoBuffering
hSetEcho stdin False
{- Again, this uses my 'pipes' library. It basically says:
* Read characters from the console using 'commands'
* Pack them into a binary format
* send them to a server running at 127.0.0.1:8080
This finishes looping when the user types a 'q' or the connection is
closed for whatever reason.
-}
runSafeIO $ runProxy $ runEitherK $
try . commands
>-> mapD (\c -> pack [c])
>-> connectWriteD Nothing "127.0.0.1" "8080"
commands :: (Proxy p) => () -> Producer p Char IO ()
commands () = runIdentityP loop
where
loop = do
c <- lift getChar
respond c
unless (c == 'q') loop
It's pretty simple: commands generates a stream of Chars, which then get converted to ByteStrings and then sent as packets to the server.
If you run the server and a few clients and have them each type in a few keys, your server display will output a list showing how many keys each client typed:
[1,6,4]
... and if some of the clients disconnect they will be removed from the list:
[1,4]
Note that the pipes component of these examples will simplify greatly in the upcoming pipes-4.0.0 release, but the current pipes ecosystem still gets the job done as is.

First, I'd definitely recommend using your own specific data type for representing commands. When using (String, Maybe (Chan [a]), Maybe a) a buggy client can crash your actor simply by sending an unknown command or by sending ("add", Nothing, Nothing), etc. I'd suggest something like
data Command a = Add a | Remove a | Get (Chan [a])
Then you can pattern match on commands in storage in a save way.
Actors have their advantages, but also I feel that they have some drawbacks. For example, getting an answer from an actor requires sending it a command and then waiting for a reply. And the client can't be completely sure that it gets a reply and that the reply will be of some specific type - you can't say I want only answers of this type (and how many of them) for this particular command.
So as an example I'll give a simple, STM solution. It'd be better to use a hash table or a (balanced tree) set, but since Handle implements neither Ord nor Hashable, we can't use these data structures, so I'll keep using lists.
module ThreadSet (
TSet, add, remove, get
) where
import Control.Monad
import Control.Monad.STM
import Control.Concurrent.STM.TVar
import Data.List (delete)
newtype TSet a = TSet (TVar [a])
add :: (Eq a) => a -> TSet a -> STM ()
add x (TSet v) = readTVar v >>= writeTVar v . (x :)
remove :: (Eq a) => a -> TSet a -> STM ()
remove x (TSet v) = readTVar v >>= writeTVar v . delete x
get :: (Eq a) => TSet a -> STM [a]
get (TSet v) = readTVar v
This module implements a STM based set of arbitrary elements. You can have multiple such sets and use them together in a single STM transaction that succeeds or fails at once. For example
-- | Ensures that there is exactly one element `x` in the set.
add1 :: (Eq a) => a -> TSet a -> STM ()
add1 x v = remove x v >> add x v
This would be difficult with actors, you'd have to add it as another command for the actor, you can't compose it of existing actions and still have atomicity.
Update: There is an interesting article explaining why Clojure designers chose not to use actors. For example, using actors, even if you have many reads and only very little writes to a mutable structure, they're all serialized, which can greatly impact performance.

Compilation error: "The last statement in a 'do' construct must be an expression"

The following is my dining philosophers code and yields a compilation error saying "The last statement in a 'do' construct must be an expression: mVar2 <- newEmptyMVar mVar3"
Can Somebody help me fix this error and get this program working? thank you
import Control.Concurrent
import Control.Concurrent.MVar
import System.Random
takefork :: Int -> forks -> IO ()
takefork n forks = takeMVar (forks!!n)
releasefork :: Int -> forks -> IO ()
releasefork n forks = putMVar (forks!!n)
philosopher :: [Int]
philosopher = [1,2,3,4,5]
forks :: [MVar] -> [Int]
forks = do
takefork n ( philosopher - 1)
threadDelay delay
let delay = 100000
takefork n philosopher
putStrLn("Philosopher" ++ philosopher ++ "has started eating")
releasefork n philosopher
releasefork n ( philosopher - 1)
ptStrLn ("Philosopher" ++ philosopher ++ "has stopped eating")
forks
main :: IO ()
main = do
mVar1 <- newEmptyMVar
mVar2 <- newEmptyMVar
mVar3 <- newEmptyMVar
mVar4 <- newEmptyMVar
mVar5 <- newEmptyMVar
let mVar = [mVar1, mVar2, mVar3, mVar4, mVar5]
sequence_ [ forkIO forks (mVar philosopher) ]

There are many problems with your code.
The error message you report indicates you are probably mixing spaces and tabs. Get rid of the tabs and use only spaces.
You are presumably writing this program in order to practice writing Haskell programs, not in order to run the program for fun and profit. So we don't want to simply give you a working Dining Philosophers implementation, we want to help you write your implementation.
I cannot tell from your code how you expect it to work.
I'm going to focus on the last line:
sequence_ [ forkIO forks (mVar philosopher) ]
sequence_ :: [IO a] -> IO () --- give sequence_ a list of i/o actions, and it (returns an i/o action that) performs each action in order. From the [...], it looks like you are trying to give it a list, but with only one element. This is probably not what you mean.
forkIO :: IO () -> IO ThreadID --- give forkIO an i/o action, and it (returns an i/o action that) starts that i/o action running in a new thread, giving you the id of that thread.
There are two problems here:
forks is a function, not an i/o action (it's not even a function that returns an i/o action, though you probably mean it to be)
you give forkIO a second argunment ((mVar philosopher)), but it only takes one argument
mVar philosopher itself doesn't make any sense: mVar :: [MVar a] (it's a list of MVars, and I haven't worked out what type the MVars are supposed to contain) but you treat it like a function, passing it philosopher as an argument.
At this point a lightbulb blinks on above my head. You wish to call forks with parameters mVar and philosopher?
sequence_ [ forkIO (forks mVar philosopher) ]
We're still sequencing a single action though. Perhaps you wish to call forks with each element of philosopher in turn?
sequence_ $ map (\n -> forkIO (forks mVar n)) philosopher
We can simplify this to
mapM_ (\n -> forkIO (forks mVar n)) philosopher
This doesn't match up with the type you given forks :: [MVar] -> [Int]. But that's probably wrong, so you'll want to fix that function next.

Strict evaluation techniques for concurrent channels in Haskell

I'm toying with Haskell threads, and I'm running into the problem of communicating lazily-evaluated values across a channel. For example, with N worker threads and 1 output thread, the workers communicate unevaluated work and the output thread ends up doing the work for them.
I've read about this problem in various documentation and seen various solutions, but I only found one solution that works and the rest do not. Below is some code in which worker threads start some computation that can take a long time. I start the threads in descending order, so that the first thread should take the longest, and the later threads should finish earlier.
import Control.Concurrent (forkIO)
import Control.Concurrent.Chan -- .Strict
import Control.Concurrent.MVar
import Control.Exception (finally, evaluate)
import Control.Monad (forM_)
import Control.Parallel.Strategies (using, rdeepseq)
main = (>>=) newChan $ (>>=) (newMVar []) . run
run :: Chan (Maybe String) -> MVar [MVar ()] -> IO ()
run logCh statVars = do
logV <- spawn1 readWriteLoop
say "START"
forM_ [18,17..10] $ spawn . busyWork
await
writeChan logCh Nothing -- poison the logger
takeMVar logV
putStrLn "DONE"
where
say mesg = force mesg >>= writeChan logCh . Just
force s = mapM evaluate s -- works
-- force s = return $ s `using` rdeepseq -- no difference
-- force s = return s -- no-op; try this with strict channel
busyWork = say . show . sum . filter odd . enumFromTo 2 . embiggen
embiggen i = i*i*i*i*i
readWriteLoop = readChan logCh >>= writeReadLoop
writeReadLoop Nothing = return ()
writeReadLoop (Just mesg) = putStrLn mesg >> readWriteLoop
spawn1 action = do
v <- newEmptyMVar
forkIO $ action `finally` putMVar v ()
return v
spawn action = do
v <- spawn1 action
modifyMVar statVars $ \vs -> return (v:vs, ())
await = do
vs <- modifyMVar statVars $ \vs -> return ([], vs)
mapM_ takeMVar vs
Using most techniques, the results are reported in the order spawned; that is, the longest-running computation first. I interpret this to mean that the output thread is doing all the work:
-- results in order spawned (longest-running first = broken)
START
892616806655
503999185040
274877906943
144162977343
72313663743
34464808608
15479341055
6484436675
2499999999
DONE
I thought the answer to this would be strict channels, but they didn't work. I understand that WHNF for strings is insufficient because that would just force the outermost constructor (nil or cons for the first character of the string). The rdeepseq is supposed to fully evaluate, but it makes no difference. The only thing I've found that works is to map Control.Exception.evaluate :: a -> IO a over all the characters in the string. (See the force function comments in the code for several different alternatives.) Here's the result with Control.Exception.evaluate:
-- results in order finished (shortest-running first = correct)
START
2499999999
6484436675
15479341055
34464808608
72313663743
144162977343
274877906943
503999185040
892616806655
DONE
So why don't strict channels or rdeepseq produce this result? Are there other techniques? Am I misinterpreting why the first result is broken?

There are two issues going on here.
The reason the first attempt (using an explicit rnf) doesn't work is that, by using return, you've created a thunk that fully evaluates itself when it is evaluated, but the thunk itself has not being evaluated. Notice that the type of evaluate is a -> IO a: the fact that it returns a value in IO means that evaluate can impose ordering:
return (error "foo") >> return 1 == return 1
evaluate (error "foo") >> return 1 == error "foo"
The upshot is that this code:
force s = evaluate $ s `using` rdeepseq
will work (as in, have the same behavior as mapM_ evaluate s).
The case of using strict channels is a little trickier, but I believe this is due to a bug in strict-concurrency. The expensive computation is actually being run on the worker threads, but it's not doing you much good (you can check for this explicitly by hiding some asynchronous exceptions in your strings and seeing which thread the exception surfaces on).
What's the bug? Let's take a look at the code for strict writeChan:
writeChan :: NFData a => Chan a -> a -> IO ()
writeChan (Chan _read write) val = do
new_hole <- newEmptyMVar
modifyMVar_ write $ \old_hole -> do
putMVar old_hole $! ChItem val new_hole
return new_hole
We see that modifyMVar_ is called on write before we evaluate the thunk. The sequence of operations then is:
writeChan is entered
We takeMVar write (blocking anyone else who wants to write to the channel)
We evaluate the expensive thunk
We put the expensive thunk onto the channel
We putMVar write, unblocking all of the other threads
You don't see this behavior with the evaluate variants, because they perform the evaluation before the lock is acquired.
I’ll send Don mail about this and see if he agrees that this behavior is kind of suboptimal.
Don agrees that this behavior is suboptimal. We're working on a patch.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Why does my parallel traversal Haskell program leak memory? - haskell

Related

Waiting on multiple async's reliably?

Synchronising parallel processes in Haskell

Haskell - Actor based mutability

Compilation error: "The last statement in a 'do' construct must be an expression"

Strict evaluation techniques for concurrent channels in Haskell

Categories

Resources