Synchronized section in async map

Synchronized section in async map - haskell

I have a big IO function that will continuesly load data from a folder, perform pure calculations on the data, and write it back.
I am running this function over multiple folders in parallel using
mapConcurrently_ iofun folderList
from http://hackage.haskell.org/package/async-2.1.1.1/docs/Control-Concurrent-Async.html#v%3amapConcurrently
This works perfecty... but a little bit too well. Now even the character output of the putStrLn calls are async, which leads to an unreadable console log.
Is there a way to make IO Actions synchronized or even better a synchronized version of putStrLn?

The way you coordinate threads is via MVars or TVars if you want to use STM. You can read all about them in "Parallel and Concurrent Haskell". You could do something like:
do mutex <- newMVar ()
let putStrLn' = withMVar mutex . const . putStrLn
mapConcurrently_ (iofunPrintingWith putStrLn') folderList

Related

Behavior of async package

What would the following code using the async package do:
action <- async $ mapM_ someFunc someList
wait action
Will this merely spawn a single thread in which mapM_ occurs? (Implying that this has no benefit over just mapM_ someFunc someList)
Or will it perform the mapM_ action asynchronously (or is mapConcurrently the only way to get such behavior)?

Will this merely spawn a single thread in which mapM_ occurs?
Yes, it will fork a thread and immediately block waiting for the mapM_ to finish and return a () (or to throw an exception).
The async package is very simple; you might like to look at the source to see how it all works together and learn more about the underlying haskell concurrency primitives.

Concurrency considerations between pipes and non-pipes code

I'm in the process of wrapping a C library for some encoding in a pipes interface, but I've hit upon some design decisions that need to be made.
After the C library is set up, we hold on to an encoder context. With this, we can either encode, or change some parameters (let's call the Haskell interface to this last function tune :: Context -> Int -> IO ()). There are two parts to my question:
The encoding part is easily wrapped up in a Pipe Foo Bar IO (), but I would also like to expose tune. Since simultaneous use of the encoding context must be lock protected, I would need to take a lock at every iteration in the pipe, and protect tune with taking the same lock. But now I feel I'm forcing hidden locks on the user. Am I barking up the wrong tree here? How is this kind of situation normally resolved in the pipes ecosystem? In my case I expect the pipe that my specific code is part of to always run in its own thread, with tuning happening concurrently, but I don't want to force this point of view upon any users. Other packages in the pipes ecosystem do not seem to force their users like either.
An encoding context that is no longer used needs to be properly de-initialized. How does one, in the pipes ecosystem, ensure that such things (in this case performing som IO actions) are taken care of when the pipe is destroyed?
A concrete example would be wrapping a compression library, in which case the above can be:
The compression strength is tunable. We set up the pipe and it runs along merrily. How should one best go about allowing the compression strength setting to be changed while the pipe keeps running, assuming that concurrent access to the compression codec context must be serialized?
The compression library allocated a bunch of memory off the Haskell heap when set up, and we'll need to call some library function to clean this up when the pipe is torn down.
Thanks… this might all be obvious, but I'm quite new to the pipes ecosystem.
Edit: Reading this after posting, I'm quite sure it's the vaguest question I've ever asked here. Ugh! Sorry ;-)

Regarding (1), the general solution is to change your Pipe's type to:
Pipe (Either (Context, Int) Foo) Bar IO ()
In other words, it accepts both Foo inputs and tune requests, which it processes internally.
So let's then assume that you have two concurrent Producers corresponding to inputs and tune requests:
producer1 :: Producer Foo IO ()
producer2 :: Producer (Context, Int) IO ()
You can use pipes-concurrency to create a buffer that they both feed into, like this:
example = do
(output, input) <- spawn Unbounded
-- input :: Input (Either (Context, Int) Foo)
-- output :: Output (Either (Context, Int) Foo)
let io1 = runEffect $ producer1 >-> Pipes.Prelude.map Right >-> toOutput output
io2 = runEffect $ producer2 >-> Pipes.Prelude.map Left >-> toOutput output
as <- mapM async [io1, io2]
runEffect (fromInput >-> yourPipe >-> someConsumer)
mapM_ wait as
You can learn more about the pipes-concurrency library by reading this tutorial.
By forcing all tune requests to go through the same single-threaded Pipe you can ensure that you don't accidentally have two concurrent invocations of the tune function.
Regarding (2) there are two ways you can acquire a resource using pipes. The more sophisticated approach is to use the pipes-safe library, which provides a bracket function that you can use within a Pipe, but that is probably overkill for your purpose and only exists for acquiring and releasing multiple resources over the lifetime of a pipe. A simpler solution is just to use the following with idiom to acquire the pipe:
withEncoder :: (Pipe Foo Bar IO () -> IO r) -> IO r
withEncoder k = bracket acquire release $ \resource -> do
k (createPipeFromResource resource)
Then a user would just write:
withEncoder $ \yourPipe -> do
runEffect (someProducer >-> yourPipe >-> someConsumer)
You can optionally use the managed package, which simplifies the types a bit and makes it easier to acquire multiple resources. You can learn more about it from reading this blog post of mine.

How to implement Haskell equivalent of invokeAll

I'm looking for equivalent functionality to ExecutorService.invokeAll in Haskell. Do I need to implement it? From documentation:
Executes the given tasks, returning a list of Futures holding their status and results when all complete.
In my use case, tasks spend most of their time waiting on IO, so I only need to avoid constantly blocking main thread which would build up a collection of Either results or errors.

The standard solution to that category of problems is the "async" library. As much as I understand your problem you need something like this:
import Control.Concurrent.Async
processActions :: [IO a] -> IO ()
processActions actions = do
-- Execute all the actions concurrently and
-- get a list of asyncs (an equivalent to futures):
asyncs <- mapM async actions
-- Do whatever you want...
-- ...
-- Block until all the asyncs complete:
mapM_ wait asyncs
However the library provides a lot of other patterns, so you should just check it out. There's a possibility that what you need is the mapConcurrently function.

No problems. Just use the async library and run mapM async someListOfJobs to get a list of Asyncs on which you can wait, poll, and many other operations.

Implement main server loop in Haskell?

What is the generally accepted way to implement the main loop of a server that needs to wait on a heterogeneous set of events? That is the server should wait (not busywait) until one of the following occurs:
new socket connection
data available on an existing socket
OS signal
third-party library callbacks

I think you're thinking in terms of a C paradigm with a single thread, nonblocking I/O, and a select() call.
You can manage to write something like that in Haskell, but Haskell has much more to offer:
lightweight threads
safe and efficient concurrent data primitives like Mvar and Chan
the Big Gun: Software Transactional Memory
I recommend you fork a new thread for every separate point of contact with the outside world, and keep everything coordinated with STM.

Use takeMVar and putMVar to synchronize between threads. They generally block the thread if operation is not permitted.
Read ghc docs.

I'd like to make it clear I think the two solutions posted first are better than this one for the specific problem you have, but here's a way to solve the type of problem you presented.
A simple way round this is to take your definitions like
data SocketConn = ....
data DataAvail = ...
data OSSignal = ...
data Callback = ...
and define the unsimplified version of
data ServerEvent = Sok SocketConn | Dat DataAvail | Sig OSSignal | Call Callback
handleEvent :: ServerEvent -> IO ()
handleEvent (Soc s) = ....
handleEvent (Dat d) = ....
handleEvent (Sig o) = ....
handleEvent (Call c) = ....
Like I say, read up on the other answers!

Software Transactional Memory (STM) is the main way to do a multi-way wait.
However, by the looks of things, in your case you probably just want to spawn a seperate Haskell thread for each task, and let each such thread block while there's nothing happening.
You wouldn't want to create a thousand OS threads, but a thousand Haskell threads is no trouble at all.
(If these threads need to coordinate from time to time, then again, STM is probably the simplest, most reliable way to do that.)

A way to form a 'select' on MVars without polling

I have two MVars (well an MVar and a Chan). I need to pull things out of the Chan and process them until the other MVar is not empty any more. My ideal solution would be something like the UNIX select function where I pass in a list of (presumably empty) MVars and the thread blocks until one of them is full, then it returns the full MVar. Try as I might I can think of no way of doing this beyond repeatedly polling each MVar with isEmptyMVar until I get false. This seems inefficient.
A different thought was to use throwTo, but it interrupts what ever is happening in the thread and I need to complete processing a job out the the Chan in an atomic fashion.
A final thought as I'm typing is to create a new forkIO for each MVar which tries to read its MVar then fill a newly created MVar with its own instance. The original thread can then block on that MVar. Are Haskell threads cheap enough to go running that many?

Haskell threads are very cheap, so you could solve it that way, but it sounds like STM would be a better fit for your problem. With STM you can do
do var <- atomically (takeTMVar a `orElse` takeTMVar b)
... do stuff with var
Because of the behavior of retry and orElse, this code tries to get a, then if that fails, get b. If both fail, it blocks until either of them is updated and tries again.
You could even use this to make your own rudimentary version of select:
select :: [TMVar a] -> STM a
select = foldr1 orElse . map takeTMVar

How about using STM versions, TChan and TVar, with the retry and orElse behavior?
Implementing select is one of STM's nice capabilities. From "Composable Memory Transactions":
Beyond this, we also provide orElse,
which allows them to be composed as alternatives, so that
the second is run if the ﬁrst retries (Section 3.4). This ability allows threads to wait for many things at once, like the
Unix select system call – except that orElse composes well,
whereas select does not.
orElse in RWH.
The STM package
Papers on Haskell's STM

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string