How to implement Haskell equivalent of invokeAll

How to implement Haskell equivalent of invokeAll - haskell

I'm looking for equivalent functionality to ExecutorService.invokeAll in Haskell. Do I need to implement it? From documentation:
Executes the given tasks, returning a list of Futures holding their status and results when all complete.
In my use case, tasks spend most of their time waiting on IO, so I only need to avoid constantly blocking main thread which would build up a collection of Either results or errors.

The standard solution to that category of problems is the "async" library. As much as I understand your problem you need something like this:
import Control.Concurrent.Async
processActions :: [IO a] -> IO ()
processActions actions = do
-- Execute all the actions concurrently and
-- get a list of asyncs (an equivalent to futures):
asyncs <- mapM async actions
-- Do whatever you want...
-- ...
-- Block until all the asyncs complete:
mapM_ wait asyncs
However the library provides a lot of other patterns, so you should just check it out. There's a possibility that what you need is the mapConcurrently function.

No problems. Just use the async library and run mapM async someListOfJobs to get a list of Asyncs on which you can wait, poll, and many other operations.

Related

Synchronized section in async map

I have a big IO function that will continuesly load data from a folder, perform pure calculations on the data, and write it back.
I am running this function over multiple folders in parallel using
mapConcurrently_ iofun folderList
from http://hackage.haskell.org/package/async-2.1.1.1/docs/Control-Concurrent-Async.html#v%3amapConcurrently
This works perfecty... but a little bit too well. Now even the character output of the putStrLn calls are async, which leads to an unreadable console log.
Is there a way to make IO Actions synchronized or even better a synchronized version of putStrLn?

The way you coordinate threads is via MVars or TVars if you want to use STM. You can read all about them in "Parallel and Concurrent Haskell". You could do something like:
do mutex <- newMVar ()
let putStrLn' = withMVar mutex . const . putStrLn
mapConcurrently_ (iofunPrintingWith putStrLn') folderList

Behavior of async package

What would the following code using the async package do:
action <- async $ mapM_ someFunc someList
wait action
Will this merely spawn a single thread in which mapM_ occurs? (Implying that this has no benefit over just mapM_ someFunc someList)
Or will it perform the mapM_ action asynchronously (or is mapConcurrently the only way to get such behavior)?

Will this merely spawn a single thread in which mapM_ occurs?
Yes, it will fork a thread and immediately block waiting for the mapM_ to finish and return a () (or to throw an exception).
The async package is very simple; you might like to look at the source to see how it all works together and learn more about the underlying haskell concurrency primitives.

Distributing Haskell on a cluster

I have a piece of code that process files,
processFiles :: [FilePath] -> (FilePath -> IO ()) -> IO ()
This function spawns an async process that execute an IO action. This IO action must be submitted to a cluster through a job scheduling system (e.g Slurm).
Because I must use the job scheduling system, it's not possible to use cloudHaskell to distribute the closure. Instead the program writes a new Main.hs containing the desired computations, that is copy to the cluster node together with all the modules that main depends on and then it is executed remotely with "runhaskell Main.hs [opts]". Then the async process should ask periodically to the job scheduling system (using threadDelay) if the job is done.
Is there a way to avoid creating a new Main? Can I serialize the IO action and execute it somehow in the node?

Yep. There is a magical library called packman. It allows you to turn any haskell thing into data (as long as it does not have IORefs or related things in them.) Here the things you would need:
trySerialize :: a -> IO (Serialized a)
deserialize :: Serialized a -> IO a
instance Typeable a => Binary (Serialized a)
Yep, those are the exact types. You can package up your IO actions using trySerialize, use Binary to transfer it to wherever, and then deserialize to get the IO action out, ready for use.
Caveats for packman is that:
It stores things as thunks. This is probably what you want, so that the node can do the evaluating.
That said, if your thunk is huge, the Binary will probably be huge. Evaluating the thunk can fix this.
Like I said, mutable references are a no-no. One thing to watch out is them being inside thunks without you knowing it.
Other than that, this seems like what you want!

Concurrency considerations between pipes and non-pipes code

I'm in the process of wrapping a C library for some encoding in a pipes interface, but I've hit upon some design decisions that need to be made.
After the C library is set up, we hold on to an encoder context. With this, we can either encode, or change some parameters (let's call the Haskell interface to this last function tune :: Context -> Int -> IO ()). There are two parts to my question:
The encoding part is easily wrapped up in a Pipe Foo Bar IO (), but I would also like to expose tune. Since simultaneous use of the encoding context must be lock protected, I would need to take a lock at every iteration in the pipe, and protect tune with taking the same lock. But now I feel I'm forcing hidden locks on the user. Am I barking up the wrong tree here? How is this kind of situation normally resolved in the pipes ecosystem? In my case I expect the pipe that my specific code is part of to always run in its own thread, with tuning happening concurrently, but I don't want to force this point of view upon any users. Other packages in the pipes ecosystem do not seem to force their users like either.
An encoding context that is no longer used needs to be properly de-initialized. How does one, in the pipes ecosystem, ensure that such things (in this case performing som IO actions) are taken care of when the pipe is destroyed?
A concrete example would be wrapping a compression library, in which case the above can be:
The compression strength is tunable. We set up the pipe and it runs along merrily. How should one best go about allowing the compression strength setting to be changed while the pipe keeps running, assuming that concurrent access to the compression codec context must be serialized?
The compression library allocated a bunch of memory off the Haskell heap when set up, and we'll need to call some library function to clean this up when the pipe is torn down.
Thanks… this might all be obvious, but I'm quite new to the pipes ecosystem.
Edit: Reading this after posting, I'm quite sure it's the vaguest question I've ever asked here. Ugh! Sorry ;-)

Regarding (1), the general solution is to change your Pipe's type to:
Pipe (Either (Context, Int) Foo) Bar IO ()
In other words, it accepts both Foo inputs and tune requests, which it processes internally.
So let's then assume that you have two concurrent Producers corresponding to inputs and tune requests:
producer1 :: Producer Foo IO ()
producer2 :: Producer (Context, Int) IO ()
You can use pipes-concurrency to create a buffer that they both feed into, like this:
example = do
(output, input) <- spawn Unbounded
-- input :: Input (Either (Context, Int) Foo)
-- output :: Output (Either (Context, Int) Foo)
let io1 = runEffect $ producer1 >-> Pipes.Prelude.map Right >-> toOutput output
io2 = runEffect $ producer2 >-> Pipes.Prelude.map Left >-> toOutput output
as <- mapM async [io1, io2]
runEffect (fromInput >-> yourPipe >-> someConsumer)
mapM_ wait as
You can learn more about the pipes-concurrency library by reading this tutorial.
By forcing all tune requests to go through the same single-threaded Pipe you can ensure that you don't accidentally have two concurrent invocations of the tune function.
Regarding (2) there are two ways you can acquire a resource using pipes. The more sophisticated approach is to use the pipes-safe library, which provides a bracket function that you can use within a Pipe, but that is probably overkill for your purpose and only exists for acquiring and releasing multiple resources over the lifetime of a pipe. A simpler solution is just to use the following with idiom to acquire the pipe:
withEncoder :: (Pipe Foo Bar IO () -> IO r) -> IO r
withEncoder k = bracket acquire release $ \resource -> do
k (createPipeFromResource resource)
Then a user would just write:
withEncoder $ \yourPipe -> do
runEffect (someProducer >-> yourPipe >-> someConsumer)
You can optionally use the managed package, which simplifies the types a bit and makes it easier to acquire multiple resources. You can learn more about it from reading this blog post of mine.

A way to form a 'select' on MVars without polling

I have two MVars (well an MVar and a Chan). I need to pull things out of the Chan and process them until the other MVar is not empty any more. My ideal solution would be something like the UNIX select function where I pass in a list of (presumably empty) MVars and the thread blocks until one of them is full, then it returns the full MVar. Try as I might I can think of no way of doing this beyond repeatedly polling each MVar with isEmptyMVar until I get false. This seems inefficient.
A different thought was to use throwTo, but it interrupts what ever is happening in the thread and I need to complete processing a job out the the Chan in an atomic fashion.
A final thought as I'm typing is to create a new forkIO for each MVar which tries to read its MVar then fill a newly created MVar with its own instance. The original thread can then block on that MVar. Are Haskell threads cheap enough to go running that many?

Haskell threads are very cheap, so you could solve it that way, but it sounds like STM would be a better fit for your problem. With STM you can do
do var <- atomically (takeTMVar a `orElse` takeTMVar b)
... do stuff with var
Because of the behavior of retry and orElse, this code tries to get a, then if that fails, get b. If both fail, it blocks until either of them is updated and tries again.
You could even use this to make your own rudimentary version of select:
select :: [TMVar a] -> STM a
select = foldr1 orElse . map takeTMVar

How about using STM versions, TChan and TVar, with the retry and orElse behavior?
Implementing select is one of STM's nice capabilities. From "Composable Memory Transactions":
Beyond this, we also provide orElse,
which allows them to be composed as alternatives, so that
the second is run if the ﬁrst retries (Section 3.4). This ability allows threads to wait for many things at once, like the
Unix select system call – except that orElse composes well,
whereas select does not.
orElse in RWH.
The STM package
Papers on Haskell's STM

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string