Multithreading in a monad - multithreading

Multithreading in a monad - multithreading

I need to run multiple concurrent processes in a context of the same monad, say, Connection. I expected something like the following would work:
main = runConnection connectionSettings $ do
forkIO action1
forkIO action2
action3
but forkIO required to be run in an IO context and actions to be in IO too.
Assuming those actions have a type :: Connection (), what needs to be done to run them concurrently (which instances to implement and etc)?
Currently I'm working around this as follows, but evidently this ain't right:
main = do
forkIO $ runConnection connectionSettings action1
forkIO $ runConnection connectionSettings action2
runConnection connectionSettings action3

I've found a beautiful "monad-parallel" library intended for very similar purposes, and an even more powerful fork of it - "classy-parallel".
For a monad to be usable in a parallelizable manner it needs to have an instance of a Parallel typeclass. And the "classy-parallel" library already provides instances for the most important types for such purposes: ResourceT IO and ReaderT.
Assuming the presence of appropriate instances the code in question can be transformed to following:
import qualified Control.Monad.Parallel as Parallel
main = runConnection connectionSettings $
Parallel.sequence [action1, action2, action3]
For simply forking in ResourceT the resourceForkIO can be useful. There's also a "monad-fork" library which provides a neat and simple generalization over forking on top of forkIO and resourceForkIO.

Related

Behavior of async package

What would the following code using the async package do:
action <- async $ mapM_ someFunc someList
wait action
Will this merely spawn a single thread in which mapM_ occurs? (Implying that this has no benefit over just mapM_ someFunc someList)
Or will it perform the mapM_ action asynchronously (or is mapConcurrently the only way to get such behavior)?

Will this merely spawn a single thread in which mapM_ occurs?
Yes, it will fork a thread and immediately block waiting for the mapM_ to finish and return a () (or to throw an exception).
The async package is very simple; you might like to look at the source to see how it all works together and learn more about the underlying haskell concurrency primitives.

Concurrency considerations between pipes and non-pipes code

I'm in the process of wrapping a C library for some encoding in a pipes interface, but I've hit upon some design decisions that need to be made.
After the C library is set up, we hold on to an encoder context. With this, we can either encode, or change some parameters (let's call the Haskell interface to this last function tune :: Context -> Int -> IO ()). There are two parts to my question:
The encoding part is easily wrapped up in a Pipe Foo Bar IO (), but I would also like to expose tune. Since simultaneous use of the encoding context must be lock protected, I would need to take a lock at every iteration in the pipe, and protect tune with taking the same lock. But now I feel I'm forcing hidden locks on the user. Am I barking up the wrong tree here? How is this kind of situation normally resolved in the pipes ecosystem? In my case I expect the pipe that my specific code is part of to always run in its own thread, with tuning happening concurrently, but I don't want to force this point of view upon any users. Other packages in the pipes ecosystem do not seem to force their users like either.
An encoding context that is no longer used needs to be properly de-initialized. How does one, in the pipes ecosystem, ensure that such things (in this case performing som IO actions) are taken care of when the pipe is destroyed?
A concrete example would be wrapping a compression library, in which case the above can be:
The compression strength is tunable. We set up the pipe and it runs along merrily. How should one best go about allowing the compression strength setting to be changed while the pipe keeps running, assuming that concurrent access to the compression codec context must be serialized?
The compression library allocated a bunch of memory off the Haskell heap when set up, and we'll need to call some library function to clean this up when the pipe is torn down.
Thanks… this might all be obvious, but I'm quite new to the pipes ecosystem.
Edit: Reading this after posting, I'm quite sure it's the vaguest question I've ever asked here. Ugh! Sorry ;-)

Regarding (1), the general solution is to change your Pipe's type to:
Pipe (Either (Context, Int) Foo) Bar IO ()
In other words, it accepts both Foo inputs and tune requests, which it processes internally.
So let's then assume that you have two concurrent Producers corresponding to inputs and tune requests:
producer1 :: Producer Foo IO ()
producer2 :: Producer (Context, Int) IO ()
You can use pipes-concurrency to create a buffer that they both feed into, like this:
example = do
(output, input) <- spawn Unbounded
-- input :: Input (Either (Context, Int) Foo)
-- output :: Output (Either (Context, Int) Foo)
let io1 = runEffect $ producer1 >-> Pipes.Prelude.map Right >-> toOutput output
io2 = runEffect $ producer2 >-> Pipes.Prelude.map Left >-> toOutput output
as <- mapM async [io1, io2]
runEffect (fromInput >-> yourPipe >-> someConsumer)
mapM_ wait as
You can learn more about the pipes-concurrency library by reading this tutorial.
By forcing all tune requests to go through the same single-threaded Pipe you can ensure that you don't accidentally have two concurrent invocations of the tune function.
Regarding (2) there are two ways you can acquire a resource using pipes. The more sophisticated approach is to use the pipes-safe library, which provides a bracket function that you can use within a Pipe, but that is probably overkill for your purpose and only exists for acquiring and releasing multiple resources over the lifetime of a pipe. A simpler solution is just to use the following with idiom to acquire the pipe:
withEncoder :: (Pipe Foo Bar IO () -> IO r) -> IO r
withEncoder k = bracket acquire release $ \resource -> do
k (createPipeFromResource resource)
Then a user would just write:
withEncoder $ \yourPipe -> do
runEffect (someProducer >-> yourPipe >-> someConsumer)
You can optionally use the managed package, which simplifies the types a bit and makes it easier to acquire multiple resources. You can learn more about it from reading this blog post of mine.

Clean up child threads when main thread terminates

I'm finding myself trying to write something like this:
main = do t1 <- forkIO (forever io)
t2 <- forkIO (forever io)
forever io
`finally` traverse_ killThread [t1,t2]
But t1 and t2 can't be accessed in finally because it's outside the monad.
Since here the IO actions are run forever, my main concern is giving the threads a chance to exit cleanly in the event of a user interruption or an IOException in the last IO action.
I'm aware that packages like async and threads are great for this, but is it possible to do this easily with the basic concurrency primitives?
BTW, it'd be nice to have the runtime automatically send killThread to all child threads. When wouldn't you want that?

Just realized that there is no problem including the finally in the monadic code block.
main = do t1 <- forkIO (forever io)
t2 <- forkIO (forever io)
forever io `finally` traverse_ killThread [t1,t2]
I'm not marking this question as answered in case someone spots something wrong with this.

You don't need to kill the child threads because the Haskell runtime kills them when the main thread terminates anyway. Normally, you write code in main to wait for the child threads to complete before completing itself.

Ensure IO computations are run in a specific thread

I need to make sure that some actions are run on a specific OS thread. I wrote an API where this thread runs a loop listening to a TQueue and executes the given commands. From the API user side, there is an opaque value that is really just a newtype over this queue.
One problem is that what I really need is to embed arbitrary actions (type IO a), but I believe I can't directly exchange messages of that type. So I currently have something like this (pseudo code) :
makeSafe :: RubyInterpreter -> IO a -> IO (Either RubyError a)
makeSafe (RubyInterpreter q) a = do
mv <- newEmptyTMVarIO
-- embedded is of type IO (), letting me send this in my queue
let embedded = handleErrors a >>= atomically . putTMVar mv
atomically (writeTQueue q (SomeMessage embedded))
atomically (readTMVar mv)
(for more details, this is for the hruby package)
edit - clarifications :
Being able to send actions of type IO a would be nicer, but is not my main objective.
My main problem is that you can shoot yourself in the foot with this API, for example if there is a makeSafe call in the IO action that is passed as a parameter, this will hang.
My secondary problem is that this solution feels a bit contrived, and I wondered if there was a nicer/safer solution around.

Using the Par monad with STM and Deterministic IO

I'm in the process of writing a report for an assignment in which I implemented a concurrent multicore branch and bound algorithm using the STM package and there was an issue I've come up against.
The implementation which uses STM is obviously in the IO monad since it both uses STM's 'atomically' and Concurrent's 'forkIO', but it is deterministic. Despite the use of a shared memory variable, the final result of the function will always be the same for the same input.
My question is, what are my options when it comes to getting out of IO, besides 'unsafePerformIO'? Should I even try and get it out of the IO monad, since the use of multiple cores could potentially affect other concurrent code that doesn't have the same guarantee for determinism.
I've heard of the Par monad package (although not used it), but STM exists in the IO monad, and in order to get thread safe global variables my only alternative to STM is MVars (that I'm aware of), which also exist in the IO monad.

Please do not use unsafePerformIO with STM. STM has side-effects under the hood, using unsafePerformIO hides these side effects, making your code deceptively impure and thus hard or dangerous to refactor. Try harder to see if the parallel package will help you.
One example of unsafe STM operations being unsafe is when you end up using the "pure" STM operation nested inside of another (perhaps by a higher level library). For example, the below code loops (terminates with <loop>) due to the nested STM operations. I recall older GHC versions crashing but can't seem to reproduce that behavior right now with GHC 7.0.1.
import Control.Concurrent
import Control.Concurrent.STM
import System.IO.Unsafe
import GHC.Conc.Sync
main = newTVarIO 5 >>= runComputation >>= print
runComputation :: TVar Int -> IO Int
runComputation tv = atomically $ do
let y = getFiveUnsafe tv + 1
writeTVar tv y
return y
getFiveUnsafe tv = unsafePerformIO . atomically $ do
x <- readTVar tv
writeTVar tv (x + 5)
return x
(I welcome other people editing and adding more convincing examples - I believe better ones exist)

STM and it's related functions cannot be used from unsafePerformIO but forkIO can be, and the new thread can call atomically safely. You can do something like this:
purifiedAlgorithm = unsafePerformIO $ do
rr <- newEmptyMVar
forkIO $ concurrentAlgorithm >> putMVar rr
takeMVar rr

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string