What would the following code using the async package do:
action <- async $ mapM_ someFunc someList
wait action
Will this merely spawn a single thread in which mapM_ occurs? (Implying that this has no benefit over just mapM_ someFunc someList)
Or will it perform the mapM_ action asynchronously (or is mapConcurrently the only way to get such behavior)?
Will this merely spawn a single thread in which mapM_ occurs?
Yes, it will fork a thread and immediately block waiting for the mapM_ to finish and return a () (or to throw an exception).
The async package is very simple; you might like to look at the source to see how it all works together and learn more about the underlying haskell concurrency primitives.
Related
I have a big IO function that will continuesly load data from a folder, perform pure calculations on the data, and write it back.
I am running this function over multiple folders in parallel using
mapConcurrently_ iofun folderList
from http://hackage.haskell.org/package/async-2.1.1.1/docs/Control-Concurrent-Async.html#v%3amapConcurrently
This works perfecty... but a little bit too well. Now even the character output of the putStrLn calls are async, which leads to an unreadable console log.
Is there a way to make IO Actions synchronized or even better a synchronized version of putStrLn?
The way you coordinate threads is via MVars or TVars if you want to use STM. You can read all about them in "Parallel and Concurrent Haskell". You could do something like:
do mutex <- newMVar ()
let putStrLn' = withMVar mutex . const . putStrLn
mapConcurrently_ (iofunPrintingWith putStrLn') folderList
How could I watch several files/sockets from Haskell and wait for these to become readable/writable?
Is there anything like the select/epoll/... in Haskell? Or I am forced to spawn one thread per file/socket and always use the blocking resource from within that thread?
The question is wrong: you aren't forced to spawn one thread per file/socket and use blocking calls, you get to spawn one thread per file/socket and use blocking calls. This is the cleanest solution (in any language); the only reason to avoid it in other languages is that it's a bit inefficient there. GHC's threads are cheap enough, however, that it is not inefficient in Haskell. (Additionally, behind the scenes, GHC's IO manager uses an epoll-alike to wake up threads as appropriate.)
There's a wrapper for select(2): https://hackage.haskell.org/package/select
Example usage here: https://github.com/pxqr/udev/blob/master/examples/monitor.hs#L36
There's a wrapper for poll(2):
https://hackage.haskell.org/package/poll
GHC base comes with functionality that wraps epoll on Linux (and equivalent on other platforms) in the GHC.Event module.
Example usage:
import GHC.Event
import Data.Maybe (fromMaybe)
import Control.Concurrent (threadDelay)
main = do
fd <- getSomeFileDescriptorOfInterest
mgr <- fromMaybe (error "Must be compiled with -threaded") <$> getSystemEventManager
registerFd mgr (\fdkey event -> print event) fd evtRead OneShot
threadDelay 100000000
More documentation at http://hackage.haskell.org/package/base-4.11.1.0/docs/GHC-Event.html
Example use of an older version of the lib at https://wiki.haskell.org/Simple_Servers#Epoll-based_event_callbacks
Though, the loop in that example has since been moved to the hidden module GHC.Event.Manager, and is not exported publicly as far as I can tell. GHC.Event itself says "This module should be considered GHC internal."
In Control.Concurrent there's threadWaitRead and threadWaitWrite.
So, to translate the above epoll example:
import Control.Concurrent (threadWaitRead)
main = do
fd <- getSomeFileDescriptorOfInterest
threadWaitRead fd
putStrLn "Got a read ready event"
You can wrap the threadWaitRead and subsequent IO action in Control.Monad.forever to run them repeatedly. You can also wrap the thing in forkIO to run it in the background while your program does something else.
I'm looking for equivalent functionality to ExecutorService.invokeAll in Haskell. Do I need to implement it? From documentation:
Executes the given tasks, returning a list of Futures holding their status and results when all complete.
In my use case, tasks spend most of their time waiting on IO, so I only need to avoid constantly blocking main thread which would build up a collection of Either results or errors.
The standard solution to that category of problems is the "async" library. As much as I understand your problem you need something like this:
import Control.Concurrent.Async
processActions :: [IO a] -> IO ()
processActions actions = do
-- Execute all the actions concurrently and
-- get a list of asyncs (an equivalent to futures):
asyncs <- mapM async actions
-- Do whatever you want...
-- ...
-- Block until all the asyncs complete:
mapM_ wait asyncs
However the library provides a lot of other patterns, so you should just check it out. There's a possibility that what you need is the mapConcurrently function.
No problems. Just use the async library and run mapM async someListOfJobs to get a list of Asyncs on which you can wait, poll, and many other operations.
I'm finding myself trying to write something like this:
main = do t1 <- forkIO (forever io)
t2 <- forkIO (forever io)
forever io
`finally` traverse_ killThread [t1,t2]
But t1 and t2 can't be accessed in finally because it's outside the monad.
Since here the IO actions are run forever, my main concern is giving the threads a chance to exit cleanly in the event of a user interruption or an IOException in the last IO action.
I'm aware that packages like async and threads are great for this, but is it possible to do this easily with the basic concurrency primitives?
BTW, it'd be nice to have the runtime automatically send killThread to all child threads. When wouldn't you want that?
Just realized that there is no problem including the finally in the monadic code block.
main = do t1 <- forkIO (forever io)
t2 <- forkIO (forever io)
forever io `finally` traverse_ killThread [t1,t2]
I'm not marking this question as answered in case someone spots something wrong with this.
You don't need to kill the child threads because the Haskell runtime kills them when the main thread terminates anyway. Normally, you write code in main to wait for the child threads to complete before completing itself.
I'm new to Haskell and I can't figure out what I'm not understanding about this example on the Haskell wiki: http://www.haskell.org/haskellwiki/Implement_a_chat_server
The specific code in question is this:
runConn :: (Socket, SockAddr) -> Chan Msg -> -> IO ()
runConn (sock, _) chan = do
let broadcast msg = writeChan chan msg
hdl <- socketToHandle sock ReadWriteMode
hSetBuffering hdl NoBuffering
chan' <- dupChan chan
-- fork off thread for reading from the duplicated channel
forkIO $ fix $ \loop -> do
line <- readChan chan'
hPutStrLn hdl line
loop
-- read lines from socket and echo them back to the user
fix $ \loop -> do
line <- liftM init (hGetLine hdl)
broadcast line
loop
The code above has one thread writing to the handle hdl at the same time (potentially) as another thread is reading from it. Is this safe?
I suspect the nature of forkIO (being internal to Haskell and not a system thread library or process) is what makes this work, but I'm not sure.
I checked the documentation of forkIO for any mention of IO handles
but found nothing. I also checked the documentation of System.IO but couldn't find any mention of using handles between threads without using locking.
So can someone tell me how I should know when something like this is safe when the docs don't mention anything about thread safety?
It's not the nature of forkIO that makes this works but the nature of MVar that is used to implement both Chan and Handle.
If you want to understand how Chan works take a look at this section "MVar as building blocks: Unbounded Channels" in chapter 7 of the excellent book "Parallel and Concurrent Programming in Haskell" by Simon Marlow. In the same chapter there is a section about forkIO and MVar that will help you understand how Handle can be implemented in a thread safe way.
Chapter 12 talks specifically about various ways to implement network servers, including a chat server that is implemented using STM instead of Chans.
If it wasn't safe, blocking sockets would be almost impossible to use. If your protocol is asynchronous and you're using blocking sockets, you need a thread blocking on read pretty much all the time. If you then needed to send a message to the other side, how could you do it? Wait for the other side to send you a message?