How would you collect the results of a list of Async a in Haskell as they become available? The idea is to start processing the results of asynchronous tasks as soon as they are available.
The best I could come up with is the following function:
collect :: [Async a] -> IO [a]
collect [] = return []
collect asyncs = do
(a, r) <- waitAny asyncs
rs <- collect (filter (/= a) asyncs)
return (r:rs)
However, this function does not exhibits the desired behavior since, as pointed out in the comment below, it doesn't return till all the asynchronous tasks are completed. Furthermore, collect runs in O(n^2) since I'm filtering the list at each recursive step. This could be improved by using a more efficient structure (and maybe indexing the position of the Async values in the list).
Maybe there are library functions that take care of this, but I could not find them in the Control.Concurrent.Async module and I wonder why.
EDIT: after thinking the problem a bit more carefully, I'm wondering whether such function is a good idea. I could just use fmap on the asynchronous tasks. Maybe it is a better practice to wait for the results when there is no other choice.
As I mentioned in my other answer, streaming results out of a list of Asyncs as they become available is best achieved using a stream processing library. Here's an example using pipes.
import Control.Concurrent (threadDelay)
import Control.Concurrent.Async
import Control.Concurrent.STM
import Data.Functor (($>))
import Pipes
import Pipes.Concurrent -- from the pipes-concurrency package
import qualified Pipes.Prelude as P
asCompleted :: MonadIO m => [Async a] -> Producer a m ()
asCompleted asyncs = do
(o, i, seal) <- liftIO $ spawn' unbounded
liftIO $ forkIO $ do
forConcurrently asyncs (\async -> atomically $ waitSTM async >>= send o)
atomically seal
fromInput i
main = do
actions <- traverse async [threadDelay 2000000 $> "bar", threadDelay 1000000 $> "foo"]
runEffect $ asCompleted actions >-> P.print
-- after one second, prints "foo", then "bar" a second later
Using pipes-concurrency, we spawn' an Output-Input pair and immediately convert the Input to a Producer using fromInput. Asynchronously, we send items as they become available. When all the Asyncs have completed we seal the inbox to close down the Producer.
Implemented via TChan, additionally implemented a version which can react immediately, but it is more complex and also might have problems with exceptions (if you want to receive exceptions, use SlaveThread.fork instead of forkIO), so I commented that code in case you're not interested in it:
import Control.Concurrent (threadDelay)
import Control.Concurrent (forkIO)
import Control.Concurrent.Async
import Control.Concurrent.STM
import Control.Monad
collect :: [Async a] -> IO [a]
collect = atomically . collectSTM
collectSTM :: [Async a] -> STM [a]
collectSTM as = do
c <- newTChan
collectSTMChan c as
collectSTMChan :: TChan a -> [Async a] -> STM [a]
collectSTMChan chan as = do
mapM_ (waitSTM >=> writeTChan chan) as
replicateM (length as) (readTChan chan)
main :: IO ()
main = do
a1 <- async (threadDelay 2000000 >> putStrLn "slept 2 secs" >> return 2)
a2 <- async (threadDelay 3000000 >> putStrLn "slept 3 secs" >> return 3)
a3 <- async (threadDelay 1000000 >> putStrLn "slept 1 sec" >> return 1)
res <- collect [a1,a2,a3]
putStrLn (show res)
-- -- reacting immediately
-- a1 <- async (threadDelay 2000000 >> putStrLn "slept 2 secs" >> return 2)
-- a2 <- async (threadDelay 3000000 >> putStrLn "slept 3 secs" >> return 3)
-- a3 <- async (threadDelay 1000000 >> putStrLn "slept 1 sec" >> return 1)
-- c <- collectChan [a1,a2,a3]
-- replicateM_ 3 (atomically (readTChan c) >>= \v -> putStrLn ("Received: " ++ show v))
-- collectChan :: [Async a] -> IO (TChan a)
-- collectChan as = do
-- c <- newTChanIO
-- forM_ as $ \a -> forkIO ((atomically . (waitSTM >=> writeTChan c)) a)
-- return c
I'm reading your question as "is it possible to sort a list of Asyncs by their completion time?". If that's what you meant, the answer is yes.
import Control.Applicative (liftA2)
import Control.Concurrent (threadDelay)
import Control.Concurrent.Async
import Data.Functor (($>))
import Data.List (sortBy)
import Data.Ord (comparing)
import Data.Time (getCurrentTime)
sortByCompletion :: [Async a] -> IO [a]
sortByCompletion = fmap (fmap fst . sortBy (comparing snd)) . mapConcurrently withCompletionTime
where withCompletionTime async = liftA2 (,) (wait async) getCurrentTime
main = do
asyncs <- traverse async [threadDelay 2000000 $> "bar", threadDelay 1000000 $> "foo"]
sortByCompletion asyncs
-- ["foo", "bar"], after two seconds
Using mapConcurrently we wait for each Async on a separate thread. Upon completion we get the current time - the time at which the Async completed - and use it to sort the results. This is O(n log n) complexity because we are sorting the list. (Your original algorithm was effectively a selection sort.)
Like your collect, sortByCompletion doesn't return until all the Asyncs in the list have completed. If you wanted to stream results onto the main thread as they become available, well, lists aren't a very good tool for that. I'd use a streaming abstraction like conduit or pipes, or, working at a lower level, a TQueue. See my other answer for an example.
Is it possible to create pipes that get all values that have been sent downstream in a certain time period? I'm implementing a server where the protocol allows me to concatenate outgoing packets and compress them together, so I'd like to effectively "empty out" the queue of downstream ByteStrings every 100ms and mappend them together to then yield on to the next pipe which does the compression.
Here's a solution using pipes-concurrency. You give it any Input and it will periodically drain the input of all values:
import Control.Applicative ((<|>))
import Control.Concurrent (threadDelay)
import Data.Foldable (forM_)
import Pipes
import Pipes.Concurrent
drainAll :: Input a -> STM (Maybe [a])
drainAll i = do
ma <- recv i
case ma of
Nothing -> return Nothing
Just a -> loop (a:)
where
loop diffAs = do
ma <- recv i <|> return Nothing
case ma of
Nothing -> return (Just (diffAs []))
Just a -> loop (diffAs . (a:))
bucketsEvery :: Int -> Input a -> Producer [a] IO ()
bucketsEvery microseconds i = loop
where
loop = do
lift $ threadDelay microseconds
ma <- lift $ atomically $ drainAll i
forM_ ma $ \a -> do
yield a
loop
This gives you much greater control over how you consume elements from upstream, by selecting the type of Buffer you use to build the Input.
If you're new to pipes-concurrency, you can read the tutorial which explains how to use spawn, Buffer and Input.
Here is a possible solution. It is based on a Pipe that tags ByteStrings going downstream with a Bool, in order to identify ByteStrings belonging to the same "time bucket".
First, some imports:
import Data.AdditiveGroup
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as BL
import qualified Data.ByteString.Lazy.Builder as BB
import Data.Thyme.Clock
import Data.Thyme.Clock.POSIX
import Control.Monad.State.Strict
import Control.Lens (view)
import Control.Concurrent (threadDelay)
import Pipes
import Pipes.Lift
import qualified Pipes.Prelude as P
import qualified Pipes.Group as PG
Here is the tagging Pipe. It uses StateT internally:
tagger :: Pipe B.ByteString (B.ByteString,Bool) IO ()
tagger = do
startTime <- liftIO getPOSIXTime
evalStateP (startTime,False) $ forever $ do
b <- await
currentTime <- liftIO getPOSIXTime
-- (POSIXTime,Bool) inner state
(baseTime,tag) <- get
if (currentTime ^-^ baseTime > timeLimit)
then let tag' = not tag in
yield (b,tag') >> put (currentTime, tag')
else yield $ (b,tag)
where
timeLimit = fromSeconds 0.1
Then we can use functions from the pipes-group package to group ByteStrings belonging to the same "time bucket" into lazy ByteStrings:
batch :: Producer B.ByteString IO () -> Producer BL.ByteString IO ()
batch producer = PG.folds (<>) mempty BB.toLazyByteString
. PG.maps (flip for $ yield . BB.byteString . fst)
. view (PG.groupsBy $ \t1 t2-> snd t1 == snd t2)
$ producer >-> tagger
It seems to batch correctly. This program:
main :: IO ()
main = do
count <- P.length $ batch (yield "boo" >> yield "baa")
putStrLn $ show count
count <- P.length $ batch (yield "boo" >> yield "baa"
>> liftIO (threadDelay 200000) >> yield "ddd")
putStrLn $ show count
Has the output:
1
2
Notice that the contents of a "time bucket" are only yielded when the first element of the next bucket arrives. They are not yielded automatically each 100ms. This may or may not be a problem for you. It you want to yield automatically each 100ms, you would need a different solution, possibly based on pipes-concurrency.
Also, you could consider working directly with the FreeT-based "effectul lists" provided by pipes-group. That way you could start compressing the data in a "time bucket" before the bucket is full.
So unlike Daniel's answer my does not tag the data as it is produced. It just takes at least element from upstream and then continues to aggregate more values in the monoid until the time interval has passed.
This codes uses a list to aggregate, but there are better monoids to aggregate with
import Pipes
import qualified Pipes.Prelude as P
import Data.Time.Clock
import Data.Time.Calendar
import Data.Time.Format
import Data.Monoid
import Control.Monad
-- taken from pipes-rt
doubleToNomDiffTime :: Double -> NominalDiffTime
doubleToNomDiffTime x =
let d0 = ModifiedJulianDay 0
t0 = UTCTime d0 (picosecondsToDiffTime 0)
t1 = UTCTime d0 (picosecondsToDiffTime $ floor (x/1e-12))
in diffUTCTime t1 t0
-- Adapted from from pipes-parse-1.0
wrap
:: Monad m =>
Producer a m r -> Producer (Maybe a) m r
wrap p = do
p >-> P.map Just
forever $ yield Nothing
yieldAggregateOverTime
:: (Monoid y, -- monoid dependance so we can do aggregation
MonadIO m -- to beable to get the current time the
-- base monad must have access to IO
) =>
(t -> y) -- Change element from upstream to monoid
-> Double -- Time in seconds to aggregate over
-> Pipe (Maybe t) y m ()
yieldAggregateOverTime wrap period = do
t0 <- liftIO getCurrentTime
loop mempty (dtUTC `addUTCTime` t0)
where
dtUTC = doubleToNomDiffTime period
loop m ts = do
t <- liftIO getCurrentTime
v0 <- await -- await at least one element
case v0 of
Nothing -> yield m
Just v -> do
if t > ts
then do
yield (m <> wrap v)
loop mempty (dtUTC `addUTCTime` ts)
else do
loop (m <> wrap v) ts
main = do
runEffect $ wrap (each [1..]) >-> yieldAggregateOverTime (\x -> [x]) (0.0001)
>-> P.take 10 >-> P.print
Depending on cpu load you the output data will be aggregated differently. With at least on element in each chunk.
$ ghc Main.hs -O2
$ ./Main
[1,2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
$ ./Main
[1,2]
[3]
[4]
[5]
[6,7,8,9,10]
[11,12,13,14,15,16,17,18]
[19,20,21,22,23,24,25,26]
[27,28,29,30,31,32,33,34]
[35,36,37,38,39,40,41,42]
[43,44,45,46,47,48,49,50]
$ ./Main
[1,2,3,4,5,6]
[7]
[8]
[9,10,11,12,13,14,15,16,17,18,19,20]
[21,22,23,24,25,26,27,28,29,30,31,32,33]
[34,35,36,37,38,39,40,41,42,43,44]
[45,46,47,48,49,50,51,52,53,54,55]
[56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72]
[73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88]
[89,90,91,92,93,94,95,96,97,98,99,100,101,102,103]
$ ./Main
[1,2,3,4,5,6,7]
[8]
[9]
[10,11,12,13,14,15,16,17,18]
[19,20,21,22,23,24,25,26,27]
[28,29,30,31,32,33,34,35,36,37]
[38,39,40,41,42,43,44,45,46]
[47,48,49,50]
[51,52,53,54,55,56,57]
[58,59,60,61,62,63,64,65,66]
You might want to look at the source code of
pipes-rt it shows one approach to deal with time in pipes.
edit: Thanks to Daniel Díaz Carrete, adapted pipes-parse-1.0 technique to handle upstream termination. A pipes-group solution should be possible using the same technique as well.
I'm following this tutorial and looking at the test case in source code.
My code use SimplePool.hs in the source code and created the following file: (snippet)
sampleTask :: (TimeInterval, String) -> Process String
sampleTask (t, s) = sleep t >> return s
$(remotable ['sampleTask])
jobTest :: MVar (AsyncResult (Either String String)) -> Process ()
jobTest result = do
pid <- startTestPool 1 -- start the pool of workers here only one worker
job <- return $ ($(mkClosure 'sampleTask) (seconds 2, "foobar"))
-- callAsync put job into pool
p <- callAsync pid job
a <- wait p
setResult result a
where
setResult :: MVar a -> a -> Process ()
setResult mvar x = liftIO $ putMVar mvar x
startTestPool :: Int -> Process ProcessId
startTestPool s = spawnLocal $ do
_ <- runPool s
return ()
runPool :: Int -> Process (Either (InitResult (Pool String)) TerminateReason)
runPool s =
-- setting a to String
let s' = poolServer :: ProcessDefinition (Pool String)
in simplePool s s'
myRemoteTable :: RemoteTable
myRemoteTable = Control.Distributed.Process.Platform.__remoteTable initRemoteTable
main :: IO ()
main = do
Right (transport, _) <- createTransportExposeInternals
"127.0.0.1" "9901" defaultTCPParameters
localNode <- newLocalNode transport myRemoteTable
result <- newEmptyMVar
pid <- forkProcess localNode $ jobTest result
ans <- takeMVar result
putStrLn $ show pid
putStrLn $ show ans
I'm getting this error once I run it:
AsyncFailed (DiedException "exit-from=pid://127.0.0.1:9901:0:6")
Correct me if I'm wrong, I assume the job did not run correctly, must be some problem with the slave process.p <- callAsync pid job this line of code I think is where the task is passed on to slave process for execution. I looked into the library to find the definition of callAsync. The key line in callAsyncUsing is sendTo sid (CallMessage msg (Pid wpid)) where the function passes the task to the poolServer.
SimplePool.hs in the acceptTask the line asyncHandle <- async proc is where I think they spawn a new process to execute the task. So I think maybe the async process didn't finish running cause the the caller terminated prematurely? Or could it be that the process didn't spawn correctly? Any idea on what the best way to debug this? Also, can someone point me in the right direction to finding out how to make the poolSever span different nodes/different computers (using Control.Distributed.Process.Platform.Async.AsyncChan?)?
I have modified your code slightly, and this snippet includes the imports, so it compiles. Make sure that you are using the latest SimplePool module, as your code is using simplePool which I cannot find, and your use of runPool is ambiguous.
{-# LANGUAGE TemplateHaskell #-}
import Control.Concurrent.MVar
import Control.Exception (SomeException)
import Control.Distributed.Process hiding (call)
import Control.Distributed.Process.Closure
import Control.Distributed.Process.Node
import Control.Distributed.Process.Platform hiding (__remoteTable)
import Control.Distributed.Process.Platform.Async
import Control.Distributed.Process.Platform.ManagedProcess
import Control.Distributed.Process.Platform.Test
import Control.Distributed.Process.Platform.Time
import Control.Distributed.Process.Platform.Timer
import Control.Distributed.Process.Serializable()
import Network.Transport
import Network.Transport.TCP
import Data.Binary
import Data.Typeable (Typeable)
import SimplePool hiding (runPool)
import qualified SimplePool (runPool)
sampleTask :: (TimeInterval, String) -> Process String
sampleTask (t, s) = sleep t >> return s
$(remotable ['sampleTask])
jobTest :: MVar (AsyncResult (Either String String)) -> Process ()
jobTest result = do
pid <- startTestPool 1 -- start the pool of workers here only one worker
let job = $(mkClosure 'sampleTask) (seconds 2, "foobar")
-- callAsync put job into pool
p <- callAsync pid job
a <- wait p
setResult result a
where
setResult :: MVar a -> a -> Process ()
setResult mvar x = liftIO $ putMVar mvar x
startTestPool :: Int -> Process ProcessId
startTestPool s = spawnLocal $ do
_ <- runPool s
return ()
runPool :: Int -> Process (Either (InitResult (Pool String)) TerminateReason)
runPool = SimplePool.runPool
myRemoteTable :: RemoteTable
myRemoteTable = Main.__remoteTable initRemoteTable
main :: IO ()
main = do
Right (transport, _) <- createTransportExposeInternals
"127.0.0.1" "9901" defaultTCPParameters
localNode <- newLocalNode transport myRemoteTable
result <- newEmptyMVar
pid <- forkProcess localNode $ jobTest result
ans <- takeMVar result
print pid >> print ans
Running this compilable code:
$ ./Example
pid://127.0.0.1:9901:0:3
AsyncDone (Right "foobar")
Please note that the SimplePool sample module from the distributed-process-platform test suite has been promoted to a fully fledged component of the library. Its new location on the latest (development) branch is https://github.com/haskell-distributed/distributed-process-platform/blob/development/src/Control/Distributed/Process/Platform/Task/Queue/BlockingQueue.hs.
Some names/types have changed, so you may need to update your code in order to continue using it.
how could one implement a function in concurrent haskell that either returns 'a' successfully or due to timeout 'b'?
timed :: Int → IO a → b → IO (Either a b)
timed max act def = do
Best Regards,
Cetin SertNote: the signature of timed can be completely or slightly different.
Implementing your desired timed on top of System.Timeout.timeout is easy:
import System.Timeout (timeout)
timed :: Int -> IO a -> b -> IO (Either b a)
timed us act def = liftM (maybe (Left def) Right) (timeout us act)
By the way, the common implementation of timeout is closer to this: ($! = seq to try to force evaluation of the returned value in the thread rather than only returning a thunk):
import Control.Concurrent (forkIO, threadDelay, killThread)
import Control.Concurrent.MVar (newEmptyMVar, putMVar, takeMVar)
import System.IO (hPrint, stderr)
timeout :: Int -> IO a -> IO (Maybe a)
timeout us act = do
mvar <- newEmptyMVar
tid1 <- forkIO $ (putMVar mvar . Just $!) =<< act
tid2 <- forkIO $ threadDelay us >> putMVar mvar Nothing
res <- takeMVar mvar
killThread (maybe tid1 (const tid2) res) `catch` hPrint stderr
return res
The implementation of System.Timeout.timeout in the libraries is a little more complex, handling more exceptional cases.
import Control.Concurrent (forkIO, threadDelay, myThreadId, killThread)
import Control.Exception (Exception, handleJust, throwTo, bracket)
import Data.Typeable
import Data.Unique (Unique, newUnique)
data Timeout = Timeout Unique deriving Eq
timeoutTc :: TyCon
timeoutTc = mkTyCon "Timeout"
instance Typeable Timeout where { typeOf _ = mkTyConApp timeoutTc [] }
instance Show Timeout where
show _ = "<<timeout>>"
instance Exception Timeout
timeout n f
| n < 0 = fmap Just f
| n == 0 = return Nothing
| otherwise = do
pid <- myThreadId
ex <- fmap Timeout newUnique
handleJust (\e -> if e == ex then Just () else Nothing)
(\_ -> return Nothing)
(bracket (forkIO (threadDelay n >> throwTo pid ex))
(killThread)
(\_ -> fmap Just f))
Here's the first answer I could come up with. I needed this for a port scanner. o_O Forgot the admin password of my router and wanted to check which ports I had opened on our home server before that I could now repurpose and reuse ^ _ ^" ... This implementation should do the job for the time being.
module Control.Concurrent.Timed (timed) where
import Prelude hiding (take)
import System.IO
import Control.Monad
import System.Process
import System.Timeout
import Control.Concurrent
import System.Environment
timed :: Int → IO a → b → IO (Either b a)
timed max act def = do
w ← new
r ← new
t ← forkIO $ do
a ← act
r ≔ Right a
e ← em w
case e of
False → kill =<< take w
True → return ()
s ← forkIO $ do
(w ≔) =<< mine
wait max
e ← em r
case e of
True → do
kill t
r ≔ Left def
False → return ()
take r
timed_ :: Int → IO a → a → IO a
timed_ max act def = do
r ← timed max act def
return $ case r of
Right a → a
Left a → a
(≔) = putMVar
new = newEmptyMVar
wait = threadDelay
em = isEmptyMVar
kill = killThread
mine = myThreadId
take = takeMVar
or just use System.Timeout.timeout -__-"