Collecting the Async results as they become available - multithreading

How would you collect the results of a list of Async a in Haskell as they become available? The idea is to start processing the results of asynchronous tasks as soon as they are available.
The best I could come up with is the following function:
collect :: [Async a] -> IO [a]
collect [] = return []
collect asyncs = do
(a, r) <- waitAny asyncs
rs <- collect (filter (/= a) asyncs)
return (r:rs)
However, this function does not exhibits the desired behavior since, as pointed out in the comment below, it doesn't return till all the asynchronous tasks are completed. Furthermore, collect runs in O(n^2) since I'm filtering the list at each recursive step. This could be improved by using a more efficient structure (and maybe indexing the position of the Async values in the list).
Maybe there are library functions that take care of this, but I could not find them in the Control.Concurrent.Async module and I wonder why.
EDIT: after thinking the problem a bit more carefully, I'm wondering whether such function is a good idea. I could just use fmap on the asynchronous tasks. Maybe it is a better practice to wait for the results when there is no other choice.

As I mentioned in my other answer, streaming results out of a list of Asyncs as they become available is best achieved using a stream processing library. Here's an example using pipes.
import Control.Concurrent (threadDelay)
import Control.Concurrent.Async
import Control.Concurrent.STM
import Data.Functor (($>))
import Pipes
import Pipes.Concurrent -- from the pipes-concurrency package
import qualified Pipes.Prelude as P
asCompleted :: MonadIO m => [Async a] -> Producer a m ()
asCompleted asyncs = do
(o, i, seal) <- liftIO $ spawn' unbounded
liftIO $ forkIO $ do
forConcurrently asyncs (\async -> atomically $ waitSTM async >>= send o)
atomically seal
fromInput i
main = do
actions <- traverse async [threadDelay 2000000 $> "bar", threadDelay 1000000 $> "foo"]
runEffect $ asCompleted actions >-> P.print
-- after one second, prints "foo", then "bar" a second later
Using pipes-concurrency, we spawn' an Output-Input pair and immediately convert the Input to a Producer using fromInput. Asynchronously, we send items as they become available. When all the Asyncs have completed we seal the inbox to close down the Producer.

Implemented via TChan, additionally implemented a version which can react immediately, but it is more complex and also might have problems with exceptions (if you want to receive exceptions, use SlaveThread.fork instead of forkIO), so I commented that code in case you're not interested in it:
import Control.Concurrent (threadDelay)
import Control.Concurrent (forkIO)
import Control.Concurrent.Async
import Control.Concurrent.STM
import Control.Monad
collect :: [Async a] -> IO [a]
collect = atomically . collectSTM
collectSTM :: [Async a] -> STM [a]
collectSTM as = do
c <- newTChan
collectSTMChan c as
collectSTMChan :: TChan a -> [Async a] -> STM [a]
collectSTMChan chan as = do
mapM_ (waitSTM >=> writeTChan chan) as
replicateM (length as) (readTChan chan)
main :: IO ()
main = do
a1 <- async (threadDelay 2000000 >> putStrLn "slept 2 secs" >> return 2)
a2 <- async (threadDelay 3000000 >> putStrLn "slept 3 secs" >> return 3)
a3 <- async (threadDelay 1000000 >> putStrLn "slept 1 sec" >> return 1)
res <- collect [a1,a2,a3]
putStrLn (show res)
-- -- reacting immediately
-- a1 <- async (threadDelay 2000000 >> putStrLn "slept 2 secs" >> return 2)
-- a2 <- async (threadDelay 3000000 >> putStrLn "slept 3 secs" >> return 3)
-- a3 <- async (threadDelay 1000000 >> putStrLn "slept 1 sec" >> return 1)
-- c <- collectChan [a1,a2,a3]
-- replicateM_ 3 (atomically (readTChan c) >>= \v -> putStrLn ("Received: " ++ show v))
-- collectChan :: [Async a] -> IO (TChan a)
-- collectChan as = do
-- c <- newTChanIO
-- forM_ as $ \a -> forkIO ((atomically . (waitSTM >=> writeTChan c)) a)
-- return c

I'm reading your question as "is it possible to sort a list of Asyncs by their completion time?". If that's what you meant, the answer is yes.
import Control.Applicative (liftA2)
import Control.Concurrent (threadDelay)
import Control.Concurrent.Async
import Data.Functor (($>))
import Data.List (sortBy)
import Data.Ord (comparing)
import Data.Time (getCurrentTime)
sortByCompletion :: [Async a] -> IO [a]
sortByCompletion = fmap (fmap fst . sortBy (comparing snd)) . mapConcurrently withCompletionTime
where withCompletionTime async = liftA2 (,) (wait async) getCurrentTime
main = do
asyncs <- traverse async [threadDelay 2000000 $> "bar", threadDelay 1000000 $> "foo"]
sortByCompletion asyncs
-- ["foo", "bar"], after two seconds
Using mapConcurrently we wait for each Async on a separate thread. Upon completion we get the current time - the time at which the Async completed - and use it to sort the results. This is O(n log n) complexity because we are sorting the list. (Your original algorithm was effectively a selection sort.)
Like your collect, sortByCompletion doesn't return until all the Asyncs in the list have completed. If you wanted to stream results onto the main thread as they become available, well, lists aren't a very good tool for that. I'd use a streaming abstraction like conduit or pipes, or, working at a lower level, a TQueue. See my other answer for an example.

Related

Using IO inside the ST monad and runSTUArray

I'm just starting to experiment with monad transformers so I may have missed something trivial. Anyway: how can I print from inside the ST monad?
Example code: I want to create code that is almost as fast as C, by reading and writing as quickly as possible to memory, with C-like for loops. I have an ST monad action that safely mutates an unboxed array that I run with runSTUArray.
My question is, how can I use IO actions inside the ST action? If it was a State monad action, I could use the StateT transformer to lift the IO actions to, but how is that done for the ST monad?
Example: ST. -- how can I print form inside the ST monad?
import Control.Monad.ST (ST)
import Data.Array.Base ( STUArray(STUArray), newArray, readArray, writeArray )
import Data.Array.ST (runSTUArray)
import Data.Array.Unboxed ( elems )
import Control.Monad (forM_)
test :: IO ()
test = print . elems $ runSTUArray action
where
action :: ST s (STUArray s Int Int)
action = do
arr <- newArray (1,10) 1
forM_ [3..10] $ \i -> do
-- liftIO . print $ "i is " ++ show i. --- <--- How should I do this?
x1 <- readArray arr (i-1)
x2 <- readArray arr (i-2)
writeArray arr i (x1+x2)
return arr
Example: StateT, -- here it is possible to lift the print to use it inside the monad/
import Data.Array.IArray (listArray, (!), (//), elems)
import Data.Array (Array)
import Control.Monad.Trans.State.Strict (StateT (runStateT), get, put)
import Control.Monad.IO.Class (MonadIO(liftIO))
import Control.Monad (forM_)
test :: IO ()
test = do
let n = listArray (1,10) [1..] :: Array Int Int
(_,x) <- runStateT action n
print $ elems x
return ()
where action = do
forM_ [3..10] $ \i -> do
x <- get
liftIO . print $ "i is " ++ show i. -- <--- here printing works fine
put (x // [(i, x!(i-1) + x!(i-2))])
You don't print from an ST action, you print from an IO action. Luckily for you, there are IOUArrays -- and they are even STUArrays under the hood, so there can be no fear of performance lost.

Haskell: function that gives next element of a list

I was wondering if it is possible in Haskell to define a function which upon calling gives the next element of an (infinite) list, so for example:
Prelude> func
1
Prelude> func
2
Is it possible to have such a function in Haskell, and if there is, can you give me an example?
You could do a Stateful thing like this:
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.State
import Data.List
import Data.Maybe
-- This is not a function! The misleading name func comes from the question text.
func :: MonadState [a] m => m a
func = state (fromJust . uncons)
exampleUsage :: State [Int] (Int, Int)
exampleUsage = do
x <- func
y <- func
return (x, y)
You can try it in ghci:
> evalState exampleUsage [1..]
(1, 2)
However, at a high level, I would suggest rethinking your requirements. func is not very idiomatic at all; simply working with the infinite list directly is generally going to be much clearer, have lower (syntactic) overhead, and lead to better generated code. For example:
exampleUsage' :: [a] -> (a, a)
exampleUsage' (x:y:_) = (x,y)
N.B. this is two lines of code with no extensions or imports, compared to the previous 11 lines of code including a language extension and three imports. Usage is also simplified; you can drop the call to evalState entirely and be done.
> exampleUsage' [1..]
(1, 2)
You can use mutable references and the IO monad (or other stateful monad). This can be made rather pretty via partial application:
Prelude> import Data.IORef
Prelude Data.IORef> ref <- newIORef 0
Prelude Data.IORef> let func = readIORef ref >>= \r -> writeIORef ref (r+1) >> return r
Prelude Data.IORef> func
0
Prelude Data.IORef> func
1
Or closer to what you requested:
Prelude Data.IORef> ref2 <- newIORef [0..]
Prelude Data.IORef> let func2 = readIORef ref2 >>= \(x:xs) -> writeIORef ref2 xs >> return x
Prelude Data.IORef> func2
0
Prelude Data.IORef> func2
1
It sounds like you are looking for something like other languages' Iterator or Generator constructs. If so, this seems like a good use case for the conduit library. Note that there are options (e.g. pipes); however, conduit may be a good starting point for you.
If you are trying to operate only over lists, using the State Monad may be a simpler answer (as Daniel suggests); however, if you are looking for a more general solution, conduit (or the like) may indeed be the answer.
The func you are searching for is therefore most likely the await function.
Here's a simple example -
import Prelude
import Conduit
import Data.MonoTraversable
main :: IO ()
main = runConduit (source .| consume) >>= print
source :: Monad m => Producer m (Element [Integer])
source = yieldMany [0..]
consume :: Monad m => ConduitM i o m (Maybe (i, i))
consume = do
mx <- await
my <- await
return $ (,) <$> mx <*> my
And its output -
λ main
Just (0,1)

Limiting pipes based on time?

Is it possible to create pipes that get all values that have been sent downstream in a certain time period? I'm implementing a server where the protocol allows me to concatenate outgoing packets and compress them together, so I'd like to effectively "empty out" the queue of downstream ByteStrings every 100ms and mappend them together to then yield on to the next pipe which does the compression.
Here's a solution using pipes-concurrency. You give it any Input and it will periodically drain the input of all values:
import Control.Applicative ((<|>))
import Control.Concurrent (threadDelay)
import Data.Foldable (forM_)
import Pipes
import Pipes.Concurrent
drainAll :: Input a -> STM (Maybe [a])
drainAll i = do
ma <- recv i
case ma of
Nothing -> return Nothing
Just a -> loop (a:)
where
loop diffAs = do
ma <- recv i <|> return Nothing
case ma of
Nothing -> return (Just (diffAs []))
Just a -> loop (diffAs . (a:))
bucketsEvery :: Int -> Input a -> Producer [a] IO ()
bucketsEvery microseconds i = loop
where
loop = do
lift $ threadDelay microseconds
ma <- lift $ atomically $ drainAll i
forM_ ma $ \a -> do
yield a
loop
This gives you much greater control over how you consume elements from upstream, by selecting the type of Buffer you use to build the Input.
If you're new to pipes-concurrency, you can read the tutorial which explains how to use spawn, Buffer and Input.
Here is a possible solution. It is based on a Pipe that tags ByteStrings going downstream with a Bool, in order to identify ByteStrings belonging to the same "time bucket".
First, some imports:
import Data.AdditiveGroup
import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as BL
import qualified Data.ByteString.Lazy.Builder as BB
import Data.Thyme.Clock
import Data.Thyme.Clock.POSIX
import Control.Monad.State.Strict
import Control.Lens (view)
import Control.Concurrent (threadDelay)
import Pipes
import Pipes.Lift
import qualified Pipes.Prelude as P
import qualified Pipes.Group as PG
Here is the tagging Pipe. It uses StateT internally:
tagger :: Pipe B.ByteString (B.ByteString,Bool) IO ()
tagger = do
startTime <- liftIO getPOSIXTime
evalStateP (startTime,False) $ forever $ do
b <- await
currentTime <- liftIO getPOSIXTime
-- (POSIXTime,Bool) inner state
(baseTime,tag) <- get
if (currentTime ^-^ baseTime > timeLimit)
then let tag' = not tag in
yield (b,tag') >> put (currentTime, tag')
else yield $ (b,tag)
where
timeLimit = fromSeconds 0.1
Then we can use functions from the pipes-group package to group ByteStrings belonging to the same "time bucket" into lazy ByteStrings:
batch :: Producer B.ByteString IO () -> Producer BL.ByteString IO ()
batch producer = PG.folds (<>) mempty BB.toLazyByteString
. PG.maps (flip for $ yield . BB.byteString . fst)
. view (PG.groupsBy $ \t1 t2-> snd t1 == snd t2)
$ producer >-> tagger
It seems to batch correctly. This program:
main :: IO ()
main = do
count <- P.length $ batch (yield "boo" >> yield "baa")
putStrLn $ show count
count <- P.length $ batch (yield "boo" >> yield "baa"
>> liftIO (threadDelay 200000) >> yield "ddd")
putStrLn $ show count
Has the output:
1
2
Notice that the contents of a "time bucket" are only yielded when the first element of the next bucket arrives. They are not yielded automatically each 100ms. This may or may not be a problem for you. It you want to yield automatically each 100ms, you would need a different solution, possibly based on pipes-concurrency.
Also, you could consider working directly with the FreeT-based "effectul lists" provided by pipes-group. That way you could start compressing the data in a "time bucket" before the bucket is full.
So unlike Daniel's answer my does not tag the data as it is produced. It just takes at least element from upstream and then continues to aggregate more values in the monoid until the time interval has passed.
This codes uses a list to aggregate, but there are better monoids to aggregate with
import Pipes
import qualified Pipes.Prelude as P
import Data.Time.Clock
import Data.Time.Calendar
import Data.Time.Format
import Data.Monoid
import Control.Monad
-- taken from pipes-rt
doubleToNomDiffTime :: Double -> NominalDiffTime
doubleToNomDiffTime x =
let d0 = ModifiedJulianDay 0
t0 = UTCTime d0 (picosecondsToDiffTime 0)
t1 = UTCTime d0 (picosecondsToDiffTime $ floor (x/1e-12))
in diffUTCTime t1 t0
-- Adapted from from pipes-parse-1.0
wrap
:: Monad m =>
Producer a m r -> Producer (Maybe a) m r
wrap p = do
p >-> P.map Just
forever $ yield Nothing
yieldAggregateOverTime
:: (Monoid y, -- monoid dependance so we can do aggregation
MonadIO m -- to beable to get the current time the
-- base monad must have access to IO
) =>
(t -> y) -- Change element from upstream to monoid
-> Double -- Time in seconds to aggregate over
-> Pipe (Maybe t) y m ()
yieldAggregateOverTime wrap period = do
t0 <- liftIO getCurrentTime
loop mempty (dtUTC `addUTCTime` t0)
where
dtUTC = doubleToNomDiffTime period
loop m ts = do
t <- liftIO getCurrentTime
v0 <- await -- await at least one element
case v0 of
Nothing -> yield m
Just v -> do
if t > ts
then do
yield (m <> wrap v)
loop mempty (dtUTC `addUTCTime` ts)
else do
loop (m <> wrap v) ts
main = do
runEffect $ wrap (each [1..]) >-> yieldAggregateOverTime (\x -> [x]) (0.0001)
>-> P.take 10 >-> P.print
Depending on cpu load you the output data will be aggregated differently. With at least on element in each chunk.
$ ghc Main.hs -O2
$ ./Main
[1,2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
$ ./Main
[1,2]
[3]
[4]
[5]
[6,7,8,9,10]
[11,12,13,14,15,16,17,18]
[19,20,21,22,23,24,25,26]
[27,28,29,30,31,32,33,34]
[35,36,37,38,39,40,41,42]
[43,44,45,46,47,48,49,50]
$ ./Main
[1,2,3,4,5,6]
[7]
[8]
[9,10,11,12,13,14,15,16,17,18,19,20]
[21,22,23,24,25,26,27,28,29,30,31,32,33]
[34,35,36,37,38,39,40,41,42,43,44]
[45,46,47,48,49,50,51,52,53,54,55]
[56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72]
[73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88]
[89,90,91,92,93,94,95,96,97,98,99,100,101,102,103]
$ ./Main
[1,2,3,4,5,6,7]
[8]
[9]
[10,11,12,13,14,15,16,17,18]
[19,20,21,22,23,24,25,26,27]
[28,29,30,31,32,33,34,35,36,37]
[38,39,40,41,42,43,44,45,46]
[47,48,49,50]
[51,52,53,54,55,56,57]
[58,59,60,61,62,63,64,65,66]
You might want to look at the source code of
pipes-rt it shows one approach to deal with time in pipes.
edit: Thanks to Daniel Díaz Carrete, adapted pipes-parse-1.0 technique to handle upstream termination. A pipes-group solution should be possible using the same technique as well.

Haskell - Actor based mutability

I'm working on a haskell network application and I use the actor pattern to manage multithreading. One thing I came across is how to store for example a set of client sockets/handles. Which of course must be accessible for all threads and can change when clients log on/off.
Since I'm coming from the imperative world I thought about some kind of lock-mechanism but when I noticed how ugly this is I thought about "pure" mutability, well actually it's kind of pure:
import Control.Concurrent
import Control.Monad
import Network
import System.IO
import Data.List
import Data.Maybe
import System.Environment
import Control.Exception
newStorage :: (Eq a, Show a) => IO (Chan (String, Maybe (Chan [a]), Maybe a))
newStorage = do
q <- newChan
forkIO $ storage [] q
return q
newHandleStorage :: IO (Chan (String, Maybe (Chan [Handle]), Maybe Handle))
newHandleStorage = newStorage
storage :: (Eq a, Show a) => [a] -> Chan (String, Maybe (Chan [a]), Maybe a) -> IO ()
storage s q = do
let loop = (`storage` q)
(req, reply, d) <- readChan q
print ("processing " ++ show(d))
case req of
"add" -> loop ((fromJust d) : s)
"remove" -> loop (delete (fromJust d) s)
"get" -> do
writeChan (fromJust reply) s
loop s
store s d = writeChan s ("add", Nothing, Just d)
unstore s d = writeChan s ("remove", Nothing, Just d)
request s = do
chan <- newChan
writeChan s ("get", Just chan, Nothing)
readChan chan
The point is that a thread (actor) is managing a list of items and modifies the list according to incoming requests. Since thread are really cheap I thought this could be a really nice functional alternative.
Of course this is just a prototype (a quick dirty proof of concept).
So my question is:
Is this a "good" way of managing shared mutable variables (in the actor world) ?
Is there already a library for this pattern ? (I already searched but I found nothing)
Regards,
Chris
Here is a quick and dirty example using stm and pipes-network. This will set up a simple server that allows clients to connect and increment or decrement a counter. It will display a very simple status bar showing the current tallies of all connected clients and will remove client tallies from the bar when they disconnect.
First I will begin with the server, and I've generously commented the code to explain how it works:
import Control.Concurrent.STM (STM, atomically)
import Control.Concurrent.STM.TVar
import qualified Data.HashMap.Strict as H
import Data.Foldable (forM_)
import Control.Concurrent (forkIO, threadDelay)
import Control.Monad (unless)
import Control.Monad.Trans.State.Strict
import qualified Data.ByteString.Char8 as B
import Control.Proxy
import Control.Proxy.TCP
import System.IO
main = do
hSetBuffering stdout NoBuffering
{- These are the internal data structures. They should be an implementation
detail and you should never expose these references to the
"business logic" part of the application. -}
-- I use nRef to keep track of creating fresh Ints (which identify users)
nRef <- newTVarIO 0 :: IO (TVar Int)
{- hMap associates every user (i.e. Int) with a counter
Notice how I've "striped" the hash map by storing STM references to the
values instead of storing the values directly. This means that I only
actually write the hashmap when adding or removing users, which reduces
contention for the hash map.
Since each user gets their own unique STM reference for their counter,
modifying counters does not cause contention with other counters or
contention with the hash map. -}
hMap <- newTVarIO H.empty :: IO (TVar (H.HashMap Int (TVar Int)))
{- The following code makes heavy use of Haskell's pure closures. Each
'let' binding closes over its current environment, which is safe since
Haskell is pure. -}
let {- 'getCounters' is the only server-facing command in our STM API. The
only permitted operation is retrieving the current set of user
counters.
'getCounters' closes over the 'hMap' reference currently in scope so
that the server never needs to be aware about our internal
implementation. -}
getCounters :: STM [Int]
getCounters = do
refs <- fmap H.elems (readTVar hMap)
mapM readTVar refs
{- 'init' is the only client-facing command in our STM API. It
initializes the client's entry in the hash map and returns two
commands: the first command is what the client calls to 'increment'
their counter and the second command is what the client calls to log
off and delete
'delete' command.
Notice that those two returned commands each close over the client's
unique STM reference so the client never needs to be aware of how
exactly 'init' is implemented under the hood. -}
init :: STM (STM (), STM ())
init = do
n <- readTVar nRef
writeTVar nRef $! n + 1
ref <- newTVar 0
modifyTVar' hMap (H.insert n ref)
let incrementRef :: STM ()
incrementRef = do
mRef <- fmap (H.lookup n) (readTVar hMap)
forM_ mRef $ \ref -> modifyTVar' ref (+ 1)
deleteRef :: STM ()
deleteRef = modifyTVar' hMap (H.delete n)
return (incrementRef, deleteRef)
{- Now for the actual program logic. Everything past this point only uses
the approved STM API (i.e. 'getCounters' and 'init'). If I wanted I
could factor the above approved STM API into a separate module to enforce
the encapsulation boundary, but I am lazy. -}
{- Fork a thread which polls the current state of the counters and displays
it to the console. There is a way to implement this without polling but
this gets the job done for now.
Most of what it is doing is just some simple tricks to reuse the same
console line instead of outputting a stream of lines. Otherwise it
would be just:
forkIO $ forever $ do
ns <- atomically getCounters
print ns
-}
forkIO $ (`evalStateT` 0) $ forever $ do
del <- get
lift $ do
putStr (replicate del '\b')
putStr (replicate del ' ' )
putStr (replicate del '\b')
ns <- lift $ atomically getCounters
let str = show ns
lift $ putStr str
put $! length str
lift $ threadDelay 10000
{- Fork a thread for each incoming connection, which listens to the client's
commands and translates them into 'STM' actions -}
serve HostAny "8080" $ \(socket, _) -> do
(increment, delete) <- atomically init
{- Right now, just do the dumb thing and convert all keypresses into
increment commands, with the exception of the 'q' key, which will
quit -}
let handler :: (Proxy p) => () -> Consumer p Char IO ()
handler () = runIdentityP loop
where
loop = do
c <- request ()
unless (c == 'q') $ do
lift $ atomically increment
loop
{- This uses my 'pipes' library. It basically is a high-level way to
say:
* Read binary packets from the socket no bigger than 4096 bytes
* Get the first character from each packet and discard the rest
* Handle the character using the above 'handler' function -}
runProxy $ socketReadS 4096 socket >-> mapD B.head >-> handler
{- The above pipeline finishes either when the socket closes or
'handler' stops looping because it received a 'q'. Either case means
that the client is done so we log them out using 'delete'. -}
atomically delete
Next up is the client, which simply opens a connections and forwards all key presses as single packets:
import Control.Monad
import Control.Proxy
import Control.Proxy.Safe
import Control.Proxy.TCP.Safe
import Data.ByteString.Char8 (pack)
import System.IO
main = do
hSetBuffering stdin NoBuffering
hSetEcho stdin False
{- Again, this uses my 'pipes' library. It basically says:
* Read characters from the console using 'commands'
* Pack them into a binary format
* send them to a server running at 127.0.0.1:8080
This finishes looping when the user types a 'q' or the connection is
closed for whatever reason.
-}
runSafeIO $ runProxy $ runEitherK $
try . commands
>-> mapD (\c -> pack [c])
>-> connectWriteD Nothing "127.0.0.1" "8080"
commands :: (Proxy p) => () -> Producer p Char IO ()
commands () = runIdentityP loop
where
loop = do
c <- lift getChar
respond c
unless (c == 'q') loop
It's pretty simple: commands generates a stream of Chars, which then get converted to ByteStrings and then sent as packets to the server.
If you run the server and a few clients and have them each type in a few keys, your server display will output a list showing how many keys each client typed:
[1,6,4]
... and if some of the clients disconnect they will be removed from the list:
[1,4]
Note that the pipes component of these examples will simplify greatly in the upcoming pipes-4.0.0 release, but the current pipes ecosystem still gets the job done as is.
First, I'd definitely recommend using your own specific data type for representing commands. When using (String, Maybe (Chan [a]), Maybe a) a buggy client can crash your actor simply by sending an unknown command or by sending ("add", Nothing, Nothing), etc. I'd suggest something like
data Command a = Add a | Remove a | Get (Chan [a])
Then you can pattern match on commands in storage in a save way.
Actors have their advantages, but also I feel that they have some drawbacks. For example, getting an answer from an actor requires sending it a command and then waiting for a reply. And the client can't be completely sure that it gets a reply and that the reply will be of some specific type - you can't say I want only answers of this type (and how many of them) for this particular command.
So as an example I'll give a simple, STM solution. It'd be better to use a hash table or a (balanced tree) set, but since Handle implements neither Ord nor Hashable, we can't use these data structures, so I'll keep using lists.
module ThreadSet (
TSet, add, remove, get
) where
import Control.Monad
import Control.Monad.STM
import Control.Concurrent.STM.TVar
import Data.List (delete)
newtype TSet a = TSet (TVar [a])
add :: (Eq a) => a -> TSet a -> STM ()
add x (TSet v) = readTVar v >>= writeTVar v . (x :)
remove :: (Eq a) => a -> TSet a -> STM ()
remove x (TSet v) = readTVar v >>= writeTVar v . delete x
get :: (Eq a) => TSet a -> STM [a]
get (TSet v) = readTVar v
This module implements a STM based set of arbitrary elements. You can have multiple such sets and use them together in a single STM transaction that succeeds or fails at once. For example
-- | Ensures that there is exactly one element `x` in the set.
add1 :: (Eq a) => a -> TSet a -> STM ()
add1 x v = remove x v >> add x v
This would be difficult with actors, you'd have to add it as another command for the actor, you can't compose it of existing actions and still have atomicity.
Update: There is an interesting article explaining why Clojure designers chose not to use actors. For example, using actors, even if you have many reads and only very little writes to a mutable structure, they're all serialized, which can greatly impact performance.

Pipes and callbacks in Haskell

I'm processing some audio using portaudio. The haskell FFI bindings call a user defined callback whenever there's audio data to be processed. This callback should be handled very quickly and ideally with no I/O. I wanted to save the audio input and return quickly since my application doesn't need to react to the audio in realtime (right now I'm just saving the audio data to a file; later I'll construct a simple speech recognition system).
I like the idea of pipes and thought I could use that library. The problem is that I don't know how to create a Producer that returns data that came in through a callback.
How do I handle my use case?
Here's what I'm working with right now, in case that helps (the datum mvar isn't working right now but I don't like storing all the data in a seq... I'd rather process it as it came instead of just at the end):
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses #-}
module Main where
import Codec.Wav
import Sound.PortAudio
import Sound.PortAudio.Base
import Sound.PortAudio.Buffer
import Foreign.Ptr
import Foreign.ForeignPtr
import Foreign.C.Types
import Foreign.Storable
import qualified Data.StorableVector as SV
import qualified Data.StorableVector.Base as SVB
import Control.Exception.Base (evaluate)
import Data.Int
import Data.Sequence as Seq
import Control.Concurrent
instance Buffer SV.Vector a where
fromForeignPtr fp = return . SVB.fromForeignPtr fp
toForeignPtr = return . (\(a, b, c) -> (a, c)) . SVB.toForeignPtr
-- | Wrap a buffer callback into the generic stream callback type.
buffCBtoRawCB' :: (StreamFormat input, StreamFormat output, Buffer a input, Buffer b output) =>
BuffStreamCallback input output a b -> StreamCallback input output
buffCBtoRawCB' func = \a b c d e -> do
fpA <- newForeignPtr_ d -- We will not free, as callback system will do that for us
fpB <- newForeignPtr_ e -- We will not free, as callback system will do that for us
storeInp <- fromForeignPtr fpA (fromIntegral $ 1 * c)
storeOut <- fromForeignPtr fpB (fromIntegral $ 0 * c)
func a b c storeInp storeOut
callback :: MVar (Seq.Seq [Int32]) -> PaStreamCallbackTimeInfo -> [StreamCallbackFlag] -> CULong
-> SV.Vector Int32 -> SV.Vector Int32 -> IO StreamResult
callback seqmvar = \timeinfo flags numsamples input output -> do
putStrLn $ "timeinfo: " ++ show timeinfo ++ "; flags are " ++ show flags ++ " in callback with " ++ show numsamples ++ " samples."
print input
-- write data to output
--mapM_ (uncurry $ pokeElemOff output) $ zip (map fromIntegral [0..(numsamples-1)]) datum
--print "wrote data"
input' <- evaluate $ SV.unpack input
modifyMVar_ seqmvar (\s -> return $ s Seq.|> input')
case flags of
[] -> return $ if unPaTime (outputBufferDacTime timeinfo) > 0.2 then Complete else Continue
_ -> return Complete
done doneMVar = do
putStrLn "total done dood!"
putMVar doneMVar True
return ()
main = do
let samplerate = 16000
Nothing <- initialize
print "initialized"
m <- newEmptyMVar
datum <- newMVar Seq.empty
Right s <- openDefaultStream 1 0 samplerate Nothing (Just $ buffCBtoRawCB' (callback datum)) (Just $ done m)
startStream s
_ <- takeMVar m -- wait until our callbacks decide they are done!
Nothing <- terminate
print "let's see what we've recorded..."
stuff <- takeMVar datum
print stuff
-- write out wav file
-- let datum =
-- audio = Audio { sampleRate = samplerate
-- , channelNumber = 1
-- , sampleData = datum
-- }
-- exportFile "foo.wav" audio
print "main done"
The simplest solution is to use MVars to communicate between the callback and Producer. Here's how:
import Control.Proxy
import Control.Concurrent.MVar
fromMVar :: (Proxy p) => MVar (Maybe a) -> () -> Producer p a IO ()
fromMVar mvar () = runIdentityP loop where
loop = do
ma <- lift $ takeMVar mvar
case ma of
Nothing -> return ()
Just a -> do
respond a
loop
Your stream callback will write Just input to the MVar and your finalization callback will write Nothing to terminate the Producer.
Here's a ghci example demonstrating how it works:
>>> mvar <- newEmptyMVar :: IO (MVar (Maybe Int))
>>> forkIO $ runProxy $ fromMVar mvar >-> printD
>>> putMVar mvar (Just 1)
1
>>> putMVar mvar (Just 2)
2
>>> putMVar mvar Nothing
>>> putMVar mvar (Just 3)
>>>
Edit: The pipes-concurrency library now provides this feature, and it even has a section in the tutorial explaining specifically how to use it to get data out of callbacks.

Resources