Basic threading, interleaving threads confusion? - multithreading

I am slightly confused by the interleave function given in this guide.
I have the following data type:
data M m r = Atomic (m (M m r)) | Done r
I've created a lifting function to take a m a, inserts the a inside Done, and then re-inserts Done a back into m (). This forms the Atomic structure:
atm :: Monad m => m a -> M m a
atm m = Atomic $ liftM Done m
I've made M m an instance of the Monad class (which pattern matches based on data constructor):
instance (Monad m) => Monad (M m) where
return = Done
(Atomic m) >>= f = Atomic liftM (>>= f) m
(Done r) >>= f = f v
There's a simple implementation function which accesses nested values within the Atomic wrapper:
runThread :: Monad m => M m a -> m a
runThread (Atomic m) = m >>= runThread --Extract m and recursively pass to runThread
runThread (Done r) = return r --Return Done
Then, there is the following interleave function:
interleave :: Monad m => M m r -> M m r -> M m r
interleave (Atomic m1) (Atomic m2) = do
next1 <- atm m1 --?
next2 <- atm m2 --?
interleave next1 next2
interleave (Done _) t2 = interleave t2
interleave t1 (Done _) = interleave t1
My points of confusion are at next1 <- atm m1 and next2 <- atm m2.
As I understand it, all this is doing is taking m1 from the (Atomic m1) wrapper and reinserting it back into an Atomic wrapper? How is this operation interleaving?
Am I missing something basic here? The code works fine, so I'm sure it's due to my confusion.
The rest of the code:
threadOne :: M IO ()
threadOne = do
atm $ print 1
threadTwo :: M IO ()
threadTwo = do
atm $ print 2
main = do
runThread (interleave threadOne threadTwo)

You are partially right. The code
next1 <- atm m1
takes the atomic action that the first thread starts with and inserts it into the merged/interleaved thread. What's returned from the call to atm is the continuation next1 of that first thread.
But next, we are taking an action from the second thread, by saying
next2 <- atm m2
So the merged thread ends up executing an action from the first thread, and then one from the second thread. Hence "interleaving". We then continue by calling interleave recursively on the two continuations via
interleave next1 next2

Related

Threading continuations confusion?

I have a simple continuation type (similar to the Free monad):
data C m a = C {runC :: (a -> m a) -> m a}
toBeContinued :: Int -> C Maybe Int
toBeContinued i = C $ \todo -> todo i
continuing :: Int -> Maybe Int
continuing i = Just 100
I'm then looking to create two interleaved threads of these suspended continuations, all of which to be executed using the same continuing function:
data PoorMans m a = Atomic (m (PoorMans m a)) | Done a
instance Monad m => Monad (PoorMans m) where
(>>=) (Atomic m) f = Atomic $ (liftM (>>= f) m)
(>>=) (Done m) f = f m
return = Done
atomic :: Monad m => m a -> PoorMans m a
atomic m = Atomic $ liftM Done m
runThread :: Monad m => m a -> PoorMans m a
runThread (Atomic m) = m >>= runThread
runThread (Done m) = return m
interleave :: Monad m => PoorMans m a -> PoorMans m a -> PoorMans m a
interleave (Atomic m1) (Atomic m2) = do
n1 <- m1
n2 <- m2
interleave n1 n2
interleave (Done _) m2 = m2
interleave m1 (Done _) = m1
I'm now trying to create two threads of suspended operations, which I can then pass to interleave then into runThread with a continuation operation to be applied to all suspended operations:
createSuspendedOperations :: Int -> PoorMans (C Maybe) Int
createSuspendedOperations i = atomic $ toBeContinued 12
Error:
No instance for (Monad (C Maybe))
arising from a use of ‘atomic’
I understand that I must make C an instance of Monad as atomic is bound by Monad m =>:
instance Monad m => Monad (C m) where
However here I am stuck as to how to define the behaviors of >>= and return for C m a?
There's no way this is a monad, because a is an invariant parameter of C m a. (Here is an FPComplete post about variance.)
You might want to take a look at the "continuation monad" in transformers.

Using Poor Mans Concurrency with continuations/free monads?

I'm wanting to create an interleaved thread with suspended functions, which I can pass a single function at the time of invocation.
Basic continuation type:
data ContT m a = ContT {runContT :: (a -> m a) -> m a}
suspendedFunction :: Int -> ContT Maybe Int
suspendedFunction i = ContT $ \todo -> todo i
calculation :: Int -> Maybe Int --(a -> m a)
calculation i = Just 100
Continuations with Poor Mans Concurrency:
data C m a = Atomic (m (C m a)) | Done a
instance Monad m => Monad (C m) where
(>>=) (Atomic m) f = Atomic $ (liftM (>>= f) m)
(>>=) (Done a) f = f a
return = Done
atomic :: Monad m => m a -> C m a
atomic m = Atomic $ liftM Done m
interleave :: Monad m => C m a -> C m a -> C m a
interleave (Atomic m1) (Atomic m2) = do
n1 <- atom $ m1
n2 <- atom $ m2
interleave n1 n2
interleave m1 (Done _) = m1
interleave (Done _) m2 = m2
runInterleavedThread :: Monad m => C m a -> m a
runInterleavedThread m = m >>= runInterleaved
createSuspendedThread :: Int -> C (ContT Maybe) Int
createSuspendedThread i = atomic $ suspendedFunction i
main = do
let inter = interleave (createSuspendedThread 1) (createSuspendedThread 2)
let complete = runContT $ runThreads inter $ calculation
Error:
"No instance for (Monad (ContT Maybe))
arising from a use of ‘atomic’"
Am I missing something basic here, can anyone shed some light?

How do `pass` and `listen` work in WriterT?

The code below probably isn't a good way to do this, but it's what I've managed to cobble together. Basically, I run a series of complex tasks, during which several things get logged. At the end of each one I dump the log into a .txt file and move on to the next batch in a loop.
To achieve this I make use of listen and pass in WriterT (as part of RWST). The code is below:
-- Miscelaneous stuff
newtype Log = Log [String]
type ConnectT a = EitherT String (RWST ConnectReader Log ConnectState IO) a
timeStampLog :: String -> Log
timeStampLog msg = do
theTime <- liftIO $ fmap zonedTimeToLocalTime getZonedTime
let msgStart = show theTime ++ ": "
tell $ Log [msgStart ++ msg]
logToFileIO :: Log -> IO ()
logToFileIO (Log xs) = appendFile "Log.txt" $ "\r\n" ++ intercalate "\r\n" (reverse xs)
---------------------
logToFile :: ConnectT a -> ConnectT ()
logToFile cta = let ctaw = listen cta
in pass $ do
(_,w) <- ctaw
liftIO $ logToFileIO w
return ((),const mempty)
mapFunction :: (Show a) => a -> ConnectT ()
mapFunction a = logToFile $ do
timeStampLog $ "Starting sequence for " ++ show a
lotsOfLogging a
timeStampLog $ "Finishing sequence for " ++ show a
loopFunction :: ConnectT ()
loopFunction = logToFile $ do
timeStampLog "Starting Loop"
mapM_ mapFunction someList
timeStampLog "Finishing Loop"
What I end up with is something like this:
2015-03-17 20:21:40.8198823: Starting sequence for a
2015-03-17 20:21:41.8198823: (logs for a)
2015-03-17 20:21:41.8198823: Finishing sequence for a
2015-03-17 20:21:41.8198823: Starting sequence for b
2015-03-17 20:21:42.8198823: (logs for b)
2015-03-17 20:21:42.8198823: Finishing sequence for b
2015-03-17 20:21:39.8198823: Starting Loop
2015-03-17 20:21:42.8198823: Finishing Loop
Where the log entry for starting/finishing the loop end up together at the end.
I'm not entirely surprised that the call to logToFile in mapFunction doesn't include the log information from the loopFunction as the information hasn't passed to it via a bind.
But I'm still having trouble understanding how pass and listen work. And also how I would go about fixing this (admittedly minor) issue.
We can determine how listen and pass work almost entirely from their types. We'll start with listen.
listen
listen :: (Monoid w, Monad m) => RWST r w s m a -> RWST r w s m (a, w)
Unwrapping the RWST we have
listen :: (Monoid w, Monad m) => (r -> s -> m (a, s, w)) -> r -> s -> m ((a, w), s, w)
It needs to return an m .... The only way we have to make ms is to return something or apply the input function to an rand an s (we can't use >>= since it requires we already have an m). We don't have an a to return so we have to apply the function to an r and s. There's only one r and s we can use, those passed into the result.
listen k r s = ... (k r s)
Now we have an m (a, s, w) but need an m ((a, w), s, w). We can run the action again to get another m (nonsense for "listening") or do something with the (a, s, w) inside the m with >>=.
listen k r s = k r s >>= \(a, s' w) -> ...
To use bind we need an m. We can either return something or apply the input function to an r and s and repeat the action again, which is nonsense for "listening". We return something.
listen k r s = k r s >>= \(a, s', w) -> return ...
We need an a, a w, an s, and another w. We only have one a and no way to get any others.
listen k r s = k r s >>= \(a, s', w) -> return ((a,...),...,...)
There are 3 ways we can get a w: mempty, the w from the result of the action, or combining two ws together with <>. Returning mempty is pointless; the user could have just used mempty themselves. Duplicating what was logged with <> is as much nonsense as running an action twice, so we return what was logged by the first action.
listen k r s = k r s >>= \(a, s', w) -> return ((a,w),...,...)
We have two s es: s and s'. Reverting the state changes of the action is nonsense for "listening", so we return the changed state s'.
listen k r s = k r s >>= \(a, s', w) -> return ((a,w),s',...)
Now we are faced with the only interesting choice: what w should we keep for what was logged? The user has "listened" for what was logged; we could say that it's their problem now and reset the log to mempty. But "listening" doesn't suggest that it should change what something does, it should only observe it. Therefore, we keep the resulting log w intact.
listen k r s = k r s >>= \(a, s', w) -> return ((a,w),s',w)
If we wrap this in its RWSTs again we have
listen m = RWST \r s -> (runRWST m) r s >>= \(a, s', w) -> return ((a,w),s',w)
All we did was run the input action and include what it logged along with its resulting a in the result as a tuple. This matches the documentation for listen:
listen m is an action that executes the action m and adds its output to the value of the computation.
runRWST (listen m) r s = liftM (\ (a, w) -> ((a, w), w)) (runRWST m r s)
tell
pass :: (Monoid w, Monad m) => RWST r w s m (a, w -> w) -> RWST r w s m a
We begin as before, unwrapping the RWST
pass :: (Monoid w, Monad m) => (r -> s -> m ((a, w->w), s, w)) -> r -> s -> m (a, s, w)
We follow the same argument for how to get a resulting m as we used for listen
pass k r s = ... (k r s)
Now we have an m ((a, w->w), s, w)) but need an m (a, s, w). We can run the action again to get another m (nonsense for "passing") or do something with the ((a, w->w), s, w) inside the m with >>=.
pass k r s = k r s >>= \((a, f), s', w) -> ...
To use bind we need an m. We can either return something or apply the input function to an r and s and repeat the action again, which is nonsense for "passing". We return something.
pass k r s = k r s >>= \((a, f), s', w) -> return ...
We need an a, an s, and a w. We only have one a and no way to get any others.
pass k r s = k r s >>= \((a, f), s', w) -> return (a,...,...)
We have two s es: s and s'. Reverting the state changes of the action is nonsense for "passing", so we return the changed state s'.
pass k r s = k r s >>= \((a, f), s', w) -> return (a,s',...)
There are 4 ways we can get a w: mempty, the w from the result of the action, combining two ws together with <>, or applying the function f to another w. Setting the result to mempty leaves us wondering why the user provided a function f :: w -> w. themselves. Duplicating what was logged with <> is as much nonsense as running an action twice. We should be applying the function f to something.
pass k r s = k r s >>= \((a, f), s', w) -> return (a,s',f ...)
We could apply f to something built from memptys and <>, but if that were the case all of the fs would be equivalent to const ...; the type for it might as well have been a w. We could apply f to some elaborate structure built from w, mempty, <>, and f, but all of those structures could have been defined in f itself if we simply pass it w.
pass k r s = k r s >>= \((a, f), s', w) -> return (a,s',f w)
If we wrap this in its RWSTs again we have
pass m = RWST \r s -> (runRWST k) r s >>= \((a, f), s', w) -> return (a,s',f w)
We ran the input action and changed what was logged by the function that was a result of the action. This matches the documentation for pass:
pass m is an action that executes the action m, which returns a value and a function, and returns the value, applying the function to the output.
runRWST (pass m) r s = liftM (\ ((a, f), w) -> (a, f w)) (runRWST m r s)
The existing WriterT w m can't perform any action in the underlying m to perform logging until after the action has been run and the w has been assembled. As your question illustrates, this is confusing. The log for the do block loopFunction isn't written by logToFile until after the do block itself finishes running.
LoggerT
Let's invent a new WriterT called LoggerT. Our new LoggerT is going to provide a new function
logTells :: (Monoid w, Monoid w', Monad m) =>
(w -> LoggerT w' m ()) -> LoggerT w m a -> LoggerT w' m a
The intuition behind this is: we'll be able to provide an action (with type w -> LoggerT w' m ()) to log every tell, replacing the logged result with the result of the action. If we smash two things the user tells us together with <> we'll no longer be able to log both of them; we'll only ever be able to log the result of <>. Since our LoggerT will never be able to use <> it will never need the Monoid instances. We must drop the Monoid constraint from everything in LoggerT.
logTells :: (Monad m) =>
(w -> LoggerT w' m ()) -> LoggerT w m a -> LoggerT w' m a
We need to remember every tell so that we can replace it later. But when we replace it "later", the logging should happen at the point the tell appeared in the code. For example, if we make
processX :: LoggerT String m ()
processX = do
tell "Starting process X"
lotsOfProcessing
tell "Finishing process X"
And then "later" write logTells logToFile processX we want the resulting computation to look like the following.
logTells logToFile processX = do
logToFile "Starting process X"
lotsOfProcessing
logToFile "Finishing process X"
None of lotsOfProcessing should happen until the logToFile for tell "Starting process X" has already happened. This means that when the user tells us something we need to remember not only what we were told, but everything that happens after that. We "remember" things in the constructor for a data.
data LoggerT w m a
= Tell w (LoggerT w m a)
| ...
tell :: w -> LoggerT w m ()
tell w = Tell w (return ())
We also need to be able to perform actions in the underlying Monad. It would be tempting to add another constructor Lift (m a), but then we couldn't decide what to log as a result of the underlying computation. Instead, we'll let it decide the entire future LoggerT w m a to run.
data LoggerT w m a
= Tell w (LoggerT w m a)
| M (m (LoggerT w m a))
...
If we try to lift an underlying computation m a into LoggerT we now have a problem; we don't have a way to turn the a into a LoggerT w m a to put it in the M constructor.
instance MonadTrans (LoggerT w m) where
lift ma = M (??? ma)
We could try lifting return from the underlying Monad, but that's just a circular definition. We'll add another constructor for Returning.
data LoggerT w m a
= Tell w (LoggerT w m a)
| M (m (LoggerT w m a))
| Return a
instance MonadTrans (LoggerT w m) where
lift = M . liftM Return
To finish our monad transformer, we'll write a Monad instance.
instance Monad m => Monad (LoggerT w m) where
return = Return
la0 >>= k = go la0
where
go (Tell w la ) = Tell w (go la)
go (M mla) = M (liftM go mla)
go (Return a ) = Return a
We can now define logTells. It replaces every Tell with the action to perform to log it.
logTells :: (w -> LoggerT w' m ()) -> LoggerT w m a -> LoggerT w' m a
logTells k = go
where
go (Tell w la ) = k w >> go la
go (M mla) = M (liftM go mla)
go (Return a) = return a
Finally, we'll provide a way to get out of LoggerT by replacing all of the Tells with an action, very similar to logTells but dropping the LoggerT from the result.
Since it will get rid of the LoggerT we'll call it runLoggerT and swap the arguments to match the convention of other transformers.
runLoggerT :: LoggerT w m a -> (w -> m ()) -> m a
runLoggerT la0 k = go la0
where
go (Tell w la ) = k w >> go la
go (M mla) = liftM go mla
go (Return a) = return a
LoggerT already exists, we don't need to write it ourself. It's the Producer from the very mature pipes library.
pipes
The Producer from the pipes library is the correct logging transformer.
type Producer b = Proxy X () () b
Every Proxy has a MonadTrans (Proxy a' a b' b) instance and a Monad m => Monad (Proxy a' a b' b m) instance.
We tell it what to log with yield.
yield :: Monad m => a -> Producer' a m ()
tell = yield
When we know what we want to do with the yields, we replace them with what we want to do using for.
for :: Monad m =>
Proxy x' x b' b m a' ->
(b -> Proxy x' x c' c m b')
-> Proxy x' x c' c m a'
Specialized for Producer and (), for has the type
for :: Monad m =>
Producer b m a ->
(b -> Producer c m ()) ->
Producer c m a
logTells = flip for
If we replace each of the yields with an action in the underlying monad, we won't have anything produced anymore and can run the Proxy with runEffect.
runEffect :: Monad m => Effect m r -> m r
runEffect :: Monad m => Proxy X () () X m r -> m r
runEffect :: Monad m => Producer X m r -> m r
runLoggerT la0 k = runEffect $ for la0 (lift . k)
We can even recover the WriterT with hoist which replaces the underlying monad (every Proxy a' a b' b has an MFunctor instance).
hoist :: (Monad m, MFunctor t) => (forall a. m a -> n a) -> t m b -> t n b
We use hoist to replace the underlying monad with WriterT w m by lifting each m a into WriterT w m a. Then we replace each yield with lift . tell, and run the result.
toWriterT :: (Monad m, Monoid w) => Producer w m r -> WriterT w m r
toWriterT p0 = runEffect $ for (hoist lift p0) (lift . tell)
toWriterT p0 = runLoggerT (hoist lift p0) tell
Producer is essentially the free WriterT that doesn't require a Monoid for the items being written.
Here's a simplified, but definitely real-life example that uses censor (which is defined in terms of pass as
censor :: (MonadWriter w m) => (w -> w) -> m a -> m a
censor f m = pass $ (,f) <$> m
) to collect free variables of a lambda term:
import Control.Monad.Writer
import Data.Set (Set)
import qualified Data.Set as Set
type VarId = String
data Term = Var VarId
| Lam VarId Term
| App Term Term
freeVars :: Term -> Set VarId
freeVars = execWriter . go
where
go :: Term -> Writer (Set VarId) ()
go (Var x) = tell $ Set.singleton x
go (App f e) = go f >> go e
go (Lam x e) = censor (Set.delete x) $ go e
Now, of course you can implement this without all the Writer machinery, but remember this is just a simplified example standing in for some more involved compilation/analysis function, where tracking free variables is just one of the things going on.
The documentation is clear enough? http://hackage.haskell.org/package/mtl-2.2.1/docs/Control-Monad-Writer-Lazy.html#g:1
Examples (run the following in ghci)
import Control.Monad.Writer
runWriterT ( do (a,w) <- listen $ do { tell "foo" ; return 42 } ; tell $ reverse w ; return a )
==> (42,"foooof")
runWriterT ( pass $ do { tell "foo" ; return (42,reverse) } )
==> (42,"oof")

Single-stepping a conduit

I want to do something along the lines of ArrowChoice, but with conduits. I want to await an Either value and then pass Left values to one conduit and Right values to another, and then merge the results back into an Either stream.
Presumably this can be done by making the inner conduits like automata: turn a conduit into a function that takes an argument and returns a monadic list of outputs yielded:
newtype AutomataM i m o = Automata (i -> m (o, Automata i o))
conduitStep :: Conduit i m o -> AutomataM i m [o]
The reason for the list of outputs is that a Conduit may yield 0 or more outputs for each input.
I've looked at ResumableConduit and its relatives, and presumably the answer is in there somewhere. But I can't quite see how its done.
It's not exactly the same type signature you provided, but:
import Data.Conduit
import Data.Conduit.Internal (Pipe (..), ConduitM (..))
newtype Automata i o m r = Automata (m ([o], Either r (i -> Automata i o m r)))
conduitStep :: Monad m => ConduitM i o m r -> Automata i o m r
conduitStep (ConduitM con0) =
Automata $ go [] id con0
where
go _ front (Done r) = return (front [], Left r)
go ls front (HaveOutput p _ o) = go ls (front . (o:)) p
go ls front (NeedInput p _) =
case ls of
[] -> return (front [], Right $ conduitStep . ConduitM . p)
l:ls' -> go ls' front (p l)
go ls front (PipeM mp) = mp >>= go ls front
go ls front (Leftover p l) = go (l:ls) front p
But just be careful with this approach:
By keeping the output as a list, it's not constant memory.
We're throwing away finalizers.
There's probably a way to provide a ZipConduit abstraction, similar to ZipSource and ZipSink, that would handle this kind of problem more elegantly, but I haven't thought about it too much.
EDIT I ended up implementing ZipConduit in conduit-extra 0.1.5. Here's a demonstration of using it which sounds a bit like your case:
import Control.Applicative
import Data.Conduit
import Data.Conduit.Extra
import qualified Data.Conduit.List as CL
conduit1 :: Monad m => Conduit Int m String
conduit1 = CL.map $ \i -> "conduit1: " ++ show i
conduit2 :: Monad m => Conduit Double m String
conduit2 = CL.map $ \d -> "conduit2: " ++ show d
conduit :: Monad m => Conduit (Either Int Double) m String
conduit = getZipConduit $
ZipConduit (lefts =$= conduit1) *>
ZipConduit (rights =$= conduit2)
where
lefts = CL.mapMaybe (either Just (const Nothing))
rights = CL.mapMaybe (either (const Nothing) Just)
main :: IO ()
main = do
let src = do
yield $ Left 1
yield $ Right 2
yield $ Left 3
yield $ Right 4
sink = CL.mapM_ putStrLn
src $$ conduit =$ sink
There's a folk method of doing this using pipes by using "push-category" Pipes. The complete implementation comes from both this mailing list post and this Stack Overflow answer. I think it hasn't been released yet due to both an effort to simplify the Pipes interface, a focus on using the "sequencing" monad instance which is hidden via this method, and no proof yet that this implementation truly implements the Arrow class properly.
The idea is to implement a newtype Edge (demonstrated below) which is a push-based pipe with the type arguments in the right order for Category, Arrow, ArrowChoice and both Functor and Applicative over their output values. This lets you compose them into directed acyclic graphs using arrow notation. I'll run over the implementation below, but it's safe to just ignore it and use the Arrow/ArrowChoice/Applicative instances of Edge without too much concern.
(Edit: This code is best made available at https://github.com/Gabriel439/Haskell-RCPL-Library)
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE TypeSynonymInstances #-}
import Prelude hiding ((.), id)
import Pipes.Core
import Pipes.Lift
import Control.Monad.Morph
import Control.Category
import Control.Monad.State.Strict
import Control.Arrow
This is an atypical mode of using pipes and isn't exposed in the Pipes module; you must import Pipes.Core to use push. Push-based pipes look like
-- push :: a -> Proxy a' a a' a m r
and thus they demand at least one upstream value before the Proxy is allowed to run. This means the whole process needs to be "kickstarted" by passing the first value as a function call and that the leftmost push-Proxy will control the entire stream.
Given a push-based pipe we can implement Category, Arrow and ArrowChoice. The standard solution also involves the Edge typeclass so that we have the type arguments in the right order for Category and Arrow
newtype Edge m r a b = Edge { unEdge :: a -> Pipe a b m r }
For the Category instance, we use the "push" Category which has push as id and (<~<) as composition:
instance Monad m => Category (Edge m r) where
id = Edge push
Edge a . Edge b = Edge (a <~< b)
We embed functions into Edge with arr by augmenting id (i.e. push) on the downward edge. To do this we use the respond category which has the law p />/ respond == p, but jam our f into the process.
instance Monad m => Arrow (Edge m r) where
arr f = Edge (push />/ respond . f)
We also use a local state transformer to store the snd half of our pairs and pass it "around" the input pipe in first
first (Edge p) = Edge $ \(b, d) ->
evalStateP d $ (up \>\ hoist lift . p />/ dn) b
where
up () = do
(b, d) <- request ()
lift (put d)
return b
dn c = do
d <- lift get
respond (c, d)
Finally, we get an ArrowChoice instance by implementing left. To do so we split the burden of passing the Left and Right sides using either the return or the pipe to pass values.
instance (Monad m) => ArrowChoice (Edge m r) where
left (Edge k) = Edge (bef >=> (up \>\ (k />/ dn)))
where
bef x = case x of
Left b -> return b
Right d -> do
_ <- respond (Right d)
x2 <- request ()
bef x2
up () = do
x <- request ()
bef x
dn c = respond (Left c)
We can use Edge to create "push-based" producers and consumers
type PProducer m r b = Edge m r () b
type PConsumer m r a = forall b . Edge m r a b
and then we'll provide Functor and Applicative instances for PProducer. This goes by case analysis on the underlying Pipe, so it's a bit verbose. Essentially, however, all that happens is that we insert f into the yield slot of the Pipe.
instance Functor (PProducer m r) where
fmap f (Edge k) = $ Edge $ \() -> go (k ()) where
go p = case p of
Request () ku -> Request () (\() -> go (ku ()))
-- This is the only interesting line
Respond b ku -> Respond (f b) (\() -> go (ku ()))
M m -> M (m >>= \p' -> return (go p'))
Pure r -> Pure r
Finally, Applicative is much the same except that we have to switch between running the upstream pipe to produce functions and running the downstream pipe to produce arguments.
instance (Monad m) => Applicative (Edge m r ()) where
pure b = Edge $ \() -> forever $ respond b
(Edge k1) <*> (Edge k2) = Edge (\() -> goL (k1 ()) (k2 ()))
where
goL p1 p2 = case p1 of
Request () ku -> Request () (\() -> goL (ku ()) p2)
Respond f ku -> goR f (ku ()) p2
M m -> M (m >>= \p1' -> return (goL p1' p2))
Pure r -> Pure r
goR f p1 p2 = case p2 of
Request () ku -> Request () (\() -> goR f p1 (ku ()))
Respond x ku -> Respond (f x) (\() -> goL p1 (ku ()))
M m -> M (m >>= \p2' -> return (goR f p1 p2'))
Pure r -> Pure r

Fusing conduits with multiple inputs

I am trying to create a conduit that can consume multiple input streams. I need to be able to await on one or the other of the input streams in no particular order (e.g., not alternating) making zip useless. There is nothing parallel or non-deterministic going on here: I await on one stream or the other. I want to be able to write code similar to the following (where awaitA and awaitB await on the first or second input stream respectively):
do
_ <- awaitA
x <- awaitA
y <- awaitB
yield (x,y)
_ <- awaitB
_ <- awaitB
y' <- awaitB
yield (x,y')
The best solution I have is to make the inner monad another conduit, e.g.
foo :: Sink i1 (ConduitM i2 o m) ()
Which then allows
awaitA = await
awaitB = lift await
And this mostly works. Unfortunately, this seems to make it very difficult to fuse to the inner conduit before the outer conduit is fully connected. The first thing I tried was:
fuseInner :: Monad m =>
Conduit i2' m i2 ->
Sink i1 (ConduitM i2 o m) () ->
Sink i1 (ConduitM i2' o m) ()
fuseInner x = transPipe (x =$=)
But this doesn't work, at least when x is stateful since (x =$=) is run multiple times, effectively restarting x each time.
Is there any way to write fuseInner, short of breaking into the internals of conduit (which looks like it would be pretty messy)? Is there some better way to handle multiple input streams? Am I just way to far beyond what conduit was designed for?
Thanks!
If you want to combine two IO-generated streams, then Gabriel's comment is the solution.
Otherwise, you can't wait for both streams, which one produces a value first. Conduits are single-threaded and deterministic - it processes only one pipe at a time. But you could create a function that interleaves two streams, letting them decide when to switch:
{-# OPTIONS_GHC -fwarn-incomplete-patterns #-}
import Control.Monad (liftM)
import Data.Conduit.Internal (
Pipe (..), Source, Sink,
injectLeftovers, ConduitM (..),
mapOutput, mapOutputMaybe
)
-- | Alternate two given sources, running one until it yields `Nothing`,
-- then switching to the other one.
merge :: Monad m
=> Source m (Maybe a)
-> Source m (Maybe b)
-> Source m (Either a b)
merge (ConduitM l) (ConduitM r) = ConduitM $ goL l r
where
goL :: Monad m => Pipe () () (Maybe a) () m ()
-> Pipe () () (Maybe b) () m ()
-> Pipe () () (Either a b) () m ()
goL (Leftover l ()) r = goL l r
goL (NeedInput _ c) r = goL (c ()) r
goL (PipeM mx) r = PipeM $ liftM (`goL` r) mx
goL (Done _) r = mapOutputMaybe (liftM Right) r
goL (HaveOutput c f (Just o)) r = HaveOutput (goL c r) f (Left o)
goL (HaveOutput c f Nothing) r = goR c r
-- This is just a mirror copy of goL. We should combine them together to
-- avoid code repetition.
goR :: Monad m => Pipe () () (Maybe a) () m ()
-> Pipe () () (Maybe b) () m ()
-> Pipe () () (Either a b) () m ()
goR l (Leftover r ()) = goR l r
goR l (NeedInput _ c) = goR l (c ())
goR l (PipeM mx) = PipeM $ liftM (goR l) mx
goR l (Done _) = mapOutputMaybe (liftM Left) l
goR l (HaveOutput c f (Just o)) = HaveOutput (goR l c) f (Right o)
goR l (HaveOutput c f Nothing) = goL l c
It processes one source until it returns Nothing, then switches to another, etc. If one source finishes, the other one is processed to the end.
As an example, we can combine and interleave two lists:
import Control.Monad.Trans
import Data.Conduit (($$), awaitForever)
import Data.Conduit.List (sourceList)
main = (merge (sourceList $ concatMap (\x -> [Just x, Just x, Nothing]) [ 1..10])
(sourceList $ concatMap (\x -> [Just x, Nothing]) [101..110]) )
$$ awaitForever (\x -> lift $ print x)
If you need multiple sources, merge could be adapted to something like
mergeList :: Monad m => [Source m (Maybe a)] -> Source m a
which would cycle through the given list of sources until all of them are finished.
This can be done by diving into the internals of conduit. I wanted to avoid this because it looked extremely messy. Based on the responses here, it sounds like there is no way around it (but I would really appreciate a cleaner solution).
The key difficulty is that (x =$=) is a pure function, but to make transPipe give the correct answer, it needs a kind of stateful, function-like thing:
data StatefulMorph m n = StatefulMorph
{ stepStatefulMorph :: forall a. m a -> n (StatefulMorph m n, a)
, finalizeStatefulMorph :: n () }
Stepping StatefulMorph m n takes a value in m and returns, in n, both that value and the next StatefulMorph, which should be used to transform the next m value. The last StatefulMorph should be finalized (which, in the case of the "stateful (x =$=)", finalizes the x conduit.
Conduit fusion can be implemented as a StatefulMorph, using the code for pipeL with minor changes. The signature is:
fuseStateful :: Monad m
=> Conduit a m b
-> StatefulMorph (ConduitM b c m) (ConduitM a c m)
I also need a replacement for transPipe (a special case of hoist) that uses StatefulMorph values instead of functions.
class StatefulHoist t where
statefulHoist :: (Monad m, Monad n)
=> StatefulMorph m n
-> t m r -> t n r
A StatefulHoist instance for ConduitM i o can be written using the code for transPipe with some minor changes.
fuseInner is then easy to implement.
fuseInner :: Monad m
=> Conduit a m b
-> ConduitM i o (ConduitM b c m) r
-> ConduitM i o (ConduitM a c m) r
fuseInner left = statefulHoist (fuseStateful left)
I've written a more detailed explanation here and posted the full code here. If someone can come up with a cleaner solution, or one that uses the conduit public API, please post it.
Thanks for all the suggestions and input!

Resources