Handling event streams in haskell

Handling event streams in haskell - haskell

I want to process stream of events received via the MQTT. Library which I'm using uses a callback to provide the results. Processing I'm doing depends on the previous state not only the latest event. Also in the future events might be gathered from the other sources.
At the first I decided to compose it into the list which sounds as a good idea. I had the minor issue cause IO prevents lazy evaluation and waiting for infinite stream might be long, but I solved it with interleaving IO.
stream :: IO [Event] allows me to do the nice stuff like foldl, foldM map, mapM, etc... Unfortunately with this approach I rather wont be able to combine two streams, cause there is no more locking feature there.
I was diging through many libs, and found STM with TQueue for example. Unfortunately it is not what I exactly want.
I decide to create custom type and make it Foldable so I will be able to fold it. I failed due to IO.
import Control.Concurrent.STM
newtype Stream a = Stream (STM a)
runStream
:: ((a -> IO ()) -> IO i)
-> IO (Stream a)
runStream block = do
queue <- newTQueueIO
block (atomically . writeTQueue queue)
return $ Stream (readTQueue queue)
foldStream :: (a -> b -> IO b) -> b -> Stream a -> IO b
foldStream f s (Stream read) = do
n <- atomically read
m <- f n s
foldStream f m (Stream read)
mapStream :: (a -> b) -> Stream a -> Stream b
mapStream f (Stream read) = Stream $ f <$> read
zipStream :: [Stream a] -> Stream a
zipStream = undefined
Whih can be used like main = foldStream (\x _ -> print x) () =<< events
Is it possible to implement base some of base classes to work with this stream as with regular List?

The usual trick in these cases is to make the callback write to a queue, and then read from the other end of the queue.
Using a bounded, closeable queue from the stm-chans package, we can define this function:
import Control.Concurrent.STM
import Control.Concurrent.STM.TBMQueue
foldQueue :: TBMQueue a -> (x -> a -> IO x) -> IO x -> (x -> IO b) -> IO b
foldQueue queue step start done =
let go state =
do m <- atomically (readTBMQueue queue)
case m of
Nothing -> done state
Just a -> step state a >>= go
in start >>= go
It takes the channel, a step function (similar to the one required by foldM), an action to obtain the initial state, and a "done" action that returns the final result, and then feeds data from the channel until it is closed. Notice that the fold state x is chosen by the caller of foldQueue.
If later we want to upgrade to the monadic folds from the foldl package—which have a very useful Applicative instance—we can do it like this:
import qualified Control.Foldl as L
foldQueue' :: TBMQueue a -> L.FoldM IO a b -> IO b
foldQueue' queue = L.impurely (foldQueue queue)
Using impurely from the "foldl" package.
Sometimes (like when parsing, grouping, or decoding) it's easier to use a pull-based consumer. We can do that with the streaming package:
import Streaming
import qualified Streaming.Prelude as S
foldQueue' :: TBMQueue a -> (Stream (Of a) IO () -> IO r) -> IO r
foldQueue' queue consume = consume (S.untilRight (do
m <- atomically (readTBMQueue queue)
return (case m of
Nothing -> Right ()
Just a -> Left a)))
Given a function that consumes a stream, we feed to it a stream of values read from the queue.
Often, reading from the channel and writing to it must happen in different threads. We can use functions like concurrently from async to handle it cleanly.

Related

Streaming of a resource with proper acquisition and release

This is a question about the Haskell streaming library.
Stream (Of a) m r is a "stream of individual Haskell values derived from actions in some monad m and returning a value of type r". Streaming.Prelude defines many useful functions that allow nice streaming applications:
import qualified Streaming.Prelude as S
S.print $ do
S.yield "a"
S.yield "b"
S.yield "c"
The tutorial is good for getting started.
Now, the particular issue at hand is how to use this framework with a monad that requires careful instantiation and release of resources. The streaming-with package seems to be the right candidate, it has a function
bracket :: MonadMask m => m a -> (a -> m c) -> (a -> m b) -> m b
that acquires (m a), releases (a->m c) and uses (a->m b) a resource. All three actions are encapsulated in the returned m b. withFile is a good example for how to use this:
withFile :: FilePath -> IOMode -> (Handle -> m r) -> m r
withFile fp md = bracket (liftIO (openFile fp md)) (liftIO . hClose)
Acquisition and release of the handle are nicely sandwiching the usage Handle->m r.
But: I absolutely do not see how this should be used with Stream (Of a) m r. I have to provide an a->m b and I get a m b. How is this supposed to be connected so that I obtain a Stream?
To figure this out, let's play with withFile:
import System.IO
use :: Handle -> IO (S.Stream (Of String) IO ())
use = return . S.repeatM . hGetLine
main :: IO ()
main = do
str <- S.withFile "input.dat" ReadMode use
S.print str
but that results in hGetLine: illegal operation (handle is closed). That actually makes sense, by the time S.print str is called withFile has already acquired and released the handle.
So let's move the stream consumption inside the use function:
use :: Handle -> IO ()
use h = do
S.print $ S.repeatM (hGetLine h)
and that gives a hGetLine: invalid argument (invalid byte sequence). I'm not quite sure what this error means. An isEOFError would be acceptable, but 'invalid byte sequence'? In any case, this doesn't work either.
I'm running out of ideas... How is this done?
The withFile is just a toy example, the question is really about how to correctly create and consume a stream inside a bracket.

let's move the stream consumption inside the use function
This is indeed the right approach.
I'm actually getting a proper hGetLine: end of file when running the example code. The problem is that S.repeatM (hGetLine h) never bothers to check if it has reached then end of the file, and throws an exception when it bumps into it.
The following definition of use doesn't have that problem:
use :: Handle -> IO ()
use h = do
S.print $ S.untilRight $ do eof <- System.IO.hIsEOF h
if eof then Right <$> pure ()
else Left <$> hGetLine h
It uses the untilRight function.

Haskell: Replace mapM in a monad transformer stack to achieve lazy evaluation (no space leaks)

It has already been discussed that mapM is inherently not lazy, e.g. here and here. Now I'm struggling with a variation of this problem where the mapM in question is deep inside a monad transformer stack.
Here's a function taken from a concrete, working (but space-leaking) example using LevelDB that I put on gist.github.com:
-- read keys [1..n] from db at DirName and check that the values are correct
doRead :: FilePath -> Int -> IO ()
doRead dirName n = do
success <- runResourceT $ do
db <- open dirName defaultOptions{ cacheSize= 2048 }
let check' = check db def in -- is an Int -> ResourceT IO Bool
and <$> mapM check' [1..n] -- space leak !!!
putStrLn $ if success then "OK" else "Fail"
This function reads the values corresponding to keys [1..n] and checks that they are all correct. The troublesome line inside the ResourceT IO a monad is
and <$> mapM check' [1..n]
One solution would be to use streaming libraries such as pipes, conduit, etc. But these seem rather heavy and I'm not at all sure how to use them in this situation.
Another path I looked into is ListT as suggested here. But the type signatures of ListT.fromFoldable :: [Bool]->ListT Bool and ListT.fold :: (r -> a -> m r) -> r -> t m a -> mr (where m=IO and a,r=Bool) do not match the problem at hand.
What is a 'nice' way to get rid of the space leak?
Update: Note that this problem has nothing to do with monad transformer stacks! Here's a summary of the proposed solutions:
1) Using Streaming:
import Streaming
import qualified Streaming.Prelude as S
S.all_ id (S.mapM check' (S.each [1..n]))
2) Using Control.Monad.foldM:
foldM (\a i-> do {b<-check' i; return $! a && b}) True [1..n]
3) Using Control.Monad.Loops.allM
allM check' [1..n]

I know you mention you don't want to use streaming libraries, but your problem seems pretty easy to solve with streaming without changing the code too much.
import Streaming
import qualified Streaming.Prelude as S
We use each [1..n] instead of [1..n] to get a stream of elements:
each :: (Monad m, Foldable f) => f a -> Stream (Of a) m ()
Stream the elements of a pure, foldable container.
(We could also write something like S.take n $ S.enumFrom 1).
We use S.mapM check' instead of mapM check':
mapM :: Monad m => (a -> m b) -> Stream (Of a) m r -> Stream (Of b) m r
Replace each element of a stream with the result of a monadic action
And then we fold the stream of booleans with S.all_ id:
all_ :: Monad m => (a -> Bool) -> Stream (Of a) m r -> m Bool
Putting it all together:
S.all_ id (S.mapM check' (S.each [1..n]))
Not too different from the code you started with, and without the need for any new operator.

I think what you need is allM from the monad-loops package.
Then it would be just allM check' [1..n]
(Or if you don't want the import it's a pretty small function to copy.)

How to merge two Consumer into one in Haskell Pipes?

I use Haskell stream processing library pipes to write a command line tool. Each command line actions may output result to stdout and logs to stderr with pipes API.
I need Consumer which has type as Consumer (Either String String) m r to print chunk of data (Left to stderr, Right to stdout) with single Consumer.
Code I wrote (should be improved)
This function consumeEither doesn't have flexibility so I want to improve it.
consumeEither :: (MonadIO m) => Consumer (Either String String) m ()
consumeEither = do
eitherS <- await
case eitherS of
(Left l) -> for (yield l) (liftIO . (IO.hPutStrLn IO.stderr))
(Right r) -> for (yiled r) (liftIO . putStrLn)
Furthermore it would be useful to provide a function which takes two Consumers and merge them into one Consumer.
Question
Does anybody know good example or implementation of the following interface?
merge :: (Monad m) => Consumer a m r -> Consumer b m r -> Consumer (Either a b) m r
1st argument as stderr
2nd argument as stdout
Usage of the function
import Pipes
import qualified Pipes.Prelude as P
import qualified System.IO as IO
stdoutOrErr :: Consumer (Either String String) IO ()
stdoutOrErr = merge (P.toHandle IO.stderr) P.stdoutLn
Thanks

(This is #Michael's answer, but I'd like to write it up here so we can move the question out of the unanswered queue for the Haskell tag.)
See (+++) in pipes-extras. Keep in mind a Consumer is a Pipe (to nowhere), so P.toHandle IO.stderr +++ P.stdoutLn :: MonadIO m => Pipe (Either String String) (Either b d) m ().
To get a Consumer, you would have to get rid of the Lefts e.g with >-> P.concat or >-> P.drain. There are more robust and handsome ways of doing this with Folds.

Haskell Conduit: is it possible to optionally have the result of a source?

I have the following types built from Data.Conduit:
type Footers = [(ByteString, ByteString)]
type DataAndConclusion = ConduitM () ByteString IO Footers
The idea of the second type being "produce a lot of ByteStrings, and if you can produce all of them, return a Footers". The condition is because conduits are governed by downstream functions, so the consumer of DataAndConclusion may have no need to consume all its items, and in that case the return wouldn't be reached. Which is precisely the behavior that I need. But when the end of the source is reached, I would like to have the produced Footers. This would be useful for example if the DataAndConclusions were incrementally computing an MD5 and such MD5 was only needed if the entire message was processed by the downstream (for example, downstream could be simply sending it through the network, but it doesn't make sense to finish computing and send the MD5 if the socket was closed before the last piece was sent by downstream).
So, basically I want to have something with this signature to consume a DataAndConclusions:
type MySink = Sink ByteString IO ()
mySink :: MySink
mySink = ...
difficultFunction :: ConduitM () a2 m r1 -> ConduitM a2 Void m r2 -> m (Maybe r1)
Question is, is there any way to implement "difficultFunction"? How?

There should be definitely a nice solution, but I wasn't able to construct it using ConduitM primitives. Something with signature
ConduitM i a m r1 -> ConduitM a o m r2 -> ConduitM i o m (Maybe r1, r2)
Looks like a primitive function with this signature would be a good addition for the conduit library.
Nevertheless, #danidiaz's suggestion about StateT lead me to the following generic solution that lifts the whole computation to WriterT internally in order to remember the output of the first conduit, if it's reached:
import Control.Monad
import Control.Monad.Trans
import Control.Monad.Trans.Writer
import Data.Conduit
import Data.Monoid
import Data.Void
difficultFunction :: (Monad m)
=> ConduitM () a2 m r1 -> ConduitM a2 Void m r2
-> m (r2, Maybe r1)
difficultFunction l r = liftM (fmap getLast) $ runWriterT (l' $$ r')
where
l' = transPipe lift l >>= lift . tell . Last . Just
r' = transPipe lift r
(untested!)

This would be useful for example if the DataAndConclusions were
incrementally computing an MD5 and such MD5 was only needed if the
entire message was processed by the downstream
Instead of relying on the return value of the upstream conduit, in this case perhaps you could accumulate the ongoing MD5 computation in a StateT layer beneath ConduitM, and access it after running the conduit.
The other part of the puzzle is detecting that the producer has finished first. Sinks can detect upstream end-of-input in await calls. You could write a Sink that notifies you of upstream termination in its own result type, perhaps with a Maybe.
But what if you are given a Sink that doesn't already do that? We would need a function like Sink i m r -> Sink i m (Maybe r). "Given a Sink that may halt early, return a new Sink that returns Nothing if upstream finishes first". But I don't know how to write that function.
Edit: This conduit sets an IORef to True when it detects upstream termination:
detectUpstreamClose :: IORef Bool -> Conduit i IO i
detectUpstreamClose ref = conduit
where
conduit = do
m <- await
case m of
Nothing -> liftIO (writeIORef ref True)
Just i -> do
yield i
conduit
detectUpstreamClose could be inserted in a pipeline, and the IORef could be checked afterwards.

Storing arbitrary function calls across threads

I'm trying to write a library aiming to reproduce Qt's threading semantics: signals can be connected to slots, and all slots execute in a known thread, so that slots tied to the same thread are threadsafe with regards to each other.
I have the following API:
data Signal a = Signal Unique a
data Slot a = Slot Unique ThreadId (a -> IO ())
mkSignal :: IO (Signal a)
mkSlot :: ThreadId -> (Slot a -> a -> IO ()) -> IO (Slot a)
connect :: Signal a -> Slot a -> IO ()
-- callable from any thread
emit :: Signal a -> a -> IO ()
-- runs in Slot's thread as a result of `emit`
execute :: Slot a -> a -> IO ()
execute (Slot _ _ f) arg = f arg
The problem is getting from emit to execute. The argument needs to be stored at runtime somehow, and then an IO action performed, but I can't seem to get past the type checker.
The things I need:
Type safety: signals shouldn't be connected to slots expecting a different type.
Type-independence: there can be more than one slots for any given type (Perhaps this can be relaxed with newtype and/or TH).
Ease of use: since this is a library, signals and slots should be easy to create.
The things I've tried:
Data.Dynamic: makes the whole thing really fragile, and I haven't found a way to perform a correctly-typed IO action on a Dynamic. There's dynApply, but it's pure.
Existential types: I need to execute the function passed to mkSlot, as opposed to an arbitrary function based on the type.
Data.HList: I'm not smart enough to figure it out.
What am I missing?

Firstly, are you sure Slots really want to execute in a specific thread? It's easy to write thread-safe code in Haskell, and threads are very lightweight in GHC, so you're not gaining much by tying all event-handler execution to a specific Haskell thread.
Also, mkSlot's callback doesn't need to be given the Slot itself: you can use recursive do-notation to bind the slot in its callback without adding the concern of tying the knot to mkSlot.
Anyway, you don't need anything as complicated as those solutions. I expect when you talk about existential types, you're thinking about sending something like (a -> IO (), a) through a TChan (which you mentioned using in the comments) and applying it on the other end, but you want the TChan to accept values of this type for any a, rather than just one specific a. The key insight here is that if you have (a -> IO (), a) and don't know what a is, the only thing you can do is apply the function to the value, giving you an IO () — so we can just send those through the channel instead!
Here's an example:
import Data.Unique
import Control.Applicative
import Control.Monad
import Control.Concurrent
import Control.Concurrent.STM
newtype SlotGroup = SlotGroup (IO () -> IO ())
data Signal a = Signal Unique (TVar [Slot a])
data Slot a = Slot Unique SlotGroup (a -> IO ())
-- When executed, this produces a function taking an IO action and returning
-- an IO action that writes that action to the internal TChan. The advantage
-- of this approach is that it's impossible for clients of newSlotGroup to
-- misuse the internals by reading the TChan or similar, and the interface is
-- kept abstract.
newSlotGroup :: IO SlotGroup
newSlotGroup = do
chan <- newTChanIO
_ <- forkIO . forever . join . atomically . readTChan $ chan
return $ SlotGroup (atomically . writeTChan chan)
mkSignal :: IO (Signal a)
mkSignal = Signal <$> newUnique <*> newTVarIO []
mkSlot :: SlotGroup -> (a -> IO ()) -> IO (Slot a)
mkSlot group f = Slot <$> newUnique <*> pure group <*> pure f
connect :: Signal a -> Slot a -> IO ()
connect (Signal _ v) slot = atomically $ do
slots <- readTVar v
writeTVar v (slot:slots)
emit :: Signal a -> a -> IO ()
emit (Signal _ v) a = atomically (readTVar v) >>= mapM_ (`execute` a)
execute :: Slot a -> a -> IO ()
execute (Slot _ (SlotGroup send) f) a = send (f a)
This uses a TChan to send actions to the worker thread each slot is tied to.
Note that I'm not very familiar with Qt, so I may have missed some subtlety of the model. You can also disconnect Slots with this:
disconnect :: Signal a -> Slot a -> IO ()
disconnect (Signal _ v) (Slot u _ _) = atomically $ do
slots <- readTVar v
writeTVar v $ filter keep slots
where keep (Slot u' _) = u' /= u
You might want something like Map Unique (Slot a) instead of [Slot a] if this is likely to be a bottleneck.
So, the solution here is to (a) recognise that you have something that's fundamentally based upon mutable state, and use a mutable variable to structure it; (b) realise that functions and IO actions are first-class just like everything else, so you don't have to do anything special to construct them at runtime :)
By the way, I suggest keeping the implementations of Signal and Slot abstract by not exporting their constructors from the module defining them; there are many ways to tackle this approach without changing the API, after all.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Handling event streams in haskell - haskell

Related

Streaming of a resource with proper acquisition and release

Haskell: Replace mapM in a monad transformer stack to achieve lazy evaluation (no space leaks)

How to merge two Consumer into one in Haskell Pipes?

Haskell Conduit: is it possible to optionally have the result of a source?

Storing arbitrary function calls across threads

Categories

Resources