Streaming of a resource with proper acquisition and release - haskell

This is a question about the Haskell streaming library.
Stream (Of a) m r is a "stream of individual Haskell values derived from actions in some monad m and returning a value of type r". Streaming.Prelude defines many useful functions that allow nice streaming applications:
import qualified Streaming.Prelude as S
S.print $ do
S.yield "a"
S.yield "b"
S.yield "c"
The tutorial is good for getting started.
Now, the particular issue at hand is how to use this framework with a monad that requires careful instantiation and release of resources. The streaming-with package seems to be the right candidate, it has a function
bracket :: MonadMask m => m a -> (a -> m c) -> (a -> m b) -> m b
that acquires (m a), releases (a->m c) and uses (a->m b) a resource. All three actions are encapsulated in the returned m b. withFile is a good example for how to use this:
withFile :: FilePath -> IOMode -> (Handle -> m r) -> m r
withFile fp md = bracket (liftIO (openFile fp md)) (liftIO . hClose)
Acquisition and release of the handle are nicely sandwiching the usage Handle->m r.
But: I absolutely do not see how this should be used with Stream (Of a) m r. I have to provide an a->m b and I get a m b. How is this supposed to be connected so that I obtain a Stream?
To figure this out, let's play with withFile:
import System.IO
use :: Handle -> IO (S.Stream (Of String) IO ())
use = return . S.repeatM . hGetLine
main :: IO ()
main = do
str <- S.withFile "input.dat" ReadMode use
S.print str
but that results in hGetLine: illegal operation (handle is closed). That actually makes sense, by the time S.print str is called withFile has already acquired and released the handle.
So let's move the stream consumption inside the use function:
use :: Handle -> IO ()
use h = do
S.print $ S.repeatM (hGetLine h)
and that gives a hGetLine: invalid argument (invalid byte sequence). I'm not quite sure what this error means. An isEOFError would be acceptable, but 'invalid byte sequence'? In any case, this doesn't work either.
I'm running out of ideas... How is this done?
The withFile is just a toy example, the question is really about how to correctly create and consume a stream inside a bracket.

let's move the stream consumption inside the use function
This is indeed the right approach.
I'm actually getting a proper hGetLine: end of file when running the example code. The problem is that S.repeatM (hGetLine h) never bothers to check if it has reached then end of the file, and throws an exception when it bumps into it.
The following definition of use doesn't have that problem:
use :: Handle -> IO ()
use h = do
S.print $ S.untilRight $ do eof <- System.IO.hIsEOF h
if eof then Right <$> pure ()
else Left <$> hGetLine h
It uses the untilRight function.

Related

Handling event streams in haskell

I want to process stream of events received via the MQTT. Library which I'm using uses a callback to provide the results. Processing I'm doing depends on the previous state not only the latest event. Also in the future events might be gathered from the other sources.
At the first I decided to compose it into the list which sounds as a good idea. I had the minor issue cause IO prevents lazy evaluation and waiting for infinite stream might be long, but I solved it with interleaving IO.
stream :: IO [Event] allows me to do the nice stuff like foldl, foldM map, mapM, etc... Unfortunately with this approach I rather wont be able to combine two streams, cause there is no more locking feature there.
I was diging through many libs, and found STM with TQueue for example. Unfortunately it is not what I exactly want.
I decide to create custom type and make it Foldable so I will be able to fold it. I failed due to IO.
import Control.Concurrent.STM
newtype Stream a = Stream (STM a)
runStream
:: ((a -> IO ()) -> IO i)
-> IO (Stream a)
runStream block = do
queue <- newTQueueIO
block (atomically . writeTQueue queue)
return $ Stream (readTQueue queue)
foldStream :: (a -> b -> IO b) -> b -> Stream a -> IO b
foldStream f s (Stream read) = do
n <- atomically read
m <- f n s
foldStream f m (Stream read)
mapStream :: (a -> b) -> Stream a -> Stream b
mapStream f (Stream read) = Stream $ f <$> read
zipStream :: [Stream a] -> Stream a
zipStream = undefined
Whih can be used like main = foldStream (\x _ -> print x) () =<< events
Is it possible to implement base some of base classes to work with this stream as with regular List?
The usual trick in these cases is to make the callback write to a queue, and then read from the other end of the queue.
Using a bounded, closeable queue from the stm-chans package, we can define this function:
import Control.Concurrent.STM
import Control.Concurrent.STM.TBMQueue
foldQueue :: TBMQueue a -> (x -> a -> IO x) -> IO x -> (x -> IO b) -> IO b
foldQueue queue step start done =
let go state =
do m <- atomically (readTBMQueue queue)
case m of
Nothing -> done state
Just a -> step state a >>= go
in start >>= go
It takes the channel, a step function (similar to the one required by foldM), an action to obtain the initial state, and a "done" action that returns the final result, and then feeds data from the channel until it is closed. Notice that the fold state x is chosen by the caller of foldQueue.
If later we want to upgrade to the monadic folds from the foldl package—which have a very useful Applicative instance—we can do it like this:
import qualified Control.Foldl as L
foldQueue' :: TBMQueue a -> L.FoldM IO a b -> IO b
foldQueue' queue = L.impurely (foldQueue queue)
Using impurely from the "foldl" package.
Sometimes (like when parsing, grouping, or decoding) it's easier to use a pull-based consumer. We can do that with the streaming package:
import Streaming
import qualified Streaming.Prelude as S
foldQueue' :: TBMQueue a -> (Stream (Of a) IO () -> IO r) -> IO r
foldQueue' queue consume = consume (S.untilRight (do
m <- atomically (readTBMQueue queue)
return (case m of
Nothing -> Right ()
Just a -> Left a)))
Given a function that consumes a stream, we feed to it a stream of values read from the queue.
Often, reading from the channel and writing to it must happen in different threads. We can use functions like concurrently from async to handle it cleanly.

Haskell: Replace mapM in a monad transformer stack to achieve lazy evaluation (no space leaks)

It has already been discussed that mapM is inherently not lazy, e.g. here and here. Now I'm struggling with a variation of this problem where the mapM in question is deep inside a monad transformer stack.
Here's a function taken from a concrete, working (but space-leaking) example using LevelDB that I put on gist.github.com:
-- read keys [1..n] from db at DirName and check that the values are correct
doRead :: FilePath -> Int -> IO ()
doRead dirName n = do
success <- runResourceT $ do
db <- open dirName defaultOptions{ cacheSize= 2048 }
let check' = check db def in -- is an Int -> ResourceT IO Bool
and <$> mapM check' [1..n] -- space leak !!!
putStrLn $ if success then "OK" else "Fail"
This function reads the values corresponding to keys [1..n] and checks that they are all correct. The troublesome line inside the ResourceT IO a monad is
and <$> mapM check' [1..n]
One solution would be to use streaming libraries such as pipes, conduit, etc. But these seem rather heavy and I'm not at all sure how to use them in this situation.
Another path I looked into is ListT as suggested here. But the type signatures of ListT.fromFoldable :: [Bool]->ListT Bool and ListT.fold :: (r -> a -> m r) -> r -> t m a -> mr (where m=IO and a,r=Bool) do not match the problem at hand.
What is a 'nice' way to get rid of the space leak?
Update: Note that this problem has nothing to do with monad transformer stacks! Here's a summary of the proposed solutions:
1) Using Streaming:
import Streaming
import qualified Streaming.Prelude as S
S.all_ id (S.mapM check' (S.each [1..n]))
2) Using Control.Monad.foldM:
foldM (\a i-> do {b<-check' i; return $! a && b}) True [1..n]
3) Using Control.Monad.Loops.allM
allM check' [1..n]
I know you mention you don't want to use streaming libraries, but your problem seems pretty easy to solve with streaming without changing the code too much.
import Streaming
import qualified Streaming.Prelude as S
We use each [1..n] instead of [1..n] to get a stream of elements:
each :: (Monad m, Foldable f) => f a -> Stream (Of a) m ()
Stream the elements of a pure, foldable container.
(We could also write something like S.take n $ S.enumFrom 1).
We use S.mapM check' instead of mapM check':
mapM :: Monad m => (a -> m b) -> Stream (Of a) m r -> Stream (Of b) m r
Replace each element of a stream with the result of a monadic action
And then we fold the stream of booleans with S.all_ id:
all_ :: Monad m => (a -> Bool) -> Stream (Of a) m r -> m Bool
Putting it all together:
S.all_ id (S.mapM check' (S.each [1..n]))
Not too different from the code you started with, and without the need for any new operator.
I think what you need is allM from the monad-loops package.
Then it would be just allM check' [1..n]
(Or if you don't want the import it's a pretty small function to copy.)

How to merge two Consumer into one in Haskell Pipes?

I use Haskell stream processing library pipes to write a command line tool. Each command line actions may output result to stdout and logs to stderr with pipes API.
I need Consumer which has type as Consumer (Either String String) m r to print chunk of data (Left to stderr, Right to stdout) with single Consumer.
Code I wrote (should be improved)
This function consumeEither doesn't have flexibility so I want to improve it.
consumeEither :: (MonadIO m) => Consumer (Either String String) m ()
consumeEither = do
eitherS <- await
case eitherS of
(Left l) -> for (yield l) (liftIO . (IO.hPutStrLn IO.stderr))
(Right r) -> for (yiled r) (liftIO . putStrLn)
Furthermore it would be useful to provide a function which takes two Consumers and merge them into one Consumer.
Question
Does anybody know good example or implementation of the following interface?
merge :: (Monad m) => Consumer a m r -> Consumer b m r -> Consumer (Either a b) m r
1st argument as stderr
2nd argument as stdout
Usage of the function
import Pipes
import qualified Pipes.Prelude as P
import qualified System.IO as IO
stdoutOrErr :: Consumer (Either String String) IO ()
stdoutOrErr = merge (P.toHandle IO.stderr) P.stdoutLn
Thanks
(This is #Michael's answer, but I'd like to write it up here so we can move the question out of the unanswered queue for the Haskell tag.)
See (+++) in pipes-extras. Keep in mind a Consumer is a Pipe (to nowhere), so P.toHandle IO.stderr +++ P.stdoutLn :: MonadIO m => Pipe (Either String String) (Either b d) m ().
To get a Consumer, you would have to get rid of the Lefts e.g with >-> P.concat or >-> P.drain. There are more robust and handsome ways of doing this with Folds.

How can I write a pipe that sends downstream a list of what it receives from upstream?

I'm having a hard time to write a pipe with this signature:
toOneBigList :: (Monad m, Proxy p) => () -> Pipe p a [a] m r
It should simply take all as from upstream and send them in a list downstream.
All my attempts look fundamentally broken.
Can anybody point me in the right direction?
There are two pipes-based solutions and I'll let you pick which one you prefer.
Note: It's not clear why you output the list on the downstream interface instead of just returning it directly as a result.
Conduit-style
The first one, which is very close to the conduit-based solution uses the upcoming pipes-pase, which is basically complete and just needs documentation. You can find the latest draft on Github.
Using pipes-parse, the solution is identical to the conduit solution that Petr gave:
import Control.Proxy
import Control.Proxy.Parse
combine
:: (Monad m, Proxy p)
=> () -> Pipe (StateP [Maybe a] p) (Maybe a) [a] m ()
combine () = loop []
where
loop as = do
ma <- draw
case ma of
Nothing -> respond (reverse as)
Just a -> loop (a:as)
draw is like conduit's await: it requests a value from either the leftovers buffer (that's the StateP part) or from upstream if the buffer is empty. Nothing indicates end of file.
You can wrap a pipe that does not have an end of file signal using the wrap function from pipes-parse, which has type:
wrap :: (Monad m, Proxy p) => p a' a b' b m r -> p a' a b' (Maybe b) m s
Classic Pipes Style
The second alternative is a bit simpler. If you want to fold a given pipe you can do so directly using WriterP:
import Control.Proxy
import Control.Proxy.Trans.Writer
foldIt
:: (Monad m, Proxy p) =>
(() -> Pipe p a b m ()) -> () -> Pipe p a [b] m ()
foldIt p () = runIdentityP $ do
r <- execWriterK (liftP . p >-> toListD >-> unitU) ()
respond r
That's a higher-level description of what is going on, but it requires passing in the pipe as an explicit argument. It's up to you which one you prefer.
By the way, this is why I was asking why you want to send a single value downstream. The above is much simpler if you return the folded list:
foldIt p = execWriterK (liftP . p >-> toListD)
The liftP might not even be necessary if p is completely polymorphic in its proxy type. I only include it as a precaution.
Bonus Solution
The reason pipes-parse does not provide the toOneBigList is that it's always a pipes anti-pattern to group the results into a list. pipes has several nice features that make it possible to never have to group the input into a list, even if you are trying to yield multiple lists. For example, using respond composition you can have a proxy yield the subset of the stream it would have traversed and then inject a handler that uses that subset:
example :: (Monad m, Proxy p) => () -> Pipe p a (() -> Pipe p a a m ()) m r
example () = runIdentityP $ forever $ do
respond $ \() -> runIdentityP $ replicateM_ 3 $ request () >>= respond
printIt :: (Proxy p, Show a) => () -> Pipe p a a IO r
printIt () = runIdentityP $ do
lift $ putStrLn "Here we go!"
printD ()
useIt :: (Proxy p, Show a) => () -> Pipe p a a IO r
useIt = example />/ (\p -> (p >-> printIt) ())
Here's an example of how to use it:
>>> runProxy $ enumFromToS 1 10 >-> useIt
Here we go!
1
2
3
Here we go!
4
5
6
Here we go!
7
8
9
Here we go!
10
This means you never need to bring a single element into memory even when you need to group elements.
I'll give only a partial answer, perhaps somebody else will have a better one.
As far as I know, standard pipes have no mechanism of detecting when the other part of the pipeline terminates. The first pipe that terminates produces the final result of the pipe-line and all the others are just dropped. So if you have a pipe that consumes input forever (to eventually produce a list), it will have no chance acting and producing output when its upstream finishes. (This is intentional so that both up- and down-stream parts are dual to each other.) Perhaps this is solved in some library building on top of pipes.
The situation is different with conduit. It has consume function that combines all inputs into a list and returns (not outputs) it. Writing a function like the one you need, that outputs the list at the end, is not difficult:
import Data.Conduit
combine :: (Monad m) => Conduit a m [a]
combine = loop []
where
loop xs = await >>= maybe (yield $ reverse xs) (loop . (: xs))

How withFile is implemented in haskell

Following a haskell tutorial, the author provides the following implementation of the withFile method:
withFile' :: FilePath -> IOMode -> (Handle -> IO a) -> IO a
withFile' path mode f = do
handle <- openFile path mode
result <- f handle
hClose handle
return result
But why do we need to wrap the result in a return? Doesn't the supplied function f already return an IO as can be seen by it's type Handle -> IO a?
You're right: f already returns an IO, so if the function were written like this:
withFile' path mode f = do
handle <- openFile path mode
f handle
there would be no need for a return. The problem is hClose handle comes in between, so we have to store the result first:
result <- f handle
and doing <- gets rid of the IO. So return puts it back.
This is one of the tricky little things that confused me when I first tried Haskell. You're misunderstanding the meaning of the <- construct in do-notation. result <- f handle doesn't mean "assign the value of f handle to result"; it means "bind result to a value 'extracted' from the monadic value of f handle" (where the 'extraction' happens in some way that's defined by the particular Monad instance that you're using, in this case the IO monad).
I.e., for some Monad typeclass m, the <- statement takes an expression of type m a in the right hand side and a variable of type a on the left hand side, and binds the variable to a value. Thus in your particular example, with result <- f handle, we have the types f result :: IO a, result :: a and return result :: IO a.
PS do-notation has also a special form of let (without the in keyword in this case!) that works as a pure counterpart to <-. So you could rewrite your example as:
withFile' :: FilePath -> IOMode -> (Handle -> IO a) -> IO a
withFile' path mode f = do
handle <- openFile path mode
let result = f handle
hClose handle
result
In this case, because the let is a straightforward assignment, the type of result is IO a.

Resources