How can I conditionally apply a conduit?

How can I conditionally apply a conduit? - haskell

I have a Conduit of type Conduit a m a and a function of type (a -> Maybe a). I want to run the function, and then if it returns Nothing, use the Conduit. That is, I want a function of type
maybePipe :: Conduit a m b -> (a -> Maybe b) -> Conduit a m b
or, of the more restricted type
maybePipe :: Conduit a m a -> (a -> Maybe a) -> Conduit a m a
If it helps, my specific case is as follows:
I'm writing code that deals with IRC messages, and I have a function:
runClient :: Conduit IRC.Message IO IRC.Message -> ClientSettings -> IO ()
runClient pipe address = runTCPClient' pipe' address where
pipe' = mapC IRC.decode $= concatMapC id $= pipe $= mapC IRC.encode $= mapC (++ "\r\n")
handlePings (IRC.Message (Just (IRC.Server serverName)) "PING" []) = Just $ IRC.pong serverName
handlePings (IRC.Message Nothing "PING" [server]) = Just $ IRC.pong server
handlePings (IRC.Message Nothing "PING" []) = Just $ IRC.pong (getHost address)
handlePings _ = Nothing
runTCPClient' :: Conduit ByteString IO ByteString -> ClientSettings -> IO ()
runTCPClient' pipe address = runTCPClient address runClient where
runClient appdata = appSource appdata $= linesUnboundedAsciiC $= pipe $$ appSink appdata
I want to be able to do maybePipe handlePings pipe (or equivalent) in that function, so when the IRC message is a ping, we respond with a pong and don't call the user-specified Conduit.

Searching Hoogle reveals a function with almost exactly that type signature: mapOutputMaybe. But the more idiomatic way would be to fuse with Data.Conduit.List.mapMaybe.
EDIT
Scratch that, I understand what you're asking now. No, there's no built in combinator. But it's easy to build one up:
myHelper onNothing f = awaitForever $ maybe onNothing yield . f

Using Michael's combinator only calls the (a -> Maybe b) on the first item it comes acress, then lets the onNothing pipe take over. This was not what I was looking for.
Instead, using ZipConduit, in my specific example (using conduit-combinators):
pingHandlingPipe =
getZipConduit $ ZipConduit (concatMapC handlePings)
*> ZipConduit (takeWhileC (not.isJust.handlePings) $= pipe)
or, generalized
pipeMaybe maybeF pipe =
getZipConduit $ ZipConduit (concatMapC maybeF)
*> ZipConduit (takeWhileC (not.isJust.maybeF) $= pipe)
Unfortunately, this calls the (a -> Maybe b) function two times.

Related

Convert IO callback to infinite list

I am using a library that I can provide with a function a -> IO (), which it will call occasionally.
Because the output of my function depends not only on the a it receives as input, but also on the previous a's, it would be much easier for me to write a function [a] -> IO (), where [a] is infinite.
Can I write a function:
magical :: ([a] -> IO ()) -> (a -> IO ())
That collects the a's it receives from the callback and passes them to my function as a lazy infinite list?

The IORef solution is indeed the simplest one. If you'd like to explore a pure (but more complex) variant, have a look at conduit. There are other implementations of the same concept, see Iteratee I/O, but I found myself conduit to be very easy to use.
A conduit (AKA pipe) is an abstraction of of program that can accept input and/or produce output. As such, it can keep internal state, if needed. In your case, magical would be a sink, that is, a conduit that accepts input of some type, but produces no output. By wiring it into a source, a program that produces output, you complete the pipeline and then ever time the sink asks for an input, the source is run until it produces its output.
In your case you'd have roughly something like
magical :: Sink a IO () -- consumes a stream of `a`s, no result
magical = go (some initial state)
where
go state = do
m'input <- await
case m'input of
Nothing -> return () -- finish
Just input -> do
-- do something with the input
go (some updated state)

This is not exactly what you asked for, but it might be enough for your purposes, I think.
magical :: ([a] -> IO ()) -> IO (a -> IO ())
magical f = do
list <- newIORef []
let g x = do
modifyIORef list (x:)
xs <- readIORef list
f xs -- or (reverse xs), if you need FIFO ordering
return g
So if you have a function fooHistory :: [a] -> IO (), you can use
main = do
...
foo <- magical fooHistory
setHandler foo -- here we have foo :: a -> IO ()
...
As #danidaz wrote above, you probably do not need magical, but can play the same trick directly in your fooHistory, modifying a list reference (IORef [a]).
main = do
...
list <- newIORef []
let fooHistory x = do
modifyIORef list (x:)
xs <- readIORef list
use xs -- or (reverse xs), if you need FIFO ordering
setHandler fooHistory -- here we have fooHistory :: a -> IO ()
...

Control.Concurrent.Chan does almost exactly what I wanted!
import Control.Monad (forever)
import Control.Concurrent (forkIO)
import Control.Concurrent.Chan
setHandler :: (Char -> IO ()) -> IO ()
setHandler f = void . forkIO . forever $ getChar >>= f
process :: String -> IO ()
process ('h':'i':xs) = putStrLn "hi" >> process xs
process ('a':xs) = putStrLn "a" >> process xs
process (x:xs) = process xs
process _ = error "Guaranteed to be infinite"
main :: IO ()
main = do
c <- newChan
setHandler $ writeChan c
list <- getChanContents c
process list

This seems like a flaw in the library design to me. You might consider an upstream patch so that you could provide something more versatile as input.

Streaming of a resource with proper acquisition and release

This is a question about the Haskell streaming library.
Stream (Of a) m r is a "stream of individual Haskell values derived from actions in some monad m and returning a value of type r". Streaming.Prelude defines many useful functions that allow nice streaming applications:
import qualified Streaming.Prelude as S
S.print $ do
S.yield "a"
S.yield "b"
S.yield "c"
The tutorial is good for getting started.
Now, the particular issue at hand is how to use this framework with a monad that requires careful instantiation and release of resources. The streaming-with package seems to be the right candidate, it has a function
bracket :: MonadMask m => m a -> (a -> m c) -> (a -> m b) -> m b
that acquires (m a), releases (a->m c) and uses (a->m b) a resource. All three actions are encapsulated in the returned m b. withFile is a good example for how to use this:
withFile :: FilePath -> IOMode -> (Handle -> m r) -> m r
withFile fp md = bracket (liftIO (openFile fp md)) (liftIO . hClose)
Acquisition and release of the handle are nicely sandwiching the usage Handle->m r.
But: I absolutely do not see how this should be used with Stream (Of a) m r. I have to provide an a->m b and I get a m b. How is this supposed to be connected so that I obtain a Stream?
To figure this out, let's play with withFile:
import System.IO
use :: Handle -> IO (S.Stream (Of String) IO ())
use = return . S.repeatM . hGetLine
main :: IO ()
main = do
str <- S.withFile "input.dat" ReadMode use
S.print str
but that results in hGetLine: illegal operation (handle is closed). That actually makes sense, by the time S.print str is called withFile has already acquired and released the handle.
So let's move the stream consumption inside the use function:
use :: Handle -> IO ()
use h = do
S.print $ S.repeatM (hGetLine h)
and that gives a hGetLine: invalid argument (invalid byte sequence). I'm not quite sure what this error means. An isEOFError would be acceptable, but 'invalid byte sequence'? In any case, this doesn't work either.
I'm running out of ideas... How is this done?
The withFile is just a toy example, the question is really about how to correctly create and consume a stream inside a bracket.

let's move the stream consumption inside the use function
This is indeed the right approach.
I'm actually getting a proper hGetLine: end of file when running the example code. The problem is that S.repeatM (hGetLine h) never bothers to check if it has reached then end of the file, and throws an exception when it bumps into it.
The following definition of use doesn't have that problem:
use :: Handle -> IO ()
use h = do
S.print $ S.untilRight $ do eof <- System.IO.hIsEOF h
if eof then Right <$> pure ()
else Left <$> hGetLine h
It uses the untilRight function.

How to turn a pull based pipe into a push based one?

By default pipes are pull based. This is due to the operator >-> which is implemented via +>> which is the pointful bind operator for his pull category. My understanding is that this means that if you have code like producer >-> consumer, the consumer's body will be called first, then once it awaits data, the producer will be called.
I've seen in the pipes documentation here that you can use the code (reflect .) from Pipes.Core to turn a pull based pipe into a push based pipe. That means instead (correct me if I'm wrong) that in the code above producer >-> consumer, the producer is run first, produces a value, then the consumer tries to consume. That seems really useful and I'd like to know how to do it.
I've also seen in discussions here that there is no push based counterpart to >-> because it is easy to turn any pipe around (I assume with reflect?), but I can't really figure how to do it or find any examples.
Here's some code I've attempted:
stdin :: Producer String IO r
stdin = forever $ do
lift $ putStrLn "stdin"
str <- lift getLine
yield str
countLetters :: Consumer String IO r
countLetters = forever $ do
lift $ putStrLn "countLetters"
str <- await
lift . putStrLn . show . length $ str
-- this works in pull mode
runEffect (stdin >-> countLetters)
-- equivalent to above, works
runEffect ((\() -> stdin) +>> countLetters)
-- push based operator, doesn't do what I hoped
runEffect (stdin >>~ (\_ -> countLetters))
-- does not compile
runEffect (countLetters >>~ (\() -> stdin))

-- push based operator, doesn't do what I hoped
runEffect (stdin >>~ (\_ -> countLetters))
I gather the problem here is that, while the producer is ran first as expected, the first produced value is dropped. Compare...
GHCi> runEffect (stdin >-> countLetters)
countLetters
stdin
foo
3
countLetters
stdin
glub
4
countLetters
stdin
... with:
GHCi> runEffect (stdin >>~ (\_ -> countLetters))
stdin
foo
countLetters
stdin
glub
4
countLetters
stdin
This issue is discussed in detail by Gabriella Gonzalez's answer to this question. It boils down to how the argument to the function you give to (>>~) is the "driving" input in the push-based flow, and so if you const it away you end up dropping the first input. The solution is to reshape countLetters accordingly:
countLettersPush :: String -> Consumer String IO r
countLettersPush str = do
lift $ putStrLn "countLetters"
lift . putStrLn . show . length $ str
str' <- await
countLettersPush str'
GHCi> runEffect (stdin >>~ countLettersPush)
stdin
foo
countLetters
3
stdin
glub
countLetters
4
stdin
I've also seen in discussions here that there is no push based counterpart to >-> because it is easy to turn any pipe around (I assume with reflect?)
I'm not fully sure of my ground, but it seems that doesn't quite apply to the solution above. What we can do, now that we have the push-based flow working correctly, is using reflect to turn it around back to a pull-based flow:
-- Preliminary step: switching to '(>~>)'.
stdin >>~ countLettersPush
(const stdin >~> countLettersPush) ()
-- Applying 'reflect', as the documentation suggests.
reflect . (const stdin >~> countLettersPush)
reflect . const stdin <+< reflect . countLettersPush
const (reflect stdin) <+< reflect . countLettersPush
-- Rewriting in terms of '(+>>)'.
(reflect . countLettersPush >+> const (reflect stdin)) ()
reflect . countLettersPush +>> reflect stdin
This is indeed pull-based, as the flow is driven by reflect stdin, the downstream Client:
GHCi> :t reflect stdin
reflect stdin :: Proxy String () () X IO r
GHCi> :t reflect stdin :: Client String () IO r
reflect stdin :: Client String () IO r :: Client String () IO r
The flow, however, involves sending Strings upstream, and so it cannot be expressed in terms of (>->), which is, so to say, downstream-only:
GHCi> -- Compare the type of the second argument with that of 'reflect stdin'
GHCi> :t (>->)
(>->)
:: Monad m =>
Proxy a' a () b m r -> Proxy () b c' c m r -> Proxy a' a c' c m

How can I write a pipe that sends downstream a list of what it receives from upstream?

I'm having a hard time to write a pipe with this signature:
toOneBigList :: (Monad m, Proxy p) => () -> Pipe p a [a] m r
It should simply take all as from upstream and send them in a list downstream.
All my attempts look fundamentally broken.
Can anybody point me in the right direction?

There are two pipes-based solutions and I'll let you pick which one you prefer.
Note: It's not clear why you output the list on the downstream interface instead of just returning it directly as a result.
Conduit-style
The first one, which is very close to the conduit-based solution uses the upcoming pipes-pase, which is basically complete and just needs documentation. You can find the latest draft on Github.
Using pipes-parse, the solution is identical to the conduit solution that Petr gave:
import Control.Proxy
import Control.Proxy.Parse
combine
:: (Monad m, Proxy p)
=> () -> Pipe (StateP [Maybe a] p) (Maybe a) [a] m ()
combine () = loop []
where
loop as = do
ma <- draw
case ma of
Nothing -> respond (reverse as)
Just a -> loop (a:as)
draw is like conduit's await: it requests a value from either the leftovers buffer (that's the StateP part) or from upstream if the buffer is empty. Nothing indicates end of file.
You can wrap a pipe that does not have an end of file signal using the wrap function from pipes-parse, which has type:
wrap :: (Monad m, Proxy p) => p a' a b' b m r -> p a' a b' (Maybe b) m s
Classic Pipes Style
The second alternative is a bit simpler. If you want to fold a given pipe you can do so directly using WriterP:
import Control.Proxy
import Control.Proxy.Trans.Writer
foldIt
:: (Monad m, Proxy p) =>
(() -> Pipe p a b m ()) -> () -> Pipe p a [b] m ()
foldIt p () = runIdentityP $ do
r <- execWriterK (liftP . p >-> toListD >-> unitU) ()
respond r
That's a higher-level description of what is going on, but it requires passing in the pipe as an explicit argument. It's up to you which one you prefer.
By the way, this is why I was asking why you want to send a single value downstream. The above is much simpler if you return the folded list:
foldIt p = execWriterK (liftP . p >-> toListD)
The liftP might not even be necessary if p is completely polymorphic in its proxy type. I only include it as a precaution.
Bonus Solution
The reason pipes-parse does not provide the toOneBigList is that it's always a pipes anti-pattern to group the results into a list. pipes has several nice features that make it possible to never have to group the input into a list, even if you are trying to yield multiple lists. For example, using respond composition you can have a proxy yield the subset of the stream it would have traversed and then inject a handler that uses that subset:
example :: (Monad m, Proxy p) => () -> Pipe p a (() -> Pipe p a a m ()) m r
example () = runIdentityP $ forever $ do
respond $ \() -> runIdentityP $ replicateM_ 3 $ request () >>= respond
printIt :: (Proxy p, Show a) => () -> Pipe p a a IO r
printIt () = runIdentityP $ do
lift $ putStrLn "Here we go!"
printD ()
useIt :: (Proxy p, Show a) => () -> Pipe p a a IO r
useIt = example />/ (\p -> (p >-> printIt) ())
Here's an example of how to use it:
>>> runProxy $ enumFromToS 1 10 >-> useIt
Here we go!
1
2
3
Here we go!
4
5
6
Here we go!
7
8
9
Here we go!
10
This means you never need to bring a single element into memory even when you need to group elements.

I'll give only a partial answer, perhaps somebody else will have a better one.
As far as I know, standard pipes have no mechanism of detecting when the other part of the pipeline terminates. The first pipe that terminates produces the final result of the pipe-line and all the others are just dropped. So if you have a pipe that consumes input forever (to eventually produce a list), it will have no chance acting and producing output when its upstream finishes. (This is intentional so that both up- and down-stream parts are dual to each other.) Perhaps this is solved in some library building on top of pipes.
The situation is different with conduit. It has consume function that combines all inputs into a list and returns (not outputs) it. Writing a function like the one you need, that outputs the list at the end, is not difficult:
import Data.Conduit
combine :: (Monad m) => Conduit a m [a]
combine = loop []
where
loop xs = await >>= maybe (yield $ reverse xs) (loop . (: xs))

How do I combine IOError exceptions with locally relevant exceptions?

I am building a Haskell application and trying to figure out how I am going to build the error handling mechanism. In the real application, I'm doing a bunch of work with Mongo. But, for this, I'm going to simplify by working with basic IO operations on a file.
So, for this test application, I want to read in a file and verify that it contains a proper fibonnacci sequence, with each value separated by a space:
1 1 2 3 5 8 13 21
Now, when reading the file, any number of things could actually be wrong, and I am going to call all of those exceptions in the Haskell usage of the word.
data FibException = FileUnreadable IOError
| FormatError String String
| InvalidValue Integer
| Unknown String
instance Error FibException where
noMsg = Unknown "No error message"
strMsg = Unknown
Writing a pure function that verifies the sequence and throws an error in the case that the sequence is invalid is easy (though I could probably do better):
verifySequence :: String -> (Integer, Integer) -> Either FibException ()
verifySequence "" (prev1, prev2) = return ()
verifySequence s (prev1, prev2) =
let readInt = reads :: ReadS Integer
res = readInt s in
case res of
[] -> throwError $ FormatError s
(val, rest):[] -> case (prev1, prev2, val) of
(0, 0, 1) -> verifySequence rest (0, 1)
(p1, p2, val') -> (if p1 + p2 /= val'
then throwError $ InvalidValue val'
else verifySequence rest (p2, val))
_ -> throwError $ InvalidValue val
After that, I want the function that reads the file and verifies the sequence:
type FibIOMonad = ErrorT FibException IO
verifyFibFile :: FilePath -> FibIOMonad ()
verifyFibFile path = do
sequenceStr <- liftIO $ readFile path
case (verifySequence sequenceStr (0, 0)) of
Right res -> return res
Left err -> throwError err
This function does exactly what I want if the file is in the invalid format (it returns Left (FormatError "something")) or if the file has a number out of sequence (Left (InvalidValue 15)). But it throws an error if the file specified does not exist.
How do I catch the IO errors that readFile may produce so that I can transform them into the FileUnreadable error?
As a side question, is this even the best way to do it? I see the advantage that the caller of verifyFibFile does not have to set up two different exception handling mechanisms and can instead catch just one exception type.

You might consider EitherT and the errors package in general. http://hackage.haskell.org/packages/archive/errors/1.3.1/doc/html/Control-Error-Util.html has a utility tryIO for catching IOError in EitherT and you could use fmapLT to map error values to your custom type.
Specifically:
type FibIOMonad = EitherT FibException IO
verifyFibFile :: FilePath -> FibIOMonad ()
verifyFibFile path = do
sequenceStr <- fmapLT FileUnreadable (tryIO $ readFile path)
hoistEither $ verifySequence sequenceStr (0, 0)

#Savanni D'Gerinel: you are on the right track. Let's extract your error-catching code from verifyFibFile to make it more generic, and modify it slightly so that it works directly in ErrorT:
catchError' :: ErrorT e IO a -> (IOError -> ErrorT e IO a) -> ErrorT e IO a
catchError' m f =
ErrorT $ catchError (runErrorT m) (fmap runErrorT f)
verifyFibFile can now be written as:
verifyFibFile' :: FilePath -> FibIOMonad ()
verifyFibFile' path = do
sequenceStr <- catchError' (liftIO $ readFile path) (throwError . FileUnReadable)
ErrorT . return $ verifySequence sequenceStr' (0, 0)
Notice what we have done in catchError'. We have stripped the ErrorT constructor from the ErrorT e IO a action, and also from the return value of the error-handling function, knowing than we can reconstruct them afterwards by wrapping the result of the control operation in ErrorT again.
Turns out that this is a common pattern, and it can be done with monad transformers other than ErrorT. It can get tricky though (how to do this with ReaderT for example?). Luckily, the monad-control packgage already provides this functionality for many common transformers.
The type signatures in monad-control can seem scary at first. Start by looking at just one function: control. It has the type:
control :: MonadBaseControl b m => (RunInBase m b -> b (StM m a)) -> m a
Let's make it more specific by making b be IO:
control :: MonadBaseControl IO m => (RunInBase m IO -> IO (StM m a)) -> m a
m is a monad stack built on top of IO. In your case, it would be ErrorT IO.
RunInBase m IO is a type alias for a magical function, that takes a value of type m a and returns a value of type IO *something*, something being some complex magic that encodes the state of the whole monad stack inside IO and lets you reconstruct the m a value afterwards, once you have "fooled" the control operation that only accepts IO values. control provides you with that function, and also handles the reconstruction for you.
Applying this to your problem, we rewrite verifyFibFile once more as:
import Control.Monad.Trans.Control (control)
import Control.Exception (catch)
verifyFibFile'' :: FilePath -> FibIOMonad ()
verifyFibFile'' path = do
sequenceStr <- control $ \run -> catch (run . liftIO $ readFile path)
(run . throwError . FileUnreadable)
ErrorT . return $ verifySequence sequenceStr' (0, 0)
Keep in mind that this only works when the proper instance of MonadBaseControl b m exists.
Here is a nice introduction to monad-control.

So, here's an answer that I have developed. It centers around getting readFile wrapped into the proper catchError statement, and then lifted.
verifyFibFile :: FilePath -> FibIOMonad ()
verifyFibFile path = do
contents <- liftIO $ catchError (readFile path >>= return . Right) (return . Left . FileUnreadable)
case contents of
Right sequenceStr' -> case (verifySequence sequenceStr' (0, 0)) of
Right res -> return res
Left err -> throwError err
Left err -> throwError err
So, verifyFibFile gets a little more nested in this solution.
readFile path has type IO String, obviously. In this context, the type for catchError will be:
catchError :: IO String -> (IOError -> IO String) -> IO String
So, my strategy was to catch the error and turn it into the left side of an Either, and turn the successful value into the right side, changing my data type to this:
catchError :: IO (Either FibException String) -> (IOError -> IO (Either FibException String)) -> IO (Either FibException String)
I do this by, in the first parameter, simply wrapping the result into Right. I figure that I won't actually execute the return . Right branch of the code unless readFile path was successful. In the other parameter to catch, I start with an IOError, wrap it in Left, and then return it back into the IO context. After that, no matter what the result is, I lift the IO value up into the FibIOMonad context.
I'm bothered by the fact that the code gets even more nested. I have Left values, and all of those Left values get thrown. I'm basically in an Either context, and I had thought that one of the benefits Either's implementation of the Monad class was that Left values would simply be passed along through the binding operations and that no further code in that context would be executed. I would love some elucidation on this, or to see how the nesting can be removed from this function.
Maybe it can't. It does seem that the caller, however, can call verifyFibFile repeatedly and execution basically stops the first time verifyFibFile returns an error. This works:
runTest = do
res <- verifyFibFile "goodfib.txt"
liftIO $ putStrLn "goodfib.txt"
--liftIO $ printResult "goodfib.txt" res
res <- verifyFibFile "invalidValue.txt"
liftIO $ putStrLn "invalidValue.txt"
res <- verifyFibFile "formatError.txt"
liftIO $ putStrLn "formatError.txt"
Main> runErrorT $ runTest
goodfib.txt
Left (InvalidValue 17)
Given the files that I have created, both invalidValue.txt and formatError.txt cause errors, but this function returns Left (InvalidValue ...) for me.
That's okay, but I still feel like I've missed something with my solution. And I have no idea whether I'll be able to translate this into something that makes MongoDB access more robust.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How can I conditionally apply a conduit? - haskell

Related

Convert IO callback to infinite list

Streaming of a resource with proper acquisition and release

How to turn a pull based pipe into a push based one?

How can I write a pipe that sends downstream a list of what it receives from upstream?

How do I combine IOError exceptions with locally relevant exceptions?

Categories

Resources