Signalling to downstream that upstream is exhausted - haskell

Question
Using the Haskell pipes library, I'm trying to define a Pipe with the following type:
signalExhausted :: Monad m => Pipe a (Value a) m r
where the Value data type is defined by:
data Value a = Value a | Exhausted
The pipe should obey the following laws:
toList (each [] >-> signalExhausted) == [Exhausted]
toList (each xs >-> signalExhausted) == map Value xs ++ [Exhausted]
In other words, the pipe should be equivalent to Pipes.Prelude.map Value, except that it should yield an additional Exhausted after all upstream values have been processed, giving downstream a chance to perform some final action.
Can such a Pipe be defined?
Example
> let xs = words "hubble bubble toil and trouble"
> toList $ each xs >-> signalExhausted
[Value "hubble", Value "bubble", Value "toil", Value "and", Value "trouble", Exhausted]
Notes
I'm aware that the pipes-parse library provides the functions draw and parseForever. These look useful, but I can't quite see how to combine them into a Pipe that matches the specification above.

A pipe like signalExhausted can't be defined, but a function equivalent to (>-> signalExhausted) can.
>-> is a specialized version of the pull category. Execution is driven by the downstream proxies pulling data from upstream proxies. The downstream proxy sends an empty request () upstream and blocks until a response holding a value comes back from the upstream proxy. When the upstream proxy is exhausted and doesn't have any more values to send back, it returns. You can see the return that matters for these examples in the definition of each.
each = F.foldr (\a p -> yield a >> p) (return ())
-- what to do when the data's exhausted ^
The downstream proxy needs a value to continue running, but there's no value the pipes library can possibly provide it, so the downstream proxy never runs again. Since it never runs again, there's no way it can modify or react to the data.
There are two solutions to this problem. The simplest is to map Value over the upstream pipe and add a yield Exhausted after it's done.
import Pipes
import qualified Pipes.Prelude as P
data Value a = Value a | Exhausted
deriving (Show)
signalExhausted p = p >-> P.map Value >> yield Exhausted
This does exactly what you're looking for except the function signalExhausted takes the place of (>-> signalExhausted).
let xs = words "hubble bubble toil and trouble"
print . P.toList . signalExhausted $ each xs
[Value "hubble",Value "bubble",Value "toil",Value "and",Value "trouble",Exhausted]
The more general solution to this problem is to stop the upstream proxy from returning and instead signal downstream when it is exhausted. I demonstrated how to do so in an answer to a related question.
import Control.Monad
import Pipes.Core
returnDownstream :: Monad m => Proxy a' a b' b m r -> Proxy a' a b' (Either r b) m r'
returnDownstream = (forever . respond . Left =<<) . (respond . Right <\\)
This replaces each respond with respond . Right and replaces return with forever . respond . left, sending returns downstream along with responses.
returnDownstream is more general than what you are looking for. We can demonstrate how to use it to recreate signalExhausted. returnDownstream transforms a pipe that returns into one that never returns, and instead forwards its return value downstream as the Left value of an Either.
signalExhausted p = returnDownstream p >-> respondLeftOnce
respondLeftOnce is an example downstream proxy. The downstream proxy can discern between regular values held in Right and the return value held in Left.
respondLeftOnce :: Monad m => Pipe (Either e a) (Value a) m ()
respondLeftOnce = go
where
go = do
ea <- await
case ea of
Right a -> yield (Value a) >> go
Left _ -> yield Exhausted -- The upstream proxy is exhausted; do something else

Related

Haskell sequence of IO actions processing with filtration their results in realtime+perfoming some IO actions in certain moments

I want to do some infinite sequence of IO actions processing with filtration their results in realtime+perfoming some IO actions in certain moments:
We have some function for reducing sequences (see my question haskell elegant way to filter (reduce) sequences of duplicates from infinte list of numbers):
f :: Eq a => [a] -> [a]
f = map head . group
and expression
join $ sequence <$> ((\l -> (print <$> l)) <$> (f <$> (sequence $ replicate 6 getLine)))
if we run this, user can generate any seq of numbers, for ex:
1
2
2
3
3
"1"
"2"
"3"
[(),(),()]
This means that at first all getLine actions performed (6 times in the example and at the end of this all IO actions for filtered list performed, but I want to do IO actions exactly in the moments then sequencing reduces done for some subsequences of same numbers.
How can I archive this output:
1
2
"1"
2
3
"2"
3
3
"3"
[(),(),()]
So I Want this expression not hangs:
join $ sequence <$> ((\l -> (print <$> l)) <$> (f <$> (sequence $ repeat getLine)))
How can I archive real-time output as described above without not blocking it on infinite lists?
Without a 3rd-party library, you can lazily read the contents of standard input, appending a dummy string to the end of the expected input to force output. (There's probably a better solution that I'm stupidly overlooking.)
import System.IO
print_unique :: (String, String) -> IO ()
print_unique (last, current) | last == current = return ()
| otherwise = print last
main = do
contents <- take 6 <$> lines <$> hGetContents stdin
traverse print_unique (zip <*> tail $ (contents ++ [""]))
zip <*> tail produces tuples consisting of the ith and i+1st lines without blocking. print_unique then immediately outputs a line if the following line is different.
Essentially, you are sequencing the output actions as the input is executed, rather than sequencing the input actions.
This seems like a job for a streaming library, like streaming.
{-# LANGUAGE ImportQualifiedPost #-}
module Main where
import Streaming
import Streaming.Prelude qualified as S
main :: IO ()
main =
S.mapM_ print
. S.catMaybes
. S.mapped S.head
. S.group
$ S.replicateM 6 getLine
"streaming" has an API reminiscent to that of lists, but works with effectful sequences.
The nice thing about streaming's version of group is that it doesn't force you to keep the whole group in memory if it isn't needed.
The least intuitive function in this answer is mapped, because it's very general. It's not obvious that streaming's version of head fits as its parameter. The key idea is that the Stream type can represent both normal effectful sequences, and sequences of elements on which groups have been demarcated. This is controlled by changing a functor type parameter (Of in the first case, a nested Stream (Of a) m in the case of grouped Streams).
mapped let's you transform that functor parameter while having some effect in the underlying monad (here IO). head processes the inner Stream (Of a) m groups, getting us back to an Of (Maybe a) functor parameter.
I found a nice solution with iterateUntilM
iterateUntilM (\_->False) (\pn -> getLine >>= (\n -> if n==pn then return n else (if pn/="" then print pn else return ()) >> return n) ) ""
I don't like some verbose with
(if pn/="" then print pn else return ())
if you know how to reduce this please comment)
ps.
It is noteworthy that I made a video about this function :)
And could not immediately apply it :(

Haskell Pipes -- Having a pipe consume what it yields (itself)

I'm trying to write a webscraper using Pipes and I've come to the part of following scraped links. I have a process function that downloads a url, finds links, and yields them.
process :: Pipe Item Item (StateT CState IO) ()
....
for (each links) yield
....
Now I want to some how recursively follow these links, threading the StateT through. I realize there is probably doing something more idiomatic then using a single pipe for the bulk of the scraper (especially as I start adding more features), I'm open for suggestions. I'm probably going to have to rethink the design when I consider multithreading w/ shared state anyways.
You can connect a Pipe a b m r to a side-effect through the m parameter, which swaps out which Monad the pipe is operating over. You can use this to requeue links by connecting the downstream end of your pipe to another pipe that sticks the links in a queue and connecting the upstream end of your pipe to a pipe that reads links from the queue.
Our goal is to write
import Pipes
loopLeft :: Monad m => Pipe (Either l a) (Either l b) m r -> Pipe a b m r
We'll take a pipe whose downstream output, Either l b, is either an Left l to send back upstream or a Right b to send downstream, and send the ls back in the upstream input Either l a, which is either a queued up Left l or a Right a coming from upstream. We'll connect the Left ls together to make a pipe that only sees as coming from upstream and only yields bs headed downstream.
At the downstream end we'll push the ls from Left l onto a stack. We yield the rs from Right r downstream.
import Control.Monad
import Control.Monad.Trans.State
pushLeft :: Monad m => Pipe (Either l a) a (StateT [l] m) r
pushLeft = forever $ do
o <- await
case o of
Right a -> yield a
Left l -> do
stack <- lift get
lift $ put (l : stack)
At the upstream end we'll look for something on top of the stack to yield. If there isn't one, we'll await for a value from upstream and yield it.
popLeft :: Monad m => Pipe a (Either l a) (StateT [l] m) r
popLeft = forever $ do
stack <- lift get
case stack of
[] -> await >>= yield . Right
(x : xs) -> do
lift $ put xs
yield (Left x)
Now we can write loopLeft. We compose the upstream and downstream pipes together with pipe composition popLeft >-> hoist lift p >-> pushLeft. The hoist lift turns a Pipe a b m r into a Pipe a b (t m) r. The distribute turns a Pipe a b (t m) r into a t (Pipe a b m) r. To get back to a Pipe a b m r we run the whole StateT computation starting with an empty stack []. In Pipes.Lift there's a nice name evalStateP for the combination of evalStateT and distribute.
import Pipes.Lift
loopLeft :: Monad m => Pipe (Either l a) (Either l b) m r -> Pipe a b m r
loopLeft p = flip evalStateT [] . distribute $ popLeft >-> hoist lift p >-> pushLeft
I would do it like this:
import Pipes
type Url = String
getLinks :: Url -> IO [Url]
getLinks = undefined
crawl :: MonadIO m => Pipe Url Url m a
crawl = loop []
where
loop [] = do url <- await; loop [url]
loop (url:urls) = do
yield url
urls' <- liftIO $ getLinks url
loop (urls ++ urls')
You can achieve DFS or BFS depending on how you combine url' with urls.

How to write a Haskell Pipes "sum" function?

I'm trying to learn the pipes package by writing my own sum function and I'm getting stumped. I'd like to not use the utility functions from Pipes.Prelude (since it has sum and fold and other functions which make it trivial) and only use the information as described in Pipes.Tutorial. The tutorial doesn't talk about the constructors of Proxy, but if I look in the source of sum and fold it uses those constructors and I wonder whether it is possible to write my sum function without knowledge of these low level details.
I'm having trouble coming to terms with how this function would be able to continue taking in values as long as there are values available, and then somehow return that sum to the user. I guess the type would be:
sum' :: Monad m => Consumer Int m Int
It appears to me this could work because this function could consume values until there are no more, then return the final sum. I would use it like this:
mysum <- runEffect $ inputs >-> sum'
However, the function in Pipes.Prelude has the following signature instead:
sum :: (Monad m, Num a) => Producer a m () -> m a
So I guess this is my first hurdle. Why does the sum function take a Producer as an argument as opposed to using >-> to connect?
FYI I ended up with the following after the answer from danidiaz:
sum' = go 0
where
go n p = next p >>= \x -> case x of
Left _ -> return n
Right (_, p') -> go (n + 1) p'
Consumers are actually quite limited in what they can do. They can't detect end-of-input (pipes-parse uses a different technique for that) and when some other part of the pipeline stops (for example the Producer upstream) that part is the one that must provide the result value for the pipeline. So putting the sum in the return value of the Consumer won't work in general.
Some alternatives are:
Implement a function that deals directly with Producer internals, or perhaps uses an auxiliary function like next. There are adapters of this type that can feed Producer data to "smarter" consumers, like Folds from the foldl package.
Keep using a Consumer, but instead of putting the sum in the return value of the Consumer, use a WriterT as the base monad with a Sum Int monoid as accumulator. That way, even if the Producer stop first, you can still run the writer to get to the accumulator This solution is likely to be less efficient, though.
Example code for the WriterT approach:
import Data.Monoid
import Control.Monad
import Control.Monad.Trans.Writer
import Pipes
producer :: Monad m => Producer Int m ()
producer = mapM_ yield [1..10]
summator :: Monad n => Consumer Int (WriterT (Sum Int) n) ()
summator = forever $ await >>= lift . tell . Sum
main :: IO ()
main = do
Sum r <- execWriterT . runEffect $ producer >-> summator
print r

How can I write a pipe that sends downstream a list of what it receives from upstream?

I'm having a hard time to write a pipe with this signature:
toOneBigList :: (Monad m, Proxy p) => () -> Pipe p a [a] m r
It should simply take all as from upstream and send them in a list downstream.
All my attempts look fundamentally broken.
Can anybody point me in the right direction?
There are two pipes-based solutions and I'll let you pick which one you prefer.
Note: It's not clear why you output the list on the downstream interface instead of just returning it directly as a result.
Conduit-style
The first one, which is very close to the conduit-based solution uses the upcoming pipes-pase, which is basically complete and just needs documentation. You can find the latest draft on Github.
Using pipes-parse, the solution is identical to the conduit solution that Petr gave:
import Control.Proxy
import Control.Proxy.Parse
combine
:: (Monad m, Proxy p)
=> () -> Pipe (StateP [Maybe a] p) (Maybe a) [a] m ()
combine () = loop []
where
loop as = do
ma <- draw
case ma of
Nothing -> respond (reverse as)
Just a -> loop (a:as)
draw is like conduit's await: it requests a value from either the leftovers buffer (that's the StateP part) or from upstream if the buffer is empty. Nothing indicates end of file.
You can wrap a pipe that does not have an end of file signal using the wrap function from pipes-parse, which has type:
wrap :: (Monad m, Proxy p) => p a' a b' b m r -> p a' a b' (Maybe b) m s
Classic Pipes Style
The second alternative is a bit simpler. If you want to fold a given pipe you can do so directly using WriterP:
import Control.Proxy
import Control.Proxy.Trans.Writer
foldIt
:: (Monad m, Proxy p) =>
(() -> Pipe p a b m ()) -> () -> Pipe p a [b] m ()
foldIt p () = runIdentityP $ do
r <- execWriterK (liftP . p >-> toListD >-> unitU) ()
respond r
That's a higher-level description of what is going on, but it requires passing in the pipe as an explicit argument. It's up to you which one you prefer.
By the way, this is why I was asking why you want to send a single value downstream. The above is much simpler if you return the folded list:
foldIt p = execWriterK (liftP . p >-> toListD)
The liftP might not even be necessary if p is completely polymorphic in its proxy type. I only include it as a precaution.
Bonus Solution
The reason pipes-parse does not provide the toOneBigList is that it's always a pipes anti-pattern to group the results into a list. pipes has several nice features that make it possible to never have to group the input into a list, even if you are trying to yield multiple lists. For example, using respond composition you can have a proxy yield the subset of the stream it would have traversed and then inject a handler that uses that subset:
example :: (Monad m, Proxy p) => () -> Pipe p a (() -> Pipe p a a m ()) m r
example () = runIdentityP $ forever $ do
respond $ \() -> runIdentityP $ replicateM_ 3 $ request () >>= respond
printIt :: (Proxy p, Show a) => () -> Pipe p a a IO r
printIt () = runIdentityP $ do
lift $ putStrLn "Here we go!"
printD ()
useIt :: (Proxy p, Show a) => () -> Pipe p a a IO r
useIt = example />/ (\p -> (p >-> printIt) ())
Here's an example of how to use it:
>>> runProxy $ enumFromToS 1 10 >-> useIt
Here we go!
1
2
3
Here we go!
4
5
6
Here we go!
7
8
9
Here we go!
10
This means you never need to bring a single element into memory even when you need to group elements.
I'll give only a partial answer, perhaps somebody else will have a better one.
As far as I know, standard pipes have no mechanism of detecting when the other part of the pipeline terminates. The first pipe that terminates produces the final result of the pipe-line and all the others are just dropped. So if you have a pipe that consumes input forever (to eventually produce a list), it will have no chance acting and producing output when its upstream finishes. (This is intentional so that both up- and down-stream parts are dual to each other.) Perhaps this is solved in some library building on top of pipes.
The situation is different with conduit. It has consume function that combines all inputs into a list and returns (not outputs) it. Writing a function like the one you need, that outputs the list at the end, is not difficult:
import Data.Conduit
combine :: (Monad m) => Conduit a m [a]
combine = loop []
where
loop xs = await >>= maybe (yield $ reverse xs) (loop . (: xs))

Reading in arbitrary amount of binary messages

I am parsing binary data out of files using Binary.Get and have something like the following:
data FileMessageHeaders = FileMessageHeaders [FileMessageHeader]
data FileMessageHeader = FileMessageHeader ...
instance Binary FileMessageHeaders where
put = undefined
get = do
messages <- untilM get isEmpty
return (FileMessageHeaders messages)
instance Binary FileMessageHeader where
put = undefined
get = ..
The problem I am having is that the untilM from monad-loops on hackage uses sequence so I believe that this is what is causing a massive delay in returning the head of the FileMessageHeader list as the whole file must be read (is this correct?). I am having trouble coming up with a way to rewrite this and avoid sequencing all of the FileMessageHeaders in the file. Any suggestions?
Thanks!
As FUZxxl notes, the problem is untilM; the Get monad is strict and requires that the entire untilM action completes before it returns. IO has nothing to do with it.
The easiest thing to do is probably switch to attoparsec and use that for parsing instead of binary. Attoparsec supports streaming parses and would likely be much easier to use for this case.
If you can't switch to attoparsec, you'll need to use some of the lower-level functions of binary rather than just using the Binary instance. Something like the following (completely untested).
getHeaders :: ByteString -> [FileMessageHeader]
getHeaders b = go b 0
where
go bs n
| B.null bs = []
| otherwise = let (header, bs', n') = runGetState get bs n
in header : go bs' n'
Unfortunately this means you won't be able to use the Binary instance or the get function, you'll have to use getHeaders. It will stream though.
The problem here is, that an IO action has to finish before the control flow can continue. Thus, the program has to read in all the messages, before they get evaluated. You could try to define an own combinator sequenceI, that uses the function unsafeInterleaveIO from System.IO.Unsafe. This function allows you, well, to interleave actions. It is used, for instance by getContents. I would define sequenceI like this:
sequenceI (x:xs) = do v <- x
vs <- unsafeInterleaveIO $ sequenceI xs
return (v:vs)
On top of this combinator, you can define your own untilM, that streams. Doing this is left as an excercise to the reader.
Edit (corrected for compilation)
This is a proof-of-concept, untested implementation of untilM:
untilMI f p = do
f' <- f
p' <- p
if p'
then return [f']
else do g' <- unsafeInterleaveIO $ untilMI f p
return (f' : g')

Resources