Is there a pre-defined function for conduit analogy of `takeWhile`? - haskell

I find the following function missing from the Data.Conduit.List module, and I couldn't find an easy way to compose this using functions in that module.
takeWhile :: Monad m => (a -> Bool) -> Consumer a m [a]
takeWhile p = await >>= \case
Nothing -> return []
Just b -> if p b
then (b :) <$> takeWhile p
else (leftover b) >> return []
This function is very useful in my application where I sometimes need to group the next few items together, and I am not sure how many are there.
The missing of this function is kind of strange to me as there are take :: Monad m => Int -> Consumer a m [a], and groupBy :: Monad m => (a -> a -> Bool) -> Conduit a m [a], but no takeWhile.
Am I missing something?
Edit: Per #ErikR's request, here is two simple examples that can perhaps clarify why I think this function could be useful.
Case 1: the protocol specifies there be a header section in the stream. For simplicity let's assume it's a String stream and the header items are marked by a leading #.
Stream content:
#language=English
#encoding=Unicode
Apple
Orange
Blue
Red
Sheep
Dog
...
Code using takeWhile:
myConduit :: Conduit String IO String ()
myConduit = do
headers <- takeWhile ((== '#') . head)
awaitForever $ \ item -> do
case getLanguage headers of
English -> ...
French -> ...
Case 2: the protocol specifies that items with prefix # has several continuations prefixed by +.
Stream content:
Apple
Orange
Blue
#Has
+kell
#A
+Really
+Long
+Word
Dog
...
Code using takeWhile:
myConduit :: Conduit String IO String ()
myConduit = runMaybeC . forever $ do
a <- maybe (lift mzero) return =<< await
aConts <- if head item == '#' then takeWhile ((== '+') . head)
else return []
liftIO . putStrLn . concat $ a : aConts
However, aside from being useful, it is also for completeness. I see that Data.Conduit.List's goal is to provide a set of "list-like" operations in the Conduit context. I think bread-and-butter functions like takeWhile should be provided, along with its siblings like dropWhile, so that people don't have to change their style of coding when thinking about conduits as lists.

Related

How to convert a haskell List into a monadic function that uses list values for operations?

I am having trouble wrapping my head around making to work a conversion of a list into a monadic function that uses values of the list.
For example, I have a list [("dir1/content1", "1"), ("dir1/content11", "11"), ("dir2/content2", "2"), ("dir2/content21", "21")] that I want to be converted into a monadic function that is mapped to a following do statement:
do
mkBlob ("dir1/content1", "1")
mkBlob ("dir1/content11", "11")
mkBlob ("dir2/content2", "2")
mkBlob ("dir2/content21", "21")
I imagine it to be a function similar to this:
contentToTree [] = return
contentToTree (x:xs) = (mkBlob x) =<< (contentToTree xs)
But this does not work, failing with an error:
• Couldn't match expected type ‘() -> TreeT LgRepo m ()’
with actual type ‘TreeT LgRepo m ()’
• Possible cause: ‘(>>=)’ is applied to too many arguments
In the expression: (mkBlob x) >>= (contentToTree xs)
In an equation for ‘contentToTree’:
contentToTree (x : xs) = (mkBlob x) >>= (contentToTree xs)
• Relevant bindings include
contentToTree :: [(TreeFilePath, String)] -> () -> TreeT LgRepo m ()
I do not quite understand how to make it work.
Here is my relevant code:
import Data.Either
import Git
import Data.Map
import Conduit
import qualified Data.List as L
import qualified Data.ByteString.Char8 as BS
import qualified Data.ByteString.Lazy as BL
import Control.Monad (join)
type FileName = String
data Content = Content {
content :: Either (Map FileName Content) String
} deriving (Eq, Show)
contentToPaths :: String -> Content -> [(TreeFilePath, String)]
contentToPaths path (Content content) = case content of
Left m -> join $ L.map (\(k, v) -> (contentToPaths (if L.null path then k else path ++ "/" ++ k) v)) $ Data.Map.toList m
Right c -> [(BS.pack path, c)]
mkBlob :: MonadGit r m => (TreeFilePath, String) -> TreeT r m ()
mkBlob (path, content) = putBlob path
=<< lift (createBlob $ BlobStream $
sourceLazy $ BL.fromChunks [BS.pack content])
sampleContent = Content $ Left $ fromList [
("dir1", Content $ Left $ fromList [
("content1", Content $ Right "1"),
("content11", Content $ Right "11")
]),
("dir2", Content $ Left $ fromList [
("content2", Content $ Right "2"),
("content21", Content $ Right "21")
])
]
Would be grateful for any tips or help.
You have:
A list of values of some type a (in this case a ~ (String, String)). So, xs :: [a]
A function f from a to some type b in a monadic context, m b. Since you're ignoring the return value, we can imagine b ~ (). So, f :: Monad m => a -> m ().
You want to perform the operation, yielding some monadic context and an unimportant value, m (). So overall, we want some function doStuffWithList :: Monad m => [a] -> (a -> m ()) -> m (). We can search Hoogle for this type, and it yields some results. Unfortunately, as we've chosen to order the arguments, the first several results are little-used functions from other packages. If you scroll further, you start to find stuff in base - very promising. As it turns out, the function you are looking for is traverse_ :: (Foldable t, Applicative f) => (a -> f b) -> t a -> f (). With that, we can replace your do-block with just:
traverse_ mkBlob [ ("dir1/content1", "1")
, ("dir1/content11", "11")
, ("dir2/content2", "2")
, ("dir2/content21", "21")
]
As it happens there are many names for this function, some for historical reasons and some for stylistic reasons. mapM_, forM_, and for_ are all the same and all in base, so you could use any of these. But the M_ versions are out of favor these days because really you only need Applicative, not Monad; and the for versions take their arguments in an order that's convenient for lambdas but inconvenient for named functions. So, traverse_ is the one I'd suggest.
Assuming mkBlob is a function that looks like
mkBlob :: (String, String) -> M ()
where M is some specific monad, then you have the list
xs = [("dir1/content1", "1"), ("dir1/content11", "11"), ("dir2/content2", "2"), ("dir2/content21", "21")]
whose type is xs :: [(String, String)]. The first thing we need is to run the mkBlob function on each element, i.e. via map.
map mkBlob xs :: [M ()]
Now, we have a list of monadic actions, so we can use sequence to run them in sequence.
sequence (map mkBlob xs) :: M [()]
The resulting [()] value is all but useless, so we can use void to get rid of it
void . sequence . map mkBlob $ xs :: M ()
Now, void . sequence is called sequence_ in Haskell (since this pattern is fairly common), and sequence . map is called mapM. Putting the two together, the function you want is called mapM_.
mapM_ mkBlob xs :: M ()

Haskell: Replace mapM in a monad transformer stack to achieve lazy evaluation (no space leaks)

It has already been discussed that mapM is inherently not lazy, e.g. here and here. Now I'm struggling with a variation of this problem where the mapM in question is deep inside a monad transformer stack.
Here's a function taken from a concrete, working (but space-leaking) example using LevelDB that I put on gist.github.com:
-- read keys [1..n] from db at DirName and check that the values are correct
doRead :: FilePath -> Int -> IO ()
doRead dirName n = do
success <- runResourceT $ do
db <- open dirName defaultOptions{ cacheSize= 2048 }
let check' = check db def in -- is an Int -> ResourceT IO Bool
and <$> mapM check' [1..n] -- space leak !!!
putStrLn $ if success then "OK" else "Fail"
This function reads the values corresponding to keys [1..n] and checks that they are all correct. The troublesome line inside the ResourceT IO a monad is
and <$> mapM check' [1..n]
One solution would be to use streaming libraries such as pipes, conduit, etc. But these seem rather heavy and I'm not at all sure how to use them in this situation.
Another path I looked into is ListT as suggested here. But the type signatures of ListT.fromFoldable :: [Bool]->ListT Bool and ListT.fold :: (r -> a -> m r) -> r -> t m a -> mr (where m=IO and a,r=Bool) do not match the problem at hand.
What is a 'nice' way to get rid of the space leak?
Update: Note that this problem has nothing to do with monad transformer stacks! Here's a summary of the proposed solutions:
1) Using Streaming:
import Streaming
import qualified Streaming.Prelude as S
S.all_ id (S.mapM check' (S.each [1..n]))
2) Using Control.Monad.foldM:
foldM (\a i-> do {b<-check' i; return $! a && b}) True [1..n]
3) Using Control.Monad.Loops.allM
allM check' [1..n]
I know you mention you don't want to use streaming libraries, but your problem seems pretty easy to solve with streaming without changing the code too much.
import Streaming
import qualified Streaming.Prelude as S
We use each [1..n] instead of [1..n] to get a stream of elements:
each :: (Monad m, Foldable f) => f a -> Stream (Of a) m ()
Stream the elements of a pure, foldable container.
(We could also write something like S.take n $ S.enumFrom 1).
We use S.mapM check' instead of mapM check':
mapM :: Monad m => (a -> m b) -> Stream (Of a) m r -> Stream (Of b) m r
Replace each element of a stream with the result of a monadic action
And then we fold the stream of booleans with S.all_ id:
all_ :: Monad m => (a -> Bool) -> Stream (Of a) m r -> m Bool
Putting it all together:
S.all_ id (S.mapM check' (S.each [1..n]))
Not too different from the code you started with, and without the need for any new operator.
I think what you need is allM from the monad-loops package.
Then it would be just allM check' [1..n]
(Or if you don't want the import it's a pretty small function to copy.)

Chaining functions of type IO (Maybe a )

I am writing a small library for interacting with a few external APIs. One set of functions will construct a valid request to the yahoo api and parse the result to a data type. Another set of functions will look up the users current location based on IP and return a data type representing the current location. While the code works, it seems having to explicitly pattern match to sequence multiple functions of type IO (Maybe a).
-- Yahoo API
constructQuery :: T.Text -> T.Text -> T.Text
constructQuery city state = "select astronomy, item.condition from weather.forecast" <>
" where woeid in (select woeid from geo.places(1)" <>
" where text=\"" <> city <> "," <> state <> "\")"
buildRequest :: T.Text -> IO ByteString
buildRequest yql = do
let root = "https://query.yahooapis.com/v1/public/yql"
datatable = "store://datatables.org/alltableswithkeys"
opts = defaults & param "q" .~ [yql]
& param "env" .~ [datatable]
& param "format" .~ ["json"]
r <- getWith opts root
return $ r ^. responseBody
run :: T.Text -> IO (Maybe Weather)
run yql = buildRequest yql >>= (\r -> return $ decode r :: IO (Maybe Weather))
-- IP Lookup
getLocation:: IO (Maybe IpResponse)
getLocation = do
r <- get "http://ipinfo.io/json"
let body = r ^. responseBody
return (decode body :: Maybe IpResponse)
-- Combinator
runMyLocation:: IO (Maybe Weather)
runMyLocation = do
r <- getLocation
case r of
Just ip -> getWeather ip
_ -> return Nothing
where getWeather = (run . (uncurry constructQuery) . (city &&& region))
Is it possible to thread getLocation and run together without resorting to explicit pattern matching to "get out" of the Maybe Monad?
You can happily nest do blocks that correspond to different monads, so it's just fine to have a block of type Maybe Weather in the middle of your IO (Maybe Weather) block.
For example,
runMyLocation :: IO (Maybe Weather)
runMyLocation = do
r <- getLocation
return $ do ip <- r; return (getWeather ip)
where
getWeather = run . (uncurry constructQuery) . (city &&& region)
This simple pattern do a <- r; return f a indicates that you don't need the monad instance for Maybe at all though - a simple fmap is enough
runMyLocation :: IO (Maybe Weather)
runMyLocation = do
r <- getLocation
return (fmap getWeather r)
where
getWeather = run . (uncurry constructQuery) . (city &&& region)
and now you see that the same pattern appears again, so you can write
runMyLocation :: IO (Maybe Weather)
runMyLocation = fmap (fmap getWeather) getLocation
where
getWeather = run . (uncurry constructQuery) . (city &&& region)
where the outer fmap is mapping over your IO action, and the inner fmap is mapping over your Maybe value.
I misinterpreted the type of getWeather (see comment below) such that you will end up with IO (Maybe (IO (Maybe Weather))) rather than IO (Maybe Weather).
What you need is a "join" through a two layer monad stack. This is essentially what a monad transformer provides for you (see #dfeuer's answer) but it is possible to write this combinator manually in the case of Maybe -
import Data.Maybe (maybe)
flatten :: (Monad m) => m (Maybe (m (Maybe a))) -> m (Maybe a)
flatten m = m >>= fromMaybe (return Nothing)
in which case you can write
runMyLocation :: IO (Maybe Weather)
runMyLocation = flatten $ fmap (fmap getWeather) getLocation
where
getWeather = run . (uncurry constructQuery) . (city &&& region)
which should have the correct type. If you are going to chain multiple functions like this, you will need multiple calls to flatten, in which case it maybe be easier to build a monad transformer stack instead (with the caveat's in #dfeuer's answer).
There is probably a canonical name for the function I've called "flatten" in the transformers or mtl libraries, but I can't find it at the moment.
Note that the function fromMaybe from Data.Maybe essentially does the case analysis for you, but abstracts it into a function.
Some consider this an anti-pattern, but you could use MaybeT IO a instead of IO (Maybe a). The problem is that you only deal with one of the ways getLocation can fail—it could also throw an IO exception. From that perspective, you might as well drop the Maybe and just throw your own exception if decoding fails, catching it wherever you like.
change getWeather to have Maybe IpResponse->IO.. and use >>= to implement it and then you can do getLocation >>= getWeather. The >>= in getWeather is the one from Maybe, that will deal with Just and Nothing and the other getLocation>>= getWeather the one from IO.
you can even abstract from Maybe and use any Monad: getWeather :: Monad m -> m IpResponse -> IO .. and will work.

How can I write a pipe that sends downstream a list of what it receives from upstream?

I'm having a hard time to write a pipe with this signature:
toOneBigList :: (Monad m, Proxy p) => () -> Pipe p a [a] m r
It should simply take all as from upstream and send them in a list downstream.
All my attempts look fundamentally broken.
Can anybody point me in the right direction?
There are two pipes-based solutions and I'll let you pick which one you prefer.
Note: It's not clear why you output the list on the downstream interface instead of just returning it directly as a result.
Conduit-style
The first one, which is very close to the conduit-based solution uses the upcoming pipes-pase, which is basically complete and just needs documentation. You can find the latest draft on Github.
Using pipes-parse, the solution is identical to the conduit solution that Petr gave:
import Control.Proxy
import Control.Proxy.Parse
combine
:: (Monad m, Proxy p)
=> () -> Pipe (StateP [Maybe a] p) (Maybe a) [a] m ()
combine () = loop []
where
loop as = do
ma <- draw
case ma of
Nothing -> respond (reverse as)
Just a -> loop (a:as)
draw is like conduit's await: it requests a value from either the leftovers buffer (that's the StateP part) or from upstream if the buffer is empty. Nothing indicates end of file.
You can wrap a pipe that does not have an end of file signal using the wrap function from pipes-parse, which has type:
wrap :: (Monad m, Proxy p) => p a' a b' b m r -> p a' a b' (Maybe b) m s
Classic Pipes Style
The second alternative is a bit simpler. If you want to fold a given pipe you can do so directly using WriterP:
import Control.Proxy
import Control.Proxy.Trans.Writer
foldIt
:: (Monad m, Proxy p) =>
(() -> Pipe p a b m ()) -> () -> Pipe p a [b] m ()
foldIt p () = runIdentityP $ do
r <- execWriterK (liftP . p >-> toListD >-> unitU) ()
respond r
That's a higher-level description of what is going on, but it requires passing in the pipe as an explicit argument. It's up to you which one you prefer.
By the way, this is why I was asking why you want to send a single value downstream. The above is much simpler if you return the folded list:
foldIt p = execWriterK (liftP . p >-> toListD)
The liftP might not even be necessary if p is completely polymorphic in its proxy type. I only include it as a precaution.
Bonus Solution
The reason pipes-parse does not provide the toOneBigList is that it's always a pipes anti-pattern to group the results into a list. pipes has several nice features that make it possible to never have to group the input into a list, even if you are trying to yield multiple lists. For example, using respond composition you can have a proxy yield the subset of the stream it would have traversed and then inject a handler that uses that subset:
example :: (Monad m, Proxy p) => () -> Pipe p a (() -> Pipe p a a m ()) m r
example () = runIdentityP $ forever $ do
respond $ \() -> runIdentityP $ replicateM_ 3 $ request () >>= respond
printIt :: (Proxy p, Show a) => () -> Pipe p a a IO r
printIt () = runIdentityP $ do
lift $ putStrLn "Here we go!"
printD ()
useIt :: (Proxy p, Show a) => () -> Pipe p a a IO r
useIt = example />/ (\p -> (p >-> printIt) ())
Here's an example of how to use it:
>>> runProxy $ enumFromToS 1 10 >-> useIt
Here we go!
1
2
3
Here we go!
4
5
6
Here we go!
7
8
9
Here we go!
10
This means you never need to bring a single element into memory even when you need to group elements.
I'll give only a partial answer, perhaps somebody else will have a better one.
As far as I know, standard pipes have no mechanism of detecting when the other part of the pipeline terminates. The first pipe that terminates produces the final result of the pipe-line and all the others are just dropped. So if you have a pipe that consumes input forever (to eventually produce a list), it will have no chance acting and producing output when its upstream finishes. (This is intentional so that both up- and down-stream parts are dual to each other.) Perhaps this is solved in some library building on top of pipes.
The situation is different with conduit. It has consume function that combines all inputs into a list and returns (not outputs) it. Writing a function like the one you need, that outputs the list at the end, is not difficult:
import Data.Conduit
combine :: (Monad m) => Conduit a m [a]
combine = loop []
where
loop xs = await >>= maybe (yield $ reverse xs) (loop . (: xs))

Unwrapping a monad

Given the below program, I am having issues dealing with monads.
module Main
where
import System.Environment
import System.Directory
import System.IO
import Text.CSV
--------------------------------------------------
exister :: String -> IO Bool
exister path = do
fileexist <- doesFileExist path
direxist <- doesDirectoryExist path
return (fileexist || direxist )
--------------------------------------------------
slurp :: String -> IO String
slurp path = do
withFile path ReadMode (\handle -> do
contents <- hGetContents handle
last contents `seq` return contents )
--------------------------------------------------
main :: IO ()
main = do
[csv_filename] <- getArgs
putStrLn (show csv_filename)
csv_raw <- slurp csv_filename
let csv_data = parseCSV csv_filename csv_raw
printCSV csv_data -- unable to compile.
csv_data is an Either (parseerror) CSV type, and printCSV takes only CSV data.
Here's the ediff between the working version and the broken version.
***************
*** 27,30 ****
csv_raw <- slurp csv_filename
let csv_data = parseCSV csv_filename csv_raw
! printCSV csv_data -- unable to compile.
\ No newline at end of file
--- 27,35 ----
csv_raw <- slurp csv_filename
let csv_data = parseCSV csv_filename csv_raw
! case csv_data of
! Left error -> putStrLn $ show error
! Right csv_data -> putStrLn $ printCSV csv_data
!
! putStrLn "done"
!
reference: http://hackage.haskell.org/packages/archive/csv/0.1.2/doc/html/Text-CSV.html
Regarding monads:
Yes, Either a is a monad. So simplifying the problem, you are basically asking for this:
main = print $ magicMonadUnwrap v
v :: Either String Int
v = Right 3
magicMonadUnwrap :: (Monad m) => m a -> a
magicMonadUnwrap = undefined
How do you define magicMonadUnwrap? Well, you see, it's different for each monad. Each one needs its own unwrapper. Many of these have the word "run" in them, for example, runST, runCont, or runEval. However, for some monads, it might not be safe to unwrap them (hence the need for differing unwrappers).
One implementation for lists would be head. But what if the list is empty? An unwrapper for Maybe is fromJust, but what if it's Nothing?
Similarly, the unwrapper for the Either monad would be something like:
fromRight :: Either a b -> b
fromRight (Right x) = x
But this unwrapper isn't safe: what if you had a Left value instead? (Left usually represents an error state, in your case, a parse error). So the best way to act upon an Either value it is to use the either function, or else use a case statement matching Right and Left, as Daniel Wagner illustrated.
tl;dr: there is no magicMonadUnwrap. If you're inside that same monad, you can use <-, but to truly extract the value from a monad...well...how you do it depends on which monad you're dealing with.
Use case.
main = do
...
case csv_data of
Left err -> {- whatever you're going to do with an error -- print it, throw it as an exception, etc. -}
Right csv -> printCSV csv
The either function is shorter (syntax-wise), but boils down to the same thing.
main = do
...
either ({- error condition function -}) printCSV csv_data
You must unlearn what you have learned.
Master Yoda.
Instead of thinking about, or searching for ways to "free", "liberate", "release", "unwrap" or "extract" normal Haskell values from effect-centric (usually monadic) contexts, learn how to use one of Haskell's more distinctive features - functions are first-class values:
you can use functions like values of other types e.g. like Bool, Char, Int, Integer etc:
arithOps :: [(String, Int -> Int -> Int)]
arithOps = zip ["PLUS","MINUS", "MULT", "QUOT", "REM"]
[(+), (-), (*), quot, rem]
For your purposes, what's more important is that functions can also be used as arguments e.g:
map :: (a -> b) -> [a] -> [b]
map f xs = [ f x | x <- xs ]
filter :: (a -> Bool) -> [a] -> [a]
filter p xs = [ x | x <- xs, p x ]
These higher-order functions are even available for use in effect-bearing contexts e.g:
import Control.Monad
liftM :: Monad m => (a -> b) -> (m a -> m b)
liftM2 :: Monad m => (a -> b -> c) -> (m a -> m b -> m c)
liftM3 :: Monad m => (a -> b -> c -> d) -> (m a -> m b -> m c -> m d)
...etc, which you can use to lift your regular Haskell functions:
do .
.
.
val <- liftM3 calculate this_M that_M other_M
.
.
.
Of course, the direct approach also works:
do .
.
.
x <- this_M
y <- that_M
z <- other_M
let val = calculate x y z
.
.
.
As your skills develop, you'll find yourself delegating more and more code to ordinary functions and leaving the effects to a vanishingly-small set of entities defined in terms of functors, applicatives, monads, arrows, etc as you progress towards Haskell mastery.
You're not convinced? Well, here's a brief note of how effects used to be handled in Haskell - there's also a longer description of how Haskell arrived at the monadic interface. Alternately, you could look at Standard ML, OCaml, and other similar languages - who knows, maybe you'll be happier with using them...

Resources