How to write a Haskell Pipes "sum" function? - haskell

I'm trying to learn the pipes package by writing my own sum function and I'm getting stumped. I'd like to not use the utility functions from Pipes.Prelude (since it has sum and fold and other functions which make it trivial) and only use the information as described in Pipes.Tutorial. The tutorial doesn't talk about the constructors of Proxy, but if I look in the source of sum and fold it uses those constructors and I wonder whether it is possible to write my sum function without knowledge of these low level details.
I'm having trouble coming to terms with how this function would be able to continue taking in values as long as there are values available, and then somehow return that sum to the user. I guess the type would be:
sum' :: Monad m => Consumer Int m Int
It appears to me this could work because this function could consume values until there are no more, then return the final sum. I would use it like this:
mysum <- runEffect $ inputs >-> sum'
However, the function in Pipes.Prelude has the following signature instead:
sum :: (Monad m, Num a) => Producer a m () -> m a
So I guess this is my first hurdle. Why does the sum function take a Producer as an argument as opposed to using >-> to connect?
FYI I ended up with the following after the answer from danidiaz:
sum' = go 0
where
go n p = next p >>= \x -> case x of
Left _ -> return n
Right (_, p') -> go (n + 1) p'

Consumers are actually quite limited in what they can do. They can't detect end-of-input (pipes-parse uses a different technique for that) and when some other part of the pipeline stops (for example the Producer upstream) that part is the one that must provide the result value for the pipeline. So putting the sum in the return value of the Consumer won't work in general.
Some alternatives are:
Implement a function that deals directly with Producer internals, or perhaps uses an auxiliary function like next. There are adapters of this type that can feed Producer data to "smarter" consumers, like Folds from the foldl package.
Keep using a Consumer, but instead of putting the sum in the return value of the Consumer, use a WriterT as the base monad with a Sum Int monoid as accumulator. That way, even if the Producer stop first, you can still run the writer to get to the accumulator This solution is likely to be less efficient, though.
Example code for the WriterT approach:
import Data.Monoid
import Control.Monad
import Control.Monad.Trans.Writer
import Pipes
producer :: Monad m => Producer Int m ()
producer = mapM_ yield [1..10]
summator :: Monad n => Consumer Int (WriterT (Sum Int) n) ()
summator = forever $ await >>= lift . tell . Sum
main :: IO ()
main = do
Sum r <- execWriterT . runEffect $ producer >-> summator
print r

Related

Create random numbers in a reproducible way and hide generator threading (Using Haskell Monad)

I need to create random data in Haskell.
I want my code to be:
a) reproducible from a seed
b) the threading of generators to be implicit
I understand Monads generally and the way that Random Generators work.
My approach is to thread the generator through the code so I can reproduce the random numbers but want to hide the threading of the generators in a Monad.
I'm thinking that the State Monad is a good approach.
Here's some simple code:
type Gen a = State StdGen a
roll :: Gen Int
roll = state $ randomR (1, 6)
roll2 :: Gen Int
roll2 = (+) <$> roll <*> roll
test :: Int -> IO ()
test seed = do
let gen = mkStdGen seed
print (evalState roll gen)
print (evalState roll gen)
print (evalState roll2 gen)
print (evalState roll2 gen)
I'm trying to use State so that I can push the threading of the generator into the State Monad but the results of roll are the same and results of roll2 are the same. I can see that this is because I'm passing gen into the functions multiple times so of course it would produce the same output. So that makes me think I need to get a new generator from each function. But then I'm back to having to thread the generator through the code which is what I'm trying to avoid by using State. I feel like I'm missing a trick!
I explored MonadRandom too and that did push the threading away from my code but I couldn't see how to make that approach be reproducible.
I've hunted a lot and tried many things but seem to always either be able to hide the generators OR make the code reproducible but not both.
I'm keen to use a Monad more specific than IO.
I'm also going to build a series of more complex functions which will generate random lists of numbers so I need to have a simple way to make these random functions rely on each other. I managed that with MonadRandom but again I couldn't see how that could be reproducible.
Any help appreciated.
If you needn't interleave IO with randomness, as here, then the answer is just to lump your State actions together into one with the Monad operations (they're the thing passing the state around for you!).
test :: Int -> IO ()
test seed = do
print a
print b
print c
print d
where
(a,b,c,d) = flip evalState (mkStdGen seed) $ do
a <- roll
b <- roll
c <- roll2
d <- roll2
return (a,b,c,d)
If you will need to interleave IO and randomness, then you will want to look into StateT StdGen IO as your monad instead of using State StdGen and IO separately. That might look like this, say:
roll :: MonadState StdGen m => m Int
roll = state (randomR (1,6))
roll2 :: MonadState StdGen m => m Int
roll2 = (+) <$> roll <*> roll
test :: (MonadState StdGen m, MonadIO m) => m ()
test = do
roll >>= liftIO . print
roll >>= liftIO . print
roll2 >>= liftIO . print
roll2 >>= liftIO . print
(You could then use e.g. evalStateT test (mkStdGen seed) to turn this back into an IO () action, or embed it into a larger computation if there were further random things you needed to generate and do IO about.)
MonadRandom does little more than package up StateT StdGen in a way that lets you still use non-seed state, so I encourage you to reconsider using it. evalRand and evalRandT from Control.Monad.Random.Lazy (or .Strict) shouldy give you the repeatability you need; if they don't, you should open a fresh question with the full details of what you tried and how it went wrong.
Normally, it's pretty much the whole point of a random generator that you don't always get the same result. And that's the reason why you use a state monad: to pass on the updated generator, so that the next random event will actually be different.
If you want always the same value, then there's not really any reason to use special random tooling at all – just generate one value once (or two values), then pass it on whereever needed, like you would pass on another variable.
test :: IO ()
test = do
[dice0, dice1] <- replicateM 2 $ randomRIO (1,6)
print dice0
print dice0
print $ dice0+dice1
print $ dice0+dice1

Haskell: Chaining State Monad

If I have a function that shuffles a "deck of cards", how do I use the State Monad to iterate through a defined set number of shuffles and then return a result?
For example I have the following function that will do 1 shuffle of the deck then return a specific card:
step :: State [String] String
step = do
modify shuffle
deck <- get
pure $ bestCard deck
What I would like to be able to do is iterate through the state changes 5 times before I return the value.
What I have tried is this:
steps :: Int -> State [String] String
steps n = case n of
0 -> do
deck <- get
pure $ bestCard deck
_ -> do
modify shuffle
steps (n - 1)
but that looks far from being the correct way of doing it, even though it works.
NB. I am aware this can be done without using State Monad but I am trying to use this example to learn how to use State.
edit:
Thanks to #Koterpillar, I can use replicateM to get what I want.
evalState (replicateM n $ modify shuffle >> get >>= pure . bestCard)
The most succinct way to do it is replicateM_, which repeats a monadic action a specified number of times and discards the result:
replicateM_ 5 $ modify shuffle
Because State is a monad, you only have to care about repeating an action, not specifically working with State. I found the above function by searching Hoogle for the signature of the function I wanted:
Monad m => Int -> m a -> m ()
Note that the result doesn't even require a monad, just an applicative:
replicateM_ :: Applicative m => Int -> m a -> m ()

Haskell: Replace mapM in a monad transformer stack to achieve lazy evaluation (no space leaks)

It has already been discussed that mapM is inherently not lazy, e.g. here and here. Now I'm struggling with a variation of this problem where the mapM in question is deep inside a monad transformer stack.
Here's a function taken from a concrete, working (but space-leaking) example using LevelDB that I put on gist.github.com:
-- read keys [1..n] from db at DirName and check that the values are correct
doRead :: FilePath -> Int -> IO ()
doRead dirName n = do
success <- runResourceT $ do
db <- open dirName defaultOptions{ cacheSize= 2048 }
let check' = check db def in -- is an Int -> ResourceT IO Bool
and <$> mapM check' [1..n] -- space leak !!!
putStrLn $ if success then "OK" else "Fail"
This function reads the values corresponding to keys [1..n] and checks that they are all correct. The troublesome line inside the ResourceT IO a monad is
and <$> mapM check' [1..n]
One solution would be to use streaming libraries such as pipes, conduit, etc. But these seem rather heavy and I'm not at all sure how to use them in this situation.
Another path I looked into is ListT as suggested here. But the type signatures of ListT.fromFoldable :: [Bool]->ListT Bool and ListT.fold :: (r -> a -> m r) -> r -> t m a -> mr (where m=IO and a,r=Bool) do not match the problem at hand.
What is a 'nice' way to get rid of the space leak?
Update: Note that this problem has nothing to do with monad transformer stacks! Here's a summary of the proposed solutions:
1) Using Streaming:
import Streaming
import qualified Streaming.Prelude as S
S.all_ id (S.mapM check' (S.each [1..n]))
2) Using Control.Monad.foldM:
foldM (\a i-> do {b<-check' i; return $! a && b}) True [1..n]
3) Using Control.Monad.Loops.allM
allM check' [1..n]
I know you mention you don't want to use streaming libraries, but your problem seems pretty easy to solve with streaming without changing the code too much.
import Streaming
import qualified Streaming.Prelude as S
We use each [1..n] instead of [1..n] to get a stream of elements:
each :: (Monad m, Foldable f) => f a -> Stream (Of a) m ()
Stream the elements of a pure, foldable container.
(We could also write something like S.take n $ S.enumFrom 1).
We use S.mapM check' instead of mapM check':
mapM :: Monad m => (a -> m b) -> Stream (Of a) m r -> Stream (Of b) m r
Replace each element of a stream with the result of a monadic action
And then we fold the stream of booleans with S.all_ id:
all_ :: Monad m => (a -> Bool) -> Stream (Of a) m r -> m Bool
Putting it all together:
S.all_ id (S.mapM check' (S.each [1..n]))
Not too different from the code you started with, and without the need for any new operator.
I think what you need is allM from the monad-loops package.
Then it would be just allM check' [1..n]
(Or if you don't want the import it's a pretty small function to copy.)

Haskell do clause with multiple monad types

I'm using a graphic library in Haskell called Threepenny-GUI. In this library the main function returns a UI monad object. This causes me much headache as when I attempt to unpack IO values into local variables I receive errors complaining of different monad types.
Here's an example of my problem. This is a slightly modified version of the standard main function, as given by Threepenny-GUI's code example:
main :: IO ()
main = startGUI defaultConfig setup
setup :: Window -> UI ()
setup w = do
labelsAndValues <- shuffle [1..10]
shuffle :: [Int] -> IO [Int]
shuffle [] = return []
shuffle xs = do randomPosition <- getStdRandom (randomR (0, length xs - 1))
let (left, (a:right)) = splitAt randomPosition xs
fmap (a:) (shuffle (left ++ right))
Please notice the fifth line:
labelsAndValues <- shuffle [1..10]
Which returns the following error:
Couldn't match type ‘IO’ with ‘UI’
Expected type: UI [Int]
Actual type: IO [Int]
In a stmt of a 'do' block: labelsAndValues <- shuffle [1 .. 10]
As to my question, how do I unpack the IO function using the standard arrow notation (<-), and keep on having these variables as IO () rather than UI (), so I can easily pass them on to other functions.
Currently, the only solution I found was to use liftIO, but this causes conversion to the UI monad type, while I actually want to keep on using the IO type.
A do block is for a specific type of monad, you can't just change the type in the middle.
You can either transform the action or you can nest it inside the do. Most times transformations will be ready for you. You can, for instance have a nested do that works with io and then convert it only at the point of interaction.
In your case, a liftIOLater function is offered to handle this for you by the ThreePennyUI package.
liftIOLater :: IO () -> UI ()
Schedule an IO action to be run later.
In order to perform the converse conversion, you can use runUI:
runUI :: Window -> UI a -> IO a
Execute an UI action in a particular browser window. Also runs all scheduled IO action.
This is more an extended comment - it doesn't address the main question, but your implementation of shufffle. There are 2 issues with it:
Your implementation is inefficient - O(n^2).
IO isn't the right type for it - shuffle has no general side effects, it just needs a source of randomness.
For (1) there are several solutions: One is to use Seq and its index, which is O(log n), which would make shuffle O(n log n). Or you could use ST arrays and one of the standard algorithms to get O(n).
For (2), all you need is threading a random generator, not full power of IO. There is already nice library MonadRandom that defines a monad (and a type-class) for randomized computations. And another package already provides the shuffle function. Since IO is an instance of MonadRandom, you can just use shuffle directly as a replacement for your function.
Under the cover, do is simply syntactic sugar for >>= (bind) and let:
do { x<-e; es } = e >>= \x -> do { es }
do { e; es } = e >> do { es }
do { e } = e
do {let ds; es} = let ds in do {es}
And the type of bind:
(>>=) :: Monad m => a -> (a -> m b) -> m b
So yeah it only "supports" one Monad

How can I write a pipe that sends downstream a list of what it receives from upstream?

I'm having a hard time to write a pipe with this signature:
toOneBigList :: (Monad m, Proxy p) => () -> Pipe p a [a] m r
It should simply take all as from upstream and send them in a list downstream.
All my attempts look fundamentally broken.
Can anybody point me in the right direction?
There are two pipes-based solutions and I'll let you pick which one you prefer.
Note: It's not clear why you output the list on the downstream interface instead of just returning it directly as a result.
Conduit-style
The first one, which is very close to the conduit-based solution uses the upcoming pipes-pase, which is basically complete and just needs documentation. You can find the latest draft on Github.
Using pipes-parse, the solution is identical to the conduit solution that Petr gave:
import Control.Proxy
import Control.Proxy.Parse
combine
:: (Monad m, Proxy p)
=> () -> Pipe (StateP [Maybe a] p) (Maybe a) [a] m ()
combine () = loop []
where
loop as = do
ma <- draw
case ma of
Nothing -> respond (reverse as)
Just a -> loop (a:as)
draw is like conduit's await: it requests a value from either the leftovers buffer (that's the StateP part) or from upstream if the buffer is empty. Nothing indicates end of file.
You can wrap a pipe that does not have an end of file signal using the wrap function from pipes-parse, which has type:
wrap :: (Monad m, Proxy p) => p a' a b' b m r -> p a' a b' (Maybe b) m s
Classic Pipes Style
The second alternative is a bit simpler. If you want to fold a given pipe you can do so directly using WriterP:
import Control.Proxy
import Control.Proxy.Trans.Writer
foldIt
:: (Monad m, Proxy p) =>
(() -> Pipe p a b m ()) -> () -> Pipe p a [b] m ()
foldIt p () = runIdentityP $ do
r <- execWriterK (liftP . p >-> toListD >-> unitU) ()
respond r
That's a higher-level description of what is going on, but it requires passing in the pipe as an explicit argument. It's up to you which one you prefer.
By the way, this is why I was asking why you want to send a single value downstream. The above is much simpler if you return the folded list:
foldIt p = execWriterK (liftP . p >-> toListD)
The liftP might not even be necessary if p is completely polymorphic in its proxy type. I only include it as a precaution.
Bonus Solution
The reason pipes-parse does not provide the toOneBigList is that it's always a pipes anti-pattern to group the results into a list. pipes has several nice features that make it possible to never have to group the input into a list, even if you are trying to yield multiple lists. For example, using respond composition you can have a proxy yield the subset of the stream it would have traversed and then inject a handler that uses that subset:
example :: (Monad m, Proxy p) => () -> Pipe p a (() -> Pipe p a a m ()) m r
example () = runIdentityP $ forever $ do
respond $ \() -> runIdentityP $ replicateM_ 3 $ request () >>= respond
printIt :: (Proxy p, Show a) => () -> Pipe p a a IO r
printIt () = runIdentityP $ do
lift $ putStrLn "Here we go!"
printD ()
useIt :: (Proxy p, Show a) => () -> Pipe p a a IO r
useIt = example />/ (\p -> (p >-> printIt) ())
Here's an example of how to use it:
>>> runProxy $ enumFromToS 1 10 >-> useIt
Here we go!
1
2
3
Here we go!
4
5
6
Here we go!
7
8
9
Here we go!
10
This means you never need to bring a single element into memory even when you need to group elements.
I'll give only a partial answer, perhaps somebody else will have a better one.
As far as I know, standard pipes have no mechanism of detecting when the other part of the pipeline terminates. The first pipe that terminates produces the final result of the pipe-line and all the others are just dropped. So if you have a pipe that consumes input forever (to eventually produce a list), it will have no chance acting and producing output when its upstream finishes. (This is intentional so that both up- and down-stream parts are dual to each other.) Perhaps this is solved in some library building on top of pipes.
The situation is different with conduit. It has consume function that combines all inputs into a list and returns (not outputs) it. Writing a function like the one you need, that outputs the list at the end, is not difficult:
import Data.Conduit
combine :: (Monad m) => Conduit a m [a]
combine = loop []
where
loop xs = await >>= maybe (yield $ reverse xs) (loop . (: xs))

Resources