Pipe that maintains state - haskell

I'm trying to calculate rolling hash values (buzzhash) for a big file using pipes.
Currently I have this. But don't know how to write a pipe that maintains a state.
import qualified Data.ByteString.Lazy as L
import Data.Word
import Data.Bits(xor,rotate)
import Data.Array
import Pipes
import Control.Monad.State.Strict
import Control.Monad(forever)
produceFromList (x:xs) = do
yield x
produceFromList xs
buzzHash = do
x <- await
h <- lift $ get -- pull out previous value
let h' = rotate h 1 `xor` (hashArrW8!x) -- calculate new value
lift $ put h' -- save new value
yield h'
stdoutLn :: Consumer Word64 IO ()
stdoutLn = do
a <- await
lift $ print a
main = do
bs <- L.unpack `fmap` L.getContents
runEffect $ produceFromList bs >-> buzzHash >-> stdoutLn
hashArrW8 :: Array Word8 Word64
How do I make buzzHash save previous value and use it for the calculation of next value? Initial state value should be 0.

You were almost there; you just need to run the state.
main = do
bs <- L.unpack `fmap` L.getContents
flip execStateT 0 $ runEffect $ produceList bs >-> buzzHash >-> hoist lift stdoutLn
I assume you don't want to recover the state, so I use execStateT rather than runStateT.
The only curiosity here is that stdoutLn was marked as Consumer Word64 IO () . So I use hoist lift to make it Consumer Word64 (StateT Word64 IO) () Everything in the series a >-> b >-> c must agree in the underlying monad and return type.
Here are a few further comments that might save you time. First produceFromList is each.
Moreover, you could have avoided the hoist lift by relabeling your stdoutLn:
stdoutLn :: MonadIO m => Consumer Word64 m ()
stdoutLn = do
a <- await
liftIO $ print a
But here there is some trouble: you are not repeating the action. This should pretty clearly be a loop:
stdoutLn :: MonadIO m => Consumer Word64 m ()
stdoutLn = do
a <- await
liftIO $ print a
stdoutLn
in fact this is already available as P.print, so we can write
import qualified Pipes.Prelude as P
main = do
bs <- L.unpack `fmap` L.getContents
flip execStateT 0 $ runEffect $ each bs >-> buzzHash >-> P.print
If I understand you, buzzHash is meant to be repeated indefinitely too:
buzzHash = do
x <- await
h <- lift $ get -- pull out previous value
let h' = rotate h 1 `xor` (hashArrW8!x) -- calculate new value
lift $ put h' -- save new value
yield h'
buzzHash
(this is forever buzzHash, where we use your buzzHash)
Finally, if you
import qualified Pipes.ByteString as PB
import Control.Lens (view) -- (or Lens.Micro.MTL or Lens.Simple)
we see we don't need the lazy bytestring IO, which doesn't stream properly anyway.
Pipes.ByteString already has the unpack we want, packaged as a lens, so that we use view PB.unpack where elsewhere we would use B.unpack. So in the end we can write
main = flip evalStateT 0 $ runEffect $ view PB.unpack PB.stdin >-> buzzHash >-> P.print
Once it is in this form we see we aren't using the underlying state of the pipeline except in buzzHash, so we can localize this
import Pipes.Lift (evalStateP)
main = runEffect $ view PB.unpack PB.stdin >-> evalStateP 0 buzzHash >-> P.print
or, if you like you can rewrite
buzzHash' :: Monad m => Word64 -> Pipe Word8 Word64 m r
buzzHash' n = evalStateP n $ forever $ do
x <- await
h <- lift $ get -- pull out previous value
let h' = rotate h 1 `xor` (hashArrW8!x) -- calculate new value
lift $ put h' -- save new value
yield h'
Then you would write
main = runEffect $ view PB.unpack PB.stdin >-> buzzHash' 0 >-> P.print

Related

Pipes (Haskell lib) - piping pipes with different state monad

My goal is to have the last value produced equal to 80 (40 + 40) (see code below)...
import Pipes
import Pipes.Prelude
import Pipes.Lift
import Control.Monad.State.Strict
data Input = A Integer | B Integer | C Integer
main :: IO ()
main = runEffect $ each [A 10,B 2,C 3,A 40,A 40] >-> pipeline >-> print
pipeline :: Pipe Input Integer IO ()
pipeline = for cat $ \case
A x -> yield x >-> accumulate
B x -> yield x
C x -> yield x
accumulate :: Pipe Integer Integer IO ()
accumulate = evalStateP 0 accumulate'
accumulate' :: Pipe Integer Integer (StateT Integer IO) ()
accumulate' = go
where
go = do
x <- await
lift $ modify (+x)
r <- lift get
yield r
go
With this example Input As are not accumulated...yield x >-> accumulate on Input A does do what I'm expected, the stream is a new one each time...
Piping pipes with different state monad sequentially works well but here somehow I want to nest them in the case pattern (like a substream somehow)...
The problem is that you call evalStateP too early, discarding state you want to preserve across calls to accumulate. Try something like this:
pipeline :: Pipe Input Integer IO ()
pipeline = evalStateP 0 $ for cat $ \case
A x -> yield x >-> accumulate
B x -> yield x
C x -> yield x
accumulate :: Pipe Integer Integer (StateT Integer IO) ()
accumulate = for cat $ \x -> do
modify (+x)
r <- get
yield r
Note that Proxy has a MonadState instance, so you don't need to lift state operations manually if you use mtl.

Forking the streaming flow in haskell-pipes

I'm having trouble directing flow though a pipeline with haskell-pipes. Basically, I analyze a bunch of files and then I have to either
print results to the terminal in a human-friendly way
encode results to JSON
The chosen path depends upon a command line option.
In the second case, I have to output an opening bracket, then every incoming value followed by a comma and then a closing bracket. Currently insertCommas never terminates, so the closing bracket is never outputted.
import Pipes
import Data.ByteString.Lazy as B
import Data.Aeson (encode)
insertCommas :: Consumer B.ByteString IO ()
insertCommas = do
first <- await
lift $ B.putStr first
for cat $ \obj -> lift $ do
putStr ","
B.putStr obj
jsonExporter :: Consumer (FilePath, AnalysisResult) IO ()
jsonExporter = do
lift $ putStr "["
P.map encode >-> insertCommas
lift $ putStr "]"
exportStream :: Config -> Consumer (FilePath, AnalysisResult) IO ()
exportStream conf =
case outputMode conf of
JSON -> jsonExporter
_ -> P.map (export conf) >-> P.stdoutLn
main :: IO ()
main = do
-- The first two lines are Docopt stuff, not relevant
args <- parseArgsOrExit patterns =<< getArgs
ins <- allFiles $ args `getAllArgs` argument "paths"
let conf = readConfig args
runEffect $ each ins
>-> P.mapM analyze
>-> P.map (filterResults conf)
>-> P.filter filterNulls
>-> exportStream conf
AFAIK a Consumer cannot detect the end of a stream. In order to do that you need to use a Pipes.Parser and invert the control.
Here is a Parser which inserts commas between String elements:
import Pipes
import qualified Pipes.Prelude as P
import Pipes.Parse (draw, evalStateT)
commify = do
lift $ putStrLn "["
m1 <- draw
case m1 of
Nothing -> lift $ putStrLn "]"
Just x1 -> do
lift $ putStrLn x1
let loop = do mx <- draw
case mx of
Nothing -> lift $ putStrLn "]"
Just x -> lift (putStr "," >> putStrLn x) >> loop
loop
test1 = evalStateT commify ( mapM_ yield (words "this is a test") )
test2 = evalStateT commify P.stdinLn
To handle the different output formats I would probably make both formats a Parser:
exportParser = do
mx <- draw
case mx of
Nothing -> return ()
Just x -> (lift $ putStrLn $ export x) >> exportParser
and then:
let parser = case outputMode of
JSON -> commify
_ -> exportParser
evalStateT parser (P.mapM analyze
>-> P.map (filterResults conf)
>-> P.filter filterNulls)
There is probably a slicker way to write exportParser in terms of foldAllM. You can also use the MaybeT transformer to more succinctly write the commify parser. I've written both out explicitly to make them easier to understand.
I think you should 'commify' with pipes-group. It has an intercalates, but not an intersperse, but it's not a big deal to write. You should stay away from the Consumer end, I think, for this sort of problem.
{-#LANGUAGE OverloadedStrings #-}
import Pipes
import qualified Pipes.Prelude as P
import qualified Data.ByteString.Lazy.Char8 as B
import Pipes.Group
import Lens.Simple -- or Control.Lens or Lens.Micro or anything with view/^.
import System.Environment
intersperse_ :: Monad m => a -> Producer a m r -> Producer a m r
intersperse_ a producer = intercalates (yield a) (producer ^. chunksOf 1)
main = do
args <- getArgs
let op prod = case args of
"json":_ -> yield "[" *> intersperse_ "," prod <* yield "]"
_ -> intersperse_ " " prod
runEffect $ op producer >-> P.mapM_ B.putStr
putStrLn ""
where
producer = mapM_ yield (B.words "this is a test")
which give me this
>>> :main json
[this,is,a,test]
>>> :main ---
this is a test

Using Pipes to read and write binary data in Haskell

I am trying to read and write very many ints in constant memory. I have figured out how to write the ints to memory but have not figured out how to read them back.
import Control.Lens (zoom)
import System.IO (IOMode(..), withFile)
import Pipes
import qualified Pipes.Prelude as P
import qualified Pipes.ByteString as PB
import qualified Pipes.Parse as P
import qualified Pipes.Binary as P
intStream :: Monad m => Proxy x' x () Int m b
intStream = go (0 :: Int) where
go i = yield i >> go (i + 1)
decoder :: Monad m => Int -> P.Parser P.ByteString m [Int]
decoder n = zoom (P.decoded . P.splitAt n) P.drawAll
main :: IO ()
main = do
withFile "ints" WriteMode $ \h -> do
runEffect $ for intStream P.encode >-> P.take 10000 >-> PB.toHandle h
withFile "ints" ReadMode $ \h -> do
xs <- P.evalStateT (decoder 10000000) (PB.fromHandle h)
print xs
I got the decoder function from the documentation for Pipes.Binary. However it uses drawAll which according to the documentation drawAll is not idiomatic use of Pipes and is provided for testing purposes.
My question is how to modify decoder so that it doesn't use drawAll and thus does not load all the values of xs into memory. So instead of printing the list of xs I could P.map print over a stream of decoded ints being read from the file.
The docs say that decoded is a lens from a stream of bytes to a stream of decoded values. We can get the latter out of the former using view from lens:
decoder :: Monad m => Int -> Producer P.ByteString m a -> Producer Int m ()
decoder n p = void (view P.decoded p) >-> P.take n
main :: IO ()
main = do
withFile "ints" WriteMode $ \h -> do
runEffect $ for intStream P.encode >-> P.take 10000 >-> PB.toHandle h
withFile "ints" ReadMode $ \h -> do
runEffect $ decoder 10000 (PB.fromHandle h) >-> P.print
I don't have much experience with pipes, I just followed the types here. The program seems to function as intended though.

Why is putStrLn not atomic?

To practice concurrent programming, I wrote the following (suboptimal) program, which repeatedly calculates the first prime bigger than whatever the user inputs:
import Control.Concurrent
import Control.Concurrent.Chan
import Control.Monad (forever)
primeAtLeast n = -- Some pure code that looks up the first prime at least as big as n
outputPrimeAtLeast n = putStrLn $ show $ (n, primeAtLeast n)
main = do
chan <- newChan
worker <- forkIO $ forever $ readChan chan >>= outputPrimeAtLeast
forever $ (readLn :: (IO Int)) >>= (writeChan chan)
killThread worker
I want to have a worker thread in the background that does the actual calculation and outputs (n, primeAtLeast n) as soon as it's finished.
What it's doing now: As soon as I enter a number n, it immediately outputs (n,, returns the control to the main thread, calculates primeAtLeast n in the background and outputs the second half primeAtLeast n) as soon as it's finished.
So is putStrLn not atomic? Or where is the problem?
Try this:
outputPrimeAtLeast n = let p = primeAtLeast n in p `seq` putStrLn $ show (n, p)
The above forces the computation of the prime before the putStrLn is run.
Further, you may use print instead of putStrLn . show:
outputPrimeAtLeast n = let p = primeAtLeast n in p `seq` print (n, p)
Alternatively, you may use a putStrLn function which forces every single character before starting printing anything.
strictPutStrLn :: Show a => a -> IO ()
strictPutStrLn x = let str = show x in str `listSeq` putStrLn str
listSeq :: [a] -> b -> b
listSeq [] w = w
listSeq (x:xs) w = x `seq` listSeq xs w

list monad transformer

I need to use a list monad transformer. I've read that there are potential problems with ListT IO from Control.Monad.List, since IO isn't commutative, so I'm looking at ListT done right. But I'm getting some unexpected behavior.
Consider this simple test:
test = runListT $ do
x <- liftList [1..3]
liftIO $ print x
y <- liftList [6..8]
liftIO $ print (x,y)
Using Control.Monad.List:
Main> test
1
(1,6)
(1,7)
(1,8)
2
(2,6)
(2,7)
(2,8)
3
(3,6)
(3,7)
(3,8)
[(),(),(),(),(),(),(),(),()]
Using "ListT done right":
Main> test
1
(1,6)
Is this a problem with "ListT done right", or am I just using it wrong? Is there a preferred alternative?
Thanks!
This might be intensional on the part of the author, since they say
it lets each element of the list have its own side effects, which only get
`excecuted' if this element of the list is really inspected.
I'm not sure, though. Anyway, you can use this function to sequence the whole
list:
runAll_ :: (Monad m) => ListT m a -> m ()
runAll_ (ListT m) = runAll_' m where
runAll_' m = do
mm <- m
case mm of
MNil -> return ()
_ `MCons` mxs -> runAll_' mxs
And an analogous runAll that returns a list should be easy to construct.
main = runAll_ $ do
x <- liftList [1..3]
liftIO $ print x
y <- liftList [6..8]
liftIO $ print (x,y)
1
(1,6)
(1,7)
(1,8)
2
(2,6)
(2,7)
(2,8)
3
(3,6)
(3,7)
(3,8)

Resources