Piping an http stream through a haskell conduit - haskell

I am trying to create a conduit that will stream data from HTTP through a conduit source. Here is what I have so far:
import qualified Network.HTTP.Client.Conduit as CC
getStream :: String -> IO (ConduitM () BS.ByteString IO ())
getStream url = do
req <- parseUrl url
return $ CC.withResponse req $ \res -> do
responseBody res $= (awaitForever $ \bytes -> liftIO $ do
putStrLn $ "Got " ++ show (BS.length bytes) ++ " but will ignore them")
But I am getting
No instance for (Control.Monad.Reader.Class.MonadReader env0 IO) …
arising from a use of ‘CC.withResponse’
In the expression: CC.withResponse req
In the second argument of ‘($)’, namely
‘CC.withResponse req
$ \ res
-> do { responseBody res $= (awaitForever $ \ bytes -> ...) }’
In a stmt of a 'do' block:
return
$ CC.withResponse req
$ \ res
-> do { responseBody res $= (awaitForever $ \ bytes -> ...) }
How come a MonadReader is expected? It doesn't make any sense to me.

How about this variation of the example in the Network.HTTP.Conduit docs:
{-# LANGUAGE OverloadedStrings #-}
module Lib2 () where
import Data.Conduit (($$+-), awaitForever)
import qualified Network.HTTP.Client.Conduit as CC
import Network.HTTP.Conduit (http, tlsManagerSettings, newManager)
import Control.Monad.IO.Class (liftIO)
import Control.Monad.Trans.Resource (runResourceT)
import Data.Conduit.Binary (sinkFile) -- Exported from the package conduit-extra
main2 :: IO ()
main2 = do
request <- CC.parseUrl "http://google.com/"
manager <- newManager tlsManagerSettings
runResourceT $ do
response <- http request manager
CC.responseBody response $$+- (awaitForever $ \x -> liftIO $ putStrLn "Chunk")
Original answer
The return type for getStream is wrong. Try removing the type signature and use FlexibleContexts, e.g.:
{-# LANGUAGE OverloadedStrings, FlexibleContexts #-}
module Lib () where
import Data.Conduit
import qualified Data.ByteString as BS
import qualified Network.HTTP.Client.Conduit as CC
import Control.Monad.IO.Class
getStream url = do
req <- CC.parseUrl url
CC.withResponse req $ \res -> do
CC.responseBody res $= (awaitForever $ \x -> liftIO $ putStrLn "Got a chunk")
And then :t getStream reports:
getStream
:: (monad-control-1.0.0.4:Control.Monad.Trans.Control.MonadBaseControl
IO (ConduitM a c m),
mtl-2.2.1:Control.Monad.Reader.Class.MonadReader env m, MonadIO m,
CC.HasHttpManager env,
exceptions-0.8.0.2:Control.Monad.Catch.MonadThrow m) =>
String -> ConduitM a c m ()
which shows that the return type has the form ConduitM ..., not IO ....
This also shows how MonadReader gets into the picture... The monad m must have access to an HTTP manager through a reader environment as expressed by the following constraints:
CC.HasHttpManager env
MonadReader env m
All this is saying is that m has a reader environment of some type env which itself has a way of accessing an HTTP manager.
In particular, m cannot be just the plain IO monad, which is what the error message is complaining about.
Answer to question in the comments
Here is an example of how to create a Producer from a HTTP response:
{-# LANGUAGE OverloadedStrings #-}
module Lib3 () where
import qualified Data.ByteString as BS
import qualified Network.HTTP.Client.Conduit as CC
import Network.HTTP.Conduit (http, tlsManagerSettings, newManager)
import qualified Network.HTTP.Client as Client (httpLbs, responseOpen, responseClose)
import Data.Conduit (Producer, addCleanup)
import Data.Conduit (awaitForever, await, ($$))
import qualified Network.HTTP.Client.Conduit as HCC
import Control.Monad.IO.Class (liftIO, MonadIO)
getStream url = do
request <- CC.parseUrl url
manager <- newManager tlsManagerSettings
response <- Client.responseOpen request manager
let producer :: Producer IO BS.ByteString
producer = HCC.bodyReaderSource $ CC.responseBody response
cleanup _ = do liftIO $ putStrLn "(cleaning up)"; Client.responseClose response
producerWithCleanup = addCleanup cleanup producer
return $ response { CC.responseBody = producerWithCleanup }
test = do
res <- getStream "http://google.com"
let producer = CC.responseBody res
consumer = awaitForever $ \_ -> liftIO $ putStrLn "Got a chunk"
producer $$ consumer

Related

Haskell Scotty ‘Home.main’ is applied to too few arguments

I need to start up my very simple webapp with Haskell's Scotty and I just can't seem to get the IO () ReaderT stuff workinng. I am basing this off of another example I found online, and am pretty new to Monads and Haskell overall.
My IDE is throwing this error:
Couldn't match expected type ‘IO t0’
with actual type ‘(m0 Network.Wai.Internal.Response
-> IO Network.Wai.Internal.Response)
-> IO ()’
• Probable cause: ‘Home.main’ is applied to too few arguments
In the expression: Home.main
When checking the type of the IO action ‘main’
It is also throwing this one but I think it should get fixed once I fixed the other one
Ambiguous occurrence ‘main’
It could refer to either ‘Home.main’,
imported from ‘Platform.Home’ at Main.hs:16:1-28
or ‘Main.main’, defined at Main.hs:28:1
I am leaving here the needed code, if there is anything else I should show please let me know.
In "Main.hs":
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
module Main
( main
) where
import Control.Monad (join)
import Control.Applicative ((<$>))
import Core.Item.Controller (routes)
import Core.Item.Controller as ItemController
import Core.Item.Service as ItemService
import Core.Item.DAO as ItemDAO
import Platform.Postgres as Postgres
import Platform.Home as Home
import Data.Maybe (fromMaybe)
import Network.Wai.Middleware.RequestLogger (logStdoutDev)
import Network.Wai.Middleware.Static (addBase, noDots, staticPolicy, (>->))
import System.Environment (lookupEnv)
import Text.Read (readMaybe)
import Web.Scotty (middleware, scotty)
import Language.Haskell.TH (Type(AppT))
import ClassyPrelude
main :: IO ()
main = do
pgEnv <- Postgres.init
let runner app = flip runReaderT pgEnv $ unAppT app
Home.main runner
type Environment = Postgres.Env
newtype AppT a = AppT
{ unAppT :: ReaderT Environment IO a
} deriving (Applicative, Functor, Monad, MonadIO, MonadReader Environment)
instance ItemController.Service AppT where
getItem = ItemService.getItem
getItems = ItemService.getItems
createItem = ItemService.createItem
instance ItemService.ItemRepo AppT where
findItems = ItemDAO.findItems
addItem = ItemDAO.addItem
instance ItemService.TimeRepo AppT where
currentTime = liftIO getCurrentTime
In "Postgres.hs"
type Env = Pool Connection
type Postgres r m = (MonadReader r m, Has Env r, MonadIO m)
init :: IO Env
init = do
pool <- acquirePool
migrateDb pool
return pool
And this is my "Home.hs":
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ConstraintKinds #-}
module Platform.Home
( main
) where
import ClassyPrelude (MonadIO, LText, fromMaybe, readMay)
import Web.Scotty.Trans
import Network.HTTP.Types.Status
import Network.Wai.Handler.WarpTLS (runTLS, tlsSettings)
import Network.Wai.Handler.Warp (defaultSettings, setPort)
import Network.Wai (Response)
import Network.Wai.Middleware.Cors
import qualified Core.Item.Controller as ItemController
import System.Environment (lookupEnv)
type App r m = (ItemController.Service m, MonadIO m)
main :: (App r m) => (m Response -> IO Response) -> IO ()
main runner = do
port <- acquirePort
mayTLSSetting <- acquireTLSSetting
case mayTLSSetting of
Nothing ->
scottyT port runner routes
Just tlsSetting -> do
app <- scottyAppT runner routes
runTLS tlsSetting (setPort port defaultSettings) app
where
acquirePort = do
port <- fromMaybe "" <$> lookupEnv "PORT"
return . fromMaybe 3000 $ readMay port
acquireTLSSetting = do
env <- (>>= readMay) <$> lookupEnv "ENABLE_HTTPS"
let enableHttps = fromMaybe True env
return $ if enableHttps
then Just $ tlsSettings "secrets/tls/certificate.pem" "secrets/tls/key.pem"
else Nothing
routes :: (App r m) => ScottyT LText m ()
routes = do
-- middlewares
middleware $ cors $ const $ Just simpleCorsResourcePolicy
{ corsRequestHeaders = "Authorization":simpleHeaders
, corsMethods = "PUT":"DELETE":simpleMethods
}
options (regex ".*") $ return ()
-- errors
defaultHandler $ \str -> do
status status500
json str
-- feature routes
ItemController.routes
-- health
get "/api/health" $
json True
Actually, the errors are related. In Main.hs, change the import of Home to:
import qualified Platform.Home as Home
^^^^^^^^^-- add this
and it should fix both errors. The following minimal example gives the same pair of errors:
-- contents of Home.hs
module Home where
main :: (Int -> Int) -> IO ()
main = undefined
-- contents of Main.hs
import Home
main = Home.main id
but works if you change import Home to import qualified Home.
The issue appears to be that GHC tries to type-check Home.main as the program's main function (perhaps simply because it was the first one defined, having been imported before the definition of Main.main in the body of the module), and it generates this extra error message because Home.main's type doesn't match the required signature of IO t for a main function. This happens before it gets around to noticing that there are two definitions of main (i.e., the "ambiguous occurrence" error), and it's typechecked the wrong one.

Is there any way to catch all exceptions in scotty without wrapping all of my code in Exception Catching

I would like for my default handler to be able to catch all of the exceptions that my App throws but in order for this to happen I need to manually call raise after manually adding some exception catching around my IO code.
below is an example minimal server:
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE ScopedTypeVariables #-}
module Lib
( someFunc
) where
import Web.Scotty.Trans
import Control.Monad.Trans
import Control.Monad.Reader
import Control.Monad.Catch
import Control.Monad.Except
import Data.Text.Lazy as TL
data AppEnv = AppEnv
{ appStuff :: String
}
newtype App a = App
{ unApp :: ReaderT AppEnv IO a
} deriving (Functor, Applicative, Monad, MonadIO, MonadReader AppEnv, MonadThrow)
someFunc :: IO ()
someFunc = do
let run a = runReaderT (unApp $ App a) (AppEnv "APPY STUFF")
scottyT 8080 run $ do
defaultHandler $ \(e :: TL.Text) -> do
liftIO $ print "HERE"
liftIO $ print $ showError e
html $ "Something Went Seriously Wrong"
get "/" $ do
(r :: (Either TL.Text String)) <- liftIO $ runExceptT $ do
(uId) <- lift $ readFile "./helloworld.txt"
return $ ("hello")
liftIO $ print r
case r of
Left l -> raise l
Right s -> (html "hello world")
get "/catch-this" $ do
error "Catch Me"
(html "hello world")
notFound $ do
html "That is not a valid route"
I would like to be able to catch all of my uncaught exceptions in my default handler however this is not the default behavior of scotty that only happens if you call raise. I could wrap all of my ActionM code blocks in ExceptT however this seems like a messy/mechanical way of solving this problem. I mostly want to do this for logging purposes so I can report out to Sentry or Log to a file and this would make it much more convenient.
I figured I'd throw this in there as I recently was looking for this same solution again. It's unfortunate but I was never able to get the behavior that I wanted out of scotty.
Fortunately since scotty is just a nice library to create WAI application's you can get a nice work around using the Settings type from warp and the Options type from scotty.
Below is an example of how you can approach this:
{-# LANGUAGE OverloadedStrings #-}
module Lib
( someFunc
) where
import Web.Scotty.Trans
import Data.Text
import qualified Data.Text.Lazy as TL
import Control.Monad.IO.Class
import Control.Exception
import Network.HTTP.Types
import System.IO.Error
import Network.Wai.Handler.Warp
import Network.Wai
myOpts :: Options
myOpts = Options 1 mySettings
mySettings :: Settings
mySettings = setOnExceptionResponse myHandler $ setPort 3002 $ defaultSettings
myHandler :: SomeException -> Response
myHandler se = responseLBS status500 [] "HERE WE ARE"
someFunc :: IO ()
someFunc = do
scottyOptsT myOpts id routes
myExceptions :: (MonadIO m) => TL.Text -> ActionT TL.Text m ()
myExceptions t = do
liftIO $ print t
html "error"
routes :: (MonadIO m) => ScottyT TL.Text m ()
routes = do
defaultHandler $ \str -> do
liftAndCatchIO $ print str
status status500
json ("welp you thought"::Text)
get "/:here" $ do
liftIO $ ioError $ userError "Hahah"
text "here"
You can then tap further into the Settings type provided by warp so that you could maybe log all the error messages or perform some custom action using the following methods setOnException setOnExceptionResponse.

How to compress the output when writing to a file?

I have a computation that along with other things generates some data (a lot of it) and I want to write into a file.
The way the code is structured now is (simplified):
writeRecord :: Handle -> Record -> IO ()
writeRecord h r = hPutStrLn h (toByteString r)
This function is then called periodically during a bigger computation. It is almost like a log, and in fact, multiple files are being written simultaneously.
Now I want the output file to be compressed with Gzip.
In languages like Java I would do something like:
outStream = new GzipOutputStream(new FileOutputStream(path))
and then would just write into that wrapped output stream.
What is the way of doing it in Haskell?
I think writing something like
writeRecord h r = hPut h ((compressed . toByteString) r)
is not correct because compressing each small bit individually isn't efficient (I even tried it and the size of the compressed file is bigger than uncompressed this way).
I also don't think that I can just produce a lazy ByteString (or even a list of chunks) and then write it with compressed . fromChunks because this will require my "generator" building the full thing in memory. And the fact that more than one file is produced at the same time makes it even more complicated.
So what would be a way to solve this in Haskell? Writing to file(s) and have them gzipped?
All the streaming libraries support compression. If I understand the particular problem and the way you are thinking about it, io-streams might be the simplest for your purposes. Here I alternate between writing to trump and clinton output streams, which are written as compressed files. I follow by showing the pipes equivalent of Michael's conduit program
#!/usr/bin/env stack
-- stack --resolver lts-6.21 --install-ghc runghc --package io-streams
{-# LANGUAGE OverloadedStrings #-}
import qualified System.IO.Streams as IOS
import qualified System.IO as IO
import Data.ByteString (ByteString)
analyzer :: IOS.OutputStream ByteString -> IOS.OutputStream ByteString -> IO ()
analyzer clinton trump = do
IOS.write (Just "This is a string\n") clinton
IOS.write (Just "This is a string\n") trump
IOS.write (Just "Clinton string\n") clinton
IOS.write (Just "Trump string\n") trump
IOS.write (Just "Another Clinton string\n") clinton
IOS.write (Just "Another Trump string\n") trump
IOS.write Nothing clinton
IOS.write Nothing trump
main:: IO ()
main =
IOS.withFileAsOutput "some-file-clinton.txt.gz" $ \clinton_compressed ->
IOS.withFileAsOutput "some-file-trump.txt.gz" $ \trump_compressed -> do
clinton <- IOS.gzip IOS.defaultCompressionLevel clinton_compressed
trump <- IOS.gzip IOS.defaultCompressionLevel trump_compressed
analyzer clinton trump
Obviously you can mix all kinds of IO in analyzer between acts of writing to the two output streams - I'm just show in the writes, so to speak. In particular, if analyzer is understood as depending on an input stream, the writes can depend on reads from the input stream. Here's a (slightly!) more complicated program that does that. If I run the program above I see
$ stack gzip_so.hs
$ gunzip some-file-clinton.txt.gz
$ gunzip some-file-trump.txt.gz
$ cat some-file-clinton.txt
This is a string
Clinton string
Another Clinton string
$ cat some-file-trump.txt
This is a string
Trump string
Another Trump string
With pipes and conduit there are various ways of achieving the above effect, with a higher level of decomposition of parts. Writing to separate files will however be a little more subtle. Here in any case is the pipes equivalent of Michael S's conduit program:
#!/usr/bin/env stack
-- stack --resolver lts-6.21 --install-ghc runghc --package pipes-zlib
{-# LANGUAGE OverloadedStrings #-}
import Control.Monad.IO.Class (MonadIO, liftIO)
import Data.ByteString (ByteString, hPutStr)
import System.IO (IOMode(..), withFile, Handle)
import Pipes
import qualified Pipes.ByteString as PB
import qualified Pipes.GZip as P
-- Some helper function you may have
someAction :: IO ByteString
someAction = return "This is a string\n"
-- Original version
producerHandle :: Handle -> IO ()
producerHandle h = do
str <- someAction
hPutStr h str
producerPipe :: MonadIO m => Producer ByteString m ()
producerPipe = do
str <- liftIO someAction
yield str
main :: IO ()
main = withFile "some-file-pipes.txt.gz" WriteMode $ \h ->
runEffect $ P.compress P.defaultCompression producerPipe >-> PB.toHandle h
-- Edit
Here for what it's worth is yet another way of superimposing several producers on a single thread with pipes or conduit, to add to the different approaches Michael S and danidiaz mentioned:
#!/usr/bin/env stack
-- stack --resolver lts-6.21 --install-ghc runghc --package pipes-zlib
{-# LANGUAGE OverloadedStrings #-}
import Pipes
import Pipes.GZip
import qualified Pipes.Prelude as P
import qualified Pipes.ByteString as Bytes
import System.IO
import Control.Monad (replicateM_)
producer = replicateM_ 50000 $ do
marie "This is going to Marie\n" -- arbitary IO can be interspersed here
arthur "This is going to Arthur\n" -- with liftIO
sylvia "This is going to Sylvia\n"
where
marie = yield; arthur = lift . yield; sylvia = lift . lift . yield
sinkHelper h p = runEffect (compress bestSpeed p >-> Bytes.toHandle h)
main :: IO ()
main =
withFile "marie.txt.gz" WriteMode $ \marie ->
withFile "arthur.txt.gz" WriteMode $ \arthur ->
withFile "sylvia.txt.gz" WriteMode $ \sylvia ->
sinkHelper sylvia
$ sinkHelper arthur
$ sinkHelper marie
$ producer
It is quite simple and fast, and can be written in conduit with the obvious alterations - but finding it natural involves a higher level of buy-in with the 'monad transformer stack' point of view. It would be the most natural way of writing such a program from the point of view of something like the streaming library.
Doing this with conduit is fairly straightforward, though you'd need to adjust your code a bit. I've put together an example of before and after code to demonstrate it. The basic idea is:
Replace hPutStr h with yield
Add some liftIO wrappers
Instead of using withBinaryFile or the like, use runConduitRes, gzip, and sinkFile
Here's the example:
#!/usr/bin/env stack
-- stack --resolver lts-6.21 --install-ghc runghc --package conduit-extra
{-# LANGUAGE OverloadedStrings #-}
import Control.Monad.IO.Class (MonadIO, liftIO)
import Data.ByteString (ByteString, hPutStr)
import Data.Conduit (ConduitM, (.|), yield, runConduitRes)
import Data.Conduit.Binary (sinkFile)
import Data.Conduit.Zlib (gzip)
import System.IO (Handle)
-- Some helper function you may have
someAction :: IO ByteString
someAction = return "This is a string\n"
-- Original version
producerHandle :: Handle -> IO ()
producerHandle h = do
str <- someAction
hPutStr h str
-- Conduit version
producerConduit :: MonadIO m => ConduitM i ByteString m ()
producerConduit = do
str <- liftIO someAction
yield str
main :: IO ()
main = runConduitRes $ producerConduit
.| gzip
.| sinkFile "some-file.txt.gz"
You can learn more about conduit in the conduit tutorial.
Your Java idea is interesting, give me a few more minutes, I'll add an answer that looks more like that.
EDIT
Here's a version that's closer to your Java style approach. It relies on a SinkFunc.hs module which is available as a Gist at: https://gist.github.com/snoyberg/283154123d30ff9e201ea4436a5dd22d
#!/usr/bin/env stack
-- stack --resolver lts-6.21 --install-ghc runghc --package conduit-extra
{-# LANGUAGE OverloadedStrings #-}
{-# OPTIONS_GHC -Wall -Werror #-}
import Data.ByteString (ByteString)
import Data.Conduit ((.|))
import Data.Conduit.Binary (sinkHandle)
import Data.Conduit.Zlib (gzip)
import System.IO (withBinaryFile, IOMode (WriteMode))
import SinkFunc (withSinkFunc)
-- Some helper function you may have
someAction :: IO ByteString
someAction = return "This is a string\n"
producerFunc :: (ByteString -> IO ()) -> IO ()
producerFunc write = do
str <- someAction
write str
main :: IO ()
main = withBinaryFile "some-file.txt.gz" WriteMode $ \h -> do
let sink = gzip .| sinkHandle h
withSinkFunc sink $ \write -> producerFunc write
EDIT 2 One more for good measure, actually using ZipSink to stream the data to multiple different files. There are lots of different ways of slicing this, but this is one way that works:
#!/usr/bin/env stack
-- stack --resolver lts-6.21 --install-ghc runghc --package conduit-extra
{-# LANGUAGE OverloadedStrings #-}
import Control.Monad.Trans.Resource (MonadResource)
import Data.ByteString (ByteString)
import Data.Conduit (ConduitM, (.|), yield, runConduitRes, ZipSink (..))
import Data.Conduit.Binary (sinkFile)
import qualified Data.Conduit.List as CL
import Data.Conduit.Zlib (gzip)
data Output = Foo ByteString | Bar ByteString
fromFoo :: Output -> Maybe ByteString
fromFoo (Foo bs) = Just bs
fromFoo _ = Nothing
fromBar :: Output -> Maybe ByteString
fromBar (Bar bs) = Just bs
fromBar _ = Nothing
producer :: Monad m => ConduitM i Output m ()
producer = do
yield $ Foo "This is going to Foo"
yield $ Bar "This is going to Bar"
sinkHelper :: MonadResource m
=> FilePath
-> (Output -> Maybe ByteString)
-> ConduitM Output o m ()
sinkHelper fp f
= CL.mapMaybe f
.| gzip
.| sinkFile fp
main :: IO ()
main = runConduitRes
$ producer
.| getZipSink
(ZipSink (sinkHelper "foo.txt.gz" fromFoo) *>
ZipSink (sinkHelper "bar.txt.gz" fromBar))
For incremental compression, I think you could make use of compressIO/foldCompressStream in Codec.Compression.Zlib.Internal.
If you're able to represent your producer action as an IO (Maybe a) (such as an MVar take or InputStream/Chan read) where Nothing signifies end of input, something like this should work:
import System.IO (Handle)
import qualified Data.ByteString as BS
import qualified Codec.Compression.Zlib.Internal as ZLib
compressedWriter :: Handle -> (IO (Maybe BS.ByteString)) -> IO ()
compressedWriter handle source =
ZLib.foldCompressStream
(\next -> source >>= maybe (next BS.empty) next)
(\chunk next -> BS.hPut handle chunk >> next)
(return ())
(ZLib.compressIO ZLib.rawFormat ZLib.defaultCompressParams)
This solution is similar to Michael Snoyman's EDIT 2, but uses the foldl, pipes, pipes-zlib and streaming-eversion packages.
{-# language OverloadedStrings #-}
module Main where
-- cabal install bytestring foldl pipes pipes-zlib streaming-eversion
import Data.Foldable
import Data.ByteString
import qualified Control.Foldl as L
import Pipes
import qualified Pipes.Prelude
import Pipes.Zlib (compress,defaultCompression,defaultWindowBits)
import Streaming.Eversion.Pipes (transvertMIO)
import System.IO
type Tag = String
producer :: Monad m => Producer (Tag,ByteString) m ()
producer = do
yield $ ("foo","This is going to Foo")
yield $ ("bar","This is going to Bar")
foldForTag :: Handle -> Tag -> L.FoldM IO (Tag,ByteString) ()
foldForTag handle tag =
L.premapM (\(tag',bytes) -> if tag' == tag then Just bytes else Nothing)
. L.handlesM L.folded
. transvertMIO (compress defaultCompression defaultWindowBits)
$ L.mapM_ (Data.ByteString.hPut handle)
main :: IO ()
main = do
withFile "foo.txt" WriteMode $ \h1 ->
withFile "bar.txt" WriteMode $ \h2 ->
let multifold = traverse_ (uncurry foldForTag) [(h1,"foo"),(h2,"bar")]
in L.impurely Pipes.Prelude.foldM multifold producer
This solution is similar to Michael Snoyman's EDIT 2, but uses the streaming, streaming-bytestring, pipes and pipes-zlib packages.
{-# language OverloadedStrings #-}
module Main where
-- cabal install bytestring streaming streaming-bytestring pipes pipes-zlib
import Data.ByteString
import qualified Data.ByteString.Streaming as B
import Streaming
import qualified Streaming.Prelude as S
import Pipes (next)
import qualified Pipes.Prelude
import Pipes.Zlib (compress,defaultCompression,defaultWindowBits)
import System.IO
type Tag = String
producer :: Monad m => Stream (Of (Tag,ByteString)) m ()
producer = do
S.yield ("foo","This is going to Foo")
S.yield ("bar","This is going to Bar")
-- I couldn't find a streaming-zlib on Hackage, took a pipes detour
compress' :: MonadIO m
=> Stream (Of ByteString) m r -> Stream (Of ByteString) m r
compress' = S.unfoldr Pipes.next
. compress defaultCompression defaultWindowBits
. Pipes.Prelude.unfoldr S.next
keepTag :: Monad m
=> Tag -> Stream (Of (Tag,ByteString)) m r -> Stream (Of ByteString) m r
keepTag tag = S.map snd . S.filter ((tag==) . fst)
main :: IO ()
main = runResourceT
. B.writeFile "foo.txt" . B.fromChunks . compress' . keepTag "foo"
. B.writeFile "bar.txt" . B.fromChunks . compress' . keepTag "bar"
$ S.copy producer
I make use of the copy function from Streaming.Prelude, that allows you to
Duplicate the content of stream, so that it can be acted on twice in
different ways, but without breaking streaming.

Haskell IO-Streams and Groundhog db usage

How to compile the following program? Somehow I cannot escape the error "No instance for (PersistBackend IO).
My aim is to see, how to efficiently fill a db-table using io-streams. The type of makeOutputStream is (Maybe a -> IO ()) -> IO (OutputStream a) while insertWords returns m () and it does not accept IO () as return type.
(Late addition: a work around found, but it is not an answer to the question. See below.)
The error msg is:
Words_read2.hs:30:36:
No instance for (PersistBackend IO)
arising from a use of `insertWord'
Possible fix: add an instance declaration for (PersistBackend IO)
In the first argument of `Streams.makeOutputStream', namely
`insertWord'
In a stmt of a 'do' block:
os <- Streams.makeOutputStream insertWord
In the expression:
do { is <- Streams.handleToInputStream h >>= Streams.words;
os <- Streams.makeOutputStream insertWord;
Streams.connect is os }
And the code producing this error is:
{-# LANGUAGE GADTs, TypeFamilies, TemplateHaskell, QuasiQuotes, FlexibleInstances, FlexibleContexts, StandaloneDeriving #-}
import qualified Data.ByteString as B
import Data.Maybe
import Control.Monad.IO.Class (MonadIO, liftIO)
import Database.Groundhog.Core
import Database.Groundhog.TH
import Database.Groundhog.Sqlite
import System.IO
import System.IO.Streams.File
import qualified System.IO.Streams as Streams
data Words = Words {word :: String} deriving (Eq, Show)
mkPersist defaultCodegenConfig [groundhog|
definitions:
- entity: Words
|]
insertWord :: (MonadIO m, PersistBackend m) => Maybe B.ByteString -> m ()
insertWord wo = case wo of
Just ww -> insert_ $ Words ((show . B.unpack) ww)
Nothing -> return ()
main = do
withSqliteConn "words2.sqlite" $ runDbConn $ do
runMigration defaultMigrationLogger $ migrate (undefined :: Words)
liftIO $ withFile "web2" ReadMode $ \h -> do -- a link to /usr/share/dict/web2 - a list of words one per line
is <- Streams.handleToInputStream h >>= Streams.words
os <- Streams.makeOutputStream insertWord
Streams.connect is os
As a work around, we can do things other way: we do not try to work inside runDbConn but rather return a handle to a (pool of) connection and pass it around. The idea come from SO answer to question:
Making Custom Instances of PersistBackend.
{-# LANGUAGE GADTs, TypeFamilies, TemplateHaskell, QuasiQuotes, FlexibleInstances, FlexibleContexts, StandaloneDeriving #-}
import qualified Data.ByteString as B
import Data.Maybe
import qualified Data.Text as T
import qualified Data.Text.Encoding as T
import Control.Monad.IO.Class -- (MonadIO, liftIO)
import Control.Monad.Trans.Control
import Database.Groundhog.Core
import Database.Groundhog.TH
import Database.Groundhog.Sqlite
import System.IO
import System.IO.Streams.File
import qualified System.IO.Streams as Streams
data Words = Words {word :: T.Text} deriving (Eq, Show)
mkPersist defaultCodegenConfig [groundhog|
definitions:
- entity: Words
|]
main = do
gh <- do withSqlitePool "words5.sqlite" 5 $ \pconn -> return pconn
runDbConn (runMigration defaultMigrationLogger $ migrate (undefined :: Words)) gh
withFile "web3" ReadMode $ \h -> do -- 500 words from /usr/share/dict/web2 - a list of words one per line
is <- Streams.handleToInputStream h >>= Streams.words
os <- Streams.makeOutputStream (iw2db gh)
Streams.connect is os
iw2db :: (MonadIO m, MonadBaseControl IO m, ConnectionManager cm Sqlite) => cm -> Maybe B.ByteString -> m()
iw2db gh (Just x) = runDbConn (insert_ $ Words (T.decodeUtf8 x)) gh
iw2db gh Nothing = return ()
Groundhog actions can run only in monad which is an instance of PersistBackend. IO cannot be made its instance because unlike DbPersist it does not carry connection information.
I like the code in the workaround, but can be made much faster. Now each action is run within its own transaction opened by runDbConn. To avoid this we can open a connection from pool and begin a single transaction. And then each action reuses this connection avoiding transaction overhead. Also createSqlitePool is nicer than withSqlitePool in this case.
{-# LANGUAGE GADTs, TypeFamilies, TemplateHaskell, QuasiQuotes, FlexibleInstances, FlexibleContexts, StandaloneDeriving #-}
import qualified Data.ByteString as B
import Data.Maybe
import qualified Data.Text as T
import qualified Data.Text.Encoding as T
import Control.Monad.IO.Class -- (MonadIO, liftIO)
import Control.Monad.Trans.Control
import Database.Groundhog.Core
import Database.Groundhog.TH
import Database.Groundhog.Sqlite
import System.IO
import System.IO.Streams.File
import qualified System.IO.Streams as Streams
import Control.Monad.Logger (MonadLogger, NoLoggingT(..))
data Words = Words {word :: T.Text} deriving (Eq, Show)
mkPersist defaultCodegenConfig [groundhog|
definitions:
- entity: Words
|]
main = do
gh <- createSqlitePool "words5.sqlite" 5
runDbConn (runMigration defaultMigrationLogger $ migrate (undefined :: Words)) gh
withFile "/usr/share/dict/words" ReadMode $ \h -> do -- 500 words from /usr/share/dict/web2 - a list of words one per line
is <- Streams.handleToInputStream h >>= Streams.words
withConn (\conn -> liftIO $ do -- (conn :: Sqlite) with opened transaction
os <- Streams.makeOutputStream (iw2db conn)
-- It is important to put Streams.connect inside withConn so that it uses the same transaction
-- If we put it outside, the transaction will be already closed and Sqlite will automatically do a new transaction for each insert
Streams.connect is os) gh
iw2db :: (MonadIO m, MonadBaseControl IO m, ConnectionManager cm Sqlite)
=> cm -> Maybe B.ByteString -> m ()
iw2db gh (Just x) = runDbConnNoTransaction (insert_ $ Words (T.decodeUtf8 x)) gh
iw2db gh Nothing = return ()
-- Probably this function should go to the Generic module
runDbConnNoTransaction :: (MonadBaseControl IO m, MonadIO m, ConnectionManager cm conn) => DbPersist conn (NoLoggingT m) a -> cm -> m a
runDbConnNoTransaction f cm = runNoLoggingT (withConnNoTransaction (runDbPersist f) cm)

Aeson deriveJSON combined with conduit sinkParser

Continuing my exploration of conduit and aeson, how would I go about using my own data type in stead of Value in this (slightly modified) code snippet from the Yesod book.
{-# LANGUAGE OverloadedStrings, TemplateHaskell #-}
import Network.Wai (Response, responseLBS, Application, requestBody)
import Network.HTTP.Types (status200, status400)
import Network.Wai.Handler.Warp (run)
import Data.Aeson.Parser (json)
import Data.Conduit.Attoparsec (sinkParser)
import Control.Monad.IO.Class (liftIO)
import Data.Aeson (Value(..), encode, object, (.=))
import Control.Exception (SomeException)
import Data.ByteString (ByteString)
import Data.Conduit (ResourceT, ($$))
import Control.Exception.Lifted (handle)
import qualified Data.HashMap.Strict as M
import Data.Aeson.TH (deriveJSON)
-- I ADDED THIS
data JSONRequest = JSONRequest {
command :: ByteString,
params :: M.HashMap ByteString ByteString
}
deriveJSON id ''JSONRequest
-- END OF WHAT I ADDED
main :: IO ()
main = run 3000 app
app :: Application
app req = handle invalidJson $ do
value <- requestBody req $$ sinkParser json
newValue <- liftIO $ dispatch value
return $ responseLBS
status200
[("Content-Type", "application/json")]
$ encode newValue
invalidJson :: SomeException -> ResourceT IO Response
invalidJson ex = return $ responseLBS
status400
[("Content-Type", "application/json")]
$ encode $ object
[ ("message" .= show ex)
]
-- Application-specific logic would go here.
dispatch :: Value -> IO Value
dispatch = return
Basically, I want to change the type of dispatch to JSONRequest -> IO JSONRequest. How do I tell the parser to use my own derived instance of fromJSON?
I tried just adding a type declaration, praying for polymorphic return type on json, but I realised it is strictly for Value.
Just looking at the types, don't you just need to fmap your fromJSON over the result coming from json? With a suitable signature for dispatch we just need:
-- import Data.Aeson
app :: Application
app req = handle invalidJson $ do
result <- requestBody req $$ sinkParser (fmap fromJSON json)
next_result <- liftIO $ dispatch result
return $ responseLBS status200 [("Content-Type", "application/json")]
$ encode next_result
dispatch :: Result JSONRequest -> IO JSONRequest
dispatch (Error str) = undefined
dispatch (Success jsonreq) = return jsonreq
But maybe it's a little clearer written thus:
-- import Data.Aeson
-- import qualified Data.Attoparsec as Atto
toRequest :: Value -> Result JSONRequest
toRequest = fromJSON -- specialized now to your fromJSON
jsonRequestParser :: Atto.Parser (Result JSONRequest)
jsonRequestParser = fmap toRequest json
app :: Application
app req = handle invalidJson $ do
result <- requestBody req $$ sinkParser jsonRequestParser
next_result <- liftIO $ dispatch result
return $ responseLBS status200 [("Content-Type", "application/json")]
$ encode next_result
dispatch :: Result JSONRequest -> IO JSONRequest
dispatch (Error str) = undefined
dispatch (Success jsonreq) = return jsonreq
I left the parser returning a Result JSONRequest so dispatch is handling Error cases too, which might mean you need your exception handling somehow?

Resources