How to compile the following program? Somehow I cannot escape the error "No instance for (PersistBackend IO).
My aim is to see, how to efficiently fill a db-table using io-streams. The type of makeOutputStream is (Maybe a -> IO ()) -> IO (OutputStream a) while insertWords returns m () and it does not accept IO () as return type.
(Late addition: a work around found, but it is not an answer to the question. See below.)
The error msg is:
Words_read2.hs:30:36:
No instance for (PersistBackend IO)
arising from a use of `insertWord'
Possible fix: add an instance declaration for (PersistBackend IO)
In the first argument of `Streams.makeOutputStream', namely
`insertWord'
In a stmt of a 'do' block:
os <- Streams.makeOutputStream insertWord
In the expression:
do { is <- Streams.handleToInputStream h >>= Streams.words;
os <- Streams.makeOutputStream insertWord;
Streams.connect is os }
And the code producing this error is:
{-# LANGUAGE GADTs, TypeFamilies, TemplateHaskell, QuasiQuotes, FlexibleInstances, FlexibleContexts, StandaloneDeriving #-}
import qualified Data.ByteString as B
import Data.Maybe
import Control.Monad.IO.Class (MonadIO, liftIO)
import Database.Groundhog.Core
import Database.Groundhog.TH
import Database.Groundhog.Sqlite
import System.IO
import System.IO.Streams.File
import qualified System.IO.Streams as Streams
data Words = Words {word :: String} deriving (Eq, Show)
mkPersist defaultCodegenConfig [groundhog|
definitions:
- entity: Words
|]
insertWord :: (MonadIO m, PersistBackend m) => Maybe B.ByteString -> m ()
insertWord wo = case wo of
Just ww -> insert_ $ Words ((show . B.unpack) ww)
Nothing -> return ()
main = do
withSqliteConn "words2.sqlite" $ runDbConn $ do
runMigration defaultMigrationLogger $ migrate (undefined :: Words)
liftIO $ withFile "web2" ReadMode $ \h -> do -- a link to /usr/share/dict/web2 - a list of words one per line
is <- Streams.handleToInputStream h >>= Streams.words
os <- Streams.makeOutputStream insertWord
Streams.connect is os
As a work around, we can do things other way: we do not try to work inside runDbConn but rather return a handle to a (pool of) connection and pass it around. The idea come from SO answer to question:
Making Custom Instances of PersistBackend.
{-# LANGUAGE GADTs, TypeFamilies, TemplateHaskell, QuasiQuotes, FlexibleInstances, FlexibleContexts, StandaloneDeriving #-}
import qualified Data.ByteString as B
import Data.Maybe
import qualified Data.Text as T
import qualified Data.Text.Encoding as T
import Control.Monad.IO.Class -- (MonadIO, liftIO)
import Control.Monad.Trans.Control
import Database.Groundhog.Core
import Database.Groundhog.TH
import Database.Groundhog.Sqlite
import System.IO
import System.IO.Streams.File
import qualified System.IO.Streams as Streams
data Words = Words {word :: T.Text} deriving (Eq, Show)
mkPersist defaultCodegenConfig [groundhog|
definitions:
- entity: Words
|]
main = do
gh <- do withSqlitePool "words5.sqlite" 5 $ \pconn -> return pconn
runDbConn (runMigration defaultMigrationLogger $ migrate (undefined :: Words)) gh
withFile "web3" ReadMode $ \h -> do -- 500 words from /usr/share/dict/web2 - a list of words one per line
is <- Streams.handleToInputStream h >>= Streams.words
os <- Streams.makeOutputStream (iw2db gh)
Streams.connect is os
iw2db :: (MonadIO m, MonadBaseControl IO m, ConnectionManager cm Sqlite) => cm -> Maybe B.ByteString -> m()
iw2db gh (Just x) = runDbConn (insert_ $ Words (T.decodeUtf8 x)) gh
iw2db gh Nothing = return ()
Groundhog actions can run only in monad which is an instance of PersistBackend. IO cannot be made its instance because unlike DbPersist it does not carry connection information.
I like the code in the workaround, but can be made much faster. Now each action is run within its own transaction opened by runDbConn. To avoid this we can open a connection from pool and begin a single transaction. And then each action reuses this connection avoiding transaction overhead. Also createSqlitePool is nicer than withSqlitePool in this case.
{-# LANGUAGE GADTs, TypeFamilies, TemplateHaskell, QuasiQuotes, FlexibleInstances, FlexibleContexts, StandaloneDeriving #-}
import qualified Data.ByteString as B
import Data.Maybe
import qualified Data.Text as T
import qualified Data.Text.Encoding as T
import Control.Monad.IO.Class -- (MonadIO, liftIO)
import Control.Monad.Trans.Control
import Database.Groundhog.Core
import Database.Groundhog.TH
import Database.Groundhog.Sqlite
import System.IO
import System.IO.Streams.File
import qualified System.IO.Streams as Streams
import Control.Monad.Logger (MonadLogger, NoLoggingT(..))
data Words = Words {word :: T.Text} deriving (Eq, Show)
mkPersist defaultCodegenConfig [groundhog|
definitions:
- entity: Words
|]
main = do
gh <- createSqlitePool "words5.sqlite" 5
runDbConn (runMigration defaultMigrationLogger $ migrate (undefined :: Words)) gh
withFile "/usr/share/dict/words" ReadMode $ \h -> do -- 500 words from /usr/share/dict/web2 - a list of words one per line
is <- Streams.handleToInputStream h >>= Streams.words
withConn (\conn -> liftIO $ do -- (conn :: Sqlite) with opened transaction
os <- Streams.makeOutputStream (iw2db conn)
-- It is important to put Streams.connect inside withConn so that it uses the same transaction
-- If we put it outside, the transaction will be already closed and Sqlite will automatically do a new transaction for each insert
Streams.connect is os) gh
iw2db :: (MonadIO m, MonadBaseControl IO m, ConnectionManager cm Sqlite)
=> cm -> Maybe B.ByteString -> m ()
iw2db gh (Just x) = runDbConnNoTransaction (insert_ $ Words (T.decodeUtf8 x)) gh
iw2db gh Nothing = return ()
-- Probably this function should go to the Generic module
runDbConnNoTransaction :: (MonadBaseControl IO m, MonadIO m, ConnectionManager cm conn) => DbPersist conn (NoLoggingT m) a -> cm -> m a
runDbConnNoTransaction f cm = runNoLoggingT (withConnNoTransaction (runDbPersist f) cm)
Related
I would like for my default handler to be able to catch all of the exceptions that my App throws but in order for this to happen I need to manually call raise after manually adding some exception catching around my IO code.
below is an example minimal server:
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE ScopedTypeVariables #-}
module Lib
( someFunc
) where
import Web.Scotty.Trans
import Control.Monad.Trans
import Control.Monad.Reader
import Control.Monad.Catch
import Control.Monad.Except
import Data.Text.Lazy as TL
data AppEnv = AppEnv
{ appStuff :: String
}
newtype App a = App
{ unApp :: ReaderT AppEnv IO a
} deriving (Functor, Applicative, Monad, MonadIO, MonadReader AppEnv, MonadThrow)
someFunc :: IO ()
someFunc = do
let run a = runReaderT (unApp $ App a) (AppEnv "APPY STUFF")
scottyT 8080 run $ do
defaultHandler $ \(e :: TL.Text) -> do
liftIO $ print "HERE"
liftIO $ print $ showError e
html $ "Something Went Seriously Wrong"
get "/" $ do
(r :: (Either TL.Text String)) <- liftIO $ runExceptT $ do
(uId) <- lift $ readFile "./helloworld.txt"
return $ ("hello")
liftIO $ print r
case r of
Left l -> raise l
Right s -> (html "hello world")
get "/catch-this" $ do
error "Catch Me"
(html "hello world")
notFound $ do
html "That is not a valid route"
I would like to be able to catch all of my uncaught exceptions in my default handler however this is not the default behavior of scotty that only happens if you call raise. I could wrap all of my ActionM code blocks in ExceptT however this seems like a messy/mechanical way of solving this problem. I mostly want to do this for logging purposes so I can report out to Sentry or Log to a file and this would make it much more convenient.
I figured I'd throw this in there as I recently was looking for this same solution again. It's unfortunate but I was never able to get the behavior that I wanted out of scotty.
Fortunately since scotty is just a nice library to create WAI application's you can get a nice work around using the Settings type from warp and the Options type from scotty.
Below is an example of how you can approach this:
{-# LANGUAGE OverloadedStrings #-}
module Lib
( someFunc
) where
import Web.Scotty.Trans
import Data.Text
import qualified Data.Text.Lazy as TL
import Control.Monad.IO.Class
import Control.Exception
import Network.HTTP.Types
import System.IO.Error
import Network.Wai.Handler.Warp
import Network.Wai
myOpts :: Options
myOpts = Options 1 mySettings
mySettings :: Settings
mySettings = setOnExceptionResponse myHandler $ setPort 3002 $ defaultSettings
myHandler :: SomeException -> Response
myHandler se = responseLBS status500 [] "HERE WE ARE"
someFunc :: IO ()
someFunc = do
scottyOptsT myOpts id routes
myExceptions :: (MonadIO m) => TL.Text -> ActionT TL.Text m ()
myExceptions t = do
liftIO $ print t
html "error"
routes :: (MonadIO m) => ScottyT TL.Text m ()
routes = do
defaultHandler $ \str -> do
liftAndCatchIO $ print str
status status500
json ("welp you thought"::Text)
get "/:here" $ do
liftIO $ ioError $ userError "Hahah"
text "here"
You can then tap further into the Settings type provided by warp so that you could maybe log all the error messages or perform some custom action using the following methods setOnException setOnExceptionResponse.
I'm trying to thread configuration through my Scotty based application using ReaderT monad transformer approach, and having trouble doing so. I have to use configuration both when defining routes (as some of them depend on the config) and when handling actual requests.
The latter works just fine in the ActionT, but no matter what I try I just can't get the types right in ScottyT.
Here's the minimal example I compiled from the ReaderT sample from Scotty GitHub repository:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Control.Applicative
import Control.Monad.Reader (MonadIO, MonadReader, ReaderT, asks, lift, runReaderT)
import Data.Default.Class (def)
import Data.Text.Lazy (Text, pack)
import Prelude
import Web.Scotty.Trans (ScottyT, get, scottyOptsT, text, capture)
data Config = Config
{ environment :: String
} deriving (Eq, Read, Show)
newtype ConfigM a = ConfigM
{ runConfigM :: ReaderT Config IO a
} deriving (Applicative, Functor, Monad, MonadIO, MonadReader Config)
application :: ScottyT Text ConfigM ()
application = do
get "/" $ do
e <- lift $ asks environment
text $ pack $ show e
path <- lift $ asks environment
get (capture path) $ do
text $ pack $ "Hello, custom path"
main :: IO ()
main = scottyOptsT def runIO application where
runIO :: ConfigM a -> IO a
runIO m = runReaderT (runConfigM m) config
config :: Config
config = Config
{ environment = "Development"
}
The error I'm getting is:
• No instance for (Control.Monad.Trans.Class.MonadTrans
(ScottyT Text))
arising from a use of ‘lift’
• In a stmt of a 'do' block: path <- lift $ asks environment
I've looked through the code where ScottyT type is outlined, and indeed there doesn't seem to be an instance of MonadTrans defined for it.
However, I don't feel I have enough both mana and Haskell experience to find a way out of it and would appreciate any help!
Thank you!
With a collective mind we all came to a currently viable solution to the problem.
ScottyT type cased to be a monad transformer with after https://github.com/scotty-web/scotty/pull/167 got merged, therefore there's currently no way of using it this way. There was a PR https://github.com/scotty-web/scotty/pull/181 aimed at bringing that feature back, but as far as I understood it has never got merged.
Since it's not a monad transformer we can only wrap it again:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Control.Applicative
import Control.Monad.Reader (MonadIO, MonadReader, ReaderT, asks, lift, runReaderT)
import Data.Default.Class (def)
import Data.Text.Lazy (Text, pack)
import Prelude
import Web.Scotty.Trans (ScottyT, get, scottyOptsT, text, capture)
data Config = Config
{ environment :: String
} deriving (Eq, Read, Show)
newtype ConfigM a = ConfigM
{ runConfigM :: ReaderT Config IO a
} deriving (Applicative, Functor, Monad, MonadIO, MonadReader Config)
application :: ConfigM (ScottyT Text ConfigM ())
application = do
path <- asks environment
return $
get "/" $ do
e <- lift $ asks environment
text $ pack $ show e
get (capture path) $
text $ pack $ "Hello, custom path"
runIO :: Config -> ConfigM a -> IO a
runIO c m = runReaderT (runConfigM m) c
main :: IO ()
main = do
let config = Config { environment = "/path" }
app <- runIO config application
scottyOptsT def (runIO config) app
Thanks everyone for helping me out, and hopefully this helps another wandering Scotty like me :).
I have a computation that along with other things generates some data (a lot of it) and I want to write into a file.
The way the code is structured now is (simplified):
writeRecord :: Handle -> Record -> IO ()
writeRecord h r = hPutStrLn h (toByteString r)
This function is then called periodically during a bigger computation. It is almost like a log, and in fact, multiple files are being written simultaneously.
Now I want the output file to be compressed with Gzip.
In languages like Java I would do something like:
outStream = new GzipOutputStream(new FileOutputStream(path))
and then would just write into that wrapped output stream.
What is the way of doing it in Haskell?
I think writing something like
writeRecord h r = hPut h ((compressed . toByteString) r)
is not correct because compressing each small bit individually isn't efficient (I even tried it and the size of the compressed file is bigger than uncompressed this way).
I also don't think that I can just produce a lazy ByteString (or even a list of chunks) and then write it with compressed . fromChunks because this will require my "generator" building the full thing in memory. And the fact that more than one file is produced at the same time makes it even more complicated.
So what would be a way to solve this in Haskell? Writing to file(s) and have them gzipped?
All the streaming libraries support compression. If I understand the particular problem and the way you are thinking about it, io-streams might be the simplest for your purposes. Here I alternate between writing to trump and clinton output streams, which are written as compressed files. I follow by showing the pipes equivalent of Michael's conduit program
#!/usr/bin/env stack
-- stack --resolver lts-6.21 --install-ghc runghc --package io-streams
{-# LANGUAGE OverloadedStrings #-}
import qualified System.IO.Streams as IOS
import qualified System.IO as IO
import Data.ByteString (ByteString)
analyzer :: IOS.OutputStream ByteString -> IOS.OutputStream ByteString -> IO ()
analyzer clinton trump = do
IOS.write (Just "This is a string\n") clinton
IOS.write (Just "This is a string\n") trump
IOS.write (Just "Clinton string\n") clinton
IOS.write (Just "Trump string\n") trump
IOS.write (Just "Another Clinton string\n") clinton
IOS.write (Just "Another Trump string\n") trump
IOS.write Nothing clinton
IOS.write Nothing trump
main:: IO ()
main =
IOS.withFileAsOutput "some-file-clinton.txt.gz" $ \clinton_compressed ->
IOS.withFileAsOutput "some-file-trump.txt.gz" $ \trump_compressed -> do
clinton <- IOS.gzip IOS.defaultCompressionLevel clinton_compressed
trump <- IOS.gzip IOS.defaultCompressionLevel trump_compressed
analyzer clinton trump
Obviously you can mix all kinds of IO in analyzer between acts of writing to the two output streams - I'm just show in the writes, so to speak. In particular, if analyzer is understood as depending on an input stream, the writes can depend on reads from the input stream. Here's a (slightly!) more complicated program that does that. If I run the program above I see
$ stack gzip_so.hs
$ gunzip some-file-clinton.txt.gz
$ gunzip some-file-trump.txt.gz
$ cat some-file-clinton.txt
This is a string
Clinton string
Another Clinton string
$ cat some-file-trump.txt
This is a string
Trump string
Another Trump string
With pipes and conduit there are various ways of achieving the above effect, with a higher level of decomposition of parts. Writing to separate files will however be a little more subtle. Here in any case is the pipes equivalent of Michael S's conduit program:
#!/usr/bin/env stack
-- stack --resolver lts-6.21 --install-ghc runghc --package pipes-zlib
{-# LANGUAGE OverloadedStrings #-}
import Control.Monad.IO.Class (MonadIO, liftIO)
import Data.ByteString (ByteString, hPutStr)
import System.IO (IOMode(..), withFile, Handle)
import Pipes
import qualified Pipes.ByteString as PB
import qualified Pipes.GZip as P
-- Some helper function you may have
someAction :: IO ByteString
someAction = return "This is a string\n"
-- Original version
producerHandle :: Handle -> IO ()
producerHandle h = do
str <- someAction
hPutStr h str
producerPipe :: MonadIO m => Producer ByteString m ()
producerPipe = do
str <- liftIO someAction
yield str
main :: IO ()
main = withFile "some-file-pipes.txt.gz" WriteMode $ \h ->
runEffect $ P.compress P.defaultCompression producerPipe >-> PB.toHandle h
-- Edit
Here for what it's worth is yet another way of superimposing several producers on a single thread with pipes or conduit, to add to the different approaches Michael S and danidiaz mentioned:
#!/usr/bin/env stack
-- stack --resolver lts-6.21 --install-ghc runghc --package pipes-zlib
{-# LANGUAGE OverloadedStrings #-}
import Pipes
import Pipes.GZip
import qualified Pipes.Prelude as P
import qualified Pipes.ByteString as Bytes
import System.IO
import Control.Monad (replicateM_)
producer = replicateM_ 50000 $ do
marie "This is going to Marie\n" -- arbitary IO can be interspersed here
arthur "This is going to Arthur\n" -- with liftIO
sylvia "This is going to Sylvia\n"
where
marie = yield; arthur = lift . yield; sylvia = lift . lift . yield
sinkHelper h p = runEffect (compress bestSpeed p >-> Bytes.toHandle h)
main :: IO ()
main =
withFile "marie.txt.gz" WriteMode $ \marie ->
withFile "arthur.txt.gz" WriteMode $ \arthur ->
withFile "sylvia.txt.gz" WriteMode $ \sylvia ->
sinkHelper sylvia
$ sinkHelper arthur
$ sinkHelper marie
$ producer
It is quite simple and fast, and can be written in conduit with the obvious alterations - but finding it natural involves a higher level of buy-in with the 'monad transformer stack' point of view. It would be the most natural way of writing such a program from the point of view of something like the streaming library.
Doing this with conduit is fairly straightforward, though you'd need to adjust your code a bit. I've put together an example of before and after code to demonstrate it. The basic idea is:
Replace hPutStr h with yield
Add some liftIO wrappers
Instead of using withBinaryFile or the like, use runConduitRes, gzip, and sinkFile
Here's the example:
#!/usr/bin/env stack
-- stack --resolver lts-6.21 --install-ghc runghc --package conduit-extra
{-# LANGUAGE OverloadedStrings #-}
import Control.Monad.IO.Class (MonadIO, liftIO)
import Data.ByteString (ByteString, hPutStr)
import Data.Conduit (ConduitM, (.|), yield, runConduitRes)
import Data.Conduit.Binary (sinkFile)
import Data.Conduit.Zlib (gzip)
import System.IO (Handle)
-- Some helper function you may have
someAction :: IO ByteString
someAction = return "This is a string\n"
-- Original version
producerHandle :: Handle -> IO ()
producerHandle h = do
str <- someAction
hPutStr h str
-- Conduit version
producerConduit :: MonadIO m => ConduitM i ByteString m ()
producerConduit = do
str <- liftIO someAction
yield str
main :: IO ()
main = runConduitRes $ producerConduit
.| gzip
.| sinkFile "some-file.txt.gz"
You can learn more about conduit in the conduit tutorial.
Your Java idea is interesting, give me a few more minutes, I'll add an answer that looks more like that.
EDIT
Here's a version that's closer to your Java style approach. It relies on a SinkFunc.hs module which is available as a Gist at: https://gist.github.com/snoyberg/283154123d30ff9e201ea4436a5dd22d
#!/usr/bin/env stack
-- stack --resolver lts-6.21 --install-ghc runghc --package conduit-extra
{-# LANGUAGE OverloadedStrings #-}
{-# OPTIONS_GHC -Wall -Werror #-}
import Data.ByteString (ByteString)
import Data.Conduit ((.|))
import Data.Conduit.Binary (sinkHandle)
import Data.Conduit.Zlib (gzip)
import System.IO (withBinaryFile, IOMode (WriteMode))
import SinkFunc (withSinkFunc)
-- Some helper function you may have
someAction :: IO ByteString
someAction = return "This is a string\n"
producerFunc :: (ByteString -> IO ()) -> IO ()
producerFunc write = do
str <- someAction
write str
main :: IO ()
main = withBinaryFile "some-file.txt.gz" WriteMode $ \h -> do
let sink = gzip .| sinkHandle h
withSinkFunc sink $ \write -> producerFunc write
EDIT 2 One more for good measure, actually using ZipSink to stream the data to multiple different files. There are lots of different ways of slicing this, but this is one way that works:
#!/usr/bin/env stack
-- stack --resolver lts-6.21 --install-ghc runghc --package conduit-extra
{-# LANGUAGE OverloadedStrings #-}
import Control.Monad.Trans.Resource (MonadResource)
import Data.ByteString (ByteString)
import Data.Conduit (ConduitM, (.|), yield, runConduitRes, ZipSink (..))
import Data.Conduit.Binary (sinkFile)
import qualified Data.Conduit.List as CL
import Data.Conduit.Zlib (gzip)
data Output = Foo ByteString | Bar ByteString
fromFoo :: Output -> Maybe ByteString
fromFoo (Foo bs) = Just bs
fromFoo _ = Nothing
fromBar :: Output -> Maybe ByteString
fromBar (Bar bs) = Just bs
fromBar _ = Nothing
producer :: Monad m => ConduitM i Output m ()
producer = do
yield $ Foo "This is going to Foo"
yield $ Bar "This is going to Bar"
sinkHelper :: MonadResource m
=> FilePath
-> (Output -> Maybe ByteString)
-> ConduitM Output o m ()
sinkHelper fp f
= CL.mapMaybe f
.| gzip
.| sinkFile fp
main :: IO ()
main = runConduitRes
$ producer
.| getZipSink
(ZipSink (sinkHelper "foo.txt.gz" fromFoo) *>
ZipSink (sinkHelper "bar.txt.gz" fromBar))
For incremental compression, I think you could make use of compressIO/foldCompressStream in Codec.Compression.Zlib.Internal.
If you're able to represent your producer action as an IO (Maybe a) (such as an MVar take or InputStream/Chan read) where Nothing signifies end of input, something like this should work:
import System.IO (Handle)
import qualified Data.ByteString as BS
import qualified Codec.Compression.Zlib.Internal as ZLib
compressedWriter :: Handle -> (IO (Maybe BS.ByteString)) -> IO ()
compressedWriter handle source =
ZLib.foldCompressStream
(\next -> source >>= maybe (next BS.empty) next)
(\chunk next -> BS.hPut handle chunk >> next)
(return ())
(ZLib.compressIO ZLib.rawFormat ZLib.defaultCompressParams)
This solution is similar to Michael Snoyman's EDIT 2, but uses the foldl, pipes, pipes-zlib and streaming-eversion packages.
{-# language OverloadedStrings #-}
module Main where
-- cabal install bytestring foldl pipes pipes-zlib streaming-eversion
import Data.Foldable
import Data.ByteString
import qualified Control.Foldl as L
import Pipes
import qualified Pipes.Prelude
import Pipes.Zlib (compress,defaultCompression,defaultWindowBits)
import Streaming.Eversion.Pipes (transvertMIO)
import System.IO
type Tag = String
producer :: Monad m => Producer (Tag,ByteString) m ()
producer = do
yield $ ("foo","This is going to Foo")
yield $ ("bar","This is going to Bar")
foldForTag :: Handle -> Tag -> L.FoldM IO (Tag,ByteString) ()
foldForTag handle tag =
L.premapM (\(tag',bytes) -> if tag' == tag then Just bytes else Nothing)
. L.handlesM L.folded
. transvertMIO (compress defaultCompression defaultWindowBits)
$ L.mapM_ (Data.ByteString.hPut handle)
main :: IO ()
main = do
withFile "foo.txt" WriteMode $ \h1 ->
withFile "bar.txt" WriteMode $ \h2 ->
let multifold = traverse_ (uncurry foldForTag) [(h1,"foo"),(h2,"bar")]
in L.impurely Pipes.Prelude.foldM multifold producer
This solution is similar to Michael Snoyman's EDIT 2, but uses the streaming, streaming-bytestring, pipes and pipes-zlib packages.
{-# language OverloadedStrings #-}
module Main where
-- cabal install bytestring streaming streaming-bytestring pipes pipes-zlib
import Data.ByteString
import qualified Data.ByteString.Streaming as B
import Streaming
import qualified Streaming.Prelude as S
import Pipes (next)
import qualified Pipes.Prelude
import Pipes.Zlib (compress,defaultCompression,defaultWindowBits)
import System.IO
type Tag = String
producer :: Monad m => Stream (Of (Tag,ByteString)) m ()
producer = do
S.yield ("foo","This is going to Foo")
S.yield ("bar","This is going to Bar")
-- I couldn't find a streaming-zlib on Hackage, took a pipes detour
compress' :: MonadIO m
=> Stream (Of ByteString) m r -> Stream (Of ByteString) m r
compress' = S.unfoldr Pipes.next
. compress defaultCompression defaultWindowBits
. Pipes.Prelude.unfoldr S.next
keepTag :: Monad m
=> Tag -> Stream (Of (Tag,ByteString)) m r -> Stream (Of ByteString) m r
keepTag tag = S.map snd . S.filter ((tag==) . fst)
main :: IO ()
main = runResourceT
. B.writeFile "foo.txt" . B.fromChunks . compress' . keepTag "foo"
. B.writeFile "bar.txt" . B.fromChunks . compress' . keepTag "bar"
$ S.copy producer
I make use of the copy function from Streaming.Prelude, that allows you to
Duplicate the content of stream, so that it can be acted on twice in
different ways, but without breaking streaming.
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric, ScopedTypeVariables #-}
import Data.Generics (Data, Typeable)
import Data.Conduit
import qualified Data.Conduit.List as CL
import Database.CouchDB.Conduit.Generic
import Database.CouchDB.Conduit
import Database.CouchDB.Conduit.View
import Data.ByteString.Char8 (ByteString, empty)
import Control.Monad.IO.Class (liftIO)
import Data.Aeson
import Data.Aeson.Types
import GHC.Generics
data Page = Page { id_ :: ByteString, url :: ByteString }
deriving (Show, Data, Typeable, Generic)
instance FromJSON Page
getPages :: IO ()
getPages = runCouch (def { couchHost = "192.168.0.103" }) $ do
couchView_ "reader" "reader" "pages" [] $ CL.mapM_ (liftIO . print)
This works and gives me this:
*Main> getPages
fromList [("key",String "802e343945c7f8da2d8a71fdb80025a7"),("id",String "802e343945c7f8da2d8a71fdb80025a7"),("value",String "http://yandex.ru")]`
But I actually want a function getPages :: IO [Page], so I tried this:
getPages = runCouch (def { couchHost = "192.168.0.103" }) $ do
couchView_ "reader" "reader" "pages" [] $ toType =$ CL.consume`
which gives me type error:
`Reader/Couch.hs:24:47:
Couldn't match expected type `Object' with actual type `Value'
Expected type: Conduit Object m1 b0
Actual type: Conduit Value m0 a0
In the first argument of `(=$)', namely `toType'
In the second argument of `($)', namely `toType =$ CL.consume'
Failed, modules loaded: none.
This is not surprising because couchView needs Sink Object m a as a parameter.
The question is: how to implement getPages :: IO [Page]?
I'm not familiar with conduit, aeson, or couchDB, but this at least type-checks:
getPages :: IO [Result Page]
getPages = runCouch (def { couchHost = "192.168.0.103" }) $ do
couchView_ "reader" "reader" "pages" [] $ CL.map (fromJSON . Object) =$ CL.consume
Using Parsec 3.1, it is possible to parse several types of inputs:
[Char] with Text.Parsec.String
Data.ByteString with Text.Parsec.ByteString
Data.ByteString.Lazy with Text.Parsec.ByteString.Lazy
I don't see anything for the Data.Text module. I want to parse Unicode content without suffering from the String inefficiencies. So I've created the following module based on the Text.Parsec.ByteString module:
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses #-}
{-# OPTIONS_GHC -fno-warn-orphans #-}
module Text.Parsec.Text
( Parser, GenParser
) where
import Text.Parsec.Prim
import qualified Data.Text as T
instance (Monad m) => Stream T.Text m Char where
uncons = return . T.uncons
type Parser = Parsec T.Text ()
type GenParser t st = Parsec T.Text st
Does it make sense to do so?
It this compatible with the rest of the Parsec API?
Additional comments:
I had to add {-# LANGUAGE NoMonomorphismRestriction #-} pragma in my parse modules to make it work.
Parsing Text is one thing, building an AST with Text is another thing. I will also need to pack my String before return:
module TestText where
import Data.Text as T
import Text.Parsec
import Text.Parsec.Prim
import Text.Parsec.Text
input = T.pack "xxxxxxxxxxxxxxyyyyxxxxxxxxxp"
parser = do
x1 <- many1 (char 'x')
y <- many1 (char 'y')
x2 <- many1 (char 'x')
return (T.pack x1, T.pack y, T.pack x2)
test = runParser parser () "test" input
Since Parsec 3.1.2 support of Data.Text is built-in!
See http://hackage.haskell.org/package/parsec-3.1.2
If you are stuck with older version, the code snippets in other answers are helpful, too.
That looks like exactly what you need to do.
It should be compatible with the rest of Parsec, include the Parsec.Char parsers.
If you're using Cabal to build your program, please put an upper bound of parsec-3.1 in your package description, in case the maintainer decides to include that instance in a future version of Parsec.
I added a function parseFromUtf8File to help reading UTF-8 encoded files in an efficient fashion. Works flawlessly with umlaut characters. Function type matches parseFromFile from Text.Parsec.ByteString. This version uses strict ByteStrings.
-- A derivate work from
-- http://stackoverflow.com/questions/4064532/using-parsec-with-data-text
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses #-}
{-# OPTIONS_GHC -fno-warn-orphans #-}
module Text.Parsec.Text
( Parser, GenParser, parseFromUtf8File
) where
import Text.Parsec.Prim
import qualified Data.Text as T
import qualified Data.ByteString as B
import Data.Text.Encoding
import Text.Parsec.Error
instance (Monad m) => Stream T.Text m Char where
uncons = return . T.uncons
type Parser = Parsec T.Text ()
type GenParser t st = Parsec T.Text st
-- | #parseFromUtf8File p filePath# runs a strict bytestring parser
-- #p# on the input read from #filePath# using
-- 'ByteString.readFile'. Returns either a 'ParseError' ('Left') or a
-- value of type #a# ('Right').
--
-- > main = do{ result <- parseFromFile numbers "digits.txt"
-- > ; case result of
-- > Left err -> print err
-- > Right xs -> print (sum xs)
-- > }
parseFromUtf8File :: Parser a -> String -> IO (Either ParseError a)
parseFromUtf8File p fname = do
raw <- B.readFile fname
let input = decodeUtf8 raw
return (runP p () fname input)