Copying data between databases using Haskell Persistent

Copying data between databases using Haskell Persistent - haskell

First up, I have a large Sqlite database. I want to copy a subset of the data in this database to another Sqlite database in a streaming fashion using the Haskell persistent library.
Here's the only I managed to do it, but its very slow (and uses more memory) compared to loading the data in memory and then writing it to the new database.
import Database.Persist.Sqlite (SqliteConf(..), runSqlite, runMigration, runSqlPersistMPool)
import Database.Persist (entityVal, Entity, createPoolConfig)
import Data.Pool (Pool)
import Data.Conduit (ConduitT, (.|), runConduit)
import qualified Data.Conduit.List as CL
main :: RIO App ()
main = do
ipool <- liftIO $ createPoolConfig (SqliteConf "mydb.db" 1)
opool <- liftIO $ createPoolConfig (SqliteConf "mydbsmall.db" 1)
liftIO $ flip runSqlPersistMPool opool $ do
runMigration migrateAll
liftIO $ flip runSqlPersistMPool ipool $ do
runConduit $ selectSource (distinct $ from $ \pays -> return pays) .| (migrate opool)
migrate :: (Monad m, MonadIO m)
=> Pool SqlBackend
-> ConduitT (Entity Pays) Void m ()
migrate pool = CL.mapM_ (\e -> liftIO $ flip runSqlPersistMPool pool $ insertKey (entityKey e) (entityVal e))
The problem is in the migrate function since I'm calling runSqlPersistMPool on every single data in my conduit stream.
What is the right way to do it ?

Related

Collecting the Async results as they become available

How would you collect the results of a list of Async a in Haskell as they become available? The idea is to start processing the results of asynchronous tasks as soon as they are available.
The best I could come up with is the following function:
collect :: [Async a] -> IO [a]
collect [] = return []
collect asyncs = do
(a, r) <- waitAny asyncs
rs <- collect (filter (/= a) asyncs)
return (r:rs)
However, this function does not exhibits the desired behavior since, as pointed out in the comment below, it doesn't return till all the asynchronous tasks are completed. Furthermore, collect runs in O(n^2) since I'm filtering the list at each recursive step. This could be improved by using a more efficient structure (and maybe indexing the position of the Async values in the list).
Maybe there are library functions that take care of this, but I could not find them in the Control.Concurrent.Async module and I wonder why.
EDIT: after thinking the problem a bit more carefully, I'm wondering whether such function is a good idea. I could just use fmap on the asynchronous tasks. Maybe it is a better practice to wait for the results when there is no other choice.

As I mentioned in my other answer, streaming results out of a list of Asyncs as they become available is best achieved using a stream processing library. Here's an example using pipes.
import Control.Concurrent (threadDelay)
import Control.Concurrent.Async
import Control.Concurrent.STM
import Data.Functor (($>))
import Pipes
import Pipes.Concurrent -- from the pipes-concurrency package
import qualified Pipes.Prelude as P
asCompleted :: MonadIO m => [Async a] -> Producer a m ()
asCompleted asyncs = do
(o, i, seal) <- liftIO $ spawn' unbounded
liftIO $ forkIO $ do
forConcurrently asyncs (\async -> atomically $ waitSTM async >>= send o)
atomically seal
fromInput i
main = do
actions <- traverse async [threadDelay 2000000 $> "bar", threadDelay 1000000 $> "foo"]
runEffect $ asCompleted actions >-> P.print
-- after one second, prints "foo", then "bar" a second later
Using pipes-concurrency, we spawn' an Output-Input pair and immediately convert the Input to a Producer using fromInput. Asynchronously, we send items as they become available. When all the Asyncs have completed we seal the inbox to close down the Producer.

Implemented via TChan, additionally implemented a version which can react immediately, but it is more complex and also might have problems with exceptions (if you want to receive exceptions, use SlaveThread.fork instead of forkIO), so I commented that code in case you're not interested in it:
import Control.Concurrent (threadDelay)
import Control.Concurrent (forkIO)
import Control.Concurrent.Async
import Control.Concurrent.STM
import Control.Monad
collect :: [Async a] -> IO [a]
collect = atomically . collectSTM
collectSTM :: [Async a] -> STM [a]
collectSTM as = do
c <- newTChan
collectSTMChan c as
collectSTMChan :: TChan a -> [Async a] -> STM [a]
collectSTMChan chan as = do
mapM_ (waitSTM >=> writeTChan chan) as
replicateM (length as) (readTChan chan)
main :: IO ()
main = do
a1 <- async (threadDelay 2000000 >> putStrLn "slept 2 secs" >> return 2)
a2 <- async (threadDelay 3000000 >> putStrLn "slept 3 secs" >> return 3)
a3 <- async (threadDelay 1000000 >> putStrLn "slept 1 sec" >> return 1)
res <- collect [a1,a2,a3]
putStrLn (show res)
-- -- reacting immediately
-- a1 <- async (threadDelay 2000000 >> putStrLn "slept 2 secs" >> return 2)
-- a2 <- async (threadDelay 3000000 >> putStrLn "slept 3 secs" >> return 3)
-- a3 <- async (threadDelay 1000000 >> putStrLn "slept 1 sec" >> return 1)
-- c <- collectChan [a1,a2,a3]
-- replicateM_ 3 (atomically (readTChan c) >>= \v -> putStrLn ("Received: " ++ show v))
-- collectChan :: [Async a] -> IO (TChan a)
-- collectChan as = do
-- c <- newTChanIO
-- forM_ as $ \a -> forkIO ((atomically . (waitSTM >=> writeTChan c)) a)
-- return c

I'm reading your question as "is it possible to sort a list of Asyncs by their completion time?". If that's what you meant, the answer is yes.
import Control.Applicative (liftA2)
import Control.Concurrent (threadDelay)
import Control.Concurrent.Async
import Data.Functor (($>))
import Data.List (sortBy)
import Data.Ord (comparing)
import Data.Time (getCurrentTime)
sortByCompletion :: [Async a] -> IO [a]
sortByCompletion = fmap (fmap fst . sortBy (comparing snd)) . mapConcurrently withCompletionTime
where withCompletionTime async = liftA2 (,) (wait async) getCurrentTime
main = do
asyncs <- traverse async [threadDelay 2000000 $> "bar", threadDelay 1000000 $> "foo"]
sortByCompletion asyncs
-- ["foo", "bar"], after two seconds
Using mapConcurrently we wait for each Async on a separate thread. Upon completion we get the current time - the time at which the Async completed - and use it to sort the results. This is O(n log n) complexity because we are sorting the list. (Your original algorithm was effectively a selection sort.)
Like your collect, sortByCompletion doesn't return until all the Asyncs in the list have completed. If you wanted to stream results onto the main thread as they become available, well, lists aren't a very good tool for that. I'd use a streaming abstraction like conduit or pipes, or, working at a lower level, a TQueue. See my other answer for an example.

Scotty: connection pool as monad reader

There are trillions of monad tutorial including the reader and it seems all clear when you read about it. But when you actually need to write, it becomes a different matter.
I'v never used the Reader, just never got to it in practice. So I don't know how to go about it although I read about it.
I need to implement a simple database connection pool in Scotty so every action can use the pool. The pool must be "global" and accessible by all action functions. I read that the way to do it is the Reader monad. If there are any other ways please let me know.
Can you please help me and show how to do this with the Reader correctly?
I'll probably learn faster if I see how it is done with my own examples.
{-# LANGUAGE OverloadedStrings #-}
module DB where
import Data.Pool
import Database.MongoDB
-- Get data from config
ip = "127.0.0.1"
db = "index"
--Create the connection pool
pool :: IO (Pool Pipe)
pool = createPool (runIOE $ connect $ host ip) close 1 300 5
-- Run a database action with connection pool
run :: Action IO a -> IO (Either Failure a)
run act = flip withResource (\x -> access x master db act) =<< pool
So the above is simple. and I want to use the 'run' function in every Scotty action to access the database connection pool. Now, the question is how to wrap it in the Reader monad to make it accessible by all functions? I understand that the 'pool' variable must be 'like global' to all the Scotty action functions.
Thank you.
UPDATE
I am updating the question with the full code snippet. Where I pass the 'pool' variable down the function chain. If someone can show how to change it to utilize the monad Reader please.
I don't understand how to do it.
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Network.HTTP.Types
import Web.Scotty
import qualified Data.Text as T
import qualified Data.Text.Lazy as LT
import Data.Text.Lazy.Internal
import Data.Monoid (mconcat)
import Data.Aeson (object, (.=), encode)
import Network.Wai.Middleware.Static
import Data.Pool
import Database.MongoDB
import Control.Monad.Trans (liftIO,lift)
main = do
-- Create connection pool to be accessible by all action functions
pool <- createPool (runIOE $ connect $ host "127.0.0.1") close 1 300 5
scotty 3000 (basal pool)
basal :: Pool Pipe -> ScottyM ()
basal pool = do
middleware $ staticPolicy (noDots >-> addBase "static")
get "/json" (showJson pool)
showJson :: Pool Pipe -> ActionM ()
showJson pool = do
let run act = withResource pool (\pipe -> access pipe master "index" act)
d <- lift $ run $ fetch (select [] "tables")
let r = either (const []) id d
text $ LT.pack $ show r
Thanks.
UPDATE 2
I tried to do it the way it was suggested below but it does not work.
If anyone has any ideas, please. The list of compile errors is so long that I don't even know where to begin ....
main = do
pool <- createPool (runIOE $ connect $ host "127.0.0.1") close 1 300 5
scotty 3000 $ runReaderT basal pool
basal :: ScottyT LT.Text (ReaderT (Pool Pipe) IO) ()
basal = do
middleware $ staticPolicy (noDots >-> addBase "static")
get "/json" $ showJson
showJson :: ActionT LT.Text (ReaderT (Pool Pipe) IO) ()
showJson = do
p <- lift ask
let rdb a = withResource p (\pipe -> access pipe master "index" a)
j <- liftIO $ rdb $ fetch (select [] "tables")
text $ LT.pack $ show j
UPDATE 3
Thanks to cdk for giving the idea and thanks to Ivan Meredith for giving the scottyT suggestion. This question also helped: How do I add the Reader monad to Scotty's monad
This is the version that compiles. I hope it helps someone and saves some time.
import qualified Data.Text.Lazy as T
import qualified Data.Text.Lazy.Encoding as T
import Data.Text.Lazy (Text)
import Control.Monad.Reader
import Web.Scotty.Trans
import Data.Pool
import Database.MongoDB
type ScottyD = ScottyT Text (ReaderT (Pool Pipe) IO)
type ActionD = ActionT Text (ReaderT (Pool Pipe) IO)
-- Get data from config
ip = "127.0.0.1"
db = "basal"
main = do
pool <- createPool (runIOE $ connect $ host ip) close 1 300 5
let read = \r -> runReaderT r pool
scottyT 3000 read read basal
-- Application, meaddleware and routes
basal :: ScottyD ()
basal = do
get "/" shoot
-- Route action handlers
shoot :: ActionD ()
shoot = do
r <- rundb $ fetch $ select [] "computers"
html $ T.pack $ show r
-- Database access shortcut
rundb :: Action IO a -> ActionD (Either Failure a)
rundb a = do
pool <- lift ask
liftIO $ withResource pool (\pipe -> access pipe master db a)

I've been trying to figure out this exact problem myself. Thanks to hints on this SO question, and other research I've come up with the following which works for me. The key bit you were missing was to use scottyT
No doubt there is a prettier way to write runDB but I don't have much experience in Haskell, so please post it if you can do better.
type MCScottyM = ScottyT TL.Text (ReaderT (Pool Pipe) IO)
type MCActionM = ActionT TL.Text (ReaderT (Pool Pipe) IO)
main :: IO ()
main = do
pool <- createPool (runIOE $ connect $ host "127.0.0.1") close 1 300 5
scottyT 3000 (f pool) (f pool) $ app
where
f = \p -> \r -> runReaderT r p
app :: MCScottyM ()
app = do
middleware $ staticPolicy (noDots >-> addBase "public")
get "/" $ do
p <- runDB dataSources
html $ TL.pack $ show p
runDB :: Action IO a -> MCActionM (Either Failure a)
runDB a = (lift ask) >>= (\p -> liftIO $ withResource p (\pipe -> access pipe master "botland" a))
dataSources :: Action IO [Document]
dataSources = rest =<< find (select [] "datasources")
Update
I guess this a bit more pretty.
runDB :: Action IO a -> MCActionM (Either Failure a)
runDB a = do
p <- lift ask
liftIO $ withResource p db
where
db pipe = access pipe master "botland" a

As you've alluded, the way to make it accessable is to wrap your computations in the Reader monad or more likely the ReaderT transformer. So your run function (changed slightly)
run :: Pool Pipe -> Action IO a -> IO (Either Failure a)
run pool act =
flip withResource (\x -> access x master db act) =<< pool
becomes
run :: Action IO a -> ReaderT (Pool Pipe) IO (Either Failure a)
run act = do
pool <- ask
withResource pool (\x -> access x master db act)
Computations inside a ReaderT r m a environment can access the r using ask and ReaderT seemingly conjures it out of thin air! In reality, the ReaderT monad is just plumbing the Env throughout the computation without you having to worry about it.
To run a ReaderT action, you use runReaderT :: ReaderT r m a -> r -> m a. So you call runReaderT on your top level scotty function to provide the Pool and runReaderT will unwrap the ReaderT environment and return you a value in the base monad.
For example, to evaluate your run function
-- remember: run act :: ReaderT (Pool Pipe) IO (Either Failure a)
runReaderT (run act) pool
but you wouldn't want to use runReaderT on run, as it is probably part of a larger computation that should also share the ReaderT environment. Try to avoid using runReaderT on "leaf" computations, you should generally call it as high up in the program logic as possible.
EDIT: The difference between Reader and ReaderT is that Reader is a monad while ReaderT is a monad transformer. That is, ReaderT adds the Reader behaviour to another monad (or monad transformer stack). If you're not familiar with monad transformers I'd recommend real world haskell - transformers.
You have showJson pool ~ ActionM () and you want to add a Reader environment with access to a Pool Pipe. In this case, you actually need ActionT and ScottyT transformers rather than ReaderT in order to work with functions from the scotty package.
Note that ActionM is defined type ActionM = ActionT Text IO, similarly for ScottyM.
I don't have all the necessary libraries installed, so this might not typecheck, but it should give you the right idea.
basal :: ScottyT Text (ReaderT (Pool Pipe) IO) ()
basal = do
middleware $ staticPolicy (...)
get "/json" showJson
showJson :: ActionT Text (ReaderT (Pool Pipe) IO) ()
showJson = do
pool <- lift ask
let run act = withResource pool (\p -> access p master "index act)
d <- liftIO $ run $ fetch $ select [] "tables"
text . TL.pack $ either (const "") show d

An infinite reading from file

I'm trying to read some irregular input (for example, a commands, that can appear from time to time) from file. E.g. initially source file is empty, and my program was started. Then a some string was appended to the file, and my program must read this string.
A first naive implementation:
import System.IO
import Control.Monad
listen :: Handle -> IO ()
listen file = forever $ do
ineof <- hIsEOF file
if ineof
then do
s <- hGetLine file
putStrLn s
else
return ()
But it's not working properly of course (because of a performance issues first of all). How can I implement this correctly (maybe with a conduits usage)?

I've put together an example of implementing this below. The basic idea is:
Monitor for file changes using the fsnotify package.
Use sourceFileRange to stream the previously unconsumed portions of the file.
Use an MVar to let the fsnotify callback signal the Source to continue reading.
This assumes that the source file is only ever added to, never delete or shortened.
import Control.Concurrent (forkIO, threadDelay)
import Control.Concurrent.MVar (MVar, newEmptyMVar, putMVar,
takeMVar)
import Control.Exception (IOException, try)
import Control.Monad (forever, void, when)
import Control.Monad.IO.Class (liftIO)
import Data.ByteString (ByteString)
import qualified Data.ByteString as S
import Data.Conduit (MonadResource, Source, bracketP,
runResourceT, ($$), ($=))
import Data.Conduit.Binary (sourceFileRange)
import qualified Data.Conduit.List as CL
import Data.IORef (IORef, modifyIORef, newIORef,
readIORef)
import Data.Time (getCurrentTime)
import Filesystem (canonicalizePath)
import Filesystem.Path.CurrentOS (decodeString, directory)
import System.FSNotify (Event (..), startManager,
stopManager, watchDir)
tryIO :: IO a -> IO (Either IOException a)
tryIO = try
sourceFileForever :: MonadResource m => FilePath -> Source m ByteString
sourceFileForever fp' = bracketP startManager stopManager $ \manager -> do
fp <- liftIO $ canonicalizePath $ decodeString fp'
baton <- liftIO newEmptyMVar
liftIO $ watchDir manager (directory fp) (const True) $ \event -> void $ tryIO $ do
fpE <- canonicalizePath $
case event of
Added x _ -> x
Modified x _ -> x
Removed x _ -> x
when (fpE == fp) $ putMVar baton ()
consumedRef <- liftIO $ newIORef 0
loop baton consumedRef
where
loop :: MonadResource m => MVar () -> IORef Integer -> Source m ByteString
loop baton consumedRef = forever $ do
consumed <- liftIO $ readIORef consumedRef
sourceFileRange fp' (Just consumed) Nothing $= CL.iterM counter
liftIO $ takeMVar baton
where
counter bs = liftIO $ modifyIORef consumedRef (+ fromIntegral (S.length bs))
main :: IO ()
main = do
let fp = "foo.txt"
writeFile fp "Hello World!"
_ <- forkIO $ runResourceT $ sourceFileForever fp $$ CL.mapM_ (liftIO . print)
forever $ do
now <- getCurrentTime
appendFile fp $ show now ++ "\n"
threadDelay 1000000

How to perform database queries in GHCi in Yesod Application

How to, for example, insert a new User into a database using Yesod application's models? Or is there a better way?
I am dealing with scaffolded application. Now I created App instance and dont know how to perform requests using it.
:i Extra
data Extra
= Extra {extraCopyright :: Data.Text.Internal.Text,
extraAnalytics :: Maybe Data.Text.Internal.Text}
-- Defined in `Settings
let e = Extra "asdf" Nothing
let c = AppConfig {appEnv = Development, appPort = 3000, appRoot = "/", appHost = "localhost", appExtra = e}
f <- makeFoundation c
:t f
f :: App
:i App
data App
= App {settings :: AppConfig DefaultEnv Extra,
getStatic :: Yesod.Static.Static,
connPool :: persistent-1.2.3.0:Database.Persist.Class.PersistConfig.PersistConfigPool
PersistConf,
httpManager :: http-client-0.2.0.1:Network.HTTP.Client.Types.Manager,
persistConfig :: PersistConf,
appLogger :: Yesod.Core.Types.Logger}
-- Defined in `Foundation'
What next?

If you just want to do Persistent queries in ghci you can do this without creating a Yesod application. Unfortunately doing this is differs quite a bit depending on the specific back end you want to use.
For SQLite:
> import Database.Persist.Sqlite
> import Model
> pool <- createSqlitePool "yesod-test.sqlite3" 2
> runSqlite "yesod-test.sqlite3" (runMigration migrateAll)
> userId <- runSqlite "yesod-test.sqlite3" (insert (User "foo#bar.com" Nothing))
For Postgresql:
-- In Shell: $ createdb yesod-test
> import Database.PostgreSQL.Simple
> con <- connectPostgreSQL "dbname=yesod-test"
> import Database.Persist.Postgresql
> pcon <- openSimpleConn con
> import Model
> runSqlPersistM (runMigration migrateAll) pcon
> userId <- runSqlPersistM (insert (User "foo#bar.com" Nothing)) pcon
> Just user <- runSqlPersistM (get userId) pcon
> userIdent user

The scaffolding provides (at least with yesod-bin 1.4.5) a function db in Application.hs which you can use:
$ cabal repl
...
*Application> db $ insert $ User "foo#bar.com" Nothing

It looks like the accepted answer is pretty old. Here is an update for running persistent queries in IO with the postgresql backend.
import Control.Monad.Reader (ReaderT)
import Control.Monad.Logger (LoggingT, runStdoutLoggingT)
import Database.Persist.Sql (SqlBackend, runSqlConn)
import Database.Persist.Postgresql (withPostgresqlConn)
runDBIO :: ReaderT SqlBackend (LoggingT IO) a -> IO a
runDBIO = runStdoutLoggingT . withPostgresqlConn "dbname=test-db" . runSqlConn

How to create a Database Monad Stack in Happstack?

I want to create a Happstack application with lots of access to a database. I think that a Monad Stack with IO at the bottom and a Database Write-like monad on top (with log writer in the middle) will work to have a clear functions in each access, example:
itemsRequest :: ServerConfig -> ServerPart Response
itemsRequest cf = dir "items" $ do
methodM [GET,HEAD]
liftIO $ noticeM (scLogger cf) "sended job list"
items <- runDBMonad (scDBConnString cf) $ getItemLists
case items of
(Right xs) -> ok $ toResponse $ show xs
(Left err) -> internalServerError $ toResponse $ show err
With:
getItemList :: MyDBMonad (Error [Item])
getItemList = do
-- etc...
But I have little knowledge of Monad and Monad Transformers (I see this question as an exercise to learn about it), and I have no idea how to begin the creation of Database Monad, how to lift the IO from happstack to the Database Stack,...etc.

Here is some minimal working code compiled from snippets above for confused newbies like me.
You put stuff into AppConfig type and grab it with ask inside your response makers.
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Happstack.Server
import Control.Monad.Reader
import qualified Data.ByteString.Char8 as C
myApp :: AppMonad Response
myApp = do
-- access app config. look mom, no lift!
test <- ask
-- try some happstack funs. no lift either.
rq <- askRq
bs <- lookBS "lol"
-- test IO please ignore
liftIO . print $ test
liftIO . print $ rq
liftIO . print $ bs
-- bye
ok $ toResponse ("Oh, hi!" :: C.ByteString)
-- Put your stuff here.
data AppConfig = AppConfig { appSpam :: C.ByteString
, appEggs :: [C.ByteString] } deriving (Eq, Show)
config = AppConfig "THIS. IS. SPAAAAAM!!1" []
type AppMonad = ReaderT AppConfig (ServerPartT IO)
main = simpleHTTP (nullConf {port=8001}) $ runReaderT myApp config {appEggs=["red", "gold", "green"]}

You likely want to use 'ReaderT':
type MyMonad a = ReaderT DbHandle ServerPart a
The Reader monad transformer makes a single value accessible using the ask function - in this case, the value we want everyone to get at is the database connection.
Here, DbHandle is some connection to your database.
Because 'ReaderT' is already an instance of all of the happstack-server type-classes all normal happstack-server functions will work in this monad.
You probably also want some sort of helper to open and close the database connection:
runMyMonad :: String -> MyMonad a -> ServerPart a
runMyMonad connectionString m = do
db <- liftIO $ connect_to_your_db connectionString
result <- runReaderT m db
liftIO $ close_your_db_connection db
(It might be better to use a function like 'bracket' here, but I don't know that there is such an operation for the ServerPart monad)
I don't know how you want to do logging - how do you plan to interact with your log-file? Something like:
type MyMonad a = ReaderT (DbHandle, LogHandle) ServerPart a
and then:
askDb :: MyMonad DbHandle
askDb = fst <$> ask
askLogger :: MyMonad LogHandle
askLogger = snd <$> ask
might be enough. You could then build on those primitives to make higher-level functions. You would also need to change runMyMonad to be passed in a LogHandle, whatever that is.
Once you get more than two things you want access to it pays to have a proper record type instead of a tuple.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Copying data between databases using Haskell Persistent - haskell

Related

Collecting the Async results as they become available

Scotty: connection pool as monad reader

An infinite reading from file

How to perform database queries in GHCi in Yesod Application

How to create a Database Monad Stack in Happstack?

Categories

Resources