I'm trying to read some irregular input (for example, a commands, that can appear from time to time) from file. E.g. initially source file is empty, and my program was started. Then a some string was appended to the file, and my program must read this string.
A first naive implementation:
import System.IO
import Control.Monad
listen :: Handle -> IO ()
listen file = forever $ do
ineof <- hIsEOF file
if ineof
then do
s <- hGetLine file
putStrLn s
else
return ()
But it's not working properly of course (because of a performance issues first of all). How can I implement this correctly (maybe with a conduits usage)?
I've put together an example of implementing this below. The basic idea is:
Monitor for file changes using the fsnotify package.
Use sourceFileRange to stream the previously unconsumed portions of the file.
Use an MVar to let the fsnotify callback signal the Source to continue reading.
This assumes that the source file is only ever added to, never delete or shortened.
import Control.Concurrent (forkIO, threadDelay)
import Control.Concurrent.MVar (MVar, newEmptyMVar, putMVar,
takeMVar)
import Control.Exception (IOException, try)
import Control.Monad (forever, void, when)
import Control.Monad.IO.Class (liftIO)
import Data.ByteString (ByteString)
import qualified Data.ByteString as S
import Data.Conduit (MonadResource, Source, bracketP,
runResourceT, ($$), ($=))
import Data.Conduit.Binary (sourceFileRange)
import qualified Data.Conduit.List as CL
import Data.IORef (IORef, modifyIORef, newIORef,
readIORef)
import Data.Time (getCurrentTime)
import Filesystem (canonicalizePath)
import Filesystem.Path.CurrentOS (decodeString, directory)
import System.FSNotify (Event (..), startManager,
stopManager, watchDir)
tryIO :: IO a -> IO (Either IOException a)
tryIO = try
sourceFileForever :: MonadResource m => FilePath -> Source m ByteString
sourceFileForever fp' = bracketP startManager stopManager $ \manager -> do
fp <- liftIO $ canonicalizePath $ decodeString fp'
baton <- liftIO newEmptyMVar
liftIO $ watchDir manager (directory fp) (const True) $ \event -> void $ tryIO $ do
fpE <- canonicalizePath $
case event of
Added x _ -> x
Modified x _ -> x
Removed x _ -> x
when (fpE == fp) $ putMVar baton ()
consumedRef <- liftIO $ newIORef 0
loop baton consumedRef
where
loop :: MonadResource m => MVar () -> IORef Integer -> Source m ByteString
loop baton consumedRef = forever $ do
consumed <- liftIO $ readIORef consumedRef
sourceFileRange fp' (Just consumed) Nothing $= CL.iterM counter
liftIO $ takeMVar baton
where
counter bs = liftIO $ modifyIORef consumedRef (+ fromIntegral (S.length bs))
main :: IO ()
main = do
let fp = "foo.txt"
writeFile fp "Hello World!"
_ <- forkIO $ runResourceT $ sourceFileForever fp $$ CL.mapM_ (liftIO . print)
forever $ do
now <- getCurrentTime
appendFile fp $ show now ++ "\n"
threadDelay 1000000
Related
I have a long running process which I need to start.
It takes a few seconds to start, and outputs logs to stdout, with one that indicates it is ready.
I would like to:
start the process silently, so that the stdout from the process is not displayed in my session.
capture the output as it streams so that I can determine that it is ready.
have some handle on the process so that I can stop the process at a later point.
I have come close using Shelly, Turtle and System.Process, but fail to capture the stdout.
Using System.Process I had:
import Control.Concurrent (threadDelay)
import Control.Concurrent.Async (race)
import System.IO
import System.Process
startService :: IO ProcessHandle
startService = do
let cmd = "./my-service"
args = [ "-p 1234" ]
(_, Just hout, _, p) <- createProcess $ (proc cmd args) { std_out = CreatePipe }
started <- either id id <$> race (checkStarted hout) timeOut
unless started $ fail "Service not started"
pure p
where
checkStarted :: Handle -> IO Bool
checkStarted h = do
str <- hGetLine h
-- check str for started log, else loop
timeOut :: IO Bool
timeOut = do
threadDelay 10000000
pure False
But The handler hout was never in a ready state.
Using Shelly I had:
import Control.Concurrent (threadDelay)
import Control.Concurrent.Async (race)
import Control.Concurrent.MVar
import Shelly
import System.IO
startService :: IO (Async ())
startService = do
let cmd = "./my-service"
args = [ "-p 1234" ]
startedMVar <- newEmptyMVar
async <- shelly $ asyncSh $ runHandle cmd args $ recordWhenStarted startedMVar
started <- either id id <$> race (readMVar startedMVar) timeOut
unless started $ fail "Service not started"
pure async
where
recordWhenStarted :: MVar Bool -> Text -> IO ()
recordWhenStarted mvar txt =
when (isStartedLog txt) $
modifyMVar_ mvar (const $ pure True)
timeOut :: IO Bool
timeOut = do
threadDelay 10000000
pure False
But the recordWhenStarted is never called.
The following is example of starting process and reading stdout in a program of mine:
runMystem :: [T.Text] -> IO T.Text
runMystem stemWords = do
(i, o, _, ph) <- createProcess (proc mystemExecutabe mystemParams) { std_in = CreatePipe, std_out = CreatePipe }
res <- flip (maybe (return T.empty)) i $ \hIn ->
flip (maybe (return T.empty)) o $ \hOut -> do
hSetEncoding hIn utf8
hSetEncoding hOut utf8
forM_ stemWords $ TIO.hPutStrLn hIn
TIO.hGetContents hOut
void $ waitForProcess ph
return res
This answer uses the process-streaming library (written by the author of this answer) which is a set of helpers over process.
{-# language OverloadedStrings #-}
{-# language NumDecimals #-}
import System.Process.Streaming (execute,piped,shell,foldOut,transduce1)
import qualified System.Process.Streaming.Text as PT
import Data.Text.Lazy (isInfixOf)
import Control.Applicative
import Control.Monad
import Control.Concurrent (threadDelay)
import Control.Concurrent.Async
import Control.Concurrent.MVar
main :: IO ()
main = do
started <- newEmptyMVar
let execution =
execute (piped (shell "{ sleep 3 ; echo fooo ; sleep 3 ; }")) $
foldOut . transduce1 PT.utf8x . PT.eachLine $ lookline
lookline line = do
when (isInfixOf "foo" line) (putMVar started ())
return (Right ())
stopOrNot =
do abort <- race (threadDelay 4e6) (readMVar started)
case abort of
Left () -> return () -- stop immediately
Right () -> runConcurrently empty -- sleep forever
result <- race stopOrNot execution
print result
execute installs exception handlers that terminate the external process when an asynchronous exceptions arrives, to it is safe to use race with it.
execute also takes care to drain any standard stream that is not read explicitly (like stderr in this case) to avoid a common source of deadlocks.
I have two threads: producer and consumer. Producer produces some (key,value) pairs and consumer inserts them into Map, wrapped into Data.IORef. I tried to use Control.Concurrent.BoundedChan for communication between producer and consumer, and it works fine (memory usage is constant), provided that I use BangPatterns where needed. The code is as follows:
{-# LANGUAGE BangPatterns #-}
module Main where
import qualified Data.Map.Strict as M
import Control.Concurrent
import Control.Concurrent.STM
import Control.Concurrent.STM.TBQueue
import qualified Control.Concurrent.BoundedChan as BC
import qualified Control.Concurrent.Chan.Unagi.Bounded as UBC
import qualified Data.Text as T
import System.Random
import Data.IORef
import Control.Monad
data Item = Item !Int !Int
data SinkState = SinkState {
myMap :: !(M.Map Int Int)
}
testBCs = do
chan <- BC.newBoundedChan 1000
forkIO $ source chan
sink chan
where
source chan = forever $ do
threadDelay 500
key <- getStdRandom (randomR (1,5000))
value <- getStdRandom (randomR (1,1000000))
BC.writeChan chan $ Item key value
sink chan = do
state <- newIORef SinkState {
myMap = M.empty
}
forever $ do
(Item key value) <- BC.readChan chan
atomicModifyIORef' state (\s -> (s { myMap = myMap s `seq` M.insert key value (myMap s) }, ()))
Now, when I switch from BoundedChan to Control.Concurrent.STM.TBQueue, memory starts to leak:
testTBs = do
chan <- atomically $ newTBQueue 1000
forkIO $ source chan
sink chan
where
source chan = forever $ do
threadDelay 500
key <- getStdRandom (randomR (1,5000))
value <- getStdRandom (randomR (1,1000000))
atomically $ writeTBQueue chan $ Item key value
sink chan = do
state <- newIORef SinkState {
myMap = M.empty
}
forever $ do
(Item key value) <- chan `seq` atomically $ readTBQueue chan
atomicModifyIORef' state (\s -> (s { myMap = myMap s `seq` M.insert key value (myMap s) }, ()))
And the profiling results looks like this (ran with +RTS -hd):
So my questions are:
What's going on in the second case?
What are those stg_ap_2_upd_info and other symbols?
How to fix leak?
Without digging in to the space leak itself, one solution is simply moving the consumer logic entirely into the STM. This is as simple as replacing the IORef with a TVar. But to fully take advantage of the STM the queue read and state updates should be placed into a single atomically block. Both operations will be executed in a transaction. A helpful side effect is that we've also gained exception safety.
import Control.Concurrent
import Control.Concurrent.STM
import Control.Monad
import System.Random
import qualified Data.Map.Strict as M
data Item = Item {-# UNPACK #-} !Int {-# UNPACK #-} !Int
data SinkState = SinkState {
myMap :: !(M.Map Int Int)
}
main :: IO ()
main = do
chan <- newTBQueueIO 1000
forkIO . forever $ do
threadDelay 500
key <- getStdRandom $ randomR (1,5000)
value <- getStdRandom $ randomR (1,1000000)
atomically . writeTBQueue chan $ Item key value
state <- newTVarIO SinkState {
myMap = M.empty
}
forever . atomically $ do
Item key value <- readTBQueue chan
modifyTVar' state $ \s -> s { myMap = M.insert key value (myMap s) }
I want to create a process and write some text from my haskell program into the process's stdin periodically (from an IO action).
The following works correctly in GHCi but don't work correctly when built and run. In GHCi everything works perfectly and the value from the IO action is fed in periodically. When built and run however, it seems to pause for arbitrarily long periods of time when writing to stdin of the process.
I've used CreateProcess (from System.Process) to create the handle and tried hPutStrLn (bufferent set to NoBuffering -- LineBuffering didnt work either).
So I'm trying the process-streaming package and pipes but can't seem to get anything to work at all.
The real question is this: How do i create a process from haskell and write to it periodically?
Minimal example that exhibits this behavior:
import System.Process
import Data.IORef
import qualified Data.Text as T -- from the text package
import qualified Data.Text.IO as TIO
import Control.Concurrent.Timer -- from the timers package
import Control.Concurrent.Suspend -- from the suspend package
main = do
(Just hin, _,_,_) <- createProcess_ "bgProcess" $
(System.Process.proc "grep" ["10"]) { std_in = CreatePipe }
ref <- newIORef 0 :: IO (IORef Int)
flip repeatedTimer (msDelay 1000) $ do
x <- atomicModifyIORef' ref $ \x -> (x + 1, x)
hSetBuffering hin NoBuffering
TIO.hPutStrLn hin $ T.pack $ show x
Any help will be greatly appreciated.
This is a pipes Producer that emits a sequence of numbers with a second delay:
{-# language NumDecimals #-}
import Control.Concurrent
import Pipes
import qualified Data.ByteString.Char8 as Bytes
periodic :: Producer Bytes.ByteString IO ()
periodic = go 0
where
go n = do
d <- liftIO (pure (Bytes.pack (show n ++ "\n"))) -- put your IO action here
Pipes.yield d
liftIO (threadDelay 1e6)
go (succ n)
And, using process-streaming, we can feed the producer to an external process like this:
import System.Process.Streaming
main :: IO ()
main = do
executeInteractive (shell "grep 10"){ std_in = CreatePipe } (feedProducer periodic)
I used executeInteractive, which sets std_in automatically to NoBuffering.
Also, if you pipe std_out and want to process each match immediately, be sure to pass the --line-buffered option to grep (or use the stdbuf command) to ensure that matches are immediately available at the output.
What about using threadDelay, e.g.:
import Control.Monad (forever)
import Control.Concurrent (threadDelay)
...
forever $ do
x <- atomicModifyIORef' ref $ \x -> (x + 1, x)
hSetBuffering hin NoBuffering
TIO.hPutStrLn hin $ T.pack $ show x
threadDelay 1000000 -- 1 sec
Spawn this off in another thread if you need to do other work at the same time.
You can remove he need for the IORef with:
loop h x = do
hSetBuffering h NoBuffering
TIO.hPutStrLn h $ T.pack $ show x
threadDelay 1000000
loop h (x+1)
And, of course, you only need to do the hSetBuffering once - e.g. do it just before you enter the loop.
I am trying to write a simple program in Haskell which listens over Zero MQ socket and publishes it to websocket connection, below is my code
{-# LANGUAGE OverloadedStrings #-}
import Data.Char (isPunctuation, isSpace)
import Data.Monoid (mappend)
import Data.Text (Text)
import Control.Exception (fromException)
import Control.Monad (forM_, forever)
import Control.Concurrent (MVar, newMVar, modifyMVar_, readMVar)
import Control.Monad.IO.Class (liftIO)
import Control.Monad
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Network.WebSockets as WS
import System.ZMQ3.Monadic
import Data.ByteString.Char8 (pack, unpack)
import Control.Concurrent (threadDelay)
import Data.Text.Encoding
import Data.ByteString.Internal
main :: IO ()
main = do
liftIO $ putStrLn "starting main..."
WS.runServer "0.0.0.0" 9160 $ application
application :: WS.Request -> WS.WebSockets WS.Hybi00 ()
application rq = do
liftIO $ putStrLn "starting..."
WS.acceptRequest rq
sink <- WS.getSink
WS.getVersion >>= liftIO . putStrLn . ("Client version: " ++)
msg <- WS.receiveData
liftIO $ putStrLn $ show $ (msg:: Text)
WS.sendTextData (msg :: Text)
runZMQ $ do
repSocket<- socket Rep
s<-return $bind repSocket "tcp://*:6555"
msg2 <- receive repSocket
let quote = msg2
--msg2 <- WS.receiveData
--liftIO $ putStrLn $ quote
WS.sendTextData $ ("test"::Text)
But compiler fails at statement WS.sendTextData $ ("test"::Text) saying below error
websocket_server.hs:42:17:
Couldn't match expected type `ZMQ z a0'
with actual type `WS.WebSockets p0 ()'
In a stmt of a 'do' block: WS.sendTextData $ ("test" :: Text)
In the second argument of `($)', namely
`do { repSocket <- socket Rep;
s <- return $ bind repSocket "tcp://*:6555";
msg2 <- receive repSocket;
let quote = msg2;
.... }'
In a stmt of a 'do' block:
runZMQ
$ do { repSocket <- socket Rep;
s <- return $ bind repSocket "tcp://*:6555";
msg2 <- receive repSocket;
let quote = msg2;
.... }
I am not sure how to deal with this issue how can I make do block statements return same value when the values cannot be converted into each other?
A simple liftIO should be enough for that call, but I haven't tried.
The trouble is that both the ZMQ and Websockets libraries define a "top level monad" that is not a monad transformer. So there is no provided way to layer the monads. This is poor design on their parts.
My suggestion would be to A) write your own ZMQ transformer or B) use the nonmonadic interface provided by ZMQ at the top level.
I'm processing some audio using portaudio. The haskell FFI bindings call a user defined callback whenever there's audio data to be processed. This callback should be handled very quickly and ideally with no I/O. I wanted to save the audio input and return quickly since my application doesn't need to react to the audio in realtime (right now I'm just saving the audio data to a file; later I'll construct a simple speech recognition system).
I like the idea of pipes and thought I could use that library. The problem is that I don't know how to create a Producer that returns data that came in through a callback.
How do I handle my use case?
Here's what I'm working with right now, in case that helps (the datum mvar isn't working right now but I don't like storing all the data in a seq... I'd rather process it as it came instead of just at the end):
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses #-}
module Main where
import Codec.Wav
import Sound.PortAudio
import Sound.PortAudio.Base
import Sound.PortAudio.Buffer
import Foreign.Ptr
import Foreign.ForeignPtr
import Foreign.C.Types
import Foreign.Storable
import qualified Data.StorableVector as SV
import qualified Data.StorableVector.Base as SVB
import Control.Exception.Base (evaluate)
import Data.Int
import Data.Sequence as Seq
import Control.Concurrent
instance Buffer SV.Vector a where
fromForeignPtr fp = return . SVB.fromForeignPtr fp
toForeignPtr = return . (\(a, b, c) -> (a, c)) . SVB.toForeignPtr
-- | Wrap a buffer callback into the generic stream callback type.
buffCBtoRawCB' :: (StreamFormat input, StreamFormat output, Buffer a input, Buffer b output) =>
BuffStreamCallback input output a b -> StreamCallback input output
buffCBtoRawCB' func = \a b c d e -> do
fpA <- newForeignPtr_ d -- We will not free, as callback system will do that for us
fpB <- newForeignPtr_ e -- We will not free, as callback system will do that for us
storeInp <- fromForeignPtr fpA (fromIntegral $ 1 * c)
storeOut <- fromForeignPtr fpB (fromIntegral $ 0 * c)
func a b c storeInp storeOut
callback :: MVar (Seq.Seq [Int32]) -> PaStreamCallbackTimeInfo -> [StreamCallbackFlag] -> CULong
-> SV.Vector Int32 -> SV.Vector Int32 -> IO StreamResult
callback seqmvar = \timeinfo flags numsamples input output -> do
putStrLn $ "timeinfo: " ++ show timeinfo ++ "; flags are " ++ show flags ++ " in callback with " ++ show numsamples ++ " samples."
print input
-- write data to output
--mapM_ (uncurry $ pokeElemOff output) $ zip (map fromIntegral [0..(numsamples-1)]) datum
--print "wrote data"
input' <- evaluate $ SV.unpack input
modifyMVar_ seqmvar (\s -> return $ s Seq.|> input')
case flags of
[] -> return $ if unPaTime (outputBufferDacTime timeinfo) > 0.2 then Complete else Continue
_ -> return Complete
done doneMVar = do
putStrLn "total done dood!"
putMVar doneMVar True
return ()
main = do
let samplerate = 16000
Nothing <- initialize
print "initialized"
m <- newEmptyMVar
datum <- newMVar Seq.empty
Right s <- openDefaultStream 1 0 samplerate Nothing (Just $ buffCBtoRawCB' (callback datum)) (Just $ done m)
startStream s
_ <- takeMVar m -- wait until our callbacks decide they are done!
Nothing <- terminate
print "let's see what we've recorded..."
stuff <- takeMVar datum
print stuff
-- write out wav file
-- let datum =
-- audio = Audio { sampleRate = samplerate
-- , channelNumber = 1
-- , sampleData = datum
-- }
-- exportFile "foo.wav" audio
print "main done"
The simplest solution is to use MVars to communicate between the callback and Producer. Here's how:
import Control.Proxy
import Control.Concurrent.MVar
fromMVar :: (Proxy p) => MVar (Maybe a) -> () -> Producer p a IO ()
fromMVar mvar () = runIdentityP loop where
loop = do
ma <- lift $ takeMVar mvar
case ma of
Nothing -> return ()
Just a -> do
respond a
loop
Your stream callback will write Just input to the MVar and your finalization callback will write Nothing to terminate the Producer.
Here's a ghci example demonstrating how it works:
>>> mvar <- newEmptyMVar :: IO (MVar (Maybe Int))
>>> forkIO $ runProxy $ fromMVar mvar >-> printD
>>> putMVar mvar (Just 1)
1
>>> putMVar mvar (Just 2)
2
>>> putMVar mvar Nothing
>>> putMVar mvar (Just 3)
>>>
Edit: The pipes-concurrency library now provides this feature, and it even has a section in the tutorial explaining specifically how to use it to get data out of callbacks.