Capture the output while terminating a process - haskell

I need to run a process, do something while it is running, and finally terminate it. The
process in question writes things to standard output that I would like to retain. Unfortunately,
it seems that the process dies before I can connect and extract its last words. Having scarce
experience with asynchronous programming, I am having a difficulty finding a nice solution. It
would be fortunate if I can accomplish this task within the framework of RIO.Process, although I
am prepared to step outside of it if it cannot be avoided. (Note that RIO employs an unusual
way of invoking external processes via a callback system.)
Below is a highly simplified runnable example of what I am trying to achieve.
Here is an emulation of the program to be run:
(Put it in a file called x.sh and say chmod +x x.sh to make it executable.)
#!/bin/sh
trap 'echo "Terminating..."; exit 0' TERM
echo "Initialization complete."
while true; do sleep 1; done
Here is my code:
(Put it in a file called X.hs and compile with ghc -package rio X.hs.)
{-# language NoImplicitPrelude #-}
{-# language BlockArguments #-}
{-# language OverloadedStrings #-}
module Main where
import RIO
import RIO.Process
import Data.Text.IO (hGetContents, hGetLine)
main :: IO ()
main = runSimpleApp do
proc "./x.sh" [ ]
\processConfig -> withProcessWait_ (setStdout createPipe processConfig)
\processHandle -> bracket_
(initialize processHandle)
(terminate processHandle)
(return ())
initialize :: (HasProcessContext env, HasLogFunc env) => Process () Handle () -> RIO env ()
initialize processHandle = do
x <- liftIO $ hGetLine (getStdout processHandle)
if x == "Initialization complete." then return () else error "This should not happen."
terminate :: HasLogFunc env => Process () Handle () -> RIO env ()
terminate processHandle = do
log' <- async $ liftIO $ hGetContents (getStdout processHandle)
stopProcess processHandle
log <- wait log'
logInfo $ display log
Here is what happens:
% ./X
X: fd:3: hGetBuffering: illegal operation (handle is closed)
— x.sh is saying something, but I cannot hear.
What is the right way to manage this?

From the documentation for stopProcess:
Close a process and release any resources acquired. This will ensure terminateProcess is called, wait for the process to actually exit, and then close out resources allocated for the streams. In the event of any cleanup exceptions being thrown this will throw an exception.
(emphasis mine) You don't want stopProcess to do that before you read the output. You just want terminateProcess. withProcessWait_ will take care of the rest of it. Unfortuntately, you do have to step outside of RIO to do that, with import System.Process (terminateProcess) and then liftIO $ terminateProcess (unsafeProcessHandle processHandle).
Side notes: You're kind of misusing bracket_. Since the "middle" of your bracket_ is a no-op, and especially now that the beginning and end aren't actually acquiring or releasing any resources, it's kind of pointless. Also, instead of using async at all, you can just read the output normally after terminating the process, since the output that a process already produced doesn't just disappear when it's terminated.
Here's your code with all of the above fixed:
{-# language NoImplicitPrelude #-}
{-# language BlockArguments #-}
{-# language OverloadedStrings #-}
module Main where
import RIO
import RIO.Process
import Data.Text.IO (hGetContents, hGetLine)
import System.Process (terminateProcess)
main :: IO ()
main = runSimpleApp do
proc "./x.sh" [ ]
\processConfig -> withProcessWait_ (setStdout createPipe processConfig)
\processHandle -> do
initialize processHandle
terminate processHandle
initialize :: (HasProcessContext env, HasLogFunc env) => Process () Handle () -> RIO env ()
initialize processHandle = do
x <- liftIO $ hGetLine (getStdout processHandle)
if x == "Initialization complete." then return () else error "This should not happen."
terminate :: HasLogFunc env => Process () Handle () -> RIO env ()
terminate processHandle = do
liftIO $ terminateProcess (unsafeProcessHandle processHandle)
log <- liftIO $ hGetContents (getStdout processHandle)
logInfo $ display log

Related

Haskell: start a long running process, silently, capturing stdout

I have a long running process which I need to start.
It takes a few seconds to start, and outputs logs to stdout, with one that indicates it is ready.
I would like to:
start the process silently, so that the stdout from the process is not displayed in my session.
capture the output as it streams so that I can determine that it is ready.
have some handle on the process so that I can stop the process at a later point.
I have come close using Shelly, Turtle and System.Process, but fail to capture the stdout.
Using System.Process I had:
import Control.Concurrent (threadDelay)
import Control.Concurrent.Async (race)
import System.IO
import System.Process
startService :: IO ProcessHandle
startService = do
let cmd = "./my-service"
args = [ "-p 1234" ]
(_, Just hout, _, p) <- createProcess $ (proc cmd args) { std_out = CreatePipe }
started <- either id id <$> race (checkStarted hout) timeOut
unless started $ fail "Service not started"
pure p
where
checkStarted :: Handle -> IO Bool
checkStarted h = do
str <- hGetLine h
-- check str for started log, else loop
timeOut :: IO Bool
timeOut = do
threadDelay 10000000
pure False
But The handler hout was never in a ready state.
Using Shelly I had:
import Control.Concurrent (threadDelay)
import Control.Concurrent.Async (race)
import Control.Concurrent.MVar
import Shelly
import System.IO
startService :: IO (Async ())
startService = do
let cmd = "./my-service"
args = [ "-p 1234" ]
startedMVar <- newEmptyMVar
async <- shelly $ asyncSh $ runHandle cmd args $ recordWhenStarted startedMVar
started <- either id id <$> race (readMVar startedMVar) timeOut
unless started $ fail "Service not started"
pure async
where
recordWhenStarted :: MVar Bool -> Text -> IO ()
recordWhenStarted mvar txt =
when (isStartedLog txt) $
modifyMVar_ mvar (const $ pure True)
timeOut :: IO Bool
timeOut = do
threadDelay 10000000
pure False
But the recordWhenStarted is never called.
The following is example of starting process and reading stdout in a program of mine:
runMystem :: [T.Text] -> IO T.Text
runMystem stemWords = do
(i, o, _, ph) <- createProcess (proc mystemExecutabe mystemParams) { std_in = CreatePipe, std_out = CreatePipe }
res <- flip (maybe (return T.empty)) i $ \hIn ->
flip (maybe (return T.empty)) o $ \hOut -> do
hSetEncoding hIn utf8
hSetEncoding hOut utf8
forM_ stemWords $ TIO.hPutStrLn hIn
TIO.hGetContents hOut
void $ waitForProcess ph
return res
This answer uses the process-streaming library (written by the author of this answer) which is a set of helpers over process.
{-# language OverloadedStrings #-}
{-# language NumDecimals #-}
import System.Process.Streaming (execute,piped,shell,foldOut,transduce1)
import qualified System.Process.Streaming.Text as PT
import Data.Text.Lazy (isInfixOf)
import Control.Applicative
import Control.Monad
import Control.Concurrent (threadDelay)
import Control.Concurrent.Async
import Control.Concurrent.MVar
main :: IO ()
main = do
started <- newEmptyMVar
let execution =
execute (piped (shell "{ sleep 3 ; echo fooo ; sleep 3 ; }")) $
foldOut . transduce1 PT.utf8x . PT.eachLine $ lookline
lookline line = do
when (isInfixOf "foo" line) (putMVar started ())
return (Right ())
stopOrNot =
do abort <- race (threadDelay 4e6) (readMVar started)
case abort of
Left () -> return () -- stop immediately
Right () -> runConcurrently empty -- sleep forever
result <- race stopOrNot execution
print result
execute installs exception handlers that terminate the external process when an asynchronous exceptions arrives, to it is safe to use race with it.
execute also takes care to drain any standard stream that is not read explicitly (like stderr in this case) to avoid a common source of deadlocks.

How do I shut down `runTCPServer`?

I'm writing a socket server with runTCPServer from conduit-extra (formerly known as network-conduit). My goal is to interact with my editor using this server --- activate the server from the editor (most likely just by calling external command), use it, and terminate the server when the work is done.
For simplicity, I start with a simple echo server, and let's say I'd like to shut down the whole process when the connection is closed.
So I tried:
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Conduit
import Data.Conduit.Network
import Data.ByteString (ByteString)
import Control.Monad.IO.Class (liftIO)
import System.Exit (exitSuccess)
import Control.Exception
defaultPort :: Int
defaultPort = 4567
main :: IO ()
main = runTCPServer (serverSettings defaultPort "*") $ \ appData ->
appSource appData $$ conduit =$= appSink appData
conduit :: ConduitM ByteString ByteString IO ()
conduit = do
msg <- await
case msg of
Nothing -> liftIO $ do
putStrLn "Nothing left"
exitSuccess
-- I'd like the server to shut down here
(Just s) -> do
yield s
conduit
But this doesn't work -- the program continues to accept new connections. If I am not mistaken, this is because the thread listening to the connection we're dealing with exits with exitSuccess, but the entire process doesn't. So this is totally understandable, but I haven't been able to find a way to exit the whole process.
How do I terminate a server run by runTCPServer? Is runTCPServer something that's supposed to serve forever?
Here's a simple implementation of the idea described in comments:
main = do
mv <- newEmptyMVar
tid <- forkTCPServer (serverSettings defaultPort "*") $ \ appData ->
appSource appData $$ conduit mv =$= appSink appData
() <- takeMVar mv -- < -- wait for done signal
return ()
conduit :: MVar () -> ConduitM ByteString ByteString IO ()
conduit mv = do
msg <- await
case msg of
Nothing -> liftIO $ do
putStrLn "Nothing left"
putMVar mv () -- < -- signal that we're done
(Just s) -> do
yield s
conduit mv

How to exit a conduit when using mergeSources

I have a simple forked conduit setup, with two inputs feeding one single output....
{-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent (threadDelay)
import Control.Monad.IO.Class
import Control.Monad.Trans.Resource
import qualified Data.ByteString as B
import Data.Conduit
import Data.Conduit.TMChan
import Data.Conduit.Network
main::IO ()
main = do
runTCPClient (clientSettings 3000 "127.0.0.1") $ \server -> do
runResourceT $ do
input <- mergeSources [
transPipe liftIO (appSource server),
infiniteSource
] 2
input $$ transPipe liftIO (appSink server)
infiniteSource::MonadIO m=>Source m B.ByteString
infiniteSource = do
liftIO $ threadDelay 10000000
yield "infinite source"
infiniteSource
(here I connect to a tcp socket, then combine the socket input with a timed infinite source, then respond back to the socket)
This works great, until the connection is dropped.... Because the second input still exists, the conduit keeps running. (In this case, the program does end when the timed input fires and there is no socket to write to, but this isn't always the case in my real example).
What is the proper way to shut down the full conduit when one of the inputs is closed?
I tried to brute force a crash by adding the following
crashOnEndOfStream::MonadIO m=>Conduit B.ByteString m B.ByteString
crashOnEndOfStream = do
awaitForever $ yield
error "the peer connection has disconnected" --tried with error
liftIO $ exitWith ExitSuccess --also tried with exitWith
but because the input conduit runs in a thread, the executable was immune to runtime exceptions shutting it down (plus, there is probably a smoother way to shut stuff down than halting the program).
the Source created by mergeSources keeps a count of unclosed sources. It's only closed when the count reaches 0 i.e. every upstream source is closed. This mechanism and the underlying TBMChannel is hidden from user code so you have no way to change its behavior.
One possible solution is to create the channel and the source manually with some medium-level functions exported by Data.Conduit.TMChan, so you can finalize the source by closing the TBMChannel. I haven't tested the code below since your program is not runnable on my machine.
{-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent (threadDelay)
import Control.Monad.IO.Class
import Control.Monad.Trans.Resource
import qualified Data.ByteString as B
import Data.Conduit
import Data.Conduit.Network
import Data.Conduit.TMChan
main::IO ()
main = do
runTCPClient (clientSettings 3000 "127.0.0.1") $ \server -> do
runResourceT $ do
-- create the TBMChannel
chan <- liftIO $ newTBMChanIO 2
let
-- everything piped to the sink will appear at the source
chanSink = sinkTBMChan chan True
chanSource = sourceTBMChan chan
tid1 <- resourceForkIO $ appSource server $$ chanSink
tid2 <- resourceForkIO $ infiniteSource $$ chanSink
chanSource $$ transPipe liftIO (appSink server)
-- and call 'closeTBMChan chan' when you want to exit.
-- 'chanSource' will be closed when the underlying TBMChannel is closed.
infiniteSource :: MonadIO m => Source m B.ByteString
infiniteSource = do
liftIO $ threadDelay 10000000
yield "infinite source"
infiniteSource

QSem doesn't seem to block threads

I'm writing a simple script to run bunch of tasks in parallel using the Shelly library but I want to limit the max number of tasks running at any one time. The script takes a file with an input on each line and runs a task for that input. There are a few hundred inputs in the file and I want to limit to around 16 processes at a time.
The current script actually limits to 1 (well tries to) using a QSem with an initial count of 1. I seem to be missing something though because when I run on a test file with 4 inputs I see this:
Starting
Starting
Starting
Starting
Done
Done
Done
Done
So the threads are not blocking on the QSem as I would expect, they're all running simultaneously. I've even gone so far as to implement my own semaphores both on MVar and TVar and neither worked the way I expected. I'm obviously missing something fundamental but what? I've also tried compiling the code and running it as a binary.
#!/usr/bin/env runhaskell
{-# LANGUAGE TemplateHaskell, QuasiQuotes, DeriveDataTypeable, OverloadedStrings #-}
import Shelly
import Prelude hiding (FilePath)
import Text.Shakespeare.Text (lt)
import qualified Data.Text.Lazy as LT
import Control.Monad (forM)
import System.Environment (getArgs)
import qualified Control.Concurrent.QSem as QSem
import Control.Concurrent (forkIO, MVar, putMVar, newEmptyMVar, takeMVar)
-- Define max number of simultaneous processes
maxProcesses :: IO QSem.QSem
maxProcesses = QSem.newQSem 1
bkGrnd :: ShIO a -&gt ShIO (MVar a)
bkGrnd proc = do
mvar &lt- liftIO newEmptyMVar
_ &lt- liftIO $ forkIO $ do
-- Block until there are free processes
sem &lt- maxProcesses
QSem.waitQSem sem
putStrLn "Starting"
-- Run the shell command
result &lt- shelly $ silently proc
liftIO $ putMVar mvar result
putStrLn "Done"
-- Signal that this process is done and another can run.
QSem.signalQSem sem
return mvar
main :: IO ()
main = shelly $ silently $ do
[img, file] &lt- liftIO $ getArgs
contents &lt- readfile $ fromText $ LT.pack file
-- Run a backgrounded process for each line of input.
results &lt- forM (LT.lines contents) $ \line -> bkGrnd $ do
runStdin &ltcommand> &ltarguments>
liftIO $ mapM_ takeMVar results
As I said in my comment, each call to bkGrnd creates its own semaphonre, allowing every thread to continue without waiting. I would try something like this instead, where the semaphore is created in the main and passed each time to bkGrnd.
bkGrnd :: QSem.QSem -> ShIO a -> ShIO (MVar a)
bkGrnd sem proc = do
mvar <- liftIO newEmptyMVar
_ <- liftIO $ forkIO $ do
-- Block until there are free processes
QSem.waitQSem sem
--
-- code continues as before
--
main :: IO ()
main = shelly $ silently $ do
[img, file] <- liftIO $ getArgs
contents <- readfile $ fromText $ LT.pack file
sem <- maxProcesses
-- Run a backgrounded process for each line of input.
results <- forM (LT.lines contents) $ \line -> bkGrnd sem $ do
runStdin <command> <arguments>
liftIO $ mapM_ takeMVar results
You have an answer, but I need to add: QSem and QSemN are not thread safe if killThread or asynchronous thread death is possible.
My bug report and patch are GHC trac ticket #3160. The fixed code is available as a new library called SafeSemaphore with module Control.Concurrent.MSem, MSemN, MSampleVar, and a bonus FairRWLock.
Isn't it better
bkGrnd sem proc = do
QSem.waitQSem sem
mvar <- liftIO newEmptyMVar
_ <- liftIO $ forkIO $ do
...
so not even forkIO until you get the semaphore?

Catching Control-C exception in GHC (Haskell)

I built a really simple read-eval-print-loop in Haskell that catches Control-C (UserInterrupt). However, whenever I compile and run this program, it always catches the first Control-C and always aborts on the second Control-C with exit code 130. It doesn't matter how many lines of input I give it before and between the two Control-Cs, it always happens this way. I know I must be missing something simple... please help, thanks!
Note: this is with base-4 exceptions, so Control.Exception and not Control.OldException.
import Control.Exception as E
import System.IO
main :: IO ()
main = do hSetBuffering stdout NoBuffering
hSetBuffering stdin NoBuffering
repLoop
repLoop :: IO ()
repLoop
= do putStr "> "
line <- interruptible "<interrupted>" getLine
if line == "exit"
then putStrLn "goodbye"
else do putStrLn $ "input was: " ++ line
repLoop
interruptible :: a -> IO a -> IO a
interruptible a m
= E.handleJust f return m
where
f UserInterrupt
= Just a
f _
= Nothing
Wei Hu is correct; the Haskell runtime system deliberately aborts the program when a second control-C is pressed. To get the behavior one might expect:
import Control.Exception as E
import Control.Concurrent
import System.Posix.Signals
main = do
tid <- myThreadId
installHandler keyboardSignal (Catch (throwTo tid UserInterrupt)) Nothing
... -- rest of program
Disclaimer: I'm not familiar with GHC internals and my answer is based on grepping the source code, reading the comments, and making guesses.
The main function you define is in fact wrapped by runMainIO defined in GHC.TopHandler (this is further confirmed by looking at TcRnDriver.lhs):
-- | 'runMainIO' is wrapped around 'Main.main' (or whatever main is
-- called in the program). It catches otherwise uncaught exceptions,
-- and also flushes stdout\/stderr before exiting.
runMainIO :: IO a -> IO a
runMainIO main =
do
main_thread_id <- myThreadId
weak_tid <- mkWeakThreadId main_thread_id
install_interrupt_handler $ do
m <- deRefWeak weak_tid
case m of
Nothing -> return ()
Just tid -> throwTo tid (toException UserInterrupt)
a <- main
cleanUp
return a
`catch`
topHandler
And install_interrupt_handler is defined as:
install_interrupt_handler :: IO () -> IO ()
#ifdef mingw32_HOST_OS
install_interrupt_handler handler = do
_ <- GHC.ConsoleHandler.installHandler $
Catch $ \event ->
case event of
ControlC -> handler
Break -> handler
Close -> handler
_ -> return ()
return ()
#else
#include "rts/Signals.h"
-- specialised version of System.Posix.Signals.installHandler, which
-- isn't available here.
install_interrupt_handler handler = do
let sig = CONST_SIGINT :: CInt
_ <- setHandler sig (Just (const handler, toDyn handler))
_ <- stg_sig_install sig STG_SIG_RST nullPtr
-- STG_SIG_RST: the second ^C kills us for real, just in case the
-- RTS or program is unresponsive.
return ()
On Linux, stg_sig_install is a C function that calls out to sigaction. The parameter STG_SIG_RST is translated to SA_RESETHAND. On Windows, things are done differently, which probably explains ja's observation.
The most reliable solution for me (at least on Linux), has been to install a signal handler using System.Posix.Signals. I was hoping for a solution that would not require this, but the real reason I posted the question was that I wanted to know why GHC behaved the way it did. As explained on #haskell, a likely explanation is that GHC behaves this way so that the user can always Control-C an application if it hangs. Still, it would be nice if GHC provided a way to affect this behavior without the somewhat lower-level method that we resorted to :).

Resources