Haskell: How to timeout a function that runs an external command

Haskell: How to timeout a function that runs an external command - haskell

I call an external program inside a function. Now i would like to timeout this function and not just the external program. But after the function times out, the external program is still running on my computer (i'm using debian) until it finishes its computation, after that its thread still remains in the process table as a subthread of my main program until the main program terminates.
Here are two minimal examples which illustrates what i would like to do. The first uses unsafePerformIO, the second is completely in the IO monad. I don't really depend on the unsafePerformIO but would like to keep it if possible. The described problem occures with and without it.
With unsafePerformIO
module Main where
import System.Timeout
import Criterion.Measurement
import System.IO.Unsafe
import System.Process
main = do
x <- time $ timeoutP (1 * 1000000) $ mytest 2
y <- getLine
putStrLn $ show x ++ y
timeoutP :: Int -> a -> IO (Maybe a)
timeoutP t fun = timeout t $ return $! fun
mytest :: Int -> String
mytest n =
let
x = runOnExternalProgram $ n * 1000
in
x ++ ". Indeed."
runOnExternalProgram :: Int -> String
runOnExternalProgram n = unsafePerformIO $ do
-- convert the input to a parameter of the external program
let x = show $ n + 12
-- run the external program
-- (here i use "sleep" to indicate a slow computation)
answer <- readProcess "sleep" [x] ""
-- convert the output as needed
let verboseAnswer = "External program answered: " ++ answer
return verboseAnswer
Without unsafePerformIO
module Main where
import System.Timeout
import Criterion.Measurement
import System.IO.Unsafe
import System.Process
main = do
x <- time $ timeout (1 * 1000000) $ mytest 2
y <- getLine
putStrLn $ show x ++ y
mytest :: Int -> IO String
mytest n = do
x <- runOnExternalProgram $ n * 1000
return $ x ++ ". Indeed."
runOnExternalProgram :: Int -> IO String
runOnExternalProgram n = do
-- convert the input to a parameter for the external program:
let x = show $ n + 12
-- run the external program
-- (here i use "sleep" to indicate a slow computation):
answer <- readProcess "sleep" [x] ""
-- convert the output as needed:
let verboseAnswer = "External program answered: " ++ answer
return verboseAnswer
Maybe bracket can be of help here, but i don't really know how.
Edit: I adopted John L's answer. Now i am using the following:
import Control.Concurrent
import Control.Exception
import System.Exit
import System.IO
import System.IO.Error
import System.Posix.Signals
import System.Process
import System.Process.Internals
safeCreateProcess :: String -> [String] -> StdStream -> StdStream -> StdStream
-> ( ( Maybe Handle
, Maybe Handle
, Maybe Handle
, ProcessHandle
) -> IO a )
-> IO a
safeCreateProcess prog args streamIn streamOut streamErr fun = bracket
( do
h <- createProcess (proc prog args)
{ std_in = streamIn
, std_out = streamOut
, std_err = streamErr
, create_group = True }
return h
)
-- "interruptProcessGroupOf" is in the new System.Process. Since some
-- programs return funny exit codes i implemented a "terminateProcessGroupOf".
-- (\(_, _, _, ph) -> interruptProcessGroupOf ph >> waitForProcess ph)
(\(_, _, _, ph) -> terminateProcessGroup ph >> waitForProcess ph)
fun
{-# NOINLINE safeCreateProcess #-}
safeReadProcess :: String -> [String] -> String -> IO String
safeReadProcess prog args str =
safeCreateProcess prog args CreatePipe CreatePipe Inherit
(\(Just inh, Just outh, _, ph) -> do
hPutStr inh str
hClose inh
-- fork a thread to consume output
output <- hGetContents outh
outMVar <- newEmptyMVar
forkIO $ evaluate (length output) >> putMVar outMVar ()
-- wait on output
takeMVar outMVar
hClose outh
return output
-- The following would be great, if some programs did not return funny
-- exit codes!
-- ex <- waitForProcess ph
-- case ex of
-- ExitSuccess -> return output
-- ExitFailure r ->
-- fail ("spawned process " ++ prog ++ " exit: " ++ show r)
)
terminateProcessGroup :: ProcessHandle -> IO ()
terminateProcessGroup ph = do
let (ProcessHandle pmvar) = ph
ph_ <- readMVar pmvar
case ph_ of
OpenHandle pid -> do -- pid is a POSIX pid
signalProcessGroup 15 pid
otherwise -> return ()
This solves my problem. It kills all child processes of the spawned process and that at the right time.
Kind regards.

Edit: it is possible to get the pid of the spawned process. You can do so with code like the following:
-- highly non-portable, and liable to change between versions
import System.Process.Internals
-- from the finalizer of the bracketed function
-- `ph` is a ProcessHandle as returned by createProcess
(\(_,_,_,ph) -> do
let (ProcessHandle pmvar) = ph
ph_ <- takeMVar pmvar
case ph_ of
OpenHandle pid -> do -- pid is a POSIX pid
... -- do stuff
putMVar pmvar ph_
If you kill the process, instead of putting the open ph_ into the mvar you should create an appropriate ClosedHandle and put that back instead. It's important that this code executes masked (bracket will do this for you).
Now that you have a POSIX id you can use system calls or shell out to kill as necessary. Just be careful that your Haskell executable isn't in the same process group if you go that route.
/end edit
This behavior seems sort of sensible. The documentation for timeout claims that it doesn't work at all for non-Haskell code, and indeed I don't see any way that it could generically. What's happening is that readProcess spawns a new process, but then is timed out while waiting for output from that process. It seems that readProcess doesn't terminate the spawned process when it's aborted abnormally. This could be a bug in readProcess, or it could be by design.
As a workaround, I think you'll need to implement some of this yourself. timeout works by raising an async exception in a spawned thread. If you wrap your runOnExternalProgram in an exception handler, you'll get the behavior you want.
The key function here is the new runOnExternalProgram, which is a combination of your original function and readProcess. It would be better (more modular, more reusable, more maintainable) to make a new readProcess that kills the spawned process when an exception is raised, but I'll leave that as an exercise.
module Main where
import System.Timeout
import Criterion.Measurement
import System.IO.Unsafe
import System.Process
import Control.Exception
import System.IO
import System.IO.Error
import GHC.IO.Exception
import System.Exit
import Control.Concurrent.MVar
import Control.Concurrent
main = do
x <- time $ timeoutP (1 * 1000000) $ mytest 2
y <- getLine
putStrLn $ show x ++ y
timeoutP :: Int -> IO a -> IO (Maybe a)
timeoutP t fun = timeout t $ fun
mytest :: Int -> IO String
mytest n = do
x <- runOnExternalProgram $ n * 1000
return $ x ++ ". Indeed."
runOnExternalProgram :: Int -> IO String
runOnExternalProgram n =
-- convert the input to a parameter of the external program
let x = show $ n + 12
in bracketOnError
(createProcess (proc "sleep" [x]){std_in = CreatePipe
,std_out = CreatePipe
,std_err = Inherit})
(\(Just inh, Just outh, _, pid) -> terminateProcess pid >> waitForProcess pid)
(\(Just inh, Just outh, _, pid) -> do
-- fork a thread to consume output
output <- hGetContents outh
outMVar <- newEmptyMVar
forkIO $ evaluate (length output) >> putMVar outMVar ()
-- no input in this case
hClose inh
-- wait on output
takeMVar outMVar
hClose outh
-- wait for process
ex <- waitForProcess pid
case ex of
ExitSuccess -> do
-- convert the output as needed
let verboseAnswer = "External program answered: " ++ output
return verboseAnswer
ExitFailure r ->
ioError (mkIOError OtherError ("spawned process exit: " ++ show r) Nothing Nothing) )

Related

How can I poll a process for it's stdout / stderrr output? Blocked by isEOF

The following example requires the packages of:
- text
- string-conversions
- process
Code:
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE LambdaCase #-}
module Example where
import qualified Data.Text as T
import Data.Text (Text)
import Data.Monoid
import Control.Monad.Identity
import System.Process
import GHC.IO.Handle
import Debug.Trace
import Data.String.Conversions
runGhci :: Text -> IO Text
runGhci _ = do
let expr = "print \"test\""
let inputLines = (<> "\n") <$> T.lines expr :: [Text]
print inputLines
createProcess ((proc "ghci" ["-v0", "-ignore-dot-ghci"]) {std_in=CreatePipe, std_out=CreatePipe, std_err=CreatePipe}) >>= \case
(Just pin, Just pout, Just perr, ph) -> do
output <-
forM inputLines (\i -> do
let script = i <> "\n"
do
hPutStr pin $ cs $ script
hFlush pin
x <- hIsEOF pout >>= \case
True -> return ""
False -> hGetLine pout
y <- hIsEOF perr >>= \case
True -> return ""
False -> hGetLine perr
let output = cs $! x ++ y
return $ trace "OUTPUT" $ output
)
let f i o = "ghci>" <> i <> o
let final = T.concat ( zipWith f (inputLines :: [Text]) (output :: [Text]) :: [Text])
print final
terminateProcess ph
pure $ T.strip $ final
_ -> error "Invaild GHCI process"
If I attempt to run the above:
stack ghci src/Example.hs
ghci> :set -XOverloadedStrings
ghci> runGhci ""
["print \"test\"\n"]
It appears to be blocking on hIsEOF perr, according to https://stackoverflow.com/a/26510673/1663462 it sounds like I shouldn't call this function unless there is 'some output' ready to be flushed / read... However how do I handle the case where it does not have any output at that stage? I don't mind periodically 'checking' or having a timeout.
How can I prevent the above from hanging? I've tried various approaches involving hGetContents, hGetLine however they all seem to end up blocking (or closing the handle) in this situation...

I had to use additional threads, MVars, as well as timeouts:
runGhci :: Text -> IO Text
runGhci _ = do
let expr = "123 <$> 123"
let inputLines = filter (/= "") (T.lines expr)
print inputLines
createProcess ((proc "ghci" ["-v0", "-ignore-dot-ghci"]) {std_in=CreatePipe, std_out=CreatePipe, std_err=CreatePipe}) >>= \case
(Just pin, Just pout, Just perr, ph) -> do
output <- do
forM inputLines
(\i -> do
let script = "putStrLn " ++ show magic ++ "\n"
++ cs i ++ "\n"
++ "putStrLn " ++ show magic ++ "\n"
do
stdoutMVar <- newEmptyMVar
stderrMVar <- newMVar ""
hPutStr pin script
hFlush pin
tOutId <- forkIO $ extract' pout >>= putMVar stdoutMVar
tErrId <- forkIO $ do
let f' = hGetLine perr >>= (\l -> modifyMVar_ stderrMVar (return . (++ (l ++ "\n"))))
forever f'
x <- timeout (1 * (10^6)) (takeMVar stdoutMVar) >>= return . fromMaybe "***ghci timed out"
y <- timeout (1 * (10^6)) (takeMVar stderrMVar) >>= return . fromMaybe "***ghci timed out"
killThread tOutId
killThread tErrId
return $ trace "OUTPUT" $ cs $! x ++ y
)
let final = T.concat ( zipWith f (inputLines :: [Text]) (output :: [Text]) :: [Text])
print final
terminateProcess ph
pure $ T.strip $ cs $ final
_ -> error "Invaild GHCI process"

Elegant egress from `Async` actions

Is there a simple way of cancelling an Async a value, such that it won't be interrupted in the middle of some critical action? I suppose I could use a semaphore inside a loop condition.
async $ whileM readSemaphore runLoopBody
But I'd like to know whether async or some other related library supports it out-of-the-box.
My code, as requested.
-- |
-- TODO | - Rename (?)
-- - Time-out
awaitResult :: String -> IO a -> IO a
awaitResult s act = do
putStr s
sem <- newMVar True
a <- async $ ellipsis sem
r <- act
swapMVar sem False
return r
where
ellipsis :: MVar Bool -> IO ()
ellipsis sem = void $ do
whileM (readMVar sem) $ forM [". ", ".. ", "...", " "] $ \dots -> do
putStr dots
cursorBackward 3
threadDelay (floor $ 0.4 * second)
-- TODO | - If the Windows console wasn't shit, I'd use a checkmark
putStr " (" >> withPretty fgGreen "done" >> putStrLn ")"

The phrasing of the question made me think of a more antagonistic relationship, in which case using Control.Concurrent.mask to limit when the task can be interrupted should be reasonable.
Based on the updated code it seems the threads have a tight coupling and the first alternative I can think of would be spawning the action instead of the indicator and using poll, which I think is less noisy than the MVar route:
import Control.Monad (forM_)
import Control.Exception (throw)
import Control.Concurrent (threadDelay)
import Control.Concurrent.Async (Async,async,poll)
second :: (Num a) => a
second = 1000000
awaitResult :: String -> IO a -> IO a
awaitResult s act = do
putStrLn s
a <- async $ act
ellipsis a
where
ellipsis :: Async a -> IO a
ellipsis a = do
result <- poll a
case result of
Nothing -> do
forM_ [". ",".. ","..."," "] $ \dots -> do
putStr dots
putStr "\r"
threadDelay $ floor $ 0.4 * second
ellipsis a
Just (Left e) -> throw e
Just (Right x) -> return x
main = awaitResult "testing" (threadDelay (5 * second) >> return 5)

Forking the streaming flow in haskell-pipes

I'm having trouble directing flow though a pipeline with haskell-pipes. Basically, I analyze a bunch of files and then I have to either
print results to the terminal in a human-friendly way
encode results to JSON
The chosen path depends upon a command line option.
In the second case, I have to output an opening bracket, then every incoming value followed by a comma and then a closing bracket. Currently insertCommas never terminates, so the closing bracket is never outputted.
import Pipes
import Data.ByteString.Lazy as B
import Data.Aeson (encode)
insertCommas :: Consumer B.ByteString IO ()
insertCommas = do
first <- await
lift $ B.putStr first
for cat $ \obj -> lift $ do
putStr ","
B.putStr obj
jsonExporter :: Consumer (FilePath, AnalysisResult) IO ()
jsonExporter = do
lift $ putStr "["
P.map encode >-> insertCommas
lift $ putStr "]"
exportStream :: Config -> Consumer (FilePath, AnalysisResult) IO ()
exportStream conf =
case outputMode conf of
JSON -> jsonExporter
_ -> P.map (export conf) >-> P.stdoutLn
main :: IO ()
main = do
-- The first two lines are Docopt stuff, not relevant
args <- parseArgsOrExit patterns =<< getArgs
ins <- allFiles $ args `getAllArgs` argument "paths"
let conf = readConfig args
runEffect $ each ins
>-> P.mapM analyze
>-> P.map (filterResults conf)
>-> P.filter filterNulls
>-> exportStream conf

AFAIK a Consumer cannot detect the end of a stream. In order to do that you need to use a Pipes.Parser and invert the control.
Here is a Parser which inserts commas between String elements:
import Pipes
import qualified Pipes.Prelude as P
import Pipes.Parse (draw, evalStateT)
commify = do
lift $ putStrLn "["
m1 <- draw
case m1 of
Nothing -> lift $ putStrLn "]"
Just x1 -> do
lift $ putStrLn x1
let loop = do mx <- draw
case mx of
Nothing -> lift $ putStrLn "]"
Just x -> lift (putStr "," >> putStrLn x) >> loop
loop
test1 = evalStateT commify ( mapM_ yield (words "this is a test") )
test2 = evalStateT commify P.stdinLn
To handle the different output formats I would probably make both formats a Parser:
exportParser = do
mx <- draw
case mx of
Nothing -> return ()
Just x -> (lift $ putStrLn $ export x) >> exportParser
and then:
let parser = case outputMode of
JSON -> commify
_ -> exportParser
evalStateT parser (P.mapM analyze
>-> P.map (filterResults conf)
>-> P.filter filterNulls)
There is probably a slicker way to write exportParser in terms of foldAllM. You can also use the MaybeT transformer to more succinctly write the commify parser. I've written both out explicitly to make them easier to understand.

I think you should 'commify' with pipes-group. It has an intercalates, but not an intersperse, but it's not a big deal to write. You should stay away from the Consumer end, I think, for this sort of problem.
{-#LANGUAGE OverloadedStrings #-}
import Pipes
import qualified Pipes.Prelude as P
import qualified Data.ByteString.Lazy.Char8 as B
import Pipes.Group
import Lens.Simple -- or Control.Lens or Lens.Micro or anything with view/^.
import System.Environment
intersperse_ :: Monad m => a -> Producer a m r -> Producer a m r
intersperse_ a producer = intercalates (yield a) (producer ^. chunksOf 1)
main = do
args <- getArgs
let op prod = case args of
"json":_ -> yield "[" *> intersperse_ "," prod <* yield "]"
_ -> intersperse_ " " prod
runEffect $ op producer >-> P.mapM_ B.putStr
putStrLn ""
where
producer = mapM_ yield (B.words "this is a test")
which give me this
>>> :main json
[this,is,a,test]
>>> :main ---
this is a test

Why is putStrLn not atomic?

To practice concurrent programming, I wrote the following (suboptimal) program, which repeatedly calculates the first prime bigger than whatever the user inputs:
import Control.Concurrent
import Control.Concurrent.Chan
import Control.Monad (forever)
primeAtLeast n = -- Some pure code that looks up the first prime at least as big as n
outputPrimeAtLeast n = putStrLn $ show $ (n, primeAtLeast n)
main = do
chan <- newChan
worker <- forkIO $ forever $ readChan chan >>= outputPrimeAtLeast
forever $ (readLn :: (IO Int)) >>= (writeChan chan)
killThread worker
I want to have a worker thread in the background that does the actual calculation and outputs (n, primeAtLeast n) as soon as it's finished.
What it's doing now: As soon as I enter a number n, it immediately outputs (n,, returns the control to the main thread, calculates primeAtLeast n in the background and outputs the second half primeAtLeast n) as soon as it's finished.
So is putStrLn not atomic? Or where is the problem?

Try this:
outputPrimeAtLeast n = let p = primeAtLeast n in p `seq` putStrLn $ show (n, p)
The above forces the computation of the prime before the putStrLn is run.
Further, you may use print instead of putStrLn . show:
outputPrimeAtLeast n = let p = primeAtLeast n in p `seq` print (n, p)
Alternatively, you may use a putStrLn function which forces every single character before starting printing anything.
strictPutStrLn :: Show a => a -> IO ()
strictPutStrLn x = let str = show x in str `listSeq` putStrLn str
listSeq :: [a] -> b -> b
listSeq [] w = w
listSeq (x:xs) w = x `seq` listSeq xs w

How to abort getChar safely?

I would like to optionally abort a getChar action.
I need the following function:
getChar' :: (Char -> IO ()) -> IO (IO ())
In case of abort <- getChar' callback , a character is read from standard input, unless abort is called before a character is available.
If a character is read, callback is called with it.
I have the following prototype implementation:
import Control.Monad
import Control.Concurrent
getChar' :: (Char -> IO ()) -> IO (IO ())
getChar' callback = do
v <- newEmptyMVar
tid <- forkIO $ do
c <- getChar
b <- tryPutMVar v ()
when b $ callback c
return $ do
b <- tryPutMVar v ()
when b $ killThread tid
The problem is that killThread may abort the thread after reading the char but before putting () into the MVar.
I have no idea how to solve this problem, is it possible at all with the base package?
If not, have you seen a similar function implemented in other packages?

I think the easiest way to achieve this is to perform your own buffering. Here's a simple prototype. It assumes that you call launchIOThread exactly once in your program. It doesn't handle EOF or other IO exceptions, but that should be easy.
import Control.Concurrent
import Control.Concurrent.STM
import Data.Maybe
import Control.Monad
type Buffer = TVar (Maybe Char)
launchIOThread :: IO Buffer
launchIOThread = do
buf <- atomically $ newTVar Nothing
_ <- forkIO $ ioThread buf
return buf
ioThread :: Buffer -> IO ()
ioThread buf = loop where
loop =
join $ atomically $ do
contents <- readTVar buf
if isJust contents -- no-one has taken the character yet
then retry -- relax
else return $ do
c <- getChar
atomically $ writeTVar buf (Just c)
loop
getChar' :: Buffer -> (Char -> IO ()) -> IO (IO ())
getChar' buf callback = do
abortFlag <- atomically $ newTVar False
_ <- forkIO $ doGetChar abortFlag
return $ atomically $ writeTVar abortFlag True
where
doGetChar abortFlag = join $ atomically $ do
mbC <- readTVar buf
abort <- readTVar abortFlag
case mbC of
Just c ->
do writeTVar buf Nothing; return $ callback c
Nothing | abort -> return $ return ()
_ -> retry

What you want to do is use exception-handling constructs such that regardless of exceptions, the MVar is always left in a safe state. In particular, you probably want withMVar.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Haskell: How to timeout a function that runs an external command - haskell

Related

How can I poll a process for it's stdout / stderrr output? Blocked by isEOF

Elegant egress from `Async` actions

Forking the streaming flow in haskell-pipes

Why is putStrLn not atomic?

How to abort getChar safely?

Categories

Resources