If a process thread opens an inproc ZMQ socket, and then dies because of some unhandled exception, what happens if the socket is not closed? How bad is this sort of a practice?
To be more specific, I've implemented a very simple message broker very similar to http://zguide.zeromq.org/page:all#Multithreading-with-MQ in Haskell.
The worker thread opens a new socket, and waits in an infinite loop for processing messages.
The socket is not closed anywhere in the worker thread.
Now, if there is an unhandled exception in the worker thread, and the thread dies, how bad is to just restart the thread without caring?
I'm pasting the worker code from the Haskell example:
worker :: ZMQ z ()
worker = do
receiver <- socket Rep
connect receiver "inproc://workers"
forever $ do
receive receiver >>= liftIO . printf "Received request:%s\n" . unpack
-- Suppose there is some exception here
liftIO $ threadDelay (1 * 1000 * 1000)
send receiver [] "World"
So it seems that if you don't close the inproc socket, the restarted thread can't accept messages very well. I'm not sure I understand this behaviour, but I can confirm that this modified example from the ZMQ haskell guide works:
import System.ZMQ3.Monadic
import Prelude hiding (catch)
import Control.Monad.CatchIO
worker :: ZMQ z ()
worker = do
liftIO $ putStrLn "Starting the worker thread..."
receiver <- socket Rep
connect receiver "inproc://workers"
catch
(forever $ do
liftIO $ putStrLn "Waiting for an inproc message"
request <- receiveMulti receiver -- request :: ByteString.Char8
liftIO $ putStrLn "I'm doing something that may throw an error"
-- error "IO Error has happened"
)
(\(e :: IOError) -> do
liftIO $ putStrLn $ "Caught error: " ++ (show e)
close receiver -- Commenting this out will result in the restarted worker thread being unable to accept new messages
)
Related
I have a Yesod app with a warp server, and some of its functionality depends on async exceptions. Recently there arose a need to migrate it to https, which I did with the package warp-tls. However now I can't kill the warp thread by throwing ThreadKilled exceptions at it, the throwTo function just hangs and does nothing.
Consider the following example. Here we monitor the state of warp thread with MVar (), which is empty while the thread is running, and has a () value when the thread is killed.
import MyApplication (waiPage)
-- waiPage :: Application
runWai :: MVar () -> IO ()
runWai finishVar = bracket
(return ())
(const $ putMVar finishVar ())
(const runApp)
where
-- change this to normal or tls to check
runApp = runAppTls
runAppNormal = runSettings warpSettings waiPage
runAppTls = runTLS siteTlsSettings warpSettings waiPage
--
warpSettings = setPort 8080 defaultSettings
siteTlsSettings = tlsSettings "cert.pem" "key.pem"
main :: IO ()
main = do
finishVar <- newEmptyMVar
thread_id <- forkIO $ runWai finishVar
-- Try to kill warp thread. Fork because throw might hang
forkIO $ throwTo thread_id ThreadKilled
threadDelay (2 * 10^6) -- microseconds to seconds
isAlive <- isEmptyMVar finishVar
if isAlive then putStrLn "Couldn't kill warp thread"
else putStrLn "Succesfully killed warp thread"
-- Wait for forked warp thread to finish
readMVar finishVar
When you have runApp = runAppNormal, you will get Succesfully killed warp thread message, and the application will exit.
When you have runApp = runAppTls, you will get the Couldn't kill warp thread message, and the app will hang and keep serving.
So how do I get rid of this exception-intercepting behavior? Or at Least is there a way to kill the warpTls thread in any other way?
It turned out to be a Windows-only bug in the version I used which is warp-tls == 3.2.4 that got fixed in a later version. I've looked at the fix and it's a function with the name windowsThreadBlockHack, so if anyone is stuck with a bit outdated warp, you can backport this fix for you too.
I'm writing an app using yesod-websockets, and whenever I receive a "Start" message, I need to run a thread that continuously produces data and sends it to the client, until the client tells it to stop. The producer may also stop producing data on its own.
When the producer stops (regardless of whether it stopped producing on its own, or the client told it to stop), it goes back to the main loop and waits for another "Start" message to be received.
The code looks something like this (runnable minimal repro below):
wsApp :: WebSocketsT Handler ()
wsApp =
forever $ do
msg <- receiveMsg
case msg of
StartMsg -> do
race_
(produceData)
(whileM ((/= StopMsg) <$> receiveMsg))
The problem is that if produceData stops on its own, then the thread running receiveMsg will be cancelled, and this causes the websocket connection to be closed.
21/Mar/2018:08:21:06 +0000 [Error#yesod] Exception from Warp: ConnectionClosed #(app-0.0.0-5bzI9Onrk2fFepGGsdocDz:Application src/Application.hs:122:15)
Is there a way to cancel a thread that's listening on the connection, without killing the connection?
Here's a minimal repro:
wsApp :: WebSocketsT Handler ()
wsApp = forever $ do
race_
(receiveData :: WebSocketsT Handler Text)
(pure ())
$logDebug "Trying again"
The second thread will finish, and the first one will be cancelled, causing the connection to be killed.
Since I couldn't find a more graceful solution, I ended up using an IORef Bool to synchronize the websockets thread and the producer thread.
wsApp :: WebSocketsT Handler ()
wsApp = do
producing <- newIORef False
forever $ do
msg <- receiveMsg
case msg of
StartMsg -> do
whenM (not <$> readIORef producing) $ do
atomicWriteIORef producing True
void . async $ produceData producing
StopMsg -> atomicWriteIORef producing False
produceData :: IORef Bool -> WebSocketsT Handler ()
produceData producing =
whenM (readIORef producing) $
case produce of
Nothing -> atomicWriteIORef producing False
Just x -> sendMsg x >> produceData producing
I'm trying to run a rabbitmq background process on heroku to pick tasks off a queue and process them. I'm working with the AMQP haskell library and they give the following example (parts omitted or brevity.
main = do
--setup connection omitted
--connect to queue, wait for messages
consumeMsgs chan "myQueue" Ack myCallback
--halts process so messages are taken off the queue until a key is presseed
getLine -- wait for keypress
closeConnection conn -- close connection after key
putStrLn "connection closed"
This works fine locally because getLine keeps the process running until you press a key. However, when I deploy this to heroku the process exits with
2016-04-19T08:37:23.373087+00:00 app[worker.1]: worker: <stdin>: hGetChar: end of file
I figured out from the accepted answer to this question that this is because in order to deploy a background process via ssh you need to redirect /dev/null/ to stdin which sends an EOF signal to the process.
In our case the getLine function exits because of this signal and the entire process stops, preventing our worker from staying up.
How can I keep this worker running when I deploy?
EDIT: Final Solution
Using #carstons comments I ended up with the following implementation that worked:
main :: IO ()
main = do
mvar <- newEmptyMVar
conn <- setupConnection
queueName <- pack <$> getEnv "QUEUE_NAME"
chan <- openChannel conn
consumeMsgs chan queueName Ack processMessage
installHandler sigINT (Catch (cleanupConnection conn mvar)) Nothing
putStrLn "Running forever, press ctrl c to exit"
-- this blocks until sigint is recieved and the handler for SIGINT
-- "fills" the mvar. once that is filled the process exits
run <- takeMVar mvar
case run of
_ -> return ()
mixpanelConfig :: IO Config
mixpanelConfig = liftM2 Config (ApiToken . pack <$> getEnv "MIXPANEL_API_TOKEN") (newManager tlsManagerSettings)
cleanupConnection :: Connection -> MVar () -> IO ()
cleanupConnection conn mvar = do
closeConnection conn
putStrLn "SIGINT received.. closing rabbitmq connection"
putMVar mvar ()
processMessage :: (Message, Envelope) -> IO ()
as I pointed out in the comment if you just want to keep it running forever you can use forever with - for example - threadDelay:
import Control.Concurrent (threadDelay)
import Control.Monad (forever)
main = do
--setup connection omitted
--connect to queue, wait for messages
consumeMsgs chan "myQueue" Ack myCallback
--halts process so messages are taken off the queue forever
forever $ threadDelay 10000
-- so this will never happen and you could remove it
closeConnection conn -- close connection after key
putStrLn "connection closed"
note that this will of course never really close the connection or exit the application - you'll have to kill the process.
the alternative would be a bit more involved as you need some message/way to send your program a termination signal.
An easy way is to use MVars which you could set in your myCallback when a certain stop-message was received on your queue:
import Control.Concurrent.MVar
main = do
-- MVar to receve the quit-signal
quitSignal <- newEmptyMVar
--setup connection omitted
--connect to queue, wait for messages - the callback should
--set the quitSignal with putMVar quitSignal ()
consumeMsgs chan "myQueue" Ack (myCallback quitSignal)
--halts process so messages are taken off the queue till quitSignal
takeMVar quitSignal
-- so this will happen after quitSignal was set
closeConnection conn -- close connection after key
putStrLn "connection closed"
I'm running a haskell websocket server using Wai:
application :: MVar ServerState -> Wai.Application
application state = WaiWS.websocketsOr WS.defaultConnectionOptions wsApp staticApp
where
wsApp :: WS.ServerApp
wsApp pendingConn = do
conn <- WS.acceptRequest pendingConn
talk conn state
To allow a single client to send asynchronous messages, talk is defined as follows:
talk :: WS.Connection -> MVar ServerState -> IO ()
talk conn state = forever $ do
msg <- WS.receiveMessage conn
putStrLn "received message"
successLock <- newEmptyMVar
tid <- timeoutAsync successLock $ processMessage c state msg
putStrLn "forked thread"
modifyMVar_ state $ \curState ->
return $ curState & threads %~ (M.insert mid tid) -- thread bookkeeping
putStrLn "modified state"
putMVar successLock ()
putStrLn "unlocked success"
where
mid = serverMessageId msg
timeoutAsync lock f = forkIO $ do
timeout S.process_message_timeout onTimeout (onSuccess lock) f
onSuccess lock = do
-- block until the first modifyMVar_ above finishes.
takeMVar lock
modifyMVar_ state $ \curState ->
return $ curState & threads %~ (M.delete mid) -- thread cleanup
onTimeout = ...
Here's the thing: when I bombard this server with many messages (from a single client) that are CPU-heavy, the the main thread occasionally hangs at "forked thread".
This is surprising because all work on messages are (in theory) being done in separate threads, and so the main thread (forever) should never block.
What's going on here?
[EDIT]
A minimum verifiable example is pretty hard to provide in this case (the work is done in processMessage, but comprises a lot of moving parts, any of which might be the problem). Instead, I'm looking for high-level pointers to things I could investigate.
Here is data from an example run (send the server an expensive request, then a bunch of smaller less-expensive ones):
gc productivity 36%: http://puu.sh/nSxnj/d8bb5995ae.png
event log (using +RTS -ls and -eventlog): http://puu.sh/nSxDy/efe457bee2.eventlog
CPU usage ~300% (for 4 caps) -- made me think GC might be competing with OS resources; I decreased the num capabilities to n-1, and this seemed to improve responsiveness
Also, the app has the following properties, which I think are potential causes of the problem:
ratio of GC'd to live data is high; processMessage basically constructs a giant list which is aeson'd and sent back to the user, but not kept in state
many foreign calls are made (due to ZMQ, which iirc makes unsafe foreign calls) on a single request
ThreadScope tells me that lots of heapoverflows occur, causing GC requests
The following yields a deadlock error message (* Exception: thread blocked indefinitely in an MVar operation). I've thought it through step by step, and I don't see the issue.
On the main thread, a MVar is created, and given to producer, running on a new thread
producer starts, and blocks at listenOn, waiting for a connection
The main thread continues into the loop, and blocks, waiting for the MVar to receive something
Once producer gets a connection, it continues into it's loop, and after receiving something from the socket, puts it into the MVar
Meaning (as far as I understand it), it should end up with producer putting something in the MVar, and main waiting to receive something.
If it's getting stuck because listenOn doesn't connect immediately, how can I get around this? The MVar needs to be created in main, and before producer is forked so it can be passed in.
import Control.Concurrent
import Network
import Network.Socket
import System.IO
getSockInfo :: Socket -> IO String
getSockInfo s = do
info <- getPeerName s
return $ case info of
(SockAddrInet port addr) -> "Addr/Port: " ++ (show addr) ++ " / " ++ (show port)
(SockAddrInet6 port flow addr scope) ->
"Addr/Port: " ++ (show addr) ++ " / " ++ (show port) ++ "Flow/Scope: " ++ (show flow) ++ " / " ++ (show scope)
producer :: MVar String -> IO ()
producer m = do
s <- listenOn (PortNumber 5555)
putStrLn "Listening..."
info <- getSockInfo s
putStrLn $ "Connected to " ++ info
h <- socketToHandle s ReadMode
loop h m
where loop h m = do
message <- hGetLine h
putMVar m message
loop h m
main :: IO ()
main = do
withSocketsDo $ do
m <- newEmptyMVar
prod <- forkIO $ producer m
loop m
where loop m = do
n <- takeMVar m
print n
loop m
listenOn returns immediately but doesn't give you a connected socket, so attempts to use it or read from it fail. I'm not sure why you aren't seeing an error message to indicate that, since I do when I run your code. In any case the listening thread is probably dying at that point, which leaves the main thread deadlocked as nothing can write to the MVar.
Using accept after listenOn to wait for a remote connection should fix this.