Haskell: thread blocked indefinitely in an STM transaction - multithreading

Is there any way to increase a time interval, on the basis of which the RTS decides that thread has blocked indefinitely in an STM transaction?
Here is my code:
import Control.Concurrent (ThreadId)
import Control.Concurrent.MVar (MVar,newMVar,withMVar)
import Control.Concurrent.STM
import qualified Control.Concurrent.ThreadManager as TM
data ThreadManager = ThreadManager { tmCounter::TVar Int, tmTM::MVar TM.ThreadManager }
data Settings = Settings {
maxThreadsCount::Int }
createThreadManager :: Settings -> IO ThreadManager
createThreadManager s = do
counter <- atomically $ newTVar (maxThreadsCount s)
tm <- TM.make >>= newMVar
return $ ThreadManager counter tm
forkManaged :: ThreadManager -> IO () -> IO ThreadId
forkManaged tm fn = do
atomically $ do
counter <- readTVar $ tmCounter tm
check $ counter > 0
writeTVar (tmCounter tm) (counter - 1)
withMVar (tmTM tm) $ \thrdmgr -> TM.fork thrdmgr $ do
fn
atomically $ do
counter <- readTVar $ tmCounter tm
writeTVar (tmCounter tm) (counter + 1)
forkManaged makes sure that amount of simultaneously running managed threads does not exceed maxThreadsCount. It works fine until heavy load. Under heavy load RTS throws an exception. I think under heavy load, on hard concurrent competition for resources, some of threads just have no time to get access to the STM context. So I think, increasing time interval when RTS decides to throw this exception may solve the problem.

Daniel Wagner is right. The decision is not made with timeouts. The relevant code in the rts is in Schedule.c
See the resurrectThreads function for where the exception is thrown. The comment describes that this is only thrown to threads found to be garbage after GC. ezyang described how this worked for mvars: http://blog.ezyang.com/2011/07/blockedindefinitelyonmvar/
[bad speculation concerning check removed when I checked its source and realized that it was just a simple guard/retry and not what was described in an earlier paper -- oops! I now suspect that Daniel Wagner is correct here as well, and the issue is somehow that the counter isn't being incremented.]

Related

forkIO seems to block on haskell websocket server

I'm running a haskell websocket server using Wai:
application :: MVar ServerState -> Wai.Application
application state = WaiWS.websocketsOr WS.defaultConnectionOptions wsApp staticApp
where
wsApp :: WS.ServerApp
wsApp pendingConn = do
conn <- WS.acceptRequest pendingConn
talk conn state
To allow a single client to send asynchronous messages, talk is defined as follows:
talk :: WS.Connection -> MVar ServerState -> IO ()
talk conn state = forever $ do
msg <- WS.receiveMessage conn
putStrLn "received message"
successLock <- newEmptyMVar
tid <- timeoutAsync successLock $ processMessage c state msg
putStrLn "forked thread"
modifyMVar_ state $ \curState ->
return $ curState & threads %~ (M.insert mid tid) -- thread bookkeeping
putStrLn "modified state"
putMVar successLock ()
putStrLn "unlocked success"
where
mid = serverMessageId msg
timeoutAsync lock f = forkIO $ do
timeout S.process_message_timeout onTimeout (onSuccess lock) f
onSuccess lock = do
-- block until the first modifyMVar_ above finishes.
takeMVar lock
modifyMVar_ state $ \curState ->
return $ curState & threads %~ (M.delete mid) -- thread cleanup
onTimeout = ...
Here's the thing: when I bombard this server with many messages (from a single client) that are CPU-heavy, the the main thread occasionally hangs at "forked thread".
This is surprising because all work on messages are (in theory) being done in separate threads, and so the main thread (forever) should never block.
What's going on here?
[EDIT]
A minimum verifiable example is pretty hard to provide in this case (the work is done in processMessage, but comprises a lot of moving parts, any of which might be the problem). Instead, I'm looking for high-level pointers to things I could investigate.
Here is data from an example run (send the server an expensive request, then a bunch of smaller less-expensive ones):
gc productivity 36%: http://puu.sh/nSxnj/d8bb5995ae.png
event log (using +RTS -ls and -eventlog): http://puu.sh/nSxDy/efe457bee2.eventlog
CPU usage ~300% (for 4 caps) -- made me think GC might be competing with OS resources; I decreased the num capabilities to n-1, and this seemed to improve responsiveness
Also, the app has the following properties, which I think are potential causes of the problem:
ratio of GC'd to live data is high; processMessage basically constructs a giant list which is aeson'd and sent back to the user, but not kept in state
many foreign calls are made (due to ZMQ, which iirc makes unsafe foreign calls) on a single request
ThreadScope tells me that lots of heapoverflows occur, causing GC requests

Why does my parallel traversal Haskell program leak memory?

Consider the following Haskell program (I'm doing this mostly for learning purposes):
import qualified Control.Concurrent.MSem as Sem
import System.Environment (getArgs)
import Control.Concurrent (forkIO)
import Control.Monad
-- Traverse with maximum n threads
parallelTraverse :: Foldable a => Int -> (b -> IO()) -> a b -> IO ()
parallelTraverse n action values = do
sem <- Sem.new n
forM_ values $ \value -> Sem.with sem (forkIO $ action value)
main :: IO ()
main = do
args <- getArgs
let nThreads = read . head $ args :: Int
parallelTraverse nThreads print [(1::Int)..]
when I run it, memory quickly climbs to several GB. I tried various combinations to make sure I discard the results of intermediate computations (the print actions). Why is it still leaking space?
First of all, you have an evident mistake in the following piece:
Sem.with sem (forkIO $ action value)
You're addressing the semaphore from the master thread around the "fork" operation instead of the action there. Following is the proper way to implement it:
forkIO (Sem.with sem (action value))
I.e., to address the semaphore from the context of the forked thread.
Secondly, in the following code you're calling the parallelTraverse operation on an infinite list:
parallelTraverse nThreads print [(1::Int)..]
Which results in the infinite forking of threads. And since the forkIO operation is roughly instantaneous for the calling thread, it's pretty much no surprise that you're running out of resources quite soon.
To use the semaphore to limit the number of worker threads the with pattern simply won't do in your case. Instead you should use the explicit combination of wait and signal and not forget to treat the exceptions properly (in case you expect them). E.g.,:
parallelTraverse :: Foldable a => Int -> (b -> IO()) -> a b -> IO ()
parallelTraverse n action values = do
sem <- Sem.new n
forM_ values $ \value -> do
Sem.wait sem
forkIO $ finally (action value) (Sem.signal sem)

Stop threads from interleaving output

The following program creates two threads running concurrently, that each sleep for a random amount of time, before printing a line of text to stdout.
import Control.Concurrent
import Control.Monad
import System.Random
randomDelay t = randomRIO (0, t) >>= threadDelay
printer str = forkIO . forever $ do
randomDelay 1000000 -- μs
putStrLn str
main = do
printer "Hello"
printer "World"
return ()
The output generally looks something like
>> main
Hello
World
World
Hello
WoHrelld
o
World
Hello
*Interrupted
>>
How do you ensure that only one thread can write to stdout at a time? This seems like the kind of thing that STM should be good at, but all STM transactions must have the type STM a for some a, and an action that prints to the screen has type IO a, and there doesn't seem to be a way to embed IO into STM.
The way to handle output with STM is to have an output queue that is shared between all threads and which is processed by a single thread.
import Control.Concurrent
import Control.Concurrent.STM
import Control.Monad
import System.Random
randomDelay t = randomRIO (0, t) >>= threadDelay
printer queue str = forkIO . forever $ do
randomDelay 1000000 -- μs
atomically $ writeTChan queue str
prepareOutputQueue = do
queue <- newTChanIO
forkIO . forever $ atomically (readTChan queue) >>= putStrLn
return queue
main = do
queue <- prepareOutputQueue
printer queue "Hello"
printer queue "World"
return ()
Locking in the way you're describing isn't possible usingSTM. This is because STM is based on optimistic locking and so every transaction must be restartable at any point. If you embedded an IO operation into STM, it could be executed multiple times.
Probably the easiest solution for this problem is to use a MVar as a lock:
import Control.Concurrent
import Control.Concurrent.MVar
import Control.Monad
import System.Random
randomDelay t = randomRIO (0, t) >>= threadDelay
printer lock str = forkIO . forever $ do
randomDelay 1000000
withMVar lock (\_ -> putStrLn str)
main = do
lock <- newMVar ()
printer lock "Hello"
printer lock "World"
return ()
In this solution the lock is passed as an argument to printer.
Some people prefer to declare the lock as a top-level global variable, but currently this requires unsafePerformIO and relies on properties of GHC that AFAIK aren't part of the Haskell Language Report (in particular, it relies on the fact that a global variable with non-polymorphic type is evaluated at most once during the execution of a program).
A bit of research, based on Petr Pudlák's answer shows that there is a module Control.Concurrent.Lock in the concurrent-extra package that provides an abstraction around MVar ()-based locks.
The solution using that library is
import Control.Concurrent
import qualified Control.Concurrent.Lock as Lock
import Control.Monad
import System.Random
randomDelay t = randomRIO (0, t) >>= threadDelay
printer lock str = forkIO . forever $ do
randomDelay 1000
Lock.with lock (putStrLn str)
main = do
lock <- Lock.new
printer lock "Hello"
printer lock "World"
return ()
This is the example using global lock as mentioned by Petr.
import Control.Concurrent
import Control.Monad
import System.Random
import Control.Concurrent.MVar (newMVar, takeMVar, putMVar, MVar)
import System.IO.Unsafe (unsafePerformIO)
{-# NOINLINE lock #-}
lock :: MVar ()
lock = unsafePerformIO $ newMVar ()
printer x = forkIO . forever $ do
randomDelay 100000
() <- takeMVar lock
let atomicPutStrLn str = putStrLn str >> putMVar lock ()
atomicPutStrLn x
randomDelay t = randomRIO (0, t) >>= threadDelay
main = do
printer "Hello"
printer "World"
return ()
You can actually implement a lock using STM if you want, though an MVar will almost certainly perform better.
newtype Lock = Lock (TVar Status)
data Status = Locked | Unlocked
newLocked :: IO Lock
newLocked = Lock <$> newTVarIO Locked
newUnlocked :: IO Lock
newUnlocked = Lock <$> newTVarIO Unlocked
-- | Acquire a lock.
acquire :: Lock -> IO ()
acquire (Lock tv) = atomically $
readTVar tv >>= \case
Locked -> retry
Unlocked -> writeTVar tv Locked
-- | Try to acquire a lock. If the operation succeeds,
-- return `True`.
tryAcquire :: Lock -> IO Bool
tryAcquire (Lock tv) = atomically $
readTVar tv >>= \case
Locked -> pure False
Unlocked -> True <$ writeTVar tv Locked
-- | Release a lock. This version throws an exception
-- if the lock is unlocked.
release :: Lock -> IO ()
release (Lock tv) = atomically $
readTVar tv >>= \case
Unlocked -> throwSTM DoubleRelease
Locked -> writeTVar tv Unlocked
data DoubleRelease = DoubleRelease deriving Show
instance Exception DoubleRelease where
displayException ~DoubleRelease = "Attempted to release an unlocked lock."
-- | Release a lock. This version does nothing if
-- the lock is unlocked.
releaseIdempotent :: Lock -> IO ()
releaseIdempotent (Lock tv) = atomically $ writeTVar tv Unlocked
-- | Get the status of a lock.
isLocked :: Lock -> IO Status
isLocked (Lock tv) = readTVarIO tv
acquire/release pairs need careful masking and exception handling, much like primitive MVar operations. The documentation suggests, but does not actually state, that STM operations are interruptible when they retry; assuming this is true, the same approach used for withMVar will work here. Note: I've opened a GHC ticket to document retry interruptibility.

Killing a thread when MVar is garbage collected

I have a worker thread which reads data repeatedly from an MVar and performs some useful work on that. After a while, the rest of the program forgets about that worker thread, which means that it will wait on an empty MVar and become very lonely. My question is:
Will the MVar be garbage collected if threads no longer write to it, for instance because they all wait for it?
Will garbage collection kill the waiting threads?
If neither, can I somehow indicate to the compiler that the MVar should be garbage collected and the thread be killed?
EDIT: I should probably clarify the purpose of my question. I don't desire general protection against deadlocks; instead, what I would like to do is to tie the life of the worker thread to life of a value (as in: dead values are claimed by garbage collection). In other words, the worker thread is a resource that I would like to free not by hand, but when a certain value (the MVar or a derivative) is garbage collected.
Here an example program that demonstrates what I have in mind
import Control.Concurrent
import Control.Concurrent.MVar
main = do
something
-- the thread forked in something can be killed here
-- because the MVar used for communication is no longer in scope
etc
something = do
v <- newEmptyMVar
forkIO $ forever $ work =<< takeMVar v
putMVar v "Haskell"
putMVar v "42"
In other words, I want the thread to be killed when I can no longer communicate with it, i.e. when the MVar used for communication is no longer in scope. How to do that?
It will just work: when the MVar is only reachable by the thread that is blocked on it, then the thread is sent the BlockedIndefinitelyOnMVar exception, which will normally cause it to die silently (the default exception handler for a thread ignores this exception).
BTW, for doing some cleanup when the thread dies, you'll want to use forkFinally (which I just added to Control.Concurrent).
If you're lucky, you'll get a "BlockedIndefinitelyOnMVar", indicating that you're waiting on an MVar that no thread will ever write to.
But, to quote Ed Yang,
GHC only knows that a thread can be considered garbage if there are no
references to the thread. Who is holding a reference to the thread?
The MVar, as the thread is blocking on this data structure and has
added itself to the blocking list of this. Who is keeping the MVar
alive? Why, our closure that contains a call to takeMVar. So the
thread stays.
without a bit of work (which would be, by the way, quite interesting to see), BlockedIndefinitelyOnMVar is not an obviously useful mechanism for giving your Haskell programs deadlock protection.
GHC just can't solve the problem in general of knowing whether your thread will make progress.
A better approach would be to explicitly terminate threads by sending them a Done message. E.g. just lift your message type into an optional value that also includes an end-of-message value:
import Control.Concurrent
import Control.Concurrent.MVar
import Control.Monad
import Control.Exception
import Prelude hiding (catch)
main = do
something
threadDelay (10 * 10^6)
print "Still here"
something = do
v <- newEmptyMVar
forkIO $
finally
(let go = do x <- takeMVar v
case x of
Nothing -> return ()
Just v -> print v >> go
in go)
(print "Done!")
putMVar v $ Just "Haskell"
putMVar v $ Just "42"
putMVar v Nothing
and we get the correct clean up:
$ ./A
"Haskell"
"42"
"Done!"
"Still here"
I tested the simple weak MVar and it did get finalized and killed. The code is:
import Control.Monad
import Control.Exception
import Control.Concurrent
import Control.Concurrent.MVar
import System.Mem(performGC)
import System.Mem.Weak
dologger :: MVar String -> IO ()
dologger mv = do
tid <- myThreadId
weak <- mkWeakPtr mv (Just (putStrLn "X" >> killThread tid))
logger weak
logger :: Weak (MVar String) -> IO ()
logger weak = act where
act = do
v <- deRefWeak weak
case v of
Just mv -> do
a <- try (takeMVar mv) :: IO (Either SomeException String)
print a
either (\_ -> return ()) (\_ -> act) a
Nothing -> return ()
play mv = act where
act = do
c <- getLine
if c=="quit" then return ()
else putMVar mv c >> act
doplay mv = do
forkIO (dologger mv)
play mv
main = do
putStrLn "Enter a string to escape, or quit to exit"
mv <- newEmptyMVar
doplay mv
putStrLn "*"
performGC
putStrLn "*"
yield
putStrLn "*"
threadDelay (10^6)
putStrLn "*"
The session with the program was:
(chrisk)-(/tmp)
(! 624)-> ghc -threaded -rtsopts --make weak2.hs
[1 of 1] Compiling Main ( weak2.hs, weak2.o )
Linking weak2 ...
(chrisk)-(/tmp)
(! 625)-> ./weak2 +RTS -N4 -RTS
Enter a string to escape, or quit to exit
This is a test
Right "This is a test"
Tab Tab
Right "Tab\tTab"
quit
*
*
X
*
Left thread killed
*
So blocking on takeMVar did not keep the MVar alive on ghc-7.4.1 despite expectations.
While BlockedIndefinitelyOnMVar should work, also consider using ForeignPointer finalizers. The normal role of those is to delete C structures that are no longer accessible in Haskell. However, you can attach any IO finalizer to them.

STM monad problem

This is just a hypothetical scenario to illustrate my question. Suppose that there are two threads and one TVar shared between them. In one thread there is an atomically block that reads the TVar and takes 10s to complete. In another thread is an atomically block that modifies the TVar every second. Will the first atomically block ever complete? Surely it will just keep going back to the beginning, because the log is perpetually in an inconsistent state?
As others have said: in theory there is no guarantee of progress. In practice there is also no guarantee of progress:
import Control.Monad -- not needed, but cleans some things up
import Control.Monad.STM
import Control.Concurrent.STM
import Control.Concurrent
import GHC.Conc
import System.IO
main = do
tv <- newTVarIO 0
forkIO (f tv)
g tv
f :: TVar Int -> IO ()
f tv = forever $ do
atomically $ do
n <- readTVar tv
writeTVar tv (n + 1)
unsafeIOToSTM (threadDelay 100000)
putStr "."
hFlush stdout
g :: TVar Int -> IO ()
g tv = forever $ do
atomically $ do
n <- readTVar tv
writeTVar tv (n + 1)
unsafeIOToSTM (threadDelay 1000000)
putStrLn "Done with long STM"
The above never says "Done with long STM" in my tests.
Obviously if you think the computation is still going to be valid/pertinent then you would want to either
Leave the atomic block, perform expensive computation, enter the atomic block / confirm assumptions are valid / and update the value. Potentially dangerous, but no more so than most locking strategies.
Memoize the results in the atomic block so the still valid result will be no more than a cheap lookup after a retry.
STM prevents deadlock, but is still vulnerable to starvation. It is possible in a pathological case for the 1s atomic action to always aquire the resource.
However, the changes of this happening are very rare -- I don't believe I've ever seen it in practice.
For the semantics, see Composable Memory Transactions, section 6.5 "Progress". STM in Haskell guarantees only that a running transaction will successfully commit (i.e. no deadlock), but in the worst case an infinite transaction will block others.
No, it would work fine. Exactly how the two threads would interact depends on
the retry logic.
For example, let's say you have:
ten tv = do
n <- readTVar tv
when (n < 7) retry
writeTVar tv 0
-- do something that takes about 10 seconds
one tv = do
modifyTVar tv (+1)
-- do something that takes about 1 second
So the "ten" thread will be in retry state until the TVar reaches
the value 7, then it will proceed.
Note that you can't directly control how long these computations will take
inside the STM monad. That would be a side-effect, and side-effects are not
allowed in STM calculations. The only way to communicate with the outside
world is via values passed through transactional memory.
And that means that if the "baton-passing" logic through transactional memory is
correct, the program will work correctly independently of the exact amount
of time any part of it takes. That's part of the guarantee of STM.

Resources