I have a TChan as input for a thread which should behave like this:
If sombody writes to the TChan within a specific time, the content should be retrieved. If there is nothing written within the specified time, it should unblock and continue with Nothing.
My attempt on this was to use the timeout function from System.Timeout like this:
timeout 1000000 $ atomically $ readTChan pktChannel
This seemed to work but now I discovered, that I am sometimes loosing packets (they are written to the channel, but not read on the other side. In the log I get this:
2014.063.11.53.43.588365 Pushing Recorded Packet: 2 1439
2014.063.11.53.43.592319 Run into timeout
2014.063.11.53.44.593396 Run into timeout
2014.063.11.53.44.593553 Pushing Recorded Packet: 3 1439
2014.063.11.53.44.597177 Sending Recorded Packet: 3 1439
Where "Pushing Recorded Packet" is the writing from the one thread and "Sending Recorded Packet" is the reading from the TChan in the sender thread. The line with Sending Recorded Packet 2 1439 is missing, which would indicate a successful read from the TChan.
It seems that if the timeout is received at the wrong point in time, the channel looses the packet. I suspect that the threadKill function used inside timeout and STM don't play well together.
Is this correct? Does somebody have another solution that does not loose the packet?
Use registerDelay, an STM function, to signal a TVar when the timeout is reached. You can then use the orElse function or the Alternative operator <|> to select between the next TChan value or the timeout.
import Control.Applicative
import Control.Monad
import Control.Concurrent
import Control.Concurrent.STM
import System.Random
-- write random values after a random delay
packetWriter :: Int -> TChan Int -> IO ()
packetWriter maxDelay chan = do
let xs = randomRs (10000 :: Int, maxDelay + 50000) (mkStdGen 24036583)
forM_ xs $ \ x -> do
threadDelay x
atomically $ writeTChan chan x
-- block (retry) until the delay TVar is set to True
fini :: TVar Bool -> STM ()
fini = check <=< readTVar
-- Read the next value from a TChan or timeout
readTChanTimeout :: Int -> TChan a -> IO (Maybe a)
readTChanTimeout timeoutAfter pktChannel = do
delay <- registerDelay timeoutAfter
atomically $
Just <$> readTChan pktChannel
<|> Nothing <$ fini delay
-- | Print packets until a timeout is reached
readLoop :: Show a => Int -> TChan a -> IO ()
readLoop timeoutAfter pktChannel = do
res <- readTChanTimeout timeoutAfter pktChannel
case res of
Nothing -> putStrLn "timeout"
Just val -> do
putStrLn $ "packet: " ++ show val
readLoop timeoutAfter pktChannel
main :: IO ()
main = do
let timeoutAfter = 1000000
-- spin up a packet writer simulation
pktChannel <- newTChanIO
tid <- forkIO $ packetWriter timeoutAfter pktChannel
readLoop timeoutAfter pktChannel
killThread tid
The thumb rule of concurrency is: if adding a sleep in some point inside an IO action matters, your program is not safe.
To understand why the code timeout 1000000 $ atomically $ readTChan pktChannel does not work, consider the following alternative implementation of atomically:
atomically' :: STM a -> IO a
atomically' action = do
result <- atomically action
threadDelay someTimeAmount
return result
The above is equal to atomically, but for an extra innocent delay. Now it is easy to see that if timeout kills the thread during the threadDelay, the atomic action has completed (consuming a message from the channel), yet timeout will return Nothing.
A simple fix to timeout n $ atomically ... could be the following
smartTimeout :: Int -> STM a -> IO (Maybe a)
smartTimeout n action = do
v <- atomically $ newEmptyTMvar
_ <- timeout n $ atomically $ do
result <- action
putTMvar v result
atomically $ tryTakeTMvar v
The above uses an extra transactional variable v to do the trick. The result value of the action is stored into v inside the same atomic block in which the action is run. The return value of timeout is not trusted, since it does not tell us if action was run or not. After that, we check the TMVar v, which will be full if and only if action was run.
Instead of TChan a, use TChan (Maybe a) . Your normal producer (of x) now writes Just x. Fork an extra "ticking" process that writes Nothing to the channel (every x seconds). Then have a reader for the channel, and abort if you get two successive Nothing. This way, you avoid exceptions, which may cause data to get lost in your case (but I am not sure).
Related
Let's consider this simple Haskell program:
module Main where
import Control.Concurrent.STM
import Control.Concurrent
import Control.Exception
import Control.Monad
import Data.Maybe
import Data.Monoid
import Control.Applicative
terminator :: Either SomeException () -> IO ()
terminator r = print $ "Dying with " <> show r
doStuff :: TMVar () -> TChan () -> Int -> IO ()
doStuff writeToken barrier w = void $ flip forkFinally terminator $ do
hasWriteToken <- isJust <$> atomically (tryTakeTMVar writeToken)
case hasWriteToken of
True -> do
print $ show w <> "I'm the lead.."
threadDelay (5 * 10^6)
print "Done heavy work"
atomically $ writeTChan barrier ()
False -> do
print $ show w <> " I'm the worker, waiting for the barrier..."
myChan <- atomically $ dupTChan barrier
_ <- atomically $ readTChan myChan
print "Unlocked!"
main :: IO ()
main = do
writeToken <- newTMVarIO ()
barrier <- newBroadcastTChanIO
_ <- forM [1..20] (doStuff writeToken barrier)
threadDelay (20 * 10^6)
return ()
It essentially model a concurrency scenario where a "lead" acquire the write token, do something and the workers will sync on a barrier and way for the "green light" from the lead. This works, but if we replace worker "atomically" block with this:
_ <- atomically $ do
myChan <- dupTChan barrier
readTChan myChan
All the workers remains blocked indefinitely inside a STM transaction:
"Done heavy work"
"Dying with Right ()"
"Dying with Left thread blocked indefinitely in an STM transaction"
"Dying with Left thread blocked indefinitely in an STM transaction"
"Dying with Left thread blocked indefinitely in an STM transaction"
...
I suspect the key lies inside the semantic of atomically. Any idea?
Thanks!
Alfredo
I think this behavior comes from the definition of dupTChan. Copied here for readability, along with readTChan
dupTChan :: TChan a -> STM (TChan a)
dupTChan (TChan _read write) = do
hole <- readTVar write
new_read <- newTVar hole
return (TChan new_read write)
readTChan :: TChan a -> STM a
readTChan (TChan read _write) = do
listhead <- readTVar read
head <- readTVar listhead
case head of
TNil -> retry
TCons a tail -> do
writeTVar read tail
return a
inlining those functions, we get this STM block:
worker_block (TChan _read write) = do
hole <- readTVar write
new_read <- newTVar hole
listhead <- readTVar new_read
head <- readTVar listhead
case head of
TNil -> retry
...
When you try to run this block atomically, we make a new read_end from the tail of the channel, then call readTVar on it. The tail is of course empty, so this readTVar will retry. However, when the lead writes to the channel, the act of writing to the channel invalidates this transaction! So every follower transaction will always have to retry.
In fact, I don't think there is any case where dupTChan >>= readTChan will ever result in anything other than the thread being blocked indefinitely on an STM transaction. You can reason this out from the documentation as well. dupTChan begins empty, so within a single atomic transaction it will never have any items unless that same transaction adds them.
I'm working on a haskell network application and I use the actor pattern to manage multithreading. One thing I came across is how to store for example a set of client sockets/handles. Which of course must be accessible for all threads and can change when clients log on/off.
Since I'm coming from the imperative world I thought about some kind of lock-mechanism but when I noticed how ugly this is I thought about "pure" mutability, well actually it's kind of pure:
import Control.Concurrent
import Control.Monad
import Network
import System.IO
import Data.List
import Data.Maybe
import System.Environment
import Control.Exception
newStorage :: (Eq a, Show a) => IO (Chan (String, Maybe (Chan [a]), Maybe a))
newStorage = do
q <- newChan
forkIO $ storage [] q
return q
newHandleStorage :: IO (Chan (String, Maybe (Chan [Handle]), Maybe Handle))
newHandleStorage = newStorage
storage :: (Eq a, Show a) => [a] -> Chan (String, Maybe (Chan [a]), Maybe a) -> IO ()
storage s q = do
let loop = (`storage` q)
(req, reply, d) <- readChan q
print ("processing " ++ show(d))
case req of
"add" -> loop ((fromJust d) : s)
"remove" -> loop (delete (fromJust d) s)
"get" -> do
writeChan (fromJust reply) s
loop s
store s d = writeChan s ("add", Nothing, Just d)
unstore s d = writeChan s ("remove", Nothing, Just d)
request s = do
chan <- newChan
writeChan s ("get", Just chan, Nothing)
readChan chan
The point is that a thread (actor) is managing a list of items and modifies the list according to incoming requests. Since thread are really cheap I thought this could be a really nice functional alternative.
Of course this is just a prototype (a quick dirty proof of concept).
So my question is:
Is this a "good" way of managing shared mutable variables (in the actor world) ?
Is there already a library for this pattern ? (I already searched but I found nothing)
Regards,
Chris
Here is a quick and dirty example using stm and pipes-network. This will set up a simple server that allows clients to connect and increment or decrement a counter. It will display a very simple status bar showing the current tallies of all connected clients and will remove client tallies from the bar when they disconnect.
First I will begin with the server, and I've generously commented the code to explain how it works:
import Control.Concurrent.STM (STM, atomically)
import Control.Concurrent.STM.TVar
import qualified Data.HashMap.Strict as H
import Data.Foldable (forM_)
import Control.Concurrent (forkIO, threadDelay)
import Control.Monad (unless)
import Control.Monad.Trans.State.Strict
import qualified Data.ByteString.Char8 as B
import Control.Proxy
import Control.Proxy.TCP
import System.IO
main = do
hSetBuffering stdout NoBuffering
{- These are the internal data structures. They should be an implementation
detail and you should never expose these references to the
"business logic" part of the application. -}
-- I use nRef to keep track of creating fresh Ints (which identify users)
nRef <- newTVarIO 0 :: IO (TVar Int)
{- hMap associates every user (i.e. Int) with a counter
Notice how I've "striped" the hash map by storing STM references to the
values instead of storing the values directly. This means that I only
actually write the hashmap when adding or removing users, which reduces
contention for the hash map.
Since each user gets their own unique STM reference for their counter,
modifying counters does not cause contention with other counters or
contention with the hash map. -}
hMap <- newTVarIO H.empty :: IO (TVar (H.HashMap Int (TVar Int)))
{- The following code makes heavy use of Haskell's pure closures. Each
'let' binding closes over its current environment, which is safe since
Haskell is pure. -}
let {- 'getCounters' is the only server-facing command in our STM API. The
only permitted operation is retrieving the current set of user
counters.
'getCounters' closes over the 'hMap' reference currently in scope so
that the server never needs to be aware about our internal
implementation. -}
getCounters :: STM [Int]
getCounters = do
refs <- fmap H.elems (readTVar hMap)
mapM readTVar refs
{- 'init' is the only client-facing command in our STM API. It
initializes the client's entry in the hash map and returns two
commands: the first command is what the client calls to 'increment'
their counter and the second command is what the client calls to log
off and delete
'delete' command.
Notice that those two returned commands each close over the client's
unique STM reference so the client never needs to be aware of how
exactly 'init' is implemented under the hood. -}
init :: STM (STM (), STM ())
init = do
n <- readTVar nRef
writeTVar nRef $! n + 1
ref <- newTVar 0
modifyTVar' hMap (H.insert n ref)
let incrementRef :: STM ()
incrementRef = do
mRef <- fmap (H.lookup n) (readTVar hMap)
forM_ mRef $ \ref -> modifyTVar' ref (+ 1)
deleteRef :: STM ()
deleteRef = modifyTVar' hMap (H.delete n)
return (incrementRef, deleteRef)
{- Now for the actual program logic. Everything past this point only uses
the approved STM API (i.e. 'getCounters' and 'init'). If I wanted I
could factor the above approved STM API into a separate module to enforce
the encapsulation boundary, but I am lazy. -}
{- Fork a thread which polls the current state of the counters and displays
it to the console. There is a way to implement this without polling but
this gets the job done for now.
Most of what it is doing is just some simple tricks to reuse the same
console line instead of outputting a stream of lines. Otherwise it
would be just:
forkIO $ forever $ do
ns <- atomically getCounters
print ns
-}
forkIO $ (`evalStateT` 0) $ forever $ do
del <- get
lift $ do
putStr (replicate del '\b')
putStr (replicate del ' ' )
putStr (replicate del '\b')
ns <- lift $ atomically getCounters
let str = show ns
lift $ putStr str
put $! length str
lift $ threadDelay 10000
{- Fork a thread for each incoming connection, which listens to the client's
commands and translates them into 'STM' actions -}
serve HostAny "8080" $ \(socket, _) -> do
(increment, delete) <- atomically init
{- Right now, just do the dumb thing and convert all keypresses into
increment commands, with the exception of the 'q' key, which will
quit -}
let handler :: (Proxy p) => () -> Consumer p Char IO ()
handler () = runIdentityP loop
where
loop = do
c <- request ()
unless (c == 'q') $ do
lift $ atomically increment
loop
{- This uses my 'pipes' library. It basically is a high-level way to
say:
* Read binary packets from the socket no bigger than 4096 bytes
* Get the first character from each packet and discard the rest
* Handle the character using the above 'handler' function -}
runProxy $ socketReadS 4096 socket >-> mapD B.head >-> handler
{- The above pipeline finishes either when the socket closes or
'handler' stops looping because it received a 'q'. Either case means
that the client is done so we log them out using 'delete'. -}
atomically delete
Next up is the client, which simply opens a connections and forwards all key presses as single packets:
import Control.Monad
import Control.Proxy
import Control.Proxy.Safe
import Control.Proxy.TCP.Safe
import Data.ByteString.Char8 (pack)
import System.IO
main = do
hSetBuffering stdin NoBuffering
hSetEcho stdin False
{- Again, this uses my 'pipes' library. It basically says:
* Read characters from the console using 'commands'
* Pack them into a binary format
* send them to a server running at 127.0.0.1:8080
This finishes looping when the user types a 'q' or the connection is
closed for whatever reason.
-}
runSafeIO $ runProxy $ runEitherK $
try . commands
>-> mapD (\c -> pack [c])
>-> connectWriteD Nothing "127.0.0.1" "8080"
commands :: (Proxy p) => () -> Producer p Char IO ()
commands () = runIdentityP loop
where
loop = do
c <- lift getChar
respond c
unless (c == 'q') loop
It's pretty simple: commands generates a stream of Chars, which then get converted to ByteStrings and then sent as packets to the server.
If you run the server and a few clients and have them each type in a few keys, your server display will output a list showing how many keys each client typed:
[1,6,4]
... and if some of the clients disconnect they will be removed from the list:
[1,4]
Note that the pipes component of these examples will simplify greatly in the upcoming pipes-4.0.0 release, but the current pipes ecosystem still gets the job done as is.
First, I'd definitely recommend using your own specific data type for representing commands. When using (String, Maybe (Chan [a]), Maybe a) a buggy client can crash your actor simply by sending an unknown command or by sending ("add", Nothing, Nothing), etc. I'd suggest something like
data Command a = Add a | Remove a | Get (Chan [a])
Then you can pattern match on commands in storage in a save way.
Actors have their advantages, but also I feel that they have some drawbacks. For example, getting an answer from an actor requires sending it a command and then waiting for a reply. And the client can't be completely sure that it gets a reply and that the reply will be of some specific type - you can't say I want only answers of this type (and how many of them) for this particular command.
So as an example I'll give a simple, STM solution. It'd be better to use a hash table or a (balanced tree) set, but since Handle implements neither Ord nor Hashable, we can't use these data structures, so I'll keep using lists.
module ThreadSet (
TSet, add, remove, get
) where
import Control.Monad
import Control.Monad.STM
import Control.Concurrent.STM.TVar
import Data.List (delete)
newtype TSet a = TSet (TVar [a])
add :: (Eq a) => a -> TSet a -> STM ()
add x (TSet v) = readTVar v >>= writeTVar v . (x :)
remove :: (Eq a) => a -> TSet a -> STM ()
remove x (TSet v) = readTVar v >>= writeTVar v . delete x
get :: (Eq a) => TSet a -> STM [a]
get (TSet v) = readTVar v
This module implements a STM based set of arbitrary elements. You can have multiple such sets and use them together in a single STM transaction that succeeds or fails at once. For example
-- | Ensures that there is exactly one element `x` in the set.
add1 :: (Eq a) => a -> TSet a -> STM ()
add1 x v = remove x v >> add x v
This would be difficult with actors, you'd have to add it as another command for the actor, you can't compose it of existing actions and still have atomicity.
Update: There is an interesting article explaining why Clojure designers chose not to use actors. For example, using actors, even if you have many reads and only very little writes to a mutable structure, they're all serialized, which can greatly impact performance.
I am doing things with STM and have among other things used the TBQueue data structure with great success. An useful feature I've been using it for involves reading from it based on a precondition in a TVar, basically like so:
shouldRead <- readTVar shouldReadVar
if shouldRead
then do
a <- readTBQueue queue
doSomethingWith a
else doSomethingElse
If we assume that queue is empty and shouldReadVar contains True before executing this block, it will result in readTBQueue calling retry, and the block will be re-executed when shouldReadVar contains False or queue contains an element, whatever happens first.
I am now in need of a synchronous channel data structure, similar to the structure described in this article (Please read it if you want to understand this question), except it needs to be readable with a pre-condition like in the previous example, and possibly compose with other stuff as well.
Let's call this data structure SyncChan with writeSyncChan and readSyncChan operations defined on it.
And here's a possible use case: This (pseudo) code (which will not work because I mix STM/IO concepts):
shouldRead <- readTVar shouldReadVar
if shouldRead
then do
a <- readSyncChan syncChan
doSomethingWith a
else doSomethingElse
Assuming that no other thread is currently blocking on a writeSyncChan call, and shouldReadChan contains True, I want the block to "retry" until either shouldReadChan contains False, or a different thread blocks on a writeSyncChan. In other words: when one thread retrys on writeSyncChan and another thread blocks reaches a readSyncChan, or vice versa, I want the value to be transferred along the channel. In all other cases, both sides should be in a retry state and thus react to a change in shouldReadVar, so that the read or write can be cancelled.
The naïve approach described in the article linked above using two (T)MVars is of course not possible. Because the data structure is synchronous, it is impossible to use it within two atomically blocks, because you cannot change one TMVar and wait for another TMVar to be changed in an atomic context.
Instead, I am looking for a kind of partial atomicity, where I can "commit" a certain part of a transaction and only roll it back when certain variables change, but not others. If I have "msg" and "ack" variables like the first example in the article above, I want to be able to write to the "msg" variable, then wait for either a value to arrive on "ack", or for my other transactional variables to change. If other transactional variables change, the whole atomic block should be retried, and if an "ack" value arrives, the transaction should continue as it was in the previous state. For the reading side, something similar should happen, except I would of course be reading from "msg" and writing to "ack."
Is this possible to do using GHC STM, or do I need to do manual MVar/rollback handling?
This is what you want:
import Control.Concurrent
import Control.Concurrent.STM
import Control.Monad
data SyncChan a = SyncChan (TMVar a) (TMVar ())
newSyncChan :: IO (SyncChan a)
newSyncChan = do
msg <- newEmptyTMVarIO
ack <- newEmptyTMVarIO
return (SyncChan msg ack)
readIf :: SyncChan a -> TVar Bool -> STM (Maybe a)
readIf (SyncChan msg ack) shouldReadVar = do
b <- readTVar shouldReadVar
if b
then do
a <- takeTMVar msg
putTMVar ack ()
return (Just a)
else return Nothing
write :: SyncChan a -> a -> IO ()
write (SyncChan msg ack) a = do
atomically $ putTMVar msg a
atomically $ takeTMVar ack
main = do
sc <- newSyncChan
tv <- newTVarIO True
forkIO $ forever $ forM_ [False, True] $ \b -> do
threadDelay 2000000
atomically $ writeTVar tv b
forkIO $ forM_ [0..] $ \i -> do
putStrLn "Writing..."
write sc i
putStrLn "Write Complete"
threadDelay 300000
forever $ do
putStrLn "Reading..."
a <- atomically $ readIf sc tv
print a
putStrLn "Read Complete"
This gives the behavior you had in mind. While the TVar is True the input and output ends will be synchronized with each other. When the TVar switches to False then the read end freely aborts and returns Nothing.
The function below wants to either receive and ack or wait until its duetime has come and return.
Now, it works when it receives and ack. It works correctly when no ack is received and waits until duetime.
When the duetime is reached it freezes. It seems that it is not exiting my self constructed loop correctly. I have also tried with if-then-else, but same result. I do not want to use whileM.
How do I correctly exit the loop?
import Network.Socket hiding (send, sendTo, recv, recvFrom)
import Network.Socket.ByteString
waitAck s duetime' = do
print ("in")
(a, _) <- recvFrom s 4711
now' <- getPOSIXTime
unless (B.unpack a == "ack") (when (now' < duetime') (waitAck s duetime'))
print (B.unpack a)
return ()
The correct solution is to race two threads, one that waits for the ack, and one that waits for the time. Kill the one that loses the race. Perhaps this (untested) code will give you a hint about how:
import Control.Concurrency.MVar
withTimeout :: Int -> IO a -> IO (Maybe a)
withTimeout n io = do
mvar <- newEmptyMVar
timeout <- forkIO (threadDelay n >> putMVar mvar Nothing)
action <- forkIO (io >>= putMVar mvar . Just)
result <- takeMVar mvar
killThread timeout
killThread action
return result
waitAck s timeout = withTimeout timeout go where
go = do
(a, _) <- recvFrom s 4711
if B.unpack a == "ack" then print (B.unpack a) else go
edit: It seems that base provides System.Timeout.timeout for exactly this purpose. Its implementation is more likely to be correct than this one, too.
That's not an iterative loop. You don't place any conditions on the stuff after the recursive call, so when the conditions finally fail, the whole thing will unwind, printing once for every recursive call. I suspect that might be enough to make it appear frozen.
Try something like this:
waitAck s duetime' = do
print ("in")
(a, _) <- recvFrom s 4711
now' <- getPOSIXTime
if B.unpack a == "ack" || now' >= duetime'
then print (B.unpack a)
else waitAck s duetime'
I would like to call a UDP send function within an STM transaction so that I can avoid code like below where m' is read (and could be updated by an other thread) before the values are eventually sent & where two consecutive where clauses make me look quite "helpless".
sendRecv s newmsgs q m = do
m' <- atomically $ readTVar m
time <- getPOSIXTime
result <- appendMsg newmsgs key m
when (result > 0) (atomically $ do
mT <- readTVar m
qT <- readTVar q
--let Just messages = Map.lookup key mT in sendq s (B.pack $ unwords messages) "192.168.1.1" 4711
let mT' = Map.delete key mT
qT' = PSQ.delete key qT
writeTVar q (PSQ.insert key time qT')
writeTVar m (Map.insert key [newmsgs] mT'))
when (result > 0) (let Just messages = Map.lookup key m' in sendq s (B.pack $ unwords messages) "192.168.1.1" 4711)
sendq :: Socket -> B.ByteString -> String -> PortNumber -> IO ()
sendq s datastring host port = do
hostAddr <- inet_addr host
sendAllTo s datastring (SockAddrInet port hostAddr)
return ()
I thought that by invoking TVars with newTVarIO and using import System.IO.Unsafe I could eventually use unsafePerformIO somewhere and call my sendq function (that returns IO() ) from within the transaction.
However, I do not find where this "somewhere" is? Is it at the creation of the TVar? Is it instead of atomically $ do? Do I understand the sense an applicability of unsafePerformIO wrong?
IO cannot be done from inside an STM block, because general IO cannot be undone. If you want to do some IO, you must schedule it in the STM block, but do it outside. For example:
foo tvar = do
scheduledAction <- atomically $ do
v <- readTVar tvar
when v retry
return (sendSomethingOnASocket "okay, we're done here")
scheduledAction
If you really need to do IO within a transaction, there's unsafeIOToSTM :: IO a -> STM a, however you should make sure to read the documentation first, as there are several gotchas to be aware of. In particular, the IO action may be run several times if the transaction has to be retried.
That said, I don't think that is appropriate in this case, and you should probably refactor your code so that the message is sent outside the transaction.