How to use System.IO.Unsafe together with TVars? - haskell

I would like to call a UDP send function within an STM transaction so that I can avoid code like below where m' is read (and could be updated by an other thread) before the values are eventually sent & where two consecutive where clauses make me look quite "helpless".
sendRecv s newmsgs q m = do
m' <- atomically $ readTVar m
time <- getPOSIXTime
result <- appendMsg newmsgs key m
when (result > 0) (atomically $ do
mT <- readTVar m
qT <- readTVar q
--let Just messages = Map.lookup key mT in sendq s (B.pack $ unwords messages) "192.168.1.1" 4711
let mT' = Map.delete key mT
qT' = PSQ.delete key qT
writeTVar q (PSQ.insert key time qT')
writeTVar m (Map.insert key [newmsgs] mT'))
when (result > 0) (let Just messages = Map.lookup key m' in sendq s (B.pack $ unwords messages) "192.168.1.1" 4711)
sendq :: Socket -> B.ByteString -> String -> PortNumber -> IO ()
sendq s datastring host port = do
hostAddr <- inet_addr host
sendAllTo s datastring (SockAddrInet port hostAddr)
return ()
I thought that by invoking TVars with newTVarIO and using import System.IO.Unsafe I could eventually use unsafePerformIO somewhere and call my sendq function (that returns IO() ) from within the transaction.
However, I do not find where this "somewhere" is? Is it at the creation of the TVar? Is it instead of atomically $ do? Do I understand the sense an applicability of unsafePerformIO wrong?

IO cannot be done from inside an STM block, because general IO cannot be undone. If you want to do some IO, you must schedule it in the STM block, but do it outside. For example:
foo tvar = do
scheduledAction <- atomically $ do
v <- readTVar tvar
when v retry
return (sendSomethingOnASocket "okay, we're done here")
scheduledAction

If you really need to do IO within a transaction, there's unsafeIOToSTM :: IO a -> STM a, however you should make sure to read the documentation first, as there are several gotchas to be aware of. In particular, the IO action may be run several times if the transaction has to be retried.
That said, I don't think that is appropriate in this case, and you should probably refactor your code so that the message is sent outside the transaction.

Related

What's an idiomatic way of handling a lazy input channel in Haskell

I am implementing an IRC bot and since I am connecting over SSL by using OpenSSL.Session I use lazyRead function to read data from the socket. During the initial phase of the connection I need to perform several things in order: nick negotiation, nickserv identification, joining channels etc) so there is some state involved. Right now I came up with the following:
data ConnectionState = Initial | NickIdentification | Connected
listen :: SSL.SSL -> IO ()
listen ssl = do
lines <- BL.lines `fmap` SSL.lazyRead ssl
evalStateT (mapM_ (processLine ssl) lines) Initial
processLine :: SSL.SSL -> BL.ByteString -> StateT ConnectionState IO ()
processLine ssl line = do case message of
Just a -> processMessage ssl a
Nothing -> return ()
where message = IRC.decode $ BL.toStrict line
processMessage :: SSL.SSL -> IRC.Message -> StateT ConnectionState IO ()
processMessage ssl m = do
state <- S.get
case state of
Initial -> when (IRC.msg_command m == "376") $ do
liftIO $ putStrLn "connected!"
liftIO $ privmsg ssl "NickServ" ("identify " ++ nick_password)
S.put NickIdentification
NickIdentification -> do
when (identified m) $ do
liftIO $ putStrLn "identified!"
liftIO $ joinChannel ssl chan
S.put Connected
Connected -> return ()
liftIO $ print m
when (IRC.msg_command m == "PING") $ (liftIO . pong . mconcat . map show) (IRC.msg_params m)
So when I get to the "Connected" state I still end up going through the case statement even though it's only really needed to initialize the connection. The other problem is that adding nested StateT's would be very painful.
Other way would be to replace mapM with something custom to only process lines until we are connected and then start another loop over the rest. This would require either keeping track of what's left in the list or invoking SSL.lazyRead once again (which is not too bad).
Another solution is to keep the remaining lines list in the state and draw lines when needed similar to getLine.
What's the better thing to do in this case? Would Haskell's laziness make it so that we go directly to Connected case after state stops updating or is case always strict?
You can use the Pipe type from pipes. The trick is that instead of creating a state machine and a transition function you can encode the the state implicitly in the control flow of the Pipe.
Here is what the Pipe would look like:
stateful :: Pipe ByteString ByteString IO r
stateful = do
msg <- await
if (IRC.msg_command msg == "376")
then do
liftIO $ putStrLn "connected!"
liftIO $ privmsg ssl "NickServ" ("identify " ++ nick_password)
yield msg
nick
else stateful
nick :: Pipe ByteString ByteString IO r
nick = do
msg <- await
if identified msg
then do
liftIO $ putStrLn "identified!"
liftIO $ joinChannel ssl chan
yield msg
cat -- Forward the remaining input to output indefinitely
else nick
The stateful pipe corresponds to the stateful part of your processMessage function. It handles initialization and authentication, but defers further message processing to downstream stages by re-yielding the msg.
You can then loop over every message this Pipe yields by using for:
processMessage :: Consumer ByteString IO r
processMessage = for stateful $ \msg -> do
liftIO $ print m
when (IRC.msg_command m == "PING") $ (liftIO . pong . mconcat . map show) (IRC.msg_params m)
Now all you need is a source of ByteString lines to feed to processMessage. You can use the following Producer:
lines :: Producer ByteString IO ()
lines = do
bs <- liftIO (ByteString.getLine)
if ByteString.null bs
then return ()
else do
yield bs
lines
Then you can connect lines to processMessage and run them:
runEffect (lines >-> processMessage) :: IO ()
Note that the lines Producer does not use lazy IO. It will work even if you use the strict ByteString module, but the behavior of the entire program will still be lazy.
If you want to learn more about how pipes works, you can read the pipes tutorial.

STM and atomically: why the semantic of these two programs differ?

Let's consider this simple Haskell program:
module Main where
import Control.Concurrent.STM
import Control.Concurrent
import Control.Exception
import Control.Monad
import Data.Maybe
import Data.Monoid
import Control.Applicative
terminator :: Either SomeException () -> IO ()
terminator r = print $ "Dying with " <> show r
doStuff :: TMVar () -> TChan () -> Int -> IO ()
doStuff writeToken barrier w = void $ flip forkFinally terminator $ do
hasWriteToken <- isJust <$> atomically (tryTakeTMVar writeToken)
case hasWriteToken of
True -> do
print $ show w <> "I'm the lead.."
threadDelay (5 * 10^6)
print "Done heavy work"
atomically $ writeTChan barrier ()
False -> do
print $ show w <> " I'm the worker, waiting for the barrier..."
myChan <- atomically $ dupTChan barrier
_ <- atomically $ readTChan myChan
print "Unlocked!"
main :: IO ()
main = do
writeToken <- newTMVarIO ()
barrier <- newBroadcastTChanIO
_ <- forM [1..20] (doStuff writeToken barrier)
threadDelay (20 * 10^6)
return ()
It essentially model a concurrency scenario where a "lead" acquire the write token, do something and the workers will sync on a barrier and way for the "green light" from the lead. This works, but if we replace worker "atomically" block with this:
_ <- atomically $ do
myChan <- dupTChan barrier
readTChan myChan
All the workers remains blocked indefinitely inside a STM transaction:
"Done heavy work"
"Dying with Right ()"
"Dying with Left thread blocked indefinitely in an STM transaction"
"Dying with Left thread blocked indefinitely in an STM transaction"
"Dying with Left thread blocked indefinitely in an STM transaction"
...
I suspect the key lies inside the semantic of atomically. Any idea?
Thanks!
Alfredo
I think this behavior comes from the definition of dupTChan. Copied here for readability, along with readTChan
dupTChan :: TChan a -> STM (TChan a)
dupTChan (TChan _read write) = do
hole <- readTVar write
new_read <- newTVar hole
return (TChan new_read write)
readTChan :: TChan a -> STM a
readTChan (TChan read _write) = do
listhead <- readTVar read
head <- readTVar listhead
case head of
TNil -> retry
TCons a tail -> do
writeTVar read tail
return a
inlining those functions, we get this STM block:
worker_block (TChan _read write) = do
hole <- readTVar write
new_read <- newTVar hole
listhead <- readTVar new_read
head <- readTVar listhead
case head of
TNil -> retry
...
When you try to run this block atomically, we make a new read_end from the tail of the channel, then call readTVar on it. The tail is of course empty, so this readTVar will retry. However, when the lead writes to the channel, the act of writing to the channel invalidates this transaction! So every follower transaction will always have to retry.
In fact, I don't think there is any case where dupTChan >>= readTChan will ever result in anything other than the thread being blocked indefinitely on an STM transaction. You can reason this out from the documentation as well. dupTChan begins empty, so within a single atomic transaction it will never have any items unless that same transaction adds them.

Using TChan with Timeout

I have a TChan as input for a thread which should behave like this:
If sombody writes to the TChan within a specific time, the content should be retrieved. If there is nothing written within the specified time, it should unblock and continue with Nothing.
My attempt on this was to use the timeout function from System.Timeout like this:
timeout 1000000 $ atomically $ readTChan pktChannel
This seemed to work but now I discovered, that I am sometimes loosing packets (they are written to the channel, but not read on the other side. In the log I get this:
2014.063.11.53.43.588365 Pushing Recorded Packet: 2 1439
2014.063.11.53.43.592319 Run into timeout
2014.063.11.53.44.593396 Run into timeout
2014.063.11.53.44.593553 Pushing Recorded Packet: 3 1439
2014.063.11.53.44.597177 Sending Recorded Packet: 3 1439
Where "Pushing Recorded Packet" is the writing from the one thread and "Sending Recorded Packet" is the reading from the TChan in the sender thread. The line with Sending Recorded Packet 2 1439 is missing, which would indicate a successful read from the TChan.
It seems that if the timeout is received at the wrong point in time, the channel looses the packet. I suspect that the threadKill function used inside timeout and STM don't play well together.
Is this correct? Does somebody have another solution that does not loose the packet?
Use registerDelay, an STM function, to signal a TVar when the timeout is reached. You can then use the orElse function or the Alternative operator <|> to select between the next TChan value or the timeout.
import Control.Applicative
import Control.Monad
import Control.Concurrent
import Control.Concurrent.STM
import System.Random
-- write random values after a random delay
packetWriter :: Int -> TChan Int -> IO ()
packetWriter maxDelay chan = do
let xs = randomRs (10000 :: Int, maxDelay + 50000) (mkStdGen 24036583)
forM_ xs $ \ x -> do
threadDelay x
atomically $ writeTChan chan x
-- block (retry) until the delay TVar is set to True
fini :: TVar Bool -> STM ()
fini = check <=< readTVar
-- Read the next value from a TChan or timeout
readTChanTimeout :: Int -> TChan a -> IO (Maybe a)
readTChanTimeout timeoutAfter pktChannel = do
delay <- registerDelay timeoutAfter
atomically $
Just <$> readTChan pktChannel
<|> Nothing <$ fini delay
-- | Print packets until a timeout is reached
readLoop :: Show a => Int -> TChan a -> IO ()
readLoop timeoutAfter pktChannel = do
res <- readTChanTimeout timeoutAfter pktChannel
case res of
Nothing -> putStrLn "timeout"
Just val -> do
putStrLn $ "packet: " ++ show val
readLoop timeoutAfter pktChannel
main :: IO ()
main = do
let timeoutAfter = 1000000
-- spin up a packet writer simulation
pktChannel <- newTChanIO
tid <- forkIO $ packetWriter timeoutAfter pktChannel
readLoop timeoutAfter pktChannel
killThread tid
The thumb rule of concurrency is: if adding a sleep in some point inside an IO action matters, your program is not safe.
To understand why the code timeout 1000000 $ atomically $ readTChan pktChannel does not work, consider the following alternative implementation of atomically:
atomically' :: STM a -> IO a
atomically' action = do
result <- atomically action
threadDelay someTimeAmount
return result
The above is equal to atomically, but for an extra innocent delay. Now it is easy to see that if timeout kills the thread during the threadDelay, the atomic action has completed (consuming a message from the channel), yet timeout will return Nothing.
A simple fix to timeout n $ atomically ... could be the following
smartTimeout :: Int -> STM a -> IO (Maybe a)
smartTimeout n action = do
v <- atomically $ newEmptyTMvar
_ <- timeout n $ atomically $ do
result <- action
putTMvar v result
atomically $ tryTakeTMvar v
The above uses an extra transactional variable v to do the trick. The result value of the action is stored into v inside the same atomic block in which the action is run. The return value of timeout is not trusted, since it does not tell us if action was run or not. After that, we check the TMVar v, which will be full if and only if action was run.
Instead of TChan a, use TChan (Maybe a) . Your normal producer (of x) now writes Just x. Fork an extra "ticking" process that writes Nothing to the channel (every x seconds). Then have a reader for the channel, and abort if you get two successive Nothing. This way, you avoid exceptions, which may cause data to get lost in your case (but I am not sure).

Haskell - Actor based mutability

I'm working on a haskell network application and I use the actor pattern to manage multithreading. One thing I came across is how to store for example a set of client sockets/handles. Which of course must be accessible for all threads and can change when clients log on/off.
Since I'm coming from the imperative world I thought about some kind of lock-mechanism but when I noticed how ugly this is I thought about "pure" mutability, well actually it's kind of pure:
import Control.Concurrent
import Control.Monad
import Network
import System.IO
import Data.List
import Data.Maybe
import System.Environment
import Control.Exception
newStorage :: (Eq a, Show a) => IO (Chan (String, Maybe (Chan [a]), Maybe a))
newStorage = do
q <- newChan
forkIO $ storage [] q
return q
newHandleStorage :: IO (Chan (String, Maybe (Chan [Handle]), Maybe Handle))
newHandleStorage = newStorage
storage :: (Eq a, Show a) => [a] -> Chan (String, Maybe (Chan [a]), Maybe a) -> IO ()
storage s q = do
let loop = (`storage` q)
(req, reply, d) <- readChan q
print ("processing " ++ show(d))
case req of
"add" -> loop ((fromJust d) : s)
"remove" -> loop (delete (fromJust d) s)
"get" -> do
writeChan (fromJust reply) s
loop s
store s d = writeChan s ("add", Nothing, Just d)
unstore s d = writeChan s ("remove", Nothing, Just d)
request s = do
chan <- newChan
writeChan s ("get", Just chan, Nothing)
readChan chan
The point is that a thread (actor) is managing a list of items and modifies the list according to incoming requests. Since thread are really cheap I thought this could be a really nice functional alternative.
Of course this is just a prototype (a quick dirty proof of concept).
So my question is:
Is this a "good" way of managing shared mutable variables (in the actor world) ?
Is there already a library for this pattern ? (I already searched but I found nothing)
Regards,
Chris
Here is a quick and dirty example using stm and pipes-network. This will set up a simple server that allows clients to connect and increment or decrement a counter. It will display a very simple status bar showing the current tallies of all connected clients and will remove client tallies from the bar when they disconnect.
First I will begin with the server, and I've generously commented the code to explain how it works:
import Control.Concurrent.STM (STM, atomically)
import Control.Concurrent.STM.TVar
import qualified Data.HashMap.Strict as H
import Data.Foldable (forM_)
import Control.Concurrent (forkIO, threadDelay)
import Control.Monad (unless)
import Control.Monad.Trans.State.Strict
import qualified Data.ByteString.Char8 as B
import Control.Proxy
import Control.Proxy.TCP
import System.IO
main = do
hSetBuffering stdout NoBuffering
{- These are the internal data structures. They should be an implementation
detail and you should never expose these references to the
"business logic" part of the application. -}
-- I use nRef to keep track of creating fresh Ints (which identify users)
nRef <- newTVarIO 0 :: IO (TVar Int)
{- hMap associates every user (i.e. Int) with a counter
Notice how I've "striped" the hash map by storing STM references to the
values instead of storing the values directly. This means that I only
actually write the hashmap when adding or removing users, which reduces
contention for the hash map.
Since each user gets their own unique STM reference for their counter,
modifying counters does not cause contention with other counters or
contention with the hash map. -}
hMap <- newTVarIO H.empty :: IO (TVar (H.HashMap Int (TVar Int)))
{- The following code makes heavy use of Haskell's pure closures. Each
'let' binding closes over its current environment, which is safe since
Haskell is pure. -}
let {- 'getCounters' is the only server-facing command in our STM API. The
only permitted operation is retrieving the current set of user
counters.
'getCounters' closes over the 'hMap' reference currently in scope so
that the server never needs to be aware about our internal
implementation. -}
getCounters :: STM [Int]
getCounters = do
refs <- fmap H.elems (readTVar hMap)
mapM readTVar refs
{- 'init' is the only client-facing command in our STM API. It
initializes the client's entry in the hash map and returns two
commands: the first command is what the client calls to 'increment'
their counter and the second command is what the client calls to log
off and delete
'delete' command.
Notice that those two returned commands each close over the client's
unique STM reference so the client never needs to be aware of how
exactly 'init' is implemented under the hood. -}
init :: STM (STM (), STM ())
init = do
n <- readTVar nRef
writeTVar nRef $! n + 1
ref <- newTVar 0
modifyTVar' hMap (H.insert n ref)
let incrementRef :: STM ()
incrementRef = do
mRef <- fmap (H.lookup n) (readTVar hMap)
forM_ mRef $ \ref -> modifyTVar' ref (+ 1)
deleteRef :: STM ()
deleteRef = modifyTVar' hMap (H.delete n)
return (incrementRef, deleteRef)
{- Now for the actual program logic. Everything past this point only uses
the approved STM API (i.e. 'getCounters' and 'init'). If I wanted I
could factor the above approved STM API into a separate module to enforce
the encapsulation boundary, but I am lazy. -}
{- Fork a thread which polls the current state of the counters and displays
it to the console. There is a way to implement this without polling but
this gets the job done for now.
Most of what it is doing is just some simple tricks to reuse the same
console line instead of outputting a stream of lines. Otherwise it
would be just:
forkIO $ forever $ do
ns <- atomically getCounters
print ns
-}
forkIO $ (`evalStateT` 0) $ forever $ do
del <- get
lift $ do
putStr (replicate del '\b')
putStr (replicate del ' ' )
putStr (replicate del '\b')
ns <- lift $ atomically getCounters
let str = show ns
lift $ putStr str
put $! length str
lift $ threadDelay 10000
{- Fork a thread for each incoming connection, which listens to the client's
commands and translates them into 'STM' actions -}
serve HostAny "8080" $ \(socket, _) -> do
(increment, delete) <- atomically init
{- Right now, just do the dumb thing and convert all keypresses into
increment commands, with the exception of the 'q' key, which will
quit -}
let handler :: (Proxy p) => () -> Consumer p Char IO ()
handler () = runIdentityP loop
where
loop = do
c <- request ()
unless (c == 'q') $ do
lift $ atomically increment
loop
{- This uses my 'pipes' library. It basically is a high-level way to
say:
* Read binary packets from the socket no bigger than 4096 bytes
* Get the first character from each packet and discard the rest
* Handle the character using the above 'handler' function -}
runProxy $ socketReadS 4096 socket >-> mapD B.head >-> handler
{- The above pipeline finishes either when the socket closes or
'handler' stops looping because it received a 'q'. Either case means
that the client is done so we log them out using 'delete'. -}
atomically delete
Next up is the client, which simply opens a connections and forwards all key presses as single packets:
import Control.Monad
import Control.Proxy
import Control.Proxy.Safe
import Control.Proxy.TCP.Safe
import Data.ByteString.Char8 (pack)
import System.IO
main = do
hSetBuffering stdin NoBuffering
hSetEcho stdin False
{- Again, this uses my 'pipes' library. It basically says:
* Read characters from the console using 'commands'
* Pack them into a binary format
* send them to a server running at 127.0.0.1:8080
This finishes looping when the user types a 'q' or the connection is
closed for whatever reason.
-}
runSafeIO $ runProxy $ runEitherK $
try . commands
>-> mapD (\c -> pack [c])
>-> connectWriteD Nothing "127.0.0.1" "8080"
commands :: (Proxy p) => () -> Producer p Char IO ()
commands () = runIdentityP loop
where
loop = do
c <- lift getChar
respond c
unless (c == 'q') loop
It's pretty simple: commands generates a stream of Chars, which then get converted to ByteStrings and then sent as packets to the server.
If you run the server and a few clients and have them each type in a few keys, your server display will output a list showing how many keys each client typed:
[1,6,4]
... and if some of the clients disconnect they will be removed from the list:
[1,4]
Note that the pipes component of these examples will simplify greatly in the upcoming pipes-4.0.0 release, but the current pipes ecosystem still gets the job done as is.
First, I'd definitely recommend using your own specific data type for representing commands. When using (String, Maybe (Chan [a]), Maybe a) a buggy client can crash your actor simply by sending an unknown command or by sending ("add", Nothing, Nothing), etc. I'd suggest something like
data Command a = Add a | Remove a | Get (Chan [a])
Then you can pattern match on commands in storage in a save way.
Actors have their advantages, but also I feel that they have some drawbacks. For example, getting an answer from an actor requires sending it a command and then waiting for a reply. And the client can't be completely sure that it gets a reply and that the reply will be of some specific type - you can't say I want only answers of this type (and how many of them) for this particular command.
So as an example I'll give a simple, STM solution. It'd be better to use a hash table or a (balanced tree) set, but since Handle implements neither Ord nor Hashable, we can't use these data structures, so I'll keep using lists.
module ThreadSet (
TSet, add, remove, get
) where
import Control.Monad
import Control.Monad.STM
import Control.Concurrent.STM.TVar
import Data.List (delete)
newtype TSet a = TSet (TVar [a])
add :: (Eq a) => a -> TSet a -> STM ()
add x (TSet v) = readTVar v >>= writeTVar v . (x :)
remove :: (Eq a) => a -> TSet a -> STM ()
remove x (TSet v) = readTVar v >>= writeTVar v . delete x
get :: (Eq a) => TSet a -> STM [a]
get (TSet v) = readTVar v
This module implements a STM based set of arbitrary elements. You can have multiple such sets and use them together in a single STM transaction that succeeds or fails at once. For example
-- | Ensures that there is exactly one element `x` in the set.
add1 :: (Eq a) => a -> TSet a -> STM ()
add1 x v = remove x v >> add x v
This would be difficult with actors, you'd have to add it as another command for the actor, you can't compose it of existing actions and still have atomicity.
Update: There is an interesting article explaining why Clojure designers chose not to use actors. For example, using actors, even if you have many reads and only very little writes to a mutable structure, they're all serialized, which can greatly impact performance.

Irrefutable pattern inside an action?

In my application several threads modify the same data. Whilst under "normal" circumstances there will always be a messages, I found out that with very large amounts of data it can happen that an other thread has been faster and deleted messages before loopAction runs.
transMit :: Socket -> POSIXTime -> String -> TPSQ -> TMap -> IO ()
transMit s time newmsgs q m = do
loopAction <- atomically $ do
mT <- readTVar m
qT <- readTVar q
let mT' = Map.delete key mT
let qT' = PSQ.delete key qT
writeTVar q (PSQ.insert key time qT')
writeTVar m (Map.insert key [newmsgs] mT')
return (let Just messages = Map.lookup key mT in sendq s (B.pack $ unwords messages) "192.168.35.84" 4711)
loopAction
I tried here a case expressions such as
case (Map.lookup key MT) of
Nothing -> return ()
_ -> something w IO
but it does not work of course since the one returns () and the other branch returns IO (), etc.. What is my best take to resolve this?
This should fix the type error:
return $ case Map.lookup key mT of
Nothing -> return ()
Just messages -> sendq s (B.pack $ unwords messages) "192.168.35.84" 4711
This way both branches yield a IO (), resulting in a STM (IO ()) overall.
But I'm not sure what's happening in terms of the values. mT surely can't be modified by a different thread...

Resources