Irrefutable pattern inside an action? - haskell

In my application several threads modify the same data. Whilst under "normal" circumstances there will always be a messages, I found out that with very large amounts of data it can happen that an other thread has been faster and deleted messages before loopAction runs.
transMit :: Socket -> POSIXTime -> String -> TPSQ -> TMap -> IO ()
transMit s time newmsgs q m = do
loopAction <- atomically $ do
mT <- readTVar m
qT <- readTVar q
let mT' = Map.delete key mT
let qT' = PSQ.delete key qT
writeTVar q (PSQ.insert key time qT')
writeTVar m (Map.insert key [newmsgs] mT')
return (let Just messages = Map.lookup key mT in sendq s (B.pack $ unwords messages) "192.168.35.84" 4711)
loopAction
I tried here a case expressions such as
case (Map.lookup key MT) of
Nothing -> return ()
_ -> something w IO
but it does not work of course since the one returns () and the other branch returns IO (), etc.. What is my best take to resolve this?

This should fix the type error:
return $ case Map.lookup key mT of
Nothing -> return ()
Just messages -> sendq s (B.pack $ unwords messages) "192.168.35.84" 4711
This way both branches yield a IO (), resulting in a STM (IO ()) overall.
But I'm not sure what's happening in terms of the values. mT surely can't be modified by a different thread...

Related

Convert IO callback to infinite list

I am using a library that I can provide with a function a -> IO (), which it will call occasionally.
Because the output of my function depends not only on the a it receives as input, but also on the previous a's, it would be much easier for me to write a function [a] -> IO (), where [a] is infinite.
Can I write a function:
magical :: ([a] -> IO ()) -> (a -> IO ())
That collects the a's it receives from the callback and passes them to my function as a lazy infinite list?
The IORef solution is indeed the simplest one. If you'd like to explore a pure (but more complex) variant, have a look at conduit. There are other implementations of the same concept, see Iteratee I/O, but I found myself conduit to be very easy to use.
A conduit (AKA pipe) is an abstraction of of program that can accept input and/or produce output. As such, it can keep internal state, if needed. In your case, magical would be a sink, that is, a conduit that accepts input of some type, but produces no output. By wiring it into a source, a program that produces output, you complete the pipeline and then ever time the sink asks for an input, the source is run until it produces its output.
In your case you'd have roughly something like
magical :: Sink a IO () -- consumes a stream of `a`s, no result
magical = go (some initial state)
where
go state = do
m'input <- await
case m'input of
Nothing -> return () -- finish
Just input -> do
-- do something with the input
go (some updated state)
This is not exactly what you asked for, but it might be enough for your purposes, I think.
magical :: ([a] -> IO ()) -> IO (a -> IO ())
magical f = do
list <- newIORef []
let g x = do
modifyIORef list (x:)
xs <- readIORef list
f xs -- or (reverse xs), if you need FIFO ordering
return g
So if you have a function fooHistory :: [a] -> IO (), you can use
main = do
...
foo <- magical fooHistory
setHandler foo -- here we have foo :: a -> IO ()
...
As #danidaz wrote above, you probably do not need magical, but can play the same trick directly in your fooHistory, modifying a list reference (IORef [a]).
main = do
...
list <- newIORef []
let fooHistory x = do
modifyIORef list (x:)
xs <- readIORef list
use xs -- or (reverse xs), if you need FIFO ordering
setHandler fooHistory -- here we have fooHistory :: a -> IO ()
...
Control.Concurrent.Chan does almost exactly what I wanted!
import Control.Monad (forever)
import Control.Concurrent (forkIO)
import Control.Concurrent.Chan
setHandler :: (Char -> IO ()) -> IO ()
setHandler f = void . forkIO . forever $ getChar >>= f
process :: String -> IO ()
process ('h':'i':xs) = putStrLn "hi" >> process xs
process ('a':xs) = putStrLn "a" >> process xs
process (x:xs) = process xs
process _ = error "Guaranteed to be infinite"
main :: IO ()
main = do
c <- newChan
setHandler $ writeChan c
list <- getChanContents c
process list
This seems like a flaw in the library design to me. You might consider an upstream patch so that you could provide something more versatile as input.

What's an idiomatic way of handling a lazy input channel in Haskell

I am implementing an IRC bot and since I am connecting over SSL by using OpenSSL.Session I use lazyRead function to read data from the socket. During the initial phase of the connection I need to perform several things in order: nick negotiation, nickserv identification, joining channels etc) so there is some state involved. Right now I came up with the following:
data ConnectionState = Initial | NickIdentification | Connected
listen :: SSL.SSL -> IO ()
listen ssl = do
lines <- BL.lines `fmap` SSL.lazyRead ssl
evalStateT (mapM_ (processLine ssl) lines) Initial
processLine :: SSL.SSL -> BL.ByteString -> StateT ConnectionState IO ()
processLine ssl line = do case message of
Just a -> processMessage ssl a
Nothing -> return ()
where message = IRC.decode $ BL.toStrict line
processMessage :: SSL.SSL -> IRC.Message -> StateT ConnectionState IO ()
processMessage ssl m = do
state <- S.get
case state of
Initial -> when (IRC.msg_command m == "376") $ do
liftIO $ putStrLn "connected!"
liftIO $ privmsg ssl "NickServ" ("identify " ++ nick_password)
S.put NickIdentification
NickIdentification -> do
when (identified m) $ do
liftIO $ putStrLn "identified!"
liftIO $ joinChannel ssl chan
S.put Connected
Connected -> return ()
liftIO $ print m
when (IRC.msg_command m == "PING") $ (liftIO . pong . mconcat . map show) (IRC.msg_params m)
So when I get to the "Connected" state I still end up going through the case statement even though it's only really needed to initialize the connection. The other problem is that adding nested StateT's would be very painful.
Other way would be to replace mapM with something custom to only process lines until we are connected and then start another loop over the rest. This would require either keeping track of what's left in the list or invoking SSL.lazyRead once again (which is not too bad).
Another solution is to keep the remaining lines list in the state and draw lines when needed similar to getLine.
What's the better thing to do in this case? Would Haskell's laziness make it so that we go directly to Connected case after state stops updating or is case always strict?
You can use the Pipe type from pipes. The trick is that instead of creating a state machine and a transition function you can encode the the state implicitly in the control flow of the Pipe.
Here is what the Pipe would look like:
stateful :: Pipe ByteString ByteString IO r
stateful = do
msg <- await
if (IRC.msg_command msg == "376")
then do
liftIO $ putStrLn "connected!"
liftIO $ privmsg ssl "NickServ" ("identify " ++ nick_password)
yield msg
nick
else stateful
nick :: Pipe ByteString ByteString IO r
nick = do
msg <- await
if identified msg
then do
liftIO $ putStrLn "identified!"
liftIO $ joinChannel ssl chan
yield msg
cat -- Forward the remaining input to output indefinitely
else nick
The stateful pipe corresponds to the stateful part of your processMessage function. It handles initialization and authentication, but defers further message processing to downstream stages by re-yielding the msg.
You can then loop over every message this Pipe yields by using for:
processMessage :: Consumer ByteString IO r
processMessage = for stateful $ \msg -> do
liftIO $ print m
when (IRC.msg_command m == "PING") $ (liftIO . pong . mconcat . map show) (IRC.msg_params m)
Now all you need is a source of ByteString lines to feed to processMessage. You can use the following Producer:
lines :: Producer ByteString IO ()
lines = do
bs <- liftIO (ByteString.getLine)
if ByteString.null bs
then return ()
else do
yield bs
lines
Then you can connect lines to processMessage and run them:
runEffect (lines >-> processMessage) :: IO ()
Note that the lines Producer does not use lazy IO. It will work even if you use the strict ByteString module, but the behavior of the entire program will still be lazy.
If you want to learn more about how pipes works, you can read the pipes tutorial.

Using TChan with Timeout

I have a TChan as input for a thread which should behave like this:
If sombody writes to the TChan within a specific time, the content should be retrieved. If there is nothing written within the specified time, it should unblock and continue with Nothing.
My attempt on this was to use the timeout function from System.Timeout like this:
timeout 1000000 $ atomically $ readTChan pktChannel
This seemed to work but now I discovered, that I am sometimes loosing packets (they are written to the channel, but not read on the other side. In the log I get this:
2014.063.11.53.43.588365 Pushing Recorded Packet: 2 1439
2014.063.11.53.43.592319 Run into timeout
2014.063.11.53.44.593396 Run into timeout
2014.063.11.53.44.593553 Pushing Recorded Packet: 3 1439
2014.063.11.53.44.597177 Sending Recorded Packet: 3 1439
Where "Pushing Recorded Packet" is the writing from the one thread and "Sending Recorded Packet" is the reading from the TChan in the sender thread. The line with Sending Recorded Packet 2 1439 is missing, which would indicate a successful read from the TChan.
It seems that if the timeout is received at the wrong point in time, the channel looses the packet. I suspect that the threadKill function used inside timeout and STM don't play well together.
Is this correct? Does somebody have another solution that does not loose the packet?
Use registerDelay, an STM function, to signal a TVar when the timeout is reached. You can then use the orElse function or the Alternative operator <|> to select between the next TChan value or the timeout.
import Control.Applicative
import Control.Monad
import Control.Concurrent
import Control.Concurrent.STM
import System.Random
-- write random values after a random delay
packetWriter :: Int -> TChan Int -> IO ()
packetWriter maxDelay chan = do
let xs = randomRs (10000 :: Int, maxDelay + 50000) (mkStdGen 24036583)
forM_ xs $ \ x -> do
threadDelay x
atomically $ writeTChan chan x
-- block (retry) until the delay TVar is set to True
fini :: TVar Bool -> STM ()
fini = check <=< readTVar
-- Read the next value from a TChan or timeout
readTChanTimeout :: Int -> TChan a -> IO (Maybe a)
readTChanTimeout timeoutAfter pktChannel = do
delay <- registerDelay timeoutAfter
atomically $
Just <$> readTChan pktChannel
<|> Nothing <$ fini delay
-- | Print packets until a timeout is reached
readLoop :: Show a => Int -> TChan a -> IO ()
readLoop timeoutAfter pktChannel = do
res <- readTChanTimeout timeoutAfter pktChannel
case res of
Nothing -> putStrLn "timeout"
Just val -> do
putStrLn $ "packet: " ++ show val
readLoop timeoutAfter pktChannel
main :: IO ()
main = do
let timeoutAfter = 1000000
-- spin up a packet writer simulation
pktChannel <- newTChanIO
tid <- forkIO $ packetWriter timeoutAfter pktChannel
readLoop timeoutAfter pktChannel
killThread tid
The thumb rule of concurrency is: if adding a sleep in some point inside an IO action matters, your program is not safe.
To understand why the code timeout 1000000 $ atomically $ readTChan pktChannel does not work, consider the following alternative implementation of atomically:
atomically' :: STM a -> IO a
atomically' action = do
result <- atomically action
threadDelay someTimeAmount
return result
The above is equal to atomically, but for an extra innocent delay. Now it is easy to see that if timeout kills the thread during the threadDelay, the atomic action has completed (consuming a message from the channel), yet timeout will return Nothing.
A simple fix to timeout n $ atomically ... could be the following
smartTimeout :: Int -> STM a -> IO (Maybe a)
smartTimeout n action = do
v <- atomically $ newEmptyTMvar
_ <- timeout n $ atomically $ do
result <- action
putTMvar v result
atomically $ tryTakeTMvar v
The above uses an extra transactional variable v to do the trick. The result value of the action is stored into v inside the same atomic block in which the action is run. The return value of timeout is not trusted, since it does not tell us if action was run or not. After that, we check the TMVar v, which will be full if and only if action was run.
Instead of TChan a, use TChan (Maybe a) . Your normal producer (of x) now writes Just x. Fork an extra "ticking" process that writes Nothing to the channel (every x seconds). Then have a reader for the channel, and abort if you get two successive Nothing. This way, you avoid exceptions, which may cause data to get lost in your case (but I am not sure).

theoretical deadlock in Control.Concurrent.Chan readChan

Browsing the source of readChan one finds the following implementation and comment, starting with version 4.6 of base:
-- |Read the next value from the 'Chan'.
readChan :: Chan a -> IO a
readChan (Chan readVar _) = do
modifyMVarMasked readVar $ \read_end -> do -- Note [modifyMVarMasked]
(ChItem val new_read_end) <- readMVar read_end
-- Use readMVar here, not takeMVar,
-- else dupChan doesn't work
return (new_read_end, val)
-- Note [modifyMVarMasked]
-- This prevents a theoretical deadlock if an asynchronous exception
-- happens during the readMVar while the MVar is empty. In that case
-- the read_end MVar will be left empty, and subsequent readers will
-- deadlock. Using modifyMVarMasked prevents this. The deadlock can
-- be reproduced, but only by expanding readMVar and inserting an
-- artificial yield between its takeMVar and putMVar operations.
Prior to base version 4.6, modifyMVar was used rather than modifyMVarMasked.
I don't understand what theoretical problem is solved for here. The last sentence states there is a problem if the thread yields between the takeMVar and putMVar that comprise readMVar. But as readMVar executes under mask_, how can an async exception prevent the put after successful take?
Any help understanding the issue here is appreciated.
Let's compare the source of modifyMVar and modifyMVarMasked, since the code changed from using one to using the other:
modifyMVar m io =
mask $ \restore -> do
a <- takeMVar m
(a',b) <- restore (io a) `onException` putMVar m a
putMVar m a'
return b
modifyMVarMasked m io =
mask_ $ do
a <- takeMVar m
(a',b) <- io a `onException` putMVar m a
putMVar m a'
return b
The key here is that modifyMVar calls restore before executing its second argument, whereas modifyMVarMasked does not. So readMVar was not called under mask_ in the old version of the code as you claim in your question! It was called under restore, instead, and therefore asynchronous exceptions could be enabled after all.
Here's me working through it.
So in readMVar...
readMVar :: MVar a -> IO a
readMVar m =
mask_ $ do
a <- takeMVar m
putMVar m a
return a
...despite the mask_ the runtime may raise an exception in a blocked takeMVar. Note in that function there's no need to actually handle that case; either the readMVar worked, in which case we're safe from async exceptions, or the takeMVar never succeeds; either way we never break the MVar by leaving it empty. (Is this correct? This is what I took away from the answer to my own related question.)
modifyMVar and modifyMVarMasked are:
modifyMVar :: MVar a -> (a -> IO (a,b)) -> IO b
modifyMVar m io =
mask $ \restore -> do
a <- takeMVar m
(a',b) <- restore (io a) `onException` putMVar m a
putMVar m a'
return b
modifyMVarMasked :: MVar a -> (a -> IO (a,b)) -> IO b
modifyMVarMasked m io =
mask_ $ do
a <- takeMVar m
(a',b) <- io a `onException` putMVar m a
putMVar m a'
return b
...where the difference is in modifyMVar the masking state is restored (i.e. async exceptions probably become unmasked) in io a, which in our case is more or less readMVar.
EDIT: Although readMVar is mask_-ed as well, so now I can't see why either choice of modifyMVarMasked or modifyMVar would make a difference...
The comment seems to imply that yield (inserted into readMVar) is interruptible (I can't find this documented anywhere) and so an async exception might be raised, in which case readVar would be restored (in both current and pre-4.6 versions), but in a non-empty queue readers would see an empty one and block.
You may be interested in reading the GHC trac on this commit, which has a sample program that consistently reproduces this bug when both Control.Concurrent.Chan and the test program are compiled -O0
https://ghc.haskell.org/trac/ghc/ticket/6153
In a similar vein:
https://ghc.haskell.org/trac/ghc/ticket/5870

How to use System.IO.Unsafe together with TVars?

I would like to call a UDP send function within an STM transaction so that I can avoid code like below where m' is read (and could be updated by an other thread) before the values are eventually sent & where two consecutive where clauses make me look quite "helpless".
sendRecv s newmsgs q m = do
m' <- atomically $ readTVar m
time <- getPOSIXTime
result <- appendMsg newmsgs key m
when (result > 0) (atomically $ do
mT <- readTVar m
qT <- readTVar q
--let Just messages = Map.lookup key mT in sendq s (B.pack $ unwords messages) "192.168.1.1" 4711
let mT' = Map.delete key mT
qT' = PSQ.delete key qT
writeTVar q (PSQ.insert key time qT')
writeTVar m (Map.insert key [newmsgs] mT'))
when (result > 0) (let Just messages = Map.lookup key m' in sendq s (B.pack $ unwords messages) "192.168.1.1" 4711)
sendq :: Socket -> B.ByteString -> String -> PortNumber -> IO ()
sendq s datastring host port = do
hostAddr <- inet_addr host
sendAllTo s datastring (SockAddrInet port hostAddr)
return ()
I thought that by invoking TVars with newTVarIO and using import System.IO.Unsafe I could eventually use unsafePerformIO somewhere and call my sendq function (that returns IO() ) from within the transaction.
However, I do not find where this "somewhere" is? Is it at the creation of the TVar? Is it instead of atomically $ do? Do I understand the sense an applicability of unsafePerformIO wrong?
IO cannot be done from inside an STM block, because general IO cannot be undone. If you want to do some IO, you must schedule it in the STM block, but do it outside. For example:
foo tvar = do
scheduledAction <- atomically $ do
v <- readTVar tvar
when v retry
return (sendSomethingOnASocket "okay, we're done here")
scheduledAction
If you really need to do IO within a transaction, there's unsafeIOToSTM :: IO a -> STM a, however you should make sure to read the documentation first, as there are several gotchas to be aware of. In particular, the IO action may be run several times if the transaction has to be retried.
That said, I don't think that is appropriate in this case, and you should probably refactor your code so that the message is sent outside the transaction.

Resources