Global state in multithreaded server - haskell

i'm implementing simple http server
and i want my responses depend on some global state. For example if i get request 'get_settings' from the same client for the first time i will send large settings json, and for the second time i will just send 'Not-modified' http response.
Something like that
import Network.Simple.TCP
main = withSocketsDo $ do
let settings_state = 0 -- flag for settings response
serve (Host "127.0.0.1") "23980" $ \(conn_sock, remote_addr) -> do
putStrLn $ "TCP connection established from " ++ show remote_addr
(Just inp) <- recv conn_sock 1024
send conn_sock (process inp settings_state)
process :: B.ByteString -> Int -> B.ByteString
process inp flag
| flag == 0 = ... -- return full response and change global flag
| otherwise = ... -- return 'Not-modified'
And the question is how can i implement it? And i would like to do it as simple as possible, manually, without any Monad Transformers and so on. Let the code be ugly, but simple.
Thanks

Since changing the flag clearly has some side effects, the result of process will be in IO:
process :: B.ByteString -> Int -> IO B.ByteString
Since you don't want to use any kind of monad transformer, you need to exchange the Int with some mutable reference. Yes, you've read correctly: There are several types that are mutable, such as IORef, MVar, TVar, MVector, STRef …. To stay simple, lets stick to IORef.
process :: B.ByteString -> IORef Int -> IO B.ByteString
process inp flag = do
oldFlag <- readIORef flag
if oldFlag == 0
then do modifyIORef' flag (+1)
return bigJSONObject
else return notModified
Note that you didn't provide any logic for the flag, so I simply increased the value, but you probably want to do something else (or change the flag to IORef Bool). Note that you also want to use atomicModifyIORef' if you want to use the IORef safely in a multithreaded program:
oldFlag <- atomicModifyIORef' flag (\o -> (o+1,o))
Either way, you need to create the IORef with newIORef value, so your code snippets becomes something like
main = withSocketsDo $ do
settings_state <- newIORef 0
serve (Host "127.0.0.1") "23980" $ \(conn_sock, remote_addr) -> do
-- ...

Related

Can one store a value indexed by a string in the IO monad?

I want to use custom ManagerSettings to build a custom Network.Wreq.Session. I modify from the defaultManagerSettings both managerRawConnection and managerModifyRequest.
I want managerModifyRequest to use a configuration value known at runtime from a file. As I do a lot of requests, I would rather not make a lot of syscalls to get the configuration value from the file.
I find the type managerModifyRequest :: Request -> IO Request to be problematic. How can I use a configuration value if it is not possible to get it from the parameters?
I thought about IORefs and MVars to store the value in memory, but I should be able to pass said IORef or MVar to the function somehow...
I am not familiar with the library, but I guess you need to write something like this:
-- pseudocode
foo = do
-- create the IORef here
r <- newIORef "hello"
let settings = defaultManagerSettings{
...
managerModifyRequest = \req -> do
-- use the IORef here
s <- readIORef r
putStrLn s
writeIORef r (s ++ "!!")
return req
}
use settings
You don't need to pass the IORef as an additional argument to managerModifyRequest , you need to define that Request -> IO Request function in the scope where the IORef is available.
Alternatively, use an helper function with the additional argument, and then partially apply it with the IORef:
-- pseudocode
foo = do
-- create the IORef here
r <- newIORef "hello"
let settings = defaultManagerSettings{
...
managerModifyRequest = myManager r
}
use settings
myManager :: IORef String -> Request -> IO Request
myManager r req = do
-- use the IORef here
s <- readIORef r
putStrLn s
writeIORef r (s ++ "!!")
return req

In Haskell, how can I abort a calculation when a web client disconnects

I have a Haskell-based web service that performs a calculation that for some input can take a really long time to finish. ("really long" here means over a minute)
Because performing that calculation takes all the CPU available on the server, I place incoming requests in a queue (well, actually a stack for reasons that have to do with the typical client, but that's besides the point) when they arrive and service them when the currently running calculation finishes.
My problem is that the clients don't always wait long enough, and sometimes time out on their end, disconnect, and try a different server (well, they try again and hit the elb, and usually get a different instance). Also, occasionally the calculation the web client was asking for will become obsolete because of external factors and the web client will be killed.
In those cases I'd really like to be able to detect that the web client has gone away before I pull the next request off the stack and start the (expensive) calculation. Unfortunately, my experience with snap leads me to believe that there's no way in that framework to ask "is the client's TCP connection still connected?" and I haven't found any documentation for other web frameworks that cover the "client disconnected" case.
So is there a Haskell web framework that makes it easy to detect whether a web client has disconnected? Or failing that, is there one that at least makes it possible?
(I understand that it may not be possible to be absolutely certain in all cases whether a TCP client is still there without sending data to the other end; however, when the client actually sends RST packets to the server and the server's framework doesn't let the application code determine that the connection is gone, that's a problem)
Incidentally, though one might suspect that warp's onClose handler would let you do this, this fires only when a response is ready and written to the client so is useless as a way of aborting a calculation in progress. There also seems to be no way to get access to the accepted socket so as to set SO_KEEPALIVE or similar. (There are ways to access the initial listening socket, but not the accepted one)
So I found an answer that works for me and it might work for someone else.
It turns out that you can in fact mess around enough with the internals of Warp to do this, but then what you're left with is a basic version of Warp and if you need things like logging, etc., will need to add other packages on to that.
Also, note that so-called "half-closed" connections (when the client closes their sending end, but is still waiting for data) will be detected as closed, interrupting your calculation. I don't know of any HTTP clients that deal in half-closed connections, but just something to be aware of.
Anyway, what I did was first copy the functions runSettings and runSettingsSocket exposed by Network.Wai.Handler.Warp and Network.Wai.Handler.Warp.Internal and made versions that called a function I supplied instead of WarpI.socketConnection, so that I have the signature:
runSettings' :: Warp.Settings -> (Socket -> IO (IO WarpI.Connection))
-> Wai.Application -> IO ()
This required copying out a few helper methods, like setSocketCloseOnExec and windowsThreadBlockHack. The double-IO signature there might look weird, but it's what you want - the outer IO is run in the main thread (that calls accept) and the inner IO is run in the per-connection thread that is forked after accept returns. The original Warp function runSettings is equivalent to:
\set -> runSettings' set (WarpI.socketConnection >=> return . return)
Then I did:
data ClientDisappeared = ClientDisappeared deriving (Show, Eq, Enum, Ord)
instance Exception ClientDisappeared
runSettingsSignalDisconnect :: Warp.Settings -> Wai.Application -> IO ()
runSettingsSignalDisconnect set =
runSettings' set (WarpI.socketConnection >=> return . wrapConn)
where
-- Fork a 'monitor' thread that does nothing but attempt to
-- perform a read from conn in a loop 1/sec, and wrap the receive
-- methods on conn so that they first consume from the stuff read
-- by the monitoring thread. If the monitoring thread sees
-- end-of-file (signaled by an empty string read), raise
-- ClientDisappered on the per-connection thread.
wrapConn conn = do
tid <- myThreadId
nxtBstr <- newEmptyMVar :: IO (MVar ByteString)
semaphore <- newMVar ()
readerCount <- newIORef (0 :: Int)
monitorThread <- forkIO (monitor tid nxtBstr semaphore readerCount)
return $ conn {
WarpI.connClose = throwTo monitorThread ClientDisappeared
>> WarpI.connClose conn
, WarpI.connRecv = newRecv nxtBstr semaphore readerCount
, WarpI.connRecvBuf = newRecvBuf nxtBstr semaphore readerCount
}
where
newRecv :: MVar ByteString -> MVar () -> IORef Int
-> IO ByteString
newRecv nxtBstr sem readerCount =
bracket_
(atomicModifyIORef' readerCount $ \x -> (succ x, ()))
(atomicModifyIORef' readerCount $ \x -> (pred x, ()))
(withMVar sem $ \_ -> do w <- tryTakeMVar nxtBstr
case w of
Just w' -> return w'
Nothing -> WarpI.connRecv conn
)
newRecvBuf :: MVar ByteString -> MVar () -> IORef Int
-> WarpI.Buffer -> WarpI.BufSize -> IO Bool
newRecvBuf nxtBstr sem readerCount buf bufSize =
bracket_
(atomicModifyIORef' readerCount $ \x -> (succ x, ()))
(atomicModifyIORef' readerCount $ \x -> (pred x, ()))
(withMVar sem $ \_ -> do
(fulfilled, buf', bufSize') <-
if bufSize == 0 then return (False, buf, bufSize)
else
do w <- tryTakeMVar nxtBstr
case w of
Nothing -> return (False, buf, bufSize)
Just w' -> do
let wlen = B.length w'
if wlen > bufSize
then do BU.unsafeUseAsCString w' $ \cw' ->
copyBytes buf (castPtr cw') bufSize
putMVar nxtBstr (B.drop bufSize w')
return (True, buf, 0)
else do BU.unsafeUseAsCString w' $ \cw' ->
copyBytes buf (castPtr cw') wlen
return (wlen == bufSize, plusPtr buf wlen,
bufSize - wlen)
if fulfilled then return True
else WarpI.connRecvBuf conn buf' bufSize'
)
dropClientDisappeared :: ClientDisappeared -> IO ()
dropClientDisappeared _ = return ()
monitor tid nxtBstr sem st =
catch (monitor' tid nxtBstr sem st) dropClientDisappeared
monitor' tid nxtBstr sem st = do
(hitEOF, readerCount) <- withMVar sem $ \_ -> do
w <- tryTakeMVar nxtBstr
case w of
-- No one picked up our bytestring from last time
Just w' -> putMVar nxtBstr w' >> return (False, 0)
Nothing -> do
w <- WarpI.connRecv conn
putMVar nxtBstr w
readerCount <- readIORef st
return (B.null w, readerCount)
if hitEOF && (readerCount == 0)
-- Don't signal if main thread is also trying to read -
-- in that case, main thread will see EOF directly
then throwTo tid ClientDisappeared
else do threadDelay oneSecondInMicros
monitor' tid nxtBstr sem st
oneSecondInMicros = 1000000
Assuming that 'web service' means HTTP(S)-based clients, one option is to use a RESTful approach. Instead of assuming that clients are going to stay connected, the service could accept the request and return 202 Accepted. As the HTTP status code specification outlines:
The request has been accepted for processing, but the processing has not been completed [...]
The 202 response is intentionally non-committal. Its purpose is to allow a server to accept a request for some other process (perhaps a batch-oriented process that is only run once per day) without requiring that the user agent's connection to the server persist until the process is completed. The entity returned with this response SHOULD include an indication of the request's current status and either a pointer to a status monitor or some estimate of when the user can expect the request to be fulfilled.
The server immediately responds with a 202 Accepted response, and also includes a URL that the client can use to poll for status. One option is to put this URL in the response's Location header, but you can also put the URL in a link in the response's body.
The client can poll the status URL for status. Once the calculation finishes, the status resource can provide a link to the finished result.
You can add cache headers to the status resource and final result if you're concerned that the clients will be polling too hard.
REST in Practice outlines the general concepts, while the RESTful Web Services Cookbook has lots of good details.
I'm not saying that you can't do something with either HTTP or TCP/IP (I don't know), but if you can't, then the above is a tried-and-true solution to similar problems.
Obviously, this is completely independent on programming language, but it's been my experience that REST and algebraic data types go well together.

Using TChan with Timeout

I have a TChan as input for a thread which should behave like this:
If sombody writes to the TChan within a specific time, the content should be retrieved. If there is nothing written within the specified time, it should unblock and continue with Nothing.
My attempt on this was to use the timeout function from System.Timeout like this:
timeout 1000000 $ atomically $ readTChan pktChannel
This seemed to work but now I discovered, that I am sometimes loosing packets (they are written to the channel, but not read on the other side. In the log I get this:
2014.063.11.53.43.588365 Pushing Recorded Packet: 2 1439
2014.063.11.53.43.592319 Run into timeout
2014.063.11.53.44.593396 Run into timeout
2014.063.11.53.44.593553 Pushing Recorded Packet: 3 1439
2014.063.11.53.44.597177 Sending Recorded Packet: 3 1439
Where "Pushing Recorded Packet" is the writing from the one thread and "Sending Recorded Packet" is the reading from the TChan in the sender thread. The line with Sending Recorded Packet 2 1439 is missing, which would indicate a successful read from the TChan.
It seems that if the timeout is received at the wrong point in time, the channel looses the packet. I suspect that the threadKill function used inside timeout and STM don't play well together.
Is this correct? Does somebody have another solution that does not loose the packet?
Use registerDelay, an STM function, to signal a TVar when the timeout is reached. You can then use the orElse function or the Alternative operator <|> to select between the next TChan value or the timeout.
import Control.Applicative
import Control.Monad
import Control.Concurrent
import Control.Concurrent.STM
import System.Random
-- write random values after a random delay
packetWriter :: Int -> TChan Int -> IO ()
packetWriter maxDelay chan = do
let xs = randomRs (10000 :: Int, maxDelay + 50000) (mkStdGen 24036583)
forM_ xs $ \ x -> do
threadDelay x
atomically $ writeTChan chan x
-- block (retry) until the delay TVar is set to True
fini :: TVar Bool -> STM ()
fini = check <=< readTVar
-- Read the next value from a TChan or timeout
readTChanTimeout :: Int -> TChan a -> IO (Maybe a)
readTChanTimeout timeoutAfter pktChannel = do
delay <- registerDelay timeoutAfter
atomically $
Just <$> readTChan pktChannel
<|> Nothing <$ fini delay
-- | Print packets until a timeout is reached
readLoop :: Show a => Int -> TChan a -> IO ()
readLoop timeoutAfter pktChannel = do
res <- readTChanTimeout timeoutAfter pktChannel
case res of
Nothing -> putStrLn "timeout"
Just val -> do
putStrLn $ "packet: " ++ show val
readLoop timeoutAfter pktChannel
main :: IO ()
main = do
let timeoutAfter = 1000000
-- spin up a packet writer simulation
pktChannel <- newTChanIO
tid <- forkIO $ packetWriter timeoutAfter pktChannel
readLoop timeoutAfter pktChannel
killThread tid
The thumb rule of concurrency is: if adding a sleep in some point inside an IO action matters, your program is not safe.
To understand why the code timeout 1000000 $ atomically $ readTChan pktChannel does not work, consider the following alternative implementation of atomically:
atomically' :: STM a -> IO a
atomically' action = do
result <- atomically action
threadDelay someTimeAmount
return result
The above is equal to atomically, but for an extra innocent delay. Now it is easy to see that if timeout kills the thread during the threadDelay, the atomic action has completed (consuming a message from the channel), yet timeout will return Nothing.
A simple fix to timeout n $ atomically ... could be the following
smartTimeout :: Int -> STM a -> IO (Maybe a)
smartTimeout n action = do
v <- atomically $ newEmptyTMvar
_ <- timeout n $ atomically $ do
result <- action
putTMvar v result
atomically $ tryTakeTMvar v
The above uses an extra transactional variable v to do the trick. The result value of the action is stored into v inside the same atomic block in which the action is run. The return value of timeout is not trusted, since it does not tell us if action was run or not. After that, we check the TMVar v, which will be full if and only if action was run.
Instead of TChan a, use TChan (Maybe a) . Your normal producer (of x) now writes Just x. Fork an extra "ticking" process that writes Nothing to the channel (every x seconds). Then have a reader for the channel, and abort if you get two successive Nothing. This way, you avoid exceptions, which may cause data to get lost in your case (but I am not sure).

Haskell - Actor based mutability

I'm working on a haskell network application and I use the actor pattern to manage multithreading. One thing I came across is how to store for example a set of client sockets/handles. Which of course must be accessible for all threads and can change when clients log on/off.
Since I'm coming from the imperative world I thought about some kind of lock-mechanism but when I noticed how ugly this is I thought about "pure" mutability, well actually it's kind of pure:
import Control.Concurrent
import Control.Monad
import Network
import System.IO
import Data.List
import Data.Maybe
import System.Environment
import Control.Exception
newStorage :: (Eq a, Show a) => IO (Chan (String, Maybe (Chan [a]), Maybe a))
newStorage = do
q <- newChan
forkIO $ storage [] q
return q
newHandleStorage :: IO (Chan (String, Maybe (Chan [Handle]), Maybe Handle))
newHandleStorage = newStorage
storage :: (Eq a, Show a) => [a] -> Chan (String, Maybe (Chan [a]), Maybe a) -> IO ()
storage s q = do
let loop = (`storage` q)
(req, reply, d) <- readChan q
print ("processing " ++ show(d))
case req of
"add" -> loop ((fromJust d) : s)
"remove" -> loop (delete (fromJust d) s)
"get" -> do
writeChan (fromJust reply) s
loop s
store s d = writeChan s ("add", Nothing, Just d)
unstore s d = writeChan s ("remove", Nothing, Just d)
request s = do
chan <- newChan
writeChan s ("get", Just chan, Nothing)
readChan chan
The point is that a thread (actor) is managing a list of items and modifies the list according to incoming requests. Since thread are really cheap I thought this could be a really nice functional alternative.
Of course this is just a prototype (a quick dirty proof of concept).
So my question is:
Is this a "good" way of managing shared mutable variables (in the actor world) ?
Is there already a library for this pattern ? (I already searched but I found nothing)
Regards,
Chris
Here is a quick and dirty example using stm and pipes-network. This will set up a simple server that allows clients to connect and increment or decrement a counter. It will display a very simple status bar showing the current tallies of all connected clients and will remove client tallies from the bar when they disconnect.
First I will begin with the server, and I've generously commented the code to explain how it works:
import Control.Concurrent.STM (STM, atomically)
import Control.Concurrent.STM.TVar
import qualified Data.HashMap.Strict as H
import Data.Foldable (forM_)
import Control.Concurrent (forkIO, threadDelay)
import Control.Monad (unless)
import Control.Monad.Trans.State.Strict
import qualified Data.ByteString.Char8 as B
import Control.Proxy
import Control.Proxy.TCP
import System.IO
main = do
hSetBuffering stdout NoBuffering
{- These are the internal data structures. They should be an implementation
detail and you should never expose these references to the
"business logic" part of the application. -}
-- I use nRef to keep track of creating fresh Ints (which identify users)
nRef <- newTVarIO 0 :: IO (TVar Int)
{- hMap associates every user (i.e. Int) with a counter
Notice how I've "striped" the hash map by storing STM references to the
values instead of storing the values directly. This means that I only
actually write the hashmap when adding or removing users, which reduces
contention for the hash map.
Since each user gets their own unique STM reference for their counter,
modifying counters does not cause contention with other counters or
contention with the hash map. -}
hMap <- newTVarIO H.empty :: IO (TVar (H.HashMap Int (TVar Int)))
{- The following code makes heavy use of Haskell's pure closures. Each
'let' binding closes over its current environment, which is safe since
Haskell is pure. -}
let {- 'getCounters' is the only server-facing command in our STM API. The
only permitted operation is retrieving the current set of user
counters.
'getCounters' closes over the 'hMap' reference currently in scope so
that the server never needs to be aware about our internal
implementation. -}
getCounters :: STM [Int]
getCounters = do
refs <- fmap H.elems (readTVar hMap)
mapM readTVar refs
{- 'init' is the only client-facing command in our STM API. It
initializes the client's entry in the hash map and returns two
commands: the first command is what the client calls to 'increment'
their counter and the second command is what the client calls to log
off and delete
'delete' command.
Notice that those two returned commands each close over the client's
unique STM reference so the client never needs to be aware of how
exactly 'init' is implemented under the hood. -}
init :: STM (STM (), STM ())
init = do
n <- readTVar nRef
writeTVar nRef $! n + 1
ref <- newTVar 0
modifyTVar' hMap (H.insert n ref)
let incrementRef :: STM ()
incrementRef = do
mRef <- fmap (H.lookup n) (readTVar hMap)
forM_ mRef $ \ref -> modifyTVar' ref (+ 1)
deleteRef :: STM ()
deleteRef = modifyTVar' hMap (H.delete n)
return (incrementRef, deleteRef)
{- Now for the actual program logic. Everything past this point only uses
the approved STM API (i.e. 'getCounters' and 'init'). If I wanted I
could factor the above approved STM API into a separate module to enforce
the encapsulation boundary, but I am lazy. -}
{- Fork a thread which polls the current state of the counters and displays
it to the console. There is a way to implement this without polling but
this gets the job done for now.
Most of what it is doing is just some simple tricks to reuse the same
console line instead of outputting a stream of lines. Otherwise it
would be just:
forkIO $ forever $ do
ns <- atomically getCounters
print ns
-}
forkIO $ (`evalStateT` 0) $ forever $ do
del <- get
lift $ do
putStr (replicate del '\b')
putStr (replicate del ' ' )
putStr (replicate del '\b')
ns <- lift $ atomically getCounters
let str = show ns
lift $ putStr str
put $! length str
lift $ threadDelay 10000
{- Fork a thread for each incoming connection, which listens to the client's
commands and translates them into 'STM' actions -}
serve HostAny "8080" $ \(socket, _) -> do
(increment, delete) <- atomically init
{- Right now, just do the dumb thing and convert all keypresses into
increment commands, with the exception of the 'q' key, which will
quit -}
let handler :: (Proxy p) => () -> Consumer p Char IO ()
handler () = runIdentityP loop
where
loop = do
c <- request ()
unless (c == 'q') $ do
lift $ atomically increment
loop
{- This uses my 'pipes' library. It basically is a high-level way to
say:
* Read binary packets from the socket no bigger than 4096 bytes
* Get the first character from each packet and discard the rest
* Handle the character using the above 'handler' function -}
runProxy $ socketReadS 4096 socket >-> mapD B.head >-> handler
{- The above pipeline finishes either when the socket closes or
'handler' stops looping because it received a 'q'. Either case means
that the client is done so we log them out using 'delete'. -}
atomically delete
Next up is the client, which simply opens a connections and forwards all key presses as single packets:
import Control.Monad
import Control.Proxy
import Control.Proxy.Safe
import Control.Proxy.TCP.Safe
import Data.ByteString.Char8 (pack)
import System.IO
main = do
hSetBuffering stdin NoBuffering
hSetEcho stdin False
{- Again, this uses my 'pipes' library. It basically says:
* Read characters from the console using 'commands'
* Pack them into a binary format
* send them to a server running at 127.0.0.1:8080
This finishes looping when the user types a 'q' or the connection is
closed for whatever reason.
-}
runSafeIO $ runProxy $ runEitherK $
try . commands
>-> mapD (\c -> pack [c])
>-> connectWriteD Nothing "127.0.0.1" "8080"
commands :: (Proxy p) => () -> Producer p Char IO ()
commands () = runIdentityP loop
where
loop = do
c <- lift getChar
respond c
unless (c == 'q') loop
It's pretty simple: commands generates a stream of Chars, which then get converted to ByteStrings and then sent as packets to the server.
If you run the server and a few clients and have them each type in a few keys, your server display will output a list showing how many keys each client typed:
[1,6,4]
... and if some of the clients disconnect they will be removed from the list:
[1,4]
Note that the pipes component of these examples will simplify greatly in the upcoming pipes-4.0.0 release, but the current pipes ecosystem still gets the job done as is.
First, I'd definitely recommend using your own specific data type for representing commands. When using (String, Maybe (Chan [a]), Maybe a) a buggy client can crash your actor simply by sending an unknown command or by sending ("add", Nothing, Nothing), etc. I'd suggest something like
data Command a = Add a | Remove a | Get (Chan [a])
Then you can pattern match on commands in storage in a save way.
Actors have their advantages, but also I feel that they have some drawbacks. For example, getting an answer from an actor requires sending it a command and then waiting for a reply. And the client can't be completely sure that it gets a reply and that the reply will be of some specific type - you can't say I want only answers of this type (and how many of them) for this particular command.
So as an example I'll give a simple, STM solution. It'd be better to use a hash table or a (balanced tree) set, but since Handle implements neither Ord nor Hashable, we can't use these data structures, so I'll keep using lists.
module ThreadSet (
TSet, add, remove, get
) where
import Control.Monad
import Control.Monad.STM
import Control.Concurrent.STM.TVar
import Data.List (delete)
newtype TSet a = TSet (TVar [a])
add :: (Eq a) => a -> TSet a -> STM ()
add x (TSet v) = readTVar v >>= writeTVar v . (x :)
remove :: (Eq a) => a -> TSet a -> STM ()
remove x (TSet v) = readTVar v >>= writeTVar v . delete x
get :: (Eq a) => TSet a -> STM [a]
get (TSet v) = readTVar v
This module implements a STM based set of arbitrary elements. You can have multiple such sets and use them together in a single STM transaction that succeeds or fails at once. For example
-- | Ensures that there is exactly one element `x` in the set.
add1 :: (Eq a) => a -> TSet a -> STM ()
add1 x v = remove x v >> add x v
This would be difficult with actors, you'd have to add it as another command for the actor, you can't compose it of existing actions and still have atomicity.
Update: There is an interesting article explaining why Clojure designers chose not to use actors. For example, using actors, even if you have many reads and only very little writes to a mutable structure, they're all serialized, which can greatly impact performance.

How can I refactor this without IORefs?

How could I refactor this so that eventually IORefs would not be necessary?
inc :: IORef Int -> IO ()
inc ref = modifyIORef ref (+1)
main = withSocketsDo $ do
s <- socket AF_INET Datagram defaultProtocol
c <- newIORef 0
f <- newIORef 0
hostAddr <- inet_addr host
time $ forM [0 .. 10000] $ \i -> do
sendAllTo s (B.pack "ping") (SockAddrInet port hostAddr)
(r, _) <- recvFrom s 1024
if (B.unpack r) == "PING" then (inc c) else (inc f)
c' <- readIORef c
print (c')
sClose s
return()
What's wrong with using IORefs here? You're in IO anyways with the networking operations. IORefs aren't always the cleanest solution, but they seem to do the job well in this case.
Regardless, for the sake of answering the question, let's remove the IORefs. These references serve as a way of keeping state, so we'll have to come up with an alternate way to keep the stateful information.
The pseudocode for what we want to do is this:
open the connection
10000 times:
send a message
receive the response
(keep track of how many responses are the message "PING")
print how many responses were the message "PING"
The chunk that is indented under 1000 times can be abstracted into its own function. If we are to avoid IORefs, then this function will have to take in a previous state and produce a next state.
main = withSocketsDo $ do
s <- socket AF_INET Datagram defaultProtocol
hostAddr <- inet_addr host
let sendMsg = sendAllTo s (B.pack "ping") (SockAddrInet port hostAddr)
recvMsg = fst `fmap` recvFrom s 1024
(c,f) <- ???
print c
sClose s
So the question is this: what do we put at the ??? place? We need to define some way to "perform" an IO action, take its result, and modify state with that result somehow. We also need to know how many times to do it.
performRepeatedlyWithState :: a -- some state
-> IO b -- some IO action that yields a value
-> (a -> b -> a) -- some way to produce a new state
-> Int -- how many times to do it
-> IO a -- the resultant state, as an IO action
performRepeatedlyWithState s _ _ 0 = return s
performRepeatedlyWithState someState someAction produceNewState timesToDoIt = do
actionresult <- someAction
let newState = produceNewState someState actionResult
doWithState newState someAction produceNewState (pred timesToDoIt)
All I did here was write down the type signature that matched what I said above, and produced the relatively obvious implementation. I gave everything a very verbose name to hopefully make it apparent exactly what this function means. Equipped with this simple function, we just need to use it.
let origState = (0,0)
action = ???
mkNewState = ???
times = 10000
(c,f) <- performRepeatedlyWithState origState action mkNewState times
I've filled in the easy parameters here. The original state is (c,f) = (0,0), and we want to perform this 10000 times. (Or is it 10001?) But what should action and mkNewState look like? The action should have type IO b; it's some IO action that produces something.
action = sendMsg >> recvMsg
I bound sendMsg and recvMsg to expressions from your code earlier. The action we want to perform is to send a message, and then receive a message. The value this action produces is the message received.
Now, what should mkNewState look like? It should have the type a -> b -> a, where a is the type of the State, and b is the type of the action result.
mkNewState (c,f) val = if (B.unpack val) == "PING"
then (succ c, f)
else (c, succ f)
This isn't the cleanest solution, but do you get the general idea? You can replace IORefs by writing a function that recursively calls itself, passing extra parameters along in order to keep track of state. The exact same idea is embodied in the foldM solution suggested on the similar question.
Bang patterns, as Nathan Howell suggests, would be wise, to avoid building up a large thunk of succ (succ (succ ...))) in your state:
mkNewState (!c, !f) val = ...
Building on the earlier comment regarding a stack overflow.
The accumulators 'f' and 'c' in either the IORef or foldM case need to be evaluated to prevent a long chain of thunks from being allocated while you're iterating. One way of forcing evaluation of the thunks is to use a bang pattern. This tells the compiler to evaluate the value, removing the thunk, even though it's value is not demanded in the function.
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE OverloadedStrings #-}
import Control.Concurrent
import Control.Monad
import Data.ByteString.Char8
import Data.Foldable (foldlM)
import Data.IORef
import Network.Socket hiding (recvFrom)
import Network.Socket.ByteString (recvFrom, sendAllTo)
main = withSocketsDo $ do
let host = "127.0.0.1"
port= 9898
s <- socket AF_INET Datagram defaultProtocol
hostAddr <- inet_addr host
-- explicitly mark both accumulators as strict using bang patterns
let step (!c, !f) i = do
sendAllTo s "PING" (SockAddrInet port hostAddr)
(r, _) <- recvFrom s 1024
return $ case r of
-- because c and f are never used, the addition operator below
-- builds a thunk chain. these can lead to a stack overflow
-- when the chain is being evalulated by the 'print c' call below.
"PING" -> (c+1, f)
_ -> (c, f+1)
(c, f) <- foldlM step (0, 0) [0..10000]
print c
sClose s
return ()

Resources