Implement main server loop in Haskell? - haskell

What is the generally accepted way to implement the main loop of a server that needs to wait on a heterogeneous set of events? That is the server should wait (not busywait) until one of the following occurs:
new socket connection
data available on an existing socket
OS signal
third-party library callbacks

I think you're thinking in terms of a C paradigm with a single thread, nonblocking I/O, and a select() call.
You can manage to write something like that in Haskell, but Haskell has much more to offer:
lightweight threads
safe and efficient concurrent data primitives like Mvar and Chan
the Big Gun: Software Transactional Memory
I recommend you fork a new thread for every separate point of contact with the outside world, and keep everything coordinated with STM.

Use takeMVar and putMVar to synchronize between threads. They generally block the thread if operation is not permitted.
Read ghc docs.

I'd like to make it clear I think the two solutions posted first are better than this one for the specific problem you have, but here's a way to solve the type of problem you presented.
A simple way round this is to take your definitions like
data SocketConn = ....
data DataAvail = ...
data OSSignal = ...
data Callback = ...
and define the unsimplified version of
data ServerEvent = Sok SocketConn | Dat DataAvail | Sig OSSignal | Call Callback
handleEvent :: ServerEvent -> IO ()
handleEvent (Soc s) = ....
handleEvent (Dat d) = ....
handleEvent (Sig o) = ....
handleEvent (Call c) = ....
Like I say, read up on the other answers!

Software Transactional Memory (STM) is the main way to do a multi-way wait.
However, by the looks of things, in your case you probably just want to spawn a seperate Haskell thread for each task, and let each such thread block while there's nothing happening.
You wouldn't want to create a thousand OS threads, but a thousand Haskell threads is no trouble at all.
(If these threads need to coordinate from time to time, then again, STM is probably the simplest, most reliable way to do that.)

Related

Concurrency considerations between pipes and non-pipes code

I'm in the process of wrapping a C library for some encoding in a pipes interface, but I've hit upon some design decisions that need to be made.
After the C library is set up, we hold on to an encoder context. With this, we can either encode, or change some parameters (let's call the Haskell interface to this last function tune :: Context -> Int -> IO ()). There are two parts to my question:
The encoding part is easily wrapped up in a Pipe Foo Bar IO (), but I would also like to expose tune. Since simultaneous use of the encoding context must be lock protected, I would need to take a lock at every iteration in the pipe, and protect tune with taking the same lock. But now I feel I'm forcing hidden locks on the user. Am I barking up the wrong tree here? How is this kind of situation normally resolved in the pipes ecosystem? In my case I expect the pipe that my specific code is part of to always run in its own thread, with tuning happening concurrently, but I don't want to force this point of view upon any users. Other packages in the pipes ecosystem do not seem to force their users like either.
An encoding context that is no longer used needs to be properly de-initialized. How does one, in the pipes ecosystem, ensure that such things (in this case performing som IO actions) are taken care of when the pipe is destroyed?
A concrete example would be wrapping a compression library, in which case the above can be:
The compression strength is tunable. We set up the pipe and it runs along merrily. How should one best go about allowing the compression strength setting to be changed while the pipe keeps running, assuming that concurrent access to the compression codec context must be serialized?
The compression library allocated a bunch of memory off the Haskell heap when set up, and we'll need to call some library function to clean this up when the pipe is torn down.
Thanks… this might all be obvious, but I'm quite new to the pipes ecosystem.
Edit: Reading this after posting, I'm quite sure it's the vaguest question I've ever asked here. Ugh! Sorry ;-)
Regarding (1), the general solution is to change your Pipe's type to:
Pipe (Either (Context, Int) Foo) Bar IO ()
In other words, it accepts both Foo inputs and tune requests, which it processes internally.
So let's then assume that you have two concurrent Producers corresponding to inputs and tune requests:
producer1 :: Producer Foo IO ()
producer2 :: Producer (Context, Int) IO ()
You can use pipes-concurrency to create a buffer that they both feed into, like this:
example = do
(output, input) <- spawn Unbounded
-- input :: Input (Either (Context, Int) Foo)
-- output :: Output (Either (Context, Int) Foo)
let io1 = runEffect $ producer1 >-> Pipes.Prelude.map Right >-> toOutput output
io2 = runEffect $ producer2 >-> Pipes.Prelude.map Left >-> toOutput output
as <- mapM async [io1, io2]
runEffect (fromInput >-> yourPipe >-> someConsumer)
mapM_ wait as
You can learn more about the pipes-concurrency library by reading this tutorial.
By forcing all tune requests to go through the same single-threaded Pipe you can ensure that you don't accidentally have two concurrent invocations of the tune function.
Regarding (2) there are two ways you can acquire a resource using pipes. The more sophisticated approach is to use the pipes-safe library, which provides a bracket function that you can use within a Pipe, but that is probably overkill for your purpose and only exists for acquiring and releasing multiple resources over the lifetime of a pipe. A simpler solution is just to use the following with idiom to acquire the pipe:
withEncoder :: (Pipe Foo Bar IO () -> IO r) -> IO r
withEncoder k = bracket acquire release $ \resource -> do
k (createPipeFromResource resource)
Then a user would just write:
withEncoder $ \yourPipe -> do
runEffect (someProducer >-> yourPipe >-> someConsumer)
You can optionally use the managed package, which simplifies the types a bit and makes it easier to acquire multiple resources. You can learn more about it from reading this blog post of mine.

Design choice of Haskell data types in multithreaded programs

In a multiple threaded server application, I use the type Client to represent a client. The nature of Client is quite mutable: clients send UDP heartbeat messages to keep registered with the server, the message may also contain some realtime data (think of a sensor). I need to keep track of many things such as the timestamp and source address of the last heartbeat, the realtime data, etc. The result is a pretty big structure with many states. Each client has a client ID, and I use a HashMap wrapped in an MVar to store the clients, so lookup is easy and fast.
type ID = ByteString
type ClientMap = MVar (HashMap ID Client)
There's a "global" value of ClientMap which is made available to each thread. It's stored in a ReaderT transformer along with many other global values.
The Client by itself is a big immutable structure, using strict fields to prevent from space leaks:
data Client = Client
{
_c_id :: !ID
, _c_timestamp :: !POSIXTime
, _c_addr :: !SockAddr
, _c_load :: !Int
...
}
makeLenses ''Client
Using immutable data structures in a mutable wrapper in a common design pattern in Concurrent Haskell, according to Parallel and Concurrent Programming in Haskell. When a heartbeat message is received, the thread that processes the message would construct a new Client, lock the MVar of the HashMap, insert the Client into the HashMap, and put the new HashMap in the MVar. The code is basically:
modifyMVar hashmap_mvar (\hm ->
let c = Client id ...
in return $! M.insert id c hm)
This approach works fine, but as the number of clients grows (we now have tens of thousands of clients), several problems emerge:
The client sends heartbeat messages pretty frequently (around every 30 seconds), resulting in access contention of the ClientMap.
Memory consumption of the program seems to be quite high. My understanding is that, updating large immutable structures wrapped in MVar frequently will make the garbage collector very busy.
Now, to reduce the contention of the global hashmap_mvar, I tried to wrap the mutable fields of Client in an MVar for each client, such as:
data ClientState = ClientState
{
_c_timestamp :: !POSIXTime
, _c_addr :: !SockAddr
, _c_load :: !Int
...
}
makeLenses ''ClientState
data Client = Client
{
c_id :: !ID
, c_state :: MVar CameraState
}
This seems to reduce the level of contention (because now I only need to update the MVar in each Client, the grain is finer), but the memory footprint of the program is still high. I've also tried to UNPACK some of the fields, but that didn't help.
Any suggestions? Will STM solve the contention problem? Should I resort to mutable data structures other than immutable ones wrapped in MVar?
See also Updating a Big State Fast in Haskell.
Edit:
As Nikita Volkov pointed out, a shared map smells like bad design in a typical TCP-based server-client application. However, in my case, the system is UDP based, meaning there's no such thing as a "connection". The server uses a single thread to receive UDP messages from all the clients, parses them and performs actions accordingly, e.g., updating the client data. Another thread reads the map periodically, checks the timestamp of heartbeats, and deletes those who have not sent heartbeats in the last 5 minutes, say. Seems like a shared map is inevitable? Anyway I understand that using UDP was a poor design choice in the first place, but I would still like to know how can I improve my situation with UDP.
First of all, why exactly do you need that shared map at all? Do you really need to share the private states of clients with anything? If not (which is the typical case with a client-server application), then you can simply get around without any shared map.
In fact, there is a "remotion" library, which encompasses all the client-server communication, allowing you to create services simply by extending it with your custom protocol. You should take a look.
Secondly, using multiple MVars over fields of some entity is always potentially a race condition bug. You should use STM, when you need to update multiple things atomically. I'm not sure if that's the case in your app, nonetheless you should be aware of that.
Thirdly,
The client sends heartbeat messages pretty frequently (around every 30 seconds), resulting in access contention of the ClientMap
Seems like just a job for Map of the recently released "stm-containers" library. See this blog post for introduction to the library. You'll be able to get back to the immutable Client model with this.

Is the random number generator in Haskell thread-safe?

Is the same "global random number generator" shared across all threads, or does each thread get its own?
If one is shared, how can I ensure thread-safety? The approach using getStdGen and setStdGen described in the "Monads" chapter of Real World Haskell doesn't look safe.
If each thread has an independent generator, will the generators for two threads started in rapid succession have different seeds? (They won't, for example, if the seed is a time in seconds, but milliseconds might be OK. I don't see how to get a time with millisecond resolution from Data.Time.)
There is a function named newStdGen, which gives one a new std. gen every time it's called. Its implementation uses atomicModifyIORef and thus is thread-safe.
newStdGen is better than get/setStdGen not only in terms of thread-safety, but it also guards you from potential single-threaded bugs like this: let rnd = (fst . randomR (1,5)) <$> getStdGen in (==) <$> rnd <*> rnd.
In addition, if you think about the semantics of newStdGen vs getStdGen/setStdGen, the first ones can be very simple: you just get a new std. gen in a random state, chosen non-deterministically. On the other hand, with the get/set pair you can't abstract away the global program state, which is bad for multiple reasons.
I would suggest you to use getStdGen only once (in the main thread) and then use the split function to generate new generators. I would do it like this:
Make an MVar that contains the generator. Whenever a thread needs a new generator, it takes the current value out of the MVar, calls split and puts the new generator back. Due to the functionality of an MVar, this should be threadsafe.
By itself, getStdGen and setStdGen are not thread safe in a certain sense. Suppose the two threads both perform this action:
do ...
g <- getStdGen
(v, g') <- someRandOperation g
setStdGen g'
It is possible for the threads to both run the g <- getStdGen line before the other thread reaches setStdGen, therefore they both could get the exact same generator. (Am I wrong?)
If they both grab the same version of the generator, and use it in the same function, they will get the same "random" result. So you do need to be a little more careful when dealing with random number generation and multithreading. There are many solutions; one that comes to mind is to have a single dedicated random number generator thread that produces a stream of random numbers which other threads could consume in a thread-safe way. Putting the generator in an MVar, as FUZxxl suggests, is probably the simplest and most straightforward solution.
Of course I would encourage you to inspect your code and make sure it is necessary to generate random numbers in more than one thread.
You can use split as in FUZxxl's answer. However, instead of using an MVar, whenever you call forkIO, just have your IO action for the forked thread close over one of the resulting generators, and leave the other one with the original thread. This way each thread has its own generator.
As Dan Burton said, do inspect your code and see if you really need RNG in multiple threads.

A way to form a 'select' on MVars without polling

I have two MVars (well an MVar and a Chan). I need to pull things out of the Chan and process them until the other MVar is not empty any more. My ideal solution would be something like the UNIX select function where I pass in a list of (presumably empty) MVars and the thread blocks until one of them is full, then it returns the full MVar. Try as I might I can think of no way of doing this beyond repeatedly polling each MVar with isEmptyMVar until I get false. This seems inefficient.
A different thought was to use throwTo, but it interrupts what ever is happening in the thread and I need to complete processing a job out the the Chan in an atomic fashion.
A final thought as I'm typing is to create a new forkIO for each MVar which tries to read its MVar then fill a newly created MVar with its own instance. The original thread can then block on that MVar. Are Haskell threads cheap enough to go running that many?
Haskell threads are very cheap, so you could solve it that way, but it sounds like STM would be a better fit for your problem. With STM you can do
do var <- atomically (takeTMVar a `orElse` takeTMVar b)
... do stuff with var
Because of the behavior of retry and orElse, this code tries to get a, then if that fails, get b. If both fail, it blocks until either of them is updated and tries again.
You could even use this to make your own rudimentary version of select:
select :: [TMVar a] -> STM a
select = foldr1 orElse . map takeTMVar
How about using STM versions, TChan and TVar, with the retry and orElse behavior?
Implementing select is one of STM's nice capabilities. From "Composable Memory Transactions":
Beyond this, we also provide orElse,
which allows them to be composed as alternatives, so that
the second is run if the first retries (Section 3.4). This ability allows threads to wait for many things at once, like the
Unix select system call – except that orElse composes well,
whereas select does not.
orElse in RWH.
The STM package
Papers on Haskell's STM

Automatically reconnect a Haskell Network connection in an idiomatic way

I've worked my way through Don Stewart's Roll your own IRC bot tutorial, and am playing around with some extensions to it. My current code is essentially the same as the "The monadic, stateful, exception-handling bot in all its glory"; it's a bit too long to paste here unless someone requests it.
Being a Comcast subscriber, it's particularly important that the bot be able to reconnect after periods of poor connectivity. My approach is to simply time the PING requests from the server, and if it goes without seeing a PING for a certain time, to try reconnecting.
So far, the best solution I've found is to wrap the hGetLine in the listen loop with System.Timeout.timeout. However, this seems to require defining a custom exception so that the catch in main can call main again, rather than return (). It also seems quite fragile to specify a timeout value for each individual hGetLine.
Is there a better solution, perhaps something that wraps an IO a like bracket and catch so that the entire main can handle network timeouts without the overhead of a new exception type?
How about running a separate thread that performs all the reading and writing and takes care of periodically reconnecting the handle?
Something like this
input :: Chan Char
output :: Chan Char
putChar c = writeChan output c
keepAlive = forever $ do
h <- connectToServer
catch
(forever $
do c <- readChan output; timeout 4000 (hPutChar h c); return ())
(\_ -> return ())
The idea is to encapsulate all the difficulty with periodically reconnecting into a separate thread.

Resources