Long polling in Yesod

Long polling in Yesod - haskell

Can I do long polling in Yesod, or any other Haskell web framework with comparable database facilities?
To be precise, I want to delay a HTTP response until something interesting happens. There should also be a timeout after which the client will be served a response saying "nothing happened" and then the client will issue the same request.
To make life even more complicated, the app I have in mind is serving all its stuff over both HTTP/HTML5 and a really compact UDP protocol to MIDP clients. Events from either protocol can release responses in either protocol.
TIA,
Adrian.

I can't answer all the issues of the more complicated UDP stuff, but the short answer is that, yes, Yesod supports long polling. You can essentially do something like:
myHandler = do
mres <- timeout timeoutInMicroseconds someAction
case mres of
Nothing -> return nothingHappenedResponse
Just res -> doSomething res
You'll probably want to used System.Timeout.Lifted from the lifted-base package.

Michael's answer hits the timeout requirement. For general clients you do not want to keep HTTP responses waiting for more than about 60 seconds as they may be connecting through a proxy or similar which tend to get impatient after about that long. If you're on a more tightly controlled network then you may be able to relax this timeout. One minor correction is that the parameter to timeout is in microseconds not nanoseconds.
For the 'wait for something interesting to happen' part, we use the check combinator from Control.Concurrent.STM (which wraps up retry) so our handler thread waits on a TVar:
someAction = do
interestingStuff <- atomically $ do
currentStuff <- readTVar theStuff
check $ isInteresting currentStuff
return currentStuff
respondWith interestingStuff
Meanwhile, other threads (incl HTTP handlers) are updating theStuff :: TVar Stuff - each update triggers a new calculation of isInteresting and potentially a response if it returns True.
This is compatible with serving the same information over UDP: simply share theStuff between your UDP server threads and the Yesod threads.

Related

Haskell req package: how to use withReqManager to reuse connection for many requests to the same host?

I would like to reuse a HTTPS connection for many requests to keep latency as low as possible for every request to the same host. Using the haskell package req, if I send too many requests in a short period of time then it seems I might be reaching a connection limit for the Manager and a new connection is started, showing higher latency. I can't change the connection limit in the req package for the Manager to fix this. Instead the package documentation recommends to use withReqManager to reuse connections. But i cant comprehend how to use this function. Could someone explain to me how to use it to always reuse a connection for an explicit series of requests, please?
Another example where the connection is not reused is when too much time passes between requests to the same host. When i delay a request for 5 seconds the connection is reused but if i delay for 60 seconds it is not reused. If someone could provide an example using withReqManger to reuse the connection for every time i run ttst I'd really appreciate it.
lttest :: IO ()
lttest = do ttst
threadDelay 5000000
ttst
threadDelay 60000000
ttst
where ttst = do metm <- getCurrentTime
runReq defaultHttpConfig { httpConfigCheckResponse = \_ _ _ -> Nothing } $ do
v <- req GET (https "ifconfig.me") (NoReqBody) lbsResponse mempty
liftIO $ print (responseBody v :: Data.ByteString.Lazy.ByteString)
metm2 <- getCurrentTime
print (diffUTCTime metm2 metm)
Edit: I think i may have found a way to ensure that the connection is reused, but it requires a small hack and not using the req package. As #WillemVanOnsem commented, the server will likely close the connection after some time. So I have to send a dummy request every few seconds to the same host to keep the connection alive. But then I still need to find a way to keep the connection alive from having many requests being sent over a small period. The wreq package has a module called Network.Wreq.Session. This module allows you to initialize a Session and do all of your requests on the same connection by passing the same Session to each request. So far this seems to be working. An important note is that your dummy request or any other request for that matter using the same Session should not occur at the same time. If they ever do occur at the same time, the connection wont be reused that time.

Why should buffering not be used in the following example?

I was reading this tutorial:
http://www.catonmat.net/blog/simple-haskell-tcp-server/
To learn the basics of Haskell's Network module. He wrote a small function called sockHandler:
sockHandler :: Socket -> IO ()
sockHandler sock = do
(handle, _, _) <- accept sock
hSetBuffering handle NoBuffering
forkIO $ commandProcessor handle
sockHandler sock
That accepts a connection, and forks it to a new thread. While breaking down the code, he says:
"Next we use hSetBuffering to change buffering mode for the client's socket handle to NoBuffering, so we didn't have buffering surprises."
But doesn't elaborate on that point. What surprises is he talking about? I Google'd it, and saw a few security articles (Which I'm guessing is related to the cache being intercepted), but nothing seemingly related to the content of this tutorial.
What is the issue? I thought about it, but I don't think I have enough networking experience to fill in the blanks.
Thank you.

For the sake of illustration, suppose the protocol allows the server to query the client for some information, e.g. (silly example follows)
hPutStr sock "Please choose between A or B"
choice <- hGetLine sock
case decode choice of
Just A -> handleA
Just B -> handleB
Nothing -> protocolError
Everything looks fine... but the server seems to hang. Why? This is because the message was not really sent over the network by hPutStr, but merely inserted in a local buffer. Hence, the other end never receives the query, so does not reply, causing the server to get stuck in its read.
A solution here would be to insert an hFlush sock before reading. This has to be manually inserted at the "right" points, and is prone to error. A lazier option would be to disable buffering entirely -- this is safer, albeit it severely impacts performance.

Design choice of Haskell data types in multithreaded programs

In a multiple threaded server application, I use the type Client to represent a client. The nature of Client is quite mutable: clients send UDP heartbeat messages to keep registered with the server, the message may also contain some realtime data (think of a sensor). I need to keep track of many things such as the timestamp and source address of the last heartbeat, the realtime data, etc. The result is a pretty big structure with many states. Each client has a client ID, and I use a HashMap wrapped in an MVar to store the clients, so lookup is easy and fast.
type ID = ByteString
type ClientMap = MVar (HashMap ID Client)
There's a "global" value of ClientMap which is made available to each thread. It's stored in a ReaderT transformer along with many other global values.
The Client by itself is a big immutable structure, using strict fields to prevent from space leaks:
data Client = Client
{
_c_id :: !ID
, _c_timestamp :: !POSIXTime
, _c_addr :: !SockAddr
, _c_load :: !Int
...
}
makeLenses ''Client
Using immutable data structures in a mutable wrapper in a common design pattern in Concurrent Haskell, according to Parallel and Concurrent Programming in Haskell. When a heartbeat message is received, the thread that processes the message would construct a new Client, lock the MVar of the HashMap, insert the Client into the HashMap, and put the new HashMap in the MVar. The code is basically:
modifyMVar hashmap_mvar (\hm ->
let c = Client id ...
in return $! M.insert id c hm)
This approach works fine, but as the number of clients grows (we now have tens of thousands of clients), several problems emerge:
The client sends heartbeat messages pretty frequently (around every 30 seconds), resulting in access contention of the ClientMap.
Memory consumption of the program seems to be quite high. My understanding is that, updating large immutable structures wrapped in MVar frequently will make the garbage collector very busy.
Now, to reduce the contention of the global hashmap_mvar, I tried to wrap the mutable fields of Client in an MVar for each client, such as:
data ClientState = ClientState
{
_c_timestamp :: !POSIXTime
, _c_addr :: !SockAddr
, _c_load :: !Int
...
}
makeLenses ''ClientState
data Client = Client
{
c_id :: !ID
, c_state :: MVar CameraState
}
This seems to reduce the level of contention (because now I only need to update the MVar in each Client, the grain is finer), but the memory footprint of the program is still high. I've also tried to UNPACK some of the fields, but that didn't help.
Any suggestions? Will STM solve the contention problem? Should I resort to mutable data structures other than immutable ones wrapped in MVar?
See also Updating a Big State Fast in Haskell.
Edit:
As Nikita Volkov pointed out, a shared map smells like bad design in a typical TCP-based server-client application. However, in my case, the system is UDP based, meaning there's no such thing as a "connection". The server uses a single thread to receive UDP messages from all the clients, parses them and performs actions accordingly, e.g., updating the client data. Another thread reads the map periodically, checks the timestamp of heartbeats, and deletes those who have not sent heartbeats in the last 5 minutes, say. Seems like a shared map is inevitable? Anyway I understand that using UDP was a poor design choice in the first place, but I would still like to know how can I improve my situation with UDP.

First of all, why exactly do you need that shared map at all? Do you really need to share the private states of clients with anything? If not (which is the typical case with a client-server application), then you can simply get around without any shared map.
In fact, there is a "remotion" library, which encompasses all the client-server communication, allowing you to create services simply by extending it with your custom protocol. You should take a look.
Secondly, using multiple MVars over fields of some entity is always potentially a race condition bug. You should use STM, when you need to update multiple things atomically. I'm not sure if that's the case in your app, nonetheless you should be aware of that.
Thirdly,
The client sends heartbeat messages pretty frequently (around every 30 seconds), resulting in access contention of the ClientMap
Seems like just a job for Map of the recently released "stm-containers" library. See this blog post for introduction to the library. You'll be able to get back to the immutable Client model with this.

Do my (beginner) understanding of blocking and non blocking io is correct?

Right now I do a lot of research about concurrency and parallelism. Could you tell me if I understand correctly (on os level):
Blocking io:
When I explicitly wait for connection (ie. in Ruby:)
conn = socket.accept
So my thread is blocked until I get something to socket, right?
(And I understand that I am pooling socket in some loop in accept for data, right?)
Non blocking:
I have thread that is asking from time to time all registered fd (filedescriptors) if they have something I need. But there is also 'dont call us, we will call you' rule, but how it is working on ios level (on libraries like eventmachine or node it is done by callbacks (?))
PS. I would welcome readings and presentations, like:
http://www.paperplanes.de/2011/4/25/eventmachine-how-does-it-work.html
http://www.kegel.com/c10k.html

Blocking io:
When I explicitly wait for connection (ie. in Ruby:)
conn = socket.accept
So my thread is blocked until I get something to socket, right?
Right.
(And I understand that I am pooling socket in some loop in accept for data, right?)
Wrong. You are blocked. Period. The operating system will wake you up when something relevant happens.
Non blocking:
I have thread that is asking from time to time all registered fd (filedescriptors) if they have something I need. But there is also 'dont call us, we will call you' rule, but how it is working on ios level (on libraries like eventmachine or node it is done by callbacks (?))
What you have just described including the callbacks is 'asynchronous' I/O.
Non-blocking I/O just means that the calls don't block, so e.g. if you call read() and there is no data already there, nothing happens. When to call the calls is up to you but it is assisted by select()/poll()/epoll(), which block until various events have occurred on the socket(s).

Automatically reconnect a Haskell Network connection in an idiomatic way

I've worked my way through Don Stewart's Roll your own IRC bot tutorial, and am playing around with some extensions to it. My current code is essentially the same as the "The monadic, stateful, exception-handling bot in all its glory"; it's a bit too long to paste here unless someone requests it.
Being a Comcast subscriber, it's particularly important that the bot be able to reconnect after periods of poor connectivity. My approach is to simply time the PING requests from the server, and if it goes without seeing a PING for a certain time, to try reconnecting.
So far, the best solution I've found is to wrap the hGetLine in the listen loop with System.Timeout.timeout. However, this seems to require defining a custom exception so that the catch in main can call main again, rather than return (). It also seems quite fragile to specify a timeout value for each individual hGetLine.
Is there a better solution, perhaps something that wraps an IO a like bracket and catch so that the entire main can handle network timeouts without the overhead of a new exception type?

How about running a separate thread that performs all the reading and writing and takes care of periodically reconnecting the handle?
Something like this
input :: Chan Char
output :: Chan Char
putChar c = writeChan output c
keepAlive = forever $ do
h <- connectToServer
catch
(forever $
do c <- readChan output; timeout 4000 (hPutChar h c); return ())
(\_ -> return ())
The idea is to encapsulate all the difficulty with periodically reconnecting into a separate thread.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string