What is the optimum polling duration limit for Socket.io?

What is the optimum polling duration limit for Socket.io? - node.js

I'm using Socket.io on a project and would use XHR polling, but I have a limit of 6 concurrent connections. Therefore, after opening 5 tabs, Socket.io starts to hang.
If I set the polling duration to 0 seconds (20 is the default), the limit no longer affects the application but Firebug shows that there's a request every second.
If I use a 0 second limit, how this affect my server and users?

When you set a duration, you are using XHR long-polling. This duration instructs the server on how long to keep a HTTP request open when it does not have any data to send. If the server does have data to send, the data is sent instantly and the connection is closed. The client then creates a new connection and the cycle continues.
When you set the duration to zero, you are effectively telling the server to use short-polling, which if the client asks the server for data, the server will instantly respond with an empty response, or with the data.
The influences that short-polling will have on the client and server are that the client will not receive messages instantly as long-polling would allow, but it consumes less resources because the HTTP request is not kept open. This also means that you probably won't hit your concurrent connection limit, because the connections end immediately.

Related

Sending a response after jobs have finished processing in Express

So, I have Express server that accepts a request. The request is web scraping that takes 3-4 minute to finish. I'm using Bull to queue the jobs and processing it as and when it is ready. The challenge is to send this results from processed jobs back as response. Is there any way I can achieve this? I'm running the app on heroku, but heroku has a request timeout of 30sec.

You don’t have to wait until the back end finished do the request identified who is requesting . Authenticate the user. Do a res.status(202).send({message:”text});
Even though the response was sended to the client you can keep processing and stuff
NOTE: Do not put a return keyword before res.status...
The HyperText Transfer Protocol (HTTP) 202 Accepted response status code indicates that the request has been accepted for processing, but the processing has not been completed; in fact, processing may not have started yet. The request might or might not eventually be acted upon, as it might be disallowed when processing actually takes place.
202 is non-committal, meaning that there is no way for the HTTP to later send an asynchronous response indicating the outcome of processing the request. It is intended for cases where another process or server handles the request, or for batch processing.

You always need to send response immediately due to timeout. Since your process takes about 3-4 minutes, it is better to send a response immediately mentioning that the request was successfully received and will be processed.
Now, when the task is completed, you can use socket.io or web sockets to notify the client from the server side. You can also pass a response.
The client side also can check continuously if the job was completed on the server side, this is called polling and is required with older browsers which don't support web sockets. socket.io falls back to polling when browsers don't support web sockets.
Visit socket.io for more information and documentation.

Best approach to this problem is socket.io library. It can send data to client send whenever you want. It triggers a function on client side which receives the data. Socket.io supports different languages and it is really ease to use.
website link
Documentation Link

create a jobs table in a database or persistant storage like redis
save each job in the table upon request with a unique id
update status to running on starting the job
sent HTTP 202 - Accepted
At the client implement a polling script, At the server implement a job status route/api. The api accept a job id and queries the job table and respond with the status
When the job is finished update the job table with status completed, when the jon is errored updated the job table with status failed and maybe a description column to store the cause for error
This solution makes your system horizontaly scalable and distributed. It also prevents the consequences of unexpected connection drops. Polling interval depends on average job completion duration. I would recommend an average interval of 5 second
This can be even improved to store job completion progress in the jobs table so that the client can even display a progress bar

->Request time out occurs when your connection is idle, different servers implement in a different way so timeout time differs
1)The solution for this timeout problem would be to make your connections open(constant), that is the connection between client and servers should remain constant.
So for such scenarios use WebSockets, which ensures that after the initial request and response handshake between client and server the connection stays open.
there are many libraries to implement realtime connection.Eg Pubnub,socket.io. This is the same technology used for live streaming.
Node js can handle many concurrent connections and its lightweight too, won't use many resources too.

HTTP/1.1 client: How to decide good keep-alive timeout default?

I’m writing a HTTP/1.1 client that will be used against a variety of servers.
How can I decide a reasonable default keep-alive timeout value, as in, how long the client should keep an unused connection open before closing? Any value I think of seems extremely arbitrary.

First note that that with HTTP keep alive both client and server can close an idle connection (i.e. no outstanding response, no unfinished request) at any time. This means especially that the client cannot make the server keep the connection open by enforcing some timeout, all what a client-side timeout does is limit how long the client will try to keep the connection open. The server might close the connection even before this client-side timeout is reached.
Based on this there is no generic good value for the timeout but there actually does not need to be one. The timeout is essentially used to limit resources, i.e. how much idle connections will be open at the same time. If your specific use case will never visit the same site again anyway then using HTTP keep-alive would just be a waste of resources. If instead you don't know your specific usage pattern you could just place a limit on the number of open connections, i.e. close the longest unused connection if the limit is reached and a new connection is needed. It might make sense to have some upper limit timeout of 10..15 minutes anyway since usually after this time firewalls and NAT routers in between will have abandoned the connection state so the idle connection will no longer work for new requests anyway.
But in any case you also need to be sure that you detect if the server closes a connection and then discard this connection from the list of reusable connections. And if you use HTTP keep-alive you also need to be aware that the server might close the connection in the very moment you are trying to send a new request on an existing connection, i.e. you need to retry this request then on a newly created connection.

maximum reasonable timeout for a synchronous HTTP request

This applies to non-user facing backend applications communicating with each other through HTTP. I'm wondering if there is a guideline for a maximum timeout for a synchronous HTTP request. For example, let's say a request can take up to 10 minutes to complete. Can I simply create a worker thread on the client and, in the worker thread, invoke the request synchronously? Or should I implement the request asynchronously, to return HTTP 202 Accepted and spin off a worker thread on the server side to complete the request and figure out a way to send the results back, presumable through a messaging framework?
One of my concerns is it safe to keep an socket open for an extended period of time?

How long a socket connection can remain open (without activity) depends on the (quality of the) network infrastructure.
A client HTTP request waiting for an answer from a server results in an open socket connection without any data going through that connection for a while. A proxy server might decide to close such inactive connections after 5 minutes. Similarly, a firewall can decide to close connections that are open for more than 30 minutes, active or not.
But since you are in the backend, these cases can be tested (just let the server thread handling the request sleep for a certain time before giving an answer). Once it is verified that socket connections are not closed by different network components, it is safe to rely on socket connections to remain open. Keep in mind though that network cables can be unplugged and servers can crash - you will always need a strategy to handle disruptions.
As for synchronous and asynchronous: both are feasable and both have advantages and disadvantages. But what is right for you depends on a whole lot more than just the reliability of socket connections.

Will using Socket.io instead of normal ajax calls prevent a server from running out of TCP sockets?

I'm trying to set up a server that can handle a high sustained amount of simultaneous requests. I found that at a certain point, the server won't be able to recycle "old" TCP connections quickly enough to accommodate extreme amounts of requests.
Do websockets eliminate or decrease the amount of tcp connections that a server needs to handle, and are they a good alternative to "normal" requests?

Websockets are persistent connections so it really depends on what you're talking about. The way socket.io uses XHR is different from a typical ajax call in that it hangs onto the request for as long as possible before sending a response. It's a technique called long-polling and It's trying to simulate a persistent connection by never letting go of the request. When the request is about to timeout it sends a response and a new request is initiated immediately which it hangs onto yet again, and the cycle continues.
So I guess if you're getting flooded with connections because of ajax calls then that's probably because your client code is polling the server at some sort of interval. This means that even idle clients will be hitting your server with fury because of this polling. If that's the case then yes, socket.io will reduce your number of connections because it tries to hang onto one single connection per client for as long as possible.
These days I recommend socket.io over doing plain ajax requests. Socket.io is designed to be performant with whatever transport it settles on. The way it gracefully degrades based on what connection is possible is great and means your server will be overloaded as little as possible while still reaching as wide an audience as it can.

Advantage/disadvantage of using socketio heartbeats

Socket.io allows you to use heartbeats to "check the health of Socket.IO connections." What exactly are heartbeats and why should or shouldn't I use them?

A heartbeat is a small message sent from a client to a server (or from a server to a client and back to the server) at periodic intervals to confirm that the client is still around and active.
For example, if you have a Node.js app serving a chat room, and a user doesn't say anything for many minutes, there's no way to tell if they're really still connected. By sending a hearbeat at a predetermined interval (say, every 15 seconds), the client informs the server that it's still there. If it's been e.g. 20 seconds since the server's gotten a heartbeat from a client, it's likely been disconnected.
This is necessary because you cannot be guaranteed a clean connection termination over TCP--if a client crashes, or something else happens, you won't receive the termination packets from the client, and the server won't know that the client has disconnected. Furthermore, Socket.IO supports various other mechanisms (other than TCP sockets) to transfer data, and in these cases the client won't (or can't) send a termination message to the server.
By default, a Socket.IO client will send a heartbeat to the server every 15 seconds (heartbeat interval), and if the server hasn't heard from the client in 20 seconds (heartbeat timeout) it will consider the client disconnected.
I can't think of many average use cases where you probably wouldn't want to use heartbeats.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string