Node.js design approach. Server polling periodically from clients - node.js

I'm trying to learn Node.js and adequate design approaches.
I've implemented a little API server (using express) that fetches a set of data from several remote sites, according to client requests that use the API.
This process can take some time (several fecth / await), so I want the user to know how is his request doing. I've read about socket.io / websockets but maybe that's somewhat an overkill solution for this case.
So what I did is:
For each client request, a requestID is generated and returned to the client.
With that ID, the client can query the API (via another endpoint) to know his request status at any time.
Using setTimeout() on the client page and some DOM manipulation, I can update and display the current request status every X, like a polling approach.
Although the solution works fine, even with several clients connecting concurrently, maybe there's a better solution?. Are there any caveats I'm not considering?

TL;DR The approach you're using is just fine, although it may not scale very well. Websockets are a different approach to solve the same problem, but again, may not scale very well.
You've identified what are basically the only two options for real-time (or close to it) updates on a web site:
polling the server - the client requests information periodically
using Websockets - the server can push updates to the client when something happens
There are a couple of things to consider.
How important are "real time" updates? If the user can wait several seconds (or longer), then go with polling.
What sort of load can the server handle? If load is a concern, then Websockets might be the way to go.
That last question is really the crux of the issue. If you're expecting a few or a few dozen clients to use this functionality, then either solution will work just fine.
If you're expecting thousands or more to be connecting, then polling starts to become a concern, because now we're talking about many repeated requests to the server. Of course, if the interval is longer, the load will be lower.
It is my understanding that the overhead for Websockets is lower, but still can be a concern when you're talking about large numbers of clients. Again, a lot of clients means the server is managing a lot of open connections.
The way large services handle this is to design their applications in such a way that they can be distributed over many identical servers and which server you connect to is managed by a load balancer. This is true for either polling or Websockets.

Related

Best approach to connect two nodejs REST API servers

Scenario is I have two Node applications which are providing some REST APIs, Server_A has some set of REST endpoints, and Server_B has some other set of endpoints.
We have a requirement where Server_A need some data from Server_B. We can create some REST endpoints for this but there will be some performance issues. Server_A will create http connection each time to Server_B.
We can use Websockets but I am not much sure if it would be good approach or not.
In all cases Server_A will be calling Server_B and Server_B will return data instantly.
Server_B will be doing most of the database operations, Server_A has calculations only. Server_A will call Server_B for some data requirements.
In Addition, there will be only one socket connection that is between Server_A and Server_B, all other clients will connect via REST only.
Could anyone suggest if it would be correct approach?
Or you have some better idea.
It would be helpful if I get some code references, modules suggestions.
Thanks
What you are asking about is premature optimization. You are attempting to optimize before you even know you have a problem.
HTTP connections are pretty darn fast. There are databases that work using an HTTP API and those databases are consulted on every HTTP request of the server. So, an HTTP API that is used frequently can work just fine.
What you need to do is to implement your server A using the regular HTTP requests to server B that are already supported. Then, test your system at load and see how it performs. Chances are pretty good that the real bottleneck won't have anything to do with the fact that you're using HTTP requests between server A and server B and if you want to improve the performance of your system, you will probably be working on different problems. This is why you don't want to do premature optimization.
The more moving parts in a system, the less likely you have any idea where the actual bottlenecks are when you put the system under load. That's why you have to test the system under load, instrument it like crazy so you can see where the performance is being impacted the most and then measure like crazy. Then, and only then, will you know where it makes sense to invest your development resources to improve your scalablity or performance.
FYI, a webSocket connection has some advantages over repeated HTTP connections (less connection overhead per request), but also some disadvantages (it's not request/response so you have invent your own way to match a response with a given request).

Should I use a REST API, or Socket.io for a Geolocation App?

I need to track moving cars.
Should I post the location every time the location changes, and send it over the socket?
Or should make a REST API and post the location (from the tracked device) and check it (with the tracker device) every 10 seconds, regardless if the location changed or not?
(The App is being made with React Native)
Building HTTP requests by frequent updates requires more resources then sending messages through websocket. Keeping websocket connections open by a lot of users requires more resources than using HTTP. In my opinion the answer depends on the user count, the update frequency, whether you apply the REST constraints (no server side session) and which version of HTTP you use (HTTP2 is more efficient than HTTP1.1 as far as I know). I don't think this is something we can tell you without measurements.
The same is true if you want to push data from the server to the client. If you do it frequently and the update must be almost immediate, then websocket is probably a better choice than polling. If you do the rarely and the delay (polling frequency) can be a few minutes, then polling might be better.
Note that I am not an expert of load scaling, this is just a layman's logic.
I would use WebSockets. For small deployments and low-frequency updates basically anything works, but with WebSockets you have technology that scales better in the long term. (And no, I would not consider this premature optimization, since the choice of technology here does not mean unnecessary initial overhead.)
Shameless plug: If you're using WebSocket, you could take a look at Crossbar.io - http://crossbar.io, or WAMP (http://wamp-proto.org) in general, which provides messaging mechanisms on top of WebSocket and should work well for you use case. I work for the company which is at the core of this, but it's open source software.

What is the best way to communicate between two servers?

I am building a web app which has two parts. In one part it uses a real time connection between the server and the client and in the other part it does some cpu intensive task to provide relevant data.
Implementing the real time communication in nodejs and the cpu intensive part in python/java. What is the best way the nodejs server can participate in a duplex communication with the other server ?
For a basic solution you can use Socket.IO if you are already using it and know how it works, it will get the job done since it allows for communication between a client and server where the client can be a different server in a different language.
If you want a more robust solution with additional options and controls or which can handle higher traffic throughput (though this shouldn't be an issue if you are ultimately just sending it through the relatively slow internet) you can look at something like ØMQ (ZeroMQ). It is a messaging queue which gives you more control and lots of different communications methods beyond just request-response.
When you set either up I would recommend using your CPU intensive server as the stable end(server) and your web server(s) as your client. Assuming that you are using a single server for your CPU intensive tasks and you are running several NodeJS server instances to take advantage of multi-cores for your web server. This simplifies your communication since you want to have a single point to connect to.
If you foresee needing multiple CPU servers you will want to setup a routing server that can route between multiple web servers and multiple CPU servers and in this case I would recommend the extra work of learning ØMQ.
You can use http.request method provided to make curl request within node's code.
http.request method is also used for implementing Authentication api.
You can put your callback in the success of request and when you get the response data in node, you can send it back to user.
While in backgrount java/python server can utilize node's request for CPU intensive task.
I maintain a node.js application that intercommunicates among 34 tasks spread across 2 servers.
In your case, for communication between the web server and the app server you might consider mqtt.
I use mqtt for this kind of communication. There are mqtt clients for most languages, including node/javascript, python and java. In my case I publish json messages using mqtt 'topics' and any task that has registered to subscribe to a 'topic' receives it's data when published. If you google "pub sub", "mqtt" and "mosquitto" you'll find lots of references and examples. Mosquitto (now an Eclipse project) is only one of a number of mqtt brokers that are available. Another very good broker that is written in Java is called hivemq.
This is a very simple, reliable solution that scales well. In my case literally millions of messages reliably pass through mqtt every day.
You must be looking for socketio
Socket.IO enables real-time bidirectional event-based communication.
It works on every platform, browser or device, focusing equally on reliability and speed.
Sockets have traditionally been the solution around which most
realtime systems are architected, providing a bi-directional
communication channel between a client and a server.

Socket.io vs AJAX Use cases

Background: I am building a web app using NodeJS + Express. Most of the communication between client and server is REST (GET and POST) calls. I would typically use AJAX XMLHttpRequest like mentioned in https://developers.google.com/appengine/articles/rpc. And I don't seem to understand how to make my RESTful service being used for Socket.io as well.
My questions are
What scenarios should I use Socket.io over AJAX RPC?
Is there a straight forward way to make them work together. At least for Expressjs style REST.
Do I have real benefits of using socket.io(if websockets are used -- TCP layer) on non real time web applications. Like a tinyurl site (where users post queries and server responds and forgets).
Also I was thinking a tricky but nonsense idea. What if I use RESTful for requests from clients and close connection from server side and do socket.emit().
Thanks in advance.
Your primary problem is that WebSockets are not request/response oriented like HTTP is. You mention REST and HTTP interchangeably, keep in mind that REST is a methodology behind designing and modeling your HTTP routes.
Your questions,
1. Socket.io would be a good scenario when you don't require a request/response format. For instance if you were building a multiplayer game in which whoever could click on more buttons won, you would send the server each click from each user, not needing a response back from the server that it registered each click. As long as the WebSocket connection is open, you can assume the message is making it to the server. Another use case is when you need a server to contact a client sporadically. An analytics page would be a good use case for WebSockets as there is no uniform pattern as to when data needs to be at the client, it could happen at anytime.
The WebSocket connection is an HTTP GET request with a special header requesting the server to upgrade it to a WebSocket connection. Distinguishing different events and message on the WebSocket connection is up to your application logic and likely won't match REST style URIs and methods (otherwise you are replication HTTP request/reply in a sense).
No.
Not sure what you mean on the last bit.
I'll just explain more about when you want to use Socket.IO and leave the in-depth explanation to Tj there.
Generally you will choose Socket.IO when performance and/or latency is a major concern and you have a site that involves users polling for data often. AJAX or long-polling is by far easier to implement, however, it can have serious performance problems in high load situations. By high-load, I mean like Facebook. Imagine millions of people loading their feed, and every minute each user is asking the server for new data. That could require some serious hardware and software to make that work well. With Socket.IO, each user could instead connect and just indefinitely wait for new data from the server as it arrives, resulting in far less overall server traffic.
Also, if you have a real-time application, Socket.IO would allow for a much better user experience while maintaining a reasonable server load. A common example is a chat room. You really don't want to have to constantly poll the server for new messages. It would be much better for the server to broadcast new messages as they are received. Although you can do it with long-polling, it can be pretty expensive in terms of server resources.

Node.js App with API endpoints which take 20sec+ :: Connections Left Open :: How to Optimize?

I have a Node.js RESTful API returning JSON data. One of the API calls can (and frequently does) take 10 - 20 seconds to finish. This long RTT is due to connecting to external APIs, like DiffBot, MailChimp, Facebook, Twitter, etc. I wish I could make the API call shorter, but I cannot.
Of course, I've implemented the node code in a nice async way, but the problem is that the client's inbound connection (to the node app) is alive while it waits for the server to finish, and thus might be killing my performance. In fact, I'm currently guessing that this may explain my long-running timeout issue in node.
I've already increased maxSockets to a huge number...
require('http').globalAgent.maxSockets = 9999;
For the sake of interest, I'm printing out the active sockets each time a new connection is made (here's the code).
Which gives me output like this:
SOCKETS: {} { 'graph.facebook.com:443': 5, 'api.instagram.com:443': 1 }
Nothing too enlightening there. The max connections I ever see is around 20 or so, total, across all hosts. But this doesn't really tell me anything about incoming connections, or how to optimize them so that my server does not choke when there are many of them alive at once (which I suspect it is).
You should optimize your architecture, not just the code.
First, I would change the way the client/server interact with each other. The server should end the request upon recept and notify the client once all the tasks for that request are truly complete.
There are different ways to achieve that. For example, the client can query the stats of the request using AJAX (poll) every X seconds. Another example would be to use WebSocket.
If you're going with this approach, look into Socket.IO. It supports many transports with the same API, if WebSocket is available, it would use that, otherwise, it would fall back to other transports such as Flash Socket, long-polling, etc.
Second, you shouldn't use one process to do all this work. You should use a queue (preferably a messaging system that supports queues), then, run workers (separate processes) to do the "heavy lifting".
Personally, I use AMQP due to its features and portability (it's an open-standard) but feel free to use any other queue system with a persistant backend.
That way, if one or more process(es) crash(es) and you use the right queue, you wouldn't lose any data (such as the API tasks you mentioned).
Hope it helps.

Resources