Best approach to connect two nodejs REST API servers

Best approach to connect two nodejs REST API servers - node.js

Scenario is I have two Node applications which are providing some REST APIs, Server_A has some set of REST endpoints, and Server_B has some other set of endpoints.
We have a requirement where Server_A need some data from Server_B. We can create some REST endpoints for this but there will be some performance issues. Server_A will create http connection each time to Server_B.
We can use Websockets but I am not much sure if it would be good approach or not.
In all cases Server_A will be calling Server_B and Server_B will return data instantly.
Server_B will be doing most of the database operations, Server_A has calculations only. Server_A will call Server_B for some data requirements.
In Addition, there will be only one socket connection that is between Server_A and Server_B, all other clients will connect via REST only.
Could anyone suggest if it would be correct approach?
Or you have some better idea.
It would be helpful if I get some code references, modules suggestions.
Thanks

What you are asking about is premature optimization. You are attempting to optimize before you even know you have a problem.
HTTP connections are pretty darn fast. There are databases that work using an HTTP API and those databases are consulted on every HTTP request of the server. So, an HTTP API that is used frequently can work just fine.
What you need to do is to implement your server A using the regular HTTP requests to server B that are already supported. Then, test your system at load and see how it performs. Chances are pretty good that the real bottleneck won't have anything to do with the fact that you're using HTTP requests between server A and server B and if you want to improve the performance of your system, you will probably be working on different problems. This is why you don't want to do premature optimization.
The more moving parts in a system, the less likely you have any idea where the actual bottlenecks are when you put the system under load. That's why you have to test the system under load, instrument it like crazy so you can see where the performance is being impacted the most and then measure like crazy. Then, and only then, will you know where it makes sense to invest your development resources to improve your scalablity or performance.
FYI, a webSocket connection has some advantages over repeated HTTP connections (less connection overhead per request), but also some disadvantages (it's not request/response so you have invent your own way to match a response with a given request).

Related

Node.js design approach. Server polling periodically from clients

I'm trying to learn Node.js and adequate design approaches.
I've implemented a little API server (using express) that fetches a set of data from several remote sites, according to client requests that use the API.
This process can take some time (several fecth / await), so I want the user to know how is his request doing. I've read about socket.io / websockets but maybe that's somewhat an overkill solution for this case.
So what I did is:
For each client request, a requestID is generated and returned to the client.
With that ID, the client can query the API (via another endpoint) to know his request status at any time.
Using setTimeout() on the client page and some DOM manipulation, I can update and display the current request status every X, like a polling approach.
Although the solution works fine, even with several clients connecting concurrently, maybe there's a better solution?. Are there any caveats I'm not considering?

TL;DR The approach you're using is just fine, although it may not scale very well. Websockets are a different approach to solve the same problem, but again, may not scale very well.
You've identified what are basically the only two options for real-time (or close to it) updates on a web site:
polling the server - the client requests information periodically
using Websockets - the server can push updates to the client when something happens
There are a couple of things to consider.
How important are "real time" updates? If the user can wait several seconds (or longer), then go with polling.
What sort of load can the server handle? If load is a concern, then Websockets might be the way to go.
That last question is really the crux of the issue. If you're expecting a few or a few dozen clients to use this functionality, then either solution will work just fine.
If you're expecting thousands or more to be connecting, then polling starts to become a concern, because now we're talking about many repeated requests to the server. Of course, if the interval is longer, the load will be lower.
It is my understanding that the overhead for Websockets is lower, but still can be a concern when you're talking about large numbers of clients. Again, a lot of clients means the server is managing a lot of open connections.
The way large services handle this is to design their applications in such a way that they can be distributed over many identical servers and which server you connect to is managed by a load balancer. This is true for either polling or Websockets.

Is it a bad idea to use a web api instead of a tcp socket for master/slave communication?

TL:DR; Are there any drawbacks / pitfalls to use a RESTful API instead of a TCP/Socket connection for a master-slave pattern?
I'm writing a web application that fetches data from an 3rd party API, calculates statistics (and more), and presents it. The fetched data is stored in a database.
I'm writing this web application with NodeJS and AngularJS.
So there are 2 basic opertions the app will do:
Answer HTTP requests from the front end
Update the database. This includes: Fetching data, saving data to database, and caching results of some heavy queries every 5 minutes in a in-memory database like redis.
I want to seperate these operations in two applications. The HTTP server shall be the master. The second application is a just a slave, of which as many instances can be spawned.
The master will implement some kind of task-processor which just distributes tasks to idle slaves. The Slave is very dumb. It can report about its status (idle/busy and some details like current load etc). You can start tasks on a slave. The master will take care of queueing tasks and so on.
I guess this is common server/agent pattern.
So I've started implenting a TCP Server and Client, but it just feels like so much overhead to do this. A bit like reinventing the wheel. So I thought I just could use a HTTP server on my client which just has two endpoints like
[GET] /status
[POST] /execute/:task
Am I on the right track here?

TL;DR; There are drawbacks to rolling your own REST API for master-slave architecture.
Your server/agent pattern is commonly referred to as microservices.
Rolling your own REST API might work, but is probably not optimal. Dealing with delivery semantics (for example, at most once vs at least once), transient failures, polling, etc will likely cause a lot of pain before you get it right.
There are libraries/services to provide varying levels of convenience:
Seneca - http://senecajs.org
Pigato - http://prdn.github.io/pigato/
Kong by Mashape - http://getkong.org
Webtask by Auth0 (paid) - https://webtask.io

Is RESTful server and use of socket.io contradictory?

I am developing a node/express RESTful server with normal routes and HTTP request and response. Now I am asked to extend the server to provide near real time data to clients using socket.io or some such thing.
I feel like the real time update requires a client and serve connection management and client state management on the server and that on its own is orthogonal to the RESTful aspects of my server. For a client, to get continuous data feed from a RESTful server, the client has to "poll" for it.
Is my statement correct? If not, is there a pattern for providing the two features?

Correct, RESTful services are contrary to the stateful client/server connection needed to give real time I/O. Without knowing how much data your talking about RESTful would be very expensive in I/O.

No, there's no contradiction. Each HTTP request takes a round-trip to set-up, and many more if you support SSL (which you should), or do most forms of authentication. Socket.io lets you minimize those setups, by leaving the connection open and authenticated, ready for the next request. You could do something fancy with the 100 Continue header, following section 8.2.3 of the spec: http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html
But, just doing normal RESTful https pull requests over socket.io should show a meaningful decrease in latency, by minimizing the need to re-establish connections. Of course, implementing full server push support will lower latency even more.
The tradeoff is increased complexity (though socket.io can be implemented in Express in a few lines), and increased memory consumption by leaving sockets open.

Dealing with / preventing potentially malicious requests (AWS, Node.js)

I have a server that is running on aws - it's load balanced to some ec2 instances that run node.js servers. The security groups are set up so that only the LB can hit them on the HTTP port.
I was tailing some log files, and saw a bunch (50 or so at a time, seemingly somewhat periodically) of requests to /manager/html - AFAIK this looks like an attempt to expose a vulnerability in my app or gain access to a database manager of some sort.
My questions are:
Am I being targeted or are these random crawlers? This is on a service that is not even launched yet, so it's definitely obscure. There's been a bit of press about the service, so it's feasible that a person would be aware of our domain, but this subdomain has not been made public.
Are there common conventions for not allowing these types of requests to hit my instances? Preferably, I'd be able to configure some sort of frequency or blacklist in my LB, and never have these types of requests hit an instance. Not sure how to detect malicious vs normal traffic though.
Should I be running a local proxy on my ec2 instances to avoid this type of thing? Are there any existing node.js solutions that can just refuse the requests at the app level? Is that a bad idea?
Bonus: If I were to log the origin of these requests, would that information be useful? Should I try to go rogue and hunt down the origin and send some hurt their way? Should I beeswithmachineguns the originating IP if it's a single origin? (I realize this is silly, but may inspire some fun answers).
Right now these requests are not effecting me, they get 401s or 404s, and it has virtually no impact on other clients. But if this were to go up in scale, what are my options?

There are too many random automated requests are being made, even I host a nodejs server, they try to use cgi and phpmyadmin/wordpress configs. You can just use basic rate limiting techniques (redis-throttle)[https://npmjs.org/package/redis-throttle] for both your NodeJS server and ssh fail2ban to protect yourself from simple DoS attacks.
Automatic requests cannot do harm unless NodeJS or the libraries you have as well known flaws, so you should be always input & security checking all over your server. You should not be worried if you coded well. (Don't dump errors to users, sanitize input etc.)
You can log your 401 and 404s for a week, and filter the most common ones via your LB. Hunting down the IPs and sources will not help you if you are not a hollywood producer or fighting terrorists, as yoır problem is not so imporant and most importantly these requests are mostly from botnets.

We had faced similar issues in the past and we had taken some preventive measures to stop such attacks though it can't guarantee to stop them completely but it showed significant measures in the reduction of such attacks.
http://uksysadmin.wordpress.com/2011/03/21/protecting-ssh-against-brute-force-attacks/
http://www.prolexic.com/knowledge-center-white-paper-ddos-mitigation-incident-response-plan-playbook.html
https://serverfault.com/questions/340307/how-can-i-prevent-a-ddos-attack-on-amazon-ec2
Hope this helps.

Consider running a proxy cache like Varnish in front of your app servers. Use it's VCL to allow access to only the URI you define and reject everything else, allow GET but block PUT and POST, etc... Can also be used to filter http response headers you return. This would let you mask your node.js server as apache for example. Many tuts out on the net to implement this.

Socket.io vs AJAX Use cases

Background: I am building a web app using NodeJS + Express. Most of the communication between client and server is REST (GET and POST) calls. I would typically use AJAX XMLHttpRequest like mentioned in https://developers.google.com/appengine/articles/rpc. And I don't seem to understand how to make my RESTful service being used for Socket.io as well.
My questions are
What scenarios should I use Socket.io over AJAX RPC?
Is there a straight forward way to make them work together. At least for Expressjs style REST.
Do I have real benefits of using socket.io(if websockets are used -- TCP layer) on non real time web applications. Like a tinyurl site (where users post queries and server responds and forgets).
Also I was thinking a tricky but nonsense idea. What if I use RESTful for requests from clients and close connection from server side and do socket.emit().
Thanks in advance.

Your primary problem is that WebSockets are not request/response oriented like HTTP is. You mention REST and HTTP interchangeably, keep in mind that REST is a methodology behind designing and modeling your HTTP routes.
Your questions,
1. Socket.io would be a good scenario when you don't require a request/response format. For instance if you were building a multiplayer game in which whoever could click on more buttons won, you would send the server each click from each user, not needing a response back from the server that it registered each click. As long as the WebSocket connection is open, you can assume the message is making it to the server. Another use case is when you need a server to contact a client sporadically. An analytics page would be a good use case for WebSockets as there is no uniform pattern as to when data needs to be at the client, it could happen at anytime.
The WebSocket connection is an HTTP GET request with a special header requesting the server to upgrade it to a WebSocket connection. Distinguishing different events and message on the WebSocket connection is up to your application logic and likely won't match REST style URIs and methods (otherwise you are replication HTTP request/reply in a sense).
No.
Not sure what you mean on the last bit.

I'll just explain more about when you want to use Socket.IO and leave the in-depth explanation to Tj there.
Generally you will choose Socket.IO when performance and/or latency is a major concern and you have a site that involves users polling for data often. AJAX or long-polling is by far easier to implement, however, it can have serious performance problems in high load situations. By high-load, I mean like Facebook. Imagine millions of people loading their feed, and every minute each user is asking the server for new data. That could require some serious hardware and software to make that work well. With Socket.IO, each user could instead connect and just indefinitely wait for new data from the server as it arrives, resulting in far less overall server traffic.
Also, if you have a real-time application, Socket.IO would allow for a much better user experience while maintaining a reasonable server load. A common example is a chat room. You really don't want to have to constantly poll the server for new messages. It would be much better for the server to broadcast new messages as they are received. Although you can do it with long-polling, it can be pretty expensive in terms of server resources.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string