Separate REST API and App Server - node.js

We want to figure out, is it a good practice to split the REST API and App Server. We code both of them with NodeJs and host it on AWS, also note that we want to connect other clients (Android/iOS) with the API and have separate database server.
Our main questions are:
it is more secure ?
better performance ?
there are special features of the development, which we must consider?
that is then the REST server is down? do we have to cache the data on the App Server ?
we also have some simple logic on the client side like "password forget", which server handle this ? (App- or REST server)
which of them handle the authentication ?

it is more secure ?
No. It is not. In fact, it is less secure as the attack surface is larger. And you need to individually authenticate and authorize each services.
better performance ?
Nope. Function calls with in the same application is much faster than serializing -> network latency(http(s) overhead) -> deserializing -> processing -> serializing -> network latency(http(s) overhead) -> deserializing
there are special features of the development, which we must consider?
Yes, deployment strategy, service discovery, graceful degradation in case of upstream service unavailability.
that is then the REST server is down? do we have to cache the data on
the App Server?
This depends on the situation, there is no universal answer to this question. Trust me, nobody other than your team/product owner can answer this question. Mostly this decision will be driven by the contract that you establish with your consumers. I would suggest to read about circuit breaking/ graceful degradation/ http response codes for partial response etc
we also have some simple logic on the client side like "password
forget", which server handle this ? (App- or REST server) which of
them handle the authentication ?
From this question, I am assuming that you don't have a clear separation for individual responsibilities of each service yet. That begs the question, why split these into 2 at this point of time. Why can't all the functionality reside in one application to begin with. As you evolve, as you start feeling the pain of a monolithic application, you can revisit your architecture and break it into small pieces as it deems fit. In my opinion, for smaller applications, monolithic architecture is much more manageable than microservices.
Just one humble suggestion, I wouldn't name one service REST and other App server, that may give an impression that you may be looking this from an incorrect angle.
In my opinion, if I am coming to a point where my monolithic app is not manageable anymore, I would split it based on functionality (look at unrelated entities and take them out as a separate service)

Related

Data Aggregator/composition service in Microservices

I am developing an application where there is a dashboard for data insights.
The backend is a set of microservices written in NodeJS express framework, with MySQL backend. The pattern used is the Database-Per-Service pattern, with a message broker in between.
The problem I am facing is, that I have this dashboard that derives data from multiple backend services(Different databases altogether, some are sql, some are nosql and some from graphDB)
I want to avoid multiple queries between front end and backend for this screen. However, I want to avoid a single point of failure as well. I have come up with the following solutions.
Use an API gateway aggregator/composition that makes multiple calls to backend services on behalf of a single frontend request, and then compose all the responses together and send it to the client. However, scaling even one server would require scaling of the gateway itself. Also, it makes the gateway a single point of contact.
Create a facade service, maybe called dashboard service, that issues calls to multiple services in the backend and then composes the responses together and sends a single payload back to the server. However, this creates a synchronous dependency.
I favor approach 2. However, I have a question there as well. Since the services are written in nodeJs, is there a way to enforce time-bound SLAs for each service, and if the service doesn't respond to the facade aggregator, the client shall be returned partial, or cached data? Is there any mechanism for the same?
GraphQL has been designed for this.
You start by defining a global GraphQL schema that covers all the schemas of your microservices. Then you implement the fetchers, that will "populate" the response by querying the appropriate microservices. You can start several instances to do not have a single point of failure. You can return partial responses if you have a timeout (your answer will incluse resolver errors). GraphQL knows how to manage cache.
Honestly, it is a bit confusing at first, but once you got it, it is really simple to extend the schema and include new microservices into it.
I can’t answer on node’s technical implementation but indeed the second approach allows to model the query calls to remote services in a way that the answer is supposed to be received within some time boundary.
It depends on the way you interconnect between the services. The easiest approach is to spawn an http request from the aggregator service to the service that actually bring the data.
This http request can be set in a way that it won’t wait longer than X seconds for response. So you spawn multiple http requests to different services simultaneously and wait for response. I come from the java world, where these settings can be set at the level of http client making those connections, I’m sure node ecosystem has something similar…
If you prefer an asynchronous style of communication between the services, the situation is somewhat more complicated. In this case you can design some kind of ‘transactionId’ in the message protocol. So the requests from the aggregator service might include such a ‘transactionId’ (UUID might work) and “demand” that the answer will include just the same transactionId. Now the sends when sent the messages should wait for the response for the certain amount of time and then “quit waiting” after X seconds/milliseconds. All the responses that might come after that time will be discarded because no one is expected to handle them at the aggregator side.
BTW this “aggregator” approach also good / simple from the front end approach because it doesn’t have to deal with many requests to the backend as in the gateway approach, but only with one request. So I completely agree that the aggregator approach is better here.

Is a node.js app that both servs a rest-api and handles web sockets a good idea?

Disclaimer: I'm new to node.js so I am sorry if this is a weird question :)
I have a node.js using express.js to serv a REST-API. The data served by the REST-API is fetched from a nosql database by the node.js app. All clients only use HTTP-GET. There is one exception though: Data is PUT and DELETEd from the master database (a relational database on another server).
The thought for this setup is of course to let the 'node.js/nosql database' server(s) be a public front end and thereby protecting the master database from heavy traffic.
Potentially a number of different client applications will use the REST-API, but mainly it will be used by a client app with a long lifetime (typically 0.5 to 2 hours). Instead of letting this app constantly polling the REST-API for possible new data I want to use websockets so that data is only sent to client when there is any new data. I will use a node.js app for this and probably socket.io so that it could fall back to api-polling if websockets are not supported by the client. New data should be sent to clients each time the master database PUTs or DELETEs objects in the nosql database.
The question is if I should use one node.js for both the API and the websockets or one for the API and one for the websockets.
Things to consider:
- Performance: The app(s) will be hosted on a cluster of servers with a load balancer and a HTTP accelerator in front. Would one app handling everything perform better than two apps with distinct tasks?
- Traffic between app: If I choose a two app solution the api app that receives PUTs and DELETEs from the master database will have to notice the websocket app every time it receives new data (or the master database will have to notice both apps). Could the doubled traffic be a performance issue?
- Code cleanlines: I believe two apps will result in cleaner and better code, but then again there will surely be some common code for both apps which will lead to having two copies it.
As to how heavy the load can be it is very difficult to say, but a possible peak can involve:
50000 clients
each listening to up to 5 different channels
new data being sent from master each 5th second
new data should be sent to approximately 25% of the clients (for some data it should be sent to all clients and other data probably below 1% of the clients)
UPDATE:
Thanks for the answers guys. More food for thoughts here. I have decided to have two node.js apps, one for the REST-API and one for web sockets. The reason is that I belive it will be easier to scale them. To begin with the whole system will be hosted on three physical servers and one node.js app for the REST-API on each server should bu sufficient, but for the websocket app there probably needs to several instances of it on each physical server.
This is a very good question.
If you are looking at a legacy system, and you already have a REST interface defined, there is not a lot of advantages to adding WebSockets. Things that may point you to WebSockets would be:
a demand for server-to-client or client-to-client real-time data
a need to integrate with server-components using a classic bi-directional protocol (e.g. you want to write an FTP or sendmail client in javascript).
If you are starting a new project, I would try to have a hard split in the project between:
the serving of static content (images, js, css) using HTTP (that was what it was designed for) and
the serving of dynamic content (real-time data) using WebSockets (load-balanced, subscription/messaging based, automatic reconnect enabled to handle network blips).
So, why should we try to have a hard separation? Let's consider the advantages of a HTTP-based REST protocol.
The use of the HTTP protocol for REST semantics is an invention that has certain advantages
Stateless Interactions: none of the client's context is to be stored on the server side between the requests.
Cacheable: Clients can cache the responses.
Layered System: undetectability of intermediaries
Easy testing: it's easy to use curl to test an HTTP-based protocol
On the other hand...
The use of a messaging protocol (e.g. AMQP, JMS/STOMP) on top of WebSockets does not preclude any of these advantages.
WebSockets can be transparently load-balanced, messages and state can be cached, efficient stateful or stateless interactions can be defined.
A basic reactive analysis style can define which events trigger which messages between the client and the server.
Key additional advantages are:
a WebSocket is intended to be a long-term persistent connection, usable for multiple different messaging purpose over a single connection
a WebSocket connection allows for full bi-directional communication, allowing data to be sent in either direction in sympathy with network characteristics.
one can use connection offloading to share subscriptions to common topics using intermediaries. This means with very few connections to a core message broker, you can serve millions of connected users efficiently at scale.
monitoring and testing can be implemented with an admin interface to send/recieve messages (provided with all message brokers).
the cost of all this is that one needs to deal with re-establishment of state when the WebSocket needs to reconnect after being dropped. Many protocol designers build in the notion of a "sync" message to provide context from the server to the client.
Either way, your model object could be the same whether you use REST or WebSockets, but that might mean you are still thinking too much in terms of request-response rather than publish/subscribe.
The first thing you must think about, is how you're going to scale the servers and manage their state. With a REST API this is largely straightforward, as they are for the most part stateless, and every load balancer knows how to proxy http requests. Hence, REST APIs can be scaled horizontally, leaving the few bits of state to the persistence layer (database) to deal with. With websockets, often times its a different matter. You need to research what load balancer you're going to use (if its a cloud deployment, often times it depends on the cloud provider). Then figure out what type of websocket support or configuration the load balancer will need. Then depending on your application, you need to figure out how to manage the state of your websocket connections across the cluster. Think about the different use cases, e.g. if a websocket event on one server alters the state of the data, will you need to propagate this change to a different user on a different connection? If the answer is yes, then you'll probably need something like Redis to manage your ws connections and communicate changes between the servers.
As for performance, at the end of the day its still just HTTP connections, so I doubt there will be a big difference in separating the server functionality. However, I think two servers would go a big way in improving code cleanliness, as long as you have another 'core' module to isolate code common to both servers.
Personally I would do them together, this is because you can share the models and most of the code between the REST and the WS.
At the end of the day what Yuri said in his answer is correct, but is not so much work to load balance WS any way, everyone does it nowadays. The approach I took is have REST for everything and then create some WS "endpoints" for subscribing for realtime data server-client.
So for what I understood, your client would just get notifications from the server, with updates, so definitely I would go with WS. You subscribe to some events and then you get new results when there are. Keep asking with HTTP calls is not the best way.
We had this need and basically built a small framework around this idea http://devalien.github.io/Axolot/
Basically you can understand our approach in the controller (this is just an example, in our real world app we have subscriptions so we can notify when we have new data or when we finish a procedure). In actions there are the rest endpoints and in sockets the websockets endpoints.
module.exports = {
model: 'user', // We are attaching the user to the model, so CRUD operations are there (good for dev purposes)
path: '/user', // Tthis is the end point
actions: {
'get /': [
function (req, res) {
var query = {};
Model.user.find(query).then(function(user) { // Find from the User Model declared above
res.send(user);
}).catch(function (err){
res.send(400, err);
});
}],
},
sockets: {
getSingle: function(userId, cb) { // This one is callable from socket.io using "user:getSingle
Model.user.findOne(userId).then(function(user) {
cb(user)
}).catch(function (err){
cb({error: err})
});
}
}
};

Is it a bad idea to use a web api instead of a tcp socket for master/slave communication?

TL:DR; Are there any drawbacks / pitfalls to use a RESTful API instead of a TCP/Socket connection for a master-slave pattern?
I'm writing a web application that fetches data from an 3rd party API, calculates statistics (and more), and presents it. The fetched data is stored in a database.
I'm writing this web application with NodeJS and AngularJS.
So there are 2 basic opertions the app will do:
Answer HTTP requests from the front end
Update the database. This includes: Fetching data, saving data to database, and caching results of some heavy queries every 5 minutes in a in-memory database like redis.
I want to seperate these operations in two applications. The HTTP server shall be the master. The second application is a just a slave, of which as many instances can be spawned.
The master will implement some kind of task-processor which just distributes tasks to idle slaves. The Slave is very dumb. It can report about its status (idle/busy and some details like current load etc). You can start tasks on a slave. The master will take care of queueing tasks and so on.
I guess this is common server/agent pattern.
So I've started implenting a TCP Server and Client, but it just feels like so much overhead to do this. A bit like reinventing the wheel. So I thought I just could use a HTTP server on my client which just has two endpoints like
[GET] /status
[POST] /execute/:task
Am I on the right track here?
TL;DR; There are drawbacks to rolling your own REST API for master-slave architecture.
Your server/agent pattern is commonly referred to as microservices.
Rolling your own REST API might work, but is probably not optimal. Dealing with delivery semantics (for example, at most once vs at least once), transient failures, polling, etc will likely cause a lot of pain before you get it right.
There are libraries/services to provide varying levels of convenience:
Seneca - http://senecajs.org
Pigato - http://prdn.github.io/pigato/
Kong by Mashape - http://getkong.org
Webtask by Auth0 (paid) - https://webtask.io

Communication between RESTful API's on same server with NodeJS

I am building two sets of services on a website (all written in NodeJS on the server), both are using a RESTful approach. For the sake of modularity I decided to make both services separate entities. The first service deals with the products of the site and the second specifically deals with user related functions. So the first might have functions like getProducts, deleteProduct etc... The second would have functions like isLoggedIn, register, hasAccessTo etc... The product module will make several calls to the user module to make sure that the person making the calls has the privilege to do so.
Now the reason I separated them like this, was because in the near future I foresee a separate product range opening up, but will need to use the same user system as the first (even sharing the same database). The user system will use a database that spans the entire site and all subsequent products
My question is about communication between these projects and the users project. What is the most effective way of keeping the users module separate without suffering any significant speed hits. If the product API made a call to the user API on the same server (localhost), is there a signifcant cost to this, versus building the user API into each of the subsequent projects? Is there a better way to do this through interprocess communication maybe? Is simply having the users API run as its own service an effective solution?
If you have two nodes on same server (machine) then you have not bad performance in terms of network latency because both are on localhost.
Then, nodes will be communicating using a rest api, so on the underground, you will use node js sockets. You could use unix sockets instead of http sockets because are faster BUT are worst to debug, so I recommend you don't to that (but it's ok know alternatives).
And finally, your system looks like an "actor design pattern". At first glance this design patter is a little difficult to understand but you could have a look at this if you want more info about actor model pattern:
Actor model for NodeJS https://github.com/benlau/nactor
Actor model explanation http://en.wikipedia.org/wiki/Actor_model

How do I maintain state across multiple web servers?

Can I have multiple web servers hooked up to a SQL Server cluster and still maintain a user's session?
I've thought of various approaches. The one suggested by the Microsoft site is to use response.redirect to the "correct" server. While I can understand the reasoning for this, it seems kind of short sighted.
If the load balancer is sending you to the server currently under the least strain, surely as a developer you should honor that?
Are there any best practices to follow in this instance? If so, I would appreciate knowing what they are and any insights into the pros/cons of using them.
Some options:
The load balancer can be configured to have sticky sessions. Make sure your app session timeout is less than the load balancers or you'll get bounced around with unpredictable results.
You can use a designated state server to handle session. Then it won't matter where they get bounced by the LB.
You can use SQL server to manage session.
Check this on serverfault.
https://serverfault.com/questions/19717/load-balanced-iis-servers-with-asp-net-inproc-session
I'm taking here from my experience of Java App Servers, some with very sophisticated balancing algorithms.
A reasonable general assumption is that "Session Affinity" is preferable to balancing every request. If we allocate the initial request for each user with some level of work-load knowledge (or even on a random basis) and the population comes and goes them we do end up with a reasonable behaviours. Remember that the objective is to give each user a good experience not to end up with evenly used servers!
In the event of a server failing we can then see our requests move eleswhere and we expect to see our session transfered. Lots of way to achieve that (session in DB, session state propogated via high speed messaging ...).
This isn't probably the answer you're looking for, but can you eliminate the NEED for session state? We've gone to great lengths to encode whatever we might need between requests in the page itself. That way I have no concern for state across a farm or scalability issues with having to hang onto something owned by someone who might never come back.
While you could use "sticky" sessions in your load balancer, a more optimal path is to have your Session use a State Server instead of InProc. At that point, all of your webservers can point to the same state server and share session.
http://msdn.microsoft.com/en-us/library/ms972429.aspx MSDN has plenty to say on the subject :D
UPDATE:
The State Server is a service on your windows server boxes, but yeah it produces a single point of failure.
Additionally, you could specify serialization of the session to a SQL Server, which wouldn't be a single point of failure if you had it farmed.
I'm not sure of how "heavy" the workload is for a state server, does anyone else have any metrics?

Resources