Data Aggregator/composition service in Microservices

Data Aggregator/composition service in Microservices - node.js

I am developing an application where there is a dashboard for data insights.
The backend is a set of microservices written in NodeJS express framework, with MySQL backend. The pattern used is the Database-Per-Service pattern, with a message broker in between.
The problem I am facing is, that I have this dashboard that derives data from multiple backend services(Different databases altogether, some are sql, some are nosql and some from graphDB)
I want to avoid multiple queries between front end and backend for this screen. However, I want to avoid a single point of failure as well. I have come up with the following solutions.
Use an API gateway aggregator/composition that makes multiple calls to backend services on behalf of a single frontend request, and then compose all the responses together and send it to the client. However, scaling even one server would require scaling of the gateway itself. Also, it makes the gateway a single point of contact.
Create a facade service, maybe called dashboard service, that issues calls to multiple services in the backend and then composes the responses together and sends a single payload back to the server. However, this creates a synchronous dependency.
I favor approach 2. However, I have a question there as well. Since the services are written in nodeJs, is there a way to enforce time-bound SLAs for each service, and if the service doesn't respond to the facade aggregator, the client shall be returned partial, or cached data? Is there any mechanism for the same?

GraphQL has been designed for this.
You start by defining a global GraphQL schema that covers all the schemas of your microservices. Then you implement the fetchers, that will "populate" the response by querying the appropriate microservices. You can start several instances to do not have a single point of failure. You can return partial responses if you have a timeout (your answer will incluse resolver errors). GraphQL knows how to manage cache.
Honestly, it is a bit confusing at first, but once you got it, it is really simple to extend the schema and include new microservices into it.

I can’t answer on node’s technical implementation but indeed the second approach allows to model the query calls to remote services in a way that the answer is supposed to be received within some time boundary.
It depends on the way you interconnect between the services. The easiest approach is to spawn an http request from the aggregator service to the service that actually bring the data.
This http request can be set in a way that it won’t wait longer than X seconds for response. So you spawn multiple http requests to different services simultaneously and wait for response. I come from the java world, where these settings can be set at the level of http client making those connections, I’m sure node ecosystem has something similar…
If you prefer an asynchronous style of communication between the services, the situation is somewhat more complicated. In this case you can design some kind of ‘transactionId’ in the message protocol. So the requests from the aggregator service might include such a ‘transactionId’ (UUID might work) and “demand” that the answer will include just the same transactionId. Now the sends when sent the messages should wait for the response for the certain amount of time and then “quit waiting” after X seconds/milliseconds. All the responses that might come after that time will be discarded because no one is expected to handle them at the aggregator side.
BTW this “aggregator” approach also good / simple from the front end approach because it doesn’t have to deal with many requests to the backend as in the gateway approach, but only with one request. So I completely agree that the aggregator approach is better here.

Related

Alternative to GraphQL long polling on an Express server for a large request?

Objective
I need to show a big table of data in my React web app frontend.
My backend is an Express server with a GraphQL layer and a few "normal" endpoints.
My server gets data from various sources, including an external API, which is the data source for my current task.
My server has a database that I can use freely. I cannot directly access the external API from my front end.
The data all comes from the external API I mentioned. In fact, it comes from multiple similar calls to the same endpoint with many different IDs. Each of those individual calls takes a while to return but doesn't risk timing out.
Current Solution
My naive implementation: I do one GraphQL query in which the resolver does all the API calls to the external service in parallel. It waits on them all to complete using Promise.all(). It then returns a big array containing all the data I need to my server. My server then returns that data to me.
Problem With Current Solution
Unfortunately, this sometimes leaves my frontend hanging for too long and it times out (takes longer than 2 minutes).
Proposed Solution
Is there a better way than manually implementing long polling in GraphQL?
This is my main plan for a solution at the moment:
Frontend sends a request to my server
Server returns a 200 and starts hitting the external API, and sets a flag in the database
Server stores the result of each API call in the database as it completes
Meanwhile, the frontend shows a loading screen and keeps making the same GraphQL query for an entity like MyBigTableData which will tell me how many of the external API calls have returned
When they've all returned, the next time I ask for MyBigTableData, the server will send back all the data.
Question
Is there a better alternative to GraphQL long polling on an Express server for this large request that I have to do?
An alternative that comes to mind is to not use GraphQL and instead use a standard HTTP endpoint, but I'm not sure that really makes much difference.
I also see that HTTP/2 has multiplexing which could be relevant. My server currently runs HTTP/1.1 and upgrading is something of an unknown to me.
I see here that Keep-Alive, which sounds like it could be relevant, is unusable in Safari which is bad as many of my users use Safari to access the frontend.
I can't use WebSockets because of technical restraints. I don't want to set a ridiculously long timeout on my client either (and I'm not sure if it's possible)

I discovered that GraphQL has polling built in https://www.apollographql.com/docs/react/data/queries/#polling
In the end, I made a REST polling system.

Azure Logic Apps - HTTP communication between microservices

Are Logic Apps considered microservices? If so, is making HTTP API calls from Logic Apps, whether it's using HTTP/Function/APIM connectors, not a violation of direct HTTP communication between microservices?
If possible, never depend on synchronous communication (request/response) between multiple microservices, not even for queries. The goal of each microservice is to be autonomous and available to the client consumer, even if the other services that are part of the end-to-end application are down or unhealthy. If you think you need to make a call from one microservice to other microservices (like performing an HTTP request for a data query) in order to be able to provide a response to a client application, you have an architecture that will not be resilient when some microservices fail.
Moreover, having HTTP dependencies between microservices, like when creating long request/response cycles with HTTP request chains, as shown in the first part of the Figure 4-15, not only makes your microservices not autonomous but also their performance is impacted as soon as one of the services in that chain is not performing well.
Source: https://learn.microsoft.com/en-us/dotnet/standard/microservices-architecture/architect-microservice-container-applications/communication-in-microservice-architecture

Yes, Logic Apps are primarily Http based services. Whether or not it's 'micro' really doesn't matter because 'micro' is too abstract to have any real meaning. It was a useful marketing term at one point but it's tour on the tech fashion runway has ended. So, don't even think about that. ;)
What the authors are trying to express is that you should avoid chaining dependencies in an app's architecture. A waits for B which waits for C which waits for D which waits for E, etc... That's the first line in the graphic.
Instead, Basket can check Catalog on it's own, then call Ordering, while Inventory is checked in the background. You only one level deep instead of 4.

Node.js REST API wrapper for async messaging

Given an event driven micro service architecture with asynchronous messaging, what solutions are there to implementing a 'synchronous' REST API wrapper such that requests to the REST interface wait for a response event to be published before sending a response to the client?
Example: POST /api/articles
Internally this would send a CreateArticleEvent in the services layer, eventually expecting an ArticleCreatedEvent in response containing the ID of the persisted article.
Only then would the REST interface response to the end client with this ID.
Dealing with multiple simultaneous requests - is keeping an in-memory map of inflight requests in the REST api layer keyed by some correlating identifier conceptually a workable approach?
How can we deal with timing out requests after a certain period?

Generally you don't need to maintain a map of in-flight requests, because this is basically done for you by node.js's http library.
Just use express as it's intended, and this is probably something you never really have to worry about, as long as you avoid any global state.
If you have a weirder pattern in mind to build, and not sure how to solve it. It might help to share a simple example. Chances are that it's not hard to rebuild and avoid global state.

With express, have you tried middleware? You can chain a series of callback functions with a certain timeout after the article is created.

I assume you are in the context of Event Sourcing and microservices? If so I recommend that you don't publish a CreateArticleEvent to the event store, and instead directly create the article in the database and then publish the ArticleCreatedEvent to the Event store.
Why you ask? Generally this pattern is created to orchestrate different microservices. In the example show in the link above, it was used to orchestrate how the Customer service should react when an Order is created. Note the past tense. The Order Service created the order, and Customer Service reacts to it.
In your case it is easier (and probably better) to just insert the order into the database (by calling the ArticleService directly) and responding with the article ID. Then just publish the ArctileCreatedEvent to your event store, to trigger other microservices that may want to listen to it (like, for example, trigger a notification to the editor for review).
Event Sourcing is a good pattern, but we don't need to apply it to everything.

Node.js, Socket.IO, Express: Should app logic be in socket handlers or REST api?

I'm planning a non-trivial realtime chat platform. The app has several types of resources: Users, Groups, Channels, Messages. There are roughly 20 types of realtime events having to do with these resources: for instance, submitting a message, a user connecting or disconnecting, a user joining a group, a moderator kicking a user from a group, etc...
Overall, I see two paths to organizing all this complexity.
The first is to build a REST API to manage the resources. For instance, to send a message, POST to /api/v1/messages. Or, to kick a user from a group, POST to /api/v1/group/:group_id/kick/. Then, from within the Express route handler, call io.emit (made accessible through res.locals) with the updated data to notify all related clients. In this case, clients talk to the server through HTTP and the server notifies clients through socket.io.
The other option is to not have a rest API at all, and handle all events through socket.IO. For instance, to send a message, emit a SEND_MESSAGE event. Or, to kick a user, emit a KICK_USER event. Then, from within the socket.io event handler, call io.emit with the updated data to notify all clients.
Yet another option is to have certain actions handled by a REST API, others by socket.IO. For instance, to get all messages, GET api/v1/channel/:id/messages. But to post a message, emit SEND_MESSAGE to the socket.
Which is the most suitable option? How do I determine which actions need to be sent thorough an API, and which need to be sent through socket.io? Is it better not to have a REST API for this type of application?
Some of my thoughts so far, nothing conclusive:
Advantages of REST API over the socket.io-only approach:
Easier to organize hierarchically, more modular
Easier to test
More robust and elegant
Simpler auth implementation with middleware
Disadvantages of REST API over the socket.io-only approach:
Slightly less performant (source)
Since a socket connection needs to be open anyways, why not use it for everything?
Slightly harder to manage on the client side.
Thanks for reading !

This could be achieve this using sockets.
Why because a chat application will be having dozens of actions, like ..
'STARTS_TYPING', 'STOPS_TYPING', 'SEND_MESSAGE', 'RECIVE_MESSAGE',...
Accommodating all these features using rest api's will generate a complex system which lacks performance.
Also concept of rooms in socket.io simplifies lot of headache regarding group chat implementation.
So its better to build everything based on sockets[socket.io or web cluster].

Here is the solution I found to solve this problem.
The key mistake in my question was that I assumed a rest API and websockets were mutually exclusive, because I intended on integrating the business and database logic directly in express routes and socket.io handlers. Thus, choosing between socket.io and http was important, because it would influence the core business logic of my app.
Instead, it shouldn't matter which transport to use. The business logic has to be independent from the transport logic, in its own module.
To do this, I developed a service layer that handles CRUD tasks, but also more specific tasks such as authentication. Then, this service layer can be easily consumed from either or both express routes and socket.io handlers.
In the end, this architecture allowed me not to easily switch between transport technologies.

Is a node.js app that both servs a rest-api and handles web sockets a good idea?

Disclaimer: I'm new to node.js so I am sorry if this is a weird question :)
I have a node.js using express.js to serv a REST-API. The data served by the REST-API is fetched from a nosql database by the node.js app. All clients only use HTTP-GET. There is one exception though: Data is PUT and DELETEd from the master database (a relational database on another server).
The thought for this setup is of course to let the 'node.js/nosql database' server(s) be a public front end and thereby protecting the master database from heavy traffic.
Potentially a number of different client applications will use the REST-API, but mainly it will be used by a client app with a long lifetime (typically 0.5 to 2 hours). Instead of letting this app constantly polling the REST-API for possible new data I want to use websockets so that data is only sent to client when there is any new data. I will use a node.js app for this and probably socket.io so that it could fall back to api-polling if websockets are not supported by the client. New data should be sent to clients each time the master database PUTs or DELETEs objects in the nosql database.
The question is if I should use one node.js for both the API and the websockets or one for the API and one for the websockets.
Things to consider:
- Performance: The app(s) will be hosted on a cluster of servers with a load balancer and a HTTP accelerator in front. Would one app handling everything perform better than two apps with distinct tasks?
- Traffic between app: If I choose a two app solution the api app that receives PUTs and DELETEs from the master database will have to notice the websocket app every time it receives new data (or the master database will have to notice both apps). Could the doubled traffic be a performance issue?
- Code cleanlines: I believe two apps will result in cleaner and better code, but then again there will surely be some common code for both apps which will lead to having two copies it.
As to how heavy the load can be it is very difficult to say, but a possible peak can involve:
50000 clients
each listening to up to 5 different channels
new data being sent from master each 5th second
new data should be sent to approximately 25% of the clients (for some data it should be sent to all clients and other data probably below 1% of the clients)
UPDATE:
Thanks for the answers guys. More food for thoughts here. I have decided to have two node.js apps, one for the REST-API and one for web sockets. The reason is that I belive it will be easier to scale them. To begin with the whole system will be hosted on three physical servers and one node.js app for the REST-API on each server should bu sufficient, but for the websocket app there probably needs to several instances of it on each physical server.

This is a very good question.
If you are looking at a legacy system, and you already have a REST interface defined, there is not a lot of advantages to adding WebSockets. Things that may point you to WebSockets would be:
a demand for server-to-client or client-to-client real-time data
a need to integrate with server-components using a classic bi-directional protocol (e.g. you want to write an FTP or sendmail client in javascript).
If you are starting a new project, I would try to have a hard split in the project between:
the serving of static content (images, js, css) using HTTP (that was what it was designed for) and
the serving of dynamic content (real-time data) using WebSockets (load-balanced, subscription/messaging based, automatic reconnect enabled to handle network blips).
So, why should we try to have a hard separation? Let's consider the advantages of a HTTP-based REST protocol.
The use of the HTTP protocol for REST semantics is an invention that has certain advantages
Stateless Interactions: none of the client's context is to be stored on the server side between the requests.
Cacheable: Clients can cache the responses.
Layered System: undetectability of intermediaries
Easy testing: it's easy to use curl to test an HTTP-based protocol
On the other hand...
The use of a messaging protocol (e.g. AMQP, JMS/STOMP) on top of WebSockets does not preclude any of these advantages.
WebSockets can be transparently load-balanced, messages and state can be cached, efficient stateful or stateless interactions can be defined.
A basic reactive analysis style can define which events trigger which messages between the client and the server.
Key additional advantages are:
a WebSocket is intended to be a long-term persistent connection, usable for multiple different messaging purpose over a single connection
a WebSocket connection allows for full bi-directional communication, allowing data to be sent in either direction in sympathy with network characteristics.
one can use connection offloading to share subscriptions to common topics using intermediaries. This means with very few connections to a core message broker, you can serve millions of connected users efficiently at scale.
monitoring and testing can be implemented with an admin interface to send/recieve messages (provided with all message brokers).
the cost of all this is that one needs to deal with re-establishment of state when the WebSocket needs to reconnect after being dropped. Many protocol designers build in the notion of a "sync" message to provide context from the server to the client.
Either way, your model object could be the same whether you use REST or WebSockets, but that might mean you are still thinking too much in terms of request-response rather than publish/subscribe.

The first thing you must think about, is how you're going to scale the servers and manage their state. With a REST API this is largely straightforward, as they are for the most part stateless, and every load balancer knows how to proxy http requests. Hence, REST APIs can be scaled horizontally, leaving the few bits of state to the persistence layer (database) to deal with. With websockets, often times its a different matter. You need to research what load balancer you're going to use (if its a cloud deployment, often times it depends on the cloud provider). Then figure out what type of websocket support or configuration the load balancer will need. Then depending on your application, you need to figure out how to manage the state of your websocket connections across the cluster. Think about the different use cases, e.g. if a websocket event on one server alters the state of the data, will you need to propagate this change to a different user on a different connection? If the answer is yes, then you'll probably need something like Redis to manage your ws connections and communicate changes between the servers.
As for performance, at the end of the day its still just HTTP connections, so I doubt there will be a big difference in separating the server functionality. However, I think two servers would go a big way in improving code cleanliness, as long as you have another 'core' module to isolate code common to both servers.

Personally I would do them together, this is because you can share the models and most of the code between the REST and the WS.
At the end of the day what Yuri said in his answer is correct, but is not so much work to load balance WS any way, everyone does it nowadays. The approach I took is have REST for everything and then create some WS "endpoints" for subscribing for realtime data server-client.
So for what I understood, your client would just get notifications from the server, with updates, so definitely I would go with WS. You subscribe to some events and then you get new results when there are. Keep asking with HTTP calls is not the best way.
We had this need and basically built a small framework around this idea http://devalien.github.io/Axolot/
Basically you can understand our approach in the controller (this is just an example, in our real world app we have subscriptions so we can notify when we have new data or when we finish a procedure). In actions there are the rest endpoints and in sockets the websockets endpoints.
module.exports = {
model: 'user', // We are attaching the user to the model, so CRUD operations are there (good for dev purposes)
path: '/user', // Tthis is the end point
actions: {
'get /': [
function (req, res) {
var query = {};
Model.user.find(query).then(function(user) { // Find from the User Model declared above
res.send(user);
}).catch(function (err){
res.send(400, err);
});
}],
},
sockets: {
getSingle: function(userId, cb) { // This one is callable from socket.io using "user:getSingle
Model.user.findOne(userId).then(function(user) {
cb(user)
}).catch(function (err){
cb({error: err})
});
}
}
};

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string