Connecting two Node/Express apps with streaming JSON - node.js

I currently have two apps running...
One is my REST API layer that provides a number of services to the frontend.
The other is a 'translation app', it can be fed a JSON object (over http POST call) , perform some data translation and mappings on that object and return it to the REST layer
My situation is I want to do this for a large number of objects. The flow i want is:
User requests 100,000 objects in a specific format -> REST layer retrieves that from the database -> passes each JSON data object to
translation service to perform formatting -> pass each one back to the
REST layer -> REST layer returns new objects to the user.
What I don't want to do is call tranlate.example.com/translate on 100,000 different calls, or pass megabytes of data through 1 single huge POST request.
So the obvious answer is streaming data to the translate app, and then streaming data back.
There seems to be a lot of solutions to stream data across apps: open a websocket (socket.io) , open a raw TCP connection between the two, or since the HTTP request and response data of Node is actually a stream I could utilize that then emit a JSON object when its successfully translated
My question is Is there a best practice here to stream data between two apps? It seems I should use http(req, res) stream and keep a long-lived connection open to preserve the 'REST' model. Any samples that could be provided would be great.

This is one of the best use cases for message queues. Your basically create a queue for data to be translated by the translate service, and a queue for data which is already translated and ready to be sent back to the user. Your REST layer and translation layer publish and subscribe to the applicable queues, and can process the data as it comes in. This has the added benefit of decoupling your REST and translation layer, meaning it becomes trivial to add multiple translation layers later to handle additional load if necessary.
Take a look at RabbitMQ, but there are plenty of other options as well.

Related

How to give data in small parts from BE to the FE?

My case is that, web application (FE / react.js) is trying to generate a csv file with the response coming from the gateway (BE / node.js) service request.
Because the data is too large, FE is sending partial requests with using limit and offset values. And then it tries to merge it.
But FE wants to get the data in a single request. For this problem, looks like we can use stream. However, when i searched for its usage, I couldn't find an example.
On gateway service, how can I send multiple requests to internal service using limit and offset, and serve it to FE via stream?
I'm expecting to return the data by parts to the web application
Yes, Stream or Server-Sent Events (SSE) is the way to go. In your case, I recommend trying SSE.
Here are some examples to start:
https://dev.to/dhiwise/how-to-implement-server-sent-events-in-nodejs-11d9
https://www.digitalocean.com/community/tutorials/nodejs-server-sent-events-build-realtime-app

Data Aggregator/composition service in Microservices

I am developing an application where there is a dashboard for data insights.
The backend is a set of microservices written in NodeJS express framework, with MySQL backend. The pattern used is the Database-Per-Service pattern, with a message broker in between.
The problem I am facing is, that I have this dashboard that derives data from multiple backend services(Different databases altogether, some are sql, some are nosql and some from graphDB)
I want to avoid multiple queries between front end and backend for this screen. However, I want to avoid a single point of failure as well. I have come up with the following solutions.
Use an API gateway aggregator/composition that makes multiple calls to backend services on behalf of a single frontend request, and then compose all the responses together and send it to the client. However, scaling even one server would require scaling of the gateway itself. Also, it makes the gateway a single point of contact.
Create a facade service, maybe called dashboard service, that issues calls to multiple services in the backend and then composes the responses together and sends a single payload back to the server. However, this creates a synchronous dependency.
I favor approach 2. However, I have a question there as well. Since the services are written in nodeJs, is there a way to enforce time-bound SLAs for each service, and if the service doesn't respond to the facade aggregator, the client shall be returned partial, or cached data? Is there any mechanism for the same?
GraphQL has been designed for this.
You start by defining a global GraphQL schema that covers all the schemas of your microservices. Then you implement the fetchers, that will "populate" the response by querying the appropriate microservices. You can start several instances to do not have a single point of failure. You can return partial responses if you have a timeout (your answer will incluse resolver errors). GraphQL knows how to manage cache.
Honestly, it is a bit confusing at first, but once you got it, it is really simple to extend the schema and include new microservices into it.
I can’t answer on node’s technical implementation but indeed the second approach allows to model the query calls to remote services in a way that the answer is supposed to be received within some time boundary.
It depends on the way you interconnect between the services. The easiest approach is to spawn an http request from the aggregator service to the service that actually bring the data.
This http request can be set in a way that it won’t wait longer than X seconds for response. So you spawn multiple http requests to different services simultaneously and wait for response. I come from the java world, where these settings can be set at the level of http client making those connections, I’m sure node ecosystem has something similar…
If you prefer an asynchronous style of communication between the services, the situation is somewhat more complicated. In this case you can design some kind of ‘transactionId’ in the message protocol. So the requests from the aggregator service might include such a ‘transactionId’ (UUID might work) and “demand” that the answer will include just the same transactionId. Now the sends when sent the messages should wait for the response for the certain amount of time and then “quit waiting” after X seconds/milliseconds. All the responses that might come after that time will be discarded because no one is expected to handle them at the aggregator side.
BTW this “aggregator” approach also good / simple from the front end approach because it doesn’t have to deal with many requests to the backend as in the gateway approach, but only with one request. So I completely agree that the aggregator approach is better here.

EventStreams (SSE) - Broadcasting updates to clients. Is it possible?

I have React web application and REST API (Express.js).
I found that usage of EventStream is better choice if you do not want to use long-polling or sockets (no need to send data client->server).
Usecase:
User opens page where is empty table where other users can add data by POST /data.
This table is filled with initial data from API by GET /data.
Then page is connected to EventStream on /data/stream and listen for updates
Someone add new row and table needs to be updated...
Is possible to broadcast this change (new row added) from backend (controller for adding rows) to all users what are connected to /data/stream?
It is generally not good practice to have a fetch for the initial data, then a separate live stream for updates. That's because there is a window where data can arrive on the server between the initial fetch and the live update stream.
Usually, that means you either miss messages or you get duplicates that are published to both. You can eliminate duplicates by tracking some kind of id or sequence number, but that means additional coding and computation.
SSE can be used for both the initial fetch and the live updates on a single stream, avoiding the aforementioned sync challenges.
The client creates an EventSource to initiate an SSE stream. The server responds with the data that is already there, and thereafter publishes any new data that arrives on the server.
If you want, the server can include an event-id with each message. Then if a client becomes disconnected, the SSE client will automatically reconnect with the last-event-id, and the data flow resumes from where it left off. On the client-side, the auto-reconnect and resume from last-event-id is automatic as it is spec-ed by the standard. The developer doesn't have to do anything.
SSE is kind of like a HTTP / REST / XHR request that stays open and continues to stream data, so you get the best of both worlds. The API is lightweight, easy to understand, and standards-based.
I will try to answer myself :)
I never thought I can use just whatever pub/sub system on backend. Every user what connects to stream (/data/stream) gets subscribed and server will just publish when receive new row from POST /data

Node.js REST API wrapper for async messaging

Given an event driven micro service architecture with asynchronous messaging, what solutions are there to implementing a 'synchronous' REST API wrapper such that requests to the REST interface wait for a response event to be published before sending a response to the client?
Example: POST /api/articles
Internally this would send a CreateArticleEvent in the services layer, eventually expecting an ArticleCreatedEvent in response containing the ID of the persisted article.
Only then would the REST interface response to the end client with this ID.
Dealing with multiple simultaneous requests - is keeping an in-memory map of inflight requests in the REST api layer keyed by some correlating identifier conceptually a workable approach?
How can we deal with timing out requests after a certain period?
Generally you don't need to maintain a map of in-flight requests, because this is basically done for you by node.js's http library.
Just use express as it's intended, and this is probably something you never really have to worry about, as long as you avoid any global state.
If you have a weirder pattern in mind to build, and not sure how to solve it. It might help to share a simple example. Chances are that it's not hard to rebuild and avoid global state.
With express, have you tried middleware? You can chain a series of callback functions with a certain timeout after the article is created.
I assume you are in the context of Event Sourcing and microservices? If so I recommend that you don't publish a CreateArticleEvent to the event store, and instead directly create the article in the database and then publish the ArticleCreatedEvent to the Event store.
Why you ask? Generally this pattern is created to orchestrate different microservices. In the example show in the link above, it was used to orchestrate how the Customer service should react when an Order is created. Note the past tense. The Order Service created the order, and Customer Service reacts to it.
In your case it is easier (and probably better) to just insert the order into the database (by calling the ArticleService directly) and responding with the article ID. Then just publish the ArctileCreatedEvent to your event store, to trigger other microservices that may want to listen to it (like, for example, trigger a notification to the editor for review).
Event Sourcing is a good pattern, but we don't need to apply it to everything.

Node.js best socket.io practice (request-response vs broadcast)

I am new to node.js/socket.io and may not be asking this the right way or even asking the right question.
The goal of my app is to get data from an API convert it to a JSON Object and store it in a mongodb. Then serve the client side the data when needed. I have thought about two possibilities and was wondering what the best practice would be. My first notion was to broadcast all of the entries in the database every so often to all of the connections. The other idea was to have the client request to server what data it needed and then send the requested data to client.
The data being stored in the database is around 100 entries. The data would be updated from the API approximately every 30 seconds. If method 1 was chosen the data would be broadcast every 5-10 seconds. If method 2 was chosen then the data would be sent when requested. The client side will have different situations where not all data will be needed all the time. The client side will have to request data every so often to make sure the data is "fresh".
So my question is what is the best practice broadcast a large chunk every x seconds or broadcast smaller chunks when requested.
Sorry if this doesn't make sense.
Thanks for your time.
DDP protocol is definitely an interesting way to go, but it might be overkill. Simpler solution is to take the best from both method 1 and 2. If latency is not so important and you have spare bandwidth you can broadcast a "update" message to all clients when new data arrives. The client considers if the update affects it and downloads data it needs.
Slightly more complicated and more effective approach is subscription procedure very similar to DDP. For smaller projects you can implement it yourself in a while. This is how it could work:
Client subscribes to a chunk of data.
Server sends this chunk to the client and remembers which clients subscribed to what.
If the chunk is updated the server goes through the subscription list and sends the new data to subscribers.
Client can unsubscribe at any time by sending a special message, disconnecting (or optionally by subscribing to different chunk).
By chunk I mean any way how to identify some data. It can be record ID, time range a filter or anything that makes sense in your project.

Resources