How to give data in small parts from BE to the FE? - node.js

My case is that, web application (FE / react.js) is trying to generate a csv file with the response coming from the gateway (BE / node.js) service request.
Because the data is too large, FE is sending partial requests with using limit and offset values. And then it tries to merge it.
But FE wants to get the data in a single request. For this problem, looks like we can use stream. However, when i searched for its usage, I couldn't find an example.
On gateway service, how can I send multiple requests to internal service using limit and offset, and serve it to FE via stream?
I'm expecting to return the data by parts to the web application

Yes, Stream or Server-Sent Events (SSE) is the way to go. In your case, I recommend trying SSE.
Here are some examples to start:
https://dev.to/dhiwise/how-to-implement-server-sent-events-in-nodejs-11d9
https://www.digitalocean.com/community/tutorials/nodejs-server-sent-events-build-realtime-app

Related

Data Aggregator/composition service in Microservices

I am developing an application where there is a dashboard for data insights.
The backend is a set of microservices written in NodeJS express framework, with MySQL backend. The pattern used is the Database-Per-Service pattern, with a message broker in between.
The problem I am facing is, that I have this dashboard that derives data from multiple backend services(Different databases altogether, some are sql, some are nosql and some from graphDB)
I want to avoid multiple queries between front end and backend for this screen. However, I want to avoid a single point of failure as well. I have come up with the following solutions.
Use an API gateway aggregator/composition that makes multiple calls to backend services on behalf of a single frontend request, and then compose all the responses together and send it to the client. However, scaling even one server would require scaling of the gateway itself. Also, it makes the gateway a single point of contact.
Create a facade service, maybe called dashboard service, that issues calls to multiple services in the backend and then composes the responses together and sends a single payload back to the server. However, this creates a synchronous dependency.
I favor approach 2. However, I have a question there as well. Since the services are written in nodeJs, is there a way to enforce time-bound SLAs for each service, and if the service doesn't respond to the facade aggregator, the client shall be returned partial, or cached data? Is there any mechanism for the same?
GraphQL has been designed for this.
You start by defining a global GraphQL schema that covers all the schemas of your microservices. Then you implement the fetchers, that will "populate" the response by querying the appropriate microservices. You can start several instances to do not have a single point of failure. You can return partial responses if you have a timeout (your answer will incluse resolver errors). GraphQL knows how to manage cache.
Honestly, it is a bit confusing at first, but once you got it, it is really simple to extend the schema and include new microservices into it.
I can’t answer on node’s technical implementation but indeed the second approach allows to model the query calls to remote services in a way that the answer is supposed to be received within some time boundary.
It depends on the way you interconnect between the services. The easiest approach is to spawn an http request from the aggregator service to the service that actually bring the data.
This http request can be set in a way that it won’t wait longer than X seconds for response. So you spawn multiple http requests to different services simultaneously and wait for response. I come from the java world, where these settings can be set at the level of http client making those connections, I’m sure node ecosystem has something similar…
If you prefer an asynchronous style of communication between the services, the situation is somewhat more complicated. In this case you can design some kind of ‘transactionId’ in the message protocol. So the requests from the aggregator service might include such a ‘transactionId’ (UUID might work) and “demand” that the answer will include just the same transactionId. Now the sends when sent the messages should wait for the response for the certain amount of time and then “quit waiting” after X seconds/milliseconds. All the responses that might come after that time will be discarded because no one is expected to handle them at the aggregator side.
BTW this “aggregator” approach also good / simple from the front end approach because it doesn’t have to deal with many requests to the backend as in the gateway approach, but only with one request. So I completely agree that the aggregator approach is better here.

Firestore snapshot listener limit in node.js application

I have a nodejs backend that is serving as a gRPC server in front of a cloud firestore datastore. In perusing the best practices documentation for Firestore, I noticed: "limit snapshot listeners to 100 per client".
This is a pretty reasonable limitation if a "client" is a web UI or flutter application, but does the same limitation apply to a node.js or golang server connecting to the database via the admin interface? Suddenly, in the best case I am looking at 100 concurrent users per server process, which isn't super-great, if those users each request a single resource in streaming mode.
So: does that 100 snapshot listeners per client limitation apply when the "client" is actually a backend API service? And if so, what are some best practices to work around this?
(yes, I know I could just use the regular client API in the client itself, and will be doing that, I am mostly wondering about the limitations in an academic sense, as I was considering using streaming GRPC because there's a fair bit of data massaging that needs to happen between the storage representation and what the client consumes, so putting that all into a single place on a server where I control the rollout frequency is easier than dealing with data representation sync errors because some client is using an older implementation of a transformer method. Plus: that's extra data / code to ship to clients).
The 100 snapshot listeners per client limit should apply for any client, including a backend API service.
Firestore doesn't have a way to make the distinction on where the calls come from, and as such there's no built-in mechanism to make it to exempt the limitation.

Alternative to GraphQL long polling on an Express server for a large request?

Objective
I need to show a big table of data in my React web app frontend.
My backend is an Express server with a GraphQL layer and a few "normal" endpoints.
My server gets data from various sources, including an external API, which is the data source for my current task.
My server has a database that I can use freely. I cannot directly access the external API from my front end.
The data all comes from the external API I mentioned. In fact, it comes from multiple similar calls to the same endpoint with many different IDs. Each of those individual calls takes a while to return but doesn't risk timing out.
Current Solution
My naive implementation: I do one GraphQL query in which the resolver does all the API calls to the external service in parallel. It waits on them all to complete using Promise.all(). It then returns a big array containing all the data I need to my server. My server then returns that data to me.
Problem With Current Solution
Unfortunately, this sometimes leaves my frontend hanging for too long and it times out (takes longer than 2 minutes).
Proposed Solution
Is there a better way than manually implementing long polling in GraphQL?
This is my main plan for a solution at the moment:
Frontend sends a request to my server
Server returns a 200 and starts hitting the external API, and sets a flag in the database
Server stores the result of each API call in the database as it completes
Meanwhile, the frontend shows a loading screen and keeps making the same GraphQL query for an entity like MyBigTableData which will tell me how many of the external API calls have returned
When they've all returned, the next time I ask for MyBigTableData, the server will send back all the data.
Question
Is there a better alternative to GraphQL long polling on an Express server for this large request that I have to do?
An alternative that comes to mind is to not use GraphQL and instead use a standard HTTP endpoint, but I'm not sure that really makes much difference.
I also see that HTTP/2 has multiplexing which could be relevant. My server currently runs HTTP/1.1 and upgrading is something of an unknown to me.
I see here that Keep-Alive, which sounds like it could be relevant, is unusable in Safari which is bad as many of my users use Safari to access the frontend.
I can't use WebSockets because of technical restraints. I don't want to set a ridiculously long timeout on my client either (and I'm not sure if it's possible)
I discovered that GraphQL has polling built in https://www.apollographql.com/docs/react/data/queries/#polling
In the end, I made a REST polling system.

Streaming an API using request module

The API endpoint I need to access provides live streaming option only. But the need is for a regular non streaming API. Using the request node module can I achieve this?
You can hook up to the stream on your server and store data that arrives in the stream locally on the server in a database and then when a REST request comes in for some data, you look in your local database and satisfy the request from that database (the traditional, non-streaming way).
Other than that, I can't figure out what else you might be trying to do. You cannot "turn a streaming API into a non-streaming one". They just aren't even close to the same thing. A streaming API is like subscribing to a feed of information. You don't make a request, new data is just sent to you when it's available. A typical non-streaming API is that a client makes a specific request and the server responds with data for that specific request.
Here's a discussion of the Twitter streaming API that might be helpful: https://dev.twitter.com/streaming/overview

Connecting two Node/Express apps with streaming JSON

I currently have two apps running...
One is my REST API layer that provides a number of services to the frontend.
The other is a 'translation app', it can be fed a JSON object (over http POST call) , perform some data translation and mappings on that object and return it to the REST layer
My situation is I want to do this for a large number of objects. The flow i want is:
User requests 100,000 objects in a specific format -> REST layer retrieves that from the database -> passes each JSON data object to
translation service to perform formatting -> pass each one back to the
REST layer -> REST layer returns new objects to the user.
What I don't want to do is call tranlate.example.com/translate on 100,000 different calls, or pass megabytes of data through 1 single huge POST request.
So the obvious answer is streaming data to the translate app, and then streaming data back.
There seems to be a lot of solutions to stream data across apps: open a websocket (socket.io) , open a raw TCP connection between the two, or since the HTTP request and response data of Node is actually a stream I could utilize that then emit a JSON object when its successfully translated
My question is Is there a best practice here to stream data between two apps? It seems I should use http(req, res) stream and keep a long-lived connection open to preserve the 'REST' model. Any samples that could be provided would be great.
This is one of the best use cases for message queues. Your basically create a queue for data to be translated by the translate service, and a queue for data which is already translated and ready to be sent back to the user. Your REST layer and translation layer publish and subscribe to the applicable queues, and can process the data as it comes in. This has the added benefit of decoupling your REST and translation layer, meaning it becomes trivial to add multiple translation layers later to handle additional load if necessary.
Take a look at RabbitMQ, but there are plenty of other options as well.

Resources