let's say I have an application that returns exchange rates for today.
The service should read data via REST then save in cache and give clients from this cache. I want this request to 3rd party API to happen upon first attempt to get today's rate (kind of lazy init for every day).
Something like this:
(1) HTTP Request to get rate (form my app's client)
(2) if rate for today is available then return it
else
(3) read it from 3rd party service (via REST request)
(4) save in cache
(5) return from cache
The whole logic is written by mean of promises but the is a problem if i have millions of requests simultaneously at the very beginning of the day. In this case if one of the requests is on operations (3), (4) or (5) ( which are organized as a promise chain) operation (1) and (2) for other request can be handled by node in between.
E.g. while first requests is still waiting for the 3rd party API to response and the cache is empty other million of requests can also fire the same request to the same 3rd party API.
My thought is to chain operation (3) to some kind of an object A with the promise ( A.promise) inside that exposes resolve function to A. All other requests would wait (not synchronously wait of course) till the first request updates the cache and calls A.resolve() which will resolve A.promise.
But it looks a bit ugly, any idea of a better approach?
Update
I've got one solution, not sure whether it's node.js style:
function Deferred(){;
this.promise = false
this.markInProgress = ()=>{
this.promise = new Promise((res, rej)=>{
this.resolve = res;
this.reject = rej;
})
}
this.markDone = ()=>{
this.resolve()
this.promise = false
}
this.isInProgress = this.promise
}
let state = new Deferred();
function updateCurrencyRate(){
return db.any(`select name from t_currency group by name`)
.then((currencies) => {
return getRateFromCbr()
.then(res => Promise.all(
currencies.map((currency, i, currencies) =>
saveCurrency(
currency.name,
parseRate(res, currency.name)))));
})
}
function loadCurrencyRateFroDate(date){
if (state.isInProgress){
return state.promise
} else {
state.markInProgress();
return updateCurrencyRate()
.then(()=> {
state.markDone();
})
}
}
function getCurrencyRateForDate(date){
return getCurrencytRateFromDb(date)
.then((rate) => {
if (rate[0]) {
return Promise.resolve(rate)
} else {
loadCurrencyRateFroDate(date)
.then(()=>getCurrencytRateFromDb(date))
}
})
}
I would take a very simple queue, flush and fallback approach to this.
Implement a queuing mechanism (maybe with RabbitMQ) and route all your requests to the queue. This way you can hold off responding to requests when cache expires.
Create an expirable cache layer (maybe a redis cache) and expire your cache everyday.
By default route your requests from the queue to get data from cache. On receiving the data from cache, if the cache has expired, hold the queue and get data directly from 3rd party and update your cache and its expiry.
flush your cache every day
With queues, you have better control over the traffic. You can also add 3rd party API call as a fallback way to get data when your cache fails or anything goes wrong.
Related
I have events table with multiple records.
Columns - id, start_time, end_time ...
I have to fetch the analytics for all the live events (which can be thousands at a certain time) repeatedly by third party API calls, which can take one event at one time. I have to do this repeatedly until the event ends for each live event. Let's say the minimum time for an event to fetch analytics is every 15 minutes.
Third Party API calls need to be sequential.
I am open to use any tool e.g. Redis.
What are the efficient ways I can have this?
I need to have something like LRU system with repetition, but don't exactly know how to implement.
One efficient way to achieve this would be to use an asynchronous method such as the async library's mapSeries function in combination with Redis set command.
Here is an example of how you could use the mapSeries function and redis to make API requests for all the matching IDs in a table:
const async = require("async");
const redis = require("redis");
const client = redis.createClient();
const IDS = getIdsFromTable(); // function to get the ids that match the
start_time and end_time filter
async.mapSeries(IDS, (ID, callback) => {
// Make API request with ID
// ...
client.set(ID, JSON.stringify(result), function(err, reply) {
if(err){
console.log(err);
}
console.log(reply);
});
// Once the API request is complete, call the callback function
// to move on to the next ID
callback(null, result);
}, (err, results) => {
// All API requests have completed and saved in redis
// `results` is an array of all the responses
been });
You can set a timeout for 15 minutes for each request so that if any request takes more time than expected it will not block the execution and also handle the error if any. It is also important to consider the data expiration time in Redis, if you don't need the data after a certain time, you can set a time-to-live (TTL) on the keys.
I want to make a progress bar kind of telling where the user where in process of fetching the API my backend is. But it seems like every time I send a response it stops the request, how can I avoid this and what should I google to learn more since I didn't find anything online.
React:
const {data, error, isError, isLoading } = useQuery('posts', fetchPosts)
if(isLoading){<p>Loadinng..</p>}
return({data&&<p>{data}</p>})
Express:
app.get("api/v1/testData", async (req, res) => {
try {
const info = req.query.info
const sortByThis = req.query.sortBy;
if (info) {
let yourMessage = "Getting Data";
res.status(200).send(yourMessage);
const valueArray = await fetchData(info);
yourMessage = "Data retrived, now sorting";
res.status(200).send(yourMessage);
const sortedArray = valueArray.filter((item) => item.value === sortByThis);
yourMessage = "Sorting Done now creating geojson";
res.status(200).send(yourMessage);
createGeoJson(sortedArray)
res.status(200).send(geojson);
}
else { res.status(400) }
} catch (err) { console.log(err) res.status(500).send }
}
You can only send one response to a request in HTTP.
In case you want to have status updates using HTTP, the client needs to poll the server i.e. request status updates from the server. Keep in mind though that every request needs to be processed on the server side and will take resources away which are then not available for other (more important) requests from other clients. So don't poll too frequently.
If you want to support long running operations using HTTP have a look at the following API design pattern.
Alternatively you could also use a WebSockets connection to push updates from the server to the client. I assume your computation on the backend will not be minutes long and you want to update the client in real-time, so probably WebSockets will be the best option for you. A WebSocket connection has, once established, considerably less overhead than sending huge HTTP requests/ responses between client and server.
Have a look at this thread which dicusses abovementioned and other possibilites.
Rephrased at the end
NodeJS communicates with other APIs through GRPC.
Each external API has its own dedicated GRPC connection with Node and every dedicated GRPC connection has an upper bound of concurrent clients that it can serve simultaneously (e.g. External API 1 has an upper bound of 30 users).
Every request to the Express API, may need to communicate with External API 1, External API 2, or External API 3 (from now on, EAP1, EAP2 etc) and the Express API also has an upper bound of concurrent clients (e.g. 100 clients) that can feed the EAPs with.
So, how I am thinking of solving the issue:
A Client makes a new request to the Express API.
A middleware, queueManager, creates a Ticket for the client (think of it as a Ticket that approves access to the System - it has basic data of the Client (e.g. name))
The Client gets the Ticket, creates an Event Listener that listens
to an event with their Ticket ID as the event name (when the System
is ready to accept a Ticket, it yields the Ticket's ID as an event)
and enters a "Lobby" where, the Client, just waits till their ticket
ID is accepted/announced (event).
My issue is that I can't really think of how to implement the way that the system will keep track of the tickets and how to have a queue based on the concurrent clients of the system.
Before the client is granted access to the System, the System itself should:
Check if the Express API has reached its upper-bound of concurrent clients -> If that's true, it should just wait till a new Ticket position is available
If a new position is available, it should check the Ticket and find out which API it needs to contact. If, for example, it needs to contact EAP1, it should check how many current clients use the GRPC connection. This is already implemented (Every External API is under a Class that has all the information that is needed). If the EAP1 has reached its upper-bound, then NodeJS should try again later (But, how much later? Should I emit a system event after the System has completed another request to EAP1?)
I'm aware of Bull, but I am not really sure if it fits my requirements.
What I really need to do is to have the Clients in a queue, and:
Check if Express API has reached its upper-bound of concurrent users
If a position is free, pop() a Ticket from the Ticket's array
Check if the EAPx has reached its upper-bound limit of concurrent users
If true, try another ticket (if available) that needs to communicate
with a different EAP
If false, grant access
Edit: One more idea could be to have two Bull Queues. One for the Express API (where the option "concurrency" could be set as the upper bound of the Express API) and one for the EAPs. Each EAP Queue will have a distinct worker (in order to set the upper bound limits).
REPHRASED
In order to be more descriptive about the issue, I'll try to rephrase the needs.
A simple view of the System could be:
I have used Clem's suggestion (RabbitMQ), but again, I can't achieve concurrency with limits (upper-bounds).
So,
Client asks for a Ticket from the TicketHandler. In order for the TicketHandler to construct a new Ticket, the client, along with other information, provides a callback:
TicketHandler.getTicket(variousInfo, function () {
next();
})
The callback will be used by the system to allow a Client to connect with an EAP.
TickerHandler gets the ticket:
i) Adds it to the queue
ii) When the ticket can be accessed (upper-bound is not reached), it asks the appropriate EAP Handler if the client can make use of the GRPC connection. If yes, then asks the EAP Handler to lock a position and then it calls the ticket's available callback (from Step 1)
If no, TicketHandler checks the next available Ticket that needs to contact a different EAP. This should go on until the EAP Handler that first informed TicketHandler that "No position is available", sends a message to TicketHandler in order to inform it that "Now there are X available positions" (or "1 available position"). Then TicketHandler, should check the ticket that couldn't access EAPx before and ask again EAPx if it can access the GRPC connection.
From your description I understand what follows:
You have a Node.js front-tier. Each Node.js box needs to be limited to up to 100 clients
You have an undefined back-tier that has GRPC connections with the boxes in the front-tier (let's call them EAPs). Each EAP <-> Node.js GRPS link is limited to N concurrent connections.
What I see here are only server-level and connection-level limits thus I see no reason to have any distributed system (like Bull) to manage the queue (if the Node.js box dies there is no one able to recover the HTTP request context to offer a response to that specific request - therefore when a Node.js box dies responses to its requests are not more useful).
This being considered I would simply create a local queue (as simple as an array) to manage your queuing.
Disclaimer: this has to be considered pseudo-code what follows is simplified and untested
This may be a Queue implementation:
interface SimpleQueueObject<Req, Res> {
req: Req;
then: (Res) => void;
catch: (any) => void;
}
class SimpleQueue<Req = any, Res = any> {
constructor(
protected size: number = 100,
/** async function to be executed when a request is de-queued */
protected execute: (req: Req) => Promise<Res>,
/** an optional function that may ba used to indicate a request is
not yet ready to be de-queued. In such case nex request will be attempted */
protected ready?: (req: Req) => boolean,
) { }
_queue: SimpleQueueObject<Req, Res>[] = [];
_running: number = 0;
private _dispatch() {
// Queues all available
while (this._running < this.size && this._queue.length > 0) {
// Accept
let obj;
if (this.ready) {
const ix = this._queue.findIndex(o => this.ready(o.req));
// todo : this may cause queue to stall (for now we throw error)
if (ix === -1) return;
obj = this._queue.splice(ix, 1)[0];
} else {
obj = this._queue.pop();
}
// Execute
this.execute(obj.req)
// Resolves the main request
.then(obj.then)
.catch(obj.catch)
// Attempts to queue something else after an outcome from EAP
.finally(() => {
this._running --;
this._dispatch();
});
obj.running = true;
this._running ++;
}
}
/** Queue a request, fail if queue is busy */
queue(req: Req): Promise<Res> {
if (this._running >= this.size) {
throw "Queue is busy";
}
// Queue up
return new Promise<Res>((resolve, reject) => {
this._queue.push({ req, then: resolve, catch: reject });
this._dispatch();
});
}
/** Queue a request (even if busy), but wait a maximum time
* for the request to be de-queued */
queueTimeout(req: Req, maxWait: number): Promise<Res> {
return new Promise<Res>((resolve, reject) => {
const obj: SimpleQueueObject<Req, Res> = { req, then: resolve, catch: reject };
// Expire if not started after maxWait
const _t = setTimeout(() => {
const ix = this._queue.indexOf(obj);
if (ix !== -1) {
this._queue.splice(ix, 1);
reject("Request expired");
}
}, maxWait);
// todo : clear timeout
// Queue up
this._queue.push(obj);
this._dispatch();
})
}
isBusy(): boolean {
return this._running >= this.size;
}
}
And then your Node.js business logic may do something like:
const EAP1: SimpleQueue = /* ... */;
const EAP2: SimpleQueue = /* ... */;
const INGRESS: SimpleQueue = new SimpleQueue<any, any>(
100,
// Forward request to EAP
async req => {
if (req.forEap1) {
// Example 1: this will fail if EAP1 is busy
return EAP1.queue(req);
} else if (req.forEap2) {
// Example 2: this will fail if EAP2 is busy and the request can not
// be queued within 200ms
return EAP2.queueTimeout(req, 200);
}
}
)
app.get('/', function (req, res) {
// Forward request to ingress queue
INGRESS.queue(req)
.then(r => res.status(200).send(r))
.catch(e => res.status(400).send(e));
})
Or this solution will allow you (as requested) to also accept requests for busy EAPs (up to a max of 100 in total) and dispatch them when they become ready:
const INGRESS: SimpleQueue = new SimpleQueue<any, any>(
100,
// Forward request to EAP
async req => {
if (req.forEap1) {
return EAP1.queue(req);
} else if (req.forEap2) {
return EAP2.queue(req);
}
},
// Delay queue for busy consumers
req => {
if (req.forEap1) {
return !EAP1.isBusy();
} else if (req.forEap2) {
return !EAP2.isBusy();
} else {
return true;
}
}
)
Please note that:
in this example, Node.js will start throwing when more than 100 concurrent requests are received (it is not unusual to throw a 503 while throttling)
Be careful when you have more throttling limits (Node.js and GRPC in your case) as the first may cause the seconds starvation (think about receiving 100 requests for EAP1 and then 10 for EAP2, Node.js will be full with EAP1 requests and will refuse EAP2 ones all do EAP2 is doing nothing)
I have to insert a a table with data regarding sent emails, after each email is sent.
Inside a loop I'm stuffing an array to be solved by the Promise.all().
insertData is a function that inserts Data, given two arguments, connector, the connection pool and dataToInsert, an object with data to be inserted.
async function sendAndInsert(payload) {
for (data of payload) {
let array = [];
const dataToInsert = {
id: data.id,
campaign: data.campaign,
}
for (email of data) {
array.push(insertData(connector, dataToInsert));
}
await Promise.all(array);
}
}
Afterwards, the function is invoked:
async invoke () {
await sendAndInsert(toInsertdata);
}
To insert 5000 records, it takes about 10 minutes, which is nuts.
Using
nodejs v10
pg-txclient as DB connector to PostgreSql.
What I've done and can be discarded as possible source of error:
Inserted random stuff to the table using the same connection.
I'm sure there is no issue with DB server, connection.
The issue must be in the Promise.all(), await sutff.
It looks like each record is being inserted through a separate call to insertData. Each call is likely to include overhead such as network latency, and 5000 requests cannot all be handled simultaneously. One call to insertData has to send the data to the database and wait for a response, before the next call can even start sending its data. 5000 requests over 10 minutes corresponds to 1.2 seconds latency per request, which is not unreasonable if the database is on another machine.
A better strategy is to insert all of the objects in one network request. You should modify insertData to allow it to accept an array of objects to insert instead of just one at a time. Then, all data can be sent at once to the database and you only suffer through the latency a single time.
In some page I have to get information from 8 different endpoints. 2 of them are outside of my application and sometimes they cause an delay at displaying data. The web browser waits until the data is processed. Once they're outside of my app I can't refactor them in order to make them fast, but I need to show the information that they provide. In addition, sometimes one of them returns nothing. If so, I use default data to show to the user. The waiting time takes time for the user experience perspective.
I'm using promises to call these endpoints. Below is part of the code snippet that I am using.
The code is working fine. The issue is the delay.
First. Here is the array that contains all the service that I need to process:
var requests = [{
// 0
url: urlLocalApi + '/endpointURL_1/',
headers: {
'headers': 'apitoken'
},
}, {
// 1
url: urlLocalApi + '/endpointURL_2/',
headers: {
'headers': 'apitoken'
},
];
The code of array is encapsulated in this method:
const requests = homePageFunctions.createRequest();
Now, it is how the data is processed. I am using both 'request-promise' and 'bluebird', and a personal logger to check it out if everything goes fine.
const Promise = require("bluebird");
const request = require('request-promise');
var viewsHelper = {
getPageData: function (requests) {
return Promise.map(requests, function (obj) {
return request(obj).then(function (body) {
AppLogger.log(`Endpoint parsed`, statusLogger.infodate);
return JSON.parse(body);
});
});
}
}
module.exports = viewsHelper;
How do I call this?
viewsHelper.getPageData(requests)
.then(results => {
var output = [];
for (var i = 0; i < results.length; i++) {
output.push(results[i]);
}
// render data
res.render('homepage/index', output);
AppLogger.log(`PageData is rendered`, statusLogger.infodate);
})
.catch(err => {
console.log(err);
});
};
Take a look that inside of each index item of "output" array, there is the output of each data of each endpoint.
The problem here is:
If any of the endpoint takes long, the entire chain slows even though
if they are already processed. The web page waits in a blank mode.
How to prevent this behavior?
That is an interesting question but I have questions in order to answer it effectively.
You have Node server and client (HTML/JS)
You have 8 end points 2 are slow because you don’t have control over them.
Does the client (page) aware of the 8 end points? I .e you make 8 calls everytime you reload the page?
OR
Does the page makes one request to your node JS and your nodeJS synchronously calls the 8 end points
If it is 1 then lazy loading will work easily for you since the page is making the requests.
If it is 2 lazy loading will work only at the server side however the client will be blocked because it doesn’t know (or care how you load your data. The page made one request and it is blocked waiting for that request..
Obviously each method has pros and cons ..
One way you can solve this is to asynchronously call those end points on node and cache them and when the page makes the 1 request you have the data ready ..
Again we know very little about the situation there are many ways to solve this
Hope this helps