Node Promises.all() takes too long to execute - node.js

I have to insert a a table with data regarding sent emails, after each email is sent.
Inside a loop I'm stuffing an array to be solved by the Promise.all().
insertData is a function that inserts Data, given two arguments, connector, the connection pool and dataToInsert, an object with data to be inserted.
async function sendAndInsert(payload) {
for (data of payload) {
let array = [];
const dataToInsert = {
id: data.id,
campaign: data.campaign,
}
for (email of data) {
array.push(insertData(connector, dataToInsert));
}
await Promise.all(array);
}
}
Afterwards, the function is invoked:
async invoke () {
await sendAndInsert(toInsertdata);
}
To insert 5000 records, it takes about 10 minutes, which is nuts.
Using
nodejs v10
pg-txclient as DB connector to PostgreSql.
What I've done and can be discarded as possible source of error:
Inserted random stuff to the table using the same connection.
I'm sure there is no issue with DB server, connection.
The issue must be in the Promise.all(), await sutff.

It looks like each record is being inserted through a separate call to insertData. Each call is likely to include overhead such as network latency, and 5000 requests cannot all be handled simultaneously. One call to insertData has to send the data to the database and wait for a response, before the next call can even start sending its data. 5000 requests over 10 minutes corresponds to 1.2 seconds latency per request, which is not unreasonable if the database is on another machine.
A better strategy is to insert all of the objects in one network request. You should modify insertData to allow it to accept an array of objects to insert instead of just one at a time. Then, all data can be sent at once to the database and you only suffer through the latency a single time.

Related

What is the efficient way to make API request for all the IDs of a table

I have events table with multiple records.
Columns - id, start_time, end_time ...
I have to fetch the analytics for all the live events (which can be thousands at a certain time) repeatedly by third party API calls, which can take one event at one time. I have to do this repeatedly until the event ends for each live event. Let's say the minimum time for an event to fetch analytics is every 15 minutes.
Third Party API calls need to be sequential.
I am open to use any tool e.g. Redis.
What are the efficient ways I can have this?
I need to have something like LRU system with repetition, but don't exactly know how to implement.
One efficient way to achieve this would be to use an asynchronous method such as the async library's mapSeries function in combination with Redis set command.
Here is an example of how you could use the mapSeries function and redis to make API requests for all the matching IDs in a table:
const async = require("async");
const redis = require("redis");
const client = redis.createClient();
const IDS = getIdsFromTable(); // function to get the ids that match the
start_time and end_time filter
async.mapSeries(IDS, (ID, callback) => {
// Make API request with ID
// ...
client.set(ID, JSON.stringify(result), function(err, reply) {
if(err){
console.log(err);
}
console.log(reply);
});
// Once the API request is complete, call the callback function
// to move on to the next ID
callback(null, result);
}, (err, results) => {
// All API requests have completed and saved in redis
// `results` is an array of all the responses
been });
You can set a timeout for 15 minutes for each request so that if any request takes more time than expected it will not block the execution and also handle the error if any. It is also important to consider the data expiration time in Redis, if you don't need the data after a certain time, you can set a time-to-live (TTL) on the keys.

How should I go about using Redis for creating notifications with express/nodejs?

Okay so I have a Nodejs/Express app that has an endpoint which allows users to receive notifications by opening up a connection to said endpoint:
var practitionerStreams = [] // this is a list of all the streams opened by pract users to the
backend
async function notificationEventsHandler(req, res){
const headers ={
'Content-Type': 'text/event-stream',
'Connection': 'keep-alive',
'Cache-Control': 'no-cache'
}
const practEmail = req.headers.practemail
console.log("PRACT EMAIL", practEmail)
const data = await ApptNotificationData.findAll({
where: {
practEmail: practEmail
}
})
//console.log("DATA", data)
res.writeHead(200, headers)
await res.write(`data:${JSON.stringify(data)}\n\n`)
// create a new stream
const newPractStream = {
practEmail: practEmail,
res
}
// add the new stream to list of streams
practitionerStreams.push(newPractStream)
req.on('close', () => {
console.log(`${practEmail} Connection closed`);
practitionerStreams = practitionerStreams.filter(pract => pract.practEmail !== pract.practEmail);
});
return res
}
async function sendApptNotification(newNotification, practEmail){
var updatedPractitionerStream = practitionerStreams.map((stream) =>
// iterate through the array and find the stream that contains the pract email we want
// then write the new notification to that stream
{
if (stream["practEmail"]==practEmail){
console.log("IF")
stream.res.write(`data:${JSON.stringify(newNotification)}\n\n`)
return stream
}
else {
// if it doesnt contain the stream we want leave it unchanged
console.log("ELSE")
return stream
}
}
)
practitionerStreams = updatedPractitionerStream
}
Basically when the user connects it takes the response object (that will stay open), will put that in an Object along with a unique email, and write to it in the future in sendApptNotification
But obviously this is slow for a full app, how exactly do I replace this with Redis? Would I still have a Response object that I write to? Or would that be replaced with a redis stream that I can subscribe to on the frontend? I also assume I would store all my streams on redis as well
edit: from what examples I've seen people are writing events from redis to the response object
Thank you in advance
If you want to use Redis Stream as notification system, you can follow this official guide:
https://redis.com/blog/how-to-create-notification-services-with-redis-websockets-and-vue-js/ .
To get this data as real time you need to create a websocket connection. I prefer to send to you an official guide instead of create it for you it's because the quality of this guide. It's perfect to anyone understand how to create it, but you need to adapt for your reality, of course.
However like I've said to you in the comments, I just believe that it's more simple to do requests in your api endpoint like /api/v1/notifications with setInterval in your frontend code and do requests each 5 seconds for example. If you prefer to use a notification system as real time I think you need to understand why do you need it, so in the future you can change your system if you think you need it. Basically it's a trade-off you must to do!
For my example imagine two tables in a relational database, one as Users and the second as Notifications.
The tables of this example:
UsersTable
id name
1 andrew
2 mark
NotificationTable
id message userId isRead
1 message1 1 true
2 message2 1 false
3 message3 2 false
The endpoint of this example will return all cached notifications that isn't read by the user. If the cache doesn't exists, it will return the data from the database, put it on the cache and return to the user. In the next call from API, you'll get the result from cache. There some points to complete in this example, for example the query on the database to get the notifications, the configuration of time expiration from cache and the another important thing is: if you want to update all the time the notifications in the cache, you need to create a middleware and trigger it in the parts of your code that needs to notify the notifications user. In this case you'll only update the database and cache. But I think you can complete these points.
const redis = require('redis');
const redisClient = redis.createClient();
app.get('/notifications', async (request, response) => {
const userId = request.user.id;
const cacheResult = await redisClient.get(`user:${userId}:notifications`)
if (cacheResult) return response.send(cacheResult);
const notifications = getUserNotificationsFromDatabase(userId);
redisClient.set(`user:${userId}:notifications`, notifications);
response.send(notifications);
})
Besides that there's another way, you can simple use only the redis or only the database to manage this notification. Your relational database with the correct index will send to your the results as faster as you expect. You'll only think about how much notifications you'll have been.

Asynchronous processing of data in Expressjs

I've an Express route which receives some data and process it, then insert into mongo (using mongoose).
This is working well if I return a response after the following steps are done:
Receive request
Process the request data
Insert the processed data into Mongo
Return 204 response
But client will be calling this API concurrently for millions of records. Hence the requirement is not to block the client for processing the data. Hence I made a small change in the code:
Receive request
Return response immediately with 204
Process the requested data
Insert the processed data into Mongo
The above is working fine for the first few requests (say 1000s), after that client is getting socket exception: connection reset peer error. I guess it is because server is blocking the connection as the port is not free and at some point of time, I notice my nodejs process is throwing out Out of memory error.
Sample code is as follows:
async function enqueue(data) {
// 1. Process the data
// 2. Insert the data in mongo
}
async function expressController(request, response) {
logger.info('received request')
response.status(204).send()
try {
await enqueue(request.body)
} catch (err) {
throw new Error(err)
}
}
Am I doing something wrong here?

Nodejs - make requests wait on lazy init of an object

let's say I have an application that returns exchange rates for today.
The service should read data via REST then save in cache and give clients from this cache. I want this request to 3rd party API to happen upon first attempt to get today's rate (kind of lazy init for every day).
Something like this:
(1) HTTP Request to get rate (form my app's client)
(2) if rate for today is available then return it
else
(3) read it from 3rd party service (via REST request)
(4) save in cache
(5) return from cache
The whole logic is written by mean of promises but the is a problem if i have millions of requests simultaneously at the very beginning of the day. In this case if one of the requests is on operations (3), (4) or (5) ( which are organized as a promise chain) operation (1) and (2) for other request can be handled by node in between.
E.g. while first requests is still waiting for the 3rd party API to response and the cache is empty other million of requests can also fire the same request to the same 3rd party API.
My thought is to chain operation (3) to some kind of an object A with the promise ( A.promise) inside that exposes resolve function to A. All other requests would wait (not synchronously wait of course) till the first request updates the cache and calls A.resolve() which will resolve A.promise.
But it looks a bit ugly, any idea of a better approach?
Update
I've got one solution, not sure whether it's node.js style:
function Deferred(){;
this.promise = false
this.markInProgress = ()=>{
this.promise = new Promise((res, rej)=>{
this.resolve = res;
this.reject = rej;
})
}
this.markDone = ()=>{
this.resolve()
this.promise = false
}
this.isInProgress = this.promise
}
let state = new Deferred();
function updateCurrencyRate(){
return db.any(`select name from t_currency group by name`)
.then((currencies) => {
return getRateFromCbr()
.then(res => Promise.all(
currencies.map((currency, i, currencies) =>
saveCurrency(
currency.name,
parseRate(res, currency.name)))));
})
}
function loadCurrencyRateFroDate(date){
if (state.isInProgress){
return state.promise
} else {
state.markInProgress();
return updateCurrencyRate()
.then(()=> {
state.markDone();
})
}
}
function getCurrencyRateForDate(date){
return getCurrencytRateFromDb(date)
.then((rate) => {
if (rate[0]) {
return Promise.resolve(rate)
} else {
loadCurrencyRateFroDate(date)
.then(()=>getCurrencytRateFromDb(date))
}
})
}
I would take a very simple queue, flush and fallback approach to this.
Implement a queuing mechanism (maybe with RabbitMQ) and route all your requests to the queue. This way you can hold off responding to requests when cache expires.
Create an expirable cache layer (maybe a redis cache) and expire your cache everyday.
By default route your requests from the queue to get data from cache. On receiving the data from cache, if the cache has expired, hold the queue and get data directly from 3rd party and update your cache and its expiry.
flush your cache every day
With queues, you have better control over the traffic. You can also add 3rd party API call as a fallback way to get data when your cache fails or anything goes wrong.

waiting for data to appear/change in DB

I am writing REST api which has to provide kind of real-time communication between users. Lets say I have db.orders collection. And I have api GET /order/{id}. This api should wait for some change in order document. For example it should return some data only when order.status is ready. I know how to do long-polling but I no idea how to check for data to appear/change in db. It would be easy if there was one app instance - then I could do this in memory, something like this:
var queue = []
// GET /order/{id}
function(req,res,next) {
var data = getDataFromDb();
if(data && data.status == 'ready') {
res.send(data);
return;
}
queue.push({id: req.params.id, req: req, res: res, next: next});
}
// POST /order/{id}
function(req,res,next) {
req.params.data.status = 'ready'
saveToDb(req.params.data);
var item = findInQueue(queue,req.params.id);
if(item) item.res.send(req.params.data);
}
First handler waits for data to have status ready and second sets status of data to ready. Its just a pseudocode and many things are missing (timeout for example).
The problem is when I want to use many instances of such app - I need some messaging mechanism which will allow to communicate across instances in kind of real time.
I read about REDIS PUB/SUB but I am not sure if I can use it in this way...
I am using node.js + restify + mongoDB for now.
You are looking for the oplog. It's a special capped collection where all operations on the database are stored. To enable them on a single server you can do.
mongod --dbpath=./data --oplogSize=100 --replSet test
then connect to the server using the console and write
rs.initiate()
use the console and do
use local
show collections
Notice the collection oplog.rs. it contains all the operations that have been applied to the server. If you are using node.js you can listen to the changes in the following way
var local = db.db("local");
var steam = local.collection("oplog.rs").find({}, {tailable:true, awaitdata:true}).stream();
stream.on('data', function(doc) {
});
for each operation on mongodb you'll receive a doc where you can establish if something you are interested in changed state.

Resources