Send batches to web API - node.js

I have a mongodb and NodeJS setup on expressJS. What this API basically does is storing e-mail adresses and other information about users.
These are called personas and are stored in a MongoDB database. What I'm trying to do now is calling a url in my app, which sends all personas to the Mailchimp API.
However, as the amount of personas that are stored is quite high (144.000), I can not send them in one batch to the Mailchimp API. What I'm trying to do is send them in batches, without much luck.
How would I go about to set this up? Currently I'm using the Async package to limit the simultaneous sends to the Mailchimp API. But I'm not sure if this is the correct way to go.
I guess the code below is not working, as the personas-array I collect is too big to fit in the memory. But I'm not sure how to chunk it up in a correct way.
//This is a model function which searches the database to collect all personas
Persona.getAllSubscriptions(function(err, personas) {
//Loop send each persona to mailchimp
var i = 1;
//This is the async module I'm using to limit the simultaneous requests to Mailchimp
async.forEachLimit(personas, 10, function (persona, callback) {
//This is the function to send one item to mailchimp
mailchimpHelper.sendToMailchimp(persona, mailchimpMergefields, function(err,body){
if(err) {
callback(err);
} else if(!body) {
callback(new Error("No response from Mailchimp"));
} else {
console.log(i);
i++;
callback();
}
});
}, function(err) {
if (err) console.log(err);
//Set a success message
res.json({error: false, message: "All personas updated"});
});
});

I ran into a similar problem with a query to a collection that could return more than 170,000 documents. I ended up using the "stream" API to build batches to be processed. You could do something similar to "build" batches to send to MailChimp.
Here's an example.
var stream = db.collection.find().stream(); //be sure find is returning a cursor
var batch = []
this.stream.on('data', function(data){
batch.push(data);
if(batch.length >= maxBatchSize){
stream.pause();
// send batch to mail chimp
}
});
this.stream.on('pause', function(){
// send batch to mailChimp
// when mailChimp has finished
stream.resume();
});
this.stream.on('end', ()=>{
// data finished
});
You can look at the documentation for cursor and stream here
Hope this helps.
Cheers.

There seem to be some things that I wouldn't do so like you described. You are trying to quite heavy processing inside the node server. The trigger by url could cause you a lot of problems if you do not secure it.
Also, this is a heavy process which is better to be implemented as queue-worker approach separated from the server. This would give you more control over the process, some of the email sendings might fail or error might occur on the mailchimp side(API is down etc). So instead of triggering directly sending, just trigger worker and process emails as chunks as #jackfrster described.
Make sure you have checked the Mailchimp API limits. Do you have considered alternatives like creating campaign and send out the campaign so you would not need to sending for each person in list ?

Related

Websocket vs SSE to implement a real time friend invitation system on React/Node

I would like to implement a system that allows users to add each other as friends and share data between them. I have gotten the authentication done and currently researching ways to do this real time. This project of mine is purely a learning experience so I am looking for many ways to perform this task to grow my knowledge.
I have experience using Websockets on a previous project and it was easy to use. Websockets seems like the best solution to my problem as it allows the user to send and receive invites through the open socket. However I have also learnt that the downside would be a long open socket connection that might be potentially performance taxing(?) Since I'm only sending/receiving information only when an invite is sent/received, websockets might be overutilized for a simple function.
At the same time I would like to learn about new technologies and I found out about Server Sent Events that would be less performance heavy(?) Using SSE would be much efficient as it only sends HTTP requests to the clients/server whenever the user send the invite.
Please correct me if I'm wrong for what I typed out above as this is what I gathered through my reading online. So now I'm having a hard time understanding whether SSE is better than websocket for my project. If there are other technologies please do let me know too! Thank you
how you doing ?
The best advise would be always to use websocket in this context, cuz your project can grow and need some feature that would be better using websocket
But you got another options, one of the is Firebase, Yes, FIREBASE!
You can do a nice reactive application with firebase, becouse the its observers update data in realtime, just like the websockets do.
But here go some cons and pros.
Websocket: Can make your project escalable, its more complete, you can use it in any context, BUT: is hard to implement and takes more time to be learned and understood.
Firebase, Easy and fast to implement, you can do a chat in 20 minuts, and surelly would help you with your problem, There is Firestore and Reatime database.. even the firestore updates in realtime.. BUT: Firebase costs in a big project can be expensive, i dont think is a good option for a big project.
Thats it.. the better options to do a real time data application to me.
A little bit more about. Firebase vs Websocket
https://ably.com/compare/firebase-vs-socketio
to send a friend invitation, you just send an API request. WebSocket is used for real time communication. From react.js, get the email and send the email to the server
export const sendFriendInvitation = async (data) => {
try {
return axios.post("/friend-invitation", data);
} catch (exception) {
console.error(error)
}
};
On node.js side, write a controller to control this request:
const invitationRequest = async (req, res) => {
// get the email
const { targetMail } = req.body;
// write code to handle that same person is not sending req to himself
// get the details of user who sent the email
const targetUser = await User.findOne({
mail: targetMail.toLowerCase(),
});
if (!targetUser) {
return res
.status(404)
.send("send error message");
}
// you should have Invitations model
// check if invitation already sent.
// check if the user we would like to invite is our friend
// now create a new invitation
// if invitation has been successfully created, update the user's friend
return res.status(201).send("Invitation has been sent");
};

Nodejs how to separate multiple "multipartform-data" POST requests

In Nodejs I have developed a small Client application that sends multiple “multipart/form-data” to my Server application using POST requests.
Each form to be sent is composed by a file (loaded from the Client hard-disk) and a string information. Basically I have the following situation :
Form 1: (File 1, String 1)
Form 2: (File 2, String 2)
Form 3: (File 3, String 3)
Etc..
To make the POST requests I’m using the “form-data” library ( https://www.npmjs.com/package/form-data ).
The problem that I’m facing is that all the POST requests are sent after the end of the execution of my Client application, but I would like to be able to send each POST request separately.
Here is a part of the code that I’m using :
function FormSubmit(item)
{
var FileStream = fs.createReadStream(item.path);
// Create an "Upload" Form and set all form parameters.
let form = new FormData();
form.append('Text1', 'test');
form.append('file', FileStream);
// Form Submit.
form.submit('http://localhost:5000/upload', function(err, res) {
if (err) {
console.log(err);
}
if (res!= undefined)
res.resume();
else
console.log('Res undefined: ', res);
});
}
I’m calling the “FormSubmit” function multiple times, and I was expecting to receive the POST request on the Server application every time after executing the command “form.submit”, but in reality I receive the POST requests all together after the entire application execution finish.
In particular the Server receives the requests on the command “self.emit('connect');” inside the function “afterConnect” in the “net.js” file in the core module.
It seems that it has nothing to do with timings, because even if i put a breakpoint and wait for some minutes after the first execution of the "FormSubmit" function, i don't receive anything on the Server application.
Probably it is not something related to the "form-data" library, because i get the same behaviour using "request", etc..
I guess it is something related to NodeJs itself or about how i wrote the Client application.
I am new to NodeJs so any help/advice would be appreciated.
Thanks.

How to trigger background-processes, how to receive intermediate results?

I have a NodeJS / background-process issue, that I don't know how to solve it 'elegant', straight, the right way.
The user submits some (like ~10 or more) URLs via a textarea and then they should be processed asynchronous. [a screenshot with puppeteer has to be taken, some information gathered, the screenshot should be processed with sharp and the result should be persisted in a MongoDB. The screenshot via GridFS and the URL in an own collection with a reference to the screenshot].
While this async process is calculated in the background, the page should be updated whenever a URL got processed.
There are so many ways to do that, but which one is the most correct/straightforward/resource saving way?
Browserify and I do it in the browser? No, too much stuff on the client side.. AJAX/Axios posts and wait for the URLs to be processed and reflect the results on the side? Trigger the process before the response gets send back to the client or let the client start the processing?
So, I made a workflow engine of some sort that supports long-running jobs. And I followed this tutorial https://farazdagi.com/2014/rest-and-long-running-jobs/
Which is nothing, when a request is created you just return a status code and when the jobs are completed you just log them somewhere and use that.
For, this I used EventEmitter which is used inside a promise. It's only my solution maybe not elegant, maybe outright wrong. Made a little POC for you.
const events = require('events')
const emitter = new events.EventEmitter();
const actualWork = function() {
return new Promise((res,rej)=>{
setTimeout(res, 1000);
})
}
emitter.on('workCompleted', function(){
// log somewhere
});
app.get('/someroute', (req,res)=>{
res.json({msg:'reqest initiated', id: 'some_id'})
actualWork()
.then(()=>{
emitter.emit('workCompleted', {id: 'some_id'});
});
})
app.get('/someroute/id/status', (req,res)=>{
//get the log
})

Node JS & Socket.io Advice - Is it better to post information to a database through a route, or through the socket?

I am currently building a new application which, at a basic level, lets users add a task which needs to be completed, and then lets a different group of users pick up the task and complete it.
Previously, I took over building a real-time chat application written with NodeJS and Socket.io and on that, all the messages were posted to a database over the socket connection.
In the current application I am doing the same, but was thinking if it might be better off to post the information to the database via the route instead, then emitting the socket event on success to update the list of available tasks.
I was just looking for advice, how would you guys do this? Commit info to the database through a route or over the socket?
If you go the route way things are pretty much laid out for you to be sure whether or not your update worked or not. Socket doesn't guarantee neither success nor failure, by default.
But you could program to make it. For e.g.
client: send data
socket.emit('update', data); // send data
server: receive data & send back an update as to whether the operation was successful or not
socket.on('update', function(data){
findOrUpdateOrWhateverAsync(function(err){
if(!err) socket.emit('update', null); // send back a "null" on success
else socket.emit('update', err.message); // or send back error message
});
});
client: receive update on error/success
socket.on('update', function(err){
if(err) alert(err);
});

node.js wait for response

I have a very limited knowledge about node and nob-blocking IO so forgive me if my question is too naive.
In order to return needed information in response body, I need to
Make a call to 3rd party API
Wait for response
Add some modifications and return JSON response with the information I got from API.
My question is.. how can I wait for response? Or is it possible to send the information to the client only when I received response from API (as far as I know, connection should be bidirectional in this case which means I won't be able to do so using HTTP).
And yet another question. If one request waits for response from API, does this mean than other users will be forced to wait too (since node is single-threaded) until I increase numbers of threads/processes from 1 to N?
You pass a callback to the function which calls the service. If the service is a database, for example:
db.connect(host, callback);
And somewhere else in the code:
var callback = function(err, dbObject) {
// The connection was made, it's safe to handle the code here
console.log(dbObject.status);
res.json(jsonObject, 200)
};
Or you can use anonymous functions, so:
db.connect(host, function(err, dbObject) {
// The connection was made, it's safe to handle the code here
console.log(dbObject.status);
res.json(jsonObject, 200)
});
Between the call and the callback, node handles other clients / connections freely, "non-blocking".
This type of situation is exactly what node was designed to solve. Once you receive the request from your client, you can make a http request, which should take a callback parameter. This will call your callback function when the request is done, but node can do other work (including serving other clients) while you are waiting for the response. Once the request is done, you can have your code return the response to the client that is still waiting.
The amount of memory and CPU used by the node process will increase as additional clients connect to it, but only one process is needed to handle many simultaneous clients.
Node focuses on doing slow I/O asynchronously, so that the application code can start a task, and then have code start executing again after the I/O has completed.
An typical example might make it clear. We make a call to the FB API. When we get a response, we modify it and then send JSON to the user.
var express = require('express');
var fb = require('facebook-js');
app.get('/user', function(req, res){
fb.apiCall('GET', '/me/', {access_token: access_token}, function(error, response, body){ // access FB API
// when FB responds this part of the code will execute
if (error){
throw new Error('Error getting user information');
}
body.platform = 'Facebook' // modify the Facebook response, available as JSON in body
res.json(body); // send the response to client
});
});

Resources