Node js async queue and multi-thread - node.js

I'm building an app with node js that must take request from the users, read in mongodb and update the specific user data before taking another request for this specific user. I was able to do this using async queue with this code:
//Create a queue with concurrency of 1
var tasksQueue = async.queue(function (userInfo, callback) {
User.verifyAndUpdateAccount(userInfo, callback)
}, 1);
router.post('/', function (req, res, next) {
res.setHeader('Content-Type', 'application/json');
//Is required datas present?
if (!req.body.username || !req.body.password) {
return res.send(null);
}
var username = req.body.username;
var password = req.body.password;
//Verify account informations and update them
tasksQueue.push({username: username, password:password}, function (err,
updatedUser) {
if (err == 'NO_MORE_REQUESTS_ALLOW') {
// S'il ne lui reste plus de sms
return res.send({
code: 400,
message: 'NO_MORE_REQUESTS_ALLOW'
})
} else if (err) throw err;
//**********************************************************
// Build the response here and respond to user requests
})
The problem is that it doesn't work well when I use the module cluster.js because each thread has his own queue and don't care about other threads. I would like my app to work like this:
Imagine that a user1 made 2 requests and after that user2 made 1 request. Suppose that I have 4 cores, I would like my first thread to start treating the first request of user1 and all the threads wait that the user1 first request has been treated before treating his second request. But the other thread must be able to treat user2 request before the user1 first request has been treated because 1 request only update data of a single user in mongodb and doesn't affect others so we don't need to wait.

Maybe you can refine code to make response processing fast enough. But in case you do need a non-blocking way for heavy computation, you can try 'napajs', which was released by Microsoft that can work with Node.js to enable multithreading JavaScript scenarios in the same process. Here is a quick introduction for your reference.
your code will then look like:
var napa = require('napajs');
// One-time setup.
// You can change number of workers per your requirement.
var zone = napa.zone.create('response-builders', { workers: 4 });
function serveRequest() {
var request = null;
// Get request from queue.
// ...
zone.execute(() => {
// Build the response here
return response;
}, [request]).then((result) => {
// respond to user
console.log(result.value);
});
}

Related

Express hold a request until last request is finished

So I'm writing an application in node.js+express which I want to achieve the following goal.
User POST many requests at a nearly same time (like with command curl...& which & makes it run at the background)
Process each request at one time, hold other requests before one is finished. The order can be determined by request arrive time, if same then choose randomly. So if I POST 5 requests to add things in the database at nearly the same time, the first request will be added into database first, while other requests will be held (not responding anything yet) until the first request is been processed and respond with a 200 code then follow on processing the second request.
Is it possible to achieve this with express, so when I send couple requests at one time, it won't occur issue like something isn't add into MongoDB properly.
You can set up middleware before/after your routes to queue up and dequeue requests if one is in progress. As people have mentioned this is not really best practice, but this is a way to do it within a single process (will not work for serverless models)
const queue = [];
const inprogress = null;
app.use((req, res, next) => {
if (inprogress) {
queue.push({req, res, next})
} else {
inprogress = res;
}
})
app.get('/your-route', (req, res, next) => {
// run your code
res.json({ some: 'payload' })
next();
})
app.use((req, res, next) => {
inprogress = null;
if (queue.length > 0) {
const queued = queue.shift();
inprogress = queued.res;
queued.next();
}
next();
})

Send 'Received post' back to requester before async finishes (NodeJS, ExpressJS)

I have an API POST route where I receive data from a client and upload the data to another service. This upload is done inside of the post request (async) and takes awhile. The client wants to know their post req was received prior to the async (create project function) is finished. How can I send without ending the POST? (res.send stops, res.write doesn't send it out)
I thought about making an http request back to their server as soon as this POST route is hit. . .
app.post('/v0/projects', function postProjects(req, res, next) {
console.log('POST notice to me');
// *** HERE, I want to send client message
// This is the async function
createProject(req.body, function (projectResponse) {
projectResponse.on('data', function (data) {
parseString(data.toString('ascii'), function (err, result) {
res.message = result;
});
});
projectResponse.on('end', function () {
if (res.message.error) {
console.log('MY ERROR: ' + JSON.stringify(res.message.error));
next(new Error(res));
} else {
// *** HERE is where they finally receive a message
res.status(200).send(res.message);
}
});
projectResponse.on('error', function (err) {
res.status(500).send(err.message);
});
});
});
The internal system requires that this createProject function is called in the POST request (needs to exist and have something uploaded or else it doesn't exist) -- otherwise I'd call it later.
Thank you!
I think you can't send first response that post request received and send another when internal job i.e. createProject has finished no matter success or fail.
But possibly, you can try:
createProject(payload, callback); // i am async will let you know when done! & it will push payload.jobId in doneJobs
Possibility 1, If actual job response is not required:
app.post('/v0/projects', function (req, res, next) {
// call any async job(s) here
createProject(req.body);
res.send('Hey Client! I have received post request, stay tuned!');
next();
});
});
Possibility 2, If actual job response is required, try maintaining queue:
var q = []; // try option 3 if this is not making sense
var jobsDone = []; // this will be updated by `createProject` callback
app.post('/v0/projects', function (req, res, next) {
// call async job and push it to queue
let randomId = randomId(); // generates random but unique id depending on requests received
q.push({jobId: randomId });
req.body.jobId = randomId;
createProject(req.body);
res.send('Hey Client! I have received post request, stay tuned!');
next();
});
});
// hit this api after sometime to know whether job is done or not
app.get('/v0/status/:jobId', function (req, res, next) {
// check if job is done
// based on checks if done then remove from **q** or retry or whatever is needed
let result = jobsDone.indexOf(req.params.jobId) > -1 ? 'Done' : 'Still Processing';
res.send(result);
next();
});
});
Possibility 3, redis can be used instead of in-memory queue in possibility 2.
P.S. There are other options available as well to achieve the desired results but above mentioned are possible ones.

Nodejs: Async request with a list of URL

I am working on a crawler. I have a list of URL need to be requested. There are several hundreds of request at the same time if I don't set it to be async. I am afraid that it would explode my bandwidth or produce to much network access to the target website. What should I do?
Here is what I am doing:
urlList.forEach((url, index) => {
console.log('Fetching ' + url);
request(url, function(error, response, body) {
//do sth for body
});
});
I want one request is called after one request is completed.
You can use something like Promise library e.g. snippet
const Promise = require("bluebird");
const axios = require("axios");
//Axios wrapper for error handling
const axios_wrapper = (options) => {
return axios(...options)
.then((r) => {
return Promise.resolve({
data: r.data,
error: null,
});
})
.catch((e) => {
return Promise.resolve({
data: null,
error: e.response ? e.response.data : e,
});
});
};
Promise.map(
urls,
(k) => {
return axios_wrapper({
method: "GET",
url: k,
});
},
{ concurrency: 1 } // Here 1 represents how many requests you want to run in parallel
)
.then((r) => {
console.log(r);
//Here r will be an array of objects like {data: [{}], error: null}, where if the request was successfull it will have data value present otherwise error value will be non-null
})
.catch((e) => {
console.error(e);
});
The things you need to watch for are:
Whether the target site has rate limiting and you may be blocked from access if you try to request too much too fast?
How many simultaneous requests the target site can handle without degrading its performance?
How much bandwidth your server has on its end of things?
How many simultaneous requests your own server can have in flight and process without causing excess memory usage or a pegged CPU.
In general, the scheme for managing all this is to create a way to tune how many requests you launch. There are many different ways to control this by number of simultaneous requests, number of requests per second, amount of data used, etc...
The simplest way to start would be to just control how many simultaneous requests you make. That can be done like this:
function runRequests(arrayOfData, maxInFlight, fn) {
return new Promise((resolve, reject) => {
let index = 0;
let inFlight = 0;
function next() {
while (inFlight < maxInFlight && index < arrayOfData.length) {
++inFlight;
fn(arrayOfData[index++]).then(result => {
--inFlight;
next();
}).catch(err => {
--inFlight;
console.log(err);
// purposely eat the error and let the rest of the processing continue
// if you want to stop further processing, you can call reject() here
next();
});
}
if (inFlight === 0) {
// all done
resolve();
}
}
next();
});
}
And, then you would use that like this:
const rp = require('request-promise');
// run the whole urlList, no more than 10 at a time
runRequests(urlList, 10, function(url) {
return rp(url).then(function(data) {
// process fetched data here for one url
}).catch(function(err) {
console.log(url, err);
});
}).then(function() {
// all requests done here
});
This can be made as sophisticated as you want by adding a time element to it (no more than N requests per second) or even a bandwidth element to it.
I want one request is called after one request is completed.
That's a very slow way to do things. If you really want that, then you can just pass a 1 for the maxInFlight parameter to the above function, but typically, things would work a lot faster and not cause problems by allowing somewhere between 5 and 50 simultaneous requests. Only testing would tell you where the sweet spot is for your particular target sites and your particular server infrastructure and amount of processing you need to do on the results.
you can use set timeout function to process all request within loop. for that you must know maximum time to process a request.

effective way of sending the body as a callback

In my app.js
var employees = require('../models/employees');
employees.read(req.params.id, function(body) {
console.log(body.firstName);
});
in my models/employees
var request = require('request');
var employees = {
read: function(id, callback) {
request
.get('http://api.mysite.com/employees/' + id, function(error, response, body) {
body = JSON.parse(body);
return callback(body);
})
},
};
module.exports = employees;
this works. (returns the employee name correctly) but I´m not sure if this is the correct (async) way of getting data from an api and displaying it.
thank you!
Node.js by default is asynchronous so you don't have to 'make' it work in an async manner.
For future use though, once you have more requests, there may be times where you have to wait for certain request to finish before you can fire the next one off, i.e. run tasks synchronously. In that case you'll have to use something like http://caolan.github.io/async/ and queue function calls in a waterfall/series model.

Sending different POST response in NodeJS

I would like to get some help with the following problem. I'm writing my bsc thesis, and this small part of code would be responsible for registering a user. (I'm new at nodejs actually). I'm using express and mongoose for this too.
I would like to process the request data, and check for some errors, first I would like to check if all fields exist, secondly if someone already registered with this e-mail address.
Based on the errors (or on success), I would like to send different responses. If a field is missing, then a 400 Bad request, if a user exists, then 409 Conflict, and 200 OK, if everything is ok. But I would only like to do the callback if there are no errors, but I'm kinda stuck here... I get the error Can't set headers after they are sent, which is obvious actually, because JS continues processing the code even if a response is set.
app.post('/register', function (req, res) {
var user = new User(req.body);
checkErrors(req, res, user, registerUser);
});
var registerUser = function(req, res, user){
user.save(function(err, user){
if (err) return console.log(err);
});
res.sendStatus(200);
};
var checkErrors = function(req, res, user, callback){
var properties = [ 'firstName', 'lastName', 'email', 'password', 'dateOfBirth' ];
for(var i = 0; i < properties.length; i++){
if(!req.body.hasOwnProperty(properties[i])){
res.status(400).send('field ' + properties[i] + ' not found');
}
}
var criteria = {
email: req.body.email
};
User.find(criteria).exec(function(err, user){
if(user.length > 0){
res.status(409).send('user already exists');
}
});
callback(req, res, user);
};
I think the problem is in the for loop in checkErrors. Since you call res.status(400).send() within the loop, you can end up calling it multiple times, which will trigger an error after the first call since a response will already have been sent back to the client.
Inside the loop, you can instead add missing fields to an array, then check the length of the array to see if you should respond with a 400 or continue. That way, you will only call res.status(400).send() one time.
For example:
...
var missingFields = [];
for(var i = 0; i < properties.length; i++){
if(!req.body.hasOwnProperty(properties[i])){
missingFields.push(properties[i]);
}
}
if(missingFields.length > 0) {
return res.status(400).send({"missingFields" : missingFields});
}
...
In general, I advise that you put return in front of each res.send() call, to ensure that no others are accidentally called later on.
An example of this is:
User.find(criteria).exec(function(err, user){
// We put return here in case you later add conditionals that are not
// mutually exclusive, since execution will continue past the
// res.status() call without return
if(user.length > 0){
return res.status(409).send('user already exists');
}
// Note that we also put this function call within the block of the
// User.find() callback, since it should not execute until
// User.find() completes and we can check for existing users.
return callback(req, res, user);
});
You probably noticed that I moved callback(req, res, user). If we leave callback(req, res, user) outside the body of the User.find() callback, it is possible that it will be executed before User.find() is completed. This is one of the gotchas of asynchronous programming with Node.js. Callback functions signal when a task is completed, so execution can be done "out of order" in relation to your source code if you don't wrap operations that you want to be sequential within callbacks.
On a side note, in the function registerUser, if user.save fails then the client will never know, since the function sends a 200 status code for any request. This happens for the same reason I mentioned above: because res.sendStatus(200) is not wrapped inside the user.save callback function, it may run before the save operation has completed. If an error occurs during a save, you should tell the client, probably with a 500 status code. For example:
var registerUser = function(req, res, user){
user.save(function(err, user){
if (err) {
console.error(err);
return res.status(500).send(err);
}
return res.sendStatus(201);
});
};
Your call to registerUser() is defined after the route and would be undefined since it's not a hoisted function.
Your use of scope in the closures isn't correct. For your specific error, it's because you're running res.send() in a loop when it's only supposed to be called once per request (hence already sent headers a.k.a. response already sent). You should be returning from the function directly after calling res.send() as well.

Resources