Node.js - process millions of http post items without blocking

Node.js - process millions of http post items without blocking - node.js

Using node.js, what is the best way to process a million items in an HTTP post request without blocking the server? My only guess is some sort of message queue, but I really have no idea.

You would want to use a lib like async.js to create non-blocking loops.
https://github.com/caolan/async
var async = require("async");
async.each(yourArrayOfThings, function(oneItem, callback) {
// do something
// ...
return callback(null);
}, function(err) {
// if any of the callbacks returned an error, err would equal that error
});
Give some more information on what your processing needs are, if this is not an applicable solution for you.

Related

make nodejs http get call in synchronous way

I was asked a question in an interview. below is the question.
const JsonFromHTTPCall = function(){
// make get request to some api url and return json object.
}
// code below is not editable
let result = JsonFromHTTPCall();
console.log("result ", result);
I am not finding a way to make console.log statement wait until I get the result from http call.
Please give me a way to solve it.
Thanks in advance.

Nodejs does not offer synchronous networking in any way. All built-in networking is asynchronous. Therefore, you cannot directly return a value from a function retrieved via networking. Instead, you need to communicate back the result either via a callback function, an event you trigger or a returned promise.
For a summary of this issue see this highly active question/answer:
How do I return the response from an asynchronous call?
There is a gross hack that involves using a synchronous child process and having it do the networking for you, but it's unlikely that is what they were asking for in your interview.
So, the main answer to the question is that "nodejs does not offer synchronous networking" and further "you cannot change an asynchronous result into a synchronous result". Therefore the proper way to code this is to use nodejs asynchronous coding techniques.
The cleanest way I know of to make http get calls is using a library such as request-promise() or my newer favorite got() and use the promise interface plus async/await to make a nice clean code path:
const got = require('got');
async function getSomeJSON(url) {
let data = await got(url).json();
console.log(data);
return data;
}
getSomeJSON(myURL).then(data => {
console.log("got my data");
}).catch(err => {
console.log(err);
});

node async call return data in response

I am new to nodejs so I have a basic question and this is my scanrio
I have a javascript client which is making a http request to a node server to read a value from the database.
Once the node server receives the request it makes a simple db call and returns the data to the client in the response, and this is where the problem is.
router.get('/state', function(req, res){
var result = dbServer.makeDBCall();//Before this line executes and returns the result the next line executes
res.send(result);
}
The database call from the node server is asynchronous, therefore before the result is returned the node server has already sent a blank response to the client. What is the standard/acceptable way of getting this achieved, I know I can block the node thread using async, but then the whole purpose of node is gone right?

It depends on what kind of database node module you are using.
Other than the standard callback approach, there are also the promise way. The pg-promise library is 1 of those kind.
See sample code:
this.databaseConnection.makeDBCall('your query...')
.then(function(dbResponse) {
// Parse the response to the format you want then...
res.send(result);
})
.catch(function(error) {
// Handle error
res.send(error.message);
});
#spdev : I saw 1 of your comments about you being worried about how Node actually knows who to reply the response to, especially when there are multiple requests.
This is a very good question, and to be honest with you - I don't know much about it as well.
In short the answer is yes, Node somehow handles this by creating a corresponding ServerResponse object when a HTTP request comes through. This object seems to have some smartness to tell the Nodejs network stack how to route itself back to the caller when it gets parsed as data packets.
I tried Googling a bit for an answer but didn't got too far. I hope the ServerResponseObject documentation can provide more insight for you. Share with me if you got an answer thanks!
https://nodejs.org/api/all.html#http_class_http_serverresponse

Try below code.
router.get('/state', function(req, res){
var result = dbServer.makeDBCall(function(err,result){
if(!err) {
res.send(result);
}
});
}
Hope this Help.

The dbServer.makeDBCall(); must have a callback that runs when the statement completes executing.
Something like -
dbServer.makeDBCall({query: 'args'}, function(err, result){
if (err) // handle error
res.send(result);
})
You return the response from db from that callback function.
Learn more about callback from here-
nodeJs callbacks simple example
https://docs.nodejitsu.com/articles/getting-started/control-flow/what-are-callbacks/

nodejs - multiple async http requests

I have just started my journey with nodejs and would like to create a simple nodejs app that needs to:
- first request/get some initial data from via http,
- use received json to do another set of requests (some can be done in parallel, some needs to be executed first and data received will be used to create valid url).
Taking into account that nodejs is asynchronous and based on callbacks, I am wondering what is the best way to achieve this in order to have 'clean code' and not mess up with the code too much.
Thanks for any hints / guidelines, Mark

Maybe check out the Async library. Has a lot of built in functionality that seems to accomplish what you're looking for. Couple of useful ones right off the bat might be "async.waterfall" and "async.map".
async.waterfall
async.map

Agreed that this is subjective, in general the way to go is promises, there are native promises:
Native Promise Docs - MDN
For your particular question, imo, the npm module request-promise offers some great solutions. It is essentially a 'Promisified" version of the request module:
It will allow you to GET/POST/PUT/DELETE and follow up each request with a .then() where you can continue to do more calls like so:
-this code first GETS something from a server, then POSTS something else to that server.
function addUserToAccountName(url, accountName, username, password){
var options = assignUrl(url); // assignUrl is not in this code
request
.get(options) //first get
.auth(username, password)
.then(function(res) {
var id = parseId(res.data, accountName); //parse response
return id;
})
.then(function(id) {
var postOptions = Object.assign(defaultSettings, {url: url + id + '/users'})
request.post(postOptions) // then make a post
.auth(username, password)
.then(function(response) {
//console.log(response);
})
.catch(function(err) {
console.log((err.response.body.message));
})
})
}
You can just keep going with the .then() whatever you return from the previous .then() will be passed in to the function.
Request-Promise

To async, or not to async in node.js?

I'm still learning the node.js ropes and am just trying to get my head around what I should be deferring, and what I should just be executing.
I know there are other questions relating to this subject generally, but I'm afraid without a more relatable example I'm struggling to 'get it'.
My general understanding is that if the code being executed is non-trivial, then it's probably a good idea to async it, as to avoid it holding up someone else's session. There's clearly more to it than that, and callbacks get mentioned a lot, and I'm not 100% on why you wouldn't just synch everything. I've got some ways to go.
So here's some basic code I've put together in an express.js app:
app.get('/directory', function(req, res) {
process.nextTick(function() {
Item.
find().
sort( 'date-modified' ).
exec( function ( err, items ){
if ( err ) {
return next( err );
}
res.render('directory.ejs', {
items : items
});
});
});
});
Am I right to be using process.nextTick() here? My reasoning is that as it's a database call then some actual work is having to be done, and it's the kind of thing that could slow down active sessions. Or is that wrong?
Secondly, I have a feeling that if I'm deferring the database query then it should be in a callback, and I should have the actual page rendering happening synchronously, on condition of receiving the callback response. I'm only assuming this because it seems like a more common format from some of the examples I've seen - if it's a correct assumption can anyone explain why that's the case?
Thanks!

You are using it wrong in this case, because .exec() is already asynchronous (You can tell by the fact that is accepts a callback as a parameter).
To be fair, most of what needs to be asynchronous in nodejs already is.
As for page rendering, if you require the results from the database to render the page, and those arrive asynchronously, you can't really render the page synchronously.

Generally speaking it's best practice to make everything you can asynchronous rather than relying on synchronous functions ... in most cases that would be something like readFile vs. readFileSync. In your example, you're not doing anything synchronously with i/o. The only synchronous code you have is the logic of your program (which requires CPU and thus has to be synchronous in node) but these are tiny little things by comparison.
I'm not sure what Item is, but if I had to guess what .find().sort() does is build a query string internally to the system. It does not actually run the query (talk to the DB) until .exec is called. .exec takes a callback, so it will communicate with the DB asynchronously. When that communication is done, the callback is called.
Using process.nextTick does nothing in this case. That would just delay the calling of its code until the next event loop which there is no need to do. It has no effect on synchronicity or not.
I don't really understand your second question, but if the rendering of the page depends on the result of the query, you have to defer rendering of the page until the query completes -- you are doing this by rendering in the callback. The rendering itself res.render may not be entirely synchronous either. It depends on the internal mechanism of the library that defines the render function.
In your example, next is not defined. Instead your code should probably look like:
app.get('/directory', function(req, res) {
Item.
find().
sort( 'date-modified' ).
exec(function (err, items) {
if (err) {
console.error(err);
res.status(500).end("Database error");
}
else {
res.render('directory.ejs', {
items : items
});
}
});
});
});

How to avoid the need to delay event emission to the next tick of the event loop?

I'm writing a Node.js application using a global event emitter. In other words, my application is built entirely around events. I find this kind of architecture working extremely well for me, with the exception of one side case which I will describe here.
Note that I do not think knowledge of Node.js is required to answer this question. Therefore I will try to keep it abstract.
Imagine the following situation:
A global event emitter (called mediator) allows individual modules to listen for application-wide events.
A HTTP Server is created, accepting incoming requests.
For each incoming request, an event emitter is created to deal with events specific to this request
An example (purely to illustrate this question) of an incoming request:
mediator.on('http.request', request, response, emitter) {
//deal with the new request here, e.g.:
response.send("Hello World.");
});
So far, so good. One can now extend this application by identifying the requested URL and emitting appropriate events:
mediator.on('http.request', request, response, emitter) {
//identify the requested URL
if (request.url === '/') {
emitter.emit('root');
}
else {
emitter.emit('404');
}
});
Following this one can write a module that will deal with a root request.
mediator.on('http.request', function(request, response, emitter) {
//when root is requested
emitter.once('root', function() {
response.send('Welcome to the frontpage.');
});
});
Seems fine, right? Actually, it is potentially broken code. The reason is that the line emitter.emit('root') may be executed before the line emitter.once('root', ...). The result is that the listener never gets executed.
One could deal with this specific situation by delaying the emission of the root event to the end of the event loop:
mediator.on('http.request', request, response, emitter) {
//identify the requested URL
if (request.url === '/') {
process.nextTick(function() {
emitter.emit('root');
});
}
else {
process.nextTick(function() {
emitter.emit('404');
});
}
});
The reason this works is because the emission is now delayed until the current event loop has finished, and therefore all listeners have been registered.
However, there are many issues with this approach:
one of the advantages of such event based architecture is that emitting modules do not need to know who is listening to their events. Therefore it should not be necessary to decide whether the event emission needs to be delayed, because one cannot know what is going to listen for the event and if it needs it to be delayed or not.
it significantly clutters and complexifies code (compare the two examples)
it probably worsens performance
As a consequence, my question is: how does one avoid the need to delay event emission to the next tick of the event loop, such as in the described situation?
Update 19-01-2013
An example illustrating why this behavior is useful: to allow a http request to be handled in parallel.
mediator.on('http.request', function(req, res) {
req.onceall('json.parsed', 'validated', 'methodoverridden', 'authenticated', function() {
//the request has now been validated, parsed as JSON, the kind of HTTP method has been overridden when requested to and it has been authenticated
});
});
If each event like json.parsed would emit the original request, then the above is not possible because each event is related to another request and you cannot listen for a combination of actions executed in parallel for a specific request.

Having both a mediator which listens for events and an emitter which also listens and triggers events seems overly complicated. I'm sure there is a legit reason but my suggestion is to simplify. We use a global eventBus in our nodejs service that does something similar. For this situation, I would emit a new event.
bus.on('http:request', function(req, res) {
if (req.url === '/')
bus.emit('ns:root', req, res);
else
bus.emit('404');
});
// note the use of namespace here to target specific subsystem
bus.once('ns:root', function(req, res) {
res.send('Welcome to the frontpage.');
});

It sounds like you're starting to run into some of the disadvantages of the observer pattern (as mentioned in many books/articles that describe this pattern). My solution is not ideal – assuming an ideal one exists – but:
If you can make a simplifying assumption that the event is emitted only 1 time per emitter (i.e. emitter.emit('root'); is called only once for any emitter instance), then perhaps you can write something that works like jQuery's $.ready() event.
In that case, subscribing to emitter.once('root', function() { ... }) will check whether 'root' was emitted already, and if so, will invoke the handler anyway. And if 'root' was not emitted yet, it'll defer to the normal, existing functionality.
That's all I got.

I think this architecture is in trouble, as you're doing sequential work (I/O) that requires definite order of actions but still plan to build app on components that naturally allow non-deterministic order of execution.
What you can do
Include context selector in mediator.on function e.g. in this way
mediator.on('http.request > root', function( .. ) { } )
Or define it as submediator
var submediator = mediator.yield('http.request > root');
submediator.on(function( ... ) {
emitter.once('root', ... )
});
This would trigger the callback only if root was emitted from http.request handler.
Another trickier way is to make background ordering, but it's not feasible with your current one mediator rules them all interface. Implement code so, that each .emit call does not actually send the event, but puts the produced event in list. Each .once puts consume event record in the same list. When all mediator.on callbacks have been executed, walk through the list, sort it by dependency order (e.g. if list has first consume 'root' and then produce 'root' swap them). Then execute consume handlers in order. If you run out of events, stop executing.

Oi, this seems like a very broken architecture for a few reasons:
How do you pass around request and response? It looks like you've got global references to them.
If I answer your question, you will turn your server into a pure synchronous function and you'd lose the power of async node.js. (Requests would be queued effectively, and could only start executing once the last request is 100% finished.)
To fix this:
Pass request & response to the emit() call as parameters. Now you don't need to force everything to run synchronously anymore, because when the next component handles the event, it will have a reference to the right request & response objects.
Learn about other common solutions that don't need a global mediator. Look at the pattern that Connect was based on many Internet-years ago: http://howtonode.org/connect-it <- describes middleware/onion routing

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Node.js - process millions of http post items without blocking - node.js

Using node.js, what is the best way to process a million items in an HTTP post request without blocking the server? My only guess is some sort of message queue, but I really have no idea.

Related

make nodejs http get call in synchronous way

node async call return data in response

nodejs - multiple async http requests

To async, or not to async in node.js?

How to avoid the need to delay event emission to the next tick of the event loop?

Categories

Resources