How to avoid the need to delay event emission to the next tick of the event loop? - node.js

I'm writing a Node.js application using a global event emitter. In other words, my application is built entirely around events. I find this kind of architecture working extremely well for me, with the exception of one side case which I will describe here.
Note that I do not think knowledge of Node.js is required to answer this question. Therefore I will try to keep it abstract.
Imagine the following situation:
A global event emitter (called mediator) allows individual modules to listen for application-wide events.
A HTTP Server is created, accepting incoming requests.
For each incoming request, an event emitter is created to deal with events specific to this request
An example (purely to illustrate this question) of an incoming request:
mediator.on('http.request', request, response, emitter) {
//deal with the new request here, e.g.:
response.send("Hello World.");
});
So far, so good. One can now extend this application by identifying the requested URL and emitting appropriate events:
mediator.on('http.request', request, response, emitter) {
//identify the requested URL
if (request.url === '/') {
emitter.emit('root');
}
else {
emitter.emit('404');
}
});
Following this one can write a module that will deal with a root request.
mediator.on('http.request', function(request, response, emitter) {
//when root is requested
emitter.once('root', function() {
response.send('Welcome to the frontpage.');
});
});
Seems fine, right? Actually, it is potentially broken code. The reason is that the line emitter.emit('root') may be executed before the line emitter.once('root', ...). The result is that the listener never gets executed.
One could deal with this specific situation by delaying the emission of the root event to the end of the event loop:
mediator.on('http.request', request, response, emitter) {
//identify the requested URL
if (request.url === '/') {
process.nextTick(function() {
emitter.emit('root');
});
}
else {
process.nextTick(function() {
emitter.emit('404');
});
}
});
The reason this works is because the emission is now delayed until the current event loop has finished, and therefore all listeners have been registered.
However, there are many issues with this approach:
one of the advantages of such event based architecture is that emitting modules do not need to know who is listening to their events. Therefore it should not be necessary to decide whether the event emission needs to be delayed, because one cannot know what is going to listen for the event and if it needs it to be delayed or not.
it significantly clutters and complexifies code (compare the two examples)
it probably worsens performance
As a consequence, my question is: how does one avoid the need to delay event emission to the next tick of the event loop, such as in the described situation?
Update 19-01-2013
An example illustrating why this behavior is useful: to allow a http request to be handled in parallel.
mediator.on('http.request', function(req, res) {
req.onceall('json.parsed', 'validated', 'methodoverridden', 'authenticated', function() {
//the request has now been validated, parsed as JSON, the kind of HTTP method has been overridden when requested to and it has been authenticated
});
});
If each event like json.parsed would emit the original request, then the above is not possible because each event is related to another request and you cannot listen for a combination of actions executed in parallel for a specific request.

Having both a mediator which listens for events and an emitter which also listens and triggers events seems overly complicated. I'm sure there is a legit reason but my suggestion is to simplify. We use a global eventBus in our nodejs service that does something similar. For this situation, I would emit a new event.
bus.on('http:request', function(req, res) {
if (req.url === '/')
bus.emit('ns:root', req, res);
else
bus.emit('404');
});
// note the use of namespace here to target specific subsystem
bus.once('ns:root', function(req, res) {
res.send('Welcome to the frontpage.');
});

It sounds like you're starting to run into some of the disadvantages of the observer pattern (as mentioned in many books/articles that describe this pattern). My solution is not ideal – assuming an ideal one exists – but:
If you can make a simplifying assumption that the event is emitted only 1 time per emitter (i.e. emitter.emit('root'); is called only once for any emitter instance), then perhaps you can write something that works like jQuery's $.ready() event.
In that case, subscribing to emitter.once('root', function() { ... }) will check whether 'root' was emitted already, and if so, will invoke the handler anyway. And if 'root' was not emitted yet, it'll defer to the normal, existing functionality.
That's all I got.

I think this architecture is in trouble, as you're doing sequential work (I/O) that requires definite order of actions but still plan to build app on components that naturally allow non-deterministic order of execution.
What you can do
Include context selector in mediator.on function e.g. in this way
mediator.on('http.request > root', function( .. ) { } )
Or define it as submediator
var submediator = mediator.yield('http.request > root');
submediator.on(function( ... ) {
emitter.once('root', ... )
});
This would trigger the callback only if root was emitted from http.request handler.
Another trickier way is to make background ordering, but it's not feasible with your current one mediator rules them all interface. Implement code so, that each .emit call does not actually send the event, but puts the produced event in list. Each .once puts consume event record in the same list. When all mediator.on callbacks have been executed, walk through the list, sort it by dependency order (e.g. if list has first consume 'root' and then produce 'root' swap them). Then execute consume handlers in order. If you run out of events, stop executing.

Oi, this seems like a very broken architecture for a few reasons:
How do you pass around request and response? It looks like you've got global references to them.
If I answer your question, you will turn your server into a pure synchronous function and you'd lose the power of async node.js. (Requests would be queued effectively, and could only start executing once the last request is 100% finished.)
To fix this:
Pass request & response to the emit() call as parameters. Now you don't need to force everything to run synchronously anymore, because when the next component handles the event, it will have a reference to the right request & response objects.
Learn about other common solutions that don't need a global mediator. Look at the pattern that Connect was based on many Internet-years ago: http://howtonode.org/connect-it <- describes middleware/onion routing

Related

Sequentially execute webhooks received in node application

I have a node application using koa. It receiving webhooks from external application on specific resources.
To illustrate let say the webhook send me with POST request an object of this type :
{
'resource_id':'<SomeID>',
'resource_origin':'<SomeResourceOrigin>',
'value' : '<SomeValue>'
}
I would like to execute sequentially any resources coming from the same origin to avoid desynchronization of resources related to my execution.
I was thinking to use database as lock and use cron to sequentially executing my process for each resources of same origin.
But I'm not sure it's the most efficient method.
So my question is here :
Do you know some method/package/service allowing me to use global queues that I could implement for each origin insuring resources from same origin will be executed synchronously without making all webhooks processed sequentially ? If it do not use database it's better.
If I were you I would start by serializing the handling of all your webhooks. In other words, I suggest you handle them one at a time no matter their origin. Use a simple queue inside your nodejs application.
(Once you've convinced yourself that works correctly, you can then serialize them based on origin.)
First, structure your function (let's call it handleOneWebhook()) for handling incoming webhooks as a Promise or an async function. Then you could invoke them using code with this outline.
let busy= false
async function handleManyWebhooks (queue) {
if (busy) return
busy = true
while (queue.length > 0) {
const item = queue.shift()
await handleOneWebhook (item)
}
busy = false
}
The queue you pass to handleManyWebhooks is a simple array, where each element is the object from a POST request. You use it as a queue: push() each object to put it into the queue, and shift() to remove it.
Then, whenever you receive a webhook POST object you use code with this outline.
const queue = []
...
function handlePostObject (postObject) {
queue.push(postObject)
handleManyWebooks (queue)
}
Even though you call handleManyWebhooks once for each incoming object, the busy flag makes sure it handles only one at a time.
Notice this is a very simple solution. Once you have it working correctly, two possible refinements suggest themselves.
Use something more efficient for your queue than a simple array. shift() is not very fast.
Create a separate queue object with its own busy flag for each separate origin. Then you will be able to parallelize the handling of webhooks from different origins while still serializing the stream of webhooks from each origin.
Solution I decide to use
Small brief of the post discussion
As Ivan Rubinson let me know my problem is just a producer-consumer problem.
So I finally chose to use RabbitMQ because I have a huge amount of webhook to process. For peoples having a small amount of request to process and do not want use external tools O. Jones answer is a real good way to solve the problem.
Solution design
I finally install and configure a RabbitMQ server, then I created for each origin of my web-hooks one queue.
Producer
On the producer side when I receive the web-hook data I send a message to the queue corresponding to the origin of my web-hook with serialized information needed to process in fact id of the row in the Database to make messages as light as possible.
Consumer
On the consumer side I create a consumer function for each origin queue and set the fetch policy to one to process message one by one in each queue finally I set the channel policy to wait an acknowledgement message before to send the next message . Wit this configuration consumers proceed message by message and solve the initial problem.
Implementation
Producer
async function create(){
await amqp.connect(RBMQ_CONNECTION_STRING).then(async (conn)=>{
await conn.createChannel().then(async (ch)=>{
global.channel_publisher=ch;
});
});
}
async function sendtask(queue,task){
if(!global.channel_publisher){
await create();
}
global.channel_publisher.assertQueue(queue).then((ok)=>{
global.channel_publisher.sendToQueue(queue, Buffer.from(task));
});
}
I use the sendtask(queue,task) function at the place I received my web-hook
Consumer
async function create(){
await amqp.connect(RBMQ_CONNECTION_STRING).then(async (conn)=>{
await conn.createChannel().then(async (ch)=>{
ch.prefetch(1);
global.channel_consumer=ch;
});
});
}
async function consumeTask(queue){
if(!global.channel_consumer){
await create();
}
global.channel_consumer.assertQueue(queue).then((ok)=>{
global.channel_consumer.consume(queue,(message)=>{
const args=message.content.toString().split(';');
await processWebhooks(args);
global.channel_consumer.ack(message);
});
});
}
I use the consumeTask(queue) when I had to process a new origin of web-hooks. Also I use it for initialize my application with all known origins in the database.

Is there any risk to read/write the same file content from different 'sessions' in Node JS?

I'm new in Node JS and i wonder if under mentioned snippets of code has multisession problem.
Consider I have Node JS server (express) and I listen on some POST request:
app.post('/sync/:method', onPostRequest);
var onPostRequest = function(req,res){
// parse request and fetch email list
var emails = [....]; // pseudocode
doJob(emails);
res.status(200).end('OK');
}
function doJob(_emails){
try {
emailsFromFile = fs.readFileSync(FILE_PATH, "utf8") || {};
if(_.isString(oldEmails)){
emailsFromFile = JSON.parse(emailsFromFile);
}
_emails.forEach(function(_email){
if( !emailsFromFile[_email] ){
emailsFromFile[_email] = 0;
}
else{
emailsFromFile[_email] += 1;
}
});
// write object back
fs.writeFileSync(FILE_PATH, JSON.stringify(emailsFromFile));
} catch (e) {
console.error(e);
};
}
So doJob method receives _emails list and I update (counter +1) these emails from object emailsFromFile loaded from file.
Consider I got 2 requests at the same time and it triggers doJob twice. I afraid that when one request loaded emailsFromFile from file, the second request might change file content.
Can anybody spread the light on this issue?
Because the code in the doJob() function is all synchronous, there is no risk of multiple requests causing a concurrency problem.
If you were using async IO in that function, then there would be possible concurrency issues.
To explain, Javascript in node.js is single threaded. So, there is only one thread of Javascript execution running at a time and that thread of execution runs until it returns back to the event loop. So, any sequence of entirely synchronous code like you have in doJob() will run to completion without interruption.
If, on the other hand, you use any asynchronous operations such as fs.readFile() instead of fs.readFileSync(), then that thread of execution will return back to the event loop at the point you call fs.readFileSync() and another request can be run while it is reading the file. If that were the case, then you could end up with two requests conflicting over the same file. In that case, you would have to implement some form of concurrency protection (some sort of flag or queue). This is the type of thing that databases offer lots of features for.
I have a node.js app running on a Raspberry Pi that uses lots of async file I/O and I can have conflicts with that code from multiple requests. I solved it by setting a flag anytime I'm writing to a specific file and any other requests that want to write to that file first check that flag and if it is set, those requests going into my own queue are then served when the prior request finishes its write operation. There are many other ways to solve that too. If this happens in a lot of places, then it's probably worth just getting a database that offers features for this type of write contention.

Optimization callbacks in MeteorJS

I dont know how to ask my question correctly, but for example I have some structure like this
get_data:function(){
this.unblock();
request("example.com", Meteor.bindEnvironment(function(error, response, body) {
if (!error && response.statusCode == 200) {
$ = Cheerio.load(body);// get HTML of example.com
$(".someclass").each(function() {
if (!somedata_doesnt_exist_in_Mongo) {
request(nexturl, Meteor.bindEnvironment(function(error, response, body)
//... logic
}));
}
});
}
}))
}
Main idea is that I get data from many sites like agregator and have a lot of methods like this. And it'a a lot of time. So I have 2 questions
1 - for Meteor guys. When I use this.unblock() this ensures that my method will work without taking time with customers, like work in other thread ?
2 - How can I optimaze code stucture like above ?
Sorry if it's not in StackOverflow format but
I am waiting for any help !
this.unblock is relevant only to each client individually. It
allows subsequent method calls from client A to run without
having the previous method calls from that client A to finish.
It is like working in a new thread asynchronously in the sense that
the previous method calls are not blocking for client A for this
function using this.unblock. If you have client B, his/her
method invocation wouldn't be blocking A's regardless of whether
you use this.unblock.
I recommend using this.unblock whenever you are sure subsequent method calls will not rely on the result of the function you use this.unblock in. Sending out emails is the most common example. Subsequent method calls will not need the emails to finish sending before doing its job. For your example, I think it should be good to use this.unblock, but of course it depends on what you plan to do with the results following the execution of code after this.unblock.

How to iterate on each record of a Model.stream waterline query?

I need to do something like:
Lineup.stream({foo:"bar"}).exec(function(err,lineup){
// Do something with each record
});
Lineup is a collection with over 18000 records so I think using find is not a good option. What's the correct way to do this? From docs I can't figure out how to.
The .stream() method returns a node stream interface ( a read stream ) that emits events as data is read. Your options here are either to .pipe() to something else that can take "stream" input, such as the response object of the server, or to attach an event listener to the events emitted from the stream. i.e:
Piped to response
Lineup.stream({foo:"bar"}).pipe(res);
Setup event listeners
var stream = Lineup.stream({foo:"bar"});
stream.on("data",function(data) {
stream.pause(); // stop emitting events for a moment
/*
* Do things
*/
stream.resume(); // resume events
});
stream.on("err",function(err) {
// handle any errors that will throw in reading here
});
The .pause() and .resume() are quite inportant as otherwise things within the processing just keep responding to emitted events before that code is complete. While fine for small cases, this is not desirable for larger "streams" that the interface is meant to be used for.
Additionally, if you are calling any "asynchronous" actions inside the event handler like this, then you need to take care to .resume() within the callback or promise resolution , thus waiting for that "async" action to complete itself.
But look at the "node documentation" linked earlier for more in depth information on "stream".
P.S I believe the following syntax should also be supported if it suits your sensibilities better:
var stream = Lineup.find({foo:"bar"}).stream();

how to remove listener from inside the callback function in node.js

I set up a listener for an event emitter and what I want to do is to remove the same listener if I get certain events. The problem I am running into is that I don't know how to pass the callback function to removeListener inside the callback function. I tried "this", but it errors out. Is there any ways to achieve this? By the way, I am not using once because I am only removing the listener on a certain event.
P.S. I am using redis here so whatever message I receive I would always be listening on the key "message". It would not be possible to just listen on different keys. Channel wouldn't help either because I only want to remove a specific listener.
Also, what I want to do is communication between two completely independent process. No hierarchy of any kind. In process B, there are many independent functions that will get data from process A. My initial thought was using a message queue, but with that I cannot think of a way to ensure that each function in B will get the right data from A.
One cool thing about closures is that you can assign them a name, and that name can be used internally by the function. I haven't tested this, but you should try:
object.on('event', function handler() {
// do stuff
object.off('event', handler);
});
You should also look into whether your event emitter supports namespaces. That would let you do something like:
object.on('event.namespace', function() {
// do stuff
object.off('.namespace');
});

Resources