Sequentially execute webhooks received in node application - node.js

I have a node application using koa. It receiving webhooks from external application on specific resources.
To illustrate let say the webhook send me with POST request an object of this type :
{
'resource_id':'<SomeID>',
'resource_origin':'<SomeResourceOrigin>',
'value' : '<SomeValue>'
}
I would like to execute sequentially any resources coming from the same origin to avoid desynchronization of resources related to my execution.
I was thinking to use database as lock and use cron to sequentially executing my process for each resources of same origin.
But I'm not sure it's the most efficient method.
So my question is here :
Do you know some method/package/service allowing me to use global queues that I could implement for each origin insuring resources from same origin will be executed synchronously without making all webhooks processed sequentially ? If it do not use database it's better.

If I were you I would start by serializing the handling of all your webhooks. In other words, I suggest you handle them one at a time no matter their origin. Use a simple queue inside your nodejs application.
(Once you've convinced yourself that works correctly, you can then serialize them based on origin.)
First, structure your function (let's call it handleOneWebhook()) for handling incoming webhooks as a Promise or an async function. Then you could invoke them using code with this outline.
let busy= false
async function handleManyWebhooks (queue) {
if (busy) return
busy = true
while (queue.length > 0) {
const item = queue.shift()
await handleOneWebhook (item)
}
busy = false
}
The queue you pass to handleManyWebhooks is a simple array, where each element is the object from a POST request. You use it as a queue: push() each object to put it into the queue, and shift() to remove it.
Then, whenever you receive a webhook POST object you use code with this outline.
const queue = []
...
function handlePostObject (postObject) {
queue.push(postObject)
handleManyWebooks (queue)
}
Even though you call handleManyWebhooks once for each incoming object, the busy flag makes sure it handles only one at a time.
Notice this is a very simple solution. Once you have it working correctly, two possible refinements suggest themselves.
Use something more efficient for your queue than a simple array. shift() is not very fast.
Create a separate queue object with its own busy flag for each separate origin. Then you will be able to parallelize the handling of webhooks from different origins while still serializing the stream of webhooks from each origin.

Solution I decide to use
Small brief of the post discussion
As Ivan Rubinson let me know my problem is just a producer-consumer problem.
So I finally chose to use RabbitMQ because I have a huge amount of webhook to process. For peoples having a small amount of request to process and do not want use external tools O. Jones answer is a real good way to solve the problem.
Solution design
I finally install and configure a RabbitMQ server, then I created for each origin of my web-hooks one queue.
Producer
On the producer side when I receive the web-hook data I send a message to the queue corresponding to the origin of my web-hook with serialized information needed to process in fact id of the row in the Database to make messages as light as possible.
Consumer
On the consumer side I create a consumer function for each origin queue and set the fetch policy to one to process message one by one in each queue finally I set the channel policy to wait an acknowledgement message before to send the next message . Wit this configuration consumers proceed message by message and solve the initial problem.
Implementation
Producer
async function create(){
await amqp.connect(RBMQ_CONNECTION_STRING).then(async (conn)=>{
await conn.createChannel().then(async (ch)=>{
global.channel_publisher=ch;
});
});
}
async function sendtask(queue,task){
if(!global.channel_publisher){
await create();
}
global.channel_publisher.assertQueue(queue).then((ok)=>{
global.channel_publisher.sendToQueue(queue, Buffer.from(task));
});
}
I use the sendtask(queue,task) function at the place I received my web-hook
Consumer
async function create(){
await amqp.connect(RBMQ_CONNECTION_STRING).then(async (conn)=>{
await conn.createChannel().then(async (ch)=>{
ch.prefetch(1);
global.channel_consumer=ch;
});
});
}
async function consumeTask(queue){
if(!global.channel_consumer){
await create();
}
global.channel_consumer.assertQueue(queue).then((ok)=>{
global.channel_consumer.consume(queue,(message)=>{
const args=message.content.toString().split(';');
await processWebhooks(args);
global.channel_consumer.ack(message);
});
});
}
I use the consumeTask(queue) when I had to process a new origin of web-hooks. Also I use it for initialize my application with all known origins in the database.

Related

Is there any risk to read/write the same file content from different 'sessions' in Node JS?

I'm new in Node JS and i wonder if under mentioned snippets of code has multisession problem.
Consider I have Node JS server (express) and I listen on some POST request:
app.post('/sync/:method', onPostRequest);
var onPostRequest = function(req,res){
// parse request and fetch email list
var emails = [....]; // pseudocode
doJob(emails);
res.status(200).end('OK');
}
function doJob(_emails){
try {
emailsFromFile = fs.readFileSync(FILE_PATH, "utf8") || {};
if(_.isString(oldEmails)){
emailsFromFile = JSON.parse(emailsFromFile);
}
_emails.forEach(function(_email){
if( !emailsFromFile[_email] ){
emailsFromFile[_email] = 0;
}
else{
emailsFromFile[_email] += 1;
}
});
// write object back
fs.writeFileSync(FILE_PATH, JSON.stringify(emailsFromFile));
} catch (e) {
console.error(e);
};
}
So doJob method receives _emails list and I update (counter +1) these emails from object emailsFromFile loaded from file.
Consider I got 2 requests at the same time and it triggers doJob twice. I afraid that when one request loaded emailsFromFile from file, the second request might change file content.
Can anybody spread the light on this issue?
Because the code in the doJob() function is all synchronous, there is no risk of multiple requests causing a concurrency problem.
If you were using async IO in that function, then there would be possible concurrency issues.
To explain, Javascript in node.js is single threaded. So, there is only one thread of Javascript execution running at a time and that thread of execution runs until it returns back to the event loop. So, any sequence of entirely synchronous code like you have in doJob() will run to completion without interruption.
If, on the other hand, you use any asynchronous operations such as fs.readFile() instead of fs.readFileSync(), then that thread of execution will return back to the event loop at the point you call fs.readFileSync() and another request can be run while it is reading the file. If that were the case, then you could end up with two requests conflicting over the same file. In that case, you would have to implement some form of concurrency protection (some sort of flag or queue). This is the type of thing that databases offer lots of features for.
I have a node.js app running on a Raspberry Pi that uses lots of async file I/O and I can have conflicts with that code from multiple requests. I solved it by setting a flag anytime I'm writing to a specific file and any other requests that want to write to that file first check that flag and if it is set, those requests going into my own queue are then served when the prior request finishes its write operation. There are many other ways to solve that too. If this happens in a lot of places, then it's probably worth just getting a database that offers features for this type of write contention.

a request-method in java that implements long-polling

I have already written a request-method in java that sends a request to a simple Server. I have written this simple server and the Connection is based on sockets. When the server has the answer for the request, it will send it automatically to client. Now I want to write a new method that can behave as following:
if the server does not answer after a fixed period of time, then I send a new Request to the server using my request-method
My problem is to implement this idea. I am thinking in launching a thread, whenever the request-method is executed. If this thread does not hear something for fixed period of time, then the request method should be executed again. But how can I hear from the same socket used between that client and server?
I am also asking,if there is a simpler method that does not use threads
curently I am working on this idea
I am working on this idea:
1)send a request using my request-method
2)launch a thread for hearing from socket
3)If(no answer){ go to (1)}
else{
exit
}
I have some difficulties in step 3. How I can go to (1)
You may be able to accomplish this with a single thread using a SocketChannel and a Selector, see also these tutorials on SocketChannel and Selector. The gist of it is that you'll use long-polling on the Selector to let you know when your SocketChannel(s) are ready to read/write/etc using Selector#select(long timeout). (SocketChannel supports non-blocking, but from your problem description it sounds like things would be simpler using blocking)
SocketChannel socketChannel = SocketChannel.open();
socketChannel.connect(new InetSocketAddress("http://jenkov.com", 80));
Selector selector = Selector.open();
SelectionKey key = socketChannel.register(selector, SelectionKey.OP_READ);
// returns the number of channels ready after 5000ms; if you have
// multiple channels attached to the selector then you may prefer
// to iterate through the SelectionKeys
if(selector.select(5000) > 0) {
SocketChannel keyedChannel = (SocketChannel)key.channel();
// read/write the SocketChannel
} else {
// I think your best bet here is to close and reopen the Socket
// or to reinstantiate a new socket - depends on your Request method
}
I am working on this idea:
1)send a request using my request-method
2)launch a thread for hearing from socket
3)If(no answer) then go to (1)

node-vertica throws exception for multi-queries with a callback

I have created a simple web interface for vertica.
I expose simple operation above a vertica cluster.
one of the functionality I expose is querying vertica.
when my user enters a multi-query the node modul throws an exception and my process exits with exit 1.
Is there any way to catch this exception?
Is there any way overcome the problem in a different way?
Right now there's no way to overcome this when using a callback for the query result.
Preventing this from happening would involve making sure there's only one query in the user's input. This is hard because it involves parsing SQL.
The callback API isn't built to deal with multi-queries. I simply haven't bothered implementing proper handling of this case, because this has never been an issue for me.
Instead of a callback, you could use the event listener API, which will send you lower level messages, and handle this yourself.
q = conn.query("SELECT...; SELECT...");
q.on("fields", function(fields) { ... }); // 1 time per query
q.on("row", function(row) { ... }); // 0...* time per query
q.on("end", function(status) { ... }); // 1 time per query

Can the Azure Service Bus be delayed before retrying a message?

The Azure Service Bus supports a built-in retry mechanism which makes an abandoned message immediately visible for another read attempt. I'm trying to use this mechanism to handle some transient errors, but the message is made available immediately after being abandoned.
What I would like to do is make the message invisible for a period of time after it is abandoned, preferably based on an exponentially incrementing policy.
I've tried to set the ScheduledEnqueueTimeUtc property when abandoning the message, but it doesn't seem to have an effect:
var messagingFactory = MessagingFactory.CreateFromConnectionString(...);
var receiver = messagingFactory.CreateMessageReceiver("test-queue");
receiver.OnMessageAsync(async brokeredMessage =>
{
await brokeredMessage.AbandonAsync(
new Dictionary<string, object>
{
{ "ScheduledEnqueueTimeUtc", DateTime.UtcNow.AddSeconds(30) }
});
}
});
I've considered not abandoning the message at all and just letting the lock expire, but this would require having some way to influence how the MessageReceiver specifies the lock duration on a message, and I can't find anything in the API to let me change this value. In addition, it wouldn't be possible to read the delivery count of the message (and therefore make a decision for how long to wait for the next retry) until after the lock is already required.
Can the retry policy in the Message Bus be influenced in some way, or can a delay be artificially introduced in some other way?
Careful here because I think you are confusing the retry feature with the automatic Complete/Abandon mechanism for the OnMessage event-driven message handling. The built in retry mechanism comes into play when a call to the Service Bus fails. For example, if you call to set a message as complete and that fails, then the retry mechanism would kick in. If you are processing a message an exception occurs in your own code that will NOT trigger a retry through the retry feature. Your question doesn't get explicit on if the error is from your code or when attempting to contact the service bus.
If you are indeed after modifying the retry policy that occurs when an error occurs attempting to communicate with the service bus you can modify the RetryPolicy that is set on the MessageReciver itself. There is an RetryExponitial which is used by default, as well as an abstract RetryPolicy you can create your own from.
What I think you are after is more control over what happens when you get an exception doing your processing, and you want to push off working on that message. There are a few options:
When you create your message handler you can set up OnMessageOptions. One of the properties is "AutoComplete". By default this is set to true, which means as soon as processing for the message is completed the Complete method is called automatically. If an exception occurs then abandon is automatically called, which is what you are seeing. By setting the AutoComplete to false you required to call Complete on your own from within the message handler. Failing to do so will cause the message lock to eventually run out, which is one of the behaviors you are looking for.
So, you could write your handler so that if an exception occurs during your processing you simply do not call Complete. The message would then remain on the queue until it's lock runs out and then would become available again. The standard dead lettering mechanism applies and after x number of tries it will be put into the deadletter queue automatically.
A caution of handling this way is that any type of exception will be treated this way. You really need to think about what types of exceptions are doing this and if you really want to push off processing or not. For example, if you are calling a third party system during your processing and it gives you an exception you know is transient, great. If, however, it gives you an error that you know will be a big problem then you may decide to do something else in the system besides just bailing on the message.
You could also look at the "Defer" method. This method actually will then not allow that message to be processed off the queue unless it is specifically pulled by its sequence number. You're code would have to remember the sequence number value and pull it. This isn't quite what you described though.
Another option is you can move away from the OnMessage, Event-driven style of processing messages. While this is very helpful you don't get a lot of control over things. Instead hook up your own processing loop and handle the abandon/complete on your own. You'll also need to deal some of the threading/concurrent call management that the OnMessage pattern gives you. This can be more work but you have the ultimate in flexibility.
Finally, I believe the reason the call you made to AbandonAsync passing the properties you wanted to modify didn't work is that those properties are referring to Metadata properties on the method, not standard properties on BrokeredMessage.
I actually asked this same question last year (implementation aside) with the three approaches I could think of looking at the API. #ClemensVasters, who works on the SB team, responded that using Defer with some kind of re-receive is really the only way to control this precisely.
You can read my comment to his answer for a specific approach to doing it where I suggest using a secondary queue to store messages that indicate which primary messages have been deferred and need to be re-received from the main queue. Then you can control how long you wait by setting the ScheduledEnqueueTimeUtc on those secondary messages to control exactly how long you wait before you retry.
I ran into a similar issue where our order picking system is legacy and goes into maintenance mode each night.
Using the ideas in this article(https://markheath.net/post/defer-processing-azure-service-bus-message) I created a custom property to track how many times a message has been resubmitted and manually dead lettering the message after 10 tries. If the message is under 10 retries it clones the message increments the custom property and sets the en queue of the new message.
using Microsoft.Azure.ServiceBus;
public PickQueue()
{
queueClient = new QueueClient(QUEUE_CONN_STRING, QUEUE_NAME);
}
public async Task QueueMessageAsync(int OrderId)
{
string body = JsonConvert.SerializeObject(OrderId);
var message = new Message(Encoding.UTF8.GetBytes(body));
await queueClient.SendAsync(message);
}
public async Task ReQueueMessageAsync(Message message, DateTime utcEnqueueTime)
{
int resubmitCount = (int)(message.UserProperties["ResubmitCount"] ?? 0) + 1;
if (resubmitCount > 10)
{
await queueClient.DeadLetterAsync(message.SystemProperties.LockToken);
}
else
{
Message clone = message.Clone();
clone.UserProperties["ResubmitCount"] = ++resubmitCount;
await queueClient.ScheduleMessageAsync(message, utcEnqueueTime);
}
}
This question asks how to implement exponential backoff in Azure Functions. If you do not want to use the built-in RetryPolicy (only available when autoComplete = false), here's the solution I've been using:
public static async Task ExceptionHandler(IMessageSession MessageSession, string LockToken, int DeliveryCount)
{
if (DeliveryCount < Globals.MaxDeliveryCount)
{
var DelaySeconds = Math.Pow(Globals.ExponentialBackoff, DeliveryCount);
await Task.Delay(TimeSpan.FromSeconds(DelaySeconds));
await MessageSession.AbandonAsync(LockToken);
}
else
{
await MessageSession.DeadLetterAsync(LockToken);
}
}

How to avoid the need to delay event emission to the next tick of the event loop?

I'm writing a Node.js application using a global event emitter. In other words, my application is built entirely around events. I find this kind of architecture working extremely well for me, with the exception of one side case which I will describe here.
Note that I do not think knowledge of Node.js is required to answer this question. Therefore I will try to keep it abstract.
Imagine the following situation:
A global event emitter (called mediator) allows individual modules to listen for application-wide events.
A HTTP Server is created, accepting incoming requests.
For each incoming request, an event emitter is created to deal with events specific to this request
An example (purely to illustrate this question) of an incoming request:
mediator.on('http.request', request, response, emitter) {
//deal with the new request here, e.g.:
response.send("Hello World.");
});
So far, so good. One can now extend this application by identifying the requested URL and emitting appropriate events:
mediator.on('http.request', request, response, emitter) {
//identify the requested URL
if (request.url === '/') {
emitter.emit('root');
}
else {
emitter.emit('404');
}
});
Following this one can write a module that will deal with a root request.
mediator.on('http.request', function(request, response, emitter) {
//when root is requested
emitter.once('root', function() {
response.send('Welcome to the frontpage.');
});
});
Seems fine, right? Actually, it is potentially broken code. The reason is that the line emitter.emit('root') may be executed before the line emitter.once('root', ...). The result is that the listener never gets executed.
One could deal with this specific situation by delaying the emission of the root event to the end of the event loop:
mediator.on('http.request', request, response, emitter) {
//identify the requested URL
if (request.url === '/') {
process.nextTick(function() {
emitter.emit('root');
});
}
else {
process.nextTick(function() {
emitter.emit('404');
});
}
});
The reason this works is because the emission is now delayed until the current event loop has finished, and therefore all listeners have been registered.
However, there are many issues with this approach:
one of the advantages of such event based architecture is that emitting modules do not need to know who is listening to their events. Therefore it should not be necessary to decide whether the event emission needs to be delayed, because one cannot know what is going to listen for the event and if it needs it to be delayed or not.
it significantly clutters and complexifies code (compare the two examples)
it probably worsens performance
As a consequence, my question is: how does one avoid the need to delay event emission to the next tick of the event loop, such as in the described situation?
Update 19-01-2013
An example illustrating why this behavior is useful: to allow a http request to be handled in parallel.
mediator.on('http.request', function(req, res) {
req.onceall('json.parsed', 'validated', 'methodoverridden', 'authenticated', function() {
//the request has now been validated, parsed as JSON, the kind of HTTP method has been overridden when requested to and it has been authenticated
});
});
If each event like json.parsed would emit the original request, then the above is not possible because each event is related to another request and you cannot listen for a combination of actions executed in parallel for a specific request.
Having both a mediator which listens for events and an emitter which also listens and triggers events seems overly complicated. I'm sure there is a legit reason but my suggestion is to simplify. We use a global eventBus in our nodejs service that does something similar. For this situation, I would emit a new event.
bus.on('http:request', function(req, res) {
if (req.url === '/')
bus.emit('ns:root', req, res);
else
bus.emit('404');
});
// note the use of namespace here to target specific subsystem
bus.once('ns:root', function(req, res) {
res.send('Welcome to the frontpage.');
});
It sounds like you're starting to run into some of the disadvantages of the observer pattern (as mentioned in many books/articles that describe this pattern). My solution is not ideal – assuming an ideal one exists – but:
If you can make a simplifying assumption that the event is emitted only 1 time per emitter (i.e. emitter.emit('root'); is called only once for any emitter instance), then perhaps you can write something that works like jQuery's $.ready() event.
In that case, subscribing to emitter.once('root', function() { ... }) will check whether 'root' was emitted already, and if so, will invoke the handler anyway. And if 'root' was not emitted yet, it'll defer to the normal, existing functionality.
That's all I got.
I think this architecture is in trouble, as you're doing sequential work (I/O) that requires definite order of actions but still plan to build app on components that naturally allow non-deterministic order of execution.
What you can do
Include context selector in mediator.on function e.g. in this way
mediator.on('http.request > root', function( .. ) { } )
Or define it as submediator
var submediator = mediator.yield('http.request > root');
submediator.on(function( ... ) {
emitter.once('root', ... )
});
This would trigger the callback only if root was emitted from http.request handler.
Another trickier way is to make background ordering, but it's not feasible with your current one mediator rules them all interface. Implement code so, that each .emit call does not actually send the event, but puts the produced event in list. Each .once puts consume event record in the same list. When all mediator.on callbacks have been executed, walk through the list, sort it by dependency order (e.g. if list has first consume 'root' and then produce 'root' swap them). Then execute consume handlers in order. If you run out of events, stop executing.
Oi, this seems like a very broken architecture for a few reasons:
How do you pass around request and response? It looks like you've got global references to them.
If I answer your question, you will turn your server into a pure synchronous function and you'd lose the power of async node.js. (Requests would be queued effectively, and could only start executing once the last request is 100% finished.)
To fix this:
Pass request & response to the emit() call as parameters. Now you don't need to force everything to run synchronously anymore, because when the next component handles the event, it will have a reference to the right request & response objects.
Learn about other common solutions that don't need a global mediator. Look at the pattern that Connect was based on many Internet-years ago: http://howtonode.org/connect-it <- describes middleware/onion routing

Resources