I create a system that will add a new repeatable action after the POST method.
In nest documentation, I saw that queues are registered in modules.
So when I'd like to add repeatable jobs, should I create one queue and using a controller just add a new job to this queue, or should I create a separated queue? If separated - how to create using a controller?
I'm not sure what did you mean by "should I create a separated queue?". did you meant to create a sepeate queue per repeatable job?
the answer depends on multipel factors:
what is the concurrency level of each and everyone of your repeatable jobs?
does any of them has a preiority?
.....
.....
as you can see, if all the jobs will use the same bull-queue options, there is no reason to create addtional queues.
How do you create a queue: https://docs.nestjs.com/techniques/queues
Is there something specific that is unclear in their toturial? (I used it a week ago and everything is working great in production).
Related
In a Producer-Consumer case with multiple app instances, I know I am supposed to have some type of queue for the distribution of events to the consumers. But how do I deal with the producer?
I must query a database for objects with an expired deadline every minute. That will push work to a message queue, so distribution is not a problem. My concern is that if I have multiple instances of the app, I have to make sure that only one is producing work.
Am I supposed to solve this electing a cluster leader? Is there a common algorithm or library in NodeJS for this? My guess is that I will have to reach for some magic Redis command and make my instances aware of each other.
There are always many different ways to achieve things, but my suggestion is to create an idempotent outbox table in your database, where multiple producers throw the records to be published to the message queue.
Then, you can deploy a tool like Debezium that does transaction log tailing (reads the database transaction log) and pushes the message to whatever message queue technology you're using.
Please note that it's also a good practice to implement the idempotency check on your consumers to make sure they don't process the same message twice.
Wix - How We Implemented Idempotency in a Billing System at Scale
I have a problem when I scale the project (nestjs) to multiple instance. In my project, I have a crawler service that run each 10 minutes. When 2 instance running, crawler will run on both instances, so data will duplicate. Does anyone know how to handle it?
Looks like it can be processed using a queue, but I don't have a solution yet
Jobs aren't the right construct in this case.
Instead, use a job Queue: https://docs.nestjs.com/techniques/queues
You won't even need to set up a separate worker server to handle the jobs. Instead, add Redis (or similar) to your setup, configure a queue to use it, then set up 1) a producer module to add jobs to the queue whenever they need to be run, 2) a consumer module which will pull jobs off the queue and process them. Add logic into the producer module to ensure that duplicate jobs aren't being created, if that logic is running on both your machines.
Conversely it may be easier just to separate job production/processing into a separate server.
I have a nodejs project that is exposing a simple rest api for an external web application. This webhook must cope with a large number of requests per second as well as return 200 OK very quickly to the caller. In order for that to happen I investigate a redis simple queue to be enqueued with each request's to be handled asynchronously later on (via a consumer thread).
The redis simple queue seems like an easy way to achieve this task (https://github.com/smrchy/rsmq)
1) Is rsmq.receiveMessage() { ....... } a blocking method? if this handler is slow - will it impact my server's performance?
2) If the answer to question 1 is true - Is it recommended to extract the consumption of the messages to an external micro service? (a dedicated consumer)? what are the best practices to create multi threaded consumers on such environment?
You can use pubsub feature provided by redis https://redis.io/topics/pubsub
You can publish to various channels without any knowledge of subscribers . Subscribers can subscribe to the channels they wish.
sreeni
1) No, it won't block the event loop, however you will only start processing a second message once you call the "next" method, i.e., you will process one message at a time. To overcome this, you can start multiple workers in parallel. Take a look here: https://stackoverflow.com/a/45984677/7201847
2) That's an architectural decision that depends on the load you have to support and the hardware capacity you have. I would recommend at least two Node.js processes, one for adding the messages to the queue and another one to actually processing them, with the option to start additional worker processes if needed, depending on the results of your performance tests.
I am working on a laravel project which involves the customers of the application to import data into the application through various other APIs.
I am thinking of making jobs for each kind of data that needs to imported. But, that would mean that many customers will be raising jobs that need to be worked by queue workers which means that if a customer has raised a job to import one job which is importing a certain kind of data, if another customer also raised a job to import the same kind of data, he would have to wait for the first customer's job to complete and then the second customer's job would be started to work by the queue working. But then we cannot have 100 customers waiting for each other.
So what would be a viable solution for this kind of imports? Should I put the jobs on hashed queues and then call those queues? Or is there a better way to handle this. Has anyone ever worked with such an application where data was being imported from various APIs by customers asynchronously. And how to handle it best.
Your pointing in right direction. You need a queue worker such as Laravel's - but you need to choose the async driver. You can use the build-in drivers or try to install your own queue server like RabbitMQ. Here is a nice package with RabbitMQ driver for Laravel 5:
https://github.com/vladimir-yuldashev/laravel-queue-rabbitmq
This will makes all your customers request will get into the async queue the separate threads for each one. These queues systems works very efficient and we can say that is a kind of async PHP. For be more async inside PHP code you can use Guzzle HTTP package which provides Async Request (and full service from within the PHP code/server side of the application).
I have two instances of a worker role.
I want to run a sub-task (on a Thread Pool thread) only on one of the Worker Role instances.
My initial idea was to do something like this:
ThreadPool.QueueUserWorkItem((o) =>
{
if (RoleEnvironment.CurrentRoleInstance.Id == RoleEnvironment.Roles[RoleEnvironment.CurrentRoleInstance.Role.Name].Instances.First().Id)
{
emailWorker.Start();
}
});
However, the above code relies on Role.Instances collection always returning the instances in the same order. Is this the case? or can the items be returned in any order?
Is there another approved way of running a task on one role instance only?
Joe, the solution you are looking for typically rely on:
either acquiring on lease (similar to a lock, but with an expiration) on a specific blob using the Blob Storage as a synchronization point between your role instances.
or queuing / dequeuing a message from the Queue Storage, which is usually the suggested pattern to delay long running operations such as sending an email.
Either ways, you need to go through the Azure Storage to make it work. I suggest to have a look at Lokad.Cloud, as we have designed this open-source framework precisely to handle this sort of situations.
If they need to be doing different things, then it sounds to me like you don't have 2 instances of a single worker role. In reality you have 2 different worker roles.
Especially when looking at the scalability of your application, processes need to be able to run on more than one instance. What happens when that task that you only want to run on one role gets large enough that it needs to scale to 2 or more role instances?
One of the benefits of developing for Azure is that you get scalability automatically if you design your app properly. If makes you work extra to get something that's not scalable, which is what you're trying to do.
What triggers this task to be started? If you use a message on Queue Storage (as suggested by Joannes) then only one worker role will pick up the message and process it and it doesn't matter which instance of your worker role does that.
So for now if you've got one worker role that's doing the sub task and another worker role that's doing everything else, just add 2 worker roles to your Azure solution. However, even if you do that, the worker role that processes the sub task should be written in such a way that if you ever scale it to run more than a single instance that it will run properly. In that case, you might as well stick with the single worker role and code for processing messages off the Queue to start your sub task.