Is it safe to spawn multiple jobs from a job so workers could start working on any vacant jobs?
Currently my set up is like this. I have 20 workers waiting for any jobs to be pushed. One of the job is to send iOS push notification, the problem with iOS, You can't send bulk messages.
Current: What I made was, a job that gets the list of specific users by batch, get each device token from my db and start sending notification.
Scenario: If one topic has 1000 users, I have to get all 1000 users and their devices and then start sending on each device. This would push a new job on my queue and 1 worker would pick it app, while the other workers are vacant and waits for incoming jobs. What if no jobs would be available for a given time, Worker 1 had to do all the job sending then,
What I working right now. Is it safe if That one big job, would instead create another jobs so other workers who are vacant can pick it up and do the work?
P.S All jobs are running in 1 tube.
That sounds quite reasonable to me, spreading the load out among a number of workers.
There are some things I would be careful about - such as setting an appropriate priority. If the task that created dozens, or hundreds more tasks has a higher priority then the job that actually does the sending, then you will quickly get potentially hundreds of thousands of jobs, but the workers may not be running them, and so the queue would be filling up.
Leaving large gaps between the priorities mean you can slot in jobs that are really important as well. A more important customer may have a priority closer to zero, and hence be processed, and sent ahead of a smaller customer.
Other matters to think about include an account being rate-limited - If you were limited to say 10 notifications per second, running 20 workers would be a non-starter.
I would also put the new groups of jobs into a new tube (running dozens of tubes is not expensive). You can watch a number of tubes at once (getting the 'most important' job from any of them), but you can't count different types of job within a single tube, so splitting the types into different queues allows you to easily see how many jobs of each type are running. Thus, if the sending processes are building up, you could slow the splitting jobs from being created for a while, or mark them as an even lower priority for a while.
Finally, to keep some advantage of batching jobs and avoiding some overhead, I'd likely split the jobs from 1000+-off into packets of maybe 25-50 notifications per job.
Related
I have been researching how to efficiently solve the following use case and I am struggling to find the best solution.
Basically I have a Node.js REST API which handles requests for users from a mobile application. We want some requests to launch background tasks outside of the req/res flow because they are CPU intensive or might just take a while to execute. We are trying to implement or use any existing frameworks which are able to handle different job queues in the following way (or at least compatible with the use case):
Every user has their own set job queues (there are different kind of jobs).
The jobs within one specific queue have to be executed sequentially and only one job at a time but everything else can be executed in parallel (it would be preferable if there are no queues hogging the workers or whatever is actually consuming the tasks so all queues get more or less the same priority).
Some queues might fill up with hundreds of tasks at a given time but most likely they will be empty a lot of the time.
Queues need to be persistent.
We currently have a solution with RabbitMQ with one queue for every kind of task which all the users share. The users dump tasks into the same queues which results in them filling up with tasks from a specific user for a long time and having the rest of users wait for those tasks to be done before their own start to be consumed. We have looked into priority queues but we don't think that's the way to go for our own use case.
The first somewhat logical solution we thought of is to create temporary queues whenever a user needs to run background jobs and have them be deleted when empty. Nevertheless we are not sure if having that many queues is scalable and we are also struggling with dynamically creating RabbitMQ queues, exchanges, etc (we have even read somewhere that it might be an anti-pattern?).
We have been doing some more research and maybe the way to go would be with other stuff such as Kafka or Redis based stuff like BullMQ or similar.
What would you recommend?
If you're on AWS, have you considered SQS? There is no limit on number of standard queues created, and in flight messages can reach up to 120k. This would seem to satisfy your requirements above.
While the mentioned SQS solution did prove to be very scalable our amount of polling we would need to do or use of SNS did not make the solution optimal. On the other hand implementing a self made solution via database polling was too much for our use case and we did not have the time or computational resources to consider a new database in our stack.
Luckily, we ended up finding that the Pro version of BullMQ does have a "Group" functionality which performs a round robin strategy for different tasks within a single queue. This ended up adjusting perfectly to our use case and is what we ended up using.
I want to be able to process background jobs that will have multiple tasks associated with it. Tasks may consist of launching API requests (blocking operations) and manipulating and persisting responses. Some of these tasks could also have subtasks that must be executed asynchronously.
For languages such as Ruby, I might use a worker to execute the jobs. As I understand, every time a new job comes gets to the queue, a new thread will execute it. As I mentioned before, sometimes a task could contain a series of subtasks to be executed asynchronously, so as I see it, I have two options:
Add the substep execution to the worker queue (But a job could have easily lots of subtasks that will fill the queue fast and will block new jobs from been processed).
What if I use event-driven Node server to handle a job execution? I would not need to add subtasks to a queue as a single node server could be able to handle one's job entire execution asynchronously. Is there something wrong with doing this?
This is the first time I encounter this kind of problem and I want to know which approach is better suited to solve my issue.
I am putting together a Celery based data ingestion pipeline. One thing I do not see anywhere in the documentation is how to build a flow where workers are only running when there is work to be done. (seems like a major flaw in the design of Celery honestly)
I understand Celery itself won't handle autoscaling of actual servers, thats fine, but when I simulate this Flower doesn't see the work that was submitted unless the worker was online when the task was submitted. Why? I'd love a world where I'm not paying for servers unless there is actual work to be done.
Workflow:
Imagine a While loop thats adding new data to be processed using the celery_app.send_task method.
I have custom code that sees theres N messages in the queue. It spins up a Server, and starts a Celery worker for that task.
Celery worker comes online, and does the work.
BUT.
Flower has no record of that task, even though I see the broker has a "message", and while watchings the output of the worker, I can see it did its thing.
If I keep the worker online, and then submit a task, it monitors everything just fine and dandy.
Anyone know why?
You can use celery autoscaling. For example setting autoscale to 8 will mean it will fire up to 8 processes to process your queue(s). It will have a master process sitting waiting though. You can also set a minimum, for example 2-8 which will have 2 workers waiting but fire up some more (up to 8) if it needs to (and then scale down when the queue is empty).
This is the process based autoscaler. You can use it as a reference if you want to create a cloud based autoscaler for example that fires up new nodes instead of just processes.
As to your flower issue it's hard to say without knowing your broker (redis/rabbit/etc). Flower doesn't capture everything as it relies on the broker doing that and some configuration causes the broker to delete information like what tasks have run.
I have following task to implement using AWS stack:
One job is triggered periodically and put message to queue (SQS). Worker recieves this task and based on it additional tasks need to be created (approximately 1-10 K tasks). And all these tasks are also put to another queue and there are additional workers to process these tasks.
These flow can be described displayed in following way:
Periodic task ->SQS->woker_1(creates more tasks) -> SQS -> workers_2
Based on project conventions and bureaucracy it will take some time to create two separate services for worker_1 that listen to periodic task and creates fine grained tasks and for workers_2 that just process particular tasks, make docker images, CI jobs etc... and get deploy it.
So, here is the tradeof:
1. Spend additional time and create two separate services. On the other hand these services might be really simple. And even there is a doubt to have 2 separate projects.
2. Make this as a one service that put messages to the same queue and also will listen to the messages on the same queue and perorm work for: worker_1 and worker_2.
Any suggestions or thoughts are appreciated!
I don't think there can be a "correct" answer to this, you already have a good list of pros and cons for both options. Some additional things I thought of:
SQS queues don't really allow you to pick out specific types of messages, you pretty much need to read everything first-in-first-out. So if you share queues, you may have less control of prioritizing messages.
For the two services to interact, they need a shared message definition. Sharing the same codebase would make it easier to dev and test the messaging code. Of course, it could also be a shared library.
Deploying both worker types in the same server/application would share resources, which might be more economical at the low end, or it might be confusing at high scale.
It may be possible to develop all the code into the same application, and leave the decision to deployment-time if it is all on the same server and queue or separate servers reading from separate queues. This seems ideal to me.
I am working on a system that has lots of tasks that are perfect for queueing and has some existing home made legacy solutions already in place that work to varying degrees, I am familiar with gearman and have read through the RabbitMQ tutorials and am keen to upgrade the current solutions to use one of these more robust existing solutions (leaning towards rabbitMQ atm because of the flexibility and scalability and the management plugin).
I am having trouble understanding how to address a problem that allows user A to queue up a large number of a jobs (lets say 5000) of type A which then blocks the processing of any newly added jobs of type A until user A's jobs are done. Id like to implement a solution that will fairly share the load, or even just round-robin between the queued users.
Does anyone have any suggestions or insights into how I might implement a solution to this ?
I thought routing_keys might help but if User A's jobs are queued before User B adds their jobs then they still wont be processed until User A's jobs have been consumed ?
I have also thought of creating a queue for each user & jobtype but I am unsure how to do this dynamically ?
Perhaps I need to implement some sort of control queue that sets up queues and dynamically adjusts the worker processes to consume the newly added user only queue, but would the worker collect the jobs from the queues in a round-robin type way ? And how would I decide when to remove the queues ?
thanks in advance for any help !
Ok no comments from anyone so in the end I figured out that in rabbitmq you can consume from multiple queues in a round robin type fashion. So I built a queue that informs consumer workers to consume from a queue and dynamically create a queue for each users tasks, that are periodically deleted when empty.