Solution for user-specific background job queues - node.js

I have been researching how to efficiently solve the following use case and I am struggling to find the best solution.
Basically I have a Node.js REST API which handles requests for users from a mobile application. We want some requests to launch background tasks outside of the req/res flow because they are CPU intensive or might just take a while to execute. We are trying to implement or use any existing frameworks which are able to handle different job queues in the following way (or at least compatible with the use case):
Every user has their own set job queues (there are different kind of jobs).
The jobs within one specific queue have to be executed sequentially and only one job at a time but everything else can be executed in parallel (it would be preferable if there are no queues hogging the workers or whatever is actually consuming the tasks so all queues get more or less the same priority).
Some queues might fill up with hundreds of tasks at a given time but most likely they will be empty a lot of the time.
Queues need to be persistent.
We currently have a solution with RabbitMQ with one queue for every kind of task which all the users share. The users dump tasks into the same queues which results in them filling up with tasks from a specific user for a long time and having the rest of users wait for those tasks to be done before their own start to be consumed. We have looked into priority queues but we don't think that's the way to go for our own use case.
The first somewhat logical solution we thought of is to create temporary queues whenever a user needs to run background jobs and have them be deleted when empty. Nevertheless we are not sure if having that many queues is scalable and we are also struggling with dynamically creating RabbitMQ queues, exchanges, etc (we have even read somewhere that it might be an anti-pattern?).
We have been doing some more research and maybe the way to go would be with other stuff such as Kafka or Redis based stuff like BullMQ or similar.
What would you recommend?

If you're on AWS, have you considered SQS? There is no limit on number of standard queues created, and in flight messages can reach up to 120k. This would seem to satisfy your requirements above.

While the mentioned SQS solution did prove to be very scalable our amount of polling we would need to do or use of SNS did not make the solution optimal. On the other hand implementing a self made solution via database polling was too much for our use case and we did not have the time or computational resources to consider a new database in our stack.
Luckily, we ended up finding that the Pro version of BullMQ does have a "Group" functionality which performs a round robin strategy for different tasks within a single queue. This ended up adjusting perfectly to our use case and is what we ended up using.

Related

BullMQ - add jobs to queues from different service to worker

I'm trying to figure out the best architecture for a scalable BullMQ implementation. We have a number of different services that are going to be feeding jobs into queues. In some situations we may have multiple different services feeding jobs into the same queue.
Initially I had thought to contain all BullMQ implementation on a single instance and stand up a simple API with an endpoint that can receive jobs to be added to the queue. So for any service that wants to add a job to a queue, they just hit a specific endpoint and the job gets added to the queue.
I was wondering though whether an alternative approach could be to instantiate a BullMQ queue on the various services that want to add jobs to queues, and then just have the workers located on a separate service to pick up jobs from the queue when they are ready for execution? This 'worker box' can then horizontally scale up as required.
If this approach is possible, I have concerns about what the implications may be of having multiple services adding jobs to the same queue - can this cause issues or is BullMQ designed to handle such a situation?
I'm finding it difficult to find information about what standard 'best-practice' approaches are for BullMQ implementation. Any guidance greatly appreciated. Thanks.

Task scheduling behind multiple instances

Currently I am solving an engineering problem, and want to open the conversation to the SO community.
I want to implement a task scheduler. I have two separate instances of a nodeJS application sitting behind an elastic load balancer (ELB). The problem is when both instances come up, they try to execute the same tasks logic, causing the tasks run more than once.
My current solution is to use node-schedule to schedule tasks to run, then have them referencing the database to check if the task hasn't already been run since it's specified run time interval.
The logic here is a little messy, and I am wondering if there is a more elegant way I could go about doing this.
Perhaps it is possible to set a particular env variable on a specific instance - so that only that instance will run the tasks.
What do you all think?
What you are describing appears to be a perfect example of a use case for AWS Simple Queue Service.
https://aws.amazon.com/sqs/details/
Key points to look out for in your solution:
Make sure that you pick a visibility timeout that is reflective of your workload (so messages don't reenter the queue whilst still in process by another worker)
Don't store your workload in the message, reference it! A message can only be up to 256kb in size and message sizes have an impact on performance and cost.
Make sure you understand billing! As billing is charged in 64KB chunks, meaning 1 220KB message is charged as 4x 64KB chucks / requests.
If you make your messages small, you can save more money by doing batch requests as your bang for buck will be far greater!
Use longpolling to retrieve messages to get the most value out of your message requests.
Grant your application permissions to SQS by the use of an EC2 IAM Role, as this is the best security practice and the recommended approach by AWS.
It's an excellent service, and should resolve your current need nicely.
Thanks!
Xavier.

AWS multiple SQS queues and workers optimal design

I have following task to implement using AWS stack:
One job is triggered periodically and put message to queue (SQS). Worker recieves this task and based on it additional tasks need to be created (approximately 1-10 K tasks). And all these tasks are also put to another queue and there are additional workers to process these tasks.
These flow can be described displayed in following way:
Periodic task ->SQS->woker_1(creates more tasks) -> SQS -> workers_2
Based on project conventions and bureaucracy it will take some time to create two separate services for worker_1 that listen to periodic task and creates fine grained tasks and for workers_2 that just process particular tasks, make docker images, CI jobs etc... and get deploy it.
So, here is the tradeof:
1. Spend additional time and create two separate services. On the other hand these services might be really simple. And even there is a doubt to have 2 separate projects.
2. Make this as a one service that put messages to the same queue and also will listen to the messages on the same queue and perorm work for: worker_1 and worker_2.
Any suggestions or thoughts are appreciated!
I don't think there can be a "correct" answer to this, you already have a good list of pros and cons for both options. Some additional things I thought of:
SQS queues don't really allow you to pick out specific types of messages, you pretty much need to read everything first-in-first-out. So if you share queues, you may have less control of prioritizing messages.
For the two services to interact, they need a shared message definition. Sharing the same codebase would make it easier to dev and test the messaging code. Of course, it could also be a shared library.
Deploying both worker types in the same server/application would share resources, which might be more economical at the low end, or it might be confusing at high scale.
It may be possible to develop all the code into the same application, and leave the decision to deployment-time if it is all on the same server and queue or separate servers reading from separate queues. This seems ideal to me.

Handle long-running processes in NodeJS?

I've seen some older posts touching on this topic but I wanted to know what the current, modern approach is.
The use case is: (1) assume you want to do a long running task on a video file, say 60 seconds long, say jspm install that can take up to 60 seconds. (2) you can NOT subdivide the task.
Other requirements include:
need to know when a task finishes
nice to be able to stop a running task
stability: if one task dies, it doesn't bring down the server
needs to be able to handle 100s of simultaneous requests
I've seen these solutions mentioned:
nodejs child process
webworkers
fibers - not used for CPU-bound tasks
generators - not used for CPU-bound tasks
https://adambom.github.io/parallel.js/
https://github.com/xk/node-threads-a-gogo
any others?
Which is the modern, standard-based approach? Also, if nodejs isn't suited for this type of task, then that's also a valid answer.
The short answer is: Depends
If you mean a nodejs server, then the answer is no for this use case. Nodejs's single-thread event can't handle CPU-bound tasks, so it makes sense to outsource the work to another process or thread. However, for this use case where the CPU-bound task runs for a long time, it makes sense to find some way of queueing tasks... i.e., it makes sense to use a worker queue.
However, for this particular use case of running JS code (jspm API), it makes sense to use a worker queue that uses nodejs. Hence, the solution is: (1) use a nodejs server that does nothing but queue tasks in the worker queue. (2) use a nodejs worker queue (like kue) to do the actual work. Use cluster to spread the work across different CPUs. The result is a simple, single server that can handle hundreds of requests (w/o choking). (Well, almost, see the note below...)
Note:
the above solution uses processes. I did not investigate thread solutions because it seems that these have fallen out of favor for node.
the worker queue + cluster give you the equivalent of a thread pool.
yea, in the worst case, the 100th parallel request will take 25 minutes to complete on a 4-core machine. The solution is to spin up another worker queue server (if I'm not mistaken, with a db-backed worker queue like kue this is trivial---just make each point server point to the same db).
You're mentioning a CPU-bound task, and a long-running one, that's definitely not a node.js thing. You also mention hundreds of simultaneous tasks.
You might take a look at something like Gearman job server for things like that - it's a dedicated solution.
Alternatively, you can still have Node.js manage the requests, just not do the actual job execution.
If it's relatively acceptable to have lower then optimal performance, and you want to keep your code in JavaScript, you can still do it, but you should have some sort of job queue - something like Redis or RabbitMQ comes to mind.
I think job queue will be a must-have requirement for long-running, hundreds/sec tasks, regardless of your runtime. Except if you can spawn this job on other servers/services/machines - then you don't care, your Node.js API is just a front and management layer for the job cluster, then Node.js is perfectly ok for the job, and you need to focus on that job cluster, and you could then make a better question.
Now, node.js can still be useful for you here, it can help manage and hold those hundreds of tasks, depending where they come from (ie. you might only allow requests to go through to your job server for certain users, or limit the "pause" functionality to others etc.
Easily perform Concurrent Execution to LongRunning Processes using Simple ConcurrentQueue. Feel free to improve and share feedback.
👨🏻‍💻 Create your own Custom ConcurrentExecutor and set your concurrency limit.
🔥 Boom you got all your long-running processes run in concurrent mode.
For Understanding you can have a look:
Concurrent Process Executor Queue

RabbitMQ/AMQP/Gearman distributing workload based on job type and grouping

I am working on a system that has lots of tasks that are perfect for queueing and has some existing home made legacy solutions already in place that work to varying degrees, I am familiar with gearman and have read through the RabbitMQ tutorials and am keen to upgrade the current solutions to use one of these more robust existing solutions (leaning towards rabbitMQ atm because of the flexibility and scalability and the management plugin).
I am having trouble understanding how to address a problem that allows user A to queue up a large number of a jobs (lets say 5000) of type A which then blocks the processing of any newly added jobs of type A until user A's jobs are done. Id like to implement a solution that will fairly share the load, or even just round-robin between the queued users.
Does anyone have any suggestions or insights into how I might implement a solution to this ?
I thought routing_keys might help but if User A's jobs are queued before User B adds their jobs then they still wont be processed until User A's jobs have been consumed ?
I have also thought of creating a queue for each user & jobtype but I am unsure how to do this dynamically ?
Perhaps I need to implement some sort of control queue that sets up queues and dynamically adjusts the worker processes to consume the newly added user only queue, but would the worker collect the jobs from the queues in a round-robin type way ? And how would I decide when to remove the queues ?
thanks in advance for any help !
Ok no comments from anyone so in the end I figured out that in rabbitmq you can consume from multiple queues in a round robin type fashion. So I built a queue that informs consumer workers to consume from a queue and dynamically create a queue for each users tasks, that are periodically deleted when empty.

Resources