In which way Node.js handles jobs in parallel? [Bull v3] - node.js

I have started experimenting in Node.js lately.
I'm currently setting up an app, which will handle multiple queues in parallel by utilising bull library, to run heavy jobs in the background.
I'm looking for an answer, which i hope i did not miss in the documentation.
**It's still kind of "blurry" to me, how this library is handling those tasks in parallel. **
So i have the following scenario:
2 jobs are running at the same time, both of them are heavy and are
taking some time to finish.
During the run of those 2 jobs, i can still use the rest of the
application - The event loop is not blocked.
What i get from that, is, that probably something else handles those 2 jobs, since JavaScript is single-threaded What is this?
Any guidance or any advice will be highly appreciated!
https://github.com/OptimalBits/bull

Related

How to use nodejs & socket.io server for multi user in a non-blocking way?

I'm working with NodeJs from a couple of time & it was doing a really good job when there is only IO. And then I faced this challenge.
We have a game in which every time on an average 250 users play. Currently its back-end server is running on java. But now we want to convert it in NodeJs.
So we were going good until when we reached to the game engine. Where there is so many CPU-bound jobs. When a user is getting served these CPU-Bound requests, all others are getting blocked. This is really normal I know so we tested all the solutions of this problem before abandoning the project.
Used the following:
callback
thread and threadPool from node_module webWorker-threads
created separate js file for all CPU-Bound jobs and ran them in process.exec
cluster
created each thing in different module
But except process.exec & cluster all are in vain. In these two solution cluster is also too much unpredictable. Because it happened that in a worker multiple requests are assigned & in the front there is a CPU-Bound job, in that case again same issue.
Only process.exec is working good. But we have so many CPU-Bound tasks, if we do a separate file for each of them then it will be a mess.
So I want to know if it is not at all possible in NodeJs or not. Anyone of stack-overflow community faced this issue and solved it or anyone want to give any solution regarding this, a big thanks to all of them...

Handle long-running processes in NodeJS?

I've seen some older posts touching on this topic but I wanted to know what the current, modern approach is.
The use case is: (1) assume you want to do a long running task on a video file, say 60 seconds long, say jspm install that can take up to 60 seconds. (2) you can NOT subdivide the task.
Other requirements include:
need to know when a task finishes
nice to be able to stop a running task
stability: if one task dies, it doesn't bring down the server
needs to be able to handle 100s of simultaneous requests
I've seen these solutions mentioned:
nodejs child process
webworkers
fibers - not used for CPU-bound tasks
generators - not used for CPU-bound tasks
https://adambom.github.io/parallel.js/
https://github.com/xk/node-threads-a-gogo
any others?
Which is the modern, standard-based approach? Also, if nodejs isn't suited for this type of task, then that's also a valid answer.
The short answer is: Depends
If you mean a nodejs server, then the answer is no for this use case. Nodejs's single-thread event can't handle CPU-bound tasks, so it makes sense to outsource the work to another process or thread. However, for this use case where the CPU-bound task runs for a long time, it makes sense to find some way of queueing tasks... i.e., it makes sense to use a worker queue.
However, for this particular use case of running JS code (jspm API), it makes sense to use a worker queue that uses nodejs. Hence, the solution is: (1) use a nodejs server that does nothing but queue tasks in the worker queue. (2) use a nodejs worker queue (like kue) to do the actual work. Use cluster to spread the work across different CPUs. The result is a simple, single server that can handle hundreds of requests (w/o choking). (Well, almost, see the note below...)
Note:
the above solution uses processes. I did not investigate thread solutions because it seems that these have fallen out of favor for node.
the worker queue + cluster give you the equivalent of a thread pool.
yea, in the worst case, the 100th parallel request will take 25 minutes to complete on a 4-core machine. The solution is to spin up another worker queue server (if I'm not mistaken, with a db-backed worker queue like kue this is trivial---just make each point server point to the same db).
You're mentioning a CPU-bound task, and a long-running one, that's definitely not a node.js thing. You also mention hundreds of simultaneous tasks.
You might take a look at something like Gearman job server for things like that - it's a dedicated solution.
Alternatively, you can still have Node.js manage the requests, just not do the actual job execution.
If it's relatively acceptable to have lower then optimal performance, and you want to keep your code in JavaScript, you can still do it, but you should have some sort of job queue - something like Redis or RabbitMQ comes to mind.
I think job queue will be a must-have requirement for long-running, hundreds/sec tasks, regardless of your runtime. Except if you can spawn this job on other servers/services/machines - then you don't care, your Node.js API is just a front and management layer for the job cluster, then Node.js is perfectly ok for the job, and you need to focus on that job cluster, and you could then make a better question.
Now, node.js can still be useful for you here, it can help manage and hold those hundreds of tasks, depending where they come from (ie. you might only allow requests to go through to your job server for certain users, or limit the "pause" functionality to others etc.
Easily perform Concurrent Execution to LongRunning Processes using Simple ConcurrentQueue. Feel free to improve and share feedback.
👨🏻‍💻 Create your own Custom ConcurrentExecutor and set your concurrency limit.
🔥 Boom you got all your long-running processes run in concurrent mode.
For Understanding you can have a look:
Concurrent Process Executor Queue

Equivalent of Celery in Node JS

Please suggest an equivalent of Celery in Node JS to run asynchronous tasks.
I have been able to search for the following:
(Later)
Kue (Kue),
coffee-resque (coffee-resque)
cron (cron)
node-celery(node celery)
I have run both manual and automated threads in background and interact with MongoDB.
node-celery is using redis DB and not Mongo DB. Is there any way I can change that?When I installed node-celery redis was installed as dependency.
I am new to celery, Please guide.Thanks.
Celery is basically a RabbitMQ client. There are producers (tasks), consumers (workers) and AMQP message broker which delivers messages between tasks and workers.
Knowing that will enable you to write your own celery in node.js.
node-celery here is a library that enables your node process to work both as a celery client (Producer/Publisher) and a celery worker (Consumer).
See https://abhishek-tiwari.com/post/amqp-rabbitmq-and-celery-a-visual-guide-for-dummies
Edit-1/2018
My recommendation is not to use Kue now, as it seems to be a stalled project, use Celery instead. It is very well supported and maintained by the community and supports large number of use cases.
Old Answer
Go for Kue, it's a wholistic solution that resembles Celery in Python word; it has the concepts of: producers/consumers, delayed tasks, task retrial, task TTL, ability to round-robin tasks across multiple consumers listening to the same queue, etc.
Probably Celery is more advanced with more features with more brokers to support and you can use celery-node if you like, but, in my opinion, I think no need to go for a hybrid solution that requires installation of python and node when you can only use only language that's sufficient in 90% of the cases (unless necessary of course).
Go for Kue, it's a wholistic solution that resembles Celery in Python word; it has the concepts of: producers/consumers, delayed tasks, task retrial, task TTL, ability to round-robin tasks across multiple consumers listening to the same queue, etc.
Kue, after so much time have passed, still has the same old core issues unsolved:
github.com/Automattic/kue/issues/514
github.com/Automattic/kue/issues/130
github.com/Automattic/kue/issues/53
If anyone reading this don't want to rewrite Kue, don't start with it. It's good for simple tasks. But if you want to deal with a lot of them, concurrent, or task chains (when one task creates another) - stop wasting your time.
I've wasted a month trying to debug Kue and still no success. The best choice was to change Kue for Pubs/sub Messaging queue on RabbitMQ and Rabbot (another RabbitMQ wrap up).
Personally, I haven't used Celery as much to put all in for it, but as I've been searching for Celery alternative and found how someone is advising for Kue just boiled my blood in veins.
If you want to send a delayed email (as in Kue example) you can go for whatever you'd like without worrying about errors. But if you want a reliable system task/message queue, don't even start with Kue. I'd personally go with 5. node-celery(node celery)
It is also worth mentioning https://github.com/OptimalBits/bull. It is a fast, reliable, Redis-based queue written for stability and atomicity.
Bull 4 is currently in beta and has some nice features https://github.com/taskforcesh/bullmq
In our experience, Kue was unreliable, losing jobs. Granted, we were using an older version, it's probably been fixed since. That was also during the period when TJ abandoned the project and the new maintainers hadn't been chosen. We switched to beanstalkd and have been very happy. We're using https://github.com/ceejbot/fivebeans as the node interface to beanstalkd.

Controlling the flow of requests without dropping them - NodeJS

I have a simple nodejs webserver running, it:
Accepts requests
Spawns separate thread to perform background processing
Background thread returns results
App responds to client
Using Apache benchmark "ab -r -n 100 -c 10", performing 100 requests with 10 at a time.
Average response time of 5.6 seconds.
My logic for using nodejs is that is typically quite resource efficient, especially when the bulk of the work is being done by another process. Seems like the most lightweight webserver option for this scenario.
The Problem
With 10 concurrent requests my CPU was maxed out, which is no surprise since there is CPU intensive work going on the background.
Scaling horizontally is an easy thing to, although I want to make the most out of each server for obvious reasons.
So how with nodejs, either raw or some framework, how can one keep that under control as to not go overkill on the CPU.
Potential Approach?
Could accepting the request storing it in a db or some persistent storage and having a separate process that uses an async library to process x at a time?
In your potential approach, you're basically describing a queue. You can store incoming messages (jobs) there and have each process get one job at the time, only getting the next one when processing the previous job has finished. You could spawn a number of processes working in parallel, like an amount equal to the number of cores in your system. Spawning more won't help performance, because multiple processes sharing a core will just run slower. Keeping one core free might be preferred to keep the system responsive for administrative tasks.
Many different queues exist. A node-based one using redis for persistence that seems to be well supported is Kue (I have no personal experience using it). I found a tutorial for building an implementation with Kue here. Depending on the software your environment is running in though, another choice might make more sense.
Good luck and have fun!

worker queue for nodejs?

I am in the process of beginning to write a worker queue for node using node's cluster API and mongoose.
I noticed that a lot of libs exist that already do this but using redis and forking. Is there a good reason to fork versus using the cluster API?
edit and now i also find this: https://github.com/xk/node-threads-a-gogo -- too many options!
I would rather not add redis to the mix since I already use mongo. Also, my requirements are very loose, I would like persistence but could go without it for the first version.
Part two of the question:
What are the most stable/used nodejs worker queue libs out there today?
Wanted to follow up on this. My solution ended up being a roll your own cluster impl where some of my cluster workers are dedicated job workers (ie they just have code to work on jobs).
I use agenda for job scheduling.
Cron type jobs are scheduled by the cluster master. The rest of the jobs are created in the non-worker clusters as they are needed. (verification emails etc)
Before that I was using kue but dropped it because the rest of my app uses mongodb and I didnt like having to use redis just for job scheduling.
Have u tried https://github.com/rvagg/node-worker-farm?
It is very light weight and doesn't require a separate server.
I personally am partial to cluster-master.
https://github.com/isaacs/cluster-master
The reason I like cluster master is because it does very little besides add in logic for forking your process, and give you the ability to manage the number of process you're running, and a little bit of logging/recovery to boot! I find overly bloated process management libraries tend to be unstable, and sometimes even slow things down.
This library will be good for you if the following are true:
Your module is largely asynchronous
You don't have a huge amount of different types of events triggering
The events that fire have small amounts of work to do, but you have lots of similar events firing(things like web servers)
The reason for the above list, is the reason why threads-a-gogo may be good for you, for the opposite reasons. If you have a few spots in your code, where there is a lot of work to do within your event loop, something like threads-a-gogo that launches a "thread" specifically for this work is awesome, because you aren't determining ahead of time how many workers to spawn, but rather spawning them to do work when needed. Note: this can also be bad if there is the potential for a lot of them to spawn, if you start launching too many processes things can actually bog down, but I digress.
To summarize, if your module is largely asynchronous already, what you really want is a worker pool. To minimize the down time when your process is not listening for events, and to maximize the amount of processor you can use. Unless you have a very busy syncronous call, a single node event loop will have troubles taking advantage of even a single core of a processor. Under this circumstance, you are best off with cluster-master. What I recommend is doing a little benchmarking, and see how much of a single core your program can use under the "worst case scenario". Let's say this is 33% of one core. If you have a quad core machine, you then tell cluster master to launch you 12 workers.
Hope this helped!

Resources