Equivalent of Celery in Node JS - node.js

Please suggest an equivalent of Celery in Node JS to run asynchronous tasks.
I have been able to search for the following:
(Later)
Kue (Kue),
coffee-resque (coffee-resque)
cron (cron)
node-celery(node celery)
I have run both manual and automated threads in background and interact with MongoDB.
node-celery is using redis DB and not Mongo DB. Is there any way I can change that?When I installed node-celery redis was installed as dependency.
I am new to celery, Please guide.Thanks.

Celery is basically a RabbitMQ client. There are producers (tasks), consumers (workers) and AMQP message broker which delivers messages between tasks and workers.
Knowing that will enable you to write your own celery in node.js.
node-celery here is a library that enables your node process to work both as a celery client (Producer/Publisher) and a celery worker (Consumer).
See https://abhishek-tiwari.com/post/amqp-rabbitmq-and-celery-a-visual-guide-for-dummies

Edit-1/2018
My recommendation is not to use Kue now, as it seems to be a stalled project, use Celery instead. It is very well supported and maintained by the community and supports large number of use cases.
Old Answer
Go for Kue, it's a wholistic solution that resembles Celery in Python word; it has the concepts of: producers/consumers, delayed tasks, task retrial, task TTL, ability to round-robin tasks across multiple consumers listening to the same queue, etc.
Probably Celery is more advanced with more features with more brokers to support and you can use celery-node if you like, but, in my opinion, I think no need to go for a hybrid solution that requires installation of python and node when you can only use only language that's sufficient in 90% of the cases (unless necessary of course).

Go for Kue, it's a wholistic solution that resembles Celery in Python word; it has the concepts of: producers/consumers, delayed tasks, task retrial, task TTL, ability to round-robin tasks across multiple consumers listening to the same queue, etc.
Kue, after so much time have passed, still has the same old core issues unsolved:
github.com/Automattic/kue/issues/514
github.com/Automattic/kue/issues/130
github.com/Automattic/kue/issues/53
If anyone reading this don't want to rewrite Kue, don't start with it. It's good for simple tasks. But if you want to deal with a lot of them, concurrent, or task chains (when one task creates another) - stop wasting your time.
I've wasted a month trying to debug Kue and still no success. The best choice was to change Kue for Pubs/sub Messaging queue on RabbitMQ and Rabbot (another RabbitMQ wrap up).
Personally, I haven't used Celery as much to put all in for it, but as I've been searching for Celery alternative and found how someone is advising for Kue just boiled my blood in veins.
If you want to send a delayed email (as in Kue example) you can go for whatever you'd like without worrying about errors. But if you want a reliable system task/message queue, don't even start with Kue. I'd personally go with 5. node-celery(node celery)

It is also worth mentioning https://github.com/OptimalBits/bull. It is a fast, reliable, Redis-based queue written for stability and atomicity.
Bull 4 is currently in beta and has some nice features https://github.com/taskforcesh/bullmq

In our experience, Kue was unreliable, losing jobs. Granted, we were using an older version, it's probably been fixed since. That was also during the period when TJ abandoned the project and the new maintainers hadn't been chosen. We switched to beanstalkd and have been very happy. We're using https://github.com/ceejbot/fivebeans as the node interface to beanstalkd.

Related

Scheduling function calls in a stateless Node.js application

I'm trying to figure out a design pattern for scheduling events in a stateless Node back-end with multiple instances running simultaneously.
Use case example:
Create an message object with a publish date/time and save it to a database
Optionally update the publishing time or delete the object
When the publish time is reached, the message content is sent to a 3rd party API endpoint
Right now my best idea is to use bee-queue or bull to queue delayed jobs. It should be able to store the state and ensure that the job is executed only once. However, I feel like it might introduce a single point of failure, especially when maintaining state on Redis for months and then hoping that the future version of the queue library is still working.
Another option is a service worker that polls the database for upcoming events every n minutes, but this seems like a potential scaling issue down the line for multi-tenant SaaS.
Are there more robust design patterns for solving this?
Don't worry about redis breaking. It's pretty stable, and eventually you can decide to freeze the version.
If there are jobs that will be executed in the future I would suggest a database, like Mongo or Redis, with a disk-store. So you will survive a reboot, you don't have to reinvent the wheel, and already have a nice set of tools for scalability.

Handle long-running processes in NodeJS?

I've seen some older posts touching on this topic but I wanted to know what the current, modern approach is.
The use case is: (1) assume you want to do a long running task on a video file, say 60 seconds long, say jspm install that can take up to 60 seconds. (2) you can NOT subdivide the task.
Other requirements include:
need to know when a task finishes
nice to be able to stop a running task
stability: if one task dies, it doesn't bring down the server
needs to be able to handle 100s of simultaneous requests
I've seen these solutions mentioned:
nodejs child process
webworkers
fibers - not used for CPU-bound tasks
generators - not used for CPU-bound tasks
https://adambom.github.io/parallel.js/
https://github.com/xk/node-threads-a-gogo
any others?
Which is the modern, standard-based approach? Also, if nodejs isn't suited for this type of task, then that's also a valid answer.
The short answer is: Depends
If you mean a nodejs server, then the answer is no for this use case. Nodejs's single-thread event can't handle CPU-bound tasks, so it makes sense to outsource the work to another process or thread. However, for this use case where the CPU-bound task runs for a long time, it makes sense to find some way of queueing tasks... i.e., it makes sense to use a worker queue.
However, for this particular use case of running JS code (jspm API), it makes sense to use a worker queue that uses nodejs. Hence, the solution is: (1) use a nodejs server that does nothing but queue tasks in the worker queue. (2) use a nodejs worker queue (like kue) to do the actual work. Use cluster to spread the work across different CPUs. The result is a simple, single server that can handle hundreds of requests (w/o choking). (Well, almost, see the note below...)
Note:
the above solution uses processes. I did not investigate thread solutions because it seems that these have fallen out of favor for node.
the worker queue + cluster give you the equivalent of a thread pool.
yea, in the worst case, the 100th parallel request will take 25 minutes to complete on a 4-core machine. The solution is to spin up another worker queue server (if I'm not mistaken, with a db-backed worker queue like kue this is trivial---just make each point server point to the same db).
You're mentioning a CPU-bound task, and a long-running one, that's definitely not a node.js thing. You also mention hundreds of simultaneous tasks.
You might take a look at something like Gearman job server for things like that - it's a dedicated solution.
Alternatively, you can still have Node.js manage the requests, just not do the actual job execution.
If it's relatively acceptable to have lower then optimal performance, and you want to keep your code in JavaScript, you can still do it, but you should have some sort of job queue - something like Redis or RabbitMQ comes to mind.
I think job queue will be a must-have requirement for long-running, hundreds/sec tasks, regardless of your runtime. Except if you can spawn this job on other servers/services/machines - then you don't care, your Node.js API is just a front and management layer for the job cluster, then Node.js is perfectly ok for the job, and you need to focus on that job cluster, and you could then make a better question.
Now, node.js can still be useful for you here, it can help manage and hold those hundreds of tasks, depending where they come from (ie. you might only allow requests to go through to your job server for certain users, or limit the "pause" functionality to others etc.
Easily perform Concurrent Execution to LongRunning Processes using Simple ConcurrentQueue. Feel free to improve and share feedback.
👨🏻‍💻 Create your own Custom ConcurrentExecutor and set your concurrency limit.
🔥 Boom you got all your long-running processes run in concurrent mode.
For Understanding you can have a look:
Concurrent Process Executor Queue

Meteor backend parsing xml api, scraping websites etc

I would be very gratefull if someone could answer my question. I'm new to nodejs. Doing app in meteor. Everything is fine, mongo etc. But when I end huge crud...I will need to parse some xml apis, scrape some websites. All as backend tasks done via cron etc. My question is...I don't see any examples of such backends in meteor. I see using npm libs. Is this the only path to follow? Also meteor writes mongo's ids as strings. While php writes as objectid. If I would use npm will it write as objectid? Will it do harm? Overall question is...to do parsing backend in meteor the npm are the good path?
I quote #Dan Dascalescu in his excellent answer to another question:
There are several packages to run background tasks in Meteor. From the simplest to the most involved:
super basic cron packages: cron,
easycron
percolatestudio:synced-cron
cron jobs distributed across multiple app servers
queue-async - minimalistic async (see below), written by D3 author Mike Bostock
peerlibrary:async - wrapper for the popular async package for Node.js and the
browser. Offers over 20 functions
(map, reduce, every, filter etc.) and supports powerful control flow
(serial, parallel, waterfall etc.); see also this
example.
artwells:queue - priorities, scheduling, logging, re-queuing. Queue backed by MongoDB.
vsivsi:jobCollection
schedule persistent jobs to be run anywhere (servers, clients). I used this to power the RSS feed aggregation at a financial news
aggregator startup (StockBase.com).
differential:workers
Spawn headless worker meteor processes to work on async jobs
Packages I would recommend caution with:
PowerQueue - queue async tasks, throttle resource usage, retry failed. Supports sub
queues. No
scheduling.
No tests, but nifty demo. Not
suitable for running for a long
while
due to using recursive
calls.
Kue - the priority job queue for Node.js backed by redis.
Not updated for Meteor 0.9+.

Controlling the flow of requests without dropping them - NodeJS

I have a simple nodejs webserver running, it:
Accepts requests
Spawns separate thread to perform background processing
Background thread returns results
App responds to client
Using Apache benchmark "ab -r -n 100 -c 10", performing 100 requests with 10 at a time.
Average response time of 5.6 seconds.
My logic for using nodejs is that is typically quite resource efficient, especially when the bulk of the work is being done by another process. Seems like the most lightweight webserver option for this scenario.
The Problem
With 10 concurrent requests my CPU was maxed out, which is no surprise since there is CPU intensive work going on the background.
Scaling horizontally is an easy thing to, although I want to make the most out of each server for obvious reasons.
So how with nodejs, either raw or some framework, how can one keep that under control as to not go overkill on the CPU.
Potential Approach?
Could accepting the request storing it in a db or some persistent storage and having a separate process that uses an async library to process x at a time?
In your potential approach, you're basically describing a queue. You can store incoming messages (jobs) there and have each process get one job at the time, only getting the next one when processing the previous job has finished. You could spawn a number of processes working in parallel, like an amount equal to the number of cores in your system. Spawning more won't help performance, because multiple processes sharing a core will just run slower. Keeping one core free might be preferred to keep the system responsive for administrative tasks.
Many different queues exist. A node-based one using redis for persistence that seems to be well supported is Kue (I have no personal experience using it). I found a tutorial for building an implementation with Kue here. Depending on the software your environment is running in though, another choice might make more sense.
Good luck and have fun!

worker queue for nodejs?

I am in the process of beginning to write a worker queue for node using node's cluster API and mongoose.
I noticed that a lot of libs exist that already do this but using redis and forking. Is there a good reason to fork versus using the cluster API?
edit and now i also find this: https://github.com/xk/node-threads-a-gogo -- too many options!
I would rather not add redis to the mix since I already use mongo. Also, my requirements are very loose, I would like persistence but could go without it for the first version.
Part two of the question:
What are the most stable/used nodejs worker queue libs out there today?
Wanted to follow up on this. My solution ended up being a roll your own cluster impl where some of my cluster workers are dedicated job workers (ie they just have code to work on jobs).
I use agenda for job scheduling.
Cron type jobs are scheduled by the cluster master. The rest of the jobs are created in the non-worker clusters as they are needed. (verification emails etc)
Before that I was using kue but dropped it because the rest of my app uses mongodb and I didnt like having to use redis just for job scheduling.
Have u tried https://github.com/rvagg/node-worker-farm?
It is very light weight and doesn't require a separate server.
I personally am partial to cluster-master.
https://github.com/isaacs/cluster-master
The reason I like cluster master is because it does very little besides add in logic for forking your process, and give you the ability to manage the number of process you're running, and a little bit of logging/recovery to boot! I find overly bloated process management libraries tend to be unstable, and sometimes even slow things down.
This library will be good for you if the following are true:
Your module is largely asynchronous
You don't have a huge amount of different types of events triggering
The events that fire have small amounts of work to do, but you have lots of similar events firing(things like web servers)
The reason for the above list, is the reason why threads-a-gogo may be good for you, for the opposite reasons. If you have a few spots in your code, where there is a lot of work to do within your event loop, something like threads-a-gogo that launches a "thread" specifically for this work is awesome, because you aren't determining ahead of time how many workers to spawn, but rather spawning them to do work when needed. Note: this can also be bad if there is the potential for a lot of them to spawn, if you start launching too many processes things can actually bog down, but I digress.
To summarize, if your module is largely asynchronous already, what you really want is a worker pool. To minimize the down time when your process is not listening for events, and to maximize the amount of processor you can use. Unless you have a very busy syncronous call, a single node event loop will have troubles taking advantage of even a single core of a processor. Under this circumstance, you are best off with cluster-master. What I recommend is doing a little benchmarking, and see how much of a single core your program can use under the "worst case scenario". Let's say this is 33% of one core. If you have a quad core machine, you then tell cluster master to launch you 12 workers.
Hope this helped!

Resources