Scheduling with gearman vs. cron? - cron

I have noticed a lot of people discussing Gearman and it's scheduling features making it enable to distribute work onto other servers. However, I have not yet seen a comparison to native cronjobs.
What are the differences between cron and Gearman?

If you're doing pure scheduling, using Gearman is unnecessary.
The main differences between Gearman and cron are that:
cron jobs are only triggered only based on time, while Gearman functions are triggered by calls by other applications.
Gearman is used for coordinating tasks between multiple systems, as you mentioned, while cron provides no synchronization. As a result, asynchronous tasks are better for cron, and vice versa.
Unless your application needs to farm out heavy-duty synchronous processing to other servers, I would recommend you to use cron and keep it simple.

Related

Is there any Kotlin coroutine scheduler?

I'm building a system in Kotlin that need's to schedule a lot of jobs at the same time (sometimes even a few thousands of jobs). Most of the jobs are not complex, they are simply doing a few HTTP requests with different libraries.
I'm currently working with Quartz scheduler. the problem is that sometimes it misfires a job and that something that can be critical to me. I know Quartz has different settings you can apply to handle misfires but I didn't find something suitable.
I know that in Kotlin there are coroutines that can be considered as lightweight threads. Is there any scheduler like Quartz for Kotlin that executes the jobs as coroutines instead of threads with the same (or at least most of the same) features? I'm asking because I believe that such scheduler can increase my system performance.
follow up question: if such scheduler exists will it be safe to use him even if I don't know the implementation of the libraries I'm using inside the job? (For example making sure there are no Thread.sleep() calls)

Linux crontab command can replace java quartz

I have a question about job schedular, can use crontab(linux) command replace java quartz?
I want to know the advantage of quartz, someone could give some advice.
Depending on the scale of the problem being solved, using the cron scheduler provided by Linux will work well for many problems (on a single host). When you would like some fail over capability quartz is going to be the solution. Quartz can act as a clustered scheduler. Configured correctly, one node could be brought down for patching while the jobs running on quartz continue to process. There are also features of quartz that cron does not provide. Persistence and disallowing concurrent execution are two that I am using for a project. Those are some of the features that stand out to me. It would be best to check the documentation and look at some of the examples provided.
Cron is available by default on any unix based os. Quarz is simply a Java API with (more scheduling options). If you wish to schedule tasks within a Java application, Quartz is the way to go. If you wish to schedule adhoc os commands, unless you feel like writing your own generic scheduler, cron is the way to go.

Handle long-running processes in NodeJS?

I've seen some older posts touching on this topic but I wanted to know what the current, modern approach is.
The use case is: (1) assume you want to do a long running task on a video file, say 60 seconds long, say jspm install that can take up to 60 seconds. (2) you can NOT subdivide the task.
Other requirements include:
need to know when a task finishes
nice to be able to stop a running task
stability: if one task dies, it doesn't bring down the server
needs to be able to handle 100s of simultaneous requests
I've seen these solutions mentioned:
nodejs child process
webworkers
fibers - not used for CPU-bound tasks
generators - not used for CPU-bound tasks
https://adambom.github.io/parallel.js/
https://github.com/xk/node-threads-a-gogo
any others?
Which is the modern, standard-based approach? Also, if nodejs isn't suited for this type of task, then that's also a valid answer.
The short answer is: Depends
If you mean a nodejs server, then the answer is no for this use case. Nodejs's single-thread event can't handle CPU-bound tasks, so it makes sense to outsource the work to another process or thread. However, for this use case where the CPU-bound task runs for a long time, it makes sense to find some way of queueing tasks... i.e., it makes sense to use a worker queue.
However, for this particular use case of running JS code (jspm API), it makes sense to use a worker queue that uses nodejs. Hence, the solution is: (1) use a nodejs server that does nothing but queue tasks in the worker queue. (2) use a nodejs worker queue (like kue) to do the actual work. Use cluster to spread the work across different CPUs. The result is a simple, single server that can handle hundreds of requests (w/o choking). (Well, almost, see the note below...)
Note:
the above solution uses processes. I did not investigate thread solutions because it seems that these have fallen out of favor for node.
the worker queue + cluster give you the equivalent of a thread pool.
yea, in the worst case, the 100th parallel request will take 25 minutes to complete on a 4-core machine. The solution is to spin up another worker queue server (if I'm not mistaken, with a db-backed worker queue like kue this is trivial---just make each point server point to the same db).
You're mentioning a CPU-bound task, and a long-running one, that's definitely not a node.js thing. You also mention hundreds of simultaneous tasks.
You might take a look at something like Gearman job server for things like that - it's a dedicated solution.
Alternatively, you can still have Node.js manage the requests, just not do the actual job execution.
If it's relatively acceptable to have lower then optimal performance, and you want to keep your code in JavaScript, you can still do it, but you should have some sort of job queue - something like Redis or RabbitMQ comes to mind.
I think job queue will be a must-have requirement for long-running, hundreds/sec tasks, regardless of your runtime. Except if you can spawn this job on other servers/services/machines - then you don't care, your Node.js API is just a front and management layer for the job cluster, then Node.js is perfectly ok for the job, and you need to focus on that job cluster, and you could then make a better question.
Now, node.js can still be useful for you here, it can help manage and hold those hundreds of tasks, depending where they come from (ie. you might only allow requests to go through to your job server for certain users, or limit the "pause" functionality to others etc.
Easily perform Concurrent Execution to LongRunning Processes using Simple ConcurrentQueue. Feel free to improve and share feedback.
πŸ‘¨πŸ»β€πŸ’» Create your own Custom ConcurrentExecutor and set your concurrency limit.
πŸ”₯ Boom you got all your long-running processes run in concurrent mode.
For Understanding you can have a look:
Concurrent Process Executor Queue

Equivalent of Celery in Node JS

Please suggest an equivalent of Celery in Node JS to run asynchronous tasks.
I have been able to search for the following:
(Later)
Kue (Kue),
coffee-resque (coffee-resque)
cron (cron)
node-celery(node celery)
I have run both manual and automated threads in background and interact with MongoDB.
node-celery is using redis DB and not Mongo DB. Is there any way I can change that?When I installed node-celery redis was installed as dependency.
I am new to celery, Please guide.Thanks.
Celery is basically a RabbitMQ client. There are producers (tasks), consumers (workers) and AMQP message broker which delivers messages between tasks and workers.
Knowing that will enable you to write your own celery in node.js.
node-celery here is a library that enables your node process to work both as a celery client (Producer/Publisher) and a celery worker (Consumer).
See https://abhishek-tiwari.com/post/amqp-rabbitmq-and-celery-a-visual-guide-for-dummies
Edit-1/2018
My recommendation is not to use Kue now, as it seems to be a stalled project, use Celery instead. It is very well supported and maintained by the community and supports large number of use cases.
Old Answer
Go for Kue, it's a wholistic solution that resembles Celery in Python word; it has the concepts of: producers/consumers, delayed tasks, task retrial, task TTL, ability to round-robin tasks across multiple consumers listening to the same queue, etc.
Probably Celery is more advanced with more features with more brokers to support and you can use celery-node if you like, but, in my opinion, I think no need to go for a hybrid solution that requires installation of python and node when you can only use only language that's sufficient in 90% of the cases (unless necessary of course).
Go for Kue, it's a wholistic solution that resembles Celery in Python word; it has the concepts of: producers/consumers, delayed tasks, task retrial, task TTL, ability to round-robin tasks across multiple consumers listening to the same queue, etc.
Kue, after so much time have passed, still has the same old core issues unsolved:
github.com/Automattic/kue/issues/514
github.com/Automattic/kue/issues/130
github.com/Automattic/kue/issues/53
If anyone reading this don't want to rewrite Kue, don't start with it. It's good for simple tasks. But if you want to deal with a lot of them, concurrent, or task chains (when one task creates another) - stop wasting your time.
I've wasted a month trying to debug Kue and still no success. The best choice was to change Kue for Pubs/sub Messaging queue on RabbitMQ and Rabbot (another RabbitMQ wrap up).
Personally, I haven't used Celery as much to put all in for it, but as I've been searching for Celery alternative and found how someone is advising for Kue just boiled my blood in veins.
If you want to send a delayed email (as in Kue example) you can go for whatever you'd like without worrying about errors. But if you want a reliable system task/message queue, don't even start with Kue. I'd personally go with 5. node-celery(node celery)
It is also worth mentioning https://github.com/OptimalBits/bull. It is a fast, reliable, Redis-based queue written for stability and atomicity.
Bull 4 is currently in beta and has some nice features https://github.com/taskforcesh/bullmq
In our experience, Kue was unreliable, losing jobs. Granted, we were using an older version, it's probably been fixed since. That was also during the period when TJ abandoned the project and the new maintainers hadn't been chosen. We switched to beanstalkd and have been very happy. We're using https://github.com/ceejbot/fivebeans as the node interface to beanstalkd.

Eclipse RCP: Only one Job runs at a time?

The Jobs API in Eclipse RCP apparently works much differently than I expected. I thought that creating and scheduling multiple Jobs would actually cause multiple worker threads to be created, executing the Jobs in parallel unless there was an ISchedulingRule conflict.
I went back and read the documentation more closely, and also discovered this comment in the JobManager class:
/**
* Returns a running or blocked job whose scheduling rule conflicts with the
* scheduling rule of the given waiting job. Returns null if there are no
* conflicting jobs. A job can only run if there are no running jobs and no blocked
* jobs whose scheduling rule conflicts with its rule.
*/
Now it looks to me like the Job manager will only ever attempt to use one background worker thread. Am I completely wrong about this? If I'm right,
what is the point of scheduling rules and locks? If there is only one worker thread, Jobs can never preemt each other. Wouldn't these only ever be used in case a Job's sleep() method is called (e.g. sleeping while holding a Lock)?
does any part of the platform allow two Jobs to actually run concurrently, on multiple worker threads, thus making the above features useful somehow?
What am I missing here?
Take a look at the run method in the documentation, specifically this part:
Jobs can optionally finish their
execution asynchronously (in another
thread) by returning a result status
of ASYNC_FINISH. Jobs that finish
asynchronously must specify the
execution thread by calling setThread,
and must indicate when they are
finished by calling the method done.
ASYNC_FINISH there looks interresting.
AFAIK creating and scheduling multiple Jobs DO actually cause multiple worker threads to be created and to be executed in parallel.
However if you specify an optional scheduling rule to your job (using the setRule() method) and if that rule conflicts with another job's scheduling rule then those two jobs can't run simultaneously.
This Eclipse Corner article provides good description as well as few code samples for Eclipse Job API.
The IJobManager API is only needed for advanced job manipulation, e.g. when you need to use locks, synchronize between several jobs, terminate jobs, etc.
Note: Eclipse 4.5M4 will include now (Q4 2014) a way to Support for Job Groups with throttling
See bug 432049:
Eclipse provides a simple Jobs API to perform different tasks in parallel and in asynchronous fashion. One limitation of the Eclipse Jobs is that there is no easy way to limit the number of worker threads being used to execute jobs.
This may lead to a thread pool explosion when many jobs are scheduled in quick succession. Due to that it’s easy to use Jobs to perform different unrelated tasks in parallel, but hard to implement thousands of Jobs co-operating to complete a single large task.
Eclipse currently supports the concept of Job Families, which provides one way of grouping with support for join, cancel, sleep, and wakeup operations on the whole family.
To address all these issue we would like to propose a simple way to group a set of Eclipse Jobs that are responsible for pieces of the same large task.
The API would support throttling, join, cancel, combined progress and error reporting for all of the jobs in the group and the job grouping functionality can be used to rewrite performance critical algorithms to use parallel execution of cooperating jobs.
You can see the implementation in this commit 26471fa

Resources