Monitoring a cron running on HPCC cluster - hpcc-ecl

I have a cron scheduled to run on Thor cluster. Is there a way to monitor a cron running on HPCC Cluster and send a notification if the cron is not running due to a failure or system shutdown?

Akhilesh,
The only way I can think of to do that would be to make the CRON job periodically send a "ping" of some sort (an email, or update a semaphore file, or ... ) then have a separate process running on another box to alert someone if that "ping" doesn't arrive as scheduled (indicating the CRON job is no longer working).
HTH,
Richard

Related

why nestjs cron job doesn't work?? can' even guess

i implement cron job A(EVERY_DAY_AT_8AM) on the batch server
when cron job A executed it queues a job to bull queue
and the other server consumes the job from bull queue
on batch server other cronjob which is executed more frequently than cron job A exists
The problem is although cron job A executed, job doesn't be queued to bull queue
In shorts, the job which has to be consumed had gone somewhere.
I can't even guess why this situation happen..
any one have ideas why does this issue happen??

Issue after Quartz Web Restart- Quartz triggers all the scheduled jobs which has been pasted

I setup JDBCJobStore for store jobs and schedule the jobs by Cron.
Sometimes, I will manually stop the Quartz Scheduler in order to bypass some scheduled job to be triggered for some specific purpose.
However, I face an issue after re-starting the Quartz Scheduler. All the jobs which was scheduled will be triggered at same time even through the next schedule time has been due. I check the database and find all the jobs has been scheduled and saved in QRTZ_FIRED_TRIGGERS table, but not can be delete. Cron only re-schedule jobs after run.
Is there any way to make Quartz to re-schedule job by Cron when I re-start the Quartz Server and without trigger these expired schedule?
Any help will be highly appreciated.
Best Regards,
Dean Huang
The job will be reschedule by Cron if I configure as RAMJobStore and setup job by xml, but not JDBCJobStore.

Oozie: kill a job after a timeout

Sorry but can't find he configuration point a need. I schedule spark application, sometimes they may not succeed after 1 hour, in this case I want to automatically kill this task (because I am sure it will never succeed, and another scheduling may start).
I found a timeout configuration, but as I understand it, this is used to delay the start of a workflow.
So is there a kind of living' timeout ?
Oozie cannot kill a workflow that it triggered. However you can ensure that a single workflow is running at same time by setting Concurrency = 1 in the Coordinator.
Also you can have a second Oozie workflow monitoring the status of the Spark job.
Anyawy, you should investigate the root cause of Spark job not successful or being blocked.

Node js and system cron jobs

I am using node-cron to schedule some tasks inside my node app. This package has some API to create, start and stop cron jobs. However, I can't seem to find these cron jobs when I run crontab -l command in my OS terminal. I tried both on my mac os as well as on centos.
Specific question:
Does such node packages create cron jobs at the OS level?
If answer to 1 is yes, then will these cron jobs execute irregardless my node app is running or not?
If answer to 2 is yes, then how do I stop and clear out all such schedules cron jobs?
Giving a fast look at the node-cron source code, you can check that,
node-cron does not create any cron at the OS Level.
It looks like just a long time out functionality..
I suppose that if the node process will be restarted you lost the launched cronjobs.

Determine which Celery workers are consuming jobs and telling them to stop

Scenario: How to I gracefully tell a worker to stop accepting new jobs and identify when they are finished processing the current jobs as to shut them down as workers are coming online.
Details (Feel free to correct any of my assumptions):
Here is snippet of my current queue.
As you can see I have 2 exchange queues for the workers (I believe this is the *.pidbox), 2 queues representing celeryev on each host (yes I know I only need one), and one default celery queue. Clearly I have 90+ jobs in this queue.
(Side Question) Where do you go to find the worker consuming the job from the Management console? I know I can look at djcelery and figure that out.
So.. I know there are jobs running on each host - I can't shut celery off those machines as it will kill the jobs running (and any pending?).
How do I stop any further processing of new jobs while allowing those jobs still running to complete? I know that on each host I can stop celery but that will kill any current jobs running as well. I want to say to the 22 jobs in the hopper to halt.
Thanks!!

Resources