I've been running an hourly cron job that worked wonderfully for a little while. One fine day it stopped working. The below are consecutive lines in my cron_hourly.log (note the gap - cron never initiated anything in that time).
Mon Feb 8 18:01:27 EST 2016: END hourly cron run - status=0
__________________________________________________________________________
__________________________________________________________________________
Sun Feb 21 11:01:10 EST 2016: START hourly cron run
__________________________________________________________________________
If you are using the free plan, your gear idles if it has not received any legitimate web traffic within 24 hours. When your gear is idle, cron jobs do not run. You can avoid this by upgrading to the Bronze plan.
Related
I realize a Databricks cluster has a timeout, meaning after N minutes it will turn the cluster off. Here's a sample.
As nice as this feature is, though, it is not what we need. Our team works from 8AM to 6PM on weekdays. We want the cluster to would auto-start at 8AM, stay "always on" during working hours, THEN timeout after, say, 6PM. Make sense?
Q: Is this possible?
You can do everything inside the Databricks by scheduling some small job on the existing cluster. In this case, if cluster is stopped, then it will be started for execution of the job, and will stay until the auto-termination feature will kick-in (I would recommend to use 65-70 minutes as auto-termination setting to balance costs). You can create a notebook with something like
display(spark.range(1))
and schedule it for execution on selected cluster. To keep cluster running during the work hours, you need schedule the job to run periodically. This could be done with following cron expression (see Quartz docs for reference):
* 0 8-17 * * MON-FRI
P.S. Really, it should be maybe * 55 7-16 * * MON-FRI, to start 5 minutes before 8AM.
Yes, it possible to start the databricks cluster as per your team works from 8AM to 6PM on weekdays using Azure Automation.
To start at 8 AM you can use PowerShell runbook in Azure Automation to start your cluster as per the scheduled time as shown below:
PowerShell runbook should be as shown below:
$accessToken = "<Personal_Access_Token>"
$apiUrl = "<Azure_Databricks_Endpoint_URL>"
Set-DatabricksEnvironment -AccessToken $accessToken -ApiRootUrl $apiUrl
Start-DatabricksCluster -ClusterID "<Cluster_ID>"
To stop at 6 PM you can set the property Terminate after 600 minutes of inactivity.
Note: If your Business Hours (8AM to 6PM which means 10 Hours x 60 minutes) you can set the property Terminate after 600 minutes of inactivity as shown below:
This Tutorial: Start Azure Databricks clusters during business hours walks you through the creation of a PowerShell Workflow runbook to start Azure Databricks clusters during business hours in Azure Automation.
Unfortunately, It is not possible.
However, You can opt out of auto termination by clearing the Auto Termination checkbox or by specifying an inactivity period of 0.
Refer official document
I setup JDBCJobStore for store jobs and schedule the jobs by Cron.
Sometimes, I will manually stop the Quartz Scheduler in order to bypass some scheduled job to be triggered for some specific purpose.
However, I face an issue after re-starting the Quartz Scheduler. All the jobs which was scheduled will be triggered at same time even through the next schedule time has been due. I check the database and find all the jobs has been scheduled and saved in QRTZ_FIRED_TRIGGERS table, but not can be delete. Cron only re-schedule jobs after run.
Is there any way to make Quartz to re-schedule job by Cron when I re-start the Quartz Server and without trigger these expired schedule?
Any help will be highly appreciated.
Best Regards,
Dean Huang
The job will be reschedule by Cron if I configure as RAMJobStore and setup job by xml, but not JDBCJobStore.
I have setup a cron job on GCP Kubernetes. It runs once per day at 10:00am.
The job runs as expected however, I don't really understand what the charts say on the GCP K8S console.
As shown in the charts, there are around 1.5 CPU and 8G RAM at this point, when the cron job is not running. I expect the current usage should be zero as it is not running.
Could anyone see what is wrong? or I read the charts wrong?
Note I do retain 7 jobs in the history. Each job ran about 15sec and completed successfully.
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 08 Mar 2018 04:00:56 +0000
Finished: Thu, 08 Mar 2018 04:01:09 +0000
20180320 EDIT:
I found all graphs from other cron jobs all look the same. Is it something I setup wrong?
I reproduced your situation by that way:
Created CronJob with a name "Application"
Added an application using Deployment with a name "Application"
And now, I see on graphics of Cron Job details information not only about a cronjob itself, but also about an application.
So, I think, dashboard using name of object for getting a data.
Looks like you have any other deployment, replicaset etc. which always running and has a same name as your cronjob and on graphics you see the mix of data from it and from your cronjob.
When I submit a job on a set of machines machine located in PST timezone, the Spark Master has the correct time, but the history server shows time that is 8 hrs ahead which is GMT.
Is there a way to fix this ?
I've been monitoring the cron jobs I set up last week from my Bolt powered website. I've noticed that the daily cron jobs seem to be running at 11am (that's the time the database logs them at). In the config.yml the time is set to 3am.
I've checked the server time and that's using UTC. The MySQL database is using the server time so I would assume that to be UTC as well.
Is this on a Git master version, or 1.x?
Either way this definitely sounds like a bug.
Edit:
Was a bug and I have submitted a PR that fixes it for master and the version 1.6 branch.
However a possible workaround is to set the time of the existing records in the database so that the time of the last run is 03:00.