Spark Streaming Notebook for x hours - databricks

I created a Notebook in Databricks (Azure) that stream data directly into our Datawarehouse.
I would like to schedule this ro run every day from (exam) 2AM till 10PM, in the off period some maintenance would run.
I can schedule this to start every day at 2AM, but how could I make it nicely stop every day at arround 10PM ?

Related

Cron Job With Altering Behavior Depending on Hour

Is it possible to execute a Cron job such that between certain hours, it executes every 30 minutes, but other hours, only every 1 hour?
I was unable to figure this out using my basic Cron abilities.

Airflow terminates current run, and starts new run, every day at midnight, despite my CRON schedule

I have an Airflow job that I wish to run every 130 minutes. I have set the cron schedule like this: "*/130 * * * *".
This schedule functions normally, until the clock hits midnight. Each day at midnight, if a job is currently underway, Airflow will terminate the job and start a new job. I do NOT want this behavior. Thanks in advance for your advice!

How to get Spark Streaming running time

I need to set up a Spark Streaming application. Jobs of the application need to make some decisions based on the whole application running time.
For example, assume the Spark Streaming application was submitted at 08:00. The jobs run between 08:00 and 10:00 should do a plus operation, while the jobs run after 10:00 should do a minus operation.
How can I record the first job's (or the application's) start time and determine the interval between each job and the first job? Or is there any other good solution?
SparkContext's startTime() method returns the time when it became active.

Azure WebJob/Scheduler every 30 minutes from 8am-6pm?

When I go to configure a Schedule in the Azure management console, I'm only given the option of scheduling with an absolute end date/time (or never ending) and an interval.
So I can't, from this UI, schedule a job to every 30 minutes run every day from 8:00 AM to 6:00 PM only (i.e. don't run from 6:01 PM to 7:59 AM). Windows Task Manager and all other schedulers (cron, quartz) I've used before support the behaviour I want.
Is type of schedule supported at all in Azure, e.g. through the API or a hackish use of the Portal HTTP/JSON interfaces?
You can use the built-in scheduling which is more flexible than the Azure one.
You can learn more about how that works from this blog post http://blog.amitapple.com/post/2015/06/scheduling-azure-webjobs/
The summary: create a file called settings.job that contains the following piece of json
{"schedule": "cron expression for the schedule"}
in your case the cron expression for "every 30 minutes from 8am to 6pm" would be 0,30 8-18 * * *
so the JSON you want is
{"schedule": "0,30 8-18 * * *"}
Keep in mind that this uses the timezone of the machine, which is UTC by default.
This is something you need to implement in your WebJob. I have a similar issue in that I have WebJobs with complex schedules. Fortunately it isn't hard to implement.
This snippit gets your local time (Eastern from what I can tell) from UTC which everything is Azure is set to. It then checks if it is Saturday or Sunday and if it is exits out (not sure if you need this). It then checks whether it is before 8AM or after 6PM and if it is exits out. If it passes both those conditions the WebJob runs.
//Get current time, adjust 4 hours to convert UTC to Eastern Time
DateTime dt = DateTime.Now.AddHours(-4);
//This job should only run Monday - Friday from 8am to 6pm Eastern Time.
if (dt.DayOfWeek == DayOfWeek.Saturday || dt.DayOfWeek == DayOfWeek.Sunday) return;
if (dt.Hour < 8 || dt.Hour > 16) return;
//Go run WebJob
Hope this helps.

Linux: Start a cron job inside another cron job

I am dealing with a workflow where I need to start three processes. I have the first process which is to be scheduled at the beginning of every hour and the rest two at 45th minute of every hour and the 52nd minute of every hour.
But Instead of making the client schedule two different jobs on their server what I would rather want is to have just one job configured to run in the beginning of every hour which does a bunch of stuff and then starts these cron jobs at their respective times. i.e. 45th minute and 52nd minute of the hour.
Is there any way to do this.
I don't have any experience with shell scripting and always schedule cron jobs manually on cron-tab.
Thanks!

Resources