cron job two parameters - start time and end time should cover entire day - linux

I have written a shell script for data extraction that accepts two parameters - Start time and end time in YYDDMMHHSSSS format. The shell script in turn will run sql queries and fetch data between these two date parameters.
My intention is to deploy the shell script as a cron job which should run at least once every day(preferably every 6 hours). The second time it runs it should use that last End time as the Start time, and the new End time as, say (Starttime + 6 hours). So all data is always extracted once. Another job will kick off at say 12 in the midnight everyday and it will pick up the data that the shell scrip deposited for that day.
I have never setup a cron job before but it looks doable from what I have read, I'm not sure if the above thing can be done though?

Cron executes jobs at specific times and/or days with all parameters for the script defined at the time the job is placed into the cron job table. The script needs to handle all other requirements. If your requirements are based on the current time and the last time the script was executed, then the script will need to preserve the time of execution each time it is run and the obtain the last time it was invoked from the information preserved.
In this particular case, because you are accessing a database, I suggest that you use the database to preserve time of the previous script execution.

Related

How to check if the DAG is complete within Given time or not?

I have a Dag A, It runs at a time let's say 10 Am, and typically completes within 15-20 mins, but sometimes it takes more time and due to some tables in the Database it goes into an endless running state, how can I know that if my DAG is completed within a given time frame and if not it should send email Alerts that it's not completed in this time and you need to check.
My thought process:
To build a parallel DAg or process within the same DAG and then write a python function in it which just checks the start time and match it with the Current time and then keeps subtracting it unless it reaches some fixed value lets say 10 mins and then shoots an email that it has not been completed.
Please correct me if I am wrong or what are the other ways to check it
It sounds like you just need to define an SLA. You can find an example here.

How can i schedule pyspark script on hourly basis on linux enviroment

I have one pyspark script and i want that script to be executed on hourly basis means after each hour the script should get executed .
How can i execute that script on hourly basis .
I've search a lot but didn't get anything.
You can use any of the below approaches
https://developer.ibm.com/hadoop/2017/06/30/scheduling-spark-job-written-pyspark-sparkr-yarn-oozie/
https://github.com/pinterest/pinball
cron tab
http://airflow.apache.org/scheduler.html

Run cron at a different frequency in specific interval

I have a cron job that runs every 30 minutes, starting 10 minutes past a whole hour:
0+10/30+*+*+*+?
Now, this needs to be changed, so that in a specific time interval, it runs every 15 minutes instead. E.g. at 7.50, 8.05, 8.20 and 8.35. Then every 30 minutes again.
Is this possible with a single cron job and if so, how? Or do I need multiple jobs to accomplish this?
Thank you in advance.
not easy in a single cron, and that is also hard to read.
multiple jobs may work fine and show much clear
// This will start at 1:10am, and every 30minutes run once.
0+10/30+1-23/2+*+*+?
// This will start at 0:10am, and every 15minutes run once.
0+10/15+0-24/2+*+*+?
you may also consider to void the two job running at the same time.
As far as I've understood, this is not possible within a single cron job.
setup cron from morning to evening only points out that three different cron jobs are needed, so I am closing my question.

creating cron job that sends output to file every day and overwrites this file every month

I need help with cron job that sends output to file every day and overwrites this file every month my only problem is how to make it overwrite each month and I need this in one job so creating 2 jobs one that outputs to a file and other removing it every month is out of picture
You could run it every day but use date +%w to print the day number and act differently (call with > to clobber the file instead of >> to append) based on that.
Note that some cron daemons require % to be escaped, hence \%.
# Run every day at 00:30 but overwrite file on Mondays; append every other day.
# Note that this requires bash as your shell.
# May need to override with SHELL=/bin/bash
30 00 * * * if [ "$(date +\%w)" = "1" ]; then /your/command > /your/logfile; else /your/command >> /your/logfile; fi
Edit:
You mention in comments above that your actual goal is log rotation.
The norm for Linux systems is to use something like logrotate to manage logs like this. That also has the advantage that you can keep multiple previous log files and compress them if you like.
I would recommend making use of a logrotate config snippet to accomplish your goal instead of doing it in the cron job itself. To put this in the cron job is counter-intuitive if it's merely for log rotation.
Here's an example logrotate snippet, which may go in a location like /etc/logrotate.d/yourapp depending on which Linux distribution you're using.
/var/log/yourlog {
daily
missingok
# keep one year of logs
rotate 365
compress
# keep the first one uncompressed for ease of viewing
delaycompress
}
This will result in your log file being rotated daily, with the first iteration being like /var/log/yourlog.1 and then compressed iterations like /var/log/yourlog.2.gz, /var/log/yourlog.3.gz and so on.
In my opinion therefore, your question is not actually a cron question. The kind of cron trickery used above would only be appropriate in situations such as when you want a job to fire on the last Sunday of the month, or the last day of the month, or other criteria that can't be expressed in cron syntax.

Linux: Start a cron job inside another cron job

I am dealing with a workflow where I need to start three processes. I have the first process which is to be scheduled at the beginning of every hour and the rest two at 45th minute of every hour and the 52nd minute of every hour.
But Instead of making the client schedule two different jobs on their server what I would rather want is to have just one job configured to run in the beginning of every hour which does a bunch of stuff and then starts these cron jobs at their respective times. i.e. 45th minute and 52nd minute of the hour.
Is there any way to do this.
I don't have any experience with shell scripting and always schedule cron jobs manually on cron-tab.
Thanks!

Resources